LexCrossCheck
- Descriptions:
Cross-reference check the content of lexical records from a text file. It validates:Check Flag Condition/Action Desription Type Auto-Fix DUP EUI if EUI is duplicated Check duplicated EUIs error no DUP REC - if cit|cat is duplicated
- A record is represented by citation|category
- Potential dupRec are sent to ${outFile}.dupRec
Check potential duplicated records warning no nominalization, abbreviations, acronyms NO EUI - No CC-EUI (found by cit|cat) & CR-EUI exists
- remove CR-EUI
CR-EUI is not correct error yes NEW EUI - No CC-EUI & no CR-EUI
- These cit|cat are not in Lexicon and should be added into Lexicon
cit|cat is new in Lexicon warning no MISS EUI - 1 CC-EUI & No CR-EUI
- assign CC-EUI to CR-EUI
missing CR-EUI is found warning yes WRONG EUI - 1 CC-EUI & not matches CR-EU
- reguires manual review
CR-EUI is differnt from CC-EUI error no MISS EUIs - Multi CC-EUIs & no CR-EUI
- reguires manual review
missing CR-EUI with multi candidate CC-EUIs error no WRONG EUIs - Multi CC-EUIs & none matches CR-EUI
- reguires manual review
CR-EUI is not in CC-EUIs error no nominalization SYM CIT - N-EUI found (nom found by EUI) & cit not matches
- reguires manual review
citation not match in symmetric nom error no SYM CAT - N-EUI found & cat not matches
- reguires manual review
category not match in symmetric nom error no SYM NONE - No N-EUI
- reguires manual review
Not symmetric error no
- Requirements:
- lexCheck${YEAR}dist.jar or
- lexCheck${YEAR}api.jar and lvg${YEAR}api.jar
- Usage:
shell> LexCrossCheck <inFile> <autoFixFile> <-v> <prepostionFile> <dupRecExpFile> <notBaseFormFile>
- inFile: lexical record in text format
- autoFixFile: auto-fixed lexical record in text format
- -v: set verbose to true, default: fault
- prepositionFile: the preposition file, default: use the prepositions.data included in the lexCheck${YEAR}api.jar or lexCheck${YEAR}dist.jar
=> prepositions are used in the class of Compl.CheckPreposition.java. - dupRecExpFile: the duplicate exception file. The duplicated reports are generated in ${autoFixFile}.dupRec
- notBaseFormFile: not base form, from the tag results (output of AnalyzeNewEuiFile: *cr/abb.out).
- Outputs:
- Confirmed message if records are valid. Otherwise, error message.
- Auto-fixed Lexicon
- ${autoFixFile}.dupRec: duplicated records that need to be addressed before further checking
- Examples:
shell> LexCrossCheck ./in.txt
shell> LexCrossCheck ./in.txt ./out.txt
shell> LexCrossCheck ./in.txt ./out.txt -v
shell> LexCrossCheck ./in.txt ./out.txt -v ./data/Files/prepositions.data ./data/Files/dupRecExceptions.data ./data/Files/notBaseForm.data