LexCrossCheck
- Descriptions:
Cross-reference check the content of lexical records from a text file. It validates (defined in./CheckCont/ErrMsgUtilLexicon.java
:Check Flag Value Condition/Action Description Type Auto-Fix DUP EUI 1 if EUI is duplicated Check duplicated EUIs error no DUP REC 2 - if cit|cat is duplicated
- A record is represented by citation|category
- Potential dupRec are sent to ${outFile}.dupRec
Check potential duplicated records warning no nominalization, abbreviations, acronyms NO EUI 3 - No CC-EUI (found by cit|cat) & CR-EUI exists
- remove CR-EUI
CR-EUI is not correct error yes NEW EUI 13 - No CC-EUI & no CR-EUI
- These cit|cat are not in Lexicon and should be added into Lexicon
cit|cat is new in Lexicon warning no MISS EUI 6 - 1 CC-EUI & No CR-EUI
- assign CC-EUI to CR-EUI
missing CR-EUI is found warning yes WRONG EUI 7 - 1 CC-EUI & not matches CR-EU
- requires manual review
CR-EUI is different from CC-EUI error no MISS EUIs 8 - Multi CC-EUIs & no CR-EUI
- requires manual review
missing CR-EUI with multi candidate CC-EUIs error no WRONG EUIs 9 - Multi CC-EUIs & none matches CR-EUI
- requires manual review
CR-EUI is not in CC-EUIs error no nominalization SYM CIT 10 - N-EUI found (nom found by EUI) & cit not matches
- requires manual review
citation not match in symmetric nom error no SYM CAT 11 - N-EUI found & cat not matches
- requires manual review
category not match in symmetric nom error no SYM NONE 12 - No N-EUI
- requires manual review
Not symmetric error no
- Usage:
shell> LexCrossCheck <inFile> <autoFixFile> <prepostionFile> <particleFile> <dupRecExpFile> <notBaseFormFile> <-v: verbose>
- inFile: lexical record in text format
- autoFixFile: auto-fixed lexical record in text format
- prepositionFile: the preposition file, default: use the prepositions.data included in the lexCheck${YEAR}api.jar or lexCheck${YEAR}dist.jar
=> prepositions are used in the class of Compl.CheckPreposition.java. - particleFile: the particle file, default: use the particles.data included in the lexCheck${YEAR}api.jar or lexCheck${YEAR}dist.jar
=> particles are used in the class of Compl.CheckParticle.java. - dupRecExpFile: the duplicate exception file. The duplicated reports are generated in ${autoFixFile}.dupRec
- notBaseFormFile: not base form, from the tag results (output of AnalyzeNewEuiFile: *cr/abb.out).
- -v: set verbose to true, default: fault
- Outputs:
- On screen message:
- Confirmed message if records are valid.
- Otherwise, error message.
- autoFixFile: Auto-fixed Lexicon
- ${autoFixFile}.dupRec: duplicated records that need to be addressed before further checking
- On screen message:
- Notes:
- Must include:
- lexCheck${YEAR}dist.jar (for LVG APIs) or
- lexCheck${YEAR}api.jar and lvg${YEAR}api.jar
- Benchmark run time for Lexicon: 10 ~ 15 sec.
- Must include:
- Examples:
shell> LexCrossCheck lexicon.txt lexicon.fixed ./data/Files/prepositions.data ./data/Files/particles.data ./data/Files/dupRecExceptions.data ./data/Files/notBaseForm.data -v
shell> LexCrossCheck lexicon.txt lexicon.fixed ./data/Files/prepositions.data ./data/Files/particles.data ./data/Files/dupRecExceptions.data ./data/Files/notBaseForm.data