LexCrossCheck

  • Descriptions:
    Cross-reference check the content of lexical records from a text file. It validates:
    Check FlagCondition/ActionDesriptionTypeAuto-Fix
    DUP EUIif EUI is duplicatedCheck duplicated EUIserrorno
    DUP REC
    • if cit|cat is duplicated
    • A record is represented by citation|category
    • Potential dupRec are sent to ${outFile}.dupRec
    Check potential duplicated recordswarningno
    nominalization, abbreviations, acronyms
    NO EUI
    • No CC-EUI (found by cit|cat) & CR-EUI exists
    • remove CR-EUI
    CR-EUI is not correcterroryes
    NEW EUI
    • No CC-EUI & no CR-EUI
    • These cit|cat are not in Lexicon and should be added into Lexicon
    cit|cat is new in Lexiconwarningno
    MISS EUI
    • 1 CC-EUI & No CR-EUI
    • assign CC-EUI to CR-EUI
    missing CR-EUI is foundwarningyes
    WRONG EUI
    • 1 CC-EUI & not matches CR-EU
    • reguires manual review
    CR-EUI is differnt from CC-EUIerrorno
    MISS EUIs
    • Multi CC-EUIs & no CR-EUI
    • reguires manual review
    missing CR-EUI with multi candidate CC-EUIserrorno
    WRONG EUIs
    • Multi CC-EUIs & none matches CR-EUI
    • reguires manual review
    CR-EUI is not in CC-EUIserrorno
    nominalization
    SYM CIT
    • N-EUI found (nom found by EUI) & cit not matches
    • reguires manual review
    citation not match in symmetric nomerrorno
    SYM CAT
    • N-EUI found & cat not matches
    • reguires manual review
    category not match in symmetric nomerrorno
    SYM NONE
    • No N-EUI
    • reguires manual review
    Not symmetricerrorno

  • Requirements:
    • lexCheck${YEAR}dist.jar or
    • lexCheck${YEAR}api.jar and lvg${YEAR}api.jar

  • Usage:
    shell> LexCrossCheck <inFile> <autoFixFile> <-v> <prepostionFile> <dupRecExpFile> <notBaseFormFile>
    • inFile: lexical record in text format
    • autoFixFile: auto-fixed lexical record in text format
    • -v: set verbose to true, default: fault
    • prepositionFile: the preposition file, default: use the prepositions.data included in the lexCheck${YEAR}api.jar or lexCheck${YEAR}dist.jar
      => prepositions are used in the class of Compl.CheckPreposition.java.
    • dupRecExpFile: the duplicate exception file. The duplicated reports are generated in ${autoFixFile}.dupRec
    • notBaseFormFile: not base form, from the tag results (output of AnalyzeNewEuiFile: *cr/abb.out).

  • Outputs:
    • Confirmed message if records are valid. Otherwise, error message.
    • Auto-fixed Lexicon
    • ${autoFixFile}.dupRec: duplicated records that need to be addressed before further checking

  • Examples:
    • shell> LexCrossCheck ./in.txt
    • shell> LexCrossCheck ./in.txt ./out.txt
    • shell> LexCrossCheck ./in.txt ./out.txt -v
    • shell> LexCrossCheck ./in.txt ./out.txt -v ./data/Files/prepositions.data ./data/Files/dupRecExceptions.data ./data/Files/notBaseForm.data