@MISC{_towardsfullautomationoflexiconconstruction, author = {}, title = {TowardsFullAutomationofLexiconConstruction}, year = {}}
Wedescribeworkinprogressaimedatdevelopingmethodsforautomaticallyconstructinga lexiconusingonlystatisticaldataderivedfrom analysisofcorpora,aproblemwecalllexical optimization. Specifically,weusestatistical methodsalonetoobtaininformationequivalent tosyntacticcategories,andtodiscoverthesemanticallymeaningfulunitsoftext,whichmaybemulti-wordunitsorpolysemousterms-incontext. Ourguidingprincipleistoemploya notionof“meaningfulness”thatcanbequantifiedinformation-theoretically,sothatplausible variantsofalexiconcanbejudgedrelativeto eachother.Wedescribeatechniqueofthisnaturecalledinformationtheoreticco-clustering andgiveresultsofaseriesofexperimentsbuilt arounditthatdemonstratethemainingredientsoflexicaloptimization. Weconcludeby describingourplansforfurtherimprovements, andforapplyingthesamemathematicalprinciplestootherproblemsinnaturallanguageprocessing. 1
ourguidingprincipleistoemploya notionof meaningfulness aproblemwecalllexical optimization weusestatistical tosyntacticcategories sothatplausible variantsofalexiconcanbejudgedrelativeto weconcludeby describingourplansforfurtherimprovements