MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Similarity-based approaches to natural language processing (1997) [34 citations — 2 self]

Download:
Download as a PDF | Download as a PS
by Lillian Jane Lee, Lillian Jane Lee
http://www.cs.cornell.edu/home/llee/papers/thesis-single.ps
Add To MetaCart

Abstract:

ii Statistical methods for automatically extracting information about associations between words or documents from large collections of text have the potential to have considerable impact in a number of areas, such as information retrieval and natural-language-based user interfaces. However, even huge bodies of text yield highly unreliable estimates of the probability of relatively common events, and, in fact, perfectly reasonable events may not occur in the training data at all. This is known as the sparse data problem. Traditional approaches to the sparse data problem use crude approximations. We propose a di#erent solution: if we are able to organize the data into classes of similar events, then, if information about an event is lacking, we can estimate its behavior from information about similar events. This thesis presents two such similarity-based approaches, where, in general, we measure similarity by the Kullback-Leibler divergence, an information-theoretic quantity. Our first approach is to build soft, hierarchical clusters: soft, because each event belongs to each cluster with some probability; hierarchical, because cluster centroids are iteratively

Citations

4398 Maximum likelihood from incomplete data via the EM algorithm – Dempster, Laird, et al. - 1977
683 Finding Groups in Data: An Introduction to Cluster Analysis – Kaufman, Rousseeuw - 1990
679 WordNet: A lexical database for english – Miller - 1995
647 Pattern Recognition with Fuzzy Objective Function Algorithms – Bezdek - 1981
623 A stochastic parts program and noun phrase parser for unrestricted text – Church - 1988
596 Information Theory and Statistics – Kullback - 1959
520 Estimation of probabilities from sparse data for the language model component of a speech recognizer – Katz - 1987
500 The use of multiple measurements in taxonomic problems – Fisher - 1936
435 Word association norms, mutual information, and lexicography – Church, Hanks - 1990
419 Scatter/Gather: A clusterbased approach to browsing large document collections – Cutting, Karger, et al. - 1992
405 Distributional Clustering of English Words – Pereira, Tishby, et al. - 1993
394 Classbased n-gram models of natural language – Brown, deSouza, et al. - 1992
346 Bayesian Classification (AutoClass): Theory and Results – Cheeseman, Stutz - 1995
339 An empirical study of smoothing techniques for language modeling – Chen, Goodman - 1998
318 A maximum likelihood approach to continuous speech recognition – Bahl, Jelinek, et al. - 1983
284 Information theory and statistical mechanics – Jaynes - 1957
252 Syntactic Structures – Chomsky - 1957
238 The population frequencies of species and the estimation of population parameters – Good - 1953
238 Interpolated estimation of markov source parameters from sparse data – Jelinek, Mercer - 1980
227 Word-sense disambiguation using statistical models of Roget's categories trained on large corpora – Yarowsky - 1992
213 AUTOCLASS: A Bayesian classification system – Cheeseman, Kelly, et al. - 1988
190 Selection and Information: A Class-Based Approach to Lexical Relationships – Resnik - 1993
159 Elements of Information Theory. Wiley Series in Telecommunications – Cover, Thomas - 1991
155 Noun Classification from Predicate-Argument Structures – Hindle - 1990
143 Pairwise data clustering by deterministic annealing – Hofmann, Buhmann - 1997
125 A comparison of the enhanced good-turing and deleted estimation methods for estimating probabilities of English bigrams. Computer Speech and Language 5 – Church, Gale - 1991
115 Statistical mechanics and phase transitions in clustering – Rose, Fox - 1990
106 Probability theory – Rényi - 1970
95 The Art of Computer Programming, volume 1 – Knuth - 1973
75 1992] Contextual word similarity and estimation from sparse data – Dagan, Marcus, et al.
75 Improved clustering techniques for class-based statistical language modeling – Kneser, Ney - 1993
56 Principles of lexical language modeling for speech recognition – Jelinek, Mercer, et al. - 1992
54 Similaritybased estimation of word co-occurrence probabilities – Dagan, Pereira, et al. - 1994
52 Statistical methods and linguistics – Abney - 1996
50 Towards the Automatic Identification of Adjectival Scales: Clustering Adjectives According to Meaning – Hatzivassiloglou, McKeown - 1993
44 On the Estimation of ’Small’ Probabilities by Leaving-One-Out – ESSEN
44 Wordnet and distributional analysis: A class-based approach to lexical discovery – Resnik - 1992
40 On the complexity of clustering problems – Brucker - 1977
40 Word space – SCHÜTZE - 1993
39 Cooccurrence smoothing for stochastic language modeling – Essen, Steinbiss - 1992
39 A synopsis of linguistic theory. 1930-1955 – Firth - 1957
39 Work on statistical methods for word sense disambiguation – Gale, Church, et al. - 1992
35 Similarity-based methods for word sense disambiguation – Dagan, Lee, et al. - 1997
33 Statistical sense disambiguation with relatively small corpora using dictionary definitions – Luk - 1995
30 Baysian Classification with Correlation and Inheritance – Hanson, Stutz, et al. - 1991
27 Part-of-speech induction from scratch – Schütze - 1993
27 Intrinsic classification by MML - the Snob Program – Wallace, Dowe - 1994
24 A parser for text corpora – Hindle - 1993
23 Bootstrapping syntactic categories – Finch, Chater - 1992
22 Learning similarity-based word sense disambiguation from sparse data – Karov, Edelman - 1996