MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Memory-Based Learning: Using Similarity for Smoothing

Download:
Download as a PDF
unknown authors
http://acl.ldc.upenn.edu/P/P97/P97-1056.pdf
Add To MetaCart

Abstract:

This paper analyses the relation between the use of similarity in Memory-Based Learning and the notion of backed-off smoothing in statistical language model-ing. We show that the two approaches are closely related, and we argue that feature weighting methods in the Memory-Based paradigm can offer the advantage of au-tomatically specifying a suitable domain-specific hierarchy between most specific and most general conditioning information without the need for a large number of pa-rameters. We report two applications of this approach: PP-attachment and POS-tagging. Our method achieves state-of-the-art performance in both domains, and al-lows the easy integration of diverse infor-mation sources, such as rich lexical repre-sentations. 1

Citations

3215 C4.5: Programs for Machine Learning – Quinlan - 1993
2489 Induction of Decision Trees – Quinlan - 1986
792 Instance-Based Learning Algorithms – Kibler - 1991
518 Estimation of probabilities from sparse data for the language model component of speech recogniser – Katz - 1987
414 Toward memory-based reasoning – Stanfill, C, et al. - 1986
391 Class-Based n-gram Models of Natural Language – Brown, Pietra, et al. - 1992
359 A new statistical parser based on bigram lexical dependencies – Collins - 1996
335 An empirical study of smoothing techniques for language modeling – Chen, Goodman - 1996
316 A maximum-likelihood approach to continuous speech recognition – Bahl, Jelinek, et al. - 1983
222 A maximum entropy part-of-speech tagger – Ratnaparkhi - 1996
189 Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-based Approach – Ng, Lee - 1996
155 MBT: A Memory-Based Part of Speech Tagger-Generator – Daelemans, Zavrel, et al. - 1996
126 A comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of English bigrams. Computer Speech and Language 5.19–54 – CHURCH, GALE - 1991
117 Natural Language Parsing as Statistical Pattern Recognition – Magerman - 1994
116 Coping with ambiguity and unknown words through probabilistic models. Computational Linguistics – Weischedel, Meteer, et al. - 1993
113 Prepositional phrase attachment through a backed-off model – Collins, Brooks - 1995
91 A maximum entropy model for prepositional phrase attachment – Ratnaparkhi, Reynar, et al. - 1994
68 The Distance-Weighted k-Nearest-Neighbor Rule – Dudani - 1976
59 A Weighted Nearest Neighbour Algorithm for Learning with Symbolic Features – Cost, Salzberg - 1993
51 Memory-based lexical acquisition and processing – Daelemans - 1995
39 Cooccurrence smoothing for stochastic language modeling – Essen, Steinbiss - 1992
33 Domain-Specific Knowledge Acquisition for Conceptual Sentence Analysis – Cardie - 1994
30 Automatic feature set selection for case-based learning of linguistic knowledge – Cardie - 1996
12 Handling sparse data by successive abstraction – Samuelsson - 1996
1 Abstraction Considered Harmful: Lazy Learning of Language Processing – Dalemans - 1996
1 Distributional Part-ofSpeech Tagging – Schitze - 1994