Abstract:
This paper analyses the relation between the use of similarity in Memory-Based Learning and the notion of backed-off smoothing in statistical language model-ing. We show that the two approaches are closely related, and we argue that feature weighting methods in the Memory-Based paradigm can offer the advantage of au-tomatically specifying a suitable domain-specific hierarchy between most specific and most general conditioning information without the need for a large number of pa-rameters. We report two applications of this approach: PP-attachment and POS-tagging. Our method achieves state-of-the-art performance in both domains, and al-lows the easy integration of diverse infor-mation sources, such as rich lexical repre-sentations. 1
Citations
|
3215
|
C4.5: Programs for Machine Learning
– Quinlan
- 1993
|
|
2489
|
Induction of Decision Trees
– Quinlan
- 1986
|
|
792
|
Instance-Based Learning Algorithms
– Kibler
- 1991
|
|
518
|
Estimation of probabilities from sparse data for the language model component of speech recogniser
– Katz
- 1987
|
|
414
|
Toward memory-based reasoning
– Stanfill, C, et al.
- 1986
|
|
391
|
Class-Based n-gram Models of Natural Language
– Brown, Pietra, et al.
- 1992
|
|
359
|
A new statistical parser based on bigram lexical dependencies
– Collins
- 1996
|
|
335
|
An empirical study of smoothing techniques for language modeling
– Chen, Goodman
- 1996
|
|
316
|
A maximum-likelihood approach to continuous speech recognition
– Bahl, Jelinek, et al.
- 1983
|
|
222
|
A maximum entropy part-of-speech tagger
– Ratnaparkhi
- 1996
|
|
189
|
Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-based Approach
– Ng, Lee
- 1996
|
|
155
|
MBT: A Memory-Based Part of Speech Tagger-Generator
– Daelemans, Zavrel, et al.
- 1996
|
|
126
|
A comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of English bigrams. Computer Speech and Language 5.19–54
– CHURCH, GALE
- 1991
|
|
117
|
Natural Language Parsing as Statistical Pattern Recognition
– Magerman
- 1994
|
|
116
|
Coping with ambiguity and unknown words through probabilistic models. Computational Linguistics
– Weischedel, Meteer, et al.
- 1993
|
|
113
|
Prepositional phrase attachment through a backed-off model
– Collins, Brooks
- 1995
|
|
91
|
A maximum entropy model for prepositional phrase attachment
– Ratnaparkhi, Reynar, et al.
- 1994
|
|
68
|
The Distance-Weighted k-Nearest-Neighbor Rule
– Dudani
- 1976
|
|
59
|
A Weighted Nearest Neighbour Algorithm for Learning with Symbolic Features
– Cost, Salzberg
- 1993
|
|
51
|
Memory-based lexical acquisition and processing
– Daelemans
- 1995
|
|
39
|
Cooccurrence smoothing for stochastic language modeling
– Essen, Steinbiss
- 1992
|
|
33
|
Domain-Specific Knowledge Acquisition for Conceptual Sentence Analysis
– Cardie
- 1994
|
|
30
|
Automatic feature set selection for case-based learning of linguistic knowledge
– Cardie
- 1996
|
|
12
|
Handling sparse data by successive abstraction
– Samuelsson
- 1996
|
|
1
|
Abstraction Considered Harmful: Lazy Learning of Language Processing
– Dalemans
- 1996
|
|
1
|
Distributional Part-ofSpeech Tagging
– Schitze
- 1994
|