In this paper I present a novel Machine Learning technique based on Pair Hidden Markov Models, a statistical model used in bioinformatics. This technique can be used to learn finite-state string to string transductions. I present a model of the acquisition of the English past tense. The same model can also learn without modification the Arabic broken plural, a much more complex morphological system. I also show how this model can be used for unsupervised learning of morphology, and in fact can learn morphology from sets of words automatically induced from unlabelled corpora. I then discuss various other applications and extensions of this technique. 1.
|
4344
|
Maximum likelihood from incomplete data via the EM algorithm
– Dempster, Laird, et al.
- 1977
|
|
371
|
Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids
– Dubin, Eddy, et al.
- 1998
|
|
236
|
On learning the past tenses of English verbs
– Rumelhart, McClelland
- 1986
|
|
235
|
Inductive inference of formal languages from positive data
– Angluin
- 1980
|
|
198
|
Finite-State Transducers in Language and Speech Processing
– Mohri
- 1997
|
|
196
|
Convolution Kernels on Discrete Structures
– Haussler
- 1999
|
|
193
|
Statistical inference for probabilistic functions of finite state Markov chains
– Baum, Petrie
- 1966
|
|
165
|
On language and connectionism: Analysis of a parallel distributed processing model of language acquisition
– Pinker, Prince
- 1988
|
|
151
|
Stochastic inversion transduction grammars and bilingual parsing of parallel corpora
– Wu
- 1997
|
|
115
|
Unsupervised learning of the morphology of a natural language
– Goldsmith
- 2001
|
|
112
|
Learning string-edit distance
– Ristad, Yianilos
- 1998
|
|
105
|
Dynamic alignment kernels
– Watkins
- 1999
|
|
62
|
Induction of first-order decision lists : Results on learning the past tense of english verbs
– Mooney, Califf
- 1995
|
|
57
|
An inequality for rational functions with applications to some statistical estimation problems
– Gopalakrishnan
- 1991
|
|
39
|
The BNC handbook: Exploring the British National Corpus with
– Aston, Burnard
- 1998
|
|
37
|
Learning in natural language
– Roth
- 1999
|
|
34
|
Learning the past tense of English verbs: The symbolic pattern associator vs. connectionist models
– Ling
- 1994
|
|
34
|
A natural law of succession
– Ristad
- 1995
|
|
33
|
Knowledge-free induction of morphology using latent semantic analysis
– Schone, Jurafsky
- 2000
|
|
30
|
Algorithms for grapheme-phoneme translation for English and French: Applications for database searches and speech synthesis
– Divay, Vitale
- 1997
|
|
25
|
Syntax directed translations and the pushdown assembler
– Aho, Ullman
- 1969
|
|
24
|
Foot and word in prosodic morphology: The Arabic broken plural
– McCarthy, Prince
- 1990
|
|
23
|
Inducing syntactic categories by context distribution clustering
– Clark
- 2000
|
|
22
|
Maximum mutual information estimation of hidden Markov models”, chapter 3 in Automatic Speech and Speaker Recognition, Advanced Topics, edited by Chin-Hui
– Normandin
- 1996
|
|
16
|
Learning bias and phonological-rule induction
– Gildea, Jurafsky
- 1996
|
|
15
|
Finitestate non-concatenative morphotactics
– Beesley, Karttunen
- 2000
|
|
13
|
Statistical Methods for Speech Recognition. Language, speech and communication
– Jelinek
- 1997
|
|
12
|
Properties of syntax directed translations
– Aho, Ullman
- 1969
|
|
10
|
Neural networks, nativism and the plausibility of constructivism
– Quartz
- 1993
|
|
8
|
On the use of sequential transducers in natural language processing
– Mohri
- 1997
|
|
8
|
Analogical prediction
– Muggleton, Bain
- 1999
|
|
3
|
Concepts et algorithmes pour la découverte des structures formelles des langues
– Déjean
- 1998
|
|
3
|
A morphology component for language programs
– Golding, Thompson
- 1985
|
|
2
|
Computational complexity of problems on probabilistic grammars and transducers
– Higuera, C
- 2000
|
|
2
|
A connectionist model of the Arabic plural system
– Nakisa
- 1997
|
|
1
|
Inductive logic programming: issues, results and the lll challenge
– Muggelton
- 1999
|
|
1
|
Efficient mulit-lingual phoneme-tographeme conversion based on HMM
– Rentzepopoulos
- 1996
|