Results 1 -
1 of
1
Long-tail Distributions and Unsupervised Learning of Morphology
"... In previous work on unsupervised learning of morphology, the long-tail pattern in the rank-frequency distribution of words, as well as of morphological units, is usually considered as following Zipf’s law (power-law). We argue that these long-tail distributions can also be considered as lognormal. S ..."
Abstract
- Add to MetaCart
(Show Context)
In previous work on unsupervised learning of morphology, the long-tail pattern in the rank-frequency distribution of words, as well as of morphological units, is usually considered as following Zipf’s law (power-law). We argue that these long-tail distributions can also be considered as lognormal. Since we know the conjugate prior distribution for a lognormal likelihood, we propose to generate morphology data from lognormal distributions. When the performance is evaluated by a tokenbased criterion, giving more weights to the results of frequent words, the proposed model preforms significantly better than other models in discussion. Moreover, we capture the statistical properties of morphological units with a Bayesian approach, other than a rule-based approach as studied in (Chan, 2008) and (Zhao and Marcus, 2011). Given the multiplicative property of lognormal distributions, we can directly capture the long-tail distribution of word frequency, without the need of an additional generative process as studied in (Goldwater et al., 2006).