| J.-S. Chang and K.-Y. Su. An Unsupervised Iterative Method for Chinese New Lexicon Extraction. International Journal of Computational Linguistics & Chinese Language Processing, 1997. |
....meaningful patterns. Unfortunately, segmenting an input sentence into words is a nontrivial task. There has been a signi cant amount of research on techniques for discovering word segmentation boundaries; see for example [BC96, Bre99, DB95, MA97, CAS98, dM95, KW99, Kit00, ZGCL00, Hua00, AL00, CS97, GPS99, PC96, Jin92, SS90, SSGC96] among which there are at least two Ph.D. theses [Kit00, dM95] The main idea behind most of these techniques is to start with a lexicon that contains the set of possible words and then segment a concatenated character string by optimizing a heuristic objective ....
J.-S. Chang and K.-Y. Su. An Unsupervised Iterative Method for Chinese New Lexicon Extraction. International Journal of Computational Linguistics & Chinese Language Processing, 1997.
....to build a lexicon. Unfortunately, since there are over 20,000 Chinese characters, among which 6763 are most commonly used, building a complete lexicon by hand is impractical. Therefore a number of unsupervised segmentation methods have been proposed recently to segment Chinese and Japanese text [1, 15, 40, 74, 47]. Most of these approaches use some form of EM to learn a probabilistic model of character sequences and then employ Viterbi decoding like procedures to segment new text into words. One reason that EM algorithm is widely adopted for unsupervised training is that it is guaranteed to converge to a ....
Chang, J.-S. and Su, K.-Y.; An Unsupervised Iterative Method for Chinese New Lexicon Extraction. International Journal of Computational Linguistics & Chinese Language Processing, 1997.
....to build a lexicon. Unfortunately, since there are over 20,000 Chinese characters, among which 6763 are most commonly used, building a complete lexicon by hand is impractical. Therefore a number of unsupervised segmentation methods have been proposed recently to segment Chinese and Japanese text [1, 3, 8, 12, 9]. Most of these approaches use some form of EM to learn a probabilistic model of character sequences and then employ Viterbi decoding like procedures to segment new text into words. One reason that EM algorithm is widely adopted for unsupervised training is that it is guaranteed to converge to a ....
....standard EM segmentation can be thought of as a zero order HMM. Mutual Information Lexicon Optimization: Other researchers have considered using mutual information to build a lexicon. For example, 14] uses mutual information to build a lexicon, but only deals with words of up to 2 characters. [3, 12] uses mutual information and context information to build a lexicon based on the statistics directly obtained from the training corpus. By contrast, we are using mutual information to prune a given lexicon. That is, instead of building a lexicon from scratch, we rst add all possible words and ....
Chang, J.-S. and Su, K.-Y.; An Unsupervised Iterative Method for Chinese New Lexicon Extraction. International Journal of Computational Linguistics & Chinese Language Processing, 1997.
No context found.
Chang, Jing-Shin and Keh-Yih Su, 1997a. "An Unsupervised Iterative Method for Chinese New Lexicon Extraction", to appear in International Journal of Computational Linguistics & Chinese Language Processing, 1997.
No context found.
J.-S. Chang and K.-Y. Su. An Unsupervised Iterative Method for Chinese New Lexicon Extraction. International Journal of Computational Linguistics & Chinese Language Processing, 1997.
No context found.
J.-S. Chang and K.-Y. Su. An Unsupervised Iterative Method for Chinese New Lexicon Extraction. International Journal of Computational Linguistics & Chinese Language Processing, 1997.
No context found.
J.-S. Chang and K.-Y. Su. An Unsupervised Iterative Method for Chinese New Lexicon Extraction. International Journal of Computational Linguistics & Chinese Language Processing, 1997.
No context found.
J.-S. Chang and K.-Y. Su. An Unsupervised Iterative Method for Chinese New Lexicon Extraction. International Journal of Computational Linguistics & Chinese Language Processing, 1997.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC