4 citations found. Retrieving documents...
Jing-Shin Chang, Yi-Chung Lin, and Keh-Yih Su. 1995. Automatic Construction of a Chinese Electronic Dictionary, In Proceedings of VLC-95, pages 107-120.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Automatic Extraction of New Words from Japanese Texts using.. - Nagata (1996)   (1 citation)  (Correct)

....the best analysis) from an unsegmented corpus of 20 million words. Initial estimates of the word frequencies were derived from the frequencies in the corpus of the strings of hanzi making up each word in the lexicon whelher or nol each string is actually an instance of the word in qmestion. (Chang et al. 1995) proposed an automatic dictionary construction method for Chinese from a large unsegmented corpus (311591 sentences) with the help of a small segmented seed corpus (1000 sentences) They combined Viterbi reestimation using the word unigram model with a post filter called the Two Class ....

....model for word segmentation, and expected word frequency for unknown word extraction. We compared time results with a segmented Japanese corpus, and reported 43.7 recall and 52.3 precision for 1000 sentences whose out of vocabulary rate is 2.1 . It is impossible to compare our results with (Chang et al. 1995), because the experiment con ditions are completely different in terms of lan guage (Chinese vs. Japanese) the size of seed segmented corpus, the size of target unsegmented corpus and its out of vocabulary rate, the size of initial word list, aud the type of reference data 57 (on line ....

Jing-Shin Chang, Yi-Chung Lin, and Keh-Yih Su. 1995. Automatic Construction of a Chinese Electronic Dictionary, In Proceedings of VLC-95, pages 107-120.


USeg: A Retargetable Word Segmentation Procedure for.. - Ponte, Croft (1996)   (14 citations)  (Correct)

....out a per word cost. Additional complexity arises from the inflectional endings. In order to consider a candidate word, morphological processing is done to find every possible grouping of characters that form valid words. This complication does not arise in Chinese to the same degree. Chang et al. [6] describe a method of Chinese lexical acquisition using a small seed corpus, a word based segmenter, and a two class classifier for words. N grams of size 2, 3, and 4, were used as candidates in the initial lexicon. The word probabilities were updated by training the word based segmenter on a ....

....from the training text is that many modern words, such as words used for concepts in science and technology, are very different in Chinese as spoken in Mainland China and in Taiwan [7] 7. 2 Automatic Lexical Acquisition From Text The lexical acquisition process was done in a manner similar to [6]. Statistics were collected for n grams of size 2, 3, and 4 since most Chinese words are of those lengths. The frequency of occurrence, mutual information and entropy were the statistics used [6] Informally, an n gram is assumed to be a word if it: ffl Occurs frequently USeg: A Retargetable Word ....

[Article contains additional citation context not shown here]

Chang, J.S. , Lin Y. C. and Su, K. Y. Automatic Construction of a Chinese Electronic Dictionary. Proceedings of the Third Workshop on Very Large Corpora June 1995.


A Multivariate Gaussian Mixture Model for Automatic Compound.. - Chang, Su   Self-citation (Jing-shin Su)   (Correct)

No context found.

Chang, Jing-Shin, Yi-Chung Lin and Keh-Yih Su, "Automatic Construction of a Chinese Electronic Dictionary," Proceedings of the Third Workshop on Very Large Corpora, pp. 107-120, MIT, June, 1995.


An Unsupervised Iterative Method for Chinese New Lexicon.. - Jing-Shin Chang (1997)   (7 citations)  Self-citation (Jing-shin Su)   (Correct)

No context found.

Chang, Jing-Shin, Yi-Chung Lin and Keh-Yih Su, "Automatic Construction of a Chinese Electronic Dictionary," Proceedings of the Third Workshop on Very Large Corpora, pp. 107-120, MIT, June, 1995.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC