Results 1 -
7 of
7
Visually and Phonologically Similar Characters in Incorrect Simplified Chinese Words
"... Visually and phonologically similar characters are major contributing factors for errors in Chinese text. By defining appropriate similarity measures that consider extended Cangjie codes, we can identify visually similar characters within a fraction of a second. Relying on the pronunciation informat ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
Visually and phonologically similar characters are major contributing factors for errors in Chinese text. By defining appropriate similarity measures that consider extended Cangjie codes, we can identify visually similar characters within a fraction of a second. Relying on the pronunciation information noted for individual characters in Chinese lexicons, we can compute a list of characters that are phonologically similar to a given character. We collected 621 incorrect Chinese words reported on the Internet, and analyzed the causes of these errors. 83 % of these errors were related to phonological similarity, and 48 % of them were related to visual similarity between the involved characters. Generating the lists of phonologically and visually similar characters, our programs were able to contain more than 90 % of the incorrect characters in the reported errors. 1
A Hybrid Chinese Spelling Correction Using Language Model and Statistical Machine Translation with Reranking
"... ..."
(Show Context)
A Study of Language Modeling for Chinese Spelling Check
"... Chinese spelling check (CSC) is still an open problem today. To the best of our knowledge, language modeling is widely used in CSC because of its simplicity and fair predictive power, but most systems only use the conventional n-gram models. Our work in this paper continues this general line of rese ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Chinese spelling check (CSC) is still an open problem today. To the best of our knowledge, language modeling is widely used in CSC because of its simplicity and fair predictive power, but most systems only use the conventional n-gram models. Our work in this paper continues this general line of research by further exploring different ways to glean extra semantic clues and Web resources to enhance the CSC performance in an unsupervised fashion. Empirical results demonstrate the utility of our CSC system. 1
Integrating Dictionary and Web N-grams for Chinese Spell Checking
- Computational Linguistics and Chinese Language Processing
, 2013
"... Abstract Chinese spell checking is an important component of many NLP applications, including word processors, search engines, and automatic essay rating. Nevertheless, compared to spell checkers for alphabetical languages (e.g., English or French), Chinese spell checkers are more difficult to deve ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract Chinese spell checking is an important component of many NLP applications, including word processors, search engines, and automatic essay rating. Nevertheless, compared to spell checkers for alphabetical languages (e.g., English or French), Chinese spell checkers are more difficult to develop because there are no word boundaries in the Chinese writing system and errors may be caused by various Chinese input methods. In this paper, we propose a novel method for detecting and correcting Chinese typographical errors. Our approach involves word segmentation, detection rules, and phrase-based machine translation. The error detection module detects errors by segmenting words and checking word and phrase frequency based on compiled and Web corpora. The phonological or morphological typographical errors found then are corrected by running a decoder based on the statistical machine translation model (SMT). The results show that the proposed system achieves significantly better accuracy in error detection and more satisfactory performance in error correction than the state-of-the-art systems.
Chinese Word Spelling Correction Based on N-gram Ranked Inverted Index List
"... Spelling correction can assist individuals to input text data with machine using written language to obtain relevant information effi-ciently and effectively in. By referring to rele-vant applications such as web search, writing systems, recommend systems, document min-ing, typos checking before pri ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Spelling correction can assist individuals to input text data with machine using written language to obtain relevant information effi-ciently and effectively in. By referring to rele-vant applications such as web search, writing systems, recommend systems, document min-ing, typos checking before printing is very close to spelling correction. Individuals can
Candidate Scoring Using Web-Based Measure for Chinese Spelling Error Correction
"... Chinese character correction involves two major steps: 1) Providing candidate corrections for all or partially identified characters in a sentence, and 2) Scoring all altered sentences and identifying which is the best corrected sentence. In this paper a web-based measure is used to score candidate ..."
Abstract
- Add to MetaCart
(Show Context)
Chinese character correction involves two major steps: 1) Providing candidate corrections for all or partially identified characters in a sentence, and 2) Scoring all altered sentences and identifying which is the best corrected sentence. In this paper a web-based measure is used to score candidate sentences, in which there exists one continuous error character in a sentence in almost all sentences in the Bakeoff corpora. The approach of using a web-based measure can be applied directly to sentences with multiple error characters, either consecutive or not, and is not optimized for one-character error correction of Chinese sentences. The results show that the approach achieved a fair precision score whereas the recall is low compared to results reported in this Bakeoff. 1
Chinese Spelling Check System Based on Tri-gram Model
"... This paper describes our system in the Chinese spelling check (CSC) task of CLP-SIGHAN Bake-Off 2014. CSC is still an open problem today. To the best of our knowledge, n-gram language modeling (LM) is widely used in CSC because of its simplicity and fair predictive power. Our work in this paper cont ..."
Abstract
- Add to MetaCart
(Show Context)
This paper describes our system in the Chinese spelling check (CSC) task of CLP-SIGHAN Bake-Off 2014. CSC is still an open problem today. To the best of our knowledge, n-gram language modeling (LM) is widely used in CSC because of its simplicity and fair predictive power. Our work in this paper continues this general line of research by using a tri-gram LM to detect and correct possible spelling errors. In addition, we use dynamic programming to improve the efficiency of the algorithm, and additive smoothing to solve the data sparseness problem in training set. Empirical evaluation results demonstrate the utility of our CSC system. 1