@MISC{Zhixiong_simultaneousinterpretation:, author = {Zhang Zhixiong and Li Sa and Wu Zhengxin and Lin Ying}, title = {Simultaneous Interpretation: No}, year = {} }
Share
OpenURL
Abstract
Being aware of the importance of Information Extraction (IE) in supporting innovation in many areas of library services, the authors begin to construct a Chinese information extraction system to effectively process huge Chinese information resources. Based on experiments and comparisons of some popular IE systems, the authors bring forth a Chinese IE solution which makes full use of GATE (General Architecture for Text Engineering) system from University of Sheffield, trying to develop a Chinese IE plug-in to process Chinese information resource based on GATE framework. After more than one years of working, the authors implemented this system. The article here analyses the framework of GATE system, describes the Chinese IE solution based on the GATE system, focuses on three key difficulties in the process of implementing Chinese information extraction system, which are Chinese tokenizing problem, professional gazetteers and Chinese named entity recognition. (1) Chinese tokenizing is a problem because language structure of Chinese is very flexible and performing word segmentation of Chinese language is very difficult. To solve this problem, the open source software named ICTCLAS (Institute of Computing Technology, Chinese Lexical Analysis System) from CAS is integrated into this system. (2) In GATE system, to aid named entity recognition, a set of gazetteer lists is provided. But the gazetteer lists