MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Conceptualization to Develop Machine Learning Techniques for Information Extraction: Consistency Queries

Download:
Download as a PDF
by Gunter Grieser, Klaus P. Jantke, Steffen Lange
http://www.ke.informatik.tu-darmstadt.de/~gunter/./PAPERS/fgml02.pdf
Add To MetaCart

Abstract:

Abstract. The information extraction from documents is an increasingly urgent problem of enterprise knowledge management. Knowledge sources may be internal like text files and forms of business administration processes or external like HTML pages, e.g. When the number of knowledge sources is paramount, substantial computer support is inevitable. Machine learning techniques play a crucial role. A prototypical development system named LExIKON has been developed which supports interactive information extraction from semi-structured documents. The central mechanism inside LExIKON involves learning of formal languages. These formal languages serve as parameters of so-called wrappers which are synthesized programs performing the intended information extraction. The essence of the LExIKON technology and the functionality of the LExIKON development system is sketched by means of a sample session documented and discussed using several screenshots. The automatic generation of – hypothetical – wrappers for information extraction through the invocation of machine learning techniques is raising several questions. What can we expect of a wrapper generated in case it is not yet completely correct? Can we generate wrappers in a properly incremental fassion? For answering those practically relevant questions, a new formal framework of learning – learning by consistency queries – is introduced and studied. The overall scenario of learning by consistency queries for information extraction

Citations

624 Language identification in the limit – Gold - 1967
535 Theory of Recursive Functions and Effective Computability – Rogers - 1967
528 Queries and concept learning – Angluin - 1988
25 Queries revisited – Angluin - 2001
17 A unifying approach to html wrapper representation and learning – Grieser, Jantke, et al. - 2000
17 Combining postulates of naturalness in inductive inference – Jantke, Beick - 1981
2 Consistency queries in information extraction – Grieser, Jantke, et al. - 2002
2 Learning approaches to wrapper induction – Grieser, Lange - 2001