| S. Huffman. Learning Information Extraction Patterns from Examples. Connectionist, Statistical, and symbolic Approaches to Learning for Natural Language Processing, 1996. |
....and the launching of space vehicles. In order to address the problem of creating such systems economically, some researchers have looked at techniques for automatically or semi automatically constructing lexicons ( 23] 24] 26] 30] or extraction rules for the domain ( 7] 28] [20], 16] 17] 25] Most of these techniques have applied machine learning approaches to learn rules based on texts that have been semantically annotated. Either pre annotation of text is done by a human expert, or the rules are post processed by a human expert. The research reported here ....
S. Huffman. Learning Information Extraction Patterns from Examples. Connectionist, Statistical, and symbolic Approaches to Learning for Natural Language Processing, 1996.
....Concept dictionaries typically consist of hundreds to thousands of patterns like this. New dictionaries are required for each new domain, and creating them consumes a great amount of time. Research efforts have begun to address this: e.g. Cardie (1993) Riloff (1993) Soderland et al. 1995) Huffman (1996) . These corpus based methods all employ a large training corpus annotated with examples for each concept or type of extraction. From these examples, machine learning algorithms induce conceptual patterns for extraction. However, these methods have not eliminated the cost of building information ....
....g to P ProposeInstances add new instances p(T ) to E RefinePattern with s the corresponding specific pattern unless p is already performing at its best specialize p in the direction of s Figure 4: Algorithm for learning extraction patterns from user supplied instances. The LIEP system (Huffman, 1996) also uses a specific to general learning algorithm. It learns a new pattern for every example in the training corpus, unless a previously learned pattern matches the example or can be generalized to match it. LIEP generalizes a pattern by fixing a non generalizable portion of the pattern; then, ....
Huffman, S. 1996. Learning Information Extraction Patterns from Examples. In Wermter, S.; Riloff, E.; and Scheler, G., editors, Symbolic, Connectionist, and Statistical Approaches to Learning for Natural Language Processing. Springer. 246--260.
....itself. ffl Only single value extraction rules can be formulated. This is because all predicates implicitly relate to one text fragment. There are a a number of other systems that learn information extractions rules and do not use any logic programming formalism such as AutoSlog [11] LIEP [8], WHISK [13] and RAPIER [1] More complete and detailed overviews are given in [10] and in [13] 91 5 Summary We have proposed our mapping of typical text patterns to logic programs which is based on types for text, words, and text positions and three fundamental predicates. Based on this ....
S. B. Huffman. Learning information extraction patterns from examples. In Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing, volume 1040 of LNAI, pages 246--260. Springer Verlag, Berlin, 1996.
.... we are able to mechanically transform an information space using attributes of such an explanation structure (by partially evaluating the programmatic representation of the operationalized explanation) Other applications in automated software engineering [23] and information pattern extraction [28], while supporting explanation based views of scenario analysis, have not been connected to the partial evaluation aspect, which is critical for the PIPE methodology of personalization. The PIPE approach is also related to the use of task models in software design. Traditionally, such integration ....
S.B. Huffman. Learning Information Extraction Patterns from Examples. In Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing, pages 246--260. Springer Verlag, 1996.
....definition, however, turned out to be a very time consuming task. Thus, a number of machine learning approaches have been developed recently. They acquire parts of the IE model automatically from (various types of) text corpora (e.g. specifically annotated corpora or unannotated ones; cf. e.g. [12, 19, 22, 4], 8] However, what is missing so far is an integrated approach for acquiring the IE model using machine learning techniques. The ultimate objective that we pursue in our research is a fully integrated system for building an IE model and exploiting it in an IE system by applying a bootstrapping ....
....of type based feature structures and unification, but they do not provide an easily understandable conceptual model of the domain. Second, work on the explicit combination of information extraction and machine learning has mostly been focused on only few parts of the IE model, e.g. described by [12, 19, 22, 4, 8]. The underlying idea is rather than spending weeks or months manually adapting an information extraction system to a new domain, one would like a system that can be trained on some sample documents and that is expected to do a reasonable job of extracting information from new ones. In [8] a ....
S. Huffman. Learning information extraction patterns from examples. In Wermter, Riloff, and Scheler, editors, Connectionist, Statistical, And Symbol Approaches to Learning for Natural Language Processing, volume 1040 of Lecture Notes in Artificial Intelligence, pages 246--260, Berlin, Springer, 1996.
....unigram distribution. By applying the Viterbi algorithm to previously unseen text, the most likely state sequence is produced, and this can be used to label parts of the text with a class. Several authors have developed machine learning systems that generate pattern based extraction rules (see [20,49,58,112,113]) Craven et al. 29] have attempted something even more ambitious than simple information extraction; they seek to autonomously build AI knowledge bases by a combination of extracting data to populate predefined relations and inducing new relations from Web structure. An extended description of ....
S. Huffman, Learning information extraction patterns from examples, in: S. Wermter, E. Riloff, G. Scheler (Eds.), Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing, Springer, Berlin, 1996.
....While AutoSlog s patterns perform best when semantic class information is available, the learning algorithm and the resulting concept nodes can still operate effectively when no semantic class information can be obtained. There have been a few additional attempts to learn extraction patterns. Huffman s LIEP system [1996] learns patterns that recognize semantic relationships between two target noun phrases, i.e. between two slot fillers of an information extraction output template. The patterns describe the syntactic context that falls between the target noun phrases as well as the semantic class of the heads of ....
Huffman, Scott 1996. Learning Information Extraction Patterns from Examples. In Wermter, Stefan; Riloff, Ellen; and Scheler, Gabriele, editors, Symbolic, connectionist, and statistical approaches to learning for natural language processing, Lecture Notes in Artificial Intelligence Series. Springer. 246--260.
....and represent them as a dictionary (Lehnert, 1992) All the rules must be reconstructed from scratch when the target domain is changed. To cope with this problem, some pioneers have studied methods for learning information extraction rules (Riloff,1996; Soderland et al. 1995; Kim et al. 1995; Huffman, 1996; Califf and Mooney, 1997) Along these lines, our approach is to apply an inductive logic programming (ILP) Muggleton, 1991) system to the learning of IE rules, where information is extracted from semantic representations of news articles. The ILP system that we employed is a type oriented ILP ....
....of information showed good results. This indicates that the ILP system RHB has a high potential in IE tasks. 7 Related Work Previous researches on generating IE rules from texts with templates include AutoSlogTS (Riloff,1996) CRYSTAL (Soderland et al. 1995) PALKA (Kim et al. 1995) LIEP (Huffman, 1996) and RAPIER (Califf and Mooney, 1997) In our approach, we use the typeoriented ILP system RHB , which is independent of natural language analysis. This point differentiates our approach from the others. Learning semantic level IE rules using an ILP system from semantic representations is also ....
S. B. Huffman, Learning Information Extraction Patterns from Examples, Statistical and Symbolic Approaches to Learning for Natural Language Processing, pp.246--260, 1996.
....IE systems is the high cost involved in manually adapting them to new domains and text styles. In recent years, a variety of Machine Learning (ML) techniques has been used to improve the portability of IE systems to new domains, as in SRV (Freitag, 1998) RAPIER (Califf and Mooney, 1997) LIEP (Huffman, 1996), CRYSTAL (Soderland et al. 1995) and WHISK (Soderland, 1999) However, some drawbacks remain in the portability of these systems: a) existing systems generally depend on the supported text style and learn IE rules either for structured texts, semi structured texts or free text , b) IE systems ....
S. Huffman. 1996. Learning information extraction patterns from examples. In S. Wermter, E. Riloff, and G. Sheller, editors, Connectionist, statistical and symbolic approaches to learning for natural language processing. Springer-Verlag.
....texts, it also makes use of a semantic hierarchy and associated lexicon created by the expert for the task. Bagga et al. 1] system generalizes from sentences selected by an expert by using also an annotated corpora (in order to discover the best generalizations made from these sentences) LIEP [3] does not need annotated training texts, but rather has an interface that allows a user to identify over the text relevant entities and relationships between them. LIEP patterns are induced from positive training instances and the generalization step allows to expand the patterns by including a ....
S.B. Huffman, `Learning information extraction patterns from examples ', in IJCAI-95 WSP on New Approaches to Learning for NLP, 1995.
....libraries in a scalable fashion. There has been substantial work on trainable IE systems in recent years. This research has tended to be split between two communities: the natural language processing 62 N. Kushmerick Artificial Intelligence 118 (2000) 15 68 community has focused on free text [16,43,62,72], while the information integration and software agent communities have focused on structured Internet documents [9,11,41,52, 60] This distinction has started to blur, as researchers have started to evaluate their systems on both structured and natural text [33,70,71] We now discuss these ....
S. Huffman, Learning information extraction patterns from examples, in: S. Wermter, E. Riloff, G. Scheler (Eds.), Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing, Springer, Berlin, 1996.
....extracting only limited structures from the input text. Creating dictionaries of concept patterns for information extraction consumes a great amount of time and is required for each new domain. Research efforts have begun to address this: Cardie (1993) Riloff (1993) Soderland et al. 1995) Huffman (1996) . These corpus based methods all employ a large training corpus annotated with examples for each concept. From these examples, machine learning algorithms induce conceptual patterns for extraction. However, these methods have not eliminated the cost of building information extraction systems, but ....
....of examples. The AutoSlog system (Riloff, 1993) used a one shot learning pass with a set of syntactic heuristics to encode patterns for every training instance. The resulting concept dictionary was filtered by a user to discard incorrect patterns. Both Crystal (Soderland et al. 1995) and Liep (Huffman, 1996) use specific to general covering algorithms. Crystal guides pattern generalization by unifying similar patterns; while Liep avoids generalization complexity by limiting allowable generalizations. These systems have laid important groundwork, which we build upon in two directions. First, we extend ....
Huffman, S. 1996. Learning Information Extraction Patterns from Examples. In Wermter, S.; Riloff, E.; and Scheler, G., editors, Symbolic, Connectionist, and Statistical Approaches to Learning for Natural Language Processing. Springer. 246--260.
....who edits the proposed rules. The second system is PALKA [Kim and Moldovan 1992] developed by the University of Southern California for their MUC 5 system. A third trainable system appeared in MUC 6, the HASTEN system from SRA [Krupka 1995] The fourth system described in this chapter is LIEP [Huffman 1996] that was developed on the MUC 6 domain. By the time of the MUC 6 conference, the University of Massachusetts had moved from AutoSlog to CRYSTAL. 7.1 AutoSlog The AutoSlog dictionary construction tool was developed by Ellen Riloff at the University of Massachusetts [Riloff 1993] AutoSlog passes ....
....constraints on the sentence elements used as slot fillers. The Egraph anchor had a constraint on the verb root, and the irrelevant sentence element had no constraints. 7. 4 LIEP The last system to be discussed in this chapter is Scott Huffman s LIEP (Learning Information Extraction Patterns) [Huffman 1996]. LIEP uses a set of heuristics to create rules, called extraction patterns, from single training instances. There is also a mechanism to generalize extraction patterns slightly. LIEP learns patterns for multi slot concept extraction, such as Management Succession events. Unlike AutoSlog s ....
Huffman, S. Learning Information Extraction Patterns from Examples. Connectionist, Statistical, and Symbolic approaches to Learning for Natural Language Processing. Springer, 246-260, 1996.
....extraction In contrast to information extraction systems that rely on the availability of large text corpora, we followed a different approach in that we learn extraction templates from examples provided by the user. For that purpose we apply a similar technique to the one used in LIEP [23]. Figure 10 shows an example of the generation of a template for job opportunities. The user reaches this window by clicking on the Template button in the message list. He first has to select the relevant context by marking the corresponding area in the message. As next step the user chooses a ....
S. B. Huffman. Learning information extraction patterns from examples. In S. Wermter, E. Riloff, and G. Scheler, editors, Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing, pages 246--260. Springer-Verlag, Berlin, 1996.
....however, there have been several efforts to automate the acquisition of extraction patterns. Two of the earliest systems to generate extraction patterns automatically were AutoSlog [Riloff, 1993] and PALKA [Kim and Moldovan, 1993] More recently, CRYSTAL [Soderland et al. 1995] and LIEP [Huffman, 1996] have been developed. All of these systems use some form of manually tagged training data or user input. For example, AutoSlog requires text with specially tagged noun phrases. CRYSTAL requires text with specially tagged noun phrases as well as a semantic hierarchy and associated lexicon. PALKA ....
Huffman, S. 1996. Learning information extraction patterns from examples. In Wermter, Stefan; Riloff, Ellen; and Scheler, Gabriele, editors 1996, Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing. Springer-Verlag, Berlin.
....person months of manual labor. So it is not surprising that there is strong interest in developing techniques to acquire the necessary domain specific knowledge automatically. Several systems have been developed to generate domain specific extraction patterns automatically or semi automatically [Huffman, 1995; Kim and Moldovan, 1993; Riloff, 1996a; Soderland et al. 1995] There have also been efforts to automate 2 These activating conditions are domain specific and would likely be prone to false hits if the case frame was applied to a more general corpus. various aspects of discourse processing ....
Huffman, Scott B. 1995. Learning information extraction patterns from examples. In Working Notes of the IJCAI-95 Workshop on New Approaches to Learning for Natural Language Processing.
....data. For example, AutoSlog [ Riloff, 1993; 1996a ] CRYSTAL [ Soderland et al. 1995 ] RAPIER [ Califf, 1998 ] SRV [ Freitag, 1998 ] and WHISK [ Soderland, 1999 ] need a training corpus that includes annotations for the desired extractions, and PALKA [ Kim and Moldovan, 1993 ] and LIEP [ Huffman, 1996 ] require manually defined keywords, frames, or object recognizers. AutoSlog TS [ Riloff, 1996b ] has simpler needs but still requires a corpus of texts that have been labeled as relevant and irrelevant to the domain. Using bootstrapping techniques described in section 2, we have developed an ....
S. Huffman. Learning information extraction patterns from examples. In Stefan Wermter, Ellen Riloff, and Gabriele Scheler, editors, Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing, pages 246--260. Springer-Verlag, Berlin, 1996.
....are an attractive testbed for the application of machine learning methods to natural language processing. Recently, several researchers have begun to apply learning methods to the construction of IE systems (McCarthy Lehnert 1995; Soderland et al. 1995; Riloff 1993; 1996; Kim Moldovan 1995; Huffman 1996). Several symbolic and statistical methods have been employed, but learning is generally used to construct only part of a larger IE system. Our system, Rapier (Robust Automated Production of Information Extraction Rules) learns rules for the complete IE task. The resulting rules extract the ....
....job template. IE can be useful in a variety of domains. The various MUC s have focused on domains such as Latin American terrorism, joint ventures, microelectronics, and company management changes. Others have used IE to track medical patient records (Soderland et al. 1995) or company mergers (Huffman 1996). A general task considered in this paper is extracting information from postings to USENET newsgroups, such as job announcements. Relational Learning Most empirical natural language research has employed statistical techniques that base decisions on very limited contexts, or symbolic techniques ....
[Article contains additional citation context not shown here]
Huffman, S. B. 1996. Learning information extraction patterns from examples. In Wermter, S.; Riloff, E.; and Scheler, G., eds., Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing. Berlin: Springer. 246--260.
....Wrapper Induction learns rules to extract multiple slots of a case frame at once, but SRV and RAPIER extract single slots in isolation. The next systems shown in Table 1 can handle free text such as news stories: AutoSlog (Riloff 1993) CRYSTAL (Soderland et al. 1995, Soderland 1997) and LIEP (Huffman 1996). AutoSlog does single slot extraction, LIEP does only multislot extraction, and CRYSTAL handles either. Each of these systems requires syntactic preprocessing of the text and semantic tagging. Given preprocessed input, CRYSTAL can be extended to handle semi structured text (Soderland 1997a) LIEP ....
....has also been applied to semi structured text. but only if supplied with an appropriate syntactic analyzer that allows the text to be treated as if it were grammatical (Soderland 1997a) 6.3.3. LIEP, HASTEN, and PALKA Other systems that learn text extraction rules are LIEP, HASTEN, and PALKA. LIEP (Huffman 1996) uses heuristics in a manner similar to AutoSlog, but learns multi slot rules. In contrast to systems we have previously seen that cannot handle multi slot extraction, LIEP cannot handle single slot extraction. It finds context for a slot only in terms of its syntactic relationship to other slots. ....
Huffman, S. (1996). Learning information extraction patterns from examples. In Wermter, S., Riloff, E., and Scheller, G. (Eds.), Connectionist, Statistical, and Symbolic approaches to Learning for Natural Language Processing. Berlin: Springer.
....do extraction on the remaining texts [72] Only those texts that met predefined characteristics were considered suitable for examination by the extraction system. Additionally, they further reduced the amount of semantic processing by only examining those sentences in which the keywords appeared [27]. Some extraction systems look for important phrases by applying heuristics. These could be: look for proper nouns to be the names of individuals, companies, or countries; look for a word that is all capitals surrounded by parenthesis, which would denote an acronym) look for the use of italics ....
....they did not readily apply. Limited automatic extraction was obtained and with lower than acceptable precision and recall rates. See the next section for definitions of precision and recall. An additional problem with IE systems is that there appears to be an implicit or even explicit assumption [27] that the information being extracted is about an event . These systems further assume that the constituent elements being found fill roles within an event. This assumption does not hold for the domains of this research; the sentences containing the desired information generally use ....
Huffman, Scott B. Learning information extraction patterns from examples. In Working Notes of the IJCAI Workshop on New Approaches to Learning for Natural Language Processing (Montreal, Canada, August 1995), AAAI, pp. 127--134.
....are available in house and if not how long it will take to get them and what they will cost [28] We see this automation of a sophisticated librarian as a natural step in the evolutionary development of a fully automated digital library. Our approach builds on current document location technology [1, 6, 15, 31, 32] by introducing the value of information and its cost, time and likelihood of being acquired as the driving force behind the decision of when, where and how to locate specific documents. 2 Overview of the System Our proposed system architecture is based on three primary layers that operate ....
S. B. Huffman. Learning information extraction patterns from examples. In IJCAI-95 Workshop on New Aproaches to Learning for Natural Language Processing, August 1995.
....is easy to see that we can use the above concept node to extract the target of the terrorist attack in the sentence The Parliament was bombed by the guerrillas. but we need an active verb based concept to perform the same task for the sentence The guerrillas bombed the Parliament. 2. 2 LIEP LIEP [Huffman, 1995] is a learning system that can be seen as a multi slot version of AutoSlog. That is, rather than learning one extraction pattern for each item of interest in a sentence (e.g. target and perpetrator) LIEP generates a single rule for all items of interest. For instance, let us again consider the ....
Huffman, S. 1995. Learning information extraction patterns from examples. Workshop on new approaches to learning for natural language processing (IJCAI-95) 127-- 142.
No context found.
Huffman, S. (1996). Learning information extraction patterns from examples. In Wermter, Learning for Natural Language Processing. Berlin: Springer.
No context found.
S. B. Huffman. Learning informationextraction patterns from examples. IJCAI-95 Workshop on New Aproaches to Learning for Natural Language Processing, August 1995.
No context found.
S. B. Huffman. Learning information extraction patterns from examples. Proc. IJCAI-95 workshop on New Approaches to Learning for Natural Language Processing, 1995.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC