| Califf, M. and R. Mooney: 1999, `Relational Learning of Pattern-Match Rules for Information Extraction'. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99). pp. 328--334. |
....further processing. Several research teams have proposed ways to extract data using several methods, including hard coded wrappers by declarative languages [2, 9, 21] natural language processing (NLP) 20, 43, 45, 47, 48] HTML structure analysis [10, 37, 46] inductive learning based wrappers [6, 7, 26, 29, 43], wrappers created by example [1, 31] and regular expression wrappers generated automatically [10] Though all researchers report good results, the two main difficulties of traditional wrappers resiliency and scalability still remain. Resiliency means that a wrapper continues to function ....
M.E. Califf and R.J. Mooney. Relational learning of pattern-match rules for information extraction. In Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99), pages 487--493, Orlando, Florida, July 18-22, 1999. 77
....from a single relational table and are not appropriate for link discovery. Relational data mining techniques, such as inductive logic programming, are needed. Many other problems in molecular biology [36] natural language understanding [46] web page classification [9] information extraction [3, 15], and other areas also require mining multi relational data. However, relational data mining requires exploring a much larger space of possible patterns and performing complex inference and pattern matching. Consequently, current RDM methods are not scalable to very large databases. Consequently, ....
M. E. Califf and R. J. Mooney. Relational learning of pattern-match rules for information extraction. In Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99), pages 328--334, Orlando, FL, July 1999.
....such as comparison shopping agents [4] job finding, etc. There are three factors when designing an IE system. First, whether the training examples are annotated may influence the design of an IE system. Most machine learning based approaches rely on user annotated training ex amples [9, 1, 12, 6, 8], very few systems generate extraction rules based on unlabeled text [10, 2] Second, depending on the characteristics of the application domains, IE systems use extraction patterns based on one of he following approaches: context based constraints, delimiterbased constraints, or a combination of ....
....of both. For example, wrapper induction systems such as WIEN [12] Softmealy [6] Stalker [8] generate delimiter based extraction rules, while some generate context based rules [10, 2] Finally, some IE systems may rely on background knowledge for pattern generalization. For example, RAPIER [1] imposes constraints based on the WordNet semantic classes. Soft mealy [6] defines token classes such as word and nonword token classes. The IEPAD System IEPAD [2] is an IE system that does not require user annotated training example. It applies several pattern discovery techniques including ....
M. Califf and R. Mooney. Relational learning of pattern-match rules for information extraction. In Proceedings of the Sixteenth National Conference on Artificial Intelligence, 1997.
....in industries, and the launching of space vehicles. In order to address the problem of creating such systems economically, some researchers have looked at techniques for automatically or semi automatically constructing lexicons ( 23] 24] 26] 30] or extraction rules for the domain ( [7], 28] 20] 16] 17] 25] Most of these techniques have applied machine learning approaches to learn rules based on texts that have been semantically annotated. Either pre annotation of text is done by a human expert, or the rules are post processed by a human expert. The research reported ....
....is especially useful for extracting both LOCATION and BENEFITS facts. The performance was improved about 30 in F measure for those two facts. 1.2. 7 Discussion Automated rule learning from examples can also be found in other systems such as AutoSlog ( 23] PALKA ( 21] RAPIER ( [7]) and WHISK ( 28] AutoSlog uses heuristics to create rules from the examples and then requires the human expert to accept or reject the rules. PALKA applies a conceptual hierarchy to control the generalization or specification of the target slot. RAPIER uses inductive logic programming to learn ....
M. Califf and R. Mooney. Relational Learning of Pattern-Match Rules for Information Extraction. Proceedings of Computational Language Learning'97, 1997.
....They can signal errors in a test set but cannot identify the values at fault. Recent work on integration leverages relational associations but not necessarily structural features [Doan, et al. 2003] Current hierarchical approaches learn basic structural features but not more complex constraints [Califf and Mooney, 1999, Knoblock, et al. 2001, Wang, et al. 1997] ....
M. E. Califf and R. J. Mooney. Relational learning of pattern-match rules for information extraction. AAAI-99.
....strate1 gies. Furthermore, we show that the difference in performance between two selection strategies can be (weakly) predicted from the correlation between the documents they select (Sec. 5) 2 Related work There has been a large amount of work on adaptive information extraction, e.g. [2, 1, 9] and many others. These algorithms generally perform well, but all have the potential for further improvement through active learning techniques. Active learning refers to a variety of ways that a learning algorithm can control the training data over which it generalizes. For example, a learner ....
....generic in that they do not assume any specific IE algorithm. The learning algorithm that we use is LP [2] but the active learning strategies that we investigate are not particular to our choice of learning algorithm and so we could easily substitute another IE algorithm such as BWI [9] or Rapier [1]. COMPARE. This strategy selects for annotation the document that is textually least similar to the documents that have already been annotated. We select the document that is textually most dissimilar to the documents already in the corpus. The idea is to sample uniformly from the document space, ....
M. Califf and R. Mooney. Relational learning of pattern-match rules for information extraction. In Proc. 16th Nat. Conf. Artifical Intelligence, 1999.
....unusual proper nouns in it. Often the items we wish to extract are proper nouns such as names. These can be difficult to extract, because they are likely to be words that have not been seen before. Committee Invoke two different IE learning algorithms (e.g. LP [Ciravegna, 2001] and Rapier [Califf and Mooney, 1999]) The next selected document is the one on which the learned rule sets most disagree. Bag Invoke the learning algorithm on different partitions of the available training data. As with Committee, the document that maximizes disagreement is selected. Mine Following [Nahm and Mooney, 2000] learn ....
.... than others Can we bound the utility of AL as a function of the efficiency of compared to Are there general properties of learning tasks for which our approach is effective 5 Related work There has been a large amount of work on adaptive information extraction, e.g. Ciravegna, 2001; Califf and Mooney, 1999; Freitag and Kushmerick, 2000] and many others. These algorithms generally perform well, but all have the potential for further improvement through active learning techniques. Multi view learning [Blum and Mitchell, 1998] has received widespread attention. With this approach, predictions based ....
M. Califf and R. Mooney. Relational learning of pattern-match rules for information extraction. In Proc. 16th Nat. Conf. Artifical Intelligence, 1999.
....the problem of writing IE rules from a novel perspective, one which enables a much faster development of IE systems. Keywords Text Mining, Theory Revision, Information Extraction, User Guided Revision. 1. INTRODUCTION One of the main problems in building information extraction (IE) systems [1,3,5,6,22,23] is that the knowledge elicited from domain experts tends to be only approximately correct. Although knowledge so obtained might make a good first approximation to the real world, it typically contains inaccuracies that are exposed when the model asserts a fact that does not agree with empirical ....
....Fourth, the domain expert may have preferences as to which revision operators should be used for revision specific elements should they be flawed. In this paper we introduce our scheme for providing explicit revision bias in the revision of flawed IE rules. Other research on learning IE rules [1,3,5,6,22,23] has focused on inducing new IE rules based on examples rather than revising existing IE rules based on examples. In addition, we use a more sophisticated extraction language, which is more suitable for handling real world tasks and achieving high precision and recall. The rest of the paper is ....
Califf, M. E. and Mooney, R. (1997). Relational learning of pattern-match rules for information
....with total of 410 documents, including 160 structured and 250 semi structured ones. The performance of our system is measured, slot by slot, on the test set of 130 Korean and 120 English semi structured documents. We also compared the previous state of the art extraction systems, BWI [6] RAPIER [2], and WHISK SmLearning( all examples ) let P = part of speech tagger( all examples ) let rule set = until P is empty do generateclass by SmLForOneClass( P ) find common part of speech tag sequence, postag, from class transformclass and postag into mDTD rules add rules to rule set remove ....
M. Califf and R. Mooney, Relational Learning of Pattern-Match Rules for Information Extraction, Proc. of the 16 National Conference on Artificial Intelligence, 1999.
....value extraction rules can be formulated. This is because all predicates implicitly relate to one text fragment. There are a a number of other systems that learn information extractions rules and do not use any logic programming formalism such as AutoSlog [11] LIEP [8] WHISK [13] and RAPIER [1]. More complete and detailed overviews are given in [10] and in [13] 91 5 Summary We have proposed our mapping of typical text patterns to logic programs which is based on types for text, words, and text positions and three fundamental predicates. Based on this representation we presented the ....
M. Califf and R. Mooney. Relational Learning of Pattern-Match Rules for Information Extraction. Working Papers of the ACL 97 Workshop on Natural Language Learning, pages 9--15, 1997.
....created with our technology, requiring changes to be propagated to our application. For Win 32 and Unix GDAs, modifications of GUIs tend to be rare, whereas for web pages, changes occur more often. In such cases, additional approaches such as information extraction technologies may be of help [5]. 7 Conclusions The reuse of binary legacy applications (GDAs) that can only be accessed through GUIs is both a difficult and fundamental problem of software reuse. We have shown that its solution is a special case of intercepting architectural connectors where a human is at one end of a ....
M. Califf, R. Mooney. "Relational Learning of Pattern-Match Rules for Information Extraction", Working Notes of AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, 1997.
....and it addresses most of the wien s shortcomings. However, its empirical evaluation is quite sketchy, which makes it hard to compare with wien and stalker. There are three other recent systems that are focusing on learning extraction rules from online documents: srv [ Freitag, 1998 ] rapier [ Califf and Mooney, 1999 ] and whisk [ Soderland, 1999 ] Even though these approaches are mostly concerned with extracting data from natural language text, they could be also applied to some simple wrapper induction problems. The modeling tool we have described enables unsophisticated users to turn web pages into ....
M. Califf and R. Mooney. Relational learning of pattern-match rules for information extraction. In Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99), pages 328--334, 1999.
....h0 = void and h i ; i 0, the h i is consumed by void. 4. If 1 jH 0 j, obtain the reversibility index k for S 0 H . 5. Construct the prefix tree A0 = PT (S H 0 ) and induce the k reversible EA. 6 Related work The wrapper induction problem has been intensively studied over last years [1, 3, 5, 6, 7, 8, 9, 10, 12]. In [7] Kushmerick first identified some simple classes of HTML wrappers which can be efficiently inferred. These classes assume a tabular structure of the response page. The wrapper inference is therefore reduced to the efficient detection of tag sequences preceding each label in such a tabular ....
....[6] nested labels in wellformed HTML pages [9] and disjunctions [6, 8, 10] Despite the obvious progress, none of these methods has however reached the 100 success rate. All these methods follow the grammar based approach which is somewhat complementary to the approaches that use NLP techniques [3, 5, 12]. The grammar based approach is more relevant to wrapping sites that generate HTML pages upon user requests, all sorts of search engines, news delivery, electronic shopping, etc. The main advantage of grammar based approaches is the fast processing of HTML pages, as extraction automata allow to ....
M. E. Califf and R.J. Mooney. Relational learning of pattern-match rules for information extraction. Working Papers of the ACL-97 Workshop in NLP, 1997.
....definition, however, turned out to be a very time consuming task. Thus, a number of machine learning approaches have been developed recently. They acquire parts of the IE model automatically from (various types of) text corpora (e.g. specifically annotated corpora or unannotated ones; cf. e.g. [12, 19, 22, 4], 8] However, what is missing so far is an integrated approach for acquiring the IE model using machine learning techniques. The ultimate objective that we pursue in our research is a fully integrated system for building an IE model and exploiting it in an IE system by applying a bootstrapping ....
....of type based feature structures and unification, but they do not provide an easily understandable conceptual model of the domain. Second, work on the explicit combination of information extraction and machine learning has mostly been focused on only few parts of the IE model, e.g. described by [12, 19, 22, 4, 8]. The underlying idea is rather than spending weeks or months manually adapting an information extraction system to a new domain, one would like a system that can be trained on some sample documents and that is expected to do a reasonable job of extracting information from new ones. In [8] a ....
M. Califf and R. Mooney. Relational learning of pattern-match rules for information extraction. In Proceedings of the AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, 1998.
No context found.
M. E. Califf and R. J. Mooney. Relational learning of pattern-match rules for information extraction. In Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99), pages 328--334, Orlando, FL, July 1999.
No context found.
Califf, M. E., and Mooney, R. J. 1999. Relational learning of pattern-match rules for information extraction. In Proceedings of the 17th National Conference on Artificial Intelligence, 328--334.
No context found.
M. E. Califf and R. J. Mooney. Relational learning of pattern-match rules for information extraction. In Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99), pages 328--334, Orlando, FL, July 1999.
....for link discovery. Relational data mining techniques, such as inductive logic programming, are needed. Many other problems in molecular biology [Srinivasan et al..1996] natural language understanding [Zelle Mooney1996] web page classification [Craven et al..2000] information extraction [Califf Mooney1999, Freitag1998] and other areas also require mining multi relational data. However, relational data mining requires exploring a much larger space of possible patterns and performing complex inference and pattern matching. Consequently, current RDM methods are not scalable to very large databases. ....
Califf, M. E., and Mooney, R. J. 1999. Relational learning of pattern-match rules for information extraction. In Proceedings of the 17th National Conference on Artificial Intelligence, 328--334.
No context found.
Califf, M. and R. Mooney: 1999, `Relational Learning of Pattern-Match Rules for Information Extraction'. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99). pp. 328--334.
No context found.
M. E. Califf and R. J. Mooney. Relational learning of patternmatch rules for information extraction. In Working Notes of AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, pages 6--11, Menlo Park, CA, 1998. AAAI Press.
No context found.
M. Califf and R. Mooney. Relational learning of patternmatch rules for information extraction. In Proc. 16th Nat. Conf. Artifical Intelligence, 1999.
No context found.
Mooney, R., Califf, M. "Relational Learning of Pattern-Match Rules for Information Extraction" In Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99), Orlando, FL, pp. 328-334, July 1999.
No context found.
M. E. Califf and R. J. Mooney. Relational learning of pattern-match rules for information extraction. In Working Notes of AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, pages 6--11, Menlo Park, CA, 1998. AAAI Press.
No context found.
M. E. Califf and R. J. Mooney. Relational learning of pattern-match rules for information extraction. In Sixteenth National Conference on Artificial Intelligence, 1999.
No context found.
M. Califf and R. Mooney. Relational learning of patternmatch rules for information extraction. In Proc. 16th Nat. Conf. Artifical Intelligence, 1999.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC