11 citations found. Retrieving documents...
B. Thomas, Anti-unification based learning of T-Wrappers for information extraction, in: Proc. of AAAI Workshop on Machine Learning for IE, AAAI Press, 1999, pp. 15--20.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Learning Approaches to Wrapper Induction - Grieser, Lange (2001)   (Correct)

....we relate the learning problems on hand to the problems that learning theory papers originally address and point out what they have in common and where the differences are. More specifically, we use the problem of learning island wrappers, a particular simple class of wrappers introduced in [5, 6], as an illustrating example. Our theoretical investigations are based on the rather idealistic asumption that, when learning any particular island wrapper, the learner eventually receives all HTML documents together with the overall set of information that can be extracted from those documents ....

....a procedure p that, given any j 2 IN, enumerates a finite set S j such that (i) and (ii) are fulfilled, where (i) for all j 2 IN, S j L j . ii) for all j; k 2 IN, if S j L k then L k 6 L j . 3 Island Wrappers Next, we define a particular type of wrappers, socalled island wrappers (cf. [5, 6]) that describe a particular way of how interesting information may be wrapped into an HTML document. This approach rests upon the following assumptions. i) Information of the same type visually appears in the same style in a document and, vice versa, the visual appearance allows for grouping ....

Thomas, B. 1999a. Anti-unification based learning of TWrappers for information extraction. In Proc. AAAI Workshop on Machine Learning for IE, 15--20.


Visual Web Information Extraction with Lixto - Baumgartner, Flesca, Gottlob (2001)   (30 citations)  (Correct)

....and counterexamples of a large number of web pages. Stalker [22] specialises general SkipTo sequence patterns based on labelled HTML pages. An approach to maximise specific patterns is introduced by Davulcu et al. 10] Other examples include Softmealy [13] using finite state transducers) and MIA [27] (prolog based wrappers using antiunification; neural networks to generalise and learn texts) NoDoSe ( 2] extracts information from plain string sources and provides a user interface for example labelling. It has Name Website Used Example Page Testpages Amazon http: www.amazon.com Lord of the ....

B. Thomas. Anti-unification based learning of T-wrappers for information extraction. In Workshop on Machine Learning for IE, 1999.


Visual Web Information Extraction with Lixto - Baumgartner, Flesca, Gottlob (2001)   (30 citations)  (Correct)

....and counterexamples of a large number of web pages. Stalker [20] specialises general SkipTo sequence patterns based on labelled HTML pages. An approach to maximise specific patterns is introduced by Davulcu et al. 9] Other examples include Softmealy [12] using finite state transducers) and MIA [24] (prolog based wrappers using antiunification; neural networks to generalise and learn texts) NoDoSe ( 2] extracts information from plain string sources and provides a user interface for example labelling. It has restricted capabilities to deal with Name Website Used Example Page Testpages ....

B. Thomas. Anti-unification based learning of Twrappers for information extraction. In Workshop on Machine Learning for IE, 1999.


A Multi-Agent Location Based Information Systems for.. - Beuster, Kleemann.. (2003)   Self-citation (Thomas)   (Correct)

....for certain web domains. Whenever the extraction agents visit one of these domains during their search they use these pre learned wrappers to extract information from one of the web pages. Currently the wrapper toolkit uses a one shot learning strategy [Grieser et al. 2000; Thomas, 2000; Thomas, 1999a; Thomas, 1999b] extended with a special document representation based on the Document Object Model (DOM) representation. The general strategy to learn left and right anchors to define the start and end point for extraction (delimiters) is kept, but extended to path learning in the DOM of the ....

Bernd Thomas. Anti-Unification Based Learning of T-Wrappers for Information Extraction. In Proc. AAAI-99 Workshop on Machine Learning for Information Extraction, 1999.


Model-Based Deduction for Knowledge Representation - Baumgartner, Furbach, Thomas (2002)   Self-citation (Thomas)   (Correct)

....search profiles is needed. This is exactly the point where we use model based deduction for knowledge representation. Step b) the extraction of addresses from never before seen web pages with varying structure of addresses necessitates the use of machine learning based techniques. These techniques [11] are used to learn various extraction procedures o#ine and online. Nevertheless the current state of the art in machine learning based information extraction still requires some togehter with wizAI.com post cleaning of extractions, because the correctness of extractions is not and probably will ....

B. Thomas. Anti-Unification Based Learning of T-Wrappers for Information Extraction. In Workshop on Machine Learning for Information Extraction. Amercian Association of Artificial Intelligence, July 1999. preceeding Sixteenth National American Conference on Artifical Intelligence (AAAI-99).


Core Technologies For Information Agents - Kushmerick, Thomas (2003)   Self-citation (Thomas)   (Correct)

....and b) the learning methods used. For example, the relevant features may be its type (integer, char, HTML tag) whether it is upper or lower case, its length, linguistic knowledge about the word category, its genus or even additional semantic knowledge drawn from a rich taxonomy. For example Thomas [Thomas, 1999] uses a document transformation into feature terms, in which a fragment like b Pentium 90 b is written as a list of tokens: token(type=html, tag=b) token(type=word, txt= Pentium ) token(type=int, val=90) token(type=html end, tag=b) If we replace a feature value like b in ....

....tuples of the form Description, URL from a HTML document. For further details of how logic programs can be used for information extraction, see [Thomas, 2000] In the last decade various representations have been developed, some influenced largely by logic programming [Junker et al. 1999; Thomas, 1999] and other slot oriented approaches motivated by natural language processing. In essence they all can be represented without much effort in a first order predicate logic syntax. Additional representations Figure 5: An online catalogue may be used to capture the documents layout. For ....

[Article contains additional citation context not shown here]

Bernd Thomas. Anti-Unification Based Learning of T-Wrappers for Information Extraction. In Proc. AAAI-99 Workshop on Machine Learning for Information Extraction, 1999.


Adaptive information extraction: Core technologies for.. - Kushmerick, Thomas (2002)   (3 citations)  Self-citation (Thomas)   (Correct)

....b) the learning methods used. For example, the relevant features may be its type (integer, char, HTML tag) whether it is upper or lower case, its length, linguistic knowledge about the word category, its genus or even additional semantic knowledge drawn from a rich taxonomy. For example Thomas [34] uses a document transformation into feature terms, in which a fragment like b Pentium 90 b is written as a list of tokens: token(type=html, tag=b) token(type=word, txt= Pentium ) token(type=int, val=90) token(type=html end, tag=b) If we replace a feature value like b in ....

....token(type=html end, tag=a) extracts tuples of the form Description, URL from a HTML document. For further details of how logic programs can be used for information extraction, see [35] In the last decade various representations have been developed, some influenced largely by logic programming [16, 34], and other slot oriented approaches motivated by natural language processing. In essence they all can be represented without much effort in a first order predicate logic syntax. Fig. 5. Learned T Wrapper rule and extractions. Additional representations may be used to capture the documents ....

[Article contains additional citation context not shown here]

B. Thomas. Anti-Unification Based Learning of T-Wrappers for Information Extraction. In Proc. AAAI-99 Workshop on Machine Learning for Information Extraction, 1999.


MIA - An Ubiquitous Multi-Agent Web Information System - Beuster, Thomas, Wolff (2000)   (1 citation)  Self-citation (Thomas)   (Correct)

....on web documents (HTML documents) it makes more sense to use the special text formatting and annotating strings (tags) of these documents to recognize and extract relevant information. We use the syntax based approach of automatically learning extraction procedures (wrappers [18] described in [7, 15, 16]. Similar approaches using machine learning techniques for the automatic wrapper construction are described in [3, 6, 9] To extract information, we can assume special text parts to be delimiters marking the beginning and the end of the relevant information to be extracted. Thus the key idea of ....

....Thus the overall task is that of automatic program synthesis to build wrappers. More specific we use generalization techniques based on Anti Unification [12, 8] in combination with methods adopted from the area of inductive logic programming [11] For further details the reader is referred to [15, 7]. The major problem we are confronted with in the context of MIA is the lack of available examples. Because the classifier provides unknown web pages to the system, we cannot determine any examples, which are needed for learning. In the previous described standard setting for learning wrappers ....

B. Thomas. Anti-unification based learning of TWrappers for information extraction. In Workshop on Machine Learning for Information Extraction, July 1999. 16th National American Conference on Artifical Intelligence (AAAI-99).


Ubiquitous Web Information Agents - Beuster, Thomas, Wolff (2000)   Self-citation (Thomas)   (Correct)

....on web documents (HTML documents) it makes more sense to use the special text formatting and annotating strings (tags) of these documents to recognize and extract relevant information. We use the syntax based approach of automatically learning extraction procedures (wrappers [16] described in [5, 13, 14]. Similar approaches using machine learning techniques for the automatic wrapper construction are described in [1, 4, 7] To extract information, we can assume special text parts to be delimiters marking the beginning and the end of the relevant information to be extracted. Thus the key idea of ....

....is able to extract the instances in E and remaining instances presented on the page. Therefore we use generalization techniques based on Anti Unification [10, 6] in combination with methods adopted from the area of inductive logic programming [9] For further details the reader is referred to [13, 5]. The major problem we are confronted with in the context of MIA is the lack of available examples. Because the classifier provides unknown web pages to the system, we cannot determine any examples, which are needed for learning. To overcome this problem we developed a learning algorithm that ....

B. Thomas. Anti-unification based learning of T-Wrappers for information extraction. In Workshop on Machine Learning for Information Extraction, July 1999. 16th National American Conference on Artifical Intelligence (AAAI-99).


A Unifying Approach to HTML Wrapper Representation and.. - Grieser, Jantke, Lange.. (2000)   (5 citations)  Self-citation (Thomas)   (Correct)

....represented as HTML document is straightforward. One first tries to find the left and right anchor for the authors name(s) and then, to the right, the nearest left and right anchor for the year of publication. This is repeated as long as those pairs of anchors can be found. Several approaches [8, 14, 15] in the IE community use this basic idea to define extraction procedures (wrappers or templates) based on their own description language. Further, investigations showed that wrappers can be classified according to their expressiveness based on several structural constraints. This leads to the ....

....semantics is still reasonable. To lay a cornerstone of our application oriented work, we investigate which learnability results achieved for EFSs lift to AEFSs. Additionally, we prototypically show how AEFSs can be used to describe a certain class of HTML wrappers, so called island wrappers (cf. [14, 15]) We prove the learnability of island wrappers from only positive examples under certain natural assumptions. 3 Advanced Elementary Formal Systems In this section, we introduce a quite general formalism to describe wrappers, namely advanced elementary formal systems (AEFSs, for short) In ....

[Article contains additional citation context not shown here]

B. Thomas, `Anti-unification based learning of T-Wrappers for information extraction ', in Proc. of AAAI Workshop on Machine Learning for IE, pp. 15--20. AAAI, (1999).


Advanced Elementary Formal Systems - Lange, Grieser, Jantke (2001)   (Correct)

No context found.

B. Thomas, Anti-unification based learning of T-Wrappers for information extraction, in: Proc. of AAAI Workshop on Machine Learning for IE, AAAI Press, 1999, pp. 15--20.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC