Results 1 - 10
of
153
Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging
- Computational Linguistics
, 1995
"... this paper, we will describe a simple rule-based approach to automated learning of linguistic knowledge. This approach has been shown for a number of tasks to capture information in a clearer and more direct fashion without a compromise in performance. We present a detailed case study of this learni ..."
Abstract
-
Cited by 924 (8 self)
- Add to MetaCart
(Show Context)
this paper, we will describe a simple rule-based approach to automated learning of linguistic knowledge. This approach has been shown for a number of tasks to capture information in a clearer and more direct fashion without a compromise in performance. We present a detailed case study of this learning method applied to part of speech tagging
Bottom-Up Relational Learning of Pattern Matching Rules for Information Extraction
, 2003
"... Information extraction is a form of shallow text processing that locates a specified set of relevant items in a natural-language document. Systems for this task require significant domain-specific knowledge and are time-consuming and difficult to build by hand, making them a good application for ..."
Abstract
-
Cited by 406 (20 self)
- Add to MetaCart
(Show Context)
Information extraction is a form of shallow text processing that locates a specified set of relevant items in a natural-language document. Systems for this task require significant domain-specific knowledge and are time-consuming and difficult to build by hand, making them a good application for machine learning. We present an algorithm, RAPIER, that uses pairs of sample documents and filled templates to induce pattern-match rules that directly extract fillers for the slots in the template. RAPIER is a bottom-up learning algorithm that incorporates techniques from several inductive logic programming systems. We have implemented the algorithm in a system that allows patterns to have constraints on the words, part-of-speech tags, and semantic classes present in the filler and the surrounding text. We present encouraging experimental results on two domains.
Unsupervised namedentity extraction from the web: An experimental study.
- Artificial Intelligence,
, 2005
"... Abstract The KNOWITALL system aims to automate the tedious process of extracting large collections of facts (e.g., names of scientists or politicians) from the Web in an unsupervised, domain-independent, and scalable manner. The paper presents an overview of KNOW-ITALL's novel architecture and ..."
Abstract
-
Cited by 372 (39 self)
- Add to MetaCart
(Show Context)
Abstract The KNOWITALL system aims to automate the tedious process of extracting large collections of facts (e.g., names of scientists or politicians) from the Web in an unsupervised, domain-independent, and scalable manner. The paper presents an overview of KNOW-ITALL's novel architecture and design principles, emphasizing its distinctive ability to extract information without any hand-labeled training examples. In its first major run, KNOW-ITALL extracted over 50,000 class instances, but suggested a challenge: How can we improve KNOWITALL's recall and extraction rate without sacrificing precision? This paper presents three distinct ways to address this challenge and evaluates their performance. Pattern Learning learns domain-specific extraction rules, which enable additional extractions. Subclass Extraction automatically identifies sub-classes in order to boost recall (e.g., "chemist" and "biologist" are identified as sub-classes of "scientist"). List Extraction locates lists of class instances, learns a "wrapper" for each list, and extracts elements of each list. Since each method bootstraps from KNOWITALL's domain-independent methods, the methods also obviate hand-labeled training examples. The paper reports on experiments, focused on building lists of named entities, that measure the relative efficacy of each method and demonstrate their synergy. In concert, our methods gave KNOWITALL a 4-fold to 8-fold increase in recall at precision of 0.90, and discovered over 10,000 cities missing from the Tipster Gazetteer.
Using Corpus Statistics and WordNet Relations for Sense Identification
, 1998
"... Introduction An impressive array of statistical methods have been developed for word sense identification. They range from dictionary-based approaches that rely on definitions (Vronis and Ide 1990; Wilks et al. 1993) to corpus-based approaches that use only word cooccurrence frequencies extracted f ..."
Abstract
-
Cited by 201 (0 self)
- Add to MetaCart
Introduction An impressive array of statistical methods have been developed for word sense identification. They range from dictionary-based approaches that rely on definitions (Vronis and Ide 1990; Wilks et al. 1993) to corpus-based approaches that use only word cooccurrence frequencies extracted from large textual corpora (Schfitze 1995; Dagan and Itai 1994). We have drawn on these two traditions, using corpus-based co-occurrence and the lexical knowledge base that is embodied in the WordNet lexicon. The two traditions complement each other. Corpus-based approaches have the advantage of being generally applicable to new texts, domains, and corpora without needing costly and perhaps error-prone parsing or semantic analysis. They require only training corpora in which the sense distinctions have been marked, but therein lies their weakness. Obtaining training materials for statistical methods is costly and timeconsuming --it is a "knowledge acquisition bottleneck" (Gale, Church, and Y
Introduction to the CoNLL-2000 Shared Task: Chunking
, 2000
"... We describe the CoNLL-2000 shared task: dividing text into syntactically related nonoverlapping groups of words, so-called text chunking. We give background information on the data setst present a general overview of the systems that have taken part in the shared task and briefly discuss their perfo ..."
Abstract
-
Cited by 180 (1 self)
- Add to MetaCart
We describe the CoNLL-2000 shared task: dividing text into syntactically related nonoverlapping groups of words, so-called text chunking. We give background information on the data setst present a general overview of the systems that have taken part in the shared task and briefly discuss their performance.
Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data
- IN ICML
, 2004
"... In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when longrange dependencies exist. We present dynamic conditional random fields (DCRFs), a generalization of linear-chain cond ..."
Abstract
-
Cited by 171 (13 self)
- Add to MetaCart
(Show Context)
In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when longrange dependencies exist. We present dynamic conditional random fields (DCRFs), a generalization of linear-chain conditional random fields (CRFs) in which each time slice contains a set of state variables and edges---a distributed state representation as in dynamic Bayesian networks (DBNs)---and parameters are tied across slices. Since exact
A Rule-Based Approach to Prepositional Phrase Attachment Disambiguation
, 1994
"... I.n this paper, we describe a new corpus-based ap proach to prepositional phrase attachment disambiguation, and 10resent results comparing perlbrmance of this algorithm with ol,her corpus-based approaches to this problem. ..."
Abstract
-
Cited by 169 (6 self)
- Add to MetaCart
I.n this paper, we describe a new corpus-based ap proach to prepositional phrase attachment disambiguation, and 10resent results comparing perlbrmance of this algorithm with ol,her corpus-based approaches to this problem.
Web-scale information extraction in knowItAll: (preliminary results).
- In Proceedings of the 13th International Conference on World Wide Web (WWW ’04),
, 2004
"... ABSTRACT Manually querying search engines in order to accumulate a large body of factual information is a tedious, error-prone process of piecemeal search. Search engines retrieve and rank potentially relevant documents for human perusal, but do not extract facts, assess confidence, or fuse informa ..."
Abstract
-
Cited by 151 (5 self)
- Add to MetaCart
(Show Context)
ABSTRACT Manually querying search engines in order to accumulate a large body of factual information is a tedious, error-prone process of piecemeal search. Search engines retrieve and rank potentially relevant documents for human perusal, but do not extract facts, assess confidence, or fuse information from multiple documents. This paper introduces KNOWITALL, a system that aims to automate the tedious process of extracting large collections of facts from the web in an autonomous, domain-independent, and scalable manner. The paper describes preliminary experiments in which an instance of KNOWITALL, running for four days on a single machine, was able to automatically extract 54,753 facts. KNOWITALL associates a probability with each fact enabling it to trade off precision and recall. The paper analyzes KNOWITALL's architecture and reports on lessons learned for the design of large-scale information extraction systems.
Unsupervised Learning of Disambiguation Rules for Part of Speech Tagging
- In Natural Language Processing Using Very Large Corpora
, 1995
"... In this paper we describe an unsupervised learning algorithm for automatically training a rule-based part of speech tagger without using a manually tagged corpus. We compare this algorithm to the Baum-Welch algorithm, used for unsupervised training of stochastic taggers. Next, we show a method for c ..."
Abstract
-
Cited by 130 (1 self)
- Add to MetaCart
In this paper we describe an unsupervised learning algorithm for automatically training a rule-based part of speech tagger without using a manually tagged corpus. We compare this algorithm to the Baum-Welch algorithm, used for unsupervised training of stochastic taggers. Next, we show a method for combining unsupervised and supervised rule-based training algorithms to create a highly accurate tagger using only a small amount of manually tagged text.
Feature engineering for text classification
- Proceedings of ICML-99, 16th International Conference on Machine Learning
, 1999
"... Most research in text classification has used the “bag of words ” representation of text. This paper examines some alternative ways to represent text based on syntactic and semantic relationships between words (phrases, synonyms and hypernyms). We describe the new representations and try to justify ..."
Abstract
-
Cited by 129 (1 self)
- Add to MetaCart
Most research in text classification has used the “bag of words ” representation of text. This paper examines some alternative ways to represent text based on syntactic and semantic relationships between words (phrases, synonyms and hypernyms). We describe the new representations and try to justify our suspicions that they could have improved the performance of a rule-based learner. The representations are evaluated using the RIPPER rule-based learner on the Reuters-21578 and DigiTrad test corpora, but on their own the new representations are not found to produce a significant performance improvement. Finally, we try combining classifiers based on different representations using a majority voting technique. This step does produce some performance improvement on both test collections. In general, our work supports the emerging consensus in the information retrieval community that more sophisticated Natural Language Processing techniques need to be developed before better text representations can be produced. We conclude that for now, research into new learning algorithms and methods for combining existing learners holds the most promise.