Results 1 - 10
of
101
Reduction Techniques for Instance-based Learning Algorithms
- MACHINE LEARNING
, 2000
"... Instance-based learning algorithms are often faced with the problem of deciding which instances to store for use during generalization. Storing too many instances can result in large memory requirements and slow execution speed, and can cause an oversensitivity to noise. This paper has two main pur ..."
Abstract
-
Cited by 213 (3 self)
- Add to MetaCart
Instance-based learning algorithms are often faced with the problem of deciding which instances to store for use during generalization. Storing too many instances can result in large memory requirements and slow execution speed, and can cause an oversensitivity to noise. This paper has two main purposes. First, it provides a survey of existing algorithms used to reduce storage requirements in instance-based learning algorithms and other exemplar-based algorithms. Second, it proposes six additional reduction algorithms called DROP1–DROP5 and DEL (three of which were first described in Wilson & Martinez, 1997c, as RT1–RT3) that can be used to remove instances from the concept description. These algorithms and 10 algorithms from the survey are compared on 31 classification tasks. Of those algorithms that provide substantial storage reduction, the DROP algorithms have the highest average generalization accuracy in these experiments, especially in the presence of uniform class noise.
Separate-and-conquer rule learning
- Artificial Intelligence Review
, 1999
"... This paper is a survey of inductive rule learning algorithms that use a separate-and-conquer strategy. This strategy can be traced back to the AQ learning system and still enjoys popularity as can be seen from its frequent use in inductive logic programming systems. We will put this wide variety of ..."
Abstract
-
Cited by 168 (29 self)
- Add to MetaCart
(Show Context)
This paper is a survey of inductive rule learning algorithms that use a separate-and-conquer strategy. This strategy can be traced back to the AQ learning system and still enjoys popularity as can be seen from its frequent use in inductive logic programming systems. We will put this wide variety of algorithms into a single framework and analyze them along three different dimensions, namely their search, language and overfitting avoidance biases.
Forgetting Exceptions is Harmful in Language Learning
- MACHINE LEARNING, SPECIAL ISSUE ON NATURAL LANGUAGE LEARNING
, 1999
"... We show that in language learning, contrary to received wisdom, keeping exceptional training instances in memory can be beneficial for generalization accuracy. We investigate this phenomenon empirically on a selection of benchmark natural language processing tasks: grapheme-to-phoneme conversion, pa ..."
Abstract
-
Cited by 136 (44 self)
- Add to MetaCart
We show that in language learning, contrary to received wisdom, keeping exceptional training instances in memory can be beneficial for generalization accuracy. We investigate this phenomenon empirically on a selection of benchmark natural language processing tasks: grapheme-to-phoneme conversion, part-of-speech tagging, prepositional-phrase attachment, and base noun phrase chunking. In a first series of experiments we combine memory-based learning with training set editing techniques, in which instances are edited based on their typicality and class prediction strength. Results show that editing exceptional instances (with low typicality or low class prediction strength) tends to harm generalization accuracy. In a second series of experiments we compare memory-based learning and decision-tree learning methods on the same selection of tasks, and find that decision-tree learning often performs worse than memory-based learning. Moreover, the decrease in performance can be linked to the degree of abstraction from exceptions (i.e., pruning or eagerness). We provide explanations for both results in terms of the properties of the natural language processing tasks and the learning algorithms.
Machine learning for information extraction in informal domains
, 1999
"... Abstract. We consider the problem of learning to perform information extraction in domains where linguistic processing is problematic, such as Usenet posts, email, and finger plan files. In place of syntactic and semantic information, other sources of information can be used, such as term frequency, ..."
Abstract
-
Cited by 125 (4 self)
- Add to MetaCart
(Show Context)
Abstract. We consider the problem of learning to perform information extraction in domains where linguistic processing is problematic, such as Usenet posts, email, and finger plan files. In place of syntactic and semantic information, other sources of information can be used, such as term frequency, typography, formatting, and mark-up. We describe four learning approaches to this problem, each drawn from a different paradigm: a rote learner, a term-space learner based on Naive Bayes, an approach using grammatical induction, and a relational rule learner. Experiments on 14 information extraction problems defined over four diverse document collections demonstrate the effectiveness of these approaches. Finally, we describe a multistrategy approach which combines these learners and yields performance competitive with or better than the best of them. This technique is modular and flexible, and could find application in other machine learning problems. Keywords: information extraction, multistrategy learning 1.
The Tradeoffs Between Open and Traditional Relation Extraction
"... Traditional Information Extraction (IE) takes a relation name and hand-tagged examples of that relation as input. Open IE is a relationindependent extraction paradigm that is tailored to massive and heterogeneous corpora such as the Web. An Open IE system extracts a diverse set of relational tuples ..."
Abstract
-
Cited by 112 (10 self)
- Add to MetaCart
Traditional Information Extraction (IE) takes a relation name and hand-tagged examples of that relation as input. Open IE is a relationindependent extraction paradigm that is tailored to massive and heterogeneous corpora such as the Web. An Open IE system extracts a diverse set of relational tuples from text without any relation-specific input. How is Open IE possible? We analyze a sample of English sentences to demonstrate that numerous relationships are expressed using a compact set of relation-independent lexico-syntactic patterns, which can be learned by an Open IE system. What are the tradeoffs between Open IE and traditional IE? We consider this question in the context of two tasks. First, when the number of relations is massive, and the relations themselves are not pre-specified, we argue that Open IE is necessary. We then present a new model for Open IE called O-CRF and show that it achieves increased precision and nearly double the recall than the model employed by TEXTRUNNER, the previous stateof-the-art Open IE system. Second, when the number of target relations is small, and their names are known in advance, we show that O-CRF is able to match the precision of a traditional extraction system, though at substantially lower recall. Finally, we show how to combine the two types of systems into a hybrid that achieves higher precision than a traditional extractor, with comparable recall. 1
Instance-based learning in dynamic decision making
- Cognitive Science
, 2003
"... This paper presents a learning theory pertinent to dynamic decision making (DDM) called instancebased learning theory (IBLT). IBLT proposes five learning mechanisms in the context of a decision-making process: instance-based knowledge, recognition-based retrieval, adaptive strategies, necessity-base ..."
Abstract
-
Cited by 104 (44 self)
- Add to MetaCart
This paper presents a learning theory pertinent to dynamic decision making (DDM) called instancebased learning theory (IBLT). IBLT proposes five learning mechanisms in the context of a decision-making process: instance-based knowledge, recognition-based retrieval, adaptive strategies, necessity-based choice, and feedback updates. IBLT suggests in DDM people learn with the accumulation and refinement of instances, containing the decision-making situation, action, and utility of decisions. As decision makers interact with a dynamic task, they recognize a situation according to its similarity to past instances, adapt their judgment strategies from heuristic-based to instance-based, and refine the accumulated knowledge according to feedback on the result of their actions. The IBLT’s learning mechanisms have been implemented in an ACT-R cognitive model. Through a series of experiments, this paper shows how the IBLT’s learning mechanisms closely approximate the relative trend magnitude and performance of human data. Although the cognitive model is bounded within the context of a dynamic task, the IBLT is a general theory of decision making applicable to other dynamic environments.
The role of Occam’s Razor in knowledge discovery
- Data Mining and Knowledge Discovery
, 1999
"... Abstract. Many KDD systems incorporate an implicit or explicit preference for simpler models, but this use of “Occam’s razor ” has been strongly criticized by several authors (e.g., Schaffer, 1993; Webb, 1996). This controversy arises partly because Occam’s razor has been interpreted in two quite di ..."
Abstract
-
Cited by 104 (3 self)
- Add to MetaCart
Abstract. Many KDD systems incorporate an implicit or explicit preference for simpler models, but this use of “Occam’s razor ” has been strongly criticized by several authors (e.g., Schaffer, 1993; Webb, 1996). This controversy arises partly because Occam’s razor has been interpreted in two quite different ways. The first interpretation (simplicity is a goal in itself) is essentially correct, but is at heart a preference for more comprehensible models. The second interpretation (simplicity leads to greater accuracy) is much more problematic. A critical review of the theoretical arguments for and against it shows that it is unfounded as a universal principle, and demonstrably false. A review of empirical evidence shows that it also fails as a practical heuristic. This article argues that its continued use in KDD risks causing significant opportunities to be missed, and should therefore be restricted to the comparatively few applications where it is appropriate. The article proposes and reviews the use of domain constraints as an alternative for avoiding overfitting, and examines possible methods for handling the accuracy–comprehensibility trade-off.
Context-sensitive feature selection for lazy learners,”
- Artificial Intelligence Review,
, 1997
"... ..."
(Show Context)
Mining knowledge from text using information extraction
- SIGKDD Explorations
, 2005
"... An important approach to text mining involves the use of natural-language information extraction. Information extraction (IE) distills structured data or knowledge from unstructured text by identifying references to named entities as well as stated relationships between such entities. IE systems can ..."
Abstract
-
Cited by 56 (0 self)
- Add to MetaCart
(Show Context)
An important approach to text mining involves the use of natural-language information extraction. Information extraction (IE) distills structured data or knowledge from unstructured text by identifying references to named entities as well as stated relationships between such entities. IE systems can be used to directly extricate abstract knowledge from a text corpus, or to extract concrete data from a set of documents which can then be further analyzed with traditional data-mining techniques to discover more general patterns. We discuss methods and implemented systems for both of these approaches and summarize results on mining real text corpora of biomedical abstracts, job announcements, and product descriptions. We also discuss challenges that arise when employing current information extraction technology to discover knowledge in text.
Bayesian Averaging of Classifiers and the Overfitting Problem
- IN PROC. 17TH INTERNATIONAL CONF. ON MACHINE LEARNING
, 2000
"... Although Bayesian model averaging is theoretically the optimal method for combining learned models, it has seen very little use in machine learning. In this paper we study its application to combining rule sets, and compare it with bagging and partitioning, two popular but more ad hoc alternativ ..."
Abstract
-
Cited by 55 (2 self)
- Add to MetaCart
Although Bayesian model averaging is theoretically the optimal method for combining learned models, it has seen very little use in machine learning. In this paper we study its application to combining rule sets, and compare it with bagging and partitioning, two popular but more ad hoc alternatives. Our experiments show that, surprisingly, Bayesian model averaging's error rates are consistently higher than the other methods'. Further investigation shows this to be due to a marked tendency to overfit on the part of Bayesian model averaging, contradicting previous beliefs that it solves (or avoids) the overfitting problem.