Results 11 - 20
of
67
MDL-based Decision Tree Pruning
, 1995
"... This paper explores the application of the Minimum Description Length principle for pruning decision trees. We present a new algorithm that intuitively captures the primary goal of reducing the misclassification error. An experimental comparison is presented with three other pruning algorithms. The ..."
Abstract
-
Cited by 68 (1 self)
- Add to MetaCart
This paper explores the application of the Minimum Description Length principle for pruning decision trees. We present a new algorithm that intuitively captures the primary goal of reducing the misclassification error. An experimental comparison is presented with three other pruning algorithms. The results show that the MDL pruning algorithm achieves good accuracy, small trees, and fast execution times. Introduction Construction or "induction" of decision trees from examples has been the subject of extensive research in the past [Breiman et. al. 84, Quinlan 86]. It is typically performed in two steps. First, training data is used to grow a decision tree. Then in the second step, called pruning, the tree is reduced to prevent "overfitting". There are two broad classes of pruning algorithms. The first class includes algorithms like cost-complexity pruning [Breiman et. al., 84], that use a separate set of samples for pruning, distinct from the set used to grow the tree. In many cases, ...
Rule Induction and Instance-Based Learning: A Unified Approach
, 1995
"... This paper presents a new approach to inductive learning that combines aspects of instancebased learning and rule induction in a single simple algorithm. The RISE system searches for rules in a specific-to-general fashion, starting with one rule per training example, and avoids some of the difficult ..."
Abstract
-
Cited by 68 (6 self)
- Add to MetaCart
This paper presents a new approach to inductive learning that combines aspects of instancebased learning and rule induction in a single simple algorithm. The RISE system searches for rules in a specific-to-general fashion, starting with one rule per training example, and avoids some of the difficulties of separate-andconquer approaches by evaluating each proposed induction step globally, i.e., through an efficient procedure that is equivalent to checking the accuracy of the rule set as a whole on every training example. Classification is performed using a best-match strategy, and reduces to nearest-neighbor if all generalizations of instances were rejected. An extensive empirical study shows that RISE consistently achieves higher accuracies than state-of-the-art representatives of its "parent" paradigms (PEBLS and CN2), and also outperforms a decision-tree learner (C4.5) in 13 out of 15 test domains (in 10 with 95% confidence). 1 Introduction Several well-developed approaches to indu...
Scaling Up: Distributed Machine Learning with Cooperation
- In Proceedings of the Thirteenth National Conference on Artificial Intelligence
, 1996
"... Machine-learning methods are becoming increasingly popular for automated data analysis. However, standard methods do not scale up to massive scientific and business data sets without expensive hardware. This paper investigates a practical alternative for scaling up: the use of distributed proce ..."
Abstract
-
Cited by 59 (2 self)
- Add to MetaCart
(Show Context)
Machine-learning methods are becoming increasingly popular for automated data analysis. However, standard methods do not scale up to massive scientific and business data sets without expensive hardware. This paper investigates a practical alternative for scaling up: the use of distributed processing to take advantage of the often dormant PCs and workstations available on local networks. Each workstation runs a common rule-learning program on a subset of the data. We first show that for commonly used ruleevaluation criteria, a simple form of cooperation can guarantee that a rule will look good to the set of cooperating learners if and only if it would look good to a single learner operating with the entire data set. We then show how such a system can further capitalize on different perspectives by sharing learned knowledge for significant reduction in search effort. We demonstrate the power of the method by learning from a massive data set taken from the domain of cel...
Efficient Pruning Methods For Separate-And-Conquer Rule Learning Systems
- IN PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE
, 1993
"... Recent years have seen increased interest in systems that learn sets of rules. The goal of this paper is to study the degree to which "separate and conquer" rule learning induction methods scale up to large, real-world learning problems. In particular ..."
Abstract
-
Cited by 57 (2 self)
- Add to MetaCart
Recent years have seen increased interest in systems that learn sets of rules. The goal of this paper is to study the degree to which "separate and conquer" rule learning induction methods scale up to large, real-world learning problems. In particular
An Extensible Meta-Learning Approach for Scalable and Accurate Inductive Learning
, 1996
"... Much of the research in inductive learning concentrates on problems with relatively small amounts of data. With the coming age of ubiquitous network computing, it is likely that orders of magnitude more data in databases will be available for various learning problems of real world importance. Som ..."
Abstract
-
Cited by 53 (8 self)
- Add to MetaCart
Much of the research in inductive learning concentrates on problems with relatively small amounts of data. With the coming age of ubiquitous network computing, it is likely that orders of magnitude more data in databases will be available for various learning problems of real world importance. Some learning algorithms assume that the entire data set fits into main memory, which is not feasible for massive amounts of data, especially for applications in data mining. One approach to handling a large data set is to partition the data set into subsets, run the learning algorithm on each of the subsets, and combine the results. Moreover, data can be inherently distributed across multiple sites on the network and merging all the data in one location can be expensive or prohibitive. In this thesis we propose, investigate, and evaluate a meta-learning approach to integrating the results of mul...
Scaling Up Inductive Learning with Massive Parallelism
- Machine Learning
"... . Machine learning programs need to scale up to very large data sets for several reasons, including increasing accuracy and discovering infrequent special cases. Current inductive learners perform well with hundreds or thousands of training examples, but in some cases, up to a million or more exampl ..."
Abstract
-
Cited by 37 (14 self)
- Add to MetaCart
. Machine learning programs need to scale up to very large data sets for several reasons, including increasing accuracy and discovering infrequent special cases. Current inductive learners perform well with hundreds or thousands of training examples, but in some cases, up to a million or more examples may be necessary to learn important special cases with confidence. These tasks are infeasible for current learning programs running on sequential machines. We discuss the need for very large data sets and prior efforts to scale up machine learning methods. This discussion motivates a strategy that exploits the inherent parallelism present in many learning algorithms. We describe a parallel implementation of one inductive learning program on the CM-2 Connection Machine, show that it scales up to millions of examples, and show that it uncovers special-case rules that sequential learning programs, running on smaller datasets, would miss. The parallel version of the learning program is prefer...
Noise-Tolerant Windowing
- JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 1998
"... Windowing has been proposed as a procedure for efficient memory use in the ID3 decision tree learning algorithm. However, it was shown that it may often lead to a decrease in performance, in particular in noisy domains. Following up on previous work, where we have demonstrated that the ability of ru ..."
Abstract
-
Cited by 28 (4 self)
- Add to MetaCart
Windowing has been proposed as a procedure for efficient memory use in the ID3 decision tree learning algorithm. However, it was shown that it may often lead to a decrease in performance, in particular in noisy domains. Following up on previous work, where we have demonstrated that the ability of rule learning algorithms to learn rules independently can be exploited for more efficient windowing procedures, we demonstrate in this paper how this property can be exploited to achieve noise-tolerance in windowing.
Efficient Progressive Sampling for Association Rules
, 2002
"... In data mining, sampling has often been suggested as an effective tool to reduce the size of the dataset operated at some cost to accuracy. However, this loss to accuracy is often difficult to measure and characterize since the exact nature of the learning curve (accuracy vs. sample size) is paramet ..."
Abstract
-
Cited by 23 (5 self)
- Add to MetaCart
In data mining, sampling has often been suggested as an effective tool to reduce the size of the dataset operated at some cost to accuracy. However, this loss to accuracy is often difficult to measure and characterize since the exact nature of the learning curve (accuracy vs. sample size) is parameter and data dependent, i.e., we do not know apriori what sample size is needed to achieve a desired accuracy on a particular dataset for a particular set of parameters. In this article we propose the use of progressive sampling to determine the required sample size for association rule mining. We first show that a naive application of progressive sampling is not very efficient for association rule mining. We then present a refinement based on equivalence classes, that seems to work extremely well in practice and is able to converge to the desired sample size very quickly and very accurately. An additional novelty of our approach is the definition of a support-sensitive, interactive measure of accuracy across progressive samples.
Automatically Training a Problematic Dialogue Predictor for a Spoken Dialogue System
- Journal of Artificial Intelligence Research
, 2002
"... sources and services from any phone. However, current spoken dialogue systems are deficient in their strategies for preventing, identifying and repairing problems that arise in the conversation. This paper reports results on automatically training a Problematic Dialogue Predictor to predict probl ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
sources and services from any phone. However, current spoken dialogue systems are deficient in their strategies for preventing, identifying and repairing problems that arise in the conversation. This paper reports results on automatically training a Problematic Dialogue Predictor to predict problematic human-computer dialogues using a corpus of 4692 dialogues collected with the How May I Help You spoken dialogue system. The Problematic Dialogue Predictor can be immediately applied to the system's decision of whether to transfer the call to a human customer care agent, or be used as a cue to the system's dialogue manager to modify its behavior to repair problems, and even perhaps, to prevent them. We show that a Problematic Dialogue Predictor using automaticallyobtainable features from the first two exchanges in the dialogue can predict problematic dialogues 13.2% more accurately than the baseline.