Results 1  10
of
140
Wrappers for Feature Subset Selection
 AIJ SPECIAL ISSUE ON RELEVANCE
, 1997
"... In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a ..."
Abstract

Cited by 1521 (3 self)
 Add to MetaCart
In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. Our wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We study the strengths and weaknesses of the wrapper approach andshow a series of improved designs. We compare the wrapper approach to induction without feature subset selection and to Relief, a filter approach to feature subset selection. Significant improvement in accuracy is achieved for some datasets for the two families of induction algorithms used: decision trees and NaiveBayes.
Automatic Construction of Decision Trees from Data: A MultiDisciplinary Survey
 Data Mining and Knowledge Discovery
, 1997
"... Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial ne ..."
Abstract

Cited by 220 (1 self)
 Add to MetaCart
(Show Context)
Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial neural networks. Researchers in these disciplines, sometimes working on quite different problems, identified similar issues and heuristics for decision tree construction. This paper surveys existing work on decision tree construction, attempting to identify the important issues involved, directions the work has taken and the current state of the art. Keywords: classification, treestructured classifiers, data compaction 1. Introduction Advances in data collection methods, storage and processing technology are providing a unique challenge and opportunity for automated data exploration techniques. Enormous amounts of data are being collected daily from major scientific projects e.g., Human Genome...
Stochastic Dynamic Programming with Factored Representations
, 1997
"... Markov decision processes(MDPs) have proven to be popular models for decisiontheoretic planning, but standard dynamic programming algorithms for solving MDPs rely on explicit, statebased specifications and computations. To alleviate the combinatorial problems associated with such methods, we prop ..."
Abstract

Cited by 188 (10 self)
 Add to MetaCart
(Show Context)
Markov decision processes(MDPs) have proven to be popular models for decisiontheoretic planning, but standard dynamic programming algorithms for solving MDPs rely on explicit, statebased specifications and computations. To alleviate the combinatorial problems associated with such methods, we propose new representational and computational techniques for MDPs that exploit certain types of problem structure. We use dynamic Bayesian networks (with decision trees representing the local families of conditional probability distributions) to represent stochastic actions in an MDP, together with a decisiontree representation of rewards. Based on this representation, we develop versions of standard dynamic programming algorithms that directly manipulate decisiontree representations of policies and value functions. This generally obviates the need for statebystate computation, aggregating states at the leaves of these trees and requiring computations only for each aggregate state. The key to these algorithms is a decisiontheoretic generalization of classic regression analysis, in which we determine the features relevant to predicting expected value. We demonstrate the method empirically on several planning problems,
BOAT  Optimistic Decision Tree Construction
, 1999
"... Classification is an important data mining problem. Given a training database of records, each tagged with a class label, the goal of classification is to build a concise model that can be used to predict the class label of future, unlabeled records. A very popular class of classifiers are decision ..."
Abstract

Cited by 115 (2 self)
 Add to MetaCart
(Show Context)
Classification is an important data mining problem. Given a training database of records, each tagged with a class label, the goal of classification is to build a concise model that can be used to predict the class label of future, unlabeled records. A very popular class of classifiers are decision trees. All current algorithms to construct decision trees, including all mainmemory algorithms, make one scan over the training database per level of the tree. We introduce a new algorithm (BOAT) for decision tree construction that improves upon earlier algorithms in both performance and functionality. BOAT constructs several levels of the tree in only two scans over the training database, resulting in an average performance gain of 300% over previous work. The key to this performance improvement is a novel optimistic approach to tree construction in which we construct an initial tree using a small subset of the data and refine it to arrive at the final tree. We guarantee that any differen...
Dynamic Weighted Majority: A New Ensemble Method for Tracking Concept Drift
, 2003
"... Algorithms for tracking concept drift are important for many applications. We present a general method based on the Weighted Majority algorithm for using any online learner for concept drift. Dynamic Weighted Majority (DWM) maintains an ensemble of base learners, predicts using a weightedmajority ..."
Abstract

Cited by 86 (0 self)
 Add to MetaCart
(Show Context)
Algorithms for tracking concept drift are important for many applications. We present a general method based on the Weighted Majority algorithm for using any online learner for concept drift. Dynamic Weighted Majority (DWM) maintains an ensemble of base learners, predicts using a weightedmajority vote of these "experts", and dynamically creates and deletes experts in response to changes in performance. We empirically evaluated two experimental systems based on the method using incremental naive Bayes and Incremental Tree Inducer (ITI) as experts.
Classification trees with unbiased multiway splits
 Journal of the American Statistical Association
, 2001
"... Two univariate split methods and one linear combination split method are proposed for the construction of classification trees with multiway splits. Examples are given where the trees are more compact and hence easier to interpret than binary trees. A major strength of the univariate split methods i ..."
Abstract

Cited by 74 (11 self)
 Add to MetaCart
Two univariate split methods and one linear combination split method are proposed for the construction of classification trees with multiway splits. Examples are given where the trees are more compact and hence easier to interpret than binary trees. A major strength of the univariate split methods is that they have negligible bias in variable selection, both when the variables differ in the number of splits they offer and when they differ in number of missing values. This is an advantage because inferences from the tree structures can be adversely affected by selection bias. The new methods are shown to be highly competitive in terms of computational speed and classification accuracy of future observations. Key words and phrases: Decision tree, linear discriminant analysis, missing value, selection bias. 1
Mining Surprising Patterns Using Temporal Description Length
, 1998
"... We propose a new notion of surprising temporal patterns in market basket data, and algorithms to find such patterns. This is distinct from finding frequent patterns as addressed in the common mining literature. We argue that once the analyst is already familiar with prevalent patterns in the data, t ..."
Abstract

Cited by 61 (0 self)
 Add to MetaCart
We propose a new notion of surprising temporal patterns in market basket data, and algorithms to find such patterns. This is distinct from finding frequent patterns as addressed in the common mining literature. We argue that once the analyst is already familiar with prevalent patterns in the data, the greatest incremental benefit is likely to be from changes in the relationship between item frequencies over time. A simple measure of surprise is the extent of departure from a model, estimated using standard multivariate time series analysis. Unfortunately, such estimation involves models, smoothing windows and parameters whose optimal choices can vary dramatically from one application to another. In contrast, we propose a precise characterization of surprise based on the number of bits in which a basket sequence can be encoded under a carefully chosen coding scheme. In this scheme it is inexpensive to encode sequences of itemsets that have steady, hence likely to be wellknown, correla...
Accurate decision trees for mining highspeed data streams
 In Proc. SIGKDD
, 2003
"... In this paper we study the problem of constructing accurate decision tree models from data streams. Data streams are incremental tasks that require incremental, online, and anytime learning algorithms. One of the most successful algorithms for mining data streams is VFDT. In this paper we extend th ..."
Abstract

Cited by 52 (7 self)
 Add to MetaCart
(Show Context)
In this paper we study the problem of constructing accurate decision tree models from data streams. Data streams are incremental tasks that require incremental, online, and anytime learning algorithms. One of the most successful algorithms for mining data streams is VFDT. In this paper we extend the VFDT system in two directions: the ability to deal with continuous data and the use of more powerful classification techniques at tree leaves. The proposed system, VFDTc, can incorporate and classify new information online, with a single scan of the data, in time constant per example. The most relevant property of our system is the ability to obtain a performance similar to a standard decision tree algorithm even for medium size datasets. This is relevant due to the anytime property. We study the behaviour of VFDTc in different problems and demonstrate its utility in large and medium data sets. Under a biasvariance analysis we observe that VFDTc in comparison to C4.5 is able to reduce the variance component.