Results 1  10
of
281
Induction of Decision Trees
 MACH. LEARN
, 1986
"... The technology for building knowledgebased systems by inductive inference from examples has been demonstrated successfully in several practical applications. This paper summarizes an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such syste ..."
Abstract

Cited by 4377 (4 self)
 Add to MetaCart
The technology for building knowledgebased systems by inductive inference from examples has been demonstrated successfully in several practical applications. This paper summarizes an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail. Results from recent studies show ways in which the methodology can be modified to deal with information that is noisy and/or incomplete. A reported shortcoming of the basic algorithm is discussed and two means of overcoming it are compared. The paper concludes with illustrations of current research directions.
A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirtythree Old and New Classification Algorithms
, 2000
"... . Twentytwo decision tree, nine statistical, and two neural network algorithms are compared on thirtytwo datasets in terms of classication accuracy, training time, and (in the case of trees) number of leaves. Classication accuracy is measured by mean error rate and mean rank of error rate. Both cr ..."
Abstract

Cited by 234 (8 self)
 Add to MetaCart
(Show Context)
. Twentytwo decision tree, nine statistical, and two neural network algorithms are compared on thirtytwo datasets in terms of classication accuracy, training time, and (in the case of trees) number of leaves. Classication accuracy is measured by mean error rate and mean rank of error rate. Both criteria place a statistical, splinebased, algorithm called Polyclass at the top, although it is not statistically signicantly dierent from twenty other algorithms. Another statistical algorithm, logistic regression, is second with respect to the two accuracy criteria. The most accurate decision tree algorithm is Quest with linear splits, which ranks fourth and fth, respectively. Although splinebased statistical algorithms tend to have good accuracy, they also require relatively long training times. Polyclass, for example, is third last in terms of median training time. It often requires hours of training compared to seconds for other algorithms. The Quest and logistic regression algor...
Automatic Construction of Decision Trees from Data: A MultiDisciplinary Survey
 Data Mining and Knowledge Discovery
, 1997
"... Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial ne ..."
Abstract

Cited by 224 (1 self)
 Add to MetaCart
(Show Context)
Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial neural networks. Researchers in these disciplines, sometimes working on quite different problems, identified similar issues and heuristics for decision tree construction. This paper surveys existing work on decision tree construction, attempting to identify the important issues involved, directions the work has taken and the current state of the art. Keywords: classification, treestructured classifiers, data compaction 1. Introduction Advances in data collection methods, storage and processing technology are providing a unique challenge and opportunity for automated data exploration techniques. Enormous amounts of data are being collected daily from major scientific projects e.g., Human Genome...
Autocontext and its Application to Highlevel Vision Tasks
 In Proc. CVPR
"... The notion of using context information for solving highlevel vision and medical image segmentation problems has been increasingly realized in the field. However, how to learn an effective and efficient context model, together with an image appearance model, remains mostly unknown. The current lite ..."
Abstract

Cited by 156 (6 self)
 Add to MetaCart
The notion of using context information for solving highlevel vision and medical image segmentation problems has been increasingly realized in the field. However, how to learn an effective and efficient context model, together with an image appearance model, remains mostly unknown. The current literature using Markov Random Fields (MRFs) and Conditional Random Fields (CRFs) often involves specific algorithm design, in which the modeling and computing stages are studied in isolation. In this paper, we propose the autocontext algorithm. Given a set of training images and their corresponding label maps, we first learn a classifier on local image patches. The discriminative probability (or classification confidence) maps created by the learned classifier are then used as context information, in addition to the original image patches, to train a new classifier. The algorithm then iterates until convergence. Autocontext integrates lowlevel and context information by fusing a large number of lowlevel appearance features with context and implicit shape information. The resulting discriminative algorithm is general and easy to implement. Under nearly the same parameter settings in training, we apply the algorithm to three challenging vision applications: foreground/background segregation, human body configuration estimation, and scene region labeling. Moreover, context also plays a very important role in medical/brain images where the anatomical structures are mostly constrained to relatively fixed positions. With only some slight changes resulting from using 3D instead of 2D features, the autocontext algorithm applied to brain MRI image segmentation is shown to outperform stateoftheart algorithms specifically designed for this domain. Furthermore, the scope of the proposed algorithm goes beyond image analysis and it has the potential to be used for a wide variety of problems in multivariate labeling.
Learning Object Identification Rules for Information Integration
 Information Systems
, 2001
"... When integrating information from multiple websites, the same data objects can exist in inconsistent text formats across sites, making it di#cult to identify matching objects using exact text match. We have developed an object identification system called Active Atlas, which compares the objects& ..."
Abstract

Cited by 111 (9 self)
 Add to MetaCart
(Show Context)
When integrating information from multiple websites, the same data objects can exist in inconsistent text formats across sites, making it di#cult to identify matching objects using exact text match. We have developed an object identification system called Active Atlas, which compares the objects' shared attributes in order to identify matching objects. Certain attributes are more important for deciding if a mapping should exist between two objects. Previous methods of object identification have required manual construction of object identification rules or mapping rules for determining the mappings between objects. This manual process is time consuming and errorprone.
C4.5, Class Imbalance, and Cost Sensitivity: Why Undersampling beats Oversampling
, 2003
"... This paper takes a new look at two sampling schemes commonly used to adapt machine learning algorithms to imbalanced classes and misclassification costs. It uses a performance analysis technique called cost curves to explore the interaction of over and undersampling with the decision tree learner C4 ..."
Abstract

Cited by 102 (2 self)
 Add to MetaCart
(Show Context)
This paper takes a new look at two sampling schemes commonly used to adapt machine learning algorithms to imbalanced classes and misclassification costs. It uses a performance analysis technique called cost curves to explore the interaction of over and undersampling with the decision tree learner C4.5. C4.5 was chosen as, when combined with one of the sampling schemes, it is quickly becoming the community standard when evaluating new cost sensitive learning algorithms. This paper shows that using C4.5 with undersampling establishes a reasonable standard for algorithmic comparison. But it is recommended that the cheapest class classifier be part of that standard as it can be better than undersampling for relatively modest costs. Oversampling, however, shows little sensitivity, there is often little difference in performance when misclassification costs are changed. 1.
Multiple Comparisons in Induction Algorithms
 MACHINE LEARNING
, 1998
"... A single mechanism is responsible for three pathologies of induction algorithms: attribute selection errors, overfitting, and oversearching. In each pathology, induction algorithms compare multiple items based on scores from an evaluation function and select the item with the maximum score. We ..."
Abstract

Cited by 94 (10 self)
 Add to MetaCart
(Show Context)
A single mechanism is responsible for three pathologies of induction algorithms: attribute selection errors, overfitting, and oversearching. In each pathology, induction algorithms compare multiple items based on scores from an evaluation function and select the item with the maximum score. We call this a ( ). We analyze the statistical properties of and show how failure to adjust for these properties leads to the pathologies. We also discuss approaches that can control pathological behavior, including Bonferroni adjustment, randomization testing, and crossvalidation.
Characterization of Classification Algorithms
, 1995
"... This paper is concerned with the problem of characterization of classication algorithms. The aim is to determine under what circumstances a particular classication algorithm is applicable. The method used involves generation of dierent kinds of models. These include regression and rule models, pi ..."
Abstract

Cited by 60 (9 self)
 Add to MetaCart
This paper is concerned with the problem of characterization of classication algorithms. The aim is to determine under what circumstances a particular classication algorithm is applicable. The method used involves generation of dierent kinds of models. These include regression and rule models, piecewise linear models (model trees) and instance based models. These are generated automatically on the basis of dataset characteristics and given test results. The lack of data is compensated for by various types of preprocessing. The models obtained are characterized by quantifying their predictive capability and the best models are identied.
Predicting information seeker satisfaction in community question answering
 In Proceedings of SIGIR
, 2008
"... Question answering communities such as Naver and Yahoo! Answers have emerged as popular, and often effective, means of information seeking on the web. By posting questions for other participants to answer, information seekers can obtain specific answers to their questions. Users of popular portals s ..."
Abstract

Cited by 57 (4 self)
 Add to MetaCart
(Show Context)
Question answering communities such as Naver and Yahoo! Answers have emerged as popular, and often effective, means of information seeking on the web. By posting questions for other participants to answer, information seekers can obtain specific answers to their questions. Users of popular portals such as Yahoo! Answers already have submitted millions of questions and received hundreds of millions of answers from other participants. However, it may also take hours –and sometime days – until a satisfactory answer is posted. In this paper we introduce the problem of predicting information seeker satisfaction in collaborative question answering communities, where we attempt to predict whether a question author will be satisfied with the answers submitted by the community participants. We present a general prediction model, and develop a variety of content, structure, and communityfocused features for this task. Our experimental results, obtained from a largescale evaluation over thousands of real questions and user ratings, demonstrate the feasibility of modeling and predicting asker satisfaction. We complement our results with a thorough investigation of the interactions and information seeking patterns in question answering communities that correlate with information seeker satisfaction. Our models and predictions could be useful for a variety of applications such as user intent inference, answer ranking, interface design, and query suggestion and routing.
Lazy Learning of Bayesian Rules
 Machine Learning
, 2000
"... The naive Bayesian classifier provides a simple and e#ective approach to classifier learning, but its attribute independence assumption is often violated in the real world. A number of approaches have sought to alleviate this problem. A Bayesian tree learning algorithm builds a decision tree, and ge ..."
Abstract

Cited by 55 (11 self)
 Add to MetaCart
The naive Bayesian classifier provides a simple and e#ective approach to classifier learning, but its attribute independence assumption is often violated in the real world. A number of approaches have sought to alleviate this problem. A Bayesian tree learning algorithm builds a decision tree, and generates a local naive Bayesian classifier at each leaf. The tests leading to a leaf can alleviate attribute interdependencies for the local naive Bayesian classifier. However, Bayesian tree learning still su#ers from the small disjunct problem of tree learning. While inferred Bayesian trees demonstrate low average prediction error rates, there is reason to believe that error rates will be higher for those leaves with few training examples. This paper proposes the application of lazy learning techniques to Bayesian tree induction and presents the resulting lazy Bayesian rule learning algorithm, called Lbr. This algorithm can be justified by a variant of Bayes theorem which supports a weaker conditional attribute independence assumption than is required by naive Bayes. For each test example, it builds a most appropriate rule with a local naive Bayesian classifier as its consequent. It is demonstrated that the computational requirements of Lbr are reasonable in a wide crosssection of natural domains. Experiments with these domains show that, on average, this new algorithm obtains lower error rates significantly more often than the reverse in comparison to a naive Bayesian classifier, C4.5, a Bayesian tree learning algorithm, a constructive Bayesian classifier that eliminates attributes and constructs new attributes using Cartesian products of existing nominal attributes, and a lazy decision tree learning algorithm. It also outperforms, although the result is not statisticall...