Results 1 - 10
of
27
Fast Binary Feature Selection with Conditional Mutual Information
- Journal of Machine Learning Research
, 2004
"... We propose in this paper a very fast feature selection technique based on conditional mutual information. ..."
Abstract
-
Cited by 176 (1 self)
- Add to MetaCart
(Show Context)
We propose in this paper a very fast feature selection technique based on conditional mutual information.
Logic Regression
- Journal of Computational and Graphical Statistics
, 2003
"... The odyssey cohort study consists of 8,394 participants who do-nated blood samples in 1974 and 1989 in Washington County, Maryland. The cohort has been followed until 2001, and envi-ronmental factors such as smoking and dietary intake are avail-able. The goals of the study include finding associatio ..."
Abstract
-
Cited by 76 (13 self)
- Add to MetaCart
The odyssey cohort study consists of 8,394 participants who do-nated blood samples in 1974 and 1989 in Washington County, Maryland. The cohort has been followed until 2001, and envi-ronmental factors such as smoking and dietary intake are avail-able. The goals of the study include finding associations between polymorphisms in candidate genes and disease (including can-cer and cardiovascular disease). Particularly, gene-environment and gene-gene interactions associated with disease are of inter-est. Currently, SNP data from 51 sites are available for some 1600 subjects.
Quantifying and visualizing attribute interactions: An approach based on entropy
- http://arxiv.org/abs/cs.AI/0308002 v3
, 2004
"... Interactions are patterns between several attributes in data that cannot be inferred from any subset of these attributes. While mutual information is a well-established approach to evaluating the interactions between two attributes, we surveyed its generalizations as to quantify interactions between ..."
Abstract
-
Cited by 33 (4 self)
- Add to MetaCart
(Show Context)
Interactions are patterns between several attributes in data that cannot be inferred from any subset of these attributes. While mutual information is a well-established approach to evaluating the interactions between two attributes, we surveyed its generalizations as to quantify interactions between several attributes. We have chosen McGill’s interaction information, which has been independently rediscovered a number of times under various names in various disciplines, because of its many intuitively appealing properties. We apply interaction information to visually present the most important interactions of the data. Visualization of interactions has provided insight into the structure of data on a number of domains, identifying redundant attributes and opportunities for constructing new features, discovering unexpected regularities in data, and have helped during construction of predictive models; we illustrate the methods on numerous examples. A machine learning method that disregards interactions may get caught in two traps: myopia is caused by learning algorithms assuming independence in spite of interactions, whereas fragmentation arises from assuming an interaction in spite of independence.
Importance Sampled Learning Ensembles
, 2003
"... Learning a function of many arguments is viewed from the perspective of high-- dimensional numerical quadrature. It is shown that many of the popular ensemble learning procedures can be cast in this framework. In particular randomized methods, including bagging and random forests, are seen to cor ..."
Abstract
-
Cited by 27 (5 self)
- Add to MetaCart
Learning a function of many arguments is viewed from the perspective of high-- dimensional numerical quadrature. It is shown that many of the popular ensemble learning procedures can be cast in this framework. In particular randomized methods, including bagging and random forests, are seen to correspond to random Monte Carlo integration methods each based on particular importance sampling strategies. Non random boosting methods are seen to correspond to deterministic quasi Monte Carlo integration techniques. This view helps explain some of their properties and suggests modifications to them that can substantially improve their accuracy while dramatically improving computational performance.
PERT - perfect random tree ensembles
- Computing Science and Statistics
, 2001
"... Ensemble classifiers originated in the machine learning community. They work by fitting many individual classifiers and combining them by weighted or unweighted voting. The ensemble classifier is often much more accurate than the individual classifiers from which it is built. In fact, ensemble class ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
(Show Context)
Ensemble classifiers originated in the machine learning community. They work by fitting many individual classifiers and combining them by weighted or unweighted voting. The ensemble classifier is often much more accurate than the individual classifiers from which it is built. In fact, ensemble classifiers are among the most accurate general-purpose classifiers available. We introduce a new ensemble method, PERT, in which each individual classifier is a perfectly-fit classification tree with random selection of splits. Compared to other ensemble methods, PERT is very fast to fit. Given the randomness of the split selection, PERT is surprisingly accurate. Calculations suggest that one reason why PERT works so well is that although the individual tree classifiers are extremely weak, they are almost uncorrelated. The simple probabilistic nature of the classifier lends itself to theoretical analysis. We show that PERT is fitting a continuous posterior probability surface for each class. As such, it can be viewed as a classification-via-regression procedure that fits a continuous interpolating surface. In theory, this surface could be found using a one-shot procedure.
Integrated sensing and processing decision trees
- IEEE Trans. on Pat. Anal. and Mach. Intel
, 2004
"... Abstract—We introduce a methodology for adaptive sequential sensing and processing in a classification setting. Our objective for sensor optimization is the back-end performance metric—in this case, misclassification rate. Our methodology, which we dub Integrated Sensing and Processing Decision Tree ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
(Show Context)
Abstract—We introduce a methodology for adaptive sequential sensing and processing in a classification setting. Our objective for sensor optimization is the back-end performance metric—in this case, misclassification rate. Our methodology, which we dub Integrated Sensing and Processing Decision Trees (ISPDT), optimizes adaptive sequential sensing for scenarios in which sensor and/or throughput constraints dictate that only a small subset of all measurable attributes can be measured at any one time. Our decision trees optimize misclassification rate by invoking a local dimensionality reduction-based partitioning metric in the early stages, focusing on classification only in the leaves of the tree. We present the ISPDT methodology and illustrative theoretical, simulation, and experimental results. Index Terms—Classification, clustering, adaptive sensing, sequential sensing, local dimensionality reduction. æ 1
Distributed program sampling
- In Proceedings of PLDI’03
, 2003
"... We propose a sampling infrastructure for gathering information about software from the set of runs experienced by its user community. We show how to gather random samples with very low overhead for users, and we also show how to make use of the information we gather. We present two example applicati ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
(Show Context)
We propose a sampling infrastructure for gathering information about software from the set of runs experienced by its user community. We show how to gather random samples with very low overhead for users, and we also show how to make use of the information we gather. We present two example applications: sharing the overhead of assertions, and using statistical analysis of a large number of sampled runs to help isolate the location of a bug. 1.
Attractor Networks for Shape Recognition
- Neural Computation
, 1999
"... We describe a system of thousands of binary perceptrons with coarse oriented edges as input which is able to successfully recognize shapes, even in a context with hundreds of classes. The perceptrons have randomized feedforward connections from the input layer and form a recurrent network among ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
We describe a system of thousands of binary perceptrons with coarse oriented edges as input which is able to successfully recognize shapes, even in a context with hundreds of classes. The perceptrons have randomized feedforward connections from the input layer and form a recurrent network among themselves. Each class is represented by a pre-learned attractor (serving as an associative `hook') in the recurrent net, corresponding to a randomly selected subpopulation of the perceptrons. In training rst the attractor of the correct class is activated among the perceptrons, and then the visual stimulus is presented at the input layer, The feedforward connections are then modied using eld dependent Hebbian learning with positive synapses, which we show to be stable with respect to large variations in feature statistics and coding levels, and allows the use of the same threshold on all perceptrons. Recognition is based only on the visual stimuli. These activate the recurrent n...
Recent advances in predictive (machine) learning,
- Journal of Classification
, 2006
"... Prediction involves estimating the unknown value of an attribute of a system under study given the values of other measured attributes. In prediction (machine) learning the prediction rule is derived from data consisting of previously solved cases. Most methods for predictive learning were originat ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Prediction involves estimating the unknown value of an attribute of a system under study given the values of other measured attributes. In prediction (machine) learning the prediction rule is derived from data consisting of previously solved cases. Most methods for predictive learning were originated many years ago at the dawn of the computer age. Recently two new techniques have emerged that have revitalized the field. These are support vector machines and boosted decision trees. This paper provides an introduction to these two new methods tracing their respective ancestral roots to standard kernel methods and ordinary decision trees.