Results 1 - 10
of
12
Minimax-optimal classification with dyadic decision trees
- IEEE TRANSACTIONS ON INFORMATION THEORY
, 2006
"... Decision trees are among the most popular types of classifiers, with interpretability and ease of im-plementation being among their chief attributes. Despite the widespread use of decision trees, theoretical analysis of their performance has only begun to emerge in recent years. In this paper it is ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
Decision trees are among the most popular types of classifiers, with interpretability and ease of im-plementation being among their chief attributes. Despite the widespread use of decision trees, theoretical analysis of their performance has only begun to emerge in recent years. In this paper it is shown that a new family of decision trees, dyadic decision trees (DDTs), attain nearly optimal (in a minimax sense) rates of convergence for a broad range of classification problems. Furthermore, DDTs are surprisingly adaptive in three important respects: They automatically (1) adapt to favorable conditions near the Bayes decision boundary; (2) focus on data distributed on lower dimensional manifolds; and (3) reject irrelevant features. DDTs are constructed by penalized empirical risk minimization using a new data-dependent penalty and may be computed exactly with computational complexity that is nearly linear in the training sample size. DDTs are the first classifier known to achieve nearly optimal rates for the diverse class of distributions studied here while also being practical and implementable. This is also the first study (of which we are aware) to consider rates for adaptation to intrinsic data dimension and relevant features.
Learning minimum volume sets
- J. Machine Learning Res
, 2006
"... Given a probability measure P and a reference measure µ, one is often interested in the minimum µ-measure set with P-measure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P, and are useful for detecting anomalies and constructing confidence region ..."
Abstract
-
Cited by 19 (8 self)
- Add to MetaCart
Given a probability measure P and a reference measure µ, one is often interested in the minimum µ-measure set with P-measure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P, and are useful for detecting anomalies and constructing confidence regions. This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P. Other than these samples, no other information is available regarding P, but the reference measure µ is assumed to be known. We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classification. As in classification, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. Thus we obtain finite sample size performance bounds in terms of VC dimension and related quantities. We also demonstrate strong universal consistency and an oracle inequality. Estimators based on histograms and dyadic partitions illustrate the proposed rules. 1
A Neyman-Pearson approach to statistical learning
- IEEE Trans. Inform. Theory
, 2005
"... The Neyman-Pearson (NP) approach to hypothesis testing is useful in situations where different types of error have different consequences or a priori probabilities are unknown. For any α> 0, the Neyman-Pearson lemma specifies the most powerful test of size α, but assumes the distributions for each h ..."
Abstract
-
Cited by 19 (8 self)
- Add to MetaCart
The Neyman-Pearson (NP) approach to hypothesis testing is useful in situations where different types of error have different consequences or a priori probabilities are unknown. For any α> 0, the Neyman-Pearson lemma specifies the most powerful test of size α, but assumes the distributions for each hypothesis are known or (in some cases) the likelihood ratio is monotonic in an unknown parameter. This paper investigates an extension of NP theory to situations in which one has no knowledge of the underlying distributions except for a collection of independent and identically distributed training examples from each hypothesis. Building on a “fundamental lemma ” of Cannon et al., we demonstrate that several concepts from statistical learning theory have counterparts in the NP context. Specifically, we consider constrained versions of empirical risk minimization (NP-ERM) and structural risk minimization (NP-SRM), and prove performance guarantees for both. General conditions are given under which NP-SRM leads to strong universal consistency. We also apply NP-SRM to (dyadic) decision trees to derive rates of convergence. Finally, we present explicit algorithms to implement NP-SRM for histograms and dyadic decision trees. 1
Optimal dyadic decision trees
, 2007
"... We introduce a novel algorithm for building an optimal dyadic decision tree (ODT). The method combines guaranteed performance in the learning theoretical sense and optimal search from the algorithmic point of view. Furthermore it inherits the explanatory power of tree approaches, while improving per ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
We introduce a novel algorithm for building an optimal dyadic decision tree (ODT). The method combines guaranteed performance in the learning theoretical sense and optimal search from the algorithmic point of view. Furthermore it inherits the explanatory power of tree approaches, while improving performance over classical approaches such as CART/C4.5, as shown on experiments on artificial and benchmark data.
Fast search for best representations in multitree dictionaries
- In Wavelet Applications in Signal and Image Processing VIII, Proc. SPIE 4119, 2000. [7] S.G. Mallat. A Wavelet Tour of Signal Processing, Second Edition
, 2006
"... Abstract—We address the best basis problem—or, more generally, the best representation problem: Given a signal, a dictionary of representations, and an additive cost function, the aim is to select the representation from the dictionary which minimizes the cost for the given signal. We develop a new ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
Abstract—We address the best basis problem—or, more generally, the best representation problem: Given a signal, a dictionary of representations, and an additive cost function, the aim is to select the representation from the dictionary which minimizes the cost for the given signal. We develop a new framework of multitree dictionaries, which includes some previously proposed dictionaries as special cases. We show how to efficiently find the best representation in a multitree dictionary using a recursive tree-pruning algorithm. We illustrate our framework through several examples, including a novel block image coder, which significantly outperforms both the standard JPEG and quadtree-based methods and is comparable to embedded coders such as JPEG2000 and SPIHT. Index Terms—Best basis, grammar, image compression, JPEG. I.
Hierarchical stochastic image grammars for classification and segmentation
- IEEE Trans. Image Processing
, 2006
"... Abstract—We develop a new class of hierarchical stochastic image models called spatial random trees (SRTs) which admit polynomial-complexity exact inference algorithms. Our framework of multitree dictionaries is the starting point for this construction. SRTs are stochastic hidden tree models whose l ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
Abstract—We develop a new class of hierarchical stochastic image models called spatial random trees (SRTs) which admit polynomial-complexity exact inference algorithms. Our framework of multitree dictionaries is the starting point for this construction. SRTs are stochastic hidden tree models whose leaves are associated with image data. The states at the tree nodes are random variables, and, in addition, the structure of the tree is random and is generated by a probabilistic grammar. We describe an efficient recursive algorithm for obtaining the maximum a posteriori estimate of both the tree structure and the tree states given an image. We also develop an efficient procedure for performing one iteration of the expectation-maximization algorithm and use it to estimate the model parameters from a set of training images. We address other inference problems arising in applications such as maximization of posterior marginals and hypothesis testing. Our models and algorithms are illustrated through several image classification and segmentation experiments, ranging from the segmentation of synthetic images to the classification of natural photographs and the segmentation of scanned documents. In each case, we show that our method substantially improves accuracy over a variety of existing methods. Index Terms—Dictionary, estimation, grammar, hierarchical model, image classification, probabilistic context-free grammar, segmentation, statistical image model, stochastic context-free grammar, tree model. I.
Minimax optimal level set estimation
- in Proc. SPIE, Wavelets XI, 31 July - 4
, 2005
"... Abstract — This paper describes a new methodology and associated theoretical analysis for rapid and accurate extraction of level sets of a multivariate function from noisy data. The identification of the boundaries of such sets is an important theoretical problem with applications for digital elevat ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Abstract — This paper describes a new methodology and associated theoretical analysis for rapid and accurate extraction of level sets of a multivariate function from noisy data. The identification of the boundaries of such sets is an important theoretical problem with applications for digital elevation maps, medical imaging, and pattern recognition. This problem is significantly different from classical segmentation because level set boundaries may not correspond to singularities or edges in the underlying function; as a result, segmentation methods which rely upon detecting boundaries would be potentially ineffective in this regime. This issue is addressed in this paper through a novel error metric sensitive to both the error in the location of the level set estimate and the deviation of the function from the critical level. Hoeffding’s inequality is used to derive a novel regularization
On the adaptive properties of decision trees
- Advances in Neural Information Processing Systems 17
, 2005
"... Decision trees are surprisingly adaptive in three important respects: They automatically (1) adapt to favorable conditions near the Bayes decision boundary; (2) focus on data distributed on lower dimensional manifolds; (3) reject irrelevant features. In this paper we examine a decision tree based on ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Decision trees are surprisingly adaptive in three important respects: They automatically (1) adapt to favorable conditions near the Bayes decision boundary; (2) focus on data distributed on lower dimensional manifolds; (3) reject irrelevant features. In this paper we examine a decision tree based on dyadic splits that adapts to each of these conditions to achieve minimax optimal rates of convergence. The proposed classifier is the first known to achieve these optimal rates while being practical and implementable. 1
Level set estimation in medical imaging
- in IEEE SSP Workshop
, 2005
"... Rapid and accurate extraction of level sets and isoconcentration surfaces from noisy medical images is a common problem arising in a variety of contexts, such as estimating regions in which uptake of a pharmaceutical has exceeded some critical value or identifying areas of brain activity in neuroima ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Rapid and accurate extraction of level sets and isoconcentration surfaces from noisy medical images is a common problem arising in a variety of contexts, such as estimating regions in which uptake of a pharmaceutical has exceeded some critical value or identifying areas of brain activity in neuroimaging. In general, a level set is the set S on which a function f exceeds a critical value (e.g. S = {x: f(x)> γ}). Boundaries of level sets and isoconcentration surfaces typically constitute manifolds embedded in the highdimensional observation space. The tree structures underlying our method are constructed by minimizing a complexity regularized data-fitting term over a family of dyadic partitions. Our method specifically aims to minimize an error metric sensitive to both deviations in the location of the level set and the rate of change of the surface intensity or activity level statistic in the vicinity of the level
Level set estimation via trees
- in Proc. Int. Conf. Acoustics, Speech & Signal Proc
, 2005
"... Tree-structured partitions provide a natural framework for rapid and accurate extraction of the level sets of a multivariate function f from noisy data. In general, a level set is the set S on which f exceeds some critical value (e.g., S = {x: f(x) ≥ γ}). Boundaries of level sets typically constitu ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Tree-structured partitions provide a natural framework for rapid and accurate extraction of the level sets of a multivariate function f from noisy data. In general, a level set is the set S on which f exceeds some critical value (e.g., S = {x: f(x) ≥ γ}). Boundaries of level sets typically constitute manifolds embedded in the high-dimensional observation space. The identification of these boundaries is an important theoretical problem with applications for digital elevation maps, medical imaging, and pattern recognition. Because level set identification is intrinsically simpler than field denoising or estimation, explicit level set extraction methods can achieve higher accuracy than more indirect approaches (such as extracting a level set from an estimate of the function). The trees underlying our method are constructed by minimizing a complexity regularized data-fitting term over a family of dyadic partitions. Our method automatically adapts to spatially varying regularity of both the level set and the field underlying the data. Level set extraction using multiresolution trees can be implemented in near linear time and specifically aims to minimize an error metric sensitive to both the error in the location of the level set and the associated field estimation error. 1.

