Results 1 - 10
of
199
A framework for learning predictive structures from multiple tasks and unlabeled data
- Journal of Machine Learning Research
, 2005
"... One of the most important issues in machine learning is whether one can improve the performance of a supervised learning algorithm by including unlabeled data. Methods that use both labeled and unlabeled data are generally referred to as semi-supervised learning. Although a number of such methods ar ..."
Abstract
-
Cited by 202 (2 self)
- Add to MetaCart
One of the most important issues in machine learning is whether one can improve the performance of a supervised learning algorithm by including unlabeled data. Methods that use both labeled and unlabeled data are generally referred to as semi-supervised learning. Although a number of such methods are proposed, at the current stage, we still don’t have a complete understanding of their effectiveness. This paper investigates a closely related problem, which leads to a novel approach to semi-supervised learning. Specifically we consider learning predictive structures on hypothesis spaces (that is, what kind of classifiers have good predictive power) from multiple learning tasks. We present a general framework in which the structural learning problem can be formulated and analyzed theoretically, and relate it to learning with unlabeled data. Under this framework, algorithms for structural learning will be proposed, and computational issues will be investigated. Experiments will be given to demonstrate the effectiveness of the proposed algorithms in the semi-supervised learning setting. 1.
Consistency of spectral clustering
, 2004
"... Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spe ..."
Abstract
-
Cited by 170 (11 self)
- Add to MetaCart
Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spectral clustering algorithms, which cluster the data with the help of eigenvectors of graph Laplacian matrices. We show that one of the two of major classes of spectral clustering (normalized clustering) converges under some very general conditions, while the other (unnormalized), is only consistent under strong additional assumptions, which, as we demonstrate, are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering in practical applications. We believe that methods used in our analysis will provide a basis for future exploration of Laplacian-based methods in a statistical setting.
On the Influence of the Kernel on the Consistency of Support Vector Machines
- Journal of Machine Learning Research
, 2001
"... In this article we study the generalization abilities of several classifiers of support vector machine (SVM) type using a certain class of kernels that we call universal. It is shown that the soft margin algorithms with universal kernels are consistent for a large class of classification problems ..."
Abstract
-
Cited by 104 (16 self)
- Add to MetaCart
In this article we study the generalization abilities of several classifiers of support vector machine (SVM) type using a certain class of kernels that we call universal. It is shown that the soft margin algorithms with universal kernels are consistent for a large class of classification problems including some kind of noisy tasks provided that the regularization parameter is chosen well. In particular we derive a simple su#cient condition for this parameter in the case of Gaussian RBF kernels. On the one hand our considerations are based on an investigation of an approximation property---the so-called universality---of the used kernels that ensures that all continuous functions can be approximated by certain kernel expressions. This approximation property also gives a new insight into the role of kernels in these and other algorithms. On the other hand the results are achieved by a precise study of the underlying optimization problems of the classifiers. Furthermore, we show consistency for the maximal margin classifier as well as for the soft margin SVM's in the presence of large margins. In this case it turns out that also constant regularization parameters ensure consistency for the soft margin SVM's. Finally we prove that even for simple, noise free classification problems SVM's with polynomial kernels can behave arbitrarily badly.
Ratio Limit Theorems for Empirical Processes
- Annals of Probability
, 2003
"... Concentration inequalities are used to derive some new inequalities for ratio-type suprema of empirical processes. These general inequalities are used to prove several new limit theorems for ratio-type suprema and to recover anumber of the results from [1] and [2]. As a statistical application, an o ..."
Abstract
-
Cited by 77 (3 self)
- Add to MetaCart
Concentration inequalities are used to derive some new inequalities for ratio-type suprema of empirical processes. These general inequalities are used to prove several new limit theorems for ratio-type suprema and to recover anumber of the results from [1] and [2]. As a statistical application, an oracle inequality for nonparametric regression is obtained via ratio bounds. 1.
Local Rademacher complexities
- Annals of Statistics
, 2002
"... We propose new bounds on the error of learning algorithms in terms of a data-dependent notion of complexity. The estimates we establish give optimal rates and are based on a local and empirical version of Rademacher averages, in the sense that the Rademacher averages are computed from the data, on a ..."
Abstract
-
Cited by 76 (17 self)
- Add to MetaCart
We propose new bounds on the error of learning algorithms in terms of a data-dependent notion of complexity. The estimates we establish give optimal rates and are based on a local and empirical version of Rademacher averages, in the sense that the Rademacher averages are computed from the data, on a subset of functions with small empirical error. We present some applications to classification and prediction with convex function classes, and with kernel classes in particular.
Convexity, Classification, and Risk Bounds
- JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2003
"... Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convex surrogate of the 0-1 loss function. The convexity makes these algorithms computationally efficien ..."
Abstract
-
Cited by 73 (11 self)
- Add to MetaCart
Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convex surrogate of the 0-1 loss function. The convexity makes these algorithms computationally efficient. The use of a surrogate, however, has statistical consequences that must be balanced against the computational virtues of convexity. To study these issues, we provide a general quantitative relationship between the risk as assessed using the 0-1 loss and the risk as assessed using any nonnegative surrogate loss function. We show that this relationship gives nontrivial upper bounds on excess risk under the weakest possible condition on the loss function: that it satisfy a pointwise form of Fisher consistency for classification. The relationship is based on a simple variational transformation of the loss function that is easy to compute in many applications. We also present a refined version of this result in the case of low noise. Finally, we
Compressed sensing and redundant dictionaries
, 2006
"... This article extends the concept of compressed sensing to signals that are not sparse in an orthonormal basis but rather in a redundant dictionary. It is shown that a matrix, which is a composition of a random matrix of certain type and a deterministic dictionary, has small restricted isometry const ..."
Abstract
-
Cited by 54 (8 self)
- Add to MetaCart
This article extends the concept of compressed sensing to signals that are not sparse in an orthonormal basis but rather in a redundant dictionary. It is shown that a matrix, which is a composition of a random matrix of certain type and a deterministic dictionary, has small restricted isometry constants. Thus, signals that are sparse with respect to the dictionary can be recovered via Basis Pursuit from a small number of random measurements. Further, thresholding is investigated as recovery algorithm for compressed sensing and conditions are provided that guarantee reconstruction with high probability. The different schemes are compared by numerical experiments. Key words: compressed sensing, redundant dictionary, sparse approximation, random
Posterior consistency of Dirichlet mixtures in density estimation
- Ann. Statist
, 1999
"... A Dirichlet mixture of normal densities is a useful choice for a prior distribution on densities in the problem of Bayesian density estimation. In the recent years, efficient Markov chain Monte Carlo method for the computation of the posterior distribution has been developed. The method has been app ..."
Abstract
-
Cited by 47 (17 self)
- Add to MetaCart
A Dirichlet mixture of normal densities is a useful choice for a prior distribution on densities in the problem of Bayesian density estimation. In the recent years, efficient Markov chain Monte Carlo method for the computation of the posterior distribution has been developed. The method has been applied to data arising from different fields of interest. The important issue of consistency was however left open. In this paper, we settle this issue in affirmative. 1. Introduction. Recent

