Results 1  10
of
417
A framework for learning predictive structures from multiple tasks and unlabeled data
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... One of the most important issues in machine learning is whether one can improve the performance of a supervised learning algorithm by including unlabeled data. Methods that use both labeled and unlabeled data are generally referred to as semisupervised learning. Although a number of such methods ar ..."
Abstract

Cited by 434 (3 self)
 Add to MetaCart
One of the most important issues in machine learning is whether one can improve the performance of a supervised learning algorithm by including unlabeled data. Methods that use both labeled and unlabeled data are generally referred to as semisupervised learning. Although a number of such methods are proposed, at the current stage, we still don’t have a complete understanding of their effectiveness. This paper investigates a closely related problem, which leads to a novel approach to semisupervised learning. Specifically we consider learning predictive structures on hypothesis spaces (that is, what kind of classifiers have good predictive power) from multiple learning tasks. We present a general framework in which the structural learning problem can be formulated and analyzed theoretically, and relate it to learning with unlabeled data. Under this framework, algorithms for structural learning will be proposed, and computational issues will be investigated. Experiments will be given to demonstrate the effectiveness of the proposed algorithms in the semisupervised learning setting.
The Convex Geometry of Linear Inverse Problems
, 2010
"... In applications throughout science and engineering one is often faced with the challenge of solving an illposed inverse problem, where the number of available measurements is smaller than the dimension of the model to be estimated. However in many practical situations of interest, models are constr ..."
Abstract

Cited by 178 (19 self)
 Add to MetaCart
In applications throughout science and engineering one is often faced with the challenge of solving an illposed inverse problem, where the number of available measurements is smaller than the dimension of the model to be estimated. However in many practical situations of interest, models are constrained structurally so that they only have a few degrees of freedom relative to their ambient dimension. This paper provides a general framework to convert notions of simplicity into convex penalty functions, resulting in convex optimization solutions to linear, underdetermined inverse problems. The class of simple models considered are those formed as the sum of a few atoms from some (possibly infinite) elementary atomic set; examples include wellstudied cases such as sparse vectors (e.g., signal processing, statistics) and lowrank matrices (e.g., control, statistics), as well as several others including sums of a few permutations matrices (e.g., ranked elections, multiobject tracking), lowrank tensors (e.g., computer vision, neuroscience), orthogonal matrices (e.g., machine learning), and atomic measures (e.g., system identification). The convex programming formulation is based on minimizing the norm induced by the convex hull of the atomic set; this norm is referred to as the atomic norm. The facial
The BrunnMinkowski inequality
 BULL. AMER. MATH. SOC. (N.S
, 2002
"... In 1978, Osserman [124] wrote an extensive survey on the isoperimetric inequality. The BrunnMinkowski inequality can be proved in a page, yet quickly yields the classical isoperimetric inequality for important classes of subsets of R n, and deserves to be better known. This guide explains the rela ..."
Abstract

Cited by 175 (9 self)
 Add to MetaCart
In 1978, Osserman [124] wrote an extensive survey on the isoperimetric inequality. The BrunnMinkowski inequality can be proved in a page, yet quickly yields the classical isoperimetric inequality for important classes of subsets of R n, and deserves to be better known. This guide explains the relationship between the BrunnMinkowski inequality and other inequalities in geometry and analysis, and some applications.
Local Rademacher complexities
 Annals of Statistics
, 2002
"... We propose new bounds on the error of learning algorithms in terms of a datadependent notion of complexity. The estimates we establish give optimal rates and are based on a local and empirical version of Rademacher averages, in the sense that the Rademacher averages are computed from the data, on a ..."
Abstract

Cited by 161 (21 self)
 Add to MetaCart
We propose new bounds on the error of learning algorithms in terms of a datadependent notion of complexity. The estimates we establish give optimal rates and are based on a local and empirical version of Rademacher averages, in the sense that the Rademacher averages are computed from the data, on a subset of functions with small empirical error. We present some applications to classification and prediction with convex function classes, and with kernel classes in particular.
Empirical margin distributions and bounding the generalization error of combined classifiers
 Ann. Statist
, 2002
"... Dedicated to A.V. Skorohod on his seventieth birthday We prove new probabilistic upper bounds on generalization error of complex classifiers that are combinations of simple classifiers. Such combinations could be implemented by neural networks or by voting methods of combining the classifiers, such ..."
Abstract

Cited by 156 (11 self)
 Add to MetaCart
(Show Context)
Dedicated to A.V. Skorohod on his seventieth birthday We prove new probabilistic upper bounds on generalization error of complex classifiers that are combinations of simple classifiers. Such combinations could be implemented by neural networks or by voting methods of combining the classifiers, such as boosting and bagging. The bounds are in terms of the empirical distribution of the margin of the combined classifier. They are based on the methods of the theory of Gaussian and empirical processes (comparison inequalities, symmetrization method, concentration inequalities) and they improve previous results of Bartlett (1998) on bounding the generalization error of neural networks in terms of ℓ1norms of the weights of neurons and of Schapire, Freund, Bartlett and Lee (1998) on bounding the generalization error of boosting. We also obtain rates of convergence in Lévy distance of empirical margin distribution to the true margin distribution uniformly over the classes of classifiers and prove the optimality of these rates.
Lectures on the central limit theorem for empirical processes
 Probability and Banach Spaces
, 1986
"... Abstract. Concentration inequalities are used to derive some new inequalities for ratiotype suprema of empirical processes. These general inequalities are used to prove several new limit theorems for ratiotype suprema and to recover anumber of the results from [1] and [2]. As a statistical applica ..."
Abstract

Cited by 135 (9 self)
 Add to MetaCart
(Show Context)
Abstract. Concentration inequalities are used to derive some new inequalities for ratiotype suprema of empirical processes. These general inequalities are used to prove several new limit theorems for ratiotype suprema and to recover anumber of the results from [1] and [2]. As a statistical application, an oracle inequality for nonparametric regression is obtained via ratio bounds. 1.
Noise stability of functions with low influences: invariance and optimality
"... In this paper we study functions with low influences on product probability spaces. The analysis of boolean functions f: {−1, 1} n → {−1, 1} with low influences has become a central problem in discrete Fourier analysis. It is motivated by fundamental questions arising from the construction of proba ..."
Abstract

Cited by 126 (17 self)
 Add to MetaCart
In this paper we study functions with low influences on product probability spaces. The analysis of boolean functions f: {−1, 1} n → {−1, 1} with low influences has become a central problem in discrete Fourier analysis. It is motivated by fundamental questions arising from the construction of probabilistically checkable proofs in theoretical computer science and from problems in the theory of social choice in economics. We prove an invariance principle for multilinear polynomials with low influences and bounded degree; it shows that under mild conditions the distribution of such polynomials is essentially invariant for all product spaces. Ours is one of the very few known nonlinear invariance principles. It has the advantage that its proof is simple and that the error bounds are explicit. We also show that the assumption of bounded degree can be eliminated if the polynomials are slightly “smoothed”; this extension is essential for our applications to “noise stability”type problems. In particular, as applications of the invariance principle we prove two conjectures: the “Majority Is Stablest ” conjecture [27] from theoretical computer science, which was the original motivation for this work, and the “It Ain’t Over Till It’s Over” conjecture [25] from social choice theory. The “Majority Is Stablest ” conjecture and its generalizations proven here in conjunction with “Unique Games” and its variants imply a number of (optimal) inapproximability results for graph problems.
Theory of classification: A survey of some recent advances
, 2005
"... The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have led to these recent results. ..."
Abstract

Cited by 91 (3 self)
 Add to MetaCart
The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have led to these recent results.