Results 1  10
of
31
MultiManifold SemiSupervised Learning
"... We study semisupervised learning when the data consists of multiple intersecting manifolds. We give a finite sample analysis to quantify the potential gain of using unlabeled data in this multimanifold setting. We then propose a semisupervised learning algorithm that separates different manifolds ..."
Abstract

Cited by 147 (9 self)
 Add to MetaCart
(Show Context)
We study semisupervised learning when the data consists of multiple intersecting manifolds. We give a finite sample analysis to quantify the potential gain of using unlabeled data in this multimanifold setting. We then propose a semisupervised learning algorithm that separates different manifolds into decision sets, and performs supervised learning within each set. Our algorithm involves a novel application of Hellinger distance and sizeconstrained spectral clustering. Experiments demonstrate the benefit of our multimanifold semisupervised learning approach. 1
Statistical Analysis of SemiSupervised Regression
"... Semisupervised methods use unlabeled data in addition to labeled data to construct predictors. While existing semisupervised methods have shown some promising empirical performance, their development has been based largely based on heuristics. In this paper we study semisupervised learning from t ..."
Abstract

Cited by 42 (1 self)
 Add to MetaCart
Semisupervised methods use unlabeled data in addition to labeled data to construct predictors. While existing semisupervised methods have shown some promising empirical performance, their development has been based largely based on heuristics. In this paper we study semisupervised learning from the viewpoint of minimax theory. Our first result shows that some common methods based on regularization using graph Laplacians do not lead to faster minimax rates of convergence. Thus, the estimators that use the unlabeled data do not have smaller risk than the estimators that use only labeled data. We then develop several new approaches that provably lead to improved performance. The statistical tools of minimax analysis are thus used to offer some new perspective on the problem of semisupervised learning. 1
Unlabeled data: Now it helps, now it doesn’t
"... Empirical evidence shows that in favorable situations semisupervised learning (SSL) algorithms can capitalize on the abundance of unlabeled training data to improve the performance of a learning task, in the sense that fewer labeled training data are needed to achieve a target error bound. However, ..."
Abstract

Cited by 36 (2 self)
 Add to MetaCart
(Show Context)
Empirical evidence shows that in favorable situations semisupervised learning (SSL) algorithms can capitalize on the abundance of unlabeled training data to improve the performance of a learning task, in the sense that fewer labeled training data are needed to achieve a target error bound. However, in other situations unlabeled data do not seem to help. Recent attempts at theoretically characterizing SSL gains only provide a partial and sometimes apparently conflicting explanations of whether, and to what extent, unlabeled data can help. In this paper, we attempt to bridge the gap between the practice and theory of semisupervised learning. We develop a finite sample analysis that characterizes the value of unlabeled data and quantifies the performance improvement of SSL compared to supervised learning. We show that there are large classes of problems for which SSL can significantly outperform supervised learning, in finite sample regimes and sometimes also in terms of error convergence rates. 1
Manifold regularization and semisupervised learning: some theoretical analysis
, 2008
"... Manifold regularization (Belkin et al., 2006) is a geometrically motivated framework for machine learning within which several semisupervised algorithms have been constructed. Here we try to provide some theoretical understanding of this approach. Our main result is to expose the natural structure ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
(Show Context)
Manifold regularization (Belkin et al., 2006) is a geometrically motivated framework for machine learning within which several semisupervised algorithms have been constructed. Here we try to provide some theoretical understanding of this approach. Our main result is to expose the natural structure of a class of problems on which manifold regularization methods are helpful. We show that for such problems, no supervised learner can learn effectively. On the other hand, a manifold based learner (that knows the manifold or “learns ” it from unlabeled examples) can learn with relatively few labeled examples. Our analysis follows a minimax style with an emphasis on finite sample results (in terms of n: the number of labeled examples). These results allow us to properly interpret manifold regularization and related spectral and geometric algorithms in terms of their potential use in semisupervised learning.
kNN Regression Adapts to Local Intrinsic Dimension
"... Many nonparametric regressors were recently shown to converge at rates that depend only on the intrinsic dimension of data. These regressors thus escape the curse of dimension when highdimensional data has low intrinsic dimension (e.g. a manifold). We show that kNN regression is also adaptive to i ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
(Show Context)
Many nonparametric regressors were recently shown to converge at rates that depend only on the intrinsic dimension of data. These regressors thus escape the curse of dimension when highdimensional data has low intrinsic dimension (e.g. a manifold). We show that kNN regression is also adaptive to intrinsic dimension. In particular our rates are local to a query x and depend only on the way masses of balls centered at x vary with radius. Furthermore, we show a simple way to choose k = k(x) locally at any x so as to nearly achieve the minimax rate at x in terms of the unknown intrinsic dimension in the vicinity of x. We also establish that the minimax rate does not depend on a particular choice of metric space or distribution, but rather that this minimax rate holds for any metric space and doubling measure. 1
REGRESSION ON MANIFOLDS: ESTIMATION OF THE EXTERIOR DERIVATIVE
 SUBMITTED TO THE ANNALS OF STATISTICS
, 2010
"... Collinearity and nearcollinearity of predictors cause difficulties when doing regression. In these cases, variable selection becomes untenable because of mathematical issues concerning the existence and numerical stability of the regression coefficients, and interpretation of the coefficients is am ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
Collinearity and nearcollinearity of predictors cause difficulties when doing regression. In these cases, variable selection becomes untenable because of mathematical issues concerning the existence and numerical stability of the regression coefficients, and interpretation of the coefficients is ambiguous because gradients are not defined. Using a differential geometric interpretation, in which the regression coefficients are interpreted as estimates of the exterior derivative of a function, we develop a new method to do regression in the presence of collinearities. Our regularization scheme can improve estimation error, and it can be easily modified to include lassotype regularization. These estimators also have simple extensions to the “large p, small n” context.
Escaping the curse of dimensionality with a treebased regressor
 Conference on Computational Learning Theory
, 2009
"... We present the first treebased regressor whose convergence rate depends only on the intrinsic dimension of the data, namely its Assouad dimension. The regressor uses the RPtree partitioning procedure, a simple randomized variant of kd trees. 1 ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
(Show Context)
We present the first treebased regressor whose convergence rate depends only on the intrinsic dimension of the data, namely its Assouad dimension. The regressor uses the RPtree partitioning procedure, a simple randomized variant of kd trees. 1
Nonlinear Manifold Representations for Functional Data
"... For functional data lying on an unknown nonlinear lowdimensional space, we study manifold learning and introduce the notions of manifold mean, manifold modes of functional variation and of functional manifold components. These constitute nonlinear representations of functional data that complement ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
For functional data lying on an unknown nonlinear lowdimensional space, we study manifold learning and introduce the notions of manifold mean, manifold modes of functional variation and of functional manifold components. These constitute nonlinear representations of functional data that complement classical linear representations such as eigenfunctions and functional principal components. Our manifold learning procedures borrow ideas from existing nonlinear dimension reduction methods, which we modify to address functional data settings. In simulations and applications, we study examples of functional data which lie on a manifold and validate the superior behavior of manifold mean and functional manifold components over traditional crosssectional mean and functional principal components. We also include consistency proofs for our estimators under certain assumptions. Key words and phrases: functional data analysis, modes of functional variation, functional manifold components, dimension reduction, smoothing.
D (2010) Learning gradients on manifolds
 Bernoulli
"... Abstract: A common belief in high dimensional data analysis is that data are concentrated on a low dimensional manifold. This motivates simultaneous dimension reduction and regression on manifolds. We provide an algorithm for learning gradients on manifolds for dimension reduction for high dimension ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
Abstract: A common belief in high dimensional data analysis is that data are concentrated on a low dimensional manifold. This motivates simultaneous dimension reduction and regression on manifolds. We provide an algorithm for learning gradients on manifolds for dimension reduction for high dimensional data with few observations. We obtain generalization error bounds for the gradient estimates and show that 1 imsart ver. 2006/03/07 file: finalversion.tex date: February 19, 2009S. Mukherjee et al./Learning Gradients on Manifolds 2 the convergence rate depends on the intrinsic dimension of the manifold and not on the dimension of the ambient space. We illustrate the efficacy of this approach empirically on simulated and real data and compare the method to other dimension reduction procedures.
Local linear regression on manifolds and its geometric interpretation
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2013
"... Highdimensional data analysis has been an active area, and the main focuses have been variable selection and dimension reduction. In practice, it occurs often that the variables are located on an unknown, lowerdimensional nonlinear manifold. Under this manifold assumption, one purpose of this pap ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
Highdimensional data analysis has been an active area, and the main focuses have been variable selection and dimension reduction. In practice, it occurs often that the variables are located on an unknown, lowerdimensional nonlinear manifold. Under this manifold assumption, one purpose of this paper is regression and gradient estimation on the manifold, and another is developing a new tool for manifold learning. To the first aim, we suggest directly reducing the dimensionality to the intrinsic dimension d of the manifold, and performing the popular local linear regression (LLR) on a tangent plane estimate. An immediate consequence is a dramatic reduction in the computation time when the ambient space dimension p d. We provide rigorous theoretical justification of the convergence of the proposed regression and gradient estimators by carefully analyzing the curvature, boundary, and nonuniform sampling effects. A bandwidth selector that can handle heteroscedastic errors is proposed. To the second aim, we analyze carefully the behavior of our regression estimator both in the interior and near the boundary of the manifold, and make explicit its relationship with manifold learning, in particular estimating the LaplaceBeltrami operator of the manifold. In this context, we also make clear that it is important to use a smaller bandwidth in the tangent plane estimation than in the LLR. Simulation studies and the Isomap face data example are used to illustrate the computational speed and estimation accuracy of our methods.