Results 1  10
of
270
SemiSupervised Learning Literature Survey
, 2006
"... We review the literature on semisupervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole
spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semisupervised learning. This document is a chapter ..."
Abstract

Cited by 782 (8 self)
 Add to MetaCart
(Show Context)
We review the literature on semisupervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole
spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semisupervised learning. This document is a chapter excerpt from the author’s
doctoral thesis (Zhu, 2005). However the author plans to update the online version frequently to incorporate the latest development in the field. Please obtain the latest
version at http://www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf
Representation learning: A review and new perspectives.
 of IEEE Conf. Comp. Vision Pattern Recog. (CVPR),
, 2005
"... AbstractThe success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can b ..."
Abstract

Cited by 173 (4 self)
 Add to MetaCart
(Show Context)
AbstractThe success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representationlearning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.
Random projections of smooth manifolds
 Foundations of Computational Mathematics
, 2006
"... We propose a new approach for nonadaptive dimensionality reduction of manifoldmodeled data, demonstrating that a small number of random linear projections can preserve key information about a manifoldmodeled signal. We center our analysis on the effect of a random linear projection operator Φ: R N ..."
Abstract

Cited by 144 (26 self)
 Add to MetaCart
(Show Context)
We propose a new approach for nonadaptive dimensionality reduction of manifoldmodeled data, demonstrating that a small number of random linear projections can preserve key information about a manifoldmodeled signal. We center our analysis on the effect of a random linear projection operator Φ: R N → R M, M < N, on a smooth wellconditioned Kdimensional submanifold M ⊂ R N. As our main theoretical contribution, we establish a sufficient number M of random projections to guarantee that, with high probability, all pairwise Euclidean and geodesic distances between points on M are wellpreserved under the mapping Φ. Our results bear strong resemblance to the emerging theory of Compressed Sensing (CS), in which sparse signals can be recovered from small numbers of random linear measurements. As in CS, the random measurements we propose can be used to recover the original data in R N. Moreover, like the fundamental bound in CS, our requisite M is linear in the “information level” K and logarithmic in the ambient dimension N; we also identify a logarithmic dependence on the volume and conditioning of the manifold. In addition to recovering faithful approximations to manifoldmodeled signals, however, the random projections we propose can also be used to discern key properties about the manifold. We discuss connections and contrasts with existing techniques in manifold learning, a setting where dimensionality reducing mappings are typically nonlinear and constructed adaptively from a set of sampled training data.
Dimensionality reduction by learning an invariant mapping
 In Proc. Computer Vision and Pattern Recognition Conference (CVPR’06
, 2006
"... Dimensionality reduction involves mapping a set of high dimensional input points onto a low dimensional manifold so that “similar ” points in input space are mapped to nearby points on the manifold. We present a methodcalled Dimensionality Reduction by Learning an Invariant Mapping (DrLIM) for lea ..."
Abstract

Cited by 89 (12 self)
 Add to MetaCart
(Show Context)
Dimensionality reduction involves mapping a set of high dimensional input points onto a low dimensional manifold so that “similar ” points in input space are mapped to nearby points on the manifold. We present a methodcalled Dimensionality Reduction by Learning an Invariant Mapping (DrLIM) for learning a globally coherent nonlinear function that maps the data evenly to the output manifold. The learning relies solely on neighborhood relationships and does not require any distance measure in the input space. The method can learn mappings that are invariant to certain transformations of the inputs, as is demonstrated with a number of experiments. Comparisons are made to other techniques, in particular LLE. 1
Nonlinear dimensionality reduction by semidefinite programming and kernel matrix factorization
 in Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics
, 2005
"... We describe an algorithm for nonlinear dimensionality reduction based on semidefinite programming and kernel matrix factorization. The algorithm learns a kernel matrix for high dimensional data that lies on or near a low dimensional manifold. In earlier work, the kernel matrix was learned by maximiz ..."
Abstract

Cited by 66 (5 self)
 Add to MetaCart
(Show Context)
We describe an algorithm for nonlinear dimensionality reduction based on semidefinite programming and kernel matrix factorization. The algorithm learns a kernel matrix for high dimensional data that lies on or near a low dimensional manifold. In earlier work, the kernel matrix was learned by maximizing the variance in feature space while preserving the distances and angles between nearest neighbors. In this paper, adapting recent ideas from semisupervised learning on graphs, we show that the full kernel matrix can be very well approximated by a product of smaller matrices. Representing the kernel matrix in this way, we can reformulate the semidefinite program in terms of a much smaller submatrix of inner products between randomly chosen landmarks. The new framework leads to orderofmagnitude reductions in computation time and makes it possible to study much larger problems in manifold learning. 1
Tensor subspace analysis
 In Advances in Neural Information Processing Systems 18 (NIPS
, 2005
"... Previous work has demonstrated that the image variations of many objects (human faces in particular) under variable lighting can be effectively modeled by low dimensional linear spaces. The typical linear subspace learning algorithms include Principal Component Analysis (PCA), Linear Discriminant An ..."
Abstract

Cited by 65 (4 self)
 Add to MetaCart
(Show Context)
Previous work has demonstrated that the image variations of many objects (human faces in particular) under variable lighting can be effectively modeled by low dimensional linear spaces. The typical linear subspace learning algorithms include Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and Locality Preserving Projection (LPP). All of these methods consider an n1 × n2 image as a high dimensional vector in R n1×n2, while an image represented in the plane is intrinsically a matrix. In this paper, we propose a new algorithm called Tensor Subspace Analysis (TSA). TSA considers an image as the second order tensor in R n1 ⊗ R n2, where R n1 and R n2 are two vector spaces. The relationship between the column vectors of the image matrix and that between the row vectors can be naturally characterized by TSA. TSA detects the intrinsic local geometrical structure of the tensor space by learning a lower dimensional tensor subspace. We compare our proposed approach with PCA, LDA and LPP methods on two standard databases. Experimental results demonstrate that TSA achieves better recognition rate, while being much more efficient. 1
Euclidean embedding of cooccurrence data
 Advances in Neural Information Processing Systems 17
, 2005
"... Abstract Embedding algorithms search for low dimensional structure in complexdata, but most algorithms only handle objects of a single type for which pairwise distances are specified. This paper describes a method for embedding objects of different types, such as images and text, into a single comm ..."
Abstract

Cited by 65 (1 self)
 Add to MetaCart
(Show Context)
Abstract Embedding algorithms search for low dimensional structure in complexdata, but most algorithms only handle objects of a single type for which pairwise distances are specified. This paper describes a method for embedding objects of different types, such as images and text, into a single common Euclidean space based on their cooccurrence statistics. Thejoint distributions are modeled as exponentials of Euclidean distances in the lowdimensional embedding space, which links the problem to convex optimization over positive semidefinite matrices. The local structure of our embedding corresponds to the statistical correlations via random walks in the Euclidean space. We quantify the performance of our method on two text datasets, and show that it consistently and significantly outperforms standard methods of statistical correspondence modeling, such as multidimensional scaling and correspondence analysis. 1 Introduction Embeddings of objects in a lowdimensional space are an important tool in unsupervisedlearning and in preprocessing data for supervised learning algorithms. They are especially valuable for exploratory data analysis and visualization by providing easily interpretablerepresentations of the relationships among objects. Most current embedding techniques build low dimensional mappings that preserve certain relationships among objects and differ in the relationships they choose to preserve, which range from pairwise distances in multidimensional scaling (MDS) [4] to neighborhood structure in locally linear embedding[12]. All these methods operate on objects of a single type endowed with a measure of similarity or dissimilarity. However, realworld data often involve objects of several very different types without anatural measure of similarity. For example, typical web pages or scientific papers contain
The fastest mixing Markov process on a graph and a connection to a maximum variance unfolding problem
 SIAM REVIEW
, 2006
"... We consider a Markov process on a connected graph, with edges labeled with transition rates between the adjacent vertices. The distribution of the Markov process converges to the uniform distribution at a rate determined by the second smallest eigenvalue λ2 of the Laplacian of the weighted graph. I ..."
Abstract

Cited by 64 (4 self)
 Add to MetaCart
(Show Context)
We consider a Markov process on a connected graph, with edges labeled with transition rates between the adjacent vertices. The distribution of the Markov process converges to the uniform distribution at a rate determined by the second smallest eigenvalue λ2 of the Laplacian of the weighted graph. In this paper we consider the problem of assigning transition rates to the edges so as to maximize λ2 subject to a linear constraint on the rates. This is the problem of finding the fastest mixing Markov process (FMMP) on the graph. We show that the FMMP problem is a convex optimization problem, which can in turn be expressed as a semidefinite program, and therefore effectively solved numerically. We formulate a dual of the FMMP problem and show that it has a natural geometric interpretation as a maximum variance unfolding (MVU) problem, i.e., the problem of choosing a set of points to be as far apart as possible, measured by their variance, while respecting local distance constraints. This MVU problem is closely related to a problem recently proposed by Weinberger and Saul as a method for “unfolding ” highdimensional data that lies on a lowdimensional manifold. The duality between the FMMP and MVU problems sheds light on both problems, and allows us to characterize and, in some cases, find optimal solutions.
Gaussian Process Latent Variable Models for Human Pose Estimation
"... We describe a generative approach to recover 3D human pose from image silhouettes. Our method is based on learning a shared low dimensional latent representation capable of generating both human pose and image observations through the GPLVM [1]. We learn a dynamical model over the latent space whic ..."
Abstract

Cited by 52 (9 self)
 Add to MetaCart
(Show Context)
We describe a generative approach to recover 3D human pose from image silhouettes. Our method is based on learning a shared low dimensional latent representation capable of generating both human pose and image observations through the GPLVM [1]. We learn a dynamical model over the latent space which allows us to disambiguate between ambiguous silhouettes by temporal consistency. The model has only two free parameters and requires no manual initialization. 1.