Results 1  10
of
658
SemiSupervised Learning Literature Survey
, 2006
"... We review the literature on semisupervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole
spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semisupervised learning. This document is a chapter ..."
Abstract

Cited by 757 (8 self)
 Add to MetaCart
We review the literature on semisupervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole
spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semisupervised learning. This document is a chapter excerpt from the author’s
doctoral thesis (Zhu, 2005). However the author plans to update the online version frequently to incorporate the latest development in the field. Please obtain the latest
version at http://www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf
A framework for learning predictive structures from multiple tasks and unlabeled data
 Journal of Machine Learning Research
, 2005
"... One of the most important issues in machine learning is whether one can improve the performance of a supervised learning algorithm by including unlabeled data. Methods that use both labeled and unlabeled data are generally referred to as semisupervised learning. Although a number of such methods ar ..."
Abstract

Cited by 440 (3 self)
 Add to MetaCart
One of the most important issues in machine learning is whether one can improve the performance of a supervised learning algorithm by including unlabeled data. Methods that use both labeled and unlabeled data are generally referred to as semisupervised learning. Although a number of such methods are proposed, at the current stage, we still don’t have a complete understanding of their effectiveness. This paper investigates a closely related problem, which leads to a novel approach to semisupervised learning. Specifically we consider learning predictive structures on hypothesis spaces (that is, what kind of classifiers have good predictive power) from multiple learning tasks. We present a general framework in which the structural learning problem can be formulated and analyzed theoretically, and relate it to learning with unlabeled data. Under this framework, algorithms for structural learning will be proposed, and computational issues will be investigated. Experiments will be given to demonstrate the effectiveness of the proposed algorithms in the semisupervised learning setting. 1.
Learning Deep Architectures for AI
"... Theoretical results suggest that in order to learn the kind of complicated functions that can represent highlevel abstractions (e.g. in vision, language, and other AIlevel tasks), one may need deep architectures. Deep architectures are composed of multiple levels of nonlinear operations, such as i ..."
Abstract

Cited by 182 (32 self)
 Add to MetaCart
Theoretical results suggest that in order to learn the kind of complicated functions that can represent highlevel abstractions (e.g. in vision, language, and other AIlevel tasks), one may need deep architectures. Deep architectures are composed of multiple levels of nonlinear operations, such as in neural nets with many hidden layers or in complicated propositional formulae reusing many subformulae. Searching the parameter space of deep architectures is a difficult task, but learning algorithms such as those for Deep Belief Networks have recently been proposed to tackle this problem with notable success, beating the stateoftheart in certain areas. This paper discusses the motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of singlelayer models such as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks.
SemiSupervised Classification by Low Density Separation
, 2005
"... We believe that the cluster assumption is key to successful semisupervised learning. Based on this, we propose three semisupervised algorithms: 1. deriving graphbased distances that emphazise low density regions between clusters, followed by training a standard SVM; 2. optimizing the Transd ..."
Abstract

Cited by 175 (9 self)
 Add to MetaCart
We believe that the cluster assumption is key to successful semisupervised learning. Based on this, we propose three semisupervised algorithms: 1. deriving graphbased distances that emphazise low density regions between clusters, followed by training a standard SVM; 2. optimizing the Transductive SVM objective function, which places the decision boundary in low density regions, by gradient descent; 3. combining the first two to make maximum use of the cluster assumption. We compare with state of the art algorithms and demonstrate superior accuracy for the latter two methods.
Fast random walk with restart and its applications
 In ICDM ’06: Proceedings of the 6th IEEE International Conference on Data Mining
, 2006
"... How closely related are two nodes in a graph? How to compute this score quickly, on huge, diskresident, real graphs? Random walk with restart (RWR) provides a good relevance score between two nodes in a weighted graph, and it has been successfully used in numerous settings, like automatic captionin ..."
Abstract

Cited by 174 (19 self)
 Add to MetaCart
(Show Context)
How closely related are two nodes in a graph? How to compute this score quickly, on huge, diskresident, real graphs? Random walk with restart (RWR) provides a good relevance score between two nodes in a weighted graph, and it has been successfully used in numerous settings, like automatic captioning of images, generalizations to the “connection subgraphs”, personalized PageRank, and many more. However, the straightforward implementations of RWR do not scale for large graphs, requiring either quadratic space and cubic precomputation time, or slow response time on queries. We propose fast solutions to this problem. The heart of our approach is to exploit two important properties shared by many real graphs: (a) linear correlations and (b) blockwise, communitylike structure. We exploit the linearity by using lowrank matrix approximation, and the community structure by graph partitioning, followed by the ShermanMorrison lemma for matrix inversion. Experimental results on the Corel image and the DBLP dabasets demonstrate that our proposed methods achieve significant savings over the straightforward implementations: they can save several orders of magnitude in precomputation and storage cost, and they achieve up to 150x speed up with 90%+ quality preservation. 1
MultiManifold SemiSupervised Learning
"... We study semisupervised learning when the data consists of multiple intersecting manifolds. We give a finite sample analysis to quantify the potential gain of using unlabeled data in this multimanifold setting. We then propose a semisupervised learning algorithm that separates different manifolds ..."
Abstract

Cited by 143 (8 self)
 Add to MetaCart
We study semisupervised learning when the data consists of multiple intersecting manifolds. We give a finite sample analysis to quantify the potential gain of using unlabeled data in this multimanifold setting. We then propose a semisupervised learning algorithm that separates different manifolds into decision sets, and performs supervised learning within each set. Our algorithm involves a novel application of Hellinger distance and sizeconstrained spectral clustering. Experiments demonstrate the benefit of our multimanifold semisupervised learning approach. 1
Ranking on Data Manifolds
 Advances in Neural Information Processing Systems 16
, 2004
"... The Google search engine has enjoyed huge success with its web page ranking algorithm, which exploits global, rather than local, hyperlink structure of the web using random walks. Here we propose a simple universal ranking algorithm for data lying in the Euclidean space, such as text or image data. ..."
Abstract

Cited by 142 (1 self)
 Add to MetaCart
(Show Context)
The Google search engine has enjoyed huge success with its web page ranking algorithm, which exploits global, rather than local, hyperlink structure of the web using random walks. Here we propose a simple universal ranking algorithm for data lying in the Euclidean space, such as text or image data. The core idea of our method is to rank the data with respect to the intrinsic manifold structure collectively revealed by a great amount of data. Encouraging experimental results from synthetic, image, and text data illustrate the validity of our method.
Learning from labeled and unlabeled data on a directed graph
 in: Proceedings of the 22nd International Conference on Machine Learning (ICML
"... We propose a general framework for learning from labeled and unlabeled data on a directed graph in which the structure of the graph including the directionality of the edges is considered. The time complexity of the algorithm derived from this framework is nearly linear due to recently developed num ..."
Abstract

Cited by 133 (9 self)
 Add to MetaCart
(Show Context)
We propose a general framework for learning from labeled and unlabeled data on a directed graph in which the structure of the graph including the directionality of the edges is considered. The time complexity of the algorithm derived from this framework is nearly linear due to recently developed numerical techniques. In the absence of labeled instances, this framework can be utilized as a spectral clustering method for directed graphs, which generalizes the spectral clustering approach for undirected graphs. We have applied our framework to realworld web classification problems and obtained encouraging results. 1.
Scaling learning algorithms towards AI
 TO APPEAR IN “LARGESCALE KERNEL MACHINES”
, 2007
"... One longterm goal of machine learning research is to produce methods that are applicable to highly complex tasks, such as perception (vision, audition), reasoning, intelligent control, and other artificially intelligent behaviors. We argue that in order to progress toward this goal, the Machine Lea ..."
Abstract

Cited by 120 (23 self)
 Add to MetaCart
One longterm goal of machine learning research is to produce methods that are applicable to highly complex tasks, such as perception (vision, audition), reasoning, intelligent control, and other artificially intelligent behaviors. We argue that in order to progress toward this goal, the Machine Learning community must endeavor to discover algorithms that can learn highly complex functions, with minimal need for prior knowledge, and with minimal human intervention. We present mathematical and empirical evidence suggesting that many popular approaches to nonparametric learning, particularly kernel methods, are fundamentally limited in their ability to learn complex highdimensional functions. Our analysis focuses on two problems. First, kernel machines are shallow architectures, in which one large layer of simple template matchers is followed by a single layer of trainable coefficients. We argue that shallow architectures can be very inefficient in terms of required number of computational elements and examples. Second, we analyze a limitation of kernel machines with a local kernel, linked to the curse of dimensionality, that applies to supervised, unsupervised (manifold learning) and semisupervised kernel machines. Using empirical results on invariant image recognition tasks, kernel methods are compared with deep architectures, in and higherlevel representations. We argue that deep architectures have the potential to generalize in nonlocal ways, i.e., beyond immediate neighbors, and that this is crucial in order to make progress on the kind of complex tasks required for artificial intelligence.
Diffusion maps, spectral clustering and eigenfunctions of fokkerplanck operators
 in Advances in Neural Information Processing Systems 18
, 2005
"... This paper presents a diffusion based probabilistic interpretation of spectral clustering and dimensionality reduction algorithms that use the eigenvectors of the normalized graph Laplacian. Given the pairwise adjacency matrix of all points, we define a diffusion distance between any two data points ..."
Abstract

Cited by 114 (15 self)
 Add to MetaCart
(Show Context)
This paper presents a diffusion based probabilistic interpretation of spectral clustering and dimensionality reduction algorithms that use the eigenvectors of the normalized graph Laplacian. Given the pairwise adjacency matrix of all points, we define a diffusion distance between any two data points and show that the low dimensional representation of the data by the first few eigenvectors of the corresponding Markov matrix is optimal under a certain mean squared error criterion. Furthermore, assuming that data points are random samples from a density p(x) = e −U(x) we identify these eigenvectors as discrete approximations of eigenfunctions of a FokkerPlanck operator in a potential 2U(x) with reflecting boundary conditions. Finally, applying known results regarding the eigenvalues and eigenfunctions of the continuous FokkerPlanck operator, we provide a mathematical justification for the success of spectral clustering and dimensional reduction algorithms based on these first few eigenvectors. This analysis elucidates, in terms of the characteristics of diffusion processes, many empirical findings regarding spectral clustering algorithms.