Results 1  10
of
157
Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps
 Proceedings of the National Academy of Sciences
, 2005
"... of contexts of data analysis, such as spectral graph theory, manifold learning, nonlinear principal components and kernel methods. We augment these approaches by showing that the diffusion distance is a key intrinsic geometric quantity linking spectral theory of the Markov process, Laplace operators ..."
Abstract

Cited by 259 (48 self)
 Add to MetaCart
of contexts of data analysis, such as spectral graph theory, manifold learning, nonlinear principal components and kernel methods. We augment these approaches by showing that the diffusion distance is a key intrinsic geometric quantity linking spectral theory of the Markov process, Laplace operators, or kernels, to the corresponding geometry and density of the data. This opens the door to the application of methods from numerical analysis and signal processing to the analysis of functions and transformations of the data. Abstract. We provide a framework for structural multiscale geometric organization of graphs and subsets of Rn. We use diffusion semigroups to generate multiscale geometries in order to organize and represent complex structures. We show that appropriately selected eigenfunctions or scaling functions of Markov matrices, which describe local transitions, lead to macroscopic descriptions at different scales. The process of iterating or diffusing the Markov matrix is seen as a generalization of some aspects of the Newtonian paradigm, in which local infinitesimal transitions of a system lead to global macroscopic descriptions by integration. In Part I below, we provide a unified view of ideas from data analysis, machine learning and numerical analysis. In Part II [1], we augment this approach by introducing fast orderN algorithms for homogenization of heterogeneous structures as well as for data representation. 1.
On the Nyström Method for Approximating a Gram Matrix for Improved KernelBased Learning
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... A problem for many kernelbased methods is that the amount of computation required to find the solution scales as O(n³), where n is the number of training examples. We develop and analyze an algorithm to compute an easilyinterpretable lowrank approximation to an nn Gram matrix G such that compu ..."
Abstract

Cited by 187 (11 self)
 Add to MetaCart
A problem for many kernelbased methods is that the amount of computation required to find the solution scales as O(n³), where n is the number of training examples. We develop and analyze an algorithm to compute an easilyinterpretable lowrank approximation to an nn Gram matrix G such that computations of interest may be performed more rapidly. The approximation is of the form G k = CW , where C is a matrix consisting of a small number c of columns of G and W k is the best rankk approximation to W , the matrix formed by the intersection between those c columns of G and the corresponding c rows of G. An important aspect of the algorithm is the probability distribution used to randomly sample the columns; we will use a judiciouslychosen and datadependent nonuniform probability distribution. Let F denote the spectral norm and the Frobenius norm, respectively, of a matrix, and let G k be the best rankk approximation to G. We prove that by choosing O(k/# ) columns both in expectation and with high probability, for both # = 2, F , and for all k : 0 rank(W ). This approximation can be computed using O(n) additional space and time, after making two passes over the data from external storage. The relationships between this algorithm, other related matrix decompositions, and the Nyström method from integral equation theory are discussed.
Learning a kernel matrix for nonlinear dimensionality reduction
 In Proceedings of the Twenty First International Conference on Machine Learning (ICML04
, 2004
"... We investigate how to learn a kernel matrix for high dimensional data that lies on or near a low dimensional manifold. Noting that the kernel matrix implicitly maps the data into a nonlinear feature space, we show how to discover a mapping that “unfolds ” the underlying manifold from which the data ..."
Abstract

Cited by 154 (9 self)
 Add to MetaCart
(Show Context)
We investigate how to learn a kernel matrix for high dimensional data that lies on or near a low dimensional manifold. Noting that the kernel matrix implicitly maps the data into a nonlinear feature space, we show how to discover a mapping that “unfolds ” the underlying manifold from which the data was sampled. The kernel matrix is constructed by maximizing the variance in feature space subject to local constraints that preserve the angles and distances between nearest neighbors. The main optimization involves an instance of semidefinite programming—a fundamentally different computation than previous algorithms for manifold learning, such as Isomap and locally linear embedding. The optimized kernels perform better than polynomial and Gaussian kernels for problems in manifold learning, but worse for problems in large margin classification. We explain these results in terms of the geometric properties of different kernels and comment on various interpretations of other manifold learning algorithms as kernel methods.
Diffusion Wavelets
, 2004
"... We present a multiresolution construction for efficiently computing, compressing and applying large powers of operators that have high powers with low numerical rank. This allows the fast computation of functions of the operator, notably the associated Green’s function, in compressed form, and their ..."
Abstract

Cited by 149 (18 self)
 Add to MetaCart
(Show Context)
We present a multiresolution construction for efficiently computing, compressing and applying large powers of operators that have high powers with low numerical rank. This allows the fast computation of functions of the operator, notably the associated Green’s function, in compressed form, and their fast application. Classes of operators satisfying these conditions include diffusionlike operators, in any dimension, on manifolds, graphs, and in nonhomogeneous media. In this case our construction can be viewed as a farreaching generalization of Fast Multipole Methods, achieved through a different point of view, and of the nonstandard wavelet representation of CalderónZygmund and pseudodifferential operators, achieved through a different multiresolution analysis adapted to the operator. We show how the dyadic powers of an operator can be used to induce a multiresolution analysis, as in classical LittlewoodPaley and wavelet theory, and we show how to construct, with fast and stable algorithms, scaling function and wavelet bases associated to this multiresolution analysis, and the corresponding downsampling operators, and use them to compress the corresponding powers of the operator. This allows to extend multiscale signal processing to general spaces (such as manifolds and graphs) in a very natural way, with corresponding fast algorithms.
Dimensionality reduction of multimodal labeled data by local Fisher discriminant analysis
 Journal of Machine Learning Research
, 2007
"... Reducing the dimensionality of data without losing intrinsic information is an important preprocessing step in highdimensional data analysis. Fisher discriminant analysis (FDA) is a traditional technique for supervised dimensionality reduction, but it tends to give undesired results if samples in a ..."
Abstract

Cited by 123 (11 self)
 Add to MetaCart
(Show Context)
Reducing the dimensionality of data without losing intrinsic information is an important preprocessing step in highdimensional data analysis. Fisher discriminant analysis (FDA) is a traditional technique for supervised dimensionality reduction, but it tends to give undesired results if samples in a class are multimodal. An unsupervised dimensionality reduction method called localitypreserving projection (LPP) can work well with multimodal data due to its locality preserving property. However, since LPP does not take the label information into account, it is not necessarily useful in supervised learning scenarios. In this paper, we propose a new linear supervised dimensionality reduction method called local Fisher discriminant analysis (LFDA), which effectively combines the ideas of FDA and LPP. LFDA has an analytic form of the embedding transformation and the solution can be easily computed just by solving a generalized eigenvalue problem. We demonstrate the practical usefulness and high scalability of the LFDA method in data visualization and classification tasks through extensive simulation studies. We also show that LFDA can be extended to nonlinear dimensionality reduction scenarios by applying the kernel trick.
A Review of Kernel Methods in Machine Learning
, 2006
"... We review recent methods for learning with positive definite kernels. All these methods formulate learning and estimation problems as linear tasks in a reproducing kernel Hilbert space (RKHS) associated with a kernel. We cover a wide range of methods, ranging from simple classifiers to sophisticate ..."
Abstract

Cited by 95 (4 self)
 Add to MetaCart
(Show Context)
We review recent methods for learning with positive definite kernels. All these methods formulate learning and estimation problems as linear tasks in a reproducing kernel Hilbert space (RKHS) associated with a kernel. We cover a wide range of methods, ranging from simple classifiers to sophisticated methods for estimation with structured data.
Diffusion Maps, Spectral Clustering and Reaction
 Applied and Computational Harmonic Analysis: Special issue on Diffusion Maps and Wavelets
, 2006
"... A central problem in data analysis is the low dimensional representation of high dimensional data, and the concise description of its underlying geometry and density. In the analysis of large scale simulations of complex dynamical systems, where the notion of time evolution comes into play, importan ..."
Abstract

Cited by 94 (14 self)
 Add to MetaCart
A central problem in data analysis is the low dimensional representation of high dimensional data, and the concise description of its underlying geometry and density. In the analysis of large scale simulations of complex dynamical systems, where the notion of time evolution comes into play, important problems are the identification of slow variables and dynamically meaningful reaction coordinates that capture the long time evolution of the system. In this paper we provide a unifying view of these apparently different tasks, by considering a family of di#usion maps, defined as the embedding of complex (high dimensional) data onto a low dimensional Euclidian space, via the eigenvectors of suitably defined random walks defined on the given datasets. Assuming that the data is randomly sampled from an underlying general probability distribution p(x) = e U(x) , we show that as the number of samples goes to infinity, the eigenvectors of each di#usion map converge to the eigenfunctions of a corresponding di#erential operator defined on the support of the probability distribution. Di#erent normalizations of the Markov chain on the graph lead to di#erent limiting di#erential operators.
Learning Eigenfunctions Links Spectral Embedding And Kernel PCA
 NEURAL COMPUTATION
, 2004
"... In this paper, we show a direct relation between spectral embedding methods and kernel PCA, and how both are special cases of a more general learning problem, that of learning the principal eigenfunctions of an operator defined from a kernel and the unknown data generating density. Whereas ..."
Abstract

Cited by 86 (5 self)
 Add to MetaCart
(Show Context)
In this paper, we show a direct relation between spectral embedding methods and kernel PCA, and how both are special cases of a more general learning problem, that of learning the principal eigenfunctions of an operator defined from a kernel and the unknown data generating density. Whereas
Nonlinear dimensionality reduction by semidefinite programming and kernel matrix factorization
 in Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics
, 2005
"... We describe an algorithm for nonlinear dimensionality reduction based on semidefinite programming and kernel matrix factorization. The algorithm learns a kernel matrix for high dimensional data that lies on or near a low dimensional manifold. In earlier work, the kernel matrix was learned by maximiz ..."
Abstract

Cited by 66 (5 self)
 Add to MetaCart
(Show Context)
We describe an algorithm for nonlinear dimensionality reduction based on semidefinite programming and kernel matrix factorization. The algorithm learns a kernel matrix for high dimensional data that lies on or near a low dimensional manifold. In earlier work, the kernel matrix was learned by maximizing the variance in feature space while preserving the distances and angles between nearest neighbors. In this paper, adapting recent ideas from semisupervised learning on graphs, we show that the full kernel matrix can be very well approximated by a product of smaller matrices. Representing the kernel matrix in this way, we can reformulate the semidefinite program in terms of a much smaller submatrix of inner products between randomly chosen landmarks. The new framework leads to orderofmagnitude reductions in computation time and makes it possible to study much larger problems in manifold learning. 1
Iterative kernel principal component analysis for image modeling
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2005
"... Abstract In recent years, Kernel Principal Component Analysis (KPCA) has been suggested for various image processing tasks requiring an image model such as, e.g., denoising or compression. The original form of KPCA, however, can be only applied to strongly restricted image classes due to the limite ..."
Abstract

Cited by 57 (3 self)
 Add to MetaCart
(Show Context)
Abstract In recent years, Kernel Principal Component Analysis (KPCA) has been suggested for various image processing tasks requiring an image model such as, e.g., denoising or compression. The original form of KPCA, however, can be only applied to strongly restricted image classes due to the limited number of training examples that can be processed. We therefore propose a new iterative method for performing KPCA, the Kernel Hebbian Algorithm which iteratively estimates the Kernel Principal Components with only linear order memory complexity. In our experiments, we compute models for complex image classes such as faces and natural images which require a large number of training examples. The resulting image models are tested in singleframe superresolution and denoising applications. The KPCA model is not specifically tailored to these tasks; in fact, the same model can be used in superresolution with variable input resolution, or denoising with unknown noise characteristics. In spite of this, both superresolution and denoising performance are comparable to existing methods.