Results 1 - 10
of
26
Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps
- Proceedings of the National Academy of Sciences
, 2005
"... of contexts of data analysis, such as spectral graph theory, manifold learning, nonlinear principal components and kernel methods. We augment these approaches by showing that the diffusion distance is a key intrinsic geometric quantity linking spectral theory of the Markov process, Laplace operators ..."
Abstract
-
Cited by 99 (29 self)
- Add to MetaCart
of contexts of data analysis, such as spectral graph theory, manifold learning, nonlinear principal components and kernel methods. We augment these approaches by showing that the diffusion distance is a key intrinsic geometric quantity linking spectral theory of the Markov process, Laplace operators, or kernels, to the corresponding geometry and density of the data. This opens the door to the application of methods from numerical analysis and signal processing to the analysis of functions and transformations of the data. Abstract. We provide a framework for structural multiscale geometric organization of graphs and subsets of Rn. We use diffusion semigroups to generate multiscale geometries in order to organize and represent complex structures. We show that appropriately selected eigenfunctions or scaling functions of Markov matrices, which describe local transitions, lead to macroscopic descriptions at different scales. The process of iterating or diffusing the Markov matrix is seen as a generalization of some aspects of the Newtonian paradigm, in which local infinitesimal transitions of a system lead to global macroscopic descriptions by integration. In Part I below, we provide a unified view of ideas from data analysis, machine learning and numerical analysis. In Part II [1], we augment this approach by introducing fast order-N algorithms for homogenization of heterogeneous structures as well as for data representation. 1.
Diffusion maps and coarse-graining: A unified framework for dimensionality reduction, graph partitioning and data set parameterization
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2006
"... We provide evidence that non-linear dimensionality reduction, clustering and data set parameterization can be solved within one and the same framework. The main idea is to define a system of coordinates with an explicit metric that reflects the connectivity of a given data set and that is robust to ..."
Abstract
-
Cited by 58 (5 self)
- Add to MetaCart
We provide evidence that non-linear dimensionality reduction, clustering and data set parameterization can be solved within one and the same framework. The main idea is to define a system of coordinates with an explicit metric that reflects the connectivity of a given data set and that is robust to noise. Our construction, which is based on a Markov random walk on the data, offers a general scheme of simultaneously reorganizing and subsampling graphs and arbitrarily shaped data sets in high dimensions using intrinsic geometry. We show that clustering in embedding spaces is equivalent to compressing operators. The objective of data partitioning and clustering is to coarse-grain the random walk on the data while at the same time preserving a diffusion operator for the intrinsic geometry or connectivity of the data set up to some accuracy. We show that the quantization distortion in diffusion space bounds the error of compression of the operator, thus giving a rigorous justification for k-means clustering in diffusion space and a precise measure of the performance of general clustering algorithms.
Diffusion maps, spectral clustering and eigenfunctions of fokker-planck operators
- in Advances in Neural Information Processing Systems 18
, 2005
"... This paper presents a diffusion based probabilistic interpretation of spectral clustering and dimensionality reduction algorithms that use the eigenvectors of the normalized graph Laplacian. Given the pairwise adjacency matrix of all points, we define a diffusion distance between any two data points ..."
Abstract
-
Cited by 40 (7 self)
- Add to MetaCart
This paper presents a diffusion based probabilistic interpretation of spectral clustering and dimensionality reduction algorithms that use the eigenvectors of the normalized graph Laplacian. Given the pairwise adjacency matrix of all points, we define a diffusion distance between any two data points and show that the low dimensional representation of the data by the first few eigenvectors of the corresponding Markov matrix is optimal under a certain mean squared error criterion. Furthermore, assuming that data points are random samples from a density p(x) = e −U(x) we identify these eigenvectors as discrete approximations of eigenfunctions of a Fokker-Planck operator in a potential 2U(x) with reflecting boundary conditions. Finally, applying known results regarding the eigenvalues and eigenfunctions of the continuous Fokker-Planck operator, we provide a mathematical justification for the success of spectral clustering and dimensional reduction algorithms based on these first few eigenvectors. This analysis elucidates, in terms of the characteristics of diffusion processes, many empirical findings regarding spectral clustering algorithms.
Data Fusion and Multicue Data Matching by Diffusion Maps
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2006
"... Data fusion and multi-cue data matching are fundamental tasks of high-dimensional data analysis. In this paper, we apply the recently introduced diffusion framework to address these tasks. Our contribution is three-fold. First, we present the Laplace-Beltrami approach for computing density invariant ..."
Abstract
-
Cited by 22 (2 self)
- Add to MetaCart
Data fusion and multi-cue data matching are fundamental tasks of high-dimensional data analysis. In this paper, we apply the recently introduced diffusion framework to address these tasks. Our contribution is three-fold. First, we present the Laplace-Beltrami approach for computing density invariant embeddings which are essential for integrating different sources of data. Second, we describe a refinement of the Nyström extension algorithm called “geometric harmonics”. We also explain how to use this tool for data assimilation. Finally, we introduce a multi-cue data matching scheme based on nonlinear spectral graphs alignment. The effectiveness of the presented schemes is validated by applying it to the problems of lip-reading and image sequence alignment.
An experimental investigation of graph kernels on a collaborative recommendation task
- Proceedings of the 6th International Conference on Data Mining (ICDM 2006
, 2006
"... This paper presents a survey as well as a systematic empirical comparison of seven graph kernels and two related similarity matrices (simply referred to as graph kernels), namely the exponential diffusion kernel, the Laplacian exponential diffusion kernel, the von Neumann diffusion kernel, the regul ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
This paper presents a survey as well as a systematic empirical comparison of seven graph kernels and two related similarity matrices (simply referred to as graph kernels), namely the exponential diffusion kernel, the Laplacian exponential diffusion kernel, the von Neumann diffusion kernel, the regularized Laplacian kernel, the commute-time kernel, the random-walk-with-restart similarity matrix, and finally, three graph kernels introduced in this paper: the regularized commute-time kernel, the Markov diffusion kernel, and the cross-entropy diffusion matrix. The kernel-on-a-graph approach is simple and intuitive. It is illustrated by applying the nine graph kernels to a collaborative-recommendation task and to a semisupervised classification task, both on several databases. The graph methods compute proximity measures between nodes that help study the structure of the graph. Our comparisons suggest that the regularized commute-time and the Markov diffusion kernels perform best, closely followed by the regularized Laplacian kernel. 1
Riemannian manifold learning for nonlinear dimensionality reduction
- IN PROCEEDINGS OF ECCV
, 2006
"... In recent years, nonlinear dimensionality reduction (NLDR) techniques have attracted much attention in visual perception and many other areas of science. We propose an efficient algorithm called Riemannian manifold learning (RML). A Riemannian manifold can be constructed in the form of a simplicia ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
In recent years, nonlinear dimensionality reduction (NLDR) techniques have attracted much attention in visual perception and many other areas of science. We propose an efficient algorithm called Riemannian manifold learning (RML). A Riemannian manifold can be constructed in the form of a simplicial complex, and thus its intrinsic dimension can be reliably estimated. Then the NLDR problem is solved by constructing Riemannian normal coordinates (RNC). Experimental results demonstrate that our algorithm can learn the data’s intrinsic geometric structure, yielding uniformly distributed and well organized low-dimensional embedding data.
A family of dissimilarity measures between nodes generalizing both the shortest-path and the commute-time distances
- in Proceedings of the 14th SIGKDD International Conference on Knowledge Discovery and Data Mining
"... This work introduces a new family of link-based dissimilarity measures between nodes of a weighted directed graph. This measure, called the randomized shortest-path (RSP) dissimilarity, depends on a parameter θ and has the interesting property of reducing, on one end, to the standard shortest-path d ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
This work introduces a new family of link-based dissimilarity measures between nodes of a weighted directed graph. This measure, called the randomized shortest-path (RSP) dissimilarity, depends on a parameter θ and has the interesting property of reducing, on one end, to the standard shortest-path distance when θ is large and, on the other end, to the commute-time (or resistance) distance when θ is small (near zero). Intuitively, it corresponds to the expected cost incurred by a random walker in order to reach a destination node from a starting node while maintaining a constant entropy (related to θ) spread in the graph. The parameter θ is therefore biasing gradually the simple random walk on the graph towards the shortest-path policy. By adopting a statistical physics approach and computing a sum over all the possible paths (discrete path integral), it is shown that the RSP dissimilarity from every node to a particular node of interest can be computed efficiently by solving two linear systems of n equations, where n is the number of nodes. On the other hand, the dissimilarity between every couple of nodes is obtained by inverting an n × n matrix. The proposed measure can be used for various graph mining tasks such as computing betweenness centrality, finding dense communities, etc, as shown in the experimental section.
Dimensionality Reduction: A Comparative Review
, 2008
"... In recent years, a variety of nonlinear dimensionality reduction techniques have been proposed, many of which rely on the evaluation of local properties of the data. The paper presents a review and systematic comparison of these techniques. The performances of the techniques are investigated on arti ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
In recent years, a variety of nonlinear dimensionality reduction techniques have been proposed, many of which rely on the evaluation of local properties of the data. The paper presents a review and systematic comparison of these techniques. The performances of the techniques are investigated on artificial and natural tasks. The results of the experiments reveal that nonlinear techniques perform well on selected artificial tasks, but do not outperform the traditional PCA on real-world tasks. The paper explains these results by identifying weaknesses of current nonlinear techniques, and suggests how the performance of nonlinear dimensionality reduction techniques may be improved.
Set oriented computation of transport rates in 3-degree of freedom systems: scattering rates for the Rydberg atom in crossed fields, in preparation
, 2005
"... We present a new method based on set oriented computations for the calculation of reaction rates in chemical systems. The method is demonstrated with the Rydberg atom, an example for which traditional Transition State Theory fails. Coupled with dynamical systems theory, the set oriented approach pro ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We present a new method based on set oriented computations for the calculation of reaction rates in chemical systems. The method is demonstrated with the Rydberg atom, an example for which traditional Transition State Theory fails. Coupled with dynamical systems theory, the set oriented approach provides a global description of the dynamics. The main idea of the method is as follows. We construct a box covering of a Poincaré section under consideration, use the Poincaré first return time for the identification of those regions relevant for transport and then we apply an adaptation of recently developed techniques for the computation of transport rates ([12], [27]). The reaction rates in chemical systems are of great interest in chemistry, especially for realistic three and higher dimensional systems. Our approach is applied to the Rydberg atom in crossed electric and magnetic fields. Our methods are complementary to, but in common problems considered, agree with, the results of [14]. For the Rydberg atom, we consider the half and full scattering problems in both the 2- and the 3-degree of freedom systems. The ionization of such atoms is a system on which many experiments have been done and it serves to illustrate the elegance of our method. Contents To the memory of Henri Poincaré,

