Results 1 - 10
of
32
Nearest-neighbor searching and metric space dimensions
- In Nearest-Neighbor Methods for Learning and Vision: Theory and Practice
, 2006
"... Given a set S of n sites (points), and a distance measure d, the nearest neighbor searching problem is to build a data structure so that given a query point q, the site nearest to q can be found quickly. This paper gives a data structure for this problem; the data structure is built using the distan ..."
Abstract
-
Cited by 63 (0 self)
- Add to MetaCart
Given a set S of n sites (points), and a distance measure d, the nearest neighbor searching problem is to build a data structure so that given a query point q, the site nearest to q can be found quickly. This paper gives a data structure for this problem; the data structure is built using the distance function as a “black box”. The structure is able to speed up nearest neighbor searching in a variety of settings, for example: points in low-dimensional or structured Euclidean space, strings under Hamming and edit distance, and bit vector data from an OCR application. The data structures are observed to need linear space, with a modest constant factor. The preprocessing time needed per site is observed to match the query time. The data structure can be viewed as an application of a “kd-tree ” approach in the metric space setting, using Voronoi regions of a subset in place of axis-aligned boxes. 1
Random projections of smooth manifolds
- Foundations of Computational Mathematics
, 2006
"... We propose a new approach for nonadaptive dimensionality reduction of manifold-modeled data, demonstrating that a small number of random linear projections can preserve key information about a manifold-modeled signal. We center our analysis on the effect of a random linear projection operator Φ: R N ..."
Abstract
-
Cited by 53 (19 self)
- Add to MetaCart
We propose a new approach for nonadaptive dimensionality reduction of manifold-modeled data, demonstrating that a small number of random linear projections can preserve key information about a manifold-modeled signal. We center our analysis on the effect of a random linear projection operator Φ: R N → R M, M < N, on a smooth well-conditioned K-dimensional submanifold M ⊂ R N. As our main theoretical contribution, we establish a sufficient number M of random projections to guarantee that, with high probability, all pairwise Euclidean and geodesic distances between points on M are well-preserved under the mapping Φ. Our results bear strong resemblance to the emerging theory of Compressed Sensing (CS), in which sparse signals can be recovered from small numbers of random linear measurements. As in CS, the random measurements we propose can be used to recover the original data in R N. Moreover, like the fundamental bound in CS, our requisite M is linear in the “information level” K and logarithmic in the ambient dimension N; we also identify a logarithmic dependence on the volume and conditioning of the manifold. In addition to recovering faithful approximations to manifold-modeled signals, however, the random projections we propose can also be used to discern key properties about the manifold. We discuss connections and contrasts with existing techniques in manifold learning, a setting where dimensionality reducing mappings are typically nonlinear and constructed adaptively from a set of sampled training data.
Translated Poisson mixture model for stratification learning
- Int. J. Comput. Vision
, 2000
"... A framework for the regularized and robust estimation of non-uniform dimensionality and density in high dimensional noisy data is introduced in this work. This leads to learning stratifications, that is, mixture of manifolds representing different characteristics and complexities in the data set. Th ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
A framework for the regularized and robust estimation of non-uniform dimensionality and density in high dimensional noisy data is introduced in this work. This leads to learning stratifications, that is, mixture of manifolds representing different characteristics and complexities in the data set. The basic idea relies on modeling the high dimensional sample points as a process of Translated Poisson mixtures, with regularizing restrictions, leading to a model which includes the presence of noise. The Translated Poisson distribution is useful to model a noisy counting process, and it is derived from the noise-induced translation of a regular Poisson distribution. By maximizing the log-likelihood of the process counting the points falling into a local ball, we estimate the local dimension and density. We show that
Stratification learning: Detecting mixed density and dimensionality in high dimensional point clouds
- In Advances in NIPS 19
, 2006
"... The study of point cloud data sampled from a stratification, a collection of manifolds with possible different dimensions, is pursued in this paper. We present a technique for simultaneously soft clustering and estimating the mixed dimensionality and density of such structures. The framework is base ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
The study of point cloud data sampled from a stratification, a collection of manifolds with possible different dimensions, is pursued in this paper. We present a technique for simultaneously soft clustering and estimating the mixed dimensionality and density of such structures. The framework is based on a maximum likelihood estimation of a Poisson mixture model. The presentation of the approach is completed with artificial and real examples demonstrating the importance of extending manifold learning to stratification learning. 1
Estimates of the information content and dimensionality of natural scenes from proximity distributions
, 2007
"... ..."
A duality view of spectral methods for dimensionality reduction
- In ICML ’06: Proceedings of the 23rd international conference on Machine learning
, 2006
"... We present a unified duality view of several recently emerged spectral methods for nonlinear dimensionality reduction, including Isomap, locally linear embedding, Laplacian eigenmaps, and maximum variance unfolding. We discuss the duality theory for the maximum variance unfolding problem, and show t ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
We present a unified duality view of several recently emerged spectral methods for nonlinear dimensionality reduction, including Isomap, locally linear embedding, Laplacian eigenmaps, and maximum variance unfolding. We discuss the duality theory for the maximum variance unfolding problem, and show that other methods are directly related to either its primal formulation or its dual formulation, or can be interpreted from the optimality conditions. This duality framework reveals close connections between these seemingly quite different algorithms. In particular, it resolves the myth about these methods in using either the top eigenvectors of a dense matrix, or the bottom eigenvectors of a sparse matrix — these two eigenspaces are exactly aligned at primal-dual optimality. 1.
A.O.: Estimating local intrinsic dimension with knearest neighbor graphs
- In: IEEE Workshop on Statistical Signal Processing (SSP
, 2005
"... Abstract — Many high-dimensional data sets of practical interest exhibit a varying complexity in different parts of the data space. This is the case, for example, of databases of images containing many samples of a few textures of different complexity. Such phenomena can be modeled by assuming that ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Abstract — Many high-dimensional data sets of practical interest exhibit a varying complexity in different parts of the data space. This is the case, for example, of databases of images containing many samples of a few textures of different complexity. Such phenomena can be modeled by assuming that the data lies on a collection of manifolds with different intrinsic dimensionalities. In this extended abstract, we introduce a method to estimate the local dimensionality associated with each point in a data set, without any prior information about the manifolds, their quantity and their sampling distributions. The proposed method uses a global dimensionality estimator based on k-nearest neighbor (k-NN) graphs, together with an algorithm for computing neighborhoods in the data with similar topological properties.
De-biasing for intrinsic dimension estimation
- in Proc. IEEE Statistical Signal Processing Workshop
, 2007
"... Many algorithms have been proposed for estimating the intrinsic dimension of high dimensional data. A phenomenon common to all of them is a negative bias, perceived to be the result of undersampling. We propose improved methods for estimating intrinsic dimension, taking manifold boundaries into cons ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
Many algorithms have been proposed for estimating the intrinsic dimension of high dimensional data. A phenomenon common to all of them is a negative bias, perceived to be the result of undersampling. We propose improved methods for estimating intrinsic dimension, taking manifold boundaries into consideration. By estimating dimension locally, we are able to analyze and reduce the effect that sample data depth has on the negative bias. Additionally, we offer improvements to an existing algorithm for dimension estimation, based on k-nearest neighbor graphs, and offer an algorithm for adapting any dimension estimation algorithm to operate locally. Finally, we illustrate the uses of local dimension estimation with data sets consisting of multiple manifolds, including applications such as diagnosing anomalies in router networks and image segmentation. Index Terms — Intrinsic dimension, manifold learning, Riemannian manifold, nearest neighbor graph, geodesics
Learning nonlinear image manifolds by global alignment of local linear models
- IEEE Trans. Pattern Analysis and Machine Intell
"... Abstract—Appearance-based methods, based on statistical models of the pixel values in an image (region) rather than geometrical object models, are increasingly popular in computer vision. In many applications, the number of degrees of freedom (DOF) in the image generating process is much lower than ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract—Appearance-based methods, based on statistical models of the pixel values in an image (region) rather than geometrical object models, are increasingly popular in computer vision. In many applications, the number of degrees of freedom (DOF) in the image generating process is much lower than the number of pixels in the image. If there is a smooth function that maps the DOF to the pixel values, then the images are confined to a low-dimensional manifold embedded in the image space. We propose a method based on probabilistic mixtures of factor analyzers to 1) model the density of images sampled from such manifolds and 2) recover global parameterizations of the manifold. A globally nonlinear probabilistic two-way mapping between coordinates on the manifold and images is obtained by combining several, locally valid, linear mappings. We propose a parameter estimation scheme that improves upon an existing scheme and experimentally compare the presented approach to self-organizing maps, generative topographic mapping, and mixtures of factor analyzers. In addition, we show that the approach also applies to finding mappings between different embeddings of the same manifold. Index Terms—Feature extraction or construction, machine learning, statistical image representation. 1
On dimensionality reduction for classification and its application
- in IEEE Int. Conf. Acoust., Speech. Signal Processing
, 2006
"... In this paper, we evaluate the contribution of the classification constrained dimensionality reduction (CCDR) algorithm to the performance of several classifiers. We present an extension to previously introduced CCDR algorithm to multiple hypotheses. We investigate classification performance using t ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
In this paper, we evaluate the contribution of the classification constrained dimensionality reduction (CCDR) algorithm to the performance of several classifiers. We present an extension to previously introduced CCDR algorithm to multiple hypotheses. We investigate classification performance using the CCDR algorithm on hyperspectral satellite imagery data. We demonstrate the performance gain for both local and global classifiers and demonstrate a 10 % improvement of the k-nearest neighbors algorithm performance. We present a connection between intrinsic dimension estimation and the optimal embedding dimension obtained using the CCDR algorithm. 1.

