Results 1  10
of
67
Maximum likelihood estimation of intrinsic dimension
 In Advances in Neural Information Processing Systems
, 2005
"... We propose a new method for estimating intrinsic dimension of a dataset derived by applying the principle of maximum likelihood to the distances between close neighbors. We derive the estimator by a Poisson process approximation, assess its bias and variance theoretically and by simulations, and ap ..."
Abstract

Cited by 141 (7 self)
 Add to MetaCart
(Show Context)
We propose a new method for estimating intrinsic dimension of a dataset derived by applying the principle of maximum likelihood to the distances between close neighbors. We derive the estimator by a Poisson process approximation, assess its bias and variance theoretically and by simulations, and apply it to a number of simulated and real datasets. We also show it has the best overall performance compared with two other intrinsic dimension estimators. 1
Nearestneighbor searching and metric space dimensions
 In NearestNeighbor Methods for Learning and Vision: Theory and Practice
, 2006
"... Given a set S of n sites (points), and a distance measure d, the nearest neighbor searching problem is to build a data structure so that given a query point q, the site nearest to q can be found quickly. This paper gives a data structure for this problem; the data structure is built using the distan ..."
Abstract

Cited by 106 (0 self)
 Add to MetaCart
Given a set S of n sites (points), and a distance measure d, the nearest neighbor searching problem is to build a data structure so that given a query point q, the site nearest to q can be found quickly. This paper gives a data structure for this problem; the data structure is built using the distance function as a “black box”. The structure is able to speed up nearest neighbor searching in a variety of settings, for example: points in lowdimensional or structured Euclidean space, strings under Hamming and edit distance, and bit vector data from an OCR application. The data structures are observed to need linear space, with a modest constant factor. The preprocessing time needed per site is observed to match the query time. The data structure can be viewed as an application of a “kdtree ” approach in the metric space setting, using Voronoi regions of a subset in place of axisaligned boxes. 1
Geodesic entropic graphs for dimension and entropy estimation in manifold learning
 IEEE TRANS. ON SIGNAL PROCESSING
, 2004
"... In the manifold learning problem, one seeks to discover a smooth low dimensional surface, i.e., a manifold embedded in a higher dimensional linear vector space, based on a set of measured sample points on the surface. In this paper, we consider the closely related problem of estimating the manifold ..."
Abstract

Cited by 97 (5 self)
 Add to MetaCart
In the manifold learning problem, one seeks to discover a smooth low dimensional surface, i.e., a manifold embedded in a higher dimensional linear vector space, based on a set of measured sample points on the surface. In this paper, we consider the closely related problem of estimating the manifold’s intrinsic dimension and the intrinsic entropy of the sample points. Specifically, we view the sample points as realizations of an unknown multivariate density supported on an unknown smooth manifold. We introduce a novel geometric approach based on entropic graph methods. Although the theory presented applies to this general class of graphs, we focus on the geodesicminimalspanningtree (GMST) to obtaining asymptotically consistent estimates of the manifold dimension and the Rényientropy of the sample density on the manifold. The GMST approach is striking in its simplicity and does not require reconstruction of the manifold or estimation of the multivariate density of the samples. The GMST method simply constructs a minimal spanning tree (MST) sequence using a geodesic edge matrix and uses the overall lengths of the MSTs to simultaneously estimate manifold dimension and entropy. We illustrate the GMST approach on standard synthetic manifolds as well as on real data sets consisting of images of faces.
Convergence of laplacian eigenmaps
 In NIPS
, 2006
"... Geometrically based methods for various tasks of machine learning have attracted considerable attention over the last few years. In this paper we show convergence of eigenvectors of the point cloud Laplacian to the eigenfunctions of the LaplaceBeltrami operator on the underlying manifold, thus esta ..."
Abstract

Cited by 46 (3 self)
 Add to MetaCart
(Show Context)
Geometrically based methods for various tasks of machine learning have attracted considerable attention over the last few years. In this paper we show convergence of eigenvectors of the point cloud Laplacian to the eigenfunctions of the LaplaceBeltrami operator on the underlying manifold, thus establishing the first convergence results for a spectral dimensionality reduction algorithm in the manifold setting. 1
Data Dimensionality Estimation Methods: A Survey
 Pattern Recognition
, 2003
"... In this paper, data dimensionality estimation methods are reviewed. The estimation of the dimensionality of a data set is a classical problem of pattern recognition. There are some good reviews [1] in literature but they do not include more recent developments based on fractal techniques and neural ..."
Abstract

Cited by 38 (1 self)
 Add to MetaCart
(Show Context)
In this paper, data dimensionality estimation methods are reviewed. The estimation of the dimensionality of a data set is a classical problem of pattern recognition. There are some good reviews [1] in literature but they do not include more recent developments based on fractal techniques and neural autoassociators. The aim of this paper is to provide an uptodate survey of the dimensionality estimation methods of a data set, paying special attention to the fractalbased methods.
Translated Poisson mixture model for stratification learning
 Int. J. Comput. Vision
, 2000
"... A framework for the regularized and robust estimation of nonuniform dimensionality and density in high dimensional noisy data is introduced in this work. This leads to learning stratifications, that is, mixture of manifolds representing different characteristics and complexities in the data set. Th ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
(Show Context)
A framework for the regularized and robust estimation of nonuniform dimensionality and density in high dimensional noisy data is introduced in this work. This leads to learning stratifications, that is, mixture of manifolds representing different characteristics and complexities in the data set. The basic idea relies on modeling the high dimensional sample points as a process of Translated Poisson mixtures, with regularizing restrictions, leading to a model which includes the presence of noise. The Translated Poisson distribution is useful to model a noisy counting process, and it is derived from the noiseinduced translation of a regular Poisson distribution. By maximizing the loglikelihood of the process counting the points falling into a local ball, we estimate the local dimension and density. We show that
Manifoldadaptive dimension estimation
 In ICML ’07: Proceedings of the 24th international conference on Machine learning
, 2007
"... Intuitively, learning should be easier when the data points lie on a lowdimensional submanifold of the input space. Recently there has been a growing interest in algorithms that aim to exploit such geometrical properties of the data. Oftentimes these algorithms require estimating the dimension of t ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
Intuitively, learning should be easier when the data points lie on a lowdimensional submanifold of the input space. Recently there has been a growing interest in algorithms that aim to exploit such geometrical properties of the data. Oftentimes these algorithms require estimating the dimension of the manifold first. In this paper we propose an algorithm for dimension estimation and study its finitesample behaviour. The algorithm estimates the dimension locally around the data points using nearest neighbor techniques and then combines these local estimates. We show that the rate of convergence of the resulting estimate is independent of the dimension of the input space and hence the algorithm is “manifoldadaptive”. Thus, when the manifold supporting the data is low dimensional, the algorithm can be exponentially more efficient than its counterparts that are not exploiting this property. Our computer experiments confirm the obtained theoretical results. 1.
Stratification learning: Detecting mixed density and dimensionality in high dimensional point clouds
 In Advances in NIPS 19
, 2006
"... The study of point cloud data sampled from a stratification, a collection of manifolds with possible different dimensions, is pursued in this paper. We present a technique for simultaneously soft clustering and estimating the mixed dimensionality and density of such structures. The framework is base ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
(Show Context)
The study of point cloud data sampled from a stratification, a collection of manifolds with possible different dimensions, is pursued in this paper. We present a technique for simultaneously soft clustering and estimating the mixed dimensionality and density of such structures. The framework is based on a maximum likelihood estimation of a Poisson mixture model. The presentation of the approach is completed with artificial and real examples demonstrating the importance of extending manifold learning to stratification learning. 1