Results 1 - 10
of
358
Searching in metric spaces
, 2001
"... The problem of searching the elements of a set that are close to a given query element under some similarity criterion has a vast number of applications in many branches of computer science, from pattern recognition to textual and multimedia information retrieval. We are interested in the rather gen ..."
Abstract
-
Cited by 436 (38 self)
- Add to MetaCart
The problem of searching the elements of a set that are close to a given query element under some similarity criterion has a vast number of applications in many branches of computer science, from pattern recognition to textual and multimedia information retrieval. We are interested in the rather general case where the similarity criterion defines a metric space, instead of the more restricted case of a vector space. Many solutions have been proposed in different areas, in many cases without cross-knowledge. Because of this, the same ideas have been reconceived several times, and very different presentations have been given for the same approaches. We present some basic results that explain the intrinsic difficulty of the search problem. This includes a quantitative definition of the elusive concept of “intrinsic dimensionality. ” We also present a unified
Citeseer: an automatic citation indexing system
- INTERNATIONAL CONFERENCE ON DIGITAL LIBRARIES
, 1998
"... We present CiteSeer: an autonomous citation indexing system which indexes academic literature in electronic format (e.g. Postscript files on the Web). CiteSeer understands how to parse citations, identify citations to the same paper in different formats, and identify the context of citations in the ..."
Abstract
-
Cited by 292 (46 self)
- Add to MetaCart
(Show Context)
We present CiteSeer: an autonomous citation indexing system which indexes academic literature in electronic format (e.g. Postscript files on the Web). CiteSeer understands how to parse citations, identify citations to the same paper in different formats, and identify the context of citations in the body of articles. CiteSeer provides most of the advantages of traditional (manually constructed) citation indexes (e.g. the ISI citation indexes), including: literature retrieval by following citation links (e.g. by providing a list of papers that cite a given paper), the evaluation and ranking of papers, authors, journals, etc. based on the number of citations, and the identification of research trends. CiteSeer has many advantages over traditional citation indexes, including the ability to create more up-to-date databases which are not limited to a preselected set of journals or restricted by journal publication delays, completely autonomous operation with a corresponding reduction in cost, and powerful interactive browsing of the literature using the context of citations. Given a particular paper of interest, CiteSeer can display the context of how the paper is cited in subsequent publications. This context may contain a brief summary of the paper, another author's response to the paper, or subsequent work which builds upon the original article. CiteSeer allows the location of papers by keyword search or by citation links. Papers related to a given paper can be located using common citation information or word vector similarity. CiteSeer will soon be available for public use.
The Bayesian image retrieval system, PicHunter: Theory, implementation, and psychophysical experiments
- IEEE TRANSACTIONS ON IMAGE PROCESSING
, 2000
"... This paper presents the theory, design principles, implementation, and performance results of PicHunter, a prototype content-based image retrieval (CBIR) system that has been developed over the past three years. In addition, this document presents the rationale, design, and results of psychophysica ..."
Abstract
-
Cited by 226 (2 self)
- Add to MetaCart
(Show Context)
This paper presents the theory, design principles, implementation, and performance results of PicHunter, a prototype content-based image retrieval (CBIR) system that has been developed over the past three years. In addition, this document presents the rationale, design, and results of psychophysical experiments that were conducted to address some key issues that arose during PicHunter’s development. The PicHunter project makes four primary contributions to research on content-based image retrieval. First, PicHunter represents a simple instance of a general Bayesian framework we describe for using relevance feedback to direct a search. With an explicit model of what users would do, given what target image they want, PicHunter uses Bayes’s rule to predict what is the target they want, given their actions. This is done via a probability distribution over possible image targets, rather than by refining a query. Second, an entropy-minimizing display algorithm is described that attempts to maximize the information obtained from a user at each iteration of the search. Third, PicHunter makes use of hidden annotation rather than a possibly inaccurate/inconsistent annotation structure that the user must learn and make queries in. Finally, PicHunter introduces two experimental paradigms to quantitatively evaluate the performance of the system, and psychophysical experiments are presented that support the theoretical claims.
Near neighbor search in large metric spaces
- In Proceedings of the 21th International Conference on Very Large Data Bases
, 1995
"... Given user data, one often wants to find approximate matches in a large database. A good example of such a task is finding images similar to a given image in a large collection of images. We focus on the important and technically difficult case where each data element is high dimensional, or more ge ..."
Abstract
-
Cited by 216 (0 self)
- Add to MetaCart
(Show Context)
Given user data, one often wants to find approximate matches in a large database. A good example of such a task is finding images similar to a given image in a large collection of images. We focus on the important and technically difficult case where each data element is high dimensional, or more generally, is represented by a point in a large metric spaceand distance calculations are computationally expensive. In this paper we introduce a data structure to solve this problem called a GNAT- Geometric Near-neighbor Access Tree. It is based on the philosophy that the data structure should act as a hierarchical geometrical model of the data as opposed to a simple decomposition of the data that does not use its intrinsic geometry. In experiments, we find that GNAT’s outperform previous data structures in a number of applications.
Index-driven similarity search in metric spaces
- ACM Transactions on Database Systems
, 2003
"... Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this article, we focus on methods for similarity search th ..."
Abstract
-
Cited by 192 (8 self)
- Add to MetaCart
Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this article, we focus on methods for similarity search that make the general assumption that similarity is represented with a distance metric d. Existing methods for handling similarity search in this setting typically fall into one of two classes. The first directly indexes the objects based on distances (distance-based indexing), while the second is based on mapping to a vector space (mapping-based approach). The main part of this article is dedicated to a survey of distance-based indexing methods, but we also briefly outline how search occurs in mapping-based methods. We also present a general framework for performing search based on distances, and present algorithms for common types of queries that operate on an arbitrary “search hierarchy. ” These algorithms can be applied on each of the methods presented, provided a suitable search hierarchy is defined.
Finding Nearest Neighbors in Growth-restricted Metrics
- In 34th Annual ACM Symposium on the Theory of Computing
, 2002
"... Most research on nearest neighbor algorithms in the literature has been focused on the Euclidean case. In many practical search problems however, the underlying metric is non-Euclidean. Nearest neighbor algorithms for general metric spaces are quite weak, which motivates a search for other classes o ..."
Abstract
-
Cited by 170 (0 self)
- Add to MetaCart
(Show Context)
Most research on nearest neighbor algorithms in the literature has been focused on the Euclidean case. In many practical search problems however, the underlying metric is non-Euclidean. Nearest neighbor algorithms for general metric spaces are quite weak, which motivates a search for other classes of metric spaces that can be tractably searched.
Approximate nearest neighbors and Fast Johnson-Lindenstrauss Transform.
- Proceedings of the Symposium on Theory of Computing,
, 2006
"... ABSTRACT We introduce a new low-distortion embedding of The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with a randomized Fourier transform. Sparse random projections are unsuitable for low-distort ..."
Abstract
-
Cited by 156 (6 self)
- Add to MetaCart
ABSTRACT We introduce a new low-distortion embedding of The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with a randomized Fourier transform. Sparse random projections are unsuitable for low-distortion embeddings. We overcome this handicap by exploiting the "Heisenberg principle" of the Fourier transform, ie, its local-global duality. The FJLT can be used to speed up search algorithms based on lowdistortion embeddings in 1 and 2. We consider the case of approximate nearest neighbors in d 2 . We provide a faster algorithm using classical projections, which we then further speed up by plugging in the FJLT. We also give a faster algorithm for searching over the hypercube.
A Simple Algorithm for Nearest Neighbor Search in High Dimensions
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1997
"... Abstract—The problem of finding the closest point in high-dimensional spaces is common in pattern recognition. Unfortunately, the complexity of most existing search algorithms, such as k-d tree and R-tree, grows exponentially with dimension, making them impractical for dimensionality above 15. In ne ..."
Abstract
-
Cited by 153 (1 self)
- Add to MetaCart
(Show Context)
Abstract—The problem of finding the closest point in high-dimensional spaces is common in pattern recognition. Unfortunately, the complexity of most existing search algorithms, such as k-d tree and R-tree, grows exponentially with dimension, making them impractical for dimensionality above 15. In nearly all applications, the closest point is of interest only if it lies within a user-specified distance e. We present a simple and practical algorithm to efficiently search for the nearest neighbor within Euclidean distance e. The use of projection search combined with a novel data structure dramatically improves performance in high dimensions. A complexity analysis is presented which helps to automatically determine e in structured problems. A comprehensive set of benchmarks clearly shows the superiority of the proposed algorithm for a variety of structured and unstructured search problems. Object recognition is demonstrated as an example application. The simplicity of the algorithm makes it possible to construct an inexpensive hardware search engine which can be 100 times faster than its software equivalent. A C++ implementation of our algorithm is available upon request to search@cs.columbia.edu/CAVE/.
On aligning curves
- IEEE TPAMI
, 2003
"... We present a novel approach to finding a correspondence (alignment) between two curves. The correspondence is based on a notion of an alignment curve which treats both curves symmetrically. We then define a similarity metric based on the alignment curve using two intrinsic properties of the curve, ..."
Abstract
-
Cited by 133 (4 self)
- Add to MetaCart
We present a novel approach to finding a correspondence (alignment) between two curves. The correspondence is based on a notion of an alignment curve which treats both curves symmetrically. We then define a similarity metric based on the alignment curve using two intrinsic properties of the curve, namely, length and curvature. The optimal correspondence is found by an efficient dynamic-programming method both for aligning pairs of curve segments and pairs of closed curves, and is effective in the presence of a variety of transformations of the curve. Finally, the correspondence is shown in application to handwritten character recognition, prototype formation, and object recognition, and is potentially useful in other applications such as registration and tracking.
Efficient algorithms for geometric optimization
- ACM Comput. Surv
, 1998
"... We review the recent progress in the design of efficient algorithms for various problems in geometric optimization. We present several techniques used to attack these problems, such as parametric searching, geometric alternatives to parametric searching, prune-and-search techniques for linear progra ..."
Abstract
-
Cited by 114 (10 self)
- Add to MetaCart
We review the recent progress in the design of efficient algorithms for various problems in geometric optimization. We present several techniques used to attack these problems, such as parametric searching, geometric alternatives to parametric searching, prune-and-search techniques for linear programming and related problems, and LPtype problems and their efficient solution. We then describe a variety of applications of these and other techniques to numerous problems in geometric optimization, including facility location, proximity problems, statistical estimators and metrology, placement and intersection of polygons and polyhedra, and ray shooting and other query-type problems.