Results 1 - 10
of
30
When Is "Nearest Neighbor" Meaningful?
- In Int. Conf. on Database Theory
, 1999
"... . We explore the effect of dimensionality on the "nearest neighbor " problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance to the fa ..."
Abstract
-
Cited by 222 (1 self)
- Add to MetaCart
. We explore the effect of dimensionality on the "nearest neighbor " problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance to the farthest data point. To provide a practical perspective, we present empirical results on both real and synthetic data sets that demonstrate that this effect can occur for as few as 10-15 dimensions. These results should not be interpreted to mean that high-dimensional indexing is never meaningful; we illustrate this point by identifying some high-dimensional workloads for which this effect does not occur. However, our results do emphasize that the methodology used almost universally in the database literature to evaluate high-dimensional indexing techniques is flawed, and should be modified. In particular, most such techniques proposed in the literature are not evaluated versus simple...
Fast and effective retrieval of medical tumor shapes
- IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 1998
"... We investigate the problem of retrieving similar shapes from a large database; in particular, we focus on medical tumor shapes (“Find tumors that are similar to a given pattern.”). We use a natural similarity function for shape-matching, based on concepts from mathematical morphology, and we show h ..."
Abstract
-
Cited by 45 (0 self)
- Add to MetaCart
We investigate the problem of retrieving similar shapes from a large database; in particular, we focus on medical tumor shapes (“Find tumors that are similar to a given pattern.”). We use a natural similarity function for shape-matching, based on concepts from mathematical morphology, and we show how it can be lower-bounded by a set of shape features for safely pruning candidates, thus giving fast and correct output. These features can be organized in a spatial access method, leading to fast indexing for range queries and nearest-neighbor queries. In addition to the lower-bounding, our second contribution is the design of a fast algorithm for nearest-neighbor search, achieving significant speedup while provably guaranteeing correctness. Our experiments demonstrate that roughly 90 percent of the candidates can be pruned using these techniques, resulting in up to 27 times better performance compared to sequential scan.
Improving the Query Performance of High-Dimensional Index Structures by Bulk Load Operations
- IN PROC. CONFERENCE ON EXTENDING DATABASE TECHNOLOGY, LNCS 1377
, 1998
"... In this paper, we propose a new bulk-loading technique for high-dimensional indexes which represent an important component of multimedia database systems. Since it is very inefficient to construct an index for a large amount of data by dynamic insertion of single objects, there is an increasing i ..."
Abstract
-
Cited by 44 (11 self)
- Add to MetaCart
In this paper, we propose a new bulk-loading technique for high-dimensional indexes which represent an important component of multimedia database systems. Since it is very inefficient to construct an index for a large amount of data by dynamic insertion of single objects, there is an increasing interest in bulk-loading techniques. In contrast to previous approaches, our technique exploits a priori knowledge of the complete data set to improve both construction time and query performance. Our algorithm operates in a mannar similar to the Quicksort algorithm and has an average runtime complexity of O(n log n). We additionally improve the query performance by optimizing the shape of the bounding boxes, by completely avoiding overlap, and by clustering the pages on disk.
A Cost Model for Query Processing in High-Dimensional Data Spaces
, 2000
"... During the last decade, multimedia databases have become increasingly important in many application areas such as medicine, CAD, geography or molecular biology. An important research issue in the field of multimedia databases is similarity search in large data sets. Most current approaches addressin ..."
Abstract
-
Cited by 43 (0 self)
- Add to MetaCart
During the last decade, multimedia databases have become increasingly important in many application areas such as medicine, CAD, geography or molecular biology. An important research issue in the field of multimedia databases is similarity search in large data sets. Most current approaches addressing similarity search use the so-called feature approach which transforms important properties of the stored objects into points of a high-dimensional space (feature vectors). Thus, the similarity search is transformed into a neighborhood search in the feature space. For the management of the feature vectors, multidimensional index structures are usually applied. The performance of query processing can be substantially improved by opti...
Fast Nearest Neighbor Search in High-dimensional Space
- In Proceedings of the 14th International Conference on Data Engineering
, 1998
"... Similarity search in multimedia databases requires an efficient support of nearest-neighbor search on a large set of high-dimensional points as a basic operation for query processing. As recent theoretical results show, state of the art approaches to nearest-neighbor search are not efficient in high ..."
Abstract
-
Cited by 41 (0 self)
- Add to MetaCart
Similarity search in multimedia databases requires an efficient support of nearest-neighbor search on a large set of high-dimensional points as a basic operation for query processing. As recent theoretical results show, state of the art approaches to nearest-neighbor search are not efficient in higher dimensions. In our new approach, we therefore precompute the result of any nearest-neighbor search which corresponds to a computation of the voronoi cell of each data point. In a second step, we store the voronoi cells in an index structure efficient for high-dimensional data spaces. As a result, nearest neighbor search corresponds to a simple point query on the index structure. Although our technique is based on a precomputation of the solution space, it is dynamic, i.e. it supports insertions of new data points. An extensive experimental evaluation of our technique demonstrates the high efficiency for uniformly distributed as well as real data. We obtained a significant reduction of the...
The Gauss-tree: Efficient object identification in databases of probabilistic feature vectors
- In Proc. ICDE
, 2006
"... In applications of biometric databases the typical task is to identify individuals according to features which are not exactly known. Reasons for this inexactness are varying measuring techniques or environmental circumstances. Since these circumstances are not necessarily the same when determining ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
In applications of biometric databases the typical task is to identify individuals according to features which are not exactly known. Reasons for this inexactness are varying measuring techniques or environmental circumstances. Since these circumstances are not necessarily the same when determining the features for different individuals, the exactness might strongly vary between the individuals as well as between the features. To identify individuals, similarity search on feature vectors is applicable, but even the use of adaptable distance measures is not capable to handle objects having an individual level of exactness. Therefore, we develop a comprehensive probabilistic theory in which uncertain observations are modeled by probabilistic feature vectors (pfv), i.e. feature vectors where the conventional feature values are replaced by Gaussian probability distribution functions. Each feature value of each object is complemented by a variance value indicating its uncertainty. We define two types of identification queries, k-mostlikely identification and threshold identification. For efficient query processing, we propose a novel index structure, the Gauss-tree. Our experimental evaluation demonstrates that pfv stored in a Gauss-tree significantly improve the result quality compared to traditional feature vectors. Additionally, we show that the Gauss-tree significantly speeds up query times compared to competitive methods. 1
2D Projection Interval Relationships: A Symbolic Representation of Spatial Relationships
- IN PROC. OF 4TH INT'L SYMPOSIUM ON LARGE SPATIAL DATABASES
, 1995
"... Spatial relationships are important ingredients for expressing constraints in retrieval systems for spatial databases. In this paper we propose a unified representation of spatial relationships, 2D Projection Interval Relationships (2D-PIR), that integrates both directional and topological relations ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Spatial relationships are important ingredients for expressing constraints in retrieval systems for spatial databases. In this paper we propose a unified representation of spatial relationships, 2D Projection Interval Relationships (2D-PIR), that integrates both directional and topological relationships. We propose a graph representation for pictures that is based on 2D-PIR. This graph representation can be constructed efficiently and leads to an efficient algorithm for "picture matching".
Effective Similarity Search on Voxelized CAD Objects
, 2003
"... Similarity search in database systems is becoming an increasingly important task in modern application domains such as multimedia, molecular biology, medical imaging and many others. Especially for CAD applications, suitable similarity models and a clear representation of the results can help to red ..."
Abstract
-
Cited by 11 (9 self)
- Add to MetaCart
Similarity search in database systems is becoming an increasingly important task in modern application domains such as multimedia, molecular biology, medical imaging and many others. Especially for CAD applications, suitable similarity models and a clear representation of the results can help to reduce the cost of developing and producing new parts by maximizing the reuse of existing parts. In this paper, we adapt two known similarity models to voxelized 3-D CAD data and introduce a new model based on eigenvectors. The experimental evaluation of our three similarity models is based on two real-world test datasets. Furthermore, we introduce hierarchical clustering as a new and effective way to analyse and compare similarity models. We show that both our similarity model as well as our evaluation procedure are suitable for industrial use.

