| T. Bozkaya, M. Ozsoyoglu.(1997) "Distance-based Indexing for High-dimensional Metric Spaces" ACM SIGMOD 97, page 357-368. |
....expensive operations is to find objects in the database that are similar to a given query object. Nearest neighbor search is a central requirement in such cases. There is a long stream of research on solving the nearest neighbor search problem, and many multidimensional indexes have been proposed [3, 4, 7, 8, 14, 16, 17, 18]. However, these index structures have largely been studied in the contextofdisk basedsystemswhereitisassumedthatthe databases are too large to fit into the main memory. This assumption is increasingly being challenged as RAM gets Permission to make digital or hard copies of all or part of this ....
....contribute significantly to the overall cost. Several main memory indexing schemes have been designed to be cache conscious [13, 15] However, these schemes are targeted at single or low dimensional data. Moreover, for high dimensional data, distance calculations are computationally expensive [4]. Therefore an e#cient main memory index should exploit the L2 cache e#ectively and minimize the distance computation to improve the performance. In this paper, we propose a novel multi tier index structure, called # tree , that can facilitate e#cient KNN search in main memory environment. ....
T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proc. of the ACM SIGMOD Conference, pages 357--368, 1997.
....with the user and situation. These mechanisms require search methods that can support the adaptive queries. The goal of our work is to create a search method for adaptive ellipsoid queries that can find similar objects e#ciently. Various metric indices (e.g. the M tree [CPZ97] and the mvp tree [BO97] have been proposed as indexing methods for arbitrary distance functions. However, these indices cannot be applied to systems that handle changeable distance functions and, thus, they are not functionally adequate to support adaptive ellipsoid queries. In [SK97] Seidl et al. presented a search ....
Tolga Bozkaya and Meral Ozoyoglu: "Distance-Based Indexing for HighDimensional Metric Spaces", in Proc. ACM SIGMOD International Conference on Management of Data, pp. 357--368, May 1997.
....of individual queries. First, our results indicate that while there exist situations in which high dimensional nearest neighbor queries are meaningful, they are very specific in nature and are quite different from the independent dimensions basis that most studies in the literature (e.g. [20, 10, 6, 4, 5]) use to evaluate techniques in a controlled manner. In the future, these NN technique evaluations should focus on those situations in which the results are meaningful. For instance, answers are meaningful when the data consists of small, well formed clusters, and the query is guaranteed to land ....
....at 10 dimensions in all cases. In [10] linear scan vastly outperforms the SR tree in all cases in this paper for the 16 dimensional synthetic dataset. For a 16 dimensional real dataset, the SR tree performs similarly to linear scan in a few experiments, but is usually beaten by linear scan. In [6], performance numbers are presented for NN queries where bounds are imposed on the radius used to find the NN. While the performance in high dimensionality looks good in some cases, in trying to duplicate their results we found that the radius was such that few, if any, queries returned an ....
T. Bozkaya and M. Ozsoyoglu. Distancebased indexing for high-dimensional metric spaces. In Proc. 16th ACM SIGACT-SIGMOD-SIGART Symposium on PODS, pages 357--368, 1997.
....objects from one or more selected pivot point(s) where the distance is computed using a given distance function. Examples of DP based techniques that are distance based include SS tree, M tree and TV tree. Examples of SP based techniques that are distance based include vp tree [31] and mvp tree [19]. A comparison between the two classes can be found in [22] Corresponding representation Note: Internal nodes of kd trees maitain 2 split positions (lsp and rsp) instead of one to represent overlapping splits within hybrid tree Space partitioning index node Note: Partitions mutually ....
T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high dimensional metric spaces. Proc. of SIGMOD, 1997.
....queries. We do not use SAMS as an index structure, but as a way of resolving data conflicts, which are explained in detail in Chapter 5. Several spatial access methods have been proposed, such as R tree and its variants [7, 25, 32, 38, 61] TV tree [50] X tree [8] Metric tree [65] MVP tree [10], and M tree [13] Those methods need a distance function that is used for comparing the data objects in the multi dimensional space. The distance function measures the (dis)similarity between two objects. It is defined for all attributes and sub objects of the considered objects. Dey et al. ....
T. Bozkaya and M. Ozsoyoglu, "Distance-Based Indexing For High-Dimensional Metric Spaces," Proceedings ACM SIGMOD International Conference on Management of Data, Tucson, Arizona, 1997. 115
....the criteria specified in a givenquM . The indexingmethods that have been proposed tosu port this kind of retrieval are known as spatial access methods (SAMs) andmet t ees. The formerinclugb SS tree [31] R tree [26] and grid files [11] the latterinclugb the vp tree [4] mvp tree [1], GNAT [2] andM tree [6] While these methods are effective in some specialized image database applications, many open problems in indexingstill remain. First, imagefeatu1 vectors utorsg have high dimensions (e.g. some imagefeatu1 vectors can have u to 100 dimensions) Since the ....
....to organize and partition the search space. The only requOOg is that the distance futance mut be metric so that the triangleinequgP6R property applies and can be ueg topruM the search space. Several metric trees have been developed so far, inclug MMP vp tree [4] the GNAT [2] the mvp tree [1], andM tree [7] Ou goal is not to develop a new indexingstru61P for high dimensional imagefeatuP1 bu t tou6 an existing one effectively. We chose a very well established access method called theM trees as theugPbMzg bbb61g for indexingou reduou composite image featuPb6 TheM trees ....
T. Bozkaya, M. Ozsoyoglu (1997) Distance-based indexing for high-dimensional metric spaces. In: SIGMOD'97, pp 357--368, TuO1O6 Ariz., USA
....uses as many pivots as space permits. There are many proximity search algorithms in metric spaces that are based on pivots, such as Burkhard Keller Tree [4] Fixed Queries Tree (FQT) 1] FixedHeight FQT (FHQT) 1] Fixed Queries Array (FQA) 5] Vantage Point Tree [13] Multi Vantage Point Tree [2], Excluded Middle Vantage Point Forest [14] AESA [12] Linear AESA (LAESA) 10] and Spaghettis [6] 3 E#ciency criterion Depending on how pivots are selected, they can filter out less or more objects. We define in this section a criterion to tell which from two pivot sets is expected to filter ....
T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proc. ACM SIGMOD International Conference on Management of Data, pages 357--368, 1997. Sigmod Record 26(2).
.... corresponding to d(a; b) 2 [x 1 ; x 2 ] then we can avoid entering in the subtree of b whenever [d(q; a) r; d(q; a) r] has no intersection with [x 1 ; x 2 ] Data structures using this idea are the bk tree and its variants [4,29] metric trees [31] tlaesa [20] and vp trees and variants [33,5,34]. Clustering algorithms. The second trend consists in dividing the space in zones as compact as possible, normally recursively, and storing a representative point ( center ) for each zone plus a few extra data that permits quickly discarding the zone at query time. Two criteria can be used to ....
T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proc. ACM Conference on Management of Data (SIGMOD'97), pages 357-368, 1997. Sigmod Record 26(2).
....(c) Methods based on trees, like the R tree [29] and its derivatives (R tree [30] etc. Recent extensions for high dimensions include the X tree [31] and the SR trees [32] Methods referred to as metric trees or distance trees are based on the idea of indexing using distance information [33, 34, 35]. All these methods try to exploit the triangle inequality in order to prune the search space on a range query. However, none of them tries to map images into points in a target space (also known as feature space ) nor to provide a tool for visualization. Besides, most of these methods require ....
Tolga Bozkaya and Meral Ozoyoglu. Distance-Based Indexing for High-Dimensional Metric Spaces. In Proceedings of ACM SIGMOD, 1997.
....we show that a direct use of the Euclidean distance measure is not suitable for finding series containing similar cyclic components. A variety of indexing structures and techniques have been introduced recently for similarity searches of complex objects such as shape similarity for time series [6, 34, 22, 23, 7, 31, 8, 9, 18, 5]. We attack the problem from a different direction and our proposed approximation techniques may be combined with efficient indexing structures to achieve better performance. Recently, paper [19] applies wavelet transform on a database of 20,000 images and obtained the approximations by choosing ....
T. Bozkaya and Z. M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proc. ACM SIGMOD, pages 357--368, 1997.
....are at the same depth h, regardless of the bucket size. Vantage Point Trees (vp trees) 20, 22] are designed for continuous distance functions. The root has two equal size subtrees that divide the elements in closer to and farther from the root. This can be extended to m ary trees (mvp trees) [5, 4]. Finally, algorithms like AESA [21] LAESA [16, 15] and its variants [18, 8] and Fixed Queries Arrays (fqarrays [9] are based in a common idea: k pivots are selected and each object is mapped to k coordinates which are its distances to the pivots. Later, the query q is also mapped and if it ....
T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proc. SIGMOD '97, pages 357--368, 1997. Sigmod Record 26(2).
.... query time BKT [19, 59] n pointers O(n log n) O(n ff ) FQT [5] n: n log n pointers O(n log n) O(n ff ) FHQT [5, 4, 6] n: nh pointers O(nh) O(log n) O(n ff ) FQA [24] nhb bits O(nh) O(log n) O(n ff log n) VPT [62, 68, 26] n pointers O(n log n) O(log n) MVPT [16, 15] n pointers O(n log n) O(log n) VPF [69] n pointers O(n 2 Gammaff ) O(n 1 Gammaff log n) BST [44, 52] n pointers O(n log n) not analyzed GHT [62, 18] n pointers O(n log n) not analyzed GNAT [16] nm 2 distances O(nm log m n) not analyzed VT [32, 51, 63] n ....
....before checking them. Finally, the author of [68] considers the problem of pivot selection and argues that it is better to take elements far away from the set. MVPT The VPT can be extended to m ary trees by using the m Gamma 1 uniform percentiles instead of just the median. This is suggested in [16, 15]. In [15] the Multi Vantage Point Tree (MVPT) is presented. They propose the use of many elements in a single node, much as in [59] It can be seen that the space is O(n) since each internal node needs to store the m percentiles but the leaves do not. The construction time is O(n log n) if we ....
[Article contains additional citation context not shown here]
T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proc. ACM SIGMOD International Conference on Management of Data, pages 357--368, 1997. Sigmod Record 26(2). 52
....vectors, but not multi dimensional vectors like the ones used in databases. Therefore, people have begun to develop new indexing methods for content based retrieval in databases such as R tree [33] R tree [63] R tree [6] SR tree [39] Quad tree [21] k d tree [7] VP tree [71] MVP tree [9], and some other methods [8, 68] 4 Chapter 1 Introduction 1.2 Problem Defined Generally, multimedia databases contain database objects with features approximately in Gaussian distributions and there usually exist some natural data clusters in the feature vector space (see Section 2.4 for ....
....0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 (b) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 (c) Figure 2.8: Different data distributions. a) Mixture Gaussian. b) Super Gaussian. c) Uniform. MVP tree Multi vantage point tree (MVP tree) [9] is a distance based indexing method for similarity queries on high dimensional metric spaces. Like VP tree, it uses vantage point for indexing. Unlike VP tree, it uses more than one vantage point to partition the feature vector space. Experiments in [9] show that MVP tree outperforms the VP tree ....
[Article contains additional citation context not shown here]
Tolga Bozkaya and Meral Ozsoyoglu. "Distance-Based Indexing for Highdimensional Metric Spaces". SIGMOD Record, 26(2):357--368, June 1997.
.... provided that we set r lo and r hi to the proper values that is, r lo = r i,1 and r hi = r i for the child corresponding to S i (unless tighter bounds are maintained) Another variant of vp trees that achieves a higher fan out, termed the mvp tree, was suggested by Bozkaya and Ozsoyoglu [7, 8]. Each node in the mvp tree is essentially equivalent to the result of collapsing the nodes at several levels of a vp tree. There is one crucial difference between the mvp tree and the result of such collapsing: only one pivot is used for each level inside an mvp tree node (although the number of ....
.... three pivots would be needed in the corresponding vp tree) Observe that some subsets are partitioned using pivots that are not members of the sets, which does not occur in the vp tree (e.g. p 2 is used to partition the subset inside the ball around p 1 in Figure 14a) Bozkaya and Ozsoyoglu [7, 8] suggest using multiple partitions for each pivot, as discussed above. Hence, with k pivots per node and m partitions per pivot, the fan out of the nonleaf nodes is m k . Furthermore, they propose storing, for each data object in a leaf node, the distances to some maximum number n of ancestral ....
T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proceedings of the ACM SIGMOD Conference, J. Peckham, ed., pages 357--368, Tucson, AZ, May 1997.
....of elements being placed in the outer half of the sphere, resulting in an unbalanced tree, requiring costly rebalancing operations. Some attempts to improve the behavior of this technique involve the use of multiple vantage point objects on each level of the tree, and exhibit some improvement [4]. In addition to the VP Tree, work has been done on GH Trees (Generalized Hyperplane Tree) which choose two (or perhaps more) elements at each level of the tree, and partition the remaining elements into two sets those closer to one element or the other. This technique has the problem that it ....
Tolga Bozkaya and Z. Meral Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Joan Peckham, editor, SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, May 13-15, 1997, Tucson, Arizona, USA, pages 357--368. ACM Press, 1997.
....the author of [Yianilos 1993] considers the problem of pivot selection and argues that it is better to take elements far away from the set. 5.1.2.2 MVPT. The VPT can be extended to m ary trees by using the m Gamma 1 uniform percentiles instead of just the median. This is suggested in [Brin 1995; Bozkaya and Ozsoyoglu 1997]. In [Bozkaya and Ozsoyoglu 1997] the Multi Vantage Point Tree (MVPT) is presented. They propose the use of many elements in a single node, much as in [Shapiro 1977] It can be seen that the space is O(n) since each internal node needs to store the m percentiles but the leaves do not. The ....
....the problem of pivot selection and argues that it is better to take elements far away from the set. 5.1.2.2 MVPT. The VPT can be extended to m ary trees by using the m Gamma 1 uniform percentiles instead of just the median. This is suggested in [Brin 1995; Bozkaya and Ozsoyoglu 1997] In [Bozkaya and Ozsoyoglu 1997], the Multi Vantage Point Tree (MVPT) is presented. They propose the use of many elements in a single node, much as in [Shapiro 1977] It can be seen that the space is O(n) since each internal node needs to store the m percentiles but the leaves do not. The construction time is O(n log n) if we ....
[Article contains additional citation context not shown here]
Bozkaya, T. and Ozsoyoglu, M. 1997. Distance-based indexing for high-dimensional metric spaces. In Proc. ACM SIGMOD International Conference on Management of Data (1997), pp. 357--368. Sigmod Record 26(2).
....as threshold the median of the distances from the pivot to all its associated elements. This guarantees that the tree is well balanced. The VP tree is generalized to use more than one pivot per node and using arbitrary quantiles instead of just the median in the Multi Vantage Point Tree (MVP) (Bozkaya and Ozsoyoglu, 1997). Another generalization of the same idea is to use a forest instead of a tree (Yianilos, 1999) to eliminate backtracking in limited radius nearest neighbor search. A different trend of algorithms based on pivots stores the information in array form. For each database element a, its distance to ....
Bozkaya, T. and M. Ozsoyoglu: 1997, `Distance-based indexing for high-dimensional metric spaces'. In: Proc. ACM SIGMOD International Conference on Management of Data. pp. 357--368. Sigmod Record 26(2).
....form the basis for indexes designed for high dimensional databases [14, 20] To reduce the e ect of high dimensionalities, dimensionality reduction [8] and lter and re ne methods [3, 19] have been proposed. Indexes were also speci cally designed to facilitate metric based query processing [7, 9]. However, linear scan remains one of the best strategies for similarity search [6] This is because there is a high tendency for data points to be equidistant to query points in a high dimensional space. More recently, the p sphere tree [10] was proposed to support approximate nearest neighbor ....
T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proc. 1997 ACM SIGMOD International Conference on Management of Data, pages 357{ 368. 1997. 19
.... the triangle inequality to prune subtrees, that is, if a is the tree root and b is a children corresponding to d(a; b) 2 [x 1 ; x 2 ] then we can avoid entering in the subtree of b whenever [d(q; a) r; d(q; a) r] has no intersection with [x 1 ; x 2 ] Several data structures use this idea [3, 22, 14, 24, 4, 25]. Clustering algorithms. The second trend consists in dividing the space in zones as compact as possible, normally recursively, and storing a representative point ( center ) for each zone plus a few extra data that permits quickly discarding the zone at query time. Two criteria can be used to ....
T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proc. ACM Conference on Management of Data (SIGMOD'97), pages 357-368, 1997. Sigmod Record 26(2).
.... pivots There are many proximity search algorithms in metric spaces that are based in the use of pivots, such as Burkhard Keller Tree (BKT) 5] Fixed Queries Tree (FQT) 2] Fixed Height FQT (FHQT) 2] Fixed Queries Array (FQA) 7] Vantage Point Tree (VPT) 12] Multi Vantage Point Tree (MVPT) [3], Excluded Middle Vantage Point Forest (VPF) 13] Approximating Eliminating Search Algorithm (AESA) 11] Linear AESA (LAESA) 10] and Spaghettis [6] All these algorithms use, directly or indirectly, the following procedure to answer range queries: if the universe of objects is denoted by X, ....
T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proc. ACM SIGMOD International Conference on Management of Data, pages 357--368, 1997. Sigmod Record 26(2).
....dmax (v i ,Reg(N) bounds, since they depend on the kind of data regions managed by the index. For instance, in M tree above bounds are computed as max d(v i ,vr ) r(vr ) 0 and d(v i ,vr ) r(vr ) respectively [CPZ97] Simple calculations are similarly required for other metric trees [Chi94, Bri95, BO97], as well as for spatial access methods, such as R tree (see [RKV95] 4.1 False Drops at the Index Level The absence of any specific assumption about the similarity environment and the access method in Theorem 4 makes it impossible to guarantee the absence of false drops at the level of index ....
T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, pages357--368, Tucson, AZ, May 1997.
....a binary tree degenerates into a simple list of vantage points. Another method [20] is the generalized hyper plane tree (gh tree) which partitions the data set into two by picking two points as representatives and assigning the remaining to the closest representative. Bozkaya and Ozsoyoglu [7] [6] proposed an extension of the vp tree called multi vantage point tree (mvp tree) which chooses in a clever way m vantage points for a node which has a fanout of m 2 . The Geometric Near Access Tree (GNAT) of Brin [8] can be viewed as a refinement of the second technique presented in [9] It ....
T. Bozkaya and Z. M. zsoyoglu, "Distance-Based Indexing for High-Dimensional Metric Spaces," Proc. ACM Int'l Conference on Data Management (SIGMOD), Tucson, AZ, 1997, pp. 357-368.
....guide the search for approximate string matches [4, 11] In [1] Baeza Yates and Gonnet solve the problem of exact substring joins, using suffix arrays and outside the context of a relational database. In the context of databases, several indexing techniques proposed for arbitrary metric spaces [3, 2] could be applied for the problem of approximately retrieving strings. However such structures have to be supported by the database management system. Cohen [5] presented a framework for the integration of heterogeneous databases based on textual similarity and proposed WHIRL, a logic that ....
T. Bozkaya and Z. M. Ozsoyoglu. Distance based indexing for high dimensional metric spaces. In Proceedings of the 1997 ACM SIGMOD Conference on Management of Data, pages 357--368,1997.
....guide the search for approximate string matches [4, 11] In [1] Baeza Yates and Gonnet solve the problem of exact substring joins, using suffix arrays and outside the context of a relational database. In the context of databases, several indexing techniques proposed for arbitrary metric spaces [3, 2] could be applied for the problem of approximately retrieving strings. However such structures have to be supported by the database management system. Cohen [5] presented a framework for the integration of heterogeneous databases based on textual similarity and proposed WHIRL, a logic that ....
T. Bozkaya and Z. M. Ozsoyoglu. Distance based indexing for high dimensional metric spaces. In Proceedings of the 1997 ACM SIGMOD Conference on Management of Data, pages 357--368,1997.
....indexes designed for high dimensional databases [12, 17] To reduce the e ect of high dimensionalities, use of bigger nodes [3] dimensionality reduction [7] and lter and re ne methods [2, 16] have been proposed. Indexes were also speci cally designed to facilitate metric based query processing [6, 8]. However, linear scan remains one of the best strategies for KNN search [5] This is because there is a high tendency for data points to be equidistant to query points in a high dimensional space. More recently, the p sphere tree [9] was proposed to support approximate nearest neighbor (NN) ....
T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proc. 1997 ACM SIGMOD International Conference on Management of Data, pages 357-368. 1997.
....Pivot based algorithms can be thought as a mapping from the original metric space to a k dimensional vector space, see section 4 for a discussion on this issue. The most important feature for the performance of a pivot based algorithm is the number of pivots used for the mapping. Some indexes [5, 2, 15, 3] can use only O(log n) pivots, since they partition the space hierarchically using one pivot [5, 2, 15] or more [3] per level. After log n levels each partition has only O(1) element. In other schemas [1] the level of the tree is a parameter and the existence of very long paths in the tree is ....
....4 for a discussion on this issue. The most important feature for the performance of a pivot based algorithm is the number of pivots used for the mapping. Some indexes [5, 2, 15, 3] can use only O(log n) pivots, since they partition the space hierarchically using one pivot [5, 2, 15] or more [3] per level. After log n levels each partition has only O(1) element. In other schemas [1] the level of the tree is a parameter and the existence of very long paths in the tree is allowed. The elimination of the backtracking for limited radius nearest neighbor searches has been proposed as an ....
T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In ACM SIGMOD'97 Conference, pages 357-368, Tucson, Arizona, USA, 1997.
....so as to predict its probable future behavior) etc. Since the problem has appeared in unrelated areas, the corresponding algorithms and data structures seem to emerge from a great diversity, and different approaches have been proposed and analyzed separately, often under different assumptions [5, 20, 22, 19, 21, 23, 13,15, 1, 4, 14, 18, 3, 11, 17, 7, 8, 24]. Due to space limitations we refer the reader to a recent survey where all the known approaches for similarity searching are discussed [9] Currently, the only realistic way to compare two different algorithms is to apply them to the same data set. We present a unified complexity model for the ....
....Fig. 3. With two rings we define an equivalence based on being at the same distance to both points. However, the resulting class is partitioned. 6 Pivot Based and Clustering Algorithms A large class of methods to index metric spaces are just variants of what we call pivot based algorithms [5, 20, 22, 21, 23, 13, 15, 1, 14, 18, 3, 11, 7, 8, 24]. The idea is an extension of Example 3, using more pivots in order to decrease the external complexity. Instead of just one pivot, one selects h pivots p 1 Delta Delta Delta p h 2 U, and stores all the distances d(u; p i ) for all u 2 U. This set of distances is the index. Now, given a query ....
T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proc. ACM SIGMOD International Conference on Management of Data, pages 357--368, 1997. Sigmod Record 26(2).
....are at the same depth h, regardless of the bucket size. Vantage Point Trees (vp trees) 17, 19] are designed for continuous distance functions. The root has two equal size subtrees that divide the elements in closer to and farther from the root. This can be extended to m ary trees (mvp trees) [5, 4]. Generalized hyperplane trees (gh trees) 17] use two pivots for each tree node and divide the space according to which of the two pivots is closer to each object. If this is generalized to an m ary partition then a Geometric Near neighbor Access Tree (gna tree) is obtained [5] which makes a ....
T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proc. SIGMOD'97, pages 357-368, 1997. Sigmod Record 26(2).
....and allows overlaps in the areas covered (i.e. a point may belong to more than one partition) This idea is also present in R trees [36] for vector spaces. MVPT The VPT can be extended to m ary trees by using the m Gamma 1 uniform percentiles instead of just the median. This is suggested in [16, 15]. In [15] the Multi Vantage Point Tree (MVPT) is presented. They propose the use of many elements in a single node, much as in [51] It can be seen that the space is O(n) since each internal node needs to store the m percentiles but the leaves do not. The construction time is O(n log n) if we ....
....overlaps in the areas covered (i.e. a point may belong to more than one partition) This idea is also present in R trees [36] for vector spaces. MVPT The VPT can be extended to m ary trees by using the m Gamma 1 uniform percentiles instead of just the median. This is suggested in [16, 15] In [15], the Multi Vantage Point Tree (MVPT) is presented. They propose the use of many elements in a single node, much as in [51] It can be seen that the space is O(n) since each internal node needs to store the m percentiles but the leaves do not. The construction time is O(n log n) if we search ....
[Article contains additional citation context not shown here]
T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proc. ACM SIGMOD International Conference on Management of Data, pages 357--368, 1997. Sigmod Record 26(2).
....are at the same depth h, regardless of the bucket size. Vantage Point Trees (vp trees) 13, 15] are designed for continuous distance functions. The root has two equal size subtrees that divide the elements in closer to and farther from the root. This can be extended to m ary trees (mvp trees) [5, 4]. Generalized hyperplane trees (gh trees) 13] use two pivots for each tree node and divide the space according to which of the two pivots is closer to each object. If this is generalized to an m ary partition then a Geometric Near neighbor Access Tree (gna tree) is obtained [5] which makes a ....
T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proc. SIGMOD '97, pages 357--368, 1997. Sigmod Record 26(2).
....are at the same depth h, regardless of the bucket size. Vantage Point Trees (vp trees) 18, 20] are designed for continuous distance functions. The root has two equal size subtrees that divide the elements in closer to and farther from the root. This can be extended to m ary trees (mvp trees) [4, 3]. Finally, algorithms like AESA [19] LAESA [14, 13] and its variants [16, 7] and Fixed Queries Arrays (fq arrays [8] are based in a common idea: k pivots are selected and each object is mapped to k coordinates which are its distances to the pivots. Later, the query q is also mapped and if it ....
T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proc. SIGMOD'97, pages 357--368, 1997. Sigmod Record 26(2).
....the leaves are at the same depth h, regardless of the bucket size. Vantage Point Trees (VPTs) 36, 39] are designed for continuous distance functions. The root has two equal size subtrees that divide the elements in closer to and farther from the root. This can be extended to m ary trees (MVPTs) [10, 9]. Finally, algorithms like AESA [37] LAESA [31, 30] and its variants [33, 13] and Fixed Queries Arrays (FQAs [14] are based in a common idea: k pivots are selected and each object is mapped to k coordinates which are its distances to the pivots. Later, the query q is also mapped and if it ....
T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proc. ACM SIGMOD International Conference on Management of Data, pages 357--368, 1997. Sigmod Record 26(2).
.... Complexity query time BKT [19, 58] n pointers O(n log n) O(n ) FQT [5] n: n log n pointers O(n log n) O(n ) FHQT [5, 4, 6] n: nh pointers O(nh) O(log n) O(n ) FQA [24] nhb bits O(nh) O(log n) O(n log n) VPT [61, 67, 25] n pointers O(n log n) O(log n) MVPT [16, 15] n pointers O(n log n) O(log n) VPF [68] n pointers O(n 2 ) O(n 1 log n) BST [43, 51] n pointers O(n log n) not analyzed GHT [61, 18] n pointers O(n log n) not analyzed GNAT [16] nm 2 distances O(nm log m n) not analyzed VT [31, 50, 62] n pointers O(n log n) not ....
....before checking them. Finally, the author of [67] considers the problem of pivot selection and argues that it is better to take elements far away from the set. MVPT The VPT can be extended to m ary trees by using the m 1 uniform percentiles instead of just the median. This is suggested in [16, 15]. In [15] the Multi Vantage Point Tree (MVPT) is presented. They propose the use of many elements in a single node, much as in [58] It can be seen that the space is O(n) since each internal node needs to store the m percentiles but the leaves do not. The construction time is O(n log n) if we ....
[Article contains additional citation context not shown here]
T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proc. ACM SIGMOD International Conference on Management of Data, pages 357-368, 1997. Sigmod Record 26(2).
....selection of a centroid distributes the distances better. The problem with the algorithm [33] is that it needs O(n 2 ) space and build time. In this sense it is close to [25] This is unacceptably high for all by very small databases. Some approaches designed for continuous distance functions [31, 37, 8, 9, 12, 24] are not covered in this brief review. The reason is that these structures do not use all the information obtained from the comparisons, since this cannot be done in continuous spaces. It can, however, be done (and it is done) in discrete spaces and this fact makes the reviewed structures superior ....
T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. Manuscript.
....from the pivot to all its associated elements. A more complete work on the same idea is presented in the Vantage Point Trees (VPT) 19] This tree is generalized to use more than one pivot per node and using arbitrary quantiles instead of just the median in the Multi Vantage Point Trees (MVP) [6]. Another generalization of the same idea is to use a forest instead of a tree [20] to eliminate backtracking in limited radius nearest neighbor search in high dimensions. There is a trend of algorithms based simply in the use of k pivots, with little or no search structure. For each database ....
T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. Manuscript.
....scans as a sanity check. First, our results indicate that while there exist situations in which high dimensional nearest neighbor queries are meaningful, they are very specific in nature and are quite different from the independent dimensions basis that most studies in the literature (e.g. [31, 19, 14, 10, 11]) use to evaluate techniques in a controlled manner. In the future, these NN technique evaluations should focus on those situations in which the results are meaningful. For instance, answers are meaningful when the data consists of small, well formed clusters, and the query is guaranteed to land ....
....at 10 dimensions in all cases. In [19] linear scan vastly outperforms the SR tree in all cases in this paper for the 16 dimensional synthetic data set. For a 16 dimensional real data set, the SR tree performs similarly to linear scan in a few experiments, but is usually beaten by linear scan. In [14], performance numbers are presented for NN queries where bounds are imposed on the radius used to find the NN. While the performance in high dimensionality looks good in some cases, in trying to duplicate their results we found that the radius was such that few, if any, queries returned an answer. ....
Bozkaya, T., Ozsoyoglu, M.: Distance-Based Indexing for High-Dimensional Metric Spaces. In Proc. 16th ACM SIGACT-SIGMOD-SIGART Symposium on PODS (1997) 357--368
....researchers havedeveloped many new indexing methods for contentbased retrieval in multimedia databases. For example, rectangle based indexing as in R Tree [6] R Tree [11] R Tree [1] SR Tree [7] Partition based Indexing as in Quad tree [5] k d Tree [2] VP Tree [4, 13] and MVP tree [3]. However, one major problem of these indexing techniques has been that these methods fail to utilize the underlying data distribution to their advantage in their indexing structure. This results in what is known as the boundary query problem where the retrieval Precision will degrade when a ....
Tolga Bozkaya and Meral Ozsoyoglu. "Distance-Based Indexing for Highdimensional Metric Spaces". SIGMOD Record, 26(2):357--368, June 1997.
....method for the physical clustering of vp tree nodes. We compared the costs of the n nearest neighbor search with R tree and M tree by experiments, and show that the search of vp tree is considerably more efficient. 2. The update problem has also been left open for the vp tree and its variants [6]. We propose mechanisms for update operations on the vp tree. We investigate two alternatives in the insert operation: splitfirst and redistribute first techniques; and two alternatives in the delete operation: merge first and redistribute first. All of these techniques preserve the balanced tree ....
....the coordinates of the N objects on one axis in the first recursive call, and those on the other axis in the second recursive call. 2.2 Distance Based Index Structures Quite a number of distance based indexing structures have been proposed. A summary of some of these methods can be found in [6, 7]. Previous work includes techniques suggested in [8] which contains some of the basic ideas for later methods, the generalized hyperplane tree (gh tree) 36] the vantage point tree (vptree) 36, 39, 10] the Geometric Near neighbor Access Tree (GNAT) 7] the mvp tree [6] which is a variation of ....
[Article contains additional citation context not shown here]
T. Bozkaya and M. Ozoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proceedings of the ACM SIGMOD International Conference on the Management of Data, 1997.
....in a database and index them based on their content has trigerred a lot of research on multidimensional index structures. Some of the recently proposed techniques include TV trees [23] X tree [3] SS tree [34] SR tree [21] M tree [8] hB tree [24] LSDh tree [17] vp tree [7] and mvp trees [6]. In this section, we develop a classification of multidimensional indexing techniques which allows us to compare the hybrid tree with the previous research in this area. The classification is summarized in Figure 1. Since we have already discussed dimensionality 3 Ordering techniques (e.g. ....
T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high dimensional metric spaces. Proc. of SIGMOD, 1997.
....tree degenerates into a simple list of vantage points. Another method of Uhlmann [4] is the generalized hyper plane tree (gh tree) The gh tree partitions the data set into two by picking two points as representatives and assigning the remaining to the closest representative. Bozkaya and Ozsoyoglu [7] proposed an extension of the vp tree called multi vantage point tree (mvp tree) which chooses in a clever way m vantage points for a node which has a fanout of m 2 . The Geometric Near Access Tree (GNAT) of Brin [8] can be viewed as a refinement of another technique presented in [3] All ....
Bozkaya,T., zsoyoglu, Z.M. Distance-Based Indexing for High-Dimensional Metric Spaces, ACM-SIGMOD (1997) 357-368.
....bounds, since they depend on the kind of data regions managed by the index. For instance, in M tree above bounds are computed as maxfd(v i ; v r ) Gamma r(v r ) 0g and d(v i ; v r ) r(v r ) respectively [CPZ97] Simple calculations are similarly required for other metric trees [Chi94, Bri95, BO97] as well as for spatial access methods, such as R tree (see [RKV95] 11 5.1 False Drops at the Index Level The generality of Theorem 1 does not come at no price. In particular, the absence of any specific assumption about the similarity environment and the access method makes it impossible ....
T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, pages 357--368, Tucson, AZ, May 1997.
....they claim that it outperforms the SS tree and the SR tree, by comparing their improvement relative to the R tree. One caveat in the above approach is that the design of the X tree is based on the assumption that data and query points are uniformly distributed. The VP tree The VP trees [82, 90, 16] use a simple and intuitive idea. Given a set of points P , we select a point p to be the vantage point. We compute the distance from p to all other points in P , and we use these distances to compare the points. Formally, for some point x 2 P , let Pi p (x) d(p; x) Given two points x and y in ....
T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proceedings of the ACM SIGMOD International Conference on Management of Data, volume 26,2, pages 357--368, New York, May 13--15 1997.
....as content information where the distance calculations will be based on. Images can also be compared on a pixel by pixel basis by calculating the distance between two images as the accumulation of the differences between the intensities of their pixels. 1 A preliminary version of this paper ([BO97]) appeared in ACM SIGMOD 1997. 2 This research is partially supported by the National Science Foundation grant IRI 92 24660, and the National Science Foundation FAW award IRI90 24152 2 In all the applications above, the problem is to find data items similar to a given query item where the ....
T. Bozkaya, M. Ozsoyoglu, "Distance-Based Indexing for High-Dimensional Metric Spaces", Proceedings of the 1997 ACM SIGMOD Conference, Tucson, pages 357-368, 1997,
No context found.
T. Bozkaya, M. Ozsoyoglu.(1997) "Distance-based Indexing for High-dimensional Metric Spaces" ACM SIGMOD 97, page 357-368.
No context found.
T. Bozkaya and Z. M. Ozsoyoglu. Distance based indexing for high dimensional metric spaces. In Proceedings of the 1997.
No context found.
T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In SIGMOD '97: Proceedings of the 1997.
No context found.
T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proc. of ACM SIGMOD, pp. 357--368, 1997.
No context found.
T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proc. ACM SIGMOD International Conference on Management of Data, pages 357-368, 1997.
No context found.
T. Bozkaya and M. Ozsoyoglu, "Distance-Based Indexing for Highdimensional Metric Spaces", in SIGMOD Record, volume 26, pages 357-- 368, 1997.
No context found.
T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proc. ACM Conference on Management of Data (SIGMOD'97), pages 357--368, 1997. Sigmod Record 26(2).
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC