35 citations found. Retrieving documents...
D. Shasha, Tsong-Li Wang, New techniques for best match retrieval, ACM Trans. Inform. Systems 8 (1990) 140}158.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Extracting Classification Knowledge of Internet.. - Lin, Shih, Chen, Ho, .. (1998)   (3 citations)  (Correct)

....When a query consisting of one or more terms is submitted to the system, the index is applied to rapidly locate documents that contain these terms. Once documents are retrieved, a VSM based [3] similarity measurement is employed to rank the documents. A popular ranking algorithm, TFxlDF [14], combines Term Frequency and Inverse Document Frequency to estimate the metric of relevance between the terms of documents and the query. Those systems can search relevant documents efficiently. However, the index size is usually larger than the size of original documents. Huge physical memory ....

Shasha, D., and Wang, T., "New Techniques for Best- Match Retrieval", ACM Transactions on Office Information Systems, Vol. 8, No. 2, January 1990, pp. 140-158.


Finding Motifs in Time Series - Lin, Keogh, Lonardi, Patel (2002)   (2 citations)  (Correct)

....first calculate the distance D(Q,Ca) and discover it to be 7. At this point we should continue on to measure D(Q,Cb) but in fact we don t have to do this calculation We can use the triangular inequality to discover that D(Q,Cb) could not be a match to Q. The triangular inequality requires that [2, 22, 33]: D(Q,C) D(Q,Cb) D(C,Cb) 8) Filling in the known values give us 7 D(Q,Cb) 2 (9) Rearranging the terms gives us 5 D(Q,Cb) 10) But since we are only interested in subsequences that are a distance less than 1 unit away, there is no point in determining the exact value of D(Q,Cb) ....

....the exact value of D(Q,Cb) which we now know to be at least 5 units away. The first formalization of this idea for fast searching of nearest neighbors in matrices is generally credited to Burkhard and Keller [5] More efficient implementations are possible, for example Shasha and Wang [33], introduced the Approximation Distance Map (ADM) algorithm that takes advantage of an arbitrary set of pre computed distances instead of using just one randomly chosen reference point. For the problem at hand, however, the techniques discussed above seem of little utility, since as previously ....

[Article contains additional citation context not shown here]

Shasha, D. & Wang, T. (1990). New techniques for bestmatch retrieval. ACM Trans. on Information Systems, Vol. 8(2). pp 140-158.


Mining Motifs in Massive Time Series Databases - Patel, Keogh, Lin, Lonardi (2002)   (2 citations)  (Correct)

....calculate the distance D(Q,C a ) and discover it to be 7. At this point we should continue on to measure D(Q,C b ) but in fact we don t have to do this calculation We can use the triangular inequality to discover that D(Q,C b ) could not be a match to Q. The triangular inequality requires that [2, 22, 33]: D(Q,C a ) # D(Q,C b ) D(C a ,C b ) 7) Filling in the known values give us 7 # D(Q,C b ) 2 (8) Rearranging the terms gives us 5 # D(Q,C b ) 9) But since we are only interested in subsequences that are a distance less than 1 unit away, there is no point in determining the exact value of ....

....the exact value of D(Q,C b ) which we now know to be at least 5 units away. The first formalization of this idea for fast searching of nearest neighbors in matrices is generally credited to Burkhard and Keller [5] More efficient implementations are possible; for example, Shasha and Wang [33] introduced the Approximation Distance Map (ADM) algorithm that precomputes an arbitrary set of distances instead of using just one randomly chosen reference point. For the problem at hand, however, the techniques discussed above seem of little utility, since as previously noted, we are unlikely ....

[Article contains additional citation context not shown here]

Shasha, D. & Wang, T. (1990). New techniques for best-match retrieval. ACM Trans. on Information Systems, Vol. 8(2). pp 140-158.


Discovering Informative Content Blocks from Web Documents - Lin, Ho (2002)   (2 citations)  (Correct)

....according to the weight distribution of features appearing in a page cluster. For easy calculation of each feature s entropy, features of content blocks in a page can be grouped and represented as a feature document list with term frequency (TF) or weight (such as TFxIDF [16] or its variations [18]) Considering all pages in a cluster, these lists of pages form the feature document matrix (F D Matrix) The F D matrix can be generated while extracting features of documents in the cluster with the time complexity O( D F log F ) where F is the average number of features and F log F is ....

Shasha, D. and Wang, T., "New Techniques for Best-Match Retrieval," ACM Transactions on Office Information System, 8(2):140-158, 1990.


Searching in Metric Spaces - Chávez, Navarro, Baeza-Yates, .. (1999)   (8 citations)  (Correct)

....on 20 the fly whether to take more pivots, while FQTs must precompute that decision (i.e. bucket size) The problem with the algorithm [64] is that it needs O(n 2 ) space and construction time. This is unacceptably high for all but very small databases. In this sense the approach is close to [58], although in this latter case they may take fewer distances and bound the unknown ones. AESA is experimentally shown to have O(1) query time. LAESA and variants In a newer version of AESA, called LAESA (for Linear AESA) 47] they propose to use k fixed pivots, so that the space and construction ....

D. Sasha and T. Wang. New techniques for best-match retrieval. ACM Trans. on Information Systems, 8(2):140--158, 1990.


Searching in Metric Spaces - Chavez, Navarro, Baeza-Yates.. (1999)   (8 citations)  (Correct)

....the fly whether to take more pivots, while FQTs must precompute that decision (i.e. bucket size) The problem with the algorithm [Vidal 1986] is that it needs O(n 2 ) space and construction time. This is unacceptably high for all but very small databases. In this sense the approach is close to [Sasha and Wang 1990], although in this latter case they may take fewer distances and bound the unknown ones. AESA is experimentally shown to have O(1) query time. 5.1.3.2 LAESA and variants. In a newer version of AESA, called LAESA (for Linear AESA) Mic o et al. 1994] they propose to use k fixed pivots, so that ....

Sasha, D. and Wang, T. 1990. New techniques for best-match retrieval. ACM Trans. on Information Systems 8, 2, 140--158.


Extracting Classification Knowledge of Internet Documents with.. - Lin (1998)   (3 citations)  (Correct)

....document. When a query consisting of one or more terms is submitted to the system, the index is applied to rapidly locate documents that contain these terms. Once documents are retrieved, a VSM based [3] similarity measurement is employed to rank the documents. A popular ranking algorithm, TFIDF [14], combines Term Frequency and Inverse Document Frequency to estimate the metric of relevance between the terms of documents and the query. Those systems can search relevant documents efficiently. However, the index size is usually larger than the size of original documents. Huge physical memory ....

Shasha, D., and Wang, T., "New Techniques for BestMatch Retrieval", ACM Transactions on Office Information Systems, Vol. 8, No. 2, January 1990, pp. 140-158.


Indexing Multimedia Databases - Faloutsos   (Correct)

....y) d(x; w) d(w; y) Methods: ffl Branch and bound, searching a cluster hierarchy. FN75] Can be applied with R trees. o o o o o o C1 C2 Q x 26 ffl Pre compute distances from some points. Single point ( star ) BK73] multiple points [Sha77] arbitrary topologies [SW90] Typically 20 80 of the file is searched. o o o o o o Anchor Q 27 4.5 Conclusions Among the SAMs, Z ordering (Linear quadtrees) and R trees seem the most promising methods. 28 5 ACCESS METHODS FOR TEXT Applications: ffl captions of multimedia objects ffl Library automation ....

Dennis Shasha and Tsong-Li Wang. New techniques for best-match retrieval. ACM TOIS, 8(2):140--158, April 1990.


An Efficient Content-based Retrieval Method using.. - Unknown (1998)   (Correct)

.... #### # 3 ### ffi # ffi,#CK#### ae ### ### #,#### :L #j . 4 2.2 j ### CK# #(Multi dimensional indexing) ffi ###9L j # # #sv # ### j ## # ,#OE j ) HL OE ) 9L 4 # cgsvffi ###ffi #9L HL OE 7L ah ### #9L)### ## ffi,#CK# 9L 4 # #cgffi ###ffi 2L # ,#j . Shasha ## Wang [25]### #ffi # ##### #(triangular inequality)## ### #cg # ffi,#CK#### ae.#9L :Lffi) ### CK# # ffi ,#### :L # #j . U[ i , CK# # ffi ,## ###ffi # ###### HL OE 7L ah9L ae)ae ## ### =K#;Q# 9L 2# ae ###7 (quadratic)im ###ffi ffi CG##9L 4 ### ## HL OE 7L ah9L### ae,#ae,# ae ### j ### ....

D. Shasha , T-L. Wang,. New techniques for best-match retrieval. ACM TOIS, 8(2):140--158, April 1990.


A Unified Model for Similarity Searching - Chávez, Navarro..   (Correct)

....so as to predict its probable future behavior) etc. Since the problem has appeared in unrelated areas, the corresponding algorithms and data structures seem to emerge from a great diversity, and different approaches have been proposed and analyzed separately, often under different assumptions [5, 20, 22, 19, 21, 23, 13,15, 1, 4, 14, 18, 3, 11, 17, 7, 8, 24]. Due to space limitations we refer the reader to a recent survey where all the known approaches for similarity searching are discussed [9] Currently, the only realistic way to compare two different algorithms is to apply them to the same data set. We present a unified complexity model for the ....

D. Sasha and T. Wang. New techniques for best-match retrieval. ACM Trans. on Information Systems, 8(2):140--158, 1990.


Searching in Metric Spaces - Chávez, Navarro, Baeza-Yates, .. (1999)   (8 citations)  (Correct)

....on the fly whether to take more pivots, while FQTs must precompute that decision (i.e. bucket size) 16 The problem with the algorithm [55] is that it needs O(n 2 ) space and construction time. This is unacceptably high for all but very small databases. In this sense the approach is close to [50], although in this latter case they may take less distances and bound the unknown ones. AESA is experimentally shown to have O(1) query time. LAESA and variants In a newer version of AESA, called LAESA (for Linear AESA) 42] they propose to use k fixed pivots, so that the space and construction ....

D. Sasha and T. Wang. New techniques for best-match retrieval. ACM Trans. on Information Systems, 8(2):140--158, 1990.


Searching in Metric Spaces - Chávez, Navarro, Baeza-Yates, .. (1999)   (8 citations)  (Correct)

....on the y whether to take more pivots, while FQTs must precompute that decision (i.e. bucket size) The problem with the algorithm [63] is that it needs O(n 2 ) space and construction time. This is unacceptably high for all but very small databases. In this sense the approach is close to [57], although in this latter case they may take fewer distances and bound the unknown ones. AESA is experimentally shown to have O(1) query time. LAESA and variants In a newer version of AESA, called LAESA (for Linear AESA) 46] they propose to use k xed pivots, so that the space and construction ....

D. Sasha and T. Wang. New techniques for best-match retrieval. ACM Trans. on Information Systems, 8(2):140-158, 1990.


Fast Approximate String Matching in a Dictionary - Baeza-Yates, Navarro (1998)   (Correct)

....This is because the distance distribution tends to be very centered (which is bad for all range search algorithms) and the selection of a centroid distributes the distances better. The problem with the algorithm [33] is that it needs O(n 2 ) space and build time. In this sense it is close to [25]. This is unacceptably high for all by very small databases. Some approaches designed for continuous distance functions [31, 37, 8, 9, 12, 24] are not covered in this brief review. The reason is that these structures do not use all the information obtained from the comparisons, since this cannot ....

D. Sasha and T. Wang. New techniques for best-match retrieval. ACM TOIS, 8(2):140--158, 1990.


Indexing Large Metric Spaces for Similarity Search Queries - Bozkaya, Ozsoyoglu (1999)   (17 citations)  (Correct)

....cliques at each level, and keeping their representatives in the nodes to direct or trim the search. Note that keys may appear in more than one clique; so the aim is to select the representative keys to be the ones that appear in as many cliques as possible. In another approach, Shasha and Wang [SW90] suggested using pre computed distances between data elements to efficiently answer similarity search queries. The aim is to minimize the number of distance computations as much as possible, as they are assumed to be very expensive. Search algorithms of O(n) or even O(n log n) where n is the ....

....is to minimize the number of distance computations as much as possible, as they are assumed to be very expensive. Search algorithms of O(n) or even O(n log n) where n is the number of data objects) are acceptable if they minimize the number of distance computations. In Shasha and Wang s method [SW90], a table of size O(n 2 ) keeps the distances between data objects if they are pre computed. The other pairwise distances are estimated (by specifying an interval) by making use of the pre computed distances. The technique of storing and using pre computed distances may be effective for data ....

D. Shasha, T. Wang, "New Techniques for Best-Match Retrieval", ACM Transactions on Information Systems, 8(2), pages 140-158, 1990.


Issues in Using Knowledge to Perform Similarity Searching in .. - Leonard Brown The   (Correct)

....object implies that a distance function, D, exists that can measure the similarity. This function accepts two multimedia objects as input and returns some value. This distance function must be defined so that its returned value is nonnegative, it is symmetric, and it obeys the triangle inequality [SW90]. The most intuitive method to find the nearest neighbors of a query object, Q, is to compute the distance from it to each of the objects in the database. Once all the distances are known, they can be sorted in ascending order. The smallest distance corresponds to the nearest neighbor. The ....

....objects. The process of computing the similarity may be extremely complex or costly [SK98] As a result of this, many researchers have proposed methods of finding the k nearest neighbors of a query object that reduce the number of times the distance function, D, must be computed in the database [SW90, BK73, SK98]. These algorithms tend to determine a way of cheaply approximating the distance between two multimedia objects. These approximations are used to find and eliminate objects in the database that cannot possibly be the nearest neighbor to the query based on some known actual distance computations. ....

[Article contains additional citation context not shown here]

Shasha, Dennis and Tsong-Li Wang, "New Techniques for Best-Match Retrieval", ACM Transactions on Information Systems, Volume 8, Number 2, April 1990, pp. 140-158.


Efficient and Effective Querying by Image Content - Faloutsos, Equitz.. (1994)   (270 citations)  (Correct)

....to be increased communication between the vision and the database communities for such problems, and it is exactly this gap that this paper tries to bridge. 2.2 Multi dimensional indexing Within the database community, approximate matching has been attracting increasing interest. Shasha and Wang [43] proposed an indexing method that uses the triangular inequality and some precomputed distances to prune the search. Aurenhammer [2] surveys recent research on Voronoi diagrams, along with their use for nearest neighbor queries. Jagadish [24] suggested using a few minimum bounding rectangles to ....

Dennis Shasha and Tsong-Li Wang. New techniques for best-match retrieval. ACM TOIS, 8(2):140--158, April 1990.


An Indexing Scheme for Fast Similarity Search in Large Time.. - Keogh, Pazzani (1999)   (15 citations)  (Correct)

....a copy of one or more subsequences of the original data, together with a precomputed distance matrix which contains the distance between every possible pair of subsequence in that bin. This distance matrix enables pruning of the search space within the bin by using the triangular inequality [21]. When given a query, the retrieval algorithm achieves sub linear search times by using three techniques. 1) Whole bin pruning: A simple procedure exists that given the current best match, determines whether any element in a particular bin may be more similar to the query. This permits a branch ....

....the results. Others, including Shatkay and Zdonik [11] recognize that a piece wise linear representation greatly reduces the required storage and search space for a time series, but they do not suggest a robust distance measure or indexing technique. 6. 2 Indexing Techniques Shasha and Wang [21] proposed a very general indexing method called Approximate Distance Map (ADM) ADM uses the triangular inequality combined with precomputed distances to prune the search space. 2 , min( arg mod b l div b l b Figure 8: A comparison of the representational power of linear segments ....

Shasha, D., & Wang, T. L., (1990). New techniques for best-match retrieval. ACM Transactions on Information Systems, Vol. 8, No 2 April 1990, pp. 140158.


ACIRD: Intelligent Internet Documents Organization and.. - Lin, Chen, Ho, Huang (2002)   (Correct)

....the improvement of retrieval efficiency by using indexing and query reformulation techniques. The first step of wordbased document processing is to extract words from documents based on pre constructed dictionary, stoplist, and stemming rules. Once words are extracted, a widely used method TF IDF [9, 23] is applied to determine the weights of words. Term frequency (TF) is the number of occurrence of a word in a document and inverse document frequency (IDF) is the inverse of the document frequency, defined as the number of documents in which the word occurs. The weight of a word can be determined ....

D. Shasha, and T. Wang, "New Techniques for Best-Match Retrieval", ACM Transactions on Office Information Systems, Vol. 8, No. 2, January 1990, pp. 140-158.


Proximity Matching Using Fixed-Queries Trees - Baeza-Yates, Cunto, Manber, Wu   (Correct)

....are close, under some distance function we study in detail the Hamming and Levenshtein distance functions to a query element. This problem has been extensively studied for distance functions based on a lexicographical order (closest neighbor or closest point problem) We refer the reader to [SW90, Mur83] for techniques that assume linear orderings, Euclidean, and similar distance functions. The problem is much harder, however, for distance functions that are not related to a linear ordering. The practical approach for finding all elements in a database close to a query element is typically to ....

....studied the effect of approximating the Levenshtein distance using simpler distance functions that are easier to compute and bound from above the original function. However, the reduction in the computation of the distance was traded by the number of extra comparisons needed. Shasha and Wang [SW90] extended BK trees to any set of precomputed distances, using them in an optimal way. They compute an approximate distance map of the database to guide the search by using a Floyd Warshall style algorithm of O(n 3 ) running time. They study empirically the effect of the number of precomputed ....

Shasha, D. and Wang, T-L. "New Techniques for Best-Match Retrieval", ACM Transactions on Information Systems 8, 1990, 140--158.


Indexing Large Metric Spaces For Similarity Search Queries - Bozkaya, al. (2002)   (17 citations)  (Correct)

....cliques at each level, and keeping their representatives in the nodes to direct or trim the search. Note that keys may appear in more than one clique; so the aim is to select the representative keys to be the ones that appear in as many cliques as possible. In another approach, Shasha and Wang [SW90] suggested using pre computed distances between data elements to efficiently answer similarity search queries. The aim is to minimize the number of distance computations as much as possible, as they are assumed to be very expensive. Search algorithms of O(n) or even O(n log n) where n is the ....

....is to minimize the number of distance computations as much as possible, as they are assumed to be very expensive. Search algorithms of O(n) or even O(n log n) where n is the number of data objects) are acceptable if they minimize the number of distance computations. In Shasha and Wang s method [SW90], a table of size O(n 2 ) keeps the distances between data objects if they are pre computed. The other pairwise distances are estimated (by specifying an interval) by making use of the pre computed distances. The technique of storing and using pre computed distances may be effective for data ....

D. Shasha, T. Wang, "New Techniques for Best-Match Retrieval", ACM Transactions on Information Systems, 8(2), pages 140-158, 1990.


An Artistic Design System - Eaglestone (1994)   (Correct)

....as an interface. Objects can be retrieved by describing some of their perceptual properties, and then projecting along the axes corresponding to the other uninstantiated perceptual properties. The fuzzy nature of this type of querying means it can be viewed as an extension of the class 2 search in [10, 70]. It is therefore appropriate to return objects which are in some sense closest to the query key. In fact the object space metaphor provides an implementation of a notion of distance between objects. A further advantage is the ability to browse or explore object spaces. This can be ....

Shasha, D., Wang,T-L.: New Techniques for Best-Match Retrieval. ACM TOIS 8:2, 1990, pp. 140-158.


FastMap: A Fast Algorithm for Indexing, Data-Mining and.. - Faloutsos, Lin (1995)   (40 citations)  (Correct)

.... methods using linear quadtrees [Gar82] or, equivalently, the z ordering [Ore86, Ore90] or other space filling curves [FR89, Jag90b] and finally (c) methods that use grid files [NHS84, HN83] There are also retrieval methods for the case where only the triangular inequality holds [BK73] Sha77] SW90] BYCMW94] All these methods try to exploit the triangular inequality in order to prune the search space on a range query. However, none of them tries to map objects into points in target space , nor to provide a tool for visualization. Finally, our work could be beneficial to research on ....

Dennis Shasha and Tsong-Li Wang. New techniques for best-match retrieval. ACM TOIS, 8(2):140--158, April 1990.


Fast Nearest-Neighbor Search Algorithms Based on.. - Ramasubramanian, al. (2000)   (1 citation)  (Correct)

No context found.

D. Shasha, Tsong-Li Wang, New techniques for best match retrieval, ACM Trans. Inform. Systems 8 (1990) 140}158.


A Study Of Several Specific Secure Two-Party Computation Problems - Du (2001)   (3 citations)  (Correct)

No context found.

Dennis Shasha and Tsong-Li Wang. New techniques for best-match retrieval. ACM Transactions on Information Systems, 8(2):140-158, 1990.


Fast Full-Search Equivalent Nearest-Neighbour Search Algorithms - Chua (1999)   (Correct)

No context found.

D. Sasha and T. Wang. New techniques for best-match retrieval. ACM Transactions on Information Systems, 8(2):140--158, 1990.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC