| A. Gionis, D. Gunopulos, and N. Koudas. Efficient and Tunable Similar Set Retrieval. SIGMOD Conference. 2001. |
....high dimensional datasets of the UCI KDD Archive [22] collected by real application domains, the majority of the attributes are categorical. In addition, set data types (e.g. market basket transactions) are frequently used to describe complex data in object oriented object relational systems [11]. In this paper we show how a hierarchical index can be used to process efficiently similarity search and other related query types on sets and categorical data. In contrast to a previous method [1] the signature tree (SG tree) is suitable for a dynamic environment with frequent updates and ....
....not straightforward. To our knowledge the only method previously proposed for similarity search in set and categorical data spaces is [1] Due to its high relevance to our approach, we describe it in detail in the following paragraph. The similarity search problem for sets has also been studied in [11], where hash based indexes which provide approximate results are proposed. In this paper, we deal with the problem of finding the exact answers to queries, thus our method is not directly comparable to these indexes. Finally, a similar hierarchical index to the SG tree was proposed in [7] ....
A. Gionis, D. Gunopulos, and N. Koudas. Efficient and Tunable Similar Set Retrieval. SIGMOD Conference. 2001.
....In order to be competitive, our measure should also try to be tractable in polynomial time. In [GH 96] the idea of calculating similarities between sets, using the Jaccard coefficient, is investigated. The indexing issue for distance similarity between sets of values is treated in recent work [GGK01], again using the Jaccard coefficient to calculate the similarity. BFS02] also investigate this with their mediator approach. The traditional cosine measure from the Information Retrieval literature (see [SM83] has the same behavior as the Jaccard coefficient. As a matter of fact, it can be ....
A. Gionis, D. Gunopulos, N. Koudras, "Efficient and Tunable Similar Set Retrieval", ACM- SIGMOD (2001).
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC