(Enter summary)
Abstract: We consider the problem of joining massive datasets. We propose two techniques for minimizing disk I/O cost of join operations for both spatial and sequence data. Our techniques optimize the available buffer space using a global view of the datasets. We build a boolean matrix on the pages of the given datasets using a lower bounding distance predictor. The marked entries of this matrix represent candidate page pairs to be joined. Our first technique joins the marked pages iteratively. Our... (Update)
Context of citations to this paper: More
...entries, r marked rows and c marked columns, then pm NLJ performs at least e rain r, c disk I Os for that cluster. Proof: omitted (cf. [24]) For the example in Figure 3, r = 3, c = 2, and e = 5. The total number of disk I Os is 5 rain 2, 3 = 7. Note that NLJ is the same...
Cited by: More
Joining Massive High-Dimensional Datasets - Kahveci, Lang, Singh (2003)
(Correct)
Active bibliography (related documents): More All
0.6: Optimizing Similarity Search for Arbitrary Length Time Series.. - Kahveci, Singh (2003)
(Correct)
0.6: R-Trees Have Grown Everywhere - Manolopoulos, Nanopoulos..
(Correct)
0.5: GORDER: An Efficient Method for KNN Join Processing - Chenyi Xia Hongjun (2004)
(Correct)
Similar documents based on text: More All
0.3: Variable Length Queries for Time Series Data - Kahveci, Singh (2001)
(Correct)
0.3: An Efficient Index Structure for String Databases - Kahveci, Singh (2001)
(Correct)
0.3: Shift and Scale Invariant Search of Multi-attribute Time.. - Kahveci, Singh, Gurel (2001)
(Correct)
BibTeX entry: (Update)
T. Kahveci, C. A. Lang, and A. K. Singh. Joining massive highdimensional datasets. Technical Report 30, UCSB, 2002. http://citeseer.ist.psu.edu/article/kahveci03joining.html More
@inproceedings{ kahveci-joining,
author = "Tamer Kahveci and Christian Lang and Ambuj K Singh",
title = "Variable Length Queries for Time Series Data",
booktitle = "{ICDE}",
year = "2003",
url = "citeseer.ist.psu.edu/article/kahveci03joining.html",
url = "citeseer.nj.nec.com/kahveci03joining.html" }
Citations (may not include all citations):
4212
Computers and intractability A guide to the theory of NP-Com.. (context) - Garey, Jhonson - 1979
516
tree: An efficient and robust access method for points and r.. (context) - Beckmann, Kriegel et al. - 1990
241
Fast subsequence matching in time-series databases
- Faloutsos, Ranganathan et al. - 1994
205
Efficient similarity search in sequence databases
- Agrawal, Faloutsos et al. - 1993
159
Efficient processing of spatial joins using R-trees (context) - Brinkhoff, Kriegel et al. - 1993
134
Spatial query processing in an objectoriented database syste.. (context) - Orenstein - 1986
126
Fast similarity search in the presence of noise (context) - Agrawal, Lin et al. - 1995
122
Database Management Systems (context) - Ramakrishnan, Gehrke - 2000
115
Partition based spatial-merge join
- Patel, DeWitt - 1996
89
Multi-step processing of spatial joins (context) - Brinkhoff, Kriegel et al. - 1994
87
Similarity-based queries for time series data
- Rafiei, Mendelzon - 1997
81
Spatial hash-joins
- Lo, Ravishankar - 1996
70
Optimal aggregation algorithms for middleware
- Fagin, Lotem et al. - 2001
68
Spatial joins using seeded trees (context) - Lo, Ravishankar - 1994
61
Efficient time series matching by wavelets
- Chan, Fu - 1999
61
Scalable sweeping-based spatial join
- Arge, Procopiuc et al. - 1998
55
Spatial joins using R-trees: Breadth-first traversal with gl..
- Huang, Jing et al. - 1997
49
Size separation spatial join
- Koudas, Sevcik - 1997
43
Dimensionalityreduction for similarity searching in dynamic ..
- Kanth, Agrawal et al. - 1998
41
Incremental distance join algorithms for spatial databases
- Hjaltason, Samet - 1998
37
Storage and access in relational data bases (context) - Blasgen, Eswaran - 1977
27
Matching and indexing sequences of different lengths
- Bozkaya, Yazdani et al. - 1997
27
Fast time-series searching with scaling and shifting (context) - Chu, Wong - 1999
25
High-dimensional similarity joins
- Shim, Srikant et al. - 2002
24
Variable length queries for time series data
- Kahveci, Singh - 2001
21
A performance evaluation of spatial jo (context) - Papadopoulos, Rigaux et al. - 1999
20
Join algorithm costs revisited
- Harris, Ramamohanarao - 1996
19
High dimensional similarity joins: algorithms and performanc..
- Koudas, Sevcik - 2000
19
Sort-merge-join: An idea whose time has (context) - Graefe - 1994
18
An efficient index structure for string databases
- Kahveci, Singh - 2001
15
Approximate nearest neighbors and sequence comparison with b..
- Muthukrishnan, Sahinalp - 2000
15
An analysis of schedules for performing multipage requests
- Seeger - 1996
13
Reading a set of disk pages (context) - Seeger, Larson et al. - 1993
13
TSA-tree: A waveletbased approach to improve the efficieny o..
- Shahabi, Tian et al. - 2000
8
Dissecting CPU and memory optimization effects (context) - Manegold, Boncz et al. - 2000
8
A cost model and index architecture for the similarity join
- Bohm, Kriegel - 2001
7
Efficient scheduling of page access in index-based jo (context) - Chan, OOi - 1997
6
Epsilon grid order: An algorithm for the similarity join on ..
- Bohm, Braunmuller et al. - 2001
6
BLOCKS database and its applications (context) - Henikoff, Henikoff - 1996
4
Similarity searching for multi-attribute sequences
- Kahveci, Singh et al. - 2002
3
On sort-merge algorithm for band joins (context) - Lu, Tan - 1995
3
GESS: a scalable similarityjoin algorithm for mining large d.. (context) - Dittrich, Seeger - 2001
2
Clustering non-uniform-sized spatial objects to reduce i/o c.. (context) - Xiao, Zhang et al. - 2001
2
sortsweep algorithm new method R tree based spatial join (context) - Rigaux, sweep et al. - 2000
1
Join: an easy-to-use generic algorithm for efficiently proce.. (context) - Bercken, Schneider et al.
Documents on the same site (http://www.cs.ucsb.edu/research/trcs/index.shtml): More
STATL Definition - Eckmann, Vigna, Kemmerer (2000)
(Correct)
A Comparison of Feedback Based and Fair Queuing Mechanisms.. - Iancu, Acharya (2001)
(Correct)
An Evaluation of Search Tree Techniques In The Presence of Caches - Iancu, Acharya (2001)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC