MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  IBM Almaden

Download:
pdf | ps
by Anthony Tomasic, Hector Garcia-molina, Kurt Shoens
http://www-db.stanford.edu/pub/papers/sigmod.94.ps
Add To MetaCart

Abstract:

With the proliferation of the world's "information highways " a renewed interest in efficient document indexing techniques has come about. In this paper, the problem of incremental updates of inverted lists is addressed using a new dual-structure index data structure. The index dynamically separates long and short inverted lists and optimizes the retrieval, update, and storage of each type of list. To study the behavior of the index, a space of engineering tradeoffs which range from optimizing update time to optimizing query performance is described. We quantitatively explore this space by using actual data and hardware in combination with a simulation of an information retrieval system. We then describe the best algorithm for a variety of criteria. 1

Citations

2217 Introduction to Modern Information Retrieval – Salton, McGill - 1983
1651 R-trees: A dynamic index structure for spatial searching – Guttman - 1984
1446 The art of Computer Programming – Knuth - 1981
936 Database and Knowledge-Base Systems, Volume II – Ullman - 1989
542 Human Behavior and the Principle of Least Effort – Zipf - 1949
492 Art of Computer Programming, Volume 3: Sorting and Searching (2nd Edition – Knuth - 1998
363 The grid file: An adaptable, symmetric multikey file structure – Nievergelt, Hinterberger, et al.
312 Searching Distributed Collections with Inference Networks – Callan, Lu, et al. - 1995
192 R+-tree: A dynamic index for multi-dimensional objects – Sellis, Roussopoulos, et al. - 1987
184 World-Wide Web: The Information Universe – BERNERS-LEE, CAILLIAU, et al. - 1992
172 Overview of the third text REtrieval conference (TREC-3), in Overview of the Third Text REtrieval Conference – Harman - 1995
170 Harvest: A Scalable, Customizable Discovery and Access System – Bowman, Danzig, et al. - 1994
164 Generalizing gloss to vector-space databases and broker hierarchies – Gravano, García-Molina - 1995
144 The Effectiveness of GlOSS for the Text-Database Discovery Problem – Gravano, Garcia-Molina, et al. - 1994
128 An information system for corporate users: Wide area information servers – Kahle, Medlar - 1991
124 A class of data structures for associative searching – ORENSTEIN, T - 1984
96 The Collection Fusion Problem – Voorhees, Gupta, et al. - 1995
78 A Comparison of Internet Resource Discovery Approaches – Schwartz, Emtage, et al. - 1992
76 K.: Incremental Updates of Inverted Lists for Text Document Retrieval. Short Version of – Tomasic, Garcia-Molina, et al. - 1993
71 INTERNET resource discovery services – Obraczka, Danzig, et al. - 1993
63 Fast incremental indexing for full-text information retrieval – Brown, Callan, et al. - 1994
57 Retrieving records from a gigabyte of text on a minicomputer using statistical ranking – Harman, Candela - 1990
57 An e cient indexing technique for fulltext database systems – Zobel, at, et al. - 1992
56 Optimizations for dynamic inverted index maintenance – Cutting, Pedersen - 1990
43 The Prospero File System: A Global File System Based on the Virtual System Model – Neuman - 1992
41 The Rufus system: information organization for semistructured data – Shoens, Luniewski, et al. - 1993
39 Distributed Active Catalogs and Meta-Data Caching in Descriptive Name Services – Ordille, Miller - 1993
36 A new algorithm for computing joins with grid files – Becker, Hinrichs, et al. - 1993
36 A General Solution of the n-dimensional B-tree Problem – Freeston - 1995
35 File organization for database design – Wiederhold - 1987
34 On B-tree indices for skewed distributions – Faloutsos, Jagadish - 1992
34 Distributed indexing: a scalable mechanism for distributed information retrieval", SIGIR – Danzig, Ahn, et al. - 1991
31 Precision and recall of GlOSS estimators for database discovery – Gravano, a-Molina, et al. - 1994
30 Multiattribute hashing using Gray codes – Faloutsos - 1986
29 Sparse matrix technology – Pissanetzky - 1984
28 Information Brokers: Sharing Knowledge in a Heterogeneous Distributed System – Barbara, Clifton - 1992
24 A Scalable, Non-Hierarchical Resource Discovery Mechanism Based on Probabilistic Protocols – Schwartz - 1990
21 Content routing in a network of WAIS servers – Duda, Sheldon - 1994
20 Optimal partial-match retrieval when fields are independently specified – Aho, Ullman - 1979
19 Hybrid index organizations for text databases – Faloutsos, Jagadish - 1992
18 Data structures for efficient broker implementation – TOMASIC, GRAVANO, et al. - 1997
18 Siemens TREC-4 report: Further experiments with database merging – Voorhees - 1995
17 A content routing system for distributed information servers – Sheldon, Duda, et al. - 1994
17 Querying a network of autonomous databases – Simpson, Alonso - 1989
16 Frakes and Ricardo Baeza�Yates. Information Retrieval� Data Structures and Algorithms – William - 1992
16 Implementation of the grid file: Design concepts and experience – Hinrichs - 1985
11 Full-text Document Retrieval Benchmark – DeFazio - 1993
10 Katia Obraczka. Distributed indexing of autonomous internet services – Danzig, Li - 1992
10 About the Veronica service – Foster - 1992
10 Optimal partial-match retrieval – LLOYD - 1980