MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Hierarchies of indices for text searching (1994) [9 citations — 5 self]

Download:
Download as a PDF | Download as a PS
by Ricardo Baeza-yates, Eduardo F. Barbosa, Nivio Ziviani
Rockefeller University
ftp://sunsite.dcc.uchile.cl/pub/users/rbaeza/papers/IS96.ps.gz
Add To MetaCart

Abstract:

Abstract--- We present an efficient implementation of a recently known index for text databases, when the database is stored on secondary storage devices such as magnetic or optical disks. The implementation is built on top of a new and simple index for texts called pat array (or suffix array). Considering that text searching in a large database spends most of the time accessing external storage devices, we propose additional index structures and searching algorithms for pat arrays that reduce the number of disk accesses. We present two index structures: a two-level hierarchy model that uses main memory and one level of external storage (magnetic or optical devices) and a three-level hierarchy model that uses main memory and two levels of external storage (magnetic and optical devices). Performance improvement is achieved in both models by storing most of higher index levels in faster memories, thus reducing accesses in the slowest devices in the hierarchy. Analytical and experimental results are presented for both models. For 160 megabytes of text stored on cd-rom disk the two-level model using 2 megabytes of main memory costs 20 % of the pat array used as a single level.Copyright

Citations

385 Suffix arrays: A new method for on-line string searches – Manber, Myers - 1993
172 Overview of the third text REtrieval conference (TREC-3), in Overview of the Third Text REtrieval Conference – Harman - 1995
160 GLIMPSE: A Tool to Search through Entire File Systems – Manber, Wu - 1994
157 Handbook of Algorithms and Data Structures – Gonnet - 1984
118 PATRICIA - practical algorithm to retrieve information coded in alphanumeric – Morrison - 1968
78 The Art of Computer Programming: Sorting and Searching, volume 3 – Knuth - 1973
33 New indices for text: PAT trees and PAT arrays – Gonnet, Baeza-Yates, et al. - 1992
33 W.C.: Inverted fi les – Fox, Harman, et al.
22 Frame-Sliced Signature Files – Lin, Faloutsos - 1992
11 Unstructured data bases or very efficient text searching – Gonnet - 1983
8 From Partial to Full Inverted Lists for Text Searching – Barbosa, Ziviani - 1995
4 Optimized binary search and text retrieval – Barbosa, Navarro, et al. - 1995
4 3.1: an efficient text searching system – Pat - 1987
3 Efficient text searching methods for secondary memory – Barbosa - 1995
3 Data structures and access methods for read-only optical disks – Barbosa, Ziviani - 1992
1 File organizations for optical disks – Ford, Christodoulakis - 1992
1 An optimal index for Pat arrays – Navarro - 1996