Download:
|
by Ricardo Baeza-yates, Eduardo F. Barbosa, Nivio Ziviani
Rockefeller University
ftp://sunsite.dcc.uchile.cl/pub/users/rbaeza/papers/IS96.ps.gz
Add To MetaCart
Abstract:
Abstract--- We present an efficient implementation of a recently known index for text databases, when the database is stored on secondary storage devices such as magnetic or optical disks. The implementation is built on top of a new and simple index for texts called pat array (or suffix array). Considering that text searching in a large database spends most of the time accessing external storage devices, we propose additional index structures and searching algorithms for pat arrays that reduce the number of disk accesses. We present two index structures: a two-level hierarchy model that uses main memory and one level of external storage (magnetic or optical devices) and a three-level hierarchy model that uses main memory and two levels of external storage (magnetic and optical devices). Performance improvement is achieved in both models by storing most of higher index levels in faster memories, thus reducing accesses in the slowest devices in the hierarchy. Analytical and experimental results are presented for both models. For 160 megabytes of text stored on cd-rom disk the two-level model using 2 megabytes of main memory costs 20 % of the pat array used as a single level.Copyright
Citations
|
385
|
Suffix arrays: A new method for on-line string searches
– Manber, Myers
- 1993
|
|
172
|
Overview of the third text REtrieval conference (TREC-3), in Overview of the Third Text REtrieval Conference
– Harman
- 1995
|
|
160
|
GLIMPSE: A Tool to Search through Entire File Systems
– Manber, Wu
- 1994
|
|
157
|
Handbook of Algorithms and Data Structures
– Gonnet
- 1984
|
|
118
|
PATRICIA - practical algorithm to retrieve information coded in alphanumeric
– Morrison
- 1968
|
|
78
|
The Art of Computer Programming: Sorting and Searching, volume 3
– Knuth
- 1973
|
|
33
|
New indices for text: PAT trees and PAT arrays
– Gonnet, Baeza-Yates, et al.
- 1992
|
|
33
|
W.C.: Inverted fi les
– Fox, Harman, et al.
|
|
22
|
Frame-Sliced Signature Files
– Lin, Faloutsos
- 1992
|
|
11
|
Unstructured data bases or very efficient text searching
– Gonnet
- 1983
|
|
8
|
From Partial to Full Inverted Lists for Text Searching
– Barbosa, Ziviani
- 1995
|
|
4
|
Optimized binary search and text retrieval
– Barbosa, Navarro, et al.
- 1995
|
|
4
|
3.1: an efficient text searching system
– Pat
- 1987
|
|
3
|
Efficient text searching methods for secondary memory
– Barbosa
- 1995
|
|
3
|
Data structures and access methods for read-only optical disks
– Barbosa, Ziviani
- 1992
|
|
1
|
File organizations for optical disks
– Ford, Christodoulakis
- 1992
|
|
1
|
An optimal index for Pat arrays
– Navarro
- 1996
|