Download:
|
by Justin Zobel, Alistair Moffat, Ron Sacks-davis
Proc. International Conference on Very Large Databases
http://www.cs.mu.oz.au/publications/tr_db/./mu_94_17.ps.gz
Add To MetaCart
Abstract:
There are several advantages to be gained by storing the lexicon of a full text database in main memory. In this paper we describe how to use a compressed inverted file index to search such a lexicon for entries that match a pattern or partially specified term. Our experiments show that this method provides an effective compromise between speed and space, running orders of magnitude faster than brute force search, but requiring less memory than other pattern-matching data structures; indeed, in some cases requiring less memory than would be consumed by a single pointer to each string. The pattern search method is based on text indexing techniques and is a successful adaptation of inverted files to main memory databases. 1
Citations
|
2739
|
A mathematical theory of communication
– Shannon
- 1948
|
|
391
|
A space-economical suffix tree construction algorithm
– McCreight
- 1976
|
|
223
|
Universal codeword sets and representations of the integers
– Elias
- 1975
|
|
161
|
Run-length encodings
– Golomb
- 1966
|
|
106
|
Approximate string matching with q-grams and maximal matches
– Ukkonen
- 1992
|
|
88
|
Self-Indexing Inverted Files for Fast Text Retrival
– Moffat, Zobel
- 1996
|
|
70
|
Data compression in full-text retrieval systems
– Bell, Moffat, et al.
- 1993
|
|
68
|
Optimal source codes for geometrically distributed integer alphabets
– Gallager, Voorhis
- 1975
|
|
57
|
An e cient indexing technique for fulltext database systems
– Zobel, at, et al.
- 1992
|
|
39
|
PATRICIA-Practical algorithm to retrieve information coded in alphanumeric
– MORRISON
- 1968
|
|
30
|
Practical minimal perfect hash functions for large databases
– Fox, Heath, et al.
- 1992
|
|
27
|
A Compression Method for Clustered Bit-Vectors
– Teuhola
- 1978
|
|
22
|
Parameterised compression for sparse bitmaps
– Moffat, Zobel
- 1992
|
|
17
|
Fast pattern matching: an aid to bibliographic search
– Aho, Corasick
- 1975
|
|
17
|
A systematic approach to compressing a full-text retrieval system
– Bookstein, Klein, et al.
- 1992
|
|
13
|
Novel compression of sparse bit-strings - preliminary report
– Fraenkel, Klein
- 1985
|
|
11
|
Fast approximate string matching
– Owolabi, McGregor
- 1988
|
|
10
|
Processing truncated terms in document retrieval systems
– Bratley, Choueka
- 1982
|
|
10
|
Economical inversion of large text files
– Moffat
- 1992
|
|
7
|
Practical perfect hashing
– Cormack, Horspool, et al.
- 1985
|
|
6
|
Handbook of Data Structures and Algorithms
– Gonnet, Baeza-Yates
- 1991
|
|
4
|
Generative models for bitmap sets with compression applications
– Bookstein, Klein
- 1991
|
|
1
|
Commands for interactive text searching
– ISO
- 1988
|
|
1
|
Approximate string matching for large lexicons
– Zobel, Dart
- 1994
|