MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Searching large lexicons for partially specified terms using compressed inverted files (1993) [16 citations — 6 self]

Download:
Download as a PDF | Download as a PS
by Justin Zobel, Alistair Moffat, Ron Sacks-davis
Proc. International Conference on Very Large Databases
http://www.cs.mu.oz.au/publications/tr_db/./mu_94_17.ps.gz
Add To MetaCart

Abstract:

There are several advantages to be gained by storing the lexicon of a full text database in main memory. In this paper we describe how to use a compressed inverted file index to search such a lexicon for entries that match a pattern or partially specified term. Our experiments show that this method provides an effective compromise between speed and space, running orders of magnitude faster than brute force search, but requiring less memory than other pattern-matching data structures; indeed, in some cases requiring less memory than would be consumed by a single pointer to each string. The pattern search method is based on text indexing techniques and is a successful adaptation of inverted files to main memory databases. 1

Citations

2739 A mathematical theory of communication – Shannon - 1948
391 A space-economical suffix tree construction algorithm – McCreight - 1976
223 Universal codeword sets and representations of the integers – Elias - 1975
161 Run-length encodings – Golomb - 1966
106 Approximate string matching with q-grams and maximal matches – Ukkonen - 1992
88 Self-Indexing Inverted Files for Fast Text Retrival – Moffat, Zobel - 1996
70 Data compression in full-text retrieval systems – Bell, Moffat, et al. - 1993
68 Optimal source codes for geometrically distributed integer alphabets – Gallager, Voorhis - 1975
57 An e cient indexing technique for fulltext database systems – Zobel, at, et al. - 1992
39 PATRICIA-Practical algorithm to retrieve information coded in alphanumeric – MORRISON - 1968
30 Practical minimal perfect hash functions for large databases – Fox, Heath, et al. - 1992
27 A Compression Method for Clustered Bit-Vectors – Teuhola - 1978
22 Parameterised compression for sparse bitmaps – Moffat, Zobel - 1992
17 Fast pattern matching: an aid to bibliographic search – Aho, Corasick - 1975
17 A systematic approach to compressing a full-text retrieval system – Bookstein, Klein, et al. - 1992
13 Novel compression of sparse bit-strings - preliminary report – Fraenkel, Klein - 1985
11 Fast approximate string matching – Owolabi, McGregor - 1988
10 Processing truncated terms in document retrieval systems – Bratley, Choueka - 1982
10 Economical inversion of large text files – Moffat - 1992
7 Practical perfect hashing – Cormack, Horspool, et al. - 1985
6 Handbook of Data Structures and Algorithms – Gonnet, Baeza-Yates - 1991
4 Generative models for bitmap sets with compression applications – Bookstein, Klein - 1991
1 Commands for interactive text searching – ISO - 1988
1 Approximate string matching for large lexicons – Zobel, Dart - 1994