(Enter summary)
Abstract: We present a fast compression and decompression technique for natural language texts. The novelty is that the exact search can be done on the compressed text directly, using any known sequential pattern matching algorithm. Approximate search can also be done efficiently without any decoding. The compression scheme uses a semi-static word-based modeling and a Huffman coding where the coding alphabet is byte-oriented rather than bit-oriented. We use the first bit of each byte to mark the... (Update)
Context of citations to this paper: More
...in elapsed time comparison, the algorithm drastically defeats Agrep even for English text files. It should be stated that Moura et al. [9] proposed a compression scheme that uses a word based Huffman encoding with a byte oriented code. The compression ratio for typical...
.... allow reducing the text to 25 to 30 of its original size but also allows direct searching on the compressed text without decompressing [21, 20], much faster than the same search done on the uncompressed text. Moreover, those compression techniques are based on Huffman coding...
Cited by: More
Practical and Flexible Pattern Matching over Ziv-Lempel.. - Navarro, Raffinot
(Correct)
Fast and Flexible Word Searching on Compressed Text - de Moura, Navarro.. (2000)
(Correct)
Linear Time Sorting of Skewed Distributions - de Moura, Navarro
(Correct)
Similar documents (at the sentence level):
37.8%: Compressed Pattern Matching Approximate Compressed .. - de Moura..
(Correct)
19.2%: Fast Searching on Compressed Text Allowing Errors - de Moura, Navarro, al. (1998)
(Correct)
Active bibliography (related documents): More All
0.6: A General Practical Approach to Pattern Matching over.. - Navarro, Raffinot (1998)
(Correct)
0.2: Indexing Compressed Text - de Moura, Navarro, Ziviani (1997)
(Correct)
0.2: Boyer-Moore String Matching over Ziv-Lempel Compressed Text - Navarro, Tarhio (2000)
(Correct)
Similar documents based on text: More All
1.6: Adding Compression to Block Addressing Inverted Indices - Navarro, de Moura.. (2000)
(Correct)
1.5: Compression: A Key for Next Generation Text Retrieval.. - Ziviani, de Moura.. (2000)
(Correct)
1.4: Adding Compression to Block Addressing Inverted Indexes - Navarro, de Moura.. (2000)
(Correct)
Related documents from co-citation: More All
9: A text compression scheme that allows fast searching directly in the compressed ..
- Manber - 1993
9: Fast text searching allowing errors (context) - Wu, Manber - 1992
9: A general practical approach to pattern matching over Ziv-Lempel compressed text
- Navarro, Raffinot - 1998
BibTeX entry: (Update)
E. de Moura, G. Navarro, N. Ziviani and R. Baeza-Yates. Direct Pattern Matching on Compressed Text. Tech. Report 03-98, Dept. of Computer Science, Univ. Federal de Minas Gerais, Brazil, Apr 1998. http://citeseer.ist.psu.edu/551537.html More
@inproceedings{ demoura98direct,
author = "Edleno Silva de Moura and Gonzalo Navarro and Nivio Ziviani and Ricardo A. Baeza-Yates",
title = "Direct Pattern Matching on Compressed Text",
booktitle = "String Processing and Information Retrieval",
pages = "90--95",
year = "1998",
url = "citeseer.ist.psu.edu/551537.html" }
Citations (may not include all citations):
338
A method for the construction of minimum-redundancy codes (context) - Huffman - 1952
196
Fast text searching allowing errors (context) - Wu, Manber - 1992
85
Overview of the third text retrieval conference (context) - Harman - 1995
72
A locally adaptive data compression scheme (context) - Bentley, Sleator et al. - 1986
61
Let sleeping files lie: pattern matching in z-compressed fil..
- Amir, Benson et al. - 1996
56
A very fast substring search algorithm (context) - Sunday - 1990
49
IEEE Transactions on Information Theory (context) - Ziv, Lempel et al. - 1976
45
String matching in lempelziv compressed strings
- Farach, Thorup
38
Information Retrieval - Computational and Theoretical Aspect.. (context) - Heaps - 1978
38
Efficient two-dimensional compressed matching (context) - Amir, Benson - 1992
36
A text compression scheme that allows fast searching directl..
- Manber - 1997
26
Adding compression to a fulltext retrieval system
- Zobel, Moffat - 1995
10
Efficient algorithms for lempel-ziv encodings (context) - Gasieniec, Karpinksi et al. - 1996
10
Software Practice and Experience (context) - Moffat, compression - 1989
8
Large text searching allowing errors
- Ara'ujo, Navarro et al. - 1997
5
A faster algorithm for approximate string matching (context) - Baeza-Yates, Navarro - 1996
5
Multiple approximate string matching (context) - Baeza-Yates, Navarro - 1997
3
Pattern matching problem for strings with short descriptions (context) - Karpinski, Shinohara et al. - 1997
2
Fast text searching allowing errors (context) - de Moura, Navarro et al.
1
Indexing compressed text (context) - de Moura, Navarro et al. - 1997
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://www.dcc.uchile.cl/~gnavarro/publ.html): More
A More Precise Solution to Two Problems on Tries - Navarro, Poblete
(Correct)
Fast Approximate String Matching in a Dictionary - Baeza-Yates, Navarro (1998)
(Correct)
An Optimal Index for PAT Arrays - Navarro (1996)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC