See this document in CiteSeerX!

Direct Pattern Matching on Compressed Text (1998)  (Make Corrections)  (14 citations)
Edleno Silva de Moura, Gonzalo Navarro, Nivio Ziviani
String Processing and Information Retrieval



  Home/Search   Context   Related

 
View or download:
dcc.uchile.cl/~gnavar...spire98.3.ps.gz
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  dcc.uchile.cl/~gnavarro/publ (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: We present a fast compression and decompression technique for natural language texts. The novelty is that the exact search can be done on the compressed text directly, using any known sequential pattern matching algorithm. Approximate search can also be done efficiently without any decoding. The compression scheme uses a semi-static word-based modeling and a Huffman coding where the coding alphabet is byte-oriented rather than bit-oriented. We use the first bit of each byte to mark the... (Update)

Context of citations to this paper:   More

...in elapsed time comparison, the algorithm drastically defeats Agrep even for English text files. It should be stated that Moura et al. [9] proposed a compression scheme that uses a word based Huffman encoding with a byte oriented code. The compression ratio for typical...

.... allow reducing the text to 25 to 30 of its original size but also allows direct searching on the compressed text without decompressing [21, 20], much faster than the same search done on the uncompressed text. Moreover, those compression techniques are based on Huffman coding...

Cited by:   More
Practical and Flexible Pattern Matching over Ziv-Lempel.. - Navarro, Raffinot   (Correct)
Fast and Flexible Word Searching on Compressed Text - de Moura, Navarro.. (2000)   (Correct)
Linear Time Sorting of Skewed Distributions - de Moura, Navarro   (Correct)

Similar documents (at the sentence level):
37.8%:   Compressed Pattern Matching Approximate Compressed .. - de Moura..   (Correct)
19.2%:   Fast Searching on Compressed Text Allowing Errors - de Moura, Navarro, al. (1998)   (Correct)

Active bibliography (related documents):   More   All
0.6:   A General Practical Approach to Pattern Matching over.. - Navarro, Raffinot (1998)   (Correct)
0.2:   Indexing Compressed Text - de Moura, Navarro, Ziviani (1997)   (Correct)
0.2:   Boyer-Moore String Matching over Ziv-Lempel Compressed Text - Navarro, Tarhio (2000)   (Correct)

Similar documents based on text:   More   All
1.6:   Adding Compression to Block Addressing Inverted Indices - Navarro, de Moura.. (2000)   (Correct)
1.5:   Compression: A Key for Next Generation Text Retrieval.. - Ziviani, de Moura.. (2000)   (Correct)
1.4:   Adding Compression to Block Addressing Inverted Indexes - Navarro, de Moura.. (2000)   (Correct)

Related documents from co-citation:   More   All
9:   A text compression scheme that allows fast searching directly in the compressed .. - Manber - 1993
9:   Fast text searching allowing errors (context) - Wu, Manber - 1992
9:   A general practical approach to pattern matching over Ziv-Lempel compressed text - Navarro, Raffinot - 1998

BibTeX entry:   (Update)

E. de Moura, G. Navarro, N. Ziviani and R. Baeza-Yates. Direct Pattern Matching on Compressed Text. Tech. Report 03-98, Dept. of Computer Science, Univ. Federal de Minas Gerais, Brazil, Apr 1998. http://citeseer.ist.psu.edu/551537.html   More

@inproceedings{ demoura98direct,
    author = "Edleno Silva de Moura and Gonzalo Navarro and Nivio Ziviani and Ricardo A. Baeza-Yates",
    title = "Direct Pattern Matching on Compressed Text",
    booktitle = "String Processing and Information Retrieval",
    pages = "90--95",
    year = "1998",
    url = "citeseer.ist.psu.edu/551537.html" }
Citations (may not include all citations):
338   A method for the construction of minimum-redundancy codes (context) - Huffman - 1952
196   Fast text searching allowing errors (context) - Wu, Manber - 1992
85   Overview of the third text retrieval conference (context) - Harman - 1995
72   A locally adaptive data compression scheme (context) - Bentley, Sleator et al. - 1986
61   Let sleeping files lie: pattern matching in z-compressed fil.. - Amir, Benson et al. - 1996
56   A very fast substring search algorithm (context) - Sunday - 1990
49   IEEE Transactions on Information Theory (context) - Ziv, Lempel et al. - 1976
45   String matching in lempelziv compressed strings - Farach, Thorup
38   Information Retrieval - Computational and Theoretical Aspect.. (context) - Heaps - 1978
38   Efficient two-dimensional compressed matching (context) - Amir, Benson - 1992
36   A text compression scheme that allows fast searching directl.. - Manber - 1997
26   Adding compression to a fulltext retrieval system - Zobel, Moffat - 1995
10   Efficient algorithms for lempel-ziv encodings (context) - Gasieniec, Karpinksi et al. - 1996
10   Software Practice and Experience (context) - Moffat, compression - 1989
8   Large text searching allowing errors - Ara'ujo, Navarro et al. - 1997
5   A faster algorithm for approximate string matching (context) - Baeza-Yates, Navarro - 1996
5   Multiple approximate string matching (context) - Baeza-Yates, Navarro - 1997
3   Pattern matching problem for strings with short descriptions (context) - Karpinski, Shinohara et al. - 1997
2   Fast text searching allowing errors (context) - de Moura, Navarro et al.
1   Indexing compressed text (context) - de Moura, Navarro et al. - 1997



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://www.dcc.uchile.cl/~gnavarro/publ.html):   More
A More Precise Solution to Two Problems on Tries - Navarro, Poblete   (Correct)
Fast Approximate String Matching in a Dictionary - Baeza-Yates, Navarro (1998)   (Correct)
An Optimal Index for PAT Arrays - Navarro (1996)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC