| G. Navarro, E. S. de Moura, M. Neubert, N. Ziviani, and R. Baeza-Yates. Fast and flexible word searching on compressed text. Information Retrieval Journal, 3(1):49--77, 2000. |
....large texts. Some researchers therefore focused on compressed matching problem (i.e. designing algorithms that perform string matching through compressed text without decompression) Some of the research are on searching LZ77 compressed texts [24] LZ78 compressed texts [6] and Hu#man coded texts [45, 46]. Although these algorithms improves the query e#ciency, they still rely on a full linear scan on the compressed text and is inefficient over large texts. Thus approaches for combining indexing and compression techniques to facilitate searching in compressed text are nowadays receiving more and ....
E. Moura, G. Navarro, N. Ziviani, and R. Baeza-Yates. Fast and flexible word searching on compressed text. ACM Transactions on Information Systems, 18(2):113--139, 2000.
No context found.
G. Navarro, E. S. de Moura, M. Neubert, N. Ziviani, and R. Baeza-Yates. Fast and flexible word searching on compressed text. Information Retrieval Journal, 3(1):49--77, 2000.
No context found.
G. Navarro, E. S. de Moura, M. Neubert, N. Ziviani, and R. Baeza-Yates. Fast and flexible word searching on compressed text. Information Retrieval Journal, 3(1):49--77, 2000.
.... collections (where the cost of keeping an up to date index is prohibitive, including the searchers inside text editors and Web interfaces ) for not very large texts (up to a few hundred megabytes) and even as internal tools of indexed schemes (as agrep [29] is used inside glimpse [15] or cgrep [17] is used inside compressed indexes [21] Dept. of Computer Science, University of Chile. Blanco Encalada 2120, Santiago, Chile. gnavarro dcc.uchile.cl. Work developed while the author was at postdoctoral stay at the Institut Gaspard Monge, Univ. de Marne la Vall ee, France, partially supported ....
E. Moura, G. Navarro, N. Ziviani, and R. Baeza-Yates. Fast and flexible word searching on compressed text. ACM Transactions on Information Systems (TOIS), 2000. To appear. Earlier versions in SIGIR'98 and SPIRE'98.
....this is the first practical result for this problem. 2 Related Work Two different approaches exist to search compressed text. The first one applies to compression methods based on single symbol replacement, such as Huffman coding [Huf52] Efficient solutions exist for this case [Man97, MNZBY00] but in general the compression ratio is not so good or the functionality is limited (e.g. to natural 2 language text) The second approach considers Ziv Lempel compression, which is based on finding repetitions in the text and replacing them with references to similar strings previously ....
E. Moura, G. Navarro, N. Ziviani, and R. Baeza-Yates. Fast and flexible word searching on compressed text. ACM TOIS, 18(2):113--139, 2000.
....and Ricardo Baeza Yates) 1 and text had not been compressed. This is one of the few cases where there is no space time trade off. Traditionally, compression techniques have not been used in IR systems because the compressed texts did not allow fast access. Recent text compression methods [7] have enabled the possibility of searching directly the compressed text faster than in the original text, and have also improved the amount of compression obtained. Direct access to any point in the compressed text has also become possible, and the IR system might access a given word in a ....
....a first pass over the text to obtain the frequency of each different text word and then it performs the actual compression in a second pass. The text is not only composed of words but also of separators. An efficient way to deal with words and separators is to use a method called spaceless words [7]. If a word is followed by a space, just the word is encoded. If not, the word and then the separator are encoded. At decoding time, it is assumed that a space follows each word, except if the next symbol corresponds to a separator. Figure 1 presents an example of compression using Huffman coding ....
[Article contains additional citation context not shown here]
E. Moura, G. Navarro, N. Ziviani, and R. Baeza-Yates. Fast and flexible word searching on compressed text. ACM Transactions on Information Systems (TOIS), 18(2):113--139, 2000.
....takes worst case time O(m n R) where R is the number of matches (note that it could be that R = u n) Two different approaches exist to search compressed text. The first one is rather practical. Efficient solutions based on Huffman coding [10] on words have been presented by Moura et al. [18], but they need that the text contains natural language and is large (say, 10 Mb or more) Moreover, they allow only searching for whole words and phrases. There are also other practical ad hoc methods [15] but the compression they obtain is poor. Moreover, in these compression formats n = ....
....a new, faster, algorithm based on Boyer Moore. Approximate string matching on compressed text aims at finding the pattern where a limited number of differences between the pattern and its occurrences are permitted. The problem, advocated in 1992 [2] had been solved for Huffman coding of words [18], but the solution is limited to search a whole word and retrieve whole words that are similar. The first true solutions appeared very recently, by Karkkainen et al. 11] Matsumoto et al. 16] and Navarro et al. 22] 3 The Ziv Lempel Compression Formats LZ78 and LZW The general idea of ....
E. Moura, G. Navarro, N. Ziviani, and R. Baeza-Yates. Fast and flexible word searching on compressed text. ACM Trans. on Information Systems, 18(2):113-- 139, 2000.
....are twice as fast as a decompression followed by a search using the best algorithms for both tasks. 2 Related Work Two different approaches exist to search compressed text. The first one is rather practical. Efficient solutions based on Huffman coding [10] on words have been presented in [16], but they need that the text contains natural language and is large (say, 10 Mb or more) Moreover, they allow only searching for whole words and phrases. There are also other practical ad hoc methods [15] but the compression they obtain is poor. Moreover, in these compression formats n = ....
E. Moura, G. Navarro, N. Ziviani, and R. Baeza-Yates. Fast and flexible word searching on compressed text. ACM Trans. on Information Systems, 2000. To appear. Previous versions in SIGIR'98 and SPIRE'98.
.... (where the cost of keeping an up to date index is prohibitive, including the searchers inside text editors and Web interfaces 1 ) for not very large texts (up to a few hundred megabytes) and even as internal tools of indexed schemes (as agrep [26] is used inside glimpse [14] or cgrep [16] is used inside compressed indexes [20] There is a large class of string matching algorithms in the literature (see, for example, 8, 4] but not all of them are practical. There is also a wide variety of fast online string matching tools in the Dept. of Computer Science, University of Chile. ....
E. Moura, G. Navarro, N. Ziviani, and R. Baeza-Yates. Fast and flexible word searching on compressed text. ACM Transactions on Information Systems (TOIS), 2000. To appear. Earlier versions in SIGIR'98 and SPIRE'98.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC