9 citations found. Retrieving documents...
Ricardo A. Baeza-Yates and Gonzalo Navarro. Block addressing indices for approximate text retrieval. Journal of the American Society of Information Science, 51(1):69--82, 2000.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Opportunistic Data Structures with Applications - Ferragina, Manzini (2000)   (28 citations)  (Correct)

....still unacceptable for large text collections. Approaches to combine compression and indexing techniques are nowadays receiving more and more attention, especially in the context of word based indices, achieving experimental trade offs between space occupancy and query performance (see e.g. [4, 19, 27]) An interesting idea towards the direct compression of the index data structure has been proposed in [13, 14] where the properties of the Lempel Ziv s compression scheme have been exploited to reduce the number of index points, still supporting pattern searches. As a result, the overall index ....

.... of novel text retrieval systems based on the concept of block addressing (first introduced in the Glimpse tool [19] The notable feature of block addressing is that it can achieve both sublinear space overhead and sublinear query time, whereas inverted indices achieve only the second goal [4]. Unfortunately, up to now all the known block addressing indices [4, 19] achieve time and space sublinearity only under some restrictive conditions on the block size. We show how to use our opportunistic data structure to devise a novel block addressing scheme, called CGlimpse (standing for ....

[Article contains additional citation context not shown here]

R. Baeza-Yates and G. Navarro. Block addressing indices for approximate text retrieval. Journal of the American Society for Information Science, 51(1):69--82, 2000.


A Metric Index for Approximate String Matching - Chávez, Navarro (2002)   (1 citation)  Self-citation (Navarro)   (Correct)

....n=m) Indexing text for approximate string matching has received attention only recently. Despite some progress in the last decade, the indexing schemes for this problem are still rather immature. There exist some indexing schemes specialized to word wise searching on natural language text [24, 4, 3]. These indexes perform quite well in that case but they cannot be extended to handle the general case. Extremely important applications such as DNA, proteins or oriental languages fall outside this case. The indexes that solve the general problem can be divided in three classes. A first one [19, ....

R. Baeza-Yates and G. Navarro. Block-addressing indices for approximate text retrieval. In Proc. ACM CIKM'97, pages 1--8, 1997.


Matchsimile: A Flexible Approximate Matching Tool for.. - Navarro, Baeza-Yates, Jo (2001)   Self-citation (Baeza-yates Navarro)   (Correct)

....e.g. 6] and none addressing our particular problem. The rst such system was Glimpse [10] which indexes the text and permits approximate searching by looking sequentially all the vocabulary words. The same idea, with few modi cations, has been used in other natural language indexes [3, 13]. A recent system relying on a slightly di erent approximate matching model is LikeIt [17] In this system symbol transpositions are permitted and penalized according to their distances from their original positions. Based on recent algorithmic developments, LikeIt still does not deal with the ....

R. Baeza-Yates and G. Navarro. Block-addressing indices for approximate text retrieval. Journal of the American Society for Information Science (JASIS), 51(1):69-82, January 2000.


Adding Compression to Block Addressing Inverted Indexes - Navarro, de Moura.. (2000)   (9 citations)  Self-citation (Baeza-yates Navarro)   (Correct)

....Two words can be in the same document but they may or may not form a phrase or be close. For these queries we must search directly the text of those documents where all the relevant words appear. The third type, called a block addressing index , goes one step further (Manber and Wu, 1994; Baeza Yates and Navarro, 2000). It divides the text in blocks of fixed size (which may span many documents, be part of a document or overlap with document boundaries) The index stores only the blocks where each word appears. Since normally there are much less blocks than documents, the space occupied by the index is very ....

....to documents (or files) this way following the natural divisions of the text collection. If we consider the case of simple queries (say, one word) where we are required to return only the list of matching documents, then pointing to documents is a very adequate choice. Moreover, as shown in (Baeza Yates and Navarro, 2000), it may reduce space requirements with respect to paper.tex; 3 03 2000; 12:42; p.11 12 b words block of b words block of b words block of r blocks Text words occurrences Index Figure 3. A block addressing index. using blocks of fixed size. Also, if we use blocks of fixed size and pack ....

[Article contains additional citation context not shown here]

Baeza-Yates, R. and G. Navarro: 2000, `Block-Addressing Indices for Approximate Text Retrieval'. Journal of the American Society for Information Science (JASIS) 51(1), 69--82.


A Hybrid Indexing Method for Approximate String Matching - Navarro, Baeza-Yates (2001)   (8 citations)  Self-citation (Baeza-yates Navarro)   (Correct)

....results and Section 7 our conclusions. This paper is an extended and improved version of [29] 2 Previous Work and Our Contribution There are two types of indexing mechanisms for approximate string matching, which we call word retrieving and sequence retrieving . Word retrieving indexes [23, 8, 2] are more oriented to natural language text and information retrieval. They can retrieve every word whose edit distance to the pattern word is at most k. Hence, they are not able to recover from an error involving a separator, such as recovering the word flowers from the misspelled text flo ....

....search longer patterns as the text grows. As we can tune j, we softly move to j = 1 (then eliminating verification costs) when n becomes large with respect to m. This trick is also present in the sublinearity result of Myers index [24] and implicit in similar results on natural language texts [8, 25]. Finally, it is interesting to compare the limit error level for which our index remains sublinear in its average search time against that of Myers [24] and typical filtration indexes. As explained in Section 4.2, ours and Myers index should share a single analysis, as the idealized method ....

R. Baeza-Yates and G. Navarro. Block-addressing indices for approximate text retrieval. Journal of the American Society for Information Science (JASIS), 51(1):69--82, January 2000.


Indexing Text with Approximate q-grams - Navarro, Sutinen, Tanninen, Tarhio (2000)   (8 citations)  Self-citation (Navarro)   (Correct)

....problems in this area [27, 3] Despite some progress in the last years, the indexing schemes for this problem are still rather immature. There are two types of indexing mechanisms for approximate string matching, which we call word retrieving and sequence retrieving . Word retrieving indexes [18, 5, 2] are more oriented to natural language text and information retrieval. They can retrieve every word whose edit distance to the pattern word is at most k. Hence, they are not able to recover from an error involving a separator, such as recovering the word flowers from the misspelled text flo ....

R. Baeza-Yates and G. Navarro. Block-addressing indices for approximate text retrieval. J. of the American Society for Information Science (JASIS), 51(1):69-- 82, January 2000.


Fast and Flexible Word Searching on Compressed Text - de Moura, Navarro.. (2000)   (4 citations)  Self-citation (Baeza-yates Navarro)   (Correct)

....codewords of the vocabulary symbols that match with it. This is done by sequentially traversing the vocabulary and collecting all the words that match the pattern. This technique has been already used in block addressing indices on uncompressed texts [Manber and Wu 1993; Ara ujo et al. 1997; Baeza Yates and Navarro 1997]. Since the vocabulary is very small compared to the text size, the sequential search time on the vocabulary is negligible, and there is no other additional cost to allow complex queries. This is very difficult to achieve with online plain text searching, since we take advantage of the knowledge ....

Baeza-Yates, R. and Navarro, G. 1997. Block addressing indices for approximate text retrieval. In Proc. of Sixth ACM International Conference on Information and Knowledge Management (CIKM'97) (1997), pp. 1--8. Extended version to appear in JASIS.


Linear Time Sorting of Skewed Distributions - de Moura, Navarro   Self-citation (Navarro)   (Correct)

.... language texts show that the value of the constant for natural language texts is between 1:5 and 2:0 [ANZ97] Further, the least frequent word of a text (xn ) has a small number of occurrences that is close to 1 (in almost all natural language the texts there are many words with frequency 1 [BYN97]) Therefore, the extra space used by the remainingsort algorithm in this application is K = O( log n) 2 ) which is a small extra space requirement. To show the usefulness of the idea we made experiments using literary texts from the trec collection [Har95] We have chosen the following texts: ....

R. Baeza-Yates and G. Navarro. Block addressing indices for approximate text retrieval. In Proc. of Sixth ACM International Conference on Information and 5 Knowledge Management (CIKM'97), pages 1--8, Las Vegas, Nevada, 1997.


A formal derivation of Heaps' Law - Grootjen, van Leijenhorst, van der.. (2003)   (Correct)

No context found.

Ricardo A. Baeza-Yates and Gonzalo Navarro. Block addressing indices for approximate text retrieval. Journal of the American Society of Information Science, 51(1):69--82, 2000.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC