| Haertel, M., "Gnugrep-2.0,", Usenet archive comp.sources.reviewed Vol. 3, 1993 |
....of input patterns. Clearly a better solution to this problem can be devised. Desirable solutions would be to find all pattern occurrences by scanning T only once regardless of the number of input patterns. A lot of research on the multiple string pattern matching can be found in the literature [1, 5, 6, 13]. The state of the art algorithms can quickly find all pattern occurrences by scanning T only once even for a very large number of patterns; for example, Wu and Manber s algorithm [13] can find all occurrences of 10,000 input patterns in 15.8M text about 10 seconds Partially supported by the ....
Haertel, M., "Gnugrep-2.0,", Usenet archive comp.sources.reviewed Vol. 3, 1993
.... of 2 the standard egrep package, with only 25 patterns (egrep cannot handle more than 500 patterns) even though egrep does not allow errors This implementation is not competitive, however, with very fast multiple string matching without errors, such as the ones used in agrep [WM92] or in Gnu grep [Ha93] (which uses an algorithm based on Commentz Walter [CW79] We cannot hope to compete with string matching without errors, but we can hope to come close, which we did. Next, we will show how to improve this basic scheme. We call this basic algorithm Algorithm 00. 3. Better and Faster Hash ....
Haertel, M., "Gnugrep-2.0," Usenet archive comp.sources.reviewed, Volume 3 (July, 1993).
....matching problem that combines the Boyer Moore technique with the Aho Corasick algorithm. The CommentzWalter algorithm is substantially faster than the Aho Corasick algorithm in practice. Hume [Hu91] designed a tool called gre based on this algorithm, and version 2. 0 of fgrep by the GNU project [Ha93] is using it. Baeza Yates [Ba89] also gave an algorithm that combines the Boyer Moore Horspool algorithm [Ho80] which is a slight variation of the classical Boyer Moore algorithm) with the Aho Corasick algorithm. We present a different approach that also uses the ideas of Boyer and Moore. Our ....
....experiments was a collection of articles from the Wall Street Journal totaling 15.8MB. The patterns were words from the file (all patterns appeared in the text) Table 1 compares our algorithm, labeled agrep, against four other search routines: the original egrep and fgrep, GNU grep version 2. 0 [Ha93], and gre, an older program written by Andrew Hume (which at the time was the only program that could handle large number of patterns) The patterns were words of sizes ranging from 5 to 15 with average size slightly above 6. The original egrep and fgrep could not handle (or took too long for) ....
Haertel, M., "Gnugrep-2.0," Usenet archive comp.sources.reviewed, Volume 3 (July, 1993).
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC