| R. A. Baeza-Yates and G. H. Gonnet. Efficient text searching of regular expressions. In G. Ausiello, M. DezaniCiancaglini, and S. R. D. Rocca, editors, Proceedings of the 16th International Colloquium on Automata, Languages and Programming, volume 372 of LNCS, pages 46--62, Berlin, July 1989. Springer. |
....substring stated here is not the most efficient one, and it is possible to achieve sublinear expected time for any regular expression, even logarithmic expected time, if preprocessing of the text to be searched is allowed using the algorithms developed for regular expressions by R. Baeza Yates [4, 3], or in the case of regular grammars by B. W. Watson [13, 14] 6. Conclusions I have described an efficient way to match regular languages and extract last points of use for selected transitions in a single pass. The algorithms are used in a regular expression matching and search library, and a ....
R. A. Baeza-Yates and G. H. Gonnet. Efficient text searching of regular expressions. In G. Ausiello, M. DezaniCiancaglini, and S. R. D. Rocca, editors, Proceedings of the 16th International Colloquium on Automata, Languages and Programming, volume 372 of LNCS, pages 46--62, Berlin, July 1989. Springer.
....first with shortcuts. The smaller k is, the more subtries will be cut off. When k = 0, all irrelevant subtries are cut off, and this gives the exact string search in time proportional only to the length of the string being sought. The algorithm can also be used to search full regular expressions [3]. We have proposed a trie structure which uses two bits per node and has no pointers. Our trie structure is designed for storing very large sets of word strings on secondary storage. The trie is partitioned by pages and neighboring nodes, such as parents, children and siblings, are clustered in ....
R.A. Baeza-Yates and G.H. Gonnet. Efficient text searching of regular expressions. In G. Ausiello, M. Dezani-Ciancaglini, and S.R.D. Rocca, editors, Proceedings of 16th International Colloquium on Automata, Languages and Programming, LNCS 372, pages 46--62, Stresa, Italy, July 1989. Springer-Verlag.
....the suffix tree. Other algorithms have used the BM approach in approximate string matching [4, 18] Running times are O(kn) for Baeza Yates and Gonnet [4] and O(kn( 1 m Gammak k c ) for Tarhio and Ukkonen [18] The problem of approximate matching of a class of patterns was also studied [2, 1, 5], especially in the case of patterns with don t care symbols [10, 17, 16, 3, 8, 14] Fisher et Paterson [10] developed an O(n log c log 2 m log log m) time algorithm based on the linear product. Abrahamson [1] extended this method for generalized string pattern. Pinter [17] has used the Aho and ....
R. Baeza-Yates and G. Gonnet. Efficient text searching of regular expressions. 16th International colloquium on Automata, Languages and Programming. Stresa, Italy, July 1989.
....of this automaton is such that it also helps to identify easily the occurrences of the patterns found in t. We do not impose any preprocessing of the text. There exists a sublinear average time algorithm for searching for any regular expression in a preprocessed text represented by a Patricia tree [Baeza Yates and Gonnet 1989]. Constructing a deterministic automaton representing A L(G) from G can be done by adding loops labeled with elements of the alphabet A to the initial state and then use the classical powerset construction to obtain a deterministic automaton [Aho et al. 1986] However, the size of the alphabet ....
Baeza-Yates, Ricardo A. and Gonnet, Gaston H. 1989. Efficient text searching of regular expressions. In Proceedings of the 16th International Colloquium on Automata, Languages and Programming, ICALP '89, Volume Lecture Notes in Computer Science, Springer-Verlag, Berlin, 46--62.
....be solved by scanning the text. Regular expression matching takes O(n) time (excluding the preprocessing of the regular expression) 1] and approximate string matching O(kn) time [7, 18] Baeza Yates and Gonnet have described methods to use the suffix tree to do both regular expression matching [5] and approximate string matching [6] The latter idea was also independently mentioned in [10, Remark 2] Both of these methods are based on scanning one suffix of T at a time to find whether it has a matching prefix. The methods take advantage of the fact that, if a set of suffixes has a common ....
R. A. Baeza-Yates and G. H. Gonnet. Efficient text searching of regular expressions. In Proc. 16th International Colloquium on Automata, Languages and Programming (ICALP), pages 46--62, 1989.
....are then provided that are efficient in utilizing the special data structure. In this section, we describe two prominent data structures that fall under this type of approach. 3.2.2. 1 Patricia Trees Patricia trees are based on the semi infinite string (sistring) model of textual data [GBY91, BYG89] In this model, textual data (whether structured or unstructured) is viewed as a string starting at each position of the text and continuing indefinitely Chapter 3. Related Work 52 to the right. This model is primarily targeted towards unstructured textual data but can be adapted to text with ....
....for a text of size n, there are n external nodes (or leaves) in the Pat tree and n Gamma 1 internal nodes, thus making the tree O(n) in size. For example, the Patricia tree for the string 01100100010111 when the first eight sistrings have been inserted looks like Figure 7 (Figure taken from [BYG89] 1 2 3 4 3 2 5 7 4 8 5 1 6 3 2 Figure 7: The Pat Tree for the string 0110010001011 after the insertion of the first eight sistrings In Figure 7, the numbers in the leaves represent the position of the sistring at that node (e.g. the leftmost leaf represents the seventh sistring 00010111 in ....
[Article contains additional citation context not shown here]
Ricardo A. Baeza-Yates and Gaston H. Gonnet. Efficient text searching of regular expressions. Proceedings, 16th International Colloquium on Automata, Languages, and Programming, pages 46--62, 1989.
....in a journal authored by Jane Doe with a title containing creature . These types of searches are less common in document processing systems because of the lack of the strong searching capabilities. Such searches can truly help readers find apparently complex information quickly and easily [BYG89, ACM93] Fortunately SGML provides a framework for these searches. In this paper, we refer to searches that involve conditions over logical structural regions of documents as queries . 2 Query languages for structured documents Since documents have a complex hierarchical structure, it is ....
Ricardo A. Baeza-Yates and Gaston H. Gonnet. Efficient text searching of regular expressions. Proceedings, 16th International Colloquium on Automata, Languages, and Programming, pages 46--62, 1989.
.... on the practical experience gained with tools developed in conjunction with the Unix system, is given by Aho [2] More recently, a general algorithm for regular expression search in preprocessed text with average case time complexity of O( p n) has been developed by Baeza Yates and Gonnet [6]. Much effort has been devoted to addressing special cases of regular expression search. Search algorithms have been developed for finding single keywords [8, 15, 19] sets of keywords [4] keywords separated by sequences of don t cares [1, 12, 20] and other simple patterns [7, 23] A recent ....
Ricardo A. Baeza-Yates and Gaston H. Gonnet. Efficient text searching of regular expressions. In Proceedings 16th International Coloquium on Automata, Languages and Programming, pages 46--62, Stresa, Italy, 1989.
.... Jane Doe with a title containing creature . These types of searches are less common in traditional information retrieval systems which lack the capability of involving structure in searches. Searches involving structure and complex operations such as join have been the object of recent research [4, 1]. In this paper, we will refer to such searches as queries . 2 Query languages for structured documents The object of developing languages for the purpose of querying is to attain the capability of involving complex operations involving text (data) and structure (meta data) Traditional query ....
Ricardo A. Baeza-Yates and Gaston H. Gonnet. Efficient text searching of regular expressions. Proceedings, 16th International Colloquium on Automata, Languages, and Programming, pages 46--62, 1989.
.... le probl eme de la recherche approch ee [4, 18] Les temps d ex ecution sont en O(kn) pour Baeza Yates et Gonnet [4] et en O(kn( 1 m Gammak k c ) pour Tarhio et Ukkonen [18] Le probl eme de la recherche avec erreurs d un ensemble de mots dans un texte a egalement et e etudi e [2, 1, 5], et en particulier la recherche de mots a trous [10, 17, 16, 3, 8, 14] Fisher et Paterson [10] ont d evelopp e un algorithme bas e sur une m ethode utilisant le produit lin eaire, de temps d ex ecution O(n log c log 2 m log log m) Abrahamson [1] a g en eralis e cet al..gorithme au cas d une ....
R. Baeza-Yates and G. H. Gonnet. Efficient text searching of regular expressions. 16th International colloquium on Automata, Languages and Programming. Stresa, Italy, July 1989.
....the system depend heavily on this strategy. A data structure for indexing the data is first decided upon, and operations are provided that are efficient in utilizing the constructed data structure. The work by Gonnet et al. is most notable in this approach. They used an indexing approach [GBY91, BYG89] with Patricia trees for prefix searches on semi infinite strings (sistrings) The PAT software [Ope94] from Open Text uses this indexing approach for its query processing. The base for our work will be the PAT software. The reason for choosing PAT are various: i) it is the result of a great ....
Richard A. Baeza-Yates and Gaston H. Gonnet. Efficient text searching of regular expressions. Proceedings, 16th International Colloquium on Automata, Languages, and Programming, pages 46--62, 1989.
....the eigenvalues of a matrix (for example, Gerschgorin s theorem [Wil65, Chapter 2, S13] In our example, from the T matrix we can obtain Re( 1 ) Gamma2, which is far from Gamma6:55, but good enough for asymptotic purposes. Another application of these bounding techniques can be found in [BYG89] 2 3 Weakly Closed Collections For some classes of balanced trees, an event on the fringe may depend on events happening outside the fringe. In that case unknown probabilities are introduced to handle this problem. For example, in AVL trees, the composition of the fringe may change due to ....
R. Baeza-Yates and G.H. Gonnet. Efficient text searching of regular expressions. In G. Ausiello, M. Dezani-Ciancaglini, and S. Ronchi Della Rocca, editors, ICALP'89, Lecture Notes in Computer Science 372, pages 46--62, Stresa, Italy, July 1989. SpringerVerlag.
....5 gives the main result of the paper: an algorithm to search any regular expression, with its expected time analysis. Section 6 gives a heuristic to optimize generic pattern matching queries, in particular when given as regular expressions. A preliminary version of this paper was presented in [BYG89a], and these results are part of [BY89] 2 Preliminaries 2.1 Notation We use Sigma to denote the alphabet (a set of symbols) A string over an alphabet Sigma is a finite length sequence of symbols from Sigma. The empty string (ffl) is the string with no symbols. If x and y are strings, xy ....
R. Baeza-Yates and G.H. Gonnet. Efficient text searching of regular expressions. In G. Ausiello, M. Dezani-Ciancaglini, and S. Ronchi Della Rocca, editors, ICALP'89, Lecture Notes in Computer Science 372, pages 46--62, Stresa, Italy, July 1989. SpringerVerlag.
No context found.
Ricardo Baeza-Yates and Gaston H. Gonnet, Efficient text searching of regular expressions, Proceedings of the 16th International Colloquium on Automata, Languages and Programming, Lecture Notes in Computer Science, vol. 372, Springer-Verlag,
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC