26 citations found. Retrieving documents...
Hui, L.: Color set size problem with applications to string matching. In: Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching (CPM 92). Volume 644 of LNCS., Springer-Verlag (1992) 230--243

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Frequent-Subsequence-Based Prediction of Outer - Membrane Proteins Rong   (Correct)

....subsequences, we made use of an efficient implementation of generalized suffix tree (GST) 23] with some simple modifications. Suffix trees have been extensively used in string matching and are shown to be an effective data structure for finding common subsequences that runs in linear time [7, 10]. Since each protein sequence is essentially a string of letters, generalized suffix trees can be easily applied to mine frequent subsequences among protein sequences. Interested readers are referred to the above mentioned documents for details. Starting from frequent subsequences, we build ....

Hui L., Color Set Size Problem with Applications to String Matching, Combinatorial String Matching, Lecture Notes in Computer Science, 644, p230-243, Springer-Verlag, 1992.


String Processing Algorithms - Inenaga (2003)   (Correct)

....to characterize the data. Shimozono et al. developed knowledge discovery system BONSAI [63] that outputs a decision tree based on the best substring patterns for separating two input data sets S, T # # . The best substring pattern separating S and T can be found in linear time by a clever use [32] of the su#x tree for S T . In order that BONSAI system can deal with subsequence patterns as well, Hirao et al. 28] proposed a practical algorithm to find the subsequence pattern that is the most abundant in one set and the rarest in the other. Since this problem is NP hard, they employed ....

....w . The following lemma is quite clear from the above definition. str u, then L (u) Definition 38 (Finding the best substring pattern according to f) # # of strings. p) It is stated in [28] that the above problem is solvable in O(#S# #T#) time by a clever use [32] of STree(S T ) 13.3 Finding Best Subsequence Patterns seq w if p is a subsequence of w. # # . The subsequence language of p is defined by seq w . The following lemma is quite clear from the above definition. seq u, then L (u) Definition 40 (Finding the best ....

L. Hui. Color set size problem with applications to string matching. In Proc. 3rd Lecture Notes in Computer Science, pages 230--243. Springer-Verlag, 1992.


Multiple Pattern Matching in LZW Compressed Text - Kida, Takeda, Shinohara.. (1998)   (9 citations)  (Correct)

....This paper is based on [18] and [17] Original text Compressed text b 1,2, 4, 4, 5, 2, 3, 6, 9, 11 FIG. 1. Dictionary trie. 3 Preliminaries In the following subsections we briefly sketch the LZW compression, and review the Aho Corasick pattern matching machine and the generalized suffix trie[15]. These data structure are used in our algorithm. First, we introduce some notation. Let E be a finite set of characters, called an alphabet, and E be the set of strings over E. We denote the length of u E by lul. We call the string whose length is 0 the null string, and denote it by s. Let E ....

....lemma characterizes the state transition function 5 of the AC ma chine. This is a modified version of Lemma 3 in [2] Let q (2 Prefix(M) u E , and let p = 5(q, u) Then, the string p is the longest string in the set Suffix(qu) f3 (2. 3. 3 Generalized suffix trie A generalized suffix trie [15] for a set H of strings (GST, for short) is a trie, which represents the set of suffixes of the strings in H. It is an extension of the suffix trie for a single string. FIG. 4 shows the GST for II = aba, ababb, abca, bb . Note that each node of the GST for H corresponds to a string in Factor(M) ....

L. C. K Hui. Color set size problem with application to string matching. In Combinatorial Pattern Matching, volume 644 of Lecture Notes in Computer Science, pages 230-243. Springer-Verlag, 1992.


A String Pattern Regression Algorithm and Its.. - Bannai, Inenaga.. (2002)   (Correct)

....This measure is more convenient for our calculations, and will be used for the remaining of the paper. For the substring pattern class, the problem can be solved in linear time, in the summed length of all the string attributes, by a clever use of su#x trees for strings, extending the algorithm of [7] to incorporate the numerical attributes. However, the problem is NP hard in general, for example, for more complex pattern classes such as subsequence patterns. This paper will focus on giving an e#cient but exact branch and bound algorithm for solving this problem in a general case, applicable ....

Hui, L., Color set size problem with applications to string matching, Proc. Third Anual Symposium on Combinatorial Pattern Matching, LNCS 644:230--243, 1992.


Pattern Inference under many Guises - Sagot, Wakabayashi   (Correct)

....at the end of s a symbol not appearing in . 1.5.3. 2 Generalized sux trees Trees for representing all the suxes of a set of strings fs i , 1 i N for some N 2g are called generalized sux trees and are constructed in a way very similar to the construction of the sux tree for a single string [1, 15]. We denote such generalized trees by GT . They share all the properties of a sux tree given in Section 1.5.3.1 with, in Property 1.5.1, string s substituted by strings s 1 ; s N . In particular, a generalized sux tree GT veri es the fact that every suf x of every string s i in the set ....

L. C. K. Hui.Color set size problem with applications to string matching.In A. Apostolico, M. Crochemore, Z. Galil, and U. Manber, editors, Combinatorial Pattern Matching, volume 644 of Lecture Notes in Computer Science, pages 230-243. Springer-Verlag, 1992.


Multiple Pattern Matching Algorithms on Collage System - Kida, Matsumoto, Takeda.. (2001)   (Correct)

....assuming that Occ( x) lpf (x) and lsf (x) are already computed. Proof. It is trivial for k 2. Suppose k 2. Note that we can enumerate the set Occ ( x x) in O(jOcc ( x x)j) time from Lemma 6, and that 8 lps (x) lps (lpf (x) We use a generalized sux trie [5] for a set of strings (GST for short) in order to represent the set of suxes of the strings in . It is an extension of the sux trie for a single string. Note that each node of the GST corresponds to a string in Factor( The construction of the GST takes O(m 2 ) time and space. Now, we ....

L. C. K. Hui. Color set size problem with application to string matching. In Combinatorial Pattern Matching, volume 644 of Lecture Notes in Computer Science, pages 230-243. Springer-Verlag, 1992.


New Techniques for Extracting Features from Protein Sequences - Wang, Ma, Shasha, Wu (2001)   (10 citations)  (Correct)

....number of mutations between the motif and the sequence is 1, representing the cost of inserting an A in the motif. The Sdiscover tool is based on a heuristic that works by taking a small sample K of sequences from the given set of sequences T p and storing them in a generalized suffix tree (GST) [21]. The GST can be constructed asymptotically in O(n) time and space where n is the total length of all sequences in the sample. The heuristic then traverses the GST to generate candidate regular expression motifs and compares these candidate motifs with all the sequences in T p to calculate their ....

L. C. K. Hui. Color set size problem with applications to string matching. In: A. Apostolico, M. Crochemore, Z. Galil, and U. Manber (eds.), Combinatorial Pattern Matching, Lecture Notes in Computer Science, volume 644, Springer-Verlag, 1992, pp. 230--243.


Pattern Discovery in Biology: Theory and Applications - Floratos (1999)   (2 citations)  (Correct)

....of L(P ) Unfortunatelly, the amount of expressiveness allowed in C directly impacts the computational demands of the problem. In the simplest case, when C = Sigma , the problem of finding all patterns in C with a given minimum support can be solved in linear time using generalized suffix trees [56]. In almost every other case though, the class C is expressive enough stated otherwise, the discussion that follows applies to both DNA and protein sequences. 61 to render the problem NP hard (the hardness result can be usually shown by a reduction from the longest common subsequence problem ....

L.C.K. Hui. Color set size problem with applications to string matching. In Combinatorial Pattern Matching, pages 230--243, 1992.


Free Parallel Data Mining - Li (1998)   (1 citation)  (Correct)

....segments among a small sample A of the sequences; 2) combine the segments to form candidate motifs and evaluate the activity of the motifs 20 in all of S to determine which motifs satisfy the specified requirements. Phase (1) consists of two subphases. In subphase A, a generalized suffix tree [44] (GST) for the sample of sequences is constructed. A suffix tree is a trie like data structure that compactly represents a string by collapsing a series of nodes having one child to a single node whose parent edge is associated with a string [60, 54] A GST is an extension of the suffix tree, ....

L. C. K. Hui. Color set size problem with applications to string matching. In A. Apostolico, M. Crochemore, Z. Galil, and U. Manber, editors, Combinatorial Pattern Matching, Lecture Notes in Computer Scienc, Volume 644, pages 230--243, Springer-Verlag, 1992. 134


Suffix Trees and their Applications in String Algorithms - Grossi, Italiano (1993)   (2 citations)  (Correct)

.... finding squares [10, 68] and repetitions in a string [10] computing statistics for the non overlapping occurrences [11] finding the longest match between all ordered suffix prefix pairs of a given set of strings [55] finding the longest substring that appears in h out of k strings, for any h 2 [58]; computing characteristic strings [59] matching a string as an arbitrary path of an unrooted labeled tree [4] performing efficient dictionary matching [6, 5, 7, 21, 43] data compression schemes [39, 40, 73, 83, 84, 102, 103] searching for the longest run of a given motif in molecular ....

.... squares [10, 68] and repetitions in a string [10] computing statistics for the non overlapping occurrences [11] finding the longest match between all ordered suffix prefix pairs of a given set of strings [55] finding the longest substring that appears in h out of k strings, for any 7 h 2 [58]; computing characteristic strings [59] matching a string as an arbitrary path of an unrooted labeled tree [4] performing efficient dictionary matching [6, 5, 7, 21, 43] Other interesting applications are described in the excellent survey of Apostolico [8] Beside pattern matching, suffix ....

Hui, L.C.K, Color set size problem with applications to string matching, Combinatorial Pattern Matching, 230--243, (1992).


Motifs in Sequences: Localization and Extraction - Crochemore, Sagot (2000)   (1 citation)  (Correct)

....repetitions, that is, using an index of the strings such as a sux tree. Trees for representing all the suxes of a set of strings fs i , 1 i N for some N 2g are called generalized sux trees and are constructed in a way very similar to the construction of the sux tree for a single string [64] [65]. We denote such generalized trees by GT . They share all the properties of a sux tree given in Section 3.3 with string s substituted by strings s 1 ; s N . In particular, a generalized sux tree GT satis es the fact that every sux of every string s i in the set leads to a distinct leaf. ....

L. C. K. Hui. Color set size problem with applications to string matching. In A. Apostolico, M. Crochemore, Z. Galil, and U. Manber, editors, Combinatorial Pattern Matching, volume 644 of Lecture Notes in Computer Science, pages 230-243. Springer-Verlag, 1992.


Verbumculus and the Discovery of Unusual Oligonucleotides - Apostolico, Gong, Lonardi (2000)   (Correct)

....time, which in cases like z 4 is achieved through resort to rather complex algorithmics, due to the structure of Var. For scores that require multiple tries to be built and superimposed to one another, like in the computation of SObs(w) for z 7 , z 8 and z 9 , the linear time algorithm by Hui [26] is used. 3 Software description and usage Verbumculus is composed by three modules: the tree builder Verbum, the graph drawing program Dot, and the graphic interface TreeViz. The entire package consists of about ten thousand lines of code. Verbum is written in C using the Standard Template ....

Hui, L. C. K. Color set size problem with applications to string matching. In Proceedings of the 3rd Annual Symposium on Combinatorial Pattern Matching (Tucson, AZ, 1992), A. Apostolico, M. Crochemore, Z. Galil, and U. Manber, Eds., no. 644 in Lecture Notes in Computer Science, SpringerVerlag, Berlin, pp. 230-243.


A Unifying Framework for Compressed Pattern Matching - Kida, Shibata, Takeda.. (1999)   (13 citations)  (Correct)

....and the procedure for enumerating Output. We have verified that Jump can be generalized to treat multiple patterns. Although we omit the detail, the idea is almost the same as [11] That is, we simulate the move of the AC automaton instead of the KMP automaton, and use the generalized suffix trie [8]. For Output, we have also done if a collage system contains neither repetitions nor truncations. The rest is left for our future work. Kosaraju [12] showed a faster pattern matching algorithm for LZW compression, which runs in O(n m # m log m) time. It is a challenging problem to achieve this ....

L. C. K. Hui. Color set size problem with application to string matching. In Combinatorial Pattern Matching, volume 644 of Lecture Notes in Computer Science, pages 230-- 243. Springer-Verlag, 1992.


Spelling Approximate Repeated Or Common Motifs Using a Suffix Tree - Sagot (1998)   (8 citations)  (Correct)

.... that, in this second case, if no errors are allowed, we obtain the same time and space complexities, in O(n) of the best algorithms for identifying repeated motifs [3] 5] This is not true for the common motifs problem where we have an O(nN 2 ) time bound whereas Hui obtains an O(nN) bound [9]. His approach should thus be preferred when e = 0. Since both algorithms share similar structures, we show that only a minor modification to ours is needed so as to be able to switch to Hui s when e is zero (which seldom happens when one is dealing with biological sequences) Suffix trees for ....

....a little more than that since we also dealt with gaps) It is easy to modify it so that it can handle the repeated motifs problem as well. However, the approach described there did not try to take advantage of the sequence (or sequences) structure in order to obtain the valid models as was done in [9] but for identically repeated motifs only (no mismatches allowed) or in [11] but for fixed length motifs that had to appear at least once exactly in the sequence. In the present paper, this underlying structure is exploited to obtain a new model building algorithm dealing with a Hamming distance ....

[Article contains additional citation context not shown here]

L. C. K. Hui. Color set size problem with applications to string matching. In A. Apostolico, M. Crochemore, Z. Galil, and U. Manber, editors, Combinatorial Pattern Matching, volume 644 of Lecture Notes in Computer Science, pages 230--


Discovering Patterns from Large and Dynamic Sequential Data - Wang (1997)   (15 citations)  (Correct)

....among different strings. In addition, it needs to traverse every suffix tree to determine the support of a substring. The same is true of finding all positions of a substring. Another approach is to represent all strings in a single dynamic suffix tree, based on the generalized suffix tree (GST) [H92] designed for a set of static strings. In the following, we extend the GST to dynamic strings to solve the incremental discovery problems for multiple strings. In the case of multiple strings, the position of a character in a string consists of a string identifier and a position number within that ....

L.C.K. Hui, "Color set size problem with applications to string matching", in A. Apostolico, M. Crochemore, Z. Galil, and U. Manber, editors, Combinatorial Patterns Matching, Lecture Notes in Computer Science 644, 230-243, Springer-Verlag, 1992


Pattern Discovery In Sequence Databases: Algorithms And.. - Chirn (1997)   (1 citation)  (Correct)

....patterns are solutions of the query. Phase 1 consists of two subphases. In subphase A, we construct an index structure for the sequences in the sample. In subphase B, we traverse the structure to locate the candidate segments. 2.2. 1 Subphase A of Phase 1 We construct a generalized suffix tree [39] (GST) for the sample of sequences. A suffix tree is a trie like data structure that compactly represents a string by collapsing a series of nodes having one child to a single node whose parent edge is associated with a string. Suffix trees are used extensively in string matching [46, 54] A GST ....

....and hence appear twice in the leaves. 14 Fact 2.1 8u 2 subtree(v) count(u) count(v) and jstring(u)j jstring(v)j. Fact 2.2 If count(v) b, then occurrence no 0 A (string(v) b The reason is that if count(v) b, string(v) is a prefix of the suffixes from b sequences in A. Fact 2. 3 [39, 54] The time and space needed to construct the GST is O(n) where n is the total length of all sequences in the sample. 2.2.2 Subphase B of Phase 1 In this phase, we traverse the GST constructed in subphase A to find all segments (i.e. all prefixes of strings labeled on root to leaf paths) that ....

L. C. K. Hui, "Color set size problem with applications to string matching," in Combinatorial Pattern Matching, Lecture Notes in Computer Science, 644 (A. Apostolico, M. Crochemore, Z. Galil, and U. Manber, eds.), SpringerVerlag, New York, NY, pp. 230--243, 1992. 101


Fast and Intuitive Clustering of Web Documents - Zamir (1997)   (38 citations)  (Correct)

....cluster s content. For example, for a query on the word Clinton a sample of the phrases found common to many documents includes: progress on the aids pandemic , democratic party , and Hillary Rodham Clinton . Another advantage of Phrase IC is that there exist efficient algorithms (for example [16]) for discovering long substrings common to many documents. Thus, for certain definitions of cluster cohesion based on common phrases we can leverage off this potential computational advantage. 4.1 Phrase Intersection Clustering using GQF A small variation in the definition of the GQF allows us ....

L. C. K. Hui. Color set size problem with applications to string matching. In Combinatorial Pattern Matching Third Annual Symposium, 1992.


Approaches to the Automatic Discovery of Patterns in.. - Brazma, Jonassen.. (1995)   (34 citations)  (Correct)

....subset in an exact form. This number can be estimated by using random sampling theory. Therefore, we can enumerate only these substrings that are present in the subset more than a certain number of times. Moreover, the strings in the subset can be represented as a generalised suffix tree (GST) Hui92] and then the potential candidates for the pattern, can be selected in linear time. GSTs are a generalisation of the notion of suffix trees [McC76, Ukk92] Thus the algorithm becomes linear time in the length of the examples and the patterns. This approach has been used by Wang et al. WMS ....

....given sequences. First of all, a GST can be constructed in linear time in the total length of sequences. After the GST is constructed, the query to find the longest substring common to at least k strings can be executed also in linear time, thus the whole problem can be solved in linear time (see [Hui92] for details) However we note that the construction of GST in linear time is technically very complicated and in practice often simpler and theoretically less efficient algorithms are used for constructing GSTs. Moreover, currently no very efficient algorithms using GSTs for approximate ....

[Article contains additional citation context not shown here]

L. C. K. Hui. Color set size problem with application to string matching. In Proc. of Combinatorial Pattern Matching, pages 230--243. Springer-Verlag, 1992.


Augmenting Suffix Trees, with Applications - Matias, Muthukrishnan.. (1998)   (Correct)

....that contain P . Clearly, the latter may be substantially smaller than the former if P occurs multiple times in the documents. A related query is where we are required to merely report the number of documents that contain P . An algorithm that solves this problem in O(jP j) time is given in [Hui92], which is based on data structures for computing lowest common ancestor (LCA) queries. The document listing problem is of great interest in information retrieval and has independently been formulated in many scenarios (see pages 124 125 in [Gus98] for a morbid application) for example in ....

....superlinear with the size of the input) or not online. For a survey and comparison of these methods, see [BCW90] 4 The analyses in [HZ95,HZ98] are for binary strings. 5 The suffix DAG data structure also helps solve the count query in time O(jP j) which matches the best known complexity [Hui92]. We present a new lemma for Although this problem is natural, no nontrivial algorithmic results were known before our paper. The fastest algorithms for solving this problem run in time proportional to the number of occurrences of the pattern in all documents; clearly, this could be much larger ....

J. Hui. Color set size problem with applications to string matching. In Combinatorial Pattern Matching, 1992.


Mining Sequential Patterns - Agrawal, Srikant (1995)   (331 citations)  (Correct)

....entirely different. The solution in [10] is not guaranteed to be complete, whereas we guarantee that we have discovered all sequential patterns of interest that are present in a specified minimum number of sequences. The algorithm in [10] is a main memory algorithm based on generalized suffix tree [7] and was tested against a database of 150 sequences (although the paper does contain some hints on how they might extend their approach to handle larger databases) Our solution is targeted at millions of customer sequences. Organization of the Paper We solve the problem of finding all sequential ....

L. Hui. Color set size problem with applications to string matching. In A. Apostolico, M. Crochemere, Z. Galil, and U. Manber, editors, Combinatorial Pattern Matching, LNCS 644, pages 230--243. Springer-Verlag, 1992.


Discovering Unbounded Unions of Regular Pattern Languages.. - Brazma, Ukkonen, Vilo   (1 citation)  (Correct)

....a list of the names of the original strings that have as a suffix the string that leads from root to this leaf. All substrings of every string in GST are represented by some (may be partial) path from the root. GST can be constructed in linear time on the sum of the lengths of strings it contains [8]. For the substring pattern case the input of the algorithm is only the set U = A, but the pairs ( i ; F i ) are found by the algorithm Greedy substring from A. We first build the GST in time O(kAk) Next for i = 1; jAj we find the longest substring i present in at least i strings of ....

....for i = 1; jAj we find the longest substring i present in at least i strings of A. For each i we also compute c i = c( i ; i) and keep track of the best i (i.e. i having the highest c i ) so far. This can be done in time O(kAk) by using the construction described on page 239 of [8]. After this we pick the substring pattern i having the highest cost c( i ; F i ) among all possible subsets F i of A, such that F i = L( i ) A and also find the subset F i itself in time O(kAk) having i this can be done by a fast string search algorithm) Simultaneously we can compute U ....

L. C. K. Hui. Color set size problem with application to string matching. In Proc. of Third Annual Symposium on Combinatorial Pattern Matching, Lecture Notes in Comp. Science, 644, pages 230--243. Springer-Verlag, 1992.


Finding Optimal Pairs of Patterns - Hideo Bannai Heikki (2004)   (Correct)

No context found.

Hui, L.: Color set size problem with applications to string matching. In: Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching (CPM 92). Volume 644 of LNCS., Springer-Verlag (1992) 230--243


Finding Optimal Pairs of Cooperative and Competing .. - Inenaga, Bannai..   (Correct)

No context found.

Hui, L.: Color set size problem with applications to string matching. In: Proc. 3rd Annual Symposium on Combinatorial Pattern Matching (CPM'92). Volume 644 of LNCS., Springer-Verlag (1992) 230--243


Finding Optimal Pairs of Patterns - Bannai, Hyyrö, Shinohara, Takeda.. (2004)   (Correct)

No context found.

Hui, L.: Color set size problem with applications to string matching. In: Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching (CPM 92). Volume 644 of LNCS., Springer-Verlag (1992) 230--243


Multiple Pattern Matching Algorithms on Collage System - Kida, Matsumoto, Takeda.. (2001)   (Correct)

No context found.

L.C.K. Hui. Color set size problem with application to string matching. In Combinatorial Pattern Matching, volume 644 of Lecture Notes in Computer Science, pages 230--243. Springer-Verlag, 1992.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC