| L. Parida, I. Rigoutsos, A. Floratos, D. Platt, and Y. Gao. Pattern discovery on character sets and real-valued data: linear bound on irredundant motifs and an ecient polynomial time algorithm. In Proceedings of the 11th Symposium on Discrete Algorithms, pages 297-308, 2000. |
....between a pair of sequences, to one which considers multiple possible alignments. Another important problem is how to apply this test to discover all repeats in a long genome segment, extending the work on sequence discovery algorithms available for non tandem repeats [1] and other motifs [5, 6, 15] under conventional measures of sequence similarity. One particularly interesting testbed is the identi cation of the exact boundaries of multi layered tandemly repeated DNA segments. A practical approach to this problem is to slide a xed size window across the sequence of interest, measuring ....
L. Parida, I. Rigoutsos, A. Floratsas, D. Platt, Y. Gao, Pattern discovery on character sets and real valued data: linear bound on irredundant motifs and an ecient polynomial time algorithm, Proceedings of ACM-SIAM SODA, 2000.
....in other real strings over a small alphabet occurring in practice, e.g. DNA sequences. Some heuristics try to alleviate this drawback by reducing the number of interesting motifs to make feasible any further processing of them, but they cannot guarantee sub exponential bounds in the worst case [7]. In this paper, we explore the algorithmic ideas behind motif discovery while getting some insight into their combinatorial complexity and their connections with string algorithmics. Given a motif x for a string s of length n, we denote the set of positions on s at which the occurrences of x ....
....are less speci c than it. Unfortunately, this property does not bound signi cantly the number of maximal motifs. For example, A ATA A contains an exponential number of them for q = 2 (see Section 2) A further requirement on the maximal motifs is the notion of irredundant motifs ([7]) A maximal motif x is redundant if there exist maximal motifs y 1 ; y k 6= x such that the set of occurrences of x satis es L x = L y1 [ L yk ; it is irredundant otherwise. The set of occurrences of a redundant motif can be covered by other sets of occurrences while that of an ....
[Article contains additional citation context not shown here]
L. Parida, I. Rigoutsos, A. Floratos, D. Platt, and Y. Gao. Pattern Discovery on Character Sets and Real-valued Data: Linear Bound on Irredundant Motifs and Ef- cient Polynomial Time Algorithm. In SIAM Symposium on Discrete Algorithms, 2000.
....or to the right by adding further symbols or by replacing any of its don t care symbols by an alphabet letter, without losing any occurrences. For example, motif M T E is maximal in COMMITTEE. Unfortunately, the notion of maximality is not sucient to bound the number of interesting motifs [7]. A signi cant step in this eld has been the introduction of the notion of basis. Informally speaking, a basis is a set of motifs that can generate all the maximal motifs by simple mechanical rules. The paper by Parida et al. 7] proposes a mathematical way of expressing this notion in a novel ....
....is not sucient to bound the number of interesting motifs [7] A signi cant step in this eld has been the introduction of the notion of basis. Informally speaking, a basis is a set of motifs that can generate all the maximal motifs by simple mechanical rules. The paper by Parida et al. [7] proposes a mathematical way of expressing this notion in a novel way. It relies on the idea that a basis of motifs can be de ned in the algebraic sense of the term. The maximal motifs in the basis de ned in [7] are irredundant. Representing each occurrence of a motif by its starting position in ....
[Article contains additional citation context not shown here]
L. Parida, I. Rigoutsos, A. Floratos, D. Platt, and Y. Gao. Pattern Discovery on Character Sets and Real-valued Data: Linear Bound on Irredundant Motifs and Ecient Polynomial Time Algorithm. In SIAM Symposium on Discrete Algorithms (SODA), 2000.
....in other real strings over a small alphabet occurring in practice, e.g. DNA sequences. Some heuristics try to alleviate this drawback by reducing the number of interesting motifs to make feasible any further processing of them, but they cannot guarantee sub exponential bounds in the worst case [13]. In this paper we explore the algorithmic ideas behind motif discovery while getting some insight into their combinatorial complexity and their connections with string algorithmics. Given a motif x for a string s of length n, let us denote the set of positions on s at which the occurrences of x ....
....of the irredundant motifs of string s with quorum q is the set of all the irredundant motifs in s. Informally speaking, a basis can generate all the other maximal motifs by simple mechanical rules and can be expressed mathematically in the algebraic sense of the term. According to Parida et al. [13], what makes interesting the irredundant motifs is that their number is always upper bounded by 3n independently of any chosen q 2; moreover, their discovery takes O(n log n) time because of this bound, notwithstanding the exponential number of maximal motifs that are candidates for the ....
[Article contains additional citation context not shown here]
L. Parida, I. Rigoutsos, A. Floratos, D. Platt, and Y. Gao. Pattern Discovery on Character Sets and Real-valued Data: Linear Bound on Irredundant Motifs and Ecient Polynomial Time Algorithm. In SIAM Symposium on Discrete Algorithms (SODA), 2000.
....[12, 29, 41, 47, 48, 49] Extension of the algorithm to deal with models m de ned over P( is straighforward [17, 16] no error) 39] There is, however, a notion of redundancy that appears at the level of models over P( that is not trivial to treat. The interested reader is referred to [27] and [36] for further details. 1.5.2.2.3 Complexity In the case of no error, the di erence in time complexity between KMR KMRC and the algorithm sketched above varies only in that a log k term for the rst approach is changed into a k term for the last one. Before giving the complexity for the ....
L. Parida, I. Rigoutsos, A. Floratos, D. Platt, and Y. Gao.Pattern discovery on character sets and real-valued data: linear bound on irredundant motifs and polynomial time algorithms.In Proc. of the eleventh ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 297-308. ACM Press, 2000.
....#L, W # patterns. For details see [Rigoutsos and Floratos, 1998b] Conclusions. The TEIRESIAS algorithm is an exact algorithm. It is guaranteed to find all #L, W # maximal patterns supported by at least K sequences. However, number of such patterns can be very high. In particular, in [Parida et al. 2000] it is fix and su#x ordering in reverse order compared to [Rigoutsos and Floratos, 1998b] 15 shown, that the number of maximal patterns can be exponential. In such case TEIRESIAS will take exponential time. However, such situation is not likely to happen in the real data. For example, in entire ....
....all patterns of the specified form will be found. 3.3.2 Work related to TEIRESIAS algorithm Irredundant patterns. One of the drawbacks of TEIRESIAS algorithm is potentially exponential size of the output and thus potentially exponential running time. This issue was recently addressed in [Parida et al. 2000] by a new algorithm. This algorithm imposes more restrictions on the set of patterns written to the output. They define a set of irredundant patterns such that all other patterns can be easily obtained from this set. The size of this set is at most 3n where n is the length of the input. Also they ....
Parida, L., Rigoutsos, I., Floratos, A., Platt, D., and Gao, Y. (2000). Pattern discovery on character sets and real-valued data: linear bound on irredundant motifs and an e#cient polynomial time algorithm. In Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 297-- 308.
....an illustration, the meet of suf7 and suf5 is represented in bold in our figure. We find it convenient to assume that for each meet m, the sorted list m is appended to the entry that corresponds to the first match of m in the rightmost diagonal where m appears. For example, in Fig. 1, the match M[6, 21] defines rn = aba, and points to the list rn = 1, 4, 6, 9, 12, 14, 21) The match M[1, 20] identifies m2 = a. ab, and similarly yields to the list m2 = 1, 3, 6, 9, 11, 20) b s b o b s aba a b a ba a b a a b ab a b b a a a b a b 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ....
....can be computed incre mentally in O(n a) time. Conclusion We examined the problem of extracting maximal, irredundant motifs that occur at least twice in a string. The notion of irredundancy for motifs with at least k occurrences for general k had been introduced and studied in previous work [6]. Here we give a linear upper bound for the number of irredundant 2 motifs in a string. It is easy to adapt arguments such as that of Theorem i so that it generalizes to k motifs, however, the extension of our constructions to the general case seems more involved. The motif discovery paradigms ....
L. Parida, I. Rigoutsos, A. Floratos, D. Platt, Y. Gao, Pattern Discovery on Character Sets and Real-valued Data: Linear Bound on Irredundant Motifs and Polynomial Time Algorithms, Pro- ceedings of the eleventh A CM-SIAM Symposium on Discrete Algorithms (SODA 2000.
No context found.
L. Parida, I. Rigoutsos, A. Floratos, D. Platt, and Y. Gao. Pattern discovery on character sets and real-valued data: linear bound on irredundant motifs and an ecient polynomial time algorithm. In Proceedings of the 11th Symposium on Discrete Algorithms, pages 297-308, 2000.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC