| M.A. Roytberg, A search for common patterns in many sequences, CABIOS (1992. |
....(Figure 2) of the encoded data instead of the actual graphical user interface. 950] 5 1 [672] 168] 3 1884 [538] 133] 118 49 [197] 106] 28 3 [ 96] 84] 1038 6423 [ 86] 60] 4 14 [ 71] 46] 1884 994 [ 53] 38] 16 13 [ 16] [ 31] 994 4 4 [ 14] 27] 995 29 12 [ 11] 20] 7 6423 21 [ 8] 17] 27 1884 15 [ 8] 14] 29 7 [ 7] 8] 6423 11 [ 7] 7] 1018 5 [ 6] 6] 1008 29 [ 5] 6] 12 45 [ 3] 5] 1000 42 [ 3] 5] i 41 [ 2] 5] 79 129 [ 2] 5] 1001 1010 [ 2] ....
....integrates with the presented Path Analysis infrastructure. For example, consider the possibility of dynamically updating page content for a customer on a particular path. 7. RELATED WORK General sequence analysis has a long history. Applications cover many fields, including biology [19] [16], information retrieval [12] 17] and time series [5] A sequential patterns approach to retail basket data mining is proposed in [3] and [18] Academic and experimental work in Web data analysis is exploding in popularity. This paper concerns itself with Web site log data analysis. The ....
M.A.Roytberg. A search for common patterns in many sequences. Computer Applications in the Biosciences, 8(1), 57-64, 1992.
....conserved, regions in a group of sequences are at the core of many molecular biology problems. We solve three main open questions in this area. Assume that we are given n DNA sequences s1 ; sn . The Consensus Patterns problem, which has been widely studied in bioinformatics research [26, 16, 12, 25, 4, 6, 15, 22, 24, 27], in its simplest form, asks for a region of length L in each s i , and a median string s of length L so that the total Hamming distance from s to these regions is minimized. We show the problem is NPhard and give a polynomial time approximation scheme (PTAS) for it. We also give a PTAS for the ....
....of unaligned bio sequences. It is the central term to minimize in various objective functions in [26, 16, 12, 25] The authors in [12, 16, 25, 26] gave heuristic or exponential time algorithms, and developed a working system for this problem. Other related software and applications can be found in [4, 6, 15, 22, 24, 27]. Taking only the maximum term without the log factor in each c j in General Consensus Patterns gives Consensus Patterns and its complement Max Consensus Patterns. Another motivation for studying Consensus Patterns is that it is applicable to a restricted case of Star alignment. Star alignment ....
M.A. Roytberg, A search for common patterns in many sequences, CABIOS (1992) 57-64.
.... [72, 84] The exact notion of sufficient similarity varies between different methods of sequence comparison but is invariably defined in terms of the edit distance (using appropriate cost matrices) between Q and A and the statistical distribution of this distance [72, 61, 60] Other methods [112, 50, 98, 51, 79, 96, 87, 95, 57, 77, 89, 108, 117, 103, 90] exploit the first fact of biological sequence analysis in a different manner: they start with several proteins from the same family (e.g. all the hemoglobin proteins from different mammals) and try to identify sequence signatures diagnostic of that protein family. A sequence signature is usually ....
....does not appear in the same sequences where a previously generated pattern has appeared. At the next expansion step, the set S 0 will comprise those strings that match the new pattern P 0 and the next column of the alignment will be considered. A different heuristic is proposed by Roytberg [87]. Although his method does not directly use alignment, it works in a way reminiscent of the other algorithms in this class in that it gets information about potential patterns by pairwise comparisons of the input strings. One of the input sequences is selected as the basic sequence and is compared ....
M.A. Roytberg. A search for common patterns in many sequences. CABIOS, 8:57--64, 1992. 234
.... recently methods have been proposed which instead of directly searching for consensus elements among the sequences compute pairwise similarities and then assemble the overall multiple alignment from pairwise comparisons (Vihinen [Vih88] Vingron and Argos [ViA91] Schuler et al. SAL91] Roytberg [Roy92], Miller [Mil93] 1 In molecular biology similarities between sequences are frequently depicted in the form of dot matrices ( MaL81] Arg87] Formally we will view a dot matrix simply as a matrix with all entries either 0 or 1, where a 1 at position (i; j) denotes the existence of similarity ....
....by all sequences. The above problem has already attracted the attention of several researchers. Vihinen [Vih88] proposed to superimpose pairwise dot matrices by choosing one reference sequence and relating all others to this one. This approach has been taken significantly farther by Roytberg [Roy92]. Gotoh [Got90] has defined a notion of consistency for three sequences. Schuler et al. SAL91] use diagonals instead of dots as the basic entity and devise a heuristic to assemble those into a multiple alignment. An exact algorithm for assembling an n2 dimensional dot matrix from 2 dimensional ....
Roytberg, M.A. (1992) A search for common patterns in many sequences. Comp. Appl. Biosci., 8, 57-64. 25
.... of motifs could potentially be exponential in the size of the input sequence and in the third case there could be infinite number of motifs (see Section 6) Pattern or motif discovery in data is widely used as a means of understanding large volumes of data such as DNA or protein sequences [Roy92, JCH95, NG94, SV96, WCM 96, SNJ95, RF98, Cal99] One of the major di#culties with the pattern discovery problem is the size of the output. Typically, the higher the self similarity in the sequence, greater is the number of motifs in the data. Motif discovery on such data, such as repeating ....
M. A. Roytberg. A search for common patterns in many sequences. CABIOS, pages 57--64, 1992.
....the sequences have only short regions of local similarities, this approach makes no sense. There are also techniques based on local similarity search. The techniques work effectively when similarities meet some constraints, such as they occur in a predetermined number of sequences in the database [69], they differ by mismatches, but not by insertion deletions [5] or they are situated at almost the same distance from the start of the sequences [85] In contrast to the above techniques, our approach can find similarities composed of nonconsecutive segments separated by variable length don t ....
M. A. Roytberg, "A search for common patterns in many sequences," Computer Applications in Biosciences, vol. 8, no. 1, pp. 57--64, 1992.
....optimal. An example is presented in Fig. 1. Please insert Fig. 1 about here Fig. 1. An alignment amongst five DNA sequences (adapted from Fig. 6 in [13], with permission from Oxford University Press) 2.1 What is an Optimal Alignment Intuitively, an optimal or good alignment amongst two or more sequences of symbols is one which is relatively long, involves a relatively large number of sequences, has few gaps (or no gaps at all) and, ....
Roytberg MA (1992). A search for common patterns in many sequences, Cabios, 8(1), 5764.
....The new search strategy has been implemented in a new version of the Pratt program. 1 Introduction The automatic discovery of patterns conserved in a set of bio sequences is an important problem in computational molecular biology. A number of different approaches have been proposed, for instance [Sta89, SS90, SAC90, Roy92, NG94, JCH95]. A discussion of the problem and an overview of proposed methods is found in [BJEG95] In [JCH95] we described an algorithm for discovering all patterns in a user specified class of patterns, that matches some minimum number of a given set of sequences. The algorithm was implemented in a program ....
M. A. Roytberg. A search for common patterns in many sequences. CABIOS, 8(1):57--64, 1992.
....is very important if we want to use it for predicting biological properties of the sequences too specific a language may not be expressive enough, too a general language may lead to a hypothesis space too large for efficient search. Biologists have introduced quite a large number of languages [3, 8, 9, 11, 12, 14, 15, 16] each of which differs from the others in more or less important ways, as well as differing from what is commonly understood by pattern languages in computer science [1, 2, 5, 10] Up to now, computer scientists have paid relatively little attention to these biopattern languages. One of the ....
M. A. Roytberg. A search for common patterns in many sequences. CABIOS, 8(1):57-- 64, 1992.
....conserved, regions in a group of sequences are at the core of many molecular biology problems. We solve three main open questions in this area. Assume that we are given n DNA sequences s 1 ; s n . The Consensus Patterns problem, which has been widely studied in bioinformatics research [26, 16, 12, 25, 4, 6, 15, 22, 24, 27], in its simplest form, asks for a region of length L in each s i , and a median string s of length L so that the total Hamming distance from s to these regions is minimized. We show the problem is NP hard and give a polynomial time approximation scheme (PTAS) for it. We also give a PTAS for the ....
....of unaligned bio sequences. It is the central term to minimize in various objective functions in [26, 16, 12, 25] The authors in [12, 16, 25, 26] gave heuristic or exponential time algorithms, and developed a working system for this problem. Other related software and applications can be found in [4, 6, 15, 22, 24, 27]. Taking only the maximum term without the log factor in each c j in General Consensus Patterns gives Consensus Patterns and its complement Max Consensus Patterns. Another motivation for studying Consensus Patterns is that it is applicable to a restricted case of Star alignment. Star alignment ....
M.A. Roytberg, A search for common patterns in many sequences, CABIOS (1992) 57-64.
....order for the discovery algorithm to discover patterns discriminating between these and the family members. 3.1. 1 Only positive or both positive and negative examples Most methods reported in the bioinformatics literature, use only positive examples, e.g. WAG84, LAC89, Sta89, SS94, SS90, VA91, Roy92, NG94, SAC90, WMS 94, JCH95, Jon96] However, there are some exceptions. For instance, Ogiwara et al. OUSK92] analyse a database of protein sequences where the sequences are grouped into (super ) families. Each family is analysed separately; the sequences in the family are treated as ....
....have any well conserved columns) A number of methods have been proposed for the problem of local multiple sequence alignment, e.g. SAL91, LAB 93] Most methods for local multiple alignment can only find ungapped alignments, i.e. blocks. However, some allow for insertions deletions, e.g. Roy92, SVS95a] The approaches of global sequence alignment and pattern discovery (or local sequence alignment) have complementary strengths. When the sequences share global similarity, methods for global sequence alignment often provide good alignments. In such cases, local alignments, for instance ....
M. A. Roytberg. A search for common patterns in many sequences. CABIOS, 8(1):57--64, 1992.
....of DNA sequences is shown in Figure 1. Please insert Figure 1 about here Figure 1 A good alignment amongst five DNA sequences (adapted from Fig. 6 in [31], with permission from Oxford University Press) In this area of research, it is widely recognised that the number of possible alignments of symbols is normally too large to be searched exhaustively and that, to achieve a search which has acceptable speed and acceptable scaling properties, ....
Roytberg, M. A. A. (1992) Search for Common Patterns in Many Sequences,. Cabios, 8 (1), 57-64.
....to all given sequences. In addition to the pattern that is common to all sequences, the algorithm also obtains patterns common to subsets of related sequences, therefore the algorithm can be also used for classification (in fact for unsupervised learning) A different heuristic is developed by Roytberg (1992). One sequence is selected as the basic sequence, and all the other sequences (so called serial sequences) are aligned against it. This approach corresponds to a dendrogram of the type given in figure 5. The algorithm finds the substrings in the basic sequence that have approximate matches in all, ....
.... (Fortran) Vax VMS n a (Smith and Smith, 1990) Fa N N protein (Smith, et al. 1990) Ba N Y protein MOTIF Src (TurboC) IBMPC WWW (Vingron and Argos, 1991) Ga [FILLOG SUM] Gb [FILMAXAV ] N N protein unkown A (Kudo et al. 1992) Ba, Ca Y Y DNA (Ogiwara et al. 1992) Ga N Y N protein (Roytberg, 1992) Ab N N protein, DNA MuSCo IBMPC, IBM 370 n a avail (Arikawa et al. 1993) Gd N N protein (Neuwald and Green, 1994) Da N N protein ASSET Src SPARC2 a ftp (Saqi and Sternberg, 1994) Ca N N protein (Wang et al. 1994) Gb N N protein DISCOVER, CLASSIFY Ex DOS, DEC Ultra, SunSPARC A ....
Roytberg, M. A. 1992. A search for common patterns in many sequences. Comput. Applic.
....where one sequence is chosen as a basic sequence and all other sequences are aligned against it is common to all sequences, we also obtain patterns common to subsets of related sequences, therefore the algorithm can be also used for classification. A different heuristic is developed by Roytberg [Roy92] One sequence is selected as the basic sequence, and all the other sequences (the so called serial sequences) are aligned against it. This approach would correspond to a dendrogram of the type given in Figure 6. The substrings in the basic sequence are found that have approximate matches in all, ....
.... Y DNA unknown Fortran77 Vax VMS n a [SS90] 2E, exact N protein none [SAC90] 2E, exact, before TD element Y protein MOTIF Turbo C IBMPC n a [VA91] 3A [FILLOG SUM] 3A, approx [FILMAXAV ] N align protein unkown A [KKASI92] 1C exact Y DNA none [OUSK92] 3A, exact Y N protein none [Roy92] 2A, approx N protein, DNA MuSCo IBMPC, IBM 370 n a avail [AMS 93] 3A, decision trees N protein none [NG94] 2C N protein ASSET Src SPARC2 a ftp [SS94] 2C N protein none [WMS 94] 3A, approx N protein DISCOVER, CLASSIFY Ex DOS, DEC Ultra, SunSPARC A [JCH95] 2E Y N protein ....
M. A. Roytberg. A search for common patterns in many sequences. CABIOS, 8(1):57--64, 1992.
....in figuring out what patterns to try and then efficiently finding out which ones are contained in a customer sequence. Techniques based on multiple alignment [11] have been proposed to find entire text sequences that are similar. There also has been work to find locally similar subsequences [4] [8] [9] However, as pointed out in [10] these techniques apply when the discovered patterns consist of consecutive characters or multiple lists of consecutive characters separated by a fixed length of noise characters. Closest to our problem is the problem formulation in [10] in the context of ....
M. Roytberg. A search for common patterns in many sequences. Computer Applications in the Biosciences, 8(1):57--64, 1992.
....of DNA sequences is shown in Figure 1. Please insert Figure 1 about here Figure 1 A good alignment amongst five DNA sequences (adapted from Fig. 6 in Roytberg (1992), with permission from Oxford University Press) In this area of research, it is widely recognised that the number of possible alignments of symbols is normally too large to be searched exhaustively and that, to achieve a search which has 7 acceptable computational complexity, heuristic ....
Roytberg, M. A. A. (1992) Search for Common Patterns in Many Sequences,. Cabios, 8 (1), 57-64.
....to try and then efficiently finding out which of those patterns are contained in enough data sequences. Techniques based on multiple alignment [Wat89] have been proposed to find entire text sequences that are similar. There also has been work to find locally similar subsequences [AGM 90] Roy92] VA89] However, as pointed out in [WCM 94] these techniques apply when the discovered patterns consist of consecutive characters or multiple lists of consecutive characters separated by a fixed length of noise characters. 1.2. Organization of the Paper We give a formal description of ....
M. A. Roytberg. A search for common patterns in many sequences. Computer Applications in the Biosciences, 8(1):57--64, 1992.
No context found.
M.A. Roytberg, A search for common patterns in many sequences, CABIOS (1992.
No context found.
M. A. Roytberg. A search for common patterns in many sequences. CABIOS, pages 57--64, 1992.
No context found.
) Roytberg, M. A., "A Search for Common Patterns in Many Sequences," Cabios, 8,
No context found.
Roytberg, M. A. (1992) A search for common patterns in many sequences. Cabios. 8(1), 57-64.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC