| H. Carrillo and D.J. Lipman. The multiple sequence alignment problem in biology. SIAM J. Appl. Math., 48:1073--1083, 1988. |
.... By our experiments, both the alignment quality and required time of our algorithm are better than Clustal W algorithm (using the neighboring joining method) 1 Introduction Multiple sequence alignment (MSA) is important in functional, structural and evolutionary studies of biological sequences [5,6,8]. Due to its importance, many algorithms have been proposed [9, 11, 19, 21, 22] The multiple sequence alignment problem is to obtain the alignment of a set of sequences with the best score based on some given scoring criteria [6] A biological sequence is a string consisting of characters chosen ....
....NP complete [13] A polynomial time approximation scheme has been presented [12] An optimal alignment of k sequences can be obtained by dynamic programming in O 2n # # with O # space [13] where n is the maximum length over all sequences. The algorithms proposed by Carillo and Lipman [5] and Kececioglu [14] can reduce the time for finding an exact solution by reducing the search space with an upper bound. For the SP alignment, Gusfield [10] presented a # 2 # k # approximation algorithm, and Pevsner [16] reduced the approximation factor to 3 # k # . Bafna et al. 4] ....
H. Carrillo and D. J. Lipman. The multiple sequence alignment problem in biology. SIAM 7
....ACCGTTAC ACGTCTAC ACGTTA GACCGTT (a) Given tree ACCGT TAC 0 2 GACCGT T 1 A CGTCTAC 1 AC GT TA (b) Solution Figure 2: Steiner Tree Alignment Example: Total distance = 4 2. 4 Open questions A primary method for obtaining approximate solutions to MSA is the CarrilloLipman [23, 30] use of projection, using dynamic programming in 2 dimensional state spaces to infer bounds. Gus eld [38] gave a simple proof of an approximation algorithm, which guarantees being no worse than twice the alignment score, using the sum of pairwise alignments (for any distance function) One open ....
H. Carrillo and D. Lipman. The multiple sequence alignment problem in biology. SIAM Journal of Applied Mathematics, 48(5):10731082, 1988.
.... prominently applied in the comparison of genetic material such as DNA, RNA, and protein sequences [5] The applications of sequence alignment include searching sequence databases for specific genes or patterns [4] and discovering phylogenetic relationships through the use of multiple alignments [3]. Sequence alignment is a generalization of the classic longest common subsequence problem (lcs) Given two strings A = a 1 a 2 . a m and B = b 1 b 2 . b n , with m = n, over some alphabet S of size s, the lcs problem is to find a sequence of greatest possible length that can be obtained from ....
Carrillo, H., and Lipman, D. The multiple sequence alignment problem in biology. SIAM Journal of Applied Math 48(5): 1073-1082, 1988.
....stated above has not deterred people from taking stabs at it. Bafna et al. 12] present an approximation algorithm finding a multiple alignment of k sequences with a sum of pairs score that is at most 2 from the optimal score in polynomial time for fixed l (and k l) Carrillo and Lipman [25] and Gupta et al. 51] suggest bounding the search space for finding the optimal multiple alignment. A multitude of heuristics have been 19 TG Man Mouse ACG Bird AGTC Alligator AG Figure 2.5: An example evolution of a hypothetical ancestral sequence into four di#erent homologous ....
H. Carrillo and D. Lipman. The multiple sequence alignment problem in biology. SIAM Journal on Applied Mathematics, 48:1073--1082, 1988. 162
....a score function that assigns a score to each possible multiple alignment describing its quality with respect to some criteria. Secondly, it involves constructing a method to compute a multiple alignment with optimal score. The sum of pairs score function introduced by Carillo and Lipman in [36] defines the score of a multiple alignment of k strings as the sum of the scores of the k(k 1) 2 pairwise alignments induced by the multiple alignment. It A A G A A A T A A T G C T G C C A G T T C C G Figure 2.6: A multiple alignment of five strings. is di#cult to give a ....
H. Carrillo and D. Lipman. The multiple sequence alignment problem in biology. SIAM Journal on Applied Mathematics, 48:1073--1082, 1988.
....j # = x. 2 126 Note that any vertex v # V whose corresponding vertex v (k) p1 , p k (#) ## V (k) p1 , p k (#) for some k and p 1 , p k can be ignored in the search of the shortest path s to t in G, which is regarded as the extension of Carrillo and Lipman s bounding techniques [1] to k # 3 dimensional sub alignments. Finally we must consider the way of computing V (k) p1 , p k (#) for G (k) p1 , p k . Fortunately it is rather easy for the A # algorithm to compute the exact V (k) p1 , p k (#) e#ciently. For simplicity, we use G,V, V# , s, t, v, x instead of G (k) ....
Carrilo, H. and Lipman, D.J., The Multiple Sequence Alignment Problem in Biology, SIAM J. Appl. Math., 48(5):1073--1082, 1988.
....as a natural constraint while dealing with a large number of sequences. 1 Introduction Given a set of N sequences, the Multiple Sequence Alignment problem is to align these N sequences, possibly with gaps, that brings out the best commonality of the N sequences. Various alignment cost functions [2, 3, 4, 6, 8, 7, 14, 15, 12, 9], have been used in literature. The general approach to solving the pairwise (N = 2) sequence alignment problem has been a dynamic programming technique using different mechanisms of scores which is a function of the edit distance, along with gap penalties, to evaluate the similarity of the ....
Carrillo, H. and Lipman, D., The multiple sequence alignment problem in biology, SIAM J. Appl. Math., 481073--1082, 1988.
.... i ;p j 1=x: 2 126 Note that anyvertex v 2 V whose corresponding vertex v (k) p1 ;111;p k (1) 62 V (k) p1 ;111;p k (1) for some k and p 1 ; 111;p k can be ignored in the search of the shortest path s to t in G, which is regarded as the extension of Carrillo and Lipman s bounding techniques [1] to k 3 dimensional sub alignments. Finally wemust consider the wayofcomputing V (k) p1 ;111;p k (1) for G (k) p1 ;111;p k . Fortunately it is rather easy for the A 3 algorithm to compute the exact V (k) p 1 ;111;p k (1) efficiently. For simplicity,we use G; V; V1;s;t;v;x instead of G ....
Carrilo, H. and Lipman, D.J., The Multiple Sequence Alignment Problem in Biology, SIAM J. Appl. Math., 48(5):1073--1082, 1988.
....The scoring function used can drastically affect the performance of certain alignment algorithms. Scoring functions are discussed more thoroughly in [3] and [7] 2. 2 Sequence Alignment As Path Finding Sequence alignment can be formalized as a shortest path problem through a d dimensional lattice [2], where d is the number of sequences being aligned. The alignment statespace consists of nodes where either the next character from a sequence is accepted or a gap is entered. We start from an Matrix computeMatrix(Sequence A, Sequence B) int x,y,max; Matrix m = newMatrix(length(A) ....
H. Carrillo and D. Lipman. The multiple sequence alignment problem in biology. SIAM Journal of Applied Mathematics, 48(5):1073--1082, 1988.
....Ent VNKMDAIQ YKQERYEEIKKEI SAFLKKTGYNPDKIP Pla VNKMDTVK YSEDRYEEIKKEV KDYLKKVGYQADKVD Hyphens, or gaps, are inserted into the sequences so that the same character occupy the same column. We formulate the multiple sequence alignment problem as a search problem (Carrillo Lipman 1988; Ikeda Imai 1994) The following notations are used. d Number of sequences to be aligned. S k k th sequence. L(S 1 ; S d )State space which is a d dimensional lattice. k th axis corresponds to S k . This lattice is the Cartesian product of the d sequences. L(S i ;S j ) Two dimensional ....
Carrillo, H., and Lipman, D. 1988. The multiple sequence alignment problem in biology. SIAM Journal Applied Mathematics 48:1073--1082.
....from their associated sequences. It also plays an important role in determining similar, highly conserved regions across di erent sequences, that might not be so obvious when comparing only two sequences at each time. For di erent algorithms on Multiple Sequence Alignment refer to [CWC92] AL89] [CL88], Hei89] and [Gus97] One of the more commonly used approaches to solving the multiple alignment problem is the tree alignment approach. The ancestral relationship between k given sequences is generally depicted as a phylogenetic tree or as a phylogeny. A phylogenetic tree is the hypothetical ....
H. Carrillo and D. Lipman. The multiple sequence alignment problem in biology. SIAM J. Appl. Math., 48:1073-1082, 1988.
....the alignment problem. We define the distance of two sequences A i = A 1 i : A l i and A j = A 1 j : A l j of the same length l as D(A i ; A j ) l X k=1 d(A k i ; A k j ) Carrillo and Lipman introduced a scoring scheme for alignments called sum ofpair score (SP score, see [4]) The SP score of an alignment A = fA 1 ; An g is given by D(A) X 1i jn D(A i ; A j ) Multiple sequence alignment (MSA) is the problem of finding an alignment with minimum SP score. A generalization of MSA to weighted SP score was introduced by Wu et al. 11] Let W : wS i ;S ....
H. Carrillo and D. Lipman. The multiple sequence alignment problem in biology. SIAM J. Appl. Math., 48:1073--1082, 1988.
.... 1 1 1 0 0 1 1 0 0 2 1 3 1 3 3 2 8 0 4 2 6 2 5 5 4 3 5 5 5 4 3 2 Figure 2: The modified PAM 250 matrix Formulation as the shortest path problem The multiple sequence alignment problem can be formulated as the shortest path problem in the d dimensional lattice (Carrillo Lipman 1988). Let d be the number of sequences to be aligned and S k be the k th sequence. L(S 1 # : # S d ) State space which is a d dimensional lattice. i th axis corresponds to S i . s The start node. t The target node. fl Path from s to t in L(S 1 # : # S d ) k The length of fl. n i The i th node ....
Carrillo, H., and Lipman, D. 1988. The multiple sequence alignment problem in biology. SIAM Journal Applied Mathematics 48: 1073-1082.
.... which reduces the search space considerably (see [4, 7, 9] Our two main contributions are (1) to provide an efficient implementation of the A algorithm for exactly solving the SP alignment problem with quasi natural gap costs, which can be shown to be superior to the Carrillo Lipman bounding [2] applied in MSA, and (2) combining this algorithm with the DCA approach in order to develop an iterative procedure for computing multiple alignments with a nice time versus quality tradeoff. In Section 2 we will review the techniques of Gupta et al. and the A algorithm. Section 3 describes the ....
....so Dijkstra s algorithm with the simple bounding procedure can be used as before. We can choose l(u) L(u) because L fulfills the consistency condition. It is worthwile noting that it can be shown [4, 7] that the above redefinition of edge costs implies the well known Carrillo Lipman bound [2]. We used this result to implement an exact multiple sequence alignment algorithm that provably explores at most as many nodes as any algorithm using Carrillo Lipman bounding. In the next section we describe how we make use of this algorithm in our newly proposed heuristics. 3 Iterative ....
H. Carrillo and D. Lipman. The multiple sequence alignment problem in biology. SIAM J. Appl. Math., 48(5):1073--1082, 1988.
....requires time proportional to 2 n l n , for a problem with n sequences each of length at most l. Considering that in typical real life instances l can be a few hundred, the basic dynamic programming approach turns out to be infeasible for all but very small problems. Carrillo and Lipman [5] have introduced some bounding criteria which reduce the time and space requirements of dynamic programming and make solvable problems for n 6 and l 200. However, constructing optimal alignments is bound to be computationally expensive, since the problem has been shown NP complete (Wang and ....
H. Carrillo and D. Lipman, The multiple sequence alignment problem in biology, SIAM J. Appl. Math., 49 (1989), pp. 197--209.
....to help design new proteins. Multiple sequence alignment belongs to a class of optimization problems with exponential time complexity, called combinatorial problems. It exhibits O(L N ) time complexity where L is the mean length of the sequences to be aligned and N is the number of sequences [Carrillo88]. In biology, the sequences can have lengths in the order of hundreds (proteins) thousands (RNA) or millions of units (DNA) This results in unacceptable long times, even for aligning only a few sequences [Sankoff83] To compare different alignments, a fitness function is defined based on the ....
Carrillo H., and Lipman D., "The multiple sequence alignment problem in biology", Siam J. Appl. Math., vol. 48, no. 5, pp. 1073-1082, October 1988.
....rarely allow one to reconsider an alignment of a subfamily in the light of information coming from sequences outside that subfamily. On the other hand, significant progress with implementing dynamic programming procedures for simultaneous multiple alignment was made by H. Carrillo and D.J. Lipman [7] in the late eighties: it became possible to align simultaneously and optimally between six and eight protein sequences (of medium length and comparatively high pairwise similarity) in some minutes. This was achieved by a branch and bound approach, cutting down the (high dimensional) search space ....
H. Carrillo and D.J. Lipman. The Multiple Sequence Alignment Problem in Biology. SIAM J. Appl. Math., 48(5), pages 1073--1082, 1988. 22
....problem is to align these N sequences, possibly with gaps, that brings out the best commonality of the N sequences. Broadly speaking, the quality of the alignment is usually measured by penalizing the mis matches and gaps, and rewarding the matches with appropriate weight functions [Alt89, CHM94, CL88, GKS95, HTHI95, HS88, Wat94, Wat95] This measure works best for small values of N (about 2 6) and in [ZZ96, Vih98] the authors handle the case of N 2 by first doing a pairwise alignment for some or all possible pairs in some order and then building a N wise alignment from these. However ....
H. Carrillo and D. Lipman. The multiple sequence alignment problem in biology. SIAM Journal of Applied Mathematics, pages 1073--1082, 1988.
....from the sequences, with the weight of the tree defined as the number of changes between pairs of sequences that correspond to two adjacent nodes in the tree, summed over all such pairs. ffl Sum of Pairs The sum of pairwise distances between all the pairs of sequences. Carrillo and Lipman [3] found a heuristic method for accelerating the search for the best multiple alignment. The method is based on the property that if the strings are relatively similar, the alignment path would be close to the main diagonal, therefore not all the values in the multi dimensional cube need to be ....
H. Carrillo and D. Lipmann. The multiple sequence alignment problem in biology. SIAM J. Appl. Math, 48:1073--1082, 1988.
....from the sequences, with the weight of the tree defined as the number of changes between pairs of sequences that correspond to two adjacent nodes in the tree, summed over all such pairs. Sum of Pairs The sum of pairwise distances between all the pairs of sequences. Carrillo and Lipman [3] found a heuristic method for accelerating the search for the best multiple alignment. The method is based on the property that if the strings are relatively similar, the alignment path would be close to the main diagonal, therefore not all the values in the multi dimensional cube need to be ....
H. Carrillo and D. Lipmann. The multiple sequence alignment problem in biology. SIAM J. Appl. Math, 48:1073--1082, 1988.
....graph with changed edge weights always yields better results than the well known Carillo Lipman bounding. Carillo and Lipman employ a different idea to reduce the number of vertices in the dynamic programming graph. The following property holds for any optimal multiple alignment A (c.f. [CL88]) 3 Theorem 2.1 (Carillo, Lipman) Let A be an optimal alignment of the K strings S 1 ; SK , L : L(s t) be the lower bound defined in Equation 1 and U = c(A heur ) be an upper bound for c(A ) Then the following inequality holds for every projection on a pair S i ; S j of ....
H. Carrillo and David J. Lipman. The multiple sequence alignment problem in biology. SIAM J. Appl. Math., 48(5):1073--1082, 1988.
....it does not contain any column consisting of gaps only and, for each i = 1; 2; k, the row (m i1 ; m i2 ; m iN ) reproduces the sequence s i upon eliminating all of its gap letters. The weighted sum of pairs multiple sequence alignment problem can now be described as follows (cf. [4]) Given s 1 ; s 2 ; s k , and given a scoring function D : A [ f Gammag) 2 R, defined on all possible pairs of letters, find an optimal alignment M , i.e. an alignment that minimizes w(M) X 1p qk 0 ff p;q Delta N X j=1 D(m pj ; m qj ) 1 A ; where the ff p;q are ....
....programming with a running time proportional to (2 k Gamma 1) Delta Q k i=1 n i , searching for a shortest alignment path in a directed graph with Q k i=1 n i vertices. Faster variants and a number of speedups of dynamic programming have therefore been proposed (see for instance [14] and [4]) Unfortunately, most of these approaches still need quite large computing time and computer memory when applied to more than, say, six protein sequences of average length 300. 3 The Basic Algorithm A first outline of our algorithm has been presented at the Third International Conference on ....
H. Carrillo and D. Lipman. The Multiple Sequence Alignment Problem in Biology. SIAM J. Appl. Math., 48(5):1073--1082, 1988.
....criterion for multiple sequence alignment. In order to circumvent these problems, one just has to stick to the original task of trying to construct high quality simultaneous alignments. In the late eighties, significant progress with this technique was made by H. Carrillo and D.J. Lipman [7]: it became possible to align simultaneously and optimally up to between six and eight protein sequences (of medium length and comparatively high pairwise similarity) in some minutes. To this end, the (high dimensional) search space used in the standard dynamic programming approach was reduced ....
H. Carrillo and D.J. Lipman. The Multiple Sequence Alignment Problem in Biology. SIAM J. Appl. Math., 48(5), pages 1073--1082, 1988.
....major problems in computational molecular biology. Consequently, the design and study of alignment procedures is presently a very active area of research, with an abundance of publications and software contributions (see [48] 34] 9] or [28] for a survey) While some papers (e.g. 27] 26] [8], 20] 1] propose procedures to speed up the search for the true multiple sequence alignment, that is, the optimal alignment relative to some predefined optimality criterion (see below) others ( 39] 17] 42] 23] 6] like the present one propose computationally efficient ....
.... by 1 w(m 1j ; m 2j ; m nj ) X k k 0 ffi(m kj ; m k 0 j ) Delta ff k;k 0 ; where the parameters ff k;k 0 are sequence dependent weights, and ffi : A [ f Gammag) 2 R is a (real valued) weight function, defined on all possible pairs of matrix entries (cf. 27] 13] [8]) Obviously, the optimization task is to elongate the given sequences by inserting gaps, bringing all of them up to the same length (denoted above by N) and, simultaneously, minimizing the sum over all column weights . 1 In the formula below, we charge for gaps at the begining or the end of a ....
[Article contains additional citation context not shown here]
H. Carrillo and D. Lipman. The Multiple Sequence Alignment Problem in Biology. SIAM J. Appl. Math., 48(5):1073--1082, 1988.
....requires time proportional to 2 n l n , for a problem with n sequences each of length at most l. Considering that in typical real life instances l can be a few hundred, the basic dynamic programming approach turns out to be infeasible for all but very small problems. Carrillo and Lipman [3] have introduced some bounding criteria which reduce the time and space requirements 3 of dynamic programming and make solvable problems for n 6 and l 200. However, constructing optimal alignments is bound to be computationally expensive, since the problem has been shown NP complete (Wang ....
H. Carrillo and D. Lipman, The multiple sequence alignment problem in biology, SIAM J. Appl. Math., 49 (1989), pp. 197--209.
....or both of which may be alignments, the problem of optimal alignment with linear gap costs becomes more difficult. When aligning several sequences or two alignments, the output is a multiple sequence alignment, and a common objective function on multiple alignments is the sum of pairs objective [3], which takes as the cost of a multiple alignment the sum of the costs of all induced pairwise alignments. If in the sum of pairs objective pairwise alignments are scored with linear gap costs, the problem of counting gaps in the dynamic programming recurrences arises. Unfortunately in this more ....
....is to align the characters of A to the columns of B so as to minimize the sum over all rows of B of the cost of the induced pairwise alignment between A and the given row of B, where pairwise alignments are scored with linear gap costs. This differs from general sum of pairs multiple alignment [3] (which is intractable [22] in that the alignment on the rows of B is fixed. The alignment cost function is again given by substitution costs ffi, gap start up cost fl, and gap extension cost . To count whether or not an element b ij of alignment B is the null character, we define fi ij : 8 ....
Carrillo, H. and D. Lipman. "The multiple sequence alignment problem in biology. " SIAM Journal on Applied Mathematics 48, 1073--1082, 1988.
....2.1 Aligning a sequence to a family The input to the algorithm is the query sequence A and the fixed multiple alignment B of the sequence family. The output is an optimal alignment of the elements of A against the columns of B, where alignments are scored using the sum of pairs scoring function [4] with linear gap costs [15] More precisely, an alignment of A to B is assigned a score by summing over all rows of B the score of the induced pairwise alignment between A and the given row of B, where pairwise alignments are scored with gaps penalized by a general linear function of their length. ....
Carrillo, H. and D. Lipman. "The multiple sequence alignment problem in biology." SIAM Journal on Applied Mathematics 48, 1073--1082, 1988.
.... Exercise 27 [05M, opt. In the beginning we said that for the case of 2 sequences, our maths boils down to useless, yet obviously true observations. What does the Carrillo Lipman bound assert for the 2 sequence case 2.4 Some Bibliographic Hints. The standard reference for Carrillo Lipman is [CaL88], followed up by [AlL89] the latter dealing with the more sophisticated cost model of scoring along a tree. The standard reference for its implementation, known as MSA, is [GKS95] WWW server access, paper and code (Release 2.1) are currently available at http: ibc.wustl.edu msa.html . The ....
H. Carrillo, D. Lipman, The Multiple Sequence Alignment Problem in Biology. SIAM J Appl Math 1988;48(5):1073-1082
.... of sequences that can be examined at one time is often limited to less than six by the O(n k ) time requirements of the best known algorithms for these analyses, where k is the number of sequences and n is the maximum number of symbols in any of the given sequences (Kruskal and Sankoff, 1983; Carrillo and Lipman, 1988; Timkovskii, 1990; Irving and Fraser, 1992) These requirements seem to be inherent in the dynamic programming paradigm in which many of these algorithms have been derived (see Pearson and Miller, 1992, and references) Within the last decade, new algorithmic techniques derived from the work of ....
Carrillo, H. and Lipman, D. (1988) The multiple sequence alignment problem in biology. SIAM J.
....threedimensional DP as the basis for an initial alignment to the subsequent iterative algorithm. The A 3 algorithm reduces search space without lack of optimality of the result alignment. There were proposed some methods of reducing the search space in multiple alignment (Carrillo and Lipman [3], Spouge [9] etc. but the A 3 algorithm with upper bounding operation would be the best method to derive an optimal alignment [7, 8] see also [9] Araki et al. 1] proposed to use an A 3 algorithm for two dimensional DP in the Berger Munson iterative algorithm. They use an estimate ....
H. Carrillo and D. J. Lipman, "The Multiple Sequence Alignment Problem in Biology," SIAM J. Appl. Math., Vol.48 (1988), pp.1073-1082.
.... not necessarily of equal lengths, that are similar in a certain way [20] 21] 31] It is a different approach from the more classical one of optimizing the value of a function of similarity defined between strings in view of obtaining a multiple alignment of the corresponding macromolecules [2] [8] [9] 15] 19] 28] 39] 40] Obtaining a list of common similar words may lead in a second step to such an alignment but it can also represent an end in itself. We stop here at this first end. It is a multiple comparison we are interested in, not a multiple global alignment. We wish to ....
H. Carrillo and D. J. Lipman. The multiple sequence alignment problem in biology. SIAM J. Appl. Math., 48:1073--1082, 1988.
....Keywords Computational biology, approximation algorithms, multiple sequence alignment, evolutionary trees. 1 Introduction Multiple sequence alignment is a ubiquitous problem in computational biology for which many objective functions have been studied, including the sum of pairs objective [3, 6, 13, 2, 5], the maximum trace objective [12, 15] and objectives defined in terms of an evolutionary tree [16, Dedicated to the memory of Eugene Lawler. An earlier version of this paper was presented at the 5th Symposium on Combinatorial Pattern Matching [14] y Graduate School of Industrial ....
Carrillo, H. and D. Lipman. "The multiple sequence alignment problem in biology." SIAM Journal on Applied Mathematics 48, 1073--1082, 1988.
.... (s 0 j (i) s 0 l (i) where (s 0 j (i) s 0 l (i) is the score of the two opposing letters s 0 j (i) and s 0 l (i) The SP score is sensible and has previously been studied extensively. SP alignment problem is to find an alignment with the smallest SP score. It is first studied in [9] and subsequently used in [1, 3, 30, 58] SP alignment problem can be solved exactly by using dynamic programming. However, if there are k sequences and the length of sequences is n, it takes O(n k ) time. Thus, it works for only small numbers of sequences. Some techniques to reduce the time ....
H. Carrillo and D. Lipman. The multiple sequence alignment problem in biology, SIAM Journal on Applied Mathematics, 48 (1988), pp. 10731082.
.... Exercise 27 [05M, opt. In the beginning we said that for the case of 2 sequences, our math boils down to useless, yet obviously true observations. What does the Carrillo Lipman bound assert for the 2 sequence case 2.4 Some Bibliographic Hints. The standard reference for Carrillo Lipman is [CaL88], followed up by [AlL89] the latter dealing with the more sophisticated cost model of scoring along a tree. The standard reference for its implementation, known as MSA, is [GKS95] WWW server access, paper and code (Release 2.1) are currently available at http: ibc.wustl.edu msa.html . The ....
H. Carrillo, D. Lipman, The Multiple Sequence Alignment Problem in Biology. SIAM J Appl Math 1988;48(5):1073-1082
.... the problem of assigning statistical significance to the scores obtained by these methods, most notably by Karlin and Altschul [KA90, KA93, AG96] In parallel with pairwise alignment, many algorithms for simultaneous or progressive alignment of multiple sequences were also developed [SK83, WP84, CL88, FD87, Tay87, BS87, HS89] Without going into excessive detail, these algorithms attempt to find shortcuts to calculating the full multi dimensional Figure 1.2: The dynamic programming matrix for Needleman Wunsch alignment of two intron sequences from C.elegans. analogue of the two dimensional ....
H. Carrillo and D. Lipman. The multiple sequence alignment problem in biology. SIAM Journal of Applied Mathematics, 48:1073-- 1082, 1988.
....in the number of sequences for all definitions, and is provably hard for a few [13, 4, 18] Nevertheless it is often useful to align a small (but larger than 2) number of sequences optimally. One of the existing programs that attempts to construct an optimal alignment for some definition is MSA [5, 12]. MSA (version 1.0) was completed and distributed in 1989; most of the implementation of the resource intensive part was performed by the second author of this paper. The paper [12] gives a short overview of the program, but no detailed description of the implementation has ever been published. ....
....to get ffi is that the user may suppress the preprocessor program by supplying a file with values for ffl i;j , in which case these values are used in the same way as the computed ffl values in the second option. Again, this method may yield an upper bound U that is too low. Carrillo and Lipman [5] realized that not all alignments A need to be considered. Define d(S 1 ; S 2 ; SK ) minA c(A) Let L be the sum of pairs lower bound defined as above, and let U be an upper bound on d(S 1 ; SK ) and let A be a multiple sequence alignment of minimal cost. Then U Gamma L X ....
H. Carrillo and D. Lipman. The multiple sequence alignment problem in biology. SIAM J. Appl. Math., 48:1073--1082, 1988.
....well established measure of pairwise comparison either a distance or a similarity score and to extend it to the case of a multiple comparison. For instance, the score of a multiple sequence alignment can be defined as the sum of the scores of all projected pairwise alignments (Sum of the Pairs [6] [3] or as the sum of the projected pairwise alignment score associated with each edge of an evolutionary tree (Sum according to a tree [23] 24] In each case, a function of similarity is defined, and an optimal alignment is one which minimizes this function. Dynamic programming for instance ....
H. Carrillo and D.J. Lipman. The multiple sequence alignment problem in biology. SIAM J. Appl. Math., 48:1073--1083, 1988.
....the concept itself. One natural way to proceed is to start from a well established measure of pairwise comparison and to extend it to the case of a multiple one. For instance, the score of a multiplesequence alignment can be defined as the sum of the scores of all projected pairwise alignments [4], or as the sum of the projected pairwise alignment score associated with each edge of an evolutionary tree [18] 19] In each case, a function of similarity is defined, and an optimal alignment is one which minimizes this function. Dynamic programming is an algorithm that, in the special case of ....
H. Carrillo and D.J. Lipman. The multiple sequence alignment problem in biology. SIAM J. Appl. Math., 48:1073--1083, 1988.
....and is provably hard for a few [13, 5, 11, 18] Nevertheless it is often useful to align a small number of sequences, that is greater than two, optimally. One of the existing programs that attempts to construct an optimal alignment, for some definition of multiple sequence alignment, is MSA [6, 12]. MSA (version 1.0) was completed and distributed in 1989. Most of the implementation of the resource intensive part was performed by the second author, and the shortest paths computation description in Sections 2, 3, and 4 below is derived from notes written by the second author in 1989. ....
....get ffi is that the user may suppress the preprocessor program by supplying a file with values for ffl i;j , in which case these values are used in the same way as the computed ffl values in the second option. Again, this method may yield an upper bound U that is too low. As Carrillo and Lipman [6] realized, to solve the sum of pairs alignment problem, not all alignments A have to be considered. Define d(S 1 ; S 2 ; SK ) minA c(A) Let L be the sum of pairs lower bound defined as above. Let U be an upper bound on d(S 1 ; SK ) and let A be a multiple sequence alignment ....
H. Carrillo and D. Lipman. The multiple sequence alignment problem in biology. SIAM J. Appl. Math., 48:1073--1082, 1988.
....sequence alignment minimizing the sum of the costs of all induced pairwise alignments is discussed. The algorithm incorporates a bound of Carrillo and Lipman on the cost of an optimal alignment and uses a definition of Altschul for counting gaps in multiple sequence alignments. 1 The bound In [CL88], Carrillo and Lipman consider the problem of computing a multiple sequence alignment whose sum of costs of all induced pairwise alignments is minimal. For a multiple sequence alignment A 1; Delta Delta Delta;K of sequences S 1 ; S 2 ; SK , let A i;j be the induced pairwise alignment of S ....
Carrillo, Humberto and David Lipman. "The multiple sequence alignment problem in biology." SIAM Journal on Applied Mathematics 48, 1073--1082, 1988.
No context found.
H. Carrillo and D.J. Lipman. The multiple sequence alignment problem in biology. SIAM J. Appl. Math., 48:1073--1083, 1988.
No context found.
Carrillo, H., and Lipman, D. The multiple sequence alignment problem in biology. SIAM Journal of Applied Math 48(5): 1073-1082, 1988.
No context found.
H. Carrillo and D. Lipmann. The multiple sequence alignment problem in biology. SIAM J. Appl. Math, 48:1073--1082, 1988.
No context found.
H. Carrillo and D. Lipman. The multiple sequence alignment problem in biology. SIAM J. Appl. Math. 49:1 (1989) 197--209
No context found.
H. Carrillo and D. Lipman. The multiple sequence alignment problem in biology. SIAM J. Appl. Math. 49:1 (1989) 197-209
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC