| L. Wang and T. Jiang. On the complexity of multiple sequence alignment. J. Comput. Bio., 1:337348, 1994. |
....was performed by computing the global multi ple alignment of the orthologous sequences, and by identifying highly conserved regions in the alignment. That approach led to signi cant discoveries, but is quite limited, in the following sense. First, multiple alignment is an NPhard problem [31], so even if the optimal alignment could identify REs, we might not be able to compute it. More importantly, the upstream sequences considered are quite often highly diverged (except for their regulatory elements) Since regulatory elements are very short compared to the sequences considered (say ....
L. Wang and T. Jiang. On the complexity of multiple sequence alignment. Journal of Computational Biology, 1:337-348, 1994.
....total edit distance is O(2 k L 1 : L k ) A small problem has k = 10 and L i = 100 for i = 1; k. The DP for the associated problem has complexity 782 10 20 , which is greater than the postulated age of the universe in microseconds Thus, DP is not practical. Further, MSA is NP hard [78], so we should not expect any exact algorithm to solve this problem e ciently. 2.3.1 Tree alignment Another approach to MSA considers the sequences as leaves of a phylogenetic tree a graphical representation of the evolutionary history of species or their parts. Solving MSA involves nding ....
L. Wang and T. Jiang. On the complexity of multiple sequence alignment. Journal of Computational Biology, 1:337348, 1994.
....only gap characters) a set of pairwise alignments that are without such incompatibilities. The score defined by taking the sum over the costs of all these alignments is called the sum of pairs score; the problem of finding the multiple alignment with least sum of pairs score is NP hard, cf. [152]. Other scoring schemes might seem more relevant, e.g. scoring schemes taking the treelike shape of evolutionary relationships, cf. figure 2.5, into account; this does not help any on the intractability of the problem as finding the optimal tree alignment is MAX SNP hard, cf. 152] The problem ....
....NP hard, cf. 152] Other scoring schemes might seem more relevant, e.g. scoring schemes taking the treelike shape of evolutionary relationships, cf. figure 2. 5, into account; this does not help any on the intractability of the problem as finding the optimal tree alignment is MAX SNP hard, cf. [152]. The problem of finding a good multiple alignment is of so grave importance that the results on tractability stated above has not deterred people from taking stabs at it. Bafna et al. 12] present an approximation algorithm finding a multiple alignment of k sequences with a sum of pairs score ....
L. Wang and T. Jiang. On the complexity of multiple sequence alignment. Journal of Computational Biology, 1:337--348, 1994.
....penalties are zero, then we have a scoring scheme with xed gap penalties. If dM (s; 0 for all s 2 , then we will say that the scoring scheme speci es strictly positive gap extension penalties. Both Tree Alignment and SP Alignment problems have been proved to be NP hard by Wang and Jiang [17]. Hence reasearch has focused on heuristic algorithms or approximation algorithms for such problems and on nding restrictions that are e ciently solvable. A restriction which has a natural interpretation is the one where the scoring function is a metric, for which some approximation algorithms ....
L. Wang and T. Jiang. On the complexity of multiple sequence alignment. Journal of Computational Biology, 1(4):337-348, 1994.
....a generalization of the dynamic programming method for computing an optimal pairwise alignment. Despite the simplicity of this multiple alignment method, its steep running time and space consumption makes it impractical even for modestly sized sets of relatively short strings. Wang and Jiang in [197] show that the problem of computing a multiple alignment with optimal sum of pairs score is NP hard. However, the need for good multiple alignments has motivated several heuristics and approximation algorithms for computing a multiple alignment with a good sum of pairs score. For example, Feng and ....
L. Wang and T. Jiang. On the complexity of multiple sequence alignment. Journal of Computational Biology, 1:337--348, 1994.
....sequences. The approximation ratio has been improved to a 2 l k factor, for any fixed l, by Bafna, Lawler and Pevzner [1] But, besides these results it was an open question whether the problem is NP complete. The computational complexity of multiple sequence alignment has been investigated in [9] where is given a simple proof of NP completeness of the alignment with score scheme over a fixed alphabet of four letters that satisfies the triangle inequality, and assigns a non zero distance between identical letters. But, this result leaves open the problem of analyzing the complexity of ....
....inequality, and assigns a non zero distance between identical letters. But, this result leaves open the problem of analyzing the complexity of computing optimal SP score multiple sequence alignments for instances of this problem which are of practical biological relevance. Mainly, the result in [9] does not consider an important requirement for score schemes ( 4,11] which is the property of metricity: this one implies a zero distance between identical letters. Here, we prove the intractability of multiple sequence alignment in the very restricted case in which sequences are over a binary ....
L. Wang and T. Jiang. On the complexity of multiple sequence alignment. Journal of Computational Biology, 1(4):337--348, 1994.
....a vertex. An optimal evolutionary tree is one that minimises the cost. One classical phylogeny is the star : a tree with n 1 nodes, n of them being leaves. Clearly finding an optimal evolutionary tree when the given phylogeny is a star is exactly the same problem as finding the median string. In [16] it is proved that finding an optimal evolutionary tree is NP hard, when the given phylogeny is a binary tree or a star, but the score scheme the authors use is problem dependant and doesn t even satisfy triangular inequality. Other results are proven in [17] if the phylogeny has bounded degree ....
....(computing each time D(w, W) try to improve it. In [3] another approximation is proposed based on a greedy strategy to build incrementally a solution. Other heuristics have been proposed but it remains unclear whether the problem admits a PTAS (Polynomial Time Approximation Scheme) in [16] Wang and Jiang prove that the answer is negative when the score scheme is a parameter of the problem. In their proof they use a score scheme that does not satisfy triangular inequality. We end this section with an easy lemma that will be used in the sequel. Lemma 1. Given two strings u and v, ....
[Article contains additional citation context not shown here]
L. Wang and T. Jiang, On the Complexity of Multiple Sequence Alignment, Journal of Computational Biology, Vol. 1 n4 (1994) 337-348.
....common supersequence 11 w of s 1 ; s n such that jwj = m. Note that if a or b in is aligned with for all s i , we can delete the character in and we can obtain a common supersequence which is shorter than m. See case (c) in Figure 4. A similar alignment was used by Wang and Jiang [32]. only if) Let w be a common supersequence of S such that jwj m. Let be the string constructed by substituting a for 0 and b for 1 in w. When jwj m, we append some characters from fa; bg to so that j j = m. Partition x just after every occurrence of . The distance between each ....
L. Wang and T. Jiang, On the complexity of multiple sequence alignment, Journal of Computational Biology 1, 4 (1994), 337-348. 14
....application where the multiple alignment is being used. One of such objective functions used for the evaluation of a multiple alignment is the sum of pairs (SP) function, which computes the sum of the distances of all the pairs of strings in a multiple alignment. It was shown by Wang and Jiang [15] that the problem of computing a multiple sequence alignment which finds the SP score (called the SP alignment problem [6] is NP complete. Another well known objective function is the consensus function, which finds a consensus string of a set of strings. In this case, we first need to define ....
....string is to use the consensus error. The consensus error of a string w with respect to a given set S is the sum of the distances between w and all the strings in S. A consensus string of S based on consensus error is a string that minimizes the consensus error with respect to S. Wang and Jiang [15] showed that if the penalty matrix (used to define the distance between two strings) does not satisfy the triangle inequality, the problem of finding a consensus string based on consensus error is MAX SNP hard. In fact, they showed the MAX SNP hardness of this problem by a reduction to the ....
L. Wang and T. Jiang, On the complexity of multiple sequence alignment, Journal of Computational Biology 1 (1994), 337-348.
.... [30] 60] 85] but multiple sequence alignment remains an open research area [32] 88] see [2] 8] 57] for surveys) In addition to the ambiguity inherent in selecting gap cost and alignment cost criteria, multiple sequence alignment has also been proven to be computationally intractable [41] [87]. Step 3: Selecting the Optimality Criterion As discussed in Section 1.2, the number of possible phylogenies spanning a given set of taxa is exponential in the number of taxa. However, only one of them coincides with the actual evolutionary history. Many criteria have been proposed to ....
L. Wang and T. Jiang, On the Complexity of Multiple Sequence Alignment, Journal of Computational Biology, 1 (1993), pp. 337--348.
....will minimize the objective function. A complete description of the problem is outside the scope of this dissertation. What is important to note, though, is the fact that under any reasonable definition of the objective function the problem of finding an optimal sequence alignment is NP hard [109]. As a result, most algorithms in this class resort to heuristics that produce suboptimal alignments, thus trading off enhanced execution speed with results that may not be complete. There are also other problems related to global string alignment (e.g. the problem of domain swapping see ....
L. Wang and T. Jiang. On the complexity of multiple sequence alignment. Journal of Computational Biology, 1(4):337--348, 1994.
.... is an important problem in computational molecular biology, and many algorithms have been presented in this area of research (for a recent comparison see [3] Since the problem of computing optimal alignments with respect to the sum of pairs criterion and most of its variants are NP hard [4], many approximative algorithms have been proposed (e.g. 5 8] Unfortunately, almost all of these methods either exhibit a prohibitive computational complexity or yield biologically unplausible results. With our algorithm, we try to contribute towards improving this situation. The authors ....
L. Wang and T. Jiang, On the complexity of multiple sequence alignment, J. Comp. Biol. 1 (4), 337--348, (1994).
....No non approximability result was known for geometric versions of the Steiner Tree problem. Steiner Tree in Hamming spaces is known to be Max SNP hard, though this result is usually expressed in terms of an equivalent problem in computational biology (see Wareham s Master Thesis [War93] and also [JW94, War95] Our Results. We prove the Max SNP hardness of Min ST in R n under the # 1 norm. We establish this result by means of a reduction from the Min ST problem in Hamming spaces. The reduction is based on the following combinatorial result (Theorem 14) for an instance where all the points ....
T. Jiang and L. Wang. On the complexity of multiple sequence alignment. Journal of Computational Biology, 1(4):337--348, 1994.
....We consider the case of a fixed alphabet Sigma and a fixed scoring function d and thus a problem instance of WMSA is given by a pair (S; W ) A special case of WMSA is Binary Multiple Sequence Alignment (BMSA) where the weights are restricted to 1 and 0. It has been shown that MSA is NP complete [10]. For an arbitrary fixed constant r, MSA can be approximated in polynomial time within a factor of 2 Gamma r n , where n r is the number of sequences [3] It is unknown whether MSA admits a polynomial time approximation scheme (PTAS, see [2] WMSA with arbitrary weights can be approximated ....
L. Wang and T. Jiang. On the complexity of multiple sequence alignment. J. Comp. Biol., 1(4):337--348, 1994.
....reveal important information about the specific problem. One widely used framework for such a cost function is the (weighted) sum of pairs ( W)SP) score with quasi natural gap costs [1, 3, 5] Since the problem of computing optimal multiple alignments according to the SP score is NP complete [13], usually in practice heuristic (tree based) methods are used. Nevertheless the computation of optimal multiple alignments has its justification as a means of evaluating heuristic approaches or as a subprocedure of heuristic alignment methods like the recursive Divide and Conquer Alignment ....
L. Wang and T. Jiang. On the complexity of multiple sequence alignment. J. Comp. Biol., 1(4):337--348, 1994.
....path cover knowledge are faster than the others, much more than what could be expected from the complexity bounds. 1. INTRODUCTION The computation of a minimal cost alignment of an unbounded number of sequences is known as an NP hard problem for the costs usually used [Garey and Johnson 1979; Wang and Jiang 1994; Jiang et al. 1994] Solutions thus necessarily rely on the choice of heuristics. Several approaches have been proposed, the most ecient in time computation often being greedy algorithms. In such algorithms when two positions in two di erent sequences are aligned ( put together in the same ....
Wang, L. and Jiang, T. 1994. On the complexity of multiple sequence alignment. J. Comput. Biol. 1, 337-348.
....example, dynamic programming algorithms that are guaranteed to find a best scoring alignment of k sequences with mean length n have a running time of O(n k ) Carillo and Lipman 1988) This explains the widespread use of heuristic algorithms for multiple alignment. It has been formally proved by Wang and Jiang (1994) and Bonizzoni and Della Vedova (2000) that there are scoring matrices for which the problem of finding a multiple alignment of k sequences with optimal SP score is NP hard. Unfortunately, the scoring matrix used by Wang and Jiang (1994) for obtaining this result is not a metric, which makes it ....
....for multiple alignment. It has been formally proved by Wang and Jiang (1994) and Bonizzoni and Della Vedova (2000) that there are scoring matrices for which the problem of finding a multiple alignment of k sequences with optimal SP score is NP hard. Unfortunately, the scoring matrix used by Wang and Jiang (1994) for obtaining this result is not a metric, which makes it very different from the matrices that are actually used in biological applications. The proof technique used by Bonizzoni and Della Vedova (2000) uses matrices in which the indel (insertion deletion) penalties depend on which character a ....
Wang, L., and T. Jiang, T. 1994. On the complexity of multiple sequence alignment. Journal of Computational Biology 1(4), 337-348.
....This can be done by dynamic programming much the same way as for minimum cost alignments. 7 more a multiple alignment is a good starting point for reconstructing the phylogeny 14 of the species the sequences originates from. This problem is unfortunately NP complete in most settings, see e.g. [54]. Several ways to circumvent this problem has been suggested. In [5] an approximation algorithm is presented that for any xed l achieves an approximation of 2 Gamma l k , where k is the number of sequences aligned, in polynomial time. The use of hidden Markov models to generate a prole of the ....
Lusheng Wang and Tao Jiang. On the complexity of multiple sequence alignment. Journal of Computational Biology, 1(4):337348, 1994.
....in which the cost of the alignment is obtained by adding the costs of the symbols matched up at the same positions. Analogously, in a multiple alignment the cost is obtained by adding up the matching characters, over all the positions and for all the pairs of sequences. Minimizing SP is NP hard [26]. In [9] Gusfield showed that a treebased progressive alignment method due to Feng and Doolittle (described below) using the minimum cost star gives a 2 approximation. In the program described in this paper we push this idea further, by considering also trees that are not only stars and also ....
L. Wang and T. Jiang. On the complexity of multiple sequence alignment, J. Comp. Biol. 1 (1994) 337--348
....to inner nodes and the resulting pairwise alignment across edges imply a multiple alignment. For a given tree, finding inner node assignments with an implied multiple alignment minimizing tree alignment score is known as Tree alignment problem and has been shown to be MAX SNP hard by Wang et al. [26]. Having neither sequence alignment nor tree topology given one obtains the following connection between the Steiner tree formalization of phylogeny and multiple sequence alignment. Consider a graph whose Work supported by DFG grant Vi 160 1 1 nodes are all sequences with length smaller than ....
L. Wang, T. Jiang. On the Complexity of Multiple Sequence Alignment. Journal of Computational Biology 1:4,337--348, 1996.
....it is well formulated from a mathematical point of view. This can not be said of many other methods based on some biological intuition which can not be easily framed in a rigorous scoring system. Computing a minimum SP alignment for more than two sequences is NP complete (Wang and Jang [WJ94]) For pairwise alignment however there exist efficient polynomial algorithms based on dynamic programming. Needleman and Wunsch [NW70] are credited with the first application of this method. Dynamic programming can be used also to solve the more general case, but with complexity O( 2l) n ) ....
....dynamic programming is used to compute the best labeling of the internal nodes among those in which only labels from F are allowed. The complexity of the overall algorithm is O(n p (p2 p l p n p l 2 ) pj Sigmaj p 1 ) Computing Steiner sequences in general is an NP hard problem [WJ94]. Sankoff s [San75] algorithm takes O(p2 p l p ) for p sequences, and turns out to be impractical for p 4, also considering the number of such problems that are required solving. Because computing exact Steiner sequences is a bottleneck to the overall algorithm, in our implementation we ....
L. Wang and T. Jiang. On the complexity of multiple sequence alignment, J. Comp. Biol., 1:337--348, 1994
....of amino acid or nucleotide substitutions in a weighted or unweighted manner. They take a multiple sequence alignment (MSA) as input and minimize the number of changes to explain the corresponding evolutionary tree. The construction of an optimal MSA, which is needed as input, is also NP complete [22]. In addition, many algorithms for calculating MSAs need an evolutionary tree as input, which makes the problem circular. 1 Definition 1.2 Given is a set of sequences S = s 1 , s n with s i # # # where # is a finite alphabet. A Multiple Sequence Alignment (MSA) consists of a set of ....
T. Jiang and L. Wang. On the complexity of multiple sequence alignment. J. Comp. Biol., 1:337 -- 48, 1994.
.... 1998; Jiang et al. 1996; Sanko# and Cedergren, 1983) and the exact construction of evolutionary trees 2 (Sanko#, 1975; Dress and Steel, 1993; Estabrook et al. 1975; Estabrook et al. 1976; Felsenstein, 1973; Felsenstein, 1981; Thorne et al. 1993) are NP complete (Foulds and Graham, 1982; Jiang and Wang, 1994), and becomes computationally expensive for many real protein families encountered in a contemporary database. This requires that virtually all MSAs and evolutionary trees that will be used in the post genomic era will be constructed using approximate heuristics. Definition 1.4 An MSA scoring ....
Jiang, T. and Wang, L. (1994). On the complexity of multiple sequence alignment. J. Comp. Biol., 1:337 -- 48.
....in which the cost of the alignment is obtained by adding the costs of the symbols matched up at the same positions. Analogously, in a multiple alignment the cost is obtained by adding up the matching characters, over all the positions and for all the pairs of sequences. Minimizing SP is NP hard [26]. In [9] Gus eld showed that a treebased progressive alignment method due to Feng and Doolittle (described below) using the minimum cost star gives a 2 approximation. In the program described in this paper we push this idea further, by considering also trees that are not only stars and also ....
L. Wang and T. Jiang. On the complexity of multiple sequence alignment, J. Comp. Biol. 1 (1994) 337-348
....introduced some bounding criteria which reduce the time and space requirements of dynamic programming and make solvable problems for n 6 and l 200. However, constructing optimal alignments is bound to be computationally expensive, since the problem has been shown NP complete (Wang and Jiang, [20]) Despite these very expensive solution methods, the SP objective is implemented 4 in several popularly available multiple alignment packages such as MACAW [19] and MSA [13] 2.2 Approximation algorithms via routing cost trees. The first approximation algorithm for the SP alignment problem ....
L. Wang and T. Jiang, On the complexity of multiple sequence alignment, J. Comp. Biol., 1 (1994), pp. 337--348.
....only if the running time were polynomial in both n and k. In particular, k should not appear in the exponent as it does in the expression (2n) k . Unfortunately, we are very unlikely to find such an algorithm, which is a consequence of the following theorem: Theorem 6. 5 (Wang and Jiang [50]) The optimal SP alignment problem is NP complete. What NP completeness means and what its consequences are will be discussed in the following section. 6.4. NP completeness In this section we give a brief introduction to NP completeness, and how problems can be proved to be NP complete. ....
L. Wang and T. Jiang. On the complexity of multiple sequence alignment. Journal of Computational Biology, 1:337--348, 1994.
....which is de ned as a binary tree such that each leaf is assigned a unique sequence of X, we need to construct a sequence for each internal node such that the total cost of the tree is minimized. The problem of Tree Alignment with a given phylogeny is NP hard even if the phylogeny is a binary tree [WJ1994]. Some heuristic algorithms have been proposed [HEI1989, SCL1976] Here we use an approximation algorithm based on the results of Jiang, Lawler and Wang [JLW1994] Step 1: Initialize the internal nodes. Algorithm Initialization(X; T ) Input: X = fs 1 ; s k g, T (a phylogeny for X) ....
Lusheng Wang and Tao Jiang, On the Complexity of Multiple Sequence Alignment., Journal of Computational Biology, 1-4, 337-348, 1994. 5
....are zero, then we have a scoring scheme with fixed gap penalties. If dM (s; Delta) 0 for all s 2 Sigma, then we will say that the scoring scheme specifies strictly positive gap extension penalties. Both Tree Alignment and SP Alignment problems have been proved to be NP hard by Wang and Jiang [WJ94]. Hence reasearch has focused on heuristic algorithms or approximation algorithms for such problems and on finding restrictions that are efficiently solvable. A restriction which has a natural interpretation is the one where the scoring function is a metric. For this restriction some approximation ....
L. Wang and T. Jiang. On the complexity of multiple sequence alignment. Journal of Computational Biology, 1(4):337-348, 1994.
....the alignment further. Experimental results are shown at the end. 1. Introduction The computation of multiple sequence alignments (MSAs) remains one of the major open problems in computational biology. Possibly this is due to the complexity as most problems associated with MSAs are NP complete [20]. It is therefore desirable to develop heuristics that try to come as close to the optimum as possible but terminate within reasonable time and space bounds. An MSA consists of a set of strings, where each string represents an amino acid, DNA or RNA sequence (see Definition 1.1) Example: In the ....
....probability 1 . Problem 2.1 (TSP problem) Given is a set of sequences S = s 1 , s n and the corresponding scores of the optimal pairwise alignments. The problem is to find the longest tour where each sequence is visited once. Both the construction of optimal evolutionary trees [8, 20] and the Traveling Salesman Problem are NP complete. But the TSP is very well studied and optimal solutions can be calculated within a few hours for up to 1000 cities and in a few seconds for up to 100 cities, whereas the construction of evolutionary trees is still a big problem. There are ....
T. Jiang and L. Wang. On the complexity of multiple sequence alignment. J. Comp. Biol., 1:337 -- 48, 1994.
....1 Introduction Multiple sequence alignment (MSA) is important in functional, structural and evolutionary studies of sequence data. Much research has focussed on the formal study of MSA as an optimization problem and several optimization criteria have been discussed at length in the literature [8, 34, 42, 44, 58, 65, 75, 85]. In addition, many software tools for constructing MSA s are available, mostly based on heuristics although some use exact or branch and bound techniques (see [16, 54] for surveys. The concept of an evolutionary tree is a widely used model for MSA, where the tree encodes the historical ....
....out that, in general, there is no unique structural alignment: different methods for comparing structures result in different alignments. Complexity of Multiple Sequence Alignment. Complexity results have been published for MSA using both the sum of pairs and the TA metrics. Wang and Jiang [85] has shown that SP alignment is NP complete and Sweedyk and Warnow [75] showed that general TA is NP complete. TA for a fixed binary tree topology is also NP complete [85] and a polynomial time approximation scheme has been presented [42] However, when the fixed topology is a star, TA is max SNP ....
[Article contains additional citation context not shown here]
L. Wang and T. Jiang. On the complexity of multiple sequence alignment. Journal of Computational Biology, 1(4):337--348, 1994.
....to inner nodes and the resulting pairwise alignment across edges imply a multiple alignment. For a given tree, finding inner node assignments with an implied multiple alignment minimizing tree alignment score is known as Tree alignment problem and has been shown to be MAX SNP hard by Wang et al. [26]. 1 Work supported by DFG grant Vi 160 1 1, presented at RECOMB 97, http: www.cs.sandia.gov recomb97 2 E mail: schwikowski dkfz heidelberg.de 3 E mail: m.vingron dkfz heidelberg.de 1 Having neither sequence alignment nor tree topology given one obtains the following connection between ....
L. Wang, T. Jiang. On the Complexity of Multiple Sequence Alignment. Journal of Computational Biology 1:4,337--348, 1996.
....beforehand, meaning that the problem can no longer be decomposed into columns and thus, the candidate sets assigned to inner nodes in the Candidate Heuristic consist of whole sequences instead of single letters. Optimal Algorithms. The Tree alignment problem was proved NP hard by Wang and Jiang [11]. The dynamic programming algorithm of Sankoff [4] solves the problem for unit mutation and indel costs, at the expense of run time that is exponential in the number of input sequences. Subsequent extensions include arbitrary mutation costs [8] and the simultaneous prediction of the inner ....
....the phylogeny of the input sequences and pleaded for the simultaneous calculation of alignment and phylogeny. When we keep asking for the most parsimonious tree alignment without fixing the tree topology, the problem is even harder. The so called Generalized Tree Alignment problem is MAX SNP hard [11], so that, unlike for the Tree Alignment problem, probably no polynomial time approximation scheme exists. Due to the close ties between topology and multiple alignment, previous authors [6, 35, 21] have integrated the construction of the phylogeny into the Candidate Heuristic scheme. This can be ....
L. Wang and T. Jiang, On the complexity of multiple sequence alignment, J. Comp. Biol., 1, 4 (1994), 337--348.
.... sequence alignment is an important problem in computational molecular biology, and many algorithms have been presented in this area of research (see [18] 15] 5] or [13] for a survey) Since the problem of computing optimal alignments with respect to the sum of pairs criterion is NP hard [23], many approximative algorithms have been proposed (e.g. 9] 11] 20] or [2] Unfortunately, almost all of these methods either exhibit a prohibitive computational complexity or yield biologically unplausible results. With our algorithm, we try to contribute to improving this situation. 2 ....
L. Wang and T. Jiang. On the Complexity of Multiple Sequence Alignment. J. Comp. Biol., 1(4):337--348, 1994. 11
....problem in order to presort simultaneously every sequence under consideration into several subsequences in a Divide and Conquer fashion appears to be a novel approach. Recently, it was shown that not unexpectedly multiple sequence alignment with the sum of pairs score is NP complete (cf. [47]) Therefore, to align large size sets of sequences in reasonable time, one needs fast heuristic algorithms. Unfortunately, most of the more reliable heuristic approaches suffer from high computational 10 costs for large size problems, while fast heuristics often do not yield plausible results. ....
L. Wang and T. Jiang. On the Complexity of Multiple Sequence Alignment. J. Comp. Biol., 1(4):337--348, 1994.
.... study of alignment procedures is presently a very active area of research, with an abundance of papers and software contributions (see [28] 6] or [22] for a survey) Recently, it was shown that multiple sequence alignment with the sum of pairs score, on which we will focus, is NP complete (cf. [38]) Therefore, to align large size sets of sequences in reasonable time, one needs fast heuristic algorithms. Unfortunately, most of the more reliable heuristic approaches suffer from high computational costs for large size problems, while fast heuristics often do not yield plausible results. So, ....
L. Wang, T. Jiang, On the Complexity of Multiple Sequence Alignment, J. Comp. Biol. 1, no.4, pp.337-348, 1994
....or nucleotide substitutions in a weighted or unweighted manner. They take a multiple sequence alignment (MSA) as input and minimize the number of changes or hypothesis to explain the corresponding evolutionary tree. The construction of an optimal MSA, which is needed as input, is also NP complete [22]. In addition, many algorithms for calculating MSAs need an evolutionary tree as input, which makes the problem circular. Definition 1.2 Given is a set of sequences S = s 1 , s n with s i # # # where # is a finite alphabet. A Multiple Sequence Alignment (MSA) consists of a set of ....
T. Jiang and L. Wang. On the complexity of multiple sequence alignment. J. Comp. Biol., 1:337 -- 48, 1994.
....cost for each pair of characters and the insertion deletion cost for each character. An arbitrary penalty matrix can also be used as a relative measure because it can contain both positive and negative costs [28] It is common to assume that a penalty matrix satisfies the triangle inequality [30]. Relative measures. When we want to compare the similarity between x and y and the similarity between x 0 and y 0 , we need relative measures (rather than absolute measures) because the lengths of the strings x; y; x 0 ; y 0 may be different. There are two ways to define relative ....
....b in ff, we obtain a common supersequence w of s 1 ; s n such that jwj = m. Note that if a or b in ff is aligned with Delta for all s i , we can delete the character in ff and we can obtain a common supersequence which is shorter than m. A similar alignment was used by Wang and Jiang [30]. only if) Let s be a common supersequence of S such that jsj m. Let ff be the string constructed by substituting a for 0 and b for 1 in s. Partition x just after every occurrence of . The distance between each partition block of x and #ff is at most m since each a (resp. b) in ff can be ....
L. Wang and T. Jiang, On the complexity of multiple sequence alignment, J. Comp. Biol. 1 (1994), 337-348.
....chosen in such a way that the sum over all induced pairwise distances between sequences is minimized [2] This formulation does not refer to evolution and does not require the reconstruction of ancient sequences. Both these formulations of multiple sequence alignment have been shown to be NP hard [38]. SEQUENCE ALIGNMENT AND PHYLOGENY CONSTRUCTION 3 Formalizations for the phylogeny problem can be roughly described as either character based, distance based or as maximum likelihood methods [32] Parsimony, to be described below, is a character based method. Its underlying philosophy allows for ....
L. Wang and T. Jiang. On the complexity of multiple sequence alignment. Journal of Computational Biology, 1:337--348, 1994.
....Section 6.1. We would like an algorithm that works for k in the hundreds too, which would only be possible if the running time were polynomial in both n and k. Unfortunately, we are very unlikely to find such an algorithm, which is the essence of the following theorem: Theorem 6. 5 (Wang and Jiang [6]) The optimal SP alignment problem is NP complete. What NP completeness means and what its consequences are will be discussed in the following section. 6.4. NP completeness In this section we give a brief introduction to NP completeness, and how problems can be proved to be NP complete. ....
L. Wang and T. Jiang. On the complexity of multiple sequence alignment. Journal of Computational Biology, 1:337--348, 1994.
No context found.
L. Wang and T. Jiang, On the complexity of multiple sequence alignment. J. Comp. Biol., 1(1994), 337-348.
No context found.
L. Wang and T. Jiang. On the complexity of multiple sequence alignment. Journal of Computational Biology, 1(4):337--348, 1994.
No context found.
L. Wang and T. Jiang. On the complexity of multiple sequence alignment. Journal of Computational Biology, 1:337--348, 1994.
....the consensus score P n i=1 dH (S; S i ) where S is the majority sequence of S1 ; S2 ; Sn and is called the median sequence. Consensus Alignment is NP hard for a score scheme where a mismatch costs 1 and a match costs 0 [9] The problem is MAX SNP hard if the score scheme is arbitrary [21]. The best known approximation algorithm for Consensus Alignment has performance ratio 2 Gamma o(1) 5] SP alignment has been extensively studied recently. With much effort, the best known performance ratio for SP alignment has been improved from 2 Gamma 2 k to 2 Gamma l k for any ....
....smallest score for two distinct letters in the alphabet. The assumption here is very general. In fact, in any score scheme for finite alphabet, dmax=dmin constant if dmin 6= 0. The analysis of our algorithm does not depend on triangle inequality. Contrary to our results, the MAX SNP hardness in [21] assumes that dmax=dmin = 1=0. In this section, we show that algorithm diagonalAlign works for a very general type of score scheme, i.e. d(i; i) 0 and dmax=dmin is bounded by a constant, where d(i; j) is the score between the pair of letters i and j, dmax = max i6=j d(i; j) and dmin = min ....
L. Wang and T. Jiang, On the complexity of multiple sequence alignment, J. Comp. Biol., 1, pp. 337-348, 1994.
....consensus score P n i=1 dH (S; S i ) where S is the majority sequence of S 1 ; S 2 ; S n and is called the median sequence. Consensus Alignment is NP hard for a score scheme where a mismatch costs 1 and a match costs 0 [9] The problem is MAX SNP hard if the score scheme is arbitrary [21]. The best known approximation algorithm for Consensus Alignment has performance ratio 2 Gamma o(1) 5] SP alignment has been extensively studied recently. With much effort, the best known performance ratio for SP alignment has been improved from 2 Gamma 2 k to 2 Gamma l k for any ....
....score for two distinct letters in the alphabet. The assumption here is very general. In fact, in any score scheme for finite alphabet, d max =d min constant if d min 6= 0. The analysis of our algorithm does not depend on triangle inequality. Contrary to our results, the MAX SNP hardness in [21] assumes that d max =d min = 1=0. In this section, we show that algorithm diagonalAlign works for a very general type of score scheme, i.e. d(i; i) 0 and d max =d min is bounded by a constant, where d(i; j) is the score between the pair of letters i and j, d max = max i6=j d(i; j) and d min = ....
L. Wang and T. Jiang, On the complexity of multiple sequence alignment, J. Comp. Biol., 1, pp. 337-348, 1994.
....; s k (i) of the i th column is thus defined as: s 1 (i) s 2 (i) s k (i) X (p;q)2E (s p (i) s q (i) 1) The tree score for A is the total scores of all columns in A. It is known that the two definitions of tree alignment are equivalent. Tree alignment was proved to be NP hard [20]. Many heuristic algorithms have been proposed in the literature [1, 6, 15, 16] Some approximation algorithms with guaranteed relative error bounds have been reported recently. In particular, a polynomial time approximation scheme (PTAS) is presented in [21] and improved versions are given in ....
L. Wang and T. Jiang, On the complexity of multiple sequence alignment, Journal of Computational Biology, 1(1994), 337-348. 22
No context found.
L. Wang and T. Jiang. On the complexity of multiple sequence alignment. J. Comput. Bio., 1:337348, 1994.
No context found.
L. Wang and T. Jiang. On the complexity of multiple sequence alignment. Computational Biology, 1:337--348, 1994.
No context found.
L. Wang, T. Jiang, On the complexity of multiple sequence alignment, J. Comput. Biol. 1 (1994) 337--348.
No context found.
L. Wang and T. Jiang. On the Complexity of Multiple Sequence Alignment. J. Comp. Biol. 1(4), pages 337--348, 1994.
No context found.
Wang, L. and T. Jiang, (1994) On the complexity of multiple sequence alignment, J. Computational Biology, 1:337-348. 27
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC