Results 11  20
of
328
A Polynomial Time Approximation Scheme for Minimum Routing Cost Spanning Trees
, 1998
"... Given an undirected graph with nonnegative costs on the edges, the routing cost of any of its spanning trees is the sum over all pairs of vertices of the cost of the path between the pair in the tree. Finding a spanning tree of minimum routing cost is NPhard, even when the costs obey the triangle i ..."
Abstract

Cited by 45 (8 self)
 Add to MetaCart
(Show Context)
Given an undirected graph with nonnegative costs on the edges, the routing cost of any of its spanning trees is the sum over all pairs of vertices of the cost of the path between the pair in the tree. Finding a spanning tree of minimum routing cost is NPhard, even when the costs obey the triangle inequality. We show that the general case is in fact reducible to the metric case and present a polynomialtime approximation scheme valid for both versions of the problem. In particular, we show how to build a spanning tree of an nvertex weighted graph with routing cost within (1 + ffl) from the minimum in time O(n O( 1 ffl ) ). Besides the obvious connection to network design, trees with small routing cost also find application in the construction of good multiple sequence alignments in computational biology. The communication cost spanning tree problem is a generalization of the minimum routing cost tree problem where the routing costs of different pairs are weighted by different r...
Computational Complexity Of Multiple Sequence Alignment With SpScore
 Journal of Computational Biology
, 1999
"... . It is shown that the multiple alignment problem with SPscore is NPhard for each scoring matrix in a broad class M that includes most scoring matrices actually used in biological applications. The problem remains NP hard even if sequences can only be shifted relative to each other and no inte ..."
Abstract

Cited by 39 (1 self)
 Add to MetaCart
. It is shown that the multiple alignment problem with SPscore is NPhard for each scoring matrix in a broad class M that includes most scoring matrices actually used in biological applications. The problem remains NP hard even if sequences can only be shifted relative to each other and no internal gaps are allowed. It is also shown that there is a scoring matrix M 0 such that the multiple alignment problem for M 0 is MAXSNPhard, regardless of whether or not internal gaps are allowed. Key words and phrases. sequence alignment, scoring matrix, NPhardness, MAXSNP  hardness, polynomial time approximation scheme. email: just@math.ohiou.edu phone: (740)5931260 fax: (740)5939805. 1 2 WINFRIED JUST 1. Introduction The importance of good multiple sequence alignment algorithms is evidenced by the large number of programs that have been developed for this task (Fasman and Salzberg 1998). Finding an optimal alignment of k sequences appears to quickly become computationall...
The emergence of pattern discovery techniques in computational biology
 Metabolic Engineering
, 2000
"... In the past few years, pattern discovery has been emerging as a generic tool of choice for tackling problems from the computational biology domain. In this presentation, and after defining the problem in its generality, we review some of the algorithms that have appeared in the literature and descri ..."
Abstract

Cited by 36 (5 self)
 Add to MetaCart
(Show Context)
In the past few years, pattern discovery has been emerging as a generic tool of choice for tackling problems from the computational biology domain. In this presentation, and after defining the problem in its generality, we review some of the algorithms that have appeared in the literature and describe several applications of pattern discovery to problems from computational biology. 2000 Academic Press 1.
Finding Similar Regions in Many Sequences
 JOURNAL OF COMPUTER AND SYSTEM SCIENCES
, 1999
"... Algorithms for finding similar, or highly conserved, regions in a group of sequences are at the core of many molecular biology problems. Assume that we are given n DNA sequences s 1 ; : : : ; s n . The Consensus Patterns problem, which has been widely studied in bioinformatics research [22, 10, 7 ..."
Abstract

Cited by 35 (7 self)
 Add to MetaCart
Algorithms for finding similar, or highly conserved, regions in a group of sequences are at the core of many molecular biology problems. Assume that we are given n DNA sequences s 1 ; : : : ; s n . The Consensus Patterns problem, which has been widely studied in bioinformatics research [22, 10, 7, 21, 2, 3, 9, 18, 19, 27], in its simplest form, asks for a region of length L in each s i , and a median string s of length L so that the total Hamming distance from s to these regions is minimized. We show that the problem is NPhard and give a polynomial time approximation scheme (PTAS) for it. We then present an efficient approximation algorithm for the consensus pattern problem under the original relative entropy measure of [22, 10, 7, 21]. As an interesting application of our analysis, we further obtain a PTAS for a restricted (but still NPhard) version of the important consensus alignment problem [6] allowing at most constant number of gaps, each of arbitrary length, in each sequence.
The complexity of multiple sequence alignment with SPscore that is a metric
 TCS
, 2001
"... This paper analyzes the computational complexity of computing the optimal alignment of a set of sequences under the SP (sum of all pairs) score scheme. We solve an open question by showing that the problem is NP complete in the very restricted case in which the sequences are over a binary alphabet ..."
Abstract

Cited by 34 (0 self)
 Add to MetaCart
(Show Context)
This paper analyzes the computational complexity of computing the optimal alignment of a set of sequences under the SP (sum of all pairs) score scheme. We solve an open question by showing that the problem is NP complete in the very restricted case in which the sequences are over a binary alphabet and the score is a metric. This result establishes the intractability of multiple sequence alignment under a score function of mathematical interest, which has indeed received much attention in biological sequence comparison.
Implied alignment: a synapomorphybased multiplesequence alignment method and its use in cladogram search
 Cladistics
"... A method to align sequence data based on parsimonious synapomorphy schemes generated by direct optimization (DO; earlier termed optimization alignment) is proposed. DO directly diagnoses sequence data on cladograms without an intervening multiplealignment step, thereby creating topologyspecific, d ..."
Abstract

Cited by 32 (2 self)
 Add to MetaCart
(Show Context)
A method to align sequence data based on parsimonious synapomorphy schemes generated by direct optimization (DO; earlier termed optimization alignment) is proposed. DO directly diagnoses sequence data on cladograms without an intervening multiplealignment step, thereby creating topologyspecific, dynamic homology statements. Hence, no multiplealignment is required to generate cladograms. Unlike general and globally optimal multiplealignment procedures, the method described here, implied alignment (IA), takes these dynamic homologies and traces them back through a single cladogram, linking the unaligned sequence positions in the terminal taxa via DO transformation series. These ‘‘lines of correspondence’ ’ link ancestor–descendent states and, when displayed as linearly arrayed columns without hypothetical ancestors, are largely indistinguishable from standard multiple alignment. Since this method is based on synapomorphy, the treatment of certain classes of insertion–deletion (indel) events may be different from that of other alignment procedures. As with all alignment methods, results are dependent on parameter assumptions such as indel cost and transversion:transition ratios. Such an IA could be used as a basis for phylogenetic search, but this would be questionable since the homologies derived from the implied alignment depend on its natal cladogram and any variance, between DO and IA+Search, due to heuristic approach. The utility of this procedure in heuristic cladogram searches using DO and the improvement of heuristic cladogram cost calculations are discussed. 2003 The Willi Hennig Society. Published by Elsevier Science (USA). All rights reserved.
Settling the Intractability of Multiple Alignment
 Proc. of the 14th Ann. Int. Symp. on Algorithms and Computation (ISAAC
, 2003
"... In this paper some of the most fundamental problems in computational biology are proved intractable. The following problems are shown NPhard for all binary or larger alphabets under all xed metrics: Multiple Alignment with SPscore, Star Alignment, and Tree Alignment (for a given phylogeny) . Earli ..."
Abstract

Cited by 31 (1 self)
 Add to MetaCart
(Show Context)
In this paper some of the most fundamental problems in computational biology are proved intractable. The following problems are shown NPhard for all binary or larger alphabets under all xed metrics: Multiple Alignment with SPscore, Star Alignment, and Tree Alignment (for a given phylogeny) . Earlier these problems have only been shown intractable for sporadic alphabets and distances, here the intractability is settled. Moreover, Consensus Patterns and Substring Parsimony are shown NPhard.
Towards trajectory anonymization: a generalizationbased approach
 Transactions on Data Privacy
"... Abstract. Trajectory datasets are becoming popular due to the massive usage of GPS and locationbased services. In this paper, we address privacy issues regarding the identification of individuals in static trajectory datasets. We first adopt the notion of kanonymity to trajectories and propose a no ..."
Abstract

Cited by 28 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Trajectory datasets are becoming popular due to the massive usage of GPS and locationbased services. In this paper, we address privacy issues regarding the identification of individuals in static trajectory datasets. We first adopt the notion of kanonymity to trajectories and propose a novel generalizationbased approach for anonymization of trajectories. We further show that releasing anonymized trajectories may still have some privacy leaks. Therefore we propose a randomization based reconstruction algorithm for releasing anonymized trajectory data and also present how the underlying techniques can be adapted to other anonymity standards. The experimental results on real and synthetic trajectory datasets show the effectiveness of the proposed techniques.
Topology of Strings: Median String is NPComplete.
 THEORETICAL COMPUTER SCIENCE
, 2000
"... Given a set of strings, the problem of finding a string that minimises its distance to the set is directly related with problems frequently encountered in areas involving Pattern Recognition or Computational Biology. Based on the Levenshtein (or edit) distance, different definitions of distances bet ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
(Show Context)
Given a set of strings, the problem of finding a string that minimises its distance to the set is directly related with problems frequently encountered in areas involving Pattern Recognition or Computational Biology. Based on the Levenshtein (or edit) distance, different definitions of distances between a string and a set of strings can be adopted. In particular, if this definition is the sum of the distances to each string of the set, the string that minimises this distance is the (generalised) median string. Finding this string corresponds in Speech Recognition to giving a model for a set of acoustic sequences, and in Computational Biology to constructing an optimal evolutionary tree when the given phylogeny is a star. Only efficient algorithms are known for finding approximate solutions. The results in this paper are combinatorial and negative. We prove that computing the median string corresponds to a NPcomplete decision problems, thus proving that this problem is NPhard.
A more Efficient Approximation Scheme for Tree Alignment
 SIAM Journal on Computing
, 1997
"... Abstract. We present a new polynomial time approximation scheme (PTAS) for tree alignment, which is an important variant of multiple sequence alignment. As in the existing PTASs in the literature, the basic approach of our algorithm is to partition the given tree into overlapping components of a con ..."
Abstract

Cited by 26 (5 self)
 Add to MetaCart
(Show Context)
Abstract. We present a new polynomial time approximation scheme (PTAS) for tree alignment, which is an important variant of multiple sequence alignment. As in the existing PTASs in the literature, the basic approach of our algorithm is to partition the given tree into overlapping components of a constant size and then apply local optimization on each such component. But the new algorithm uses a clever partitioning strategy and achieves a better efficiency for the same performance ratio. For example, to achieve approximation ratios 1.6 and 1.5, the best existing PTAS has to spend time O(kdn 5) and O(kdn 9), respectively, where n is the length of each leaf sequence and d, k are the depth and number of leaves of the tree, while the new PTAS only has to spend time O(kdn 4) and O(kdn 5). Moreover, the performance of the PTAS is more sensitive to the size of the components, which basically determines the running time, and we obtain an improved approximation ratio for each size. Some experiments of the algorithm on simulated and real data are also given.