Results 11  20
of
40
Removing Noise from Gene Trees
"... Abstract. Reconciliation is the commonly used method for inferring the evolutionary scenario for a gene family. It consists in “embedding ” an inferred gene tree into a known species tree, revealing the evolution of the gene family by duplications and losses. The main complaint about reconciliation ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
(Show Context)
Abstract. Reconciliation is the commonly used method for inferring the evolutionary scenario for a gene family. It consists in “embedding ” an inferred gene tree into a known species tree, revealing the evolution of the gene family by duplications and losses. The main complaint about reconciliation is that the inferred evolutionary scenario is strongly dependant on the considered gene tree, as few misplaced leaves may lead to a completely different history, with significantly more duplications and losses. As using different phylogenetic methods with different parameters may lead to different gene trees, it is essential to have criteria to choose, among those, the appropriate one for reconciliation. In this paper, following the conclusion of a previous paper, we flag certain duplication vertices of a gene tree, the “nonapparent duplication ” (NAD) vertices, as resulting from the misplacement of leaves, and consider the optimization problem of removing the minimum number of leaves leading to a tree without any NAD vertex. We develop a polynomialtime algorithm that is exact for two special classes of gene trees, and show a good performance on simulated data sets in the general case. 1
General Techniques for Comparing Unrooted Evolutionary Trees
, 1997
"... This paper presents two sets of techniques for comparing unrooted evolutionary trees, namely, label compression and fourway dynamic programming. The technique of fourway dynamic programming transforms existing algorithms for computing rooted maximum agreement subtrees into new ones for unrooted t ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
This paper presents two sets of techniques for comparing unrooted evolutionary trees, namely, label compression and fourway dynamic programming. The technique of fourway dynamic programming transforms existing algorithms for computing rooted maximum agreement subtrees into new ones for unrooted trees. Let n be the size of the two input trees. This technique leads to an O(n log n)time algorithm for unrooted trees whose degrees are bounded by a constant, matching the best known complexity for the rooted binary case. The technique of label compression is not based on dynamic programming. With this technique, we obtain an O(n 1:5 log n)time algorithm for unrooted trees with arbitrary degrees, also matching the best algorithm for the rooted unbounded degree case. 1 Introduction An evolutionary tree is a tree whose leaves are labeled with distinct symbols representing species. Evolutionary trees are useful for modeling the evolutionary relationship of species. Many mathematical biol...
The maximum agreement of two nested phylogenetic networks
 New Topics in Theoretical Computer Science, chap. 4, Nova Publishers, 2008
"... Given a set N of phylogenetic networks, the maximum agreement phylogenetic subnetwork problem (MASN) asks for a subnetwork embedded in every Ni ∈ N with as many leaves as possible. MASN can be used to identify shared branching structure among phylogenetic networks or to measure their similarity. In ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
(Show Context)
Given a set N of phylogenetic networks, the maximum agreement phylogenetic subnetwork problem (MASN) asks for a subnetwork embedded in every Ni ∈ N with as many leaves as possible. MASN can be used to identify shared branching structure among phylogenetic networks or to measure their similarity. In this chapter, we prove that the general case of MASN is NPhard already for two phylogenetic networks (in fact, even if one of the two input networks is a binary tree), but that the problem can be solved efficiently if each of the two input phylogenetic networks exhibits a nested structure. For this purpose, we introduce the concept of a nested phylogenetic network and study some of its underlying fundamental combinatorial properties. We first show that the total number of nodes V (N)  in any nested phylogenetic network N with n leaves and nesting depth d is O(n(d + 1)). We then describe a simple algorithm for testing if a given phylogenetic network is nested, and if so, determining its nesting depth in O(V (N)  · (d + 1)) time. Next, we present a polynomialtime algorithm for MASN for two nested phylogenetic networks N1,N2. Its running time is O(V (N1)  ·
Cavity matchings, label compressions, and unrooted evolutionary trees
 SIAM J. Comput
, 2000
"... Abstract. We present an algorithm for computing a maximum agreement subtree of two unrooted evolutionary trees. It takes O(n 1.5 log n) time for trees with unbounded degrees, matching the best known time complexity for the rooted case. Our algorithm allows the input trees to be mixed trees, i.e., tr ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We present an algorithm for computing a maximum agreement subtree of two unrooted evolutionary trees. It takes O(n 1.5 log n) time for trees with unbounded degrees, matching the best known time complexity for the rooted case. Our algorithm allows the input trees to be mixed trees, i.e., trees that may contain directed and undirected edges at the same time. Our algorithm adopts a recursive strategy exploiting a technique called label compression. The backbone of this technique is an algorithm that computes the maximum weight matchings over many subgraphs of a bipartite graph as fast as it takes to compute a single matching. 1. Introduction. An
Forbidden Patterns
"... Abstract. We consider the problem of indexing a collection of documents (a.k.a. strings) of total length n such that the following kind of queries are supported: given two patterns P + and P − , list all nmatch documents containing P + but not P −. This is a natural extension of the classic problem ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
Abstract. We consider the problem of indexing a collection of documents (a.k.a. strings) of total length n such that the following kind of queries are supported: given two patterns P + and P − , list all nmatch documents containing P + but not P −. This is a natural extension of the classic problem of document listing as considered by Muthukrishnan [SODA’02], where only the positive pattern P + is given. Our main solution is an index of size O(n 3/2) bits that supports queries in O(P +  + P −  + nmatch + √ n) time. 1
An algebraic view of the relation between largest common subtrees and smallest common supertrees
, 2004
"... Abstract. The relationship between two important problems in tree pattern matching, the largest common subtree and the smallest common supertree problems, is established by means of simple constructions, which allow one to obtain a largest common subtree of two trees from a smallest common supertree ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Abstract. The relationship between two important problems in tree pattern matching, the largest common subtree and the smallest common supertree problems, is established by means of simple constructions, which allow one to obtain a largest common subtree of two trees from a smallest common supertree of them, and vice versa. These constructions are the same for isomorphic, homeomorphic, topological, and minor embeddings, they take only time linear in the size of the trees, and they turn out to have a clear algebraic meaning. 1
Online Consensus and Agreement of Phylogenetic Trees
 Algorithms in Bioinformatics, 4th International Workshop, WABI 2004
, 2004
"... Abstract. Computational heuristics are the primary methods for reconstruction of phylogenetic trees on large datasets. Most largescale phylogenetic analyses produce numerous trees that are equivalent for some optimization criteria. Even using the best heuristics, it takes significant amount of time ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Computational heuristics are the primary methods for reconstruction of phylogenetic trees on large datasets. Most largescale phylogenetic analyses produce numerous trees that are equivalent for some optimization criteria. Even using the best heuristics, it takes significant amount of time to obtain optimal trees in simulation experiments. When biological data are used, the score of the optimal tree is not known. As a result, the heuristics are either run for a fixed (long) period of time, or until some measure of a lack of improvement is achieved. It is unclear, though, what is a good criterion for measuring this lack of improvement. However, often it is useful to represent the collection of best trees so far in a compact way to allow scientists to monitor the reconstruction progress. Consensus and agreement trees are common such representations. Using existing static algorithms to produce these trees increases an already lengthy computational time substantially. In this paper we present efficient online algorithms for computing strict and majority consensi and the maximum agreement subtree. 1
Boolean property encoding for local set pattern discovery: an application to gene expression data analysis
 Local Pattern Detection. SpringerVerlag LNAI 3539
, 2005
"... Abstract. In the domain of gene expression data analysis, several researchers have recently emphasized the promising application of local pattern (e.g., association rules, closed sets) discovery techniques from boolean matrices that encode gene properties. Detecting local patterns by means of comple ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Abstract. In the domain of gene expression data analysis, several researchers have recently emphasized the promising application of local pattern (e.g., association rules, closed sets) discovery techniques from boolean matrices that encode gene properties. Detecting local patterns by means of complete constraintbased mining techniques turns to be an important complementary approach or invaluable counterpart to heuristic global model mining. To take the most from local set pattern mining approaches, a needed step concerns gene expression property encoding (e.g., overexpression). The impact of this preprocessing phase on both the quantity and the quality of the extracted patterns is crucial. In this paper, we study the impact of discretization techniques by a sound comparison between the dendrograms, i.e., trees that are generated by a hierarchical clustering algorithm on raw numerical expression data and its various derived boolean matrices. Thanks to a new similarity measure, we can select the boolean property encoding technique which preserves similarity structures holding in the raw data. The discussion relies on several experimental results for three gene expression data sets. We believe our framework is an interesting direction of work for the many application domains in which (a) local set patterns have been proved useful, and (b) Boolean properties have to be derived from raw numerical data. 1
Approximating the Maximum Isomorphic Agreement Subtree is Hard
"... The Maximum Isomorphic Agreement Subtree (MIT) problem is one of the simplest versions of the Maximum Interval Weight Agreement Subtree method (MIWT) which is used to compare phylogenies. More precisely MIT allows to provide a subset of the species such that the exact distances between species in ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
The Maximum Isomorphic Agreement Subtree (MIT) problem is one of the simplest versions of the Maximum Interval Weight Agreement Subtree method (MIWT) which is used to compare phylogenies. More precisely MIT allows to provide a subset of the species such that the exact distances between species in such subset are preserved among all evolutionary trees considered. In this paper, the approximation complexity of the MIT problem is investigated, showing that it cannot be approximated in polynomial time within factor log n for any > 0 unless NPDTIME(2 ) for instances containing three trees. Moreover, we show that such result can be strengthened whenever instances of the MIT problem can contain an arbitrary number of trees, since MIT shares the same approximation lower bound of MAX CLIQUE.