Results 1  10
of
40
Maximum agreement and compatible supertrees
 Proceedings of the 15th Combinatorial Pattern Matching Symposium (CPM’O4), volume 3109 of LNCS
, 2004
"... Given a set of leaflabelled trees with identical leaf sets, the MAST problem, respectively MCT problem, consists of finding a largest subset of leaves such that all input trees restricted to these leaves are isomorphic, respectively compatible. In this paper, we propose extensions of these problem ..."
Abstract

Cited by 20 (8 self)
 Add to MetaCart
(Show Context)
Given a set of leaflabelled trees with identical leaf sets, the MAST problem, respectively MCT problem, consists of finding a largest subset of leaves such that all input trees restricted to these leaves are isomorphic, respectively compatible. In this paper, we propose extensions of these problems to the context of supertree inference, where input trees have nonidentical leaf sets. This situation is of particular interest in phylogenetics. The resulting problems are called SMAST and SMCT. A sufficient condition is given that identifies cases where these problems can be solved by resorting to MAST and MCT as subproblems. This condition is met, for instance, when only two input trees are considered. Then we give algorithms for SMAST and SMCT that benefit from the link with the subtree problems. These algorithms run in time linear to the time needed to solve MAST, respectively MCT, on an instance of the same or smaller size. It is shown that arbitrary instances of SMAST and SMCT can be turned in polynomial time into instances composed of trees with a bounded number of leaves. SMAST is shown to be W[2]hard when the considered parameter is the number of input leaves that have to be removed to obtain the agreement of the input trees. A simlar result holds for SMCT. Moreover, the corresponding optimization problems, that is the complements of SMAST and SMCT, can not be approximated in polynomial time within a constant factor, unless P = NP. These results also hold when the input trees have a bounded number of leaves. The presented results apply to both collections of rooted and unrooted trees. Preprint submitted to Elsevier Science 17 November 2006 1
Assessment of discretization techniques for relevant pattern discovery from gene expression data
 In 4th Workshop on Data Mining in Bioinformatics
, 2004
"... In the domain of gene expression data analysis, various researchers have recently emphasized the promising application of pattern discovery techniques like association rule mining or formal concept extraction from boolean matrices that encode gene properties. To take the most from these approaches, ..."
Abstract

Cited by 19 (11 self)
 Add to MetaCart
(Show Context)
In the domain of gene expression data analysis, various researchers have recently emphasized the promising application of pattern discovery techniques like association rule mining or formal concept extraction from boolean matrices that encode gene properties. To take the most from these approaches, a needed step concerns gene property encoding (e.g., overexpression) and its need for the discretization of raw gene expression data. The impact of this preprocessing step on both the quantity and the relevancy of the extracted patterns is crucial. In this paper, we study the impact of discretization parameters by a sound comparison between the dendrograms, i.e., trees that are generated by a hierarchical clustering algorithm, computed from raw expression data and from the various derived boolean matrices. Thanks to a new similarity measure and practical validation over several gene expression data sets, we propose a method that supports the choice of a discretization technique and its parameters for each specific data set. 1.
Relationships of Cetacea (Artiodactyla) Among Mammals: Increased taxon sampling alters interpretations of key fossils and character evolution. Plos one
 2009; 4 (9): e7062. doi: 10.1371/journal.pone.0007062 PMID: 19774069
"... Background: Integration of diverse data (molecules, fossils) provides the most robust test of the phylogeny of cetaceans. Positioning key fossils is critical for reconstructing the character change from life on land to life in the water. Methodology/Principal Findings: We reexamine relationships of ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
(Show Context)
Background: Integration of diverse data (molecules, fossils) provides the most robust test of the phylogeny of cetaceans. Positioning key fossils is critical for reconstructing the character change from life on land to life in the water. Methodology/Principal Findings: We reexamine relationships of critical extinct taxa that impact our understanding of the origin of Cetacea. We do this in the context of the largest total evidence analysis of morphological and molecular information for Artiodactyla (661 phenotypic characters and 46,587 molecular characters, coded for 33 extant and 48 extinct taxa). We score morphological data for Carnivoramorpha, {Creodonta, Lipotyphla, and the {raoellid artiodactylan {Indohyus and concentrate on determining which fossils are positioned along stem lineages to major artiodactylan crown clades. Shortest trees place Cetacea within Artiodactyla and close to {Indohyus, with {Mesonychia outside of Artiodactyla. The relationships of {Mesonychia and {Indohyus are highly unstable, however in trees only two steps longer than minimum length, {Mesonychia falls inside Artiodactyla and displaces {Indohyus from a position close to Cetacea. Trees based only on data that fossilize continue to show the classic arrangement of relationships within Artiodactyla with Cetacea grouping outside the clade, a signal incongruent with the molecular data that dominate the total evidence result. Conclusions/Significance: Integration of new fossil material of {Indohyus impacts placement of another extinct clade {Mesonychia, pushing it much farther down the tree. The phylogenetic position of {Indohyus suggests that the cetacean
An even faster and more unifying algorithm for comparing trees via unbalanced bipartite matchings
 Journal of Algorithms
"... A widely used method for determining the similarity of two labeled trees is to compute a maximum agreement subtree of the two trees. Previous work on this similarity measure is only concerned with the comparison of labeled trees of two special kinds, namely, uniformly labeled trees (i.e., trees with ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
(Show Context)
A widely used method for determining the similarity of two labeled trees is to compute a maximum agreement subtree of the two trees. Previous work on this similarity measure is only concerned with the comparison of labeled trees of two special kinds, namely, uniformly labeled trees (i.e., trees with all their nodes labeled by the same symbol) and evolutionary trees (i.e., leaflabeled trees with distinct symbols for distinct leaves). This paper presents an algorithm for comparing trees that are labeled in an arbitrary manner. In addition to this generality, this algorithm is faster than the previous algorithms. Another contribution of this paper is on maximum weight bipartite matchings. We show how to speed up the best known matching algorithms when the input graphs are nodeunbalanced or weightunbalanced. Based on these enhancements, we obtain an efficient algorithm for a new matching problem called the hierarchical bipartite matching problem, which is at the core of our maximum agreement subtree algorithm. 1
On the approximation of computing evolutionary trees
 in Proceedings of the 11th International Computing and Combinatorics Conference (COCOON’05
, 2005
"... Abstract. Given a set of leaflabelled trees with identical leaf sets, the wellknown MAST problem consists of finding a subtree homeomorphically included in all input trees and with the largest number of leaves. MAST and its variant called MCT are of particular interest in computational biology. Th ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
(Show Context)
Abstract. Given a set of leaflabelled trees with identical leaf sets, the wellknown MAST problem consists of finding a subtree homeomorphically included in all input trees and with the largest number of leaves. MAST and its variant called MCT are of particular interest in computational biology. This paper presents positive and negative results on the approximation of MAST, MCT and their complement versions, denoted CMAST and CMCT. For CMAST and CMCT on rooted trees we give 3approximation algorithms achieving significantly lower running times than those previously known. In particular, the algorithm for CMAST runs in linear time. The approximation threshold for CMAST, resp. CMCT, is shown to be the same whenever collections of rooted trees or of unrooted trees are considered. Moreover, hardness of approximation results are stated for CMAST, CMCT and MCT on small number of trees, and for MCT on unbounded number of trees.
Rooted Maximum Agreement Supertrees
, 2005
"... Given a set T of rooted, unordered trees, where each Ti ∈ T is distinctly leaflabeled by a set �(Ti) and where the sets �(Ti) may overlap, the maximum agreement supertree problem (MASP) is to construct a distinctly leaflabeled tree Q with leaf set �(Q) ⊆ ∪Ti ∈T �(Ti) such that �(Q)  is maximiz ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
Given a set T of rooted, unordered trees, where each Ti ∈ T is distinctly leaflabeled by a set �(Ti) and where the sets �(Ti) may overlap, the maximum agreement supertree problem (MASP) is to construct a distinctly leaflabeled tree Q with leaf set �(Q) ⊆ ∪Ti ∈T �(Ti) such that �(Q)  is maximized and for each Ti ∈ T, the topological restriction of Ti to �(Q) is isomorphic to the topological restriction of Q to �(Ti). Let n = � �∪Ti ∈T �(Ti) � � , k =T , and D = maxTi ∈T {deg(Ti)}. We first show that MASP with k = 2 can be solved in O ( √ Dn log(2n/D)) time, which is O(n log n) when D = O(1) and O(n1.5) when D is unrestricted. We then present an algorithm for MASP with D = 2 whose running time is polynomial if k = O(1). On the other hand, we prove that MASP is NPhard for any fixed k ≥ 3 when D is unrestricted, and also NPhard for any fixed D ≥ 2 when k is unrestricted even if each input tree is required to contain at most three leaves. Finally, we describe a polynomialtime (n/log n)approximation algorithm for MASP.
Improved Parameterized Complexity of the Maximum Agreement Subtree and . . .
 IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
, 2006
"... Given a set of evolutionary trees on a same set of taxa, the maximum agreement subtree problem (MAST), respectively maximum compatible tree problem (MCT), consists of finding a largest subset of taxa such that all input trees restricted to these taxa are isomorphic, respectively compatible. These ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
Given a set of evolutionary trees on a same set of taxa, the maximum agreement subtree problem (MAST), respectively maximum compatible tree problem (MCT), consists of finding a largest subset of taxa such that all input trees restricted to these taxa are isomorphic, respectively compatible. These problems
On the maximum common embedded subtree problem for ordered trees
 In C. Iliopoulos and T Lecroq, editors, String Algorithmics, chapter 7. King’s College London Publications
, 2004
"... Abstract. The maximum common embedded subtree problem, which generalizes the minor containment problem on trees, is reduced for ordered trees to a variant of the longest common subsequence problem. While the maximum common embedded subtree problem is known to be APXhard for unordered trees, an exac ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
Abstract. The maximum common embedded subtree problem, which generalizes the minor containment problem on trees, is reduced for ordered trees to a variant of the longest common subsequence problem. While the maximum common embedded subtree problem is known to be APXhard for unordered trees, an exact solution for ordered trees can be found in polynomial time. In this paper, the longest common balanced sequence problem, and thus the maximum common embedded subtree problem, are solved in O(n1n2 min(d1,ℓ1)min(d2,ℓ2)) time, on ordered trees with n1 and n2 nodes, of depth d1 and d2, andwithℓ1 and ℓ2 leaves. 1