Results 1  10
of
13
Complexity Insights of the Minimum Duplication Problem
 38TH INTERNATIONAL CONFERENCE ON CURRENT TRENDS IN THEORY AND PRACTICE OF COMPUTER SCIENCE (SOFSEM 2012), ŠPINDLERUV MLÝN: CZECH REPUBLIC
, 2012
"... The Minimum Duplication problem is a wellknown problem in phylogenetics and comparative genomics. Given a set of gene trees, the Minimum Duplication problem asks for a species tree that induces the minimum number of gene duplications in the input gene trees. More recently, a variant of the Minimum ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
The Minimum Duplication problem is a wellknown problem in phylogenetics and comparative genomics. Given a set of gene trees, the Minimum Duplication problem asks for a species tree that induces the minimum number of gene duplications in the input gene trees. More recently, a variant of the Minimum Duplication problem, called Minimum Duplication Bipartite, has been introduced in [14], where the goal is to find all preduplications, that is duplications that precede, in the evolution, the first speciation with respect to a species tree. In this paper, we investigate the complexity of both Minimum Duplication and Minimum Duplication Bipartite problems. First of all, we prove that the Minimum Duplication problem is APXhard, even when the input consists of five uniquely leaflabelled gene trees (progressing on the complexity of the problem). Then, we show that the Minimum Duplication Bipartite problem can be solved efficiently by a randomized algorithm when the input gene trees have bounded depth.
An approximation algorithm for computing a parsimonious first speciation in the gene duplication model
, 2010
"... ..."
(Show Context)
Clustering with Relative Constraints
"... Recent studies [26, 22] have suggested using relative distance comparisons as constraints to represent domain knowledge. A natural extension to relative comparisons is the combination of two comparisons defined on the same set of three instances. Constraints in this form, termed Relative Constraints ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Recent studies [26, 22] have suggested using relative distance comparisons as constraints to represent domain knowledge. A natural extension to relative comparisons is the combination of two comparisons defined on the same set of three instances. Constraints in this form, termed Relative Constraints, provide a unified knowledge representation for both partitional and hierarchical clusterings. But many key properties of relative constraints remain unknown. In this paper, we answer the following important questions that enable the broader application of relative constraints in general clustering problems: • Feasibility: Does there exist a clustering that satisfies a given set of relative constraints? (consistency of constraints) • Completeness: Given a set of consistent relative constraints, how can one derive a complete clustering without running into deadends? • Informativeness: How can one extract the most informative relative constraints from given knowledge sources? We show that any hierarchical domain knowledge can be easily represented by relative constraints. We further present a hierarchical algorithm that finds a clustering satisfying all given constraints in polynomial time. Experiments showed that our algorithm achieves significantly higher accuracy than the existing metric learning approach based on relative comparisons.
A 2Approximation for the Minimum Duplication Speciation Problem
 in "Journal of Computational Biology
"... We consider the following problem: given a set of gene family trees, spanning a given set of species, find a first speciation which splits these species into two subsets and minimizes the number of gene duplications that happened before this speciation. We call this problem the Minimum Duplication B ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
We consider the following problem: given a set of gene family trees, spanning a given set of species, find a first speciation which splits these species into two subsets and minimizes the number of gene duplications that happened before this speciation. We call this problem the Minimum Duplication Bipartition Problem. Using a generalization of the Minimum EdgeCut Problem, we propose a polynomial time 2approximation algorithm for the Minimum Duplication Bipartition Problem. We apply this algorithm to the inference of species trees on synthetic datasets and on two datasets of eukaryotic species. Key words: computational molecular biology, dynamic programming, genomics rearrangements. 1.
Computing a Smallest Multilabeled Phylogenetic Tree from Rooted Triplets
"... Abstract. We investigate the computational complexity of a new combinatorial problem of inferring a smallest possible multilabeled phylogenetic tree (MUL tree) which is consistent with each of the rooted triplets in a given set. We prove that even the restricted case of determining if there exist ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. We investigate the computational complexity of a new combinatorial problem of inferring a smallest possible multilabeled phylogenetic tree (MUL tree) which is consistent with each of the rooted triplets in a given set. We prove that even the restricted case of determining if there exists a MUL tree consistent with the input and having just one leaf duplication is NPhard. Furthermore, we show that the general minimization problem is NPhard to approximate within a ratio of n1− for any constant 0 < ≤ 1, where n denotes the number of distinct leaf labels in the input set, although a simple polynomialtime approximation algorithm achieves the approximation ratio n. We also provide an exact algorithm for the problem running in O∗(7n) time and O∗(3n) space.
A New Heuristic Algorithm for MRTC Problem
, 2012
"... A rooted phylogenetic tree is a rooted tree which represents the evolutionary history of currently living species. A rooted binary tree on three leaves is a rooted triplet. The problem of determining whether there exists a rooted phylogenetic tree that contains all of the rooted triplets is polynomi ..."
Abstract
 Add to MetaCart
A rooted phylogenetic tree is a rooted tree which represents the evolutionary history of currently living species. A rooted binary tree on three leaves is a rooted triplet. The problem of determining whether there exists a rooted phylogenetic tree that contains all of the rooted triplets is polynomial solvable, while the problem of finding a rooted phylogenetic tree that contains the maximum number of rooted triplets is known to be NPhard. This maximization problem is known as the Maximum Rooted Triplets Consistency (MRTC) problem. In this paper we present a new heuristic algorithm for this problem based on the concept of the height function of a tree. We study the performance of our algorithm from both simulation and theoretical viewpoints.
ProjectTeam sequoia  Algorithms for largescale sequence analysis for molecular biology  INRIA Activity Report
, 2008
"... The main goal of SEQUOIA projectteam is to define appropriate combinatorial models and efficient algorithms for largescale sequence analysis in molecular biology. An emphasis is made on the annotation of noncoding regions in genomes – RNA genes and regulatory sequences – via comparative genomics ..."
Abstract
 Add to MetaCart
The main goal of SEQUOIA projectteam is to define appropriate combinatorial models and efficient algorithms for largescale sequence analysis in molecular biology. An emphasis is made on the annotation of noncoding regions in genomes – RNA genes and regulatory sequences – via comparative genomics methods. This task involves several complementary issues such as sequence comparison, prediction, analysis and manipulation of RNA secondary structures, identification and processing of regulatory sequences. Efficient algorithms and parallelism on highperformance computing architectures allow largescale instances of such issues. Our aim is to tackle all those issues in an integrated fashion and to put together the developed software tools into a common platform for annotation of noncoding regions. We also explore complementary problems of protein sequence analysis. Those include new approaches to protein sequence comparison on the one hand, and a system for storing and manipulating nonribosomal peptides on the other hand. A special attention is given to the development of robust software, its validation on biological data and to its availability from the software platform of the team and by other means. Most of research projects are carried out in collaboration with biologists.
Do Triplets Have Enough Information to Construct the
, 2013
"... The evolutionary history of certain species such as polyploids are modeled by a generalization of phylogenetic trees called multilabeled phylogenetic trees, or MUL trees for short. One problem that relates to inferring a MUL tree is how to construct the smallest possible MUL tree that is consistent ..."
Abstract
 Add to MetaCart
(Show Context)
The evolutionary history of certain species such as polyploids are modeled by a generalization of phylogenetic trees called multilabeled phylogenetic trees, or MUL trees for short. One problem that relates to inferring a MUL tree is how to construct the smallest possible MUL tree that is consistent with a given set of rooted triplets, or SMRT problem for short. This problem is NPhard. There is one algorithm for the SMRT problem which is exact and runs in O(7n) time, where n is the number of taxa. In this paper, we show that the SMRT does not seem to be an appropriate solution from the biological point of view. Indeed, we present a heuristic algorithm named MTRT for this problem and execute it on some real and simulated datasets. The results of MTRT show that triplets alone cannot provide enough information to infer the true MUL tree. So, it is inappropriate to infer a MUL tree using triplet information alone and considering the minimum number of duplications. Finally, we introduce some new problems which are more suitable from the biological point of view.