Results 1 -
9 of
9
Parametric Complexity of Sequence Assembly: Theory and Applications to Next Generation Sequencing
"... In recent years a flurry of new DNA sequencing technologies have altered the landscape of genomics, providing a vast amount of sequence information at a fraction of the costs that were previously feasible. The task of assembling these sequences into a genome has, however, still remained an algorithm ..."
Abstract
-
Cited by 31 (2 self)
- Add to MetaCart
In recent years a flurry of new DNA sequencing technologies have altered the landscape of genomics, providing a vast amount of sequence information at a fraction of the costs that were previously feasible. The task of assembling these sequences into a genome has, however, still remained an algorithmic challenge that is in practice answered by heuristic solutions. In order to design better assembly algorithms and exploit the characteristics of sequence data from new technologies we need an improved understanding of the parametric complexity of the assembly problem. In this work, we provide a first theoretical study in this direction, exploring the connections between repeat complexity, read lengths, overlap lengths and coverage in determining the “hard ” instances of the assembly problem. Our work suggests at least two ways in which existing assemblers can be extended in a rigorous fashion, in addition to delineating directions for future theoretical investigations. 1
The Restriction Scaffold Problem
- Journal of Computational Biology
, 2003
"... Most shotgun sequencing projects undergo a long and costly phase of finishing, in which a partial assembly forms several contigs whose order, orientation and relative distance is unknown. We propose here a new technique that supplements the shotgun assembly data by experimentally simple and commonly ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
(Show Context)
Most shotgun sequencing projects undergo a long and costly phase of finishing, in which a partial assembly forms several contigs whose order, orientation and relative distance is unknown. We propose here a new technique that supplements the shotgun assembly data by experimentally simple and commonly used complete restriction digests of the target. By computationally combining information from the contig sequences and the fragment sizes measured for several different enzymes, we seek to form a "scaffold" on which the contigs will be placed in their correct orientation, order and distance. We give a heuristic search algorithm for solving the problem and report on promising preliminary simulation results. The key to the success of the search scheme is the very rapid solution of two time-critical subproblems that are solved to optimality in linear time.
Visualization challenges for a new cyberpharmaceutical computing paradigm
- Proceedings of the Symposium on Large-Data Visualization and Graphics
, 2001
"... In recent years, an explosion in data has been profoundly changing the field of biology and creating theneed for new areas of expertise, particularly in the handling of data. One vital area that has so far received insufficient attention is how to communicate the large quantities of diverse and comp ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
In recent years, an explosion in data has been profoundly changing the field of biology and creating theneed for new areas of expertise, particularly in the handling of data. One vital area that has so far received insufficient attention is how to communicate the large quantities of diverse and complex information that is being generated. Celera has encountered a number of visualization problems in the course of developing tools for bioinformatics research, applying them to our data generation efforts, and making that data available to our customers. This paper presents several examples from Celera’s experience. In the area of genomics, challenging visualization problems have come up in assembling genomes, studying variations between individuals, and comparing different genomes to one another. The emerging area of proteomics has created new visualization challenges in interpreting protein expression data, studying protein regulatory networks, and examining protein structure. These examples illustrate how the field of bioinformatics is posing new challenges concerning the communication of data that are often very different from those that have heretofore dominated scientific computing. Addressing the level of detail, the degree of complexity, and the interdisciplinary barriers that characterize bioinformatic problems can be expected to be a sizable but rewarding task for the field of scientific visualization. 1
A Survey of Computational Techniques for Genome Sequencing,” Project Report supported by Korea
- Institute of Science and Technology Information, 2002
"... 2 An overview of genome sequencing 5 3 Computational Techniques for Genome Sequencing 8 ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
2 An overview of genome sequencing 5 3 Computational Techniques for Genome Sequencing 8
AND
"... Abstract. Given a collection of contigs and mate-pairs. The Contig Scaffolding Problem is to order and orientate the given contigs in a manner that is consistent with as many mate-pairs as possible. This paper describes an efficient heuristic called the greedy-path merging algorithm for solving this ..."
Abstract
- Add to MetaCart
Abstract. Given a collection of contigs and mate-pairs. The Contig Scaffolding Problem is to order and orientate the given contigs in a manner that is consistent with as many mate-pairs as possible. This paper describes an efficient heuristic called the greedy-path merging algorithm for solving this problem. The method was originally developed as a key component of the compartmentalized assembly strategy developed at Celera Genomics. This interim approach was used at an early stage of the sequencing of the human genome to produce a preliminary assembly based on preliminary whole genome shotgun data produced at Celera and preliminary human contigs produced by the Human Genome Project. Categories and Subject Descriptors: F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems—computations on discrete structures; J.3 [Life and Medical
COVER FEATURE
"... Algorithms that can assemble millions of small DNA fragments into gene sequences underlie the current revolution in biotechnology, helping researchers build the growing database of complete genomes. Each cell of a living organism contains chromosomes composed of a sequence of DNA base pairs. This se ..."
Abstract
- Add to MetaCart
Algorithms that can assemble millions of small DNA fragments into gene sequences underlie the current revolution in biotechnology, helping researchers build the growing database of complete genomes. Each cell of a living organism contains chromosomes composed of a sequence of DNA base pairs. This sequence, the genome, represents a set of instructions that controls the replication and function of each organism. The automated DNA sequencer gave birth to genomics, the analytic and comparative study of genomes, by allowing scientists to decode entire genomes. Although genomes vary in size from millions of nucleotides in bacteria to billions of nucleotides in humans and most animals and plants, the chemical reactions researchers use to decode the DNA base pairs are accurate for only about 600 to 700 nucleotides at a time. The process of sequencing begins by physically breaking the DNA into millions of random fragments, which are then “read ” by a DNA sequencing machine. Next, a computer program called an assembler pieces together the many overlapping reads and reconstructs the original sequence. This general technique, called shotgun sequencing, was introduced by Fred Sanger in 1982. 1 The technique took a quantum leap forward in 1995, when a team led by Craig Venter and Robert Fleischmann of The
Pages 1–8 Design of a Compartmentalized Shotgun
"... Two different strategies for determining the human genome are currently being pursued: one is the “clone-byclone” approach, employed by the publicly funded project, and the other is the “whole genome shotgun assembler” approach, favored by researchers at Celera Genomics. An interim strategy employed ..."
Abstract
- Add to MetaCart
(Show Context)
Two different strategies for determining the human genome are currently being pursued: one is the “clone-byclone” approach, employed by the publicly funded project, and the other is the “whole genome shotgun assembler” approach, favored by researchers at Celera Genomics. An interim strategy employed at Celera, called compartmentalized shotgun assembly, makes use of preliminary data produced by both approaches. In this paper we describe the design, implementation and operation of the “compartmentalized shotgun assembler”. Contact: