| Kececioglu, J. and E. Myers (1995). Combinatorial algorithms for DNA sequence assembly. Algorithmica 13, 7--51. |
....requires the shredding of multiple copies of the original strand yielding a system of DNA fragments. The fragments are read individually using electrophoresis. The sequenced fragments, or reads, are stored in silico, and the original sequence is then deduced from fragment overlap information ([2, 6, 7, 8, 9, 14, 18, 23]) The main sources of di#culty for shotgun sequencing are gaps in the DNA coverage by the fragments, erroneous readings of DNA fragments, and the presence of repeats, i.e. long identical disjoint segments of a DNA string. An assembler implementing the traditional overlap layout consensus paradigm ....
....until a su#cient number of errors have been cleaned and a su#cient number of fragments have been labeled. Labeling. This process deals with the problem of read orientation and is an improvement over the original VEDA algorithm. A scheme for read (fragment) orientation was first proposed in [9]. We present a new scheme which directly aids in the removal of errors from fragments. In a typical input, synthetic or real, approximately half of the fragments come from one strand of the DNA and the rest from the opposite one. Given a fragment f , the corresponding interval f # of the DNA from ....
[Article contains additional citation context not shown here]
J. Kececioglu and E. Myers, Combinatorial Algorithms for DNA sequence assembly, Algorithmica 13:7-51, 1995.
....to which symbol from stream 2. But due to the loss of a random number of symbols, that information is lost, and all we are left with is a sequence of pieces of the original message that need to be assembled back together. This problem is conceptually identical to the problem of DNA sequencing [7], under the assumption of no errors. In that problem, given a large number of short overlapping substrings, the goal is to reconstruct a long string de ned over an alphabet of 4 symbols. In our problem, the short substrings are the outputs of the channel while the switch is closed, and the ....
J. Kececioglu and E. Myers. Combinatorial Algorithms for DNA Sequence Assembly. Algorithmica, 13(1/2):7-51, 1995.
....in the trace, is termed base calling and the sequence obtained is called a read. Once the fragments have been sequenced, they are assembled. It is this process we are interested in. At the conceptual level, the problem of assembling DNA sequence fragments naturally divides into three phases [27, 32]. In the overlap phase, each fragment is compared against every other fragment to see if they share a common subsequence, implying that they were potentially sampled from overlapping stretches of the original strand [13, 14, 15] At this stage, 1 clipping may occur to remove parts of the sequence ....
J. Kececioglu and E. W. Myers. Combinatorial algorithms for dna sequence assembly. Algorithmica, 13:7-51, 1995.
.... it has been recognized by researchers in several public forums that analysis of distinguishing base sites is necessary to correctly separate repeats, we are not aware of any rigorous and fully automatic method for carrying out such an analysis in the literature on sequence assembly algorithms [10, 8, 13, 19, 16, 5, 7, 15, 3, 4, 20, 2]. We present a four phase approach to separating repeats which works with any assembler that follows the standard three phase decomposition [17] into (1) constructing a graph whose edges represent how fragments overlap, 2) selecting edges to form a fragment layout, and (3) multiple alignment ....
....problem to yield an assembler that is correct in the presence of repeats. A potential disadvantage is that several iterations of the layout phase may be necessary to obtain a layout that does not compress repeats. The layout phase, however, is the least computationally intensive phase in practice [10], usually at least an order of magnitude faster than the overlap phase. Furthermore, our procedure for repeat separation is carefully designed to be fast (as demonstrated in Section 6) so repeated iteration should not signi cantly increase the total time for assembly. Overview Our approach to ....
[Article contains additional citation context not shown here]
Kececioglu, J.D. and E.W. Myers. \Combinatorial algorithms for DNA sequence assembly." Algorithmica 13:1/2, 7-51, 1995.
....be covered by a collection of paths. The other nodes of the graph are called terminals. Thus a smallest set of node disjoint directed paths covering the terminals of G gives a solution for the given instance of the MkCP. A similar formulation to the problem has been proposed by Kececioglu [K91] [KM95], the main difference lying in the treatment of the Steiner nodes. In the practical applications we are interested in, many different problems occur when the resulting substrings are considered. We know that the DNA molecule can be seen as two parallel strings over fA; C; G; Tg, where the second ....
J.D. Kececioglu and E.W. Myers, "Combinatorial Algorithms for DNA Sequence Assembly", Algorithmica 13, 7-51 (1995).
.... we do In that case, we have the shortest common superstring problem which is defined as follows: Given a finite set X and a collection I of subsets of X, what is the shortest string in which the members of each subset I 2 I appear as a consecutive subsequence This problem is known to be NP hard [16]. The frontier of a tree T is the permutation of X obtained by reading the labels of the leaves from left to right. Two PQ trees T and T 0 are equivalent, denoted T j T 0 , if one can be obtained from the other by applying a sequence of the following transformation rules: 1. Arbitrarily ....
John D. Kececioglu, Eugene W. Myers, Combinatorial Algorithms for DNA sequence assembly, TR 92-37, University of Arizona, 1993.
.... be the ordering of: f 1 ; f 6 ; f 5 ; f 10 ; f 4 ; f 9 ; f 2 ; f 7 ; f 3 ; f 8 The hypothesis followed in the assembly process is that fragments with a high degree of similarity of their sequences (a high overlap strength) likely come from the same area of the DNA (see [Kececioglu and Myers, 1995)] GENETIC ALGORITHMS FOR DNA SEQUENCING f 1 f 2 f 3 f 4 f 5 f 6 f 7 f 8 f 9 f 10 f 1 M M H M M f 2 M M M M H f 3 M M M f 4 M M M M H L f 5 M M M f 6 H M M L M f 7 M M M f 8 M M M f 9 M H M H L M M f 10 M L M M Table 1: Table of Relative Similarity Scores for Example Fragments. H ....
J. D. Kececioglu and E.W. Myers. Combinatorial algorithms for dna sequence assembly. Algorithmica, 13(1/2):7--51.
....of at most on the order of 1000 nucleotides (i.e. 1kbps) at a time [Waterman95] The shotgun sequence assembly problem is to reconstruct a long DNA segment from short fragment data by identifying overlaps between fragments. Much existing research on this problem can be found in the literature [Bonfield95, Foulser90, Gingeras79, Kececioglu95, Parsons95, Peltola84, Idury95, Huang96, Huang92, Staden80, Green]. Most traditional approaches are based on finding pairwise alignments between fragments, and have demonstrated only limited success when assembling long sequences. Consequently, newer approaches not solely based on simple pairwise alignments have also been recently developed [Bonfield95, Idury95, ....
....which comprise the contig. Thus to determine the consensus sequence for the contig, we must reconcile all the individual fragment sequences. One view of this problem, which is consistent with most previous work in the area, is to see it as an instance of multiple sequence alignment problem [Kececioglu95, Sutton95, Huang92, Huang96]. In contrast, our view of this problem is as a natural extension of the overlap map generation idea, using exact matches of additional probes selected at random from gaps between existing probes in an iterative fashion. This exploits both the low error rate in the input data as well as the high ....
J. D. Kececioglu and E. W. Myers, 1995, "Combinatorial Algorithms for DNA Sequence Assembly", Algorithmica, 13
....substances) broken into several small pieces that we are able to handle. Then, the problem is how to glue the small pieces in the correct way to reconstruct the original sequence (see [M92] and [MS96] Many different approaches have been used to solve the DNA Fragment Assembly Problem (see [KM95] for a good survey on the subject) A usual strategy is to apply algorithms designed for the shortest common superstring problem. Kececioglu proposes in his Ph.D. thesis [K91] a natural graph theoretical model to this problem. In this paper we suggest a very similar formulation and the use of ....
....terminals of G gives a solution for the given instance of the MkCP. Example 2.1 : For the instance given in Example 1.1 the corresponding graph is the following. Nodes 5, 7 and 9 are Steiner nodes. 7 8 1 3 6 4 2 5 9 10 A similar formulation for the problem has been proposed by Kececioglu [K91] [KM95], the main difference lying in the treatment of the Steiner nodes. Kececioglu s model includes special edges when a string is contained in another one. In this case the correspondence between paths and k contigs (or its skeletons) is not given. His model does not include either the idea of Steiner ....
J.D. Kececioglu and E.W. Myers, "Combinatorial Algorithms for DNA Sequence Assembly", Algorithmica 13, 7-51 (1995).
....realizes objects of type overlap graph, constraint set, and assembly that may be created, destroyed, and manipulated only via routines of the kernel. An object persists until it is explicitly destroyed. The kernel developed actually represents the Arizona group s second such construction effort [Kec91,MiM91,KeM93]. This second effort started from scratch with a complete redesign of the underlying algorithms and interface. 10. FAKtory Additions (Internal Only) To facilitate the prescreening of fragments with respect to vector and other sequences, the overlap routine fa overlap seqs has been added: int ....
Kececioglu, J.D. and E.W. Myers. "Combinatorial algorithms for DNA sequence assembly". Accepted for publication in Algorithmica (1993).
....polynomial time results in combinatorial optimization is Edmonds algorithm for maximum matching [6] This paper gives an experimental study of a new implementation of Edmonds algorithm for large sparse graphs. Our own motivation comes from the problem of large scale DNA sequence assembly [13] in computational biology, which can be approximated in the presence of error using maximum weight matchings Extended abstract submitted to the 2nd Workshop on Algorithm Engineering (WAE 98) y Corresponding author. Department of Computer Science, The University of Georgia, Athens, GA ....
Kececioglu, J.D. and E.W. Myers. "Combinatorial algorithms for DNA sequence assembly." Algorithmica 13:1/2, 7--51, 1995.
....realizes objects of type overlap graph, constraint set, and assembly that may be created, destroyed, and manipulated only via routines of the kernel. An object persists until it is explicitly destroyed. The kernel developed actually represents the Arizona group s second such construction effort [Kec91,MiM91,KeM93]. This second effort started from scratch with a complete redesign of the underlying algorithms and interface. 10. ....
Kececioglu, J.D. and E.W. Myers. "Combinatorial algorithms for DNA sequence assembly". Accepted for publication in Algorithmica (1993).
No context found.
Kececioglu, J. and E. Myers (1995). Combinatorial algorithms for DNA sequence assembly. Algorithmica 13, 7--51.
No context found.
J.D. Kececioglu and E.W. Myers, "Combinatorial algorithms for DNA sequence assembly," Algorithmica, vol. 13, pp. 7--51, 1995.
No context found.
Kececioglu, JD and EW Myers 1995. Combinatorial algorithms for DNA sequence assembly. Algorithmica 13: 7-51.
No context found.
John Kececioglu and Eugene Meyers. Combinatorial algorithms for dna sequence assembly. Algorithmica, 13:7--51, 1995.
No context found.
J.D. Kececioglu and E.W. Myers. Combinatorial algorithms for dna sequence assembly. Algorithmica, 13(1/2):7--51, 1995.
No context found.
J.D. Kececioglu and E.W. Myers. Combinatorial algorithms for dna sequence assembly. Algorithmica, 13(1/2):7--51, 1995.
No context found.
J.D. Kececioglu and E.W. Myers. Combinatorial algorithms for dna sequence assembly. Algorithmica, 13(1/2):7--51, 1995.
No context found.
J.D. Kececioglu and E.W. Myers. Combinatorial algorithms for dna sequence assembly. Algorithmica, 13(1/2):7--51, 1995.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC