Results 1 - 10
of
961
The diploid genome sequence of an individual human
- PLoS Biol
"... Presented here is a genome sequence of an individual human. It was produced from;32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given r ..."
Abstract
-
Cited by 293 (6 self)
- Add to MetaCart
(Show Context)
Presented here is a genome sequence of an individual human. It was produced from;32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2–206 bp), 292,102 heterozygous insertion/deletion events (indels)(1–571 bp), 559,473 homozygous indels (1–82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22 % of all events identified in the donor, however they involve 74 % of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44 % of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of
mreps: efficient and flexible detection of tandem repeats in dna
- Nucleic Acids Res
, 2003
"... The presence of repeated sequences is a fundamental feature of genomes. Tandemly repeated DNA appears in both eukaryotic and prokaryotic genomes, it is associated with various regulatory mechanisms and plays an important role in genomic fingerprinting. In this paper, we describe mreps, a powerful so ..."
Abstract
-
Cited by 95 (3 self)
- Add to MetaCart
(Show Context)
The presence of repeated sequences is a fundamental feature of genomes. Tandemly repeated DNA appears in both eukaryotic and prokaryotic genomes, it is associated with various regulatory mechanisms and plays an important role in genomic fingerprinting. In this paper, we describe mreps, a powerful software tool for a fast identification of tandemly repeated structures in DNA sequences. mreps is able to identify all types of tandem repeats within a single run on a whole genomic sequence. It has a resolution parameter that allows the program to identify ‘fuzzy ’ repeats. We introduce main algorithmic solutions behind mreps, describe its usage, give some execution time benchmarks and present several case studies to illustrate its capabilities. The mreps web interface is accessible through
YASS: enhancing the sensitivity of DNA similarity search
- NUCLEIC ACIDS RES
, 2005
"... YASS is a DNA local alignment tool based on an efficient and sensitive filtering algorithm. It applies transition-constrained seeds to specify the most probable conserved motifs between homologous sequences, combined with a flexible hit criterion used to identify groups of seeds that are likely to e ..."
Abstract
-
Cited by 93 (17 self)
- Add to MetaCart
YASS is a DNA local alignment tool based on an efficient and sensitive filtering algorithm. It applies transition-constrained seeds to specify the most probable conserved motifs between homologous sequences, combined with a flexible hit criterion used to identify groups of seeds that are likely to exhibit significant alignments. A web interface (http://www.loria.fr/projects/YASS/) is available to upload input sequences in fasta format, query the program and visualize the results obtained in several forms (dot-plot, tabular output and others). A standalone version is available for download from the web page.
An Algorithm for Approximate Tandem Repeats
- In Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 684 of Lecture Notes in Computer Science
, 1993
"... A perfect single tandem repeat is defined as a nonempty string that can be divided into two identical substrings, e.g. abcabc. An approximate single tandem repeat is one in which the substrings are similar, but not identical, e.g. abcdaacd. ..."
Abstract
-
Cited by 88 (3 self)
- Add to MetaCart
A perfect single tandem repeat is defined as a nonempty string that can be divided into two identical substrings, e.g. abcabc. An approximate single tandem repeat is one in which the substrings are similar, but not identical, e.g. abcdaacd.
CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats
- Nucleic Acids Res
, 2007
"... short palindromic repeats ..."
(Show Context)
Genome annotation assessment in Drosophila melanogaster. Genome Res 10
, 2000
"... Computational methods for automated genome annotation are critical to our community’s ability to make full use of the large volume of genomic sequence being generated and released. To explore the accuracy of these automated feature prediction tools in the genomes of higher organisms, we evaluated th ..."
Abstract
-
Cited by 60 (6 self)
- Add to MetaCart
(Show Context)
Computational methods for automated genome annotation are critical to our community’s ability to make full use of the large volume of genomic sequence being generated and released. To explore the accuracy of these automated feature prediction tools in the genomes of higher organisms, we evaluated their performance on a large, well-characterized sequence contig from the Adh region of Drosophila melanogaster. This experiment, known as the Genome Annotation Assessment Project (GASP), was launched in May 1999. Twelve groups, applying state-of-the-art tools, contributed predictions for features including gene structure, protein homologies, promoter sites, and repeat elements. We evaluated these predictions using two standards, one based on previously unreleased high-quality full-length cDNA sequences and a second based on the set of annotations generated as part of an in-depth study of the region by a group of Drosophila experts. Although these standard sets only approximate the unknown distribution of features in this region, we believe that when taken in context the results of an evaluation based on them are meaningful. The results were presented as a tutorial at the conference on Intelligent Systems in Molecular Biology (ISMB-99) in August 1999. Over 95 % of the coding nucleotides in the region were correctly identified by the majority of the gene finders, and the correct intron/exon structures were predicted for>40 % of the genes. Homology-based annotation techniques recognized and associated functions with almost half of the genes in the region; the remainder were only
Improved hit criteria for DNA local alignment
, 2004
"... The hit criterion is a key component of heuristic local alignment algorithms. It specifies a class of patterns assumed to witness a potential similarity, and this choice is decisive for the selectivity and sensitivity of the whole method. In this paper, we propose two ways to improve the hit criterio ..."
Abstract
-
Cited by 55 (12 self)
- Add to MetaCart
The hit criterion is a key component of heuristic local alignment algorithms. It specifies a class of patterns assumed to witness a potential similarity, and this choice is decisive for the selectivity and sensitivity of the whole method. In this paper, we propose two ways to improve the hit criterion. First, we define the group criterion combining the advantages of the single-seed and double-seed approaches used in existing algorithms. Second, we introduce transition-constrained seeds that extend spaced seeds by the possibility of distinguishing transition and transversion mismatches. We provide analytical data as well as experimental results, obtained with our YASS software, supporting both improvements.
microRNA target predictions across seven Drosophila species and comparison to mammalian targets
- PLoS Comput. Biol
, 2005
"... microRNAs are small noncoding genes that regulate the protein production of genes by binding to partially complementary sites in the mRNAs of targeted genes. Here, using our algorithm PicTar, we exploit cross-species comparisons to predict, on average, 54 targeted genes per microRNA above noise in D ..."
Abstract
-
Cited by 54 (3 self)
- Add to MetaCart
(Show Context)
microRNAs are small noncoding genes that regulate the protein production of genes by binding to partially complementary sites in the mRNAs of targeted genes. Here, using our algorithm PicTar, we exploit cross-species comparisons to predict, on average, 54 targeted genes per microRNA above noise in Drosophila melanogaster. Analysis of the functional annotation of target genes furthermore suggests specific biological functions for many microRNAs. We also predict combinatorial targets for clustered microRNAs and find that some clustered microRNAs are likely to coordinately regulate target genes. Furthermore, we compare microRNA regulation between insects and vertebrates. We find that the widespread extent of gene regulation by microRNAs is comparable between flies and mammals but that certain microRNAs may function in clade-specific modes of gene regulation. One of these microRNAs (miR-210) is predicted to contribute to the regulation of fly oogenesis. We also list specific regulatory relationships that appear to be conserved between flies and mammals. Our findings provide the most extensive microRNA target predictions in Drosophila to date, suggest specific functional roles for most microRNAs, indicate the existence of coordinate gene regulation executed by clustered microRNAs, and shed light on the evolution of microRNA function across large evolutionary distances. All predictions are freely accessible at our searchable Web site
PILER: identification and classification of genomic repeats
- Bioinformatics
, 2005
"... Repeated elements such as satellites and transposons are ubiquitous in eukaryotic genomes. De novo computational identification and classification of such elements is a challenging problem, so repeat annotation of sequenced genomes has historically largely relied on sequence similarity to hand-curat ..."
Abstract
-
Cited by 46 (0 self)
- Add to MetaCart
(Show Context)
Repeated elements such as satellites and transposons are ubiquitous in eukaryotic genomes. De novo computational identification and classification of such elements is a challenging problem, so repeat annotation of sequenced genomes has historically largely relied on sequence similarity to hand-curated libraries of known repeat families. We present a new approach to de novo repeat annotation that exploits characteristic patterns of local alignments induced by certain classes of repeats. We describe PILER, a package of efficient search algorithms for identifying such patterns. Novel repeats found using PILER are reported for H. sapiens, A. thalania and D. melanogaster. The software is freely available at