Results 1 - 10
of
335
IntAct–open source resource for molecular interaction data
- Nucleic Acids Res
, 2007
"... interaction data ..."
(Show Context)
The Molecular Biology Database Collection: 2005 update
- Nucleic Acids Res
, 2005
"... The NAR Molecular Biology Database Collection is a public online resource that contains links to all databases described in this issue of Nucleic Acids Research. In addition, this collection lists databases that have been featured in previous issues of NAR, as well as selected other databases that a ..."
Abstract
-
Cited by 163 (1 self)
- Add to MetaCart
(Show Context)
The NAR Molecular Biology Database Collection is a public online resource that contains links to all databases described in this issue of Nucleic Acids Research. In addition, this collection lists databases that have been featured in previous issues of NAR, as well as selected other databases that are freely available to the public and may be useful to the molecular biologist. The 2006 update includes 858 databases, 139 more than the previous one. The databases come with brief summaries, many of which have been updated recently. Each database is assigned a stable accession number that does not change if the database moves to a new location and its URL, authors ’ names or the contact person address are updated. The complete database list and summaries are available online at the Nucleic Acids Research website
The Sorcerer II Global Ocean Sampling expedition: Expanding the universe of protein families. PLoS Biol 5: e16
, 2007
"... Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predic ..."
Abstract
-
Cited by 151 (6 self)
- Add to MetaCart
(Show Context)
Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in
An annotation management system for relational databases
- In VLDB
, 2004
"... We present an annotation management system for relational databases. In this system, every piece of data in a relation is assumed to have zero or more annotations associated with it and annotations are propagated along, from the source to the output, as data is being transformed through a query. Suc ..."
Abstract
-
Cited by 129 (8 self)
- Add to MetaCart
(Show Context)
We present an annotation management system for relational databases. In this system, every piece of data in a relation is assumed to have zero or more annotations associated with it and annotations are propagated along, from the source to the output, as data is being transformed through a query. Such an annotation management system is important for understanding the provenance and quality of data, especially in applications that deal with integration of scientific and biological data. We present an extension, pSQL, of a fragment of SQL that has three different types of annotation propagation schemes, each useful for different purposes. The default scheme propagates annotations according to where data is copied from. The default-all scheme propagates annotations according to where data is copied from among all equivalent formulations of a given query. The custom scheme allows a user to specify how annotations should propagate. We present a storage scheme for the annotations and describe algorithms for translating a pSQL query under each propagation scheme into one or more SQL queries that would correctly retrieve the relevant annotations according to the specified propagation scheme. For the default-all scheme, we also show how we generate finitely many queries that can simulate the annotation propagation behavior of the set of all equivalent queries, which is possibly infinite. The algorithms are implemented and the feasibility of the system is demonstrated by a set of experiments that we have conducted. 1
BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark
- Proteins
, 2005
"... ABSTRACT Multiple sequence alignment is one of the cornerstones of modern molecular biology. It is used to identify conserved motifs, to determine protein domains, in 2D/3D structure prediction by homology and in evolutionary studies. Recently, high-throughput technologies such as genome sequencing ..."
Abstract
-
Cited by 129 (2 self)
- Add to MetaCart
ABSTRACT Multiple sequence alignment is one of the cornerstones of modern molecular biology. It is used to identify conserved motifs, to determine protein domains, in 2D/3D structure prediction by homology and in evolutionary studies. Recently, high-throughput technologies such as genome sequencing and structural proteomics have lead to an explosion in the amount of sequence and structure information available. In response, several new multiple alignment methods have been developed that improve both the efficiency and the quality of protein alignments. Consequently, the benchmarks used to evaluate and compare these methods must also evolve. We present here the latest release of the most widely used multiple alignment benchmark, BAliBASE, which provides high quality, manually refined, reference alignments based on 3D structural superpositions. Version 3.0 of BAliBASE includes new, more challenging test cases, representing the real problems encountered when aligning large sets of complex sequences. Using a novel, semiautomatic update protocol, the number of protein families in the benchmark has been increased and representative test cases are now available that cover most of the protein fold space. The total number of proteins in BAliBASE has also been significantly increased from 1444 to 6255 sequences. In addition, full-length sequences are now provided for all test cases, which represent difficult cases for both global and local alignment programs. Finally, the BAliBASE Web site
MEROPS: the peptidase database
- Nucleic Acids Res
, 2004
"... Peptidases, their substrates and inhibitors are of great relevance to biology, medicine and biotech-nology. The MEROPS database ..."
Abstract
-
Cited by 128 (4 self)
- Add to MetaCart
(Show Context)
Peptidases, their substrates and inhibitors are of great relevance to biology, medicine and biotech-nology. The MEROPS database
Recent improvements to the PROSITE database
- Nucleic Acids Research
, 2004
"... The PROSITE database consists of a large collec-tion of biologically meaningful signatures that are described as patterns or pro®les. Each signature is linked to documentation that provides useful bio-logical information on the protein family, domain or functional site identi®ed by the signature. Th ..."
Abstract
-
Cited by 122 (9 self)
- Add to MetaCart
(Show Context)
The PROSITE database consists of a large collec-tion of biologically meaningful signatures that are described as patterns or pro®les. Each signature is linked to documentation that provides useful bio-logical information on the protein family, domain or functional site identi®ed by the signature. The PROSITE web page has been redesigned and several tools have been implemented to help the user discover new conserved regions in their own proteins and to visualize domain arrangements. We also introduced the facility to search PDB with a PROSITE entry or a user's pattern and visualize matched positions on 3D structures. The latest
PIRSF: family classification system at the Protein Information Resource
- Nucleic Acids Res
, 2004
"... proteins and domains. The primary PIRSF classification unit is the homeomorphic family, whose members are both homologous (evolved from a common ancestor) and homeomorphic (sharing full-length sequence similarity and a common domain architecture). PIRSF families are curated systematically based on l ..."
Abstract
-
Cited by 82 (13 self)
- Add to MetaCart
proteins and domains. The primary PIRSF classification unit is the homeomorphic family, whose members are both homologous (evolved from a common ancestor) and homeomorphic (sharing full-length sequence similarity and a common domain architecture). PIRSF families are curated systematically based on literature review and integrative sequence and functional analysis, including sequence and structure similarity, domain architecture, functional association, genome context, and phyletic pattern. The results of classification and expert annotation are summarized in PIRSF family reports with graphical viewers for taxonomic distribution, domain architecture, family hierarchy, and multiple alignment and phylogenetic tree. The PIRSF system provides a comprehensive resource for bioinformatics analysis and comparative studies of protein function and evolution. Domain or fold-based searches allow identification of evolutionarily related protein families sharing domains or structural folds. Functional convergence and functional divergence are revealed by the relationships between protein classification and curated family functions. The taxonomic distribution allows the identification of lineage-specific or broadly conserved protein families and can reveal horizontal gene transfer. Here we demonstrate, with illustrative examples, how to use the web-based PIRSF system as a tool for functional and evolutionary studies of protein families.
ProFunc: a server for predicting protein function from 3D structure
- Nucleic Acids Res
, 2005
"... ProFunc ..."
(Show Context)
BASys: a web server for automated bacterial genome annotation. Nucleic Acids Res
, 2005
"... BASys (Bacterial Annotation System) is a web server that supports automated, in-depth annotation of bacterial genomic (chromosomal and plasmid) sequences. It accepts raw DNA sequence data and an optional list of gene identification information and provides extensive textual annotation and hyper-link ..."
Abstract
-
Cited by 67 (5 self)
- Add to MetaCart
(Show Context)
BASys (Bacterial Annotation System) is a web server that supports automated, in-depth annotation of bacterial genomic (chromosomal and plasmid) sequences. It accepts raw DNA sequence data and an optional list of gene identification information and provides extensive textual annotation and hyper-linked image output. BASys uses.30 programs to determine 60 annotation subfields for each gene, includinggene/proteinname,GO function,COG func-tion, possibleparaloguesandorthologues,molecular weight, isoelectric point, operon structure, sub-cellular localization, signal peptides, transmembrane regions, secondary structure, 3D structure, reactions and pathways. The depth and detail of a BASys annotation matches or exceeds that found in a stand-ard SwissProt entry. BASys also generates colorful, clickable and fully zoomable maps of each query chromosome to permit rapid navigation and detailed visual analysis of all resulting gene annotations. The textual annotations and images that are provided by BASys can be generated in 24 h for an average bac-terial chromosome (5Mb). BASys annotationsmaybe viewed and downloaded anonymously or through a password protected access system. The BASys ser-ver and databases can also be downloaded and run locally. BASys is accessible at