Results 1 - 10
of
529
The swiss-prot protein knowledgebase and its supplement trembl in 2003
- Nucleic Acids Res
, 2003
"... The SWISS-PROT protein knowledgebase ..."
Relating Whole-Genome Expression Data with Protein-Protein Interactions
, 2002
"... this paper is the interactions occurring within specific complexes. These were obtained from the MIPS complexes catalog (Fellenberg et al. 2000), which represents a carefully annotated, comprehensive data set of protein complexes culled from the scientific literature. In addition, we looked at other ..."
Abstract
-
Cited by 101 (14 self)
- Add to MetaCart
this paper is the interactions occurring within specific complexes. These were obtained from the MIPS complexes catalog (Fellenberg et al. 2000), which represents a carefully annotated, comprehensive data set of protein complexes culled from the scientific literature. In addition, we looked at other types of protein-protein interactions from large "aggregated" data sets collecting many heterogeneous pair-wise interactions. We collected these from the MIPS catalogs of physical and genetic interactions (Fellenberg et al. 2000), databases of interacting proteins (DIP and BIND) (Bader and Hogue 2000; Xenarios 2000), and a comprehensive collection of yeast two-hybrid experiments (Cagney et al. 2000; lto et al. 2000; Schwikowski et al. 2000; Uetz et al. 2000; Uetz and Hughes 2000; lto et al. 2001). These interactions are subdivided into groups based on their method of discovery. They include physical interactions (e.g., collected through coimmunoprecipitation and copurification), genetic interactions (e.g., determined through genetic means such as synthetic lethality or suppression experiments), and yeast twohybrid pairs
MIPS: analysis and annotation of proteins from whole genomes
- Nucleic Acids Res
, 2004
"... resources related to genome information. Manually curated databases for several reference organisms are maintained. Several of these databases are described elsewhere in this and other recent NAR database issues. In a complementary effort, a comprehensive set of.400 genomes automatically annotated w ..."
Abstract
-
Cited by 97 (7 self)
- Add to MetaCart
resources related to genome information. Manually curated databases for several reference organisms are maintained. Several of these databases are described elsewhere in this and other recent NAR database issues. In a complementary effort, a comprehensive set of.400 genomes automatically annotated with the PEDANT system are maintained. The main goal of our current work on creating and maintaining genome databases is to extend gene centered information to information on interactions within a generic comprehensive framework. We have concentrated our efforts along three lines (i) the development of suitable comprehensive data structures and database technology, communication and query tools to include a wide range of different types of information enabling the representation of complex information such as functional modules or networks Genome Research Environment System, (ii) the development of databases covering computable information such as the basic evolutionary relations among all genes, namely SIMAP, the sequence similarity matrix and the CABiNet network analysis framework and (iii) the compilation and manual annotation of information related to interactions such as protein– protein interactions or other types of relations (e.g. MPCDB, MPPI, CYGD). All databases described and the detailed descriptions of our projects can be accessed through the MIPS WWW server
Genome-wide discovery of transcriptional modules from DNA sequence and gene expression
- Bioinformatics
, 2003
"... In this paper, we describe an approach for understanding
transcriptional regulation from both gene expression and
promoter sequence data. We aim to identify transcriptional
modules—sets of genes that are co-regulated in a set
of experiments, through a common motif profile. Using
the EM algorithm, o ..."
Abstract
-
Cited by 70 (4 self)
- Add to MetaCart
In this paper, we describe an approach for understanding
transcriptional regulation from both gene expression and
promoter sequence data. We aim to identify transcriptional
modules—sets of genes that are co-regulated in a set
of experiments, through a common motif profile. Using
the EM algorithm, our approach refines both the module
assignment and the motif profile so as to best explain
the expression data as a function of transcriptional motifs.
It also dynamically adds and deletes motifs, as required
to provide a genome-wide explanation of the expression
data. We evaluate the method on two Saccharomyces
cerevisiae gene expression data sets, showing that our
approach is better than a standard one at recovering
known motifs and at generating biologically coherent
modules. We also combine our results with binding
localization data to obtain regulatory relationships with
known transcription factors, and show that many of the
inferred relationships have support in the literature.
Associating Genes with Gene Ontology Codes Using a Maximum Entropy Analysis of Biomedical Literature
, 2002
"... this paper but has been provided elsewhere (Ratnaparkhi 1997; Manning and Schutze 1999) ..."
Abstract
-
Cited by 58 (3 self)
- Add to MetaCart
this paper but has been provided elsewhere (Ratnaparkhi 1997; Manning and Schutze 1999)
Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps
- BIOINFORMATICS VOL. 21 SUPPL. 1 2005, PAGES I302–I310
, 2005
"... ..."
Molecular Fossils in the Human Genome: Identification and analysis of the pseudogenes in chromosomes 21 and 22
, 2001
"... We have developed an initial approach for annotating and surveying pseudogenes in the human genome. We search human genomic DNA for regions that are similar to known protein sequences and contain obvious disablements (i.e. mid-sequence stop codohs or frameshifts), while insuring minimal overlap with ..."
Abstract
-
Cited by 36 (20 self)
- Add to MetaCart
We have developed an initial approach for annotating and surveying pseudogenes in the human genome. We search human genomic DNA for regions that are similar to known protein sequences and contain obvious disablements (i.e. mid-sequence stop codohs or frameshifts), while insuring minimal overlap with annotations of known genes. Pseudogenes can be divided into 'processed' and 'non-processed'; the former are reverse- transcribed from mRNA (and therefore have no intron structure) whereas the latter presumably arise from genomic duplications. We annotate putative processed pseudogenes based on whether there is a continuous span of homology that is >70% of the length of the closest matching human protein (i.e. with introns removed), or whether there is evidence of polyadenylation. We have applied our approach to chromosomes 21 and 22, the first parts of the human genome completely sequenced, finding 190 new pseudogene annotations beyond the 264 reported by the sequencing centres. In total, on chromosomes 21 and 22, there are 189 processed pseudogenes, 195 non-processed pseudogenes and, additionally, 70 pseudogenic immunoglobulin gene segments. (Detailed assignments are available at http://bioinfo.mbb.yale.edu/genome/pseudogene.) By extrapolation, we predict that there could be up to-20,000 pseudogenes in the whole human genome, with a little more than half of them processed. We have determined the main populations and clusters of pseudogenes on chromosomes 21 and 22. There are notable excesses of pseudogenes relative to genes near the centromeres of both chromosomes, suggesting the existence of pseudogenic 'hot-spots' in the genome. We have looked at the distribution of InterPro families and GO functional categories in our pseudogenes. Overall, the families in both processed ...
The Sorcerer II Global Ocean Sampling expedition: Expanding the universe of protein families. PLoS Biol 5: e16
, 2007
"... Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predic ..."
Abstract
-
Cited by 33 (3 self)
- Add to MetaCart
Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in
ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins
- Nucleic Acids Res
, 2003
"... Multidomain proteins predominate in eukaryotic proteomes. Individual functions assigned to different sequence segments combine to create a complex function for the whole protein. While on-line resources are available for revealing globular domains in sequences, there has hitherto been no comprehensi ..."
Abstract
-
Cited by 32 (5 self)
- Add to MetaCart
Multidomain proteins predominate in eukaryotic proteomes. Individual functions assigned to different sequence segments combine to create a complex function for the whole protein. While on-line resources are available for revealing globular domains in sequences, there has hitherto been no comprehensive collection of small functional sites/ motifs comparable to the globular domain resources, yet these are as important for the function of multidomain proteins. Short linear peptide motifs are used for cell compartment targeting, protein–protein interaction, regulation by phosphorylation, acetylation, glycosylation and a host of other post-translational modifications. ELM, the Eukaryotic Linear Motif server at

