Results 1 - 10
of
616
Improving the Prediction of Protein Secondary Structure in Three and Eight Classes Using Recurrent Neural Networks and Profiles
, 2001
"... Secondarystructurepredictions areincreasinglybecomingtheworkhorseforseveralmethodsaimingatpredictingproteinstructure andfunction.Hereweuseensemblesofbidirectionalrecurrentneuralnetworkarchitectures, PSIBLAST -derivedprofiles,andalargenonredundant trainingsettoderivetwonewpredictors:(a)the secondvers ..."
Abstract
-
Cited by 87 (21 self)
- Add to MetaCart
Secondarystructurepredictions areincreasinglybecomingtheworkhorseforseveralmethodsaimingatpredictingproteinstructure andfunction.Hereweuseensemblesofbidirectionalrecurrentneuralnetworkarchitectures, PSIBLAST -derivedprofiles,andalargenonredundant trainingsettoderivetwonewpredictors:(a)the secondversionoftheSSproprogramforsecondary structureclassificationintothreecategoriesand(b) thefirstversionoftheSSpro8programforsecondarystructureclassificationintotheeightclasses producedbytheDSSPprogram.Wedescribethe resultsofthreedifferenttestsetsonwhichSSpro achievedasustainedperformanceofabout78% correctprediction.Wereportconfusionmatrices, comparePSI-BLASTtoBLAST-derivedprofiles,and assessthecorrespondingperformanceimprovements. SSproandSSpro8areimplementedasweb servers,availabletogetherwithotherstructural featurepredictorsat:http://promoter.ics.uci.edu/ BRNN-PRED/.Proteins2002;47:228--235.
Large-Scale Comparison of Protein Sequence Alignment Algorithms With Structure Alignments
- Proteins
, 2000
"... Sequence alignment programs such as BLAST and PSI-BLAST are used routinely in pairwise, profile-based, or intermediate-sequencesearch (ISS) methods to detect remote homologies for the purposes of fold assignment and comparative modeling. Yet, the sequence alignment quality of these methods at low se ..."
Abstract
-
Cited by 36 (1 self)
- Add to MetaCart
Sequence alignment programs such as BLAST and PSI-BLAST are used routinely in pairwise, profile-based, or intermediate-sequencesearch (ISS) methods to detect remote homologies for the purposes of fold assignment and comparative modeling. Yet, the sequence alignment quality of these methods at low sequence identity is not known. We have used the CE structure alignment program (Shindyalov and Bourne, Prot Eng 1998;11: 739) to derive sequence alignments for all superfamily and family-level related proteins in the SCOP domain database. CE aligns structures and their sequences based on distances within each protein, rather than on interprotein distances. We compared BLAST, PSI-BLAST, CLUSTALW, and ISS alignments with the CE structural alignments. We found that global alignments with CLUSTALW were very poor at low sequence identity (<25%), as judged by the CE alignments. We used PSI-BLAST to search the nonredundant sequence database (nr) with every sequence in SCOP using up to four iterations. The resulting matrix was used to search a database of SCOP sequences. PSI-BLAST is only slightly better than BLAST in alignment accuracy on a perresidue basis, but PSI-BLAST matrix alignments are much longer than BLAST's, and so align correctly a larger fraction of the total number of aligned residues in the structure alignments. Any two SCOP sequences in the same superfamily that shared a hit or hits in the nr PSI-BLAST searches were identified as linked by the shared intermediate sequence. We examined the quality of the longest SCOP-query/ SCOP-hit alignment via an intermediate sequence, and found that ISS produced longer alignments than PSI-BLAST searches alone, of nearly comparable per-residue quality. At 10--15% sequence identity, BLAST correctly aligns 28%, PSI-BLAST 40%, and ISS ...
RIO: Analyzing proteomes by automated phylogenomics using resampled inference of orthologs
, 2002
"... When analyzing protein sequences using sequence similarity searches, orthologous sequences (that diverged by speciation) are more reliable predictors of a new protein's function than paralogous sequences (that diverged by gene duplication) The utility of phylogenetic information in high-throughpu ..."
Abstract
-
Cited by 34 (0 self)
- Add to MetaCart
When analyzing protein sequences using sequence similarity searches, orthologous sequences (that diverged by speciation) are more reliable predictors of a new protein's function than paralogous sequences (that diverged by gene duplication) The utility of phylogenetic information in high-throughput genome annotation ("phylogenomics") is widely recognized, but existing approaches are either manual or not explicitly based on phylogenetic trees Results Here we present RIO (Resampled Inference of Orthologs), a procedure for automated phylogenomics using explicit phylogenetic inference RIO analyses are performed over bootstrap resampled phylogenetic trees to estimate the reliability of orthology assignments We also introduce supplementary concepts that are helpful for functional inference RIO has been implemented as Perl pipeline connecting several C and Java programs It is available at [http://www genetics wustl edu/eddy/forester/] A web server is at [http://www rio wustl edu/] RIO was tested on the Arabidopsis thaliana and Caenorhabditis elegans proteomes. Conclusion The RIO procedure is particularly useful for the automated detection of first representatives of novel protein subfamilies. We also describe how some orthologies can be misleading for functional inference. Background Accurate computational protein function analysis is an important way of extracting value from primary sequence data. Due to the large amount of data, automated systems seem unavoidable (at least for initial, prioritizing steps). Such efforts are complicated, for a variety of reasons. any proteins belong to large families, as suggested by Dayhoff [1]. Such families are often composed of subfamilies related to each other by gene duplication events. For example, Ingram [2] showed tha...
Novel small RNA-encoding genes in the intergenic regions of Escherichia Coli
"... tributed equally to this work. to the prediction of 24 putative sRNA-encoding genes, of which 23 were Received: 2 May 2001 tested experimentally. Here we report on the discovery of 14 genes Revised: 21 May 2001 encoding novel small RNAs in E. coli and their expression patterns under Accepted: 2 ..."
Abstract
-
Cited by 34 (5 self)
- Add to MetaCart
tributed equally to this work. to the prediction of 24 putative sRNA-encoding genes, of which 23 were Received: 2 May 2001 tested experimentally. Here we report on the discovery of 14 genes Revised: 21 May 2001 encoding novel small RNAs in E. coli and their expression patterns under Accepted: 23 May 2001 a variety of physiological conditions. Most of the newly discovered RNAs are abundant. Interestingly, the expression level of a significant number of Published: 26 June 2001 these RNAs increases upon entry into stationary phase. Current Biology 2001, 11:941--950 Conclusions: Based on our results, we conclude that small RNAs are much more widespread than previously imagined and that these versatile 0960-9822/01/$ -- see front matter # 2001 Elsevier Science Ltd. All rights reserved.<F11
Distance Based Indexing for String Proximity Search
- IN ICDE
, 2003
"... In many database applications involving string data, it is common to have near neighbor queries (asking for strings that are similar to a query string) or nearest neighbor queries (asking for strings that are most similar to a query string). The similarity between strings is defined in terms of a di ..."
Abstract
-
Cited by 28 (0 self)
- Add to MetaCart
In many database applications involving string data, it is common to have near neighbor queries (asking for strings that are similar to a query string) or nearest neighbor queries (asking for strings that are most similar to a query string). The similarity between strings is defined in terms of a distance function determined by the application domain. The most popular string distance measures are based on (a weighted) count of (i) character edit or (ii) block edit operations to transform one string into the other. Examples include the Levenshtein edit distance and the recently introduced compression distance. The main goal
A tutorial of recent developments in the seeding of local alignment
- JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL
, 2004
"... We review recent results on local alignment. We begin with a review of classical methods and early heuristic methods, and then focus on more recent work on the seeding of local alignment. We show that these techniques give a vast improvement in both sensitivity and specificity over previous methods, ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
We review recent results on local alignment. We begin with a review of classical methods and early heuristic methods, and then focus on more recent work on the seeding of local alignment. We show that these techniques give a vast improvement in both sensitivity and specificity over previous methods, and can achieve sensitivity at the level of classical algorithms while requiring orders of magnitude less runtime.
Reinvestigation of the Saccharomyces cerevisiae genome annotation by comparison to the genome of a related fungus: Ashbya gossypii
- Genome Biol
, 2003
"... The electronic version of this article is the complete one and can be found online at ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
The electronic version of this article is the complete one and can be found online at
LabBase: A Database to Manage Laboratory Data in a Large-Scale Genome-Mapping Project
- IEEE Computers in Medicine and Biology
, 1995
"... The central task of managing laboratory data is keeping track of laboratory samples, the experimental steps performed on them, and the results of these experiments. This task engenders several challenges, namely: ffl The need to accommodate frequent changes to laboratory protocols. ffl The need to p ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
The central task of managing laboratory data is keeping track of laboratory samples, the experimental steps performed on them, and the results of these experiments. This task engenders several challenges, namely: ffl The need to accommodate frequent changes to laboratory protocols. ffl The need to provide data access to programs written in multiple languages and running on heterogeneous hardware. ffl The need to represent unusual data types, such as DNA sequences, with specialized behavior. ffl The need to view data in both static and historical perspectives. The static perspective deals with the current state of knowledge about a material such as the "sequence of a DNA fragment" or the "chromosome from which a DNA fragment was obtained". The historical perspective deals with the history of experimental steps, such as "for what percentage of DNA fragments has the sequence of constituent bases been read more than once?" Such historical queries are crucial to understanding and refining l...
Microarray Analysis of the Transcriptional Network Controlled by the Photoreceptor Homeobox Gene
, 2000
"... dy demonstrates that cDNA microarrays can be successfully used to define the transcriptional networks controlled by transcription factors in vertebrate tissue in vivo. Background Studiesof neural development have highlighted the role of cell- and tissue-specify transcriptionfransc in regulating b ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
dy demonstrates that cDNA microarrays can be successfully used to define the transcriptional networks controlled by transcription factors in vertebrate tissue in vivo. Background Studiesof neural development have highlighted the role of cell- and tissue-specify transcriptionfransc in regulating both cell fly determination events and the later morphological stagesof neuronal difnaly6jAzzyT [1]. Little is known, however, about the gene expression or transcriptional networks regulated by thesefesey6 or those controlling cell fll determination. In the developing vertebrate retina, several transcription ftrans have been implicated in thedifM/6AyT--MzM/ of specif - cell types, including the paired-typefaire member Chx10 (bipolar neurons; [2]) and the POU-domain transcriptionfript fipti member Brn-3b (subtypeof ganglion cells [3]). The transcriptionfrans Crx (cone, rod homeobox) has a pivotal role in the morphologicaldifpho entiationof both rod and cone pho
An unappreciated role for RNA surveillance
- Genome Biol
, 2004
"... The electronic version of this article is the complete one and can be found online at ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
The electronic version of this article is the complete one and can be found online at

