Results 1 -
9 of
9
Folding@Home and Genome@Home: Using distributed computing to tackle previously intractable problems in computational biology
"... For decades, researchers have been applying computer simulation to address problems in biology. However, many of these grand challenges in computational biology, such as simulating how proteins fold, remained unsolved due to their great complexity. Indeed, even to simulate the fastest folding prot ..."
Abstract
-
Cited by 103 (0 self)
- Add to MetaCart
For decades, researchers have been applying computer simulation to address problems in biology. However, many of these grand challenges in computational biology, such as simulating how proteins fold, remained unsolved due to their great complexity. Indeed, even to simulate the fastest folding protein would require decades on the fastest modern CPUs. Here, we review novel methods to fundamentally speed such previously intractable problems using a new computational paradigm: distributed computing. By efficiently harnessing tens of thousands of computers throughout the world, we have been able to break previous computational barriers. However, distributed computing brings new challenges, such as how to efficiently divide a complex calculation of many PCs that are connected by relatively slow networking. Moreover, even if the challenge of accurately reproducing reality can be conquered, a new challenge emerges: how can we take the results of these simulations (typically tens to hundreds of gigabytes of raw data) and gain some insight into the questions at hand. This challenge of the analysis of the sea of data resulting from large-scale simulation will likely remain for decades to come.
A Search for Energy Minimized Sequences of Proteins
, 2009
"... In this paper, we present numerical evidence that supports the notion of minimization in the sequence space of proteins for a target conformation. We use the conformations of the real proteins in the Protein Data Bank (PDB) and present computationally efficient methods to identify the sequences with ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
In this paper, we present numerical evidence that supports the notion of minimization in the sequence space of proteins for a target conformation. We use the conformations of the real proteins in the Protein Data Bank (PDB) and present computationally efficient methods to identify the sequences with minimum energy. We use edge-weighted connectivity graph for ranking the residue sites with reduced amino acid alphabet and then use continuous optimization to obtain the energy-minimizing sequences. Our methods enable the computation of a lower bound as well as a tight upper bound for the energy of a given conformation. We validate our results by using three different inter-residue energy matrices for five proteins from protein data bank (PDB), and by comparing our energy-minimizing sequences with 80 million diverse sequences that are generated based on different considerations in each case. When we submitted some of our chosen energy-minimizing sequences to Basic Local Alignment Search Tool (BLAST), we obtained some sequences from nonredundant protein sequence database that are similar to ours with an E-value of the order of 10-7. In summary, we conclude that proteins show a trend towards minimizing energy in the sequence space but do not seem to adopt the global energyminimizing sequence. The reason for this could be either that the existing energy matrices are not able to accurately represent the inter-residue interactions in the context of the protein environment or that Nature does not push the
What’s in a likelihood? Simple models of protein evolution and the contribution of structurally viable reconstructions to the likelihood
- Syst. Biol
"... Abstract.-Most phylogenetic models of protein evolution assume that sites are independent and identically distributed. Interactions between sites are ignored, and the likelihood can be conveniently calculated as the product of the individual site likelihoods. The calculation considers all possible ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract.-Most phylogenetic models of protein evolution assume that sites are independent and identically distributed. Interactions between sites are ignored, and the likelihood can be conveniently calculated as the product of the individual site likelihoods. The calculation considers all possible transition paths (also called substitution histories or mappings) that are consistent with the observed states at the terminals, and the probability density of any particular reconstruction depends on the substitution model. The likelihood is the integral of the probability density of each substitution history taken over all possible histories that are consistent with the observed data. We investigated the extent to which transition paths that are incompatible with a protein's three-dimensional structure contribute to the likelihood. Several empirical amino acid models were tested for sequence pairs of different degrees of divergence. When simulating substitutional histories starting from a real sequence, the structural integrity of the simulated sequences quickly disintegrated. This result indicates that simple models are clearly unable to capture the constraints on sequence evolution. However, when we sampled transition paths between real sequences from the posterior probability distribution according to these same models, we found that the sampled histories were largely consistent with the tertiary structure. This suggests that simple empirical substitution models may be adequate for interpolating changes between observed sequences during phylogenetic inference despite the fact that the models cannot predict the effects of structural constraints from first principles. This study is significant because it provides a quantitative assessment of the biological realism of substitution models from the perspective of protein structure, and it provides insight on the prospects for improving models of protein sequence evolution. [Ancestral state reconstruction; empirical amino acid models; maximum likelihood; phylogenetics; protein structure.]
Program of Study Committee:
, 2008
"... New statistical potentials for improved protein structure prediction ..."
PROTEINS: Structure, Function, and Genetics 51:390–396 (2003) Increased Detection of Structural Templates Using Alignments of Designed Sequences
"... ABSTRACT Protein structure prediction by comparative modeling benefits greatly from the use of multiple sequence alignment information to improve the accuracy of structural template identification and the alignment of target sequences to structural templates. Unfortunately, this benefit is limited t ..."
Abstract
- Add to MetaCart
ABSTRACT Protein structure prediction by comparative modeling benefits greatly from the use of multiple sequence alignment information to improve the accuracy of structural template identification and the alignment of target sequences to structural templates. Unfortunately, this benefit is limited to those protein sequences for which at least several natural sequence homologues exist. We show here that the use of large diverse alignments of computationally designed protein sequences confers many of the same benefits as natural sequences in identifying structural templates for comparative modeling targets. A large-scale massively parallelized application of an all-atom protein design algorithm, including a simple model of peptide backbone flexibility, has allowed us to generate 500 diverse, non-native, high-quality sequences for each of 264 protein structures in our test set. PSI-BLAST searches using the sequence profiles generated from the designed sequences (“reverse ” BLAST searches) give nearperfect accuracy in identifying true structural homologues of the parent structure, with 54 % coverage. In 41 of 49 genomes scanned using reverse BLAST searches, at least one novel structural template (not found by the standard method of PSI-BLAST against PDB) is identified. Further improvements in coverage, through optimizing the scoring function used to design sequences and continued application to new protein structures beyond the test set, will allow this method to mature into a useful strategy for identifying distantly related structural templates.
présentée et soutenue publiquement par
, 2010
"... Eléments d’adaptation de la méthodologie d’analyse de cycle de vie aux carburants végétaux: cas de la première génération ..."
Abstract
- Add to MetaCart
(Show Context)
Eléments d’adaptation de la méthodologie d’analyse de cycle de vie aux carburants végétaux: cas de la première génération