Abstract:
The biotechnology revolution stems from rapid advances in the biological sciences. One important product of these advances is a large and rapidly growing data base of biopolymer (DNA, RNA, and protein) sequences, which has attracted much attention from researchers in different fields. The great majority of the techniques generated for studying these data have been designed to analyze a single sequence or for the comparison of a pair of sequences. Multiple sequence analysis has remained a difficult challenge. In recent years, formal statistical models have shown potential in one such problem, multiple sequence alignment. In this article we describe a general statistical paradigm, the unified Gibbs method, for the conversion of nearly any existing method for the analysis of a single sequence or for the comparison of a pair of sequences into a multiple sequence analysis method. Our previous successful experiences with the unified Gibbs include the development of the site sampler, the motif sampler, and the PROBE. Here we demonstrate again the power of such a paradigm by describing a multiple sequence partitioning method for the delineation of subsequences indicative of underlying structural features. We also show that the simple Bayesian framework is useful for model selections even for pairwise sequence comparisons.
Citations
|
2172
|
Optimization by simulated annealing
– Kirkpatrick, Gelatt, et al.
- 1983
|
|
634
|
A general method applicable to the search for similarities in the amino acid sequence of two proteins
– Needleman, Wunch
- 1970
|
|
425
|
Bayes factors
– Kass, Raftery
- 1995
|
|
402
|
Sampling-based approaches to calculating marginal densities
– Gelfand, Smith
- 1990
|
|
261
|
Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment
– Lawrence, Altschul, et al.
- 1993
|
|
212
|
Prediction of protein secondary structure at better than 70
– Rost, Sander
- 1993
|
|
111
|
Hidden Markov models of biological primary sequence information
– Baldi, Chauvin, et al.
- 1994
|
|
96
|
Multiple alignment using hidden Markov models
– Eddy
- 1995
|
|
56
|
Bayesian models for multiple local sequence alignment and Gibbs sampling strategies
– Liu, Neuwald, et al.
- 1995
|
|
56
|
Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Sci
– Neuwald, Liu, et al.
- 1995
|
|
52
|
Protein modeling using hidden Markov models: Analysis of globins
– Haussler, Krogh, et al.
- 1993
|
|
38
|
Extracting protein alignment models from the sequence database
– Neuwald
- 1997
|
|
38
|
Matching sequences under deletion/insertion constraints
– Sankoff
- 1972
|
|
23
|
The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem
– LIU
- 1994
|
|
17
|
Algorithms for the optimal identification of segment neighborhoods
– Auger, Lawrence
- 1989
|
|
12
|
Markov structures in biological sequence alignments
– Liu, Neuwald, et al.
- 1999
|
|
10
|
Predictive Updating Methods with Application to Bayesian Classification
– Chen, Liu
- 1996
|
|
5
|
An expectation maximization algorithm for the identification and characterization of common sites in unaligned biopolymer sequences
– Lawrence, Reilly
- 1990
|
|
2
|
Statistical models for multiple sequence alignment: unifications and generalizations
– Liu, Lawrence
- 1995
|
|
1
|
Biochemistry 2nd ed
– Campbell
- 1995
|