| Waterman, M. and Vingron, M., Sequence comparison significance and poisson approximation, Statistical Science, 9:401--418, 1994. |
....further need for computation which made the original version of BLAST so successful. However, in order to detect weak sequence homologies, it is crucial to allow gaps in an alignment [24] In the presence of gaps the E values follow according to numerical studies still the universal form Eq. 1) [31, 10, 19, 33, 34, 2, 23] However, the numerical values of the two parameters # and K are not known. There are various approaches to solve this dilemma: for large gap costs there are approximate analytical formulas for # [21, 20, 29] For a small sub class of scoring systems there is even an analytical formula for # that ....
M. S. Waterman and M. Vingron. Sequence comparison significance and poisson approximation. Stat. Sci., 9(3):367-- 381, August 1994.
.... fi) E i;j Gamma1 Gamma fig 28 Similarity detection F i;j = max fSW i Gamma1;j Gamma (ff fi) F i Gamma1;j Gamma fig and SW i;j = max 8 : SW i Gamma1;j Gamma1 S(x i ; y j ) E i;j F i;j 0 9 = Figure 3. 5 gives an example of a Smith Waterman table. The optimal local 0 S C A P C A L E C D P C D 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 12 4 13 5 0 0 4 13 5 13 0 0 0 10 10 5 0 0 12 4 22 14 0 0 0 0 4 0 13 14 22 5 5 14 Figure 3.5: A Smith Waterman table: The optimal local alignment paths are highlighted. ....
....this prediction. Experiment 9.1.1 We generate 1000 independent random queries of length m = 350. The residues are chosen independently according to the average amino acid distribution shown in Table 3.1. Each query is aligned to all entries of the SWISSPROT database release 31:0. using the Smith Waterman algorithm and the PAM 250 scoring matrix. The experiment is done twice: once with gap free local alignments, and a second time using the affine gap cost function g(k) 12 3 k. In both cases, the maximal scores H max s of each of the 1 000 searches are sampled and their empirical distribution function is derived. ....
[Article contains additional citation context not shown here]
Waterman, M. S. and Vingron, M. (1994b) Sequence comparison significance and Poisson approximation. Stat. Sci., 9, 367.
....at different gap costs. The solid lines are best fits to the Gumbel probability density p( Sigma) exp( Gamma Sigma Gamma e Gamma Sigma ) The values of and derived from these fits are summarized in Table 1. These empirical estimates of the Gumbel parameters were carried out using Smith Waterman alignments with PAM 120 and PAM 250 scoring matrices, at the gap cost 3 of ffi = 1, 10:8, 4:3, and 2:9. The alignment was performed for sequences of lengths N = N 0 = 350 and N = N 0 = 700. Table 1 gives the extracted values of the Gumbel parameters and for the different scoring parameter settings. The ....
Waterman, M.S. and Vingron, M. 1994a. Sequence Comparison Significance and Poisson Approximation. Stat. Sci. 9:367--381.
....k where (k) max is the largest eigenvalue for a random matrix in the Hermite ensemble of k Theta k Hermitian matrices. It is intriguing to speculate that there may be connections with the subject of sequence comparison statistics, motivated by DNA sequence matching (e.g. Waterman and Vingron [57]) where analogous asymptotic distribution theory is less well developed. 5 More about patience sorting From the viewpoint of patience sorting with n cards, the number of piles is just one aspect of the final configuration. There are other natural questions such as the number of cards in the ....
M.S. Waterman and M. Vingron. Sequence comparison significance and Poisson approximation. Statist. Science, 9:367--381, 1994.
....Sequences In this section we consider the problem of quantifying the degree to which two categorical time series are coherent (see (1. 7) The goal is to discover whether the sequences contain similar patterns and the problem is motivated by the matching of two DNA sequences (for example, see Waterman Vingron, 1994). This approach builds on the ideas used in defining the spectral envelope for a qualitative valued time series and technical details can be found in Stoffer Tyler (1998) We continue to focus on methods that are computationally simple and fast, and can be applied to long sequences. 4.1 The ....
Waterman, M. S. & Vingron, M. (1994). Sequence comparison significance and Poisson approximation. Statistical Science, 9, 367-381.
....of scores. We generate 1 000 independent random queries of length m = 350. The residues are chosen independently according to an average amino acid distribution as described in (McCaldon Argos, 1988) Each query is aligned to all entries of the SWISS PROT database release 31:0 using the Smith Waterman algorithm. We use the PAM 250 scoring matrix and very high gap penalties that do not allow gaps at all. Using the much faster programs FASTA or BLAST is inappropriate for studying random scores, since they fail to detect all similarities on weak significance levels reliably. 75 80 85 90 95 100 105 110 115 120 125 10 8 6 4 ....
Waterman, M. S. & Vingron, M. (1994a). Sequence comparison significance and Poisson approximation. Stat. Sci.
....score of the respective similarities. However, only for rather simple ways of comparing sequences has the statistical distribution of this score been understood. In this paper we want to summarize recent advances in our understanding of statistics of the score of local Smith Waterman alignments [14, 13]. To judge the statistical significance of the score of a given pairwise alignment the following procedure has found wide acceptance. The letters of the sequences are permuted randomly and a new alignment score is calculated. This procedure is repeated roughly 100 times and mean and standard ....
....Arratia and Waterman [3] showed that a negative expected global alignment score is equivalent to logarithmic growth of the local alignment score in the sequence length. For details and applications see Arratia and Waterman [3] Waterman [11] Vingron and Waterman [10] and Waterman and Vingron [14]. The main obstacle in applying the framework described in the interpretation of the Karlin Altschul formula to local alignment with gaps is the following. Different HSPs do not overlap each other. Therefore the dependence between the HSPs exceeding a threshold is low. In order to apply the above ....
[Article contains additional citation context not shown here]
Michael S. Waterman and Martin Vingron. Sequence comparison significance and poisson approximation. Statistical Science, 9:401--418, 1994.
No context found.
Waterman, M. and Vingron, M., Sequence comparison significance and poisson approximation, Statistical Science, 9:401--418, 1994.
No context found.
Waterman, M. and Vingron, M., Sequence comparison significance and poisson approximation, Statistical Science, 9:401--418, 1994.
No context found.
M. S. Waterman and M. Vingron. Sequence Comparison Significance and Poisson Approximation. Statistical Science, 9(3):367--381, 1994. 18
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC