40 citations found. Retrieving documents...
Santner, T. J. and Du#y, D. E. 1989.The Statistical Analysis of Discrete Data. Springer Verlag, New York.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Predicting Protein Structure using only Sequence.. - Karplus, Barrett.. (1999)   (6 citations)  (Correct)

....and local sequence alignment methods, and can be used to assign probabilities to proteins in database search [6] Our hmm fold recognition method di ers from protein threading methods [10, 19, 14, 15] in that pairwise interactions are not modeled or used. Instead, we employ Bayesian methods [3, 2, 17] to incorporate prior information in the form of Dirichlet mixture densities [20] over positionspeci c amino acid distributions. The components of the mixture re ect di erent patterns of sequence conservation and can be combined with data from aligned homologs to form data dependent estimates of ....

T. J. Santner and D. E. Du y. The Statistical Analysis of Discrete Data. Springer Verlag, New York, 1989.


Supervised Naive Bayes Parameters - Wettig, al. (2002)   (Correct)

....for each possible value of an attribute and one (binary) output variable for each value of the class. Looking at the L model in this light also proves concavity of S ) Theorem 2 (ii) since the supervised log likelihood is known to be concave for logistic regression models (see e.g. [11], p.234) Neural Networks. The conditional distribution (6) is equivalent also to a single layer (no hidden units) linear feed forward neural network with logistic sigmoid (softmax) activation function, see e.g. 1] In this type of a network both inputs and outputs are encoded using the so ....

T. Santner and D. Du#y. The Statistical Analysis of Discrete Data. Springer-Verlag, New York, 1989.


DNA Sequence Classification via an Expectation Maximization.. - Ma, Wang (2001)   (Correct)

.... a MAP EM algorithm to make the objective function more concave [16] The prior probabilities of P 10;j and P 35;j , j = 1; 6, are in the Dirichlet distribution, conjugate to the multinomial distribution, which means the posterior probabilities are also in the Dirichlet distribution [3] [24]. The Dirichlet distribution on the probability vector P = p(A) p(C) p(G) p(T) P could be 8 P 10;j or P 35;j , j = 1; 6) has the form: P (p(A) p(C) p(G) p(T)jff A ; ff C ; ff G ; ff T ) Gamma(ff 0 ) Pi T x=A Gamma(ff x ) Pi T x=A p(x) ff x Gamma1 (3) where ff 0 ....

T. J. Santner. The Statistical Analysis of Discrete Data. Springer-Verlag, New York, New York, 1989.


Confidence Intervals and Prediction Intervals for.. - Dybowski, Roberts   (Correct)

....based on the delta method (i.e. interval (1.21) where x is a d 1 dimensional vector (1, x 1 , x d ) T , b # is the covariance matrix for b w, and # 2 #[d 1] is the # 2 critical value for the 100(1 #) percentage point for d 1 degrees of freedom (figure 1.5) 6 . See Santner Du#y (1989, pp. 238 239) for further discussion. 1.4 Confidence intervals for feed forward neural networks So far, we have looked at linear and logistic regression, but if we have b y (x; b w) from an FNN, how can we obtain a confidence interval for y (x) We start with two approaches : the delta ....

Santner, T.J. & D.E. Du#y (1989), The Statistical Analysis of Discrete Data, Springer-Verlag, New York.


Machine Learning and Natural Language Processing - Marquez (2000)   (1 citation)  (Correct)

....semi supervised word sense disambiguation [166] and, in combination with the Naive Bayes classifier, in a semi supervised approach to text classification [162, 163] Finally, log linear models [44] are also being applied to natural language processing. In particular, log linear regression [195], a popular technique for binary classification, is used in [209] to classify verbs for machine translation purposes. In the same direction, the work by Marques et al. 135] use log linear models to induce verbal transitivity. 2.2 Symbolic Machine Learning Approaches 2.2.1 Decision Trees ....

T. J. Santner and D. E. Duffy. The Statistical Analysis of Discrete Data. Springer-- Verlag, New York, 1989.


Predicting protein structure using hidden Markov models - Karplus, Sjölander.. (1997)   (8 citations)  (Correct)

....and local sequence alignment methods, and can be used to assign probabilities to proteins in database search [6] Our hmm fold recognition method differs from protein threading methods [13, 23, 16, 17] in that pairwise interactions are not modeled or used. Instead, we employ Bayesian methods [4, 3, 21] to incorporate prior information in the form of Dirichlet mixture densities [24] over position specific amino acid distributions, and over insertion and deletion probabilities in different structural environments (Section 2.1) The priors reflect different patterns of sequence conservation, such ....

T. J. Santner and D. E. Duffy. The Statistical Analysis of Discrete Data. Springer Verlag, New York, 1989.


Confidence Intervals and Prediction Intervals for.. - Dybowski, Roberts   (Correct)

....dashed lines)b ased on the delta method i.e. interval 1.21) where x is a d 1 dimensional vector (1,x 1 , x d ) T , # is the covariance matrix for w,and# 2 #[d 1] is the # 2 critical value for the 100(1 #) percentage point for d 1 degrees of freedom (figure 1.5) 6 . See Santner Du#y (1989, pp. 238 239) for further discussion. 1.4 Con712H in tervals for feed forwardnH366 nH works So far, we have looked at linear and logistic regression,b ut if we have y (x; w) from an FNN, how can we ob tain a confidence interval for y (x) We start with two approaches : the delta ....

Santner, T.J. & D.E. Du#y (1989), The Statistical Analysis of Discrete Data, Springer-Verlag, New York.


An Investigation of Linguistic Features and.. - Hatzivassiloglou.. (2000)   (Correct)

.... in the j th pair should be in the same cluster 0 otherwise The values R j can be obtained from a training set of documents for which optimal cluster assignments are available (this is the case in the TDT2 training data used in our experiments) Then, we fit a log linear regression model [17] in which the V i s are the predictors and R the response. Such a model calculates first a linear internal predictor, which is a weighted sum of the V i s, k X i=1 w i V i and then relates to the final response via the logistic transformation R j = e j 1 e j Note that ....

.... weighted sum because of technical reasons relating to the statistical assumptions inherent in such modeling (the fact that the variance in the binomial distribution, which appropriately models each R j , is dependent on the mean and not constant as assumed by the linear regression model; see [17]) Given very modest assumptions about the distribution of the V i s, the optimal set of weights w i can be calculated efficiently using the iterative reweighted nonlinear least squares algorithm [2] This approach aims to optimize the final similarity function, rather than the evaluation ....

T. J. Santner and D. E. Duffy. The Statistical Analysis of Discrete Data. Springer-Verlag, New York, 1989.


Text-Based Approaches for the Categorization of Images - Sable, Hatzivassiloglou (1999)   (8 citations)  (Correct)

....in our training set. This last step is done to ensure that the words we keep will be frequent enough to be general discriminators, and to avoid cases where a particular word occurs in a few captions of images from a particular class simply by chance. We construct a log linear regression model (Santner and Du#y 1989) using binary variables corresponding to the occurrence of each of these words as predictors and the output feature (e.g. indoor or outdoor image) as the response. The model is fitted with iterative reweighted least squares (Bates and Watts 1988) and the fit assigns a weight to each of the ....

T. J. Santner and D. E. Du#y. The Statistical Analysis of Discrete Data. SpringerVerlag, New York, 1989.


Computerassisted Semiparametric Generalized Linear Models - Müller, Rönz, Härdle (1997)   (Correct)

....function j j = H( j ) log( j ) z T j fi and response function j = G(j j ) exp(j j ) connecting j = j multiplicatively with the linear predictor j j . For details see the monographs Bishop, Fienberg Holland (1975) Christensen (1990) Fahrmeir Hamerle (1984) Langenheine (1989) Santner Duffy (1989). In all these models the predictor j is of the linear form j = z T fi but linked in different ways to the expectation of the response. Generalized linear models have found many applications and attained considerable popularity, especially in social sciences, for analyzing qualitative data, ....

Santner, T. J. & Duffy, D. E. (1989). The Statistical Analysis of Discrete Data, Springer, New York.


Integration of Visual and Text-Based Approaches for the.. - Seungyup Paek (1999)   (14 citations)  (Correct)

....words plus prepositions from the first sentences of captions. We exclude proper nouns from this analysis since they are unlikely to be general indicators of one of the categories, and only consider words occurring five times or more in our training set. We construct a log linear regression model [ Santner and Duffy 1989 ] using binary variables corresponding to the occurrence of each of these words as predictors and the output feature (e.g. indoor or outdoor image) as the response. The model is fitted with iterative reweighted least squares [ Bates and Watts 1988 ] and the fit assigns a weight to each of the ....

Thomas J. Santner and Diane E. Duffy. The Statistical Analysis of Discrete Data. Springer-Verlag, New York, 1989.


Predicting protein structure using hidden Markov models - Karplus, Sjölander.. (1997)   (8 citations)  (Correct)

....alignment methods, and can be used to assign probabilities to proteins in database search [6] Our hmm fold recognition method differs from protein threading methods [13, 23, 16, 17] in that pairwise (residue residue) interactions are not modeled or used. Instead, we employ Bayesian methods [4, 3, 21] to incorporate prior information in the form of Dirichlet mixture densities [24] over position specific amino acid distributions, and over insertion and deletion probabilities in different structural environments (Section 2.1) The priors reflect different patterns of sequence conservation, such ....

T. J. Santner and D. E. Duffy. The Statistical Analysis of Discrete Data. Springer Verlag, New York, 1989.


Key Words: - Ceres Plot   (Correct)

....the logit and explanatory variables. Fowlkes (1987) then adapted these plots for the assessment of nonlinearity in explanatory variables by applying a smoothing technique to the binary data. Further discussions of partial residual plots with respect to logistic regression models can be found in Santner and Duffy (1989) and Collett (1991) In this paper we obtain partial residual plots for use with generalized linear models (Section 2) and we extend Cook s CERES plots to these models. In Section 3 we develop partial residual plots for transformed generalized linear models. In Section 4 we present six examples ....

Santner, T. J. and Duffy, D. E.(1989). The Statistical Analysis of Discrete Data. New York: Springer-Verlag.


More Powerful Tests from Confidence Interval P Values - Berger (1995)   (Correct)

....the test defined by p Z . Note, this property applies in general to confidence interval p values, not just this problem and this test statistic Z. 6 OTHER TEST STATISTICS Other statistics besides Z(x; y) such as the likelihood ratio test statistic and p 2 Gamma p 1 , can be used to test (1) Santner and Duffy (1989, Exercises 5.11 and 5.12) Haber (1987) and Mart in and Silva (1994) list several possible statistics. The experience with Z suggests that if another statistic is used, the confidence interval p value might provide improved power over the usual unconditional p value. The power comparisons of ....

Santner, T. J. and Duffy, D. E. (1989). The Statistical Analysis of Discrete Data. SpringerVerlag, New York.


Separating Mixtures Using Megapriors - Bailey (1996)   (Correct)

....6 24 W 2 0 1 0 4 1 3 1 0 2 1 1 1 2 2 1 1 1 888 13 Y 4 0 2 0 25 2 33 4 1 5 3 3 3 2 2 4 3 2 12 494 Table 1: The 30 component Dirichlet mixture prior for proteins. the component. The magnitude b i gives an idea of the relative strength of the prior since the variance of a Dirichlet distribution [Santner and Duffy, 1989] with parameters fi (i) is inversely proportional to b i , V ar(c) fi (i) b i ) I Gamma (fi (i) b i ) b i 1 where I is the identity vector of length jLj. The Dirichlet mixture distribution thus captures the idea that a column of a motif p is a random sample generated by a process ....

T. J. Santner and D. E. Duffy. The Statistical Analysis of Discrete Data. Springer Verlag, 1989.


Analysis of Dose-Response Data in the Presence of Extra-Binomial.. - Boos (1991)   (Correct)

.... Score tests; Teratology 1 Introduction Logistic regression and probit analysis are often used in dose response modeling of binary responses where binomial likelihoods form the basis for a well developed theory of estimation and inference (e.g. Cox and Snell, 1989, Hosmer and Lemeshow, 1989, and Santner and Duffy, 1989). In certain situations, however, the binary responses arise naturally in groups or litters and a binomial likelihood description of the data is not correct due to induced correlations within litters. If Y is the number of successes in a litter of size n with E(Y jn) np, then typically Var(Y ....

Santner, T. J., and Duffy, D. E. (1989), The Statistical Analysis of Discrete Data, New York: Springer-Verlag.


The Megaprior Heuristic for Discovering Protein Sequence.. - Bailey, Gribskov (1996)   (3 citations)  (Correct)

....sequences, sample size is approximately equal to the total number of characters in the sequences. distribution ae i with parameters fi (i) fi (i) a ; fi (i) z ) is defined as b i = P x2L fi (i) x . The variance of ae i is inversely proportional to its magnitude, b i , since (Santner and Duffy 1989) V ar(c) fi (i) b i ) I Gamma (fi (i) b i ) b i 1 : Thus, multiplying the parameter vector fi (i) of component ae i of a Dirichlet mixture prior by scale factor s 0 reduces the the variance of the component by a factor of approximately 1=s. This scaling does not affect the ....

T. J. Santner and D. E. Duffy. The Statistical Analysis of Discrete Data. Springer Verlag, 1989.


Hidden Markov Models in Computational Biology.. - Krogh, Brown.. (1993)   (144 citations)  (Correct)

....= P 20 i=1 ff i . The numbers ff 1 ; ff 20 define an a priori probability density over the set of all possible vectors (p 1 ; p 20 ) such that p i 0 and P 20 i=1 p i = 1. This density is known as the Dirichlet distribution with parameters ff 1 ; ff 20 (Berger, 1985; Santner Duffy, 1989). The p i defined in (7) are the posterior estimates of the parameters, given the count data and this prior. 7 We use the same ff 1 ; ff 20 for reestimating the probabilities of the amino acids in each match state. 7 Assuming that ff i 1, the MAP estimate is actually obtained by ....

Santner, T. J. & Duffy, D. E. (1989). The Statistical Analysis of Discrete Data. New York: Springer Verlag.


Using Dirichlet Mixture Priors to Derive Hidden Markov.. - Brown, Hughey, al. (1993)   (22 citations)  (Correct)

....parameter vectors and instead use the data from the count vectors to directly estimate the underlying density ae. To make this feasible, we have assumed a simple parametric form for the density ae, initially choosing a Dirichlet density with unknown parameters ff 1 ; ff 20 (Berger, 1985; Santner and Duffy, 1989) . The value of ae at a particular point p is given by: ae( p) Q 20 i=1 p ff i Gamma1 i Z ; 1) where Z is the normalizing constant such that ae integrates to unity. Letting ff = P 20 i=1 ff i , it is easy to see that the Dirichlet density with parameters ff 1 ; ff 20 is ....

Santner, T.J. and Duffy, D.E. 1989. The Statistical Analysis of Discrete Data. Springer Verlag, New York.


Stimulus-Response Analysis for Data in the Form of Proportions - Piegorsch (1993)   (Correct)

....to l b j = 0 (j=0,1) are functions of the Y i ; these may be viewed as the most likely values of b j available from the data. Unfortunately, no closed form expressions exist for the ML estimates under equation (1) Hence, the estimation requires computer iteration to achieve the b estimates (Santner and Duffy, 1989, 5.3B) Modern computer packages such as SAS (SAS Institute Inc. 1985) or GLIM (Payne, 1987) can provide ML logistic regression estimates for b j (j=0,1) and also give estimates of se( b j ) along with other summary measures of the model fit. To test for an increasing trend over ....

Santner, T. J., and Duffy, D. E. (1989). The Statistical Analysis of Discrete Data. New York: Springer-Verlag.


Recent Methods for RNA Modeling Using Stochastic.. - Sakakibara.. (1994)   (7 citations)  (Correct)

....approach to the parameter estimation problem, similar to our approach with protein HMMs [KBM 94, BHK 93b] Before training the grammars, we construct a prior probability density for each of their important parameter sets. This prior density takes the form of a Dirichlet distribution [SD89] The important productions are of two forms: S aSb and S aS, where terminal symbols a; b 2 fA; C; G; Ug. S aSb productions, which generate base pairs, come in groups of 16, corresponding to all possible pairs of terminal symbols. S aS productions, which generate nucleotides in loop ....

T. J. Santner and D. E. Duffy. The Statistical Analysis of Discrete Data. Springer Verlag, New York, 1989.


Hidden Markov models for sequence analysis: extension and.. - Hughey, Krogh (1996)   (30 citations)  (Correct)

....The prior distribution is a distribution over the model parameters; for the HMM it is a probability distribution over probability distributions. The prior contains our prior beliefs about the parameters of the model. In our work we use Dirichlet distributions for the prior (Berger, 1985; Santner Duffy, 1989). For a discrete probability distribution p 1 ; p M a Dirichlet distribution is described by M parameters ff 1 ; ff M . The mean of the Dirichlet distribution is p i = ff i =ff 0 , where ff 0 = P i ff i , and the variance is inversely proportional to ff 0 . If ff 0 is large, it ....

Santner, T. J. & Duffy, D. E. (1989). The Statistical Analysis of Discrete Data. New York: Springer Verlag.


Bayesian Variable Selection in Qualitative Models by.. - Dupuis, Robert (1998)   (1 citation)  (Correct)

.... covariate, say x 1 , is generally associated with the point null hypothesis H 0 : 8j = f1; Jg; 8(u; v) 2 X 2 1 ; ff j (u; s 2 ; s p ) Gamma ff j (v; s 2 ; s p ) 0; 2:1) where (s 2 ; s p ) represents any value of (x 2 ; x p ) See for instance Santner and Duffy (1989). When some covariates are continuous, or when they are discrete but p is large, a parameterized covariate dependent model can be considered instead, as for instance in a generalized linear model, P (y i = jjx i ; ff) Phi j (x t i ff) j = 1; J; i = 1; n; ff 2 IR p : 2:2) In ....

Santner, T.J. and Duffy, D. (1989) The Statistical Analysis of Discrete Data. Springer-Verlag, New York.


Dirichlet Mixtures: A Method for Improving.. - Sjölander.. (1996)   (3 citations)  (Correct)

....mixture was used in experiments elsewhere (Tatusov et al. 1994; Henikoff and Henikoff, 1995) 3 Mathematical Foundations 3. 1 What are Dirichlet densities A Dirichlet density ae is a probability density over the set of all probability vectors p (i.e. p i 0 and P i p i = 1) Berger, 1985; Santner and Duffy, 1989) . In the case of proteins, with a 20 letter alphabet, p = p 1 ; p 20 and p i = Prob(amino acid i) Here, each vector p represents a possible probability distribution over the 20 amino acids. A Dirichlet density has parameters ff = ff 1 ; ff 20 , ff i 0. The value of the ....

Santner, T. J. and Duffy, D. E. 1989. The Statistical Analysis of Discrete Data. Springer Verlag, New York.


Dirichlet Mixtures: A Method for Improved.. - Sjölander.. (1996)   (1 citation)  (Correct)

....about amino acid distributions that typically occur in columns of multiple alignments into the process of building a statistical model. We present a method to condense the information in databases of multiple alignments into a mixture of Dirichlet densities (Bernardo and Smith, 1994; Berger, 1985; Santner and Duffy, 1989) over amino acid distributions, and to combine this prior information with the observed amino acids to form more effective estimates of the expected distributions. Multiple alignments used in these experiments were taken from the Blocks database (Henikoff and Henikoff, 1991) We use Maximum ....

....smoothly between reliance on the prior information concerning likely amino acid distributions, in the absence of data, and confidence in the amino acid frequencies observed at each position, given sufficient data. 1. 3 What is a Dirichlet density A Dirichlet density ae (Berger, 1985; Santner and Duffy, 1989) is a probability density over the set of all probability vectors p (i.e. p i 0 and P i p i = 1) Proteins have a 20 letter alphabet, with p i = Prob(amino acid i) Each vector p represents a possible probability distribution over the 20 amino acids. A Dirichlet density has parameters ff ....

Santner, T. J. and Duffy, D. E. 1989. The Statistical Analysis of Discrete Data. Springer Verlag, New York.


Markov Switching Time Series Models with Application to a.. - Lu, Berliner (1999)   (4 citations)  (Correct)

....probabilities. In general, one can employ multinomial regression; this introduces no conceptual differences, though calculations are a bit more intricate. Binary regression facilitates the incorporation of the Y and X series as covariates. For introductions to this area, see Agresti (1990) and Santner and Duffy (1990). First, note that the resulting models for the transition probabilities are time dependent; i.e. the chain is not stationary Second, to model the transition probabilities, one may choose an arbitrary probability distribution function, say H. A particular transition probability is then modeled ....

Santner, T. J., and Duffy, D. E. (1990). The Statistical Analysis of Discrete Data.


Fitting a Mixture Model by Expectation Maximization to.. - Bailey, Elkan (1994)   (31 citations)  (Correct)

.... and Lawrence et al. 1993] the equations above for f ij are replaced by f ij = c ij fi j P L k=1 c ik fi ; i = 0; W; j = 1; L; fi = L X k=1 fi k : 20) This turns out to be equivalent to using the Bayes estimate for the value of under squared error loss (SEL) [Santner and Duffy, 1989] assuming that the prior distribution of each j , P ( j ) is the so called Dirichlet distribution with parameter fi 0 = fi 1 ; fi L ) The value of fi 0 must be chosen by the user depending on what information is available about the distribution of j for motifs and for the ....

T. J. Santner and D. E. Duffy. The Statistical Analysis of Discrete Data. Springer Verlag, 1989.


Dirichlet Mixtures: A Method for Improved Detection of Weak - But Signicant Protein   (Correct)

No context found.

Santner, T. J. and Du#y, D. E. 1989.The Statistical Analysis of Discrete Data. Springer Verlag, New York.


Supervised Learning of Bayesian Network Parameters Made .. - Wettig, Grünwald.. (2002)   (Correct)

No context found.

T. Santner and D. Du y. 1989. The Statistical Analysis of Discrete Data. Springer Verlag, New York. G. Schwarz. 1978. Estimating the dimension of a model. Annals of Statistics, 6:461-464.


Categorizing Web Queries According to - Geographical Locality Luis (2003)   (Correct)

No context found.

T. J. Santner and D. E. Du#y. The Statistical Analysis of Discrete Data. Springer-Verlag, New York, 1989.


Learning Anchor Verbs for Biological Interaction.. - Vasileios.. (2002)   (Correct)

No context found.

J. Santner, D.E. Duffy, The Statistical Analysis of Discrete Data, Springer Verlag, New York, 1989.


A Quantitative Evaluation of Linguistic Tests for - The Automatic Prediction (1995)   (Correct)

No context found.

Thomas J. Santner and Diane E. Duffy. 1989. The Statistical Analysis of Discrete Data. SpringerVerlag, New York.


Predicting the Semantic Orientation of Adjectives - Vasileios Hatzivassiloglou And (1997)   (5 citations)  (Correct)

No context found.

Thomas J. Santner and Diane E. Duffy. 1989. The Statistical Analysis of Discrete Data. SpringerVerlag, New York.


Corpus-Based Linguistic Indicators for Aspectual.. - Eric Siegel Department (1999)   (Correct)

No context found.

T.J. Santner and D.E. Du#y. 1989. The Statistical Analysis of Discrete Data. Springer-Verlag, New York.


Categorizing Web Queries According to - Geographical Locality Luis (2003)   (Correct)

No context found.

T. J. Santner and D. E. Du#y. The Statistical Analysis of Discrete Data. Springer-Verlag, New York, 1989.


Effects of Adjective Orientation and Gradability.. - Vasileios.. (2000)   (2 citations)  (Correct)

No context found.

Thomas J. Santner and Diane E. Duffy. 1989. The Statistical Analysis of Discrete Data. Springer-Verlag, New York.


Interval Estimation in Exponential Families - Brown, Cai, DasGupta   (Correct)

No context found.

Santner, T.J. & Du y, D.E. (1989).The Statistical Analysis of Discrete Data. SpringerVerlag, Berlin.


Interval Estimation for a Binomial Proportion - Brown, Cai, DasGupta (2001)   (Correct)

No context found.

Santner, T.J. & Du y, D.E. (1989).The Statistical Analysis of Discrete Data. SpringerVerlag, Berlin.


Partially Improper Gaussian Priors for Nonparametric.. - Nandini Raghavan (1995)   (Correct)

No context found.

Thomas J. Santner and Diane E. Duffy. The Statistical Analysis of Discrete Data. Springer-Verlag, New York, 1989.


.2 Kinase experiments - Protein Kinases   (Correct)

No context found.

T.J. Santner and D.E. Duffy. The Statistical Analysis of Discrete Data. Springer Verlag, 1989.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC