DMCA
Tutorial on maximum likelihood estimation. (2003)
Venue: | Journal of Mathematical Psychology, |
Citations: | 115 - 3 self |
Citations
5132 | Optimization by simulated annealing
- Kirkpatrick, Jr, et al.
- 1983
(Show Context)
Citation Context ...ir effectiveness. For example, one may choose different starting values over multiple runs of the iteration procedure and then examine the results to see whether the same solution is obtained repeatedly. When that happens, one can conclude with some confidence that a global maximum has been found.2 Fig. 3. A schematic plot of the log-likelihood function for a fictitious one-parameter model. Point B is the global maximum whereas points A and C are two local maxima. The series of arrows depicts an iterative optimization process. 2A stochastic optimization algorithm known as simulated annealing (Kirkpatrick, Gelatt, & Vecchi, 1983) can overcome the local maxima problem, at least in theory, though the algorithm may not be a feasible option in practice as it may take an realistically long time to find the solution. I.J. Myung / Journal of Mathematical Psychology 47 (2003) 90–10094 3.3. Relation to least-squares estimation Recall that in MLE we seek the parameter values that are most likely to have produced the data. In LSE, on the other hand, we seek the parameter values that provide the most accurate description of the data, measured in terms of how closely the model fits the data under the square-loss function. Formall... |
4300 |
Estimating the dimension of a model
- Schwarz
- 1978
(Show Context)
Citation Context ...ation used). In contrast, no such things can be said about LSE. As such, most statisticians would not view LSE as a general method for parameter estimation, but rather as an approach that is primarily used with linear regression models. Further, many of the inference methods in statistics are developed based on MLE. For example, MLE is a prerequisite for the chi-square test, the Gsquare test, Bayesian methods, inference with missing data, modeling of random effects, and many model selection criteria such as the Akaike information criterion (Akaike, 1973) and the Bayesian information criteria (Schwarz, 1978). *Fax: +614-292-5601. E-mail address: myung.1@osu.edu. 0022-2496/03/$ - see front matter r 2003 Elsevier Science (USA). All rights reserved. doi:10.1016/S0022-2496(02)00028-7 In this tutorial paper, I introduce the maximum likelihood estimation method for mathematical modeling. The paper is written for researchers who are primarily involved in empirical work and publish in experimental journals (e.g. Journal of Experimental Psychology) but do modeling. The paper is intended to serve as a stepping stone for the modeler to move beyond the current practice of using LSE to more informed modeling ... |
764 | A theory of memory retrieval
- Ratcliff
- 1978
(Show Context)
Citation Context ...of estimation can have non-trivial consequences. In general, LSE estimates tend to differ from MLE estimates, especially for data that are not normally distributed such as proportion correct and response time. An implication is that one might possibly arrive at different conclusions about the same data set depending upon which method of estimation is employed in analyzing the data. When this occurs, MLE should be preferred to LSE, unless the probability density function is unknown or difficult to obtain in an easily computable form, for instance, for the diffusion model of recognition memory (Ratcliff, 1978).3 There is a situation, however, in which the two methods intersect. This is when observations are independent of one another and are normally distributed with a constant variance. In this case, maximization of the log-likelihood is equivalent to minimization of SSE, and therefore, the same parameter values are obtained under either MLE or LSE. 4. Illustrative example In this section, I present an application example of maximum likelihood estimation. To illustrate the method, I chose forgetting data given the recent surge of interest in this topic (e.g. Rubin & Wenzel, 1996; Wickens, 1998; Wi... |
477 | The time course of perceptual choice: the Leaky, Competing Accumulator model
- Usher, McClelland
- 2001
(Show Context)
Citation Context ... viability of such models. Once a model is specified with its parameters, and data have been collected, one is in a position to evaluate its goodness of fit, that is, how well it fits the observed data. Goodness of fit is assessed by finding parameter values of a model that best fits the data—a procedure called parameter estimation. There are two general methods of parameter estimation. They are least-squares estimation (LSE) and maximum likelihood estimation (MLE). The former has been a popular choice of model fitting in psychology (e.g., Rubin, Hinton, & Wenzel, 1999; Lamberts, 2000 but see Usher & McClelland, 2001) and is tied to many familiar statistical concepts such as linear regression, sum of squares error, proportion variance accounted for (i.e. r2), and root mean squared deviation. LSE, which unlike MLE requires no or minimal distributional assumptions, is useful for obtaining a descriptive measure for the purpose of summarizing observed data, but it has no basis for testing hypotheses or constructing confidence intervals. On the other hand, MLE is not as widely recognized among modelers in psychology, but it is a standard approach to parameter estimation and inference in statistics. MLE has many... |
231 |
Model Selection
- Linhart, Zucchini
- 1986
(Show Context)
Citation Context ...elity to the underlying process. For example, it is well established in statistics that a complex model with many parameters fits data better than a simple model with few parameters, even if it is the latter that generated the data. The central question is then how one should decide among a set of competing models. A short answer is that a model should be selected based on its generalizability, which is defined as a model’s ability to fit current data but also to predict future data. For a thorough treatment of this and related issues in model selection, the reader is referred elsewhere (e.g. Linhart & Zucchini, 1986; Myung, Forster, & Browne, 2000; Pitt, Myung, & Zhang, 2002). 5. Concluding remarks This article provides a tutorial exposition of maximum likelihood estimation. MLE is of fundamental importance in the theory of inference and is a basis of many inferential techniques in statistics, unlike LSE, which is primarily a descriptive tool. In this paper, I provide a simple, intuitive explanation of the method so that the reader can have a grasp of some of the basic principles. I hope the reader will apply the method in his or her mathematical modeling efforts so a plethora of widely available MLE-bas... |
157 | Toward a method of selecting among computational models of cognition.
- Pitt, Myung, et al.
- 2002
(Show Context)
Citation Context ...stablished in statistics that a complex model with many parameters fits data better than a simple model with few parameters, even if it is the latter that generated the data. The central question is then how one should decide among a set of competing models. A short answer is that a model should be selected based on its generalizability, which is defined as a model’s ability to fit current data but also to predict future data. For a thorough treatment of this and related issues in model selection, the reader is referred elsewhere (e.g. Linhart & Zucchini, 1986; Myung, Forster, & Browne, 2000; Pitt, Myung, & Zhang, 2002). 5. Concluding remarks This article provides a tutorial exposition of maximum likelihood estimation. MLE is of fundamental importance in the theory of inference and is a basis of many inferential techniques in statistics, unlike LSE, which is primarily a descriptive tool. In this paper, I provide a simple, intuitive explanation of the method so that the reader can have a grasp of some of the basic principles. I hope the reader will apply the method in his or her mathematical modeling efforts so a plethora of widely available MLE-based analyses (e.g. Batchelder & Crowther, 1997; Van Zandt, 20... |
88 | How to fit a response time distribution - Zandt - 2000 |
76 |
On the form of forgetting
- Wixted, Ebbesen
- 1991
(Show Context)
Citation Context ...8).3 There is a situation, however, in which the two methods intersect. This is when observations are independent of one another and are normally distributed with a constant variance. In this case, maximization of the log-likelihood is equivalent to minimization of SSE, and therefore, the same parameter values are obtained under either MLE or LSE. 4. Illustrative example In this section, I present an application example of maximum likelihood estimation. To illustrate the method, I chose forgetting data given the recent surge of interest in this topic (e.g. Rubin & Wenzel, 1996; Wickens, 1998; Wixted & Ebbesen, 1991). Among a half-dozen retention functions that have been proposed and tested in the past, I provide an example of MLE for the two functions, power and exponential. Let w ðw1;w2Þ be the parameter vector, t time, and pðw; tÞ the model’s prediction of the probability of correct recall at time t: The two models are defined as power model : pðw; tÞ w1tw2 ðw1;w240Þ; exponential model : pðw; tÞ w1 expðw2tÞ ðw1;w240Þ: ð13Þ Suppose that data y ðy1;y; ymÞ consists of m observations in which yið0pyip1Þ represents an observed proportion of correct recall at time ti ði 1;y;mÞ: We are interested ... |
61 | Information-accumulation theory of speeded categorization.
- Lamberts
- 2000
(Show Context)
Citation Context ... process by testing the viability of such models. Once a model is specified with its parameters, and data have been collected, one is in a position to evaluate its goodness of fit, that is, how well it fits the observed data. Goodness of fit is assessed by finding parameter values of a model that best fits the data—a procedure called parameter estimation. There are two general methods of parameter estimation. They are least-squares estimation (LSE) and maximum likelihood estimation (MLE). The former has been a popular choice of model fitting in psychology (e.g., Rubin, Hinton, & Wenzel, 1999; Lamberts, 2000 but see Usher & McClelland, 2001) and is tied to many familiar statistical concepts such as linear regression, sum of squares error, proportion variance accounted for (i.e. r2), and root mean squared deviation. LSE, which unlike MLE requires no or minimal distributional assumptions, is useful for obtaining a descriptive measure for the purpose of summarizing observed data, but it has no basis for testing hypotheses or constructing confidence intervals. On the other hand, MLE is not as widely recognized among modelers in psychology, but it is a standard approach to parameter estimation and inf... |
55 | Mathematical statistics.
- Bickel, Doksum
- 1977
(Show Context)
Citation Context ...y involved in empirical work and publish in experimental journals (e.g. Journal of Experimental Psychology) but do modeling. The paper is intended to serve as a stepping stone for the modeler to move beyond the current practice of using LSE to more informed modeling analyses, thereby expanding his or her repertoire of statistical instruments, especially in non-linear modeling. The purpose of the paper is to provide a good conceptual understanding of the method with concrete examples. For in-depth, technically more rigorous treatment of the topic, the reader is directed to other sources (e.g., Bickel & Doksum, 1977, Chap. 3; Casella & Berger, 2002, Chap. 7; DeGroot & Schervish, 2002, Chap. 6; Spanos, 1999, Chap. 13). 2. Model specification 2.1. Probability density function From a statistical standpoint, the data vector y ðy1;y; ymÞ is a random sample from an unknown population. The goal of data analysis is to identify the population that is most likely to have generated the sample. In statistics, each population is identified by a corresponding probability distribution. Associated with each probability distribution is a unique value of the model’s parameter. As the parameter changes in value, differen... |
44 |
The precise time course of retention.
- Rubin, Hinton, et al.
- 1999
(Show Context)
Citation Context ...duce the form of the underlying process by testing the viability of such models. Once a model is specified with its parameters, and data have been collected, one is in a position to evaluate its goodness of fit, that is, how well it fits the observed data. Goodness of fit is assessed by finding parameter values of a model that best fits the data—a procedure called parameter estimation. There are two general methods of parameter estimation. They are least-squares estimation (LSE) and maximum likelihood estimation (MLE). The former has been a popular choice of model fitting in psychology (e.g., Rubin, Hinton, & Wenzel, 1999; Lamberts, 2000 but see Usher & McClelland, 2001) and is tied to many familiar statistical concepts such as linear regression, sum of squares error, proportion variance accounted for (i.e. r2), and root mean squared deviation. LSE, which unlike MLE requires no or minimal distributional assumptions, is useful for obtaining a descriptive measure for the purpose of summarizing observed data, but it has no basis for testing hypotheses or constructing confidence intervals. On the other hand, MLE is not as widely recognized among modelers in psychology, but it is a standard approach to parameter es... |
35 |
A special issue on model selection.
- Myung, Forster, et al.
- 2000
(Show Context)
Citation Context ...ocess. For example, it is well established in statistics that a complex model with many parameters fits data better than a simple model with few parameters, even if it is the latter that generated the data. The central question is then how one should decide among a set of competing models. A short answer is that a model should be selected based on its generalizability, which is defined as a model’s ability to fit current data but also to predict future data. For a thorough treatment of this and related issues in model selection, the reader is referred elsewhere (e.g. Linhart & Zucchini, 1986; Myung, Forster, & Browne, 2000; Pitt, Myung, & Zhang, 2002). 5. Concluding remarks This article provides a tutorial exposition of maximum likelihood estimation. MLE is of fundamental importance in the theory of inference and is a basis of many inferential techniques in statistics, unlike LSE, which is primarily a descriptive tool. In this paper, I provide a simple, intuitive explanation of the method so that the reader can have a grasp of some of the basic principles. I hope the reader will apply the method in his or her mathematical modeling efforts so a plethora of widely available MLE-based analyses (e.g. Batchelder & C... |
24 |
Probability theory and statistical inference.
- Spanos
- 1999
(Show Context)
Citation Context ...ology) but do modeling. The paper is intended to serve as a stepping stone for the modeler to move beyond the current practice of using LSE to more informed modeling analyses, thereby expanding his or her repertoire of statistical instruments, especially in non-linear modeling. The purpose of the paper is to provide a good conceptual understanding of the method with concrete examples. For in-depth, technically more rigorous treatment of the topic, the reader is directed to other sources (e.g., Bickel & Doksum, 1977, Chap. 3; Casella & Berger, 2002, Chap. 7; DeGroot & Schervish, 2002, Chap. 6; Spanos, 1999, Chap. 13). 2. Model specification 2.1. Probability density function From a statistical standpoint, the data vector y ðy1;y; ymÞ is a random sample from an unknown population. The goal of data analysis is to identify the population that is most likely to have generated the sample. In statistics, each population is identified by a corresponding probability distribution. Associated with each probability distribution is a unique value of the model’s parameter. As the parameter changes in value, different probability distributions are generated. Formally, a model is defined as the family of pro... |
5 |
Multinomial processing tree models of factorial categorization.
- Batchelder, Crowther
- 1997
(Show Context)
Citation Context ...& Browne, 2000; Pitt, Myung, & Zhang, 2002). 5. Concluding remarks This article provides a tutorial exposition of maximum likelihood estimation. MLE is of fundamental importance in the theory of inference and is a basis of many inferential techniques in statistics, unlike LSE, which is primarily a descriptive tool. In this paper, I provide a simple, intuitive explanation of the method so that the reader can have a grasp of some of the basic principles. I hope the reader will apply the method in his or her mathematical modeling efforts so a plethora of widely available MLE-based analyses (e.g. Batchelder & Crowther, 1997; Van Zandt, 2000) can be performed on data, thereby extracting as much information and insight as possible into the underlying mental process under investigation. Acknowledgments This work was supported by research Grant R01 MH57472 from the National Institute of Mental Health. The author thanks Mark Pitt, Richard Schweickert, and two anonymous reviewers for valuable comments on earlier versions of this paper. Appendix This appendix presents Matlab code that performs MLE and LSE analyses for the example described in the text. Matlab Code for MLE % This is the main program that finds MLE estim... |
4 | The retention of individual items. - Jr, B - 1961 |
4 |
On the form of the retention function:
- Wickens
- 1998
(Show Context)
Citation Context ... (Ratcliff, 1978).3 There is a situation, however, in which the two methods intersect. This is when observations are independent of one another and are normally distributed with a constant variance. In this case, maximization of the log-likelihood is equivalent to minimization of SSE, and therefore, the same parameter values are obtained under either MLE or LSE. 4. Illustrative example In this section, I present an application example of maximum likelihood estimation. To illustrate the method, I chose forgetting data given the recent surge of interest in this topic (e.g. Rubin & Wenzel, 1996; Wickens, 1998; Wixted & Ebbesen, 1991). Among a half-dozen retention functions that have been proposed and tested in the past, I provide an example of MLE for the two functions, power and exponential. Let w ðw1;w2Þ be the parameter vector, t time, and pðw; tÞ the model’s prediction of the probability of correct recall at time t: The two models are defined as power model : pðw; tÞ w1tw2 ðw1;w240Þ; exponential model : pðw; tÞ w1 expðw2tÞ ðw1;w240Þ: ð13Þ Suppose that data y ðy1;y; ymÞ consists of m observations in which yið0pyip1Þ represents an observed proportion of correct recall at time ti ði 1... |
1 | 0;% time intervals as a column vector I.J. Myung / - Akaike - 2003 |
1 | One hundred years of forgetting: A quantitative description of retention. - Myung - 2003 |
1 | Myung / - J - 2003 |