#### DMCA

## Model comparison and the principle of parsimony (2015)

Venue: | In |

Citations: | 1 - 0 self |

### Citations

4628 |
A new look at statistical model identification
- Akaike
- 1974
(Show Context)
Citation Context ...s. or BIC) value. The model with the smallest IC value should be preferred, but the extent of this preference is not immediately apparent. For better interpretation we can calculate IC model weights (=-=Akaike, 1974-=-b; Burnham & Anderson, 2002; Wagenmakers & Farrell, 2004); First, we compute, for each model i, the difference in IC with respect to the IC of the best candidate model: ∆i = ICi −min IC. (3) This step... |

4307 | Estimating the dimension of a model - Schwarz - 1978 |

2893 | Information theory as an extension of the maximum likelihood principle - Akaike - 1973 |

2055 |
Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach
- Burnham, Anderson
- 2002
(Show Context)
Citation Context ...tion loss between reality f and model g, where the discrepancy is quantified by the Kullback-Leibler divergence I(f, g), a distance metric between two probability distributions (for full details, see =-=Burnham & Anderson, 2002-=-). The AIC is unfortunately not consistent: as the number of observations grows infinitely large, AIC is not guaranteed to choose the true data generating model. In fact, there is cause to believe tha... |

1934 | DJC: Information Theory, Inference & Learning Algorithms Cambridge University Press 2002 [http://www.inference.phy.cam.ac.uk/mackay/itila/ book.html
- Mackay
(Show Context)
Citation Context ...e of minimum description length (Grünwald, 2000, 2007). In addition, Occam’s razor is automatically accommodated through Bayes factor model comparisons (e.g., Jeffreys, 1961; Jefferys & Berger, 1992; =-=MacKay, 2003-=-). Both minimum description length and Bayes factors feature prominently in this chapter as principled methods to quantify the tradeoff between parsimony and goodness-of-fit. Note that parsimony plays... |

1823 | Bayes factors
- Kass, Raftery
- 1995
(Show Context)
Citation Context ...M1 | y)/p(M2 | y) is given by the ratio of marginal likelihoods m(y | M1)/m(y | M2) (see below for the definition of the marginal likelihood). This ratio is known as the Bayes factor (Jeffreys, 1961; =-=Kass & Raftery, 1995-=-). The log of the Bayes factor is often interpreted as the weight of evidence provided by the data (Good, 1985; for details see Berger & Pericchi, 1996; Bernardo & Smith, 1994; Gill, 2002; O’Hagan, 19... |

1559 |
Modeling by shortest data description
- Rissanen
- 1978
(Show Context)
Citation Context ...re available based on the idea of model selection through data compression. These methods, most of them developed by Jorma Rissanen, fall under the general heading of minimum description length (MDL; =-=Rissanen, 1978-=-, 1987, 1996, 2001). In psychology, the MDL principle has been applied and promoted primarily by Grünwald (2000), Grünwald, Myung, and Pitt (2005), and Grünwald (2007), as well as Myung, Navarro, and ... |

1516 |
Inference from iterative simulation using multiple sequences
- Gelman, Rubin
- 1992
(Show Context)
Citation Context ...ance density (i.e., a mixture of the uniform Beta density and the Beta posterior density, with a mixture weight w = 0.2 on the uniform component). confirmed by visual inspection and the R̂ statistic (=-=Gelman & Rubin, 1992-=-). The top panel of Figure 6 shows the posterior distributions for the no-conflict model. Although there is slightly more certainty about parameter p than there is about parameters q and c, the poster... |

1506 | Bayesian Theory
- Bernardo, Smith
- 2000
(Show Context)
Citation Context ...es factor (Jeffreys, 1961; Kass & Raftery, 1995). The log of the Bayes factor is often interpreted as the weight of evidence provided by the data (Good, 1985; for details see Berger & Pericchi, 1996; =-=Bernardo & Smith, 1994-=-; Gill, 2002; O’Hagan, 1995). Thus, when the Bayes factor BF12 = m(y | M1)/m(y | M2) equals 5, the observed data y are 5 times more likely to occur underM1 than underM2; when BF12 equals 0.1, the obse... |

1145 |
Data analysis using regression and multilevel
- Gelman, Hill
- 2007
(Show Context)
Citation Context ...tool to obtain Bayes factors. We hope that this methodology will facilitate the principled comparison of MPT 7We confirmed the high quality of fit in a Bayesian framework using posterior predictives (=-=Gelman & Hill, 2007-=-), results not reported here. MODEL COMPARISON 23 5 10 15 20 4 6 8 10 12 r = 0.816 5 10 15 20 4 6 8 10 12 r = 0.816 5 10 15 20 4 6 8 10 12 r = 0.816 5 10 15 20 4 6 8 10 12 r = 0.816 Anscombe’s Quartet... |

768 | A theory of memory retrieval
- Ratcliff
- 1978
(Show Context)
Citation Context ...97), Anderson’s ACT-R model (Anderson et al., 2004), Cohen et al.’s PDP model (Cohen, Dunbar, & McClelland, 1990), Ratcliff’s diffusion model (Dutilh, Vandekerckhove, Tuerlinckx, & Wagenmakers, 2009; =-=Ratcliff, 1978-=-), or Brown and Heathcote’s linear ballistic accumulator model (Brown & Heathcote, 2005, 2008; Heathcote & Hayes, 2012). When various models provide competing accounts of the same data set, it can be ... |

645 | Toward an instance theory of automatization.
- Logan
- 1988
(Show Context)
Citation Context ...re interesting is the observation that practice tends to improve performance such that most of the benefit is accrued early on, a pattern of diminishing returns that is well described by a power law (=-=Logan, 1988-=-; but see Heathcote, Brown, & Mewhort, 2000). This pattern occurs across so many different tasks (e.g., cigar rolling, maze solving, fact retrieval, and a variety of standard psychological tasks) that... |

619 |
Monte Carlo Methods
- Hammersley, Handscomb
- 1964
(Show Context)
Citation Context ... case, draws from p(θ | M(·)) tend to result in low likelihoods and only few chance draws may have high likelihood. This problem can be overcome by a numerical technique known as importance sampling (=-=Hammersley & Handscomb, 1964-=-). MODEL COMPARISON 15 Bayes factors (Jeffreys, 1961; Kass & Raftery, 1995) come with two main challenges, one practical and one conceptual. The practical challenge arises because Bayes factors are de... |

598 | Stochastic complexity - Rissanen - 1987 |

583 | Bayesian model selection in social research
- Raftery
- 1995
(Show Context)
Citation Context ... BIC was derived as an approximation of a Bayesian hypothesis test using default parameter priors (the “unit information prior”; see below for more information on Bayesian hypothesis testing, and see =-=Raftery, 1995-=-, for more information on the BIC). The BIC is consistent: as the number of observations grows infinitely large, BIC is guaranteed to choose the true data generating model. Nevertheless, there is evid... |

511 | On the control of automatic processes: a parallel distributed processing account of the Stroop effect.
- Cohen, Dunbar, et al.
- 1990
(Show Context)
Citation Context ...ions. For example, the effects of practice can also be accounted for by Rickard’s component power laws model (Rickard, 1997), Anderson’s ACT-R model (Anderson et al., 2004), Cohen et al.’s PDP model (=-=Cohen, Dunbar, & McClelland, 1990-=-), Ratcliff’s diffusion model (Dutilh, Vandekerckhove, Tuerlinckx, & Wagenmakers, 2009; Ratcliff, 1978), or Brown and Heathcote’s linear ballistic accumulator model (Brown & Heathcote, 2005, 2008; Hea... |

382 | An introduction to mcmc for machine learning - Andrieu, Freitas, et al. - 2003 |

355 |
Fisher information and stochastic complexity
- Rissanen
- 1996
(Show Context)
Citation Context ...ODEL COMPARISON 12 length. Unfortunately, it can be difficult to define the number of bits required to describe a model. Second, there is the Fisher information approximation (FIA; Pitt et al., 2002; =-=Rissanen, 1996-=-): FIA = − ln p ( y | θ̂ ) + k 2 ln ( n 2pi ) + ln ∫ Θ √ det [I(θ)] dθ, (5) where I(θ) denotes the Fisher information matrix of sample size 1 (Ly, Verhagen, & Wagenmakers, in preparation). I(θ) is a k... |

340 | Goodness-of-Fit Techniques, - D’Agostino, Stephens - 1986 |

337 |
Markov chain Monte Carlo: Stochastic Simulation for Bayesian Inference
- Gamerman
- 1997
(Show Context)
Citation Context ... obtain. Fortunately, there are many approximate and exact methods to facilitate the computation of the Bayes factor (e.g., Ardia, Baştürk, Hoogerheide, & van Dijk, 2012; Chen, Shao, & Ibrahim, 2002; =-=Gamerman & Lopes, 2006-=-); in this chapter we focus on BIC (a crude approximation), the Savage-Dickey density ratio (applies only to nested models) and importance sampling. The conceptual challenge that Bayes factors bring i... |

300 | Bayesian model averaging: A tutorial
- Hoeting, Madigan, et al.
- 1999
(Show Context)
Citation Context ...idate models (i.e., they express a degree to which we should prefer one model from the set as superior), but also provide a method to combine predictions across multiple models using model averaging (=-=Hoeting, Madigan, Raftery, & Volinsky, 1999-=-). Both AIC and BIC rely on an assessment of model complexity that is relatively crude, as it is determined entirely by the number of free parameters but not by the shape of the function through which... |

274 |
A model for recognition memory: REM— retrieving effectively from memory
- Shiffrin, Steyvers
- 1997
(Show Context)
Citation Context ... encompassing model). As is true for the information criteria and minimum description length methods, Bayes factors can be used to compare structurally very different models, such as for example REM (=-=Shiffrin & Steyvers, 1997-=-) versus ACT-R (Anderson et al., 2004), or the diffusion model (Ratcliff, 1978) versus the linear ballistic accumulator model (Brown & Heathcote, 2008). In other words, Bayes factors can be applied to... |

272 |
The Minimum Description Length Principle
- Grünwald
- 2007
(Show Context)
Citation Context ...ve—considerable attention in the field of statistics, and the results of those efforts have been made accessible to psychologists through a series of recent special issues, books, and articles (e.g., =-=Grünwald, 2007-=-; Myung, Forster, & Browne, 2000; Pitt & Myung, 2002; Wagenmakers & Waldorp, 2006). Here we discuss several procedures for model comparison, with an emphasis on minimum description length and the Baye... |

244 |
The intrinsic Bayes factor for model selection and prediction
- Berger, Pericchi
- 1996
(Show Context)
Citation Context ...ratio is known as the Bayes factor (Jeffreys, 1961; Kass & Raftery, 1995). The log of the Bayes factor is often interpreted as the weight of evidence provided by the data (Good, 1985; for details see =-=Berger & Pericchi, 1996-=-; Bernardo & Smith, 1994; Gill, 2002; O’Hagan, 1995). Thus, when the Bayes factor BF12 = m(y | M1)/m(y | M2) equals 5, the observed data y are 5 times more likely to occur underM1 than underM2; when B... |

219 | Polynomial splines and their tensor products in extended linear modeling
- Stone, Hansen, et al.
- 1997
(Show Context)
Citation Context ...ty that d equals 0. This means that the data support MNCM over MDUM. The prior ordinate equals 1, and hence BFNCM,DUM simply equals the posterior ordinate at d = 0. A nonparametric density estimator (=-=Stone, Hansen, Kooperberg, & Truong, 1997-=-) that respects the bound at 0 yields an estimate of 2.81. This estimate is close to 2.77, the estimate from the importance sampling approach. The Savage-Dickey density ratio test can be applied simil... |

197 | JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling;
- Plummer
- 2003
(Show Context)
Citation Context ...e results of the Bayes factor model comparison we first inspect the posterior distributions. The posterior distributions were approximated using Markov chain Monte Carlo sampling implemented in JAGS (=-=Plummer, 2003-=-) and WinBUGS (Lunn, Jackson, Best, Thomas, & Spiegelhalter, 2012).5 All code is available on the authors’ websites. Convergence was 5The second author used WinBUGS, the first and third authors used J... |

192 |
Monte Carlo methods in Bayesian computation.
- Chen, Shao, et al.
- 2000
(Show Context)
Citation Context ...es factor can be difficult to obtain. Fortunately, there are many approximate and exact methods to facilitate the computation of the Bayes factor (e.g., Ardia, Baştürk, Hoogerheide, & van Dijk, 2012; =-=Chen, Shao, & Ibrahim, 2002-=-; Gamerman & Lopes, 2006); in this chapter we focus on BIC (a crude approximation), the Savage-Dickey density ratio (applies only to nested models) and importance sampling. The conceptual challenge th... |

188 | Finite Mixture and Markov Switching Models - Fruhwirth-Schnatter - 2006 |

178 | Fractional Bayes factors for model comparison (with discussion - O'Hagan - 1995 |

159 | Semantic integration of verbal information into a visual memory. - Loftus, Miller, et al. - 1978 |

157 | Toward a method of selecting among computational models of cognition.
- Pitt, Myung, et al.
- 2002
(Show Context)
Citation Context ...g the summed code MODEL COMPARISON 12 length. Unfortunately, it can be difficult to define the number of bits required to describe a model. Second, there is the Fisher information approximation (FIA; =-=Pitt et al., 2002-=-; Rissanen, 1996): FIA = − ln p ( y | θ̂ ) + k 2 ln ( n 2pi ) + ln ∫ Θ √ det [I(θ)] dθ, (5) where I(θ) denotes the Fisher information matrix of sample size 1 (Ly, Verhagen, & Wagenmakers, in preparati... |

133 |
Applying Occam’s razor in modeling cognition: A Bayesian approach.
- Myung, Pitt
- 1997
(Show Context)
Citation Context ...ulation of different standards of evidence. Bayes factors negotiate the tradeoff between parsimony and goodness-of-fit and implement an automatic Occam’s razor (Jefferys & Berger, 1992; MacKay, 2003; =-=Myung & Pitt, 1997-=-). To see this, consider that the marginal likelihood m(y | M(·)) can be expressed as ∫ Θ p(y | θ,M(·))p(θ | M(·)) dθ: an average across the entire parameter space, with the prior providing the averag... |

123 |
Computing Bayes factors using a generalization of the Savage-Dickey density ratio
- Verdinelli, Wasserman
- 1995
(Show Context)
Citation Context ...z (1970), who attributed it to Leonard J. “Jimmie” Savage. The result is now generally known as the Savage-Dickey density ratio (e.g., Dickey, 1971; for extensions and generalizations see Chen, 2005; =-=Verdinelli & Wasserman, 1995-=-; Wetzels, Grasman, & Wagenmakers, 2010; for an introduction for psychologists see Wagenmakers, Lodewyckx, Kuriyal, & Grasman, 2010; a short mathematical proof is presented in O’Hagan & Forster, 2004,... |

120 |
AIC model selection using Akaike weights.
- Wagenmakers, Farrell
- 2004
(Show Context)
Citation Context ...lest IC value should be preferred, but the extent of this preference is not immediately apparent. For better interpretation we can calculate IC model weights (Akaike, 1974b; Burnham & Anderson, 2002; =-=Wagenmakers & Farrell, 2004-=-); First, we compute, for each model i, the difference in IC with respect to the IC of the best candidate model: ∆i = ICi −min IC. (3) This step is taken to increase numerical stability, but it also s... |

116 |
Bayesian methods: A Social and Behavioral Sciences Approach, Chapman and Hall,
- Gill
- 2002
(Show Context)
Citation Context ...1; Kass & Raftery, 1995). The log of the Bayes factor is often interpreted as the weight of evidence provided by the data (Good, 1985; for details see Berger & Pericchi, 1996; Bernardo & Smith, 1994; =-=Gill, 2002-=-; O’Hagan, 1995). Thus, when the Bayes factor BF12 = m(y | M1)/m(y | M2) equals 5, the observed data y are 5 times more likely to occur underM1 than underM2; when BF12 equals 0.1, the observed data ar... |

114 |
The importance of complexity in model selection.
- Myung
- 2000
(Show Context)
Citation Context ...well-known of these). Another measure involves the likelihood function, which expresses the likelihood of observing the data under the model, and is maximized by the best fitting parameter estimates (=-=Myung, 2000-=-). MODEL COMPARISON 4 Occam’s razor (sometimes Ockham’s) is named after the English philosopher and Franciscan friar Father William of Occam (c.1288-c.1348), who wrote “Numquam ponenda est pluralitas ... |

108 |
The power law repealed: The case for an exponential law of practice
- Heathcote, Brown, et al.
- 2000
(Show Context)
Citation Context ... observation that practice tends to improve performance such that most of the benefit is accrued early on, a pattern of diminishing returns that is well described by a power law (Logan, 1988; but see =-=Heathcote, Brown, & Mewhort, 2000-=-). This pattern occurs across so many different tasks (e.g., cigar rolling, maze solving, fact retrieval, and a variety of standard psychological tasks) that it is known as the “power law of practice”... |

92 |
Ockham’s razor and Bayesian analysis.
- Jefferys, Berger
- 1992
(Show Context)
Citation Context ...undation for the principle of minimum description length (Grünwald, 2000, 2007). In addition, Occam’s razor is automatically accommodated through Bayes factor model comparisons (e.g., Jeffreys, 1961; =-=Jefferys & Berger, 1992-=-; MacKay, 2003). Both minimum description length and Bayes factors feature prominently in this chapter as principled methods to quantify the tradeoff between parsimony and goodness-of-fit. Note that p... |

84 | Mixtures of g priors for Bayesian variable selection
- Liang, Paulo, et al.
- 2008
(Show Context)
Citation Context ...vercome this challenge one can either spend more time and effort on the specification of realistic priors, or else one can choose default priors that fulfill general desiderata (e.g., Jeffreys, 1961; =-=Liang, Paulo, Molina, Clyde, & Berger, 2008-=-). Finally, the robustness of the conclusions can be verified by conducting a sensitivity analysis in which one examines the effect of changing the prior specification (e.g., Wagenmakers, Wetzels, Bor... |

84 | Strong optimality of the normalized ML models as universal codes and information in data
- Rissanen
- 2001
(Show Context)
Citation Context ... weights) can be obtained by multiplying FIA by 2 and then applying Equations 3 and 4. The third version of the MDL principle discussed here is normalized maximum likelihood (NML; Myung et al., 2006; =-=Rissanen, 2001-=-): NML = p ( y | θ̂(y) ) ∫ X p ( x | θ̂(x) ) dx . (6) This equation shows that NML tempers the enthusiasm about a good fit to the observed data y (i.e., the numerator) to the extent that the model cou... |

80 |
When a good fit can be bad .
- Pitt, Myung
- 2002
(Show Context)
Citation Context ...ant aspects and focus solely on the quantitative elements. The single most important quantitative element of model comparison relates to the ubiquitous tradeoff between parsimony and goodness-of-fit (=-=Pitt & Myung, 2002-=-). The motivating insight is that the appeal of an excellent fit to the data (i.e., high descriptive adequacy) needs to be tempered to the extent that the fit was achieved with a highly complex and po... |

78 | An instance theory of attention and memory. - Logan - 2002 |

72 | Theoretical and empirical review of multinomial processing tree modeling. - Batchelder, Riefer - 1999 |

72 | Shapes of reaction-time distributions and shapes of learning curves: A test of the instance theory of automatization.
- Logan
- 1992
(Show Context)
Citation Context ...e latencies; in fact, they show a power law decrease in the entire response time distribution, that is, both the fast responses and the slow responses speed up with practice according to a power law (=-=Logan, 1992-=-). The observation that practice makes perfect is trivial, but the finding that practiceinduced improvement follows a general law is not. Nevertheless, the power law of practice only provides a descri... |

64 | Model selection based on minimum description length.
- Grunwald
- 2000
(Show Context)
Citation Context ...made as simple as possible, but no simpler”), and many others. In the field of statistical reasoning and inference, Occam’s razor forms the foundation for the principle of minimum description length (=-=Grünwald, 2000-=-, 2007). In addition, Occam’s razor is automatically accommodated through Bayes factor model comparisons (e.g., Jeffreys, 1961; Jefferys & Berger, 1992; MacKay, 2003). Both minimum description length ... |

64 | Bending the power law: A CMPL theory of strategy shifts and the automatization of cognitive skills.Journal of Experimental Psychology:
- Rickard
- 1997
(Show Context)
Citation Context ...actice. The main reason is that single phenomena often afford different competing explanations. For example, the effects of practice can also be accounted for by Rickard’s component power laws model (=-=Rickard, 1997-=-), Anderson’s ACT-R model (Anderson et al., 2004), Cohen et al.’s PDP model (Cohen, Dunbar, & McClelland, 1990), Ratcliff’s diffusion model (Dutilh, Vandekerckhove, Tuerlinckx, & Wagenmakers, 2009; Ra... |

61 |
Multinomial modeling and the measurement of cognitive processes.
- Riefer, Batchelder
- 1988
(Show Context)
Citation Context ...is a penalty for model complexity, measured by the number of adjustable model MODEL COMPARISON 9 Multinomial processing tree models (Batchelder & Riefer, 1980; Chechile, 1973; Chechile & Meyer, 1976; =-=Riefer & Batchelder, 1988-=-) are psychological process models for categorical data. MPT models are used in two ways: as a psychometric tool to measure unobserved cognitive processes, and as a convenient formalization of competi... |

60 | Estimating Bayes factors via posterior simulation with the Laplace-Metropolis estimator
- Lewis, Raftery
- 1997
(Show Context)
Citation Context ...d, consequently, the Bayes factor) automatically takes all these aspects into account. Bayes factors represent “the standard Bayesian solution to the hypothesis testing and model selection problems” (=-=Lewis & Raftery, 1997-=-, p. 648) and “the primary tool used in Bayesian inference for hypothesis testing and model selection” (Berger, 2006, p. 378), but their application is not without challenges (Box 14.3). Below we show... |

53 |
The weighted likelihood ratio, linear hypotheses on normal location parameters.
- Dickey
- 1971
(Show Context)
Citation Context ...0. This surprising result was first published by Dickey and Lientz (1970), who attributed it to Leonard J. “Jimmie” Savage. The result is now generally known as the Savage-Dickey density ratio (e.g., =-=Dickey, 1971-=-; for extensions and generalizations see Chen, 2005; Verdinelli & Wasserman, 1995; Wetzels, Grasman, & Wagenmakers, 2010; for an introduction for psychologists see Wagenmakers, Lodewyckx, Kuriyal, & G... |

53 | Advances in minimum description length: Theory and applications. - Grünwald, Myung, et al. - 2005 |

52 |
Statistical analysis and the illusion of objectivity.
- Berger, Berry
- 1988
(Show Context)
Citation Context ... Grünwald, 2007). Additionally, NML requires an integration over the entire set of possible data sets, which may be difficult to define as it depends on unknown decision processes in the researchers (=-=Berger & Berry, 1988-=-). Note that, since the computation of NML depends on the likelihood of data that might have occurred but did not, the procedure violates the likelihood principle, which states that all information ab... |

52 | Why psychologists must change the way they analyze their data: The case of psi. - Wagenmakers, Wetzels, et al. - 2011 |

50 |
The simplest complete model of choice reaction time: Linear ballistic accumulation.
- Brown, Heathcote
- 2008
(Show Context)
Citation Context ...very different models, such as for example REM (Shiffrin & Steyvers, 1997) versus ACT-R (Anderson et al., 2004), or the diffusion model (Ratcliff, 1978) versus the linear ballistic accumulator model (=-=Brown & Heathcote, 2008-=-). In other words, Bayes factors can be applied to nested and non-nested models alike. For the models under consideration, however, there exists a nested structure that allows one to obtain the Bayes ... |

40 |
Weight of evidence: A brief survey,”
- Good
- 1985
(Show Context)
Citation Context ... marginal likelihood). This ratio is known as the Bayes factor (Jeffreys, 1961; Kass & Raftery, 1995). The log of the Bayes factor is often interpreted as the weight of evidence provided by the data (=-=Good, 1985-=-; for details see Berger & Pericchi, 1996; Bernardo & Smith, 1994; Gill, 2002; O’Hagan, 1995). Thus, when the Bayes factor BF12 = m(y | M1)/m(y | M2) equals 5, the observed data y are 5 times more lik... |

38 | Kendall's Advanced Theory of Statistics, Vol. 2B: Bayesian Inference - O'Hagan - 1994 |

34 | Bayesian hypothesis testing for psychologists: A tutorial on the Savage-Dickey method.
- Wagenmakers, Lodewyckx, et al.
- 2010
(Show Context)
Citation Context ...ey density ratio (e.g., Dickey, 1971; for extensions and generalizations see Chen, 2005; Verdinelli & Wasserman, 1995; Wetzels, Grasman, & Wagenmakers, 2010; for an introduction for psychologists see =-=Wagenmakers, Lodewyckx, Kuriyal, & Grasman, 2010-=-; a short mathematical proof is presented in O’Hagan & Forster, 2004, pp. 174-177).6 Thus, we can exploit the fact that MNCM is nested in MDUM and use the Savage-Dickey density ratio to obtain the Bay... |

32 | The BUGS Book: A Practical Introduction to Bayesian Analysis, Chapman & Hall/CRC,
- Lunn, Jackson, et al.
- 2013
(Show Context)
Citation Context ...r model comparison we first inspect the posterior distributions. The posterior distributions were approximated using Markov chain Monte Carlo sampling implemented in JAGS (Plummer, 2003) and WinBUGS (=-=Lunn, Jackson, Best, Thomas, & Spiegelhalter, 2012-=-).5 All code is available on the authors’ websites. Convergence was 5The second author used WinBUGS, the first and third authors used JAGS. MODEL COMPARISON 17 0.0 0.2 0.4 0.6 0.8 1.0 Brute Force Meth... |

30 |
The Schwarz criterion and related methods for normal linear models
- Pauler
- 1998
(Show Context)
Citation Context ...rnham & Anderson, 2002). Now consider a set of candidate models,Mi, i = 1, ...,m, each with a specific IC (AIC 2Note that for hierarchical models, the definition of sample size n is more complicated (=-=Pauler, 1998-=-; Raftery, 1995). MODEL COMPARISON 10 0 1 2 3 0 0.5 1 Objective intensity Su bje cti vesin ten sit y 0 1 2 3 0 0.5 1 Objective intensity Su bje cti vesin ten sit ysk = 0.4sβ = 1.0 k = 0.7sβ = 0.5 c = ... |

26 | Model selection by normalized maximum likelihood.
- Myung, Navarro, et al.
- 2006
(Show Context)
Citation Context ...weights (or Rissanen weights) can be obtained by multiplying FIA by 2 and then applying Equations 3 and 4. The third version of the MDL principle discussed here is normalized maximum likelihood (NML; =-=Myung et al., 2006-=-; Rissanen, 2001): NML = p ( y | θ̂(y) ) ∫ X p ( x | θ̂(x) ) dx . (6) This equation shows that NML tempers the enthusiasm about a good fit to the observed data y (i.e., the numerator) to the extent th... |

23 |
On the likelihood of a time series model.,”
- Akaike
- 1978
(Show Context)
Citation Context ...s. or BIC) value. The model with the smallest IC value should be preferred, but the extent of this preference is not immediately apparent. For better interpretation we can calculate IC model weights (=-=Akaike, 1974-=-b; Burnham & Anderson, 2002; Wagenmakers & Farrell, 2004); First, we compute, for each model i, the difference in IC with respect to the IC of the best candidate model: ∆i = ICi −min IC. (3) This step... |

23 | The weighted likelihood ratio, sharp hypotheses about chances, the order of a Markov chain. - Dickey, Lientz - 1970 |

23 |
Model selection [Special issue].
- Myung, Forster, et al.
- 2000
(Show Context)
Citation Context ...attention in the field of statistics, and the results of those efforts have been made accessible to psychologists through a series of recent special issues, books, and articles (e.g., Grünwald, 2007; =-=Myung, Forster, & Browne, 2000-=-; Pitt & Myung, 2002; Wagenmakers & Waldorp, 2006). Here we discuss several procedures for model comparison, with an emphasis on minimum description length and the Bayes factor. Both procedures entail... |

21 |
A diffusion model decomposition of the practice effect.
- Dutilh, Wagenmakers, et al.
- 2009
(Show Context)
Citation Context ... for by Rickard’s component power laws model (Rickard, 1997), Anderson’s ACT-R model (Anderson et al., 2004), Cohen et al.’s PDP model (Cohen, Dunbar, & McClelland, 1990), Ratcliff’s diffusion model (=-=Dutilh, Vandekerckhove, Tuerlinckx, & Wagenmakers, 2009-=-; Ratcliff, 1978), or Brown and Heathcote’s linear ballistic accumulator model (Brown & Heathcote, 2005, 2008; Heathcote & Hayes, 2012). When various models provide competing accounts of the same data... |

18 |
Comptutational modeling in cognition: Principles and practice.
- Lewandowsky, Farrell
- 2011
(Show Context)
Citation Context ...t from a coherent set of assumptions about the underlying cognitive processes—a theory. Ideally, substantive psychological theories are formalized as quantitative models (Busemeyer & Diederich, 2010; =-=Lewandowsky & Farrell, 2010-=-). For example, the power law of practice has been explained by instance theory (Logan, 1992, This work was partially supported by the starting grant “Bayes or Bust” awarded by the European Research C... |

17 |
Theory of probability, 3
- Jeffreys
- 1961
(Show Context)
Citation Context ...zor forms the foundation for the principle of minimum description length (Grünwald, 2000, 2007). In addition, Occam’s razor is automatically accommodated through Bayes factor model comparisons (e.g., =-=Jeffreys, 1961-=-; Jefferys & Berger, 1992; MacKay, 2003). Both minimum description length and Bayes factors feature prominently in this chapter as principled methods to quantify the tradeoff between parsimony and goo... |

17 |
A hierarchical process dissociation model.
- Rouder, Lu, et al.
- 2008
(Show Context)
Citation Context ... this chapter, however, our preferred method for fitting MPT models is Bayesian (Chechile & Meyer, 1976; Klauer, 2010; Lee & Wagenmakers, in press; Matzke, Dolan, Batchelder, & Wagenmakers, in press; =-=Rouder, Lu, Morey, Sun, & Speckman, 2008-=-; Smith & Batchelder, 2010). Box 14.2: Popularity of multinomial processing tree models. parameters k. The AIC estimates the expected information loss incurred when a probability distribution f (assoc... |

16 | processing treemodels: A review of the literature. Zeitschrift für Psychologie / - Erdfelder, Auer, et al. - 2009 |

14 |
Bayes factors.
- Berger
- 2006
(Show Context)
Citation Context ...Bayesian solution to the hypothesis testing and model selection problems” (Lewis & Raftery, 1997, p. 648) and “the primary tool used in Bayesian inference for hypothesis testing and model selection” (=-=Berger, 2006-=-, p. 378), but their application is not without challenges (Box 14.3). Below we show how these challenges can be overcome for the general class of MPT models. Next we compare the results of our Bayes ... |

14 |
Hierarchical multinomial processing tree models: a latent-trait approach.
- Klauer
- 2010
(Show Context)
Citation Context ...ann & Kellen, in press) with which we have good experiences. As will become apparent throughout this chapter, however, our preferred method for fitting MPT models is Bayesian (Chechile & Meyer, 1976; =-=Klauer, 2010-=-; Lee & Wagenmakers, in press; Matzke, Dolan, Batchelder, & Wagenmakers, in press; Rouder, Lu, Morey, Sun, & Speckman, 2008; Smith & Batchelder, 2010). Box 14.2: Popularity of multinomial processing t... |

13 |
Model selection: Theoretical developments and applications [Special issue].
- Wagenmakers, Waldorp
- 2006
(Show Context)
Citation Context ...s of those efforts have been made accessible to psychologists through a series of recent special issues, books, and articles (e.g., Grünwald, 2007; Myung, Forster, & Browne, 2000; Pitt & Myung, 2002; =-=Wagenmakers & Waldorp, 2006-=-). Here we discuss several procedures for model comparison, with an emphasis on minimum description length and the Bayes factor. Both procedures entail principled and general solutions to the tradeoff... |

9 |
An encompassing prior generalization of the Savage-Dickey density ratio.
- Wetzels, Grasman, et al.
- 2010
(Show Context)
Citation Context ... Leonard J. “Jimmie” Savage. The result is now generally known as the Savage-Dickey density ratio (e.g., Dickey, 1971; for extensions and generalizations see Chen, 2005; Verdinelli & Wasserman, 1995; =-=Wetzels, Grasman, & Wagenmakers, 2010-=-; for an introduction for psychologists see Wagenmakers, Lodewyckx, Kuriyal, & Grasman, 2010; a short mathematical proof is presented in O’Hagan & Forster, 2004, pp. 174-177).6 Thus, we can exploit th... |

8 | A comparative study of monte carlo methods for efficient evaluation of marginal likelihood. Computational Statistics & Data Analysis XX, forthcoming - Ardia, Baştürk, et al. - 2010 |

8 |
Separation of storage and retrieval factors in free recall of clusterable pairs
- Batchelder, Riefer
- 1980
(Show Context)
Citation Context ...radeoff with goodness-of-fit. The second section summarizes the research of Wagenaar and Boer (1987) who carried out an experiment to compare three competing multinomial processing tree models (MPTs; =-=Batchelder & Riefer, 1980-=-); this model comparison exercise is used as a running example throughout the chapter. The third section outlines different methods for model comparison and applies them to Wagenaar and Boer’s MPT mod... |

8 | Practice increases the efficiency of evidence accumulation in perceptual
- Brown, Heathcote
- 2005
(Show Context)
Citation Context ...(Cohen, Dunbar, & McClelland, 1990), Ratcliff’s diffusion model (Dutilh, Vandekerckhove, Tuerlinckx, & Wagenmakers, 2009; Ratcliff, 1978), or Brown and Heathcote’s linear ballistic accumulator model (=-=Brown & Heathcote, 2005-=-, 2008; Heathcote & Hayes, 2012). When various models provide competing accounts of the same data set, it can be difficult to choose between them. The process of choosing between models is called mode... |

8 |
Model selection and psychological theory: A discussion of the differences between the Akaike information criterion (AIC) and the Bayesian information criterion (BIC
- Vrieze
- 2012
(Show Context)
Citation Context ...uaranteed to choose the true data generating model. In fact, there is cause to believe that the AIC tends to select complex models that overfit the data (O’Hagan & Forster, 2004; for a discussion see =-=Vrieze, 2012-=-). Another information criterion, the BIC (“Bayesian information criterion”) was proposed by Schwarz (1978): BIC = −2 ln p ( y | θ̂ ) + k lnn. (2) Here, the penalty term is k ln n, where n is the numb... |

6 |
The Likelihood Principle (2nd
- Berger, Wolpert
- 1988
(Show Context)
Citation Context ...ot, the procedure violates the likelihood principle, which states that all information about a parameter θ obtainable from an experiment is contained in the likelihood function for θ for the given y (=-=Berger & Wolpert, 1988-=-). Application to Multinomial Processing Tree Models. Using the parameter estimates from Table 1 and the code provided by Wu, Myung, and Batchelder (2010), we can compute the FIA for the three competi... |

6 |
Computing marginal likelihoods from a single MCMC output
- Chen
- 2005
(Show Context)
Citation Context ...ey and Lientz (1970), who attributed it to Leonard J. “Jimmie” Savage. The result is now generally known as the Savage-Dickey density ratio (e.g., Dickey, 1971; for extensions and generalizations see =-=Chen, 2005-=-; Verdinelli & Wasserman, 1995; Wetzels, Grasman, & Wagenmakers, 2010; for an introduction for psychologists see Wagenmakers, Lodewyckx, Kuriyal, & Grasman, 2010; a short mathematical proof is present... |

5 |
Cognitive modeling. Thousand
- Busemeyer, Diederich
- 2010
(Show Context)
Citation Context ... other. Such bridges are built from a coherent set of assumptions about the underlying cognitive processes—a theory. Ideally, substantive psychological theories are formalized as quantitative models (=-=Busemeyer & Diederich, 2010-=-; Lewandowsky & Farrell, 2010). For example, the power law of practice has been explained by instance theory (Logan, 1992, This work was partially supported by the starting grant “Bayes or Bust” award... |

4 | Using multinomial processing tree models to measure cognitive deficits in clinical populations - Batchelder, Riefer - 2007 |

4 |
Diffusion versus linear ballistic accumulation: different models for response time with different conclusions about psychological mechanisms
- Heathcote, A, et al.
- 2012
(Show Context)
Citation Context ...990), Ratcliff’s diffusion model (Dutilh, Vandekerckhove, Tuerlinckx, & Wagenmakers, 2009; Ratcliff, 1978), or Brown and Heathcote’s linear ballistic accumulator model (Brown & Heathcote, 2005, 2008; =-=Heathcote & Hayes, 2012-=-). When various models provide competing accounts of the same data set, it can be difficult to choose between them. The process of choosing between models is called model comparison, model selection, ... |

3 |
Beta–MPT: Multinomial processing tree models for addressing individual differences
- Smith, Batchelder
- 2010
(Show Context)
Citation Context ...od for fitting MPT models is Bayesian (Chechile & Meyer, 1976; Klauer, 2010; Lee & Wagenmakers, in press; Matzke, Dolan, Batchelder, & Wagenmakers, in press; Rouder, Lu, Morey, Sun, & Speckman, 2008; =-=Smith & Batchelder, 2010-=-). Box 14.2: Popularity of multinomial processing tree models. parameters k. The AIC estimates the expected information loss incurred when a probability distribution f (associated with the true data-g... |

3 |
Misleading postevent information: Testing parameterized models of integration in memory
- Wagenaar, Boer
- 1987
(Show Context)
Citation Context ...ith parsimony and hence measure generalizability. We adopt their more accurate terminology here. MODEL COMPARISON 6 Figure 2 . A pair of pictures from the third phase (i.e., the recognition test) of (=-=Wagenaar & Boer, 1987-=-, reprinted with permission), containing the critical episode at the intersection. which nonetheless remains viable though temporarily inaccessible. Finally, a no-conflict model (NCM) simply states th... |

2 | The flexibility of models of recognition memory: An analysis by the minimum description length principle - Klauer, Kellen - 2011 |

2 | Testing adaptive toolbox models: A Bayesian hierarchical approach - Scheibehenne, Rieskamp, et al. - 2013 |

2 | The mind–body equation revisited - Townsend - 1975 |

1 |
The relative storage and retrieval losses in short–term memory as a function of the similarity and amount of information processing in the interpolated task. Unpublished doctoral dissertation
- Chechile
- 1973
(Show Context)
Citation Context ... parameter estimate; the second term 2k is a penalty for model complexity, measured by the number of adjustable model MODEL COMPARISON 9 Multinomial processing tree models (Batchelder & Riefer, 1980; =-=Chechile, 1973-=-; Chechile & Meyer, 1976; Riefer & Batchelder, 1988) are psychological process models for categorical data. MPT models are used in two ways: as a psychometric tool to measure unobserved cognitive proc... |

1 |
A Bayesian procedure for separately estimating MODEL COMPARISON 26 storage and retrieval components of forgetting
- Chechile, Meyer
- 1976
(Show Context)
Citation Context ...ate; the second term 2k is a penalty for model complexity, measured by the number of adjustable model MODEL COMPARISON 9 Multinomial processing tree models (Batchelder & Riefer, 1980; Chechile, 1973; =-=Chechile & Meyer, 1976-=-; Riefer & Batchelder, 1988) are psychological process models for categorical data. MPT models are used in two ways: as a psychometric tool to measure unobserved cognitive processes, and as a convenie... |

1 | The signal and the noise: The art and science of prediction - COMPARISON - 2012 |