40 citations found. Retrieving documents...
H. Linhart and W. Zucchini. Model selection. J. Wiley, New York, 1986.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

On the Value of Partial Information for Learning from Examples - Ratsaby, Maiorov (1998)   (1 citation)  (Correct)

.... problem, then based on fixed m and n the best choice of a hypothesis class over which the learner should run empirical error minimization is with d as in (16) The notion of having an optimal complexity d is closely related to statistical model selection (cf. Linhart and Zucchini [19], Devroye et al. 12] Ratsaby et al. 29] For instance, in Vapnik s structural risk minimization criterion (SRM) 38] the trade off is between m and d. For a fixed m, it is possible to calculate the optimal complexity d of a hypothesis class in a nested class structure, by ....

Linhart, H., and Zucchini, W. (1986), "Model Selection," Wiley Series in Probability and Mathematical Statistics, Wiley, New York.


Minimum Message Length Inference: Theory and Applications - Baxter (1996)   (2 citations)  (Correct)

....in the Philosophy of Science is concerned with the scientific method and considers the basis for accepting some theories and rejecting others. Each of these disciplines has its own terminology for describing the problem. I attempt to summarize some of the terms in tables 1.1, 1.2, and 1. 3 [86, 33, 22, 77]. Note that terminology varies with sub disciplines of each of these fields. Even the names of the sub disciplines are changing. For example, we now have data mining and knowledge discovery in databases (KDD) In this thesis, I use terms from these fields interchangeably. For example, I use the ....

....Criterion 0 1 2 3 4 5 6 MML68 0 10 74 16 0 0 0 MDL 0 0 15 67 8 6 4 CAICF 0 0 32 64 2 1 1 MML68 0 0 18 82 0 0 0 MDL 0 0 2 96 1 0 1 CAICF 0 0 2 98 0 0 0 MML68 0 0 0 99 1 0 0 MDL 0 0 0 97 3 0 0 CAICF 0 0 0 98 2 0 0 = 5 6.4. 2 Linhart and Zucchini Datasets Linhart and Zucchini [86] have published a number of datasets to which they fit polynomials. They employ a Kullback Leibler distance criterion, cross validation and the bootstrap method. On their Bagasse dataset[86, page 82] all my criteria fitted a straight line, with and without the two outlier points. The bootstrap ....

H. Linhart and W. Zucchini. Model Selection. Wiley, 1986.


Short-Term Travel Time Prediction Using A Time-Varying.. - Zhang, Rice   (Correct)

....subsegments. In the extreme case, we may apply the method link by link where each link is delimited by a pair of adjacent sensors. The question is analogous to the model selection problem, and it may be worthwhile to look for inspirations in the rich statistical model selection literature ( 1] [12]) Essentially the number of parameters in the TVC model increases with the number of sub segments. By dividing the trip with many sub segments, we might be able to reduce the systematic bias of the model, but run higher risk of over fitting the historical data. On the other hand, the TVC model ....

....sometimes a necessity. We have shown that adequate segmenting for long trips may be beneficial. This raises the question of how segmenting may be done adequately. As suggested earlier, the problem is related to the more general problem of model selection, and techniques such as AIC and BIC ( 1] [12]) may be applied. This is a direction for future work. 3) The trade o# in prediction accuracy is negligible when replacing the speed based T # with the proxy T Q based on the flow occupancy ratio. The implementation framework of the methodology is to compute and store the estimated model ....

H. Linhart and W. Zucchini. Model Selection. John Wiley and Sons, 1986.


Dynamic Bayesian Multinets - Bilmes (2000)   (7 citations)  (Correct)

....approach can also be taken where we use a (potentially uncountably infinite) probabilistically weighted mixture over multiple choices. The task of learning graphical models can be seen as learning any or all of the above four components given a collection of data, and is akin to model selection [22] problem known to the statistics community for years. In all cases, 1 While this is not standard terminology, a concise way to refer to the representation of the local conditional probability model is simply to use the term implementation. the underlying goal is to identify a system for ....

....accuracy, even if the structures are augmented in a class conditional way, as in case 1 above. Note that the likelihood scores for these models are dramatically higher both for the training and testing data, suggesting that overfitting is not the problem. The goal of many model selection methods [6, 22] is to choose a model that provides the best description of the data, but the above suggests that this can be inappropriate for classification. Admittedly, model selection procedures typically include complexity penalty terms (e.g. MDL, BIC, and so on) But these penalties do not select for ....

H. Linhart and W. Zucchini. Model Selection. Wiley, 1986.


Directed Graphical Models Of Classifier Combination.. - Bilmes, Kirchhoff (2000)   (1 citation)  (Correct)

....of combination rules, one can consider the corresponding set of underlying statistical models. By choosing the model most accurately reflected by the data, one can select an correspondingly appropriate combination rule. Selecting a combination rule can therefore be seen as a model selection [23] procedure. It has been shown that multiple different models might lead to exactly the same combination rule, with shared rule being valid with respect to multiple models. Also, a single model can lead to more than one valid combination rule. In the later case, the simplest model could be chosen. ....

H. Linhart and W. Zucchini. Model Selection. Wiley, 1986.


Counting Probability Distributions: Differential.. - Myung.. (1999)   (Correct)

....and the functional form of a model. In other words, we would like an analytic realization of Occam s Razor. Previous Approaches to Measuring Model Complexity. The overarching goal of many model selection approaches has been the estimation of a model s generalizability (for a review, see [4]) Some representative methods used for inference of parametric models are the Akaike Information Criterion (AIC, 5] the Bayesian Information Criterion (BIC, 6] and Rissanen s Stochastic Complexity (SC, 2, 7] AIC = 2 ln f(y #) 2k BIC = 2 ln f(y #) k ln N SC = ln f(y ....

Linhart, H. & Zucchini, W. (1986) Model Selection (John Wiley & Sons, New York).


Handling Uncertainty When You're Handling Uncertainty: Model.. - Van Allen (2000)   (Correct)

....selection criteria, its analysis is theoretical and based on asymptotic behaviour, and it only considers complexity penalization; by contrast we are empirically investigating small sample behaviour over a di erent class of criteria, including Bayesian, bootstrap and cross validation criteria. Linhart and Zucchini (1986) provide an overview of the general problem of model selection, covering AIC and cross validation, but not MDL. Rissanen (1989) gives a detailed development of the Minimum Description Length Principle, which is the information theoretic view of induction that the MDL criterion is based on. Schwarz ....

....the data using the optimal code given by the model. 2 If the model captures signi cant features of the data, this encoding will be considerably smaller than the original encoding of the sample. On the other hand, if the model represents too much about the sample, the encoding size will increase (Linhart Zucchini, 1986). This trade o is similar to the bias variance trade o . The MDL criterion we will use is given by: MDL(h; d) Fit(h; d) Dim(h) log m 2m where Dim(h) is the the number of free parameters of h. Recall that m is the sample size. This version di ers from the standard form in that we ....

Linhart, H., & Zucchini, W. (1986). Model selection. New York: John Wiley & Sons.


Discrepancy Risk Model Selection Test Theory For Comparing.. - Golden   (Correct)

....3 Introduction Let Omega be a set of probability distributions. Let the distribution generating the data be the distinguished environmental distribution p e 2 Omega Gamma Define a probability model , M Theta , i.e. a family of approximating distributions following the terminology of Linhart and Zucchini, 1986) such that M Theta = fp 2 Omega : 2 Thetag. Similarly, define M Psi = fp 2 Omega : 2 Psig. It is not necessarily assumed that p e 2 M Theta or that p e 2 M Psi (i.e. the models may be misspecified ) Let D(p e ; p ) be a real number that measures how well a ....

....into account not only their respective discrepancy loss functions but also their respective penalty terms. Examples of such penalty terms (see Sin and White, 1996, for a general analysis) include the Akaike Information Criterion (AIC) and its variants (Akaike, 1973; Bozdogan, 1987, in press; Linhart Zucchini, 1986), as well as the Bayes Information Criterion (BIC) Schwarz Information Criterion (SIC) Djuric, 1998; Kass and Wasserman, 1995; Schwarz, 1978) and various versions of Bayesian model selection penalty terms associated with Minimum Descriptive Length (MDL) and Stochastic Complexity (SC) methods ....

Linhart, H. & Zucchini, W. (1986). Model selection. New York: Wiley.


Counting Probability Distributions: Differential.. - Myung.. (1999)   (Correct)

....and the functional form of a model. In other words, we would like an analytic realization of Occam s Razor. Previous Approaches to Measuring Model Complexity. The overarching goal of many model selection approaches has been the estimation of a model s generalizability (for a review, see [2]) Four representative methods for inference of parametric models are Myung, Balasubramanian Pitt 5 the Akaike Information Criterion (AIC, 3] the Bayesian Information Criterion (BIC, 4] Rissanen s Stochastic Complexity (SC, 5, 6] and the Information theoretic Measure of Complexity ....

H. Linhart and W. Zucchini, Model Selection (John Wiley & Sons, New York, NY 1986).


Wrappers for Feature Subset Selection - Kohavi, John (1996)   (329 citations)  (Correct)

.... In a node expansion, all children can be evaluated in parallel, which will cut the running time by a factor equal to the number of attributes (e.g. 180 for DNA) In theory, every possible feature subset identifies a different model, so the problem can be viewed as that of model selection (Linhart Zucchini 1986) in statistics. If there are only a few models, as is the case when one chooses between three induction algorithms, one can estimate the accuracy of each one and select the one with the highest accuracy (Schaffer 1993) or perhaps even find some underlying theory to help predict the best one for a ....

Linhart, H. & Zucchini, W. (1986), Model Selection, John Wiley & Sons.


Comparing Model Selection Criteria for Belief Networks - Van Allen, Greiner (2000)   (Correct)

....selection criteria, its analysis is theoretical and based on asymptotic behaviour, and it only considers complexity penalization; by contrast we are empirically investigating small sample behaviour over a di erent class of criteria, including Bayesian, bootstrap and cross validation criteria. Linhart and Zucchini (1986) provide an overview of the general problem of model selection, covering AIC and cross validation, but not MDL. Rissanen (1989) gives a detailed development of the Minimum Description Length Principle, which is the information theoretic view of induction that the MDL criterion is based on. Schwarz ....

....the data using the optimal code given by the model. 4 If the model captures signi cant features of the data, this encoding will be considerably smaller than the original encoding of the sample. On the other hand, if the model represents too much about the sample, the encoding size will increase (Linhart Zucchini, 1986). This trade o is similar to the bias variance trade o . 3 We leave the estimation of as a black box for the time being we return to this at the end of the section. 4 Every probability distribution has an associated optimal code (Cover Thomas 1991) 6 The MDL criterion we will use ....

Linhart, H., & Zucchini, W. (1986). Model selection. New York: John Wiley & Sons.


Model Selection Criteria for Learning Belief Nets: An.. - Van Allen, Greiner (2000)   (4 citations)  (Correct)

....complexity penalty approaches to belief net learning. While that work also addresses suitability of various selection criteria, its analysis is theoretical and based on asymptotic behaviour; by contrast we are empirically investigating small sample behaviour over a different class of criteria. Linhart and Zucchini (1986) provide an overview of the general problem of model selection, covering AIC and Cross Validation, but not MDL. Rissanen (1989) gives a detailed development of the Minimum Description Length Principle, which is the information theoretic view of induction that the MDL criterion is based on. ....

....and afterward combining these estimates. The logical extreme is to divide the sample into m subsamples of 1 datum each. This family of methods goes under the generic name of Cross Validation, being respectively called simple , k fold , and leave one out CrossValidation (Stone, 1974; Linhart Zucchini, 1986). For our experiments, we used the simple version, dividing the sample into two equal size subsamples, one for training and one for validation. XV (h; s) info(h(s 1 ) s 2 ) where s has been split into disjoint halves s 1 and s 2 , and h(s 1 ) is the hypothesis h instantiated using the ....

[Article contains additional citation context not shown here]

Linhart, H., & Zucchini, W.,, (1986). Model selection. New York: John Wiley & Sons.


Complexity-Penalized Model Selection For Feedwater.. - Urmanov, Gribok.. (2000)   (Correct)

....of minimum Mean Squared Error (MSE) on this set of data. This solution does not guarantee good prediction on future observations. Experience shows that in most applications prediction accuracy is not improved by simply using all available predictors, more often the opposite effect is achieved (Linhart, 1986). In other words, the prediction accuracy of the full model is worse (or at least not better) than those of the subset models because the variance of the predicted values for linear models with parameters fitted by least squares increases monotonically with the number of variables used in ....

Linhart, H., Zucchini, W., 1986. Model Selection, Wiley Series in Probability and Mathematical Statistics, John Wiley & Sons, Inc.


Optimum Probability Model Selection Using Akaike's.. - Chandramouli..   (Correct)

....reasons : a) well studied asymptotic theory, b) estimates can be computed easily, c) can be combined with the likelihood based statistical inference methods. 3. MODEL SELECTION USING AIC The theory of probability model selection deals with constructing probability models from observations [9]. Let the n independent observations be denoted by fp 1 ; p 2 ; pn g. In this paper each observation corresponds to the simulated average value of the power consumption of a circuit. Let each observation have a distribution function F . Let M be the set of all distribution functions that ....

....2 ; pn g. In this paper each observation corresponds to the simulated average value of the power consumption of a circuit. Let each observation have a distribution function F . Let M be the set of all distribution functions that are completely known. We then have the following definitions [9]. Definition 1 A family of models, G , 2 , is a subset of M whose individual members are identified by the vector of parameters f 1 ; 2 ; k g. Definition 2 A fitted model, G , is a member of a family of models G , 2 , that has been selected by estimating the parameters ....

H. Linhart and W. Zucchini, Model selection, John Wiley & Sons, 1986.


Evolutionary Model-Building and Its Application in.. - Chen, Pan, Xue, Liu   (Correct)

.... but also for adaptive representation of functions f( Delta; Delta) Needless to say, it is usually an arduous work to choose an appropriate model f(x; w) as well as to find the optimal values of the parameters w, especially when we do not grasp well the background of the original problem [2] . y in: Proc. of ICOTA 95, World Scientific, 1385 1389, 1995. This work was supported in part by National Natural Science Foundation of China. 1 The newly developed artificial neural network techniques also provide an alternative approach to the model building problem. In order to make an ....

H.Linhart and W.Zucchini, Model Selection, John Wiley & Sons, New York, 1986.


An Evolutionary Approach to Adaptive Model-Building - Pan, Kang, He, Liu   (1 citation)  (Correct)

....a very poor representation of f(x i ; y i )g, even f has optimal parameter values. However, it is usually an arduous work to choose an appropriate model f(x; w) as well as to find the optimal values of the parameters w, especially when we do not grasp well the background of the original problem [5] . The newly developed artificial neural network techniques also provide an alternative approach to the model building problem. In order to make an artificial neural network model to perform well, the network must have an appropriate structure, which includes the topology of nodes and y in: ....

H. Linhart and W. Zucchini, Model Selection, John Wiley & Sons, New York, 1986.


Model Selection and Accounting for Model Uncertainty in.. - Raftery, Madigan.. (1993)   (91 citations)  (Correct)

.... best model or as one of a small set of best models, thus largely resolving the paradox. The background literature for our approach includes several areas of research, namely the selection of subsets of predictor variables in linear regression models (Hocking, 1976; Draper and Smith, 1981; Linhart and Zucchini, 1986; Mitchell and Beauchamp, 1988; Miller, 1990; George and McCulloch, 1993) and model uncertainty (Raftery, 1993; Madigan and Raftery, 1994; Madigan and York, 1993; Kass and Raftery, 1993; Draper, 1994) In the next section we outline the philosophy underlying our approach, describe how we selected ....

Linhart, H. and Zucchini, W. (1986), Model Selection, New York: Wiley.


Statistical Test For Statistical Tests for Comparing Possibly.. - Golden   (Correct)

.... Misspecified and Non nested Models An important problem in model selection is concerned with identifying the bestfitting model to some unobservable data generating process given only a data sample from that process (for further discussion see Akaike, 1973; Cox, 1962; Schwarz, 1978; Bozdogan, 1987; Linhart and Zucchini, 1986; Sin and White, 1996) In such procedures, the fit of each model to the data generating distribution is evaluated using some goodness of fit function. The model with the best goodness of fit is then selected. Such procedures permit multiple models to be simultaneously compared so that if there ....

....p 3 . Let probability models F and G be subsets of the probability model M . If G is a subset of F , then the reduced model G is said to be fully nested in the full model F . Alternatively, suppose that F G = then F and G are said to be strictly non nested. 1 The term operating model (Linhart Zucchini, 1986; Zucchini, this issue) is an alternative term sometimes used to refer to the environmental distribution. This paper will always the term model to refer to a set of probability distributions. 2 The term approximating family of probability distributions (Linhart Zucchini, 1986; Zucchini, ....

[Article contains additional citation context not shown here]

Linhart, H. & Zucchini, W. (1986). Model selection. New York: Wiley.


On the Value of Partial Information - Ratsaby, Maiorov (1998)   (2 citations)  (Correct)

.... problem, then given that m and n are fixed, the best choice of a hypothesis class on which the learner should run empirical loss minimization is H d with d as in (16) The notion of having an optimal complexity d is closely related to statistical model selection (cf. Linhart Zucchini [12]) For instance in Vapnik s Structural Risk Minimization criterion [27] the tradeoff is between m and d. For a fixed m, it is possible to calculate the optimal complexity d of a hypothesis class in a nested class structure by minimizing an upper bound on the loss L(hm ) Lm (h m ) ffl(m; ....

Linhart H., Zucchini W., (1986), "Model Selection", Wiley Series in Probability and Mathematical Statistics, John Wiley & Sons, N.Y.


Monotone B-spline Smoothing - He, Shi (1996)   (Correct)

....similar to the well known Akaike (1973) s information criterion. It aims to balance the fidelity to data and the complexity of the fit. There are several ways to motivate this type of model selection criteria. For instance, it can be understood through minimization of an expected discrepancy, see Linhart and Zucchini (1986, Section 2.4) The use of the total absolute residuals in (2.4) may be viewed as a robust alternative to the residual sum of squares that one would obtain from Gaussian likelihood. It may also be derived directly assuming Laplace errors. The constant factor 2 in the second term of (2.4) is ....

Linhart, H. and Zucchini, W. (1986), Model Selection. John Wiley & Sons, New York.


Automatic Stopping Criterion For Anisotropic Diffusion - Victor Solo School   (Correct)

No context found.

H. Linhart and W. Zucchini. Model selection. J. Wiley, New York, 1986.


Selection of Credibility Regression Models - Bühlmann, Bühlmann (1998)   (Correct)

No context found.

Linhart, H. and Zucchini, W. (1986). Model Selection. Wiley.


Asymptotics for Lasso-type estimators - Knight, Fu (2000)   (2 citations)  (Correct)

No context found.

Linhart, H. and Zucchini, W. (1986) Model Selection. New York: Wiley.


The Informational Complexity of Learning from Examples - Niyogi (1996)   (2 citations)  (Correct)

No context found.

H. Linhart and W. Zucchini. Model Selection. John Wiley and Sons,, 1986.


Master Thesis - Lanterman   (Correct)

No context found.

H. Linhart and W. Zucchini, Model Selection, Wiley, New York, 1986.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC