17 citations found. Retrieving documents...
Kearns, M. (1996). A bound on the error of cross validation using the approximation and estimation rates, with consequences for the training-test split. In Advances in Neural Information Processing Systems 8, pages 183-18. MIT Press.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
On Optimal Data Split For Generalization Estimation And Model .. - Larsen, Goutte (1999)   (Correct)

....variance term measures the reliability of the estimator and it increases when fl decreases. We therefore expect an optimal fl to solve the bias variance trade off: fl opt = argmin fl MSEHO (fl) This optimal choice has been studied asymptotically for non linear models [11] using Vapniklike bounds [7], and in the context of pattern recognition [6] Surprisingly, fl opt 1 as N 1, indicating that most data should be used for testing. For finite sample sizes, theoretical investigations are limited to simple models (see below) K Fold Cross Validation (KCV) The average over all training sets ....

....by the experiments reported in figure 1 (left) All curves are averaged over 40000 replication of the data for each size. When N increases, the optimal fl increases towards 1. Note that the MSE curves flatten, indicating that a wide interval of possible split ratios are near optimal (see also [7]) For K fold CV: MSEKCV (fl) 8 : oe 4 (2fl 3 N Gamma 2fl 2 Gamma 6N fl 2 7fl 6N fl Gamma 7 Gamma 2N) N 2 (fl Gamma 1) 3 fl 0:5 oe 4 ( Gamma4N fl 2 Gamma 9fl 8 2N fl 2fl 2 2fl 3 N) N 2 (fl Gamma 1) 2 fl fl 0:5 (10) It is easy to ....

M. Kearns, "A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split," Neural Computation, vol. 9, no. 5, pp. 1143--1161, 1997.


Estimating the Generalization Performance of an SVM Efficiently - Joachims (2000)   (24 citations)  (Correct)

....2 ) 1 4 k (14) Equation (13) and (14) exhibit a trade o in selecting l and k. The larger l, the smaller the bias. At the same time, the variance increases with k = n l decreasing. The optimal choice of l and k depend on the learner L, the hypothesis space H, and the learning task Pr( x; y) [Kearns, 1996]. Nevertheless, there are good heuristics for selecting reasonable values for l and k [Kearns, 1996] Let s nally also look at worst case bounds for the deviation. Using Hoe ding bounds [Hoe ding, 1963] it holds that Pr(jErr l (h L ) Err l;k ho (h L )j ) 2 exp( 2 k 2 ) 15) ....

....the smaller the bias. At the same time, the variance increases with k = n l decreasing. The optimal choice of l and k depend on the learner L, the hypothesis space H, and the learning task Pr( x; y) Kearns, 1996] Nevertheless, there are good heuristics for selecting reasonable values for l and k [Kearns, 1996]. Let s nally also look at worst case bounds for the deviation. Using Hoe ding bounds [Hoe ding, 1963] it holds that Pr(jErr l (h L ) Err l;k ho (h L )j ) 2 exp( 2 k 2 ) 15) Experimental results for hold out estimates are given in [Kearns et al. 1997] The hold out estimate is ....

Kearns, M. (1996). A bound on the error of cross validation using the approximation and estimation rates, with consequences for the training-test split. In Advances in Neural Information Processing Systems 8, pages 183-18. MIT Press.


A Scaling Law for the Validation-Set Training-Set Size Ratio - Guyon (1997)   (1 citation)  (Correct)

....be reserved for the validation set The optimum tradeoff between having more data to train and more data to validate must be found. Cross validation is a method of model selection which has be widely studied and criticized. Foundational papers include [2, 3, 4] and recent contributions include [5, 6]. In this paper, our emphasis is on exhibiting a simple and general scaling law which can guide experimentalists in pattern recognition: the ratio of the validation set size over the training set size scales like the square root of the complexity of the second level of inference (minimizing the ....

....of the benchmark test set can be solved with classical statistics. In contrast, the problem of determining the size of the validation set involves the complexity of the learning process and the theory of uniform convergence [7] Our derivation method follows similar lines as found in reference [5]: we bound the probability of error of the recognizer selected by cross validation using both classical bounds and VC bounds [7] we optimize the resulting bound for the training validation split. Similarly also, we exhibit two tradeoff terms the balance of which decides of the optimum. In spite ....

[Article contains additional citation context not shown here]

M. Kearns. A bound on the error of cross validation using the approximation and estimation rates, with consequences for the training-test split. In Advances in Neural Information Processing Systems 7 (NIPS 95), 1996, to appear. 10


Expected Error Analysis for Model Selection - Scheffer, Joachims (1999)   (8 citations)  (Correct)

.... difference between true and empirical error (which can be chosen according to Vapnik, 1982) plus an additional penalty term that accounts for the possibility of cross validation choosing a sub optimal model (this penalty term depends on the VC dimension of the largest model, and the sample size; Kearns, 1996). When the sample size increases both, the bound on the difference between true and empirical error and the penalty term for choosing a wrong model vanish, so cross validation always works . For many applications, n fold cross validation works quite well in practice, but it does not scale well ....

Kearns, M. (1996). A bound on the error of cross validation using the approximation and estimation rates, with consequences for the training-test split. In Advances in Neural Information Processing Systems, Vol. 8, pp. 183--189.


Estimating the Generalization Performance of an SVM Efficiently - Joachims (1999)   (24 citations)  (Correct)

....1 4 k (14) Equation (13) and (14) exhibit a trade off in selecting l and k. The larger l, the smaller the bias. At the same time, the variance increases with k = n Gamma l decreasing. The optimal choice of l and k depend on the learner L, the hypothesis space H, and the learning task Pr( x; y) [Kearns, 1996]. Nevertheless, there are good heuristics for selecting reasonable values for l and k [Kearns, 1996] Let s finally also look at worst case bounds for the deviation. Using Hoeffding bounds [Hoeffding, 1963] it holds that Pr(jErr l (h L ) Gamma Err l;k ho (h L )j ffl) 2 exp( Gamma2 k ffl 2 ....

....the bias. At the same time, the variance increases with k = n Gamma l decreasing. The optimal choice of l and k depend on the learner L, the hypothesis space H, and the learning task Pr( x; y) Kearns, 1996] Nevertheless, there are good heuristics for selecting reasonable values for l and k [Kearns, 1996]. Let s finally also look at worst case bounds for the deviation. Using Hoeffding bounds [Hoeffding, 1963] it holds that Pr(jErr l (h L ) Gamma Err l;k ho (h L )j ffl) 2 exp( Gamma2 k ffl 2 ) 15) Experimental results for hold out estimates are given in [Kearns et al. 1997] The hold out ....

Kearns, M. (1996). A bound on the error of cross validation using the approximation and estimation rates, with consequences for the training-test split. In Advances in Neural Information Processing Systems 8, pages 183--18. MIT Press.


Probabilistic Outputs for Support Vector Machines and Comparisons.. - Platt (1999)   (74 citations)  (Correct)

....simply training on the entire data set. Because determining the system parameters is often unavoidable, determining A and B from the hold out set may not incur extra computation with this method. Cross validation is an even better method than a hold out set for estimating the parameters A and B [10]. In three fold cross validation, the training set is split into three parts. Each of three SVMs are trained on permutations of two out of three parts, and the f i are evaluated on the remaining third. The union of all three sets of f i can form the training set of the sigmoid (and also can be ....

M. Kearns. A bound on the error of cross validation using the approximation and estimation rates, with consequences for the training-test split. Neural Computation, 9(5):1143-1161, 1997.


Model Selection for Probabilistic Clustering Using Cross-Validated .. - Smyth (1985)   (9 citations)  (Correct)

.... for an application to hidden Markov models) Directions for further work on cross validated likelihood include a bias variance characterization for better understanding of the trade offs involved in choosing fi (see for example the work of Shao (1993) and Zhang (1993) in a regression context and Kearns (1996) in a classification context) and comparative studies between penalized likelihood, Bayesian, and cross validation methodologies. In related work, Smyth and Wolpert (1998) extend the framework in this paper to model averaging of mixture models for density estimation, using cross validation to ....

Kearns, M., `A bound on the error of cross validation using the approximation and estimation rates, with consequences for the training-test split,' in Advances in Neural Information Processing 8, Touretzky, D. S., Mozer, M. C., and Hasselmo, M. E. (eds.), Cambridge, MA: The MIT Press, 183--189, 1996.


Expected Error Analysis for Model Selection - Scheffer, Joachims (1999)   (8 citations)  (Correct)

.... error (which can be chosen according to Wapnik Tscherwonenkis, 1979; Vapnik, 1982) plus an additional penalty term that accounts for the possibility of cross validation choosing a sub optimal model (this penalty term depends on the VC dimension of the largest model, and the sample size; Kearns, 1996). When the sample size increases, both the bound on the difference between true and empirical error and the penalty term for choosing a wrong model vanish, so cross validation always works . For many applications, n fold cross validation works quite well in practice, but it does not scale well ....

Kearns, M. (1996). A bound on the error of cross validation using the approximation and estimation rates, with consequences for the training-test split. In Advances in Neural Information Processing Systems, Vol. 8, pp. 183--189.


Asymptotic Statistical Theory of Overtraining and.. - S. Amari, N.. (1996)   (15 citations)  (Correct)

....early stopping lead to similar solutions and stressed the analogy between the number of iterations and the regularization parameter. Barber, Saad and Sollich [1995a,b] considered the evaluation of the generalization error by crossvalidation for linear perceptrons. Recently Guyon et al. 1996] and Kearns [1996] derived a VC bound for the optimal split between training and validation set, which shows the same scaling as our result. The VC result scales inversely with the square root of the VC dimension (cf. Vapnik [1982] of the network, which in the case of realizable rules coincides with the number of ....

Kearns, M. [1996], A bound on the error of cross validation using the approximation and estimation rates, with consequences for the training-test split, in NIPS`95: Advances in Neural Information Processing Systems 8, D.S. Touretzky, M.C. Mozer and M.E. Hasselmo (eds.), MIT Press: Cambridge, MA.


On Optimal Data Split For Generalization Estimation And Model.. - Jan Larsen (1999)   (Correct)

....variance term measures the reliability of the estimator and increases when fl decreases. We therefore expect an optimal fl to solve the bias variance trade off: fl opt = argmin fl MSEHO (fl) This optimal choice has been studied asymptotically for non linear models [13] using Vapnik like bounds [8], and in the context of pattern recognition [7] Surprisingly, fl opt 1 as N 1, indicating that most data should be used for testing. For finite sample sizes exact theoretical investigations are limited to simple models (see below) K Fold Cross Validation (KCV) The average over all training ....

....the experiments reported in figure 1 (left) All curves are averaged over 40000 replication of the data for each size. When N increases, the optimal fl increases towards 1. Note that the MSE curves flatten, indicating that a wide interval of possible split ratios are near optimal, as confirmed in [8] under different presumptions. The results obtained for K fold CV are totally different (detailed in [11] MSEKCV (fl) 8 : oe 4 (2 fl 3 N Gamma 2 fl 2 Gamma 6 N fl 2 7 fl 6 N fl Gamma 7 Gamma 2 N) N 2 (fl Gamma 1) 3 ; fl 0:5 oe 4 ( Gamma4 N fl 2 Gamma 9 fl ....

M. Kearns, "A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split," Neural Computation, vol. 9, no. 5, pp. 1143--1161, 1997.


Optimal Cross-Validation Split Ratio: Experimental Investigation - Cyril Goutte (1998)   (Correct)

....to its setting: 6] shows that for linear model selection, fl should asymptotically tend to 1, a surprising result that indicates that the relative amount of data left for model estimation should tend to 0. A similar result was obtained by [7] in a different setting, while a recent communication [8] suggests that small split ratios should be favoured for the split sample method. Accordingly, we expect the optimal fl to either approach 1 or 0 when N increases. The experiments presented in section 3 address this important issue. 2 Adaptive metric kernel regression Adaptive metric kernels are ....

Kearns M, A bound on the error of cross validation using the approximation and estimation rates, with consequences for the training-test split. Neural Computation, 1997, 9:1143--1161


Model Selection for Probabilistic Clustering Using Cross-Validated .. - Smyth (1998)   (9 citations)  (Correct)

.... for an application to hidden Markov models) Directions for further work on cross validated likelihood include a bias variance characterization for better understanding of the trade offs involved in choosing fi (see for example the work of Shao (1993) and Zhang (1993) in a regression context and Kearns (1996) in a classification context) and comparative studies between penalized likelihood, Bayesian, and cross validation methodologies. In related work, Smyth and Wolpert (1998) extend the framework in this paper to model averaging, again using cross validation to empirically determine the model ....

Kearns, M., `A bound on the error of cross validation using the approximation and estimation rates, with consequences for the training-test split,' in Advances in Neural Information Processing 8, Touretzky, D. S., Mozer, M. C., and Hasselmo, M. E. (eds.), Cambridge, MA: The MIT Press, 183--189, 1996.


The Connection between Regularization Operators and.. - Smola, Schölkopf, Müller (1998)   (44 citations)  (Correct)

....more information on using prior knowledge for choosing kernels see [24] Prior knowledge can also be used to determine the free parameters of the kernel, e.g. its width (oe) in the examples 4 and 6 . Besides that model selection principles like structural risk minimization [34] cross validation [3,2,11], MDL [19] Bayesian methods [13,3] etc. can be employed. Choosing a small width of the kernels leads to high generalization error as it effectively decouples the separate basis functions of the kernel expansion into very localized functions which is equivalent to memorizing the data, whereas a ....

M. Kearns. A bound on the error of cross validation using the approximation and estimation rates, with consequences for the training-test split. Neural Computation, 9(5):1143--1161, 1997.


Preventing "Overfitting" of Cross-Validation Data - Ng (1997)   (6 citations)  (Correct)

....from having tested too many hypotheses, rather than from having chosen a too complex a hypothesis. For example, within the set of hypotheses we are selecting from, there is no notion of a structure or a sequence of nested hypothesis classes of increasing complexity such as assumed in some models [Kearns, 1996, Vapnik and Chervonenkis, 1971] or of some hypotheses having been trained using a more complex hypothesis class. Because of this, even thought the literature treating overfitting is rich with algorithms for selecting limiting the complexity of the output hypothesis, this literature does not ....

Kearns, M. J. (1996). A bound on the error of Cross Validation using the approximation and estimation rates, with consequences for the training-test split. In Advances in Neural Information Processing Systems 8, pages 183--189. Morgan Kaufmann.


On Feature Selection: Learning with Exponentially many Irrelevant.. - Ng (1998)   (10 citations)  (Correct)

....to find feature sets better suited to the inductive biases of our learning algorithm, and tends to give superior performance [ Langley, 1994 ] In this paper, we study only the wrapper model of feature selection, and largely in the context of classification. Our analysis is largely inspired by [ Kearns, 1996 ] with our theoretical results heavily based on the techniques given there and those outlined in [ Kearns et al. 1997 ] We also rely heavily on tools from [ Vapnik, 1982 ] that give a very general framework for bounding the deviation of training error from generalization error. 2 ....

....we make the (rather strong) assumption that given a particular data set Sj F , L chooses the hypothesis h from some class of hypotheses (shortly to be formalized) so as to minimize training error. This closely ties in with the learning framework studied by [ Vapnik, 1982 ] and is also used in [ Kearns, 1996 ] and [ Kearns et al. 1997 ] in proving bounds on generalization error. We believe it to be a very natural model, and that it is a rich enough class of learning algorithms to merit detailed study. But also see [ Kearns et al. 1997 ] for comments regarding relations to learning algorithms that ....

[Article contains additional citation context not shown here]

Kearns, M. J. (1996). A bound on the error of Cross Validation using the approximation and estimation rates, with consequences for the training-test split. In Advances in Neural Information Processing Systems 8, pages 183--189. Morgan Kaufmann.


The Maximum-Margin Approach to Learning Text Classifiers -.. - Joachims (2000)   (17 citations)  (Correct)

No context found.

Kearns, M. (1996). A bound on the error of cross validation using the approximation and estimation rates, with consequences for the training-test split. In Advances in Neural Information Processing Systems 8, pages 183-18. MIT Press.


Model Selection for Probabilistic Clustering Using Cross-Validated .. - Smyth (1998)   (9 citations)  (Correct)

No context found.

Kearns, M., `A bound on the error of cross validation using the approximation and estimation rates, with consequences for the training-test split,' in Advances in Neural Information Processing 8, Touretzky, D. S., Mozer, M. C., and Hasselmo, M. E. (eds.), Cambridge, MA: The MIT Press, 183--189, 1996.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC