13 citations found. Retrieving documents...
T. Scheffer and T. Joachims. Expected error analysis for model selection. Preprint, TU Berlin, 1999. Available at http:ki.cs.tu-berlin.de/ scheffer.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Finding the Most Interesting Patterns in a Database Quickly.. - Scheffer, Wrobel (2001)   (Correct)

....are 2 n decision trees with n xed leaf nodes it is trivial to assign optimal class labels in O(n) steps without representing all 2 n alternatives. Similarly, the histogram of error rates of a set of decision trees or rule sets can be determined in time logarithmic in the number of hypotheses [23]. We are con dent that our sampling algorithm can be applied analogously for complex and structured hypothesis spaces without explicit representation of all hypotheses. By giving worst case bounds on the sample size (and proving that there is no sampling algorithm for some utility functions) our ....

T. Sche er and T. Joachims. Expected error analysis for model selection. In Proceedings of the Sixteenth International Conference on Machine Learning, 1999.


Predicting the Relation between Model Class, Domain, and Error Rate - Scheffer   (Correct)

....empirical error is e if none of the hypotheses incur a lower empirical error. Since the histogram tells us how many hypotheses have which generalization error we can calculate P (least empirical error is e) Q h P (empirical error ejgeneralization error of h) The analysis is described in [10, 6, 7]. Applying the analysis to the model selection problem. Given the histogram of error rates in the model class, the sample size (and, for non exhaustive learners, the empirical error) the analysis yields the expected generalization error of a learner that uses the given model class for the given ....

....hypotheses in the model class. Some results on the complexity of estimating the error histogram can be found in [6] The resulting model selection algorithm has been applied to the problem of selecting the optimal number of leaves of decision trees [7] and to a large text categorization problem [10, 6]. Quantifying the error rate of cross validation. Similarly to the generalization error of the hypothesis which is returned by a learner, we can quantify the generalization error of the hypothesis which is returned by a cross validation wrapper. The wrapper invokes the learner once for each ....

T. Sche er and T. Joachims. Expected error analysis for model selection. In Proceedings of the Sixteenth International Conference on Machine Learning, 1999. 2


Predicting the Generalization Performance of Cross Validatory.. - Scheffer (2000)   (Correct)

.... which would make the analysis an interesting heuristic for determining parameters of the wrapper (e.g. the split ratio) After a quick clari cation of some notational details in this section, we will, in Section 2, brie y recapture the average case analysis of exhaustive learning algorithms of Sche er and Joachims (1999) on which our analysis of cross validation is based. We will present this analysis in Section 3. In Section 4, we will study empirically how accurately our analysis predicts the behavior of holdout testing and n fold cross validation wrappers for model selection using both randomly drawn boolean ....

....in favor of the model class with the smaller index) The whole sample is now used to minimize the empirical error within H which yields the nal hypothesis h L that is returned by the wrapper. 2. Error Rate for a Fixed Model Class In this Section, we summarize the expected error analysis (Sche er Joachims, 1999; Sche er, 1999) on which our analysis of cross validation is based. Let us assume that the model class H i is xed and that our learner minimizes the empirical error within H i (breaking ties by drawing at random, when the hypothesis with least empirical error rate is not uniquely determined) ....

[Article contains additional citation context not shown here]

Sche er, T., & Joachims, T. (1999). Expected error analysis for model selection. Proceedings of the Sixteenth International Conference on Machine Learning.


Predicting the Generalization Performance of Cross Validatory.. - Scheffer (2000)   (Correct)

.... which would make the analysis an interesting heuristic for determining parameters of the wrapper (e.g. the split ratio) After a quick clari cation of some notational details in this section, we will, in Section 2, brie y recapture the average case analysis of exhaustive learning algorithms of Sche er and Joachims (1999) on which our analysis of cross validation is based. We will present this analysis in Section 3. In Section 4, we will study empirically how accurately our analysis predicts the behavior of holdout testing and n fold cross validation wrappers for model selection using both randomly drawn boolean ....

....in favor of the model class with the smaller index) The whole sample is now used to minimize the empirical error within H which yields the nal hypothesis h L that is returned by the wrapper. 2. Error Rate for a Fixed Model Class In this Section, we summarize the expected error analysis (Sche er Joachims, 1999; Sche er, 1999) on which our analysis of cross validation is based. Let us assume that the model class H i is xed and that our learner minimizes the empirical error within H i (breaking ties by drawing at random, when the hypothesis with least empirical error rate is not uniquely determined) ....

[Article contains additional citation context not shown here]

Sche er, T., & Joachims, T. (1999). Expected error analysis for model selection. Proceedings of the Sixteenth International Conference on Machine Learning.


Average-Case Analysis of Classification Algorithms for Boolean.. - Scheffer (2000)   (1 citation)  (Correct)

....not have to be invoked before the analysis can be applied) We can predict both the resulting empirical error rate and the resulting generalization error from the histogram of error rates and the number of hypotheses. The analysis is a simpli cation of an analysis proposed by Sche er and Joachims [19]. Let us rst sketch how the resulting empirical error rate on the training set can be predicted without running the learning algorithm at all. The empirical error rate of a single hypothesis with generalization error is governed by the binomial distribution B[m; The least empirical error ....

....by Langley et al. 10] under some simplifying approximations [9] the analysis becomes computationally ecient. An average case analysis of cross validation has been presented in [16] A rst version of the analysis class was presented by Sche er and Joachims [18, 17] and later generalized [19] and applied to text categorization and decision tree regularization [15] Independently, Domingos [1] presented a similar analysis which additionally assumes that all hypotheses incur equal error rates. Lifting the latter assumption [2] leads to an analysis that (besides making the additional ....

T. Sche er and T. Joachims. Expected error analysis for model selection. In Proceedings of the Sixteenth International Conference on Machine Learning, 1999.


Nonparametric Regularization of Decision Trees - Scheffer (2000)   (Correct)

....complexity of H i do we need o obtain a reliable estimate of (h) for all possible problems, without assessing hypotheses on holdout data. We will identify this missing information and discuss how it can be acquired eciently in many cases. In Section 2, we simplify the expected error analysis of [18] slightly and apply it to the problem of choosing the optimal decision tree complexity. The original analysis is restricted to exhaustive learners while decision tree algorithms are usually greedy. Our main theoretical result (Section 3) is an extension of the analysis to greedy learning ....

....sample is given. Our analysis is an actual case analysis (for a given learner and a given learning problem) rather than a (PAC style) worst case analysis (for the worst possible problem) Compared to earlier actual case analyses [17, 16, 5] our analysis is based on weaker assumptions. Compared to [18], our analysis is considerably simpler and, most importantly, covers greedy learners. An actual case analysis for Naive Bayesian classi ers that is guided by a similar idea has been presented by Langley and Sage [10] an actual case analysis for linear neural networks is given in [6] ....

T. Sche er and T. Joachims. Expected error analysis for model selection. In ICML-99, 1999.


Computable Shell Decomposition Bounds - Langford, McAllester (2000)   (12 citations)  (Correct)

....than or equal to our uncomputable upper bound. Our lower bound on generalization error also shows that there is essentially no loss in working with an upper bound computed from the true error distribution rather than expectations computed from this distribution as used by Scheffer and Joachims [4]. Asymptotically, the computable bound is simply the uncomputable bound with the unknown distribution of true errors replaced with observed histogram of training errors. Unfortunately, we can show that in limits where ln jHm j m converges to a value greater than zero, the histogram of training ....

Tobias Scheffer and Thorsten Joachims, "Expected error analysis for model selection", International Conference on Machine Learning (ICML), 1999.


Computable Shell Decomposition Bounds - Langford, McAllester (2000)   (12 citations)  (Correct)

....than or equal to our uncomputable upper bound. Our lower bound on generalization error also shows that there is essentially no loss in working with an upper bound computed from the true error distribution rather than expectations computed from this distribution as used by Scheffer and Joachims [4]. Asymptotically, the computable bound is simply the uncomputable bound with the unknown distribution of true errors replaced with observed histogram of training errors. Unfortunately, we can show that in limits where ln jHm j m converges to a value greater than zero, the histogram of training ....

Tobias Scheffer and Thorsten Joachims, "Expected error analysis for model selection", International Conference on Machine Learning (ICML), 1999.


Expected Error Analysis for Model Selection - Scheffer, Joachims (1999)   (8 citations)  Self-citation (Scheffer Joachims)   (Correct)

No context found.

T. Scheffer and T. Joachims. Expected error analysis for model selection. Preprint, TU Berlin, 1999. Available at http:ki.cs.tu-berlin.de/ scheffer.


Clipping and Analyzing News Using Machine Learning.. - Gründel, Naphtali.. (2001)   Self-citation (Sche)   (Correct)

....often that word occurs in the text. We thus transform each text into a feature vector, treating a text as a bag of words. Finally, we weight each feature by the inverse frequency of the corresponding word which has generally been observed to increase the accuracy of the resulting classi ers (e.g. [10, 22]) This procedure maps each text to a point in a highdimensional space. The Support Vector Machine (SVM) 11] is then used to eciently nd a hyper plane which separates positive from negative examples, such that the margin between any example and the plane is maximized. For each category we thus ....

T. Sche er and T. Joachims. Expected error analysis for model selection. In Proceedings of the Sixteenth International Conference on Machine Learning, 1999.


Error Estimation and Model Selection - Scheffer (1999)   Self-citation (Scheffer)   (Correct)

....the empirical error rates of O(log m) randomly drawn hypotheses. Hence, the key feature of the theorem is that it provides us with an estimate of the true error rate of the hypothesis returned by an empirical error minimizing rate without us having to run the learner at all. Experimental results [6, 4] show that the error estimate is often at least as accurate as an estimate obtained by 10 fold cross validation. Since the estimate is obtained very efficiently we now have a means of conducting model selection that can be applied in cases in which n fold cross validation cannot. In fact, we ....

....10 fold cross validation. Since the estimate is obtained very efficiently we now have a means of conducting model selection that can be applied in cases in which n fold cross validation cannot. In fact, we can solve model selection problems with as many as 12,000 examples and 10,000 attributes. [6, 4]. 3 Holdout Testing Based Model Selection In the previous section, we have seen that there is a mathematical model that characterizes the actual behavior of exhaustive, empirical error minimizing learners quite accurately. We can use it, for instance, to predict which model class will lead to ....

T. Scheffer and T. Joachims. Expected error analysis for model selection. In Proceedings of the International Conference on Machine Learning (ICML-99), 1999.


Expected Error Analysis for Model Selection - Scheffer, Joachims (1999)   (8 citations)  Self-citation (Scheffer Joachims)   (Correct)

....by comparing Equation 10 of Domingos (1999) to Theorem 3 of Scheffer and Joachims (1998a) Note further that Theorem 1 of Scheffer and Joachims (1998a) generalizes Theorem 3 by taking nonzero empirical errors into account. By contrast, in this paper and in its corresponding extended abstract (Scheffer Joachims, 1999), we presented a very general solution which is only based on the weaker assumption that the empirical errors of distinct hypotheses are independent, given the true error rates. Learning curves. We presented experiments which showed that, when the prior distribution of error values P fhg (ED ....

....learning problems for which simply minimizing the empirical error rate rather than constraining the learner to a restricted model is optimal. But several conditions have been identified under which model selection can in fact be shown to be beneficial, see Schaffer (1993a, 1993b) Wolpert (1993) Scheffer (1999). 6.4 Concluding Remarks We have demonstrated that the expected error rate of a hypothesis that minimizes the empirical error rate within a given model depends on the prior distribution of error rates of hypotheses in that model. This distribution can be estimated from the data and, when it is ....

Scheffer, T., & Joachims, T. (1999). Expected error analysis for model selection. In Proceedings of the International Conference on Machine Learning (ICML-99).


Expected Error Analysis for Model Selection - Scheffer, Joachims (1999)   (8 citations)  Self-citation (Scheffer Joachims)   (Correct)

....to over fitting. The task of choosing a model such that the error of the hypothesis returned by a learner (which uses this model) is low is referred to as model selection. Approaches to model selection fall into the categories hold out testing, complexity penalization, and Bayesian learning. See [17] or [7] for a more detailed overview. Hold out testing algorithms stratify the potential hypothesis language H into subsets ( models ) H 1 ; H 2 ; ae H . Starting with the smallest model, the learning algorithm returns one hypothesis from each model. The hold out set (which has not been ....

....Appendix B. Note that the only input to Theorem 2 are the error prior, jH i j, and m. Theorem 2 (which effectively replaces Theorem 1) solves the primary complexity problem by removing the product over all subsets of H i from Equation 2. A careful implementation of the formula (see the full paper [17] for details) now runs in O(m 2 ) Estimating P fhg (ED (h)jH i ; DXY ) As P fhg (ED (h)jH i ; DXY ) depends on DXY , it cannot be determined exactly. All information on DXY which we can access is contained in S. We can use S to measure P fhg (ES (h)jH i ; S) which will serve as an estimate ....

[Article contains additional citation context not shown here]

T. Scheffer and T. Joachims. Expected error analysis for model selection. Preprint, TU Berlin, 1999. Available at http:ki.cs.tu-berlin.de/ scheffer.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC