| W.N. Street and O.L. Mangasarian, Improved Generalization via Tolerant Training, J. Optimization Theory and Applications, vol. 96, no. 2, pp. 259-279, Feb. 1998. ftp://ftp.cs.wisc.edu/math-prog/ tech-reports/95-11.ps. |
....is the ultimate goal of training) This phenomenon is known as overtraining or overfitting. Fitting the data very accurately can be particularly harmful in the presence of noise. It is a widely accepted heuristic in machine learning that tolerant training should be employed to avoid overfitting [20]. Thus state of the art neural network training systems almost always use some kind of early stopping criteria that terminate training before an exact solution to (1.1) is attained. For example, the use of tuning sets is popular [8] We refer the reader to artificial intelligence literature for a ....
....almost always use some kind of early stopping criteria that terminate training before an exact solution to (1.1) is attained. For example, the use of tuning sets is popular [8] We refer the reader to artificial intelligence literature for a discussion of overfitting and other related issues (see [20] and references therein) Motivated by the above considerations, we adopt a slightly different point of view on the issues related to convergence of IGA type techniques than that in [13, 10, 1, 21] We note that tolerant training permits certain errors in fitting the training data which, in the ....
[Article contains additional citation context not shown here]
W.N. Street and O.L. Mangasarian. Improved generalization via tolerant training. Journal of Optimization Theory and Applications, 96: 259--279, 1998.
....1 j kzjj 1 : 19 The loss function corresponding to this problem is essentially that of [18] 30] which is zero in the interval ; and otherwise linear. Thus, this loss function replaces the quadratic part of the Huber loss function by zero. Such a loss function was also studied in [32]. Finally, we also note that Smola [29] 31] uses rather different techniques to derive a dual formulation of the quadratic Huber M estimator problem (9) Smola s formulation can be stated as follows, after removing a suppression term: min u;v 2 kuk 2 2 kvk 2 2 b 0 u v s:t: A 0 ....
W.N. Street and O.L. Mangasarian, Improved Generalization via Tolerant Training, J. Optimization Theory and Applications, vol. 96, no. 2, pp. 259-279, Feb. 1998. ftp://ftp.cs.wisc.edu/math-prog/ tech-reports/95-11.ps.
....as many as 16,000 data points and more than a billion nonzero matrix elements are solved. 1 Introduction Tolerating a small error in fitting a given set of data, i.e. disregarding errors that fall within some positive #, can improve testing set correctness over a standard zero tolerance error fit [18]. Vapnik [19, Section 5.9] makes use of Huber s robust regression ideas [7] by utilizing a robust loss function 1 [19, p 152] with an # insensitive zone (Figure 1 below) and setting up linear and quadratic programs for solving the problem. Scholkopf et al. [15, 16] use quadratic programming based ....
....constrained optimization problem with positive parameter C: min (#,b,s) 1 # ### 1 C # #s# 1 s.t. s # K(A,A # )# be y # s e# # s, 6) which can be represented as a linear program. For a linear kernel and a fixed tolerance #, this is essentially the model proposed in [18], which utilizes a 2 norm instead of a 1 norm. We now allow # to be a variable in the optimization problem above that will be driven to some positive tolerance error determined by the size of the parameter . By making use of linear programming perturbation theory [11] we parametrically maximize # ....
W. N. Street and O. L. Mangasarian. Improved generalization via tolerant training. Journal of Optimization Theory and Applications, 96(2):259--279, February 1998. ftp://ftp.cs.wisc.edu/math-prog/techreports /95-11.ps.
.... economics [71, 35] engineering mechanics [57, 37] and more recently to machine learning [3, 61, 23, 48, 46] In this paper we describe three recent mathematical programming based developments that are relevant to data mining: feature selection [45, 10] clustering [11] and robust representation [67]. We note at the outset that we do not plan to survey either the fields of data mining or mathematical programming, but rather highlight some recent and highly effective applications of the latter to the former. We will, however, point out other approaches that are mostly not based on mathematical ....
....when the data on which the model is based changes. This problem is closely related to the generalization problem of machine learning of how to train a system on a given training set so as to improve generalization on a new unseen testing set [39, 63, 73] We use here a simple linear model [67] and will show that if a sufficiently small error is purposely tolerated in constructing the model, then for a broad class of perturbations the model will be a more accurate representation than one obtained by a conventional zero error tolerance. A simple example demonstrates this result. 1.1 ....
[Article contains additional citation context not shown here]
W. Nick Street and O. L. Mangasarian. Improved generalization via tolerant training. Technical Report 95-11, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin, July 1995. Machine Learning, submitted. Available by ftp://ftp.cs.wisc.edu/math-prog/techreports /95-11.ps.Z.
....considered, namely the number of misclassified points by a separating plane. This problem, even though shown to be NP complete [7] can be effectively solved by a parametric [2] or a hybrid method [7] In Section 3 we describe a central problem of machine learning, that of improving generalization [27]. We give a very simple model which justifies the often accepted rule of thumb of machine learning and approximation theory, that overfitting leads to poor generalization. In fact we go the opposite direction, and show that inexact fitting can lead to improved generalization. In Section 4 we use ....
.... given in [2, 7] 3 Improving Generalization In this section we shall consider a fundamental problem of machine learning: How to train a system on a given training set so as to improve generalization on a new unseen testing set [13, 24, 28] We shall concentrate on some very recent results [27] obtained for a simple linear model and which make critical use of mathematical programming ideas. These ideas, although rigorously established for a simple linear model only, seem to extend to much more complex systems, including neural networks [27] The model that we shall consider here ....
[Article contains additional citation context not shown here]
W. Nick Street and O. L. Mangasarian. Improved generalization via tolerant training. Technical report, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin, 1995. To appear.
.... loss functional [119] jij ffl = 0 if jij ffl; jij Gamma ffl otherwise: 32) Problem (31) can then be formulated as the following constrained quadratic program: minimize w;y (1 Gamma) N (e T y) w T w) subject to Gammay Gamma ffle Aw Gamma b y ffle; y 0: 33) In [115] a similar Tolerant Training Method is proposed in which the following quadratic program is solved for some nonnegative tolerance (which is analogous to ffl above) and some small value of 1 Gamma : minimize w;y;z (1 Gamma) 2 Theta kyk 2 2 kzk 2 2 2 kwk 2 2 subject to ....
....be a valuable contribution. 6. Modeling noise. Mathematical programming models that purposely tolerate error, either because there is noise in the data or because the model is an inaccurate representation of the real problem, are likely to perform better. One such approach, tolerant training [115], purposely tolerates such inaccuracies in the model, and often leads to better predictive results. Extending this tolerant mathematical programming model to a wider class of problems would be an important and practical contribution. 7. Visualization and understandability of derived models. ....
W. N. Street and O. L. Mangasarian. Improved generalization via tolerant training. Journal of Optimization Theory and Applications, 96(2):259--279, February 1998. ftp.cs.wisc.edu/math-prog/techreports /95-11.ps.Z.
No context found.
W.N. Street and O.L. Mangasarian. Improved generalization via tolerant training. Technical Report MP-TR-95-11, University of Wisconsin, Madison, 1995.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC