15 citations found. Retrieving documents...
I. Guyon, B. Boser, and V. Vapnik. Automatic capacity tuning of very large VC-dimension classifiers. In S. J. Hanson, J. D. Cowan, and C. Lee Giles, editors, Advances in Neural Information Processing Systems, volume 5, pages 147--155. Morgan Kaufmann, San Mateo, CA, 1993.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Support Vector Machines for Phoneme Classification - Salomon (2001)   (Correct)

....2.4 Non linear Classifiers In the above section, we described how the linear SVM could handle misclassfied examples. Another extension is needed before SVMs can be used to effectively handle real world data: the modelling of non linear decision surfaces. The method for doing this was proposed by [23]. The idea is to explicitly map the input data to some higher dimensional space, where the data is linearly separable. We can use a mapping: F : N (2.28) where N is the dimension of the input space, and a higher dimensional space, termed feature space. In feature space, the technique ....

....Machines 21 Figure 2.5: The role of the kernel (from [43] a) The data is mapped from input space to feature space by a mapping F. b) The optimal separating hyperplane is found. c) The hyperplane is mapped back down to input space, where it results in a non linear decision boundary. Guyon ([23]) showed that an old trick by [35] can be used, called the kernel trick . Using this trick, the above three steps could be combined into one. In the training phase described in the previous section, notice in equation (2.23) that only includes the training data in the form of their scalar inner ....

I. Guyon, B. Boser, and V. Vapnik. Automatic capacity tuning of very large VCdimension classifiers. In S. J. Hanson, J. D. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems, volume 5, pages 147--155. Morgan Kaufmann, San Mateo, CA, 1993.


A Learning Rule for Universal Approximators with a.. - Harald Burgsteiner Peter   (Correct)

....learning algorithms for singles threshold gates such as the perceptron learning algorithm that can also easily be implemented in hardware. But single threshold gates have a very limited expressive power (unless their input dimension is excessively enlarged, like in Support Vector Machines [Guyon et al. 1993]) and hence one has resorted to multi layer perceptrons. These have the desired expressive power since they satisfy the universal approximation theorem , but they require to compute and communicate precise values of derivatives, which provides a serious obstacle for any analog implementation. ....

....above update might collect all weights around the z s line. This would make the predictions of the WTA circuit very unstable. Thus we carry out an additional update to move the weights away from the z s line. This is similar in spirit to large margin classification as in Support Vector Machines [Guyon et al. 1993], where one tries to have a large margin between data points and the separating hyperplane. In our case the width of the margin fl is given as a parameter to the algorithm. Note that we are considering the dual space. Thus we are interested in the margin between the z s line and the points ....

Guyon I., Boser B., and Vapnik V. (1993). Automatic capacity tuning of very large VC-dimension classifiers. Advances in Neural Information Processing Systems, volume 5, Morgan Kaufmann (San Mateo) 147-155.


Choice and Value Flexibility Jointly Contribute to the.. - Poirazi, Mel   (Correct)

....to its linear counterpart. Although it has been long appreciated that inclusion of higher order terms can increase the power of a learning machine for both regression and classification (Poggio, 1975; Barron, Mucciardi, Cook, Craig, Barron, 1984; Giles Maxwell, 1987; Ghosh Shin, 1992; Guyon, Boser, Vapnik, 1993; Karpinsky Werther, 1993; Schurmann, 1996) and algorithms for learning polynomials have often included heuristic strategies for subsampling the intractable number of higher order product terms, existing theory bearing on the learning capabilities of quadratic classifiers is limited. Cover ....

Guyon, I., Boser, B., &Vapnik, V. (1993). Automatic capacity tuning of very large vc-dimension classifiers. In S. Hanson, J. Cowan,&C. Giles (Eds.), Advances in 1204 Panayiota Poirazi and Bartlett W. Mel neural information processing systems, 5 (pp. 147--155). San Mateo, CA: Morgan Kaufmann.


On the Difficulty of Designing Good Classifiers - Grigni, Mirelli, Papadimitriou (1996)   (2 citations)  (Correct)

....Suppose that the two point sets can be separated by a single linear inequality, but we want to find the inequality that separates them and involves as few variables as possible. This situation is of interest when we use functions of the points as additional coordinates to facilitate classification [4, 11]. We point out that variants of this problem are hard for various levels of the W hierarchy [3, 6] which implies that (unless an unlikely collapse occurs) they cannot be solved in polynomial time even if the optimum sought is small (bounded by any very slowly growing function) 2. Definitions ....

....however, the separable case is practically interesting because it arises when we introduce extra variables to make classification possible. For example, one may introduce lowdegree monomials (products of variables) or radial basis functions (simple functions of the distance from a point) [11, 13], and then construct a linear decision tree treating the outputs of these functions as new variables. Or one could even allow more costly special purpose classifying heuristics, and also treat their outputs as variables. It is clear that any disjoint finite sets W and B may be separated given ....

I. Guyon, B. Boser, and V. Vapnik, Automatic capacity tuning of very large VC-dimension classifiers, in Advances in Neural Information Processing Systems, S. J. Hanson, J. D. Cowan, and C. L. Giles, eds., vol. 5, Morgan Kaufmann, 1993, pp. 147--155. 6 M. GRIGNI, V. MIRELLI, AND C. H. PAPADIMITRIOU


Choice and Value Flexibility Jointly Contribute to the.. - Poirazi, Mel (1999)   (Correct)

....relative to its linear counterpart. While it has been long appreciated that inclusion of higher order terms can increase the power of a learning machine for both regression and classi cation (Poggio, 1975; Barron, Mucciardi, Cook, Craig, Barron, 1984; Giles Maxwell, 1987; Ghosh Shin, 1992; Guyon, Boser, Vapnik, 1993; Karpinsky Werther, 1993; Schurmann, 1996) and algorithms for learning polynomials have often included heuristic strategies for subsampling the intractable number of higher order product terms, existing theory bearing on the learning capabilities of quadratic classi ers is limited. Cover ....

Guyon, I., Boser, B., & Vapnik, V. (1993). Automatic capacity tuning of very large VCdimension classiers. In Hanson, S., Cowan, J., & Giles, C. (Eds.), Advances in Neural Information Processing Systems, Vol. 5, pp. 147-155. Morgan Kaufmann, San Mateo, CA.


A Tutorial on Support Vector Regression - Smola, Schölkopf (1998)   (97 citations)  (Correct)

....Vapnik and Chervonenkis [1974] Vapnik [1982, 1995] In a nutshell, VC theory characterizes properties of learning machines which enable them to generalize well to unseen data. In its present form, the SV machine was developed at AT T Bell Laboratories by Vapnik and co workers [Boser et al. 1992, Guyon et al. 1993, Cortes and Vapnik, 1995, Scholkopf et al. 1995, Vapnik et al. 1997] Due to this industrial context, SV research has up to date had a sound orientation towards real world applications. Initial work focused on OCR (optical character recognition) Within a short period of time, SV classifiers ....

....estimate) on E and then apply (150) to obtain the overall entropy numbers, which then, in turn, are substituted into an error bound of the type introduced in (136) 8. 5 Bounding the shape of Phi(X) As already mentioned beforehand the first approach to estimate the shape of E was carried out in [Guyon et al. 1993, Vapnik, 1995] There the assumption was made that E has the shape of a ball. It turns out [Scholkopf et al. 1995] that the radius r of the latter can be estimated by solving a quadratic programming problem. This method, however, has a fundamental drawback, which is not resolved in [Vapnik, ....

I. Guyon, B. Boser, and V. Vapnik. Automatic capacity tuning of very large VC-dimension classifiers. In Stephen Jos'e Hanson, Jack D. Cowan, and C. Lee Giles, editors, Advances in Neural Information Processing Systems, volume 5, pages 147--155. Morgan Kaufmann, San Mateo, CA, 1993.


On the Difficulty of Designing Good Classifiers - Grigni, Mirelli, Papadimitriou (1996)   (2 citations)  (Correct)

....the two point sets can be separated by a single linear inequality, but we want to find the inequality that separates them and involves as few variables as possible. This situation is of interest when we use functions of the points as additional coordinates to facilitate classification [BGV92, BGV94] We point out that variants of this problem are complete for various levels of the W hierarchy [BFH94, CCDF94] which implies that (unless an unlikely collapse occurs) they cannot be solved in polynomial time even if the optimum sought is small (bounded by any very slowly growing function) 2 ....

....the separable case is practically interesting because it comes up when we introduce extra variables to make classification possible. For example, one may introduce low degree monomials (products of variables) or radial basis functions (simple functions of the distance from a point) Hay94, BGV94] and then construct a linear decision tree treating the outputs of these functions as new variables. Or one could even allow more costly special purpose classifying heuristics, and also treat their outputs as variables. It is clear that any disjoint finite sets W and B may be separated given ....

B. E. Boser, I. M. Guyon, and V. N. Vapnik. Automatic capacity tuning of very-large VC-dimension classifiers. Manuscript, 1994.


Robust Linear Discriminant Trees - John (1995)   (5 citations)  (Correct)

....algorithm they instead implement Fisher s linear discriminant function. Regarding robust methods and outlier rejection, Hubel (1977) states I am inclined to : prefer technical expertise to any statistical criterion for straight outlier rejection. We are guilty of this sin in our work, but Guyon, Boser Vapnik (1993) have proposed an interesting method for making use of human expertise in outlier removal to clean a dataset. 7 Future Work A known problem with decision trees is that as one goes deeper in the tree, less data is available to each node, due to the recursive hard partitioning. Since DT SE ....

Guyon, I., Boser, B. & Vapnik, V. (1993), Automatic capacity tuning of very large VC-dimension classifiers, in S. J. Hanson, J. Cowan & C. L. Giles, eds, "Advances in Neural Information Processing Systems", Vol. 5, Morgan Kaufmann, pp. 147--154.


Robust Linear Discriminant Trees - John   (5 citations)  (Correct)

....algorithm they instead implement Fisher s linear discriminant function. Regarding robust methods and outlier rejection, Huber (1977) states I am inclined to : prefer technical expertise to any statistical criterion for straight outlier rejection. We are guilty of this sin in our work, but Guyon, Boser Vapnik (1993) have proposed an interesting method for making use of human expertise in outlier removal to clean a dataset. John (1995) presents further experiments with iterative re filtering in the context of the C4.5 (Quinlan 1993) decision tree learning algorithm. 36.7 Future Work A known problem with ....

Guyon, I., Boser, B. & Vapnik, V. (1993), Automatic capacity tuning of very large VCdimension classifiers, in S. J. Hanson, J. Cowan & C. L. Giles, eds, "Advances in Neural Information Processing Systems", Vol. 5, Morgan Kaufmann, pp. 147--154.


The Sample Complexity of Pattern Classification With Neural.. - Bartlett (1997)   (69 citations)  (Correct)

.... every h in H has er P (h) er fl z (h) s 2 m (d ln(34em=d) log 2 (578) ln(4=ffi) where d = fat H (fl=16) The idea of using the magnitudes of the values of h(x i ) to give a more precise estimate of the generalization performance was first proposed in [40] and was further developed in [18] and [11] There it was used only for the case of linear function classes. Rather than giving bounds on the generalization error, the results in [40] were restricted to bounds on the misclassification probability for a fixed test sample, presented in advance. The problem was further investigated ....

I. Guyon, B. Boser, and V. Vapnik. Automatic capacity tuning of very large vcdimension classifiers. In NIPS 5, pages 147--155. Morgan Kaufmann, 1993.


Regression Estimation with Support Vector Learning Machines - Smola, al. (1996)   (6 citations)  Self-citation (Vapnik)   (Correct)

No context found.

I. Guyon, B. Boser, and V. Vapnik. Automatic capacity tuning of very large VC-dimension classifiers. In Stephen Jos'e Hanson, Jack D. Cowan, and C. Lee Giles, editors, Advances in Neural Information Processing Systems, volume 5, pages 147--155. Morgan Kaufmann, San Mateo, CA, 1993.


Regression Estimation with Support Vector Learning.. - Smola, Burges.. (1996)   (6 citations)  Self-citation (Vapnik)   (Correct)

....a great variety having to satisfy only some condition from Hilbert Schmidt theory. Capacity can be controlled effectively through the regularization functional used (some more research will have to be done on this point) by the same method that was applied to Support Vector Pattern Recognition [GBV93] Chapter 1 Introduction Before any calculus is done one should consider the basic problem we are dealing with in its fundamental setting: to approximate a function from given data and or to estimate a function from given data. There is an important difference between these two problems. In the ....

....L 1 and L 2 loss functions for different choices of ffl and oe. The minimum of these curves is shown in 5.20. 5.3 Future Perspectives Still one question remains unanswered the choice of the regularization constant U . Unfortunately the bounds which had been appplied for Pattern Recognition [GBV93] are not tight enough for Regression Estimation. One might consider a poor man s approach like cross validation or something more sophisticated like bootstrapping methods [ET94] until the problem is settled in a reasonable manner. This leaves a wide field of research open for future work. Another ....

I. Guyon, B. Boser, and V. Vapnik. Automatic capacity tuning of very large VC-dimension classifiers. In Stephen Jos'e Hanson, Jack D. Cowan, and C. Lee Giles, editors, Advances in Neural Information Processing Systems, volume 5, pages 147--155. Morgan Kaufmann, San Mateo, CA, 1993.


Journal of Machine Learning Research 6 (2005) 1579--1619.. - With Online And (2005)   (Correct)

No context found.

I. Guyon, B. Boser, and V. Vapnik. Automatic capacity tuning of very large VC-dimension classifiers. In S. J. Hanson, J. D. Cowan, and C. Lee Giles, editors, Advances in Neural Information Processing Systems, volume 5, pages 147--155. Morgan Kaufmann, San Mateo, CA, 1993.


Building Text Classifiers Using Positive and Unlabeled Examples - Liu, Dai, Li, al. (2003)   (4 citations)  (Correct)

No context found.

.#Guyon, I., Boser, B. and Vapnik, V. (1993). Automatic capacity tuning of very large VCdimension classifiers. Advances in Neural Information Processing Systems, Vol. 5.


Find A Good Initial Guess: An Interior-Point Preconditioner for SVM .. - Wen (1999)   (Correct)

No context found.

I. Guyon B.Boser, V. Vapnik. Automatic Capacity Tuning of Very Large VC-dimensions Classifiers.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC