#### DMCA

## The Nature of Statistical Learning Theory (1999)

Citations: | 12872 - 32 self |

### Citations

3618 |
Learning internal representation by error propagation
- Rumelbart, Hinton, et al.
- 1986
(Show Context)
Citation Context ...n this superposition are replaced by sigmoid functions. A method for calculating the gradient of the empirical risk for the sigmoid approximation of NN’s, called the backpropagation method, was found =-=[15]-=-, [12]. Using this gradient descent method, one can determine the corresponding coefficient values (weights) of all elements of the NN. In the 1990s, it was proven that the VC dimension of NN’s depend... |

3594 | Support-vector networks - Cortes, Vapnik - 1995 |

1830 | Spline Models for Observational Data - Wahba - 1990 |

1827 | A training algorithm for optimal margin classi
- Boser, Guyon, et al.
- 1992
(Show Context)
Citation Context ...h high-dimensional spaces: to construct a polynomial of degree 4 or 5 in a 200-dimensional space it is necessary to construct hyperplanes in a billion-dimensional feature space. In 1992, it was noted =-=[5]-=- that for both describing the optimal separating hyperplane in the feature space (41) and estimating the corresponding coefficients of expansion of the separating hyperplane (39) one uses the inner pr... |

1539 | Nonlinear Component Analysis as a Kernel Eigenvalue Problem - Schölkopf, Smola, et al. - 1998 |

1309 |
A Probabilistic Theory of Pattern Recognition
- Devroye, Gyor, et al.
- 1996
(Show Context)
Citation Context ...; however, the term responsible for the confidence interval [summand in (20)] is increased. The SRM principle takes both factors into account.) The main results of the theory of SRM are the following =-=[9]-=-, [22]. Theorem: For any distribution function the SRM method provides convergence to the best possible solution with probability one. In other words SRM method is universally strongly consistent. 9 T... |

1133 | On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications - Vapnik, Chervonenkis - 1971 |

952 |
Estimation of Dependences Based on Empirical Data
- Vapnik
- 1982
(Show Context)
Citation Context ...ions Case 1—The Set of Totally Bounded Functions: Without restriction in generality, we assume that (19) The main result in the theory of bounds for sets of totally bounded functions is the following =-=[20]-=-–[22]. Theorem: With probability at least , the inequality (20) holds true simultaneously for all functions of the set (19), where (21) in -dimensional coordinate space is also equal to because the VC... |

713 | Learnability and the Vapnik-Chervonenkis dimension - Blumer, Ehrenfeucht, et al. - 1989 |

387 | Regularization theory and neural networks architectures - Girosi, Jones, et al. - 1995 |

240 | An equivalence between sparse approximation and support vector machines - Girosi - 1998 |

238 | Scale-sensitive dimensions, uniform convergence, and learnability - Alon, Ben-David, et al. - 1997 |

182 | Simplified support vector decision rules - Burges - 1996 |

142 | Generalization Performance of Support Vector Machines and other Pattern Classifiers - BARTLETT - 1999 |

127 | A Theory of Learning and Generalization - Vidyasagar - 1997 |

80 | Fat-shattering and the learnability of real-valued functions - Bartlett, Long, et al. - 1996 |

37 |
Learning process in an asymmetric threshold network
- Cun
- 1986
(Show Context)
Citation Context ... superposition are replaced by sigmoid functions. A method for calculating the gradient of the empirical risk for the sigmoid approximation of NN’s, called the backpropagation method, was found [15], =-=[12]-=-. Using this gradient descent method, one can determine the corresponding coefficient values (weights) of all elements of the NN. In the 1990s, it was proven that the VC dimension of NN’s depends on t... |

30 | A framework for structural risk minimization - Shawe-Taylor, Bartlett, et al. - 1996 |

23 | The Glivenko–Cantelli problem, ten years later - Talagrand - 1996 |

6 |
sufficient conditions for the uniform convergence of means to their expectations
- Necessary
- 1981
(Show Context)
Citation Context ...ncides with the growth function For another extreme case where contains only one function the generalized growth function coincides with the annealed VC-entropy. The following assertion is true [20], =-=[26]-=-. Theorem: Suppose that a set of loss-functions is bounded Then for sufficiently large the following inequality: sample size). 9 The goal is to specify methods which are appropriate for a given sample... |

4 |
the uniform convergence of relative frequencies of events to their probabilities
- On
- 1971
(Show Context)
Citation Context ...et of indicator functions on the sample of size The main result of the theory of consistency for the pattern recognition problem (the consistency for indicator loss function) is the following theorem =-=[24]-=-. Theorem: For uniform two-sided convergence of the frequencies to their probabilities 3 it is necessary and sufficient that the equality (13) (14) hold. Slightly modifying the condition (14) one can ... |

2 | On the annealed vc entropy for margin classifiers: A statistical mechanics study - Opper - 1998 |

1 | invariance in kernel-based methods - “Geometry - 1999 |

1 | connection between regularization operators and support vector kernels - “The - 1998 |

1 |
necessary and sufficient conditions for consistency of the method of empirical risk minimization,” Yearbook of the Academy of Sciences of the USSR
- “The
- 1989
(Show Context)
Citation Context ...unds that evaluate this concept for a fixed amount of observations. A. The Key Theorem of the Learning Theory The key theorem of the theory concerning the ERM-based learning processesis the following =-=[27]-=-. The Key Theorem: Let be a set of functions that has a bounded loss for probability measure This type of convergence is called uniform one-sided convergence. In other words, according to the Key theo... |

1 | support vector kernels,” in - Williamson, Smola, et al. - 1999 |