Results 1  10
of
183
An introduction to kernelbased learning algorithms
 IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2001
"... This paper provides an introduction to support vector machines (SVMs), kernel Fisher discriminant analysis, and ..."
Abstract

Cited by 598 (55 self)
 Add to MetaCart
This paper provides an introduction to support vector machines (SVMs), kernel Fisher discriminant analysis, and
Bayesian measures of model complexity and fit
 Journal of the Royal Statistical Society, Series B
, 2002
"... [Read before The Royal Statistical Society at a meeting organized by the Research ..."
Abstract

Cited by 466 (4 self)
 Add to MetaCart
[Read before The Royal Statistical Society at a meeting organized by the Research
An introduction to boosting and leveraging
 Advanced Lectures on Machine Learning, LNCS
, 2003
"... ..."
(Show Context)
How to Tell When Simpler, More Unified, or Less Ad Hoc Theories will Provide More Accurate Predictions
, 1994
"... Traditional analyses of the curve fitting problem maintain that the data do not indicate what form the fitted curve should take. Rather, this issue is said to be settled by prior probabilities, by simplicity, or by a background theory. In this paper, we describe a result due to Akaike [1973], which ..."
Abstract

Cited by 126 (33 self)
 Add to MetaCart
Traditional analyses of the curve fitting problem maintain that the data do not indicate what form the fitted curve should take. Rather, this issue is said to be settled by prior probabilities, by simplicity, or by a background theory. In this paper, we describe a result due to Akaike [1973], which shows how the data can underwrite an inference concerning the curve’s form based on an estimate of how predictively accurate it will be. We argue that this approach throws light on the theoretical virtues of parsimoniousness, unification, and non ad hocness, on the dispute about Bayesianism, and on empiricism and scientific realism.
Parallelized stochastic gradient descent
 Advances in Neural Information Processing Systems 23
, 2010
"... Abstract With the increase in available data parallel machine learning has become an increasingly pressing problem. In this paper we present the first parallel stochastic gradient descent algorithm including a detailed analysis and experimental evidence. Unlike prior work on parallel optimization a ..."
Abstract

Cited by 97 (4 self)
 Add to MetaCart
(Show Context)
Abstract With the increase in available data parallel machine learning has become an increasingly pressing problem. In this paper we present the first parallel stochastic gradient descent algorithm including a detailed analysis and experimental evidence. Unlike prior work on parallel optimization algorithms
On a Kernelbased Method for Pattern Recognition, Regression, Approximation, and Operator Inversion
, 1997
"... We present a Kernelbased framework for Pattern Recognition, Regression Estimation, Function Approximation and multiple Operator Inversion. Previous approaches such as ridgeregression, Support Vector methods and regression by Smoothing Kernels are included as special cases. We will show connection ..."
Abstract

Cited by 93 (23 self)
 Add to MetaCart
We present a Kernelbased framework for Pattern Recognition, Regression Estimation, Function Approximation and multiple Operator Inversion. Previous approaches such as ridgeregression, Support Vector methods and regression by Smoothing Kernels are included as special cases. We will show connections between the costfunction and some properties up to now believed to apply to Support Vector Machines only. The optimal solution of all the problems described above can be found by solving a simple quadratic programming problem. The paper closes with a proof of the equivalence between Support Vector kernels and Greene's functions of regularization operators.
Statistical Inference, Occam’s Razor, and Statistical Mechanics on the Space of Probability Distributions
, 1997
"... The task of parametric model selection is cast in terms of a statistical mechanics on the space of probability distributions. Using the techniques of lowtemperature expansions, I arrive at a systematic series for the Bayesian posterior probability of a model family that significantly extends known ..."
Abstract

Cited by 78 (3 self)
 Add to MetaCart
The task of parametric model selection is cast in terms of a statistical mechanics on the space of probability distributions. Using the techniques of lowtemperature expansions, I arrive at a systematic series for the Bayesian posterior probability of a model family that significantly extends known results in the literature. In particular, I arrive at a precise understanding of how Occam’s razor, the principle that simpler models should be preferred until the data justify more complex models, is automatically embodied by probability theory. These results require a measure on the space of model parameters and I derive and discuss an interpretation of Jeffreys ’ prior distribution as a uniform prior over the distributions indexed by a family. Finally, I derive a theoretical index of the complexity of a parametric family relative to some true distribution that I call the razor of the model. The form of the razor immediately suggests several interesting questions in the theory of learning that can be studied using the techniques of statistical mechanics.
Slow learners are fast
 In NIPS
, 2009
"... Online learning algorithms have impressive convergence properties when it comes to risk minimization and convex games on very large problems. However, they are inherently sequential in their design which prevents them from taking advantage of modern multicore architectures. In this paper we prove t ..."
Abstract

Cited by 74 (4 self)
 Add to MetaCart
(Show Context)
Online learning algorithms have impressive convergence properties when it comes to risk minimization and convex games on very large problems. However, they are inherently sequential in their design which prevents them from taking advantage of modern multicore architectures. In this paper we prove that online learning with delayed updates converges well, thereby facilitating parallel online learning. 1