Results 1  10
of
620,222
Active Learning with Statistical Models
, 1995
"... For manytypes of learners one can compute the statistically "optimal" way to select data. We review how these techniques have been used with feedforward neural networks [MacKay, 1992# Cohn, 1994]. We then showhow the same principles may be used to select data for two alternative, statist ..."
Abstract

Cited by 679 (10 self)
 Add to MetaCart
, statisticallybased learning architectures: mixtures of Gaussians and locally weighted regression. While the techniques for neural networks are expensive and approximate, the techniques for mixtures of Gaussians and locally weighted regression are both efficient and accurate.
Sparse Bayesian Learning and the Relevance Vector Machine
, 2001
"... This paper introduces a general Bayesian framework for obtaining sparse solutions to regression and classification tasks utilising models linear in the parameters. Although this framework is fully general, we illustrate our approach with a particular specialisation that we denote the `relevance vect ..."
Abstract

Cited by 966 (5 self)
 Add to MetaCart
This paper introduces a general Bayesian framework for obtaining sparse solutions to regression and classification tasks utilising models linear in the parameters. Although this framework is fully general, we illustrate our approach with a particular specialisation that we denote the `relevance
Gaussian processes for machine learning
, 2003
"... We give a basic introduction to Gaussian Process regression models. We focus on understanding the role of the stochastic process and how it is used to define a distribution over functions. We present the simple equations for incorporating training data and examine how to learn the hyperparameters us ..."
Abstract

Cited by 720 (2 self)
 Add to MetaCart
We give a basic introduction to Gaussian Process regression models. We focus on understanding the role of the stochastic process and how it is used to define a distribution over functions. We present the simple equations for incorporating training data and examine how to learn the hyperparameters
Locally weighted learning
 ARTIFICIAL INTELLIGENCE REVIEW
, 1997
"... This paper surveys locally weighted learning, a form of lazy learning and memorybased learning, and focuses on locally weighted linear regression. The survey discusses distance functions, smoothing parameters, weighting functions, local model structures, regularization of the estimates and bias, ass ..."
Abstract

Cited by 599 (51 self)
 Add to MetaCart
This paper surveys locally weighted learning, a form of lazy learning and memorybased learning, and focuses on locally weighted linear regression. The survey discusses distance functions, smoothing parameters, weighting functions, local model structures, regularization of the estimates and bias
Training Linear SVMs in Linear Time
, 2006
"... Linear Support Vector Machines (SVMs) have become one of the most prominent machine learning techniques for highdimensional sparse data commonly encountered in applications like text classification, wordsense disambiguation, and drug design. These applications involve a large number of examples n ..."
Abstract

Cited by 549 (6 self)
 Add to MetaCart
as well as a large number of features N, while each example has only s << N nonzero features. This paper presents a CuttingPlane Algorithm for training linear SVMs that provably has training time O(sn) for classification problems and O(sn log(n)) for ordinal regression problems. The algorithm
A Survey of Weak Instruments and Weak Identification in Generalized Method of Moments
 Journal of Business & Economic Statistics
, 2002
"... Weak instruments arise when the instruments in linear instrumental variables (IV) regression are weakly correlated with the included endogenous variables. In generalized method of moments (GMM), more generally, weak instruments correspond to weak identification of some or all of the unknown paramete ..."
Abstract

Cited by 484 (11 self)
 Add to MetaCart
Weak instruments arise when the instruments in linear instrumental variables (IV) regression are weakly correlated with the included endogenous variables. In generalized method of moments (GMM), more generally, weak instruments correspond to weak identification of some or all of the unknown
Learning from demonstrationâ€ť.
 Advances in Neural Information Processing Systems 9.
, 1997
"... Abstract By now it is widely accepted that learning a task from scratch, i.e., without any prior knowledge, is a daunting undertaking. Humans, however, rarely attempt to learn from scratch. They extract initial biases as well as strategies how to approach a learning problem from instructions and/or ..."
Abstract

Cited by 399 (32 self)
 Add to MetaCart
speed up learning. In general nonlinear learning problems, only modelbased reinforcement learning shows significant speedup after a demonstration, while in the special case of linear quadratic regulator (LQR) problems, all methods profit from the demonstration. In an implementation of pole balancing
Support Vector Machines for Classification and Regression
 UNIVERSITY OF SOUTHAMPTON, TECHNICAL REPORT
, 1998
"... The problem of empirical data modelling is germane to many engineering applications.
In empirical data modelling a process of induction is used to build up a model of the
system, from which it is hoped to deduce responses of the system that have yet to be observed.
Ultimately the quantity and qualit ..."
Abstract

Cited by 357 (5 self)
 Add to MetaCart
and quality of the observations govern the performance
of this empirical model. By its observational nature data obtained is finite and sampled;
typically this sampling is nonuniform and due to the high dimensional nature of the
problem the data will form only a sparse distribution in the input space
The Determinants of Credit Spread Changes.
 Journal of Finance
, 2001
"... ABSTRACT Using dealer's quotes and transactions prices on straight industrial bonds, we investigate the determinants of credit spread changes. Variables that should in theory determine credit spread changes have rather limited explanatory power. Further, the residuals from this regression are ..."
Abstract

Cited by 422 (2 self)
 Add to MetaCart
rates, r 10 t . To capture potential nonlinear effects due to convexity, we also include the squared level of the term structure, (r 10 t ) 2 . Slope of Yield Curve We define the slope of the yield curve as the difference between Datastream's 10year and 2year Benchmark Treasury yields, slope
Greedy layerwise training of deep networks
, 2006
"... Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. Deep multilayer neural networks have many levels of nonlinearities allow ..."
Abstract

Cited by 394 (48 self)
 Add to MetaCart
Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. Deep multilayer neural networks have many levels of nonlinearities
Results 1  10
of
620,222