Results 1  10
of
6,049
Dynamic Bayesian Networks: Representation, Inference and Learning
, 2002
"... Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have bee ..."
Abstract

Cited by 770 (3 self)
 Add to MetaCart
random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linearGaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from
Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories
, 2004
"... Abstract — Current computational approaches to learning visual object categories require thousands of training images, are slow, cannot learn in an incremental manner and cannot incorporate prior information into the learning process. In addition, no algorithm presented in the literature has been te ..."
Abstract

Cited by 784 (16 self)
 Add to MetaCart
tested on more than a handful of object categories. We present an method for learning object categories from just a few training images. It is quick and it uses prior information in a principled way. We test it on a dataset composed of images of objects belonging to 101 widely varied categories. Our
Predictive reward signal of dopamine neurons
 Journal of Neurophysiology
, 1998
"... Schultz, Wolfram. Predictive reward signal of dopamine neurons. is called rewards, which elicit and reinforce approach behavJ. Neurophysiol. 80: 1–27, 1998. The effects of lesions, receptor ior. The functions of rewards were developed further during blocking, electrical selfstimulation, and drugs ..."
Abstract

Cited by 747 (12 self)
 Add to MetaCart
conditions. that resemble rewardpredicting stimuli or are novel or particularly Rewards come in various physical forms, are highly variable salient. However, only few phasic activations follow aversive stimin time and depend on the particular environment of the subject. uli. Thus dopamine neurons label
Policy gradient methods for reinforcement learning with function approximation.
 In NIPS,
, 1999
"... Abstract Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining a policy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly repres ..."
Abstract

Cited by 439 (20 self)
 Add to MetaCart
;actorcritic" or policyiteration architectures (e.g., Policy Gradient Theorem We consider the standard reinforcement learning framework (see, e.g., Sutton and Barto, 1998), in which a learning agent interacts with a Markov decision process (MDP). The state, action, and reward at each time t ∈ {0, 1, 2
A Neural Probabilistic Language Model
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2003
"... A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dimensionality: a word sequence on which the model will be tested is likely to be different from all the word sequences seen ..."
Abstract

Cited by 447 (19 self)
 Add to MetaCart
is obtained because a sequence of words that has never been seen before gets high probability if it is made of words that are similar (in the sense of having a nearby representation) to words forming an already seen sentence. Training such large models (with millions of parameters) within a reasonable time
The Simple Economics of Basic Scientific Research
 Journal of Political Economy
, 1959
"... I begin this essay by reflecting on my early paper (Nelson, 1859), and Ken’s (Arrow, 1962), as period pieces. These papers certainly have been influential in shaping the discussion of science and technology policy over the last forty years, at least among economists, but at the time they were writte ..."
Abstract

Cited by 438 (5 self)
 Add to MetaCart
written, economists were just beginning to get into analysis of the key processes and institutions involved in technological advance. A lot has been learned since that time, and the discussion has become much more sophisticated. I will highlight two of those intellectual developments: the growing
Nearoptimal reinforcement learning in polynomial time
 Machine Learning
, 1998
"... We present new algorithms for reinforcement learning, and prove that they have polynomial bounds on the resources required to achieve nearoptimal return in general Markov decision processes. After observing that the number of actions required to approach the optimal return is lower bounded by the m ..."
Abstract

Cited by 304 (5 self)
 Add to MetaCart
by the mixing time T of the optimal policy (in the undiscounted case) or by the horizon time T (in the discounted case), we then give algorithms requiring a number of actions and total computation time that are only polynomial in T and the number of states, for both the undiscounted and discounted cases
RMAX  A General Polynomial Time Algorithm for NearOptimal Reinforcement Learning
, 2001
"... Rmax is a very simple modelbased reinforcement learning algorithm which can attain nearoptimal average reward in polynomial time. In Rmax, the agent always maintains a complete, but possibly inaccurate model of its environment and acts based on the optimal policy derived from this model. The mod ..."
Abstract

Cited by 297 (10 self)
 Add to MetaCart
Rmax is a very simple modelbased reinforcement learning algorithm which can attain nearoptimal average reward in polynomial time. In Rmax, the agent always maintains a complete, but possibly inaccurate model of its environment and acts based on the optimal policy derived from this model
Boosting Image Retrieval
, 2000
"... We present an approach for image retrieval using a very large number of highly selective features and efficient online learning. Our approach is predicated on the assumption that each image is generated by a sparse set of visual “causes ” and that images which are visually similar share causes. We p ..."
Abstract

Cited by 304 (4 self)
 Add to MetaCart
propose a mechanism for computing a very large number of highly selective features which capture some aspects of this causal structure (in our implementation there are over 45,000 highly selective features). At query time a user selects a few example images, and a technique known as “boosting ” is used
JustInTime Learning for Fast and Flexible Inference
 In Advances in Neural Information Processing Systems 27
, 2014
"... Abstract Much of research in machine learning has centered around the search for inference algorithms that are both generalpurpose and efficient. The problem is extremely challenging and general inference remains computationally expensive. We seek to address this problem by observing that in most ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
specific applications of a model, we typically only need to perform a small subset of all possible inference computations. Motivated by this, we introduce justintime learning, a framework for fast and flexible inference that learns to speed up inference at runtime. Through a series of experiments, we
Results 1  10
of
6,049