Results 1  10
of
824,879
Maximum likelihood from incomplete data via the EM algorithm
 JOURNAL OF THE ROYAL STATISTICAL SOCIETY, SERIES B
, 1977
"... A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situat ..."
Abstract

Cited by 11807 (17 self)
 Add to MetaCart
situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis.
LeastSquares Policy Iteration
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2003
"... We propose a new approach to reinforcement learning for control problems which combines valuefunction approximation with linear architectures and approximate policy iteration. This new approach ..."
Abstract

Cited by 461 (12 self)
 Add to MetaCart
We propose a new approach to reinforcement learning for control problems which combines valuefunction approximation with linear architectures and approximate policy iteration. This new approach
Exploration in LeastSquares Policy Iteration
"... One of the key problems in reinforcement learning is balancing exploration and exploitation. Another is learning and acting in large or even continuous Markov decision processes (MDPs), where compact function approximation has to be used. In this paper, we provide a practical solution to exploring l ..."
Abstract
 Add to MetaCart
large MDPs by integrating a powerful exploration technique, Rmax, into a stateoftheart learning algorithm, leastsquares policy iteration (LSPI). This approach combines the strengths of both methods, and has shown its effectiveness and superiority over LSPI with two other popular exploration rules
Online Exploration in LeastSquares Policy Iteration
"... One of the key problems in reinforcement learning is balancing exploration and exploitation. Another is learning and acting in large or even continuous Markov decision processes (MDPs), where compact function approximation has to be used. In this paper, we provide a practical solution to exploring l ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
large MDPs by integrating a powerful exploration technique, Rmax, into a stateoftheart learning algorithm, leastsquares policy iteration (LSPI). This approach combines the strengths of both methods, and has shown its effectiveness and superiority over LSPI with two other popular exploration rules
FiniteSample Analysis of LeastSquares Policy Iteration
 Journal of Machine learning Research (JMLR
, 2011
"... In this paper, we report a performance bound for the widely used leastsquares policy iteration (LSPI) algorithm. We first consider the problem of policy evaluation in reinforcement learning, that is, learning the value function of a fixed policy, using the leastsquares temporaldifference (LSTD) l ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
In this paper, we report a performance bound for the widely used leastsquares policy iteration (LSPI) algorithm. We first consider the problem of policy evaluation in reinforcement learning, that is, learning the value function of a fixed policy, using the leastsquares temporaldifference (LSTD
Incremental least squares policy iteration for POMDPs
 In Proceedings of the TwentyFirst National Conference on Artificial Intelligence (AAAI
"... We present a new algorithm, called incremental least squares policy iteration (ILSPI), for finding the infinitehorizon stationary policy for partially observable Markov decision processes (POMDPs). The ILSPI algorithm computes a basis representation of the infinitehorizon value function by minimiz ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
We present a new algorithm, called incremental least squares policy iteration (ILSPI), for finding the infinitehorizon stationary policy for partially observable Markov decision processes (POMDPs). The ILSPI algorithm computes a basis representation of the infinitehorizon value function
Incremental Least Squares Policy Iteration for POMDPs
"... We present a new algorithm, called incremental least squares policy iteration (ILSPI), for solving the infinitehorizon stationary policy for partially observable Markov decision processes (POMDPs). The ILSPI algorithm computes a basis representation of the infinitehorizon value function by minimiz ..."
Abstract
 Add to MetaCart
We present a new algorithm, called incremental least squares policy iteration (ILSPI), for solving the infinitehorizon stationary policy for partially observable Markov decision processes (POMDPs). The ILSPI algorithm computes a basis representation of the infinitehorizon value function
Least Square Policy Iteration in Reinforcement Learning
"... Abstractâ€”Policy iteration is the core procedure for solving problems of reinforcement learning method. Policy iteration evaluates polices by evaluating value functions of these polices and then new improvement polices will be figured out by these value functions. Value functions and polices in class ..."
Abstract
 Add to MetaCart
and becomes an important part of approximate policy iteration. Policy is expressed by instantly calculating policy action from approximate function rather than explicit expression. Least square reinforcement method is sampleeffective in solving parameters approximating the value function, the larger
ModelFree Least Squares Policy Iteration
 Advances in Neural Information Processing Systems
, 2001
"... We propose a new approach to reinforcement learning which combines least squares function approximation with policy iteration. Our method is modelfree and completely off policy. We are motivated by the least squares temporal difference learning algorithm (LSTD), which is known for its efficient ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
We propose a new approach to reinforcement learning which combines least squares function approximation with policy iteration. Our method is modelfree and completely off policy. We are motivated by the least squares temporal difference learning algorithm (LSTD), which is known for its
ModelFree Least Squares Policy Iteration
"... 1 Introduction Linear least squares function approximators offer many advantages in the context of reinforcement learning. While their ability to generalize is less powerful than black box methods such as neural networks, they have their virtues: They are easy to implement anduse, and their behavio ..."
Abstract
 Add to MetaCart
1 Introduction Linear least squares function approximators offer many advantages in the context of reinforcement learning. While their ability to generalize is less powerful than black box methods such as neural networks, they have their virtues: They are easy to implement anduse
Results 1  10
of
824,879