Results 1  10
of
4,056,022
Qlearning with linear function approximation
 Proceedings of the 20th Annual Conference on Learning Theory
, 2007
"... In this paper, we analyze the convergence of Qlearning with linear function approximation. We identify a set of conditions that implies the convergence of this method with probability 1, when a fixed learning policy is used. We discuss the differences and similarities between our results and those ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
In this paper, we analyze the convergence of Qlearning with linear function approximation. We identify a set of conditions that implies the convergence of this method with probability 1, when a fixed learning policy is used. We discuss the differences and similarities between our results and those
Optimality of reinforcement learning algorithms with linear function approximation
 In NIPS
, 2002
"... There are several reinforcement learning algorithms that yield approximate solutions for the problem of policy evaluation when the value function is represented with a linear function approximator. In this paper we show that each of the solutions is optimal with respect to a specific objective func ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
There are several reinforcement learning algorithms that yield approximate solutions for the problem of policy evaluation when the value function is represented with a linear function approximator. In this paper we show that each of the solutions is optimal with respect to a specific objective
Convergence of Qlearning with linear function approximation
"... Abstract — In this paper, we analyze the convergence properties of Qlearning using linear function approximation. This algorithm can be seen as an extension to stochastic control settings of TDlearning using linear function approximation, as described in [1]. We derive a set of conditions that imp ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Abstract — In this paper, we analyze the convergence properties of Qlearning using linear function approximation. This algorithm can be seen as an extension to stochastic control settings of TDlearning using linear function approximation, as described in [1]. We derive a set of conditions
On the Convergence of TemporalDifference Learning with Linear Function Approximation
"... Abstract. The asymptotic properties of temporaldifference learning algorithms with linear function approximation are analyzed in this paper. The analysis is carried out in the context of the approximation of a discounted costtogo function associated with an uncontrolled Markov chain with an unco ..."
Abstract
 Add to MetaCart
Abstract. The asymptotic properties of temporaldifference learning algorithms with linear function approximation are analyzed in this paper. The analysis is carried out in the context of the approximation of a discounted costtogo function associated with an uncontrolled Markov chain
Convergence of Synchronous Reinforcement Learning with Linear Function Approximation
, 2004
"... Synchronous reinforcement learning (RL) algorithms with linear function approximation are representable as inhomogeneous matrix iterations of a special form (Schoknecht & Merke, 2003). In this paper we state conditions of convergence for general inhomogeneous matrix iterations and prove th ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Synchronous reinforcement learning (RL) algorithms with linear function approximation are representable as inhomogeneous matrix iterations of a special form (Schoknecht & Merke, 2003). In this paper we state conditions of convergence for general inhomogeneous matrix iterations and prove
Improved Temporal Difference Methods with Linear Function Approximation
"... This chapter considers temporal difference algorithms within the context of infinitehorizon finitestate dynamic programming problems with discounted cost and linear cost function approximation. This problem arises as a subproblem in the policy iteration method of dynamic programming. Additional d ..."
Abstract

Cited by 32 (8 self)
 Add to MetaCart
This chapter considers temporal difference algorithms within the context of infinitehorizon finitestate dynamic programming problems with discounted cost and linear cost function approximation. This problem arises as a subproblem in the policy iteration method of dynamic programming. Additional
Convergent Fitted Value Iteration with Linear Function Approximation
"... Fitted value iteration (FVI) with ordinary least squares regression is known to diverge. We present a new method, “ExpansionConstrained Ordinary Least Squares ” (ECOLS), that produces a linear approximation but also guarantees convergence when used with FVI. To ensure convergence, we constrain the ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Fitted value iteration (FVI) with ordinary least squares regression is known to diverge. We present a new method, “ExpansionConstrained Ordinary Least Squares ” (ECOLS), that produces a linear approximation but also guarantees convergence when used with FVI. To ensure convergence, we constrain
The Stability of General Discounted Reinforcement Learning with Linear Function Approximation
 In Proceedings of the UK Workshop on Computational Intelligence (UKCI02
, 2002
"... This paper shows that general discounted return estimating reinforcement learning algorithms cannot diverge to infinity when a form of linear function approximator is used for approximating the valuefunction or Qfunction. The results are significant insofar as examples of divergence of the valuef ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
This paper shows that general discounted return estimating reinforcement learning algorithms cannot diverge to infinity when a form of linear function approximator is used for approximating the valuefunction or Qfunction. The results are significant insofar as examples of divergence of the valuefunction
Nonlinear Functional Approximation of Heterogeneous Dynamics
, 2005
"... In modeling phenomena continuously observed and/or sampled at discrete time sequences, on problem is that often dynamics come from heterogeneous sources of uncertainty. This turns out particularly challenging with a low signaltonoise ratio, due to the structural or experimental conditions; for ins ..."
Abstract
 Add to MetaCart
; for instance, information appears dispersed in a wide spectrum of frequency bands or resolution levels. We aim to design ad hoc approximation instruments dealing with a particularly complex class of random processes, the one that generates financial returns, or their aggregates as index returns. The underlying
A Logarithmic Neural Network Architecture for Unbounded NonLinear Function Approximation
"... Multilayer feedforward neural networks with sigmoidal activation functions have been termed "universal function approximators". Although these types of networks can approximate any continuous function to a desired degree of accuracy, this approximation may require an inordinate number of ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
of hidden nodes and is only accurate over a finite interval. These short comings are due to the standard multilayer perceptron's (MLP) architecture not being well suited to unbounded nonlinear function approximation. A new architecture incorporating a logarithmic hidden layer proves to be superior
Results 1  10
of
4,056,022