Results 1  10
of
9,070
Online Convex Programming and Generalized Infinitesimal Gradient Ascent
, 2003
"... Convex programming involves a convex set F R and a convex function c : F ! R. The goal of convex programming is to nd a point in F which minimizes c. In this paper, we introduce online convex programming. In online convex programming, the convex set is known in advance, but in each step of some ..."
Abstract

Cited by 298 (4 self)
 Add to MetaCart
been constructed. We introduce an algorithm for this domain, apply it to repeated games, and show that it is really a generalization of in nitesimal gradient ascent, and the results here imply that generalized in nitesimal gradient ascent (GIGA) is universally consistent.
Gradient Ascent Critic Optimization
 University of Massachusetts
, 2010
"... In this paper, we address the critic optimization problem within the context of reinforcement learning. The focus of this problem is on improving an agent’s critic, so as to increase performance over a distribution of tasks. We use ordered derivatives, in a process similar to back propagation throug ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
, the hungerthirst domain, and the boxes domain. Starting from a random reward function, our gradient ascent critic optimization is able to find high performing reward functions which are competitive with ones that are hand crafted and those found through exhaustive search. We conclude that our sample based gradient
Reward Design via Online Gradient Ascent
"... Recent work has demonstrated that when artificial agents are limited in their ability to achieve their goals, the agent designer can benefit by making the agent’s goals different from the designer’s. This gives rise to the optimization problem of designing the artificial agent’s goals—in the RL fram ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
framework, designing the agent’s reward function. Existing attempts at solving this optimal reward problem do not leverage experience gained online during the agent’s lifetime nor do they take advantage of knowledge about the agent’s structure. In this work, we develop a gradient ascent approach with formal
Convergent Gradient Ascent in GeneralSum Games
 in Proceedings of the 13th European Conference on Machine Learning, August 1319 2002
, 2002
"... In this work we look at the recent results in policy gradient learning in a generalsum game scenario, in the form of two algorithms, IGA and WoLFIGA. We address the drawbacks in convergence properties of these algorithms, and propose a more accurate version of WoLFIGA that is guaranteed to co ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
In this work we look at the recent results in policy gradient learning in a generalsum game scenario, in the form of two algorithms, IGA and WoLFIGA. We address the drawbacks in convergence properties of these algorithms, and propose a more accurate version of WoLFIGA that is guaranteed
Genga: A generalization of natural gradient ascent with positive and negative convergence results.
 ICML,
, 2014
"... Abstract Natural gradient ascent (NGA) is a popular optimization method that uses a positive definite metric tensor. In many applications the metric tensor is only guaranteed to be positive semidefinite (e.g., when using the Fisher information matrix as the metric tensor), in which case NGA is not ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract Natural gradient ascent (NGA) is a popular optimization method that uses a positive definite metric tensor. In many applications the metric tensor is only guaranteed to be positive semidefinite (e.g., when using the Fisher information matrix as the metric tensor), in which case NGA
3.1 Deterministic Gradient Ascent........................... 10 3.2 Stochastic Gradient Ascent............................. 11
, 2015
"... ..."
Direct GradientBased Reinforcement Learning:II. Gradient Ascent Algorithms and Experiments
, 1999
"... Abstract In [2] we introduced GPOMDP, an algorithm for computing arbitrarily accurate approximations to the performance gradient of parameterized partially observable Markov decision processes (POMDPs).The algorithm's chief advantages are that it requires only a single sample path of the under ..."
Abstract
 Add to MetaCart
present CONJPOMDP, a conjugategradient ascent algorithm that uses GPOMDP as a subroutine to estimate the gradient direction. CONJPOMDP uses a novel linesearch routine that relies solely on gradient estimates and hence is robust to noise in the performance estimates. OLPOMDP,an online gradient ascent
Training MRFBased Phrase Translation Models using Gradient Ascent
"... This paper presents a general, statistical framework for modeling phrase translation via Markov random fields. The model allows for arbituary features extracted from a phrase pair to be incorporated as evidence. The parameters of the model are estimated using a largescale discriminative training ap ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
approach that is based on stochastic gradient ascent and an Nbest list based expected BLEU as the objective function. The model is easy to be incoporated into a standard phrasebased statistical machine translation system, requiring no code change in the runtime engine. Evaluation is performed on two
Reinforcement Learning in POMDP's via Direct Gradient Ascent
 In Proc. 17th International Conf. on Machine Learning
, 2000
"... This paper discusses theoretical and experimental aspects of gradientbased approaches to the direct optimization of policy performance in controlled POMDPs. We introduce GPOMDP, a REINFORCElike algorithm for estimating an approximation to the gradient of the average reward as a function of ..."
Abstract

Cited by 76 (2 self)
 Add to MetaCart
This paper discusses theoretical and experimental aspects of gradientbased approaches to the direct optimization of policy performance in controlled POMDPs. We introduce GPOMDP, a REINFORCElike algorithm for estimating an approximation to the gradient of the average reward as a function
Learning POMDP Policies with Internal State using Gradient Ascent
, 2001
"... In [8, 9] we introduced GPOMDP, an algorithm for estimating the gradient of the average reward for arbitrary Partially Observable Markov Decision Processes (POMDPs) controlled by parameterized stochastic policies. GPOMDP applies to purely reactive (memoryless) policies, or policies that generate act ..."
Abstract
 Add to MetaCart
In [8, 9] we introduced GPOMDP, an algorithm for estimating the gradient of the average reward for arbitrary Partially Observable Markov Decision Processes (POMDPs) controlled by parameterized stochastic policies. GPOMDP applies to purely reactive (memoryless) policies, or policies that generate
Results 1  10
of
9,070