Policy gradient methods for reinforcement learning with function approximation.
 In NIPS,
, 1999
"... Abstract Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining a policy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly repres ..."
Cited by 439 (20 self)
represented by its own function approximator, independent of the value function, and is updated according to the gradient of expected reward with respect to the policy parameters. Williams's REINFORCE method and actorcritic methods are examples of this approach. Our main new result is to show
Flatness and defect of nonlinear systems: Introductory theory and examples
 International Journal of Control
, 1995
"... We introduce flat systems, which are equivalent to linear ones via a special type of feedback called endogenous. Their physical properties are subsumed by a linearizing output and they might be regarded as providing another nonlinear extension of Kalman’s controllability. The distance to flatness is ..."
Cited by 346 (23 self)
is measured by a nonnegative integer, the defect. We utilize differential algebra which suits well to the fact that, in accordance with Willems ’ standpoint, flatness and defect are best defined without distinguishing between input, state, output and other variables. Many realistic classes of examples
Algorithmic information theory
 IBM JOURNAL OF RESEARCH AND DEVELOPMENT
, 1977
"... This paper reviews algorithmic information theory, which is an attempt to apply informationtheoretic and probabilistic ideas to recursive function theory. Typical concerns in this approach are, for example, the number of bits of information required to specify an algorithm, or the probability that ..."
Cited by 385 (18 self)
This paper reviews algorithmic information theory, which is an attempt to apply informationtheoretic and probabilistic ideas to recursive function theory. Typical concerns in this approach are, for example, the number of bits of information required to specify an algorithm, or the probability
Reflective and impulsive determinants of social behavior
 Personality and Social Psychology Review
, 2004
"... This article describes a 2systems model that explains social behavior as a joint function of reflective and impulsive processes. In particular, it is assumed that social behavior is controlled by 2 interacting systems that follow different operating principles. The reflective system generates behav ..."
Cited by 365 (5 self)
behavioral decisions that are based on knowledge about facts and values, whereas the impulsive system elicits behavior through associative links and motivational orientations. The proposed model describes how the 2 systems interact at various stages of processing, and how their outputs may determine behavior
Discriminative Reranking for Natural Language Parsing
, 2005
"... This article considers approaches which rerank the output of an existing probabilistic parser. The base parser produces a set of candidate parses for each input sentence, with associated probabilities that define an initial ranking of these parses. A second model then attempts to improve upon this i ..."
Cited by 333 (9 self)
This article considers approaches which rerank the output of an existing probabilistic parser. The base parser produces a set of candidate parses for each input sentence, with associated probabilities that define an initial ranking of these parses. A second model then attempts to improve upon
Solving ShapeAnalysis Problems in Languages with Destructive Updating
 POPL '96
, 1996
"... This paper concerns the static analysis of programs that perform destructive updating on heapallocated storage. We give an algorithm that conservatively solves this problem by using a finite shapegraph to approximate the possible “shapes” that heapallocated structures in a program can take on. In ..."
Cited by 306 (20 self)
. In contrast with previous work, our method M even accurate for certain programs that update cyclic data structures. For example, our method can determine that when the input to a program that searches a list and splices in a new element is a possibly circular list, the output is a possibly circular list.
Multiobjective output feedback control via LMI
 in Proc. Amer. Contr. Conf
, 1997
"... The problem of multiobjective H2=H1 optimal controller design is reviewed. There is as yet no exact solution to this problem. We present a method based on that proposed by Scherer [14]. The problem is formulated as a convex semidefinite program (SDP) using the LMI formulation of the H2 and H1 norms. ..."
Cited by 220 (8 self)
case. A simple example computed using FIR (Finite Impulse Response) Q's is presented.
Marginal Likelihood From the MetropolisHastings Output
 OUTPUT,JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2001
"... This article provides a framework for estimating the marginal likelihood for the purpose of Bayesian model comparisons. The approach extends and completes the method presented in Chib (1995) by overcoming the problems associated with the presence of intractable full conditional densities. The propos ..."
Cited by 217 (16 self)
, hierarchical random effects model for clustered Gaussian data, Poisson regression model for clustered count data, and the multivariate probit model for correlated binary data, are used to illustrate the performance and implementation of the method. These examples demonstrate that the method is practical
Appendix G. THERPS Methodology and example output from THERPS model G.1. Summary of Changes in TREX to Allow for Food Intake Estimation for
"... Equation 1 is an iguanid food ingestion rate that was implemented in THERPS to allow for estimation of daily food ingestion for herptiles (Nagy, 1987 as cited in U.S. EPA, 1993, equation 313, page 37). (EQ 1) Equation 1 replaces the following equivalent allometric equation that is used in TREX ( ..."
Equation 1 is an iguanid food ingestion rate that was implemented in THERPS to allow for estimation of daily food ingestion for herptiles (Nagy, 1987 as cited in U.S. EPA, 1993, equation 313, page 37). (EQ 1) Equation 1 replaces the following equivalent allometric equation that is used in TREX (v. 1.3.1.) to estimate food ingestion rates of birds, reported by Nagy (1987) and cited in U.S. EPA (1993): (EQ 2) The iguanid allometric equation presented in U.S. EPA (1993) (EQ 1) is used to estimate the food ingestion rate of herpatofauna. It is assumed that since both reptiles and amphibians are poikilothermic, they have similar caloric requirements. The assumption that use of the iguanid lizard allometric equation results in a reasonable approximation of terrestrial phase amphibian food intake was tested. For this analysis, measured food intake values reported for juvenile bullfrogs (Rana catesbeiana) by Modzelewski and Culley (1974, as cited in U.S. EPA, 1993) were compared to estimates
The Relevance Vector Machine
, 2000
"... The support vector machine (SVM) is a stateoftheart technique for regression and classification, combining excellent generalisation properties with a sparse kernel representation. However, it does suffer from a number of disadvantages, notably the absence of probabilistic outputs, the requirement ..."
Cited by 294 (6 self)
The support vector machine (SVM) is a stateoftheart technique for regression and classification, combining excellent generalisation properties with a sparse kernel representation. However, it does suffer from a number of disadvantages, notably the absence of probabilistic outputs
