58 citations found. Retrieving documents...
P. R. Kumar and P. P. Varaiya. Stochastic Systems: Estimation, Identification, and Adaptive Control. Prentice Hall, Englewood Cli#s, NJ, 1986.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Multi-model Approach to Non-stationary Reinforcement Learning - Samuel Choi Dit   (Correct)

....is relatively easy as existing model based RL techniques (e.g. prioritized sweeping [7] can be directly applied. In our implementation, we keep the most recent N tuples of experience by using a fixed width history window and then build the environment models by the certainty equivalent method [5]. When a model is constructed, the optimal policy (or a near optimal policy, depending on the allowed computation time) is also computed simultaneously. The main difficulty in model construction is that the window width N must be carefully chosen. If the constant N is too small, there will be ....

P. Kumar and P. Varaiya. Stochastic Systems: Estimation, Identification, and Adaptive Control. PrenticeHall, 1986.


A Gentle Introduction to the Universal Algorithmic Agent AIXI - Hutter (2003)   (2 citations)  (Correct)

....useful from a practical point of view. 4.2 On the Optimality of AIXI In this section we outline ways towards an optimality proof of AIXI. Sources of inspiration are the SP loss bounds proven in Section 3 and optimality criteria from the adaptive control literature (mainly) for linear systems [KV86]. The value bounds for AIXI are expected to be, in a sense, weaker than the SP loss bounds because the problem class covered by AIXI is much larger than the class of induction problems. Convergence of # to has already been proven, but is not su#cient to establish convergence of the behavior of ....

....is meant by universal optimality. A learner (like AIXI) may converge to the optimal informed decision maker (like AI) in several senses. Possibly relevant concepts from statistics are, consistency, self tuningness, self optimizingness, e#ciency, unbiasedness, asymptotic or finite convergence [KV86], Pareto optimality, and some more defined in Scetion 4.3. Some concepts are stronger than necessary, others are weaker than desirable but suitable to start with. Self optimizingness is defined as the asymptotic convergence of the average true value 1m of AI# to the optimal value V # 1m . ....

[Article contains additional citation context not shown here]

P. R. Kumar and P. P. Varaiya. Stochastic Systems: Estimation, Identification, and Adaptive Control. Prentice Hall, Englewood Cli#s, NJ, 1986.


Structure Theorems for Partially Asynchronous Iterations of .. - Reza Gharavi School (2001)   (2 citations)  (Correct)

....In discounted dynamic programming, for instance, this reduces to finding the fixed point of the equation y = c aPy. where a C (0, 1) is a discount factor, c is the cost vector asso ciated with some stationary policy, and P is the transition probability matrix associated with this policy, see [9], for instance, for a discussion of the theory of discounted dy namic programming. Again, in most applications, P is large and sparse, making iterative approaches to finding the fixed point advantageous over direct methods. The sequence of iterates of the fixed point equation y = Ay Dc ....

Kumar, P. R., and Varaiya, P., Stochastic Systems: Estimation, Identification, and Adaptive Control, Prentice Hall, 1986. 73


Random Sampling of a Continuous-time Stochastic Dynamical.. - Micheli, Jordan (2002)   (3 citations)  (Correct)

....initial condition Q(0) 0, calculated in T k , i.e. Q k = Q(T k ) Equation (6) may be solved numerically on line; see Appendix A of [5] for further details. 2. 1 Estimation at Poisson arrivals The natural way of performing state estimation for a discrete time system like (2) is Kalman Filtering [4] [7] However, one has to pay special attention to the fact that some of the parameters that are deterministic in ordinary Kalman Filtering are, in our case, random: namely, matrices A k and Q k . Also, time intervals are themselves measurements , as well as sequence k ) In the light of ....

P. R. Kumar and P. Varaiya. Stochastic Systems: Estimation, Identification, and Adaptive Control. Prentice Hall, Englewood Cli#s, New Jersey, 1986.


Algorithms for Sequential Decision Making - Littman (1996)   (62 citations)  (Correct)

....tuple ( a, r, the arrays are updated by r[ r[ cid, cid, Given these statistics, we estimate ( c[ nd k( C[ The estimated model can be used in any of several ways to find a good policy. In the certaitty equivaletce approach [86], an optimal policy for the estimated model is found at each step. This makes maximal use of the available data at the cost of high computational overhead. In the DYNA [155] prioritized sweeping [111] and Queue dyna [119] approaches, an estimated value function is maintained and updated according ....

P. R. Kumar and P. P. Varaiya. Stochastic Systems: Estimation, Identification, and Adaptive Control. Prentice Hall, Englewood Cliffs, New Jersey, 1986.


Approccio Probabilistico Alla Navigazione Autonoma in Tre.. - Micheli (1999)   (Correct)

.... The first three subsections refer to the Bayesian approach, which is suitable for those situations where a prior probabilistic model of the variables that have to be estimated is known; the last subsection refers instead to the classical (or Fisher s) approach, which does See, for example, [32] and [40] 59 # h( x y w Figure 4.1: Representation of the static estimation problem. not presume any kind of prior knowledge on such variables that are therefore treated as deterministic (unknown) quantities. 4.1.1 Static Bayesian estimation: generalities Suppose we wish to estimate the ....

....to be known. The best linear estimate of x given y is given by (4.1) where: # xy = # x H , # y = H# x H #, m y = Hm x , therefore: E[x y] m x # x H H# x H (y x ) 4. 5) Sometimes it is convenient to express such estimate in another form, using the following lemma [40] [32]: One can prove that linear model (4.4) can actually fit any problem where the covariance matrix of vector [x is defined [40, p. 38] Lemma 4.3 (Martix Inversion Lemma) Given two invertible matrices NN MM and a matrix H , the following identity holds: H#H 1 = # ....

[Article contains additional citation context not shown here]

P. R. Kumar and P. Varaiya. Stochastic Systems: Estimation, Identification, and Adaptive Control. Prentice Hall, Englewood Cli#s, New Jersey, 1986.


Broadcasting and Streaming Stored Video - Saparilla (2000)   (Correct)

....to the enhancement layer) a more aggressive, optimistic policy is to allocate the available bandwidth in proportion to the average consumption rates of the layers. The problem of dynamically allocating bandwidth among the layers can be formulated as an adaptive stochastic control problem [24]. The fraction of bandwidth allocated to a layer can depend on a number of observable factors, including the current and past available bandwidth, the current prefetch buffer contents, and the dynamic consumption rates of the videos. However, the statistical characteristics of the available ....

P. R. Kumar and P. Varaiya. Stochastic Systems: Estimation, Identification, and Adaptive Control. Prentice-Hall, 1986.


Practical Reinforcement Learning in Continuous Domains - Forbes, Andre (2000)   (2 citations)  (Correct)

....that point, the sequence was repeated again. use a gradient ascent strategy to nd an approximation to the optimal action for our value function. Many algorithms discretize the action space, but for many control tasks, insuciently quantized actions can cause oscillations and unstable policies [ Kumar and Varaiya, 1986 ] In order to test our instance based Q learning algorithm, we evaluated three function approximation algorithms in the cart centering domain. The domain is similar to the one described in Section 5, but it is a bit simpler (i.e. smaller world and smaller time step) In order to simulate the ....

P.R. Kumar and Pravin Varaiya. Stochastic systems: Estimation, identication, and adaptive control. Prentice-Hall, Englewood Cli s, New Jersey, 1986.


Monotonicity Of Optimal Performance Measures For Polling Systems - Van Oyen (1997)   (Correct)

....node n in service) at time t. The class of admissible strategies, U, is taken to be the set of non anticipative, non preemptive (with respect to services and set ups) policies that are based on perfect present and past observations of the queue length processes (see Ross [9] or Kumar and Varaiya [5] for further elaboration on the stochastic control framework) Randomized control actions are admissible. A switching set up time, D n , and a switching set up cost of K n (0 K n 1) is incurred at each instant (including time 0) the server switches to queue n from a different queue to process ....

Kumar, P.R. and Varaiya, P. (1986) Stochastic Systems: Estimation, Identification, and Adaptive Control. Prentice-Hall, Englewood Cliffs.


Reinforcement Learning: A Survey - Kaelbling, Littman, Moore (1996)   (367 citations)  (Correct)

....We begin with the most conceptually straightforward method: #rst, learn the T and R functions by exploring the environment and keeping statistics about the results of each action; next, compute an optimal policy using one of the methods of Section 3. This method is known as certainty equivlance #Kumar Varaiya, 1986#. There are some serious objections to this method: # It makes an arbitrary division between the learning phase and the acting phase. # How should it gather data about the environment initially Random exploration might be dangerous, and in some environments is an immensely ine#cient method ....

Kumar, P.R.,&Varaiya, P.P. #1986#. Stochastic Systems: Estimation, Identi#cation, and Adaptive Control. Prentice Hall, Englewood Cli#s, New Jersey.


Finite-capacity Multi-class Production Scheduling with Set-up.. - Kim, Van Oyen (1998)   (Correct)

.... to be the set of non preemptive, stationary, nonrandomized, greedy (that is, never idling in a nonempty queue) policies that are functions of the current state, based on perfect observations of the queue length processes (see page 103 of Ross [19] and pages 38, 42, and 152 of Kumar and Varaiya [11] for terminology) By non preemptive, we mean that service times and set ups cannot be interrupted once initiated. Because of the non preemptive assumption, the set of 4 decision epochs is assumed to be the set of all arrival epochs to an idle server, service completion epochs, and set up ....

Kumar, P.R. and Varaiya, P. (1986) Stochastic Systems: Estimation, Identification, and Adaptive Control. Prentice-Hall, Englewood Cliffs.


A General Method for Incremental Self-Improvement and.. - Schmidhuber (1996)   (6 citations)  (Correct)

....system consisting of multiple agents, where each agent is in fact just a connection in a fully recurrent RL neural net. 2 The biggest difference between time and space is that you can t reuse time. Merrick Furst 1 Theoretical Considerations Previous work on reinforcement learning (e.g. [16, 2, 46, 48]) requires strong assumptions about the environment. For instance, previous algorithms are designed for environments where the expected reward for a certain behavior is the same during all successive trials . In realistic settings, however, such assumptions do not hold. In general, any ....

....occurring early in system life may influence events actions experiments at any later time. In particular, PMP i may affect the environmental conditions for PMP k ; k i. This is not addressed by existing algorithms for adaptive control and reinforcement learning (see, e.g. [16, 2, 46, 48]) and not even by naive, inefficient, but more general and supposedly infallible exhaustive search among all possible policies, as will be seen next. What s wrong with exhaustive search Apart from the fact that exhaustive search is not considered practical even for moderate search spaces, it ....

P. R. Kumar and P. Varaiya. Stochastic Systems: Estimation, Identification, and Adaptive Control. Prentice Hall, 1986.


Finite-capacity Multi-class Production Scheduling with Set-up.. - Kim, Van Oyen (1998)   (Correct)

.... to be the set of non preemptive, stationary, nonrandomized, greedy (that is, never idling in a nonempty queue) policies that are functions of the current state, based on perfect observations of the queue length processes (see page 103 of Ross [18] and pages 38, 42, and 152 of Kumar and Varaiya [11] for terminology) By non preemptive, we mean that service times and set ups cannot be interrupted once initiated. Because of the non preemptive assumption, the set of decision epochs is assumed to be the set of all arrival epochs to an idle server, service completion epochs, and set up completion ....

....policy for system A. If we allow A to be in U R;NS , the preceding argument can easily be adapted to yield the same conclusion and J A (A) J B (B) Because the problem with finite queue lengths has both finite state and action spaces, the results of Ross [18] or Kumar and Varaiya [11] indicate that there exists a policy in U that is optimal. Numerical optimization computes the optimal cost of the truncated problem to within ffl. Since U ae U R;NS , the results holds. To conclude, we note that the same arguments can be applied if the buffers for system A satisfy M 0 i M i ....

Kumar, P.R. and Varaiya, P. (1986) Stochastic Systems: Estimation, Identification, and Adaptive Control. Prentice-Hall, Englewood Cliffs.


Practical Reinforcement Learning in Continuous Domains - Forbes, Andre (1999)   (2 citations)  (Correct)

....The first 20 trials at = Gamma 1, the next 10 trials at = Gamma 0:5, the next 10 trials at = Gamma 0:25, and the final 10 trials at = Gamma 0:125. At that point, the sequence was repeated again. control tasks, insufficiently quantized actions can cause oscillations and unstable policies [ Kumar and Varaiya, 1986 ] In order to test our instance based Q learning algorithm, we evaluated three function approximation algorithms in the cart centering domain. The domain is similar to the one described in Section 5, but it is a bit simpler (i.e. smaller world and smaller time step) In order to simulate the ....

P.R. Kumar and Pravin Varaiya. Stochastic systems: Estimation, identification, and adaptive control. Prentice-Hall, 1986.


Extended Message Passing Algorithm for Inference in Loopy.. - Plarre, Kumar (2002)   (6 citations)  Self-citation (Kumar)   (Correct)

....models, as a way of approximating the posterior distributions; see, for example [6] In Gaussian graphical models (models in which the nodal variables are jointly Gaussian) the problem of probabilistic inference reduces to that of computing the posterior mean and covariance. The problem is linear [10], and the algorithm takes a simpli ed form [16] In fact, the posterior mean can be obtained as the solution to a system of linear equations, while the posterior covariance can be obtained as the inverse of a matrix. Recently, message passing algorithms have been studied on Gaussian graphical ....

P. R. Kumar, P. Varaiya, Stochastic systems : estimation, identi cation, and adaptive control, Englewood Cli s, N.J. : Prentice Hall, 1986.


Control of Hybrid Systems - Deshpande (1994)   (11 citations)  Self-citation (Varaiya)   (Correct)

....is guaranteed with a specified confidence. In our development, we are concerned with viable control of nondeterministic hybrid systems. Some results on viability theory are given by Aubin [2] The approaches of optimal control and stochastic control are frequently combined (see Kumar and Varaiya [13]) Similarly, the approaches of viable control and stochastic control can also be combined (see Kurzhansky [14] When the condition for optimality is relaxed so that any control trajectory with cost close to optimal is permitted, then the approaches of optimal control and viable control (or all ....

P. Kumar and P. Varaiya. Stochastic Systems: Estimation, Identification, and Adaptive Control. Prentice Hall, 1986. 66


Technical Report IDSIA-09-06 Asymptotic Learnability of.. - Problems With Arbitrary   (Correct)

No context found.

P. R. Kumar and P. P. Varaiya. Stochastic Systems: Estimation, Identification, and Adaptive Control. Prentice Hall, Englewood Cli#s, NJ, 1986.


Individual Equilibrium and Learning in a Processor Sharing.. - Altman, Shimkin (1996)   (1 citation)  (Correct)

No context found.

Kumar, P. R., and P. Varaiya. 1986. Stochastic Systems: Estimation, Identification and Adaptive Control. Prentice-Hall, NJ.


Discrete-Time Analysis of Adaptive Rate Control Mechanisms - Altman, Baccelli, Bolot (1994)   (15 citations)  (Correct)

No context found.

P. R. Kumar, P. Varaiya, Stochastic Systems: Estimation, Identification and Control, Prentice-Hall, 1986.


Revisiting the TTL-based Controlled Flooding Search: Optimality .. - Chang, Liu (2004)   (2 citations)  (Correct)

No context found.

P.R. Kumar and P. Karaiya, Stochastic Systems: Estimation, Identification, and Adaptive Control, Prentice-Hall, Inc, 1986, Englewood Cli#s, NJ.


Sequence Prediction based on Monotone Complexity - Hutter (2003)   (Correct)

No context found.

P. R. Kumar and P. P. Varaiya. Stochastic Systems: Estimation, Identification, and Adaptive Control. Prentice Hall, Englewood Cli#s, NJ, 1986.


Cross Traffic Estimation by Loss Process Analysis - Salamatian, BAYNAT, BUGNAZET   (Correct)

No context found.

P. R. Kumar and P. Varaiya, Stochastic Systems: Estimation, Identification, and Adaptive Control, Prentice-Hall, Englewood Cliffs, N.J., 1986.


Self-Optimizing and Pareto-Optimal Policies in General.. - Hutter (2002)   (6 citations)  (Correct)

No context found.

P. R. Kumar and P. P. Varaiya. Stochastic Systems: Estimation, Identi cation, and Adaptive Control. Prentice Hall, Englewood Cli s, NJ, 1986.


Dynamic Power Management for Nonstationary Service.. - Chung, Benini.. (2002)   (19 citations)  (Correct)

No context found.

P. Kumar and P. Varaiya, Stochastic Systems: Estimation, Identification, and Adaptive Control. Prentice Hall, 1986.


Stochastic Routing in Ad Hoc Wireless Networks - Lott, Teneketzis (2001)   (Correct)

No context found.

P. Kumar and P. Varaiya, Stochastic Systems: Estimation, Identi cation, and Adaptive Control, Prentice-Hall, 1986

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC