| S. S. Keerthi and B. Ravindran, A Tutorial Survey of Reinforcement Learning, to appear in : Sadhana. |
....the weights of the other network which is deputed to these proposals. Sophisticated policies for rewarding the control action proposals are studied in the literature based on relaxed versions of Dynamic Programming [55] However actual applications in the control field generally concern model free [64], possibly memoryless [143] rewarding methods. The joint use of neural networks and symbolic tools is a favourite approach in the control scientist community. Actually system theory boasts so many results that to ignore it completely very often appears wasteful. This cooperation might have ....
S. S. Keerthi, B. Ravindran, "A tutorial survey of reinforcement learning", TR Dept of Comp. Sci. and Automat., Indian Institute of Sci., Bangalore, 1995.
....Networks 3 Reinforcement and Active Learning Systems Study of animal learning and human cognition has yielded the concept of reinforcement learning. A reinforcement learning system is one which learns an associative mapping by maximizing a scalar evaluation of its performance from the environment [5]. The concept of active learning is similar. An active learning system is one that can influence the training data it receives by actions or queries to its environment [2] If a module can be constructed which is able to design and conduct active experimentation and to decide when it is necessary ....
S. Sathiya Keerthi and B. Ravindran. A tutorial survey of reinforcement learning. Sadhana (published by the Indian Academy of Sciences), January 1995.
....less than N j . We now discuss the following two methods to achieve the above goal: the credit assignment method, and the sequential adaptation method. Credit assignment method The basic idea has been adapted from the reinforcement learning literature (a good survey can be found in [13]) In the problems considered, the reinforcement signal r for an action comes random amount of time after the action has been taken. Hence credit for receiving r, is to be assigned among all the actions taken till the arrival of r. Such an assignment problem is usually referred as temporal credit ....
S. Sathiya Keerthi, and B. Ravindran, "A tutorial survey of reinforcement learning", S ¯ ADHAN ¯ A: Indian Academy of Science Proceedings in Engg. Sciences, 19(1994) 851-889.
....adapt the tunable parameters (ff, fi, fl, used by Quo Vadis at each node in response to changes in network dynamics are of interest. In particular, variations of techniques drawn from adaptive control [White Sofge, 1992] and machine learning [Honavar, 1994] especially reinforcement learning [Keerthi Ravindran, 1994] are currently under investigation. For examples of preliminary work by other investigators on this topic, the reader is referred to [Littman and Boyan 1993; Lehman et al. 1993] In conclusion, it must be noted that Quo Vadis exemplifies a family of parameterized algorithms, different instances ....
Keerthi, S.S. and Ravindran, B. A Tutorial Survey of Reinforcement Learning. (preprint) (1994).
....the tunable parameters (ff, fi, fl, j, used by Quo Vadis at each node in response to changes in network dynamics are of interest. In particular, variations of techniques drawn from adaptive control [White Sofge, 1992] and machine learning [Honavar, 1994] especially reinforcement learning [Keerthi Ravindran, 1994] are currently under investigation. For examples of preliminary work by other investigators on this topic, the reader is refered to [Littman and Boyan 1993; Lehman et al. 1993] In conclusion, it must be noted that Quo Vadis exemplifies a family of parameterized algorithms, different instances of ....
Keerthi, S.S. and Ravindran, B. A Tutorial Survey of Reinforcement Learning. (preprint) (1994).
....choice of data. Currently there exist two overall separate recognised approaches to implementing active learning reinforcement learning and querying. Whereas the former refers broadly to a class of problems which involve policy based learning algorithms using stateaction transition models [10, 9], the latter is concerned with techniques of asking the most appropriate questions, usually for optimal information gain [23] While the two approaches must necessarily have elements in common, not much exists in the literature that addresses their unification. In this work we will adhere mostly ....
S. Sathiya Keerthi and B. Ravindran. A tutorial survey of reinforcement learning. Sadhana (published by the Indian Academy of Sciences), January 1995.
No context found.
S. S. Keerthi and B. Ravindran, A Tutorial Survey of Reinforcement Learning, to appear in : Sadhana.
No context found.
S. Sathya Keerthi and B. Ravindran. A tutorial survey of reinforcement learning. Sadhana, 19(6):851--889, 1994.
No context found.
Keerthi, S.S. and Ravindran, B. A Tutorial Survey of Reinforcement Learning. (preprint) (1994).
No context found.
S. Sathiya Keerthi and B. Ravindran. A tutorial survey of reinforcement learning. Sadhana (published by the Indian Academy of Sciences), January 1995.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC