10 citations found. Retrieving documents...
Crites, R. H. (1996). Large-scale dynamic optimization using teams of reinforcement learning agents. Ph.D. thesis, University of Massachusetts, Amherst.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Reinforcement Learning with Exploration - Reynolds (2002)   (Correct)

.... ) nction approximators that can be trained using this scheme include state aggregation (state aliasing and nearest neighhour methods) k nearest neighhour, certain kernel based learners (such as RBF methods with fixed centres and basis widths) piece wise and barycentric linear interpolation [80, 37, 93], and table lookup. All of these methods differ only by their choice of input mapping, which is often normalised. Many of these methods are already employed in RL (see [136, 167, 140, 117, 93, 97] for recent examples) Special cases of this framework for which convewence theorems exist are, ....

Robert H. Crites. Large-Scale Dynamic Optimization Using Teams Of Reinforcement Learning Agents. PhD thesis, (Computer Science) Graduate School of the University of Massachusetts, Amherst, September 1996.


The Stability of General Discounted Reinforcement Learning with.. - Reynolds   (Correct)

.... Q(s; a) f( s) a) or Q(s; a) f( s; a) Examples of methods which t this parameter framework are non linear methods such as a multi layer perceptions (MLP) Although these non linear methods have had some eyebrow raising success in practical applications of RL (see [21, 3, 25]) there is little or no practical theory about their behaviour. In particular, it is not generally known whether these methods are stable (i.e. not prone to diverge) when used in combination with RL. A much stronger body of theory exists for linear function approximators, such as the CMAC [23, ....

Robert H. Crites. Large-Scale Dynamic Optimization Using Teams Of Reinforcement Learning Agents. PhD thesis, (Computer Science) Graduate School of the University of Massachusetts, Amherst, September 1996.


Recent Advances in Hierarchical Reinforcement Learning - Barto, Mahadevan (2003)   (10 citations)  (Correct)

....it can be computed recursively during the waiting time. Bradtke and Du# [7] showed how to do this for continuous time SMDPs, Parr [48] proved that it converges under essentially the same conditions required for Q learning convergence, and Das et al. 12] developed the average reward case. Crites [10, 11] used SMDP Q learning in a continuous time discrete event formulation of an elevator dispatching problem, an application that illustrates two useful features of RL methods for discrete event systems. First, Q learning and Sarsa do not require explicit knowledge of the expected immediate rewards or ....

R. H. Crites. Large-Scale Dynamic Optimization Using Teams of Reinforcement Learning Agents. PhD thesis, University of Massachusetts, Amherst, MA, 1996.


Decision Boundary Partitioning: Variable Resolution Model-Free.. - Reynolds (1999)   (3 citations)  (Correct)

.... By far the most common method is to perform uniform discretisation and assume that each region approximates a state in a discrete Markov process [21, 8, 16] Many other methods exist including coarse tile coding [18, 21, 15, 14] memorybased methods [1, 14] neural networks with backpropogation [20, 5] and recurrent networks [13] to name just a few. All the above methods have the requirement that the designer of the system needs to decide upon various parameters in advance. These include: appropriate scaling of dimensions, levels of generalisation (kernel and tile sizes) available resources ....

Robert H. Crites. Large-Scale Dynamic Optimization Using Teams Of Reinforcement Learning Agents. PhD thesis, (Computer Science) Graduate School of the University of Massachusetts, Amherst, September 1996.


Rule Extraction: Where Do We Go from Here? - Craven, Shavlik (1999)   (Correct)

....from a set of candidates, Trepan uses information gain as an evaluation measure. An Application of Trepan Trepan has been used to extract trees from networks trained in a wide variety of problem domains, including several in which the neural networks were developed by others: elevator control (Crites Barto 1996; Crites 1996) exchange rate prediction (Weigend, Zimmermann, Neuneier 1995) and climate modeling (Trimble, Santee, Neidrauer 1997) In all of these cases, the networks were developed without any prior intention of applying rule extraction methods to them. Here we briefly describe the ....

....Trepan uses information gain as an evaluation measure. An Application of Trepan Trepan has been used to extract trees from networks trained in a wide variety of problem domains, including several in which the neural networks were developed by others: elevator control (Crites Barto 1996; Crites 1996), exchange rate prediction (Weigend, Zimmermann, Neuneier 1995) and climate modeling (Trimble, Santee, Neidrauer 1997) In all of these cases, the networks were developed without any prior intention of applying rule extraction methods to them. Here we briefly describe the application of Trepan ....

Crites, R. H. 1996. Large-Scale Dynamic Optimization Using Teams of Reinforcement Learning Agents. Ph.D.


Learning Situation-Specific Control In Multi-Agent Systems - Prasad (1997)   (Correct)

....an agent is concerned, the other agents are a part of its noisy environment. Inspite of the simple nature of the agents, the work by Barto[Barto, 1985, Barto, 1986] represents one of the pioneering attempts in learning among multiple agents. More recently, Crites and Barto[Crites and Barto, 1996, Crites, 1996] use similar techniques to improve elevator dispatching performance. 29 2.6 Emergent Behaviors Recent excitement in Artificial Life[Langton(Ed. 1989, Langton et al. 1992] has led to a sizable body of work on emergent behaviors. A large number of extremely simple elements ( dumb agents) ....

Crites, R. H. Large-Scale Dynamic Optimization Using Teams Of Reinforcement Learning Agents. PhD thesis, Dept. of Computer Sceince, University of Massachusetts, Amherst, 1996.


Elevator Group Control Using Multiple Reinforcement Learning.. - Crites, Barto (1998)   (20 citations)  Self-citation (Crites)   (Correct)

....the following sections, we give some additional background on RL, introduce the elevator domain, describe in more detail the multi agent RL algorithm and network architecture we used, present and discuss our results, and finally draw some conclusions. For further details on all these topics, see Crites (1996). 2. Reinforcement Learning Both symbolic and connectionist learning researchers have focused primarily on supervised learning, where a teacher provides the learning system with a set of training examples in the form of input output pairs. Supervised learning techniques are useful in a wide ....

....way of approximating DP on very large problems. The same focusing phenomenon can also be achieved with simulated online training. One can often construct a simulation model without ever explicitly determining the state transition probabilities for an environment (Barto Sutton, forthcoming; Crites Barto, 1996). For an example of such a simulation model, see section 3.3. There are several advantages to this use of a simulation model if it is sufficiently accurate. It is possible to generate huge amounts of simulated experience very quickly, potentially speeding up the training process by many orders ....

[Article contains additional citation context not shown here]

R. H. Crites. Large--Scale Dynamic Optimization Using Teams of Reinforcement Learning Agents. PhD thesis, University of Massachusetts, September 1996.


A Reinforcement Learning Scheme for a Partially-Observable.. - Ishii, Fujita, Al. (2004)   (Correct)

No context found.

Crites, R. H. (1996). Large-scale dynamic optimization using teams of reinforcement learning agents. Ph.D. thesis, University of Massachusetts, Amherst.


Recent Advances in Hierarchical Reinforcement Learning - Barto, Mahadevan (2003)   (10 citations)  (Correct)

No context found.

R. H. Crites. Large-Scale Dynamic Optimization Using Teams of Reinforcement Learning Agents. PhD thesis, University of Massachusetts, Amherst, MA, 1996.


Title of the Book! - Name Of Author   (Correct)

No context found.

R. H. Crites, Large--Scale Dynamic Optimization Using Teams of Reinforcement Learning Agents. PhD thesis, University of Massachusetts, Amherst, MA, 1996.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC