| Hansen, E. A., Barto, A. G., Zilberstein, S.: Reinforcement Learning for Mixed Open-loop and Closed-loop Control. NIPS-9, MIT Press (1996) |
.... world actions ) Eric Hansen showed that a COMDP with only two observation actions one that reveals the entire state and the other that observes nothing can be converted into an MDP, provided there is a bound on the number of world actions taken between observation actions (see [ 5 ] 6 ] and also Sven Koenig s extension to sensor planning in [ 9 ] The chain MDP algorithm is a heuristic for approximately solving COMDPs, starting with the MDP underlying the POMDP, and constructing a sequence of MDPs M 1 ; M 2 ; whose reward functions have been modified to incorporate ....
Hansen, E. A., Barto, A. G., Zilberstein, S.: Reinforcement Learning for Mixed Open-loop and Closed-loop Control. NIPS-9, MIT Press (1996)
....as general models for planning in the face of uncertainty (Littman 1996) Optimal POMDP solution methods are known to be impractical in many cases even for small problems. Proposed approaches often rely on additional assumptions about the problem that are not present in this example (e.g. (Hansen, Barto, and Zilberstein 1996) relies on the presence of sensor actions that reveal the hidden state information) Queueing scheduling problems such as the one we consider are most commonly formulated using continuous time MDP or semi MDP models (see for example (Stidham and Weber 1993) For simplicity in explicating the ....
Hansen, Barto, and Zilberstein 1996 Hansen, E. A., Barto, A.G. and Zilberstein, S. Reinforcement Learning for Mixed Open-loop and Closed-loop Control. Proceedings of the Ninth Neural Information Processing Systems Conference. Denver, Colorado, December, 1996.
....decentralized control of finite state Markov processes (Aicardi, Davoli, Minciardi 1987; Sandell et al. 1978) but they do not have communication decisions as well. The problem of decision making with the cost of communication is a very important one. In the single agent case, it is studied in (Hansen, Barto, Zilberstein 1996; Hansen Zilberstein 1996) where communication takes the special form of an agent sensing the environment. In a multi agent system, communication costs may relate to transmission fee, resource cost, etc. In the following sections we present a definition of a decentralized MMDP, followed by an ....
....Markov processes (Aicardi, Davoli, Minciardi 1987; Sandell et al. 1978) but they do not have communication decisions as well. The problem of decision making with the cost of communication is a very important one. In the single agent case, it is studied in (Hansen, Barto, Zilberstein 1996; Hansen Zilberstein 1996), where communication takes the special form of an agent sensing the environment. In a multi agent system, communication costs may relate to transmission fee, resource cost, etc. In the following sections we present a definition of a decentralized MMDP, followed by an example system, and a ....
Hansen, E.; Barto, A.; and Zilberstein, S. 1996. Reinforcement learning for mixed open-loop and closed-loop control. In Proceedings of the Ninth Neural Information Processing Systems Conference.
....structure is assumed, usually in the form of a delay of nonlocal information, i.e. the global state information will be available for all agents after k stages. The problem of decision making with the cost of communication is a very important one. In the single agent case, it is studied in [5, 6, 7], where communication takes the special form of an agent sensing the environment. In a multi agent system, communication costs may relate to transmission fee, resource cost, etc. In the following sections we present a de nition of a decentralized cooperative multi agent decision process, followed ....
E. Hansen, A. Barto, and S. Zilberstein. Reinforcement learning for mixed open-loop and closed-loop control. In Proceedings of the Ninth Neural Information Processing Systems Conference, December 1996.
....still do not need to make decision on communication. In our work, however, since agents need to make decision on their communication of nonlocal information, they may not have a fixed common information structure. The problem of decision making with the cost of communication is also studied in [3, 4, 5], where communication takes the special form as an agent sensing the environment, where sensing requires a cost but can provide information to resolve the uncertainty about the environment. Our work extends to the case when a team of decentralized agents are cooperating. In the following sections ....
E. Hansen, A. Barto, and S. Zilberstein. Reinforcement learning for mixed open-loop and closed-loop control. In Proceedings of the Ninth Neural Information Processing Systems Conference, December 1996.
No context found.
E. Hansen, A. Barto, and S. Zilberstein. Reinforcement learning for mixed open-loop and closed-loop control. In Proceedings of the Ninth Neural Information Processing Systems Conference, December 1996.
No context found.
E. A. Hansen, A. G. Barto, and S. Zilberstein, "Reinforcement learning for mixed open-loop and closed-loop control," in Proc. of NIPS, 1997, pp. 1026.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC