| Fahiem Bacchus, Craig Boutilier, and Adam Grove, \Rewarding behaviors," in Proceedings of the Thirteenth National Conference on Arti cial Intelligence and the Eighth Innovative Applications of Arti cial Intelligence Conference, Menlo Park, Aug. 4-8 1996, pp. 1160-1167, AAAI Press / MIT Press. |
....terms in the declarative sadl language. Partially observable Markov Decision Processes [ 20, 2 ] provide an elegant representation of sensing actions and actions with uncertain outcomes in Markov domains. However, they don t lend themselves to efficient algorithms. With few exceptions, such as [ 1 ] , work in MDPs assumes that reward functions (goals) are Markov as well, so temporal goals like initially are inexpressible. Anumber of contingent planning systems haveintroduced novel representations of uncertainty and sensing actions. Warplan C [ 34 ] tags actions as conditional, meaning they ....
Fahiem Bacchus, Craig Boutilier, and Adam Grove. Rewarding behaviors. In Proc. 14th Nat. Conf. on AI, 1995.
....criterion could be used to optimize risk sensitive behavior. Haddawy et al. 21] looked at a broad family of decision theoretic objectives that make it possible to specify trade offs between partially satisfying goals quickly and satisfying them completely. Bacchus, Boutilier, and Grove [2] show how some richer objectives based on evaluations of sequences of actions can actually be converted to total reward problems. Other objectives considered in planning systems, aside from simple goals of achievement, include goals of maintenance and goals of prevention [15] these types of goals ....
Fahiem Bacchus, Craig Boutilier, and Adam Grove. Rewarding behaviors. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pages 1160--1167. AAAI Press/The MIT Press, 1996. 39
....criterion could be used to optimize risk sensitive behavior. Haddawy et al. 15] looked at a broad family of decision theoretic objectives that make it possible to specify trade offs between partially satisfying goals quickly and satisfying them completely. Bacchus, Boutilier, and Grove [2] show how some richer objectives based on evaluations of sequences of actions can actually be converted to totalreward problems. Other objectives considered in planning systems, aside from simple goals of achievement, include goals of maintenance and goals of prevention [13] these types of goals ....
Fahiem Bacchus, Craig Boutilier, and Adam Grove. Rewarding behaviors. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pages 1160--1167. AAAI Press/The MIT Press, 1996.
....criterion could be used to optimize risk sensitive behavior. Haddawy et al. 16] looked at a broad family of decision theoretic objectives that make it possible to specify trade offs between partially satisfying goals quickly and satisfying them completely. Bacchus, Boutilier, and Grove [2] show how some richer objectives based on evaluations of sequences of actions can actually be converted to total reward problems. Other objectives considered in planning systems, aside from simple goals of achievement, include goals of maintenance and goals of prevention [14] these types of goals ....
Fahiem Bacchus, Craig Boutilier, and Adam Grove. Rewarding behaviors. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pages 1160--1167. AAAI Press/The MIT Press, 1996.
....fact holds in situation s. All state changes are assumed to result from the execution of actions. The special function DO is used to describe these changes: DO(a; s) returns the situation resulting from executing action a in situation s. We use fag n 1 to represent the sequence of actions a 1 ; a 2 ; an . DO(fag n 1 ; s) denotes nested application DO(an ; DO(an01 ; DO(a 1 ; s) i.e. the result of executing the entire sequence, starting in situation s. We use s n as a shorthand for DO(fag n 1 ; s 0 ) Our formulation of sadl is based on Scherl and Levesque s [ ....
....The special function DO is used to describe these changes: DO(a; s) returns the situation resulting from executing action a in situation s. We use fag n 1 to represent the sequence of actions a 1 ; a 2 ; an . DO(fag n 1 ; s) denotes nested application DO(an ; DO(an01 ; DO(a 1 ; s) i.e. the result of executing the entire sequence, starting in situation s. We use s n as a shorthand for DO(fag n 1 ; s 0 ) Our formulation of sadl is based on Scherl and Levesque s [ 32 ] solution to the frame problem for knowledge producing actions. We adopt their completeness ....
[Article contains additional citation context not shown here]
Fahiem Bacchus, Craig Boutilier, and Adam Grove. Rewarding behaviors. In Proc. 14th Nat. Conf. on AI, 1995.
....that terminates with probability1 Gamma fl at at any point in time (e.g. the robot can break down) in which case discounted models correspond to expected total reward over an finite but uncertain horizon. For these reasons, discounting is sometimes used for finite horizon problems. 16 See [3, 4], however, for a systematic approach to dealing with certain types of history dependent reward functions. 19 Another technique for dealing with infinite horizon problems is to evaluate a trajectory based on the average reward accrued per stage, or gain. The gain of a history is defined to be ....
Fahiem Bacchus, Craig Boutilier, and Adam Grove. Rewarding behaviors. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pages 1160--1167, Portland, OR, 1996.
....desirable behaviors by referring to trajectory properties (properties of the sequence of states passed through, i.e. the system s history) in addition to just the current state. This has shown up in work on planning [HH92, Dru89, Kab90, GK91] e.g. in the use of maintenance goals) and in [BBG96] we have argued that many reward functions for process oriented prob The work of Fahiem Bacchus and Craig Boutilier was supported by the Canadian government through their NSERC and IRIS programs. 1 Copyright c fl1997, American Association for Artificial Intelligence (www.aaai.org) All ....
....etc. For instance, rewarding an agent for achieving a goal within k steps of a request being issued is a natural, yet history dependent, specification of desirable behavior. Similarly, process dynamics (action effects) are sometimes most naturally expressed in a history dependent fashion. In [BBG96] we examined Non Markovian decision processes (NMDPs) and identified two key issues, namely, the specification of non Markovianproperties and the solutionof NMDPs. 2 A temporal logic called PLTL was used as a mechanism for specifying the non Markovian aspects of a system, and we will adopt the ....
[Article contains additional citation context not shown here]
Fahiem Bacchus, Craig Boutilier, and Adam Grove. Rewarding behaviors. In Proceedings of the Thirteenth National Conferenceon Artificial Intelligence, pages1160--1167, Portland, OR, 1996.
No context found.
Fahiem Bacchus, Craig Boutilier, and Adam Grove, \Rewarding behaviors," in Proceedings of the Thirteenth National Conference on Arti cial Intelligence and the Eighth Innovative Applications of Arti cial Intelligence Conference, Menlo Park, Aug. 4-8 1996, pp. 1160-1167, AAAI Press / MIT Press.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC