8 citations found. Retrieving documents...
Fahiem Bacchus, Craig Boutilier, and Adam Grove, \Rewarding behaviors," in Proceedings of the Thirteenth National Conference on Arti cial Intelligence and the Eighth Innovative Applications of Arti cial Intelligence Conference, Menlo Park, Aug. 4-8 1996, pp. 1160-1167, AAAI Press / MIT Press.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Representing Sensing Actions: The Middle Ground Revisited - Golden, Weld (1996)   (30 citations)  (Correct)

....terms in the declarative sadl language. Partially observable Markov Decision Processes [ 20, 2 ] provide an elegant representation of sensing actions and actions with uncertain outcomes in Markov domains. However, they don t lend themselves to efficient algorithms. With few exceptions, such as [ 1 ] , work in MDPs assumes that reward functions (goals) are Markov as well, so temporal goals like initially are inexpressible. Anumber of contingent planning systems haveintroduced novel representations of uncertainty and sensing actions. Warplan C [ 34 ] tags actions as conditional, meaning they ....

Fahiem Bacchus, Craig Boutilier, and Adam Grove. Rewarding behaviors. In Proc. 14th Nat. Conf. on AI, 1995.


Planning and Acting in Partially Observable Stochastic Domains - Kaelbling, Littman, al. (1998)   (30 citations)  (Correct)

....criterion could be used to optimize risk sensitive behavior. Haddawy et al. 21] looked at a broad family of decision theoretic objectives that make it possible to specify trade offs between partially satisfying goals quickly and satisfying them completely. Bacchus, Boutilier, and Grove [2] show how some richer objectives based on evaluations of sequences of actions can actually be converted to total reward problems. Other objectives considered in planning systems, aside from simple goals of achievement, include goals of maintenance and goals of prevention [15] these types of goals ....

Fahiem Bacchus, Craig Boutilier, and Adam Grove. Rewarding behaviors. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pages 1160--1167. AAAI Press/The MIT Press, 1996. 39


Planning and Acting in Partially Observable Stochastic.. - Kaelbling, Littman.. (1995)   (182 citations)  (Correct)

....criterion could be used to optimize risk sensitive behavior. Haddawy et al. 15] looked at a broad family of decision theoretic objectives that make it possible to specify trade offs between partially satisfying goals quickly and satisfying them completely. Bacchus, Boutilier, and Grove [2] show how some richer objectives based on evaluations of sequences of actions can actually be converted to totalreward problems. Other objectives considered in planning systems, aside from simple goals of achievement, include goals of maintenance and goals of prevention [13] these types of goals ....

Fahiem Bacchus, Craig Boutilier, and Adam Grove. Rewarding behaviors. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pages 1160--1167. AAAI Press/The MIT Press, 1996.


Planning and Acting in Partially Observable Stochastic.. - Kaelbling, Littman.. (1997)   (182 citations)  (Correct)

....criterion could be used to optimize risk sensitive behavior. Haddawy et al. 16] looked at a broad family of decision theoretic objectives that make it possible to specify trade offs between partially satisfying goals quickly and satisfying them completely. Bacchus, Boutilier, and Grove [2] show how some richer objectives based on evaluations of sequences of actions can actually be converted to total reward problems. Other objectives considered in planning systems, aside from simple goals of achievement, include goals of maintenance and goals of prevention [14] these types of goals ....

Fahiem Bacchus, Craig Boutilier, and Adam Grove. Rewarding behaviors. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pages 1160--1167. AAAI Press/The MIT Press, 1996.


Representing Sensing Actions: The Middle Ground Revisited - Golden, Weld (1996)   (30 citations)  (Correct)

....fact holds in situation s. All state changes are assumed to result from the execution of actions. The special function DO is used to describe these changes: DO(a; s) returns the situation resulting from executing action a in situation s. We use fag n 1 to represent the sequence of actions a 1 ; a 2 ; an . DO(fag n 1 ; s) denotes nested application DO(an ; DO(an01 ; DO(a 1 ; s) i.e. the result of executing the entire sequence, starting in situation s. We use s n as a shorthand for DO(fag n 1 ; s 0 ) Our formulation of sadl is based on Scherl and Levesque s [ ....

....The special function DO is used to describe these changes: DO(a; s) returns the situation resulting from executing action a in situation s. We use fag n 1 to represent the sequence of actions a 1 ; a 2 ; an . DO(fag n 1 ; s) denotes nested application DO(an ; DO(an01 ; DO(a 1 ; s) i.e. the result of executing the entire sequence, starting in situation s. We use s n as a shorthand for DO(fag n 1 ; s 0 ) Our formulation of sadl is based on Scherl and Levesque s [ 32 ] solution to the frame problem for knowledge producing actions. We adopt their completeness ....

[Article contains additional citation context not shown here]

Fahiem Bacchus, Craig Boutilier, and Adam Grove. Rewarding behaviors. In Proc. 14th Nat. Conf. on AI, 1995.


Decision Theoretic Planning: Structural Assumptions and.. - Boutilier, Dean, Hanks (1999)   (150 citations)  Self-citation (Boutilier)   (Correct)

....that terminates with probability1 Gamma fl at at any point in time (e.g. the robot can break down) in which case discounted models correspond to expected total reward over an finite but uncertain horizon. For these reasons, discounting is sometimes used for finite horizon problems. 16 See [3, 4], however, for a systematic approach to dealing with certain types of history dependent reward functions. 19 Another technique for dealing with infinite horizon problems is to evaluate a trajectory based on the average reward accrued per stage, or gain. The gain of a history is defined to be ....

Fahiem Bacchus, Craig Boutilier, and Adam Grove. Rewarding behaviors. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pages 1160--1167, Portland, OR, 1996.


Structured Solution Methods for Non-Markovian Decision.. - Bacchus, Boutilier, Grove (1997)   (2 citations)  Self-citation (Bacchus Boutilier)   (Correct)

....desirable behaviors by referring to trajectory properties (properties of the sequence of states passed through, i.e. the system s history) in addition to just the current state. This has shown up in work on planning [HH92, Dru89, Kab90, GK91] e.g. in the use of maintenance goals) and in [BBG96] we have argued that many reward functions for process oriented prob The work of Fahiem Bacchus and Craig Boutilier was supported by the Canadian government through their NSERC and IRIS programs. 1 Copyright c fl1997, American Association for Artificial Intelligence (www.aaai.org) All ....

....etc. For instance, rewarding an agent for achieving a goal within k steps of a request being issued is a natural, yet history dependent, specification of desirable behavior. Similarly, process dynamics (action effects) are sometimes most naturally expressed in a history dependent fashion. In [BBG96] we examined Non Markovian decision processes (NMDPs) and identified two key issues, namely, the specification of non Markovianproperties and the solutionof NMDPs. 2 A temporal logic called PLTL was used as a mechanism for specifying the non Markovian aspects of a system, and we will adopt the ....

[Article contains additional citation context not shown here]

Fahiem Bacchus, Craig Boutilier, and Adam Grove. Rewarding behaviors. In Proceedings of the Thirteenth National Conferenceon Artificial Intelligence, pages1160--1167, Portland, OR, 1996.


An Approach to the Design of Reinforcement Functions.. - Bonarini, Bonacina..   (Correct)

No context found.

Fahiem Bacchus, Craig Boutilier, and Adam Grove, \Rewarding behaviors," in Proceedings of the Thirteenth National Conference on Arti cial Intelligence and the Eighth Innovative Applications of Arti cial Intelligence Conference, Menlo Park, Aug. 4-8 1996, pp. 1160-1167, AAAI Press / MIT Press.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC