59 citations found. Retrieving documents...
R MCCALLUM, A. Overcoming incomplete perception with utile distinction memory. In "Proceedings of the tenth international machine learning conference", Amherst (1993).

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Multi-model Approach to Non-stationary Reinforcement Learning - Samuel Choi Dit   (Correct)

....action = arg max a2A X i P (M i jD) Delta Q i (s; a) 2) where A is the action space, M i is the i th environment model stored in the model library, and Q i (s; a) is the Qvalue of M i given state s and action a. This decision rule is equivalent to the one used by Chrisman and McCallum [2, 6]. While this simple decision rule provides only suboptimal solutions due to ignoring the future uncertainty, it requires substantially less computation and performs reasonably well in most cases. Unlike Chrisman and McCallum s formulation, our decision rule also consider the current model ....

A. McCallum. Overcoming incomplete perception with utile distinction memory. In Tenth International Machine Learning Conference, Amherst, MA, 1993.


Visual Feature Learning - Piater (2001)   (Correct)

....the agent may not be distinguishable on the basis of an instantaneous perception. In this case, the recent history of percepts may allow the disambiguation of these perceptually aliased states. This is the basic idea underlying McCallum s Utile Distinction Memory and Utile Su#x Memory algorithms [68, 69]. In addition to resolving hidden state, his U Tree algorithm [67] performs selective perception by testing which perceptual distinctions are meaningful. Temporal state information is also the basis of Coelho s system for learning hapticallyguided grasping strategies [22, 23] Grasping experience ....

McCallum, R. A. Overcoming incomplete perception with utile distinction memory. In Proceedings of the Tenth International Conference on Machine Learning (1993), Morgan Kaufmann.


Attention Control for State Space Construction - Kamiharako, Ishiguro, Ishida (1998)   (Correct)

....one is to select necessary sensory data according to rewards and then construct a state space with selected sensory data, the other is to construct a state space by dividing states when new sensory data and rewards are obtained. G algorithm [2] picks out proper sensors to describe states. UDM [4] splits perceptually aliasing states according to obtained sensory data and rewards while the robot behaves. Nakamura s work deals with physical sensory data but needs feature extraction methods given by human intuitions. Mahadevan s work is basically difficult to deal with physical sensory ....

R. A. McCallum, Overcoming incomplete perception with utile distinction memory, Machine Learning, pp. 190-196, 1993.


Reinforcement Learning with LSTM in Non-Markovian Tasks with.. - Bakker   (Correct)

.... (Lin Mitchell, 1993; Bakker van der Voort van der Kleij, 2000; Gomez Miikkulainen, 1999) Finally, the policy may be implemented as a finite state automaton (FSA) where the state of the FSA represents the history of observations and actions (Meuleau, Peshkin, Kim, Kaelbling, 1999; McCallum, 1993). 1.2 Long term dependencies Most of the approaches described above have problems if there are long term dependencies between relevant events. An example of a long term dependency problem is a maze navigation task where the only way to distinguish between two T junctions that look identical is ....

McCallum, R. A. (1993). Overcoming incomplete perception with utile distinction memory. In Proceedings of the tenth international machine learning conference.


Market-Based Reinforcement Learning in Partially.. - Kwee, Hutter, Schmidhuber (2001)   (1 citation)  (Correct)

....and memorize important past events in complex, partially observable settings, such as in the introductory example above. Several recent, non standard RL approaches in principle are able to deal with partially observable environments and can learn to memorize certain types of relevant events [7, 8, 6, 3, 9, 10, 13, 18, 17, 14]. None of them, however, represents a satisfactory solution to the general problem of learning in worlds described by partially observable Markov Decision Processes This work was supported by SNF grants 21 55409.98 and 2000 61847.00 (POMDPs) The approaches are either ad hoc, or work in ....

R. A. McCallum. Overcoming incomplete perception with utile distinction memory. In Machine Learning: Proceedings of the Tenth International Conference. Morgan Kaufmann, Amherst, MA, 1993.


Planning in an Imperfect World Using Previous Experiences - Chiu (1995)   (Correct)

....over world states) The policy can be constructed from and the state estimator can be constructed from T and O by straightforward application of Bayes rule. Monahan [Mon82] and Lovejoy [Lov91] give extensive surveys of the operations research literature on POMDPs. Chrisman [Chr92] and McCallum [McC93] describe algorithms for including a POMDPs from interactions with the environment and use relatively simple approximations to the resulting optimal value function. Cassandra et. al [CKL94a, CKL94b] present an algorithm for finding arbitrarily good approximations to optimal policies and a method ....

R. A. McCallum. Overcoming incomplete Perception with Utile Distinction Memory. In Proceedings of the Tenth International Conference on Machine Learning, Amherst, Massachusetts, U.S.A., 1993. Morgan Kaufmann.


Using Classifier Systems as Adaptive Expert Systems for Control - Sigaud, Gérard (2000)   (Correct)

....designing the input and output of the agents so that the problem they solve is Markov is very hard, since agents cannot have a perfect knowledge of every relevant feature of all other agents and of the behavior of these agents. Hence we are doomed to solving non Markov problems. Some researchers [McCallum, 1993,Witkowski, 1997,Dorigo, 1994] have proposed algorithms which pick the relevant inputs from a set given beforehand. In these approaches, adding a new input strongly relies on the non Markov character of the problem: if the system receives di erent rewards for the same action in what it considers ....

McCallum, R. A. (1993). Overcoming incomplete perception with utile distinction memory. In Proceedings of the Tenth International Conference on Machine Learning, pages 190196, Amherst, MA. Morgan Kaufmann.


Reinforcement Learning: A Survey - Kaelbling, Littman, Moore (1996)   (367 citations)  (Correct)

....including the hidden state, then to use that model to construct a perfect memory controller #Cassandra, Kaelbling, Littman, 1994; Lovejoy, 1991; Monahan, 1982#. Chrisman #1992# showed how the forward backward algorithm for learning HMMs could be adapted to learning POMDPs. He, and later McCallum #1993#, also gave heuristic statesplitting rules to attempt to learn the smallest possible model for a given environment. The resulting model can then be used to integrate information from the agent s observations in order to make decisions. Figure 10 illustrates the basic structure for a ....

McCallum, R. A. #1993#. Overcoming incomplete perception with utile distinction memory. In Proceedings of the Tenth International Conference on Machine Learning, pp. 190# 196 Amherst, Massachusetts. Morgan Kaufmann.


A Heuristic Search Algorithm for Markov Decision Problems - Hansen, Zilberstein   (Correct)

....all possible belief states is uncountably infinite, there is a subset of POMDPs for which an optimal policy visits a finite number of belief states, starting from a particular belief state. Simple examples include the tigerbehind the door problem of Kaelbling et al. 1998) and the maze problem of McCallum (1993). Inspection replacement problems, a widely studied class of problems in operations research, also have this property (Thomas et al. 1991) Taking either a replace or an inspect action (assuming perfect inspections) creates a loop to some previously visited belief state and a policy that visits a ....

McCallum, R.A. (1993) Overcoming incomplete perception with utile distinction memory. In Proceedings of the Tenth International Machine Learning Conference, pp. 190--196. Amherst, MA.


Temporal Difference Learning: Eligibility Traces and the Successor .. - White (1995)   (Correct)

....spaces and actions. In most cases, reinforcement learning algorithms assume that the problems they will be applied to are Markovian in nature. The issues arising when available state information is incomplete, causing observed state transitions to exhibit non Markovian behaviour, is the subject of (McCallum, 1993; Singh, Jaakkola and Jordan, 1994; Jaakkola, Singh and Jordan, 1994; Ring, 1994; Cassandra, Kaelbling and Littman, 1992) Making efficient use of time and experience is a fundamental goal of reinforcement learning. There are several interesting areas of research which address these concerns: the ....

McCallum, R. A. (1993). Overcoming incomplete perception with utile distinction memory.


Approximation Algorithms for Solving Cost Observable Markov.. - Bayer (1998)   (Correct)

....in the world not currently accounted for by the model, this will increase its number of states to fit the observed data. While Chrisman s algorithm creates new states based on predicting perception, McCallum s utile distinctions algorithm creates new states if this helps predict reward (see [22]) Utility based distinctions will build a state space as large as needed to perform the current task, while perception based distinctions will build a state space as complex as the perceived world. McCallum learns on line a relevant history of perceptions and actions, and stores it as a branch ....

A. McCallum. Overcoming incomplete perception with utile distinction memory. Proceedings of the Tenth International Conference on Machine Learning, 1993.


Solving POMDPs by Searching the Space of Finite Policies - Meuleau, Kim, Kaelbling.. (1999)   (13 citations)  (Correct)

....we would want to impose on the policy is to be representable by a finite state automaton, or, as we will call it, a finite policy graph . Many previous approaches implicitly rely on a similar hypothesis: Some authors [14, 12, 2, 27] search for optimal reactive (or memoryless) policies, McCallum [15, 16] searches the space of policies using a finite horizon memory, Wiering and Schmidhuber s HQL [26] learns finite sequences of reactive policies, and Peshkin et al. 19] look for optimal finiteexternal memory policies. All these examples are particular cases of finite policy graphs, with a set of ....

.... can also be used, for instance, we can limit the search to policy representable as a finite sequence of RPs (with particular rules governing the transition from one RP to another) as in Wiering and Schmidhuber s HQL [26] Other instances include the finite horizon memory policies used by McCallum [15, 16], or the external memory policies used by Peshkin et al. 19] Although these architectures cannot be described only in terms of restriction constraints (there are also equality constraints between different parameters of the graph) the previous results and algorithm can be extended to each of ....

R.A. McCallum. Overcoming incomplete perception with utile distinction memory. In The Proceedings of the Tenth International Machine Learning Conference, Amherst, MA, 1993.


Solving Large POMDPs using Real Time Dynamic Programming - Geffner, Bonet (1998)   (2 citations)  (Correct)

....complexity of computing the next belief state b o a from b grows with jSj 2 only in states of complete uncertainty but is significantly faster in other cases. 11 7 Experimental Results 7. 1 Small Problems For the first set of experiments with rtdp bel we considered the three problems from [11, 16, 13] called Cheese , 4x3 and 4x4. These are all small navigation problems involving in the order of 11 Gamma 16 states, 2 Gamma 7 observations, and 4 actions, formulated here as goal pomdps with positive action costs. For reasons of space we omit the details about the formulations (which follow the ....

A. McCallum. Overcoming incomplete perception with utile distinction memory. In Proceedings Tenth Int. Conf. on Machine Learning, pages 190--196, 1993.


Learning Finite-State Controllers for Partially.. - Meuleau, Peshkin.. (1999)   (17 citations)  (Correct)

....limits, we may reduce the search to policies representable as finite policy graphs. Many existing algorithms for learning to plan in POMDPs rely on a similar assumption. For instance, some researchers [14, 3, 32] try to learn memoryless (or reactive) policies, McCallum s learning algorithm [19, 20] uses a finite horizon memory, Wiering and Schmidhuber s HQL [31] learns finite sequences of reactive policies using an implicit memory of some of the previous observations, and Peshkin et al. 23] look for optimal finite externalmemory policies. All these finite memory architectures correspond to ....

R.A. McCallum. Overcoming incomplete perception with utile distinction memory. In The Proceedings of the Tenth International Machine Learning Conference, Amherst, MA, 1993.


Planning and Acting in Partially Observable Stochastic Domains - Kaelbling, Littman, al. (1998)   (30 citations)  (Correct)

....of the model with the computation of the policy. This approach has the potential significant advantage of being able to learn a model that is complex enough to support optimal (or good) behavior without making irrelevant distinctions; this idea has been pursued by Chrisman [11] and McCallum [40,41]. A Appendix Theorem 1 Let U a be a non empty set of useful policy trees, and Q a t be the complete set of useful policy trees. Then U a 6= Q a t if and only if there is some tree p 2 U a , observation o 2 Omega , and subtree p 0 2 V t Gamma1 for which there is some belief state b such ....

R. Andrew McCallum. Overcoming incomplete perception with utile distinction memory. In Proceedings of the Tenth International Conference on Machine Learning, pages 190--196, Amherst, Massachusetts, 1993. Morgan Kaufmann.


Incremental Pruning: A Simple, Fast, Exact Method for.. - Cassandra, Littman.. (1997)   (Correct)

....across problems. Table 1: Test problem parameter sizes. Test Problem States Acts. Obs. Stages jV f j Reference 1D maze 4 2 2 70 4 4x3 11 4 6 8 436 Parr Russell (1995) 4x3 CO 11 4 11 367 4 Russell Norvig (1994) 4x4 16 4 2 374 20 Cassandra, Kaelbling, Littman (1994) Cheese 11 4 7 373 14 McCallum (1993) Part painting 4 4 2 371 9 Kushmerick, Hanks, Weld (1995) Network 7 4 2 14 438 Shuttle 8 3 5 7 481 Chrisman (1992) Aircraft ID 12 6 5 4 258 Table 4: Total time (sec. spent constructing S a sets. TS a GammaBUILD Test Problem Witness IP RR 1D maze 7.1 0.1 0.1 4x3 599.1 220.5 31.0 ....

McCallum, R. A. 1993. Overcoming incomplete perception with utile distinction memory. In Proceedings of the Tenth International Conference on Machine Learning, 190--196. Amherst, Massachusetts: Morgan Kaufmann.


Reinforcement Learning Using Approximate Belief States - Rodriguez, Parr, Koller (1999)   (1 citation)  (Correct)

....reinforcement learning methods and space does not permit a full review of them. The approach upon which we focus is the use of a belief state as a probability distribution over underlying model states. This is in contrast to methods that manipulate augmented state descriptions with finite memory [9, 12] and methods that work directly with observations [8] The main advantage of a probability distribution is that it summarizes all of the information necessary to make optimal decisions [1] The main disadvantages are that a model is required to compute a belief state, and that the task of ....

Andrew R. McCallum. Overcoming incomplete perception with utile distinction memory. In Proc. ICML, pages 190--196, 1993.


Hidden-Mode Markov Decision Processes - Samuel Choi Dit-Yan (1999)   (2 citations)  (Correct)

....data size and time. Experiments on various model sizes and settings were conducted. The results are quite consistent. In the following, some details of a typical run are presented for illustration. 4. 1 Experimental Setting The experimental model is a randomly generated HMMDP with 3 modes, 10 states and 5 actions. Note that this HM MDP is equivalent to a fairly large POMDP, with 30 hidden states, 10 observations and 5 actions. In order to simulate the infrequent mode changes, each mode is set to have a minimum probability of 0.9 in looping back to itself. In addition, each state of the ....

....consistent. In the following, some details of a typical run are presented for illustration. 4.1 Experimental Setting The experimental model is a randomly generated HMMDP with 3 modes, 10 states and 5 actions. Note that this HM MDP is equivalent to a fairly large POMDP, with 30 hidden states, 10 observations and 5 actions. In order to simulate the infrequent mode changes, each mode is set to have a minimum probability of 0.9 in looping back to itself. In addition, each state of the HMMDP has 3 to 8 non zero transition probabilities, and rewards are uniformly distributed between r m (s; a) ....

[Article contains additional citation context not shown here]

A. McCallum. Overcoming incomplete perception with utile distinction memory. In Tenth International Machine Learning Conference, Amherst, MA, 1993.


Planning and Acting in Partially Observable Stochastic.. - Kaelbling, Littman.. (1995)   (182 citations)  (Correct)

....of the model with the computation of the policy. This approach has the potential significant advantage of being able to learn a model that is complex enough to support optimal (or good) behavior without making irrelevant distinctions; this idea has been pursued by Chrisman [9] and McCallum [31, 32]. A Appendix Theorem 1 Let U be a non empty set of useful policy trees, and Q a t be the complete set of useful policy trees. Then U 6= Q a t if and only if there is some tree p 2 U , o 2 Omega , and p 0 2 V t Gamma1 for which there is some belief state b such that V pnew (b) V p ....

R. Andrew McCallum. Overcoming incomplete perception with utile distinction memory. In Proceedings of the Tenth International Conference on Machine Learning, pages 190--196, Amherst, Massachusetts, 1993. Morgan Kaufmann.


Large-Scale Dynamic Optimization Using Teams of Reinforcement.. - Crites (1996)   (9 citations)  (Correct)

....must perceive and act in the world. These agents are driven by their perceptions, and in most cases they are not able to perceive the complete state of their environments. Perceptual aliasing [123] occurs when a perception does not uniquely identify a state of the world. As McCallum points out [74], perceptual aliasing can be a blessing or a curse. It is a blessing when it confounds states where the same action is best, since it reduces the size of the policy space to be searched. However, it is a curse when it confounds states where different actions are best. Jaakkola et al. [52] present ....

....the model to be trained) and for estimating the state occupation probabilities after some sequence of observations (allowing the model to be used) One difficulty with this approach is that the number of underlying states needed in the HMM is not known a priori. Chrisman [29] and McCallum [74] both add states incrementally by splitting existing ones. State splitting was also used by Chapman Kaelbling [28] in a reinforcement learning context. They built an agent to play a videogame called Amazon. The visual system of their agent had 100 input bits. With that large of an input space, ....

[Article contains additional citation context not shown here]

McCallum, R. A. Overcoming incomplete perception with utile distinction memory. In Proceedings of the Tenth International Conference on Machine Learning, 1993.


Instance-Based State Identification for Reinforcement Learning - Andrew Mccallum (1994)   (20 citations)  Self-citation (Mccallum)   (Correct)

....windows because, for most tasks, different amounts of memory are needed at different steps of the task. The on line memory creation approach has been adopted in several reinforcement learning algorithms. The Perceptual Distinctions Approach [ Chrisman, 1992 ] and Utile Distinction Memory [ McCallum, 1993 ] are both based on splitting states of a finite state machine by doing off line analysis of statistics gathered over many steps. Recurrent Q [ Lin, 1993 ] is based on training recurrent neural networks. Indexed Memory [ Teller, 1994 ] uses genetic programming to evolve agents that use load and ....

....failed 70, 30 or 20 of the time, and when they failed, resulted in random states. NSM used fi = 0:2, fl = 0:9, k = 8, and N = 1000. PDA takes almost 8000 steps to learn the task. NSM learns a good policy in less than 1000 steps, although the policy is not quite optimal. Utile Distinction Memory [ McCallum, 1993 ] was demonstrated on several local perception mazes. Unlike most reinforcement learning maze domains, the agent perceives only four bits indicating whether there is a barrier to the immediately adjacent north, east, south and west. NSM used fi = 0:9, fl = 0:9, k = 4, and N = 1000. In two of the ....

[Article contains additional citation context not shown here]

R. Andrew McCallum. Overcoming incomplete perception with utile distinction memory. In The Proceedings of the Tenth International Machine Learning Conference. Morgan Kaufmann Publishers, Inc., 1993.


Using Markov-k Memory for Problems with - Hidden-State Matthew Mitchell (2003)   (Correct)

No context found.

R MCCALLUM, A. Overcoming incomplete perception with utile distinction memory. In "Proceedings of the tenth international machine learning conference", Amherst (1993).


A Bayesian Approach to Landmark Discovery and Active Perception in .. - Thrun (1996)   (11 citations)  (Correct)

No context found.

McCallum, R. A. Overcoming Incomplete Perception with Utile Distinction Memory. in: Proceedings of the Tenth International Conference on Machine Learning, edited by P. E. Utgoff. Morgan Kaufmann, San Mateo, CA, 1993, pp. 190--196.


Reinforcement Learning for Problems with Hidden State - Hasinoff (2003)   (Correct)

No context found.

R. A. McCallum. Overcoming incomplete perception with utile distinction memory. In Proceedings of the Tenth International Conference on Machine Learning, pages 190--196, 1993.


Using Markov-k Memory for Problems with - Hidden-State Matthew Mitchell   (Correct)

No context found.

R MCCALLUM, A. Overcoming incomplete perception with utile distinction memory. In "Proceedings of the tenth international machine learning conference", Amherst (1993).

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC