| R. A. McCallum. Instance-based utile distinctions for reinforcement learning with hidden state. In Proceedings of the Twelfth International Conference on Machine Learning, pages 387--395, 1995. |
....suggests to begin with simple behaviors taking only few objects into account, and then use the basic ones to develop others. This description naturally induces the growing of a tree as exemplified on figure 5. This principle can be compared to the exhaustive observables of [4] or the U trees of [7]. And, in a similar way, difficulties should be met to stop exploring new leaves at a given depth. 6. CONCLUSION 0 Figure 5: Behaviors exploration tree. There is no precision here on which behaviors are basic or not. But notice that various com binations of types of reward must be tested. ....
R. A. McCallum. Instance-based utile distinctions for reinforcement learning with hidden state. In Proceedings of the 12fih International Machine Learning Conference (ML'95), 1995.
....with other methods. # ############ Partially Observable Markov Decision Process (POMDP) is a representative class of non Markovian environments, where agents sense di erent environmental states as the same sensory input. The most traditional approach to POMDPs is the memory based approach [1, 6] that uses histories of sensor action pairs to divide partially observable states. The memorybased approach is hardware intensive in order to store many histories of sensor action pairs. In order to overcome disadvantage of the memorybased approach, we can use a stochastic policy [4, 9, 5] where ....
McCallum, R. A.: Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State, Proc. of the 12th International Conference on Machine Learning, pp. 387-395 (1995).
....process is repeated in each of the leaves. This method was able to learn very small representations of the Q function in the presence of an overwhelming number of irrelevant, noisy state attributes. It outperformed Q learning with backpropagation in a simple videogame environment and was used by McCallum #1995# #in conjunction with other techniques for dealing with partial observability# to learn behaviors in a complex driving simulator. It cannot, however, acquire partitions in which attributes are only signi#cant in combination #such as those needed to solve parity problems#. Variable Resolution ....
....Figure 7c shows the second trial, started from a slightly di#erent position. This is a very fast algorithm, learning policies in spaces of up to nine dimensions in less than a minute. The restriction of the current implementation to deterministic environments limits its applicability,however. McCallum #1995# suggests some related tree structured methods. 264 Reinforcement Learning: A Survey 6.2 Generalization over Actions The networks described in Section 6.1.1 generalize over state descriptions presented as inputs. They also produce outputs in a discrete, factored representation and thus could ....
[Article contains additional citation context not shown here]
McCallum, R. A. #1995#. Instance-based utile distinctions for reinforcement learning with hidden state. In Proceedings of the Twelfth International Conference Machine Learning, pp. 387#395 San Francisco, CA. Morgan Kaufmann.
....as needed to perform the current task, while perception based distinctions will build a state space as complex as the perceived world. McCallum learns on line a relevant history of perceptions and actions, and stores it as a branch in a suffix tree; a leaf node stores the policy for that history ([23]) This approach does not scale well for large number of observations, but his PhD thesis ( 24] presents an algorithm that incorporates perceptual distinctions in the suffix tree. 9 5 Cost observable Markov Decision Processes (COMDPs) In POMDPs, observations are received for free; we want to ....
A. McCallum. Instance-based utile distinctions for reinforcement learning with hidden state. Proceedings of the Twelfth International Conference on Machine Learning, 1995.
....our attention only to its subspace that we believe to contain the optimal solution or a good approximation. Memoryless policies (Platzman, 1977; White Scherer, 1994; Littman, 1994; Singh, Jaakkola, Jordan, 1994) policies based on truncated histories (Platzman, 1977; White Scherer, 1994; McCallum, 1995), or nite state controllers with a xed number of memory states (Platzman, 1980; Hauskrecht, 1997; Hansen, 1998a, 1998b) are all examples of a policy space restriction. In the following we consider only the nite state machine model (see Section 2.6.1) which is quite general; other models can be ....
McCallum, R. (1995). Instance-based utile distinctions for reinforcement learning with hidden state. In Proceedings of the Twelfth International Conference on Machine Learning.
....the optimal value function V MDP and then applies it online to construct approximately optimal policies for the POMDP [4] Our paper presents a new method related to this approach. The third approach attempts to directly construct a finite state controller that implements a good POMDP policy [6, 10]. What makes POMDP s difficult The fundamental problem is that the agent may become lost that is, the agent may have a large amount of uncertainty about the current state of the environment. Consequently, the optimal policy may involve avoiding getting lost and also acting when lost . As ....
McCallum, R. A.: Instance-based Utile Distinctions for Reinforcement Learning with Hidden State. ICML-95. Morgan Kaufmann (1995) 387--396
....can be implemented using a table [16, 15] or using a function approximator, such as a feedforward neural network [1, 12, 7] Only when the task is non Markovian is it necessary to equip the agent with some sort of internal state. This internal state represents the history of inputs and actions [10, 9, 8] or it contains a more explicit model [3] of the environment. It may be implemented using a recurrent neural network [9, 8] However, even in a Markovian task, the single job of learning the direct mapping can be difficult. One often investigated problem results from dealing with a large state ....
R.A. McCallum. Instance-based utile distinctions for reinforcement learning with hidden state. In Proceedings of the Twelfth International Conference on Machine Learning, pages 387--395. Morgan Kauffman, San Francisco, CA, 1995.
.... process models (Dean et al. 1993) Planning tasks with A B C D E F 1 2 3 4 5 6 Figure 1: Simple Gridworld incompletely known start cell or nondeterministic movement, and nondeterministic sensing or sensing on demand have been modeled as partially observable Markov decision process models (McCallum 1995; Hansen 1997) Planning tasks with incompletely known start cell, deterministic movement, and automatic deterministic sensing have been modeled as AND OR search tasks (Nourbakhsh 1997; Koenig Simmons 1998a) In this paper, we present a first analysis of the last case. The Gridworld Planning ....
McCallum, R. 1995. Instance-based utile distinctions for reinforcement learning with hidden state. In Proceedings of the International Conference on Machine Learning, 387--395.
....of the model with the computation of the policy. This approach has the potential significant advantage of being able to learn a model that is complex enough to support optimal (or good) behavior without making irrelevant distinctions; this idea has been pursued by Chrisman [11] and McCallum [40,41]. A Appendix Theorem 1 Let U a be a non empty set of useful policy trees, and Q a t be the complete set of useful policy trees. Then U a 6= Q a t if and only if there is some tree p 2 U a , observation o 2 Omega , and subtree p 0 2 V t Gamma1 for which there is some belief state b such ....
R. Andrew McCallum. Instance-based utile distinctions for reinforcement learning with hidden state. In Proceedings of the Twelfth International Conference on Machine Learning, pages 387--395, San Francisco, CA, 1995. Morgan Kaufmann. 42
....and closed loop control. By learning to take open loop sequences of actions between sensing, it can optimize a tradeoff between the cost and value of sensing. The problem we address is a special case of the problem of hidden state or partial observability in RL (e.g. Whitehead Lin, 1995; McCallum, 1995). Although we assume sensing provides perfect information (a significant limiting assumption) use of open loop control means that actions must sometimes be taken when the current state of the controlled system is uncertain. Previous work on RL for partially observable environments has focused on ....
....this work would be to explore its relationship to work on RL for partially observable MDPs, which has so far focused on the problem of sensor uncertainty and hidden state. Because some of this work also makes use of tree representations of the state space and of learned state action values (e.g. McCallum, 1995), it may be that a similar pruning rule can constrain exploration for such problems. Acknowledgements Support for this work was provided in part by the National Science Foundation under grants ECS 9214866 and IRI 9409827 and in part by Rome Laboratory, USAF, under grant F30602 95 1 0012. ....
McCallum, R.A. (1995) Instance-based utile distinctions for reinforcement learning with hidden state. In Proc. 12th Int. Machine Learning Conf. Morgan Kaufmann.
....of the model with the computation of the policy. This approach has the potential significant advantage of being able to learn a model that is complex enough to support optimal (or good) behavior without making irrelevant distinctions; this idea has been pursued by Chrisman [9] and McCallum [31, 32]. A Appendix Theorem 1 Let U be a non empty set of useful policy trees, and Q a t be the complete set of useful policy trees. Then U 6= Q a t if and only if there is some tree p 2 U , o 2 Omega , and p 0 2 V t Gamma1 for which there is some belief state b such that V pnew (b) V p ....
R. Andrew McCallum. Instance-based utile distinctions for reinforcement learning with hidden state. In Proceedings of the Twelfth International Conference on Machine Learning, pages 387--395, San Francisco, CA, 1995. Morgan Kaufmann.
....estimate the state transition probabilities P ij (a) If the state transition probability density function has multiple peaks, the state is a candidate for a hidden state. By allocating one memory to each candidate of the hidden states, the candidate state can be discriminated as a hidden state [23]. After nding hidden states and adding them to the state space, we apply Q learning again to determine the adequate actions in the hidden states. 5 Experimental Results The experiment consists of two parts: rst, learning the optimal policy f through the computer simulation, then applying the ....
R.A. McCallum. \Instance-based utile distinctions for reinforcement learning with hidden state". In Proc. of the 12th Int. Conf. on Machine Learning, pages 387-395, 1995.
....pendulum (a pole) on a moving cart. Peng (1995) performed the cart pole task using locally weighted regression to interpolate a value function. Aha and Salzberg (1993) explored nearest neighbor and locally weighted learning approaches to a tracking task in which a robot pursued and caught a ball. McCallum (1995) explored the use of lazy learning techniques in situations where states were not completely measured. Farmer and Sidorowich (1987, 1988a,b) apply locally weighted regression to modeling and prediction of chaotic dynamic systems. Huang (1996) uses nearest neighbor and weighted averaging techniques ....
McCallum, R. A. (1995). Instance-based utile distinctions for reinforcement learning with hidden state.
....[13] also use learnt instances but they are carefully selected by a genetic algorithm. The second issue is addressed by Moore and Atkeson [11] who keep a queue of interesting instances. These are updated most frequently to improve the learning rate. The third issue is addressed by McCallum [7] who uses trees which expand the state representation to include prior states removing ambiguity due to hidden states. In further work [8] he uses a single representation to address both the hidden state problem and the general problem of representing a large state space by using a case base of ....
R. A. McCallum, (1995). Instance-based utile distinctions for reinforcement learning. Proc. Twelfth International Conf. on Machine Learning. pp 387-395
.... states (called hidden states ) we estimate the state transition probabilities by using the MLE (Maximum Likelihood Estimation) If the state transition probability density function has multiple peaks, we trace back the history of state transitions until this hidden state can be discriminated [McCallum, 1995 ] After nding hidden states and adding them to the state space, we apply Q learning to our task. 5 Experimental Results The experiment consists of two parts: rst, learning the optimal policy f through the computer simulation, then applying the learned policy to a real situation. 5.1 ....
R.A. McCallum. \instance-based utile distinctions for reinforcement learning with hidden state". In Proc. of the 12th Int. Conf. on Machine Learning, pages 387-395, 1995.
....of the model with the computation of the policy. This approach has the potential significant advantage of being able to learn a model that is complex enough to support optimal (or good) behavior without making irrelevant distinctions; this idea has been pursued by Chrisman [10] and McCallum [33,34]. A Appendix Theorem 1 Let U a be a non empty set of useful policy trees, and Q a t be the complete set of useful policy trees. Then U a 6= Q a t if and only if there is some tree p 2 U a , o 2 Omega , and p 0 2 V t Gamma1 for which there is some belief state b such that V pnew (b) V ....
R. Andrew McCallum. Instance-based utile distinctions for reinforcement learning with hidden state. In Proceedings of the Twelfth International Conference on Machine Learning, pages 387--395, San Francisco, CA, 1995. Morgan Kaufmann.
....been to address two problems: the economical representation of the state space and dealing with hidden state. The former uses learnt instances combined with various forms of local regression. Both Peng [11] and Tadepalli [15] use linear regression over a set of neighbouring points. McCallum [7] uses a tree representation to expand the state representation to include prior states to remove ambiguity due to hidden states. In further work [8] he uses a single representation to address both the hidden state problem and the general problem of representing a large state space by using a ....
R. A. McCallum, (1995). Instance-based utile distinctions for reinforcement learning. Proceedings of the Twelfth International Conference on Machine Learning. pp 387-395
....navigation or factory control, where the available sensors provide only partial information about the environment. A few algorithms for the solution of hidden state reinforcement learning problems have been developed (Cassandra, Kaelbling, Littman, 1994; Littman, Cassandra, Kaelbling, 1995; McCallum, 1995; Parr Russell, 1995) Exact solution appears to be very difficult. The challenge is to find approximate methods that scale well to large hidden state applications. Despite these very substantial open problems, reinforcement learning methods are already being applied to a wide range of ....
McCallum, R. A. (1995). Instance-based utile distinctions for reinforcement learning with hidden state. In Proceedings Twelfth International Conference on Machine Learning, pp. 387--396 San Francisco, CA. Morgan Kaufmann.
....Of course we can construct environments in which long term memory is essential. However, in many environments, because of their stochasticity, the significance of the past information decreases exponentially fast as the time goes. In such environments, memories of moderate length will work fine. McCallum (1995) proposes utile suffix memory (USM) algorithm. USM uses a tree structure to represent short term memories with variable length. USM s model learning is based on a statistical test, which requires time and space proportional to the learning steps. This makes it difficult to adapt USM to the ....
McCallum, R. A. (1995). Instance-based utile distinctions for reinforcement learning with hidden state. In Proc. the 12th International Conference on Machine Learning.
....1: A BIG PARTIALLY OBSERVABLE MAZE (POM) Task. Figure 1 shows a 39 Theta38 maze with a single start position (S) and a single goal position (G) The maze has many more fields and obstacles than mazes used by previous authors working on POMDPs (for instance, McCallum s maze has only 23 free fields (McCallum, 1995)) The goal is to find a program that makes an agent move from S to G. Instructions. Programs can be composed from 9 primitive instructions. These instructions represent the initial bias provided by the programmer (in what follows, superscripts will indicate instruction numbers) The first 8 ....
McCallum, R. A. (1995). Instance-based utile distinctions for reinforcement learning with hidden state. In Prieditis, A. and Russell, S., editors, Machine Learning: Proceedings of the Twelfth International Conference, pages 387--395. Morgan Kaufmann Publishers, San Francisco, CA.
....are appropriate. 3. 3 Components in integrated learning frameworks CBL algorithms have been integrated with learning algorithms that target a variety of performance tasks, including knowledge acquisition (Tecuci, 1993) analytic problem solving (Borrajo Veloso, 1997) reinforcement learning (McCallum, 1995), and Bayesian reasoning (Tirri et al. 1996) These integrations frequently use CBL in innovative fashions. For example, McCallum s cases provide historical information that allow agents to distinguish perceptually identical locations, while Borrajo and Veloso used several lazy explanation based ....
McCallum, R. A. (1995). Instance-based utile distinctions for reinforcement learning.
....(Levin, 1973; Levin, 1984) as only learning action to be plugged into MRL. We apply it to partially observable Markov decision problems (POMDPs) which recently received a lot of attention in the reinforcement learning community, e.g. Jaakkola et al. 1995; Kaelbling et al. 1995; Ring, 1994; McCallum, 1995; Littman, 1994; Cliff and Ross, 1994; Schmidhuber, 1991) We first show that LS by itself can solve partially observable mazes (POMs) involving many more states and obstacles than those solved by various previous authors (we will also see that LS can easily outperform Q learning) We then extend ....
....(Wiering and Schmidhuber, 1996) Task. Figure 1 shows a 39 Theta 38 maze with a single start position (S) and a single goal position (G) The maze has many more fields and obstacles than mazes used by previous authors working on POMDPs for instance, McCallum s maze has only 23 free fields (McCallum, 1995). The goal is to find a program that makes an agent move from S to G. Instructions. Programs can be composed from 9 primitive instructions. These instructions represent the initial bias provided by the programmer (in what follows, superscripts will indicate instruction numbers) The first 8 ....
[Article contains additional citation context not shown here]
McCallum, R. A. (1995). Instance-based utile distinctions for reinforcement learning with hidden state.
.... Several reinforcement learning algorithms augment their state representations on line by adding memory; examples include techniques based on partially observable Markov decision processes [Chrisman, 1992; McCallum, 1993] recurrent neural networks [Lin and Mitchell, 1992] and suffix trees [McCallum, 1995] . The problem is that directed exploration techniques all hinge on the assumption that the state of the world is observable. That is, they all depend on associating unique exploration statistics with each world state. This assumption is broken by hidden state. For example, since different world ....
....to many different exploration techniques. 3 Fringe Exploration with USM Fringe Exploration has been implemented in conjunction with Utile Suffix Memory (USM) a reinforcement learning algorithm that learns to use variable amounts of short term memory in order to solve tasks with hidden state [McCallum, 1995] . USM has solved many partially observable tasks well, performing over an order of magnitude better than some previous hidden state RL algorithms) however, in previous work it has only used undirected exploration techniques; thus, like the other algorithms it outperforms, USM still wasted much ....
[Article contains additional citation context not shown here]
R. Andrew McCallum. Instance-based utile distinctions for reinforcement learning. In The Proceedings of the Twelfth International Machine Learning Conference. Morgan Kaufmann Publishers, Inc., 1995.
No context found.
R. A. McCallum. Instance-based utile distinctions for reinforcement learning with hidden state. In Proceedings of the Twelfth International Conference on Machine Learning, pages 387--395, 1995.
No context found.
McCallum, R. A.: Instance-based Utile Distinctions for Reinforcement Learning with Hidden State. ICML-95. Morgan Kaufmann (1995) 387-- 396
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC