| McCallum, R. A. Instance-Based State Identification for Reinforcement Learning. in: Advances in Neural Information Processing Systems 7, edited by G. Tesauro, D. Touretzky, and T. Leen. MIT Press, Cambridge, MA, 1995, To appear. |
.... Previously, we have shown this approach to perform exceedingly well in predicting future user actions (Gorniak and Poole 2000) In that research we identified the user s state implicitly by finding the longest sequences in observed history that match the actions the user just performed, similar to (McCallum 1996). We ranked possible future actions according to the lengths of these matches. Our goal in the research presented here is to explicitly identify the states of user and application. This results in a detailed model of how one or many people are using the application. Such a model can help ....
McCallum, A. R.: 1996, Instance-based state identification for reinforcement learning, Technical report, University of Rochester.
....the same symbol, but it is bound to fail when perceptual aliasing is present. A much better measure of similarity is the length of matching action observation sequences prior to the two time states. This is exactly the similarity measure that McCallum used in his instancebased Q learning algorithm (McCallum, 1995). The intuition behind this measure is that the trajectories leading to a time state represent an embedding space of the true hidden state space, and close points in the embedding space (matching trajectories) correspond to close hidden states. This is a common approach used in system ....
....learning research, with the difference that the observations here are continuous, and not discrete. The size of the state space is comparable to that used in previous work on recovering worlds with hidden state Chrisman (1992) experimented with a space station docking problem with 6 states, and McCallum (1995) used synthetic worlds with 8, 11, 14, and 15 states. All of these previous experiments used discrete observations. Some of the states in this world are indistinguishable from each other the following two pairs of states generate the same readings: 50; 25; 0) and (50; 25; 180) 50; 25; 90) ....
McCallum, R. A. (1995). Instance-based state identification for reinforcement learning. Advances in Neural Information Processing Systems (pp. 377--384).
....Approximators Another class of sparse coarse coded memory is memory based. Although, memory based function approximators have not been widely used in conjunction with reinforcement learning, they are common in other tasks such as classification (e.g. 8] and robot control (e.g. 2] but see [16, 14, 11]) In a memory based function approximator, each memory element represents some of the state action pairs or a case the agent has experienced before. A query is performed by first retrieving the nearest neighbors to the query point according to some similarity metric and then performing a weighted ....
R. A. McCallum, G. Tesauro, D. Touretzky, and T. Leen. Instance-based state identification for reinforcement learning. Advances in Neural Information Processing Systems, 7, 1995.
....is usually smaller, which comes at the expense of increased computational complexity. 5. 5 Learning Finite State Automata Within the AI community, research has been conducted on general methods that can reverse engineer (learn) finite state automata based on their input output behavior (see e.g. [3,20,78,66,72,86,87]) Finite state automata (FSAs) are learned by observing the result of sequences of actions. Often, algorithms capable of learning FSAs require a pre given homing sequence, i.e. a sequence that resets the state of the finite state machine (a routine that carries a robot to a unique location) ....
R. A. McCallum. Instance-based state identification for reinforcement learning. In G. Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural Information Processing Systems 7, Cambridge, MA, 1995. MIT Press. To appear.
....of distance learning algorithms determine their tractability. Instead, we investigate learning algorithms that are able to change the structure of the POMDP. Alternatives to the Baum Welch algorithm for learning POMDPs are described by [Chrisman, 1992] Stolcke and Omohundro, 1993] and [McCallum, 1995] , among others. These algorithms are able to change the structure of a POMDP, but have the disadvantage that they either require a large amount of training data, learn task specific representations only, or cannot utilize prior knowledge. Consequently, we have designed a novel POMDP learning ....
McCallum, R.A. 1995. Instance-based state identification for reinforcement learning. In Advances in Neural Information Processing Systems 7.
....a conjunction of perceptions in sequence predicts reward, but only if they each predict reward on their own. The relevance of pieces of information must be detectable in isolation. This limits the feasibility of UDM in environments where there is extended concealment of crucial features. McCallum [75] has developed one of the most promising approaches to RL with hidden state, called instance based state identification, which applies memory based learning techniques such as nearest neighbor to perception action reward sequences. He reports orders of magnitude faster learning than previous ....
McCallum, R. A. Instance-based state identification for reinforcement learning. In Tesauro, G., Touretzky, D., and Leen, T., editors, Advances in Neural Information Processing Systems 7. MIT Press, Cambridge, MA, 1995.
.... from T and O, an optimal policy can be constructed [7] However these indirect methods have been criticized as being computationally intractable in realistic environments [14] Indeed, the empirical results suggest that a prohibitive amount of training time is required with these algorithms [20]. Rather than convert a POMDP to a MDP through state estimation, direct methods update a value function over observations without recovering absolute state. Jaakkola et al. present a Monte Carlo algorithm for policy evaluation and improvement using Q values defined over actions and observations ....
....defined over actions and observations [14] They provide a recursive formulation of the Monte Carlo algorithm, and prove the policy improvementmethod will converge to at least a local maximum. A hybrid approach sharing both direct and indirect qualities has been explored by Lin [16] and McCallum [20]. These methods use a memory based approach to identifying state. Lin s window Q algorithm supplies a history of the N most recent observations and actions to a conventional Q learning algorithm. The intuition is that perceptually aliased states can be discriminated by indexing utility over a ....
[Article contains additional citation context not shown here]
McCallum, R. A., Instance-based State Identification for Reinforcement Learning. In Advances In Neural Information Processing Systems 7, MIT Press, 1995.
....[11] who keep a queue of interesting instances. These are updated most frequently to improve the learning rate. The third issue is addressed by McCallum [7] who uses trees which expand the state representation to include prior states removing ambiguity due to hidden states. In further work [8] he uses a single representation to address both the hidden state problem and the general problem of representing a large state space by using a case base of state sequences associated with various trajectories. Unlike this other research, in the work presented here the case is not an example of ....
R. A. McCallum (1995). Instance-based state identification for reinforcement learning. Advances in Neural Information Processing Systems 7. pp 377-384
....Approximators Another class of sparse coarse coded memory is memory based. Although, memory based function approximators have not been widely used in conjunction with reinforcement learning, they are common in other tasks such as classification (e.g. 8] and robot control (e.g. 2] but see [16, 14, 11]) In a memory based memory, each memory element represents some of the state action pairs or a case the agent has experienced before. A query is performed by first retrieving the nearest neighbors to the query point according to some similarity metric and then performing a weighted average of ....
McCallum, R.A., Tesauro, G., Touretzky, D., & Leen, T. (1995). Instance-based state identification for reinforcement learning. Advances in Neural Information Processing Systems, 7.
....combined with various forms of local regression. Both Peng [11] and Tadepalli [15] use linear regression over a set of neighbouring points. McCallum [7] uses a tree representation to expand the state representation to include prior states to remove ambiguity due to hidden states. In further work [8] he uses a single representation to address both the hidden state problem and the general problem of representing a large state space by using a case base of sequences of states associated with various trajectories. Unlike this other research, in the work presented here the case is not an ....
R. A. McCallum (1995). Instance-based state identification for reinforcement learning. Advances in Neural Information Processing Systems 7. pp 377-384
....and an observation o is generated with probability O(s; a; o) In practice, T and O are not easily obtainable, and we use methods which do not require them a priori. Rather than attempt to solve the POMDP explicitly, we look to techniques for hidden state reinforcement learning to find a solution [12, 10, 8, 1]. Our state is defined by the users pose, facial expression, and hand configurations, expressed in nine variables Three of these are boolean, person present, left arm extended, and right arm extended, and are provided directly by the person tracker. Three more are provided by the foveated gesture ....
....performance. Given the reward function defined in the AGR task, this will correspond to a successful recognition. 2. 2 Hidden State Learning To find policies for AGR tasks we have implemented an instance based method for hidden state reinforcement learning, based on earlier work by McCallum [12]. This method performs Q learning [14, 15] see Appendix A) but replaces the absolute state with a distributed memorybased state representation. Given a history of action, reward, and observation tuples, a[t] r[t] o[t] 0 t T , a Q value is also stored with each time step, q[t] and ....
[Article contains additional citation context not shown here]
R. A. McCallum. Instance-based State Identification for Reinforcement Learning. In Advances NIPS 7, MIT Press, 1995.
....memory based. Although memorybased function approximators have not been widely used in conjunction with reinforcement learning, they are common in other tasks such as classification (e.g. Kibler and Aha, 1989) and robot control (e.g. Atkeson, 1991) but see Ram and Santamar ia, 1997; Peng, 1993; McCallum et al. 1995). In a memory based function approximator, each memory element represents some of the state action pairs or a case the agent has experienced before. A query is performed by first retrieving the nearest neighbors to the query point according to some similarity metric and then performing a weighted ....
McCallum, R. A., Tesauro, G., Touretzky, D., and Leen, T. (1995). Instance-based state identification for reinforcement learning. In Tesauro, G., Touretzky, D., and Leen, T. (eds.), Advances in Neural Information Processing Systems 7, pp 377--384. MIT Press, Cambridge, MA.
.... computation are most important to compute between real trials [Moore and Atkeson, 1993] Moore has also proposed using variable resolution in the state space to speed up RL [Moore, 1991; Moore, 1993] McCallum uses a memory based method to glean more information from the robot s executions [McCallum, 1994]. Atkeson [Atkeson, 1993] looks at ways of using local models to do local optimizations in dynamic programming which could be incorporated into a more efficient form of reinforcement learning. Despite the work on improving efficiency, reinforcement learning does not seem appropriate for the ....
A. McCallum, "Instance-based State Identification for Reinforcement Learning," In Advances in Neural Information Processings Systems 7 (NIPS-94), 1994.
....of the improved efficiency provided by storing examples in kd trees in using a lazy approach to learn several robot control tasks. More recently, Moore and Atkeson (1993) developed their prioritized sweeping algorithm, in which interesting examples in a Q table are the focus of updating. McCallum (1995) developed the nearest sequence memory algorithm, which is a lazy algorithm for solving control problems plagued by hidden state. Hidden state is an artifact of perceptual aliasing in which the mapping between states and perceptions is not one to one (Whitehead, 1992) McCallum showed through ....
McCallum, R. A. (1995). Instance-based state identification for reinforcement learning. In Advances in Neural Information Processing Systems 7, pp. 377--384.
....as to what the non observable quantities (state) of the environment are that make the environment Markov. Since in general it is difficult to specify what constitutes the state of the environment, methods that can discover hidden state and model it from data are clearly desirable (see e.g. [9, 29, 36]) ffl Other sensors. Integrating sensors other than sonar and cameras into mobile robot navigation is an important problem, since different sensors have different perceptual characteristics. In principle, the general mechanisms for mapping, localization and navigation are not specialized to a ....
R. A. McCallum. Instance-based state identification for reinforcement learning. In G. Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural Information Processing Systems 7, Cambridge, MA, 1995. MIT Press. To appear.
.... Although, memory based function approximators have not been widely used in conjunction with reinforcement learning, they are common in other tasks such as classification (e.g. Kibler and Aha, 1989] and robot control (e.g. Atkeson, 1991] but see [Ram and Santamar ia, 1997; Peng, 1993; McCallum et al. 1995]) In a memory based function approximator, each memory element represents some of the state action pairs or a case the agent has experienced before. A query is performed by first retrieving the nearest neighbors to the query point according to some similarity metric and then performing a weighted ....
McCallum, R. A., Tesauro, G., Touretzky, D., and Leen, T. (1995). Instance-based state identification for reinforcement learning. In Tesauro, G., Touretzky, D., and Leen, T., editors, Advances in Neural Information Processing Systems 7, pages 377--384. MIT Press, Cambridge, MA.
....effects of hidden state is to estimate it directly, and augment the state space correspondingly. There are various methods for estimating hidden state, such as Hidden Markov Models [6, 22, 29] recurrent neural networks [28, 43] Kalman filters [16, 37] or treebased or instance based approaches [21]. The obvious disadvantage of such an approach would be the increased size of the state space, which might cause severe data gathering and computational efficiency problems. To the best of of knowledge, none of the existing probabilistic approaches to mobile robot localization is capable of ....
McCallum, R. A. Instance-Based State Identification for Reinforcement Learning. in: Advances in Neural Information Processing Systems 7, edited by G. Tesauro, D. Touretzky, and T. Leen. MIT Press, Cambridge, MA, 1995, To appear.
.... various perceptual domains, including multi sensory inputs, is under study [Sarukkai and Ballard 1995, 1994a,b,c] as are hidden markov and other techniques to endow reinforcement learning with the sort of memory it needs to distinguish states that are perceptually the same but actually different [McCallum 1994 1995] Psychophysical Learning and approximate dynamic modeling work together to improve performance in open and closed loop control of complex, highly nonlinear and unstable motor control tasks [Schneider 1994, 1995; Schneider and Brown 1994ab; Schneider and Gans 1994] The mental ....
McCallum, R.A., "Instance-based state identification for reinforcement learning," Neural Information Processing Systems 6 (NIPS), Denver, CO, December 1994.
....learning extremely slowly, it has trouble discovering the utility of memories longer than one time step since the statistical test only examines the benefit of making a single additional state split at a time. These concerns are addressed by a recent algorithm called Nearest Sequence Memory (NSM) [McCallum, 1995] . It comes from a family of techniques dubbed Instance Based State Identification techniques that use instance based (or memory based ) learning for uncovering hidden state. NSM, specifically, is based on k nearest neighbor. The key idea behind Instance Based State Identification is the ....
....shift during learning. When faced with an evolving state space, keeping raw previous experience is the path of least commitment, and thus the most cautious about losing information. NSM learns about an order of magnitude faster than several other hidden state reinforcement learning algorithms [McCallum, 1995] . NSM s other advantage over UDM is that it can easily create memories that are multiple steps long. Since the algorithm records the raw experience in sequence, all the steps leading up to a reward are available for examination. e 0 2 a b 0 1 Observation at time Action at time t 1 a b Q 0 1 0 ....
[Article contains additional citation context not shown here]
R. Andrew McCallum. Instance-based state identification for reinforcement learning. In Advances of Neural Information Processing Systems (NIPS 7), 1995.
No context found.
McCallum, R. A. 1995b. Instance-based state identification for reinforcement learning. In Advances of Neural Information Processing Systems (NIPS 7), 377-- 384.
....memory to augment a perceptual state space that is missing crucial features due to hidden state. 2. 1 U Tree Ancestors The algorithm can be seen as the result of an effort to combine the advantages of several previous algorithms: 1) Like Parti game [Moore, 1993] and Nearest Sequence Memory [McCallum, 1995b] the algorithm is instance based, making efficient use of raw experience in order to learn quickly and discover multi element feature conjunctions easily. 2) Like Utile Distinction Memory [McCallum, 1993] the algorithm uses a robust statistical technique, a utile distinction test in or 1 ....
R. Andrew McCallum. Instance-based state identification for reinforcement learning. In Advances of Neural Information Processing Systems (NIPS 7), pages 377--384, 1995.
No context found.
McCallum, R. A. 1995b. Instance-based state identification for reinforcement learning. In Advances of Neural Information Processing Systems (NIPS 7), 377--384.
No context found.
McCallum, R. A. Instance-Based State Identification for Reinforcement Learning. in: Advances in Neural Information Processing Systems 7, edited by G. Tesauro, D. Touretzky, and T. Leen. MIT Press, Cambridge, MA, 1995, To appear.
No context found.
R. A. McCallum. Instance-based state identification for reinforcement learning. In Advances in Neural Information Processing Systems 7, pages 377--384, 1995.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC