| M. C. Mozer and J. Bacharach, 1990b, Discovering the structure of reactive environment by exploration. Neural Computation, 2:447-457. |
....depicted in Figure 3. The robot 4 We intentionally avoid the complex problem of incomplete and noisy perception perception, since the algorithm presented in this section is kind of orthogonal to research on these issues. See [Bachrach and Mozer, 1991] Chrisman, 1992] Lin and Mitchell, 1992] [Mozer and Bachrach, 1989], Rivest and Schapire, 1987] Tan, 1991] Whitehead and Ballard, 1991] for approaches to learning with incomplete perception. s See for example [Barto et al. 1989] Jordan, 1989] Munro, 1987] Thrun, 1992] for more approaches to learning action models with neural networks. agent begins ....
Michael C. Mozer and Jonathan R. Bachrach. Discovering the structure of a reactive environment by exploration. Technical Report CU-CS-451-89, Dept. of Computer Science, University of Colorado, Boulder, November 1989.
....at all. In many applications of adaptive control, one distinguishes between a learnin 9 phase and a subsequent performance phase (exploitation phase) and costs count exclusively in the latter phase. Random exploration is frequently found in recent literature to adaptive neurocontrol, e.g. in [1, 3, 16, 17, 31, 33, 37, 38, 40]. However, there are undirected exploration techniques that take costs into account during learning. These typically use the actual controller for assessing the expected costs (or rewards, respectively) of actions. Actions are randomly selected, but the expected rewards are used for drawing the ....
Michael C. Mozer and Jonathan R. Bachrach. Discovering the structure of a reactive environment by exploration. Technical Report CU-CS-451-89, Dept. of Computer Science, University of Colorado, Boulder, November 1989.
.... generators recognizers have been known for several decades [40] 52] 56] More recently, recurrent neural network realizations of finite state automata for recognition and learning of finite state (regular) languages have been explored by numerous authors [3] 8] 14] 16] 18] 37] [60], 64] 66] 67] 82] 87] 102] There has been considerable work on extending the computational capabilities of recurrent neural network models by providing some form of external memory in the form of a tape [103] or a stack [5] 11] 29] 55] 74] 84] 90] 93] 105] To the best ....
....pattern recognition, neural networks, computational learning theory, natural language processing, and related areas. The interested reader is referred to [34] 44] 54] and [70] for surveys of grammar inference in general and to [3] 5] 11] 16] 18] 17] 29] 37] 39] 55] [60], 61] 64] 66] 82] 84] 87] 93] and [102] 104] for recent results on grammar inference using neural networks. The neural architecture for syntax analysis that is proposed in this paper does not appear to lend itself to use in grammar inference using conventional neuralnetwork ....
M. C. Mozer and J. Bachrach, "Discovering the structure of a reactive environment by exploration," Neural Comput., vol. 2, no. 4, p. 447, 1990.
....inapplicable to all but the simplest problems. The MDPDHS model lies in between, inheriting some of the computational efficiency from the MDP model, while being able to cope with certain types of hidden state. The work is also related to the rich literature on learning finite state machines [11, 7]. Like ours, the problem addressed in this literature is characterized by deterministic hidden state. It differs, however, in the control goal (system identification versus reward maximization) and it does not incorporate a model of control noise of any type, in contrast to the problem studied ....
M. C. Mozer and J. R. Bachrach. Discovering the structure of a reactive environment by exploration. TR CU-CS-451-89, Univ. of Colorado, 1989.
....trajectory. The need of exploration has been recognized by many authors. We will summarize two ways of exploration. ffl Training models by randomly generated patterns. In order to explore the whole world regularly, in a first phase, the phase of model training, actions are generated randomly [MB89, Mun87, Sut90] Asymptotically, this ensures to explore the whole state space. This method has three important drawbacks. Firstly, during model training actions are not optimized. An autonomous agent would behave extremely poor until his model is sufficiently well trained because there is that ....
M. C. Mozer and J. Bachrach. Discovering the structure of a reactive environment by exploration. Technical Report CU-CS-451-89, Dept. of Computer Science, University of Colorado, Boulder, November 1989.
....More specifically, each time 3 We intentionally avoid the complex problem of incomplete and noisy perception perception, since the algorithm presented in this section is kind of orthogonal to research on these issues. See [Bachrach and Mozer,1991] Chrisman, 1992] Lin and Mitchell, 1992] [Mozer and Bachrach, 1989] , Rivest and Schapire, 1987] Tan, 1991] Whitehead and Ballard, 1991] for approaches to learning with incomplete perception. an action is performed, the previous state denoted by s, the action a and the next state s 0 form a training example for the action model network, denoted by M : ....
Michael C. Mozer and Jonathan R. Bachrach. Discovering the structure of a reactive environment by exploration. Technical Report CU-CS-451-89, Dept. of Computer Science, University of Colorado, Boulder, November 1989.
....is usually smaller, which comes at the expense of increased computational complexity. 5. 5 Learning Finite State Automata Within the AI community, research has been conducted on general methods that can reverse engineer (learn) finite state automata based on their input output behavior (see e.g. [3,20,78,66,72,86,87]) Finite state automata (FSAs) are learned by observing the result of sequences of actions. Often, algorithms capable of learning FSAs require a pre given homing sequence, i.e. a sequence that resets the state of the finite state machine (a routine that carries a robot to a unique location) ....
M. C. Mozer and J. R. Bachrach. Discovering the structure of a reactive environment by exploration. Technical Report CU-CS-451-89, Dept. of Computer Science, University of Colorado, Boulder, November 1989.
....at all. In many applications of adaptive control, one distinguishes between a learning phase and a subsequent performance phase (exploitation phase) and costs count exclusively in the latter phase. Random exploration is frequently found in recent literature to adaptive neurocontrol, e.g. in [1, 3, 16, 17, 31, 33, 37, 38, 40]. However, there are undirected exploration techniques that take costs into account during learning. These typically use the actual controller for assessing the expected costs (or rewards, respectively) of actions. Actions are randomly selected, but the expected rewards are used for drawing the ....
Michael C. Mozer and Jonathan R. Bachrach. Discovering the structure of a reactive environment by exploration. Technical Report CU-CS-451-89, Dept. of Computer Science, University of Colorado, Boulder, November 1989.
....we do not address issues relating to state representation and state estimation. State representations might involve delayed values of previous actions and sensations (Ljung Soderstrom, 1986) or they might involve internal state variables that are induced as part of the learning procedure (Mozer Bachrach, 1990). Given the state x[n Gamma 1] and given the input p[n Gamma 1] the learner produces an action u[n Gamma 1] u[n Gamma 1] h(x[n Gamma 1] p[n Gamma 1] 1 (3) 1 The choice of time indices in Equations 1, 2, and 3 is based on our focus on the output at time n. In our framework a ....
Mozer, M. C. & Bachrach, J. (1990). Discovering the structure of a reactive environment by exploration. In D. Touretzky (Ed.), Advances in Neural Information Processing Systems 2. San Mateo, CA: Morgan Kaufmann.
....is actively sought out and used to shape internal models of (a part of) the world. Exploration has already been applied to problems involving adaptation, in robot control [12, 14] in systems with vehicles coming to terms with a changing environment [1] in automata employing behaviors [5, 7, 9], and in the learning of functions [16] It has also been handled in [13] in the application of intelligent control. Our aim in this paper is to define exploration in a general way in the framework of learning from examples, and outline its possible strengths and limitations. The idea of open and ....
M. C. Mozer and J. Bachrach. Discovering the structure of a reactive environment by exploration. Technical Report CU-CS-451-89, Dept. of Computer Science, University of Colorado, Boulder, November 1989.
....1 Introduction 1. 1 Background Dynamically driven recurrent neural networks (DRNNs) have empirically shown the ability to perform inference in problems as diverse as grammar induction (Cleeremans et al. 1989; Das and Das, 1991; Elman, 1991; Frasconi et al. 1992a; Giles et al. 1992; Mozer and Bachrach, 1990; Pollack, 1991; Zeng et al. 1994) and system identification in control (Barto, 1990; Billings et al. 1992; Narendra and Parthasarathy, 1990) We discuss results concerning the learning of temporal sequences for a particular class of discrete time recurrent neural network architectures ....
Mozer, M. and Bachrach, J. (1990). Discovering the structure of a reactive environment by exploration.
....1987a; Rivest Schapire, 1987b; Schapire, 1988) An update graph is an alternate representation of a SSM that can be much smaller than the SSM for certain environments that often arise in practice. Update graphs can be mapped to a connectionist system that can learn the environment from examples (Mozer Bachrach, 1990; Mozer Bachrach, 1991) 4.2 Traditional Approaches Other methods for grammatical inference, which do not use neural networks, have demonstrated some promising results. In fact, a polynomial time algorithm proposed by Trakhtenbrot and Barzdin (Trakhtenbrot Barzdin, 1973) has been shown to be ....
Mozer, M. C. & Bachrach, J. (1990). Discovering the structure of a reactive environment by exploration. Neural Computation, 2 (4), 447--457.
....passive case, the adaptation is done on the basis of the data seen throughout the lifetime of the agent. In the active case, the agent itself may explore its possibilities by asking questions. This is done in the form of choosing those m s which may provide the a functions with unknown information [11, 1, 7]. The c of the JANUS master agent can be used as a trigger for including new agents or methods in the system. If c is too low or even zero it indicates that the current task can probably not be solved if only the current agents with the current knowledge are used. Now we discuss the different ....
M. C. Mozer and J. Bachrach. Discovering the structure of a reactive environment by exploration. Technical Report CU-CS-45189, Dept. of Computer Science, University of Colorado, Boulder, November 1989.
....in Figure 3. The robot 4 We intentionally avoid the complex problem of incomplete and noisy perception perception, since the algorithm presented in this section is kind of orthogonal to research on these issues. See [Bachrach and Mozer, 1991] Chrisman, 1992] Lin and Mitchell, 1992] [Mozer and Bachrach, 1989] , Rivest and Schapire, 1987] Tan, 1991] Whitehead and Ballard, 1991] for approaches to learning with incomplete perception. 5 See for example [Barto et al. 1989] Jordan, 1989] Munro, 1987] Thrun, 1992] for more approaches to learning action models with neural networks. agent ....
Michael C. Mozer and Jonathan R. Bachrach. Discovering the structure of a reactive environment by exploration. Technical Report CU-CS-451-89, Dept. of Computer Science, University of Colorado, Boulder, November 1989.
....between two behaviors exploration and exploitation depending on expected costs and knowledge gain. The appropriateness of this method is demonstrated by a simple robot navigation task. INTRODUCTION The need for exploration in adaptive control has been recognized by various authors [MB89, Sut90, Moo90, Sch90, BBS91] Many connectionist approaches (e.g. Mel89, MB89] distinguish a random exploration phase, at which a controller is constructed by generating actions randomly, and a subsequent exploitation phase. Random exploration usually suffers from three major disadvantages: ....
....costs and knowledge gain. The appropriateness of this method is demonstrated by a simple robot navigation task. INTRODUCTION The need for exploration in adaptive control has been recognized by various authors [MB89, Sut90, Moo90, Sch90, BBS91] Many connectionist approaches (e.g. Mel89, MB89] distinguish a random exploration phase, at which a controller is constructed by generating actions randomly, and a subsequent exploitation phase. Random exploration usually suffers from three major disadvantages: ffl Whenever costs are assigned to certain experiences which is the case for ....
M.C. Mozer and J.R. Bachrach. Discovering the structure of a reactive environment by exploration. Technical Report CU-CS-451-89, Dept. of Computer Science, University of Colorado, Boulder, Nov. 1989.
....Efficient exploration is of fundamental importance for autonomous systems that learn to act. In recent years, a variety of exploration techniques have been proposed for reinforcement learning (RL) Many researchers use undirected techniques approaches closely related to random walks, e.g. [Mozer and Bachrach, 1989; Barto et al. 1995] These include the common random action with probability e and Boltzmann distribution based on utility and decreasing temperature. While easy to implement, these approaches are often unbearably inefficient; Whitehead has proved that undirected exploration can cause the ....
M. C. Mozer and J. R. Bachrach. Discovering the structure of a reactive environment by exploration. Technical Report CU-CS-451-89, Dept. of Computer Science, University of Colorado, Boulder, November 1989.
....diversity approach finds a set of canonical tests using repeatable cycles in the environment. The Rivest Schapire algorithm depends on a completely deterministic environment. Work in a neural network implementation of this approach can accommodate noisy sensations, but has other disadvantages [ Mozer and Bachrach, 1990 ] be duplicated. A second disadvantage is that UDM and PDA both require an inordinate number of steps to learn the tasks on which they were demonstrated. They are both based on an Expectation Maximization (EM) algorithm in which the learner alternates between gathering statistics using the ....
Michael Mozer and Jonathan Bachrach. Discovering the structure of a reactive environment by exploration. Neural Computation, 2:447--457, 1990.
....function. Because 0 fl 1, V (x) is bounded. Also, since the number of s is finite V (x) exists. 4 If the state is not completely observable then a method that uses the observable states and retains past information has to be used; see (Bacharach 1991; Bacharach 1992; Chrisman 1992; Mozer Bacharach 1990a, 1990b; Whitehead and Ballard 1990) Tutorial Survey of Reinforcement Learning 9 How do we define an optimal policy, For a given x let x; denote a policy that achieves the maximum in (11) Thus we have a collection of policies, f x; x 2 Xg. Now is defined by picking ....
....(1989) for a test problem; Singh (1991, 1992b, 1992d) and Tham and Prager (1994) for a navigation problem; Lin and Kim (1991) for pole balancing; and Sutton (1990, 1991b) in his Dyna architecture. Recurrent networks with context information feedback have been used by Bacharach (1991, 1992) and Mozer and Bacharach (1990a, 1990b) in dealing with RL problems with incomplete state information. A few non connectionist methods have also been used for RL. Mahadevan and Connell (1991) have used statistical clustering in association with Q Learning for the automatic programming of a mobile robot. A novel feature of ....
M. C. Mozer and J. Bacharach, 1990b, Discovering the structure of reactive environment by exploration. Neural Computation, 2:447--457.
....function. Because 0 fl 1, V (x) is bounded. Also, since the number of s is finite V (x) exists. 4 If the state is not completely observable then a method that uses the observable states and retains past information has to be used; see (Bacharach 1991; Bacharach 1992; Chrisman 1992; Mozer Bacharach 1990a, 1990b; Whitehead and Ballard 1990) Tutorial Survey of Reinforcement Learning 9 How do we define an optimal policy, For a given x let x; denote a policy that achieves the maximum in (11) Thus we have a collection of policies, f x; x 2 Xg. Now is defined by picking ....
....(1989) for a test problem; Singh (1991, 1992b, 1992d) and Tham and Prager (1994) for a navigation problem; Lin and Kim (1991) for pole balancing; and Sutton (1990, 1991b) in his Dyna architecture. Recurrent networks with context information feedback have been used by Bacharach (1991, 1992) and Mozer and Bacharach (1990a, 1990b) in dealing with RL problems with incomplete state information. A few non connectionist methods have also been used for RL. Mahadevan and Connell (1991) have used statistical clustering in association with Q Learning for the automatic programming of a mobile robot. A novel feature of ....
M. C. Mozer and J. Bacharach, 1990a, Discovering the structure of reactive environment by exploration. In Advances in Neural Information Processing 2, D. S. Touretzky, editor, pages 439--446, Morgan Kaufmann, San Mateo, CA, USA.
....information processing, the combination of two or more such different models presents major difficulties. Exploration has already been demonstrated in adaptive models: in robot control [14] 16] for vehicles coming to terms with a changing environment [1] for automata employing behaviors [6] [8] [12] and in the learning of functions [17] It has also been handled in [15] in the application of intelligent control. Our aim in this paper is to examine reflective exploration as an adaptive property, and to outline its possible advantages and limitations. The example system used in the ....
M. C. Mozer and J. Bachrach. Discovering the structure of a reactive environment by exploration. Technical Report CU-CS-45189, Dept. of Computer Science, University of Colorado, Boulder, November 1989.
....be appropriate. The work we report here involves designing an inductive bias into the learning procedure in order to encourage the formation of discrete internal representations. In the recent years, various approaches have been explored for learning discrete representations using neural networks (McMillan, Mozer, Smolensky, 1992; Mozer Bachrach, 1990; Mozer Das, 1993; Schutze, 1993; Towell Shavlik, 1992) However, these approaches are domain specific, making strong assumptions about the nature of the task. In our work, we describe a general methodology that makes no assumption about the domain to which it is applied, beyond the fact that ....
M.C. Mozer & J.D Bachrach. (1990) Discovering the structure of a reactive environment by exploration. Neural Computation 2(4):447-457.
....involves designing an inductive bias into the learning procedure in order to encourage the formation of discrete internal representations. In the recent years, various approaches have been explored for learning discrete representations using neural networks (McMillan, Mozer, and Smolensky, 1992; Mozer and Bachrach, 1990; Mozer and Das, 1993; Schutze, 1993; Towell and Shavlik, 1992) However, these approaches are domain specific, making strong assumptions about the nature of the task. The research described here is a general methodology that makes no assumption about the domain to which it is applied, beyond the ....
Mozer, M. and Bachrach, J. (1990). Discovering the structure of a reactive environment by exploration.
No context found.
M. C. Mozer and J. Bacharach, 1990b, Discovering the structure of reactive environment by exploration. Neural Computation, 2:447-457.
No context found.
M. C. Mozer and J. Bacharach, 1990a, Discovering the structure of reactive environment by exploration. In Advances in Neural Information Processing 2, D. S. Touretzky, editor, pages 439-446, Morgan Kaufmann, San Mateo, CA, USA.
No context found.
M. C. Mozer, J. Bacharach, "Discovering the structure of reactive environment by exploration", Neural Computation, vol. 2, no. 4, pp. 447-457, 1990.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC