| Sebastian B. Thrun and Knut M611er. On planning and exploration in non-discrete environments. Technical Report 528, GMD, Sankt Augustin, FRG, 1991. |
.... they choose the one that has been tried the least often previously; error based techniques that use the measured (or predicted) variations of the variables V k i or Q k i during their last (or next) update(s) in the rule of selection of the actions (e.g. Moore 1990; Schmidhuber 1991a; Thrun and Moller 1991, 1992) In general, these techniques prefer the states or actions whose estimated quality varied the most in the past, or is predicted to vary the most in the future; 17 recency based techniques that deal with non stationary problems by keeping in memory the date of the last trial of each ....
Thrun, S., & Moller, K. (1991). On planning and exploration in non-discrete environments. Technical Report 528, GMD, Sankt Augustin, Germany.
....error predicting unit by creating action sequences for provoking mismatches between expectations and reality. The gradient computed for the error predictor also served to change the internal representations of the whole network (whose error function simply contained an additional term) Recently [12] described related ideas (they use the term competence network instead of the term confidence network as used in [7] and [5] One problem with the idea above is that in non deterministic environments the controller will focus on parts of the environmental dynamics which are inherently ....
S. Thrun and K. Moller. On planning and exploration in non-discrete environments. Technical report, Gesellschaft fur Mathematik und Datenverarbeitung, D-5205 St. Augustin, Germany, March 1991.
....a state vector from the Environment to which it must respond with an action. The process of selecting an action consists of three functions: action next state generation, state encoding, and state evaluation (see Figure 3. 2) Two distinct methods for selecting an action are direct and indirect (Thrun and Moller, 1991). Direct A direct selection scheme uses a neural network to map a state to an action without any search. Advantage: It selects an action quickly and and in constant time. Disadvantages: It doesn t handle many to many mappings of states to actions (there might be several equally good ....
Thrun, S. B. and Moller, K. (1991). On planning and exploration in non-discrete environments. Available via FTP from archive.ohiostate.
....sequences for provoking mismatches between expectations and reality. The gradient computed for the error predictor also served to change the internal representations of the whole network (whose error function simply contained an additional term) Recently Thrun and Moller described related ideas [14] (they use the term competence network instead of the term confidence network as used in [9] and [8] One problem with the idea above is that in non deterministic environments the controller will focus on parts of the environmental dynamics which are inherently unpredictable. This is because ....
S. Thrun and K. Moller. On planning and exploration in non-discrete environments. Technical report, Gesellschaft fur Mathematik und Datenverarbeitung, D-5205 St. Augustin, Germany, March 1991.
No context found.
Sebastian B. Thrun and Knut M611er. On planning and exploration in non-discrete environments. Technical Report 528, GMD, Sankt Augustin, FRG, 1991.
....bp architectures. ffl active exploration, i.e. the focus of attention on those regions in workspace that are important and whose approximations are less satisfying. It is straightforward how our learning method can be used to generate input for those parts where knowledge is still inadequate [TM91]. To summerize: The motivation for this approach was ffl to support coarse grained parallelization e.g. on transputer nets, ffl to accelerate learning of real world problems such as robot kinematics and dynamics. This is accomplished by function decomposition i.e. the use of many small networks ....
S. Thrun and K. Moller. On planning and exploration in non-discrete environments. Technical Report 528, GMD, St. Augustin, FRG, February, 1991.
No context found.
Sebastian B. Thrun and Knut Moller. On planning and exploration in non-discrete environments. Technical Report 528, GMD, Sankt Augustin, FRG, 1991.
....back through the model network. Using these gradients, actions are optimized progressively by gradient descent in action space, minimizing E exploit . The resulting actions exploit the world. THE COMPETENCE MAP The general principle of many enhanced exploration schemes [BS90, Sut90, Moo90, TM91, Sch91, Thr92] is to select actions such that the resulting observations are expected to optimally improve the controller. In terms of the above control scheme, this may be realized by driving the agent into regions in state action space where the accuracy of the model network is assumed to be ....
....the agent into regions in state action space where the accuracy of the model network is assumed to be low, and thus the knowledge gain by visiting these regions is assumed to be high. In order to estimate the accuracy of the model network, we introduce the notion of a competence network [Sch91, TM91] Basically, this map estimates some upper bound of the LMS error of the model network. This estimation is used for exploring the world by selecting actions which minimize the expected competence of the model, and thus maximize the resulting learning effect. However, training the competence map ....
S.B. Thrun and K. Moller. On planning and exploration in non-discrete environments. Technical Report 528, GMD, St.Augustin, FRG, 1991.
....ffl active exploration, i.e. the focus of attention on those parts of workspace that are important at the moment and whose approximations are still not satisfying. It is straight forward how our learning method can be used to generate input for those areas where knowledge is still inadequate [TM91]. ffl easier relearning, i.e. in case that the learning system is facing a constantly changing environment fast relearning is necessary. Unfortunately pure backpropagation is not well suited for online training with open training sets. Our decomposition allows to keep certain changes local. In ....
S. Thrun and K. Moller. On planning and exploration in non-discrete environments. Technical Report, GMD, February, 1991.
....it is used for control even after learning. The most uninformed undirected exploration technique is the random walk (Nguyen and Widrow, 1989) Anderson, 1986) Mozer and Bachrach, 1989) Bachrach and Mozer, 1991) Jordan, 1989) Jordan and Jacobs, 1990) Mel, 1989) Munro, 1987) (Thrun et al. 1991) which completely ignores costs and negative rewards from the environment. As a result of Whitehead s Theorem and as we will demonstrate, this exploration technique is even in scenarios where costs do not matter inferior to other exploration techniques. Other undirected exploration techniques rely ....
.... distributions (Whitehead and Ballard, 1991) Mahadevan and Connell, 1990) Mahadevan and Connell, 1991) Directed exploration on the other hand does utilize further knowledge of the learning process (Moore, 1990) Moore, 1991) Kaelbling, 1990) Sutton, 1990) Schmidhuber, 1991) (Thrun and Moller, 1991) , Thrun and Moller, 1992) This exploration specific knowledge typically cannot be used for controlling the environment and is thus useless after learning. We assume throughout this paper that this knowledge does not exceed the complexity of the knowledge stored for control, and Exploitation ....
[Article contains additional citation context not shown here]
Sebastian B. Thrun and Knut Moller. On planning and exploration in non-discrete environments. Technical Report 528, GMD, Sankt Augustin, FRG, 1991.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC