| S. Thrun, K. Moller, and A. Linden. Adaptive look ahead planning. In Proceedings OEGAI 90, 1990. |
....has received considerable attention in the last few years [And86, Bar89, Sut84] In general there are two principles to solve reinforcement learning problems: direct and indirect techniques, both having their advantages and disadvantages. We present a system that combines both methods [TML91, TML90] By interaction with an unknown environment a world model is progressively constructed using the backpropagation algorithm. For optimizing actions with respect to future reinforcement planning is applied in two steps: An experience network proposes a plan which is subsequently optimized by ....
....Planning has also been used for reinforcement learning. e.g. Sutton [Sut90] uses off line planning for improving the controller without interacting with the world. In this article we will present a planning technique which relys on a combination of direct and indirect learning control [TML91, TML90] A model network which approximates the behavior of the world is used for looking ahead into future and optimizing actions by gradient descent with respect to future reinforcement. In addition, an experience network is trained like a controller but used for accelerating 1 Even this is a science ....
[Article contains additional citation context not shown here]
S. Thrun, K. Moller, and A. Linden. Adaptive look-ahead planning. In Proceedings OEGAI 90, 1990.
....to backpropagation is sped up by using many small networks in a modular fashion rather than a single large one. ffl Execution speed is enhanced by modularization, and it is possible to perform faster (gradient directed) search in input space, as it is used by a planning method described in[TML90, TML91]. ffl The problem of determining the number of hidden units is circumvented. The system performs an adaptive resource allocation in a way such that difficult parts of a function attract more networks (or hidden units) than easier ones. Destructive constructive extensions can be incorporated ....
S. Thrun, K. Moller, and A. Linden. Adaptive look ahead planning. In G. Dorffner, editor, Konnektionismus in Artificial Intelligence, Springer-Verlag, Berlin, 1990.
....to backpropagation is sped up by using many small networks in a modular fashion rather than a single large one. ffl Execution speed is enhanced by modularization, and it is possible to perform faster (gradient directed) search in input space, as it is used by a planning method described in[TML90, TML91]. ffl The problem of determining the number of hidden units is circumvented. The system performs an adaptive resource allocation in a way such that difficult parts of a function attract more networks (or hidden units) than easier ones. Destructive constructive extensions can be incorporated ....
S. Thrun, K. Moller, and A. Linden. Adaptive look-ahead planning. In G. Dorffner, editor, Konnektionismus in Artificial Intelligence, Springer-Verlag, Berlin, 1990.
No context found.
S. Thrun, K. Moller, and A. Linden. Adaptive look ahead planning. In Proceedings OEGAI 90, 1990.
....for Computer Science (GMD) D 5205 St. Augustin, FRG Knut Moller University of Bonn Department of Computer Science D 5300 Bonn, FRG Alexander Linden German National Research Center for Computer Science (GMD) D 5205 St. Augustin, FRG Abstract We present a new connectionist planning method [TML90]. By interaction with an unknown environment, a world model is progressively constructed using gradient descent. For deriving optimal actions with respect to future reinforcement, planning is applied in two steps: an experience network proposes a plan which is subsequently optimized by gradient ....
....is the time of the sth action. Thus, for each action (8i; s) its influence on later activations (8j; 8 s) of the chain of networks, including all predictions, is measured by j is ( It has been shown in an earlier paper that this gradient can easily be propagated forward through the network [TML90]: j is ( 8 : ffi ij ffi s if j action input unit 0 if =1 j state context input unit j 0 is ( 1) if 1 j state context input unit (j 0 corresponding output unit of preceding model) logistic 0 (net j ( Delta X l2pred(j) weight jl l is ( ....
S. Thrun, K. Moller, and A. Linden. Adaptive look-ahead planning. In G. Dorffner, editor, Proceedings KONNAI/OEGAI, Springer, Sept. 1990.
No context found.
S. Thrun, K. Moller, and A. Linden. Adaptive look ahead planning. In Proceedings OEGAI 90, 1990.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC