| Thrun S. B. : The role of exploration in learning control. In D A. Sofge (eds.). Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Florence, Kentucky:Van Nostrand Reinhold (1992). |
....exploration. Since these two strategies, exploitation and exploration, cannot be operated at once, their control has long been an important issue in the control fields [18] Methods for exploration can roughly be classified into two: undirected exploration methods and directed exploration methods [58]. Undirected exploration methods try to explore the whole state action space by assigning positive probabilities to all possible actions. For example, semiuniform (ffl greedy) exploration and the Boltzmann exploration [55] are undirected methods. Directed exploration methods use the statistics ....
....forgetting effect of the environmental dynamics, the agent comes to try an action that is not optimal with respect to the current estimation of the value function. We discuss in this paper a new control method of the exploitation exploration balance. The balance control was also studied by Thrun [58]. Our method is mainly an undirected method, in which the balancing parameter is controlled depending on the current state. Our method also uses exploration bonus. Usher et al. 60] has suggested that the exploitation exploration balance in a real brain may be controlled by noradrenergic neurons ....
Thrun, S.B. (1992). The role of exploration in learning control. In Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches, Frorence, KY: Van Nostrand Reinhold.
....know little about the value of the actions we should be more exploratory, but once we have tried the actions numerous times we should simply exploit the information we have since our value estimates should now be good. Many ways of dealing with this explore exploit tradeoff have been proposed [110, 41]. The simplest are ad hoc, taking into account only the differences in value between actions (preferring actions with high value) The simplest of these, known as uniform random exploration involves selecting the action with the highest Q value with some probability 1 p, and with probability p ....
....the best action is performed. Even without the advantages of reasoning about the learned model (i.e. when counter based exploration is applied in a model free learning algorithm) counter based exploration generally outperforms any of the model free exploration techniques we have described [6, 110], although convergence is no longer guaranteed. One common counter based approach, first introduced as part of the Pri oritised Sweeping algorithm [79] is optimism in the face of ucertaity. The idea here is that, in the absence of other evidence, any action in any state is assumed to lead us ....
Sebastian B. Thrun. The role of exploration in learning control. In David A. White and Donald A. Sofge, editors, Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Van Nostrand Reinhold, Florence, KY, To appear.
.... of the current vertex to one plus the old u value of the current vertex (meaning that the vertex has been visited another time) Variants of Node Counting have been used in the literature to explore un known terrain, either on their own [19] or to accelerate reinforcement learning methods [27]. Node Counting is also the foundation of Avoiding the Past: A Simple but Effective Strategy for Reactive [Robot] Navigation [2] It is probably easier to implement ants that use Node Counting than ants that use LRTA because Node Counting always increases the u value of the current vertex by ....
S. Thrun. The role of exploration in learning control. In D. White and D. Sofge, editors, Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches, pages 527-559. Van Nostrand Reinhold, 1992.
....list at the previous stage. As an alternative to a static selection of examples, we propose an online strategy where examples are generated on the fly as while network is being trained. Methods that implement state space exploration have been suggested in the reinforcement learning literature [41, 40] as practical ways for dynamically sampling training instances. These methods would be too costly for implementing a dynamic training scheme in the present contest. We propose the following simplification. During the learning phase, for a given sequence V the learner is asked to follow a ....
....where a state is the undirected graph of a candidate contact map and an action is an edit operator that adds one edge. Various exploration strategies can be implemented in order to investigate the effects of the exploration exploitation trade off, which is well known in reinforcement learning [41, 40]. In random exploration, we choose the successor graph randomly with uniform probability. In pure exploitation, we select the successor graph that has maximum score, as predicted by the current network. In semi uniform exploration (also known as # greedy policy) with probability # we make a ....
S. Thrun. The role of exploration in learning control. In D. White and D. Sofge, editors, Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Van Nostrand Reinhold, Florence, Kentucky, 1992.
....open list of size 3. The open list is filled at each stage with the best 3 candidates, selected from all the possible successors of graphs in the open list at the previous stage. In the dynamic case, ideas for exploring the state space can be borrowed from the reinforcement learning literature [17, 16]. During the learning phase, for a given sequence V the learner is asked to follow a particular trajectory from the null graph to a final graph according to a given policy. A policy maps states to actions where a state is a candidate contact map and an action is an edit operator that adds one edge ....
....details on how to do this are not indicated, because this is what differentiate an exploration strategy from another one. In our experiments, we have implemented various strategies, investigating the effect of the exploration exploitation trade off, which is well known in reinforcement learning [17, 16]. In random exploration, we choose the successor graph randomly with uniform probability, whereas in pure exploitation the successor is always chosen to be the best possible one (according to the computed score) In semiuniform exploration (also known as e greedy policy) with probability e a ....
S. Thrun. The role of exploration in learning control. In D. White and D. Sofge, editors, Handbook of Intelligent Control: Neural, Fury and Adaptive Approaches. Van Nostrand Reinhold, Florence, Kentucky, 1992.
....involved computer operated sensors and actuators. One of the reasons that this domain was chosen was that concepts could be represented by relatively simple equations. D. 2 Robotic Discovery and Reinforcement Learning The careful choice of trials by ASE Progol is related to directed exploration [50] in robotic discovery and reinforcement learning. Directed exploration involves an important tradeo between exploration (analogous to experimentation) and exploitation (using previously learned knowledge) in the context of robot navigation and neural network learning, a tradeo that doesn t ....
S. Thrun. The role of exploration in learning control. In D. White and D. Sofgre, editors, Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Van Nostrand Reinhold, Florence, KY, 1992.
....LEARNING to as the exploration. The exploration has the same function as the excitation in system identification, described in chapter 1. The exploration means that sometimes a di#erent action is tried than the action according to the policy. There are di#erent strategies to explore. In [76] a distinction is made between directed and undirected exploration. In case of undirected exploration, the tried actions are truly random and are selected based on a predefined scheme. Undirected exploration does not take information about the system into account but can use experiences gathered ....
S.B. Thrun. The role of exploration in learning control. In Handbook of Intelligent control, Neural Fuzzy and Adaptive Approahces. Van Nostrand Reinhold, 1992.
....compare two RL techniques. Another problem is that the performance of an RL technique may vary with type of exploration strategy used. Accordingly, in this paper we compare the average performance of R learning and Q learning over multiple runs using two types of undirected exploration strategies [19], semi uniform exploration and Boltzmann exploration, and one recency based directed exploration method [15] Computing averaged performance improvement plots does not reveal the actual variation across runs. For this reason, we will show the variation across runs in this paper. Also, we deviate ....
....payoffs (of 1) when the robot actually pushed a box. We also measured how well the robot learned each separate behavior, which turned out to be very useful in understanding differences between the two RL techniques. 4. 1 Different Types of Exploration Strategies Following Thrun s terminology [19], we divide exploration methods into undirected and directed methods. Undirected exploration methods do not use the results of learning to guide exploration; they merely select a random action some of the time. Two popular undirected exploration methods are Boltzmann exploration, which we study in ....
[Article contains additional citation context not shown here]
S. Thrun. The role of exploration in learning control. In D. A. White and D. A. Sofge, editors, Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. Van Nostrand Reinhold. To Appear.
....is speci ed by one or more numerical values for a control panel, and the result consists of one or more numerical readings. Robotic Discovery and Reinforcement Learning The selection of experiments arises in robotic discovery and reinforcement learning through the notion of directed exploration [50]. Thrun identi es three strategies for directed exploration [50] counter based: investigate less explored regions in state space; recencybased: investigate less recently explored regions; error based: investigate regions in state space estimated to have high error. All three of these strategies ....
....and the result consists of one or more numerical readings. Robotic Discovery and Reinforcement Learning The selection of experiments arises in robotic discovery and reinforcement learning through the notion of directed exploration [50] Thrun identi es three strategies for directed exploration [50]: counter based: investigate less explored regions in state space; recencybased: investigate less recently explored regions; error based: investigate regions in state space estimated to have high error. All three of these strategies to exploration are closely linked with Cohn, Atlas, and Ladner s ....
[Article contains additional citation context not shown here]
S. Thrun. The role of exploration in learning control. In D. White and D. Sofgre, editors, Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Van Nostrand Reinhold, Florence, KY, 1992.
....have to use the policy to control action. But in doing so, we are in danger of violating the requirement that every situation action pair be tried infinitely often. This is an instance of the exploration versus exploitation trade off; it has been treated extensively elsewhere [Kaelbling, 1993a, Thrun, 1992] In the interest of simplicity in this paper, we generate actions probabilistically based on the Q values using a Boltzmann distribution. Given a situation s, we choose action a with probability e Q(a;s) T P a2A e Q(a;s) T : This serves to choose actions whose values are much better than ....
Thrun, Sebastian B. 1992. The role of exploration in learning control. In White, David A. and Sofge, Donald A., editors 1992, Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. Van Nostrand Reinhold, New York.
....rule. The system then randomly selects a set of new examples from each of the three classes, presents them to the User for tagging and nally adds them to the training set. A. 6 Robotic Discovery and Reinforcement Learning The careful choice of trials in CLML is related to directed exploration [56] in robotic discovery and reinforcement learning. The strategies for directed exploration [56] are closely linked with uncertainty sampling (see Section A.5) Directed exploration involves an important tradeo between exploration (analogous to experimentation) and exploitation (using previously ....
....presents them to the User for tagging and nally adds them to the training set. A.6 Robotic Discovery and Reinforcement Learning The careful choice of trials in CLML is related to directed exploration [56] in robotic discovery and reinforcement learning. The strategies for directed exploration [56] are closely linked with uncertainty sampling (see Section A.5) Directed exploration involves an important tradeo between exploration (analogous to experimentation) and exploitation (using previously learned knowledge) in the context of robot navigation and neural network learning, a tradeo ....
S. Thrun. The role of exploration in learning control. In D. White and D. Sofgre, editors, Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Van Nostrand Reinhold, Florence, KY, 1992.
....of the forward connections. And since the hidden nodes do not correspond to any previously identi#ed features, it is impossible to include pre wired network structure to #interpret them. Nonetheless, there is one way to attempt to reverse such mappings, called backpropagation through the inputs #Thrun 1992#. Bailey #1992# attempted to use this technique to model the mapping from a spatial term to a mental image of its prototypical case, in the context of Regier s #1996# system. The technique involves applying a random input vector #i.e. image# to the trained network, computing the corresponding ....
Thrun, Sebastian B. 1992. The role of exploration in learning control. In Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches , ed. by D. White & D. Sofge. Van Nostrand Reinhold.
....and this new state of the environment in turn determines what the student may observe. There are long time dependencies between action and observation and an observation may depend on many previous observations. For strategies of how to select actions in a directed manner in this context, see [20, 21] and the references therein. 3.1.4 Other Related Approaches A di erent kind of queries is considered by Ratsaby [22] He allows queries for additional examples from a selected pattern class. Heskes [23] presents a learning algorithm for perceptron learning from an unreliable teacher that is ....
S. B. Thrun. The role of exploration in learning control. In D. A. White and D. A. Sofge, editors, Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches, pages 527-559. Van Nordstrand Reinhold, Florence, Kentucky, 1992.
....utilitybased interleaved exploration is nearly always preferable to a priori exploration. Furthermore, we investigated the contribution of the utility based formulation, by comparing it to a randomized approach to interleaved exploration (similar to methods commonly used in reinforcement learning [30, 22]) We found that for a single repeated task in our model, utility based interleaved exploration consistently outperforms the alternatives, unless the number of task repetitions is very high. In addition, we extended the algorithms for the case of multiple repeated tasks, where the agent has a ....
....Learning While Performing tasks We also compared our original algorithm with a simpler algorithm that explores randomly while performing its tasks. This randomized learning while performing (RLWP) algorithm is modeled on probabilistic exploration methods used in reinforcement learning (see, eg, [30]) The algorithm generally follows the shortest known path towards its current goal, but if the current node has any unknown neighboring edges, with exploration probability p e RLWP attempts to traverse the best such unknown edge according to the heuristic function used by LBP (see above) In our ....
S. Thrun. The role of exploration in learning control. In Handbook for Intelligent Control: Neural, Fuzzy and Adaptive Approaches, Florence, Kentucky 41022, 1992. Van Nostrand Reinhold.
....method attempts to optimize resources (i.e. memory and time to manage data) by collecting only significant experiences; that is, those experiences that actually improve the model accuracy. ffl Exploration: In order to optimize the learning time, we use an active learning approach (e.g. 12] [13]) An active learner is one that uses its current knowledge to drive the generation of training data in order to maximize the gain of information in the least possible time. This active behavior is obtained by making the robot always explore the least known region of the environment. The ....
....smallest possible number of exploration steps. This active behavior is obtained by making the robot always explore the least known region of the environment. Active exploration is based on estimating the exploration utility U e of each existing partition and selecting the one which maximizes it [13], 22] This heuristic real valued function measures how much a partition is worth to be explored . To define U e , we use a technique called counterbased exploration with decay [13] It gives higher utility to partitions that have been visited less often and less recently. A counter c(p) keeps ....
[Article contains additional citation context not shown here]
S.B. Thrun, "The role of exploration in learning control," in Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches, D.A. White and D.A. Sofge, Eds. Van Nostrand Reinhold, 1992.
....carefully in order to obtain an accurate map with the smallest possible number of training data. This selection is based on active learning techniques [1] The intelligent choice of training examples has been shown to reduce the computational complexity of the learning process drastically [11]. Exploration is based on estimating the exploration utility U explor of each existing partition and selecting the one which maximizes it. This heuristic real valued function measures how much a partition is worth to be explored . Different exploration strategies utilize different functions U ....
....selecting the one which maximizes it. This heuristic real valued function measures how much a partition is worth to be explored . Different exploration strategies utilize different functions U explor . In our current implementation we use the technique called counter based exploration with decay [11]. It gives higher utilities to partitions that have been visited less often and less recently. For each partition p 2 P , a counter c(p) keeps track of the number of occurrences (i.e. how many times it has been visited) In order to take also into account when a partition occurred, the counter ....
S.B. Thrun. "The role of exploration in learning control ". In D. A. White and D. A. Sofge (eds.), Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches, Van Nostrand Reinhold, 1992.
....between exploration and exploitation is one of the major issues in learning control, and systematic methods for addressing it are sorely required. In reinforcement learning, various approaches to addressing the tradeoff between exploration and exploitation have been studied empirically (see Thrun, 1992, for a review) One popular technique is to be optimistic in parts of the state space that have never been explored, or, if the environment can change over time, have not been explored recently (Sutton, 1990; Moore Atkeson, 1993; Christiansen, Mason Mitchell, 1991) Sutton s DYNA system does ....
Thrun, SB (1992). The role of exploration in learning control. In DA White & DA Sofge, editors, Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches. New York, NY: Van Nostrand Reinhold.
....Kaelbling, Littman, Moore 2.2 Ad Hoc Techniques In reinforcement learning practice, some simple, ad hoc strategies have been popular. They are rarely,ifever, the best choice for the models of optimalitywehave used, but they may be viewed as reasonable, computationally tractable, heuristics. Thrun #1992# has surveyed avariety of these techniques. 2.2.1 Greedy Strategies The #rst strategy that comes to mind is to always choose the action with the highest estimated payo#. The #aw is that early unlucky sampling might indicate that the best action s reward is less than the reward obtained from a ....
Thrun, S. B. #1992#. The role of exploration in learning control. In White, D. A., & Sofge, D. A. #Eds.#, Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches.Van Nostrand Reinhold, New York, NY.
....is universally favored. Some of the approaches for which rigorous theoretical results are available are reviewed by Kumar [39] and a variety of more heuristic approaches have been studied by Barto and Singh [6] Kaelbling [34] Moore [53] Schmidhuber [61] Sutton [67] Watkins [79] Thrun [77], and Thrun and Moller [76] In the following subsections, we describe several non Bayesian methods for solving Markovian decision problems with incomplete information. Although these methods can form the basis of algorithms that can be proved to converge to optimal policies, we do not describe ....
....the controller tends to visit regions of the state space where the confidence is low so as to improve the model for these regions. This strategy produces exploration that aids identification but can conflict with control. Kaelbling [34] Lin [46] Moore [53] Schmidhuber [61] Sutton [67] Thrun [77], and Thrun and Moller [76] discuss these and other possibilities. 7.3 Q Learning Q Learning is a method proposed by Watkins [79] for solving Markovian decision problems with incomplete information. 14 Unlike the indirect adaptive methods discussed above, it is a direct method because it does ....
S.B. Thrun. The role of exploration in learning control. In D. A. White and D. A. Sofge, editors, Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, pages 527--559. Van Nostrand Reinhold, New York, 1992.
....of the missing information against the cost of acquiring it. Moreover, model completeness cannot be guaranteed, unless special assumptions are made. The issue of how much experience is necessary is related to the issue of exploration exploitation in the reinforcement learning literature see Thrun (1992) for a lucid review of the techniques proposed to address this issue. The number of epochs required for sampling dynamics is an important parameter; it is an upper bound to total number of models in M( i ) Its speci cation can be made within the framework of dynamical systems theory, where the ....
Thrun, S. (1992). The role of exploration in learning control. In D. White and D. Sofge (Eds.), Handbook of intelligent control: Neural fuzzy, and adaptive approaches, 527-559. New York: Van Nostrand Reinhold.
....of implicit exploration. The performance of the best Q learner was easily overtaken by even the ARTDP algorithm, whose computational requirements were comparable to that of Q learning. We have also tested another exploration strategy which Thrun found the best among several undirected methods 10 [27]. These runs reinforced our previous findings that estimating a model (i.e. running ADP or ARTDP instead of Q learning) could reduce the regret rate by as much as 50 . 4 Related Work There are two main research tracks that influenced our work. The first was the introduction of features in RL. ....
S. Thrun. The role of exploration in learning control. Van Nostrand Rheinhold, Florence KY, 1992.
....is typically modeled as a Markov decision process (MDP) and it is assumed that the agent does not know the parameters of this process, but has to learn how to act directly from experience. Thus, the reinforcement learning agent faces a fundamental trade off between exploitation and exploration (Thrun, 1992; Sutton Barto, 1998) should the agent exploit its cumulative experience so far, by executing the action that currently seems best, or should it execute a different action, with the hope of gaining information or experience that could lead to higher future payoffs Too little exploration can ....
Thrun, S. B. (1992). The role of exploration in learning control. In White, D. A. , Sofge, D. A. (Eds.), Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Florence, Kentucky 41022: Van Nostrand Reinhold.
....LRTA (Learning RTA ) K90] are two variations on the famous A heuristic search, with a cost function that takes the current searcher s location into account, thus making the algorithm more realistic for field applications like robotics. The worst case cover time by LRTA is O(mn) In [T92], T92a] a counter based exploration method is described in which a counter holds the number of visits to each vertex so far. The cover time by this method is bounded by O(n 2 d) where d is the diameter of the graph. In the Nearest Neighbor Approach (NNA) method of [KS96] a graph is learned by ....
....vertex degree Delta. For randomized algorithms the expected bound is given. method reference cover time depth first search [T95] 2m random walk [AKLLR79] O(mn) exp. k random walkers [BKRU94] O(mn=k) exp. semi random [GA90] 1:5m(exp. learning real time A [K90] O(mn) counter based [T92] O(n 2 d) nearest neighbor [KS96] O(m log n) k edge ant walkers [WLB96a] O( Deltan 2 =k) vertex ant walk [WLB98] O(nd) All the algorithms and cover times described so far are suitable for static graphs, that is graphs which do not change during the execution of the algorithm. If the graph ....
S. Thrun, "The Role of Exploration in Learning Control," In Handbook for Intelligent Control: Neural, Fuzzy and Adaptive Approaches, Van Nostrand Reinhold, Florence, Kentucky 41022, 1992.
....selects the action with the highest Q value) however, usually causes some loss of immediate reinforcement intake. The animat faces the problem of spending as little time as possible on exploration without affecting its ability to find optimal policies. Previous work. Schmidhuber (1991a) and Thrun (1992) present comparisons between directed and undirected exploration methods. Directed exploration methods use special exploration specific knowledge to guide the search through alternative policies. Undirected exploration methods use randomized action selection methods to guess useful experiences. ....
....use special exploration specific knowledge to guide the search through alternative policies. Undirected exploration methods use randomized action selection methods to guess useful experiences. Previous research has shown significant benefits of using directed exploration (e.g. Schmidhuber 1991a; Thrun 1992; Storck, Hochreiter and Schmidhuber, 1995, Wilson 1996, Schmidhuber 1997) Koenig and Simmons (1996) shows how undirected exploration techniques can be improved by using the action penalty rule which makes unexplored actions look more promising this decreases the advantage of directed ....
[Article contains additional citation context not shown here]
Thrun, S. B. (1992). The role of exploration in learning control.
....Robotic Discovery and Reinforcement Learning Because Closed Loop learning involves the use of robots to perform the experiments, Closed Loop learning is related to robotic discovery and reinforcement learning. More specifically, the careful choice of experiments is related to directed exploration [Thrun, 1992]. Thrun identifies the following strategies for directed exploration [Thrun, 1992] ffl counter based: investigate less explored regions in state space 1 The consistency problem for a representation R is to answer the following question, given any particular set of labeled examples: does R ....
....the use of robots to perform the experiments, Closed Loop learning is related to robotic discovery and reinforcement learning. More specifically, the careful choice of experiments is related to directed exploration [Thrun, 1992] Thrun identifies the following strategies for directed exploration [Thrun, 1992] ffl counter based: investigate less explored regions in state space 1 The consistency problem for a representation R is to answer the following question, given any particular set of labeled examples: does R contain a hypothesis that is consistent with the labeled examples ffl ....
S. Thrun. The role of exploration in learning control. In D. White and D. Sofgre, editors, Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Van Nostrand Reinhold, Florence, KY, 1992.
....2 2 2 2 2 2 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 4 4 4 4 4 4 4 4 4 Figure 2: The gridworld domain. the experiments detailed in this paper, the ffl greedy exploration algorithm (Thrun, 1992) was used, with ffl = 0:1 (i.e. at each step a random exploratory action is chosen with probability 1 in 10. The result of the executed action, in terms of changes in the high level state description, is used by the Reinforcement Schema to determine the reinforcement feedback, r, to provide to ....
Thrun, S. B. (1992). The role of exploration in learning control. In White, D., & Sofge, D. (Eds.), Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Van Nostrand Reinhold.
....the prediction of the value for the action is compared to the real evaluation the agent receives; this difference consitutes the prediction error, which is stored now and used in subsequent timesteps to fill the action s bucket. This algorithm combines recency based and error based properties (see [Thrun, 1992]) and has the advantage that it can be used for online learning. Many other exploration methods lack this last property since they depend on a decreasing influence of randomness over time. 4 Results An experiment lasting 2000 pursuit games was carried out. Initially, before the learning agent ....
Thrun, S. (1992). The role of exploration in learning control. Van Nostrand Reinhold.
....learns slightly better routing policies than Q Routing at low and medium loads and sustains higher loads than Q Routing. Optimal shows the performance of an optimal router based on global information about the state of the network. speed of learning, as can be seen in figures 3, 4, and 5 (cf. [Thrun (1992)] 2. Backward exploration is more accurate than forward exploration. In backward exploration, information about the path already traversed is propagated, instead of estimates of the remaining path. At low load levels (such as shown in figure 3) DRQ Routing is almost three times faster than ....
Thrun, S. B., 1992, "The Role of Exploration in Learning Control," In White, D. A., Sofge, D. A., editors, Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. Van Nostrand Reinhold, New York.
....learning automata from data sequences and the applied area of map acquisition for robot navigation. In the following we provide a survey of results from both of these areas. The reinforcement learning literature also addresses some aspects of learning models for Markov decision processes [Sut90, Thr92, Kae93] The latter can be viewed as a special case of learning probabilistic automata with fully observable states, and we brie y review related work from this domain in Section 2.1.3. 2.1 Learning Automata from Data Informally speaking, an automaton consists of a set of states, and a set of ....
....by Akaike is described in a book by Sakamoto et al. SIK86] In Section 6.2 we suggest another possible heuristic for estimating the number of states as part of an initialization algorithm. 2.1. 3 Models for Markov Decision Processes Much of the work on reinforcement learning [Kae93, Sut90, Thr92, BBS95, MB98] is concerned with acting optimally within the context of fully observable Markov decision processes. The Markov model consists of states and actions that transition an agent from one state to the other, where every such transition has associated with it a reward. The goal of the ....
S. Thrun, The Role of Exploration in Learning Control, in Handbook of Intelligent Control: Neural, Fuzzy and Adptive Approaches , D. A. White and D. A. Sofge, eds., Van Nostrand Reinhold, Florence, Kentucky, 1992.
....exploitation to the exclusion of exploration, are likely to find themselves trapped in sub optimal behavior. The exploration vs. exploitation problem has received significant attention within the RL research. Several methods were developed for incorporating exploration strategies into RL agents [14]. Undirected methods are based on incorporating randomness into the decision procedure by assigning each action a positive probability of being selected and performed. Directed methods use the interaction history to evaluate the expected contribution of every action to exploration. In previous ....
S. B. Thrun. The role of exploration in learning control. In D. A. White and D. Sopfge, editors, Handbook for Intelligent Control. Multiscience Press Inc., 1992.
....neighbor. In adaptive routing, however, the routing decisions of a node for a particular destination might be different at different times. The adaptation is based on exploration where routing information at each node is spread to other nodes over the network as the packets are transmitted (see (Thrun 1992) for the role of exploration in learning control) Individual nodes exploit this information by updating their routing policies. 2.1 Shortest Path Routing In Shortest Path Routing (Floyd 1962; Ravindra K. Ahuja and Tarjan 1988; Thomas H. Cormen and Rivest 1990) for any pair (s; d) of source ....
Thrun, S. B. (1992). The role of exploration in learning control. In White, D. A., and Sofge, D. A., editors, Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. New York: Van Nostrand Reinhold.
....of the forward connections. And since the hidden nodes do not correspond to any previously identified features, it is impossible to include pre wired network structure to interpret them. Nonetheless, there is one way to attempt to reverse such mappings, called backpropagation through the inputs (Thrun 1992). Bailey (1992) attempted to use this technique to model the mapping from a spatial term to a mental image of its prototypical case, in the context of Regier s (1996) system. The technique involves applying a random input vector (i.e. image) to the trained network, computing the corresponding ....
Thrun, Sebastian B. 1992. The role of exploration in learning control. In Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches , ed. by D. White & D. Sofge. Van Nostrand Reinhold.
....Khepera, an autonomous mobile robot with a very restricted view, is to explore its unknown surroundings. That is, a series of local information collected by the robot moving around its environment has to be accumulated to form a global map. This problem is well known in robotics, see e.g. [2, 3, 4]. There are two major aspects suggesting the integration of fuzziness into the system: 1) Computing the actual position w.r.t. Khepera s former movements introduces uncertainty due to dynamics, which cannot be incorporated into the kinematical model. That is, after some time the robot cannot be ....
S.B. Thrun, "The role of exploration on learning control", Technical Report CMU--CS--92--102, Carnegie Mellon University, 1992.
.... for the RTQL 3) indicated that the difference between the performances of the RTQL 3 and ADP is indeed significant at the level p = 0:05 (Student s t test was applied for testing this) We have also tested another exploration strategy which Thrun found the best among several undirected methods 6 [21]. These runs reinforced our previous findings that estimating a model (i.e. running ADP or ARTDP instead of Q learning) could reduce the regret rate by as much as 40 . 4 Related Work There are two main research tracks that influenced our work. The first was the introduction of features in RL. ....
S. Thrun. The role of exploration in learning control. Van Nostrand Rheinhold, Florence KY, 1992.
....number of states. Nguyen and Widrow used neural networks to learn to reverse a simulated articulated lorry into a loading bay (Nguyen and Widrow 1989, Nguyen and Widrow 1990) Thrun uses a robot navigation task to illustrate the trade off between exploration and exploitation in machine learning (Thrun 1992). Tham and Prager have approached controlling a simulated robot arm as a reinforcement learning task (Tham and Prager 1992) The arm is penalized for collisions by an amount proportional to their velocity. Tham and Prager use a neural network known as a Cerebellar Model Articulation Controller, or ....
Thrun, S. B. (1992). The role of exploration in learning control, Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches, Van Nostrand Reinhold, Florence, Kentucky 41022.
No context found.
S. Thrun. The role of exploration in learning control. In D. A. White and D. A. Sofge, editors, Handbook of intelligent control: neural, fuzzy and adaptive approaches. Van Nostrand Reinhold, Florence, Kentucky 41022, 1992.
....[Bachrach and Mozer, 1991] Chrisman, 1992] Lin and Mitchell, 1992] Mozer and Bachrach, 1989] Rivest and Schapire, 1987] Tan, 1991] Whitehead and Ballard, 1991] for approaches to learning with incomplete perception. s See for example [Barto et al. 1989] Jordan, 1989] Munro, 1987] [Thrun, 1992] for more approaches to learning action models with neural networks. agent begins at the initial state s and performs the action sequence a, a2, a. After action a it receives the reward r = 92.3 which, in this example, indicates that the action sequence was considerably successful. At a first ....
Sebastian Thrun. The role of exploration in learning control. In David A. White and Donald A. Sofge, editors, Handbook of intelligent control: neural fuzzy and adaptive approa&es. Van Nostrand Reinhold, Florence, Kentucky 41022, 1992.
....tabula rasa learning methods would result in collisions in any new, unknown environment before learning to avoid them. COLUMBUS top level goal is efficient exploration. The approach taken in this paper is motivated by earlier research on exploration in the context of reinforcement learning [Thrun, 1992b] Theo retical results on the efficiency of exploration indicate the importance of the exploration strategy for the amount of knowledge gained, and for the efficiency of learning control in general. It has been shown that for certain hard deterministic environments, that an autonomous robot can ....
Sebastian B. Thrun. The role of exploration in learning control. In David A. White and Donald A. Sofge, editors, Handbook of intelligent control: neural, fuzzy and adaptive approaches, Florence, Kentucky 41022, 1992. Van Nostrand Reinhold.
....optimal actions that maximize reward over time. Given some state, s, that the agent finds itself in, it computes its control action a by considering its available actions to determine which of them produces the highest 4 See for example [Barto et al. 1989] Jordan, 1989] Munro, 1987] [Thrun, 1992] for more approaches to learning action models with neural networks. 6 Sebastian B. Thrun and Tom M. Mitchell Q i value: a = argmax a2A Q i (s; a) Since Q i (s; a) measures the future cumulative reward, a is the optimal action if Q i (s; a) is correct. The problem of learning a policy ....
Sebastian B. Thrun. The role of exploration in learning control. In David A. White and Donald A. Sofge, editors, Handbook of intelligent control: neural, fuzzy and adaptive approaches, Florence, Kentucky 41022, 1992. Van Nostrand Reinhold.
.... Many global approaches to function fitting (e.g. with a single monolithic neural network) in mobile robot domains either fail in complex environments due to effects like un relearning, or demand many well distributed training examples and have often been tested in simplified simulations only [2, 9, 15, 18] . Modularized, local and instance based approaches to function approximation have often been reported to generalize better from fewer or ill distributed examples [1, 5, 6, 7, 10, 12] Instance based curve fitting techniques have also been applied successfully to fairly complex robot control ....
....rasa learning methods will cause collisions in any new, unknown environment, before the robot eventually learns to avoid them. COLUMBUS top level goal is efficient exploration. The approach taken in this paper is motivated by earlier research on exploration in the context of reinforcement learning [18] . Theoretical results on the efficiency of exploration indicate the importance of the exploration strategy for the amount of knowledge gained, and for the efficiency of learning control in general. It has been shown that for certain hard deterministic environments, that an autonomous robot can ....
Sebastian B. Thrun. The role of exploration in learning control. In David A. White and Donald A. Sofge, editors, Handbook of intelligent control: neural, fuzzy and adaptive approaches, Florence, Kentucky 41022, 1992. Van Nostrand Reinhold.
....greatly reduced by actively interacting with the environment. For example, choosing the right action during exploration can reduce exponential complexity to low degree polynomial complexity, as for example shown in Koenig s and Thrun s work on exploration in heuristic search and learning control [10, 19]. Similarly, active vision (see e.g. 1] has also led to results superior to passive approaches to computer vision. In the context of mobile robot localization, actively controlling a robot pays off whenever the environment possesses relatively few features that enable a robot to unambiguously ....
S. Thrun. The role of exploration in learning control. In D. A. White and D. A. Sofge, editors, Handbook of intelligent control: neural, fuzzy and adaptive approaches. Van Nostrand Reinhold, Florence, Kentucky 41022, 1992.
....either expressed or implied, of Siemens Corp. with a single monolithic neural network) in mobile robot domains either fail in complex environments due to effects like un relearning, or demand many well distributed training examples and have often been tested in simplified simulations only [2, 9, 15, 18]. Modularized, local and instance based approaches to function approximation have often been reported to generalize better from fewer or ill distributed examples [1, 5, 6, 7, 10, 12] Instance based curve fitting techniques have also been applied successfully to fairly complex robot control ....
....rasa learning methods will cause collisions in any new, unknown environment, before the robot eventually learns to avoid them. COLUMBUS top level goal is efficient exploration. The approach taken in this paper is motivated by earlier research on exploration in the context of reinforcement learning [18]. Theoretical results on the efficiency of exploration indicate the importance of the exploration strategy for the amount of knowledge gained, and for the efficiency of learning control in general. It has been shown that for certain hard deterministic environments, that an autonomous robot can ....
Sebastian B. Thrun. The role of exploration in learning control. In David A. White and Donald A. Sofge, editors, Handbook of intelligent control: neural, fuzzy and adaptive approaches, Florence, Kentucky 41022, 1992. Van Nostrand Reinhold.
No context found.
Thrun S. B. : The role of exploration in learning control. In D A. Sofge (eds.). Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Florence, Kentucky:Van Nostrand Reinhold (1992).
No context found.
Thrun, S. B. 1992. The role of exploration in learning control. In White, D. A., and Sofge, D. A., eds., Handbook for Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Van Nostrand Reinhold. 527--559.
No context found.
Sebastian B. Thrun. The role of exploration in learning control. In D. White and D. Sofge, editors, Handbook for Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Van Nostrand Reinhold, Florence, Kentucky 41022, 1992.
No context found.
S.B. Thrun. The Role of Exploration in Learning Control. In D.A. White and D.A. Sofge, editors, Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches, Florence, Kentucky, 1992. Van Nostrand Reinhold.
No context found.
Sebastian B. Thrun. The role of exploration in learning control. In D. White and D. Sofge, editors, Handbook for Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Van Nostrand Reinhold, Florence, Kentucky 41022, 1992.
No context found.
S. Thrun. The role of exploration in learning control. In Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. Van Nostrand Reinhold, 1994.
No context found.
Thrun, S. (1992b). The role of exploration in learning control. In D. A. White and D. A. Sofge (Eds.), Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches. NY: Van Nostrand Reinhold.
No context found.
Thrun, S. 1994. The Role of Exploration in Learning Control. In Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, Van Nostrand Reinhold.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC