80 citations found. Retrieving documents...
Thrun S. B. : The role of exploration in learning control. In D A. Sofge (eds.). Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Florence, Kentucky:Van Nostrand Reinhold (1992).

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Control of Exploitation-Exploration Meta-Parameter in.. - Ishii, Yoshida.. (2002)   (1 citation)  (Correct)

....exploration. Since these two strategies, exploitation and exploration, cannot be operated at once, their control has long been an important issue in the control fields [18] Methods for exploration can roughly be classified into two: undirected exploration methods and directed exploration methods [58]. Undirected exploration methods try to explore the whole state action space by assigning positive probabilities to all possible actions. For example, semiuniform (ffl greedy) exploration and the Boltzmann exploration [55] are undirected methods. Directed exploration methods use the statistics ....

....forgetting effect of the environmental dynamics, the agent comes to try an action that is not optimal with respect to the current estimation of the value function. We discuss in this paper a new control method of the exploitation exploration balance. The balance control was also studied by Thrun [58]. Our method is mainly an undirected method, in which the balancing parameter is controlled depending on the current state. Our method also uses exploration bonus. Usher et al. 60] has suggested that the exploitation exploration balance in a real brain may be controlled by noradrenergic neurons ....

Thrun, S.B. (1992). The role of exploration in learning control. In Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches, Frorence, KY: Van Nostrand Reinhold.


Learning and Planning in Structured Worlds - Dearden   (Correct)

....know little about the value of the actions we should be more exploratory, but once we have tried the actions numerous times we should simply exploit the information we have since our value estimates should now be good. Many ways of dealing with this explore exploit tradeoff have been proposed [110, 41]. The simplest are ad hoc, taking into account only the differences in value between actions (preferring actions with high value) The simplest of these, known as uniform random exploration involves selecting the action with the highest Q value with some probability 1 p, and with probability p ....

....the best action is performed. Even without the advantages of reasoning about the learned model (i.e. when counter based exploration is applied in a model free learning algorithm) counter based exploration generally outperforms any of the model free exploration techniques we have described [6, 110], although convergence is no longer guaranteed. One common counter based approach, first introduced as part of the Pri oritised Sweeping algorithm [79] is optimism in the face of ucertaity. The idea here is that, in the absence of other evidence, any action in any state is assumed to lead us ....

Sebastian B. Thrun. The role of exploration in learning control. In David A. White and Donald A. Sofge, editors, Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Van Nostrand Reinhold, Florence, KY, To appear.


Efficient and Inefficient Ant Coverage Methods - Koenig, Szymanski, Liu   (3 citations)  (Correct)

.... of the current vertex to one plus the old u value of the current vertex (meaning that the vertex has been visited another time) Variants of Node Counting have been used in the literature to explore un known terrain, either on their own [19] or to accelerate reinforcement learning methods [27]. Node Counting is also the foundation of Avoiding the Past: A Simple but Effective Strategy for Reactive [Robot] Navigation [2] It is probably easier to implement ants that use Node Counting than ants that use LRTA because Node Counting always increases the u value of the current vertex by ....

S. Thrun. The role of exploration in learning control. In D. White and D. Sofge, editors, Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches, pages 527-559. Van Nostrand Reinhold, 1992.


New Machine Learning Methods for the Prediction Of.. - Baldi, Pollastri.. (2002)   (Correct)

....list at the previous stage. As an alternative to a static selection of examples, we propose an online strategy where examples are generated on the fly as while network is being trained. Methods that implement state space exploration have been suggested in the reinforcement learning literature [41, 40] as practical ways for dynamically sampling training instances. These methods would be too costly for implementing a dynamic training scheme in the present contest. We propose the following simplification. During the learning phase, for a given sequence V the learner is asked to follow a ....

....where a state is the undirected graph of a candidate contact map and an action is an edit operator that adds one edge. Various exploration strategies can be implemented in order to investigate the effects of the exploration exploitation trade off, which is well known in reinforcement learning [41, 40]. In random exploration, we choose the successor graph randomly with uniform probability. In pure exploitation, we select the successor graph that has maximum score, as predicted by the current network. In semi uniform exploration (also known as # greedy policy) with probability # we make a ....

S. Thrun. The role of exploration in learning control. In D. White and D. Sofge, editors, Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Van Nostrand Reinhold, Florence, Kentucky, 1992.


A Bi-Recursive Neural Network Architecture for the Prediction .. - Vullo, Frasconi   (Correct)

....open list of size 3. The open list is filled at each stage with the best 3 candidates, selected from all the possible successors of graphs in the open list at the previous stage. In the dynamic case, ideas for exploring the state space can be borrowed from the reinforcement learning literature [17, 16]. During the learning phase, for a given sequence V the learner is asked to follow a particular trajectory from the null graph to a final graph according to a given policy. A policy maps states to actions where a state is a candidate contact map and an action is an edit operator that adds one edge ....

....details on how to do this are not indicated, because this is what differentiate an exploration strategy from another one. In our experiments, we have implemented various strategies, investigating the effect of the exploration exploitation trade off, which is well known in reinforcement learning [17, 16]. In random exploration, we choose the successor graph randomly with uniform probability, whereas in pure exploitation the successor is always chosen to be the best possible one (according to the computed score) In semiuniform exploration (also known as e greedy policy) with probability e a ....

S. Thrun. The role of exploration in learning control. In D. White and D. Sofge, editors, Handbook of Intelligent Control: Neural, Fury and Adaptive Approaches. Van Nostrand Reinhold, Florence, Kentucky, 1992.


Combining Inductive Logic Programming, Active Learning and.. - Muggleton, al.   (Correct)

....involved computer operated sensors and actuators. One of the reasons that this domain was chosen was that concepts could be represented by relatively simple equations. D. 2 Robotic Discovery and Reinforcement Learning The careful choice of trials by ASE Progol is related to directed exploration [50] in robotic discovery and reinforcement learning. Directed exploration involves an important tradeo between exploration (analogous to experimentation) and exploitation (using previously learned knowledge) in the context of robot navigation and neural network learning, a tradeo that doesn t ....

S. Thrun. The role of exploration in learning control. In D. White and D. Sofgre, editors, Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Van Nostrand Reinhold, Florence, KY, 1992.


Continuous State Space Q-Learning for Control of Nonlinear Systems - Hagen (2001)   (3 citations)  (Correct)

....LEARNING to as the exploration. The exploration has the same function as the excitation in system identification, described in chapter 1. The exploration means that sometimes a di#erent action is tried than the action according to the policy. There are di#erent strategies to explore. In [76] a distinction is made between directed and undirected exploration. In case of undirected exploration, the tried actions are truly random and are selected based on a predefined scheme. Undirected exploration does not take information about the system into account but can use experiences gathered ....

S.B. Thrun. The role of exploration in learning control. In Handbook of Intelligent control, Neural Fuzzy and Adaptive Approahces. Van Nostrand Reinhold, 1992.


To Discount or not to Discount in Reinforcement Learning: A Case .. - Mahadevan (1994)   (13 citations)  (Correct)

....compare two RL techniques. Another problem is that the performance of an RL technique may vary with type of exploration strategy used. Accordingly, in this paper we compare the average performance of R learning and Q learning over multiple runs using two types of undirected exploration strategies [19], semi uniform exploration and Boltzmann exploration, and one recency based directed exploration method [15] Computing averaged performance improvement plots does not reveal the actual variation across runs. For this reason, we will show the variation across runs in this paper. Also, we deviate ....

....payoffs (of 1) when the robot actually pushed a box. We also measured how well the robot learned each separate behavior, which turned out to be very useful in understanding differences between the two RL techniques. 4. 1 Different Types of Exploration Strategies Following Thrun s terminology [19], we divide exploration methods into undirected and directed methods. Undirected exploration methods do not use the results of learning to guide exploration; they merely select a random action some of the time. Two popular undirected exploration methods are Boltzmann exploration, which we study in ....

[Article contains additional citation context not shown here]

S. Thrun. The role of exploration in learning control. In D. A. White and D. A. Sofge, editors, Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. Van Nostrand Reinhold. To Appear.


C.1 Motivation - From Business To   (Correct)

....is speci ed by one or more numerical values for a control panel, and the result consists of one or more numerical readings. Robotic Discovery and Reinforcement Learning The selection of experiments arises in robotic discovery and reinforcement learning through the notion of directed exploration [50]. Thrun identi es three strategies for directed exploration [50] counter based: investigate less explored regions in state space; recencybased: investigate less recently explored regions; error based: investigate regions in state space estimated to have high error. All three of these strategies ....

....and the result consists of one or more numerical readings. Robotic Discovery and Reinforcement Learning The selection of experiments arises in robotic discovery and reinforcement learning through the notion of directed exploration [50] Thrun identi es three strategies for directed exploration [50]: counter based: investigate less explored regions in state space; recencybased: investigate less recently explored regions; error based: investigate regions in state space estimated to have high error. All three of these strategies to exploration are closely linked with Cohn, Atlas, and Ladner s ....

[Article contains additional citation context not shown here]

S. Thrun. The role of exploration in learning control. In D. White and D. Sofgre, editors, Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Van Nostrand Reinhold, Florence, KY, 1992.


Hierarchical Learning in Stochastic Domains: Preliminary Results - Kaelbling (1993)   (37 citations)  (Correct)

....have to use the policy to control action. But in doing so, we are in danger of violating the requirement that every situation action pair be tried infinitely often. This is an instance of the exploration versus exploitation trade off; it has been treated extensively elsewhere [Kaelbling, 1993a, Thrun, 1992] In the interest of simplicity in this paper, we generate actions probabilistically based on the Q values using a Boltzmann distribution. Given a situation s, we choose action a with probability e Q(a;s) T P a2A e Q(a;s) T : This serves to choose actions whose values are much better than ....

Thrun, Sebastian B. 1992. The role of exploration in learning control. In White, David A. and Sofge, Donald A., editors 1992, Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. Van Nostrand Reinhold, New York.


Closed Loop Machine Learning - Bryant, Muggleton (2000)   (1 citation)  (Correct)

....rule. The system then randomly selects a set of new examples from each of the three classes, presents them to the User for tagging and nally adds them to the training set. A. 6 Robotic Discovery and Reinforcement Learning The careful choice of trials in CLML is related to directed exploration [56] in robotic discovery and reinforcement learning. The strategies for directed exploration [56] are closely linked with uncertainty sampling (see Section A.5) Directed exploration involves an important tradeo between exploration (analogous to experimentation) and exploitation (using previously ....

....presents them to the User for tagging and nally adds them to the training set. A.6 Robotic Discovery and Reinforcement Learning The careful choice of trials in CLML is related to directed exploration [56] in robotic discovery and reinforcement learning. The strategies for directed exploration [56] are closely linked with uncertainty sampling (see Section A.5) Directed exploration involves an important tradeo between exploration (analogous to experimentation) and exploitation (using previously learned knowledge) in the context of robot navigation and neural network learning, a tradeo ....

S. Thrun. The role of exploration in learning control. In D. White and D. Sofgre, editors, Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Van Nostrand Reinhold, Florence, KY, 1992.


When Push Comes to Shove: A Computational Model of the Role of.. - Bailey (1997)   (8 citations)  (Correct)

....of the forward connections. And since the hidden nodes do not correspond to any previously identi#ed features, it is impossible to include pre wired network structure to #interpret them. Nonetheless, there is one way to attempt to reverse such mappings, called backpropagation through the inputs #Thrun 1992#. Bailey #1992# attempted to use this technique to model the mapping from a spatial term to a mental image of its prototypical case, in the context of Regier s #1996# system. The technique involves applying a random input vector #i.e. image# to the trained network, computing the corresponding ....

Thrun, Sebastian B. 1992. The role of exploration in learning control. In Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches , ed. by D. White & D. Sofge. Van Nostrand Reinhold.


Active Learning in Neural Networks - Hasenjäger, Ritter   (Correct)

....and this new state of the environment in turn determines what the student may observe. There are long time dependencies between action and observation and an observation may depend on many previous observations. For strategies of how to select actions in a directed manner in this context, see [20, 21] and the references therein. 3.1.4 Other Related Approaches A di erent kind of queries is considered by Ratsaby [22] He allows queries for additional examples from a selected pattern class. Heskes [23] presents a learning algorithm for perceptron learning from an unreliable teacher that is ....

S. B. Thrun. The role of exploration in learning control. In D. A. White and D. A. Sofge, editors, Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches, pages 527-559. Van Nordstrand Reinhold, Florence, Kentucky, 1992.


Interleaved vs. A Priori Exploration for Repeated.. - Argamon-Engelson..   (Correct)

....utilitybased interleaved exploration is nearly always preferable to a priori exploration. Furthermore, we investigated the contribution of the utility based formulation, by comparing it to a randomized approach to interleaved exploration (similar to methods commonly used in reinforcement learning [30, 22]) We found that for a single repeated task in our model, utility based interleaved exploration consistently outperforms the alternatives, unless the number of task repetitions is very high. In addition, we extended the algorithms for the case of multiple repeated tasks, where the agent has a ....

....Learning While Performing tasks We also compared our original algorithm with a simpler algorithm that explores randomly while performing its tasks. This randomized learning while performing (RLWP) algorithm is modeled on probabilistic exploration methods used in reinforcement learning (see, eg, [30]) The algorithm generally follows the shortest known path towards its current goal, but if the current node has any unknown neighboring edges, with exploration probability p e RLWP attempts to traverse the best such unknown edge according to the heuristic function used by LBP (see above) In our ....

S. Thrun. The role of exploration in learning control. In Handbook for Intelligent Control: Neural, Fuzzy and Adaptive Approaches, Florence, Kentucky 41022, 1992. Van Nostrand Reinhold.


Efficient Learning of Variable-Resolution Cognitive.. - Arleo.. (1999)   (2 citations)  (Correct)

....method attempts to optimize resources (i.e. memory and time to manage data) by collecting only significant experiences; that is, those experiences that actually improve the model accuracy. ffl Exploration: In order to optimize the learning time, we use an active learning approach (e.g. 12] [13]) An active learner is one that uses its current knowledge to drive the generation of training data in order to maximize the gain of information in the least possible time. This active behavior is obtained by making the robot always explore the least known region of the environment. The ....

....smallest possible number of exploration steps. This active behavior is obtained by making the robot always explore the least known region of the environment. Active exploration is based on estimating the exploration utility U e of each existing partition and selecting the one which maximizes it [13], 22] This heuristic real valued function measures how much a partition is worth to be explored . To define U e , we use a technique called counterbased exploration with decay [13] It gives higher utility to partitions that have been visited less often and less recently. A counter c(p) keeps ....

[Article contains additional citation context not shown here]

S.B. Thrun, "The role of exploration in learning control," in Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches, D.A. White and D.A. Sofge, Eds. Van Nostrand Reinhold, 1992.


Neural Network Learning of Variable Grid-Based Maps for the.. - Millán, Arleo (1997)   (1 citation)  (Correct)

....carefully in order to obtain an accurate map with the smallest possible number of training data. This selection is based on active learning techniques [1] The intelligent choice of training examples has been shown to reduce the computational complexity of the learning process drastically [11]. Exploration is based on estimating the exploration utility U explor of each existing partition and selecting the one which maximizes it. This heuristic real valued function measures how much a partition is worth to be explored . Different exploration strategies utilize different functions U ....

....selecting the one which maximizes it. This heuristic real valued function measures how much a partition is worth to be explored . Different exploration strategies utilize different functions U explor . In our current implementation we use the technique called counter based exploration with decay [11]. It gives higher utilities to partitions that have been visited less often and less recently. For each partition p 2 P , a counter c(p) keeps track of the number of occurrences (i.e. how many times it has been visited) In order to take also into account when a partition occurred, the counter ....

S.B. Thrun. "The role of exploration in learning control ". In D. A. White and D. A. Sofge (eds.), Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches, Van Nostrand Reinhold, 1992.


Active Mobile Robot Localization by Entropy Minimization - Wolfram Burgard Dept (1997)   (8 citations)  Self-citation (Thrun)   (Correct)

No context found.

S. Thrun. The role of exploration in learning control. In D. A. White and D. A. Sofge, editors, Handbook of intelligent control: neural, fuzzy and adaptive approaches. Van Nostrand Reinhold, Florence, Kentucky 41022, 1992.


Lifelong Robot Learning - Thrun, Mitchell (1993)   (5 citations)  Self-citation (Thrun)   (Correct)

....[Bachrach and Mozer, 1991] Chrisman, 1992] Lin and Mitchell, 1992] Mozer and Bachrach, 1989] Rivest and Schapire, 1987] Tan, 1991] Whitehead and Ballard, 1991] for approaches to learning with incomplete perception. s See for example [Barto et al. 1989] Jordan, 1989] Munro, 1987] [Thrun, 1992] for more approaches to learning action models with neural networks. agent begins at the initial state s and performs the action sequence a, a2, a. After action a it receives the reward r = 92.3 which, in this example, indicates that the action sequence was considerably successful. At a first ....

Sebastian Thrun. The role of exploration in learning control. In David A. White and Donald A. Sofge, editors, Handbook of intelligent control: neural fuzzy and adaptive approa&es. Van Nostrand Reinhold, Florence, Kentucky 41022, 1992.


Exploration and Model Building in Mobile Robot Domains - Thrun (1993)   (20 citations)  Self-citation (Thrun)   (Correct)

....tabula rasa learning methods would result in collisions in any new, unknown environment before learning to avoid them. COLUMBUS top level goal is efficient exploration. The approach taken in this paper is motivated by earlier research on exploration in the context of reinforcement learning [Thrun, 1992b] Theo retical results on the efficiency of exploration indicate the importance of the exploration strategy for the amount of knowledge gained, and for the efficiency of learning control in general. It has been shown that for certain hard deterministic environments, that an autonomous robot can ....

Sebastian B. Thrun. The role of exploration in learning control. In David A. White and Donald A. Sofge, editors, Handbook of intelligent control: neural, fuzzy and adaptive approaches, Florence, Kentucky 41022, 1992. Van Nostrand Reinhold.


The Exploration-Exploitation Dilemma for Adaptive Agents - Rejeb, Guessoum, M'Hallah (2005)   (Correct)

No context found.

Thrun S. B. : The role of exploration in learning control. In D A. Sofge (eds.). Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Florence, Kentucky:Van Nostrand Reinhold (1992).


Adaptive Modeling and Planning for Reactive Agents - Mykel Kochenderfer Institute (2005)   (Correct)

No context found.

Thrun, S. B. 1992. The role of exploration in learning control. In White, D. A., and Sofge, D. A., eds., Handbook for Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Van Nostrand Reinhold. 527--559.


Memory-guided Exploration in Reinforcement Learning - Carroll, Peterson, Owens   (Correct)

No context found.

Sebastian B. Thrun. The role of exploration in learning control. In D. White and D. Sofge, editors, Handbook for Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Van Nostrand Reinhold, Florence, Kentucky 41022, 1992.


Multiagent Reinforcement Learning: Stochastic Games with.. - Chalkiadakis (2003)   (1 citation)  (Correct)

No context found.

S.B. Thrun. The Role of Exploration in Learning Control. In D.A. White and D.A. Sofge, editors, Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches, Florence, Kentucky, 1992. Van Nostrand Reinhold.


Memory-guided Exploration in Reinforcement Learning - James Carroll Todd   (Correct)

No context found.

Sebastian B. Thrun. The role of exploration in learning control. In D. White and D. Sofge, editors, Handbook for Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Van Nostrand Reinhold, Florence, Kentucky 41022, 1992.


Model-based Average Reward Reinforcement Learning - Tadepalli, Ok (1998)   (4 citations)  (Correct)

No context found.

S. Thrun. The role of exploration in learning control. In Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. Van Nostrand Reinhold, 1994.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC