23 citations found. Retrieving documents...
T. G. Dietterich and N. S. Flann, "Explanation-based learning and reinforcement learning: a unified view," Machine Learning, vol. 28, pp. 169--210, 1997.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Adapting to Subsequent Changes of Environment by Learning.. - Matsui, Inuzuka, Seki (2002)   (Correct)

....rules. Decision tree learning algorithms, such as ID3 [14] and C4.5 [15] are well known and widely used for many practical applications. 2 Adapting to Changes of Environment 2. 1 Changing Environment To illustrate our method, consider a simple maze problem which was used by Dietterich and Ftann [16]. One field of this problem is shown at Figure 1, and we call it environment 1. There are six goal states, which are indicated by G. The initial state is selected randomly. The robot can use 16 operators that are divided into three groups: a) Single step operators (north, south, ast, and we s t) ....

T. G. Dietterich and N. S. Flann, "Explanation-Based Learning and Reinforcement Learning: A Unified View," Machine Learning, 28:169-214, 1997.


Reinforcement Learning with Policy Constraints - Thrun, Schulte   (Correct)

....in lifelong building control. 5 Related Work The transfer of knowledge across multiple reinforcement learning tasks has previously been studied by several researchers. A popular approach is to acquire action models, which describe the state transition probabilities of individual actions [8, 18, 27, 30]. Action models can be employed to generate synthetic training data [27] or to explain observations [8, 30] Such approaches work well if the state space dynamics stay the same across multiple tasks. In this paper, however, we are interested in cases where even the state space dynamics may change ....

....tasks has previously been studied by several researchers. A popular approach is to acquire action models, which describe the state transition probabilities of individual actions [8, 18, 27, 30] Action models can be employed to generate synthetic training data [27] or to explain observations [8, 30]. Such approaches work well if the state space dynamics stay the same across multiple tasks. In this paper, however, we are interested in cases where even the state space dynamics may change across different tasks. In such situations, it is unclear how to reuse previously learned action models. ....

T. G. Dietterich and N. S. Flann. Explanation-based learning and reinforcement learning: A unified view. In A. Prieditis and S. Russell, editors, Proceedings of the Twelfth International Conference on Machine Learning, 1995.


Machine Learning and Inductive Logic Programming for.. - Kazakov, Kudenko (2001)   (Correct)

....et al. 7] explore the combination of relational regression tree learning and Q learning. The result is a more expressive and more general Q function representation that may still be applicable even if the goals or the environment of the agent change. In another approach, Dietterich and Flann [6] combine explanation based learning (a form of learning that is focused on speeding up reasoning processes) with reinforcement learning. In Section 5 we discuss ways of combining ILP with reinforcement learning. 3.3 Integrating Machine Learning into an Agent Architecture As Section 2 showed, ....

T. Dietterich and N. Flann. Explanation-based learning and reinforcement learning. In Proceedings of the Twelfth International Conference on Machine Learning, pages 176--184, 1995.


Learning Preconditions for Control Policies in.. - Matsui, Inuzuka, Seki (2001)   (Correct)

....by a tree and it can be translated into sets of if then rules. Some decision tree learning algorithms, such as ID3 [7] and C4.5 [8] are well known and widely used for many practical applications. 2. 3 Changing Environment To illustrate our method, consider a simple maze problem which is used in [1]. The field of this problem is shown in figure 2 and we call it environment 1. There are six goal states, which are indicated by G, and an initial state is selected randomly. The robot can use 16 operators that are divided into three groups: a) Single step operators (north, south, east, and ....

....units. The robot must pay the cost for using these operators, and gets 100 units as rewards when it reaches a goal. The intention of the robot is to learn a control policy that maximizes its profits. G G G G G G 0510 15 20 0 5 10 15 Figure 2. Environment 1: This is a maze in simple maze problem [1]. Letter G indicates a goal state. G G G G G G 0 5 10 15 20 0 5 10 15 Figure 3. An optimal policy# 1 in environment 1 [1] For details for arrows, see text. Figure 3 shows an optimal policy# 1 for this maze.A simple arrow indicates a single step operator; a double arrow indicates a to wall ....

[Article contains additional citation context not shown here]

T. G. Dietterich and N. S. Flann. Explanation-based learning and reinforcement learning: A unified view. Machine Learning, 28:169--214, 1997.


Decision Theoretic Planning: Structural Assumptions and.. - Boutilier, Dean, Hanks (1999)   (150 citations)  (Correct)

....To implement this scheme effectively, we have to perform operations like regression without ever enumerating the set of all states, and this is where the structured representations for action transition, value, and policy functions play a role. For FOMDPs, approaches of this type are taken in [17, 21, 22, 46]. We illustrate the basic intuitions behind this approach by describing how value iteration for discounted, infinite horizon FOMDPs might work. We assume that the MDP is specified using a compact representation of the reward function (such as a decision tree) and actions (such as 2TBNs) In value ....

....together (i.e. maxed see Section 3.1) to determine V 1 . Of course, the process can be repeated some number of times to produce Vn for some suitable n, as well as the optimal policy with respect to Vn . This basic technique can be used in a number of different ways. Dietterich and Flann [46] propose ideas similar to these, but attention is restricted to MDPs with goal regions and deterministic actions (represented using STRIPS operators) thus rendering true goal regression techniques directly applicable. 49 Boutilier et al. 22] develop a version of modified policy iteration to ....

[Article contains additional citation context not shown here]

Thomas G. Dietterich and Nicholas S. Flann. Explanation-based learning and reinforcement learning: A unified approach. In Proceedings of the Twelfth International Conference on Machine Learning, pages 176--184, Lake Tahoe, NV, 1995.


Software as Learning: Quality Factors and Life-Cycle.. - Hernández-Orallo.. (2000)   (Correct)

....applications in this area. Many ML paradigms and techniques [31] can be used in different processes and stages of software construction: evaluation criteria like cross validation, query learning [3] reinforcement learning applied to constructive languages [23] explanation based learning, [14], data mining for knowledge based software, analogical reasoning, case based reasoning, genetic computation, etc. In summary, our analogy also shows that until machine intelligence (and ML) approaches human ability more closely, fully automated programming will remain a fallacy. In the meantime, ....

Dietterich, T.G.; Flann, N.S. "Explanation-Based Learning and Reinforcement Learning: A Unified View" Machine Learning, 28, 169-210, 1997.


Stochastic Dynamic Programming with Factored Representations - Boutilier, Dearden, al. (1999)   (30 citations)  (Correct)

.... of Givan and Dean [26, 27, 39] In this work, the notion of automaton minimization [42, 51] is extended to MDPs and is used to analyze abstraction techniques such as those presented in [30] More closely related to the specific model we propose in the current paper is that of Dietterich and Flann [32, 33]. They apply regression methods to the solution of MDPs (and consider this problem in the context of reinforcement learning in addition) Their original proposal [32] is restricted to MDPs with goal regions and deterministic actions (represented using STRIPS operators) thus rendering true ....

....consider this problem in the context of reinforcement learning in addition) Their original proposal [32] is restricted to MDPs with goal regions and deterministic actions (represented using STRIPS operators) thus rendering true goal regression techniques directly applicable. This is extended in [33] to allow stochastic actions, thus providing a stochastic generalization of goal regression. We discuss these models in more detail in Section 4.7. 1.3 Outline In Section 2 we describe the basic MDP model, various concepts that are used in the solution of MDPs, as well as several classical ....

[Article contains additional citation context not shown here]

Thomas G. Dietterich and Nicholas S. Flann. Explanation-based learning and reinforcement learning: A unified view. Machine Learning, 28(2):169--210, 1997.


Stochastic Dynamic Programming with Factored Representations - Boutilier, Dearden, al. (1999)   (30 citations)  (Correct)

.... of Givan and Dean [26, 27, 39] In this work, the notion of automaton minimization [42, 51] is extended to MDPs and is used to analyze abstraction techniques such as those presented in [30] More closely related to the specific model we propose in the current paper is that of Dietterich and Flann [32, 33]. They apply regression methods to the solution of MDPs (and consider this problem in the context of reinforcement learning in addition) Their original proposal [32] is restricted to MDPs with goal regions and deterministic actions (represented using STRIPS operators) thus rendering true ....

.... presented in [30] More closely related to the specific model we propose in the current paper is that of Dietterich and Flann [32, 33] They apply regression methods to the solution of MDPs (and consider this problem in the context of reinforcement learning in addition) Their original proposal [32] is restricted to MDPs with goal regions and deterministic actions (represented using STRIPS operators) thus rendering true goal regression techniques directly applicable. This is extended in [33] to allow stochastic actions, thus providing a stochastic generalization of goal regression. We ....

[Article contains additional citation context not shown here]

Thomas G. Dietterich and Nicholas S. Flann. Explanation-based learning and reinforcement learning: A unified approach. In Proceedings of the Twelfth International Conference on Machine Learning, pages 176--184, Lake Tahoe, 1995.


Structured Solution Methods for Non-Markovian Decision.. - Bacchus, Boutilier, Grove (1997)   (2 citations)  (Correct)

....August, 1997 fact that the state space grows exponentially with the number of problem variables. A recent focus in DTP research has been the development of MDP representation and solution techniques that do not require an explicit enumeration of the state space. For instance, the use of STRIPS [BD94, DF95, KHW94] or Bayes nets [BDG95, BD96] to represent actions in MDPs, and structured policy construction (SPC) methods that exploit such representations [DF95, BDG95, BD96] to avoid explicit state based computations when solving MDPs, promise to make MDPs more effective for such DTP problems. For such ....

....representation and solution techniques that do not require an explicit enumeration of the state space. For instance, the use of STRIPS [BD94, DF95, KHW94] or Bayes nets [BDG95, BD96] to represent actions in MDPs, and structured policy construction (SPC) methods that exploit such representations [DF95, BDG95, BD96] to avoid explicit state based computations when solving MDPs, promise to make MDPs more effective for such DTP problems. For such problems, the NMDP conversion algorithm proposed in [BBG96] has some obvious drawbacks. First, being state based, its complexity is exponential in the number of ....

Thomas G. Dietterich and Nicholas S. Flann. Explanationbased learning and reinforcement learning: A unified approach. In Proceedings of the Twelfth International Conference on Machine Learning, pages 176--184, Lake Tahoe, 1995.


Approximating Value Trees in Structured Dynamic Programming - Boutilier, Dearden (1996)   (15 citations)  (Correct)

....use of aggregation methods (or generalization) in which a number of states are grouped because they have similar or identical values and or action choice. These aggregates are treated as a single state in dynamic programming algorithms for the solution of MDPs or the related methods used in RL [22, 2, 16, 4, 5, 11, 12, 9, 17]. Such aggregations can be based on a number of different problem features, such as similarity of states according to some domain metric; but most methods generally assume that the states so grouped have the same value. In addition, such schemes can be exact or approximate, adaptive or fixed, and ....

....V ) 19] In an effort to mitigate the curse of dimensionality, researchers have sought to use aggregation or generalization to group states. One possible approach uses action models to form regions in the state space that have identical value and performs dynamic programming steps in this way [5, 12]. We briefly describe a structured version of value iteration (SVI) based on this intuition: at each stage, V i will be represented as a decision tree. 5 In value iteration, we need to produce the sequence of value functions V 0 ; V 1 ; Delta Delta Delta using Bellman backups, and we d ....

[Article contains additional citation context not shown here]

Thomas G. Dietterich and Nicholas S. Flann. Explanationbased learning and reinforcement learning: A unified approach. In Proceedings of the Twelfth International Conference on Machine Learning, pages 176--184, Lake Tahoe, 1995.


Correlated Action Effects in Decision Theoretic Regression - Boutilier (1997)   (3 citations)  (Correct)

....iteration is developed, in which value functions and policies are represented using decision trees and the DBN representation of the MDP is exploited to build these compact policies. 4 This 3 To simplify the presentation, we restric our attention to binary variables in our examples. 4 See [12] for a similar, though less general, method in the con technique is applied in [4] to value iteration, and dynamic approximation methods are considered as well. Roughly, if one has a tree representation of a value function, only certain variables will be mentioned as being relevant (under ....

Thomas G. Dietterich and Nicholas S. Flann. Explanationbased learning and reinforcement learning: A unified approach. In Proceedings of the Twelfth International Conference on Machine Learning, pages 176--184, Lake Tahoe, 1995.


Planning, Learning and Coordination in Multiagent Decision.. - Boutilier (1996)   (1 citation)  (Correct)

....use of aggregation methods (or generalization) in which a number of states are grouped because they have similar or identical value and or action choice. These aggregates are treated as a single state in dynamic programming algorithms for the solutionof MDPs or the related methods used in RL [44, 5, 36, 7, 9, 19, 22, 14, 38]. Such aggregations can be based on a number of different problem features, such as similarity of states according to some domain metric, but generally assume that the states so grouped have the same optimal value. In addition, such schemes can be exact or approximate, adaptive or fixed, and ....

....to get coffee from a coffee shop across the street, can get wet if it is raining unless it has an umbrella, and is rewarded if it brings coffee when the user requests it, and penalized (to a lesser extent) if it gets wet [9, 10] This network describes the action of fetching coffee. 28 See [22] for a similar approach to RL for goal based, deterministic problems. W U R W U R HC 0.9 1.0 W HC WC 1 2 W W 3 2 1 0 Tree Representation HC T F T F T F T F 1.0 1.0 0.0 T F F T T F F T T T T T F F F F 1.0 0.0 R W U W 1.0 1.0 0.0 1.0 W R U 1.0 T F 1.0 HC HC 0.9 Matrix HC WC WC 0.1 0.1 ....

[Article contains additional citation context not shown here]

Thomas G. Dietterich and Nicholas S. Flann. Explanationbasedlearning and reinforcement learning: A unified approach. In Proceedingsof the Twelfth International Conferenceon Machine Learning, pages 176--184, Lake Tahoe, 1995.


Learning to Take Actions - Khardon (1996)   (10 citations)  (Correct)

....to derive better results. The use of action models discussed above is also of interest. Action models have also been used in reinforcement learning as a tool for learning from imaginary experiments (Sutton, 1990) and more recently as part of the temporal difference update of the value function (Dietterich and Flann, 1995). The combination of declarative and procedural knowledge in this way is an interesting issue to be explored. Another direction for further work is the use of different models of interaction with the environment. An interesting model is suggested by Natarajan (1989) who studies learning to act in ....

Dietterich, T.G. and N.S. Flann. 1995. Explanation based learning and reinforcement learning: A unified view. In International Conference on Machine Learning, pages 176--184.


Model Minimization, Regression, and Propositional STRIPS Planning - Givan, Dean (1997)   (1 citation)  Self-citation (Thomas)   (Correct)

....used to trade time for space in computing approximately optimal solutions to Markov decision processes. Finally, in the longer version of this paper, we show how the methods of this paper can be used to understand the advantages of the explanationbased reinforcement learning algorithm developed by Dietterich and Flann [ 1995 ] 7 Conclusions In this paper, we demonstrate how traditional methods for solving propositional STRIPS planning problems can be viewed in terms of finite automata (model) minimization. Given a finite automaton whose statetransition function is defined by a set of STRIPS rules, we show how ....

Dietterich, Thomas G. and Flann, Nicholas S. 1995. Explanation-based learning and reinforcement learning: A unified view. In Proceedings Twelfth International Conference on Machine Learning. 176--184.


Explanation-Based Learning and Reinforcement Learning: A.. - Dietterich, Flann (1995)   (20 citations)  Self-citation (Dietterich Flann)   (Correct)

....into state s for the maximizing player have been completed. Hence, the algorithms attach to each state (or region) a count of the number of maximizing backups into that state remaining to be performed. Backups from that state are delayed until the count is zero. Further details are given in Dietterich Flann (1994). We studied point dp and rect dp applied to the king versus king and rook (KRK) ending in chess. While this ending is one of the simplest, it is difficult to play well, even for experts, and it can involve up to 42 ply of forced moves to win. The value function is derived for the maximizing side ....

Dietterich, T. G., & Flann, N. S. (1994). Explanationbased learning and reinforcement learning: A unified view. Tech. rep., Oregon State University, Corvallis, OR.


Hierarchical Explanation-Based Reinforcement Learning - Tadepalli, Dietterich (1997)   (5 citations)  Self-citation (Dietterich)   (Correct)

....ftadepalli,tgdg research.cs. orst.edu Abstract Explanation Based Reinforcement Learning (EBRL) was introduced by Dietterich and Flann as a way of combining the ability of Reinforcement Learning (RL) to learn optimal plans with the generalization ability of Explanation Based Learning (EBL) (Dietterich Flann, 1995). We extend this work to domains where the agent must order and achieve a sequence of subgoals in an optimal fashion. Hierarchical EBRL can effectively learn optimal policies in some of these sequential task domains even when the subgoals weakly interact with each other. We also show that when a ....

....Learning (RL) has emerged as the method of choice for building autonomous agents that improve their performance with experience. One obstacle to scaling this approach to large problems is the lack of a robust and justifiable method to generalize from one experience to another. Dietterich and Flann (Dietterich Flann, 1995) showed that ExplanationBased Learning (EBL) can be used to generalize the experience of a reinforcement learner across different states, provided the learner has access to a complete and correct domain theory. The result is an effective learning method called Explanation Based Reinforcement ....

[Article contains additional citation context not shown here]

Dietterich, T. G., & Flann, N. (1995). Explanationbased learning and reinforcement learning: A unifed view. In Proceedings of Machine Learning Conference.


Learning Relational Navigation Policies - Cocora, Kersting, Plagemann.. (2006)   (Correct)

No context found.

T. G. Dietterich and N. S. Flann, "Explanation-based learning and reinforcement learning: a unified view," Machine Learning, vol. 28, pp. 169--210, 1997.


Equivalence Notions and Model Minimization in - Markov Decision Processes   (Correct)

No context found.

Dietterich, T. G., and Flann, N. S. 1995. Explanation-based learning and reinforcement learning: A unified view. In Proceedings Twelfth International Conference on Machine Learning, 176-184.


Generalizing Dijkstra's Algorithm and Gaussian Elimination.. - McMahan, Gordon (2005)   (Correct)

No context found.

T. G. Dietterich and N. S. Flann. Explanation-based learning and reinforcement learning: A unified view. In 12th International Conference on Machine Learning (ICML), pages 176--184. Morgan Kaufmann, 1995.


Computational 'Consilience' as - Theory   (Correct)

No context found.

Dieuerich, T.G.; Flann, N.S. "Explanation-Based Learning and Reinforcement Learning: A Unified View" Machine Learning, 28, 169-210, 1997.


Game Design Verification using Reinforcement Learning - Ntoutsi, Kalles   (Correct)

No context found.

T. Dietterich, N. Flann. "Explanation-Based Learning and Reinforcement Learning: A Unified View ", Machine Learning, Vol. 28, 1997.


Learning to be Competent - Khardon (1996)   (Correct)

No context found.

Dietterich, T.G. and N.S. Flann. 1995. Explanation based learning and reinforcement learning: A unified view. In Workshop on Machine Learning, pages


Constructive Reinforcement Learning - Hernandez-Orallo (1999)   (Correct)

No context found.

T.G. Dietterich and N.S. Flann, "Explanation-Based Learning and Reinforcement Learning: A Unified View" Machine Learning, 28, 169-210, (1997).

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC