#### DMCA

## Identifying useful subgoals in reinforcement learning by local graph partitioning (2005)

### Cached

### Download Links

Venue: | In Proceedings of the Twenty-Second International Conference on Machine Learning |

Citations: | 68 - 10 self |

### Citations

3724 | Normalized cuts and image segmentation
- Shi, Malik
- 2000
(Show Context)
Citation Context ...t of edges, a cut (A, B) of G is a partition of V ; the edges that cross the cut are those with one endpoint in block A and the other in block B. We seek to minimize the Normalized Cut metric (NCut) (=-=Shi & Malik, 2000-=-), defined as follows: NCut(A, B) = cut(A, B) vol(A) + cut(B, A) vol(B) (1) where cut(A, B) is the sum of the weights on edges that originate in A and end in B, and vol(A) is the sum of weights on all... |

2764 |
Pattern classification
- Duda, Hart, et al.
- 2012
(Show Context)
Citation Context ...at all targets have the same success probability p, and all non-targets have the same success probability q, the optimal decision rule is to label a state as target if the following inequality holds (=-=Duda et al., 2001-=-) : nt n > ln 1−q 1−p ln p(1−q) q(1−p) + 1 ln( n λfa p(N) · λmiss p(T ) ) ln p(1−q) q(1−p) , (3) where nt is the number of times the state was a hit, n is the number of observations on this state (i.e... |

1985 |
Network Flows”, Theory, Algorithms and Applications, Prentice-Hall,
- Ahuja, Magnanti, et al.
- 1993
(Show Context)
Citation Context ...t score, when the sample graph has no edges from the corresponding block to the other one. We note here that there are two alternative cut metrics that are commonly used in graph partitioning: MinCut =-=[15]-=- and RatioCut [16]. MinCut is the sum of edge weights that cross the cut, while RatioCut equals cutsize(A,B)/|A| + cutsize(A,B)/|B| for an undirected graph. Neither of these meets our needs as well as... |

554 | Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning
- Sutton, Precup, et al.
- 1999
(Show Context)
Citation Context ...w computational cost. 1. Introduction Recent methods in Reinforcement Learning (RL) allow an agent to plan, act, and learn with temporallyextended actions (Dietterich, 2000; Parr, 1998; Precup, 2000; =-=Sutton et al., 1999-=-). A temporally-extended action, or a skill, is a closed-loop policy over one-step actions, for example one that takes a robot to its battery charger using lower level sensory and motor actions. A sui... |

435 | Hierarchical reinforcement learning with the maxq value function decomposition
- Dietterich
(Show Context)
Citation Context ...e space—while producing an algorithm with low computational cost. 1. Introduction Recent methods in Reinforcement Learning (RL) allow an agent to plan, act, and learn with temporallyextended actions (=-=Dietterich, 2000-=-; Parr, 1998; Precup, 2000; Sutton et al., 1999). A temporally-extended action, or a skill, is a closed-loop policy over one-step actions, for example one that takes a robot to its battery charger usi... |

313 | Self-Improving reactive agents based on reinforcement learning, planning and teaching
- Lin
- 1992
(Show Context)
Citation Context ...lock and whose distance in the sample graph to the subgoal was less than option lag (lo), a parameter of the algorithm. The option’s policy is specified through an RL process employing action replay (=-=Lin, 1992-=-) using a pseudo reward function (Dietterich, 2000). The policy learned takes the agent to the subgoal state in as few transitions as possible while remaining in the option’s initiation set. After the... |

295 | New spectral methods for ratio cut partitioning and clustering
- Hagen, Kahng
- 1992
(Show Context)
Citation Context ...ng between blocks has a low probability and transitioning within blocks has a high probability. Alternative cut metrics commonly used in graph partitioning are MinCut (Wu & Leahy, 1993) and RatioCut (=-=Hagen & Kahng, 1992-=-). MinCut is the sum of edge weights that cross the cut, while RatioCut equals cut(A, B)/|A|+cut(A, B)/|B| for an undirected graph. Neither of these meets our needs as well as NCut does. MinCut, in pa... |

280 | Reinforcement learning with hierarchies of machines
- Parr, Russell
- 1998
(Show Context)
Citation Context ...ement learning (RL) researchers have recently developed several formalisms that address planning, acting, and learning with temporally-extended actions. These include Hierarchies of Abstract Machines =-=[1, 2]-=-, MAXQ value function decomposition [3], and the options framework [4, 5]. These formalisms pave the way toward dramatically improved capabilities of autonomous agents, but to fully realize their bene... |

162 |
Sequential composition of dynamically dexterous robot behaviors
- Burridge, Rizzi, et al.
- 1999
(Show Context)
Citation Context ...identifying the access states. For example, analogous situations exist in continuous control problems where sequential composition of “funnels” in system dynamics can give rise to access-like states (=-=Burridge et al., 1999-=-). We note here an alternative use of local cuts to identify access states: Build the entire transition graph, but perform cuts on local neighborhoods, for example on a part of the graph that contains... |

146 | Automatic discovery of subgoals in reinforcement learning using diverse density
- McGovern, Barto
- 2001
(Show Context)
Citation Context ...e literature include states that are visited frequently or that have a high reward gradient (Digney, 1998), states that are visited frequently on successful trajectories but not on unsuccessful ones (=-=McGovern & Barto, 2001-=-), and states that lie between densely-connected regions of the state space (Mannor et al., 2004; Menache et al., 2002; S¸im¸sek & Barto, 2004). In addition, Hengst (2002) has used the notion of a sub... |

119 | Hierarchical Control and Learning for Markov Decision Processes
- Parr
- 1998
(Show Context)
Citation Context ...ucing an algorithm with low computational cost. 1. Introduction Recent methods in Reinforcement Learning (RL) allow an agent to plan, act, and learn with temporallyextended actions (Dietterich, 2000; =-=Parr, 1998-=-; Precup, 2000; Sutton et al., 1999). A temporally-extended action, or a skill, is a closed-loop policy over one-step actions, for example one that takes a robot to its battery charger using lower lev... |

110 | Finding structure in reinforcement learning
- Thrun, Schwartz
- 1995
(Show Context)
Citation Context ...been suggested towards this end. One approach is to search for commonly occurring subpolicies in solutions to a set of tasks and to generate skills with corresponding policies (Pickett & Barto, 2002; =-=Thrun & Schwartz, 1995-=-). A second Appearing in Proceedings of the 22 nd International Conference on Machine Learning, Bonn, Germany, 2005. Copyright 2005 by the author(s)/owner(s). approach is to identify subgoals—states t... |

93 | Discovering hierarchy in reinforcement learning with HEXQ - Hengst |

75 | Using relative novelty to identify useful temporal abstractions in reinforcement learning - Simsek, Barto - 2004 |

62 | Temporal Abstraction in Reinforcement Learning
- Precup
(Show Context)
Citation Context ...s that address planning, acting, and learning with temporally-extended actions. These include Hierarchies of Abstract Machines [1, 2], MAXQ value function decomposition [3], and the options framework =-=[4, 5]-=-. These formalisms pave the way toward dramatically improved capabilities of autonomous agents, but to fully realize their benefits, an agent needs to be able to create useful temporally-extended acti... |

51 |
Learning Hierarchical Control Structure for Multiple Tasks and Changing Environments
- Digney
- 1998
(Show Context)
Citation Context ...ful to reach—and to learn skills that take the agent efficiently to these subgoals. Subgoals proposed in the literature include states that are visited frequently or that have a high reward gradient (=-=Digney, 1998-=-), states that are visited frequently on successful trajectories but not on unsuccessful ones (McGovern & Barto, 2001), and states that lie between densely-connected regions of the state space (Mannor... |

51 | Dynamic abstraction in reinforcement learning via clustering
- Mannor, Menache, et al.
- 2004
(Show Context)
Citation Context ..., 1998), states that are visited frequently on successful trajectories but not on unsuccessful ones (McGovern & Barto, 2001), and states that lie between densely-connected regions of the state space (=-=Mannor et al., 2004-=-; Menache et al., 2002; S¸im¸sek & Barto, 2004). In addition, Hengst (2002) has used the notion of a subgoal in performing temporal and spatial abstraction simultaneously, defining subgoals to be thos... |

48 | Q-cut?dynamic discovery of sub-goals in reinforcement learning
- Menache, Mannor, et al.
- 2002
(Show Context)
Citation Context ...are visited frequently on successful trajectories but not on unsuccessful ones (McGovern & Barto, 2001), and states that lie between densely-connected regions of the state space (Mannor et al., 2004; =-=Menache et al., 2002-=-; S¸im¸sek & Barto, 2004). In addition, Hengst (2002) has used the notion of a subgoal in performing temporal and spatial abstraction simultaneously, defining subgoals to be those states that lead to ... |

35 | Policyblocks: An algorithm for creating useful macro-actions in reinforcement learning
- Pickett, Barto
- 2002
(Show Context)
Citation Context ...number of methods have been suggested towards this end. One approach is to search for commonly occurring subpolicies in solutions to a set of tasks and to generate skills with corresponding policies (=-=Pickett & Barto, 2002-=-; Thrun & Schwartz, 1995). A second Appearing in Proceedings of the 22 nd International Conference on Machine Learning, Bonn, Germany, 2005. Copyright 2005 by the author(s)/owner(s). approach is to id... |

9 |
Temporal abstraction in reinforcement learning. Doctoral dissertation
- Precup
- 2000
(Show Context)
Citation Context ...orithm with low computational cost. 1. Introduction Recent methods in Reinforcement Learning (RL) allow an agent to plan, act, and learn with temporallyextended actions (Dietterich, 2000; Parr, 1998; =-=Precup, 2000-=-; Sutton et al., 1999). A temporally-extended action, or a skill, is a closed-loop policy over one-step actions, for example one that takes a robot to its battery charger using lower level sensory and... |

1 | Identifying Subgoals by Local Graph Partitioning - Hengst - 2002 |