Results 1 -
2 of
2
Efficient learning of relational models for sequential decision making
, 2010
"... The exploration-exploitation tradeoff is crucial to reinforcement-learning (RL) agents, and a significant number of sample complexity results have been derived for agents in propositional domains. These results guarantee, with high probability, near-optimal behavior in all but a polynomial number of ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The exploration-exploitation tradeoff is crucial to reinforcement-learning (RL) agents, and a significant number of sample complexity results have been derived for agents in propositional domains. These results guarantee, with high probability, near-optimal behavior in all but a polynomial number of timesteps in the agent’s lifetime. In this work, we prove similar results for certain relational representations, primarily a class we call “relational action schemas”. These generalized models allow us to specify state transitions in a compact form, for instance describing the effect of picking up a generic block instead of picking up 10 different specific blocks. We present theoretical results on crucial subproblems in action-schema learning using the KWIK framework, which allows us to characterize the sample efficiency of an agent learning these models in a reinforcement-learning setting. These results are extended in an apprenticeship learning paradigm where and agent has access not only to its environment, but also to a teacher that can demonstrate traces of state/action/state sequences. We show that the class of action schemas that are efficiently learnable in this paradigm is strictly larger than those learnable in the online setting. We link
unknown title
"... Abstract. We consider an agent which learns a relational action model in order to be able to predict the effects of his actions. The model consists of a set of STRIPS-like rules, i.e. rules predicting what has changed in the current state when applying a given action as far as a set of preconditions ..."
Abstract
- Add to MetaCart
Abstract. We consider an agent which learns a relational action model in order to be able to predict the effects of his actions. The model consists of a set of STRIPS-like rules, i.e. rules predicting what has changed in the current state when applying a given action as far as a set of preconditions is satisfied by the current state. Here several rules can be associated to a given action, therefore allowing to model conditional effects. Learning is online, as examples result from actions performed by the agent, and incremental, as the current action model is revised each time it is contradicted by unexpected effects resulting from his actions. The form of the model allows using it as an input of standard planners. In this work, the learning unit IRALe 1 is embedded in an integrated system able to i) learn an action model ii) select its actions iii) plan to reach a goal. The agent uses the current action model to perform active learning, i.e. to select actions with the purpose of reaching states that will enforce a revision of the model, and uses its planning abilities to have a realistic evaluation of the accuracy of the model. 1

