Results 1 -
4 of
4
Ontogenetic and Phylogenetic Reinforcement Learning
"... Reinforcement learning (RL) problems come in many flavours, as do the algorithms for solving them. It is currently not clear which of the commonly used RL benchmarks best measure an algorithm’s capacity for solving real-world problems. Similarly, it is not clear which types of RL algorithms are best ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Reinforcement learning (RL) problems come in many flavours, as do the algorithms for solving them. It is currently not clear which of the commonly used RL benchmarks best measure an algorithm’s capacity for solving real-world problems. Similarly, it is not clear which types of RL algorithms are best suited to solve which kinds of RL problems. Here we present some dimensions along the axes o which RL problems and algorithms can be varied to help distinguish them from each other. Based on results and arguments in the literature, we present some conjectures as to what algorithms should work best for particular types of problems, and argue that tunable RL benchmarks are needed in order to further understand the capabilities of RL algorithms. 1
Evolving memory cell structures for sequence learning
"... Abstract. The best recent supervised sequence learning methods use gradient descent to train networks of miniature nets called memory cells. The most popular cell structure seems somewhat arbitrary though. Here we optimize its topology with a multi-objective evolutionary algorithm. The fitness funct ..."
Abstract
- Add to MetaCart
Abstract. The best recent supervised sequence learning methods use gradient descent to train networks of miniature nets called memory cells. The most popular cell structure seems somewhat arbitrary though. Here we optimize its topology with a multi-objective evolutionary algorithm. The fitness function reflects the structure’s usefulness for learning various formal languages. The evolved cells help to understand crucial structural features that aid sequence learning. 1
Artificial Intelligence and Robotics Lab. ∗Illinois Genetic Algorithms Lab.
, 2009
"... We applied on-line neuroevolution to evolve non-player characters for The Open Racing Car Simulator. While previous approaches allowed on-line learning with performance improvements during each generation, our approach enables a finer grained on-line learning with performance improvements within eac ..."
Abstract
- Add to MetaCart
We applied on-line neuroevolution to evolve non-player characters for The Open Racing Car Simulator. While previous approaches allowed on-line learning with performance improvements during each generation, our approach enables a finer grained on-line learning with performance improvements within each lap. We tested our approach on three tracks using two methods of online neuroevolution (NEAT and rtNEAT) combined with four evaluation strategies (ε-Greedy, ε-Greedy-Improved, Softmax, and Interval-based) taken from the literature. We compared the eight resulting configurations on several driving tasks involving (i) the learning of a driving behavior for a specific track, (ii) its adaptation to a new track, and (iii) the generalization capability to unknown tracks. The results we present show that our approach can successfully evolve drivers from scratch and can also be used to transfer evolved knowledge to other tracks. Overall, our results suggest that the approach performs significantly better when coupled with on-line NEAT and also indicate that ε-Greedy-Improved, Softmax are generally better than the other evaluation strategies. 1
The Exploration/Exploitation Trade-off in Reinforcement Learning for Dialogue Management
"... Abstract—Conversational systems use deterministic rules that trigger actions such as requests for confirmation or clarification. More recently, Reinforcement Learning and (Partially Observable) Markov Decision Processes have been proposed for this task. In this paper, we investigate action selection ..."
Abstract
- Add to MetaCart
Abstract—Conversational systems use deterministic rules that trigger actions such as requests for confirmation or clarification. More recently, Reinforcement Learning and (Partially Observable) Markov Decision Processes have been proposed for this task. In this paper, we investigate action selection strategies for dialogue management, in particular the exploration/exploitation trade-off and its impact on final reward (i.e. the session reward after optimization has ended) and lifetime reward (i.e. the overall reward accumulated over the learner’s lifetime). We propose to use interleaved exploitation sessions as a learning methodology to assess the reward obtained from the current policy. The experiments show a statistically significant difference in final reward of exploitation-only sessions between a system that optimizes lifetime reward and one that maximizes the reward of the final policy. I.

