Results 1  10
of
16
Output space search for structured prediction
 In ICML
, 2012
"... We consider a framework for structured prediction based on search in the space of complete structured outputs. Given a structured input, an output is produced by running a timebounded search procedure guided by a learned cost function, and then returning the least cost output uncovered during the s ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
(Show Context)
We consider a framework for structured prediction based on search in the space of complete structured outputs. Given a structured input, an output is produced by running a timebounded search procedure guided by a learned cost function, and then returning the least cost output uncovered during the search. This framework can be instantiated for a wide range of search spaces and search procedures, and easily incorporates arbitrary structuredprediction loss functions. In this paper, we make two main technical contributions. First, we define the limiteddiscrepancy search space over structured outputs, which is able to leverage powerful classification learning algorithms to improve the search space quality. Second, we give a generic cost function learning approach, where the key idea is to learn a cost function that attempts to mimic the behavior of conducting searches guided by the true loss function. Our experiments on six benchmark domains demonstrate that using our framework with only a small amount of search is sufficient for significantly improving on stateoftheart structuredprediction performance. 1.
HCsearch: A learning framework for searchbased structured prediction
 JAIR
"... Structured prediction is the problem of learning a function that maps structured inputs to structured outputs. Prototypical examples of structured prediction include partofspeech tagging and semantic segmentation of images. Inspired by the recent successes of searchbased structured prediction, we ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
Structured prediction is the problem of learning a function that maps structured inputs to structured outputs. Prototypical examples of structured prediction include partofspeech tagging and semantic segmentation of images. Inspired by the recent successes of searchbased structured prediction, we introduce a new framework for structured prediction called HCSearch. Given a structured input, the framework uses a search procedure guided by a learned heuristic H to uncover high quality candidate outputs and then employs a separate learned cost function C to select a final prediction among those outputs. The overall loss of this prediction architecture decomposes into the loss due to H not leading to high quality outputs, and the loss due to C not selecting the best among the generated outputs. Guided by this decomposition, we minimize the overall loss in a greedy stagewise manner by first training H to quickly uncover high quality outputs via imitation learning, and then training C to correctly rank the outputs generated via H according to their true losses. Importantly, this training procedure is sensitive to the particular loss function of interest and the timebound allowed for predictions. Experiments on several benchmark domains show that our approach significantly outperforms several stateoftheart methods. 1.
Learned Prioritization for Trading Off Accuracy and Speed
, 2012
"... Users want natural language processing (NLP) systems to be both fast and accurate, but quality often comes at the cost of speed. The field has been manually exploring various speedaccuracy tradeoffs for particular problems or datasets. We aim to explore this space automatically, focusing here on th ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Users want natural language processing (NLP) systems to be both fast and accurate, but quality often comes at the cost of speed. The field has been manually exploring various speedaccuracy tradeoffs for particular problems or datasets. We aim to explore this space automatically, focusing here on the case of agendabased syntactic parsing (Kay, 1986). Unfortunately, offtheshelf reinforcement learning techniques fail to learn good policies: the state space is too large to explore naively. We propose a hybrid reinforcement/apprenticeship learning algorithm that, even with few inexpensive features, can automatically learn weights that achieve competitive accuracies at significant improvements in speed over stateoftheart baselines.
Learning from Demonstrations: Is It Worth Estimating a Reward Function?
"... Abstract. This paper provides a comparative study between Inverse Reinforcement Learning (IRL) and Apprenticeship Learning (AL). IRL and AL are two frameworks, using Markov Decision Processes (MDP), which are used for the imitation learning problem where an agent tries to learn from demonstrations o ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
(Show Context)
Abstract. This paper provides a comparative study between Inverse Reinforcement Learning (IRL) and Apprenticeship Learning (AL). IRL and AL are two frameworks, using Markov Decision Processes (MDP), which are used for the imitation learning problem where an agent tries to learn from demonstrations of an expert. In the AL framework, the agent tries to learn the expert policy whereas in the IRL framework, the agent tries to learn a reward which can explain the behavior of the expert. This reward is then optimized to imitate the expert. One can wonder if it is worth estimating such a reward, or if estimating a policy is sufficient. This quite natural question has not really been addressed in the literature right now. We provide partial answers, both from a theoretical and empirical point of view. 1
Preference elicitation and inverse reinforcement learning
, 2011
"... Abstract. We state the problem of inverse reinforcement learning in terms of preference elicitation, resulting in a principled (Bayesian) statistical formulation. This generalises previous work on Bayesian inverse reinforcement learning and allows us to obtain a posterior distribution on the agent’s ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We state the problem of inverse reinforcement learning in terms of preference elicitation, resulting in a principled (Bayesian) statistical formulation. This generalises previous work on Bayesian inverse reinforcement learning and allows us to obtain a posterior distribution on the agent’s preferences, policy and optionally, the obtained reward sequence, from observations. We examine the relation of the resulting approach to other statistical methods for inverse reinforcement learning via analysis and experimental results. We show that preferences can be determined accurately, even if the observed agent’s policy is suboptimal with respect to its own preferences. In that case, significantly improved policies with respect to the agent’s preferences are obtained, compared to both other methods and to the performance of the demonstrated policy.
Active Imitation Learning via Reduction to I.I.D. Active Learning
"... In standard passive imitation learning, the goal is to learn a target policy by passively observing full execution trajectories of it. Unfortunately, generating such trajectories can require substantial expert effort and be impractical in some cases. In this paper, we consider active imitation learn ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
In standard passive imitation learning, the goal is to learn a target policy by passively observing full execution trajectories of it. Unfortunately, generating such trajectories can require substantial expert effort and be impractical in some cases. In this paper, we consider active imitation learning with the goal of reducing this effort by querying the expert about the desired action at individual states, which are selected based on answers to past queries and the learner’s interactions with an environment simulator. We introduce a new approach based on reducing active imitation learning to i.i.d. active learning, which can leverage progress in the i.i.d. setting. Our first contribution, is to analyze reductions for both nonstationary and stationary policies, showing that the label complexity (number of queries) of active imitation learning can be substantially less than passive learning. Our second contribution, is to introduce a practical algorithm inspired by the reductions, which is shown to be highly effective in four test domains compared to a number of alternatives. 1
A cascaded supervised learning approach to inverse reinforcement learning
 In European Conference on Machine Learning (ECML
, 2013
"... Abstract. This paper considers the Inverse Reinforcement Learning (IRL) problem, that is inferring a reward function for which a demonstrated expert policy is optimal. We propose to break the IRL problem down into two generic Supervised Learning steps: this is the Cascaded Supervised IRL (CSI) appr ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
(Show Context)
Abstract. This paper considers the Inverse Reinforcement Learning (IRL) problem, that is inferring a reward function for which a demonstrated expert policy is optimal. We propose to break the IRL problem down into two generic Supervised Learning steps: this is the Cascaded Supervised IRL (CSI) approach. A classification step that defines a score function is followed by a regression step providing a reward function. A theoretical analysis shows that the demonstrated expert policy is nearoptimal for the computed reward function. Not needing to repeatedly solve a Markov Decision Process (MDP) and the ability to leverage existing techniques for classification and regression are two important advantages of the CSI approach. It is furthermore empirically demonstrated to compare positively to stateoftheart approaches when using only transitions sampled according to the expert policy, up to the use of some heuristics. This is exemplified on two classical benchmarks (the mountain car problem and a highway driving simulator). 1
Boosted and Rewardregularized Classification for Apprenticeship Learning
"... lille1.fr This paper deals with the problem of learning from demonstrations, where an agent called the apprentice tries to learn a behavior from demonstrations of another agent called the expert. To address this problem, we place ourselves into the Markov Decision Process (MDP) framework, which is ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
lille1.fr This paper deals with the problem of learning from demonstrations, where an agent called the apprentice tries to learn a behavior from demonstrations of another agent called the expert. To address this problem, we place ourselves into the Markov Decision Process (MDP) framework, which is well suited for sequential decision making problems. A way to tackle this problem is to reduce it to classification but doing so we do not take into account the MDP structure. Other methods which take into account the MDP structure need to solve MDPs which is a difficult task and/or need a choice of features which is problemdependent. The main contribution of the paper is to extend a large margin approach, which is a classification method, by adding a regularization term which takes into account the MDP structure. The derived algorithm, called Rewardregularized Classification for Apprenticeship Learning (RCAL), does not need to solve MDPs. But, the major advantage is that it can be boosted: this avoids the choice of features, which is a drawback of parametric approaches. A state of the art experiment (Highway) and generic experiments (structured Garnets) are conducted to show the performance of RCAL compared to algorithms from the literature.
Predicting when to laugh with structured classification
 In Proc. of Interspeech
"... Today, Embodied Conversational Agents (ECAs) are emerging as natural media to interact with machines. Applications are numerous and ECAs can reduce the technological gap between people by providing userfriendly interfaces. Yet, ECAs are still unable to produce social signals appropriately during th ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Today, Embodied Conversational Agents (ECAs) are emerging as natural media to interact with machines. Applications are numerous and ECAs can reduce the technological gap between people by providing userfriendly interfaces. Yet, ECAs are still unable to produce social signals appropriately during their interaction with humans, which tends to make the interaction less instinctive. Especially, very little attention has been paid to the use of laughter in humanavatar interactions despite the crucial role played by laughter in humanhuman interaction. In this paper, a method for predicting the most appropriate moment for laughing for an ECA is proposed. Imitation learning via a structured classification algorithm is used in this purpose and is shown to produce a behavior similar to humans ’ on a practical application: the yes/no game.
ActionGap Phenomenon in Reinforcement Learning
"... Many practitioners of reinforcement learning problems have observed that oftentimes the performance of the agent reaches very close to the optimal performance even though the estimated (action)value function is still far from the optimal one. The goal of this paper is to explain and formalize this ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Many practitioners of reinforcement learning problems have observed that oftentimes the performance of the agent reaches very close to the optimal performance even though the estimated (action)value function is still far from the optimal one. The goal of this paper is to explain and formalize this phenomenon by introducing the concept of the actiongap regularity. As a typical result, we prove that for an agent following the greedy policy ˆπ with respect to an actionvalue function ˆ Q, the performance loss E [ V ∗ (X) − V ˆπ (X) ] is upper bounded by O( ‖ ˆ Q − Q ∗ ‖ 1+ζ in which ζ ≥ 0 is the parameter quantifying the actiongap regularity. For ζ> 0, our results indicate smaller performance loss compared to what previous analyses had suggested. Finally, we show how this regularity affects the performance of the family of approximate value iteration algorithms. 1