Results 1  10
of
365
Algorithms for Inverse Reinforcement Learning
 in Proc. 17th International Conf. on Machine Learning
, 2000
"... This paper addresses the problem of inverse reinforcement learning (IRL) in Markov decision processes, that is, the problem of extracting a reward function given observed, optimal behaviour. IRL may be useful for apprenticeship learning to acquire skilled behaviour, and for ascertaining the re ..."
Abstract

Cited by 307 (6 self)
 Add to MetaCart
(Show Context)
This paper addresses the problem of inverse reinforcement learning (IRL) in Markov decision processes, that is, the problem of extracting a reward function given observed, optimal behaviour. IRL may be useful for apprenticeship learning to acquire skilled behaviour, and for ascertaining the reward function being optimized by a natural system. We rst characterize the set of all reward functions for which a given policy is optimal. We then derive three algorithms for IRL. The rst two deal with the case where the entire policy is known; we handle tabulated reward functions on a nite state space and linear functional approximation of the reward function over a potentially in nite state space. The third algorithm deals with the more realistic case in which the policy is known only through a nite set of observed trajectories. In all cases, a key issue is degeneracythe existence of a large set of reward functions for which the observed policy is optimal. To remove...
Kalman filtering with intermittent observations
 IEEE TRANSACTIONS ON AUTOMATIC CONTROL
, 2004
"... Motivated by navigation and tracking applications within sensor networks, we consider the problem of performing Kalman filtering with intermittent observations. When data travel along unreliable communication channels in a large, wireless, multihop sensor network, the effect of communication delays ..."
Abstract

Cited by 279 (40 self)
 Add to MetaCart
Motivated by navigation and tracking applications within sensor networks, we consider the problem of performing Kalman filtering with intermittent observations. When data travel along unreliable communication channels in a large, wireless, multihop sensor network, the effect of communication delays and loss of information in the control loop cannot be neglected. We address this problem starting from the discrete Kalman filtering formulation, and modeling the arrival of the observation as a random process. We study the statistical convergence properties of the estimation error covariance, showing the existence of a critical value for the arrival rate of the observations, beyond which a transition to an unbounded state error covariance occurs. We also give upper and lower bounds on this expected state error covariance.
Multiobjective output feedback control via LMI
 in Proc. Amer. Contr. Conf
, 1997
"... The problem of multiobjective H2=H1 optimal controller design is reviewed. There is as yet no exact solution to this problem. We present a method based on that proposed by Scherer [14]. The problem is formulated as a convex semidefinite program (SDP) using the LMI formulation of the H2 and H1 norms. ..."
Abstract

Cited by 212 (8 self)
 Add to MetaCart
The problem of multiobjective H2=H1 optimal controller design is reviewed. There is as yet no exact solution to this problem. We present a method based on that proposed by Scherer [14]. The problem is formulated as a convex semidefinite program (SDP) using the LMI formulation of the H2 and H1 norms. Suboptimal solutions are computed using finite dimensional Qparametrization. The objective value of the suboptimal Q's converges to the true optimum as the dimension of Q is increased. State space representations are presented which are the analog of those given by Khargonekar and Rotea [11] for the H2 case. A simple example computed using FIR (Finite Impulse Response) Q's is presented.
A Survey of Computational Complexity Results in Systems and Control
, 2000
"... The purpose of this paper is twofold: (a) to provide a tutorial introduction to some key concepts from the theory of computational complexity, highlighting their relevance to systems and control theory, and (b) to survey the relatively recent research activity lying at the interface between these fi ..."
Abstract

Cited by 186 (18 self)
 Add to MetaCart
The purpose of this paper is twofold: (a) to provide a tutorial introduction to some key concepts from the theory of computational complexity, highlighting their relevance to systems and control theory, and (b) to survey the relatively recent research activity lying at the interface between these fields. We begin with a brief introduction to models of computation, the concepts of undecidability, polynomial time algorithms, NPcompleteness, and the implications of intractability results. We then survey a number of problems that arise in systems and control theory, some of them classical, some of them related to current research. We discuss them from the point of view of computational complexity and also point out many open problems. In particular, we consider problems related to stability or stabilizability of linear systems with parametric uncertainty, robust control, timevarying linear systems, nonlinear and hybrid systems, and stochastic optimal control.
ROBUST PORTFOLIO SELECTION PROBLEMS
, 2003
"... In this paper we show how to formulate and solve robust portfolio selection problems. The objective of these robust formulations is to systematically combat the sensitivity of the optimal portfolio to statistical and modeling errors in the estimates of the relevant market parameters. We introduce “u ..."
Abstract

Cited by 156 (8 self)
 Add to MetaCart
In this paper we show how to formulate and solve robust portfolio selection problems. The objective of these robust formulations is to systematically combat the sensitivity of the optimal portfolio to statistical and modeling errors in the estimates of the relevant market parameters. We introduce “uncertainty structures” for the market parameters and show that the robust portfolio selection problems corresponding to these uncertainty structures can be reformulated as secondorder cone programs and, therefore, the computational effort required to solve them is comparable to that required for solving convex quadratic programs. Moreover, we show that these uncertainty structures correspond to confidence regions associated with the statistical procedures employed to estimate the market parameters. Finally, we demonstrate a simple recipe for efficiently computing robust portfolios given raw market data and a desired level of confidence.
Theory of semidefinite programming for sensor network localization
 IN SODA05
, 2005
"... We analyze the semidefinite programming (SDP) based model and method for the position estimation problem in sensor network localization and other Euclidean distance geometry applications. We use SDP duality and interior–point algorithm theories to prove that the SDP localizes any network or graph th ..."
Abstract

Cited by 118 (9 self)
 Add to MetaCart
We analyze the semidefinite programming (SDP) based model and method for the position estimation problem in sensor network localization and other Euclidean distance geometry applications. We use SDP duality and interior–point algorithm theories to prove that the SDP localizes any network or graph that has unique sensor positions to fit given distance measures. Therefore, we show, for the first time, that these networks can be localized in polynomial time. We also give a simple and efficient criterion for checking whether a given instance of the localization problem has a unique realization in R 2 using graph rigidity theory. Finally, we introduce a notion called strong localizability and show that the SDP model will identify all strongly localizable sub–networks in the input network.
SDPA (SemiDefinite Programming Algorithm) User's Manual  Version 7.0.5
, 2008
"... The SDPA (SemiDefinite Programming Algorithm) [5] is a software package for solving semidefinite programs (SDPs). It is based on a Mehrotratype predictorcorrector infeasible primaldual interiorpoint method. The SDPA handles the standard form SDP and its dual. It is implemented in C++ language u ..."
Abstract

Cited by 111 (32 self)
 Add to MetaCart
(Show Context)
The SDPA (SemiDefinite Programming Algorithm) [5] is a software package for solving semidefinite programs (SDPs). It is based on a Mehrotratype predictorcorrector infeasible primaldual interiorpoint method. The SDPA handles the standard form SDP and its dual. It is implemented in C++ language utilizing the LAPACK [1] for matrix computations. The SDPA version 7.0.5 enjoys the following features: • Efficient method for computing the search directions when the SDP to be solved is large scale and sparse [4]. • Block diagonal matrix structure and sparse matrix structure are supported for data matrices. • Sparse or dense Cholesky factorization for the Schur matrix is automatically selected. • An initial point can be specified. • Some information on infeasibility of the SDP is provided. This manual and the SDPA can be downloaded from the WWW site
Research on gain scheduling
, 2000
"... Current research on gain scheduling is clarifying customary practices as well as devising new approaches and methods for the design of nonlinear control systems. Gain scheduling for nonlinear controller design is described in terms of general features of the approach and in terms of early examples o ..."
Abstract

Cited by 92 (2 self)
 Add to MetaCart
Current research on gain scheduling is clarifying customary practices as well as devising new approaches and methods for the design of nonlinear control systems. Gain scheduling for nonlinear controller design is described in terms of general features of the approach and in terms of early examples of applications in flight control and automotive engine control. Then recent research is discussed, emphasizing work on linearizationbased scheduling and work on linear parametervarying approaches.
The Lyapunov exponent and joint spectral radius of pairs of matrices are hard  when not impossible  to compute and to approximate
, 1997
"... We analyse the computability and the complexity of various definitions of spectral radii for sets of matrices. We show that the joint and generalized spectral radii of two integer matrices are not approximable in polynomial time, and that two related quantities  the lower spectral radius and th ..."
Abstract

Cited by 92 (18 self)
 Add to MetaCart
We analyse the computability and the complexity of various definitions of spectral radii for sets of matrices. We show that the joint and generalized spectral radii of two integer matrices are not approximable in polynomial time, and that two related quantities  the lower spectral radius and the largest Lyapunov exponent  are not algorithmically approximable.
Bayesian inverse reinforcement learning
 in 20th Int. Joint Conf. Artificial Intelligence
, 2007
"... Inverse Reinforcement Learning (IRL) is the problem of learning the reward function underlying a Markov Decision Process given the dynamics of the system and the behaviour of an expert. IRL is motivated by situations where knowledge of the rewards is a goal by itself (as in preference elicitation) a ..."
Abstract

Cited by 80 (0 self)
 Add to MetaCart
Inverse Reinforcement Learning (IRL) is the problem of learning the reward function underlying a Markov Decision Process given the dynamics of the system and the behaviour of an expert. IRL is motivated by situations where knowledge of the rewards is a goal by itself (as in preference elicitation) and by the task of apprenticeship learning (learning policies from an expert). In this paper we show how to combine prior knowledge and evidence from the expert’s actions to derive a probability distribution over the space of reward functions. We present efficient algorithms that find solutions for the reward learning and apprenticeship learning tasks that generalize well over these distributions. Experimental results show strong improvement for our methods over previous heuristicbased approaches. 1