• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Restless bandits with switching costs: Linear programming relaxations, performance bounds and limited lookahead policies (2006)

by J Le Ny, E Feron
Venue:In Proceedings of the American Control Conference
Add To MetaCart

Tools

Sorted by:
Results 1 - 5 of 5

Multi-agent task assignment in the bandit framework

by Jerome Le Ny, Munther Dahleh, Eric Feron - in Proc. 45th IEEE Conf. Decision Control , 2006
"... Abstract—We consider a task assignment problem for a fleet of UAVs in a surveillance/search mission. We formulate the problem as a restless bandits problem with switching costs and discounted rewards: there are N sites to inspect, each one of them evolving as a Markov chain, with different transitio ..."
Abstract - Cited by 4 (2 self) - Add to MetaCart
Abstract—We consider a task assignment problem for a fleet of UAVs in a surveillance/search mission. We formulate the problem as a restless bandits problem with switching costs and discounted rewards: there are N sites to inspect, each one of them evolving as a Markov chain, with different transition probabilities if the site is inspected or not. The sites evolve independently of each other, there are transition costs ci j for moving between sites i and j ∈ {1,...,N}, rewards when visiting the sites, and we maximize a mixed objective function of these costs and rewards. This problem is known to be PSPACE-hard. We present a systematic method, inspired from the work of Bertsimas and Niño-Mora [1] on restless bandits, for deriving a linear programming relaxation for such locally decomposable MDPs. The relaxation is computable in polynomial-time offline, provides a bound on the achievable performance, as well as an approximation of the cost-to-go which can be used online in conjunction with standard suboptimal stochastic control methods. In particular, the one-step lookahead policy based on this approximate cost-to-go reduces to computing the optimal value of a linear assignment problem of size N. We present numerical experiments, for which we assess the quality of the heuristics using the performance bound. I.
(Show Context)

Citation Context

...m (the multi-armed bandit problem with switching costs is NPhard). In this paper, we extend our previous work on polynomialtime relaxations of the restless bandits problem with switching costs (RBSC) =-=[7]-=-, from the single agent to the multi-agent case. In Section II, we formulate the RBSC problem in the framework of Markov decision processes (MDP). In Section III, we show how the local structure of th...

Optimal Cooperative Internetwork Spectrum Sharing for Cognitive Radio Systems With Spectrum Pooling

by Pengbo Si, Hong Ji, Senior Member, F. Richardyu, Senior Member, Victor C. M. Leung
"... Abstract—Spectrum pooling in cognitive radio systems is an approach to manage the available spectrum bands from different licensed networks. Most previous work on spectrum pooling concentrates on the system architecture and the design of flexibleaccess algorithms and schemes. In this paper, we prese ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Abstract—Spectrum pooling in cognitive radio systems is an approach to manage the available spectrum bands from different licensed networks. Most previous work on spectrum pooling concentrates on the system architecture and the design of flexibleaccess algorithms and schemes. In this paper, we present a cooperative scheme for internetwork spectrum sharing among multiple secondary systems, which takes into account the price and spectrum efficiency as the design criteria. Specifically, the spectrum-sharing problem is formulated as a stochastic bandit system; thus, the optimal spectrum-sharing scheme is simply allocating the new available band to the secondary network with the lowest index. Extensive simulation examples illustrate that the proposed scheme significantly improves the performance compared with the existing scheme that ignores optimal spectrum sharing. Index Terms—Cognitive radio (CR), spectrum sharing, stochastic bandits. I.

© 2015 INFORMS Learning from Experience, Simply

by Song Lin, Juanjuan Zhang, John R. Hauser
"... There is substantial academic interest in modeling consumer experiential learning. However, (approximately)optimal solutions to forward-looking experiential learning problems are complex, limiting their behavioral plausibility and empirical feasibility. We propose that consumers use cognitively simp ..."
Abstract - Add to MetaCart
There is substantial academic interest in modeling consumer experiential learning. However, (approximately)optimal solutions to forward-looking experiential learning problems are complex, limiting their behavioral plausibility and empirical feasibility. We propose that consumers use cognitively simple heuristic strategies. We explore one viable heuristic—index strategies—and demonstrate that they are intuitive, tractable, and plausible. Index strategies are much simpler for consumers to use but provide close-to-optimal utility. They also avoid exponential growth in computational complexity, enabling researchers to study learning models in more complex situations. Well-defined index strategies depend on a structural property called indexability. We prove the indexability of a canonical forward-looking experiential learning model in which consumers learn brand quality while facing random utility shocks. Following an index strategy, consumers develop an index for each brand separately and choose the brand with the highest index. Using synthetic data, we demonstrate that an index strategy achieves nearly optimal utility at substantially lower computational costs. Using IRI data for diapers, we find that an index strategy performs as well as an approximately optimal solution and better than myopic learning. We extend the analysis to incorporate risk aversion, other cognitively simple heuristics, heterogeneous foresight, and an alternative specification of brands.

A Linear Programming Relaxation and a Heuristic for the Restless Bandits Problem with General Switching Costs

by Jerome Le Ny, Munther Dahleh, Eric Feron , 2008
"... We extend a relaxation technique due to Bertsimas and Niño-Mora for the restless bandit problem to the case where arbitrary costs penalize switching between the bandits. We also construct a one-step lookahead policy using the solution of the relaxation. Computational experiments and a bound for appr ..."
Abstract - Add to MetaCart
We extend a relaxation technique due to Bertsimas and Niño-Mora for the restless bandit problem to the case where arbitrary costs penalize switching between the bandits. We also construct a one-step lookahead policy using the solution of the relaxation. Computational experiments and a bound for approximate dynamic programming provide some empirical support for the heuristic. 1
(Show Context)

Citation Context

...ransition costs in a job search problem or transaction fees in a portfolio optimization problem. It is easy to see that the MABPSC is NP-hard, since the HAMILTON CYCLE problem is a special case of it =-=[12]-=-. The MABPSC has been studied in particular by Asawa and Teneketzis [1], and very recently by Glazebrook et al. [6] and Niño-Mora [13]. These authors are concerned with the case where the switching co...

........ Em i. F.........

by Jerome Le Ny, Ae Nautics, Emilio Frazzoli, Munther A. Dahleh, Li Braries, David L. Darmofal, Jerome Le Ny , 2008
"... Certified by.................,--. r--4-Certified by........ Certified by............... ..."
Abstract - Add to MetaCart
Certified by.................,--. r--4-Certified by........ Certified by...............
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University