Results 1 -
5 of
5
Multi-agent task assignment in the bandit framework
- in Proc. 45th IEEE Conf. Decision Control
, 2006
"... Abstract—We consider a task assignment problem for a fleet of UAVs in a surveillance/search mission. We formulate the problem as a restless bandits problem with switching costs and discounted rewards: there are N sites to inspect, each one of them evolving as a Markov chain, with different transitio ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
Abstract—We consider a task assignment problem for a fleet of UAVs in a surveillance/search mission. We formulate the problem as a restless bandits problem with switching costs and discounted rewards: there are N sites to inspect, each one of them evolving as a Markov chain, with different transition probabilities if the site is inspected or not. The sites evolve independently of each other, there are transition costs ci j for moving between sites i and j ∈ {1,...,N}, rewards when visiting the sites, and we maximize a mixed objective function of these costs and rewards. This problem is known to be PSPACE-hard. We present a systematic method, inspired from the work of Bertsimas and Niño-Mora [1] on restless bandits, for deriving a linear programming relaxation for such locally decomposable MDPs. The relaxation is computable in polynomial-time offline, provides a bound on the achievable performance, as well as an approximation of the cost-to-go which can be used online in conjunction with standard suboptimal stochastic control methods. In particular, the one-step lookahead policy based on this approximate cost-to-go reduces to computing the optimal value of a linear assignment problem of size N. We present numerical experiments, for which we assess the quality of the heuristics using the performance bound. I.
Optimal Cooperative Internetwork Spectrum Sharing for Cognitive Radio Systems With Spectrum Pooling
"... Abstract—Spectrum pooling in cognitive radio systems is an approach to manage the available spectrum bands from different licensed networks. Most previous work on spectrum pooling concentrates on the system architecture and the design of flexibleaccess algorithms and schemes. In this paper, we prese ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract—Spectrum pooling in cognitive radio systems is an approach to manage the available spectrum bands from different licensed networks. Most previous work on spectrum pooling concentrates on the system architecture and the design of flexibleaccess algorithms and schemes. In this paper, we present a cooperative scheme for internetwork spectrum sharing among multiple secondary systems, which takes into account the price and spectrum efficiency as the design criteria. Specifically, the spectrum-sharing problem is formulated as a stochastic bandit system; thus, the optimal spectrum-sharing scheme is simply allocating the new available band to the secondary network with the lowest index. Extensive simulation examples illustrate that the proposed scheme significantly improves the performance compared with the existing scheme that ignores optimal spectrum sharing. Index Terms—Cognitive radio (CR), spectrum sharing, stochastic bandits. I.
© 2015 INFORMS Learning from Experience, Simply
"... There is substantial academic interest in modeling consumer experiential learning. However, (approximately)optimal solutions to forward-looking experiential learning problems are complex, limiting their behavioral plausibility and empirical feasibility. We propose that consumers use cognitively simp ..."
Abstract
- Add to MetaCart
There is substantial academic interest in modeling consumer experiential learning. However, (approximately)optimal solutions to forward-looking experiential learning problems are complex, limiting their behavioral plausibility and empirical feasibility. We propose that consumers use cognitively simple heuristic strategies. We explore one viable heuristic—index strategies—and demonstrate that they are intuitive, tractable, and plausible. Index strategies are much simpler for consumers to use but provide close-to-optimal utility. They also avoid exponential growth in computational complexity, enabling researchers to study learning models in more complex situations. Well-defined index strategies depend on a structural property called indexability. We prove the indexability of a canonical forward-looking experiential learning model in which consumers learn brand quality while facing random utility shocks. Following an index strategy, consumers develop an index for each brand separately and choose the brand with the highest index. Using synthetic data, we demonstrate that an index strategy achieves nearly optimal utility at substantially lower computational costs. Using IRI data for diapers, we find that an index strategy performs as well as an approximately optimal solution and better than myopic learning. We extend the analysis to incorporate risk aversion, other cognitively simple heuristics, heterogeneous foresight, and an alternative specification of brands.
A Linear Programming Relaxation and a Heuristic for the Restless Bandits Problem with General Switching Costs
, 2008
"... We extend a relaxation technique due to Bertsimas and Niño-Mora for the restless bandit problem to the case where arbitrary costs penalize switching between the bandits. We also construct a one-step lookahead policy using the solution of the relaxation. Computational experiments and a bound for appr ..."
Abstract
- Add to MetaCart
(Show Context)
We extend a relaxation technique due to Bertsimas and Niño-Mora for the restless bandit problem to the case where arbitrary costs penalize switching between the bandits. We also construct a one-step lookahead policy using the solution of the relaxation. Computational experiments and a bound for approximate dynamic programming provide some empirical support for the heuristic. 1
........ Em i. F.........
, 2008
"... Certified by.................,--. r--4-Certified by........ Certified by............... ..."
Abstract
- Add to MetaCart
Certified by.................,--. r--4-Certified by........ Certified by...............