Results 1 - 10
of
20
Optimality of Myopic Sensing in Multichannel Opportunistic Access
, 2008
"... We consider opportunistic communication over multiple channels where the state (“good ” or “bad”) of each channel evolves as independent and identically distributed Markov processes. A user, with limited channel sensing and access capability, chooses one channel to sense and subsequently access (bas ..."
Abstract
-
Cited by 31 (17 self)
- Add to MetaCart
We consider opportunistic communication over multiple channels where the state (“good ” or “bad”) of each channel evolves as independent and identically distributed Markov processes. A user, with limited channel sensing and access capability, chooses one channel to sense and subsequently access (based on the sensed channel state) in each time slot. A reward is obtained whenever the user senses and accesses a “good ” channel. The objective is to design an optimal channel selection policy that maximizes the expected total (discounted or average) reward accrued over a finite or infinite horizon. This problem can be cast as a Partially Observable Markov Decision Process (POMDP) or a restless multi-armed bandit process, to which optimal solutions are often intractable. We show in this paper that a myopic policy that maximizes the immediate one-step reward is always optimal when the state transitions are positively correlated over time. When the state transitions are negatively correlated, we show that the same policy is optimal when the number of channels is limited to 2 or 3, while presenting a counterexample for the case of 4 channels. This result finds applications in opportunistic transmission scheduling in a fading environment, cognitive radio networks for spectrum overlay, and resource-constrained jamming and anti-jamming.
Approximation Algorithms for Partial-information based Stochastic Control with Markovian Rewards
"... We consider a variant of the classic multi-armed bandit problem (MAB), which we call FEEDBACK MAB, where the reward obtained by playing each of n independent arms varies according to an underlying on/off Markov process with known parameters. The evolution of the Markov chain happens irrespective of ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
We consider a variant of the classic multi-armed bandit problem (MAB), which we call FEEDBACK MAB, where the reward obtained by playing each of n independent arms varies according to an underlying on/off Markov process with known parameters. The evolution of the Markov chain happens irrespective of whether the arm is played, and furthermore, the exact state of the Markov chain is only revealed to the player when the arm is played and the reward observed. At most one arm (or in general, M arms) can be played any time step. The goal is to design a policy for playing the arms in order to maximize the infinite horizon time average expected reward. This problem is an instance of a Partially Observable Markov Decision Process (POMDP), and a special case of the notoriously intractable “restless bandit ” problem. Unlike the stochastic MAB problem, the FEEDBACK MAB problem does not admit to greedy index-based optimal policies. The state of the system at any time step encodes the beliefs about the states of different arms, and the policy decisions change these beliefs – this aspect complicates the design and analysis of simple algorithms. We design a constant factor approximation to the FEEDBACK MAB problem by solving and rounding a natural LP relaxation to this problem. As far as we are aware, this is the first approximation algorithm for a POMDP problem. 1
Dynamic Allocation Indices For Restless Projects And Queueing Admission Control: A Polyhedral Approach
, 2002
"... This paper develops a polyhedral approach to the design, analysis, and computation of dynamic allocation indices for scheduling binary-action (engage/rest) Markovian stochastic projects which can change state when rested (restless bandits (RBs)), based on partial conservation laws (PCLs). This exten ..."
Abstract
-
Cited by 14 (8 self)
- Add to MetaCart
This paper develops a polyhedral approach to the design, analysis, and computation of dynamic allocation indices for scheduling binary-action (engage/rest) Markovian stochastic projects which can change state when rested (restless bandits (RBs)), based on partial conservation laws (PCLs). This extends previous work by the author [J. Nino-Mora (2001): Restless bandits, partial conservation laws and indexability. Adv. Appl. Probab. 33, 76--98], where PCLs were shown to imply the optimality of index policies with a postulated structure in stochastic scheduling problems, under admissible linear objectives, and they were deployed to obtain simple sufficient conditions for the existence of Whittle's (1988) RB index (indexability), along with an adaptive-greedy index algorithm. The new contributions include: (i) we develop the polyhedral foundation of the PCL framework, based on the structural and algorithmic properties of a new polytope associated with an accessible set system (J, (F-extended polymatroid); (ii) we present new dynamic allocation indices for RBs, motivated by an admission control model, which extend Whittle's and have a significantly increased scope; (iii) we deploy PCLs to obtain both sufficient conditions for the existence of the new indices (PCL-in- dexability), and a new adaptive-greedy index algorithm; (iv) we interpret PCL-indexability as a form of the classic economics law of diminishing marginal returns, and characterize the index as an optimal marginal cost rate; we further solve a related optimal constrained control problem; (v) we carry out a PCL-indexability analysis of the motivating admission control model, under time-discounted and long-run average criteria; this gives, under mild conditions, a new index characterization of optimal threshold...
Adapting to a Changing Environment: the Brownian Restless Bandits
"... In the multi-armed bandit (MAB) problem there are k distributions associated with the rewards of playing each of k strategies (slot machine arms). The reward distributions are initially unknown to the player. The player iteratively plays one strategy per round, observes the associated reward, and de ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
In the multi-armed bandit (MAB) problem there are k distributions associated with the rewards of playing each of k strategies (slot machine arms). The reward distributions are initially unknown to the player. The player iteratively plays one strategy per round, observes the associated reward, and decides on the strategy for the next iteration. The goal is to maximize the reward by balancing exploitation: the use of acquired information, with exploration: learning new information. We introduce and study a dynamic MAB problem in which the reward functions stochastically and gradually change in time. Specifically, the expected reward of each arm follows a Brownian motion, a discrete random walk, or similar processes. In this setting a player has to continuously keep exploring in order to adapt to the changing environment. Our formulation is (roughly) a special case of the notoriously intractable restless MAB problem. Our goal here is to characterize the cost of learning and adapting to the changing environment, in terms of the stochastic rate of the change. We consider an infinite time horizon, and strive to minimize the average cost per step which we define with respect to a hypothetical algorithm that at every step plays the arm with the maximum expected reward at this step. A related line of work on the adversarial MAB problem used a significantly weaker benchmark, the best time-invariant policy. The dynamic MAB problem models a variety of practical online, game-against- nature type optimization settings. While building on prior work, algorithms and steady-state analysis for the dynamic setting require a novel approach based on different stochastic tools.
Restless bandit marginal productivity indices, diminishing returns, and scheduling a multiclass make-to-order/-stock queue
- In Proceedings of the 41st Annual Allerton Conference on Communications, Control and Computing
, 2003
"... informs ® doi 10.1287/moor.1050.0165 © 2006 INFORMS This paper presents a framework grounded on convex optimization and economics ideas to solve by index policies problems of optimal dynamic allocation of effort to a discrete-state (finite or countable) binary-action (work/rest) semi-Markov restless ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
informs ® doi 10.1287/moor.1050.0165 © 2006 INFORMS This paper presents a framework grounded on convex optimization and economics ideas to solve by index policies problems of optimal dynamic allocation of effort to a discrete-state (finite or countable) binary-action (work/rest) semi-Markov restless bandit project, elucidating issues raised by previous work. Its contributions include: (i) the concept of a restless bandit’s marginal productivity index (MPI), characterizing optimal policies relative to general cost and work measures; (ii) the characterization of indexable restless bandits as those satisfying diminishing marginal returns to work, consistently with a nested family of threshold policies; (iii) sufficient indexability conditions via partial conservation laws (PCLs); (iv) the characterization of the MPI as an optimal marginal productivity rate relative to feasible active-state sets; (v) application to semi-Markov bandits under several criteria, including a new mixed average-bias criterion; and (vi) PCL-indexability analyses and MPIs for optimal service control of make-to-order/make-to-stock queues with convex holding costs, under discounted and average-bias criteria. Key words: restless bandits; stochastic scheduling; index policies; indexability; control by price; semi-Markov decision processes; dynamic resource allocation; diminishing returns; marginal productivity; efficient frontier; convex optimization;
Dynamic multichannel access with imperfect channel state detection
- IEEE Trans. Signal Process
, 2010
"... Abstract—A restless multi-armed bandit problem that arises in multichannel opportunistic communications is considered, where channels are modeled as independent and identical Gilbert–Elliot channels and channel state detection is subject to errors. A simple structure of the myopic policy is establis ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Abstract—A restless multi-armed bandit problem that arises in multichannel opportunistic communications is considered, where channels are modeled as independent and identical Gilbert–Elliot channels and channel state detection is subject to errors. A simple structure of the myopic policy is established under a certain condition on the false alarm probability of the channel state detector. It is shown that myopic actions can be obtained by maintaining a simple channel ordering without knowing the underlying Markovian model. The optimality of the myopic policy is proved for the case of two channels and conjectured for general cases. Lower and upper bounds on the performance of the myopic policy are obtained in closed-form, which characterize the scaling behavior of the achievable throughput of the multichannel opportunistic system. The approximation factor of the myopic policy is also analyzed to bound its worst-case performance loss with respect to the optimal performance. Index Terms—Cognitive radio, dynamic multichannel access, myopic policy, restless multi-armed bandit.
The economics of attention: maximizing user value in information rich environments
- The First International Workshop on Data Mining and Audience Intelligence for Advertising (ADKDD’07
, 2006
"... We introduce an automatic configuration mechanism that generates the most relevant information to be presented to limited attention users of information-rich media. It also guarantees to maximize their total expected utility from the information they receive. A computationally efficient algorithm is ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
We introduce an automatic configuration mechanism that generates the most relevant information to be presented to limited attention users of information-rich media. It also guarantees to maximize their total expected utility from the information they receive. A computationally efficient algorithm is used to assign an index value to each information item, which then determines whether or not a given item appears in the top list presented to users at a given time.
Mortal Multi-Armed Bandits
"... We formulate and study a new variant of the k-armed bandit problem, motivated by e-commerce applications. In our model, arms have (stochastic) lifetime after which they expire. In this setting an algorithm needs to continuously explore new arms, in contrast to the standard k-armed bandit model in wh ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We formulate and study a new variant of the k-armed bandit problem, motivated by e-commerce applications. In our model, arms have (stochastic) lifetime after which they expire. In this setting an algorithm needs to continuously explore new arms, in contrast to the standard k-armed bandit model in which arms are available indefinitely and exploration is reduced once an optimal arm is identified with nearcertainty. The main motivation for our setting is online-advertising, where ads have limited lifetime due to, for example, the nature of their content and their campaign budgets. An algorithm needs to choose among a large collection of ads, more than can be fully explored within the typical ad lifetime. We present an optimal algorithm for the state-aware (deterministic reward function) case, and build on this technique to obtain an algorithm for the state-oblivious (stochastic reward function) case. Empirical studies on various reward distributions, including one derived from a real-world ad serving application, show that the proposed algorithms significantly outperform the standard multi-armed bandit approaches applied to these settings. 1
MULTI-ARMED BANDIT PROBLEMS
"... Multi-armed bandit (MAB) problems are a class of sequential resource allocation problems concerned with allocating one or more resources among several alternative (competing) projects. Such problems are paradigms of a fundamental conflict between making decisions (allocating resources) that yield ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Multi-armed bandit (MAB) problems are a class of sequential resource allocation problems concerned with allocating one or more resources among several alternative (competing) projects. Such problems are paradigms of a fundamental conflict between making decisions (allocating resources) that yield
Characterization and computation of restless bandit marginal productivity indices. SMCtools ’07
- Proc. 2007 Workshop on Tools for Solving Structured Markov Chains
"... Appl. Probab. 25A, 287-298] yields a practical scheduling rule for the versatile yet intractable multi-armed restless bandit problem, involving the optimal dynamic priority allocation to multiple stochastic projects, modeled as restless bandits, i.e., binary-action (active/passive) (semi-) Markov de ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Appl. Probab. 25A, 287-298] yields a practical scheduling rule for the versatile yet intractable multi-armed restless bandit problem, involving the optimal dynamic priority allocation to multiple stochastic projects, modeled as restless bandits, i.e., binary-action (active/passive) (semi-) Markov decision processes. A growing body of evidence shows that such a rule is nearly optimal in a wide variety of applications, which raises the need to efficiently compute the Whittle index and more general marginal productivity index (MPI) extensions in large-scale models. For such a purpose, this paper extends to restless bandits the parametric linear programming (LP) approach deployed 3 in [J. Niño-Mora. A ( 2 / 3) n fast-pivoting algorithm for the Gittins index and optimal stopping of a Markov chain, INFORMS J. Comp., in press], which yielded a fast Gittins-index algorithm. Yet the extension is not straightforward, as the MPI is only defined for the limited range of socalled indexable bandits, which motivates the quest for methods to establish indexability. This paper furnishes algorithmic and analytical tools to realize the potential of MPI policies in largescale applications, presenting the following contributions: (i) a complete algorithmic

