Results 1  10
of
104
Optimality of Myopic Sensing in Multichannel Opportunistic Access
, 2008
"... We consider opportunistic communication over multiple channels where the state (“good ” or “bad”) of each channel evolves as independent and identically distributed Markov processes. A user, with limited channel sensing and access capability, chooses one channel to sense and subsequently access (bas ..."
Abstract

Cited by 112 (38 self)
 Add to MetaCart
(Show Context)
We consider opportunistic communication over multiple channels where the state (“good ” or “bad”) of each channel evolves as independent and identically distributed Markov processes. A user, with limited channel sensing and access capability, chooses one channel to sense and subsequently access (based on the sensed channel state) in each time slot. A reward is obtained whenever the user senses and accesses a “good ” channel. The objective is to design an optimal channel selection policy that maximizes the expected total (discounted or average) reward accrued over a finite or infinite horizon. This problem can be cast as a Partially Observable Markov Decision Process (POMDP) or a restless multiarmed bandit process, to which optimal solutions are often intractable. We show in this paper that a myopic policy that maximizes the immediate onestep reward is always optimal when the state transitions are positively correlated over time. When the state transitions are negatively correlated, we show that the same policy is optimal when the number of channels is limited to 2 or 3, while presenting a counterexample for the case of 4 channels. This result finds applications in opportunistic transmission scheduling in a fading environment, cognitive radio networks for spectrum overlay, and resourceconstrained jamming and antijamming.
Indexability of Restless Bandit Problems and Optimality of Whittle's Index for Dynamic . . .
"... We consider a class of restless multiarmed bandit problems (RMBP) that arises in dynamic multichannel access, user/server scheduling, and optimal activation in multiagent systems. For this class of RMBP, we establish the indexability and obtain Whittle’s index in closedform for both discounted an ..."
Abstract

Cited by 59 (13 self)
 Add to MetaCart
We consider a class of restless multiarmed bandit problems (RMBP) that arises in dynamic multichannel access, user/server scheduling, and optimal activation in multiagent systems. For this class of RMBP, we establish the indexability and obtain Whittle’s index in closedform for both discounted and average reward criteria. These results lead to a direct implementation of Whittle’s index policy with remarkably low complexity. When arms are stochastically identical, we show that Whittle’s index policy is optimal under certain conditions. Furthermore, it has a semiuniversal structure that obviates the need to know the Markov transition probabilities. The optimality and the semiuniversal structure result from the equivalency between Whittle’s index policy and the myopic policy established in this work. For nonidentical arms, we develop efficient algorithms for computing a performance upper bound given by Lagrangian relaxation. The tightness of the upper bound and the nearoptimal performance of Whittle’s index policy are illustrated with simulation examples.
Learning Multiuser Channel Allocations in Cognitive Radio Networks: A Combinatorial MultiArmed Bandit Formulation
"... Abstract—We consider the following fundamental problem in the context of channelized dynamic spectrum access. There are M secondary users and N ≥ M orthogonal channels. Each secondary user requires a single channel for operation that does not conflict with the channels assigned to the other users. D ..."
Abstract

Cited by 39 (6 self)
 Add to MetaCart
(Show Context)
Abstract—We consider the following fundamental problem in the context of channelized dynamic spectrum access. There are M secondary users and N ≥ M orthogonal channels. Each secondary user requires a single channel for operation that does not conflict with the channels assigned to the other users. Due to geographic dispersion, each secondary user can potentially see different primary user occupancy behavior on each channel. Time is divided into discrete decision rounds. The throughput obtainable from spectrum opportunities on each userchannel combination over a decision period is modeled as an arbitrarilydistributed random variable with bounded support but unknown mean, i.i.d. over time. The objective is to search for an allocation of channels for all users that maximizes the expected sum throughput. We formulate this problem as a combinatorial multiarmed bandit (MAB), in which each arm corresponds to a matching of the users to channels. Unlike most prior work on multiarmed bandits, this combinatorial formulation results in dependent arms. Moreover, the number of arms grows superexponentially as the permutation P (N, M). We present a novel matchinglearning algorithm with polynomial storage and polynomial computation per decision period for this problem, and prove that it results in a regret (the gap between the expected sumthroughput obtained by a genieaided perfect allocation and that obtained by this algorithm) that is uniformly upperbounded for all time n by a function that grows as O(M 4 Nlogn), i.e. polynomial in the number of unknown parameters and logarithmic in time. We also discuss how our results provide a nontrivial generalization of known theoretical results on multiarmed bandits. I.
Algorithms for dynamic spectrum access with learning for cognitive radio
 IEEE Transactions on Signal Processing
, 2010
"... We study the problem of dynamic spectrum sensing and access in cognitive radio systems as a partially observed Markov decision process (POMDP). A group of cognitive users cooperatively tries to exploit vacancies in some primary (licensed) channels whose occupancies follow a Markovian evolution. We f ..."
Abstract

Cited by 36 (2 self)
 Add to MetaCart
(Show Context)
We study the problem of dynamic spectrum sensing and access in cognitive radio systems as a partially observed Markov decision process (POMDP). A group of cognitive users cooperatively tries to exploit vacancies in some primary (licensed) channels whose occupancies follow a Markovian evolution. We first consider the scenario where the cognitive users have perfect knowledge of the distribution of the signals they receive from the primary users. For this problem, we obtain a greedy channel selection and access policy that maximizes the instantaneous reward, while satisfying a constraint on the probability of interfering with licensed transmissions. We also derive an analytical universal upper bound on the performance of the optimal policy. Through simulation, we show that our scheme achieves good performance relative to the upper bound and substantial improvement relative to an existing scheme. We then consider the more practical scenario where the exact distribution of the signal from the primary is unknown. We assume a parametric model for the distribution and develop an algorithm that can learn the true distribution, still guaranteeing the constraint on the interference probability. We show
Approximation algorithms for restless bandit problems
 CORR
"... In this paper, we consider the restless bandit problem, which is one of the most wellstudied generalizations of the celebrated stochastic multiarmed bandit problem in decision theory. In its ultimate generality, the restless bandit problem is known to be PSPACEHard to approximate to any nontrivi ..."
Abstract

Cited by 35 (4 self)
 Add to MetaCart
(Show Context)
In this paper, we consider the restless bandit problem, which is one of the most wellstudied generalizations of the celebrated stochastic multiarmed bandit problem in decision theory. In its ultimate generality, the restless bandit problem is known to be PSPACEHard to approximate to any nontrivial factor, and little progress has been made on this problem despite its significance in modeling activity allocation under uncertainty. We make progress on this problem by showing that for an interesting and general subclass that we term Monotone bandits, a surprisingly simple and intuitive greedy policy yields a factor 2 approximation. Such greedy policies are termed index policies, and are popular due to their simplicity and their optimality for the stochastic multiarmed bandit problem. The Monotone bandit problem strictly generalizes the stochastic multiarmed bandit problem, and naturally models multiproject scheduling where the state of a project becomes increasingly uncertain when the project is not scheduled. We develop several novel techniques in the design and analysis of the index policy. Our algorithm proceeds by introducing a novel “balance” constraint to the dual of a wellknown LP relaxation to the restless bandit problem. This is followed by a structural characterization of the optimal solution by using both the exact primal as well as dual complementary slackness conditions. This yields an interpretation of the dual variables as potential functions from which we derive the index policy and the associated analysis.
Delay analysis for cognitive radio networks with random access: A fluid queue view
 In Proc. IEEE INFOCOM
, 2010
"... Abstract—We consider a cognitive radio network where multiple secondary users (SUs) contend for spectrum usage, using random access, over available primary user (PU) channels. Our focus is on SUs ’ queueing delay performance, for which a systematic understanding is lacking. We take a fluid queue app ..."
Abstract

Cited by 32 (8 self)
 Add to MetaCart
Abstract—We consider a cognitive radio network where multiple secondary users (SUs) contend for spectrum usage, using random access, over available primary user (PU) channels. Our focus is on SUs ’ queueing delay performance, for which a systematic understanding is lacking. We take a fluid queue approximation approach to study the steadystate delay performance of SUs, for cases with a single PU channel and multiple PU channels. Using stochastic fluid models, we represent the queue dynamics as Poisson driven stochastic differential equations, and characterize the moments of the SUs ’ queue lengths accordingly. Since in practical systems, a secondary user would have no knowledge of other users ’ activities, its contention probability has to be set based on local information. With this observation, we develop adaptive algorithms to find the optimal contention probability that minimizes the mean queue lengths. Moreover, we study the impact of multiple channels and multiple interfaces, on SUs’ delay performance. As expected, the use of multiple channels and/or multiple interfaces leads to significant delay reduction. I.
Multichannel opportunistic access: a case of restless bandits with multiple plays
 ALLERTON CONFERENCE, OCTOBER 2009, ALLERTON, IL
, 2009
"... This paper considers the following stochastic control problem that arises in opportunistic spectrum access: a system consists of n channels where the state (“good” or “bad”) of each channel evolves as independent and identically distributed Markov processes. A user can select exactly k channels to ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
This paper considers the following stochastic control problem that arises in opportunistic spectrum access: a system consists of n channels where the state (“good” or “bad”) of each channel evolves as independent and identically distributed Markov processes. A user can select exactly k channels to sense and access (based on the sensing result) in each time slot. A reward is obtained whenever the user senses and accesses a “good” channel. The objective is to design a channel selection policy that maximizes the expected discounted total reward accrued over a finite or infinite horizon. In our previous work we established the optimality of a greedy policy for the special case of k = 1 (i.e., single channel access) under the condition that the channel state transitions are positively correlated over time. In this paper we show under the same condition the greedy policy is optimal for the general case of k ≥ 1; the methodology introduced here is thus more general. This problem may be viewed as a special case of the restless bandit problem, with multiple plays. We discuss connections between the current problem and existing literature on this class of problems.
The NonBayesian Restless Multiarmed Bandit: A Case Of NearLogarithmic Regret
 PROC. OF INTERNANIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP
, 2011
"... In the classic Bayesian restless multiarmed bandit (RMAB) problem, there are N arms, with rewards on all arms evolving at each time as Markov chains with known parameters. A player seeks to activate K ≥ 1 arms at each time in order to maximize the expected total reward obtained over multiple plays. ..."
Abstract

Cited by 25 (17 self)
 Add to MetaCart
In the classic Bayesian restless multiarmed bandit (RMAB) problem, there are N arms, with rewards on all arms evolving at each time as Markov chains with known parameters. A player seeks to activate K ≥ 1 arms at each time in order to maximize the expected total reward obtained over multiple plays. RMAB is a challenging problem that is known to be PSPACEhard in general. We consider in this work the even harder nonBayesian RMAB, in which the parameters of the Markov chain are assumed to be unknown a priori. We develop an original approach to this problem that is applicable when the corresponding Bayesian problem has the structure that, depending on the known parameter values, the optimal solution is one of a prescribed finite set of policies. In such settings, we propose to learn the optimal policy for the nonBayesian RMAB by employing a suitable metapolicy which treats each policy from this finite set as an arm in a different nonBayesian multiarmed bandit problem for which a singlearm selection policy is optimal. We demonstrate this approach by developing a novel sensing policy for opportunistic spectrum access over unknown dynamic channels. We prove that our policy achieves nearlogarithmic regret (the difference in expected reward compared to a modelaware genie), which leads to the same average reward that can be achieved by the optimal policy under a known model. This is the first such result in the literature for a nonBayesian RMAB.
Optimal Cognitive Access of Markovian Channels under Tight Collision Constraints
"... Abstract—The problem of cognitive access of channels of primary users by a secondary user is considered. The transmissions of primary users are modeled as independent continuoustime Markovian onoff processes. A secondary cognitive user employs a slotted transmission format, and it senses one of th ..."
Abstract

Cited by 23 (7 self)
 Add to MetaCart
(Show Context)
Abstract—The problem of cognitive access of channels of primary users by a secondary user is considered. The transmissions of primary users are modeled as independent continuoustime Markovian onoff processes. A secondary cognitive user employs a slotted transmission format, and it senses one of the possible channels before transmission. The objective of the cognitive user is to maximize its throughput subject to collision constraints imposed by the primary users. The optimal access strategy is in general a solution of a constrained partially observable Markov decision process, which involves a constrained optimization in an infinite dimensional functional space. It is shown in this paper that, when the collision constraints are tight, the optimal access strategy can be implemented by a simple memoryless access policy with periodic channel sensing. Analytical expressions are given for the thresholds on collision probabilities for which memoryless access performs optimally. Extensions to multiple secondary users are also presented. Numerical and theoretical results are presented to validate and extend the analysis for different practical scenarios. Index Terms—Cognitive radio, Dynamic spectrum allocation, Cognitive medium access, Markov decision processes.
A restless bandit formulation of opportunistic access: indexablity and index policy
 in Proc. 5th IEEE Conference on Sensor, Mesh and Ad Hoc Communications and Networks (SECON) Workshops
, 2008
"... Abstract—We focus on an opportunistic communication system consisting of multiple independent channels with timevarying states. With limited sensing, a user can only sense and access a subset of channels and accrue rewards determined by the state of the sensed channels. We formulate the problem of ..."
Abstract

Cited by 21 (6 self)
 Add to MetaCart
(Show Context)
Abstract—We focus on an opportunistic communication system consisting of multiple independent channels with timevarying states. With limited sensing, a user can only sense and access a subset of channels and accrue rewards determined by the state of the sensed channels. We formulate the problem of optimal sequential channel probing as a restless multiarmed bandit process, for which a powerful index policy–Whittle’s index policy–can be implemented based on the indexability of the system. Exploiting the underlying structure of the multichannel opportunistic access problem, we establish the indexability and obtain the Whittle’s index in closedform, which leads to a direct implementation of Whittle’s index policy with little complexity. Furthermore, we show that Whittle’s index policy is equivalent to the myopic policy when channels are statistically identical. Index Terms—Opportunistic access, optimal channel probing, restless multiarmed bandit, Whittle’s index policy, indexability. I.