Results

**11 - 17**of**17**### Adaptive Concentration Inequalities for Sequential Decision Problems

"... Abstract A key challenge in sequential decision problems is to determine how many samples are needed for an agent to make reliable decisions with good probabilistic guarantees. We introduce Hoeffding-like concentration inequalities that hold for a random, adaptively chosen number of samples. Our in ..."

Abstract
- Add to MetaCart

(Show Context)
Abstract A key challenge in sequential decision problems is to determine how many samples are needed for an agent to make reliable decisions with good probabilistic guarantees. We introduce Hoeffding-like concentration inequalities that hold for a random, adaptively chosen number of samples. Our inequalities are tight under natural assumptions and can greatly simplify the analysis of common sequential decision problems. In particular, we apply them to sequential hypothesis testing, best arm identification, and sorting. The resulting algorithms rival or exceed the state of the art both theoretically and empirically.

### Intelligent Approaches for Communication Denial

, 2015

"... Spectrum supremacy is a vital part of security in the modern era. In the past 50 years, a great deal of work has been devoted to designing defenses against attacks from malicious nodes (e.g., anti-jamming), while significantly less work has been devoted to the equally important task of designing eff ..."

Abstract
- Add to MetaCart

(Show Context)
Spectrum supremacy is a vital part of security in the modern era. In the past 50 years, a great deal of work has been devoted to designing defenses against attacks from malicious nodes (e.g., anti-jamming), while significantly less work has been devoted to the equally important task of designing effective strategies for denying communication between enemy nodes/radios within an area (e.g., jamming). Such denial techniques are especially useful in military applications and intrusion detection systems where untrusted communication must be stopped. In this dissertation, we study these offensive attack procedures, collectively termed as communication denial. The communication denial strategies studied in this dissertation are not only useful in undermining the communication between enemy nodes, but also help in analyzing the vulnerabilities of existing systems. A majority of the works which address communication denial assume that knowledge about the enemy nodes is available a priori. However, recent advances in communication systems creates the potential for dynamic environmental conditions where it is difficult and most likely not even possible to obtain a priori information regarding the environment and the nodes that are present in it. Therefore, it is necessary to have cognitive capabilities that enable the attacker to learn

### An optimal algorithm for the Thresholding Bandit Problem Maurilio Gutzeit

"... Abstract We study a specific combinatorial pure exploration stochastic bandit problem where the learner aims at finding the set of arms whose means are above a given threshold, up to a given precision, and for a fixed time horizon. We propose a parameter-free algorithm based on an original heuristi ..."

Abstract
- Add to MetaCart

(Show Context)
Abstract We study a specific combinatorial pure exploration stochastic bandit problem where the learner aims at finding the set of arms whose means are above a given threshold, up to a given precision, and for a fixed time horizon. We propose a parameter-free algorithm based on an original heuristic, and prove that it is optimal for this problem by deriving matching upper and lower bounds. To the best of our knowledge, this is the first non-trivial pure exploration setting with fixed budget for which optimal strategies are constructed.

### Online Learning with Feedback Graphs Without the Graphs

"... Abstract We study an online learning framework introduced by ..."

(Show Context)
### Regulation of Exploration for Simple Regret Minimization in Monte-Carlo Tree Search

"... Abstract—The application of multi-armed bandit (MAB) algo-rithms was a critical step in the development of Monte-Carlo tree search (MCTS). One example would be the UCT algorithm, which applies the UCB bandit algorithm. Various research has been conducted on applying other bandit algorithms to MCTS. ..."

Abstract
- Add to MetaCart

(Show Context)
Abstract—The application of multi-armed bandit (MAB) algo-rithms was a critical step in the development of Monte-Carlo tree search (MCTS). One example would be the UCT algorithm, which applies the UCB bandit algorithm. Various research has been conducted on applying other bandit algorithms to MCTS. Simple regret bandit algorithms, which aim to identify the optimal arm after a number of trials, have been of great interest in various fields in recent years. However, the simple regret bandit algorithm has the tendency to spend more time on sampling suboptimal arms, which may be a problem in the context of game tree search. In this research, we will propose combined confidence bounds, which utilize the characteristics of the confidence bounds of the improved UCB and UCB√ · algorithms to regulate exploration for simple regret minimization in MCTS. We will demonstrate the combined confidence bounds bandit algorithm has better empirical performance than that of the UCB algorithm on the MAB problem. We will show that the combined confidence bounds MCTS (CCB-MCTS) has better performance over plain UCT on the game of 9 × 9 Go, and has shown good scalability. We will also show that the performance of CCB-MCTS can be further enhanced with the application of all-moves-as-first (AMAF) heuristic. I.