Results 1 - 10
of
160
The ant colony optimization meta-heuristic
- in New Ideas in Optimization
, 1999
"... Ant algorithms are multi-agent systems in which the behavior of each single agent, called artificial ant or ant for short in the following, is inspired by the behavior of real ants. Ant algorithms are one of the most successful examples of swarm intelligent systems [3], and have been applied to many ..."
Abstract
-
Cited by 252 (22 self)
- Add to MetaCart
Ant algorithms are multi-agent systems in which the behavior of each single agent, called artificial ant or ant for short in the following, is inspired by the behavior of real ants. Ant algorithms are one of the most successful examples of swarm intelligent systems [3], and have been applied to many types of problems, ranging from the classical traveling salesman
Bayesian Learning in Negotiation
, 1996
"... Recent growing interest in autonomous interacting software agents and their potential application in areas such as electronic commerce [Sandolm & Lesser 1995] has given increased importance to automated negotiation. MuchDAI and game theoretic research [Rosenschein & Zlotkin 1994; Osborne & Rubinstei ..."
Abstract
-
Cited by 71 (6 self)
- Add to MetaCart
Recent growing interest in autonomous interacting software agents and their potential application in areas such as electronic commerce [Sandolm & Lesser 1995] has given increased importance to automated negotiation. MuchDAI and game theoretic research [Rosenschein & Zlotkin 1994; Osborne & Rubinstein 1994] deals with coordination and negotiation issues by giving pre-computed solutions to specific problems. There has been much research reported on developing theoretical models in which learning plays an eminent role, especially in the area of adaptive dynamics of games (e.g., [Jordan 1992; Kalai & Lehrer 1993]). However, to build autonomous agents that improve their negotiation competence based on learning from their interactions with other agents is still an emerging area. We are interested in developing autonomous agents capable of reasoning based on experience and improving their negotiation behavior incrementally. Learning in negotiation is closely coupled with...
How to Dynamically Merge Markov Decision Processes
, 1997
"... We are frequently called upon to perform multiple tasks that compete for our attention and resource. Often we know the optimal solution to each task in isolation; in this paper, we describe how this knowledge can be exploited to e#ciently find good solutions for doing the tasks in parallel. We formu ..."
Abstract
-
Cited by 55 (1 self)
- Add to MetaCart
We are frequently called upon to perform multiple tasks that compete for our attention and resource. Often we know the optimal solution to each task in isolation; in this paper, we describe how this knowledge can be exploited to e#ciently find good solutions for doing the tasks in parallel. We formulate this problem as that of dynamically merging multiple Markov decision processes (MDPs) into a composite MDP, and present a new theoretically-sound dynamic programming algorithm for finding an optimal policy for the composite MDP. We analyze various aspects of our algorithm and illustrate its use on a simple merging problem. Every day, we are faced with the problem of doing multiple tasks in parallel, each of which competes for our attention and resource. If we are running a job shop, we must decide which machines to allocate to which jobs, and in what order, so that no jobs miss their deadlines. If we are a mail delivery robot, we must find the intended recipients of the mail while simul...
A Bayesian framework for reinforcement learning
- In Proceedings of the Seventeenth International Conference on Machine Learning
, 2000
"... The reinforcement learning problem can be decomposed into two parallel types of inference: (i) estimating the parameters of a model for the underlying process; (ii) determining behavior which maximizes return under the estimated model. Following Dearden, Friedman and Andre (1999), it is proposed tha ..."
Abstract
-
Cited by 48 (1 self)
- Add to MetaCart
The reinforcement learning problem can be decomposed into two parallel types of inference: (i) estimating the parameters of a model for the underlying process; (ii) determining behavior which maximizes return under the estimated model. Following Dearden, Friedman and Andre (1999), it is proposed that the learning process estimates online the full posterior distribution over models. To determine behavior, a hypothesis is sampled from this distribution and the greedy policy with respect to the hypothesis is obtained by dynamic programming. By using a different hypothesis for each trial appropriate exploratory and exploitative behavior is obtained. This Bayesian method always converges to the optimal policy for a stationary process with discrete states. 1.
Distributed Value Functions
- In Proceedings of the Sixteenth International Conference on Machine Learning
, 1999
"... Many interesting problems, such as power grids, network switches, and traffic flow, that are candidates for solving with reinforcement learning (RL), also have properties that make distributed solutions desirable. We propose an algorithm for distributed reinforcement learning based on distributing t ..."
Abstract
-
Cited by 40 (1 self)
- Add to MetaCart
Many interesting problems, such as power grids, network switches, and traffic flow, that are candidates for solving with reinforcement learning (RL), also have properties that make distributed solutions desirable. We propose an algorithm for distributed reinforcement learning based on distributing the representation of the value function across nodes. Each node in the system only has the ability to sense state locally, choose actions locally, and receive reward locally (the goal of the system is to maximize the sum of the rewards over all nodes and over all time). However each node is allowed to give its neighbors the current estimate of its value function for the states it passes through. We present a value function learning rule, using that information, that allows each node to learn a value function that is an estimate of a weighted sum of future rewards for all the nodes in the network. With this representation, each node can choose actions to improve the performance of the overall...
Quantitative Solution of Omega-Regular Games
"... We consider two-player games played for an infinite number of rounds, with ω-regular winning conditions. The games may be concurrent, in that the players choose their moves simultaneously and independently, and probabilistic, in that the moves determine a probability distribution for the successor s ..."
Abstract
-
Cited by 37 (12 self)
- Add to MetaCart
We consider two-player games played for an infinite number of rounds, with ω-regular winning conditions. The games may be concurrent, in that the players choose their moves simultaneously and independently, and probabilistic, in that the moves determine a probability distribution for the successor state. We introduce quantitative game µ-calculus, and we show that the maximal probability of winning such games can be expressed as the fixpoint formulas in this calculus. We develop the arguments both for deterministic and for probabilistic concurrent games; as a special case, we solve probabilistic turn-based games with ω-regular winning conditions, which was also open. We also characterize the optimality, and the memory requirements, of the winning strategies. In particular, we show that while memoryless strategies suffice for winning games with safety and reachability conditions, Büchi conditions require the use of strategies with infinite memory. The existence of optimal strategies, as opposed to ε-optimal, is only guaranteed in games with safety winning conditions.
Concurrent Reachability Games
, 2008
"... We consider concurrent two-player games with reachability objectives. In such games, at each round, player 1 and player 2 independently and simultaneously choose moves, and the two choices determine the next state of the game. The objective of player 1 is to reach a set of target states; the objecti ..."
Abstract
-
Cited by 36 (18 self)
- Add to MetaCart
We consider concurrent two-player games with reachability objectives. In such games, at each round, player 1 and player 2 independently and simultaneously choose moves, and the two choices determine the next state of the game. The objective of player 1 is to reach a set of target states; the objective of player 2 is to prevent this. These are zero-sum games, and the reachability objective is one of the most basic objectives: determining the set of states from which player 1 can win the game is a fundamental problem in control theory and system verification. There are three types of winning states, according to the degree of certainty with which player 1 can reach the target. From type-1 states, player 1 has a deterministic strategy to always reach the target. From type-2 states, player 1 has a randomized strategy to reach the target with probability 1. From type-3 states, player 1 has for every real ε> 0 a randomized strategy to reach the target with probability greater than 1 − ε. We show that for finite state spaces, all three sets of winning states can be computed in polynomial time: type-1 states in linear time, and type-2 and type-3 states in quadratic time. The algorithms to compute the three sets of winning states also enable the construction of the winning and spoiling strategies.
Application-aware Admission Control and Scheduling in Web Servers
, 2002
"... This paper presents an architecture and algorithms for optimizing the performance of web services. For a given service, session-based admission control is combined with stage-wise request queuing, where the stages represent sub-tasks within sessions. The scheduling of requests is governed by general ..."
Abstract
-
Cited by 36 (0 self)
- Add to MetaCart
This paper presents an architecture and algorithms for optimizing the performance of web services. For a given service, session-based admission control is combined with stage-wise request queuing, where the stages represent sub-tasks within sessions. The scheduling of requests is governed by generalized processor sharing. We present a performance model, relying on online estimation of parameters describing client-server interaction. A reward function corresponding to the service provider's objective is maximized using techniques for nonlinear optimization. In a case study, we model and optimize the resource sharing at a web server hosting an electronic store. The performance advantages of our approach are quantified numerically, and the robustness to parameter estimation errors is assessed by sensitivity analysis.
How to Specify and Verify the Long-Run Average Behavior of Probabilistic Systems
- In Proc. LICS'98
, 1998
"... Long-run average properties of probabilistic systems refer to the average behavior of the system, measured over a period of time whose length diverges to infinity. These properties include many relevant performance and reliability indices, such as system throughput, average response time, and mean t ..."
Abstract
-
Cited by 35 (3 self)
- Add to MetaCart
Long-run average properties of probabilistic systems refer to the average behavior of the system, measured over a period of time whose length diverges to infinity. These properties include many relevant performance and reliability indices, such as system throughput, average response time, and mean time between failures. In this paper, we argue that current formal specification methods cannot be used to specify long-run average properties of probabilistic systems. To enable the specification of these properties, we propose an approach based on the concept of experiments. Experiments are labeled graphs that can be used to describe behavior patterns of interest, such as the request for a resource followed by either a grant or a rejection. Experiments are meant to be performed infinitely often, and it is possible to specify their long-run average outcome or duration. We propose simple extensions of temporal logics based on experiments, and we present model-checking algorithms for the verif...
Stochastic Linear Control over a Communication Channel
, 2003
"... We examine linear stochastic control systems when there is a communication channel connecting the sensor to the controller. The problem consists of designing the channel encoder and decoder as well as the controller to satisfy some given control objectives. In particular we examine the role communic ..."
Abstract
-
Cited by 32 (7 self)
- Add to MetaCart
We examine linear stochastic control systems when there is a communication channel connecting the sensor to the controller. The problem consists of designing the channel encoder and decoder as well as the controller to satisfy some given control objectives. In particular we examine the role communication has on the classical LQG problem. We give conditions under which the classical separation property between estimation and control holds and the certainty equivalent control law is optimal. We then present the sequential rate distortion framework. We present bounds on the achievable performance and show the inherent tradeo#s between control and communication costs. In particular we show that optimal quadratic cost decomposes into two terms: a full knowledge cost and a sequential rate distortion cost.

