Results 1  10
of
13
Evaluating the Performance of DCOP Algorithms in a Real World, Dynamic Problem
, 2008
"... Complete algorithms have been proposed to solve problems modelled as distributed constraint optimization (DCOP). However, there are only few attempts to address real world scenarios using this formalism, mainly because of the complexity associated with those algorithms. In the present work we compar ..."
Abstract

Cited by 31 (1 self)
 Add to MetaCart
Complete algorithms have been proposed to solve problems modelled as distributed constraint optimization (DCOP). However, there are only few attempts to address real world scenarios using this formalism, mainly because of the complexity associated with those algorithms. In the present work we compare three complete algorithms for DCOP, aiming at studying how they perform in complex and dynamic scenarios of increasing sizes. In order to assess their performance we measure not only standard quantities such as number of cycles to arrive to a solution, size and quantity of exchanged messages, but also computing time and quality of the solution which is related to the particular domain we use. This study can shed light in the issues of how the algorithms perform when applied to problems other than those reported in the literature (graph coloring, meeting scheduling, and distributed sensor network).
Incremental learning of variable rate concept drift
 International Workshop on Multiple Classifier Systems (MCS 2009) in Lecture
"... Abstract. We have recently introduced an incremental learning algorithm, Learn ++.NSE, for NonStationary Environments, where the data distribution changes over time due to concept drift. Learn ++.NSE is an ensemble of classifiers approach, training a new classifier on each consecutive batch of data ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
Abstract. We have recently introduced an incremental learning algorithm, Learn ++.NSE, for NonStationary Environments, where the data distribution changes over time due to concept drift. Learn ++.NSE is an ensemble of classifiers approach, training a new classifier on each consecutive batch of data that become available, and combining them through an ageadjusted dynamic error based weighted majority voting. Prior work has shown the algorithm’s ability to track gradually changing environments as well as its ability to retain former knowledge in cases of cyclical or recurring data by retaining and appropriately weighting all classifiers generated thus far. In this contribution, we extend the analysis of the algorithm to more challenging environments experiencing varying drift rates; but more importantly we present preliminary results on the ability of the algorithm to accommodate addition or subtraction of classes over time. Furthermore, we also present comparative results of a variation of the algorithm that employs an errorbased pruning in cyclical environments.
Choosing best fitness function with reinforcement learning
 In Proceedings of he 2011 10th International Conference on Machine Learning and Applications
, 2011
"... AbstractThis paper describes an optimization problem with one target function to be optimized and several supporting functions that can be used to speed up the optimization process. A method based on reinforcement learning is proposed for choosing a good supporting function during optimization usi ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
AbstractThis paper describes an optimization problem with one target function to be optimized and several supporting functions that can be used to speed up the optimization process. A method based on reinforcement learning is proposed for choosing a good supporting function during optimization using genetic algorithm. Results of applying this method to a model problem are shown.
K.: To adapt or not to adapt – consequences of adapting driver and traffic light agents
 Adaptive Agents and MultiAgent Systems III. LNCS (LNAI
, 2008
"... ..."
(Show Context)
A GameTheoretic Procedure for Learning Hierarchically Structured Strategies
"... Abstract—This paper addresses the problem of acquiring a hierarchically structured robotic skill in a nonstationary environment. This is achieved through a combination of learning primitive strategies from observation of an expert, and autonomously synthesising composite strategies from that basis. ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract—This paper addresses the problem of acquiring a hierarchically structured robotic skill in a nonstationary environment. This is achieved through a combination of learning primitive strategies from observation of an expert, and autonomously synthesising composite strategies from that basis. Both aspects of this problem are approached from a game theoretic viewpoint, building on prior work in the area of multiplicative weights learning algorithms. The utility of this procedure is demonstrated through simulation experiments motivated by the problem of autonomous driving. We show that this procedure allows the agent to come to terms with two forms of uncertainty in the world – continually varying goals (due to oncoming traffic) and nonstationarity of optimisation criteria (e.g., driven by changing navigability of the road). We argue that this type of factored task specification and learning is a necessary ingredient for robust autonomous behaviour in a “largeworld ” setting. I.
Addressing Environment NonStationarity by Repeating Qlearning Updates *
, 2016
"... Abstract Qlearning (QL) is a popular reinforcement learning algorithm that is guaranteed to converge to optimal policies in Markov decision processes. However, QL exhibits an artifact: in expectation, the effective rate of updating the value of an action depends on the probability of choosing that ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Qlearning (QL) is a popular reinforcement learning algorithm that is guaranteed to converge to optimal policies in Markov decision processes. However, QL exhibits an artifact: in expectation, the effective rate of updating the value of an action depends on the probability of choosing that action. In other words, there is a tight coupling between the learning dynamics and underlying execution policy. This coupling can cause performance degradation in noisy nonstationary environments. Here, we introduce Repeated Update Qlearning (RUQL), a learning algorithm that resolves the undesirable artifact of Qlearning while maintaining simplicity. We analyze the similarities and differences between RUQL, QL, and the closest stateoftheart algorithms theoretically. Our analysis shows that RUQL maintains the convergence guarantee of QL in stationary environments, while relaxing the coupling between the execution policy and the learning dynamics. Experimental results confirm the theoretical insights and show how RUQL outperforms both QL and the closest stateoftheart algorithms in noisy nonstationary environments.
Improved Selection of Auxiliary Objectives using Reinforcement Learning in NonStationary Environment
"... AbstractEfficiency of evolutionary algorithms can be increased by using auxiliary objectives. The method which is called EA+RL is considered. In this method a reinforcement learning (RL) algorithm is used to select objectives in evolutionary algorithms (EA) during optimization. In earlier studies, ..."
Abstract
 Add to MetaCart
(Show Context)
AbstractEfficiency of evolutionary algorithms can be increased by using auxiliary objectives. The method which is called EA+RL is considered. In this method a reinforcement learning (RL) algorithm is used to select objectives in evolutionary algorithms (EA) during optimization. In earlier studies, reinforcement learning algorithms for stationary environments were used in the EA+RL method. However, if behavior of auxiliary objectives change during the optimization process, it can be better to use reinforcement learning algorithms which are specially developed for nonstationary environments. In our previous work we proposed a new reinforcement learning algorithm to be used in the EA+RL method. In this work we propose an improved version of that algorithm. The new algorithm is applied to a nonstationary problem and compared with the methods which were used in other studies. It is shown that the proposed method achieves optimal value more often and obtains higher values of the target objective than the other algorithms.
Learning in NonStationary MDPs as Transfer Learning
"... ABSTRACT In this paper we present a learning algorithm for a particular subclass of nonstationary environments where the learner is required to interact with other agents. The behaviorpolicy of the agents are determined by a latent variable that changes rarely, but can modify the agent policies d ..."
Abstract
 Add to MetaCart
(Show Context)
ABSTRACT In this paper we present a learning algorithm for a particular subclass of nonstationary environments where the learner is required to interact with other agents. The behaviorpolicy of the agents are determined by a latent variable that changes rarely, but can modify the agent policies drastically when it does change (like traffic conditions in a driving problem). This unpredictable change in the latent variable results in nonstationarity. We frame this problem as a transfer learning in a particular subclass of MDPs where each task/MDP requires the learner to learn to interact with opponent agents with fixed policies. Across the tasks, the state and action space remains the same (and is known) but the agentpolicies change. We transfer information from previous tasks to quickly infer the combined agent behavior policy in a new task after some limited initial exploration, and hence rapidly learn an optimal/nearoptimal policy. We propose a transfer algorithm which given a collection of source behavior policies, eliminates the policies that do not apply in the new task in time polynomial in the relevant parameters using novel a statistical test. We also perform experiments in three interesting domains and show that our algorithm significantly outperforms relevant algorithms.
Reinforcement Learningbased Control of Traffic Lights in Nonstationary Environments: A Case Study in a Microscopic Simulator
"... Coping with dynamic changes in traffic volume has been the object of recent publications. Recently, a method was proposed, which is capable of learning in nonstationary scenarios via an approach to detect context changes. For particular scenarios such as the traffic control one, the performance of ..."
Abstract
 Add to MetaCart
(Show Context)
Coping with dynamic changes in traffic volume has been the object of recent publications. Recently, a method was proposed, which is capable of learning in nonstationary scenarios via an approach to detect context changes. For particular scenarios such as the traffic control one, the performance of that method is better than a greedy strategy, as well as other reinforcement learning approaches, such as Qlearning and Prioritized Sweeping. The goal of the present paper is to assess the feasibility of applying the above mentioned approach in a more realistic scenario, implemented by means of a microscopic traffic simulator. We intend to show that to use of context detection is suitable to deal with noisy scenarios where nonstationarity occurs not only due to the changing volume of vehicles, but also because of the random behavior of drivers in what regards the operational task of driving (e.g. deceleration probability). The results confirm the tendencies already detected in the previous paper, although here the increase in noise makes the learning task much more difficult, and the correct separation of contexts harder. 1