Results 1  10
of
17
On measuring social intelligence: experiments on competition and cooperation
"... Abstract. Evaluating agent intelligence is a fundamental issue for the understanding, construction and improvement of autonomous agents. New intelligence tests have been recently developed based on an assessment of task complexity using algorithmic information theory. Some early experimental results ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Evaluating agent intelligence is a fundamental issue for the understanding, construction and improvement of autonomous agents. New intelligence tests have been recently developed based on an assessment of task complexity using algorithmic information theory. Some early experimental results have shown that these intelligence tests may be able to distinguish between agents of the same kind, but they do not place very different agents, e.g., humans and machines, on a correct scale. It has been suggested that a possible explanation is that these tests do not measure social intelligence. One formal approach to incorporate social environments in an intelligence test is the recent notion of DarwinWallace distribution. Inspired by this distribution we present several new test settings considering competition and cooperation, where we evaluate the “social intelligence ” of several reinforcement learning algorithms. The results show that evaluating social intelligence raises many issues that need to be addressed in order to devise tests of social intelligence. 1
Emergence of social networks via direct and indirect reciprocity
, 2012
"... Many models of social network formation implicitly assume that network properties are static in steadystate. In contrast, actual social networks are highly dynamic: allegiances and collaborations expire and may or may not be renewed at a later date. Moreover, empirical studies show that human socia ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Many models of social network formation implicitly assume that network properties are static in steadystate. In contrast, actual social networks are highly dynamic: allegiances and collaborations expire and may or may not be renewed at a later date. Moreover, empirical studies show that human social networks are dynamic at the individual level but static at the global level: individuals ’ degree rankings change considerably over time, whereas networklevel metrics such as network diameter and clustering coefficient are relatively stable. There have been some attempts to explain these properties of empirical social networks using agentbased models in which agents play social dilemma games with their immediate neighbours, but can also manipulate their network connections to strategic advantage. However, such models cannot straightforwardly account for reciprocal behaviour based on reputation scores (“indirect reciprocity”), which is known to play an important role in many economic interactions. In order to account for indirect reciprocity, we model the network in a bottomup fashion: the network emerges from the lowlevel interactions between agents. By so doing we are able to simultaneously account for the effect of both direct reciprocity (e.g. “titfortat”) as well as indirect reciprocity (helping strangers in order to increase one’s reputation). This leads to a strategic equilibrium in the frequencies with which strategies are adopted in the population as a whole, but intermittent cycling over different strategies at the level of individual agents, which in turn gives rise to social networks which are dynamic at the individual level but stable at the network level. 1
FAQlearning in Matrix Games: Demonstrating Convergence near Nash Equilibria, and Bifurcation of Attractors in the Battle of Sexes
"... This article studies Frequency Adjusted Qlearning (FAQlearning), a variation of Qlearning that simulates simultaneous value function updates. The main contributions are empirical and theoretical support for the convergence of FAQlearning to attractors near Nash equilibria in twoagent twoactio ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
This article studies Frequency Adjusted Qlearning (FAQlearning), a variation of Qlearning that simulates simultaneous value function updates. The main contributions are empirical and theoretical support for the convergence of FAQlearning to attractors near Nash equilibria in twoagent twoaction matrix games. The games can be divided into three types: Matching pennies, Prisoners ’ Dilemma and Battle of Sexes. This article shows that the Matching pennies and Prisoners’ Dilemma yield one attractor of the learning dynamics, while the Battle of Sexes exhibits a supercritical pitchfork bifurcation at a critical temperature of τ, where one attractor splits into two attractors and one repellent fixed point. Experiments illustrate that the distance between
RESQlearning in stochastic games
"... This paper introduces a new multiagent learning algorithm for stochastic games based on replicator dynamics from evolutionary game theory. We identify and transfer desired convergence behavior of these dynamical systems by leveraging the link between evolutionary game theory and multiagent reinf ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
This paper introduces a new multiagent learning algorithm for stochastic games based on replicator dynamics from evolutionary game theory. We identify and transfer desired convergence behavior of these dynamical systems by leveraging the link between evolutionary game theory and multiagent reinforcement learning. More precisely, the algorithm (RESQlearning) presented here is the result of Reverse Engineering Statecoupled replicator dynamics injected with the Qlearning Boltzmann mutation scheme. The contributions of this paper are twofold. One, we demonstrate the importance of a mathematical multiagent learning framework by transferring insights from evolutionary game theory to reinforcement learning. Two, the resulting learning algorithm successfully inherits the convergence behavior of the reverse engineered dynamical system. Results show that RESQlearning provides convergence to pure as well as mixed Nash equilibria in a selection of stateless and stochastic multiagent games.
K.: Lenient learning in a multiplayer stag hunt
 In: Proc. of 23rd Benelux Conf. on Artificial Intelligence (BNAIC
, 2011
"... This paper describes the learning dynamics of individual learners in a multiplayer Stag Hunt game, focussing primarily on the difference between lenient and nonlenient learning. We find that, as in 2player games, leniency significantly promotes cooperative outcomes in 3player games, as the basin ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
This paper describes the learning dynamics of individual learners in a multiplayer Stag Hunt game, focussing primarily on the difference between lenient and nonlenient learning. We find that, as in 2player games, leniency significantly promotes cooperative outcomes in 3player games, as the basins of attraction of (partially) cooperative equilibria grow under this learning scheme. Moreover, we observe significant differences between purely selectionbased models, as often encountered in related analytical research, and models that include mutation. Therefore, purely selectionbased analysis might not always accurately predict the behavior of practical learning algorithms, which often include mutation. 1
Addressing the Policybias of Qlearning by Repeating Updates∗
"... Qlearning is a very popular reinforcement learning algorithm being proven to converge to optimal policies in Markov decision processes. However, Qlearning shows artifacts in nonstationary environments, e.g., the probability of playing the optimal action may decrease if Qvalues deviate signifi ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Qlearning is a very popular reinforcement learning algorithm being proven to converge to optimal policies in Markov decision processes. However, Qlearning shows artifacts in nonstationary environments, e.g., the probability of playing the optimal action may decrease if Qvalues deviate significantly from the true values, a situation that may arise in the initial phase as well as after changes in the environment.These artifacts were resolved in literature by the variant Frequency Adjusted Qlearning (FAQL). However, FAQL also suffered from practical concerns that limited the policy subspace for which the behavior was improved. Here, we introduce the Repeated Update Qlearning (RUQL), a variant of Qlearning that resolves the undesirable artifacts of Qlearning without the practical concerns of FAQL. We show (both theoretically and experimentally) the similarities and differences between RUQL and FAQL (the closest stateoftheart). Experimental results verify the theoretical insights and show how RUQL outperforms FAQL and QL in nonstationary environments.
POMDP Opponent Models for Best Response Behavior
, 2011
"... This work focuses on the integration of Opponent Modeling (OM) in two player repeated games. For this purpose, a theoretical analysis of strategies and best response strategies is given. This analysis explains why OM can be a key to longterm best response behavior. It is shown, that OpponentModel ..."
Abstract
 Add to MetaCart
This work focuses on the integration of Opponent Modeling (OM) in two player repeated games. For this purpose, a theoretical analysis of strategies and best response strategies is given. This analysis explains why OM can be a key to longterm best response behavior. It is shown, that OpponentModels based on Partially ObservableMarkov Decision Processes (POMDPs) generalize a broad range of existing OM techniques. As an instance of a POMDP based learning algorithm, a variation of McCallum’s Utile Distinction Memory algorithm is presented. This technique is based on BaumWelch maximum likelihood estimation and uses a ttest to adjust the number ofmodel states. Experimental results demonstrates that this algorithm can identify the structure of some popular simple strategies as for example TitforTat. It is also able to approximate the behavior of more complex examples like Qlearning. 1
Learning in Networked Interactions: A Replicator Dynamics Approach
"... Abstract. Many realworld scenarios can be modelled as multiagent systems, where multiple autonomous decision makers interact in a single environment. The complex and dynamic nature of such interactions prevents handcrafting solutions for all possible scenarios, hence learning is crucial. Studyin ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. Many realworld scenarios can be modelled as multiagent systems, where multiple autonomous decision makers interact in a single environment. The complex and dynamic nature of such interactions prevents handcrafting solutions for all possible scenarios, hence learning is crucial. Studying the dynamics of multiagent learning is imperative in selecting and tuning the right learning algorithm for the task at hand. So far, analysis of these dynamics has been mainly limited to normal form games, or unstructured populations. However, many multiagent systems are highly structured, complex networks, with agents only interacting locally. Here, we study the dynamics of such networked interactions, using the wellknown replicator dynamics of evolutionary game theory as a model for learning. Different learning algorithms are modelled by altering the replicator equations slightly. In particular, we investigate lenience as an enabler for cooperation. Moreover, we show how wellconnected, stubborn agents can influence the learning outcome. Finally, we investigate the impact of structural network properties on the learning outcome, as well as the influence of mutation driven by exploration.
Lenient Frequency Adjusted Qlearning
"... Overcoming convergence to suboptimal solutions in cooperative multiagent games has been a main challenge in reinforcement learning. The concept of “leniency ” has been proposed to be more forgiving for initial miscoordination. It has been shown theoretically that an arbitrarily high certainty of ..."
Abstract
 Add to MetaCart
(Show Context)
Overcoming convergence to suboptimal solutions in cooperative multiagent games has been a main challenge in reinforcement learning. The concept of “leniency ” has been proposed to be more forgiving for initial miscoordination. It has been shown theoretically that an arbitrarily high certainty of convergence to the global optimum can be achieved by increasing the degree of leniency, but the relation of the evolutionary game theoretic model to the Lenient Qlearning algorithm relied on the simplifying assumption that all actions would be updated simultaneously. Building on insights from Frequency Adjusted Qlearning, this article introduces the variation Lenient Frequency Adjusted Qlearning that matches the theoretical model precisely, and allows for arbitrarily high convergence to Pareto optimal equilibria in cooperative games. 1
Opponent Modeling with POMDPs
"... Reinforcement Learning techniques such as Qlearning are commonly studied in the context of twoplayer repeated games. However, Qlearning fails to converge to best response behavior even against simple strategies such as TitfortwoTat. Opponent Modeling (OM) can be used to overcome this problem. ..."
Abstract
 Add to MetaCart
(Show Context)
Reinforcement Learning techniques such as Qlearning are commonly studied in the context of twoplayer repeated games. However, Qlearning fails to converge to best response behavior even against simple strategies such as TitfortwoTat. Opponent Modeling (OM) can be used to overcome this problem. This article shows that OM based on Partially Observable Markov Decision Processes (POMDPs) can represent a large class of opponent strategies. A variation of McCallum’s Utile Distinction Memory algorithm is presented as a means to compute such a POMDP opponent model. This technique is based on BaumWelch maximum likelihood estimation and uses a ttest to adjust the number of model states. Experimental results demonstrate that this algorithm can identify the structure of strategies against which pure Qlearning is insufficient. This provides a basis for best response behavior against a larger class of strategies. 1