#### DMCA

## Noname manuscript No. (will be inserted by the editor)

### Citations

2206 | Probability inequalities for sums of bounded random variables
- Hoeffding
- 1963
(Show Context)
Citation Context ...d in the continuation. A.2.1 Preliminary results and notation In this section, we introduce several preliminary results that will be of use in the proof. We start with a simple lemma due to Hoeffding =-=[30]-=- that provides a useful bound for the moment generating function of a bounded r.v. X. Lemma 2 Let X be a r.v. such that E [X] = 0 and a ≤ X ≤ b almost surely. Then, for any λ > 0, E [ e−λX ] ≤ eλ 2(b−... |

1669 |
Learning from Delayed Rewards
- Watkins
- 1989
(Show Context)
Citation Context ...ake an optimal decision. The MDP agent is evaluated in terms of payoff only. An RL agent. [RL] The RL agent is a standard reinforcement learning agent (RL) running the well-known Q-learning algorithm =-=[65]-=-.8 This agent has no prior knowledge about the target task or the teammate behavior, and must learn both the task and the coordination policy by a process of trial-anderror. Unlike the other agents, t... |

1150 |
The theory of learning in games
- Fudenberg
- 1998
(Show Context)
Citation Context ...where an agent must identify a target task from the observed behavior of other agent(s). Similarly, it is possible to identify teammate identification with the large body of work on learning in games =-=[20]-=- and opponent modeling [21, 48], where an agent must predict the behavior of other agents from observing their actions. Finally, planning is clearly related with the growing body of work on decentrali... |

509 |
Asymptotically efficient adaptive allocation rules
- Lai, Robbins
- 1985
(Show Context)
Citation Context ...l bound on the regret Ad Hoc Teamwork by Learning Teammates’ Task 25 that is O(log n). Such result would also match (up to constants) the lower bound provided in the precursor work of Lai and Robbins =-=[39]-=- (see also [35]), establishing our POMDP approach as near-optimal. A second remark concerns the close relation between Theorem 2 and a rich body of work from both the game-theoretic community on Bayes... |

368 | Multiagent Systems: A Survey from a Machine Learning Perspective
- Stone, Veloso
- 2000
(Show Context)
Citation Context ...ent” as being an agent that is able to engage in ad hoc teamwork. 4 F.S. Melo and A. Sardinha third experiment evaluates the performance of our ad hoc agents in a benchmark domain from the literature =-=[59]-=- and compares some of our results with related works in the ad hoc teamwork literature. 2 The Ad Hoc Teamwork Problem The ad hoc teamwork setting [60] is a research problem in which an autonomous agen... |

312 | Algorithms for inverse reinforcement learning
- Ng, Russell
- 2000
(Show Context)
Citation Context ...ing. To some extent, our setting is close to that of inverse reinforcement learning (IRL), where an agent must identify a target task from the observation of the actions of another agent (the expert) =-=[45]-=-. In IRL, the learning agent is provided with a set of reward functions that represent potential tasks, and the agent must then determine which reward function better explains the behavior of the expe... |

281 | Rational Learning Leads to Nash Equilibrium,” Center for
- Kalai, Lehrer
- 1991
(Show Context)
Citation Context ...blishing our POMDP approach as near-optimal. A second remark concerns the close relation between Theorem 2 and a rich body of work from both the game-theoretic community on Bayesian learning in games =-=[9, 18, 19, 31, 32, 34, 44]-=- and the information-theoretic community on the convergence of Bayesian estimation [8, 14, 17, 24, 25, 28, 29, 54, 63]. Finally, it is worth noting that the potential benefits, in terms of performance... |

268 | Bandit Processes and Dynamic Allocation Indices.
- Gittins
- 1989
(Show Context)
Citation Context ...r approach in distinct ways. The consideration of the prior p0, alluded to in (i), suggests a Bayesian approach to the ad hoc teamwork problem, possibly along the lines of the seminal work of Gittins =-=[26]-=- or the more recent works of Kauffman et al [35, 36]. As for (ii), as briefly discussed in Section 3.3, it impacts the way in which regret is defined. In particular, the performance of our predictor s... |

168 |
Reputation and Equilibrium Selection in Games with a Patient Player,Econometrica
- Fudenberg, Levine
- 1989
(Show Context)
Citation Context ...blishing our POMDP approach as near-optimal. A second remark concerns the close relation between Theorem 2 and a rich body of work from both the game-theoretic community on Bayesian learning in games =-=[9, 18, 19, 31, 32, 34, 44]-=- and the information-theoretic community on the convergence of Bayesian estimation [8, 14, 17, 24, 25, 28, 29, 54, 63]. Finally, it is worth noting that the potential benefits, in terms of performance... |

160 | A Formal theory of plan recognition and its implementation
- Kautz
- 1991
(Show Context)
Citation Context ...losely related problems in the literature that address each of the challenges in Fig. 1 separately. For example, it is possible to identify task identification with the literature on plan recognition =-=[37]-=-, inverse reinforcement learning [1] and related topics, where an agent must identify a target task from the observed behavior of other agent(s). Similarly, it is possible to identify teammate identif... |

139 | An analytic solution to discrete Bayesian reinforcement learning.
- Poupart, Vlassis, et al.
- 2006
(Show Context)
Citation Context ...xploiting the information already available. In a sense, a similar perspective is behind POMDP-based approaches to Bayesian reinforcement learning that optimally tradeoff exploration and exploitation =-=[16, 47]-=-. In practical terms, this tradeoff translates to the superior performance of the POMDP agent observed in the different experiments. Finally, we note that the POMDP planning takes place offline. As su... |

129 | A framework for sequential planning in multiagent settings
- Gmytrasiewicz, Doshi
- 2005
(Show Context)
Citation Context ...1, 48], where an agent must predict the behavior of other agents from observing their actions. Finally, planning is clearly related with the growing body of work on decentralized/distributed planning =-=[27, 52]-=-. We refer to the work of Stone et al [60] for a detailed dis2 In the nomenclature of Fig. 1, we take planning in its broadest sense, which includes any offline and/or online reasoning about which act... |

121 | Planning, learning and coordination in multiagent decision processes.
- Boutilier
- 1996
(Show Context)
Citation Context ...rget task. Therefore, as expected, it outperforms the other approaches. In fact because the 16 Such games are often referred to in the multiagent learning literature as state games. See, for example, =-=[10, 64]-=-. Ad Hoc Teamwork by Learning Teammates’ Task 39 0 20 40 60 80 100 0 8 16 24 32 40 Total reward Time−step Pa yo ff OL BSKR (known task) RL (a) Total average payoff. OL BSKR (known task) RL 0 18 36 54 ... |

104 |
Optimal learning: Computational procedures for Bayes-adaptive Markov decision processes. Doctoral dissertation
- Duff
- 2002
(Show Context)
Citation Context ...xploiting the information already available. In a sense, a similar perspective is behind POMDP-based approaches to Bayesian reinforcement learning that optimally tradeoff exploration and exploitation =-=[16, 47]-=-. In practical terms, this tradeoff translates to the superior performance of the POMDP agent observed in the different experiments. Finally, we note that the POMDP planning takes place offline. As su... |

102 | Exploration and apprenticeship learning in reinforcement learning.
- Abbeel, Ng
- 2005
(Show Context)
Citation Context ...ture that address each of the challenges in Fig. 1 separately. For example, it is possible to identify task identification with the literature on plan recognition [37], inverse reinforcement learning =-=[1]-=- and related topics, where an agent must identify a target task from the observed behavior of other agent(s). Similarly, it is possible to identify teammate identification with the large body of work ... |

88 | Reinforcement learning to play an optimal nash equilibrium in team markov games.
- Wang, Sandholm
- 2003
(Show Context)
Citation Context ...rget task. Therefore, as expected, it outperforms the other approaches. In fact because the 16 Such games are often referred to in the multiagent learning literature as state games. See, for example, =-=[10, 64]-=-. Ad Hoc Teamwork by Learning Teammates’ Task 39 0 20 40 60 80 100 0 8 16 24 32 40 Total reward Time−step Pa yo ff OL BSKR (known task) RL (a) Total average payoff. OL BSKR (known task) RL 0 18 36 54 ... |

61 |
Essentials of game theory: A concise multidisciplinary introduction. Synthesis lectures on artificial intelligence and machine learning,
- Leyton-Brown, Shoham
- 2008
(Show Context)
Citation Context ..., the authors address only the teammate identification and planning steps, assuming knowledge of the target task. Game theory provides a strong theoretical framework to analyze multiagent interaction =-=[40]-=- and results from iterated play in normal form games that have been adapted to address ad hoc teamwork. For instance, Wu et al [66] propose an online planning algorithm, where the planning problem is ... |

58 | Value-function reinforcement learning in Markov games
- Littman
(Show Context)
Citation Context ...ry. In this sense, our work is also closely related with reinforcement learning (RL), where an agent must learn an optimal line of action by interacting with its environment and other existing agents =-=[42, 62]-=-. However, unlike the standard RL setting, the ad hoc agent α receives no evaluative feedback from the environment. We now revisit Example 1 and illustrate how the e-commerce scenario can be modeled u... |

57 |
Steady state learning and Nash equilibrium," Econo metrica
- Levine
- 1993
(Show Context)
Citation Context ...blishing our POMDP approach as near-optimal. A second remark concerns the close relation between Theorem 2 and a rich body of work from both the game-theoretic community on Bayesian learning in games =-=[9, 18, 19, 31, 32, 34, 44]-=- and the information-theoretic community on the convergence of Bayesian estimation [8, 14, 17, 24, 25, 28, 29, 54, 63]. Finally, it is worth noting that the potential benefits, in terms of performance... |

56 | An effective personal mobile robot agent through symbiotic human-robot interaction,”
- Rosenthal, Biswas, et al.
- 2010
(Show Context)
Citation Context ...f interest would be the application of ad hoc teamwork with human agents. This is closely related with recent research on symbiotic human robot interaction, as proposed in the work of Rosenthal et al =-=[51]-=-. Finally, from a more technical point of view, it would be possible to improve our online learning approach by explicitly taking into consideration the fact that the observations of the agent provide... |

55 | Mutual information, metric entropy and cumulative relative entropy risk.
- Haussler, Opper
- 1997
(Show Context)
Citation Context ... a rich body of work from both the game-theoretic community on Bayesian learning in games [9, 18, 19, 31, 32, 34, 44] and the information-theoretic community on the convergence of Bayesian estimation =-=[8, 14, 17, 24, 25, 28, 29, 54, 63]-=-. Finally, it is worth noting that the potential benefits, in terms of performance arising from the POMDP approach, come at a cost in complexity. In fact, even if an approximate solver such as Perseus... |

54 | Bayesian learning in normal form games.
- Jordan
- 1991
(Show Context)
Citation Context |

43 |
The exponential convergence of posterior probabilities with implications for Bayes estimators of density functions
- Barron
- 1987
(Show Context)
Citation Context ... a rich body of work from both the game-theoretic community on Bayesian learning in games [9, 18, 19, 31, 32, 34, 44] and the information-theoretic community on the convergence of Bayesian estimation =-=[8, 14, 17, 24, 25, 28, 29, 54, 63]-=-. Finally, it is worth noting that the potential benefits, in terms of performance arising from the POMDP approach, come at a cost in complexity. In fact, even if an approximate solver such as Perseus... |

35 | Thompson sampling: An asymptotically optimal finite time analysis
- KAUFMANN, KORDA, et al.
- 2012
(Show Context)
Citation Context ...f the prior p0, alluded to in (i), suggests a Bayesian approach to the ad hoc teamwork problem, possibly along the lines of the seminal work of Gittins [26] or the more recent works of Kauffman et al =-=[35, 36]-=-. As for (ii), as briefly discussed in Section 3.3, it impacts the way in which regret is defined. In particular, the performance of our predictor should not be measured against that of the best exper... |

27 | Empirical Evaluation of Ad Hoc Teamwork in the Pursuit Domain.
- Barrett, Stone, et al.
- 2011
(Show Context)
Citation Context ...in random scenarios of increasing complexity. Section 4.4 investigates the general applicability of our approach in a benchmark scenario from the literature on multi agent systems, the pursuit domain =-=[6, 7]-=-. In this benchmark scenario, we conduct a qualitative comparison with a related approach from the ad hoc teamwork literature [7]. All experiments follow a similar methodology. One ad hoc agent intera... |

21 |
Coordination and adaptation in impromptu teams.
- Bowling, McCracken
- 2005
(Show Context)
Citation Context ... teammate to ensure, for example, that both are purchasing the corresponding items from the same supplier. A similar situation can also arise in other domains [60]. For example, Bowling and McCracken =-=[11]-=- provide an early example of ad hoc teamwork in robot soccer. This work introduces the notion of an impromptu team, where an agent (referred to as a pickup player) is teamed up with a group of unknown... |

18 | Leading Ad Hoc Agents in Joint Action Settings with Multiple Teammates.
- Agmon, Stone
- 2012
(Show Context)
Citation Context ...ferring the unknown strategy adopted by the teammates from the actions of the latter. This work, like many of the existing studies in ad hoc teamwork, assumes that agents are aware of the target task =-=[2, 13, 61]-=- and does not consider the problem of identifying it from the teammate’s behavior. The key contributions of this work are thus fourfold. First, we present in Section 2 a new perspective on the ad hoc ... |

18 | Online planning for ad hoc autonomous agent teams
- Wu, Zilberstein, et al.
- 2011
(Show Context)
Citation Context ...des a strong theoretical framework to analyze multiagent interaction [40] and results from iterated play in normal form games that have been adapted to address ad hoc teamwork. For instance, Wu et al =-=[66]-=- propose an online planning algorithm, where the planning problem is approximated by solving a series of stage games. Albrecht and Ramamoorthy [3] formulate the coordination problem in ad hoc teams as... |

11 |
Dubins (1962): "Merging of Opinions with Increasing Information
- Blackwell, Lester
(Show Context)
Citation Context |

11 |
Game theory-based opponent modeling in large imperfect-information games
- Ganzfried, Sandholm
- 2011
(Show Context)
Citation Context ...y a target task from the observed behavior of other agent(s). Similarly, it is possible to identify teammate identification with the large body of work on learning in games [20] and opponent modeling =-=[21, 48]-=-, where an agent must predict the behavior of other agents from observing their actions. Finally, planning is clearly related with the growing body of work on decentralized/distributed planning [27, 5... |

11 |
der Vaart (2000). Convergence rates of posterior distributions. The Annals of Statistics 28
- Ghosal, Ghosh, et al.
(Show Context)
Citation Context ... a rich body of work from both the game-theoretic community on Bayesian learning in games [9, 18, 19, 31, 32, 34, 44] and the information-theoretic community on the convergence of Bayesian estimation =-=[8, 14, 17, 24, 25, 28, 29, 54, 63]-=-. Finally, it is worth noting that the potential benefits, in terms of performance arising from the POMDP approach, come at a cost in complexity. In fact, even if an approximate solver such as Perseus... |

9 | Ad hoc teamwork for leading a flock.
- Genter, Agmon, et al.
- 2013
(Show Context)
Citation Context ...agent teams, adopting a game-theoretic analysis; however, instead of leading the team to the optimal joint utility, the ad hoc agent leads the team to the optimal reachable joint action. Genter et al =-=[23]-=- are concerned with how ad hoc agents can lead a flock of agents to a desired orientation and propose an initial theoretical and empirical analysis of the problem. In our work, we also assume teammate... |

8 |
der Vaart (2007). Convergence rates of posterior distributions for non iid observations
- Ghosal, van
(Show Context)
Citation Context |

7 |
The exponential convergence of Bayesian learning in normal form games. Games and Economic Behavior 4(2):202–217
- Jordan
- 1992
(Show Context)
Citation Context |

7 |
Szepesvari C (2006) Bandit based Monte-Carlo planning
- Kocsis
- 2006
(Show Context)
Citation Context ...s manageable, it can be solved exactly using value iteration. For larger problems (where a larger grid is considered, for example), more powerful methods are required, such as Monte Carlo tree search =-=[38]-=-. For each task τ ∈ T , it is now possible to associate with each state-action pair (x, a) a value, denoted by Qτ (x, a). Qτ (x, a) represents the total MDP reward that the agent expects to receive up... |

7 |
2005 Perseus: randomized point-based value iteration for POMDPs
- Spaan, Vlassis
(Show Context)
Citation Context ... wide-range of approximate methods are available in the literature, the most popular of which are, perhaps, point-based methods [46]. In this paper, we adopt the popular Perseus point-based algorithm =-=[56]-=-. POMDP formulation of the ad hoc teamwork problem In order to apply the Perseus algorithm to the ad hoc teamwork problem, we start by formulating the latter as a POMDPM = (X ,A,Z,P,O, c, γ). Let HN (... |

6 | An analysis framework for ad hoc teamwork tasks
- Barrett, Stone
- 2012
(Show Context)
Citation Context ...Our work introduces the notion of task identification and teammate identification as important steps so that the planning step can select the action of the ad hoc agent accordingly. Barrett and Stone =-=[5]-=- present a framework for analyzing ad hoc team problems by defining three dimensions along which to conduct such analysis. First, team knowledge is related to the knowledge that the ad hoc agent has a... |

6 |
Cassandra A
- Kaelbling, Littman
- 1998
(Show Context)
Citation Context .... Partially Observable Markov Decision Problems Partially observable Markov decision problems (POMDPs) provide a general framework for modeling sequential decision problems in the face of uncertainty =-=[33]-=-. At each time step, and depending on its perception of the environment, the agent in a POMDP must select an action from its action repertoire in order to maximize a numerical reward signal. Actions d... |

4 | P (2011) Ad hoc teamwork modeled with multi-armed bandits: An extension to discounted infinite rewards
- Barrett, Stone
(Show Context)
Citation Context ...Like the aforementioned work, we also rely on a specific type of Markovian teammate model (i.e., a history-based best-response model). In order to empirically evaluate ad hoc teams, Barrett and Stone =-=[4]-=- present a study of several ad hoc teamwork strategies in a more open and complex teamwork domain, the pursuit domain, which has been frequently used in the literature on multiagent systems [59]. In a... |

4 |
Lugosi G (2006) Prediction, Learning and Games
- Cesa-Bianchi
(Show Context)
Citation Context ...ons regarding the actions of the teammate. With the loss defined in (1), it is possible to recast the ad hoc teamwork problem as a simple online learning problem, for which a rich body of work exists =-=[12]-=-. The remainder of this subsection details the construction of such online learning problem and the corresponding application of a standard prediction algorithm, providing some general performance gua... |

4 | Weighted synergy graphs for effective team formation with heterogeneous ad hoc agents.
- Liemhetcharat, Veloso
- 2014
(Show Context)
Citation Context ...e team. The work also shows that the approach has a predictive nature, whereby the method can be trained for a task and used successfully for another task. In a related work, Liemhetcharat and Veloso =-=[41]-=- introduce the notion of weighted synergy graph for role assignment, which enables a set of agents (represented as vertices in the graph) to reason about team formation. Given a set of agents and a ta... |

4 |
Condon A
- Madani, Hanks
- 1999
(Show Context)
Citation Context ...Teammates’ Task 21 V ∗(pn) = max a∈A ∑ x pn(x) r(x, a) + γ∑ x′,z P(x′ | x, a)O(z | x′, a)V ∗(B(pn, z, a)) . Unfortunately, partially observable Markov decision problems are generally undecidable =-=[43]-=-, and V ∗ cannot be computed exactly. Instead, a wide-range of approximate methods are available in the literature, the most popular of which are, perhaps, point-based methods [46]. In this paper, we ... |

4 |
Prediction, optimization and learning
- Nachbar
- 1997
(Show Context)
Citation Context |

4 |
L (2001) Rates of convergence of posterior distributions. Ann Statist 29(3):687–714
- Shen, Wasserman
(Show Context)
Citation Context |

3 | Cooperating with a markovian ad hoc teammate.
- Chakraborty, Stone
- 2013
(Show Context)
Citation Context ...in (11) reduces to an update of the belief over the target task, given by pn+1(τ) , P [T ∗ = τ | H(n)] = ξP [A−α(n+ 1) = a−α | T ∗ = τ,H(n) = h1:n]P [T ∗ = τ | H(n) = h1:n)] = ξE−ατ (h1:n, a−α)pn(τ), =-=(13)-=- where ξ is a normalization constant and E−ατ is defined in (3). The decision rule P obtained from Perseus thus maps, at each time-step n, a pair (HN (n), pn) to an action P (HN (n), pn) ∈ A, where pn... |

3 |
The exponential rates of convergence of posterior distributions
- Fu, Kass
- 1988
(Show Context)
Citation Context |

3 |
G and Thrun S (2006) Anytime point-based approximations for large POMDPs
- Pineau, Gordon
(Show Context)
Citation Context ...nerally undecidable [43], and V ∗ cannot be computed exactly. Instead, a wide-range of approximate methods are available in the literature, the most popular of which are, perhaps, point-based methods =-=[46]-=-. In this paper, we adopt the popular Perseus point-based algorithm [56]. POMDP formulation of the ad hoc teamwork problem In order to apply the Perseus algorithm to the ad hoc teamwork problem, we st... |

3 | The design of a proactive personal agent for task management
- Yorke-Smith, Saadati, et al.
- 2012
(Show Context)
Citation Context ...ion and distribution [50] and personal assistants in smartphones that are able to keep track of users’ calendars and preferences in order to make recommendations and assist the users in several tasks =-=[67]-=-. As technology evolves, so will the autonomy and perceptual/actuation capabilities of such agents, prompting the need for truly autonomous agents that are able to co-exist with other (different) agen... |

2 |
2008): “Entropy bounds on Bayesian Learning
- Gossner, Tomala
(Show Context)
Citation Context |

1 |
Ramamoorthy S (2013) A game-theoretic model and best-response learning method for ad hoc coordination in multiagent systems
- Albrecht
(Show Context)
Citation Context ...d to address ad hoc teamwork. For instance, Wu et al [66] propose an online planning algorithm, where the planning problem is approximated by solving a series of stage games. Albrecht and Ramamoorthy =-=[3]-=- formulate the coordination problem in ad hoc teams as a stochastic Bayesian game and derive a best response rule. Our work is also based on a game-theoretic framework: tasks are modeled as K -player ... |

1 |
A (2013) Teamwork with limited knowledge of reammates
- Barrett, Stone, et al.
(Show Context)
Citation Context ...of several ad hoc teamwork strategies in a more open and complex teamwork domain, the pursuit domain, which has been frequently used in the literature on multiagent systems [59]. In a subsequent work =-=[7]-=-, the same authors extend the previous ad hoc teamwork strategies to make use of a library of learned teammates’ models and use it for the planning step. This approach is closely related to our online... |

1 |
Megiddo N (2006) Combining expert advice in reactive environments
- Farias
(Show Context)
Citation Context ...nst that of the “experts” Eτ , τ ∈ T , for any possible but fixed history hn. In this sense, the bound in Theorem 1 may be somewhat misleading, and an approach closer to that of de Farias and Megiddo =-=[15]-=- would be more suited. In the next section, we propose a more evolved approach to overcoming the limitations just identified. In the remainder of this section, we revisit Example 1 and illustrate the ... |

1 |
Agmon N, Stone P (2011) Role-based ad hoc teamwork
- Genter
(Show Context)
Citation Context ...ehavior of the ad hoc agent, which is closely related with agent reactivity. Our paper goes beyond this analysis and proposes two novel approaches to address the ad hoc teamwork problem. Genter et al =-=[22]-=- propose a role-based approach for ad hoc teams, where an agent must identify a role, within a set of possible roles, that yields optimal utility for the team. The work also shows that the approach ha... |

1 |
Cappé O, Garivier A (2012) On Bayesian upper confidence bounds for bandit problems
- Kauffman
(Show Context)
Citation Context ...f the prior p0, alluded to in (i), suggests a Bayesian approach to the ad hoc teamwork problem, possibly along the lines of the seminal work of Gittins [26] or the more recent works of Kauffman et al =-=[35, 36]-=-. As for (ii), as briefly discussed in Section 3.3, it impacts the way in which regret is defined. In particular, the performance of our predictor should not be measured against that of the best exper... |

1 |
Dadkhah C (2012) An overview on opponent modeling in RoboCup soccer simulation 2D
- Pourmehr
(Show Context)
Citation Context ...y a target task from the observed behavior of other agent(s). Similarly, it is possible to identify teammate identification with the large body of work on learning in games [20] and opponent modeling =-=[21, 48]-=-, where an agent must predict the behavior of other agents from observing their actions. Finally, planning is clearly related with the growing body of work on decentralized/distributed planning [27, 5... |

1 | Agentswitch: Towards smart energy tariff selection.
- Ramchurn, Osborne, et al.
- 2013
(Show Context)
Citation Context ...ect information and act upon it. Examples include smart grids that autonomously gather information and act upon it to improve the efficiency and reliability of electricity production and distribution =-=[50]-=- and personal assistants in smartphones that are able to keep track of users’ calendars and preferences in order to make recommendations and assist the users in several tasks [67]. As technology evolv... |

1 |
SZ (2008) Formal models and algorithms for decentralized decision making under uncertainty
- Seuken
(Show Context)
Citation Context ...1, 48], where an agent must predict the behavior of other agents from observing their actions. Finally, planning is clearly related with the growing body of work on decentralized/distributed planning =-=[27, 52]-=-. We refer to the work of Stone et al [60] for a detailed dis2 In the nomenclature of Fig. 1, we take planning in its broadest sense, which includes any offline and/or online reasoning about which act... |

1 |
Kaplow R (2013) A survey of point-based POMDP solvers
- Shani, Pineau
(Show Context)
Citation Context ...odest growth is expected because the number of tasks impacts linearly the dimension of the POMDP’s state-space, and pointbased methods have a computational complexity that is polynomial on the latter =-=[53]-=-. 13 We emphasize that the reported times are offline planning times. The online learning approach requires no offline planning and hence no time is reported for that approach. All times were obtained... |

1 |
Vlassis N (2006) Decentralized planning under uncertainty for teams of communicating agents
- Spaan, Gordon
(Show Context)
Citation Context ...ation to attain more efficient teamwork. This is certainly a challenging problem because it would require the ad hoc agent to first acquire some sort of model of the teammates’ communication protocol =-=[57]-=-. Also of interest would be the application of ad hoc teamwork with human agents. This is closely related with recent research on symbiotic human robot interaction, as proposed in the work of Rosentha... |

1 |
S (2010) To teach or not to teach?: Decision-making under uncertainty in ad hoc teams
- Stone, Kraus
(Show Context)
Citation Context ... our knowledge, most research on ad hoc teamwork is focused on the planning step, relying on different assumptions that simplify the other challenges associated with it. For instance, Stone and Kraus =-=[58]-=- propose one of the first algorithms for ad hoc teamwork. The authors formulate the ad hoc teamwork problem using the formalism of multi-armed bandits, in order to maximize the expected sum of payoffs... |

1 |
Rosenschein J (2010) Ad hoc autonomous agent teams: Collaboration without pre-coordination
- Stone, Kaminka, et al.
(Show Context)
Citation Context ...icit communication. The challenge of developing autonomous agents that are capable of cooperatively engaging with other (unknown) agents in a joint common task is known as the ad hoc teamwork problem =-=[60]-=-. It is a new and exciting research area, recently introduced in the pioneer work of Stone et al [60], and is closely related to other areas in the literature on autonomous agents and multiagent syste... |

1 |
Rosenschein J (2010) Leading a best-response teammate in an ad hoc team
- Stone, Kaminka
(Show Context)
Citation Context ...ach for action selection. The main difference in our approach is the explicit reasoning about task identification and the consideration of an adaptive teammate model. In a different work, Stone et al =-=[61]-=- describe an algorithm for an ad hoc agent to lead a best response agent to perform actions that yield optimal joint utility. Agmon and Stone [2] extend the work of Stone et al [61] to the more genera... |

1 |
Lijoi A, Prünster I (2007) On rates of convergence for posterior distributions in infinite-dimensional models. The Annals of Statistics 35(2):738–746
- Walker
(Show Context)
Citation Context |