DMCA
A bilinear programming approach for multiagent planning
Cached
Download Links
Venue: | Journal of Artificial Intelligence Research |
Citations: | 16 - 2 self |
Citations
2633 | A Course in Game Theory - Osborne, Rubinstein - 1994 |
1900 |
Markov Decision Processes: Discrete Stochastic Dynamic Programming
- Puterman
- 1994
(Show Context)
Citation Context ...re arises from introducing the primitive and compound events in the work of Becker et al. (2004). This reward structure is necessary to capture the characteristics of the Mars rover benchmark. Interestingly, this extension does not complicate our proposed solution methods in any way. Note that the stochastic shortest path formulation (right side of Figure 1) inherently eliminates any loops because time always advances when an action is taken. Therefore, every state in that representation may be visited at most once. This property is commonly used when an MDP is formulated as a linear program (Puterman, 2005). The solution of a DEC-MDP is a deterministic stationary policy π = (π1, π2), where πi : Si → Ai is the standard MDP policy (Puterman, 2005) for agent i. In particular, πi(si) represents the action taken by agent i in state si. To define the bilinear program, we use variables x(s1, a1) to denote the probability that agent 1 visits state s1 and takes action a1 and y(s2, a2) to denote the same for agent 2. These are the standard dual variables in MDP formulation. Given a solution in terms of x for agent 1, the policy is calculated for s ∈ S1 as follows, breaking ties arbitrarily. π1(s) = arg m... |
786 | The Linear Complementarity Problem.
- Cottle, Pang, et al.
- 1992
(Show Context)
Citation Context ...ith a mixed integer linear program (MILP), derived for Eq. (1) as Petrik and Zilberstein (2007b) describe. Although Eq. (1) can also be modeled as a linear complementarity problem (LCP) (Murty, 1988; =-=Cottle et al., 1992-=-), we do not evaluate that option experimentally because LCPs are closely related to MILPs (Rosen, 1986). We expect these two formulations to exhibit similar performance. We also do not compare to any... |
510 |
Modeling Bounded Rationality
- Rubinstein
- 1998
(Show Context)
Citation Context ...ng certain independence assumptions (Becker, Zilberstein, & Lesser, 2003) or by adding explicit communication actions (Goldman & Zilberstein, 2008). DEC-POMDPs are closely related to extensive games (=-=Rubinstein, 1997-=-). In fact, any DEC-POMDP represents an exponentially larger extensive game with a common objective. Unfortunately, DEC-POMDPs with just two agents are intractable in general, unlike MDPs that can be ... |
411 | The complexity of decentralized control of Markov decision processes.
- Bernstein, Zilberstein, et al.
- 2000
(Show Context)
Citation Context ...out the world–must cooperate with each other in order to achieve some joint objective. Such problems are common in practice and can be modeled as decentralized partially observable MDPs (DEC-POMDPs) (=-=Bernstein, Zilberstein, & Immerman, 2000-=-). Some refinements of this model have been studied, for example by making certain independence assumptions (Becker, Zilberstein, & Lesser, 2003) or by adding explicit communication actions (Goldman &... |
392 |
Global Optimization: Deterministic Approaches
- Horst, Tuy
- 1996
(Show Context)
Citation Context ...the bilinear term of the objective function are independently constrained. The theory of nonseparable bilinear programs is much more complicated and the corresponding algorithms are not as efficient (=-=Horst & Tuy, 1996-=-). Thus, we limit the discussion in this paper to separable bilinear programs and often omit the term “separable”. As discussed later in more detail, a separable bilinear program may be seen as a conc... |
191 | Taming decentralized POMDPs: Towards efficient policy computation for multiagent settings..
- Nair, Tambe, et al.
- 2003
(Show Context)
Citation Context ...ram. While the algorithm often performs well in practice, it tends to converge to a suboptimal solution (Mangasarian, 1995). When applied to DEC-MDPs, this algorithm is essentially identical to JESP (=-=Nair, Tambe, Yokoo, Pynadath, & Marsella, 2003-=-)–one of the early solution methods. In the following, we use f(w, x, y, z) to denote the objective value of Eq. (1). The rest of this section presents a new anytime algorithm for solving bilinear pro... |
166 | Sequential optimality and coordination in multiagent systems.
- Boutilier
- 1999
(Show Context)
Citation Context ... model to cooperative multiagent problems. One possibility is to assume that all the agents share all the information about the underlying state. This results in a multiagent Markov decision process (=-=Boutilier, 1999-=-), which is essentially an MDP with a factored action set. A more complex alternative is to allow only partial sharing of information among agents. In these settings, several agents–each having differ... |
158 | Complementarity, Linear and Nonlinear Programming
- Murty
(Show Context)
Citation Context ...nal CSA and with a mixed integer linear program (MILP), derived for Eq. (1) as Petrik and Zilberstein (2007b) describe. Although Eq. (1) can also be modeled as a linear complementarity problem (LCP) (=-=Murty, 1988-=-; Cottle et al., 1992), we do not evaluate that option experimentally because LCPs are closely related to MILPs (Rosen, 1986). We expect these two formulations to exhibit similar performance. We also ... |
116 | Fast algorithms for finding randomized strategies in game trees. - Koller, Megiddo, et al. - 1994 |
107 | Solving transition independent decentralized Markov decision processes.
- Becker
- 2004
(Show Context)
Citation Context ...d through a common reward function that depends on their states. The coverage set algorithm (CSA) was the first optimal algorithm to solve efficiently transition and observation independent DEC-MDPs (=-=Becker, Zilberstein, Lesser, & Goldman, 2004-=-). By exploiting the fact that the interaction between the agents is limited compared to their individual local problems, CSA can solve problems that cannot be solved by the more general exact DEC-POM... |
92 | Approximate solutions for partially observable stochastic games with common payoffs.
- Emery-Montemerlo, Gordon, et al.
- 2004
(Show Context)
Citation Context ...blems (Becker, c○2009 AI Access Foundation. All rights reserved.Petrik & Zilberstein Lesser, & Zilberstein, 2004; Kim, Nair, Varakantham, Tambe, & Yokoo, 2006) or provide only approximate solutions (=-=Emery-Montemerlo, Gordon, Schneider, & Thrun, 2004-=-; Nair, Roth, Yokoo, & Tambe, 2004; Seuken & Zilberstein, 2007). In this paper, we introduce an efficient algorithm for several restricted classes, most notably decentralized MDPs with transition and ... |
87 |
Algorithms for Partially Observable Markov Decision Processes.
- Cheng
- 1988
(Show Context)
Citation Context ...the intersection points of planes defined by the current solutions in ˜ X. That guarantees that g(y) is eventually known precisely (Becker et al., 2004). A similar approach was also taken for POMDPs (=-=Cheng, 1988-=-). The upper bound on the number of intersection points in CSA is ( | ˜ X| ) dim Y . The principal problem is that the bound is exponential in the dimension of Y , and experiments do not show a slower... |
78 | Transition-independent decentralized Markov decision processes.
- Becker, Zilberstein, et al.
- 2003
(Show Context)
Citation Context ...entralized partially observable MDPs (DEC-POMDPs) (Bernstein, Zilberstein, & Immerman, 2000). Some refinements of this model have been studied, for example by making certain independence assumptions (=-=Becker, Zilberstein, & Lesser, 2003-=-) or by adding explicit communication actions (Goldman & Zilberstein, 2008). DEC-POMDPs are closely related to extensive games (Rubinstein, 1997). In fact, any DEC-POMDP represents an exponentially la... |
66 | Formal models and algorithms for decentralized decision making under uncertainty.
- Seuken, Zilberstein
- 2008
(Show Context)
Citation Context ...ntractable in general, unlike MDPs that can be solved in polynomial time. Despite recent progress in solving DEC-POMDPs, even state-of-the-art algorithms are generally limited to very small problems (=-=Seuken & Zilberstein, 2008-=-). This has motivated the development of algorithms that either solve a restricted class of problems (Becker, c○2009 AI Access Foundation. All rights reserved.Petrik & Zilberstein Lesser, & Zilberste... |
53 | Decentralized Markov decision processes with event-driven interactions.
- Becker, Lesser, et al.
- 2004
(Show Context)
Citation Context ...izon), to handle interdependent observations, and to find Nash equilibria in competitive settings. 2.1 DEC-MDPs As mentioned previously, any transition-independent and observation-independent DECMDP (=-=Becker et al., 2004-=-) may be formulated as a bilinear program. Intuitively, a DECMDP is transition independent when no agent can influence the other agents’ transitions. A DEC-MDP is observation independent when no agent... |
41 | Bilinear separation of two sets in n-space
- Bennett, Mangasarian
- 1993
(Show Context)
Citation Context ...orithm. For this purpose we use the Mars rover problem described earlier. We compared our algorithm with the original CSA and with a mixed integer linear program (MILP), derived for Eq. (1) as Petrik and Zilberstein (2007b) describe. Although Eq. (1) can also be modeled as a linear complementarity problem (LCP) (Murty, 1988; Cottle et al., 1992), we do not evaluate that option experimentally because LCPs are closely related to MILPs (Rosen, 1986). We expect these two formulations to exhibit similar performance. We also do not compare to any of the methods described by Horst and Tuy (1996) and Bennett and Mangasarian (1992) due to their very different nature and high complexity, and because some of these algorithms do not provide any optimality guarantees. In our experiments, we applied the algorithm to randomly generated problem instances with the same parameters that Becker et al. (2003, 2004) used. Each problem instance includes 2 rovers and 6 sites. At each site, the rovers can decide to perform an experiment or to skip the site. Performing experiments takes some time, and all the experiments must be performed in 15 time units. The time required to perform an experiment is drawn from a discrete normal distri... |
40 | Communication for improving policy computation in distributed pomdps.
- Nair, Roth, et al.
- 2004
(Show Context)
Citation Context ...ts reserved.Petrik & Zilberstein Lesser, & Zilberstein, 2004; Kim, Nair, Varakantham, Tambe, & Yokoo, 2006) or provide only approximate solutions (Emery-Montemerlo, Gordon, Schneider, & Thrun, 2004; =-=Nair, Roth, Yokoo, & Tambe, 2004-=-; Seuken & Zilberstein, 2007). In this paper, we introduce an efficient algorithm for several restricted classes, most notably decentralized MDPs with transition and observation independence (Becker e... |
28 |
Decentralized control of a multiple access broadcast channel: performance bounds.
- Ooi, Wornell
- 1996
(Show Context)
Citation Context ...rik & Zilberstein, 2007b). This is particularly useful for infinite-horizon DEC-MDPs. For example, consider the infinitehorizon version of the Multiple Access Broadcast Channel (MABC) (Rosberg, 1983; =-=Ooi & Wornell, 1996-=-). In this problem, which has been used widely in recent studies of decentralized decision making, two communication devices share a single channel, and they need to periodically transmit some data. H... |
22 | The linear complementarity problem as a separable bilinear program
- Mangasarian
- 1995
(Show Context)
Citation Context ...ipulation (Pang, Trinkle, & Lo, 1996), bilinear separation (Bennett & Mangasarian, 236A Bilinear Programming Approach for Multiagent Planning 1992), and even general linear complementarity problems (=-=Mangasarian, 1995-=-). We focus on multiagent planning problems where this formulation turns out to be particularly effective. Definition 1. A separable bilinear program in the normal form is defined as follows: maximize... |
21 | Anytime coordination using separable bilinear programs.
- Petrik, Zilberstein
- 2007
(Show Context)
Citation Context ...inds of separable bilinear problems. When the algorithm is applied to DEC-MDPs, it improves efficiency by several orders of magnitude compared with previous state-of the art algorithms (Becker, 2006; =-=Petrik & Zilberstein, 2007-=-a). In addition, the algorithm provides useful runtime bounds on the approximation error, which makes it more useful as an anytime algorithm. Finally, the algorithm is formulated for general separable... |
13 | Exploiting locality of interaction in networked distributed POMDPs.
- Kim, Nair, et al.
- 2006
(Show Context)
Citation Context ...s motivated the development of algorithms that either solve a restricted class of problems (Becker, c○2009 AI Access Foundation. All rights reserved.Petrik & Zilberstein Lesser, & Zilberstein, 2004; =-=Kim, Nair, Varakantham, Tambe, & Yokoo, 2006-=-) or provide only approximate solutions (Emery-Montemerlo, Gordon, Schneider, & Thrun, 2004; Nair, Roth, Yokoo, & Tambe, 2004; Seuken & Zilberstein, 2007). In this paper, we introduce an efficient alg... |
13 |
A complementarity approach to a quasistatic rigid body motion problem.
- Pang, Trinkle, et al.
- 1996
(Show Context)
Citation Context ...tiagent planning problems that can be formulated as such. In addition to multiagent planning problems, bilinear programs can be used to solve a variety of other problems such as robotic manipulation (=-=Pang, Trinkle, & Lo, 1996-=-), bilinear separation (Bennett & Mangasarian, 236A Bilinear Programming Approach for Multiagent Planning 1992), and even general linear complementarity problems (Mangasarian, 1995). We focus on mult... |
13 |
A finitely convergent algorithm for bilinear programming problems using polar cuts and disjunctive face cuts.
- Sherali, Shetty
- 1980
(Show Context)
Citation Context ... problems can be treated within this framework. Besides multiagent coordination problems, bilinear programs have been previously used to solve problems in operations research and global optimization (=-=Sherali & Shetty, 1980-=-; White, 1992; Gabriel, Garca-Bertrand, Sahakij, & Conejo, 2005). Global optimization deals with finding the optimal solutions to problems with multi-extremal objective function. Solution techniques o... |
13 |
Linear Programming: Foundations and Extensions (2nd edition).
- Vanderbei
- 2001
(Show Context)
Citation Context ...inequalities: maximize x,y x T Cy subject to A1x ≤ b1 x ≥ 0 A2y ≤ b2 y ≥ 0 (2) The latter formulation can be easily transformed into the normal form using standard transformations of linear programs (=-=Vanderbei, 2001-=-). In particular, we can introduce slack 1. It is possible to define the dimensionality in terms of x, or the minimum of dimensions of x and y. The issue is discussed in Appendix B. 237Petrik & Zilbe... |
8 | Communication-based decomposition mechanisms for decentralized MDPs.
- Goldman, Zilberstein
- 2008
(Show Context)
Citation Context ...man, 2000). Some refinements of this model have been studied, for example by making certain independence assumptions (Becker, Zilberstein, & Lesser, 2003) or by adding explicit communication actions (=-=Goldman & Zilberstein, 2008-=-). DEC-POMDPs are closely related to extensive games (Rubinstein, 1997). In fact, any DEC-POMDP represents an exponentially larger extensive game with a common objective. Unfortunately, DEC-POMDPs wit... |
8 |
Average reward decentralized Markov decision processes.
- Petrik, Zilberstein
- 2007
(Show Context)
Citation Context ...inds of separable bilinear problems. When the algorithm is applied to DEC-MDPs, it improves efficiency by several orders of magnitude compared with previous state-of the art algorithms (Becker, 2006; =-=Petrik & Zilberstein, 2007-=-a). In addition, the algorithm provides useful runtime bounds on the approximation error, which makes it more useful as an anytime algorithm. Finally, the algorithm is formulated for general separable... |
7 |
Optimal decentralized control in a multiaccess channel with partial information.
- Rosberg
- 1983
(Show Context)
Citation Context ...ar program (Petrik & Zilberstein, 2007b). This is particularly useful for infinite-horizon DEC-MDPs. For example, consider the infinitehorizon version of the Multiple Access Broadcast Channel (MABC) (=-=Rosberg, 1983-=-; Ooi & Wornell, 1996). In this problem, which has been used widely in recent studies of decentralized decision making, two communication devices share a single channel, and they need to periodically ... |
7 |
Memory bounded dynamic programming for DECPOMDPs.
- Seuken, Zilberstein
- 2007
(Show Context)
Citation Context ...Lesser, & Zilberstein, 2004; Kim, Nair, Varakantham, Tambe, & Yokoo, 2006) or provide only approximate solutions (Emery-Montemerlo, Gordon, Schneider, & Thrun, 2004; Nair, Roth, Yokoo, & Tambe, 2004; =-=Seuken & Zilberstein, 2007-=-). In this paper, we introduce an efficient algorithm for several restricted classes, most notably decentralized MDPs with transition and observation independence (Becker et al., 2003). For the sake o... |
6 |
A linear programming approach to solving bilinear programmes.
- White
- 1992
(Show Context)
Citation Context ...pproach, it is possible to reduce the dimensionality of such problems automatically, without extensive modeling effort. This makes it easy to apply our new method in practice. When applied to DEC-MDPs, the algorithm is much faster than the existing CSA method, on average reducing computation time by four orders of magnitude. We also show that a variety of other coordination problems can be treated within this framework. Besides multiagent coordination problems, bilinear programs have been previously used to solve problems in operations research and global optimization (Sherali & Shetty, 1980; White, 1992; Gabriel, Garca-Bertrand, Sahakij, & Conejo, 2005). Global optimization deals with finding the optimal solutions to problems with multi-extremal objective function. Solution techniques often share the same idea and are based on cutting plane methods. The main idea is to iteratively restrict the set of feasible solutions, while improving the incumbent 266 A Bilinear Programming Approach for Multiagent Planning solution. Horst and Tuy (1996) provide an excellent overview of these techniques. These algorithms have different characteristics and cannot be directly compared to the algorithm we deve... |
5 |
Exploiting Structure in Decentralized Markov Decision Processes.
- Becker
- 2006
(Show Context)
Citation Context ...olving these kinds of separable bilinear problems. When the algorithm is applied to DEC-MDPs, it improves efficiency by several orders of magnitude compared with previous state-of the art algorithms (=-=Becker, 2006-=-; Petrik & Zilberstein, 2007a). In addition, the algorithm provides useful runtime bounds on the approximation error, which makes it more useful as an anytime algorithm. Finally, the algorithm is form... |
5 |
Solution of general LCP by 0-1 mixed integer programming.
- Rosen
- 1986
(Show Context)
Citation Context ...ough Eq. (1) can also be modeled as a linear complementarity problem (LCP) (Murty, 1988; Cottle et al., 1992), we do not evaluate that option experimentally because LCPs are closely related to MILPs (=-=Rosen, 1986-=-). We expect these two formulations to exhibit similar performance. We also do not compare to any of the methods described by Horst and Tuy (1996) and Bennett and Mangasarian (1992) due to their very ... |
4 |
Mathematical programming methods for decentralized POMDPs.
- Aras
- 2008
(Show Context)
Citation Context ...tengel, 1994), and integer linear program formulation of DEC-POMDPs (Aras & Charpillet, 2007). The approach we develop is closely related to event-driven DECPOMDPs (Becker et al., 2004), but it is in general more efficient. Nevertheless, the size of the bilinear program is exponential in the size of the DEC-POMDP. This can be expected since solving DEC-POMDPs is NEXP-complete (Bernstein et al., 2000), while solving bilinear programs is NP-complete (Mangasarian, 1995). Because the general formulation in this case is somewhat cumbersome, we only illustrate it using the following simple example. Aras (2008) provides the details of a similar construction. Example 6. Consider the problem depicted in Figure 3, assuming that the agents are cooperative. The actions of the other agent are not observable, as denoted by the information sets. This approach can be generalized to any problem with any observable sets as long as the perfect recall condition is satisfied. Agents satisfy the perfect recall condition when they remember the set of actions taken in the prior moves (Osborne & Rubinstein, 1994). Rewards are only collected in the leaf-nodes in this case. The variables on the edges represent the prob... |
3 | Interaction structure and dimensionality in decentralized problem solving. - Allen, Petrik, et al. - 2008 |
3 |
A practical approach to approximate bilinear functions in mathematical programming problems by using Schurs decomposition and SOS type 2 variables.
- Gabriel, Garca-Bertrand, et al.
- 2005
(Show Context)
Citation Context ...framework. Besides multiagent coordination problems, bilinear programs have been previously used to solve problems in operations research and global optimization (Sherali & Shetty, 1980; White, 1992; =-=Gabriel, Garca-Bertrand, Sahakij, & Conejo, 2005-=-). Global optimization deals with finding the optimal solutions to problems with multi-extremal objective function. Solution techniques often share the same idea and are based on cutting plane methods... |
1 |
A mixed integer linear programming method for the finitehorizon Dec-POMDP problem.
- Aras, Charpillet
- 2007
(Show Context)
Citation Context ...The approach in this case may be similar to linear complementarity problem formulation of extensive games (Koller, Megiddo, & von Stengel, 1994), and integer linear program formulation of DEC-POMDPs (=-=Aras & Charpillet, 2007-=-). The approach we develop is closely related to event-driven DECPOMDPs (Becker et al., 2004), but it is in general more efficient. Nevertheless, the size of the bilinear program is exponential in the... |
1 | 272 Bilinear Programming Approach for Multiagent Planning Bertsekas - P, Tsitsiklis - 1996 |
1 |
Increased fexibility and robustness of Mars rovers.
- Bresina, Golden, et al.
- 1999
(Show Context)
Citation Context ...MDP to refer to transition and observation independent DEC-MDP. The DEC-MDP model has proved useful in several multiagent planning domains. One example that we use is the Mars rover planning problem (=-=Bresina, Golden, Smith, & Washington, 1999-=-), first formulated as a DEC-MDP by Becker et al. (2003). This domain involves two autonomous rovers that visit several sites in a given order and may decide to perform certain scientific experiments ... |