Results 1 - 10
of
25
Efficient selectivity and backup operators in Monte-Carlo tree search
- In: Proceedings Computers and Games 2006
, 2006
"... Abstract. Monte-Carlo evaluation consists in estimating a position by averaging the outcome of several random continuations, and can serve as an evaluation function at the leaves of a min-max tree. This paper presents a new framework to combine tree search with Monte-Carlo evaluation, that does not ..."
Abstract
-
Cited by 66 (2 self)
- Add to MetaCart
Abstract. Monte-Carlo evaluation consists in estimating a position by averaging the outcome of several random continuations, and can serve as an evaluation function at the leaves of a min-max tree. This paper presents a new framework to combine tree search with Monte-Carlo evaluation, that does not separate between a min-max phase and a Monte-Carlo phase. Instead of backing-up the min-max value close to the root, and the average value at some depth, a more general backup operator is defined that progressively changes from averaging to min-max as the number of simulations grows. This approach provides a fine-grained control of the tree growth, at the level of individual simulations, and allows efficient selectivity methods. This algorithm was implemented in a 9 × 9 Go-playing program, Crazy Stone, that won the 10th KGS computer-Go tournament. 1
Issues in computational Vickrey auction
- INTERNATIONAL JOURNAL OF ELECTRONIC COMMERCE
, 2000
"... The Vickrey auction has been widely advocated for multiagent systems. First we review its limitations so as to guide practitioners in their decision of when to use that protocol. These limitations include lower revenue than alternative protocols, lying in non-private-value auctions, bidder collus ..."
Abstract
-
Cited by 48 (25 self)
- Add to MetaCart
The Vickrey auction has been widely advocated for multiagent systems. First we review its limitations so as to guide practitioners in their decision of when to use that protocol. These limitations include lower revenue than alternative protocols, lying in non-private-value auctions, bidder collusion, a lying auctioneer, and undesirable revelation of sensitive information. We discuss the special characteristics of Internet auctions: third party auction servers, cryptography, and how proxy agents relate to the revelation principle and fail to promote truth-telling.
Costly valuation computation in auctions
- In In Proceedings of the Eighth Conference of Theoretical Aspects of Knowledge and Rationality (TARK VIII), Sienna
, 2001
"... We investigate deliberation and bidding strategies of agents with unlimited but costly computation who are participating in auctions. The agents do not a priori know their valuations for the items begin auctioned. Instead they devote computational resources to compute their valuations. We present a ..."
Abstract
-
Cited by 47 (24 self)
- Add to MetaCart
We investigate deliberation and bidding strategies of agents with unlimited but costly computation who are participating in auctions. The agents do not a priori know their valuations for the items begin auctioned. Instead they devote computational resources to compute their valuations. We present a normative model of bounded rationality where deliberation actions of agents are incorporated into strategies and equilibria are analyzed for standard auction protocols. We show that even in settings such as English auctions where information about other agents ’ valuations is revealed for free by the bidding process, agents may still compute on opponents’ valuation problems, incurring a cost, in order to determine how to bid. We compare the costly computation model of bounded rationality with a different model where computation is free but limited. For some auction mechanisms the equilibrium strategies are substantially different. It can be concluded that the model of bounded rationality impacts the agents ’ equilibrium strategies and must be considered when designing mechanisms for computationally limited agents. 1
Bargaining with Limited Computation: Deliberation Equilibrium
- ARTIFICIAL INTELLIGENCE
, 2001
"... We develop a normative theory of interaction---negotiation in particular---among self-interested computationally limited agents where computational actions are game theoretically treated as part of an agent's strategy. We focus on a 2-agent setting where each agent has an intractable individual prob ..."
Abstract
-
Cited by 40 (18 self)
- Add to MetaCart
We develop a normative theory of interaction---negotiation in particular---among self-interested computationally limited agents where computational actions are game theoretically treated as part of an agent's strategy. We focus on a 2-agent setting where each agent has an intractable individual problem, and there is a potential gain from pooling the problems, giving rise to an intractable joint problem. At any time, an agent can compute to improve its solution to its own problem, its opponent's problem, or the joint problem. At a deadline the agents then decide whether to implement the joint solution, and if so, how to divide its value (or cost). We present a fully normative model for controlling anytime algorithms where each agent has statistical performance profiles which are optimally conditioned on the problem instance as well as on the path of results of the algorithm run so far. Using this model, we introduce a solution concept, which we call deliberation equilibrium. It is the perfect Bayesian equilibrium of the game where deliberation actions are part of each agent's strategy. The equilibria differ based on whether the performance profiles are deterministic or stochastic, whether the deadline is known or not, and whether the proposer is known in advance or not. We present algorithms for finding the equilibria. Finally, we show that there exist instances of the deliberation--bargaining problem where no pure strategy equilibria exist and also instances where the unique equilibrium outcome is not Pareto efficient.
The Games Computers (and People) Play
, 2000
"... In the 40 years since Arthur Samuel's 1960 Advances in Computers chapter, enormous progress has been made in developing programs to play games of skill at a level comparable to, and in some cases beyond, what the best humans can achieve. In Samuel's time, it would have seemed unlikely that only ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
In the 40 years since Arthur Samuel's 1960 Advances in Computers chapter, enormous progress has been made in developing programs to play games of skill at a level comparable to, and in some cases beyond, what the best humans can achieve. In Samuel's time, it would have seemed unlikely that only a scant 40 years would be needed to develop programs that play world-class backgammon, checkers, chess, Othello, and Scrabble. These remarkable achievements are the result of a better understanding of the problems being solved, major algorithmic insights, and tremendous advances in hardware technology. Computer games research is one of the major success stories of articial intelligence. This chapter can be viewed as a successor to Samuel's work. A review of the scientic advances made in developing computer games is given. These ideas are the ingredients required for a successful program. Case studies for the games of backgammon, bridge, checkers, chess, Othello, poker, and Scrabb...
Improving heuristic mini-max search by supervised learning
- Artificial Intelligence
, 2002
"... This article surveys three techniques for enhancing heuristic game-tree search pioneered in the author's Othello program Logistello, which dominated the computer Othello scene for several years and won against the human World-champion 6-0 in 1997. First, a generalized linear evaluation model (GLEM) ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
This article surveys three techniques for enhancing heuristic game-tree search pioneered in the author's Othello program Logistello, which dominated the computer Othello scene for several years and won against the human World-champion 6-0 in 1997. First, a generalized linear evaluation model (GLEM) is described that combines conjunctions of Boolean features linearly. This approach allows an automatic, data driven exploration of the feature space. Combined with e cient least squares weight tting, GLEM greatly eases the programmer's task of nding significant features and assigning weights to them. Second, the selective search heuristic ProbCut and its enhancements are discussed. Based on evaluation correlations ProbCut can prune probably irrelevant sub-trees with a prescribed con dence. Tournament results indicate a considerable playing strength improvement compared to full-width- search. Third, an opening book framework is presented that enables programs to improve upon previous play and to explore new opening lines by constructing and searching a game-tree based on evaluations of played variations. These general methods represent the state-of-the-art in computer Othello programming and begin to attract researchers in related elds. Key words:
Partial Order Bounding: A new Approach to Evaluation in Game Tree Search
"... In computer game-playing, the established method for constructing an evaluation function uses a scalar value computed as a weighted sum of features. This paper advocates the use of partial order evaluation, and describes an ecient new search method called partial order bounding (POB). Previous tree ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
In computer game-playing, the established method for constructing an evaluation function uses a scalar value computed as a weighted sum of features. This paper advocates the use of partial order evaluation, and describes an ecient new search method called partial order bounding (POB). Previous tree search algorithms using a partial order evaluation have attempted to propagate partially ordered values through the search tree, which leads to many problems in practice, such as the complexity of backing up sets of incomparable evaluations. POB compares partially ordered values only in the leaves of a game tree, and backs up boolean values through the tree. A closely related new algorithm, linear extension partial order bounding (LE-POB), uses a standard scalar alphabeta search with values from a suitably chosen linear extension of the partial order evaluation. As an application, the eectiveness of partial order evaluation is shown in the case of modeling capturing races called semeai in ...
Associating shallow and selective global tree search with monte carlo for 9x9 go
- In Proceedings of the 4th Computer and Games Conference (CG04
, 2004
"... This paper explores the association of shallow and selective global tree search with Monte Carlo in 9x9 go. This exploration is based on Olga and Indigo, two experimental Monte Carlo programs. We provide a min-max algorithm that iteratively deepens the tree until one move at the root is proved to be ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
This paper explores the association of shallow and selective global tree search with Monte Carlo in 9x9 go. This exploration is based on Olga and Indigo, two experimental Monte Carlo programs. We provide a min-max algorithm that iteratively deepens the tree until one move at the root is proved to be superior to the other ones. At each iteration, random games are started at leaf nodes to compute mean values. The progressive pruning rule and the min-max rule are applied to non terminal nodes. We set up experiments demonstrating the relevance of this approach. Indigo used this algorithm at the 8th Computer Olympiad held in Graz. 1
Using Performance Profile Trees to Improve Deliberation Control
, 2004
"... Performance profile trees have recently been proposed as a theoretical basis for fully normative deliberation control. In this paper we conduct the first experimental study of their feasibility and accuracy in making stopping decisions for anytime algorithms on optimization problems. Using data ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Performance profile trees have recently been proposed as a theoretical basis for fully normative deliberation control. In this paper we conduct the first experimental study of their feasibility and accuracy in making stopping decisions for anytime algorithms on optimization problems. Using data and algorithms from two different real-world domains, we compare performance profile trees to other well-established deliberation-control techniques. We show that performance profile trees are feasible in practice and lead to significantly better deliberation control decisions. We then conduct experiments using performance profile trees where deliberationcontrol decisions are made using conditioning on multiple features of the solution to illustrate that such an approach is feasible in practice.
Definition and Complexity of Some Basic Metareasoning Problems
, 2003
"... In most real-world settings, due to limited time or other resources, an agent cannot perform all potentially useful deliberation and information gathering actions. This leads to the metareasoning problem of selecting such actions. Decision-theoretic methods for metareasoning have been studied in AI, ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
In most real-world settings, due to limited time or other resources, an agent cannot perform all potentially useful deliberation and information gathering actions. This leads to the metareasoning problem of selecting such actions. Decision-theoretic methods for metareasoning have been studied in AI, but there are few theoretical results on the complexity of metareasoning. We derive hardness results for three settings which most real metareasoning systems would have to encompass as special cases. In the first, the agent has to decide how to allocate its deliberation time across anytime algorithms running on different problem instances. We show this to be N P-complete. In the second, the agent has to (dynamically) allocate its deliberation or information gathering resources across multiple actions that it has to choose among. We show this to be N P-hard even when evaluating each individual action is extremely simple. In the third, the agent has to (dynamically) choose a limited number of deliberation or information gathering actions to disambiguate the state of the world. We show that this is N P-hard under a natural restriction, and PSPACE-hard in general.

