back
next

Recursive Random Games:
A Probabilistic Model for Perfect Information Games

0. Preface and Summary

Traditionally, game-playing has been regarded as one of the key activities a machine would have to perform well before the question of its intelligence could be addressed, but all is not well with this idea. The power of fascination of a working program is such that people often see this as an end in itself, yet since computer science is still large­ly an experimental field, we have the possibility of trying out ideas before questioning their scope or validity. The result is that if some ideas work reasonably well, they are unlikely ever to be questioned. For example, the reasons why a simple idea like minimaxing works are still quite obscure once one recognizes that an initial intuitive appeal and/or a decent performance in actual tests do not qualify as rational explanations. Moreover, before we could even wonder why any given methodology works, we should wonder what it means that such a methodolo­gy does work. Do we want a procedure to perform well against opponents whose abilities are inherently limited, or do we request some kind of

1
2
absolute measure?

Game-playing is a perfect laboratory for studying a pure form of deductive reasoning uncorrupted by previous knowledge. We are not in­terested here in the form of expertise that is gained through long and tedious compilation of recipes: our object is the process by which those recipes are synthesized efficiently from scratch. What are the mechan­isms that contribute to turn a rough and inaccurate understanding of a situation into a decent level of expertise without resorting to exter­nally compiled knowledge? A better understanding of such mechanisms would undoubtly be gained if their effects could be measured with respect to some universal yardstick.

The staggering combinatorial complexity of usual games makes a probabilistic yardstick an obvious choice. This is similar to the prob­lem of statistical mechanics, where the number of molecules in a typical real sample is so large that deterministic laws can only be used as tools for deriving the probabilistic laws that govern the behavior of the sample as a whole. Compare this to the size of the game tree for chess, traditionally estimated to be around 10120, and the number of different board positions roughly 1040. Moreover, one can even design

2
3
games that are theoretically undecidable. No algorithm could be designed that would be guaranteed to find whether a typical position of such a game can be settled in finitely many moves when played perfectly. [*]
* Petri Nets can be viewed as games where the transition which is fired next is up to the player whose turn it is. The outcome of such games is determined according to the normal play rule: whoever cannot fire a transition loses the game. 'Petri Net Games' are undecidable in the sense described here.
 

In spite of this staggering complexity, simple practical ideas like the minimax evaluation of a truncated game tree have proved reason­ably effective. However, if the playing performance of commercially available microprocessor-based machines is no longer ridiculous, the play-level of even the best programs running on the most powerful machines available still does not compare with that of the better human experts. We conjecture that this is due, in part, to the lack of a reli­able measure that could accurately monitor progress in game-playing technology. We still lack an efficient way of discriminating between good and bad new ideas, and are often too shy in questioning the validi­ty of old ones.

Normally, we hope that a few general concepts combined with a lot of wasted computations will enable a machine to reach a decent level of expertise. This works to some extent. When this does not measure up to our expectations, the usual remedy is to introduce still more computing power and hope that the result will somehow be better. The basic processes through which common sense principles are turned into exper­tise are hardly ever questioned. Undoubtely, current approaches are

3
4
able to perform this magic but there is also very little doubt that they are suboptimal. It is unlikely that we just need a few more tricks and a little more speed to let our machines compete against the best experts and win. The success that brute force approach has enjoyed needs ex­plaining. A probabilistic quantification of game concepts should help in such a clarification and contribute to a better design of future pro­cedures. To achieve this goal, our probabilistic yardstick needs to be both simple to use and universal enough.

Our exploration of these problems in the present study is modest, restricted to an effort to explain how some procedures effectively take advantage of some probabilistic features of certain games to improve on the quality of any decision procedure, including random guessing, at the expense of more computational time. For this, we need a meta-domain of discourse that would allow us to examine infinitely many game positions at once. For if there is a form of intelligence that works surprisingly well without any previous experience in the domain of discourse, the very mechanisms that make such a form of intelligence work at all cannot be captured by the study of any single game. Such a discourse is given to us by the theory of probability.

The use of probabilistic methods in the study of a totally deter­ministic domain like perfect information games may appear superficially surprising, but in the world of games like in many others, determining the likelihood of a given feature is often much easier than recognizing its occurrence in a particular instance. For example, it may well be easier to find the probability of winning from a typical position than

4
5
it is to find the correct winning strategy from a given situation. In other words, it may be much easier to know how well we are expected to do than how well we are doing. The best we can do, then, is to choose a procedure of good expected performance according to some probabilistic model of the situation.

This paper presents such a probabilistic model for two-player per­fect information games. Every game position allows a random number of legal moves such that each, if enacted, leads to a similar situation. The model both conveys nontrivial general properties of games and ac­counts for some aspects of games that were overlooked in former models discussed in the literature. This includes game models based on a uni­form tree structure [15,17,23,26], that is, games that always allow the same number of legal options irrespective of the particular situation. By placing those models in a larger perspective, we explain some of the paradoxical artifacts with which they were recently found to be plagued.

Section 1 introduces the fundamental concepts generally used in the world of games in a perspective suitable for the rest of this expo­sition. We emphasize that games can be reduced to game trees, that games whose graphs contain cycles will unfold into infinite game trees, and that some but not all, of these infinite game trees correspond to games that cannot be settled in a finite number of moves if both players are playing perfectly. We find it useful to view those games as 'dynamic ties'.

5
6
Once section 2 introduces our model quantitatively, the natural appearance of dynamic ties appears as a justification a posteriori of our methodology. Our probabilistic assumptions do not rule out a coun­terpart of the cyclic games that may appear with games played on graphs. The catch is that the kind of ties that appear in game trees are essen­tially impossible to recognize as such by any finite procedure. So, even though our model does not rule out ties, we must consider that whenever the model's parameters are such that ties appear with nonzero probability the games may well be impossible to solve even for players with unbound­ed look-ahead capabilities. This can also be seen as an intrinsic limi­tation of game-playing procedures that discard the cycle structure of the game graph entirely. [*]
* Hash coding techniques are sometimes used to detect nodes that have been previously examined. While this technique is intended primarily to eliminate duplicate expansions of a node's successors in the acyclic case, it has the potential of avoiding cyclic expansions. Therefore, game-playing procedures that incorporate hash-coding may recognize ties in a finite number of steps.
 

Section 3 separates the model's parameters into two basic groups. The first group, called the internal structure K governs the behavior of games before their actual termination by determining the probability distribution of the number of legal options. The second, called the external structure consists of a probability of termination (1-b) and a probability p that a terminated game is a win for whoever turn it is to play. The external structure may be matched to the internal structure when the parameter p is equal to a special value xK. When this happens, internal and external nodes have the same probability of being first player wins, and our analysis remains valid even if external nodes

6
7
ap­pear according to some unspecified scheme. This scheme does not need to obey our general probabilistic assumption, and this remark extends the scope of our discussion. For example, external nodes could appear predominantly as children of bushy nodes, or they could all be located at a given fixed depth, as in most former models. In the matched case, the exact conditions for termination influence only the game's length, not its outcome. Furthermore, we show that some internal structures, called inert, have the property that they do not allow ties, games of infinite length, even if the probability of termination is arbitrarily low. The key to inertness is shown to be the fact that those games al­low a large standard deviation in the number of legal options available from a typical game position. This can happen for arbitrarily bushy game structures. Inert structures are appropriate for describing long and bushy games in which ties almost never occur. The geometric distri­bution is shown to be a convenient special case appropriate for such a description.

The use of the model in the comparative analysis of the complexity of game-solving procedures is examplified in section 4. An expression for the branching factor of the standard depth-first search procedure SOLVE is given. Various procedures that improve on SOLVE are also dis­cussed. The lowest probability of termination that makes a game-solving procedure almost surely terminate is suggested as a good measure of the limitations inherent in that particular procedure.

Section 5 extends the model to include asymmetrical partisan games. In our probabilistic perspective, partisan games are games that

7
8
allow the probability distribution of the number of legal options to depend explicitly on whose turn it is to play. The winning probability for tip nodes may also depend on who played last. The discussion is not significantly more complex than that for the impartial case which is used in the rest of the paper mostly for convenience. The world of par­tisan games, however, is significantly richer than that of impartial ones. Most real games have partisan rules and this often translates into a partisan probabilistic model. The partisan version of our model may exhibit an amusing phenomenon, illustrated in section 5. One player may have a decisive advantage in a game lasting many moves, while the other player only stands a chance if the game lasts very few moves. Partisan games can also be totally unfair. If one of the players cannot win a terminal position while the other player cannot lose one, the best the former player can hope for is a tie. In this very special case, virtually all internal structures will yield a nonzero probability for ties if the probability of termination is low enough. [*]
* One of the rare exceptions is the geometric distribution (see Section 3). If the game's internal structure K is the same for both players, then exceptions to this statement can only be obtained if 1-K is its own inverse. At this writing, the geometric distribution is the only such example we know of.
 

Section 6 outlines some ideas relating to the propagation of er­rors in the evaluation of the status of a game tree. This issue was the initial motivation for the model presented here. Program designers have been assuming for decades that the quality of a game-playing decision based on minimaxing could only improve with a deeper search. Yet it was recently discovered [20] that the standard theoretical game model that is traditionally [15,17,23,26] used supports the opposite view, in spite

8
9
of the fact that minimax programs have been performing reasonably well for decades. Theoretically, the deeper a minimax search, the worse the decision. Nau [20] termed this phenomenon pathological. Minimax search performs much better in practice than it has any right to in theory! Explanations have been attempted [24,25], but the question is still essentially open.

Our model supports the view that a tractable game will not be pathological, while an intractable game, a game in which dynamic ties appear naturally, normally will be pathological. In this context, pathology appears as a property of the internal structure of a game. A minimax estimate can be expected to discriminate between good and bad moves better than the static evaluation function it is based on, provid­ed the internal structure of the game is inert. To put it bluntly, pathology was observed on former models that used a constant branching degree d because the internal structure of these models (K(z) = zd) is not inert. This remark could also be seen as an indication that the games people actually play are somehow better represented by an inert structure than by a non-inert one. [*]

* It is not our intention to say that chess, say, can be represented accurately by a recursive random game. However, we believe that the concepts presented here are general enough to have a counterpart in the domain of real games. Any reference to such games is to be taken in a figurative sense.
 

In actual game-playing procedures, it would be foolish to hope for an exhaustive search of the game tree. For all situations that are not end-game positions, the search has to be truncated as some point, yet truncating the search tree at a fixed depth turns out to be a poor idea

9
10
in practice. Consequently, pursuing the search of game trees only from certain positions called non-quiescent was recognized very early [27] as an effective scheme. Section 7 presents an analysis of quiescence, and demonstrates that our model is flexible enough to allow an investigation of the theoretical reasons for the advantage of quiescence-based search. Section 7 contains a theoretical confirmation of the usefulness of quiescence and establishes an appropriate criterion for truncating the search.

Section 8 presents an alternative to the minimax rule of process­ing the information obtained from nodes farther down in the game tree. The alternative suggested is to propagate winning estimates in the same way winning probabilities would. This results in a product propagation rule. A simple-minded analysis of that rule is given in that section.


Valid HTML
10
visits since 2001-03-29 Doctoral dissertation © Copyright 1983 by Gérard P. Michon, Ph.D.
back
next
home | title | contents | overview | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | appendix | references | help