Recursive Random Games:
A Probabilistic Model for Perfect Information Games
0. Preface and Summary
Traditionally, game-playing has been regarded as one of the key activities a machine
would have to perform well before the question of its intelligence could be addressed,
but all is not well with this idea. The power of fascination of a working program is
such that people often see this as an end in itself, yet since computer science is still
largely an experimental field, we have the possibility of trying out ideas before
questioning their scope or validity.
The result is that if some ideas work reasonably well,
they are unlikely ever to be questioned.
For example, the reasons why a simple idea like minimaxing works are still quite obscure
once one recognizes that an initial intuitive appeal and/or a decent performance in actual
tests do not qualify as rational explanations.
Moreover, before we could even wonder why any given methodology works,
we should wonder what it means that such a methodology does work.
Do we want a procedure to perform well against opponents whose abilities are
inherently limited, or do we request some kind of
absolute measure?
Game-playing is a perfect laboratory for studying a pure form of deductive reasoning
uncorrupted by previous knowledge.
We are not interested here in the form of expertise that is gained through
long and tedious compilation of recipes: our object is the process by which
those recipes are synthesized efficiently from scratch.
What are the mechanisms that contribute to turn a rough and inaccurate understanding
of a situation into a decent level of expertise without resorting to externally compiled
knowledge?
A better understanding of such mechanisms would undoubtly be gained if their effects
could be measured with respect to some universal yardstick.
The staggering combinatorial complexity of usual games makes a probabilistic yardstick
an obvious choice.
This is similar to the problem of statistical mechanics,
where the number of molecules in a typical real sample is so large that deterministic laws
can only be used as tools for deriving the probabilistic laws that govern the behavior of
the sample as a whole.
Compare this to the size of the game tree for chess, traditionally estimated to be around
10120, and the number of different board positions roughly
1040.
Moreover, one can even design
games that are theoretically undecidable.
No algorithm could be designed that would be guaranteed to find whether a typical position
of such a game can be settled in finitely many moves when played perfectly. [*]
* Petri Nets can be viewed as games where the transition which is fired next is up
to the player whose turn it is.
The outcome of such games is determined according to the normal play rule:
whoever cannot fire a transition loses the game.
'Petri Net Games' are undecidable in the sense described here.
|
| |
In spite of this staggering complexity, simple practical ideas like the minimax evaluation
of a truncated game tree have proved reasonably effective.
However, if the playing performance of commercially available microprocessor-based machines
is no longer ridiculous, the play-level of even the best programs running
on the most powerful machines available still does not compare with that of
the better human experts.
We conjecture that this is due, in part, to the lack of a reliable measure that could
accurately monitor progress in game-playing technology.
We still lack an efficient way of discriminating between good and bad new ideas,
and are often too shy in questioning the validity of old ones.
Normally, we hope that a few general concepts combined with a lot of wasted computations
will enable a machine to reach a decent level of expertise.
This works to some extent.
When this does not measure up to our expectations, the usual remedy is to introduce still
more computing power and hope that the result will somehow be better.
The basic processes through which common sense principles are turned into expertise are
hardly ever questioned.
Undoubtely, current approaches are
able to perform this magic but there is also very little doubt that they are suboptimal.
It is unlikely that we just need a few more tricks and a little more speed to let our
machines compete against the best experts and win.
The success that brute force approach has enjoyed needs explaining.
A probabilistic quantification of game concepts should help in such a clarification and
contribute to a better design of future procedures.
To achieve this goal, our probabilistic yardstick needs to be both simple to use and
universal enough.
Our exploration of these problems in the present study is modest,
restricted to an effort to explain how some procedures effectively take advantage
of some probabilistic features of certain games to improve on the quality of any decision
procedure, including random guessing, at the expense of more computational time.
For this, we need a meta-domain of discourse that would allow us
to examine infinitely many game positions at once.
For if there is a form of intelligence that works surprisingly well without any previous
experience in the domain of discourse, the very mechanisms that make
such a form of intelligence work at all cannot be captured by the study of any single game.
Such a discourse is given to us by the theory of probability.
The use of probabilistic methods in the study of a totally deterministic domain
like perfect information games may appear superficially surprising,
but in the world of games like in many others,
determining the likelihood of a given feature is often much easier than recognizing
its occurrence in a particular instance.
For example, it may well be easier to find the probability of winning from a typical
position than
it is to find the correct winning strategy from a given situation.
In other words, it may be much easier to know how well we are expected to do
than how well we are doing.
The best we can do, then, is to choose a procedure of good
expected performance according to some probabilistic model of the situation.
This paper presents such a probabilistic model for two-player perfect information games.
Every game position allows a random number of legal moves such that each, if enacted,
leads to a similar situation.
The model both conveys nontrivial general properties of games and accounts
for some aspects of games that were overlooked in former models discussed in the literature.
This includes game models based on a uniform tree structure
[15,17,23,26], that is, games that always allow the same number
of legal options irrespective of the particular situation.
By placing those models in a larger perspective, we explain some of the paradoxical artifacts
with which they were recently found to be plagued.
Section 1 introduces the fundamental concepts generally used in the world of games
in a perspective suitable for the rest of this exposition.
We emphasize that games can be reduced to game trees,
that games whose graphs contain cycles will unfold into infinite game trees,
and that some but not all, of these infinite game trees correspond to games that
cannot be settled in a finite number of moves if both players are playing perfectly.
We find it useful to view those games as 'dynamic ties'.
Once section 2 introduces our model quantitatively,
the natural appearance of dynamic ties appears as a justification a posteriori of
our methodology.
Our probabilistic assumptions do not rule out a counterpart of the cyclic games
that may appear with games played on graphs.
The catch is that the kind of ties that appear in game trees are essentially
impossible to recognize as such by any finite procedure.
So, even though our model does not rule out ties,
we must consider that whenever the model's parameters
are such that ties appear with nonzero probability the games may well be impossible
to solve even for players with unbounded look-ahead capabilities.
This can also be seen as an intrinsic limitation of game-playing procedures
that discard the cycle structure of the game graph entirely. [*]
* Hash coding techniques are sometimes used to detect nodes that
have been previously examined. While this technique is intended primarily to eliminate
duplicate expansions of a node's successors in the acyclic case,
it has the potential of avoiding cyclic expansions.
Therefore, game-playing procedures that incorporate hash-coding may recognize ties
in a finite number of steps.
|
| |
Section 3 separates the model's parameters into two basic groups.
The first group, called the internal structure K governs the behavior of games
before their actual termination by determining the probability distribution
of the number of legal options.
The second, called the external structure
consists of a probability of termination (1-b) and a probability p that a terminated game
is a win for whoever turn it is to play.
The external structure may be matched to the internal structure when the parameter p
is equal to a special value xK.
When this happens, internal and external nodes have the same probability
of being first player wins, and our analysis remains valid even if external nodes
appear according to some unspecified scheme.
This scheme does not need to obey our general probabilistic assumption,
and this remark extends the scope of our discussion.
For example, external nodes could appear predominantly as children of bushy nodes,
or they could all be located at a given fixed depth, as in most former models.
In the matched case, the exact conditions for termination influence only the game's length,
not its outcome. Furthermore, we show that some internal structures, called inert,
have the property that they do not allow ties, games of infinite length,
even if the probability of termination is arbitrarily low.
The key to inertness is shown to be the fact that those games allow a large
standard deviation in the number of legal options available from a typical game position.
This can happen for arbitrarily bushy game structures.
Inert structures are appropriate for describing long and bushy games in which ties
almost never occur.
The geometric distribution is shown to be a convenient special case appropriate
for such a description.
The use of the model in the comparative analysis of the complexity of game-solving
procedures is examplified in section 4.
An expression for the branching factor of the standard depth-first search procedure
SOLVE is given.
Various procedures that improve on SOLVE are also discussed.
The lowest probability of termination that makes a game-solving procedure almost
surely terminate is suggested as a good measure of the limitations inherent
in that particular procedure.
Section 5 extends the model to include asymmetrical partisan games.
In our probabilistic perspective, partisan games are games that
allow the probability distribution of the number of legal options to depend explicitly
on whose turn it is to play.
The winning probability for tip nodes may also depend on who played last.
The discussion is not significantly more complex than that for the impartial case
which is used in the rest of the paper mostly for convenience.
The world of partisan games, however, is significantly richer than that of impartial ones.
Most real games have partisan rules and this often translates
into a partisan probabilistic model.
The partisan version of our model may exhibit an amusing phenomenon,
illustrated in section 5.
One player may have a decisive advantage in a game lasting many moves,
while the other player only stands a chance if the game lasts very few moves.
Partisan games can also be totally unfair.
If one of the players cannot win a terminal position while the other player cannot lose one,
the best the former player can hope for is a tie.
In this very special case, virtually all internal structures will yield
a nonzero probability for ties if the probability of termination is low enough. [*]
* One of the rare exceptions is the geometric distribution (see Section 3).
If the game's internal structure K is the same for both players,
then exceptions to this statement can only be obtained if 1-K is its own inverse.
At this writing, the geometric distribution is the only such example we know of.
|
| |
Section 6 outlines some ideas relating to the propagation of errors in the evaluation
of the status of a game tree.
This issue was the initial motivation for the model presented here.
Program designers have been assuming for decades that the quality of a game-playing
decision based on minimaxing could only improve with a deeper search.
Yet it was recently discovered [20] that the standard
theoretical game model that is traditionally [15,17,23,26]
used supports the opposite view, in spite
of the fact that minimax programs have been performing reasonably well for decades.
Theoretically, the deeper a minimax search, the worse the decision.
Nau [20] termed this phenomenon pathological.
Minimax search performs much better in practice than it has any right to in theory!
Explanations have been attempted [24,25],
but the question is still essentially open.
Our model supports the view that a tractable game will not be pathological,
while an intractable game, a game in which dynamic ties appear naturally,
normally will be pathological.
In this context, pathology appears as a property of the internal structure of a game.
A minimax estimate can be expected to discriminate between good and bad moves better
than the static evaluation function it is based on,
provided the internal structure of the game is inert.
To put it bluntly, pathology was observed on former models that used a constant branching
degree d because the internal structure of these models (K(z) = zd) is not inert.
This remark could also be seen as an indication that the games people actually play
are somehow better represented by an inert structure than by a non-inert one. [*]
* It is not our intention to say that chess, say,
can be represented accurately by a recursive random game.
However, we believe that the concepts presented here are general enough
to have a counterpart in the domain of real games.
Any reference to such games is to be taken in a figurative sense.
|
| |
In actual game-playing procedures, it would be foolish to hope for an exhaustive search
of the game tree.
For all situations that are not end-game positions,
the search has to be truncated as some point,
yet truncating the search tree at a fixed depth turns out to be a poor idea
in practice. Consequently, pursuing the search of game trees only from certain positions
called non-quiescent was recognized very early [27]
as an effective scheme.
Section 7 presents an analysis of quiescence,
and demonstrates that our model is flexible enough to allow an investigation
of the theoretical reasons for the advantage of quiescence-based search.
Section 7 contains a theoretical confirmation of the usefulness of quiescence
and establishes an appropriate criterion for truncating the search.
Section 8 presents an alternative to the minimax rule of processing the information
obtained from nodes farther down in the game tree.
The alternative suggested is to propagate winning estimates in the same way
winning probabilities would.
This results in a product propagation rule.
A simple-minded analysis of that rule is given in that section.
Doctoral dissertation © Copyright 1983
by Gérard P. Michon, Ph.D.
home |
title |
contents |
overview |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
appendix |
references |
help