Résoudre un problème par une recherche

(1)

Introduction à l’Intelligence Artificielle et à la Théorie des Jeux

Ahmed Bouajjani

abou@irif.fr

Résoudre un problème par une recherche

Recherche non déterministe — Jeux

(2)

Environnement non déterministe / adversarial - Une action n’a pas un effet unique

- Dû à l’environnement qui peut agir de manière imprévisible

- On peut voir ça comme un jeu entre l’agent et l’environnement

- L’agent choisit une action, l’environnement choisit l’état successeur

- => atteindre le but quelque soit le comportement de l’environnement ?

- La solution n’est pas une séquence d’actions

- C’est un « plan » (« stratégie ») qui détermine l’action à faire à chaque état

(3)

Environnement non déterministe / adversarial - Une action n’a pas un effet unique

- Dû à l’environnement qui peut agir de manière imprévisible

- On peut voir ça comme un jeu entre l’agent et l’environnement

- L’agent choisit une action, l’environnement choisit l’état successeur

- => atteindre le but quelque soit le comportement de l’environnement ?

- La solution n’est pas une séquence d’actions

- C’est un « plan » (« stratégie ») qui détermine l’action à faire à chaque état

(4)

Environnement non déterministe / adversarial - Une action n’a pas un effet unique

- Dû à l’environnement qui peut agir de manière imprévisible

- On peut voir ça comme un jeu entre l’agent et l’environnement

- L’agent choisit une action, l’environnement choisit l’état successeur

- => atteindre le but quelque soit le comportement de l’environnement ?

- La solution n’est pas une séquence d’actions

- C’est un « plan » (« stratégie ») qui détermine l’action à faire à chaque état

(5)

Jeux d’accessibilité

A B

C D E F G H I J

Agent

Environnement

(6)

Jeux d’accessibilité

A B

C D E F G H I J

Agent

Environnement

Stratégie 1

(7)

Jeux d’accessibilité

A B

C D E F G H I J

Agent

Environnement

Stratégie 1

(8)

Jeux d’accessibilité

A B

C D E F G H I J

Agent

Environnement

Stratégie 1

(9)

Jeux d’accessibilité

A B

C D E F G H I J

Agent

Environnement

Stratégie 1

X

Perdante !!

(10)

Jeux d’accessibilité

A B

C D E F G H I J

Agent

Environnement

Stratégie 2

(11)

Jeux d’accessibilité

A B

C D E F G H I J

Agent

Environnement

Stratégie 2

(12)

Jeux d’accessibilité

A B

C D E F G H I J

Agent

Environnement

Stratégie 2

(13)

Jeux d’accessibilité

A B

C D E F G H I J

Agent

Environnement

Stratégie 2

√

(14)

Jeux d’accessibilité

A B

C D E F G H I J

Agent

Environnement

Stratégie 2

√ √

(15)

Jeux d’accessibilité

A B

C D E F G H I J

Agent

Environnement

Stratégie 2

√ √

(16)

Jeux d’accessibilité

A B

C D E F G H I J

Agent

Environnement

Stratégie 2

√ √

(17)

Jeux d’accessibilité

A B

C D E F G H I J

Agent

Environnement

Stratégie 2

√ √ √

(18)

Jeux d’accessibilité

A B

C D E F G H I J

Agent

Environnement

Stratégie 2

√ √ √ √

Gagnante !!!

(19)

Jeux d’accessibilité

A B

C D E F G H I J

Agent

Environnement

√ √ √ √

- Une stratégie est représentée par un arbre

- Ou par une expression si alors sinon imbriqués : B ; si S4 alors G sinon J

- Plan = [] | Action ; case état in S1 : Plan_1 ; … ; Sk : Plan_k

S1

S2 S3 S4 S5

(20)

Jeux d’accessibilité, résolution

A B

C D E F G H I J

Agent

Environnement

(21)

Jeux d’accessibilité, résolution

A B

C D E F G H I J

Agent

Environnement

0 0 1 0 1 1 0 0 1 1 0 1 0 0 1 1

(22)

Jeux d’accessibilité = jeux AND-OR

A B

C D E F G H I J

0 0 1 0 1 1 0 0 1 1 0 1 0 0 1 1

AND OR AND

OR

(23)

A B

C D E F G H I J

0 0 1 0 1 1 0 0 1 1 0 1 0 0 1 1

AND OR AND

OR

0 0

0

1 0

1 1

0 0

Jeux d’accessibilité = jeux AND-OR

(24)

A B

C D E F G H I J

0 0 1 0 1 1 0 0 1 1 0 1 0 0 1 1

AND OR AND

OR

0 0

0

1 0

1 1

0 0

Jeux d’accessibilité = jeux AND-OR

(25)

Algorithme de recherche AND-OR dans un graphe - Similaire au parcours en profondeur, version récursive

- On distingue les sommets AND (adversaire) et les sommets OR (l’agent)

- On doit marquer les états traités, mais pour des raisons d’efficacité, on ne prend en compte que les états sur le chemin de l’état initial à l’état courant

- Si on retombe sur un état déjà vu au dessus, on considère que c’est une feuille 0 (car si une possibilité de succès existe, on le trouvera sur une branche alternative à partir de l’occurrence précédente)

- Plan construit récursivement à partir des plans pour les descendants

(26)

Algorithme de recherche AND-OR

136 Chapter 4. Beyond Classical Search

function A^ND-O^R-G^RAPH-S^EARCH(problem ) returns a conditional plan, or failure O^R-S^EARCH(problem .I^NITIAL-S^TATE, problem , [ ])

function O^R-SÊARCH(state, problem , path) returns a conditional plan, or failure if problem .GÔAL-TÊST(state) then return the empty plan

if state is on path then return failure

for each action in problem .A^CTIONS(state) do

plan ← A^ND-S^EARCH(R^ESULTS(state, action ), problem , [state | path]) if plan ̸= failure then return [action | plan]

return failure

function A^ND-S^EARCH(states, problem , path) returns a conditional plan, or failure for each s_i in states do

plan _i ← O^R-S^EARCH(s_i, problem , path) if plan _i = failure then return failure

return [if s₁ then plan ₁ else if s₂ then plan ₂ else . . . if s_n−₁ then plan _n−₁ else plan_n]

Figure 4.11 An algorithm for searching ^AND–^OR graphs generated by nondeterministic environments. It returns a conditional plan that reaches a goal state in all circumstances. (The notation [x | l] refers to the list formed by adding object x to the front of list l.)

construct.) Modifying the basic problem-solving agent shown in Figure 3.1 to execute contingent solutions of this kind is straightforward. One may also consider a somewhat different agent design, in which the agent can act before it has found a guaranteed plan and deals with some contingencies only as they arise during execution. This type of interleaving of search

INTERLEAVING

and execution is also useful for exploration problems (see Section 4.5) and for game playing (see Chapter 5).

Figure 4.11 gives a recursive, depth-first algorithm for ^AND–^OR graph search. One key aspect of the algorithm is the way in which it deals with cycles, which often arise in nondeterministic problems (e.g., if an action sometimes has no effect or if an unintended effect can be corrected). If the current state is identical to a state on the path from the root, then it returns with failure. This doesn’t mean that there is no solution from the current state;

it simply means that if there is a noncyclic solution, it must be reachable from the earlier incarnation of the current state, so the new incarnation can be discarded. With this check, we ensure that the algorithm terminates in every finite state space, because every path must reach a goal, a dead end, or a repeated state. Notice that the algorithm does not check whether the current state is a repetition of a state on some other path from the root, which is important for efficiency. Exercise 4.5 investigates this issue.

AND–^OR graphs can also be explored by breadth-first or best-first methods. The concept of a heuristic function must be modified to estimate the cost of a contingent solution rather than a sequence, but the notion of admissibility carries over and there is an analog of the A^∗ algorithm for finding optimal solutions. Pointers are given in the bibliographical notes at the end of the chapter.

(27)

Recherche en présence d’un adversaire

Jeux à somme zéro et l'information parfaite

•

Deux joueurs : MAX et MIN

•

ce qu'un joueur gagne l'autre perd :

si o le résultat de jeu et et les

fonctions de paiement (les fonction d'utilité) de deux joueur alors

pour chaque o.

u

_{M AX}

u

_{M IN}

u

_{M IN}

(o) + u

_{M AX}

(o) = 0

(28)

L'arbre de jeu (2 joueurs jouant à tour de rôle)

jeu morpion (tic-tac-toe)

(29)

Description d'un jeu

•

s

0 l'état initial

• PLAYER(s) le joueur qui joue dans l'état s

• ACTIONS(s) l'ensemble d'actions disponibles dans l'état s

• RESULTAT(s,a) transition, l'état obtenu si on exécute l'action dans s

• TERMINAL(s) test booléen pour déterminer si s est un état final

• UTILITE(s) la fonction d'utilité définie pour chaque état final s. On suppose que UTILITE est la fonction d'utilité pour le joueur MAX, changeant le signe on

obtient l'utilité pour le joueur MIN.

(30)

Difficulté de jeux

Grand nombre d'états : jeu d'échec :

•

facteur de branchement (en moyenne) 35

•

nombre de coup par joueur 50

•

l'arbre de jeu 35

¹⁰⁰

de noeuds

(31)

5

Minmax

MAX

MIN

1 2 -6 -2 4

2 1 2 6

3 11 7

2 6

8

MIN

MAX MAX MAX MAX MAX MAX

MIN MIN

MIN

(32)

5 Minmax

MAX

MIN

1 2 -6 -2 4

2 1 2 6

3 11 7

2 6

8

MIN

MAX MAX MAX MAX MAX MAX

MIN MIN

MIN

Calculer minmax de chaque sommet en partant de feuilles.

2

8

3 3

1 6

3

2 5 4

2 3

(33)

Algorithme Minimax

function MinMaxDecision(état s) returns action if Player(s) == Max then

return a in Actions(s)

telle que MinMax(Resultat(s,a)) maximal else

return a in Actions(s)

telle que MinMax(Resultat(s,a)) minimal

function MinMax(état s) returns valeur

if Terminal(s) then return Utilité(s) if Player(s) == Max then

v =

for a in Actions(s) do

v = max(v, MinMax(Resultat(s,a))) else

v =

for a in Actions(s) do

v = min(v, MinMax(Resultat(s,a))) return v

1 + 1

(34)

Propriétés de l'algorithme Minmax

•

complet ? oui si l'arbre fini

•

optimal ? oui si l'arbre fini (et l'adversaire optimal)

•

complexité en temps O(b

^m

)

•

complexité en espace O(bm)

Irréaliste pour le jeu d'échecs.

(35)

exemple d'élagage α-β

(36)

exemple d'élagage α-β

(37)

exemple d'élagage α-β

(38)

exemple élagage α-β

(39)

exemple élagage α-β

(40)

Propriété d'élagage

•

Elagage n'affecte pas le résultat

•

L'ordre de visite de enfants est important et

influence la complexité (qui dépend de la taille de parties élaguées).

•

Avec l'ordre de visite optimal la complexité en temps passe de O(b

^m

) à O(b

^m/2

), donc on peut doubler la

hauteur de l'arbre visité.

↵

(41)

Le rôle de paramètres alpha-beta

•

- Max possède une stratégie qui lui garantie au moins

•

- Min possède une stratégie qui lui garantie de ne pas perdre plus que

↵

Min

Max

v Min Max

v  ↵

↵

élagage ↵

↵

(42)

Joueur Min

↵ v v

ne sera pas modifi´e

↵ v ↵  v <

↵ v

v ↵

v < ↵

élaguer

(43)

Algorithme d'élagage α-β

170 Chapter 5. Adversarial Search

function A^LPHA-BÊTA-SÊARCH(state) returns an action v ← MÂX-VÂLUE(state, −∞, +∞)

return the action in A^CTIONS(state) with value v

function MÂX-VÂLUE(state, α, β) returns a utility value if TÊRMINAL-TÊST(state) then return U^TILITY(state) v ← −∞

for each a in A^CTIONS(state) do

v ← MÂX(v , MÎN-VÂLUE(RÊSULT(s ,a), α, β)) if v ≥ β then return v

α ← M^AX(α, v ) return v

function MÎN-VÂLUE(state, α, β) returns a utility value if TÊRMINAL-TÊST(state) then return U^TILITY(state) v ← +∞

for each a in A^CTIONS(state) do

v ← MÎN(v , MÂX-VÂLUE(RÊSULT(s ,a) , α, β)) if v ≤ α then return v

β ← M^IN(β, v ) return v

Figure 5.7 The alpha–beta search algorithm. Notice that these routines are the same as the MÎNIMAX functions in Figure 5.3, except for the two lines in each of MÎN-VÂLUE and

M^AX-V^ALUE that maintain α and β (and the bookkeeping to pass these parameters along).

Adding dynamic move-ordering schemes, such as trying first the moves that were found to be best in the past, brings us quite close to the theoretical limit. The past could be the previous move—often the same threats remain—or it could come from previous exploration of the current move. One way to gain information from the current move is with iterative deepening search. First, search 1 ply deep and record the best path of moves. Then search 1 ply deeper, but use the recorded path to inform move ordering. As we saw in Chapter 3, iterative deepening on an exponential game tree adds only a constant fraction to the total search time, which can be more than made up from better move ordering. The best moves are often called killer moves and to try them first is called the killer move heuristic.

KILLER MOVES

In Chapter 3, we noted that repeated states in the search tree can cause an exponential increase in search cost. In many games, repeated states occur frequently because of transpo- sitions—different permutations of the move sequence that end up in the same position. For

TRANSPOSITION

example, if White has one move, a₁, that can be answered by Black with b₁ and an unre- lated move a₂ on the other side of the board that can be answered by b₂, then the sequences [a₁, b₁, a₂, b₂] and [a₂, b₂, a₁, b₁] both end up in the same position. It is worthwhile to store the evaluation of the resulting position in a hash table the first time it is encountered so that we don’t have to recompute it on subsequent occurrences. The hash table of previously seen positions is traditionally called a transposition table; it is essentially identical to the explored

TRANSPOSITION TABLE

(44)

MAX

MIN MIN

MAX MAX MAX MAX

MIN MIN

8 5 9 3 ¹⁰ 2 7 9 4 6 ₁₀ 1 5 2 11 8 7 12

8 5 9

5

3 3

8 5

5

5 2 2

5 5

5

5 9

9 9

55 5

5

10

5 10

1 1

5

2 2

5

Algorithme d'élagage α-β : exemple

(45)

La fonction d'évaluation

Si la profondeur de l'arbre trop grande on coupe à un niveau d et on applique une heuristique pour évaluer les positions à ce niveau (une estimation de valeur de sommets).

La plupart de fonctions d'évaluation utilisent un certain nombre de propriétés (features) et ensuite on applique une somme pondérée pour évaluer la position :

par exemple w

1

=9 et f1(s)=le nombre de reines blanches-le nombre de reines noires

Eval (s) = w

₁

f

₁

(s) + . . . + w

_n

f

_n

(s)

(46)

Recherche versus consultation

Les échecs : bases de données pour les ouvertures.

Pour les finales de jeu d'échecs : toutes les finales avec jusqu'à 5 pièces et beaucoup avec 6 pièces ont été

complètement résolues dans les années 90.

En particulier on a découvert une finale qui termine avec échec et mat après 262 pas. Mais les règles de jeu

demandent qu'une pièce soit capturée ou un pion avance en 50 pas.

En 2006 toutes les finales à 6 pièces sans pion et certains finales à 7 pièces ont été résolues. Il y a une finale à 7

pièces qui demande au moins 517 pas avant échec et mat.

(47)

Etat d'art

• Jeu de dames. Depuis 2007 le programme CHINOOK joue parfaitement. Utilise alpha-beta élagage et une base de données de 39 ^✕10¹²de jeux finals.

• Echecs - Deep Blue a vaincu Garry Kasparov en 1997. Une fonction

d'évaluation sophistiquée qui utilisait 8000 caractéristiques. DB la recherche

jusqu'au profondeur 14 mais parfois capable d'examiner les positions jusqu'au profondeur 40 (un bon joueur - profondeur 8, un grand maître — profondeur 12).

Large base de donnée d'ouvertures et de finales.

• Null move heuristic : évaluation d'une position (sous-estimation) on donnant

deux pas à l'adversaire au début et ensuite méthode classique jusqu'à un faible profondeur. Bons résultats pour les échecs.

• Hydra (successeur de Deep Blue), un cluster de 64 processeurs, 1G par processus, 200 millions d'évaluation par seconde, la recherche jusqu'au

profondeur 18. Le meilleurs programme RYBKA mais la fonction d'évaluation inconnue.

• Othello (Riversi) - 1997 le programme Logistello a battu le champion mondial.