Game pla ying

(1)

Game pla ying

Chapter6

Chapter61

Outline

♦Games

♦Perfectplay–minimaxdecisions–α–βpruning

♦Resourcelimitsandapproximateevaluation

♦Gamesofchance

♦Gamesofimperfectinformation

Chapter62

Games vs. searc h problems

“Unpredictable”opponent⇒solutionisastrategyspecifyingamoveforeverypossibleopponentreply

Timelimits⇒unlikelytofindgoal,mustapproximate

Planofattack:

•Computerconsiderspossiblelinesofplay(Babbage,1846)

•Algorithmforperfectplay(Zermelo,1912;VonNeumann,1944)

•Finitehorizon,approximateevaluation(Zuse,1945;Wiener,1948;Shannon,1950)

•Firstchessprogram(Turing,1951)

•Machinelearningtoimproveevaluationaccuracy(Samuel,1952–57)

•Pruningtoallowdeepersearch(McCarthy,1956)

Chapter63

T yp es of games

deterministicchance

perfect information

imperfect information chess, checkers,go, othello backgammonmonopoly

bridge, poker, scrabblenuclear war battleships,blind tictactoe

Chapter64

Game tree (2-pla y er, deterministic, turns)

XX XX

X X X

XX MAX (X)MIN (O)

XX

O O OXO

O OOOOO MAX (X)

XOXOXOXXX

XX XX MIN (O)

XOXXOXXOX . . .. . .. . .. . . . . . . . .. . .

TERMINALXX

−1 0+1Utility

Chapter65

Minimax

Perfectplayfordeterministic,perfect-informationgames

Idea:choosemovetopositionwithhighestminimaxvalue=bestachievablepayoffagainstbestplay

E.g.,2-plygame:

MAX

31286421452 MIN 3

A1A3A2 A13A12A11A21A23A22A33A32A31 322

Chapter66

(2)

Minimax algorithm

functionMinimax-Decision(state)returnsanactioninputs:state,currentstateingame

returntheainActions(state)maximizingMin-Value(Result(a,state))

functionMax-Value(state)returnsautilityvalueifTerminal-Test(state)thenreturnUtility(state)v←−∞fora,sinSuccessors(state)dov←Max(v,Min-Value(s))returnv

functionMin-Value(state)returnsautilityvalueifTerminal-Test(state)thenreturnUtility(state)v←∞fora,sinSuccessors(state)dov←Min(v,Max-Value(s))returnv

Chapter67

Prop erties of minimax

Complete??

Chapter68

Prop erties of minimax

Complete??Onlyiftreeisfinite(chesshasspecificrulesforthis).NBafinitestrategycanexisteveninaninfinitetree!

Optimal??

Chapter69

Prop erties of minimax

Complete??Yes,iftreeisfinite(chesshasspecificrulesforthis)

Optimal??Yes,againstanoptimalopponent.Otherwise??

Timecomplexity??

Chapter610

Prop erties of minimax

Timecomplexity??O(bm)

Spacecomplexity??

Chapter611

Prop erties of minimax

Timecomplexity??O(bm)

Spacecomplexity??O(bm)(depth-firstexploration)

Forchess,b≈35,m≈100for“reasonable”games⇒exactsolutioncompletelyinfeasible

Butdoweneedtoexploreeverypath?

Chapter612

(3)

α

–

β

pruning example

MAX

3128 MIN3 3

Chapter613

α

–

β

pruning example

MAX

3128 MIN3

2 2

XX 3

Chapter614

α

–

β

pruning example

MAX

3128 MIN3

2 2

XX14 14 3

Chapter615 α

–

β

pruning example

MAX

3128 MIN3

2 2

XX14 14

5 5 3

Chapter616

α

–

β

pruning example

MAX

3128 MIN 3

3

2 2

XX14 14

5 5

2 2 3

Chapter617

Wh y is it called

α

–

β

?

..

.. .. MAXMIN

MAX

MINV

αisthebestvalue(tomax)foundsofaroffthecurrentpath

IfVisworsethanα,maxwillavoidit⇒prunethatbranch

Defineβsimilarlyformin

Chapter618

(4)

The

α

–

β

algorithm

functionAlpha-Beta-Decision(state)returnsanactionreturntheainActions(state)maximizingMin-Value(Result(a,state))

functionMax-Value(state,α,β)returnsautilityvalueinputs:state,currentstateingameα,thevalueofthebestalternativeformaxalongthepathtostateβ,thevalueofthebestalternativeforminalongthepathtostate

ifTerminal-Test(state)thenreturnUtility(state)v←−∞fora,sinSuccessors(state)dov←Max(v,Min-Value(s,α,β))ifv≥βthenreturnvα←Max(α,v)returnv

functionMin-Value(state,α,β)returnsautilityvaluesameasMax-Valuebutwithrolesofα,βreversed

Chapter619

Prop erties of

α

–

β

Pruningdoesnotaffectfinalresult

Goodmoveorderingimproveseffectivenessofpruning

With“perfectordering,”timecomplexity=O(bm/2)⇒doublessolvabledepth

Asimpleexampleofthevalueofreasoningaboutwhichcomputationsarerelevant(aformofmetareasoning)

Unfortunately,3550isstillimpossible!

Chapter620

Resource limits

Standardapproach:

•UseCutoff-TestinsteadofTerminal-Teste.g.,depthlimit(perhapsaddquiescencesearch)

•UseEvalinsteadofUtilityi.e.,evaluationfunctionthatestimatesdesirabilityofposition

Supposewehave100seconds,explore104nodes/second⇒106nodespermove≈358/2

⇒α–βreachesdepth8⇒prettygoodchessprogram

Chapter621

Ev aluation functions

Black to move

White slightly better White to move

Black winning

Forchess,typicallylinearweightedsumoffeatures

Eval(s)=w1f1(s)+w2f2(s)+...+wnfn(s) e.g.,w1=9withf1(s)=(numberofwhitequeens)–(numberofblackqueens),etc.

Chapter622

Digression: Exact v alues don’t matter

MIN MAX

21 1

42 2

20 1

140020 20

BehaviourispreservedunderanymonotonictransformationofEval

Onlytheordermatters:payoffindeterministicgamesactsasanordinalutilityfunction

Chapter623

Deterministic games in practice

Checkers:Chinookended40-year-reignofhumanworldchampionMarionTinsleyin1994.Usedanendgamedatabasedefiningperfectplayforallpositionsinvolving8orfewerpiecesontheboard,atotalof443,748,401,247positions.

Chess:DeepBluedefeatedhumanworldchampionGaryKasparovinasix-gamematchin1997.DeepBluesearches200millionpositionspersecond,usesverysophisticatedevaluation,andundisclosedmethodsforextendingsomelinesofsearchupto40ply.

Othello:humanchampionsrefusetocompeteagainstcomputers,whoaretoogood.

Go:humanchampionsrefusetocompeteagainstcomputers,whoaretoobad.Ingo,b>300,somostprogramsusepatternknowledgebasestosuggestplausiblemoves.

Chapter624

(5)

Nondeterministic games: bac kgammon

123456789101112

242322212019181716151413 0

25

Chapter625

Nondeterministic games in general

Innondeterministicgames,chanceintroducedbydice,card-shuffling

Simplifiedexamplewithcoin-flipping:

MIN MAX

2 CHANCE

474605−2 240−2 0.50.50.50.5 3−1

Chapter626

Algorithm for nondeterministic games

Expectiminimaxgivesperfectplay

JustlikeMinimax,exceptwemustalsohandlechancenodes:...ifstateisaMaxnodethenreturnthehighestExpectiMinimax-ValueofSuccessors(state)ifstateisaMinnodethenreturnthelowestExpectiMinimax-ValueofSuccessors(state)ifstateisachancenodethenreturnaverageofExpectiMinimax-ValueofSuccessors(state)...

Chapter627

Nondeterministic games in practice

Dicerollsincreaseb:21possiblerollswith2diceBackgammon≈20legalmoves(canbe6,000with1-1roll)

depth4=20×(21×20)3≈1.2×109

Asdepthincreases,probabilityofreachingagivennodeshrinks⇒valueoflookaheadisdiminished

α–βpruningismuchlesseffective

TDGammonusesdepth-2search+verygoodEval≈world-championlevel

Chapter628

Digression: Exact v alues DO matter

DICE

MIN MAX

22331144 2314 .9.1.9.1 2.11.3

2020303011400400 20301400 .9.1.9.1 2140.9

BehaviourispreservedonlybypositivelineartransformationofEval

HenceEvalshouldbeproportionaltotheexpectedpayoff

Chapter629

Games of imp erfect information

E.g.,cardgames,whereopponent’sinitialcardsareunknown

Typicallywecancalculateaprobabilityforeachpossibledeal

Seemsjustlikehavingonebigdicerollatthebeginningofthegame∗

Idea:computetheminimaxvalueofeachactionineachdeal,thenchoosetheactionwithhighestexpectedvalueoveralldeals∗

Specialcase:ifanactionisoptimalforalldeals,it’soptimal.∗

GIB,currentbestbridgeprogram,approximatesthisideaby1)generating100dealsconsistentwithbiddinginformation2)pickingtheactionthatwinsmosttricksonaverage

Chapter630

(6)

Example

Four-cardbridge/whist/heartshand,Maxtoplayfirst

8

92 6668766766766767 429342934234343

0

Chapter631

Example

6

4 8

92 6668766766766767 429342934234343

0

8

92 66876676676677 2932932333

0

4444 6MAX

MIN MAXMIN

Chapter632

Example

8

92 6668766766766767 429342934234343

0

6

4 8

92 66876676676677 2932932333

0

4444 6

6

4 8

92 6687667667

29329323 73 6

4667

34446

6 7

34

−0.5 −0.5

MAXMIN MAXMIN MAXMIN

Chapter633

Commonsense example

RoadAleadstoasmallheapofgoldpiecesRoadBleadstoafork:taketheleftforkandyou’llfindamoundofjewels;taketherightforkandyou’llberunoverbyabus.

Chapter634

Commonsense example

RoadAleadstoasmallheapofgoldpiecesRoadBleadstoafork:taketheleftforkandyou’llberunoverbyabus;taketherightforkandyou’llfindamoundofjewels.

Chapter635

Commonsense example

RoadAleadstoasmallheapofgoldpiecesRoadBleadstoafork:taketheleftforkandyou’llberunoverbyabus;taketherightforkandyou’llfindamoundofjewels.

RoadAleadstoasmallheapofgoldpiecesRoadBleadstoafork:guesscorrectlyandyou’llfindamoundofjewels;guessincorrectlyandyou’llberunoverbyabus.

Chapter636

(7)

Prop er analysis

*IntuitionthatthevalueofanactionistheaverageofitsvaluesinallactualstatesisWRONG

Withpartialobservability,valueofanactiondependsontheinformationstateorbeliefstatetheagentisin

Cangenerateandsearchatreeofinformationstates

Leadstorationalbehaviorssuchas♦Actingtoobtaininformation♦Signallingtoone’spartner♦Actingrandomlytominimizeinformationdisclosure

Chapter637

Summary

Gamesarefuntoworkon!(anddangerous)

TheyillustrateseveralimportantpointsaboutAI

♦perfectionisunattainable⇒mustapproximate

♦goodideatothinkaboutwhattothinkabout

♦uncertaintyconstrainstheassignmentofvaluestostates

♦optimaldecisionsdependoninformationstate,notrealstate

GamesaretoAIasgrandprixracingistoautomobiledesign

Chapter638