Game pla ying
Chapter6
Chapter61
Outline
♦Games
♦Perfectplay–minimaxdecisions–α–βpruning
♦Resourcelimitsandapproximateevaluation
♦Gamesofchance
♦Gamesofimperfectinformation
Chapter62
Games vs. searc h problems
“Unpredictable”opponent⇒solutionisastrategyspecifyingamoveforeverypossibleopponentreply
Timelimits⇒unlikelytofindgoal,mustapproximate
Planofattack:
•Computerconsiderspossiblelinesofplay(Babbage,1846)
•Algorithmforperfectplay(Zermelo,1912;VonNeumann,1944)
•Finitehorizon,approximateevaluation(Zuse,1945;Wiener,1948;Shannon,1950)
•Firstchessprogram(Turing,1951)
•Machinelearningtoimproveevaluationaccuracy(Samuel,1952–57)
•Pruningtoallowdeepersearch(McCarthy,1956)
Chapter63
T yp es of games
deterministicchance
perfect information
imperfect information chess, checkers,go, othello backgammonmonopoly
bridge, poker, scrabblenuclear war battleships,blind tictactoe
Chapter64
Game tree (2-pla y er, deterministic, turns)
XX XX
X X X
XX MAX (X)MIN (O)
XX
O O OXO
O OOOOO MAX (X)
XOXOXOXXX
XX XX MIN (O)
XOXXOXXOX . . .. . .. . .. . . . . . . . .. . .
TERMINALXX
−1 0+1Utility
Chapter65
Minimax
Perfectplayfordeterministic,perfect-informationgames
Idea:choosemovetopositionwithhighestminimaxvalue=bestachievablepayoffagainstbestplay
E.g.,2-plygame:
MAX
31286421452 MIN 3
A1A3A2 A13A12A11A21A23A22A33A32A31 322
Chapter66
Minimax algorithm
functionMinimax-Decision(state)returnsanactioninputs:state,currentstateingame
returntheainActions(state)maximizingMin-Value(Result(a,state))
functionMax-Value(state)returnsautilityvalueifTerminal-Test(state)thenreturnUtility(state)v←−∞fora,sinSuccessors(state)dov←Max(v,Min-Value(s))returnv
functionMin-Value(state)returnsautilityvalueifTerminal-Test(state)thenreturnUtility(state)v←∞fora,sinSuccessors(state)dov←Min(v,Max-Value(s))returnv
Chapter67
Prop erties of minimax
Complete??
Chapter68
Prop erties of minimax
Complete??Onlyiftreeisfinite(chesshasspecificrulesforthis).NBafinitestrategycanexisteveninaninfinitetree!
Optimal??
Chapter69
Prop erties of minimax
Complete??Yes,iftreeisfinite(chesshasspecificrulesforthis)
Optimal??Yes,againstanoptimalopponent.Otherwise??
Timecomplexity??
Chapter610
Prop erties of minimax
Complete??Yes,iftreeisfinite(chesshasspecificrulesforthis)
Optimal??Yes,againstanoptimalopponent.Otherwise??
Timecomplexity??O(bm)
Spacecomplexity??
Chapter611
Prop erties of minimax
Complete??Yes,iftreeisfinite(chesshasspecificrulesforthis)
Optimal??Yes,againstanoptimalopponent.Otherwise??
Timecomplexity??O(bm)
Spacecomplexity??O(bm)(depth-firstexploration)
Forchess,b≈35,m≈100for“reasonable”games⇒exactsolutioncompletelyinfeasible
Butdoweneedtoexploreeverypath?
Chapter612
α
–
βpruning example
MAX
3128 MIN3 3
Chapter613
α
–
βpruning example
MAX
3128 MIN3
2 2
XX 3
Chapter614
α
–
βpruning example
MAX
3128 MIN3
2 2
XX14 14 3
Chapter615 α
–
βpruning example
MAX
3128 MIN3
2 2
XX14 14
5 5 3
Chapter616
α
–
βpruning example
MAX
3128 MIN 3
3
2 2
XX14 14
5 5
2 2 3
Chapter617
Wh y is it called
α–
β?
..
.. .. MAXMIN
MAX
MINV
αisthebestvalue(tomax)foundsofaroffthecurrentpath
IfVisworsethanα,maxwillavoidit⇒prunethatbranch
Defineβsimilarlyformin
Chapter618
The
α–
βalgorithm
functionAlpha-Beta-Decision(state)returnsanactionreturntheainActions(state)maximizingMin-Value(Result(a,state))
functionMax-Value(state,α,β)returnsautilityvalueinputs:state,currentstateingameα,thevalueofthebestalternativeformaxalongthepathtostateβ,thevalueofthebestalternativeforminalongthepathtostate
ifTerminal-Test(state)thenreturnUtility(state)v←−∞fora,sinSuccessors(state)dov←Max(v,Min-Value(s,α,β))ifv≥βthenreturnvα←Max(α,v)returnv
functionMin-Value(state,α,β)returnsautilityvaluesameasMax-Valuebutwithrolesofα,βreversed
Chapter619
Prop erties of
α–
βPruningdoesnotaffectfinalresult
Goodmoveorderingimproveseffectivenessofpruning
With“perfectordering,”timecomplexity=O(bm/2)⇒doublessolvabledepth
Asimpleexampleofthevalueofreasoningaboutwhichcomputationsarerelevant(aformofmetareasoning)
Unfortunately,3550isstillimpossible!
Chapter620
Resource limits
Standardapproach:
•UseCutoff-TestinsteadofTerminal-Teste.g.,depthlimit(perhapsaddquiescencesearch)
•UseEvalinsteadofUtilityi.e.,evaluationfunctionthatestimatesdesirabilityofposition
Supposewehave100seconds,explore104nodes/second⇒106nodespermove≈358/2
⇒α–βreachesdepth8⇒prettygoodchessprogram
Chapter621
Ev aluation functions
Black to move
White slightly better White to move
Black winning
Forchess,typicallylinearweightedsumoffeatures
Eval(s)=w1f1(s)+w2f2(s)+...+wnfn(s) e.g.,w1=9withf1(s)=(numberofwhitequeens)–(numberofblackqueens),etc.
Chapter622
Digression: Exact v alues don’t matter
MIN MAX
21 1
42 2
20 1
140020 20
BehaviourispreservedunderanymonotonictransformationofEval
Onlytheordermatters:payoffindeterministicgamesactsasanordinalutilityfunction
Chapter623
Deterministic games in practice
Checkers:Chinookended40-year-reignofhumanworldchampionMarionTinsleyin1994.Usedanendgamedatabasedefiningperfectplayforallpositionsinvolving8orfewerpiecesontheboard,atotalof443,748,401,247positions.
Chess:DeepBluedefeatedhumanworldchampionGaryKasparovinasix-gamematchin1997.DeepBluesearches200millionpositionspersecond,usesverysophisticatedevaluation,andundisclosedmethodsforextendingsomelinesofsearchupto40ply.
Othello:humanchampionsrefusetocompeteagainstcomputers,whoaretoogood.
Go:humanchampionsrefusetocompeteagainstcomputers,whoaretoobad.Ingo,b>300,somostprogramsusepatternknowledgebasestosuggestplausiblemoves.
Chapter624
Nondeterministic games: bac kgammon
123456789101112
242322212019181716151413 0
25
Chapter625
Nondeterministic games in general
Innondeterministicgames,chanceintroducedbydice,card-shuffling
Simplifiedexamplewithcoin-flipping:
MIN MAX
2 CHANCE
474605−2 240−2 0.50.50.50.5 3−1
Chapter626
Algorithm for nondeterministic games
Expectiminimaxgivesperfectplay
JustlikeMinimax,exceptwemustalsohandlechancenodes:...ifstateisaMaxnodethenreturnthehighestExpectiMinimax-ValueofSuccessors(state)ifstateisaMinnodethenreturnthelowestExpectiMinimax-ValueofSuccessors(state)ifstateisachancenodethenreturnaverageofExpectiMinimax-ValueofSuccessors(state)...
Chapter627
Nondeterministic games in practice
Dicerollsincreaseb:21possiblerollswith2diceBackgammon≈20legalmoves(canbe6,000with1-1roll)
depth4=20×(21×20)3≈1.2×109
Asdepthincreases,probabilityofreachingagivennodeshrinks⇒valueoflookaheadisdiminished
α–βpruningismuchlesseffective
TDGammonusesdepth-2search+verygoodEval≈world-championlevel
Chapter628
Digression: Exact v alues DO matter
DICE
MIN MAX
22331144 2314 .9.1.9.1 2.11.3
2020303011400400 20301400 .9.1.9.1 2140.9
BehaviourispreservedonlybypositivelineartransformationofEval
HenceEvalshouldbeproportionaltotheexpectedpayoff
Chapter629
Games of imp erfect information
E.g.,cardgames,whereopponent’sinitialcardsareunknown
Typicallywecancalculateaprobabilityforeachpossibledeal
Seemsjustlikehavingonebigdicerollatthebeginningofthegame∗
Idea:computetheminimaxvalueofeachactionineachdeal,thenchoosetheactionwithhighestexpectedvalueoveralldeals∗
Specialcase:ifanactionisoptimalforalldeals,it’soptimal.∗
GIB,currentbestbridgeprogram,approximatesthisideaby1)generating100dealsconsistentwithbiddinginformation2)pickingtheactionthatwinsmosttricksonaverage
Chapter630
Example
Four-cardbridge/whist/heartshand,Maxtoplayfirst
8
92 6668766766766767 429342934234343
0
Chapter631
Example
Four-cardbridge/whist/heartshand,Maxtoplayfirst
6
4 8
92 6668766766766767 429342934234343
0
8
92 66876676676677 2932932333
0
4444 6MAXMIN MAXMIN
Chapter632
Example
Four-cardbridge/whist/heartshand,Maxtoplayfirst
8
92 6668766766766767 429342934234343
0
6
4 8
92 66876676676677 2932932333
0
4444 66
4 8
92 6687667667
29329323 73 6
4667
34446
6 7
34
−0.5 −0.5
MAXMIN MAXMIN MAXMINChapter633
Commonsense example
RoadAleadstoasmallheapofgoldpiecesRoadBleadstoafork:taketheleftforkandyou’llfindamoundofjewels;taketherightforkandyou’llberunoverbyabus.
Chapter634
Commonsense example
RoadAleadstoasmallheapofgoldpiecesRoadBleadstoafork:taketheleftforkandyou’llfindamoundofjewels;taketherightforkandyou’llberunoverbyabus.
RoadAleadstoasmallheapofgoldpiecesRoadBleadstoafork:taketheleftforkandyou’llberunoverbyabus;taketherightforkandyou’llfindamoundofjewels.
Chapter635
Commonsense example
RoadAleadstoasmallheapofgoldpiecesRoadBleadstoafork:taketheleftforkandyou’llfindamoundofjewels;taketherightforkandyou’llberunoverbyabus.
RoadAleadstoasmallheapofgoldpiecesRoadBleadstoafork:taketheleftforkandyou’llberunoverbyabus;taketherightforkandyou’llfindamoundofjewels.
RoadAleadstoasmallheapofgoldpiecesRoadBleadstoafork:guesscorrectlyandyou’llfindamoundofjewels;guessincorrectlyandyou’llberunoverbyabus.
Chapter636
Prop er analysis
*IntuitionthatthevalueofanactionistheaverageofitsvaluesinallactualstatesisWRONG
Withpartialobservability,valueofanactiondependsontheinformationstateorbeliefstatetheagentisin
Cangenerateandsearchatreeofinformationstates
Leadstorationalbehaviorssuchas♦Actingtoobtaininformation♦Signallingtoone’spartner♦Actingrandomlytominimizeinformationdisclosure
Chapter637
Summary
Gamesarefuntoworkon!(anddangerous)
TheyillustrateseveralimportantpointsaboutAI
♦perfectionisunattainable⇒mustapproximate
♦goodideatothinkaboutwhattothinkabout
♦uncertaintyconstrainstheassignmentofvaluestostates
♦optimaldecisionsdependoninformationstate,notrealstate
GamesaretoAIasgrandprixracingistoautomobiledesign
Chapter638