Artificial intelligence a modern approach 3rd edition russell solutions manual by brandi.kujawa507

Artificial Intelligence A Modern Approach 3rd Edition Russell Solutions Manual Visit to Download in Full: https://testbankdeal.com/download/artificial-intelligence-a-m odern-approach-3rd-edition-russell-solutions-manual/

Instructor’sManual: ExerciseSolutions for ArtiﬁcialIntelligence AModernApproach ThirdEdition(InternationalVersion) StuartJ.RussellandPeterNorvig withcontributionsfrom ErnestDavis,NicholasJ.Hay,andMehranSahami UpperSaddleRiverBostonColumbusSanFranciscoNewYork IndianapolisLondonTorontoSydneySingaporeTokyoMontreal DubaiMadridHongKongMexicoCityMunichParisAmsterdamCapeTown Artificial Intelligence A Modern Approach 3rd Edition Russell Solutions Manual Visit TestBankDeal.com to get complete for all chapters

Editor-in-Chief:MichaelHirsch

ExecutiveEditor:TracyDunkelberger

AssistantEditor:MelindaHaggerty

EditorialAssistant:AllisonMichael

VicePresident,Production:VinceO’Brien

SeniorManagingEditor:ScottDisanno

ProductionEditor:JaneBonnell

InteriorDesigners:StuartRussellandPeterNorvig

Copyright©2010,2003,1995byPearsonEducation,Inc., UpperSaddleRiver,NewJersey07458. Allrightsreserved.ManufacturedintheUnitedStatesofAmerica.Thispublicationisprotectedby Copyrightandpermissionsshouldbeobtainedfromthepublisherpriortoanyprohibitedreproduction, storageinaretrievalsystem,ortransmissioninanyformor byanymeans,electronic,mechanical, photocopying,recording,orlikewise.Toobtainpermission(s)tousematerialsfromthiswork,please submitawrittenrequesttoPearsonHigherEducation,PermissionsDepartment,1LakeStreet,Upper SaddleRiver,NJ07458.

Theauthorandpublisherofthisbookhaveusedtheirbesteffortsinpreparingthisbook.These effortsincludethedevelopment,research,andtestingofthetheoriesandprogramstodeterminetheir effectiveness.Theauthorandpublishermakenowarrantyof anykind,expressedorimplied,with regardtotheseprogramsorthedocumentationcontainedinthisbook.Theauthorandpublishershall notbeliableinanyeventforincidentalorconsequentialdamagesinconnectionwith,orarisingout of,thefurnishing,performance,oruseoftheseprograms.

LibraryofCongressCataloging-in-PublicationDataonFile

10987654321

ISBN-13:978-0-13-606738-2

ISBN-10:0-13-606738-7

Preface

ThisInstructor’sSolutionManualprovidessolutions(oratleastsolutionsketches)for almostallofthe400exercisesin ArtiﬁcialIntelligence:AModernApproach(ThirdEdition) Weonlygiveactualcodeforafewoftheprogrammingexercises;writingalotofcodewould notbethathelpful,ifonlybecausewedon’tknowwhatlanguageyouprefer.

Inmanycases,wegiveideasfordiscussionandfollow-upquestions,andwetryto explain why wedesignedeachexercise.

Thereismoresupplementarymaterialthatwewanttoofferto theinstructor,butwe havedecidedtodoitthroughthemediumoftheWorldWideWebratherthanthroughaCD orprintedInstructor’sManual.Theideaisthatthissolutionmanualcontainsthematerialthat mustbekeptsecretfromstudents,buttheWebsitecontainsmaterialthatcanbeupdatedand addedtoinamoretimelyfashion.Theaddressforthewebsite is:

http://aima.cs.berkeley.edu

andtheaddressfortheonlineInstructor’sGuideis:

http://aima.cs.berkeley.edu/instructors.html

Thereyouwillﬁnd:

•Instructionsonhowtojointhe aima-instructors discussionlist.Westronglyrecommendthatyoujoinsothatyoucanreceiveupdates,corrections,notiﬁcationofnew versionsofthisSolutionsManual,additionalexercisesandexamquestions,etc.,ina timelymanner.

•Sourcecodeforprogramsfromthetext.WeoffercodeinLisp,Python,andJava,and pointtocodedevelopedbyothersinC++andProlog.

•Programmingresourcesandsupplementaltexts.

•Figuresfromthetext,formakingyourownslides.

•Terminologyfromtheindexofthebook.

•OthercoursesusingthebookthathavehomepagesontheWeb. Youcanseeexample syllabiandassignmentshere.Please donot putsolutionsetsforAIMAexerciseson publicwebpages!

•AIEducationinformationonteachingintroductoryAIcourses.

•OthersitesontheWebwithinformationonAI.Organizedbychapterinthebook;check thisforsupplementalmaterial.

Wewelcomesuggestionsfornewexercises,newenvironments andagents,etc.The bookbelongstoyou,theinstructor,asmuchasus.Wehopethatyouenjoyteachingfromit, thatthesesupplementalmaterialshelp,andthatyouwillshareyoursupplementsandexperienceswithotherinstructors.

iii

1.1

SolutionsforChapter1 Introduction

a.Dictionarydefinitionsof intelligence talkabout“thecapacitytoacquireandapply knowledge”or“thefacultyofthoughtandreason”or“theabilitytocomprehendand profitfromexperience.”Theseareallreasonableanswers,butifwewantsomething quantifiablewewouldusesomethinglike“theabilitytoapplyknowledgeinorderto performbetterinanenvironment.”

b.Wedeﬁne artiﬁcialintelligence asthestudyandconstructionofagentprogramsthat performwellinagivenenvironment,foragivenagentarchitecture.

c.Wedeﬁnean agent asanentitythattakesactioninresponsetoperceptsfroman environment.

d.Wedeﬁne rationality asthepropertyofasystemwhichdoesthe“rightthing”given whatitknows.SeeSection2.2foramorecompletediscussion.Bothdescribeperfect rationality,however;seeSection27.3.

e.Wedefine logicalreasoning astheaprocessofderivingnewsentencesfromold,such thatthenewsentencesarenecessarilytrueiftheoldonesaretrue.(Noticethatdoes notrefertoanyspecificsyntaxoorformallanguage,butitdoesrequireawell-defined notionoftruth.)

1.2

Theprobabilityoffoolinganinterrogatordependsonjusthowunskilledtheinterrogatoris.Oneentrantinthe2002Loebnerprizecompetition(whichisnotquitearealTuring Test)didfoolonejudge,althoughifyoulookatthetranscript,itishardtoimaginewhat thatjudgewasthinking.Therecertainlyhavebeenexamples ofachatbotorotheronline agentfoolinghumans.Forexample,seeSeeLennyFoner’saccountoftheJuliachatbot atfoner.www.media.mit.edu/people/foner/Julia/.We’dsaythechancetodayissomething like10%,withthevariationdependingmoreontheskilloftheinterrogatorratherthanthe program.In50years,weexpectthattheentertainmentindustry(movies,videogames,commercials)willhavemadesufﬁcientinvestmentsinartiﬁcialactorstocreateverycredible impersonators.

Seethesolutionforexercise26.1forsomediscussionofpotentialobjections.

1.3 Yes,theyarerational,becauseslower,deliberativeactionswouldtendtoresultinmore damagetothehand.If“intelligent”means“applyingknowledge”or“usingthoughtand reasoning”thenitdoesnotrequireintelligencetomakeareﬂexaction.

1.4 No.IQtestscorescorrelatewellwithcertainothermeasures,suchassuccessincollege, abilitytomakegooddecisionsincomplex,real-worldsituations,abilitytolearnnewskills andsubjectsquickly,andsoon,but only ifthey’remeasuringfairlynormalhumans.TheIQ testdoesn’tmeasureeverything.AprogramthatisspecializedonlyforIQtests(andspecializedfurtheronlyfortheanalogypart)wouldverylikelyperformpoorlyonothermeasures ofintelligence.Considerthefollowinganalogy:ifahuman runsthe100min10seconds,we mightdescribehimorheras veryathletic andexpectcompetentperformanceinotherareas suchaswalking,jumping,hurdling,andperhapsthrowingballs;butwewouldnotdesscribe aBoeing747as veryathletic becauseitcancover100min0.4seconds,norwouldweexpect ittobegoodathurdlingandthrowingballs.

Evenforhumans,IQtestsarecontroversialbecauseoftheir theoreticalpresuppositions aboutinnateability(distinctfromtrainingeffects)adnthegeneralizabilityofresults.See TheMismeasureofMan byStephenJayGould,Norton,1981or Multipleintelligences:the theoryinpractice byHowardGardner,BasicBooks,1993formoreonIQtests,whatthey measure,andwhatotheraspectsthereareto“intelligence.”

1.5 Inorderofmagnitudeﬁgures,thecomputationalpowerofthe computeris100times larger.

1.6 Justasyouareunawareofallthestepsthatgointomakingyourheartbeat,youare alsounawareofmostofwhathappensinyourthoughts.Youdohaveaconsciousawareness ofsomeofyourthoughtprocesses,butthemajorityremainsopaquetoyourconsciousness. Theﬁeldofpsychoanalysisisbasedontheideathatoneneeds trainedprofessionalhelpto analyzeone’sownthoughts.

1.7

•Althoughbarcodescanningisinasensecomputervision,thesearenotAIsystems. Theproblemofreadingabarcodeisanextremelylimitedandartiﬁcialformofvisual interpretation,andithasbeencarefullydesignedtobeassimpleaspossible,giventhe hardware.

•Inmanyrespects.Theproblemofdeterminingtherelevance ofawebpagetoaquery isaprobleminnaturallanguageunderstanding,andthetechniquesarerelatedtothose wewilldiscussinChapters22and23.SearchengineslikeAsk.com,whichgroup theretrievedpagesintocategories,useclusteringtechniquesanalogoustothosewe discussinChapter20.Likewise,otherfunctionalitiesprovidedbyasearchenginesuse intelligenttechniques;forinstance,thespellingcorrectorusesaformofdatamining basedonobservingusers’correctionsoftheirownspelling errors.Ontheotherhand, theproblemofindexingbillionsofwebpagesinawaythatallowsretrievalinseconds isaproblemindatabasedesign,notinartiﬁcialintelligence.

•Toalimitedextent.Suchmenustendstousevocabularieswhichareverylimited–e.g.thedigits,“Yes”,and“No”—andwithinthedesigners’control,whichgreatly simpliﬁestheproblem.Ontheotherhand,theprogramsmustdealwithanuncontrolled spaceofallkindsofvoicesandaccents.

2 Chapter1.Introduction

Thevoiceactivateddirectoryassistanceprogramsusedbytelephonecompanies, whichmustdealwithalargeandchangingvocabularyarecertainlyAIprograms.

•Thisisborderline.Thereissomethingtobesaidforviewingtheseasintelligentagents workingincyberspace.Thetaskissophisticated,theinformationavailableispartial,the techniquesareheuristic(notguaranteedoptimal),andthe stateoftheworldisdynamic. Allofthesearecharacteristicofintelligentactivities. Ontheotherhand,thetaskisvery farfromthosenormallycarriedoutinhumancognition.

1.8

Presumablythebrainhasevolvedsoastocarryoutthisoperationsonvisualimages, butthemechanismisonlyaccessibleforoneparticularpurposeinthisparticularcognitive taskofimageprocessing.Untilabouttwocenturiesagotherewasnoadvantageinpeople(or animals)beingabletocomputetheconvolutionofaGaussian foranyotherpurpose.

Thereallyinterestingquestionhereiswhatwemeanbysayingthatthe“actualperson” candosomething.Thepersoncansee,buthecannotcomputetheconvolutionofaGaussian; butcomputingthatconvolutionis part ofseeing.Thisisbeyondthescopeofthissolution manual.

1.9

Evolutiontendstoperpetuateorganisms(andcombinations andmutationsoforganisms)thataresuccessfulenoughtoreproduce.Thatis,evolutionfavorsorganismsthatcan optimizetheirperformancemeasuretoatleastsurvivetotheageofsexualmaturity,andthen beabletowinamate.Rationalityjustmeansoptimizingperformancemeasure,sothisisin linewithevolution.

1.10 Thisquestionisintendedtobeabouttheessentialnatureof theAIproblemandwhatis requiredtosolveit,butcouldalsobeinterpretedasasociologicalquestionaboutthecurrent practiceofAIresearch.

A science isafieldofstudythatleadstotheacquisitionofempiricalknowledgebythe scientificmethod,whichinvolvesfalsifiablehypothesesaboutwhatis.Apure engineering fieldcanbethoughtofastakingafixedbaseofempiricalknowledgeandusingittosolve problemsofinteresttosociety.Ofcourse,engineersdobitsofscience—e.g.,theymeasurethe propertiesofbuildingmaterials—andscientistsdobitsof engineeringtocreatenewdevices andsoon.

AsdescribedinSection1.1,the“human”sideofAIisclearly anempiricalscience— calledcognitivesciencethesedays—becauseitinvolvespsychologicalexperimentsdesigned outtoﬁndouthowhumancognitionactuallyworks.Whatabout thethe“rational”side?

Ifweviewitasstudyingtheabstractrelationshipamonganarbitrarytaskenvironment,a computingdevice,andtheprogramforthatcomputingdevice thatyieldsthebestperformance inthetaskenvironment,thentherationalsideofAIisreallymathematicsandengineering; itdoesnotrequireanyempiricalknowledgeaboutthe actual world—andthe actual task environment—thatweinhabit;thatagivenprogramwilldowellinagivenenvironmentisa theorem.(Thesameistrueofpuredecisiontheory.)Inpractice,however,weareinterested intaskenvironmentsthatdoapproximatetheactualworld,soeventherationalsideofAI involvesﬁndingoutwhattheactualworldislike.Forexample,instudyingrationalagents thatcommunicate,weareinterestedintaskenvironmentsthatcontainhumans,sowehave

toﬁndoutwhathumanlanguageislike.Instudyingperception,wetendtofocusonsensors suchascamerasthatextractusefulinformationfromtheactualworld.(Inaworldwithout light,cameraswouldn’tbemuchuse.)Moreover,todesignvisionalgorithmsthataregood atextractinginformationfromcameraimages,weneedtounderstandtheactualworldthat generatesthoseimages.Obtainingtherequiredunderstandingofscenecharacteristics,object types,surfacemarkings,andsoonisaquitedifferentkindofsciencefromordinaryphysics, chemistry,biology,andsoon,butitisstillscience.

Insummary,AIisdeﬁnitelyengineeringbutitwouldnotbeespeciallyusefultousifit werenotalsoanempiricalscienceconcernedwiththoseaspectsoftherealworldthataffect thedesignofintelligentsystemsforthatworld.

1.11 Thisdependsonyourdeﬁnitionof“intelligent”and“tell.” Inonesensecomputersonly dowhattheprogrammerscommandthemtodo,butinanothersensewhattheprogrammers consciouslytellsthecomputertodooftenhasverylittleto dowithwhatthecomputeractually does.Anyonewhohaswrittenaprogramwithanornerybugknowsthis,asdoesanyone whohaswrittenasuccessfulmachinelearningprogram.Soin onesenseSamuel“told”the computer“learntoplaycheckersbetterthanIdo,andthenplaythatway,”butinanother sensehetoldthecomputer“followthislearningalgorithm” anditlearnedtoplay.Sowe’re leftinthesituationwhereyoumayormaynotconsiderlearningtoplaycheckerstobessign ofintelligence(oryoumaythinkthatlearningtoplayinthe rightwayrequiresintelligence, butnotinthisway),andyoumaythinktheintelligenceresidesintheprogrammerorinthe computer.

1.12 Thepointofthisexerciseistonoticetheparallelwiththepreviousone.Whatever youdecidedaboutwhethercomputerscouldbeintelligentin 1.11,youarecommittedto makingthesameconclusionaboutanimals(includinghumans), unless yourreasonsfordecidingwhethersomethingisintelligenttakeintoaccountthemechanism(programmingvia genesversusprogrammingviaahumanprogrammer).Notethat Searlemakesthisappealto mechanisminhisChineseRoomargument(seeChapter26).

1.13

1.14

a.(ping-pong)Areasonablelevelofproﬁciencywasachieved byAndersson’srobot(Andersson,1988).

b.(drivinginCairo)No.Althoughtherehasbeenalotofprogressinautomateddriving, allsuchsystemscurrentlyrelyoncertainrelativelyconstantclues:thattheroadhas shouldersandacenterline,thatthecaraheadwilltravelapredictablecourse,thatcars willkeeptotheirsideoftheroad,andsoon.Somelanechangesandturnscanbemade onclearlymarkedroadsinlighttomoderatetrafﬁc.Driving indowntownCairoistoo unpredictableforanyofthesetowork.

c.(drivinginVictorville,California)Yes,tosomeextent, asdemonstratedinDARPA’s UrbanChallenge.Someofthevehiclesmanagedtonegotiatestreets,intersections, well-behavedtrafﬁc,andwell-behavedpedestriansingood visualconditions.

4 Chapter1.Introduction

Again,thechoiceyoumakein1.11drivesyouranswertothisquestion.

d.(shoppingatthemarket)No.Norobotcancurrentlyputtogetherthetasksofmovingin acrowdedenvironment,usingvisiontoidentifyawidevarietyofobjects,andgrasping theobjects(includingsquishablevegetables)withoutdamagingthem.Thecomponent piecesarenearlyabletohandletheindividualtasks,butit wouldtakeamajorintegrationefforttoputitalltogether.

e.(shoppingontheweb)Yes.Softwarerobotsarecapableofhandlingsuchtasks,particularlyifthedesignofthewebgroceryshoppingsitedoes notchangeradicallyover time.

f.(bridge)Yes.ProgramssuchasGIBnowplayatasolidlevel.

g.(theoremproving)Yes.Forexample,theproofofRobbinsalgebradescribedonpage 360.

h.(funnystory)No.Whilesomecomputer-generatedproseand poetryishysterically funny,thisisinvariablyunintentional,exceptinthecase ofprogramsthatechoback prosethattheyhavememorized.

i.(legaladvice)Yes,insomecases.AIhasalonghistoryofresearchintoapplications ofautomatedlegalreasoning.Twooutstandingexamplesare theProlog-basedexpert systemsusedintheUKtoguidemembersofthepublicindealingwiththeintricaciesof thesocialsecurityandnationalitylaws.Thesocialsecuritysystemissaidtohavesaved theUKgovernmentapproximately$150millioninitsﬁrstyearofoperation.However, extensionintomorecomplexareassuchascontractlawawaitsasatisfactoryencoding ofthevastwebofcommon-senseknowledgepertainingtocommercialtransactionsand agreementandbusinesspractices.

j.(translation)Yes.Inalimitedway,thisisalreadybeingdone.SeeKay,Gawronand Norvig(1994)andWahlster(2000)foranoverviewoftheﬁeld ofspeechtranslation, andsomelimitationsonthecurrentstateoftheart.

k.(surgery)Yes.Robotsareincreasinglybeingusedforsurgery,althoughalwaysunder thecommandofadoctor.Roboticskillsdemonstratedatsuperhumanlevelsinclude drillingholesinbonetoinsertartiﬁcialjoints,suturing,andknot-tying.Theyarenot yetcapableofplanningandcarryingoutacomplexoperation autonomouslyfromstart toﬁnish.

1.15

• DARPAGrandChallengeforRoboticCars In2004theGrandChallengewasa240 kmracethroughtheMojaveDesert.Itclearlystressedthestateoftheartofautonomous driving,andinfactnocompetitorfinishedtherace.Thebest team,CMU,completed only12ofthe240km.In2005theracefeatureda212kmcoursewithfewercurves andwiderroadsthanthe2004race.Fiveteamsfinished,withStanfordfinishingfirst, edgingouttwoCMUentries.Thiswashailedasagreatachievementforroboticsand fortheChallengeformat.In2007theUrbanChallengeputcarsinacitysetting,where theyhadtoobeytrafficlawsandavoidothercars.ThistimeCMUedgedoutStanford.

Theprogressmadeinthiscontestsisamatteroffact,butthe impactofthatprogressis amatterofopinion.

Thecompetitionappearstohavebeenagoodtestinggroundto puttheoryintopractice, somethingthatthefailuresof2004showedwasneeded.Butit isimportantthatthe competitionwasdoneatjusttherighttime,whentherewastheoreticalworktoconsolidate,asdemonstratedbytheearlierworkbyDickmanns(whoseVaMPcardrove autonomouslyfor158kmin1995)andbyPomerleau(whoseNavlabcardrove5000km acrosstheUSA,alsoin1995,withthesteeringcontrolledautonomouslyfor98%ofthe trip,althoughthebrakesandacceleratorwerecontrolledbyahumandriver).

• InternationalPlanningCompetition In1998,fiveplannerscompeted:Blackbox, HSP,IPP,SGP,andSTAN.Theresultpage(ftp://ftp.cs.yale.edu/pub/ mcdermott/aipscomp-results.html)stated“alloftheseplannersperformed verywell,comparedtothestateoftheartafewyearsago.”Mostplansfoundwere30or 40steps,withsomeover100steps.In2008,thecompetitionhadexpandedquiteabit: thereweremoretracks(satisficingvs.optimizing;sequentialvs.temporal;staticvs. learning).Therewereabout25planners,includingsubmissionsfromthe1998groups (ortheirdescendants)andnewgroups.Solutionsfoundwere muchlongerthanin1998. Insum,thefieldhasprogressedquiteabitinparticipation, inbreadth,andinpowerof theplanners.Inthe1990sitwaspossibletopublishaPlanningpaperthatdiscussed onlyatheoreticalapproach;nowitisnecessarytoshowquantitativeevidenceofthe efficacyofanapproach.Thefieldisstrongerandmorematurenow,anditseemsthat theplanningcompetitiondeservessomeofthecredit.However,someresearchersfeel thattoomuchemphasisisplacedontheparticularclassesof problemsthatappearin thecompetitions,andnotenoughonreal-worldapplications.

• RobocupRoboticsSoccer Thiscompetitionhasprovedextremelypopular,attracting 407teamsfrom43countriesin2009(upfrom38teamsfrom11countriesin1997). Theroboticplatformhasadvancedtoamorecapablehumanoid form,andthestrategy andtacticshaveadvancedaswell.Althoughthecompetition hasspurredinnovations indistributedcontrol,thewinningteamsinrecentyearshavereliedmoreonindividual ball-handlingskillsthanonadvancedteamwork.Thecompetitionhasservedtoincrease interestandparticipationinrobotics,althoughitisnotclearhowwelltheyareadvancing towardsthegoalofdefeatingahumanteamby2050.

• TRECInformationRetrievalConference Thisisoneoftheoldestcompetitions, startedin1992.Thecompetitionshaveservedtobringtogetheracommunityofresearchers,haveledtoalargeliteratureofpublications,andhaveseenprogressinparticipationandinqualityofresultsovertheyears.Intheearlyyears,TRECserved itspurposeasaplacetodoevaluationsofretrievalalgorithmsontextcollectionsthat werelargeforthetime.However,startingaround2000TRECbecamelessrelevantas theadventoftheWorldWideWebcreatedacorpusthatwasavailabletoanyoneand wasmuchlargerthananythingTREChadcreated,andthedevelopmentofcommercial searchenginessurpassedacademicresearch.

• NISTOpenMachineTranslationEvaluation Thisseriesofevaluations(explicitly notlabelleda“competition”)hasexistedsince2001.Since thenwehaveseengreat advancesinMachineTranslationqualityaswellasinthenumberoflanguagescovered.

6 Chapter1.Introduction

Thedominantapproachhasswitchedfromonebasedongrammaticalrulestoonethat reliesprimarilyonstatistics.TheNISTevaluationsseemtotrackthesechangeswell, butdon’tappeartobedrivingthechanges.

Overall,weseethatwhateveryoumeasureisboundtoincreaseovertime.Formostof thesecompetitions,themeasurementwasausefulone,andthestateofthearthasprogressed. InthecaseofICAPS,someplanningresearchersworrythattoomuchattentionhasbeen lavishedonthecompetitionitself.Insomecases,progress hasleftthecompetitionbehind, asinTREC,wheretheresourcesavailabletocommercialsearchenginesoutpacedthose availabletoacademicresearchers.InthiscasetheTRECcompetitionwasuseful—ithelped trainmanyofthepeoplewhoendedupincommercialsearchengines—andinnowaydrew energyawayfromnewideas.

SolutionsforChapter2 IntelligentAgents

2.1 Thisquestionteststhestudent’sunderstandingofenvironments,rationalactions,and performancemeasures.Anysequentialenvironmentinwhich rewardsmaytaketimetoarrive willwork,becausethenwecanarrangefortherewardtobe“overthehorizon.”Supposethat inanystatetherearetwoactionchoices, a and b,andconsidertwocases:theagentisinstate s attime T orattime T 1.Instate s,action a reachesstate s′ withreward0,whileaction b reachesstate s againwithreward1;in s′ eitheractiongainsreward10.Attime T 1, it’srationaltodo a in s,withexpectedtotalreward10beforetimeisup;butattime T ,it’s rationaltodo b withtotalexpectedreward1becausetherewardof10cannotbeobtained beforetimeisup.

Studentsmayalsoprovidecommon-senseexamplesfromreallife:investmentswhose payoffoccursaftertheendoflife,examswhereitdoesn’tmakesensetostartthehigh-value questionwithtoolittletimelefttogettheanswer,andsoon

Theenvironmentstatecanincludeaclock,ofcourse;thisdoesn’tchangethegistof theanswer—nowtheactionwilldependontheclockaswellasonthenon-clockpartofthe state—butitdoesmeanthattheagentcanneverbeinthesamestatetwice.

2.2 Noticethatforoursimpleenvironmentalassumptionsweneednotworryaboutquantitativeuncertainty.

a.Itsufﬁcestoshowthatforallpossibleactualenvironments(i.e.,alldirtdistributionsand initiallocations),thisagentcleansthesquaresatleastasfastasanyotheragent.Thisis triviallytruewhenthereisnodirt.Whenthereisdirtinthe initiallocationandnonein theotherlocation,theworldiscleanafteronestep;noagentcandobetter.Whenthere isnodirtintheinitiallocationbutdirtintheother,theworldiscleanaftertwosteps;no agentcandobetter.Whenthereisdirtinbothlocations,the worldiscleanafterthree steps;noagentcandobetter.(Note:ingeneral,theconditionstatedintheﬁrstsentence ofthisanswerismuchstricterthannecessaryforanagentto berational.)

b.Theagentin(a)keepsmovingbackwardsandforwardsevenaftertheworldisclean. Itisbettertodo NoOp oncetheworldisclean(thechaptersaysthis).Now,since theagent’sperceptdoesn’tsaywhethertheothersquareisclean,itwouldseemthat theagentmusthavesomememorytosaywhethertheothersquarehasalreadybeen cleaned.Tomakethisargumentrigorousismoredifﬁcult—forexample,couldthe agentarrangethingssothatitwouldonlybeinacleanleftsquarewhentherightsquare

EXTERNALMEMORY

wasalreadyclean?Asageneralstrategy,anagent can usetheenvironmentitselfas aformof externalmemory—acommontechniqueforhumanswhousethingslike

appointmentcalendarsandknotsinhandkerchiefs.Inthisparticularcase,however,that isnotpossible.Considerthereflexactionsfor [A, Clean ] and [B, Clean ].Ifeitherof theseis NoOp,thentheagentwillfailinthecasewherethatistheinitial perceptbut theothersquareisdirty;hence,neithercanbe NoOp andthereforethesimplereflex agentisdoomedtokeepmoving.Ingeneral,theproblemwithreflexagentsisthatthey havetodothesamethinginsituationsthatlookthesame,evenwhenthesituations areactuallyquitedifferent.Inthevacuumworldthisisabigliability,becauseevery interiorsquare(excepthome)lookseitherlikeasquarewithdirtorasquarewithout dirt.

c.Ifweconsiderasymptoticallylonglifetimes,thenitisclearthatlearningamap(in someform)confersanadvantagebecauseitmeansthattheagentcanavoidbumping intowalls.Itcanalsolearnwheredirtismostlikelytoaccumulateandcandevise anoptimalinspectionstrategy.Theprecisedetailsoftheexplorationmethodneeded toconstructacompletemapappearinChapter4;methodsforderivinganoptimal inspection/cleanupstrategyareinChapter21.

2.3

a. Anagentthatsensesonlypartialinformationaboutthestatecannotbeperfectlyrational.

False.Perfectrationalityreferstotheabilitytomakegooddecisionsgiventhesensor informationreceived.

b Thereexisttaskenvironmentsinwhichnopurereﬂexagentcanbehaverationally.

True.Apurereﬂexagentignorespreviouspercepts,socannotobtainanoptimalstate estimateinapartiallyobservableenvironment.Forexample,correspondencechessis playedbysendingmoves;iftheotherplayer’smoveisthecurrentpercept,areﬂexagent couldnotkeeptrackoftheboardstateandwouldhavetorespondto,say,“a4”inthe samewayregardlessofthepositioninwhichitwasplayed.

c. Thereexistsataskenvironmentinwhicheveryagentisrational.

True.Forexample,inanenvironmentwithasinglestate,suchthatallactionshavethe samereward,itdoesn’tmatterwhichactionistaken.Moregenerally,anyenvironment thatisreward-invariantunderpermutationoftheactionswillsatisfythisproperty.

d. Theinputtoanagentprogramisthesameastheinputtotheagentfunction.

False.Theagentfunction,notionallyspeaking,takesasinputtheentireperceptsequenceuptothatpoint,whereastheagentprogramtakesthecurrentperceptonly.

e. Everyagentfunctionisimplementablebysomeprogram/machinecombination.

False.Forexample,theenvironmentmaycontainTuringmachinesandinputtapesand theagent’sjobistosolvethehaltingproblem;thereisanagent function thatspeciﬁes therightanswers,butnoagentprogramcanimplementit.Anotherexamplewouldbe anagentfunctionthatrequiressolvingintractableprobleminstancesofarbitrarysizein constanttime.

MOBILEAGENT

f. Supposeanagentselectsitsactionuniformlyatrandomfrom thesetofpossibleactions. Thereexistsadeterministictaskenvironmentinwhichthis agentisrational.

True.Thisisaspecialcaseof(c);ifitdoesn’tmatterwhich actionyoutake,selecting randomlyisrational.

g. Itispossibleforagivenagenttobeperfectlyrationalintwodistincttaskenvironments.

True.Forexample,wecanarbitrarilymodifythepartsofthe environmentthatare unreachablebyanyoptimalpolicyaslongastheystayunreachable.

h. Everyagentisrationalinanunobservableenvironment.

False.Someactionsarestupid—andtheagentmayknowthisif ithasamodelofthe environment—evenifonecannotperceivetheenvironmentstate.

i Aperfectlyrationalpoker-playingagentneverloses.

False.Unlessitdrawstheperfecthand,theagentcanalways loseifanopponenthas bettercards.Thiscanhappenforgameaftergame.Thecorrectstatementisthatthe agent’sexpectedwinningsarenonnegative.

2.4 Manyofthesecanactuallybearguedeitherway,dependingon thelevelofdetailand abstraction.

A.Partiallyobservable,stochastic,sequential,dynamic,continuous,multi-agent.

B.Partiallyobservable,stochastic,sequential,dynamic,continuous,singleagent(unless therearealienlifeformsthatareusefullymodeledasagents).

C.Partiallyobservable,deterministic,sequential,static,discrete,singleagent.Thiscanbe multi-agentanddynamicifwebuybooksviaauction,ordynamicifwepurchaseona longenoughscalethatbookofferschange.

D.Fullyobservable,stochastic,episodic(everypointisseparate),dynamic,continuous, multi-agent.

E.Fullyobservable,stochastic,episodic,dynamic,continuous,singleagent.

F.Fullyobservable,stochastic,sequential,static,continuous,singleagent.

G.Fullyobservable,deterministic,sequential,static,continuous,singleagent.

H.Fullyobservable,strategic,sequential,static,discrete,multi-agent.

2.5 Thefollowingarejustsomeofthemanypossibledeﬁnitionsthatcanbewritten:

• Agent:anentitythatperceivesandacts;or,onethat canbeviewed asperceivingand acting.Essentiallyanyobjectqualiﬁes;thekeypointisthewaytheobjectimplements anagentfunction.(Note:someauthorsrestrictthetermto programs thatoperate on behalfof ahuman,ortoprogramsthatcancausesomeoralloftheircode torunon othermachinesonanetwork,asin mobileagents.)

• Agentfunction:afunctionthatspeciﬁestheagent’sactioninresponsetoeverypossible perceptsequence.

• Agentprogram:thatprogramwhich,combinedwithamachinearchitecture, implementsanagentfunction.Inoursimpledesigns,theprogramtakesanewpercepton eachinvocationandreturnsanaction.

10 Chapter2.IntelligentAgents

• Rationality:apropertyofagentsthatchooseactionsthatmaximizetheirexpectedutility,giventheperceptstodate.

• Autonomy:apropertyofagentswhosebehaviorisdeterminedbytheirownexperience ratherthansolelybytheirinitialprogramming.

• Reﬂexagent:anagentwhoseactiondependsonlyonthecurrentpercept.

• Model-basedagent:anagentwhoseactionisderiveddirectlyfromaninternalmodel ofthecurrentworldstatethatisupdatedovertime.

• Goal-basedagent:anagentthatselectsactionsthatitbelieveswillachieve explicitly representedgoals.

• Utility-basedagent:anagentthatselectsactionsthatitbelieveswillmaximizethe expectedutilityoftheoutcomestate.

• Learningagent:anagentwhosebehaviorimprovesovertimebasedonitsexperience.

2.6 Althoughthesequestionsareverysimple,theyhintatsomeveryfundamentalissues. Ouranswersareforthesimpleagentdesignsfor static environmentswherenothinghappens whiletheagentisdeliberating;theissuesgetevenmoreinterestingfordynamicenvironments.

a.Yes;takeanyagentprogramandinsertnullstatementsthat donotaffecttheoutput.

b.Yes;theagentfunctionmightspecifythattheagentprint true whentheperceptisa Turingmachineprogramthathalts,and false otherwise.(Note:indynamicenvironments,formachinesoflessthaninﬁnitespeed,therational agentfunctionmaynotbe implementable;e.g.,theagentfunctionthatalwaysplaysa winningmove,ifany,ina gameofchess.)

c.Yes;theagent’sbehaviorisﬁxedbythearchitectureandprogram.

d.Thereare 2n agentprograms,althoughmanyofthesewillnotrunatall.(Note:Any givenprogramcandevoteatmost n bitstostorage,soitsinternalstatecandistinguish amongonly 2n pasthistories.Becausetheagentfunctionspeciﬁesactionsbasedonpercepthistories,therewillbemanyagentfunctionsthatcannotbeimplementedbecause oflackofmemoryinthemachine.)

e.Itdependsontheprogramandtheenvironment.Iftheenvironmentisdynamic,speedingupthemachinemaymeanchoosingdifferent(perhapsbetter)actionsand/oracting sooner.Iftheenvironmentisstaticandtheprogrampaysnoattentiontothepassageof elapsedtime,theagentfunctionisunchanged.

2.7

Thedesignofgoal-andutility-basedagentsdependsonthestructureofthetaskenvironment.Thesimplestsuchagents,forexamplethoseinchapters3and10,computethe agent’sentirefuturesequenceofactionsinadvancebefore actingatall.Thisstrategyworks forstaticanddeterministicenvironmentswhichareeither fully-knownorunobservable

Forfully-observableandfully-knownstaticenvironments apolicycanbecomputedin advancewhichgivestheactiontobytakeninanygivenstate.

function GOAL-BASED-AGENT(percept) returns anaction persistent: state,theagent’scurrentconceptionoftheworldstate model,adescriptionofhowthenextstatedependsoncurrentstate andaction goal,adescriptionofthedesiredgoalstate plan,asequenceofactionstotake,initiallyempty action,themostrecentaction,initiallynone

state ← UPDATE-STATE(state, action, percept, model ) if GOAL-ACHIEVED(state,goal) thenreturn anullaction if plan isempty then plan ← PLAN(state,goal,model) action ← FIRST(plan) plan ← REST(plan) return action

Forpartially-observableenvironmentstheagentcancomputeaconditionalplan,which speciﬁesthesequenceofactionstotakeasafunctionoftheagent’sperception.Intheextreme,aconditionalplangivestheagent’sresponsetoeverycontingency,andsoitisarepresentationoftheentireagentfunction.

Inallcasesitmaybeeitherintractableortooexpensivetocomputeeverythingoutin advance.Insteadofaconditionalplan,itmaybebettertocomputeasinglesequenceof actionswhichislikelytoreachthegoal,thenmonitortheenvironmenttocheckwhetherthe planissucceeding,repairingorreplanningifitisnot.Itmaybeevenbettertocomputeonly thestartofthisplanbeforetakingtheﬁrstaction,continuingtoplanatlatertimesteps.

Pseudocodeforsimplegoal-basedagentisgiveninFigureS2.1.GOAL-ACHIEVED teststoseewhetherthecurrentstatesatisfiesthegoalornot,doingnothingifitdoes.PLAN computesasequenceofactionstotaketoachievethegoal.Thismightreturnonlyaprefix ofthefullplan,therestwillbecomputedaftertheprefixisexecuted.Thisagentwillactto maintainthegoal:ifatanypointthegoalisnotsatisfieditwill(eventually)replantoachieve thegoalagain.

Atthislevelofabstractiontheutility-basedagentisnotmuchdifferentthanthegoalbasedagent,exceptthatactionmaybecontinuouslyrequired(thereisnotnecessarilyapoint wheretheutilityfunctionis“satisﬁed”).PseudocodeisgiveninFigureS2.2.

(defunreflex-rational-vacuum-agent(percept)

(destructuring-bind(locationstatus)percept

12 Chapter2.IntelligentAgents

FigureS2.1 Agoal-basedagent. 2.8 Theﬁle "agents/environments/vacuum.lisp" inthecoderepositoryimplementsthevacuum-cleanerenvironment.Studentscaneasily extendittogeneratedifferent shapedrooms,obstacles,andsoon. 2.9 Areﬂexagentprogramimplementingtherationalagentfunctiondescribedinthechapterisasfollows:

function UTILITY-BASED-AGENT(percept) returns anaction persistent: state,theagent’scurrentconceptionoftheworldstate model,adescriptionofhowthenextstatedependsoncurrentstate andaction utility function,adescriptionoftheagent’sutilityfunction plan,asequenceofactionstotake,initiallyempty action,themostrecentaction,initiallynone

state ← UPDATE-STATE(state, action, percept, model ) if plan isempty then plan ← PLAN(state,utility function,model) action ← FIRST(plan) plan ← REST(plan) return action

(cond((eqstatus’Dirty)’Suck) ((eqlocation’A)’Right) (t’Left))))

Forstates1,3,5,7inFigure4.9,theperformancemeasuresare1996,1999,1998,2000 respectively.

2.10

a.No;seeanswerto2.4(b).

b.Seeanswerto2.4(b).

c.Inthiscase,asimplereflexagentcanbeperfectlyrational.Theagentcanconsistof atablewitheightentries,indexedbypercept,thatspecifiesanactiontotakeforeach possiblestate.Aftertheagentacts,theworldisupdatedandthenextperceptwilltell theagentwhattodonext.Forlargerenvironments,constructingatableisinfeasible. Instead,theagentcouldrunoneoftheoptimalsearchalgorithmsinChapters3and4 andexecutethefirststepofthesolutionsequence.Again,no internalstateis required, butitwouldhelptobeabletostorethesolutionsequenceinsteadofrecomputingitfor eachnewpercept.

2.11

a.Becausetheagentdoesnotknowthegeographyandperceives onlylocationandlocal dirt,andcannotrememberwhatjusthappened,itwillgetstuckforeveragainstawall whenittriestomoveinadirectionthatisblocked—thatis,unlessitrandomizes.

b.Onepossibledesigncleansupdirtandotherwisemovesrandomly:

(defunrandomized-reflex-vacuum-agent(percept) (destructuring-bind(locationstatus)percept (cond((eqstatus’Dirty)’Suck)

(t(random-element’(LeftRightUpDown))))))

FigureS2.2 Autility-basedagent.

ThisisfairlyclosetowhattheRoombaTM vacuumcleanerdoes(althoughtheRoomba hasabumpsensorandrandomizesonlywhenithitsanobstacle).Itworksreasonably wellinnice,compactenvironments.Inmaze-likeenvironmentsorenvironmentswith smallconnectingpassages,itcantakeaverylongtimetocoverallthesquares.

c.AnexampleisshowninFigureS2.3.Studentsmayalsowishto measureclean-uptime forlinearorsquareenvironmentsofdifferentsizes,andcomparethosetotheefﬁcient onlinesearchalgorithmsdescribedinChapter4.

d.Areflexagentwithstatecanbuildamap(seeChapter4fordetails).Anonlinedepthfirstexplorationwillreacheverystateintimelinearinthe sizeoftheenvironment; therefore,theagentcandomuchbetterthanthesimplereflex agent.

Thequestionofrationalbehaviorinunknownenvironmentsisacomplexonebutitis worthencouragingstudentstothinkaboutit.Weneedtohave somenotionoftheprior probabilitydistributionovertheclassofenvironments;callthistheinitial beliefstate. Anyactionyieldsanewperceptthatcanbeusedtoupdatethis distribution,moving theagenttoanewbeliefstate.Oncetheenvironmentiscompletelyexplored,thebelief statecollapsestoasinglepossibleenvironment.Therefore,theproblemofoptimal explorationcanbeviewedasasearchforanoptimalstrategy inthespaceofpossible beliefstates.Thisisawell-deﬁned,ifhorrendouslyintractable,problem.Chapter21 discussessomecaseswhereoptimalexplorationispossible.Anotherconcreteexample ofexplorationistheMinesweepercomputergame(seeExercise7.22).Forverysmall Minesweeperenvironments,optimalexplorationisfeasiblealthoughthebeliefstate

14 Chapter2.IntelligentAgents

FigureS2.3 Anenvironmentinwhichrandommotionwilltakealongtimeto coverall thesquares.

updateisnontrivialtoexplain.

2.12 Theproblemappearsatﬁrsttobeverysimilar;themaindifferenceisthatinsteadof usingthelocationpercepttobuildthemap,theagenthasto“invent”itsownlocations(which, afterall,arejustnodesinadatastructurerepresentingthestatespacegraph).Whenabump isdetected,theagentassumesitremainsinthesamelocationandcanaddawalltoitsmap. Forgridenvironments,theagentcankeeptrackofits (x,y) locationandsocantellwhenit hasreturnedtoanoldstate.Inthegeneralcase,however,thereisnosimplewaytotellifa stateisneworold.

2.13

a.Forareflexagent,thispresentsno additional challenge,becausetheagentwillcontinue to Suck aslongasthecurrentlocationremainsdirty.Foranagentthatconstructsa sequentialplan,every Suck actionwouldneedtobereplacedby“Suck untilclean.” Ifthedirtsensorcanbewrongoneachstep,thentheagentmightwanttowaitfora fewstepstogetamorereliablemeasurementbeforedeciding whetherto Suck ormove ontoanewsquare.Obviously,thereisatrade-offbecausewaitingtoolongmeans thatdirtremainsonthefloor(incurringapenalty),butactingimmediatelyriskseither dirtyingacleansquareorignoringadirtysquare(ifthesensoriswrong).Arational agentmustalsocontinuetouringandcheckingthesquaresin caseitmissedoneona previoustour(becauseofbadsensorreadings).itisnotimmediatelyobvioushowthe waitingtimeateachsquareshouldchangewitheachnewtour. Theseissuescanbe clarifiedbyexperimentation,whichmaysuggestageneraltrendthatcanbeverified mathematically.ThisproblemisapartiallyobservableMarkovdecisionprocess—see Chapter17.Suchproblemsarehardingeneral,butsomespecialcasesmayyieldto carefulanalysis.

b.Inthiscase,theagentmustkeeptouringthesquaresindeﬁnitely.Theprobabilitythat asquareisdirtyincreasesmonotonicallywiththetimesinceitwaslastcleaned,sothe rationalstrategyis,roughlyspeaking,torepeatedlyexecutetheshortestpossibletourof allsquares.(Wesay“roughlyspeaking”becausetherearecomplicationscausedbythe factthattheshortesttourmayvisitsomesquarestwice,dependingonthegeography.) ThisproblemisalsoapartiallyobservableMarkovdecision process.

SolutionsforChapter3

SolvingProblemsbySearching

3.1 Ingoalformulation,wedecidewhichaspectsoftheworldweareinterestedin,and whichcanbeignoredorabstractedaway.Theninproblemformulationwedecidehowto manipulatetheimportantaspects(andignoretheothers).Ifwedidproblemformulationfirst wewouldnotknowwhattoincludeandwhattoleaveout.Thatsaid,itcanhappenthatthere isacycleofiterationsbetweengoalformulation,problemformulation,andproblemsolving untilonearrivesatasufficientlyusefulandefficientsolution.

3.2

a.We’lldeﬁnethecoordinatesystemsothatthecenterofthe mazeisat (0, 0),andthe mazeitselfisasquarefrom ( 1, 1) to (1, 1).

Initialstate:robotatcoordinate (0, 0),facingNorth.

Goaltest:either |x| > 1 or |y| > 1 where (x,y) isthecurrentlocation.

Successorfunction:moveforwardsanydistance d;changedirectionrobotitfacing. Costfunction:totaldistancemoved.

Thestatespaceisinﬁnitelylarge,sincetherobot’spositioniscontinuous.

b.Thestatewillrecordtheintersectiontherobotiscurrentlyat,alongwiththedirection it’sfacing.Attheendofeachcorridorleavingthemazewewillhaveanexitnode. We’llassumesomenodecorrespondstothecenterofthemaze.

Initialstate:atthecenterofthemazefacingNorth. Goaltest:atanexitnode.

Successorfunction:movetothenextintersectioninfrontofus,ifthereisone;turnto faceanewdirection.

Costfunction:totaldistancemoved.

Thereare 4n states,where n isthenumberofintersections.

c.Initialstate:atthecenterofthemaze.

Goaltest:atanexitnode.

Successorfunction:movetonextintersectiontotheNorth, South,East,orWest.

Costfunction:totaldistancemoved.

Wenolongerneedtokeeptrackoftherobot’sorientationsinceitisirrelevantto

3.3

predictingtheoutcomeofouractions,andnotpartofthegoaltest.Themotorsystem thatexecutesthisplanwillneedtokeeptrackoftherobot’s currentorientation,toknow whentorotatetherobot.

d.Stateabstractions:

(i)Ignoringtheheightoftherobotofftheground,whetheritistiltedoffthevertical.

(ii)Therobotcanfaceinonlyfourdirections.

(iii)Otherpartsoftheworldignored:possibilityofother robotsinthemaze,the weatherintheCaribbean.

Actionabstractions:

(i)Weassumedallpositionswesafelyaccessible:therobot couldn’tgetstuckor damaged.

(ii)Therobotcanmoveasfarasitwants,withouthavingtorechargeitsbatteries.

(iii)Simpliﬁedmovementsystem:movingforwardsacertain distance,ratherthancontrolledeachindividualmotorandwatchingthesensorstodetectcollisions.

a.Statespace:Statesareallpossiblecitypairs (i,j).Themapis not thestatespace.

Successorfunction:Thesuccessorsof (i,j) areallpairs (x,y) suchthat Adjacent(x,i) and Adjacent(y,j)

Goal:Beat (i,i) forsome i

Stepcostfunction:Thecosttogofrom (i,j) to (x,y) is max(d(i,x),d(j,y))

b.Inthebestcase,thefriendsheadstraightforeachotherin stepsofequalsize,reducing theirseparationbytwicethetimecostoneachstep.Hence(iii)isadmissible.

c.Yes:e.g.,amapwithtwonodesconnectedbyonelink.Thetwo friendswillswap placesforever.Thesamewillhappenonanychainiftheystartanoddnumberofsteps apart.(Onecanseethisbestonthegraphthatrepresentsthe statespace,whichhastwo disjointsetsofnodes.)Thesameevenholdsforagridofanysizeorshape,because everymovechangestheManhattandistancebetweenthetwofriendsby0or2.

d.Yes:takeanyoftheunsolvablemapsfrompart(c)andaddaself-looptoanyoneof thenodes.Ifthefriendsstartanoddnumberofstepsapart,a moveinwhichoneofthe friendstakestheself-loopchangesthedistanceby1,renderingtheproblemsolvable.If theself-loopisnottaken,theargumentfrom(c)appliesand nosolutionispossible.

3.4 Fromhttp://www.cut-the-knot.com/pythagoras/ﬁfteen.shtml,thisproofappliestothe ﬁfteenpuzzle,butthesameargumentworksfortheeightpuzzle:

Deﬁnition:Thegoalstatehasthenumbersinacertainorder,whichwewillmeasureas startingattheupperleftcorner,thenproceedinglefttoright,andwhenwereachtheendofa row,goingdowntotheleftmostsquareintherowbelow.Foranyotherconﬁgurationbesides thegoal,wheneveratilewithagreaternumberonitprecedes atilewithasmallernumber, thetwotilesaresaidtobe inverted

Proposition:Foragivenpuzzleconﬁguration,let N denotethesumofthetotalnumber ofinversionsandtherownumberoftheemptysquare.Then (Nmod2) isinvariantunderany

Chapter3.SolvingProblemsbySearching legalmove.Inotherwords,afteralegalmoveanodd N remainsoddwhereasaneven N remainseven.ThereforethegoalstateinFigure3.4,withno inversionsandemptysquarein theﬁrstrow,has N =1,andcanonlybereachedfromstartingstateswithodd N ,notfrom startingstateswitheven N

Proof:Firstofall,slidingatilehorizontallychangesneitherthetotalnumberofinversionsnortherownumberoftheemptysquare.Thereforeletusconsiderslidingatile vertically.

Let’sassume,forexample,thatthetile A islocateddirectlyovertheemptysquare. Slidingitdownchangestheparityoftherownumberoftheemptysquare.Nowconsiderthe totalnumberofinversions.Themoveonlyaffectsrelativepositionsoftiles A, B, C,and D. Ifnoneofthe B, C, D causedaninversionrelativeto A (i.e.,allthreearelargerthan A)then afterslidingonegetsthree(anoddnumber)ofadditionalinversions.Ifoneofthethreeis smallerthan A,thenbeforethemove B, C,and D contributedasingleinversion(relativeto A)whereasafterthemovethey’llbecontributingtwoinversions-achangeof1,alsoanodd number.Twoadditionalcasesobviouslyleadtothesameresult.Thusthechangeinthesum N isalwayseven.Thisispreciselywhatwehavesetouttoshow.

Sobeforewesolveapuzzle,weshouldcomputethe N valueofthestartandgoalstate andmakesuretheyhavethesameparity,otherwisenosolutionispossible.

3.5 Theformulationputsonequeenpercolumn,withanewqueenplacedonlyinasquare thatisnotattackedbyanyotherqueen.Tosimplifymatters, we’llﬁrstconsiderthe n–rooks problem.Theﬁrstrookcanbeplacedinanysquareincolumn1(n choices),thesecondin anysquareincolumn2exceptthesamerowthatastherookincolumn1(n 1 choices),and soon.Thisgives n! elementsofthesearchspace.

For n queens,noticethataqueenattacksatmostthreesquaresinanygivencolumn,so incolumn2thereareatleast (n 3) choices,incolumnatleast (n 6) choices,andsoon.

3.6

a.Initialstate:Noregionscolored.

b.Initialstate:Asdescribedinthetext.

Successorfunction:Hoponcrate;Hopoffcrate;Pushcratefromonespottoanother; Walkfromonespottoanother;grabbananas(ifstandingoncrate).

Costfunction:Numberofactions.

Thusthestatespacesize S ≥ n · (n 3) · (n 6) ···.Hencewehave S3 ≥ n n n (n 3) (n 3) (n 3) (n 6) (n 6) (n 6) ≥ n (n 1) (n 2) (n 3) (n 4) (n 5) (n 6) (n 7) (n 8) = n! or S ≥ 3 √n!

Costfunction:Numberofassignments.

Goaltest:Allregionscolored,andnotwoadjacentregionshavethesamecolor. Successorfunction:Assignacolortoaregion.

Goaltest:Monkeyhasbananas.

3.7

c.Initialstate:consideringallinputrecords.

Goaltest:consideringasinglerecord,anditgives“illegalinput”message.

Successorfunction:runagainontheﬁrsthalfoftherecords;runagainonthesecond halfoftherecords.

Costfunction:Numberofruns.

Note:Thisisa contingencyproblem;youneedtoseewhetherarungivesanerror messageornottodecidewhattodonext.

d.Initialstate:jugshavevalues [0, 0, 0].

Successorfunction:givenvalues [x,y,z],generate [12,y,z], [x, 8,z], [x,y, 3] (byﬁlling); [0,y,z], [x, 0,z], [x,y, 0] (byemptying);orforanytwojugswithcurrentvalues x and y,pour y into x;thischangesthejugwith x totheminimumof x + y andthe capacityofthejug,anddecrementsthejugwith y bybytheamountgainedbytheﬁrst jug.

Costfunction:Numberofactions.

a.Ifweconsiderall (x,y) points,thenthereareaninﬁnitenumberofstates,andofpaths.

b.(Forthisproblem,weconsiderthestartandgoalpointstobevertices.)Theshortest distancebetweentwopointsisastraightline,andifitisnotpossibletotravelina straightlinebecausesomeobstacleisintheway,thenthenextshortestdistanceisa sequenceoflinesegments,end-to-end,thatdeviatefromthestraightlinebyaslittle aspossible.Sotheﬁrstsegmentofthissequencemustgofrom thestartpointtoa tangentpointonanobstacle–anypaththatgavetheobstacle awidergirthwouldbe longer.Becausetheobstaclesarepolygonal,thetangentpointsmustbeatverticesof theobstacles,andhencetheentirepathmustgofromvertextovertex.Sonowthestate spaceisthesetofvertices,ofwhichthereare35inFigure3.31.

c.Codenotshown.

d.Implementationsandanalysisnotshown.

3.8

a.Anypath,nomatterhowbaditappears,mightleadtoanarbitrarilylargereward(negativecost).Therefore,onewouldneedtoexhaustallpossiblepathstobesureofﬁnding thebestone.

b.Supposethegreatestpossiblerewardis c.Thenifwealsoknowthemaximumdepthof thestatespace(e.g.whenthestatespaceisatree),thenany pathwith d levelsremaining canbeimprovedbyatmost cd,soanypathsworsethan cd lessthanthebestpathcanbe pruned.Forstatespaceswithloops,thisguaranteedoesn’t help,becauseitispossible togoaroundaloopanynumberoftimes,pickingup c rewardeachtime.

c.Theagentshouldplantogoaroundthisloopforever(unless itcanﬁndanotherloop withevenbetterreward).

d.Thevalueofascenicloopislessenedeachtimeonerevisits it;anovelscenicsight isagreatreward,butseeingthesameoneforthetenthtimein anhouristedious,not

Chapter3.SolvingProblemsbySearching

rewarding.Toaccommodatethis,wewouldhavetoexpandthestatespacetoinclude amemory—astateisnowrepresentednotjustbythecurrentlocation,butbyacurrent locationandabagofalready-visitedlocations.Thereward forvisitinganewlocation isnowa(diminishing)functionofthenumberoftimesithasbeenseenbefore.

e.Realdomainswithloopingbehaviorincludeeatingjunkfoodandgoingtoclass.

3.9

a.Hereisonepossiblerepresentation:Astateisasix-tuple ofintegerslistingthenumber ofmissionaries,cannibals,andboatsontheﬁrstside,andthenthesecondsideofthe river.Thegoalisastatewith3missionariesand3cannibals onthesecondside.The costfunctionisoneperaction,andthesuccessorsofastate areallthestatesthatmove 1or2peopleand1boatfromonesidetoanother.

b.Thesearchspaceissmall,soanyoptimalalgorithmworks.Foranexample,seethe file "search/domains/cannibals.lisp".Itsufficestoeliminatemovesthat circlebacktothestatejustvisited.Fromallbutthefirstandlaststates,thereisonly oneotherchoice.

c.Itisnotobviousthatalmostallmovesareeitherillegalor reverttothepreviousstate. Thereisafeelingofalargebranchingfactor,andnoclearwaytoproceed.

3.10 A state isasituationthatanagentcanﬁnditselfin.Wedistinguish twotypesofstates: worldstates(theactualconcretesituationsintherealworld)andrepresentationalstates(the abstractdescriptionsoftherealworldthatareusedbytheagentindeliberatingaboutwhatto do).

A statespace isagraphwhosenodesarethesetofallstates,andwhoselinksare actionsthattransformonestateintoanother.

A searchtree isatree(agraphwithnoundirectedloops)inwhichtherootnodeisthe startstateandthesetofchildrenforeachnodeconsistsofthestatesreachablebytakingany action.

A searchnode isanodeinthesearchtree.

A goal isastatethattheagentistryingtoreach.

An action issomethingthattheagentcanchoosetodo.

A successorfunction describedtheagent’soptions:givenastate,itreturnsasetof (action,state)pairs,whereeachstateisthestatereachablebytakingtheaction.

The branchingfactor inasearchtreeisthenumberofactionsavailabletotheagent.

3.11 Aworldstateishowrealityisorcouldbe.Inoneworldstatewe’reinArad,inanother we’reinBucharest.Theworldstatealsoincludeswhichstreetwe’reon,what’scurrentlyon theradio,andthepriceofteainChina.Astatedescriptionisanagent’sinternaldescriptionofaworldstate.Examplesare In(Arad) and In(Bucharest).Thesedescriptionsare necessarilyapproximate,recordingonlysomeaspectofthe state.

Weneedtodistinguishbetweenworldstatesandstatedescriptionsbecausestatedescriptionarelossyabstractionsoftheworldstate,becausetheagentcouldbemistakenabout

howtheworldis,becausetheagentmightwanttoimaginethingsthataren’ttruebutitcould maketrue,andbecausetheagentcaresabouttheworldnotits internalrepresentationofit.

Searchnodesaregeneratedduringsearch,representingastatethesearchprocessknows howtoreach.Theycontainadditionalinformationasidefromthestatedescription,suchas thesequenceofactionsusedtoreachthisstate.Thisdistinctionisusefulbecausewemay generatedifferentsearchnodeswhichhavethesamestate,andbecausesearchnodescontain moreinformationthanastaterepresentation.

3.12

Thestatespaceisatreeofdepthone,withallstatessuccessorsoftheinitialstate. Thereisnodistinctionbetweendepth-firstsearchandbreadth-firstsearchonsuchatree.If thesequencelengthisunboundedtherootnodewillhaveinfinitelymanysuccessors,soonly algorithmswhichtestforgoalnodesaswegeneratesuccessorscanwork.

Whathappensnextdependsonhowthecompositeactionsaresorted.Ifthereisno particularordering,thenarandombutsystematicsearchof potentialsolutionsoccurs.Ifthey aresortedbydictionaryorder,thenthisimplementsdepth-firstsearch.Iftheyaresortedby lengthfirst,thendictionaryordering,thisimplementsbreadth-firstsearch.

Asigniﬁcantdisadvantageofcollapsingthesearchspacelikethisisifwediscoverthat aplanstartingwiththeaction“unplugyourbattery”can’tbeasolution,thereisnoeasyway toignoreallothercompositeactionsthatstartwiththisaction.Thisisaprobleminparticular forinformedsearchalgorithms.

Discardingsequencestructureisnotaparticularlypracticalapproachtosearch.

3.13

Thegraphseparationpropertystatesthat“everypathfromtheinitialstatetoanunexploredstatehastopassthroughastateinthefrontier.”

Atthestartofthesearch,thefrontierholdstheinitialstate;hence,trivially,everypath fromtheinitialstatetoanunexploredstateincludesanode inthefrontier(theinitialstate itself).

Now,weassumethatthepropertyholdsatthebeginningofanarbitraryiterationof theGRAPH-SEARCH algorithminFigure3.7.Weassumethattheiterationcompletes,i.e., thefrontierisnotemptyandtheselectedleafnode n isnotagoalstate.Attheendofthe iteration, n hasbeenremovedfromthefrontieranditssuccessors(ifnot alreadyexploredorin thefrontier)placedinthefrontier.Consideranypathfrom theinitialstatetoanunexplored state;bytheinductionhypothesissuchapath(atthebeginningoftheiteration)includes atleastonefrontiernode;exceptwhen n istheonlysuchnode,theseparationproperty automaticallyholds.Hence,wefocusonpathspassingthrough n (andnootherfrontier node).Bydeﬁnition,thenextnode n′ alongthepathfrom n mustbeasuccessorof n that (bytheprecedingsentence)isalreadynotinthefrontier.Furthermore, n′ cannotbeinthe exploredset,sincebyassumptionthereisapathfrom n′ toanunexplorednodenotpassing throughthefrontier,whichwouldviolatetheseparationpropertyaseveryexplorednodeis connectedtotheinitialstatebyexplorednodes(seelemmabelowforproofthisisalways possible).Hence, n′ isnotintheexploredset,henceitwillbeaddedtothefrontier;thenthe pathwillincludeafrontiernodeandtheseparationpropertyisrestored.

Thepropertyisviolatedbyalgorithmsthatmovenodesfromthefrontierintotheex-

Chapter3.SolvingProblemsbySearching ploredsetbeforealloftheirsuccessorshavebeengenerated,aswellasbythosethatfailto addsomeofthesuccessorstothefrontier.Notethatitisnot necessarytogenerate all successorsofanodeatoncebeforeexpandinganothernode,aslongaspartiallyexpandednodes remaininthefrontier.

Lemma:Everyexplorednodeisconnectedtotheinitialstate byapathofexplored nodes.

Proof:Thisistrueinitially,sincetheinitialstateisconnectedtoitself.Sincewenever removenodesfromtheexploredregion,weonlyneedtochecknewnodesweaddtothe exploredlistonanexpansion.Let n besuchanewexplorednode.Thisispreviouslyon thefrontier,soitisaneighborofanode n′ previouslyexplored(i.e.,itsparent). n′ is,by hypothesisisconnectedtotheinitialstatebyapathofexplorednodes.Thispathwith n appendedisapathofexplorednodesconnecting n′ totheinitialstate.

3.14

a False:aluckyDFSmightexpandexactly d nodestoreachthegoal.A∗ largelydominatesanygraph-searchalgorithmthatis guaranteedtoﬁndoptimalsolutions

b. True: h(n)=0 isalwaysanadmissibleheuristic,sincecostsarenonnegative.

c True:A*searchisoftenusedinrobotics;thespacecanbediscretizedorskeletonized.

d True:depthofthesolutionmattersforbreadth-ﬁrstsearch,not cost.

e. False:arookcanmoveacrosstheboardinmoveone,althoughtheManhattandistance fromstarttoﬁnishis8.

Depth-limited:1248951011

Iterativedeepening:1;123;1245367;1248951011

c.Bidirectionalsearchisveryuseful,becausetheonlysuccessorof n inthereversedirectionis ⌊(n/2)⌋.Thishelpsfocusthesearch.Thebranchingfactoris2inthe forward direction;1inthereversedirection.

1 2 3 4 5 6 7 8 9 10 12 11 13 14 15

3.15

FigureS3.1 ThestatespacefortheproblemdeﬁnedinEx.3.15. a.SeeFigureS3.1. .Breadth-ﬁrst:1234567891011

d.Yes;startatthegoal,andapplythesinglereversesuccessoractionuntilyoureach1.

e.Thesolutioncanbereadoffthebinarynumeralforthegoalnumber.Writethegoal numberinbinary.Sincewecanonlyreachpositiveintegers, thisbinaryexpansion beingswitha1.Frommost-toleast-signiﬁcantbit,skippingtheinitial1,goLeftto thenode 2n ifthisbitis0andgoRighttonode 2n +1 ifitis1.Forexample,suppose thegoalis 11,whichis 1011 inbinary.ThesolutionisthereforeLeft,Right,Right.

3.16

a. Initialstate:onearbitrarilyselectedpiece(sayastraightpiece).

Successorfunction:foranyopenpeg,addanypiecetypefromremainingtypes.(You canaddtoopenholesaswell,butthatisn’tnecessaryasallcompletetrackscanbe madebyaddingtopegs.)Foracurvedpiece,add ineitherorientation;forafork,add ineitherorientation and(iftherearetwoholes)connecting ateitherhole.It’sagood ideatodisallowanyoverlappingconfiguration,asthisterminateshopelessconfigurationsearly.(Note:thereisnoneedtoconsideropenholes,becauseinanysolutionthese willbefilledbypiecesaddedtoopenpegs.)

Goaltest:allpiecesusedinasingleconnectedtrack,noopenpegsorholes,nooverlappingtracks.

Stepcost:oneperpiece(actually,doesn’treallymatter).

b.Allsolutionsareatthesamedepth,sodepth-ﬁrstsearchwouldbeappropriate.(One couldalsousedepth-limitedsearchwithlimit n 1,butstrictlyspeakingit’snotnecessarytodotheworkofcheckingthelimitbecausestatesatdepth n 1 havenosuccessors.)Thespaceisverylarge,souniform-costandbreadth-ﬁrstwouldfail,anditerative deepeningsimplydoesunnecessaryextrawork.Therearemanyrepeatedstates,soit mightbegoodtouseaclosedlist.

c.Asolutionhasnoopenpegsorholes,soeverypegisinahole, sotheremustbeequal numbersofpegsandholes.Removingaforkviolatesthisproperty.Therearetwoother “proofs”thatareacceptable:1)asimilarargumenttotheeffectthattheremustbean evennumberof“ends”;2)eachforkcreatestwotracks,andonlyaforkcanrejointhose tracksintoone,soifaforkismissingitwon’twork.Theargumentusingpegsandholes isactuallymoregeneral,becauseitalsoappliestothecase ofathree-wayforkthathas oneholeandthreepegsoronepegandthreeholes.The“ends”argumentfailshere,as doesthefork/rejoinargument(whichisabithandwavyanyway).

d.Themaximumpossiblenumberofopenpegsis3(startsat1,addingatwo-pegfork increasesitbyone).Pretendingeachpieceisunique,anypiececanbeaddedtoapeg, givingatmost 12+(2 · 16)+(2 · 2)+(2 · 2 · 2)=56 choicesperpeg.Thetotal depthis32(thereare32pieces),soanupperboundis 16832/(12! · 16! · 2! · 2!) where thefactorialsdealwithpermutationsofidenticalpieces. Onecoulddoamorereﬁned analysistohandlethefactthatthebranchingfactorshrinksaswegodownthetree,but itisnotpretty.

3.17a. Thealgorithmexpandsnodesinorderofincreasingpathcost;thereforetheﬁrst goalitencounterswillbethegoalwiththecheapestcost.

Chapter3.SolvingProblemsbySearching

b. Itwillbethesameasiterativedeepening, d iterations,inwhich O(bd) nodesare generated.

c. d/ǫ

d. Implementationnotshown.

3.18 Consideradomaininwhicheverystatehasasinglesuccessor,andthereisasinglegoal atdepth n.Thendepth-ﬁrstsearchwillﬁndthegoalin n steps,whereasiterativedeepening searchwilltake 1+2+3+ ··· + n = O(n2) steps.

3.19 Asanordinaryperson(oragent)browsingtheweb,wecanonly generatethesuccessorsofapagebyvisitingit.Wecanthendobreadth-ﬁrstsearch,orperhapsbest-search searchwheretheheuristicissomefunctionofthenumberofwordsincommonbetweenthe startandgoalpages;thismayhelpkeepthelinksontarget.Searchengineskeepthecomplete graphoftheweb,andmayprovidetheuseraccesstoall(oratleastsome)ofthepagesthat linktoapage;thiswouldallowustodobidirectionalsearch.

3.20 Codenotshown,butagoodstartisinthecoderepository.Clearly,graphsearch mustbeused—thisisaclassicgridworldwithmanyalternate pathstoeachstate.Students willquicklyﬁndthatcomputingtheoptimalsolutionsequenceisprohibitivelyexpensivefor moderatelylargeworlds,becausethestatespaceforan n × n worldhas n2 · 2n states.The completiontimeoftherandomagentgrowslessthanexponentiallyin n,soforanyreasonable exchangeratebetweensearchcostadpathcosttherandomagentwilleventuallywin.

3.21

a.Whenallstepcostsareequal, g(n) ∝ depth (n),souniform-costsearchreproduces breadth-ﬁrstsearch.

b.Breadth-firstsearchisbest-firstsearchwith f (n)= depth (n);depth-firstsearchis best-firstsearchwith f (n)= depth (n);uniform-costsearchisbest-firstsearchwith f (n)= g(n).

c.Uniform-costsearchisA∗ searchwith h(n)=0.

3.22 Thestudentshouldﬁndthatonthe8-puzzle,RBFSexpandsmorenodes(because itdoesnotdetectrepeatedstates)buthaslowercostpernodebecauseitdoesnotneedto maintainaqueue.ThenumberofRBFSnodere-expansionsisnottoohighbecausethe presenceofmanytiedvaluesmeansthatthebestpathchanges seldom.Whentheheuristicis slightlyperturbed,thisadvantagedisappearsandRBFS’sperformanceismuchworse.

ForTSP,thestatespaceisatree,sorepeatedstatesarenotanissue.Ontheotherhand, theheuristicisreal-valuedandthereareessentiallynotiedvalues,soRBFSincursaheavy penaltyforfrequentre-expansions.

3.23 Thesequenceofqueuesisasfollows:

L[0+244=244]

M[70+241=311],T[111+329=440]

L[140+244=384],D[145+242=387],T[111+329=440]

D[145+242=387],T[111+329=440],M[210+241=451],T[251+329=580]

C[265+160=425],T[111+329=440],M[210+241=451],M[220+241=461],T[251+329=580]

T[111+329=440],M[210+241=451],M[220+241=461],P[403+100=503],T[251+329=580],R[411+193=604], D[385+242=627]

M[210+241=451],M[220+241=461],L[222+244=466],P[403+100=503],T[251+329=580],A[229+366=595], R[411+193=604],D[385+242=627]

M[220+241=461],L[222+244=466],P[403+100=503],L[280+244=524],D[285+242=527],T[251+329=580], A[229+366=595],R[411+193=604],D[385+242=627]

L[222+244=466],P[403+100=503],L[280+244=524],D[285+242=527],L[290+244=534],D[295+242=537], T[251+329=580],A[229+366=595],R[411+193=604],D[385+242=627]

P[403+100=503],L[280+244=524],D[285+242=527],M[292+241=533],L[290+244=534],D[295+242=537], T[251+329=580],A[229+366=595],R[411+193=604],D[385+242=627],T[333+329=662]

B[504+0=504],L[280+244=524],D[285+242=527],M[292+241=533],L[290+244=534],D[295+242=537],T[251+329=580], A[229+366=595],R[411+193=604],D[385+242=627],T[333+329=662],R[500+193=693],C[541+160=701]

FigureS3.2

AgraphwithaninconsistentheuristiconwhichGRAPH-SEARCH failsto returntheoptimalsolution.Thesuccessorsof S are A with f =5 and B with f =7. A is expandedﬁrst,sothepathvia B willbediscardedbecause A willalreadybeintheclosed list.

3.24 SeeFigureS3.2.

3.25 Itiscompletewhenever 0 ≤ w< 2. w =0 gives f (n)=2g(n).Thisbehavesexactly likeuniform-costsearch—thefactoroftwomakesnodifferenceinthe ordering ofthenodes. w =1 givesA∗ search. w =2 gives f (n)=2h(n),i.e.,greedybest-ﬁrstsearch.Wealso have f (n)=(2 w)[g(n)+ w 2 w h(n)] whichbehavesexactlylikeA∗ searchwithaheuristic w 2 w h(n).For w ≤ 1,thisisalways lessthan h(n) andhenceadmissible,provided h(n) isitselfadmissible.

3.26

a.Thebranchingfactoris4(numberofneighborsofeachlocation).

b.Thestatesatdepth k formasquarerotatedat45degreestothegrid.Obviouslythere arealinearnumberofstatesalongtheboundaryofthesquare,sotheansweris 4k

S A G B h=7 h=5 h=1 h=0 2 1 4 4

Chapter3.SolvingProblemsbySearching

c.Withoutrepeatedstatechecking,BFSexpendsexponentiallymanynodes:counting precisely,weget ((4x+y+1 1)/3) 1.

d.Therearequadraticallymanystateswithinthesquarefordepth x + y,sotheansweris 2(x + y)(x + y +1) 1.

e.True;thisistheManhattandistancemetric.

f.False;allnodesintherectangledeﬁnedby (0, 0) and (x,y) arecandidatesforthe optimalpath,andtherearequadraticallymanyofthem,allofwhichmaybeexpended intheworstcase.

g.True;removinglinksmayinducedetours,whichrequiremoresteps,so h isanunderestimate.

h.False;nonlocallinkscanreducetheactualpathlengthbelowtheManhattandistance.

3.27

a n2n.Thereare n vehiclesin n2 locations,soroughly(ignoringtheone-per-square constraint) (n2)n = n2n states.

b 5n

c.Manhattandistance,i.e., |(n i +1) xi| + |n yi|.Thisisexactforalonevehicle.

d.Only(iii) min{h1,...,hn}.Theexplanationisnontrivialasitrequirestwoobservations.First,letthe work W inagivensolutionbethetotal distance movedbyall vehiclesovertheirjointtrajectories;thatis,foreachvehicle,addthelengthsofallthe stepstaken.Wehave W ≥ i hi ≥≥ n min{h1,...,hn}.Second,thetotalworkwe cangetdoneperstepis ≤ n.(Notethatforeverycarthatjumps2,anothercarhasto stayput(move0),sothetotalworkperstepisboundedby n.)Hence,completingall theworkrequiresatleast n · min{h1,...,hn}/n = min{h1,...,hn} steps.

3.28 Theheuristic h = h1 + h2 (addingmisplacedtilesandManhattandistance)sometimes overestimates.Now,suppose h(n) ≤ h∗(n)+ c (asgiven)andlet G2 beagoalthatis suboptimalbymorethan c,i.e., g(G2) >C ∗ + c.Nowconsideranynode n onapathtoan optimalgoal.Wehave

f (n)= g(n)+ h(n)

≤ g(n)+ h∗(n)+ c

≤ C ∗ + c

≤ g(G2)

so G2 willneverbeexpandedbeforeanoptimalgoalisexpanded.

3.29 Aheuristicisconsistentiff,foreverynode n andeverysuccessor n′ of n generatedby anyaction a, h(n) ≤ c(n,a,n ′)+ h(n ′)

Onesimpleproofisbyinductiononthenumber k ofnodesontheshortestpathtoanygoal from n.For k =1,let n′ bethegoalnode;then h(n) ≤ c(n,a,n′).Fortheinductive

case,assume n′ isontheshortestpath k stepsfromthegoalandthat h(n′) isadmissibleby hypothesis;then

h(n) ≤ c(n,a,n ′)+ h(n ′) ≤ c(n,a,n ′)+ h∗(n ′)= h∗(n) so h(n) at k +1 stepsfromthegoalisalsoadmissible.

3.30 Thisexercisereiteratesasmallportionoftheclassicwork ofHeldandKarp(1970).

a.TheTSPproblemistoﬁndaminimal(totallength)paththroughthecitiesthatforms aclosedloop.MSTisarelaxedversionofthatbecauseitasks foraminimal(total length)graphthatneednotbeaclosedloop—itcanbeanyfully-connectedgraph.As aheuristic,MSTisadmissible—itisalwaysshorterthanorequaltoaclosedloop.

b.Thestraight-linedistancebacktothestartcityisaratherweakheuristic—itvastly underestimateswhentherearemanycities.Inthelaterstageofasearchwhenthereare onlyafewcitiesleftitisnotsobad.TosaythatMSTdominatesstraight-linedistance istosaythatMSTalwaysgivesahighervalue.ThisisobviouslytruebecauseaMST thatincludesthegoalnodeandthecurrentnodemusteitherbethestraightlinebetween them,oritmustincludetwoormorelinesthatadduptomore.(Thisallassumesthe triangleinequality.)

c.See "search/domains/tsp.lisp" forastartatthis.Theﬁleincludesaheuristic basedonconnectingeachunvisitedcitytoitsnearestneighbor,acloserelativetothe MSTapproach.

d.See(Cormen etal.,1990,p.505)foranalgorithmthatrunsin O(E log E) time,where E isthenumberofedges.Thecoderepositorycurrentlycontainsasomewhatless efﬁcientalgorithm.

Themisplaced-tilesheuristicisexactfortheproblemwhereatilecanmovefrom squareAtosquareB.Asthisisarelaxationoftheconditionthatatilecanmovefrom squareAtosquareBifBisblank,Gaschnig’sheuristiccannotbelessthanthemisplacedtilesheuristic.Asitisalsoadmissible(beingexactforarelaxationoftheoriginalproblem), Gaschnig’sheuristicisthereforemoreaccurate.

3.31

Ifwepermutetwoadjacenttilesinthegoalstate,wehaveastatewheremisplaced-tiles andManhattanbothreturn2,butGaschnig’sheuristicreturns3.

TocomputeGaschnig’sheuristic,repeatthefollowinguntilthegoalstateisreached: letBbethecurrentlocationoftheblank;ifBisoccupiedbytileX(nottheblank)inthe goalstate,moveXtoB;otherwise,moveanymisplacedtileto B.Studentscouldbeaskedto provethatthisistheoptimalsolutiontotherelaxedproblem.

3.32 Studentsshouldprovideresultsintheformofgraphsand/or tablesshowingbothruntimeandnumberofnodesgenerated.(Differentheuristicshavedifferentcomputationcosts.) Runtimesmaybeverysmallfor8-puzzles,soyoumaywanttoassignthe15-puzzleor24puzzleinstead.Theuseofpatterndatabasesisalsoworthexploringexperimentally.

4.1

SolutionsforChapter4 BeyondClassicalSearch

a.Localbeamsearchwith k =1 ishill-climbingsearch.

b.Localbeamsearchwithoneinitialstateandnolimitonthenumberofstatesretained, resemblesbreadth-ﬁrstsearchinthatitaddsonecompletelayerofnodesbeforeadding thenextlayer.Startingfromonestate,thealgorithmwould beessentiallyidenticalto breadth-ﬁrstsearchexceptthateachlayerisgeneratedall atonce.

c.Simulatedannealingwith T =0 atalltimes:ignoringthefactthattheterminationstep wouldbetriggeredimmediately,thesearchwouldbeidenticaltoﬁrst-choicehillclimbingbecauseeverydownwardsuccessorwouldberejectedwith probability1.(Exercise maybemodiﬁedinfutureprintings.)

d.Simulatedannealingwith T = ∞ atalltimesisarandom-walksearch:italways acceptsanewstate.

e.Geneticalgorithmwithpopulationsize N =1:ifthepopulationsizeis1,thenthe twoselectedparentswillbethesameindividual;crossover yieldsanexactcopyofthe individual;thenthereisasmallchanceofmutation.Thus,thealgorithmexecutesa randomwalkinthespaceofindividuals.

4.2 Despiteitshumbleorigins,thisquestionraisesmanyofthe sameissuesasthescientificallyimportantproblemofproteindesign.Thereisadiscreteassemblyspaceinwhichpieces arechosentobeaddedtothetrackandacontinuousconfigurationspacedeterminedbythe “jointangles”ateveryplacewheretwopiecesarelinked.Thuswecandefineastateasasetof oriented,linkedpiecesandtheassociatedjointanglesintherange [ 10, 10],plusasetofunlinkedpieces.Thelinkageandjointanglesexactlydeterminethephysicallayoutofthetrack; wecanallowfor(andpenalize)layoutsinwhichtrackslieon topofoneanother,orwecan disallowthem.Theevaluationfunctionwouldincludeterms forhowmanypiecesareused, howmanylooseendsthereare,and(ifallowed)thedegreeofoverlap.Wemightincludea penaltyfortheamountofdeviationfrom0-degreejointangles.(Wecouldalsoincludeterms for“interestingness”and“traversability”—forexample, itisnicetobeabletodriveatrain startingfromanytracksegmenttoanyother,endingupineitherdirectionwithouthavingto liftupthetrain.)Thetrickypartisthesetofallowedmoves.Obviouslywecanunlinkany pieceorlinkanunlinkedpiecetoanopenpegwitheitherorientationatanyallowedangle (possiblyexcludingmovesthatcreateoverlap).Moreproblematicaremovestojoinapeg

andholeonalready-linkedpiecesandmovestochangetheangleofajoint.Changingone anglemayforcechangesinothers,andthechangeswillvarydependingonwhethertheother piecesareattheirjoint-anglelimit.Ingeneraltherewill benounique“minimal”solutionfor agivenanglechangeintermsoftheconsequentchangestootherangles,andsomechanges maybeimpossible.

4.3 Hereisonesimplehill-climbingalgorithm:

•Connectallthecitiesintoanarbitrarypath.

•Picktwopointsalongthepathatrandom.

•Splitthepathatthosepoints,producingthreepieces.

•Tryallsixpossiblewaystoconnectthethreepieces.

•Keepthebestone,andreconnectthepathaccordingly.

•Iteratethestepsaboveuntilnoimprovementisobservedforawhile.

4.4 Codenotshown.

4.5 SeeFigureS4.1fortheadaptedalgorithm.ForstatesthatOR-SEARCH ﬁndsasolution foritrecordsthesolutionfound.Ifitlatervisitsthatstateagainitimmediatelyreturnsthat solution.

WhenOR-SEARCH failstoﬁndasolutionithastobecareful.Whetherastatecanbe solveddependsonthepathtakentothatsolution,aswedonot allowcycles.Soonfailure

OR-SEARCH recordsthevalueof path.Ifastateiswhichhaspreviouslyfailedwhen path containedanysubsetofitspresentvalue,OR-SEARCH returnsfailure.

Toavoidrepeatingsub-solutionswecanlabelallnewsolutionsfound,recordthese labels,thenreturnthelabelifthesestatesarevisitedagain.Post-processingcanpruneoff unusedlabels.Alternatively,wecanoutputadirectacyclicgraphstructureratherthanatree. See(Bertoli etal.,2001)forfurtherdetails.

4.6

Thequestionstatementdescribestherequiredchangesindetail,seeFigureS4.2forthe modifiedalgorithm.WhenOR-SEARCH cyclesbacktoastateon path itreturnsatoken loop whichmeanstoloopbacktothemostrecenttimethisstatewas reachedalongthepathto it.Since path isimplicitlystoredinthereturnedplan,thereissufficientinformationforlater processing,oramodifiedimplementation,toreplacethesewithlabels.

Theplanrepresentationisimplicitlyaugmentedtokeeptrackofwhethertheplanis cyclic(i.e.,containsa loop)sothatOR-SEARCH canpreferacyclicsolutions.

AND-SEARCH returnsfailureifallbranchesleaddirectlytoa loop,asinthiscasethe planwillalwaysloopforever.Thisistheonlycaseitneedstocheckasifallbranchesina ﬁniteplanlooptheremustbesomeAnd-nodewhosechildrenallimmediatelyloop.

4.7 Asequenceofactionsisasolutiontoabeliefstateproblemifittakeseveryinitial physicalstatetoagoalstate.Wecanrelaxthisproblembyrequiringittakeonly some initial physicalstatetoagoalstate.Tomakethiswelldeﬁned,we’llrequirethatitﬁndsasolution

function AND-OR-GRAPH-SEARCH(problem) returns aconditionalplan, orfailure

OR-SEARCH(problem.INITIAL-STATE, problem, [])

function OR-SEARCH(state, problem, path) returns aconditionalplan, orfailure if problem.GOAL-TEST(state) thenreturn theemptyplan if state haspreviouslybeensolved thenreturn RECALL-SUCCESS(state) if state haspreviouslyfailedforasubsetof path thenreturn failure if state ison path then

RECORD-FAILURE(state, path) return failure

foreach action in problem.ACTIONS(state) do plan ← AND-SEARCH(RESULTS(state, action), problem, [state | path]) if plan = failure then

RECORD-SUCCESS(state, [action | plan ]) return [action | plan ]

return failure

function AND-SEARCH(states, problem, path) returns aconditionalplan, orfailure foreach si in states do plan i ← OR-SEARCH(si, problem, path) if plan i = failure thenreturn failure

return [if s1 then plan 1 elseif s2 then plan 2 else if sn 1 then plan n 1 else plan n]

FigureS4.1 AND-OR searchwithrepeatedstatechecking.

forthephysicalstatewiththemostcostlysolution.If h∗(s) istheoptimalcostofsolution startingfromthephysicalstate s,then h(S)=max s∈S h∗(s)

istheheuristicestimategivenbythisrelaxedproblem.Thisheuristicassumesanysolution tothemostdifﬁcultstatetheagentthingspossiblewillsolveallstates.

OnthesensorlessvacuumcleanerprobleminFigure4.14, h correctlydeterminesthe optimalcostforallstatesexceptthecentralthreestates(thosereachedby [suck], [suck,left] and [suck,right])andtheroot,forwhich h estimatestobe1unitcheaperthantheyreally are.ThismeansA∗ willexpandthesethreecentralnodes,beforemarchingtowardsthe solution.

4.8

a.Anactionsequenceisasolutionforbeliefstate b ifperformingitstartinginanystate s ∈ b reachesagoalstate.Sinceanystateinasubsetof b isin b,theresultisimmediate. Anyactionsequencewhichis notasolutionforbeliefstate b isalsonotasolution foranysuperset;thisisthecontrapositiveofwhatwe’vejustproved.Onecannot,in general,sayanythingaboutarbitrarysupersets,astheactionsequenceneednotleadto agoalonthestatesoutsideof b.Onecansay,forexample,thatifanactionsequence

30 Chapter4.BeyondClassicalSearch

function AND-OR-GRAPH-SEARCH(problem) returns aconditionalplan, orfailure

OR-SEARCH(problem.INITIAL-STATE, problem, [])

function OR-SEARCH(state, problem, path) returns aconditionalplan, orfailure if problem.GOAL-TEST(state) thenreturn theemptyplan if state ison path thenreturn loop cyclic plan ← None foreach action in problem.ACTIONS(state) do plan ← AND-SEARCH(RESULTS(state, action), problem, [state | path]) if plan = failure then if plan isacyclic thenreturn [action | plan ] cyclic plan ← [action | plan ] if cyclic plan = None thenreturn cyclic plan return failure

function AND-SEARCH(states, problem, path) returns aconditionalplan, orfailure loopy ← True foreach si in states do plan i ← OR-SEARCH(si, problem, path) if plan i = failure thenreturn failure if plan i = loop then loopy ← False if not loopy then return [if s1 then plan 1 elseif s2 then plan 2 else ... if sn 1 then plan n 1 else plan n] return failure

solvesabeliefstate b andabeliefstate b′ thenitsolvestheunionbeliefstate b ∪ b′ .

b.Onexpansionofanode,donotaddtothefrontieranychildbeliefstatewhichisa supersetofapreviouslyexploredbeliefstate.

c.Ifyoukeeparecordofpreviouslysolvedbeliefstates,add achecktothestartofORsearchtocheckwhetherthebeliefstatepassedinisasubset ofapreviouslysolved beliefstate,returningtheprevioussolutionincaseitis.

4.9

Consideraverysimpleexample:aninitialbeliefstate {S1,S2},actions a and b both leadingtogoalstate G fromeitherinitialstate,and

c(S1,a,G)=3; c(S2,a,G)=5;

c(S1,b,G)=2; c(S2,b,G)=6 . Inthiscase,thesolution [a] costs3or5,thesolution [b] costs2or6.Neitheris“optimal”in anyobvioussense.

Insomecases,there will beanoptimalsolution.Letusconsiderjustthedeterministic case.Forthiscase,wecanthinkofthecostofaplanasamappingfromeachinitialphysicalstatetotheactualcostofexecutingtheplan.Intheexampleabove,thecostfor [a] is

FigureS4.2 AND-OR searchwithrepeatedstatechecking.

{S1:3,S2:5} andthecostfor [b] is {S1:2,S2:6}.Wecansaythatplan p1 weaklydominates p2 if,foreachinitialstate,thecostfor p1 isnohigherthanthecostfor p2.(Moreover, p1 dominates p2 ifitweaklydominatesit and hasalowercostforsomestate.)Ifaplan p weakly dominatesallothers,itisoptimal.Noticethatthisdeﬁnitionreducestoordinaryoptimalityin theobservablecasewhereeverybeliefstateisasingleton. Astheprecedingexampleshows, however,aproblemmayhavenooptimalsolutioninthissense.Aperhapsacceptableversion ofA∗ wouldbeonethatreturnsanysolutionthatisnotdominatedbyanother.

TounderstandwhetheritispossibletoapplyA∗ atall,ithelpstounderstanditsdependenceonBellman’s(1957) principleofoptimality: Anoptimalpolicyhasthepropertythat PRINCIPLEOF OPTIMALITY whatevertheinitialstateandinitialdecisionare,theremainingdecisionsmustconstitutean optimalpolicywithregardtothestateresultingfromthefirstdecision. Itisimportantto understandthatthisisarestrictiononperformancemeasuresdesignedtofacilitateefficient algorithms,notageneraldefinitionofwhatitmeanstobeoptimal.

Inparticular,ifwedeﬁnethecostofaplaninbelief-statespaceastheminimumcost ofanyphysicalrealization,weviolateBellman’sprinciple.Modifyingandextendingthe previousexample,supposethat a and b reach S3 from S1 and S4 from S2,andthenreach G fromthere:

c(S1,a,S3)=6; c(S2,a,S4)=2;

c(S1,b,S3)=6; c(S2,b,S4)=1 .c(S3,a,G)=2; c(S4,a,G)=2;

c(S3,b,G)=1; c(S4,b,G)=9 .

Inthebeliefstate {S3,S4},theminimumcostof [a] is min{2, 2} =2 andtheminimumcost of [b] is min{1, 9} =1,sotheoptimalplanis [b].Intheinitialbeliefstate {S1,S2},thefour possibleplanshavethefollowingcosts:

Hence,theoptimalplanin {S1,S2} is [b,a],whichdoes not choose b in {S3,S4} eventhough thatistheoptimalplanatthatpoint.Thiscounterintuitivebehaviorisadirectconsequence ofchoosingtheminimumofthepossiblepathcostsastheperformancemeasure.

Thisexamplegivesjustasmalltasteofwhatmighthappenwithnonadditiveperformancemeasures.DetailsofhowtomodifyandanalyzeA∗ forgeneralpath-dependentcost functionsaregivebyDechterandPearl(1985).Manyaspects ofA∗ carryover;forexample, wecanstillderivelowerboundsonthecostofapaththrougha givennode.Forabeliefstate b,theminimumvalueof g(s)+ h(s) foreachstate s in b isalowerboundontheminimum costofaplanthatgoesthrough b

4.10 ThebeliefstatespaceisshowninFigureS4.3.Nosolutionis possiblebecausenopath leadstoabeliefstateallofwhoseelementssatisfythegoal.Iftheproblemisfullyobservable, theagentreachesagoalstatebyexecutingasequencesuchthat Suck isperformedonlyina dirtysquare.Thisensuresdeterministicbehaviorandeverystateisobviouslysolvable.

4.11

Thestudentneedstomakeseveraldesignchoicesinansweringthisquestion.First, howwilltheverticesofobjectsberepresented?Theproblem statestheperceptisalistof vertexpositions,butthatisnotpreciseenough.Hereisone goodchoice:Theagenthasan

32 Chapter4.BeyondClassicalSearch

a,a]:min

a,b]:min

7, 11} =7;[b,a]:min{8, 3} =3;[b,b]:min{7, 10} =7

[

{

} =4;[

{

orientation(aheadingindegrees).Thevisiblevertexesarelistedinclockwiseorder,starting straightaheadoftheagent.Eachvertexhasarelativeangle (0to360degrees)andadistance. Wealsowanttoknowifavertexrepresentstheleftedgeofanobstacle,therightedge,oran interiorpoint.WecanusethesymbolsL,R,orItoindicatethis.

Thestudentwillneedtodosomebasiccomputationalgeometrycalculations:intersectionofapathandasetoflinesegmentstoseeiftheagentwill bumpintoanobstacle,and visibilitycalculationstodeterminethepercept.Thereareefﬁcientalgorithmsfordoingthis onasetoflinesegments,butdon’tworryaboutefﬁciency;an exhaustivealgorithmisok.If thisseemstoomuch,theinstructorcanprovideanenvironmentsimulatorandaskthestudent onlytoprogramtheagent.

Toanswer(c),thestudentwillneedsomeexchangeratefortradingoffsearchtimewith movementtime.Itisprobablytoocomplextomakethesimulationasynchronousreal-time; easiertoimposeapenaltyinpointsforcomputation.

For(d),theagentwillneedtomaintainasetofpossiblepositions.Eachtimetheagent moves,itmaybeabletoeliminatesomeofthepossibilities. Theagentmayconsidermoves thatservetoreduceuncertaintyratherthanjustgettothegoal.

4.12 Thisquestionisslightlyambiguousastowhattheperceptis—eithertheperceptisjust thelocation,oritgivesexactlythesetofunblockeddirections(i.e.,blockeddirectionsare illegalactions).Wewillassumethelatter.(Exercisemaybemodifiedinfutureprintings.) Thereare12possiblelocationsforinternalwalls,sothere are 212 =4096 possibleenvironmentconfigurations.Abeliefstatedesignatesa subset oftheseaspossibleconfigurations; forexample,beforeseeinganyperceptsall4096configurationsarepossible—thisisasingle beliefstate.

a.Onlinesearchisequivalenttoofﬂinesearchinbelief-statespacewhereeachaction inabelief-statecanhavemultiplesuccessorbelief-states:oneforeachperceptthe agentcouldobserveaftertheaction.Asuccessorbelief-stateisconstructedbytaking thepreviousbelief-state,itselfasetofstates,replacingeachstateinthisbelief-state bythesuccessorstateundertheaction,andremovingallsuccessorstateswhichare inconsistentwiththepercept.ThisisexactlytheconstructioninSection4.4.2.AND-OR searchcanbeusedtosolvethissearchproblem.Theinitialbeliefstatehas 210=1024 statesinit,asweknowwhethertwoedgeshavewallsornot(theupperandrightedges havenowalls)butnothingmore.Thereare 2212 possiblebeliefstates,oneforeachset ofenvironmentconﬁgurations.

33 L R L R S S S

FigureS4.3 Thebeliefstatespaceforthesensorlessvacuumworldunder Murphy’slaw.

Wecanviewthisasacontingencyprobleminbeliefstatespace.Aftereachactionandpercept,theagentlearnswhetherornotaninternal wallexistsbetweenthe currentsquareandeachneighboringsquare.Hence,eachreachablebeliefstatecanbe representedexactlybyalistofstatusvalues(present,absent,unknown)foreachwall separately.Thatis,thebeliefstateiscompletelydecomposableandthereareexactly 312 reachablebeliefstates.Themaximumnumberofpossiblewall-perceptsineachstate is16(24),soeachbeliefstatehasfouractions,eachwithupto16nondeterministic successors.

b.Assumingtheexternalwallsareknown,therearetwointernalwallsandhence 22 =4 possiblepercepts.

c.Theinitialnullactionleadstofourpossiblebeliefstates,asshowninFigureS4.4.From eachbeliefstate,theagentchoosesasingleactionwhichcanleadtoupto8beliefstates (onenteringthemiddlesquare).Giventhepossibilityofhavingtoretraceitsstepsat adeadend,theagentcanexploretheentiremazeinnomorethan18steps,sothe completeplan(expressedasatree)hasnomorethan 818 nodes.Ontheotherhand, therearejust 312 reachablebeliefstates,sotheplancouldbeexpressedmore concisely asatableofactionsindexedbybeliefstate(a policy intheterminologyofChapter17).

4.13 Hillclimbingissurprisinglyeffectiveatﬁndingreasonableifnotoptimalpathsforvery littlecomputationalcost,andseldomfailsintwodimensions.

34 Chapter4.BeyondClassicalSearch ?? ? ?? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ?? ? ? ? ? ? ? ?? ? ? ? ? ? ? ?? ? ? ? ? ? ? NoOp Right

FigureS4.4 The 3 × 3 mazeexplorationproblem:theinitialstate,ﬁrstpercept, andone selectedactionwithitsperceptualoutcomes.

a.Itispossible(seeFigureS4.5(a))butveryunlikely—theobstaclehastohaveanunusual shapeandbepositionedcorrectlywithrespecttothegoal.

b.Withnonconvexobstacles,gettingstuckismuchmorelikelytobeaproblem(seeFigureS4.5(b)).

c.Noticethatthisisjustdepth-limitedsearch,whereyouchooseastepalongthebestpath evenifitisnotasolution.

d.Set k tothemaximumnumberofsidesofanypolygonandyoucanalwaysescape.

e.LRTA*alwaysmakesamove,butmaymovebackiftheoldstatelooksbetterthanthe newstate.Butthentheoldstateispenalizedforthecostofthetrip,soeventuallythe localminimumﬁllsupandtheagentescapes.

4.14

Sincewecanobservesuccessorstates,wealwaysknowhowtobacktrackfromtoa previousstate.Thismeanswecanadaptiterativedeepening searchtosolvethisproblem. Theonlydifferenceisbacktrackingmustbeexplicit,followingtheactionwhichtheagent canseeleadstothepreviousstate.

35 Current position Goal (a)(b) Current position Goal

FigureS4.5 (a)Gettingstuckwithaconvexobstacle.(b)Gettingstuckwithanonconvex obstacle.

Thealgorithmexpandsthefollowingnodes: Depth1: (0, 0), (1, 0), (0, 0), ( 1, 0), (0, 0) Depth2: (0, 1), (0, 0), (0, 1), (0, 0), (1, 0), (2, 0), (1, 0), (0, 0), (1, 0), (1, 1), (1, 0), (1, 1)

SolutionsforChapter5 AdversarialSearch

5.1 Thetranslationusesthemodeloftheopponent OM (s) toﬁllintheopponent’sactions, leavingouractionstobedeterminedbythesearchalgorithm.Let P (s) bethestatepredicted tooccuraftertheopponenthasmadealltheirmovesaccordingto OM.Notethattheopponentmaytakemultiplemovesinarowbeforewegetamove,so weneedtodeﬁnethis recursively.Wehave P (s)= s ifPLAYERs isusorTERMINAL-TESTs istrue,otherwise P (s)= P (RESULT(s,OM (s))

Thesearchproblemisthengivenby:

a.Initialstate: P (S0) where S0 istheinitialgamestate.Weapply P astheopponentmay playﬁrst

b.Actions:deﬁnedasinthegamebyACTIONSs.

c.Successorfunction:RESULT′(s,a)= P (RESULT(s,a))

d.Goaltest:goalsareterminalstates

e.Stepcost:thecostofanactioniszerounlesstheresulting state s′ isterminal,inwhich caseitscostis M UTILITY(s′) where M =maxs UTILITY(s).Noticethatallcosts arenon-negative.

Noticethatthestatespaceofthesearchproblemconsistsof gamestatewhereweareto playandterminalstates.Stateswheretheopponentistoplayhavebeencompiledout.One mightalternativelyleavethosestatesinbutjusthaveasinglepossibleaction.

AnyofthesearchalgorithmsofChapter3canbeapplied.Forexample,depth-ﬁrst searchcanbeusedtosolvethisproblem,ifallgameseventuallyend.Thisisequivalentto usingtheminimaxalgorithmontheoriginalgameif OM (s) alwaysreturnstheminimax movein s

5.2

a.Initialstate:twoarbitrary8-puzzlestates.Successorfunction:onemoveonanunsolved puzzle.(Youcouldalsohaveactionsthatchangebothpuzzlesatthesametime;thisis OKbuttechnicallyyouhavetosaywhathappenswhenoneissolvedbutnottheother.)

Goaltest:bothpuzzlesingoalstate.Pathcost:1permove.

b.Eachpuzzlehas 9!/2 reachablestates(rememberthathalfthestatesareunreachable). Thejointstatespacehas (9!)2/4 states.

c.Thisislikebackgammon;expectiminimaxworks.

d.Actuallythestatementinthequestionisnottrue(itappliestoapreviousversionofpart (c)inwhichtheopponentisjusttryingtopreventyoufromwinning—inthatcase,the cointosseswilleventuallyallowyoutosolveonepuzzlewithoutinterruptions).Forthe gamedescribedin(c),considerastateinwhichthecoinhascomeupheads,say,and yougettoworkonapuzzlethatis2stepsfromthegoal.Should youmoveonestep closer?Ifyoudo,youropponentwinsifhetossesheads;orif hetossestails,youtoss tails,andhetossesheads;oranysequencewherebothtosstails n timesandthenhe tossesheads.Sohisprobabilityofwinningis atleast 1/2+1/8+1/32+ ··· =2/3.So itseemsyou’rebetteroffmoving away fromthegoal.(There’snowaytostaythesame distancefromthegoal.)Thisproblemunintentionallyseemstohavethesamekindof solutionassuicidetictactoewithpassing.

5.3

a.SeeFigureS5.1;thevaluesarejust(minus)thenumberofstepsalongthepathfromthe root.

b.SeeFigureS5.1;notethatthereisbothanupperboundandalowerboundfortheleft childoftheroot.

c.Seeﬁgure.

d.Theshortest-pathlengthbetweenthetwoplayersisalower boundonthetotalcapture time(heretheplayerstaketurns,sononeedtodividebytwo),sothe“?”leaveshavea capturetimegreaterthanorequaltothesumofthecostfromtherootandtheshortestpathlength.NoticethatthisboundisderivedwhentheEvaderplaysverybadly.The truevalueofanodecomesfrombestplaybybothplayers,sowe cangetbetterbounds byassumingbetterplay.Forexample,wecangetabetterboundfromthecostwhenthe EvadersimplymovesbackwardsandforwardsratherthanmovingtowardsthePursuer.

e.Seeﬁgure(wehaveusedthesimplebounds).Noticethatonce therightchildisknown

37 bd cd ad cecfcc debedfbf aeafac dddd−4−4 −2 −4−4 <=−6 <=−6 <=−6 <=−6 <=−4 −4 −4 −4 <=−6 −4

FigureS5.1 Pursuit-evasionsolutiontree.

tohaveavaluebelow–6,theremainingsuccessorsneednotbe considered.

f.Thepursueralwayswinsifthetreeisﬁnite.Toprovethis,letthetreeberootedas thepursuer’scurrentnode.(I.e.,pickupthetreebythatnodeanddanglealltheother branchesdown.)Theevadermusteitherbeattheroot,inwhichcasethepursuerhas won,orinsomesubtree.Thepursuertakesthebranchleading tothatsubtree.This processrepeatsatmost d times,where d isthemaximumdepthoftheoriginalsubtree, untilthepursuereithercatchestheevaderorreachesaleaf node.Sincetheleafhasno subtrees,theevadermustbeatthatnode.

5.4 Thebasicphysicalstateofthesegamesisfairlyeasytodescribe.Oneimportantthing torememberforScrabbleandbridgeisthatthephysicalstateisnotaccessibletoallplayers andsocannotbeprovideddirectlytoeachplayerbytheenvironmentsimulator.Particularly inbridge,eachplayerneedstomaintainsomebestguess(ormultiplehypotheses)astothe actualstateoftheworld.Weexpecttobeputtingsomeofthegameimplementationsonline astheybecomeavailable.

5.5 Codenotshown.

5.6 Themostobviouschangeisthatthespaceofactionsisnowcontinuous.Forexample, inpool,thecueingdirection,angleofelevation,speed,andpointofcontactwiththecueball areallcontinuousquantities.

Thesimplestsolutionisjusttodiscretizetheactionspace andthenapplystandardmethods.Thismightworkfortennis(modelledcrudelyasalternatingshotswithspeedanddirection),butforgamessuchaspoolandcroquetitislikelytofailmiserablybecausesmall changesindirectionhavelargeeffectsonactionoutcome.Instead,onemustanalyzethe gametoidentifyadiscretesetofmeaningfullocalgoals,suchas“pottingthe4-ball”inpool or“layingupforthenexthoop”incroquet.Then,inthecurrentcontext,alocaloptimization routinecanworkoutthebestwaytoachieveeachlocalgoal,resultinginadiscretesetofpossiblechoices.Typically,thesegamesarestochastic,sothebackgammonmodelisappropriate providedthatweusesampledoutcomesinsteadofsummingoveralloutcomes.

Whereaspoolandcroquetaremodelledcorrectlyasturn-takinggames,tennisisnot. Whileoneplayerismovingtotheball,theotherplayerismovingtoanticipatetheopponent’s return.Thismakestennismorelikethesimultaneous-actiongamesstudiedinChapter17.In particular,itmaybereasonabletoderive randomized strategiessothattheopponentcannot anticipatewheretheballwillgo.

5.7 Considera MIN nodewhosechildrenareterminalnodes.If MIN playssuboptimally, thenthevalueofthenodeisgreaterthanorequaltothevalue itwouldhaveif MIN played optimally.Hence,thevalueofthe MAX nodethatisthe MIN node’sparentcanonlybe increased.Thisargumentcanbeextendedbyasimpleinductionallthewaytotheroot. If thesuboptimalplayby MIN ispredictable,thenonecandobetterthanaminimaxstrategy. Forexample,if MIN alwaysfallsforacertainkindoftrapandloses,thensettingthetrap guaranteesawinevenifthereisactuallyadevastatingresponsefor MIN.Thisisshownin FigureS5.2.

5.8

38 Chapter5.AdversarialSearch

Asimplegametreeshowingthatsettingatrapfor MIN byplaying ai isawin if MIN fallsforit,butmayalsobedisastrous.Theminimaxmoveisofcourse a2,withvalue 5

Thegametreeforthefour-squaregameinExercise5.8.Terminalstatesare insingleboxes,loopstatesindoubleboxes.Eachstateisannotatedwithitsminimaxvalue inacircle.

a.(5)Thegametree,completewithannotationsofallminimax values,isshowninFigureS5.3.

b.(5)The“?”valuesarehandledbyassumingthatanagentwith achoicebetweenwinningthegameandenteringa“?”statewillalwayschoosethewin.Thatis,min(–1,?) is–1andmax(+1,?)is+1.Ifallsuccessorsare“?”,thebacked-upvalueis“?”.

c.(5)Standardminimaxisdepth-firstandwouldgointoaninfiniteloop.Itcanbefixed

39 MAX MIN a1 A B D −10 10001000 2b 3b 1b 2d 1d 3d −5 −5 −5 a2

FigureS5.2

(1,4) (2,4) (2,3) (1,3) (1,2) (3,2) (3,4) (4,3) (3,1) (2,4) (1,4) +1 −1 ? ? ? −1 −1 −1 +1 +1 +1

FigureS5.3

bycomparingthecurrentstateagainstthestack;andifthestateisrepeated,thenreturn a“?”value.Propagationof“?”valuesishandledasabove.Althoughitworksinthis case,itdoesnot always workbecauseitisnotclearhowtocompare“?”withadrawn position;norisitclearhowtohandlethecomparisonwhentherearewinsofdifferent degrees(asinbackgammon).Finally,ingameswithchancenodes,itisunclearhowto computetheaverageofanumberanda“?”.Notethatitis not correcttotreatrepeated statesautomaticallyasdrawnpositions;inthisexample,both(1,4)and(2,4)repeatin thetreebuttheyarewonpositions.

Whatisreallyhappeningisthateachstatehasawell-defined butinitiallyunknown value.Theseunknownvaluesarerelatedbytheminimaxequationatthebottomof 164.Ifthegametreeisacyclic,thentheminimaxalgorithmsolvestheseequationsby propagatingfromtheleaves.Ifthegametreehascycles,thenadynamicprogramming methodmustbeused,asexplainedinChapter17.(Exercise17.7studiesthisproblemin particular.)Thesealgorithmscandeterminewhethereachnodehasawell-determined value(asinthisexample)orisreallyaninfiniteloopinthat bothplayersprefertostay intheloop(orhavenochoice).Insuchacase,therulesofthe gamewillneedtodefine thevalue(otherwisethegamewillneverend).Inchess,forexample,astatethatoccurs 3times(andhenceisassumedtobedesirableforbothplayers)isadraw.

d.Thisquestionisalittletricky.Oneapproachisaproofbyinductiononthesizeofthe game.Clearly,thebasecase n =3 isalossforAandthebasecase n =4 isawinfor A.Forany n> 4,theinitialmovesarethesame:AandBbothmoveonesteptowards eachother.Now,wecanseethattheyareengagedinasubgameofsize n 2 onthe squares [2,...,n 1], except thatthereisanextrachoiceofmovesonsquares 2 and n 1.Ignoringthisforamoment,itisclearthatifthe“n 2”iswonforA,thenA getstothesquare n 1 beforeBgetstosquare 2 (bythedeﬁnitionofwinning)and thereforegetsto n beforeBgetsto 1,hencethe“n”gameiswonforA.Bythesame lineofreasoning,if“n 2”iswonforBthen“n”iswonforB.Now,thepresenceof theextramovescomplicatestheissue,butnottoomuch.First,theplayerwhoisslated towinthesubgame [2,...,n 1] nevermovesbacktohishomesquare.Iftheplayer slatedtolosethesubgamedoesso,thenitiseasytoshowthat heisboundtolosethe gameitself—theotherplayersimplymovesforwardandasubgameofsize n 2k is playedonestepclosertotheloser’shomesquare.

5.9 For a,thereareatmost9!games.(Thisisthenumberofmovesequencesthatﬁllupthe board,butmanywinsandlossesendbeforetheboardisfull.) For b–e,FigureS5.4showsthe gametree,withtheevaluationfunctionvaluesbelowtheterminalnodesandthebacked-up valuestotherightofthenon-terminalnodes.Thevaluesimplythatthebeststartingmovefor Xistotakethecenter.Theterminalnodeswithaboldoutline aretheonesthatdonotneed tobeevaluated,assumingtheoptimalordering.

5.10

a.Anupperboundonthenumberofterminalnodesis N !,oneforeachorderingof squares,soanupperboundonthetotalnumberofnodesis N i=1 i!.Thisisnotmuch

40 Chapter5.AdversarialSearch