Artificial intelligence a modern approach 3rd edition russell solutions manual

Page 1

Artificial Intelligence A Modern Approach 3rd Edition Russell Solutions Manual Visit to Download in Full: https://testbankdeal.com/download/artificial-intelligence-a-m odern-approach-3rd-edition-russell-solutions-manual/
Instructor’sManual: ExerciseSolutions for ArtificialIntelligence AModernApproach ThirdEdition(InternationalVersion) StuartJ.RussellandPeterNorvig withcontributionsfrom ErnestDavis,NicholasJ.Hay,andMehranSahami UpperSaddleRiverBostonColumbusSanFranciscoNewYork IndianapolisLondonTorontoSydneySingaporeTokyoMontreal DubaiMadridHongKongMexicoCityMunichParisAmsterdamCapeTown Artificial Intelligence A Modern Approach 3rd Edition Russell Solutions Manual Visit TestBankDeal.com to get complete for all chapters

Editor-in-Chief:MichaelHirsch

ExecutiveEditor:TracyDunkelberger

AssistantEditor:MelindaHaggerty

EditorialAssistant:AllisonMichael

VicePresident,Production:VinceO’Brien

SeniorManagingEditor:ScottDisanno

ProductionEditor:JaneBonnell

InteriorDesigners:StuartRussellandPeterNorvig

Copyright©2010,2003,1995byPearsonEducation,Inc., UpperSaddleRiver,NewJersey07458. Allrightsreserved.ManufacturedintheUnitedStatesofAmerica.Thispublicationisprotectedby Copyrightandpermissionsshouldbeobtainedfromthepublisherpriortoanyprohibitedreproduction, storageinaretrievalsystem,ortransmissioninanyformor byanymeans,electronic,mechanical, photocopying,recording,orlikewise.Toobtainpermission(s)tousematerialsfromthiswork,please submitawrittenrequesttoPearsonHigherEducation,PermissionsDepartment,1LakeStreet,Upper SaddleRiver,NJ07458.

Theauthorandpublisherofthisbookhaveusedtheirbesteffortsinpreparingthisbook.These effortsincludethedevelopment,research,andtestingofthetheoriesandprogramstodeterminetheir effectiveness.Theauthorandpublishermakenowarrantyof anykind,expressedorimplied,with regardtotheseprogramsorthedocumentationcontainedinthisbook.Theauthorandpublishershall notbeliableinanyeventforincidentalorconsequentialdamagesinconnectionwith,orarisingout of,thefurnishing,performance,oruseoftheseprograms.

LibraryofCongressCataloging-in-PublicationDataonFile

10987654321

ISBN-13:978-0-13-606738-2

ISBN-10:0-13-606738-7

Preface

ThisInstructor’sSolutionManualprovidessolutions(oratleastsolutionsketches)for almostallofthe400exercisesin ArtificialIntelligence:AModernApproach(ThirdEdition) Weonlygiveactualcodeforafewoftheprogrammingexercises;writingalotofcodewould notbethathelpful,ifonlybecausewedon’tknowwhatlanguageyouprefer.

Inmanycases,wegiveideasfordiscussionandfollow-upquestions,andwetryto explain why wedesignedeachexercise.

Thereismoresupplementarymaterialthatwewanttoofferto theinstructor,butwe havedecidedtodoitthroughthemediumoftheWorldWideWebratherthanthroughaCD orprintedInstructor’sManual.Theideaisthatthissolutionmanualcontainsthematerialthat mustbekeptsecretfromstudents,buttheWebsitecontainsmaterialthatcanbeupdatedand addedtoinamoretimelyfashion.Theaddressforthewebsite is:

http://aima.cs.berkeley.edu

andtheaddressfortheonlineInstructor’sGuideis:

http://aima.cs.berkeley.edu/instructors.html

Thereyouwillfind:

•Instructionsonhowtojointhe aima-instructors discussionlist.Westronglyrecommendthatyoujoinsothatyoucanreceiveupdates,corrections,notificationofnew versionsofthisSolutionsManual,additionalexercisesandexamquestions,etc.,ina timelymanner.

•Sourcecodeforprogramsfromthetext.WeoffercodeinLisp,Python,andJava,and pointtocodedevelopedbyothersinC++andProlog.

•Programmingresourcesandsupplementaltexts.

•Figuresfromthetext,formakingyourownslides.

•Terminologyfromtheindexofthebook.

•OthercoursesusingthebookthathavehomepagesontheWeb. Youcanseeexample syllabiandassignmentshere.Please donot putsolutionsetsforAIMAexerciseson publicwebpages!

•AIEducationinformationonteachingintroductoryAIcourses.

•OthersitesontheWebwithinformationonAI.Organizedbychapterinthebook;check thisforsupplementalmaterial.

Wewelcomesuggestionsfornewexercises,newenvironments andagents,etc.The bookbelongstoyou,theinstructor,asmuchasus.Wehopethatyouenjoyteachingfromit, thatthesesupplementalmaterialshelp,andthatyouwillshareyoursupplementsandexperienceswithotherinstructors.

iii

1.1

SolutionsforChapter1 Introduction

a.Dictionarydefinitionsof intelligence talkabout“thecapacitytoacquireandapply knowledge”or“thefacultyofthoughtandreason”or“theabilitytocomprehendand profitfromexperience.”Theseareallreasonableanswers,butifwewantsomething quantifiablewewouldusesomethinglike“theabilitytoapplyknowledgeinorderto performbetterinanenvironment.”

b.Wedefine artificialintelligence asthestudyandconstructionofagentprogramsthat performwellinagivenenvironment,foragivenagentarchitecture.

c.Wedefinean agent asanentitythattakesactioninresponsetoperceptsfroman environment.

d.Wedefine rationality asthepropertyofasystemwhichdoesthe“rightthing”given whatitknows.SeeSection2.2foramorecompletediscussion.Bothdescribeperfect rationality,however;seeSection27.3.

e.Wedefine logicalreasoning astheaprocessofderivingnewsentencesfromold,such thatthenewsentencesarenecessarilytrueiftheoldonesaretrue.(Noticethatdoes notrefertoanyspecificsyntaxoorformallanguage,butitdoesrequireawell-defined notionoftruth.)

1.2

Theprobabilityoffoolinganinterrogatordependsonjusthowunskilledtheinterrogatoris.Oneentrantinthe2002Loebnerprizecompetition(whichisnotquitearealTuring Test)didfoolonejudge,althoughifyoulookatthetranscript,itishardtoimaginewhat thatjudgewasthinking.Therecertainlyhavebeenexamples ofachatbotorotheronline agentfoolinghumans.Forexample,seeSeeLennyFoner’saccountoftheJuliachatbot atfoner.www.media.mit.edu/people/foner/Julia/.We’dsaythechancetodayissomething like10%,withthevariationdependingmoreontheskilloftheinterrogatorratherthanthe program.In50years,weexpectthattheentertainmentindustry(movies,videogames,commercials)willhavemadesufficientinvestmentsinartificialactorstocreateverycredible impersonators.

Seethesolutionforexercise26.1forsomediscussionofpotentialobjections.
1
1.3 Yes,theyarerational,becauseslower,deliberativeactionswouldtendtoresultinmore damagetothehand.If“intelligent”means“applyingknowledge”or“usingthoughtand reasoning”thenitdoesnotrequireintelligencetomakeareflexaction.

1.4 No.IQtestscorescorrelatewellwithcertainothermeasures,suchassuccessincollege, abilitytomakegooddecisionsincomplex,real-worldsituations,abilitytolearnnewskills andsubjectsquickly,andsoon,but only ifthey’remeasuringfairlynormalhumans.TheIQ testdoesn’tmeasureeverything.AprogramthatisspecializedonlyforIQtests(andspecializedfurtheronlyfortheanalogypart)wouldverylikelyperformpoorlyonothermeasures ofintelligence.Considerthefollowinganalogy:ifahuman runsthe100min10seconds,we mightdescribehimorheras veryathletic andexpectcompetentperformanceinotherareas suchaswalking,jumping,hurdling,andperhapsthrowingballs;butwewouldnotdesscribe aBoeing747as veryathletic becauseitcancover100min0.4seconds,norwouldweexpect ittobegoodathurdlingandthrowingballs.

Evenforhumans,IQtestsarecontroversialbecauseoftheir theoreticalpresuppositions aboutinnateability(distinctfromtrainingeffects)adnthegeneralizabilityofresults.See TheMismeasureofMan byStephenJayGould,Norton,1981or Multipleintelligences:the theoryinpractice byHowardGardner,BasicBooks,1993formoreonIQtests,whatthey measure,andwhatotheraspectsthereareto“intelligence.”

1.5 Inorderofmagnitudefigures,thecomputationalpowerofthe computeris100times larger.

1.6 Justasyouareunawareofallthestepsthatgointomakingyourheartbeat,youare alsounawareofmostofwhathappensinyourthoughts.Youdohaveaconsciousawareness ofsomeofyourthoughtprocesses,butthemajorityremainsopaquetoyourconsciousness. Thefieldofpsychoanalysisisbasedontheideathatoneneeds trainedprofessionalhelpto analyzeone’sownthoughts.

1.7

•Althoughbarcodescanningisinasensecomputervision,thesearenotAIsystems. Theproblemofreadingabarcodeisanextremelylimitedandartificialformofvisual interpretation,andithasbeencarefullydesignedtobeassimpleaspossible,giventhe hardware.

•Inmanyrespects.Theproblemofdeterminingtherelevance ofawebpagetoaquery isaprobleminnaturallanguageunderstanding,andthetechniquesarerelatedtothose wewilldiscussinChapters22and23.SearchengineslikeAsk.com,whichgroup theretrievedpagesintocategories,useclusteringtechniquesanalogoustothosewe discussinChapter20.Likewise,otherfunctionalitiesprovidedbyasearchenginesuse intelligenttechniques;forinstance,thespellingcorrectorusesaformofdatamining basedonobservingusers’correctionsoftheirownspelling errors.Ontheotherhand, theproblemofindexingbillionsofwebpagesinawaythatallowsretrievalinseconds isaproblemindatabasedesign,notinartificialintelligence.

•Toalimitedextent.Suchmenustendstousevocabularieswhichareverylimited–e.g.thedigits,“Yes”,and“No”—andwithinthedesigners’control,whichgreatly simplifiestheproblem.Ontheotherhand,theprogramsmustdealwithanuncontrolled spaceofallkindsofvoicesandaccents.

2 Chapter1.Introduction

Thevoiceactivateddirectoryassistanceprogramsusedbytelephonecompanies, whichmustdealwithalargeandchangingvocabularyarecertainlyAIprograms.

•Thisisborderline.Thereissomethingtobesaidforviewingtheseasintelligentagents workingincyberspace.Thetaskissophisticated,theinformationavailableispartial,the techniquesareheuristic(notguaranteedoptimal),andthe stateoftheworldisdynamic. Allofthesearecharacteristicofintelligentactivities. Ontheotherhand,thetaskisvery farfromthosenormallycarriedoutinhumancognition.

1.8

Presumablythebrainhasevolvedsoastocarryoutthisoperationsonvisualimages, butthemechanismisonlyaccessibleforoneparticularpurposeinthisparticularcognitive taskofimageprocessing.Untilabouttwocenturiesagotherewasnoadvantageinpeople(or animals)beingabletocomputetheconvolutionofaGaussian foranyotherpurpose.

Thereallyinterestingquestionhereiswhatwemeanbysayingthatthe“actualperson” candosomething.Thepersoncansee,buthecannotcomputetheconvolutionofaGaussian; butcomputingthatconvolutionis part ofseeing.Thisisbeyondthescopeofthissolution manual.

1.9

Evolutiontendstoperpetuateorganisms(andcombinations andmutationsoforganisms)thataresuccessfulenoughtoreproduce.Thatis,evolutionfavorsorganismsthatcan optimizetheirperformancemeasuretoatleastsurvivetotheageofsexualmaturity,andthen beabletowinamate.Rationalityjustmeansoptimizingperformancemeasure,sothisisin linewithevolution.

1.10 Thisquestionisintendedtobeabouttheessentialnatureof theAIproblemandwhatis requiredtosolveit,butcouldalsobeinterpretedasasociologicalquestionaboutthecurrent practiceofAIresearch.

A science isafieldofstudythatleadstotheacquisitionofempiricalknowledgebythe scientificmethod,whichinvolvesfalsifiablehypothesesaboutwhatis.Apure engineering fieldcanbethoughtofastakingafixedbaseofempiricalknowledgeandusingittosolve problemsofinteresttosociety.Ofcourse,engineersdobitsofscience—e.g.,theymeasurethe propertiesofbuildingmaterials—andscientistsdobitsof engineeringtocreatenewdevices andsoon.

AsdescribedinSection1.1,the“human”sideofAIisclearly anempiricalscience— calledcognitivesciencethesedays—becauseitinvolvespsychologicalexperimentsdesigned outtofindouthowhumancognitionactuallyworks.Whatabout thethe“rational”side?

Ifweviewitasstudyingtheabstractrelationshipamonganarbitrarytaskenvironment,a computingdevice,andtheprogramforthatcomputingdevice thatyieldsthebestperformance inthetaskenvironment,thentherationalsideofAIisreallymathematicsandengineering; itdoesnotrequireanyempiricalknowledgeaboutthe actual world—andthe actual task environment—thatweinhabit;thatagivenprogramwilldowellinagivenenvironmentisa theorem.(Thesameistrueofpuredecisiontheory.)Inpractice,however,weareinterested intaskenvironmentsthatdoapproximatetheactualworld,soeventherationalsideofAI involvesfindingoutwhattheactualworldislike.Forexample,instudyingrationalagents thatcommunicate,weareinterestedintaskenvironmentsthatcontainhumans,sowehave

3

tofindoutwhathumanlanguageislike.Instudyingperception,wetendtofocusonsensors suchascamerasthatextractusefulinformationfromtheactualworld.(Inaworldwithout light,cameraswouldn’tbemuchuse.)Moreover,todesignvisionalgorithmsthataregood atextractinginformationfromcameraimages,weneedtounderstandtheactualworldthat generatesthoseimages.Obtainingtherequiredunderstandingofscenecharacteristics,object types,surfacemarkings,andsoonisaquitedifferentkindofsciencefromordinaryphysics, chemistry,biology,andsoon,butitisstillscience.

Insummary,AIisdefinitelyengineeringbutitwouldnotbeespeciallyusefultousifit werenotalsoanempiricalscienceconcernedwiththoseaspectsoftherealworldthataffect thedesignofintelligentsystemsforthatworld.

1.11 Thisdependsonyourdefinitionof“intelligent”and“tell.” Inonesensecomputersonly dowhattheprogrammerscommandthemtodo,butinanothersensewhattheprogrammers consciouslytellsthecomputertodooftenhasverylittleto dowithwhatthecomputeractually does.Anyonewhohaswrittenaprogramwithanornerybugknowsthis,asdoesanyone whohaswrittenasuccessfulmachinelearningprogram.Soin onesenseSamuel“told”the computer“learntoplaycheckersbetterthanIdo,andthenplaythatway,”butinanother sensehetoldthecomputer“followthislearningalgorithm” anditlearnedtoplay.Sowe’re leftinthesituationwhereyoumayormaynotconsiderlearningtoplaycheckerstobessign ofintelligence(oryoumaythinkthatlearningtoplayinthe rightwayrequiresintelligence, butnotinthisway),andyoumaythinktheintelligenceresidesintheprogrammerorinthe computer.

1.12 Thepointofthisexerciseistonoticetheparallelwiththepreviousone.Whatever youdecidedaboutwhethercomputerscouldbeintelligentin 1.11,youarecommittedto makingthesameconclusionaboutanimals(includinghumans), unless yourreasonsfordecidingwhethersomethingisintelligenttakeintoaccountthemechanism(programmingvia genesversusprogrammingviaahumanprogrammer).Notethat Searlemakesthisappealto mechanisminhisChineseRoomargument(seeChapter26).

1.13

1.14

a.(ping-pong)Areasonablelevelofproficiencywasachieved byAndersson’srobot(Andersson,1988).

b.(drivinginCairo)No.Althoughtherehasbeenalotofprogressinautomateddriving, allsuchsystemscurrentlyrelyoncertainrelativelyconstantclues:thattheroadhas shouldersandacenterline,thatthecaraheadwilltravelapredictablecourse,thatcars willkeeptotheirsideoftheroad,andsoon.Somelanechangesandturnscanbemade onclearlymarkedroadsinlighttomoderatetraffic.Driving indowntownCairoistoo unpredictableforanyofthesetowork.

c.(drivinginVictorville,California)Yes,tosomeextent, asdemonstratedinDARPA’s UrbanChallenge.Someofthevehiclesmanagedtonegotiatestreets,intersections, well-behavedtraffic,andwell-behavedpedestriansingood visualconditions.

4 Chapter1.Introduction
Again,thechoiceyoumakein1.11drivesyouranswertothisquestion.

d.(shoppingatthemarket)No.Norobotcancurrentlyputtogetherthetasksofmovingin acrowdedenvironment,usingvisiontoidentifyawidevarietyofobjects,andgrasping theobjects(includingsquishablevegetables)withoutdamagingthem.Thecomponent piecesarenearlyabletohandletheindividualtasks,butit wouldtakeamajorintegrationefforttoputitalltogether.

e.(shoppingontheweb)Yes.Softwarerobotsarecapableofhandlingsuchtasks,particularlyifthedesignofthewebgroceryshoppingsitedoes notchangeradicallyover time.

f.(bridge)Yes.ProgramssuchasGIBnowplayatasolidlevel.

g.(theoremproving)Yes.Forexample,theproofofRobbinsalgebradescribedonpage 360.

h.(funnystory)No.Whilesomecomputer-generatedproseand poetryishysterically funny,thisisinvariablyunintentional,exceptinthecase ofprogramsthatechoback prosethattheyhavememorized.

i.(legaladvice)Yes,insomecases.AIhasalonghistoryofresearchintoapplications ofautomatedlegalreasoning.Twooutstandingexamplesare theProlog-basedexpert systemsusedintheUKtoguidemembersofthepublicindealingwiththeintricaciesof thesocialsecurityandnationalitylaws.Thesocialsecuritysystemissaidtohavesaved theUKgovernmentapproximately$150millioninitsfirstyearofoperation.However, extensionintomorecomplexareassuchascontractlawawaitsasatisfactoryencoding ofthevastwebofcommon-senseknowledgepertainingtocommercialtransactionsand agreementandbusinesspractices.

j.(translation)Yes.Inalimitedway,thisisalreadybeingdone.SeeKay,Gawronand Norvig(1994)andWahlster(2000)foranoverviewofthefield ofspeechtranslation, andsomelimitationsonthecurrentstateoftheart.

k.(surgery)Yes.Robotsareincreasinglybeingusedforsurgery,althoughalwaysunder thecommandofadoctor.Roboticskillsdemonstratedatsuperhumanlevelsinclude drillingholesinbonetoinsertartificialjoints,suturing,andknot-tying.Theyarenot yetcapableofplanningandcarryingoutacomplexoperation autonomouslyfromstart tofinish.

1.15

• DARPAGrandChallengeforRoboticCars In2004theGrandChallengewasa240 kmracethroughtheMojaveDesert.Itclearlystressedthestateoftheartofautonomous driving,andinfactnocompetitorfinishedtherace.Thebest team,CMU,completed only12ofthe240km.In2005theracefeatureda212kmcoursewithfewercurves andwiderroadsthanthe2004race.Fiveteamsfinished,withStanfordfinishingfirst, edgingouttwoCMUentries.Thiswashailedasagreatachievementforroboticsand fortheChallengeformat.In2007theUrbanChallengeputcarsinacitysetting,where theyhadtoobeytrafficlawsandavoidothercars.ThistimeCMUedgedoutStanford.

5
Theprogressmadeinthiscontestsisamatteroffact,butthe impactofthatprogressis amatterofopinion.

Thecompetitionappearstohavebeenagoodtestinggroundto puttheoryintopractice, somethingthatthefailuresof2004showedwasneeded.Butit isimportantthatthe competitionwasdoneatjusttherighttime,whentherewastheoreticalworktoconsolidate,asdemonstratedbytheearlierworkbyDickmanns(whoseVaMPcardrove autonomouslyfor158kmin1995)andbyPomerleau(whoseNavlabcardrove5000km acrosstheUSA,alsoin1995,withthesteeringcontrolledautonomouslyfor98%ofthe trip,althoughthebrakesandacceleratorwerecontrolledbyahumandriver).

• InternationalPlanningCompetition In1998,fiveplannerscompeted:Blackbox, HSP,IPP,SGP,andSTAN.Theresultpage(ftp://ftp.cs.yale.edu/pub/ mcdermott/aipscomp-results.html)stated“alloftheseplannersperformed verywell,comparedtothestateoftheartafewyearsago.”Mostplansfoundwere30or 40steps,withsomeover100steps.In2008,thecompetitionhadexpandedquiteabit: thereweremoretracks(satisficingvs.optimizing;sequentialvs.temporal;staticvs. learning).Therewereabout25planners,includingsubmissionsfromthe1998groups (ortheirdescendants)andnewgroups.Solutionsfoundwere muchlongerthanin1998. Insum,thefieldhasprogressedquiteabitinparticipation, inbreadth,andinpowerof theplanners.Inthe1990sitwaspossibletopublishaPlanningpaperthatdiscussed onlyatheoreticalapproach;nowitisnecessarytoshowquantitativeevidenceofthe efficacyofanapproach.Thefieldisstrongerandmorematurenow,anditseemsthat theplanningcompetitiondeservessomeofthecredit.However,someresearchersfeel thattoomuchemphasisisplacedontheparticularclassesof problemsthatappearin thecompetitions,andnotenoughonreal-worldapplications.

• RobocupRoboticsSoccer Thiscompetitionhasprovedextremelypopular,attracting 407teamsfrom43countriesin2009(upfrom38teamsfrom11countriesin1997). Theroboticplatformhasadvancedtoamorecapablehumanoid form,andthestrategy andtacticshaveadvancedaswell.Althoughthecompetition hasspurredinnovations indistributedcontrol,thewinningteamsinrecentyearshavereliedmoreonindividual ball-handlingskillsthanonadvancedteamwork.Thecompetitionhasservedtoincrease interestandparticipationinrobotics,althoughitisnotclearhowwelltheyareadvancing towardsthegoalofdefeatingahumanteamby2050.

• TRECInformationRetrievalConference Thisisoneoftheoldestcompetitions, startedin1992.Thecompetitionshaveservedtobringtogetheracommunityofresearchers,haveledtoalargeliteratureofpublications,andhaveseenprogressinparticipationandinqualityofresultsovertheyears.Intheearlyyears,TRECserved itspurposeasaplacetodoevaluationsofretrievalalgorithmsontextcollectionsthat werelargeforthetime.However,startingaround2000TRECbecamelessrelevantas theadventoftheWorldWideWebcreatedacorpusthatwasavailabletoanyoneand wasmuchlargerthananythingTREChadcreated,andthedevelopmentofcommercial searchenginessurpassedacademicresearch.

• NISTOpenMachineTranslationEvaluation Thisseriesofevaluations(explicitly notlabelleda“competition”)hasexistedsince2001.Since thenwehaveseengreat advancesinMachineTranslationqualityaswellasinthenumberoflanguagescovered.

6 Chapter1.Introduction

Thedominantapproachhasswitchedfromonebasedongrammaticalrulestoonethat reliesprimarilyonstatistics.TheNISTevaluationsseemtotrackthesechangeswell, butdon’tappeartobedrivingthechanges.

Overall,weseethatwhateveryoumeasureisboundtoincreaseovertime.Formostof thesecompetitions,themeasurementwasausefulone,andthestateofthearthasprogressed. InthecaseofICAPS,someplanningresearchersworrythattoomuchattentionhasbeen lavishedonthecompetitionitself.Insomecases,progress hasleftthecompetitionbehind, asinTREC,wheretheresourcesavailabletocommercialsearchenginesoutpacedthose availabletoacademicresearchers.InthiscasetheTRECcompetitionwasuseful—ithelped trainmanyofthepeoplewhoendedupincommercialsearchengines—andinnowaydrew energyawayfromnewideas.

7

SolutionsforChapter2 IntelligentAgents

2.1 Thisquestionteststhestudent’sunderstandingofenvironments,rationalactions,and performancemeasures.Anysequentialenvironmentinwhich rewardsmaytaketimetoarrive willwork,becausethenwecanarrangefortherewardtobe“overthehorizon.”Supposethat inanystatetherearetwoactionchoices, a and b,andconsidertwocases:theagentisinstate s attime T orattime T 1.Instate s,action a reachesstate s′ withreward0,whileaction b reachesstate s againwithreward1;in s′ eitheractiongainsreward10.Attime T 1, it’srationaltodo a in s,withexpectedtotalreward10beforetimeisup;butattime T ,it’s rationaltodo b withtotalexpectedreward1becausetherewardof10cannotbeobtained beforetimeisup.

Studentsmayalsoprovidecommon-senseexamplesfromreallife:investmentswhose payoffoccursaftertheendoflife,examswhereitdoesn’tmakesensetostartthehigh-value questionwithtoolittletimelefttogettheanswer,andsoon

Theenvironmentstatecanincludeaclock,ofcourse;thisdoesn’tchangethegistof theanswer—nowtheactionwilldependontheclockaswellasonthenon-clockpartofthe state—butitdoesmeanthattheagentcanneverbeinthesamestatetwice.

2.2 Noticethatforoursimpleenvironmentalassumptionsweneednotworryaboutquantitativeuncertainty.

a.Itsufficestoshowthatforallpossibleactualenvironments(i.e.,alldirtdistributionsand initiallocations),thisagentcleansthesquaresatleastasfastasanyotheragent.Thisis triviallytruewhenthereisnodirt.Whenthereisdirtinthe initiallocationandnonein theotherlocation,theworldiscleanafteronestep;noagentcandobetter.Whenthere isnodirtintheinitiallocationbutdirtintheother,theworldiscleanaftertwosteps;no agentcandobetter.Whenthereisdirtinbothlocations,the worldiscleanafterthree steps;noagentcandobetter.(Note:ingeneral,theconditionstatedinthefirstsentence ofthisanswerismuchstricterthannecessaryforanagentto berational.)

b.Theagentin(a)keepsmovingbackwardsandforwardsevenaftertheworldisclean. Itisbettertodo NoOp oncetheworldisclean(thechaptersaysthis).Now,since theagent’sperceptdoesn’tsaywhethertheothersquareisclean,itwouldseemthat theagentmusthavesomememorytosaywhethertheothersquarehasalreadybeen cleaned.Tomakethisargumentrigorousismoredifficult—forexample,couldthe agentarrangethingssothatitwouldonlybeinacleanleftsquarewhentherightsquare

8

EXTERNALMEMORY

wasalreadyclean?Asageneralstrategy,anagent can usetheenvironmentitselfas aformof externalmemory—acommontechniqueforhumanswhousethingslike

appointmentcalendarsandknotsinhandkerchiefs.Inthisparticularcase,however,that isnotpossible.Considerthereflexactionsfor [A, Clean ] and [B, Clean ].Ifeitherof theseis NoOp,thentheagentwillfailinthecasewherethatistheinitial perceptbut theothersquareisdirty;hence,neithercanbe NoOp andthereforethesimplereflex agentisdoomedtokeepmoving.Ingeneral,theproblemwithreflexagentsisthatthey havetodothesamethinginsituationsthatlookthesame,evenwhenthesituations areactuallyquitedifferent.Inthevacuumworldthisisabigliability,becauseevery interiorsquare(excepthome)lookseitherlikeasquarewithdirtorasquarewithout dirt.

c.Ifweconsiderasymptoticallylonglifetimes,thenitisclearthatlearningamap(in someform)confersanadvantagebecauseitmeansthattheagentcanavoidbumping intowalls.Itcanalsolearnwheredirtismostlikelytoaccumulateandcandevise anoptimalinspectionstrategy.Theprecisedetailsoftheexplorationmethodneeded toconstructacompletemapappearinChapter4;methodsforderivinganoptimal inspection/cleanupstrategyareinChapter21.

2.3

a. Anagentthatsensesonlypartialinformationaboutthestatecannotbeperfectlyrational.

False.Perfectrationalityreferstotheabilitytomakegooddecisionsgiventhesensor informationreceived.

b Thereexisttaskenvironmentsinwhichnopurereflexagentcanbehaverationally.

True.Apurereflexagentignorespreviouspercepts,socannotobtainanoptimalstate estimateinapartiallyobservableenvironment.Forexample,correspondencechessis playedbysendingmoves;iftheotherplayer’smoveisthecurrentpercept,areflexagent couldnotkeeptrackoftheboardstateandwouldhavetorespondto,say,“a4”inthe samewayregardlessofthepositioninwhichitwasplayed.

c. Thereexistsataskenvironmentinwhicheveryagentisrational.

True.Forexample,inanenvironmentwithasinglestate,suchthatallactionshavethe samereward,itdoesn’tmatterwhichactionistaken.Moregenerally,anyenvironment thatisreward-invariantunderpermutationoftheactionswillsatisfythisproperty.

d. Theinputtoanagentprogramisthesameastheinputtotheagentfunction.

False.Theagentfunction,notionallyspeaking,takesasinputtheentireperceptsequenceuptothatpoint,whereastheagentprogramtakesthecurrentperceptonly.

e. Everyagentfunctionisimplementablebysomeprogram/machinecombination.

False.Forexample,theenvironmentmaycontainTuringmachinesandinputtapesand theagent’sjobistosolvethehaltingproblem;thereisanagent function thatspecifies therightanswers,butnoagentprogramcanimplementit.Anotherexamplewouldbe anagentfunctionthatrequiressolvingintractableprobleminstancesofarbitrarysizein constanttime.

9

MOBILEAGENT

f. Supposeanagentselectsitsactionuniformlyatrandomfrom thesetofpossibleactions. Thereexistsadeterministictaskenvironmentinwhichthis agentisrational.

True.Thisisaspecialcaseof(c);ifitdoesn’tmatterwhich actionyoutake,selecting randomlyisrational.

g. Itispossibleforagivenagenttobeperfectlyrationalintwodistincttaskenvironments.

True.Forexample,wecanarbitrarilymodifythepartsofthe environmentthatare unreachablebyanyoptimalpolicyaslongastheystayunreachable.

h. Everyagentisrationalinanunobservableenvironment.

False.Someactionsarestupid—andtheagentmayknowthisif ithasamodelofthe environment—evenifonecannotperceivetheenvironmentstate.

i Aperfectlyrationalpoker-playingagentneverloses.

False.Unlessitdrawstheperfecthand,theagentcanalways loseifanopponenthas bettercards.Thiscanhappenforgameaftergame.Thecorrectstatementisthatthe agent’sexpectedwinningsarenonnegative.

2.4 Manyofthesecanactuallybearguedeitherway,dependingon thelevelofdetailand abstraction.

A.Partiallyobservable,stochastic,sequential,dynamic,continuous,multi-agent.

B.Partiallyobservable,stochastic,sequential,dynamic,continuous,singleagent(unless therearealienlifeformsthatareusefullymodeledasagents).

C.Partiallyobservable,deterministic,sequential,static,discrete,singleagent.Thiscanbe multi-agentanddynamicifwebuybooksviaauction,ordynamicifwepurchaseona longenoughscalethatbookofferschange.

D.Fullyobservable,stochastic,episodic(everypointisseparate),dynamic,continuous, multi-agent.

E.Fullyobservable,stochastic,episodic,dynamic,continuous,singleagent.

F.Fullyobservable,stochastic,sequential,static,continuous,singleagent.

G.Fullyobservable,deterministic,sequential,static,continuous,singleagent.

H.Fullyobservable,strategic,sequential,static,discrete,multi-agent.

2.5 Thefollowingarejustsomeofthemanypossibledefinitionsthatcanbewritten:

• Agent:anentitythatperceivesandacts;or,onethat canbeviewed asperceivingand acting.Essentiallyanyobjectqualifies;thekeypointisthewaytheobjectimplements anagentfunction.(Note:someauthorsrestrictthetermto programs thatoperate on behalfof ahuman,ortoprogramsthatcancausesomeoralloftheircode torunon othermachinesonanetwork,asin mobileagents.)

• Agentfunction:afunctionthatspecifiestheagent’sactioninresponsetoeverypossible perceptsequence.

• Agentprogram:thatprogramwhich,combinedwithamachinearchitecture, implementsanagentfunction.Inoursimpledesigns,theprogramtakesanewpercepton eachinvocationandreturnsanaction.

10 Chapter2.IntelligentAgents

• Rationality:apropertyofagentsthatchooseactionsthatmaximizetheirexpectedutility,giventheperceptstodate.

• Autonomy:apropertyofagentswhosebehaviorisdeterminedbytheirownexperience ratherthansolelybytheirinitialprogramming.

• Reflexagent:anagentwhoseactiondependsonlyonthecurrentpercept.

• Model-basedagent:anagentwhoseactionisderiveddirectlyfromaninternalmodel ofthecurrentworldstatethatisupdatedovertime.

• Goal-basedagent:anagentthatselectsactionsthatitbelieveswillachieve explicitly representedgoals.

• Utility-basedagent:anagentthatselectsactionsthatitbelieveswillmaximizethe expectedutilityoftheoutcomestate.

• Learningagent:anagentwhosebehaviorimprovesovertimebasedonitsexperience.

2.6 Althoughthesequestionsareverysimple,theyhintatsomeveryfundamentalissues. Ouranswersareforthesimpleagentdesignsfor static environmentswherenothinghappens whiletheagentisdeliberating;theissuesgetevenmoreinterestingfordynamicenvironments.

a.Yes;takeanyagentprogramandinsertnullstatementsthat donotaffecttheoutput.

b.Yes;theagentfunctionmightspecifythattheagentprint true whentheperceptisa Turingmachineprogramthathalts,and false otherwise.(Note:indynamicenvironments,formachinesoflessthaninfinitespeed,therational agentfunctionmaynotbe implementable;e.g.,theagentfunctionthatalwaysplaysa winningmove,ifany,ina gameofchess.)

c.Yes;theagent’sbehaviorisfixedbythearchitectureandprogram.

d.Thereare 2n agentprograms,althoughmanyofthesewillnotrunatall.(Note:Any givenprogramcandevoteatmost n bitstostorage,soitsinternalstatecandistinguish amongonly 2n pasthistories.Becausetheagentfunctionspecifiesactionsbasedonpercepthistories,therewillbemanyagentfunctionsthatcannotbeimplementedbecause oflackofmemoryinthemachine.)

e.Itdependsontheprogramandtheenvironment.Iftheenvironmentisdynamic,speedingupthemachinemaymeanchoosingdifferent(perhapsbetter)actionsand/oracting sooner.Iftheenvironmentisstaticandtheprogrampaysnoattentiontothepassageof elapsedtime,theagentfunctionisunchanged.

2.7

Thedesignofgoal-andutility-basedagentsdependsonthestructureofthetaskenvironment.Thesimplestsuchagents,forexamplethoseinchapters3and10,computethe agent’sentirefuturesequenceofactionsinadvancebefore actingatall.Thisstrategyworks forstaticanddeterministicenvironmentswhichareeither fully-knownorunobservable

Forfully-observableandfully-knownstaticenvironments apolicycanbecomputedin advancewhichgivestheactiontobytakeninanygivenstate.

11

function GOAL-BASED-AGENT(percept) returns anaction persistent: state,theagent’scurrentconceptionoftheworldstate model,adescriptionofhowthenextstatedependsoncurrentstate andaction goal,adescriptionofthedesiredgoalstate plan,asequenceofactionstotake,initiallyempty action,themostrecentaction,initiallynone

state ← UPDATE-STATE(state, action, percept, model ) if GOAL-ACHIEVED(state,goal) thenreturn anullaction if plan isempty then plan ← PLAN(state,goal,model) action ← FIRST(plan) plan ← REST(plan) return action

Forpartially-observableenvironmentstheagentcancomputeaconditionalplan,which specifiesthesequenceofactionstotakeasafunctionoftheagent’sperception.Intheextreme,aconditionalplangivestheagent’sresponsetoeverycontingency,andsoitisarepresentationoftheentireagentfunction.

Inallcasesitmaybeeitherintractableortooexpensivetocomputeeverythingoutin advance.Insteadofaconditionalplan,itmaybebettertocomputeasinglesequenceof actionswhichislikelytoreachthegoal,thenmonitortheenvironmenttocheckwhetherthe planissucceeding,repairingorreplanningifitisnot.Itmaybeevenbettertocomputeonly thestartofthisplanbeforetakingthefirstaction,continuingtoplanatlatertimesteps.

Pseudocodeforsimplegoal-basedagentisgiveninFigureS2.1.GOAL-ACHIEVED teststoseewhetherthecurrentstatesatisfiesthegoalornot,doingnothingifitdoes.PLAN computesasequenceofactionstotaketoachievethegoal.Thismightreturnonlyaprefix ofthefullplan,therestwillbecomputedaftertheprefixisexecuted.Thisagentwillactto maintainthegoal:ifatanypointthegoalisnotsatisfieditwill(eventually)replantoachieve thegoalagain.

Atthislevelofabstractiontheutility-basedagentisnotmuchdifferentthanthegoalbasedagent,exceptthatactionmaybecontinuouslyrequired(thereisnotnecessarilyapoint wheretheutilityfunctionis“satisfied”).PseudocodeisgiveninFigureS2.2.

(defunreflex-rational-vacuum-agent(percept)

(destructuring-bind(locationstatus)percept

12 Chapter2.IntelligentAgents
FigureS2.1 Agoal-basedagent. 2.8 Thefile "agents/environments/vacuum.lisp" inthecoderepositoryimplementsthevacuum-cleanerenvironment.Studentscaneasily extendittogeneratedifferent shapedrooms,obstacles,andsoon. 2.9 Areflexagentprogramimplementingtherationalagentfunctiondescribedinthechapterisasfollows:

function UTILITY-BASED-AGENT(percept) returns anaction persistent: state,theagent’scurrentconceptionoftheworldstate model,adescriptionofhowthenextstatedependsoncurrentstate andaction utility function,adescriptionoftheagent’sutilityfunction plan,asequenceofactionstotake,initiallyempty action,themostrecentaction,initiallynone

state ← UPDATE-STATE(state, action, percept, model ) if plan isempty then plan ← PLAN(state,utility function,model) action ← FIRST(plan) plan ← REST(plan) return action

(cond((eqstatus’Dirty)’Suck) ((eqlocation’A)’Right) (t’Left))))

Forstates1,3,5,7inFigure4.9,theperformancemeasuresare1996,1999,1998,2000 respectively.

2.10

a.No;seeanswerto2.4(b).

b.Seeanswerto2.4(b).

c.Inthiscase,asimplereflexagentcanbeperfectlyrational.Theagentcanconsistof atablewitheightentries,indexedbypercept,thatspecifiesanactiontotakeforeach possiblestate.Aftertheagentacts,theworldisupdatedandthenextperceptwilltell theagentwhattodonext.Forlargerenvironments,constructingatableisinfeasible. Instead,theagentcouldrunoneoftheoptimalsearchalgorithmsinChapters3and4 andexecutethefirststepofthesolutionsequence.Again,no internalstateis required, butitwouldhelptobeabletostorethesolutionsequenceinsteadofrecomputingitfor eachnewpercept.

2.11

a.Becausetheagentdoesnotknowthegeographyandperceives onlylocationandlocal dirt,andcannotrememberwhatjusthappened,itwillgetstuckforeveragainstawall whenittriestomoveinadirectionthatisblocked—thatis,unlessitrandomizes.

b.Onepossibledesigncleansupdirtandotherwisemovesrandomly:

(defunrandomized-reflex-vacuum-agent(percept) (destructuring-bind(locationstatus)percept (cond((eqstatus’Dirty)’Suck)

(t(random-element’(LeftRightUpDown))))))

13
FigureS2.2 Autility-basedagent.

ThisisfairlyclosetowhattheRoombaTM vacuumcleanerdoes(althoughtheRoomba hasabumpsensorandrandomizesonlywhenithitsanobstacle).Itworksreasonably wellinnice,compactenvironments.Inmaze-likeenvironmentsorenvironmentswith smallconnectingpassages,itcantakeaverylongtimetocoverallthesquares.

c.AnexampleisshowninFigureS2.3.Studentsmayalsowishto measureclean-uptime forlinearorsquareenvironmentsofdifferentsizes,andcomparethosetotheefficient onlinesearchalgorithmsdescribedinChapter4.

d.Areflexagentwithstatecanbuildamap(seeChapter4fordetails).Anonlinedepthfirstexplorationwillreacheverystateintimelinearinthe sizeoftheenvironment; therefore,theagentcandomuchbetterthanthesimplereflex agent.

Thequestionofrationalbehaviorinunknownenvironmentsisacomplexonebutitis worthencouragingstudentstothinkaboutit.Weneedtohave somenotionoftheprior probabilitydistributionovertheclassofenvironments;callthistheinitial beliefstate. Anyactionyieldsanewperceptthatcanbeusedtoupdatethis distribution,moving theagenttoanewbeliefstate.Oncetheenvironmentiscompletelyexplored,thebelief statecollapsestoasinglepossibleenvironment.Therefore,theproblemofoptimal explorationcanbeviewedasasearchforanoptimalstrategy inthespaceofpossible beliefstates.Thisisawell-defined,ifhorrendouslyintractable,problem.Chapter21 discussessomecaseswhereoptimalexplorationispossible.Anotherconcreteexample ofexplorationistheMinesweepercomputergame(seeExercise7.22).Forverysmall Minesweeperenvironments,optimalexplorationisfeasiblealthoughthebeliefstate

14 Chapter2.IntelligentAgents
FigureS2.3 Anenvironmentinwhichrandommotionwilltakealongtimeto coverall thesquares.

updateisnontrivialtoexplain.

2.12 Theproblemappearsatfirsttobeverysimilar;themaindifferenceisthatinsteadof usingthelocationpercepttobuildthemap,theagenthasto“invent”itsownlocations(which, afterall,arejustnodesinadatastructurerepresentingthestatespacegraph).Whenabump isdetected,theagentassumesitremainsinthesamelocationandcanaddawalltoitsmap. Forgridenvironments,theagentcankeeptrackofits (x,y) locationandsocantellwhenit hasreturnedtoanoldstate.Inthegeneralcase,however,thereisnosimplewaytotellifa stateisneworold.

2.13

a.Forareflexagent,thispresentsno additional challenge,becausetheagentwillcontinue to Suck aslongasthecurrentlocationremainsdirty.Foranagentthatconstructsa sequentialplan,every Suck actionwouldneedtobereplacedby“Suck untilclean.” Ifthedirtsensorcanbewrongoneachstep,thentheagentmightwanttowaitfora fewstepstogetamorereliablemeasurementbeforedeciding whetherto Suck ormove ontoanewsquare.Obviously,thereisatrade-offbecausewaitingtoolongmeans thatdirtremainsonthefloor(incurringapenalty),butactingimmediatelyriskseither dirtyingacleansquareorignoringadirtysquare(ifthesensoriswrong).Arational agentmustalsocontinuetouringandcheckingthesquaresin caseitmissedoneona previoustour(becauseofbadsensorreadings).itisnotimmediatelyobvioushowthe waitingtimeateachsquareshouldchangewitheachnewtour. Theseissuescanbe clarifiedbyexperimentation,whichmaysuggestageneraltrendthatcanbeverified mathematically.ThisproblemisapartiallyobservableMarkovdecisionprocess—see Chapter17.Suchproblemsarehardingeneral,butsomespecialcasesmayyieldto carefulanalysis.

b.Inthiscase,theagentmustkeeptouringthesquaresindefinitely.Theprobabilitythat asquareisdirtyincreasesmonotonicallywiththetimesinceitwaslastcleaned,sothe rationalstrategyis,roughlyspeaking,torepeatedlyexecutetheshortestpossibletourof allsquares.(Wesay“roughlyspeaking”becausetherearecomplicationscausedbythe factthattheshortesttourmayvisitsomesquarestwice,dependingonthegeography.) ThisproblemisalsoapartiallyobservableMarkovdecision process.

15

SolutionsforChapter3

SolvingProblemsbySearching

3.1 Ingoalformulation,wedecidewhichaspectsoftheworldweareinterestedin,and whichcanbeignoredorabstractedaway.Theninproblemformulationwedecidehowto manipulatetheimportantaspects(andignoretheothers).Ifwedidproblemformulationfirst wewouldnotknowwhattoincludeandwhattoleaveout.Thatsaid,itcanhappenthatthere isacycleofiterationsbetweengoalformulation,problemformulation,andproblemsolving untilonearrivesatasufficientlyusefulandefficientsolution.

3.2

a.We’lldefinethecoordinatesystemsothatthecenterofthe mazeisat (0, 0),andthe mazeitselfisasquarefrom ( 1, 1) to (1, 1).

Initialstate:robotatcoordinate (0, 0),facingNorth.

Goaltest:either |x| > 1 or |y| > 1 where (x,y) isthecurrentlocation.

Successorfunction:moveforwardsanydistance d;changedirectionrobotitfacing. Costfunction:totaldistancemoved.

Thestatespaceisinfinitelylarge,sincetherobot’spositioniscontinuous.

b.Thestatewillrecordtheintersectiontherobotiscurrentlyat,alongwiththedirection it’sfacing.Attheendofeachcorridorleavingthemazewewillhaveanexitnode. We’llassumesomenodecorrespondstothecenterofthemaze.

Initialstate:atthecenterofthemazefacingNorth. Goaltest:atanexitnode.

Successorfunction:movetothenextintersectioninfrontofus,ifthereisone;turnto faceanewdirection.

Costfunction:totaldistancemoved.

Thereare 4n states,where n isthenumberofintersections.

c.Initialstate:atthecenterofthemaze.

Goaltest:atanexitnode.

Successorfunction:movetonextintersectiontotheNorth, South,East,orWest.

Costfunction:totaldistancemoved.

Wenolongerneedtokeeptrackoftherobot’sorientationsinceitisirrelevantto

16

3.3

predictingtheoutcomeofouractions,andnotpartofthegoaltest.Themotorsystem thatexecutesthisplanwillneedtokeeptrackoftherobot’s currentorientation,toknow whentorotatetherobot.

d.Stateabstractions:

(i)Ignoringtheheightoftherobotofftheground,whetheritistiltedoffthevertical.

(ii)Therobotcanfaceinonlyfourdirections.

(iii)Otherpartsoftheworldignored:possibilityofother robotsinthemaze,the weatherintheCaribbean.

Actionabstractions:

(i)Weassumedallpositionswesafelyaccessible:therobot couldn’tgetstuckor damaged.

(ii)Therobotcanmoveasfarasitwants,withouthavingtorechargeitsbatteries.

(iii)Simplifiedmovementsystem:movingforwardsacertain distance,ratherthancontrolledeachindividualmotorandwatchingthesensorstodetectcollisions.

a.Statespace:Statesareallpossiblecitypairs (i,j).Themapis not thestatespace.

Successorfunction:Thesuccessorsof (i,j) areallpairs (x,y) suchthat Adjacent(x,i) and Adjacent(y,j)

Goal:Beat (i,i) forsome i

Stepcostfunction:Thecosttogofrom (i,j) to (x,y) is max(d(i,x),d(j,y))

b.Inthebestcase,thefriendsheadstraightforeachotherin stepsofequalsize,reducing theirseparationbytwicethetimecostoneachstep.Hence(iii)isadmissible.

c.Yes:e.g.,amapwithtwonodesconnectedbyonelink.Thetwo friendswillswap placesforever.Thesamewillhappenonanychainiftheystartanoddnumberofsteps apart.(Onecanseethisbestonthegraphthatrepresentsthe statespace,whichhastwo disjointsetsofnodes.)Thesameevenholdsforagridofanysizeorshape,because everymovechangestheManhattandistancebetweenthetwofriendsby0or2.

d.Yes:takeanyoftheunsolvablemapsfrompart(c)andaddaself-looptoanyoneof thenodes.Ifthefriendsstartanoddnumberofstepsapart,a moveinwhichoneofthe friendstakestheself-loopchangesthedistanceby1,renderingtheproblemsolvable.If theself-loopisnottaken,theargumentfrom(c)appliesand nosolutionispossible.

3.4 Fromhttp://www.cut-the-knot.com/pythagoras/fifteen.shtml,thisproofappliestothe fifteenpuzzle,butthesameargumentworksfortheeightpuzzle:

Definition:Thegoalstatehasthenumbersinacertainorder,whichwewillmeasureas startingattheupperleftcorner,thenproceedinglefttoright,andwhenwereachtheendofa row,goingdowntotheleftmostsquareintherowbelow.Foranyotherconfigurationbesides thegoal,wheneveratilewithagreaternumberonitprecedes atilewithasmallernumber, thetwotilesaresaidtobe inverted

Proposition:Foragivenpuzzleconfiguration,let N denotethesumofthetotalnumber ofinversionsandtherownumberoftheemptysquare.Then (Nmod2) isinvariantunderany

17

Chapter3.SolvingProblemsbySearching legalmove.Inotherwords,afteralegalmoveanodd N remainsoddwhereasaneven N remainseven.ThereforethegoalstateinFigure3.4,withno inversionsandemptysquarein thefirstrow,has N =1,andcanonlybereachedfromstartingstateswithodd N ,notfrom startingstateswitheven N

Proof:Firstofall,slidingatilehorizontallychangesneitherthetotalnumberofinversionsnortherownumberoftheemptysquare.Thereforeletusconsiderslidingatile vertically.

Let’sassume,forexample,thatthetile A islocateddirectlyovertheemptysquare. Slidingitdownchangestheparityoftherownumberoftheemptysquare.Nowconsiderthe totalnumberofinversions.Themoveonlyaffectsrelativepositionsoftiles A, B, C,and D. Ifnoneofthe B, C, D causedaninversionrelativeto A (i.e.,allthreearelargerthan A)then afterslidingonegetsthree(anoddnumber)ofadditionalinversions.Ifoneofthethreeis smallerthan A,thenbeforethemove B, C,and D contributedasingleinversion(relativeto A)whereasafterthemovethey’llbecontributingtwoinversions-achangeof1,alsoanodd number.Twoadditionalcasesobviouslyleadtothesameresult.Thusthechangeinthesum N isalwayseven.Thisispreciselywhatwehavesetouttoshow.

Sobeforewesolveapuzzle,weshouldcomputethe N valueofthestartandgoalstate andmakesuretheyhavethesameparity,otherwisenosolutionispossible.

3.5 Theformulationputsonequeenpercolumn,withanewqueenplacedonlyinasquare thatisnotattackedbyanyotherqueen.Tosimplifymatters, we’llfirstconsiderthe n–rooks problem.Thefirstrookcanbeplacedinanysquareincolumn1(n choices),thesecondin anysquareincolumn2exceptthesamerowthatastherookincolumn1(n 1 choices),and soon.Thisgives n! elementsofthesearchspace.

For n queens,noticethataqueenattacksatmostthreesquaresinanygivencolumn,so incolumn2thereareatleast (n 3) choices,incolumnatleast (n 6) choices,andsoon.

3.6

a.Initialstate:Noregionscolored.

b.Initialstate:Asdescribedinthetext.

Successorfunction:Hoponcrate;Hopoffcrate;Pushcratefromonespottoanother; Walkfromonespottoanother;grabbananas(ifstandingoncrate).

Costfunction:Numberofactions.

18
Thusthestatespacesize S ≥ n · (n 3) · (n 6) ···.Hencewehave S3 ≥ n n n (n 3) (n 3) (n 3) (n 6) (n 6) (n 6) ≥ n (n 1) (n 2) (n 3) (n 4) (n 5) (n 6) (n 7) (n 8) = n! or S ≥ 3 √n!
Costfunction:Numberofassignments.
Goaltest:Allregionscolored,andnotwoadjacentregionshavethesamecolor. Successorfunction:Assignacolortoaregion.
Goaltest:Monkeyhasbananas.

3.7

c.Initialstate:consideringallinputrecords.

Goaltest:consideringasinglerecord,anditgives“illegalinput”message.

Successorfunction:runagainonthefirsthalfoftherecords;runagainonthesecond halfoftherecords.

Costfunction:Numberofruns.

Note:Thisisa contingencyproblem;youneedtoseewhetherarungivesanerror messageornottodecidewhattodonext.

d.Initialstate:jugshavevalues [0, 0, 0].

Successorfunction:givenvalues [x,y,z],generate [12,y,z], [x, 8,z], [x,y, 3] (byfilling); [0,y,z], [x, 0,z], [x,y, 0] (byemptying);orforanytwojugswithcurrentvalues x and y,pour y into x;thischangesthejugwith x totheminimumof x + y andthe capacityofthejug,anddecrementsthejugwith y bybytheamountgainedbythefirst jug.

Costfunction:Numberofactions.

a.Ifweconsiderall (x,y) points,thenthereareaninfinitenumberofstates,andofpaths.

b.(Forthisproblem,weconsiderthestartandgoalpointstobevertices.)Theshortest distancebetweentwopointsisastraightline,andifitisnotpossibletotravelina straightlinebecausesomeobstacleisintheway,thenthenextshortestdistanceisa sequenceoflinesegments,end-to-end,thatdeviatefromthestraightlinebyaslittle aspossible.Sothefirstsegmentofthissequencemustgofrom thestartpointtoa tangentpointonanobstacle–anypaththatgavetheobstacle awidergirthwouldbe longer.Becausetheobstaclesarepolygonal,thetangentpointsmustbeatverticesof theobstacles,andhencetheentirepathmustgofromvertextovertex.Sonowthestate spaceisthesetofvertices,ofwhichthereare35inFigure3.31.

c.Codenotshown.

d.Implementationsandanalysisnotshown.

3.8

a.Anypath,nomatterhowbaditappears,mightleadtoanarbitrarilylargereward(negativecost).Therefore,onewouldneedtoexhaustallpossiblepathstobesureoffinding thebestone.

b.Supposethegreatestpossiblerewardis c.Thenifwealsoknowthemaximumdepthof thestatespace(e.g.whenthestatespaceisatree),thenany pathwith d levelsremaining canbeimprovedbyatmost cd,soanypathsworsethan cd lessthanthebestpathcanbe pruned.Forstatespaceswithloops,thisguaranteedoesn’t help,becauseitispossible togoaroundaloopanynumberoftimes,pickingup c rewardeachtime.

c.Theagentshouldplantogoaroundthisloopforever(unless itcanfindanotherloop withevenbetterreward).

d.Thevalueofascenicloopislessenedeachtimeonerevisits it;anovelscenicsight isagreatreward,butseeingthesameoneforthetenthtimein anhouristedious,not

19

Chapter3.SolvingProblemsbySearching

rewarding.Toaccommodatethis,wewouldhavetoexpandthestatespacetoinclude amemory—astateisnowrepresentednotjustbythecurrentlocation,butbyacurrent locationandabagofalready-visitedlocations.Thereward forvisitinganewlocation isnowa(diminishing)functionofthenumberoftimesithasbeenseenbefore.

e.Realdomainswithloopingbehaviorincludeeatingjunkfoodandgoingtoclass.

3.9

a.Hereisonepossiblerepresentation:Astateisasix-tuple ofintegerslistingthenumber ofmissionaries,cannibals,andboatsonthefirstside,andthenthesecondsideofthe river.Thegoalisastatewith3missionariesand3cannibals onthesecondside.The costfunctionisoneperaction,andthesuccessorsofastate areallthestatesthatmove 1or2peopleand1boatfromonesidetoanother.

b.Thesearchspaceissmall,soanyoptimalalgorithmworks.Foranexample,seethe file "search/domains/cannibals.lisp".Itsufficestoeliminatemovesthat circlebacktothestatejustvisited.Fromallbutthefirstandlaststates,thereisonly oneotherchoice.

c.Itisnotobviousthatalmostallmovesareeitherillegalor reverttothepreviousstate. Thereisafeelingofalargebranchingfactor,andnoclearwaytoproceed.

3.10 A state isasituationthatanagentcanfinditselfin.Wedistinguish twotypesofstates: worldstates(theactualconcretesituationsintherealworld)andrepresentationalstates(the abstractdescriptionsoftherealworldthatareusedbytheagentindeliberatingaboutwhatto do).

A statespace isagraphwhosenodesarethesetofallstates,andwhoselinksare actionsthattransformonestateintoanother.

A searchtree isatree(agraphwithnoundirectedloops)inwhichtherootnodeisthe startstateandthesetofchildrenforeachnodeconsistsofthestatesreachablebytakingany action.

A searchnode isanodeinthesearchtree.

A goal isastatethattheagentistryingtoreach.

An action issomethingthattheagentcanchoosetodo.

A successorfunction describedtheagent’soptions:givenastate,itreturnsasetof (action,state)pairs,whereeachstateisthestatereachablebytakingtheaction.

The branchingfactor inasearchtreeisthenumberofactionsavailabletotheagent.

3.11 Aworldstateishowrealityisorcouldbe.Inoneworldstatewe’reinArad,inanother we’reinBucharest.Theworldstatealsoincludeswhichstreetwe’reon,what’scurrentlyon theradio,andthepriceofteainChina.Astatedescriptionisanagent’sinternaldescriptionofaworldstate.Examplesare In(Arad) and In(Bucharest).Thesedescriptionsare necessarilyapproximate,recordingonlysomeaspectofthe state.

Weneedtodistinguishbetweenworldstatesandstatedescriptionsbecausestatedescriptionarelossyabstractionsoftheworldstate,becausetheagentcouldbemistakenabout

20

howtheworldis,becausetheagentmightwanttoimaginethingsthataren’ttruebutitcould maketrue,andbecausetheagentcaresabouttheworldnotits internalrepresentationofit.

Searchnodesaregeneratedduringsearch,representingastatethesearchprocessknows howtoreach.Theycontainadditionalinformationasidefromthestatedescription,suchas thesequenceofactionsusedtoreachthisstate.Thisdistinctionisusefulbecausewemay generatedifferentsearchnodeswhichhavethesamestate,andbecausesearchnodescontain moreinformationthanastaterepresentation.

3.12

Thestatespaceisatreeofdepthone,withallstatessuccessorsoftheinitialstate. Thereisnodistinctionbetweendepth-firstsearchandbreadth-firstsearchonsuchatree.If thesequencelengthisunboundedtherootnodewillhaveinfinitelymanysuccessors,soonly algorithmswhichtestforgoalnodesaswegeneratesuccessorscanwork.

Whathappensnextdependsonhowthecompositeactionsaresorted.Ifthereisno particularordering,thenarandombutsystematicsearchof potentialsolutionsoccurs.Ifthey aresortedbydictionaryorder,thenthisimplementsdepth-firstsearch.Iftheyaresortedby lengthfirst,thendictionaryordering,thisimplementsbreadth-firstsearch.

Asignificantdisadvantageofcollapsingthesearchspacelikethisisifwediscoverthat aplanstartingwiththeaction“unplugyourbattery”can’tbeasolution,thereisnoeasyway toignoreallothercompositeactionsthatstartwiththisaction.Thisisaprobleminparticular forinformedsearchalgorithms.

Discardingsequencestructureisnotaparticularlypracticalapproachtosearch.

3.13

Thegraphseparationpropertystatesthat“everypathfromtheinitialstatetoanunexploredstatehastopassthroughastateinthefrontier.”

Atthestartofthesearch,thefrontierholdstheinitialstate;hence,trivially,everypath fromtheinitialstatetoanunexploredstateincludesanode inthefrontier(theinitialstate itself).

Now,weassumethatthepropertyholdsatthebeginningofanarbitraryiterationof theGRAPH-SEARCH algorithminFigure3.7.Weassumethattheiterationcompletes,i.e., thefrontierisnotemptyandtheselectedleafnode n isnotagoalstate.Attheendofthe iteration, n hasbeenremovedfromthefrontieranditssuccessors(ifnot alreadyexploredorin thefrontier)placedinthefrontier.Consideranypathfrom theinitialstatetoanunexplored state;bytheinductionhypothesissuchapath(atthebeginningoftheiteration)includes atleastonefrontiernode;exceptwhen n istheonlysuchnode,theseparationproperty automaticallyholds.Hence,wefocusonpathspassingthrough n (andnootherfrontier node).Bydefinition,thenextnode n′ alongthepathfrom n mustbeasuccessorof n that (bytheprecedingsentence)isalreadynotinthefrontier.Furthermore, n′ cannotbeinthe exploredset,sincebyassumptionthereisapathfrom n′ toanunexplorednodenotpassing throughthefrontier,whichwouldviolatetheseparationpropertyaseveryexplorednodeis connectedtotheinitialstatebyexplorednodes(seelemmabelowforproofthisisalways possible).Hence, n′ isnotintheexploredset,henceitwillbeaddedtothefrontier;thenthe pathwillincludeafrontiernodeandtheseparationpropertyisrestored.

Thepropertyisviolatedbyalgorithmsthatmovenodesfromthefrontierintotheex-

21

Chapter3.SolvingProblemsbySearching ploredsetbeforealloftheirsuccessorshavebeengenerated,aswellasbythosethatfailto addsomeofthesuccessorstothefrontier.Notethatitisnot necessarytogenerate all successorsofanodeatoncebeforeexpandinganothernode,aslongaspartiallyexpandednodes remaininthefrontier.

Lemma:Everyexplorednodeisconnectedtotheinitialstate byapathofexplored nodes.

Proof:Thisistrueinitially,sincetheinitialstateisconnectedtoitself.Sincewenever removenodesfromtheexploredregion,weonlyneedtochecknewnodesweaddtothe exploredlistonanexpansion.Let n besuchanewexplorednode.Thisispreviouslyon thefrontier,soitisaneighborofanode n′ previouslyexplored(i.e.,itsparent). n′ is,by hypothesisisconnectedtotheinitialstatebyapathofexplorednodes.Thispathwith n appendedisapathofexplorednodesconnecting n′ totheinitialstate.

3.14

a False:aluckyDFSmightexpandexactly d nodestoreachthegoal.A∗ largelydominatesanygraph-searchalgorithmthatis guaranteedtofindoptimalsolutions

b. True: h(n)=0 isalwaysanadmissibleheuristic,sincecostsarenonnegative.

c True:A*searchisoftenusedinrobotics;thespacecanbediscretizedorskeletonized.

d True:depthofthesolutionmattersforbreadth-firstsearch,not cost.

e. False:arookcanmoveacrosstheboardinmoveone,althoughtheManhattandistance fromstarttofinishis8.

b

Depth-limited:1248951011

Iterativedeepening:1;123;1245367;1248951011

c.Bidirectionalsearchisveryuseful,becausetheonlysuccessorof n inthereversedirectionis ⌊(n/2)⌋.Thishelpsfocusthesearch.Thebranchingfactoris2inthe forward direction;1inthereversedirection.

22
1 2 3 4 5 6 7 8 9 10 12 11 13 14 15
3.15
FigureS3.1 ThestatespacefortheproblemdefinedinEx.3.15. a.SeeFigureS3.1. .Breadth-first:1234567891011

d.Yes;startatthegoal,andapplythesinglereversesuccessoractionuntilyoureach1.

e.Thesolutioncanbereadoffthebinarynumeralforthegoalnumber.Writethegoal numberinbinary.Sincewecanonlyreachpositiveintegers, thisbinaryexpansion beingswitha1.Frommost-toleast-significantbit,skippingtheinitial1,goLeftto thenode 2n ifthisbitis0andgoRighttonode 2n +1 ifitis1.Forexample,suppose thegoalis 11,whichis 1011 inbinary.ThesolutionisthereforeLeft,Right,Right.

3.16

a. Initialstate:onearbitrarilyselectedpiece(sayastraightpiece).

Successorfunction:foranyopenpeg,addanypiecetypefromremainingtypes.(You canaddtoopenholesaswell,butthatisn’tnecessaryasallcompletetrackscanbe madebyaddingtopegs.)Foracurvedpiece,add ineitherorientation;forafork,add ineitherorientation and(iftherearetwoholes)connecting ateitherhole.It’sagood ideatodisallowanyoverlappingconfiguration,asthisterminateshopelessconfigurationsearly.(Note:thereisnoneedtoconsideropenholes,becauseinanysolutionthese willbefilledbypiecesaddedtoopenpegs.)

Goaltest:allpiecesusedinasingleconnectedtrack,noopenpegsorholes,nooverlappingtracks.

Stepcost:oneperpiece(actually,doesn’treallymatter).

b.Allsolutionsareatthesamedepth,sodepth-firstsearchwouldbeappropriate.(One couldalsousedepth-limitedsearchwithlimit n 1,butstrictlyspeakingit’snotnecessarytodotheworkofcheckingthelimitbecausestatesatdepth n 1 havenosuccessors.)Thespaceisverylarge,souniform-costandbreadth-firstwouldfail,anditerative deepeningsimplydoesunnecessaryextrawork.Therearemanyrepeatedstates,soit mightbegoodtouseaclosedlist.

c.Asolutionhasnoopenpegsorholes,soeverypegisinahole, sotheremustbeequal numbersofpegsandholes.Removingaforkviolatesthisproperty.Therearetwoother “proofs”thatareacceptable:1)asimilarargumenttotheeffectthattheremustbean evennumberof“ends”;2)eachforkcreatestwotracks,andonlyaforkcanrejointhose tracksintoone,soifaforkismissingitwon’twork.Theargumentusingpegsandholes isactuallymoregeneral,becauseitalsoappliestothecase ofathree-wayforkthathas oneholeandthreepegsoronepegandthreeholes.The“ends”argumentfailshere,as doesthefork/rejoinargument(whichisabithandwavyanyway).

d.Themaximumpossiblenumberofopenpegsis3(startsat1,addingatwo-pegfork increasesitbyone).Pretendingeachpieceisunique,anypiececanbeaddedtoapeg, givingatmost 12+(2 · 16)+(2 · 2)+(2 · 2 · 2)=56 choicesperpeg.Thetotal depthis32(thereare32pieces),soanupperboundis 16832/(12! · 16! · 2! · 2!) where thefactorialsdealwithpermutationsofidenticalpieces. Onecoulddoamorerefined analysistohandlethefactthatthebranchingfactorshrinksaswegodownthetree,but itisnotpretty.

3.17a. Thealgorithmexpandsnodesinorderofincreasingpathcost;thereforethefirst goalitencounterswillbethegoalwiththecheapestcost.

23

Chapter3.SolvingProblemsbySearching

b. Itwillbethesameasiterativedeepening, d iterations,inwhich O(bd) nodesare generated.

c. d/ǫ

d. Implementationnotshown.

3.18 Consideradomaininwhicheverystatehasasinglesuccessor,andthereisasinglegoal atdepth n.Thendepth-firstsearchwillfindthegoalin n steps,whereasiterativedeepening searchwilltake 1+2+3+ ··· + n = O(n2) steps.

3.19 Asanordinaryperson(oragent)browsingtheweb,wecanonly generatethesuccessorsofapagebyvisitingit.Wecanthendobreadth-firstsearch,orperhapsbest-search searchwheretheheuristicissomefunctionofthenumberofwordsincommonbetweenthe startandgoalpages;thismayhelpkeepthelinksontarget.Searchengineskeepthecomplete graphoftheweb,andmayprovidetheuseraccesstoall(oratleastsome)ofthepagesthat linktoapage;thiswouldallowustodobidirectionalsearch.

3.20 Codenotshown,butagoodstartisinthecoderepository.Clearly,graphsearch mustbeused—thisisaclassicgridworldwithmanyalternate pathstoeachstate.Students willquicklyfindthatcomputingtheoptimalsolutionsequenceisprohibitivelyexpensivefor moderatelylargeworlds,becausethestatespaceforan n × n worldhas n2 · 2n states.The completiontimeoftherandomagentgrowslessthanexponentiallyin n,soforanyreasonable exchangeratebetweensearchcostadpathcosttherandomagentwilleventuallywin.

3.21

a.Whenallstepcostsareequal, g(n) ∝ depth (n),souniform-costsearchreproduces breadth-firstsearch.

b.Breadth-firstsearchisbest-firstsearchwith f (n)= depth (n);depth-firstsearchis best-firstsearchwith f (n)= depth (n);uniform-costsearchisbest-firstsearchwith f (n)= g(n).

c.Uniform-costsearchisA∗ searchwith h(n)=0.

3.22 Thestudentshouldfindthatonthe8-puzzle,RBFSexpandsmorenodes(because itdoesnotdetectrepeatedstates)buthaslowercostpernodebecauseitdoesnotneedto maintainaqueue.ThenumberofRBFSnodere-expansionsisnottoohighbecausethe presenceofmanytiedvaluesmeansthatthebestpathchanges seldom.Whentheheuristicis slightlyperturbed,thisadvantagedisappearsandRBFS’sperformanceismuchworse.

ForTSP,thestatespaceisatree,sorepeatedstatesarenotanissue.Ontheotherhand, theheuristicisreal-valuedandthereareessentiallynotiedvalues,soRBFSincursaheavy penaltyforfrequentre-expansions.

3.23 Thesequenceofqueuesisasfollows:

L[0+244=244]

M[70+241=311],T[111+329=440]

L[140+244=384],D[145+242=387],T[111+329=440]

D[145+242=387],T[111+329=440],M[210+241=451],T[251+329=580]

24

C[265+160=425],T[111+329=440],M[210+241=451],M[220+241=461],T[251+329=580]

T[111+329=440],M[210+241=451],M[220+241=461],P[403+100=503],T[251+329=580],R[411+193=604], D[385+242=627]

M[210+241=451],M[220+241=461],L[222+244=466],P[403+100=503],T[251+329=580],A[229+366=595], R[411+193=604],D[385+242=627]

M[220+241=461],L[222+244=466],P[403+100=503],L[280+244=524],D[285+242=527],T[251+329=580], A[229+366=595],R[411+193=604],D[385+242=627]

L[222+244=466],P[403+100=503],L[280+244=524],D[285+242=527],L[290+244=534],D[295+242=537], T[251+329=580],A[229+366=595],R[411+193=604],D[385+242=627]

P[403+100=503],L[280+244=524],D[285+242=527],M[292+241=533],L[290+244=534],D[295+242=537], T[251+329=580],A[229+366=595],R[411+193=604],D[385+242=627],T[333+329=662]

B[504+0=504],L[280+244=524],D[285+242=527],M[292+241=533],L[290+244=534],D[295+242=537],T[251+329=580], A[229+366=595],R[411+193=604],D[385+242=627],T[333+329=662],R[500+193=693],C[541+160=701]

FigureS3.2

AgraphwithaninconsistentheuristiconwhichGRAPH-SEARCH failsto returntheoptimalsolution.Thesuccessorsof S are A with f =5 and B with f =7. A is expandedfirst,sothepathvia B willbediscardedbecause A willalreadybeintheclosed list.

3.24 SeeFigureS3.2.

3.25 Itiscompletewhenever 0 ≤ w< 2. w =0 gives f (n)=2g(n).Thisbehavesexactly likeuniform-costsearch—thefactoroftwomakesnodifferenceinthe ordering ofthenodes. w =1 givesA∗ search. w =2 gives f (n)=2h(n),i.e.,greedybest-firstsearch.Wealso have f (n)=(2 w)[g(n)+ w 2 w h(n)] whichbehavesexactlylikeA∗ searchwithaheuristic w 2 w h(n).For w ≤ 1,thisisalways lessthan h(n) andhenceadmissible,provided h(n) isitselfadmissible.

3.26

a.Thebranchingfactoris4(numberofneighborsofeachlocation).

b.Thestatesatdepth k formasquarerotatedat45degreestothegrid.Obviouslythere arealinearnumberofstatesalongtheboundaryofthesquare,sotheansweris 4k

25
S A G B h=7 h=5 h=1 h=0 2 1 4 4

Chapter3.SolvingProblemsbySearching

c.Withoutrepeatedstatechecking,BFSexpendsexponentiallymanynodes:counting precisely,weget ((4x+y+1 1)/3) 1.

d.Therearequadraticallymanystateswithinthesquarefordepth x + y,sotheansweris 2(x + y)(x + y +1) 1.

e.True;thisistheManhattandistancemetric.

f.False;allnodesintherectangledefinedby (0, 0) and (x,y) arecandidatesforthe optimalpath,andtherearequadraticallymanyofthem,allofwhichmaybeexpended intheworstcase.

g.True;removinglinksmayinducedetours,whichrequiremoresteps,so h isanunderestimate.

h.False;nonlocallinkscanreducetheactualpathlengthbelowtheManhattandistance.

3.27

a n2n.Thereare n vehiclesin n2 locations,soroughly(ignoringtheone-per-square constraint) (n2)n = n2n states.

b 5n

c.Manhattandistance,i.e., |(n i +1) xi| + |n yi|.Thisisexactforalonevehicle.

d.Only(iii) min{h1,...,hn}.Theexplanationisnontrivialasitrequirestwoobservations.First,letthe work W inagivensolutionbethetotal distance movedbyall vehiclesovertheirjointtrajectories;thatis,foreachvehicle,addthelengthsofallthe stepstaken.Wehave W ≥ i hi ≥≥ n min{h1,...,hn}.Second,thetotalworkwe cangetdoneperstepis ≤ n.(Notethatforeverycarthatjumps2,anothercarhasto stayput(move0),sothetotalworkperstepisboundedby n.)Hence,completingall theworkrequiresatleast n · min{h1,...,hn}/n = min{h1,...,hn} steps.

3.28 Theheuristic h = h1 + h2 (addingmisplacedtilesandManhattandistance)sometimes overestimates.Now,suppose h(n) ≤ h∗(n)+ c (asgiven)andlet G2 beagoalthatis suboptimalbymorethan c,i.e., g(G2) >C ∗ + c.Nowconsideranynode n onapathtoan optimalgoal.Wehave

f (n)= g(n)+ h(n)

≤ g(n)+ h∗(n)+ c

≤ C ∗ + c

≤ g(G2)

so G2 willneverbeexpandedbeforeanoptimalgoalisexpanded.

3.29 Aheuristicisconsistentiff,foreverynode n andeverysuccessor n′ of n generatedby anyaction a, h(n) ≤ c(n,a,n ′)+ h(n ′)

Onesimpleproofisbyinductiononthenumber k ofnodesontheshortestpathtoanygoal from n.For k =1,let n′ bethegoalnode;then h(n) ≤ c(n,a,n′).Fortheinductive

26

case,assume n′ isontheshortestpath k stepsfromthegoalandthat h(n′) isadmissibleby hypothesis;then

h(n) ≤ c(n,a,n ′)+ h(n ′) ≤ c(n,a,n ′)+ h∗(n ′)= h∗(n) so h(n) at k +1 stepsfromthegoalisalsoadmissible.

3.30 Thisexercisereiteratesasmallportionoftheclassicwork ofHeldandKarp(1970).

a.TheTSPproblemistofindaminimal(totallength)paththroughthecitiesthatforms aclosedloop.MSTisarelaxedversionofthatbecauseitasks foraminimal(total length)graphthatneednotbeaclosedloop—itcanbeanyfully-connectedgraph.As aheuristic,MSTisadmissible—itisalwaysshorterthanorequaltoaclosedloop.

b.Thestraight-linedistancebacktothestartcityisaratherweakheuristic—itvastly underestimateswhentherearemanycities.Inthelaterstageofasearchwhenthereare onlyafewcitiesleftitisnotsobad.TosaythatMSTdominatesstraight-linedistance istosaythatMSTalwaysgivesahighervalue.ThisisobviouslytruebecauseaMST thatincludesthegoalnodeandthecurrentnodemusteitherbethestraightlinebetween them,oritmustincludetwoormorelinesthatadduptomore.(Thisallassumesthe triangleinequality.)

c.See "search/domains/tsp.lisp" forastartatthis.Thefileincludesaheuristic basedonconnectingeachunvisitedcitytoitsnearestneighbor,acloserelativetothe MSTapproach.

d.See(Cormen etal.,1990,p.505)foranalgorithmthatrunsin O(E log E) time,where E isthenumberofedges.Thecoderepositorycurrentlycontainsasomewhatless efficientalgorithm.

Themisplaced-tilesheuristicisexactfortheproblemwhereatilecanmovefrom squareAtosquareB.Asthisisarelaxationoftheconditionthatatilecanmovefrom squareAtosquareBifBisblank,Gaschnig’sheuristiccannotbelessthanthemisplacedtilesheuristic.Asitisalsoadmissible(beingexactforarelaxationoftheoriginalproblem), Gaschnig’sheuristicisthereforemoreaccurate.

3.31

Ifwepermutetwoadjacenttilesinthegoalstate,wehaveastatewheremisplaced-tiles andManhattanbothreturn2,butGaschnig’sheuristicreturns3.

TocomputeGaschnig’sheuristic,repeatthefollowinguntilthegoalstateisreached: letBbethecurrentlocationoftheblank;ifBisoccupiedbytileX(nottheblank)inthe goalstate,moveXtoB;otherwise,moveanymisplacedtileto B.Studentscouldbeaskedto provethatthisistheoptimalsolutiontotherelaxedproblem.

3.32 Studentsshouldprovideresultsintheformofgraphsand/or tablesshowingbothruntimeandnumberofnodesgenerated.(Differentheuristicshavedifferentcomputationcosts.) Runtimesmaybeverysmallfor8-puzzles,soyoumaywanttoassignthe15-puzzleor24puzzleinstead.Theuseofpatterndatabasesisalsoworthexploringexperimentally.

27

4.1

SolutionsforChapter4 BeyondClassicalSearch

a.Localbeamsearchwith k =1 ishill-climbingsearch.

b.Localbeamsearchwithoneinitialstateandnolimitonthenumberofstatesretained, resemblesbreadth-firstsearchinthatitaddsonecompletelayerofnodesbeforeadding thenextlayer.Startingfromonestate,thealgorithmwould beessentiallyidenticalto breadth-firstsearchexceptthateachlayerisgeneratedall atonce.

c.Simulatedannealingwith T =0 atalltimes:ignoringthefactthattheterminationstep wouldbetriggeredimmediately,thesearchwouldbeidenticaltofirst-choicehillclimbingbecauseeverydownwardsuccessorwouldberejectedwith probability1.(Exercise maybemodifiedinfutureprintings.)

d.Simulatedannealingwith T = ∞ atalltimesisarandom-walksearch:italways acceptsanewstate.

e.Geneticalgorithmwithpopulationsize N =1:ifthepopulationsizeis1,thenthe twoselectedparentswillbethesameindividual;crossover yieldsanexactcopyofthe individual;thenthereisasmallchanceofmutation.Thus,thealgorithmexecutesa randomwalkinthespaceofindividuals.

4.2 Despiteitshumbleorigins,thisquestionraisesmanyofthe sameissuesasthescientificallyimportantproblemofproteindesign.Thereisadiscreteassemblyspaceinwhichpieces arechosentobeaddedtothetrackandacontinuousconfigurationspacedeterminedbythe “jointangles”ateveryplacewheretwopiecesarelinked.Thuswecandefineastateasasetof oriented,linkedpiecesandtheassociatedjointanglesintherange [ 10, 10],plusasetofunlinkedpieces.Thelinkageandjointanglesexactlydeterminethephysicallayoutofthetrack; wecanallowfor(andpenalize)layoutsinwhichtrackslieon topofoneanother,orwecan disallowthem.Theevaluationfunctionwouldincludeterms forhowmanypiecesareused, howmanylooseendsthereare,and(ifallowed)thedegreeofoverlap.Wemightincludea penaltyfortheamountofdeviationfrom0-degreejointangles.(Wecouldalsoincludeterms for“interestingness”and“traversability”—forexample, itisnicetobeabletodriveatrain startingfromanytracksegmenttoanyother,endingupineitherdirectionwithouthavingto liftupthetrain.)Thetrickypartisthesetofallowedmoves.Obviouslywecanunlinkany pieceorlinkanunlinkedpiecetoanopenpegwitheitherorientationatanyallowedangle (possiblyexcludingmovesthatcreateoverlap).Moreproblematicaremovestojoinapeg

28

andholeonalready-linkedpiecesandmovestochangetheangleofajoint.Changingone anglemayforcechangesinothers,andthechangeswillvarydependingonwhethertheother piecesareattheirjoint-anglelimit.Ingeneraltherewill benounique“minimal”solutionfor agivenanglechangeintermsoftheconsequentchangestootherangles,andsomechanges maybeimpossible.

4.3 Hereisonesimplehill-climbingalgorithm:

•Connectallthecitiesintoanarbitrarypath.

•Picktwopointsalongthepathatrandom.

•Splitthepathatthosepoints,producingthreepieces.

•Tryallsixpossiblewaystoconnectthethreepieces.

•Keepthebestone,andreconnectthepathaccordingly.

•Iteratethestepsaboveuntilnoimprovementisobservedforawhile.

4.4 Codenotshown.

4.5 SeeFigureS4.1fortheadaptedalgorithm.ForstatesthatOR-SEARCH findsasolution foritrecordsthesolutionfound.Ifitlatervisitsthatstateagainitimmediatelyreturnsthat solution.

WhenOR-SEARCH failstofindasolutionithastobecareful.Whetherastatecanbe solveddependsonthepathtakentothatsolution,aswedonot allowcycles.Soonfailure

OR-SEARCH recordsthevalueof path.Ifastateiswhichhaspreviouslyfailedwhen path containedanysubsetofitspresentvalue,OR-SEARCH returnsfailure.

Toavoidrepeatingsub-solutionswecanlabelallnewsolutionsfound,recordthese labels,thenreturnthelabelifthesestatesarevisitedagain.Post-processingcanpruneoff unusedlabels.Alternatively,wecanoutputadirectacyclicgraphstructureratherthanatree. See(Bertoli etal.,2001)forfurtherdetails.

4.6

Thequestionstatementdescribestherequiredchangesindetail,seeFigureS4.2forthe modifiedalgorithm.WhenOR-SEARCH cyclesbacktoastateon path itreturnsatoken loop whichmeanstoloopbacktothemostrecenttimethisstatewas reachedalongthepathto it.Since path isimplicitlystoredinthereturnedplan,thereissufficientinformationforlater processing,oramodifiedimplementation,toreplacethesewithlabels.

Theplanrepresentationisimplicitlyaugmentedtokeeptrackofwhethertheplanis cyclic(i.e.,containsa loop)sothatOR-SEARCH canpreferacyclicsolutions.

AND-SEARCH returnsfailureifallbranchesleaddirectlytoa loop,asinthiscasethe planwillalwaysloopforever.Thisistheonlycaseitneedstocheckasifallbranchesina finiteplanlooptheremustbesomeAnd-nodewhosechildrenallimmediatelyloop.

4.7 Asequenceofactionsisasolutiontoabeliefstateproblemifittakeseveryinitial physicalstatetoagoalstate.Wecanrelaxthisproblembyrequiringittakeonly some initial physicalstatetoagoalstate.Tomakethiswelldefined,we’llrequirethatitfindsasolution

29

function AND-OR-GRAPH-SEARCH(problem) returns aconditionalplan, orfailure

OR-SEARCH(problem.INITIAL-STATE, problem, [])

function OR-SEARCH(state, problem, path) returns aconditionalplan, orfailure if problem.GOAL-TEST(state) thenreturn theemptyplan if state haspreviouslybeensolved thenreturn RECALL-SUCCESS(state) if state haspreviouslyfailedforasubsetof path thenreturn failure if state ison path then

RECORD-FAILURE(state, path) return failure

foreach action in problem.ACTIONS(state) do plan ← AND-SEARCH(RESULTS(state, action), problem, [state | path]) if plan = failure then

RECORD-SUCCESS(state, [action | plan ]) return [action | plan ]

return failure

function AND-SEARCH(states, problem, path) returns aconditionalplan, orfailure foreach si in states do plan i ← OR-SEARCH(si, problem, path) if plan i = failure thenreturn failure

return [if s1 then plan 1 elseif s2 then plan 2 else if sn 1 then plan n 1 else plan n]

FigureS4.1 AND-OR searchwithrepeatedstatechecking.

forthephysicalstatewiththemostcostlysolution.If h∗(s) istheoptimalcostofsolution startingfromthephysicalstate s,then h(S)=max s∈S h∗(s)

istheheuristicestimategivenbythisrelaxedproblem.Thisheuristicassumesanysolution tothemostdifficultstatetheagentthingspossiblewillsolveallstates.

OnthesensorlessvacuumcleanerprobleminFigure4.14, h correctlydeterminesthe optimalcostforallstatesexceptthecentralthreestates(thosereachedby [suck], [suck,left] and [suck,right])andtheroot,forwhich h estimatestobe1unitcheaperthantheyreally are.ThismeansA∗ willexpandthesethreecentralnodes,beforemarchingtowardsthe solution.

4.8

a.Anactionsequenceisasolutionforbeliefstate b ifperformingitstartinginanystate s ∈ b reachesagoalstate.Sinceanystateinasubsetof b isin b,theresultisimmediate. Anyactionsequencewhichis notasolutionforbeliefstate b isalsonotasolution foranysuperset;thisisthecontrapositiveofwhatwe’vejustproved.Onecannot,in general,sayanythingaboutarbitrarysupersets,astheactionsequenceneednotleadto agoalonthestatesoutsideof b.Onecansay,forexample,thatifanactionsequence

30 Chapter4.BeyondClassicalSearch

function AND-OR-GRAPH-SEARCH(problem) returns aconditionalplan, orfailure

OR-SEARCH(problem.INITIAL-STATE, problem, [])

function OR-SEARCH(state, problem, path) returns aconditionalplan, orfailure if problem.GOAL-TEST(state) thenreturn theemptyplan if state ison path thenreturn loop cyclic plan ← None foreach action in problem.ACTIONS(state) do plan ← AND-SEARCH(RESULTS(state, action), problem, [state | path]) if plan = failure then if plan isacyclic thenreturn [action | plan ] cyclic plan ← [action | plan ] if cyclic plan = None thenreturn cyclic plan return failure

function AND-SEARCH(states, problem, path) returns aconditionalplan, orfailure loopy ← True foreach si in states do plan i ← OR-SEARCH(si, problem, path) if plan i = failure thenreturn failure if plan i = loop then loopy ← False if not loopy then return [if s1 then plan 1 elseif s2 then plan 2 else ... if sn 1 then plan n 1 else plan n] return failure

solvesabeliefstate b andabeliefstate b′ thenitsolvestheunionbeliefstate b ∪ b′ .

b.Onexpansionofanode,donotaddtothefrontieranychildbeliefstatewhichisa supersetofapreviouslyexploredbeliefstate.

c.Ifyoukeeparecordofpreviouslysolvedbeliefstates,add achecktothestartofORsearchtocheckwhetherthebeliefstatepassedinisasubset ofapreviouslysolved beliefstate,returningtheprevioussolutionincaseitis.

4.9

Consideraverysimpleexample:aninitialbeliefstate {S1,S2},actions a and b both leadingtogoalstate G fromeitherinitialstate,and

c(S1,a,G)=3; c(S2,a,G)=5;

c(S1,b,G)=2; c(S2,b,G)=6 . Inthiscase,thesolution [a] costs3or5,thesolution [b] costs2or6.Neitheris“optimal”in anyobvioussense.

Insomecases,there will beanoptimalsolution.Letusconsiderjustthedeterministic case.Forthiscase,wecanthinkofthecostofaplanasamappingfromeachinitialphysicalstatetotheactualcostofexecutingtheplan.Intheexampleabove,thecostfor [a] is

31
FigureS4.2 AND-OR searchwithrepeatedstatechecking.

{S1:3,S2:5} andthecostfor [b] is {S1:2,S2:6}.Wecansaythatplan p1 weaklydominates p2 if,foreachinitialstate,thecostfor p1 isnohigherthanthecostfor p2.(Moreover, p1 dominates p2 ifitweaklydominatesit and hasalowercostforsomestate.)Ifaplan p weakly dominatesallothers,itisoptimal.Noticethatthisdefinitionreducestoordinaryoptimalityin theobservablecasewhereeverybeliefstateisasingleton. Astheprecedingexampleshows, however,aproblemmayhavenooptimalsolutioninthissense.Aperhapsacceptableversion ofA∗ wouldbeonethatreturnsanysolutionthatisnotdominatedbyanother.

TounderstandwhetheritispossibletoapplyA∗ atall,ithelpstounderstanditsdependenceonBellman’s(1957) principleofoptimality: Anoptimalpolicyhasthepropertythat PRINCIPLEOF OPTIMALITY whatevertheinitialstateandinitialdecisionare,theremainingdecisionsmustconstitutean optimalpolicywithregardtothestateresultingfromthefirstdecision. Itisimportantto understandthatthisisarestrictiononperformancemeasuresdesignedtofacilitateefficient algorithms,notageneraldefinitionofwhatitmeanstobeoptimal.

Inparticular,ifwedefinethecostofaplaninbelief-statespaceastheminimumcost ofanyphysicalrealization,weviolateBellman’sprinciple.Modifyingandextendingthe previousexample,supposethat a and b reach S3 from S1 and S4 from S2,andthenreach G fromthere:

c(S1,a,S3)=6; c(S2,a,S4)=2;

c(S1,b,S3)=6; c(S2,b,S4)=1 .c(S3,a,G)=2; c(S4,a,G)=2;

c(S3,b,G)=1; c(S4,b,G)=9 .

Inthebeliefstate {S3,S4},theminimumcostof [a] is min{2, 2} =2 andtheminimumcost of [b] is min{1, 9} =1,sotheoptimalplanis [b].Intheinitialbeliefstate {S1,S2},thefour possibleplanshavethefollowingcosts:

Hence,theoptimalplanin {S1,S2} is [b,a],whichdoes not choose b in {S3,S4} eventhough thatistheoptimalplanatthatpoint.Thiscounterintuitivebehaviorisadirectconsequence ofchoosingtheminimumofthepossiblepathcostsastheperformancemeasure.

Thisexamplegivesjustasmalltasteofwhatmighthappenwithnonadditiveperformancemeasures.DetailsofhowtomodifyandanalyzeA∗ forgeneralpath-dependentcost functionsaregivebyDechterandPearl(1985).Manyaspects ofA∗ carryover;forexample, wecanstillderivelowerboundsonthecostofapaththrougha givennode.Forabeliefstate b,theminimumvalueof g(s)+ h(s) foreachstate s in b isalowerboundontheminimum costofaplanthatgoesthrough b

4.10 ThebeliefstatespaceisshowninFigureS4.3.Nosolutionis possiblebecausenopath leadstoabeliefstateallofwhoseelementssatisfythegoal.Iftheproblemisfullyobservable, theagentreachesagoalstatebyexecutingasequencesuchthat Suck isperformedonlyina dirtysquare.Thisensuresdeterministicbehaviorandeverystateisobviouslysolvable.

4.11

Thestudentneedstomakeseveraldesignchoicesinansweringthisquestion.First, howwilltheverticesofobjectsberepresented?Theproblem statestheperceptisalistof vertexpositions,butthatisnotpreciseenough.Hereisone goodchoice:Theagenthasan

32 Chapter4.BeyondClassicalSearch
a,a]:min
8
4
a,b]:min
7, 11} =7;[b,a]:min{8, 3} =3;[b,b]:min{7, 10} =7
[
{
,
} =4;[
{

orientation(aheadingindegrees).Thevisiblevertexesarelistedinclockwiseorder,starting straightaheadoftheagent.Eachvertexhasarelativeangle (0to360degrees)andadistance. Wealsowanttoknowifavertexrepresentstheleftedgeofanobstacle,therightedge,oran interiorpoint.WecanusethesymbolsL,R,orItoindicatethis.

Thestudentwillneedtodosomebasiccomputationalgeometrycalculations:intersectionofapathandasetoflinesegmentstoseeiftheagentwill bumpintoanobstacle,and visibilitycalculationstodeterminethepercept.Thereareefficientalgorithmsfordoingthis onasetoflinesegments,butdon’tworryaboutefficiency;an exhaustivealgorithmisok.If thisseemstoomuch,theinstructorcanprovideanenvironmentsimulatorandaskthestudent onlytoprogramtheagent.

Toanswer(c),thestudentwillneedsomeexchangeratefortradingoffsearchtimewith movementtime.Itisprobablytoocomplextomakethesimulationasynchronousreal-time; easiertoimposeapenaltyinpointsforcomputation.

For(d),theagentwillneedtomaintainasetofpossiblepositions.Eachtimetheagent moves,itmaybeabletoeliminatesomeofthepossibilities. Theagentmayconsidermoves thatservetoreduceuncertaintyratherthanjustgettothegoal.

4.12 Thisquestionisslightlyambiguousastowhattheperceptis—eithertheperceptisjust thelocation,oritgivesexactlythesetofunblockeddirections(i.e.,blockeddirectionsare illegalactions).Wewillassumethelatter.(Exercisemaybemodifiedinfutureprintings.) Thereare12possiblelocationsforinternalwalls,sothere are 212 =4096 possibleenvironmentconfigurations.Abeliefstatedesignatesa subset oftheseaspossibleconfigurations; forexample,beforeseeinganyperceptsall4096configurationsarepossible—thisisasingle beliefstate.

a.Onlinesearchisequivalenttoofflinesearchinbelief-statespacewhereeachaction inabelief-statecanhavemultiplesuccessorbelief-states:oneforeachperceptthe agentcouldobserveaftertheaction.Asuccessorbelief-stateisconstructedbytaking thepreviousbelief-state,itselfasetofstates,replacingeachstateinthisbelief-state bythesuccessorstateundertheaction,andremovingallsuccessorstateswhichare inconsistentwiththepercept.ThisisexactlytheconstructioninSection4.4.2.AND-OR searchcanbeusedtosolvethissearchproblem.Theinitialbeliefstatehas 210=1024 statesinit,asweknowwhethertwoedgeshavewallsornot(theupperandrightedges havenowalls)butnothingmore.Thereare 2212 possiblebeliefstates,oneforeachset ofenvironmentconfigurations.

33 L R L R S S S
FigureS4.3 Thebeliefstatespaceforthesensorlessvacuumworldunder Murphy’slaw.

Wecanviewthisasacontingencyprobleminbeliefstatespace.Aftereachactionandpercept,theagentlearnswhetherornotaninternal wallexistsbetweenthe currentsquareandeachneighboringsquare.Hence,eachreachablebeliefstatecanbe representedexactlybyalistofstatusvalues(present,absent,unknown)foreachwall separately.Thatis,thebeliefstateiscompletelydecomposableandthereareexactly 312 reachablebeliefstates.Themaximumnumberofpossiblewall-perceptsineachstate is16(24),soeachbeliefstatehasfouractions,eachwithupto16nondeterministic successors.

b.Assumingtheexternalwallsareknown,therearetwointernalwallsandhence 22 =4 possiblepercepts.

c.Theinitialnullactionleadstofourpossiblebeliefstates,asshowninFigureS4.4.From eachbeliefstate,theagentchoosesasingleactionwhichcanleadtoupto8beliefstates (onenteringthemiddlesquare).Giventhepossibilityofhavingtoretraceitsstepsat adeadend,theagentcanexploretheentiremazeinnomorethan18steps,sothe completeplan(expressedasatree)hasnomorethan 818 nodes.Ontheotherhand, therearejust 312 reachablebeliefstates,sotheplancouldbeexpressedmore concisely asatableofactionsindexedbybeliefstate(a policy intheterminologyofChapter17).

4.13 Hillclimbingissurprisinglyeffectiveatfindingreasonableifnotoptimalpathsforvery littlecomputationalcost,andseldomfailsintwodimensions.

34 Chapter4.BeyondClassicalSearch ?? ? ?? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ?? ? ? ? ? ? ? ?? ? ? ? ? ? ? ?? ? ? ? ? ? ? NoOp Right
FigureS4.4 The 3 × 3 mazeexplorationproblem:theinitialstate,firstpercept, andone selectedactionwithitsperceptualoutcomes.

a.Itispossible(seeFigureS4.5(a))butveryunlikely—theobstaclehastohaveanunusual shapeandbepositionedcorrectlywithrespecttothegoal.

b.Withnonconvexobstacles,gettingstuckismuchmorelikelytobeaproblem(seeFigureS4.5(b)).

c.Noticethatthisisjustdepth-limitedsearch,whereyouchooseastepalongthebestpath evenifitisnotasolution.

d.Set k tothemaximumnumberofsidesofanypolygonandyoucanalwaysescape.

e.LRTA*alwaysmakesamove,butmaymovebackiftheoldstatelooksbetterthanthe newstate.Butthentheoldstateispenalizedforthecostofthetrip,soeventuallythe localminimumfillsupandtheagentescapes.

4.14

Sincewecanobservesuccessorstates,wealwaysknowhowtobacktrackfromtoa previousstate.Thismeanswecanadaptiterativedeepening searchtosolvethisproblem. Theonlydifferenceisbacktrackingmustbeexplicit,followingtheactionwhichtheagent canseeleadstothepreviousstate.

35 Current position Goal (a)(b) Current position Goal
FigureS4.5 (a)Gettingstuckwithaconvexobstacle.(b)Gettingstuckwithanonconvex obstacle.
Thealgorithmexpandsthefollowingnodes: Depth1: (0, 0), (1, 0), (0, 0), ( 1, 0), (0, 0) Depth2: (0, 1), (0, 0), (0, 1), (0, 0), (1, 0), (2, 0), (1, 0), (0, 0), (1, 0), (1, 1), (1, 0), (1, 1)

SolutionsforChapter5 AdversarialSearch

5.1 Thetranslationusesthemodeloftheopponent OM (s) tofillintheopponent’sactions, leavingouractionstobedeterminedbythesearchalgorithm.Let P (s) bethestatepredicted tooccuraftertheopponenthasmadealltheirmovesaccordingto OM.Notethattheopponentmaytakemultiplemovesinarowbeforewegetamove,so weneedtodefinethis recursively.Wehave P (s)= s ifPLAYERs isusorTERMINAL-TESTs istrue,otherwise P (s)= P (RESULT(s,OM (s))

Thesearchproblemisthengivenby:

a.Initialstate: P (S0) where S0 istheinitialgamestate.Weapply P astheopponentmay playfirst

b.Actions:definedasinthegamebyACTIONSs.

c.Successorfunction:RESULT′(s,a)= P (RESULT(s,a))

d.Goaltest:goalsareterminalstates

e.Stepcost:thecostofanactioniszerounlesstheresulting state s′ isterminal,inwhich caseitscostis M UTILITY(s′) where M =maxs UTILITY(s).Noticethatallcosts arenon-negative.

Noticethatthestatespaceofthesearchproblemconsistsof gamestatewhereweareto playandterminalstates.Stateswheretheopponentistoplayhavebeencompiledout.One mightalternativelyleavethosestatesinbutjusthaveasinglepossibleaction.

AnyofthesearchalgorithmsofChapter3canbeapplied.Forexample,depth-first searchcanbeusedtosolvethisproblem,ifallgameseventuallyend.Thisisequivalentto usingtheminimaxalgorithmontheoriginalgameif OM (s) alwaysreturnstheminimax movein s

5.2

a.Initialstate:twoarbitrary8-puzzlestates.Successorfunction:onemoveonanunsolved puzzle.(Youcouldalsohaveactionsthatchangebothpuzzlesatthesametime;thisis OKbuttechnicallyyouhavetosaywhathappenswhenoneissolvedbutnottheother.)

Goaltest:bothpuzzlesingoalstate.Pathcost:1permove.

b.Eachpuzzlehas 9!/2 reachablestates(rememberthathalfthestatesareunreachable). Thejointstatespacehas (9!)2/4 states.

c.Thisislikebackgammon;expectiminimaxworks.

36

d.Actuallythestatementinthequestionisnottrue(itappliestoapreviousversionofpart (c)inwhichtheopponentisjusttryingtopreventyoufromwinning—inthatcase,the cointosseswilleventuallyallowyoutosolveonepuzzlewithoutinterruptions).Forthe gamedescribedin(c),considerastateinwhichthecoinhascomeupheads,say,and yougettoworkonapuzzlethatis2stepsfromthegoal.Should youmoveonestep closer?Ifyoudo,youropponentwinsifhetossesheads;orif hetossestails,youtoss tails,andhetossesheads;oranysequencewherebothtosstails n timesandthenhe tossesheads.Sohisprobabilityofwinningis atleast 1/2+1/8+1/32+ ··· =2/3.So itseemsyou’rebetteroffmoving away fromthegoal.(There’snowaytostaythesame distancefromthegoal.)Thisproblemunintentionallyseemstohavethesamekindof solutionassuicidetictactoewithpassing.

5.3

a.SeeFigureS5.1;thevaluesarejust(minus)thenumberofstepsalongthepathfromthe root.

b.SeeFigureS5.1;notethatthereisbothanupperboundandalowerboundfortheleft childoftheroot.

c.Seefigure.

d.Theshortest-pathlengthbetweenthetwoplayersisalower boundonthetotalcapture time(heretheplayerstaketurns,sononeedtodividebytwo),sothe“?”leaveshavea capturetimegreaterthanorequaltothesumofthecostfromtherootandtheshortestpathlength.NoticethatthisboundisderivedwhentheEvaderplaysverybadly.The truevalueofanodecomesfrombestplaybybothplayers,sowe cangetbetterbounds byassumingbetterplay.Forexample,wecangetabetterboundfromthecostwhenthe EvadersimplymovesbackwardsandforwardsratherthanmovingtowardsthePursuer.

e.Seefigure(wehaveusedthesimplebounds).Noticethatonce therightchildisknown

37 bd cd ad cecfcc debedfbf aeafac dddd−4−4 −2 −4−4 <=−6 <=−6 <=−6 <=−6 <=−4 −4 −4 −4 <=−6 −4
FigureS5.1 Pursuit-evasionsolutiontree.

tohaveavaluebelow–6,theremainingsuccessorsneednotbe considered.

f.Thepursueralwayswinsifthetreeisfinite.Toprovethis,letthetreeberootedas thepursuer’scurrentnode.(I.e.,pickupthetreebythatnodeanddanglealltheother branchesdown.)Theevadermusteitherbeattheroot,inwhichcasethepursuerhas won,orinsomesubtree.Thepursuertakesthebranchleading tothatsubtree.This processrepeatsatmost d times,where d isthemaximumdepthoftheoriginalsubtree, untilthepursuereithercatchestheevaderorreachesaleaf node.Sincetheleafhasno subtrees,theevadermustbeatthatnode.

5.4 Thebasicphysicalstateofthesegamesisfairlyeasytodescribe.Oneimportantthing torememberforScrabbleandbridgeisthatthephysicalstateisnotaccessibletoallplayers andsocannotbeprovideddirectlytoeachplayerbytheenvironmentsimulator.Particularly inbridge,eachplayerneedstomaintainsomebestguess(ormultiplehypotheses)astothe actualstateoftheworld.Weexpecttobeputtingsomeofthegameimplementationsonline astheybecomeavailable.

5.5 Codenotshown.

5.6 Themostobviouschangeisthatthespaceofactionsisnowcontinuous.Forexample, inpool,thecueingdirection,angleofelevation,speed,andpointofcontactwiththecueball areallcontinuousquantities.

Thesimplestsolutionisjusttodiscretizetheactionspace andthenapplystandardmethods.Thismightworkfortennis(modelledcrudelyasalternatingshotswithspeedanddirection),butforgamessuchaspoolandcroquetitislikelytofailmiserablybecausesmall changesindirectionhavelargeeffectsonactionoutcome.Instead,onemustanalyzethe gametoidentifyadiscretesetofmeaningfullocalgoals,suchas“pottingthe4-ball”inpool or“layingupforthenexthoop”incroquet.Then,inthecurrentcontext,alocaloptimization routinecanworkoutthebestwaytoachieveeachlocalgoal,resultinginadiscretesetofpossiblechoices.Typically,thesegamesarestochastic,sothebackgammonmodelisappropriate providedthatweusesampledoutcomesinsteadofsummingoveralloutcomes.

Whereaspoolandcroquetaremodelledcorrectlyasturn-takinggames,tennisisnot. Whileoneplayerismovingtotheball,theotherplayerismovingtoanticipatetheopponent’s return.Thismakestennismorelikethesimultaneous-actiongamesstudiedinChapter17.In particular,itmaybereasonabletoderive randomized strategiessothattheopponentcannot anticipatewheretheballwillgo.

5.7 Considera MIN nodewhosechildrenareterminalnodes.If MIN playssuboptimally, thenthevalueofthenodeisgreaterthanorequaltothevalue itwouldhaveif MIN played optimally.Hence,thevalueofthe MAX nodethatisthe MIN node’sparentcanonlybe increased.Thisargumentcanbeextendedbyasimpleinductionallthewaytotheroot. If thesuboptimalplayby MIN ispredictable,thenonecandobetterthanaminimaxstrategy. Forexample,if MIN alwaysfallsforacertainkindoftrapandloses,thensettingthetrap guaranteesawinevenifthereisactuallyadevastatingresponsefor MIN.Thisisshownin FigureS5.2.

5.8

38 Chapter5.AdversarialSearch

Asimplegametreeshowingthatsettingatrapfor MIN byplaying ai isawin if MIN fallsforit,butmayalsobedisastrous.Theminimaxmoveisofcourse a2,withvalue 5

Thegametreeforthefour-squaregameinExercise5.8.Terminalstatesare insingleboxes,loopstatesindoubleboxes.Eachstateisannotatedwithitsminimaxvalue inacircle.

a.(5)Thegametree,completewithannotationsofallminimax values,isshowninFigureS5.3.

b.(5)The“?”valuesarehandledbyassumingthatanagentwith achoicebetweenwinningthegameandenteringa“?”statewillalwayschoosethewin.Thatis,min(–1,?) is–1andmax(+1,?)is+1.Ifallsuccessorsare“?”,thebacked-upvalueis“?”.

c.(5)Standardminimaxisdepth-firstandwouldgointoaninfiniteloop.Itcanbefixed

39 MAX MIN a1 A B D −10 10001000 2b 3b 1b 2d 1d 3d −5 −5 −5 a2
FigureS5.2
(1,4) (2,4) (2,3) (1,3) (1,2) (3,2) (3,4) (4,3) (3,1) (2,4) (1,4) +1 −1 ? ? ? −1 −1 −1 +1 +1 +1
FigureS5.3

bycomparingthecurrentstateagainstthestack;andifthestateisrepeated,thenreturn a“?”value.Propagationof“?”valuesishandledasabove.Althoughitworksinthis case,itdoesnot always workbecauseitisnotclearhowtocompare“?”withadrawn position;norisitclearhowtohandlethecomparisonwhentherearewinsofdifferent degrees(asinbackgammon).Finally,ingameswithchancenodes,itisunclearhowto computetheaverageofanumberanda“?”.Notethatitis not correcttotreatrepeated statesautomaticallyasdrawnpositions;inthisexample,both(1,4)and(2,4)repeatin thetreebuttheyarewonpositions.

Whatisreallyhappeningisthateachstatehasawell-defined butinitiallyunknown value.Theseunknownvaluesarerelatedbytheminimaxequationatthebottomof 164.Ifthegametreeisacyclic,thentheminimaxalgorithmsolvestheseequationsby propagatingfromtheleaves.Ifthegametreehascycles,thenadynamicprogramming methodmustbeused,asexplainedinChapter17.(Exercise17.7studiesthisproblemin particular.)Thesealgorithmscandeterminewhethereachnodehasawell-determined value(asinthisexample)orisreallyaninfiniteloopinthat bothplayersprefertostay intheloop(orhavenochoice).Insuchacase,therulesofthe gamewillneedtodefine thevalue(otherwisethegamewillneverend).Inchess,forexample,astatethatoccurs 3times(andhenceisassumedtobedesirableforbothplayers)isadraw.

d.Thisquestionisalittletricky.Oneapproachisaproofbyinductiononthesizeofthe game.Clearly,thebasecase n =3 isalossforAandthebasecase n =4 isawinfor A.Forany n> 4,theinitialmovesarethesame:AandBbothmoveonesteptowards eachother.Now,wecanseethattheyareengagedinasubgameofsize n 2 onthe squares [2,...,n 1], except thatthereisanextrachoiceofmovesonsquares 2 and n 1.Ignoringthisforamoment,itisclearthatifthe“n 2”iswonforA,thenA getstothesquare n 1 beforeBgetstosquare 2 (bythedefinitionofwinning)and thereforegetsto n beforeBgetsto 1,hencethe“n”gameiswonforA.Bythesame lineofreasoning,if“n 2”iswonforBthen“n”iswonforB.Now,thepresenceof theextramovescomplicatestheissue,butnottoomuch.First,theplayerwhoisslated towinthesubgame [2,...,n 1] nevermovesbacktohishomesquare.Iftheplayer slatedtolosethesubgamedoesso,thenitiseasytoshowthat heisboundtolosethe gameitself—theotherplayersimplymovesforwardandasubgameofsize n 2k is playedonestepclosertotheloser’shomesquare.

5.9 For a,thereareatmost9!games.(Thisisthenumberofmovesequencesthatfillupthe board,butmanywinsandlossesendbeforetheboardisfull.) For b–e,FigureS5.4showsthe gametree,withtheevaluationfunctionvaluesbelowtheterminalnodesandthebacked-up valuestotherightofthenon-terminalnodes.Thevaluesimplythatthebeststartingmovefor Xistotakethecenter.Theterminalnodeswithaboldoutline aretheonesthatdonotneed tobeevaluated,assumingtheoptimalordering.

5.10

a.Anupperboundonthenumberofterminalnodesis N !,oneforeachorderingof squares,soanupperboundonthetotalnumberofnodesis N i=1 i!.Thisisnotmuch

40 Chapter5.AdversarialSearch

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.