Artificial Intelligence A Modern Approach 3rd Edition Russell Solutions Manual Visit to Download in Full: https://testbankdeal.com/download/artificial-intelligence-a-m odern-approach-3rd-edition-russell-solutions-manual/
Instructor’sManual: ExerciseSolutions for ArtificialIntelligence AModernApproach ThirdEdition(InternationalVersion) StuartJ.RussellandPeterNorvig withcontributionsfrom ErnestDavis,NicholasJ.Hay,andMehranSahami UpperSaddleRiverBostonColumbusSanFranciscoNewYork IndianapolisLondonTorontoSydneySingaporeTokyoMontreal DubaiMadridHongKongMexicoCityMunichParisAmsterdamCapeTown Artificial Intelligence A Modern Approach 3rd Edition Russell Solutions Manual Visit TestBankDeal.com to get complete for all chapters
Editor-in-Chief:MichaelHirsch
ExecutiveEditor:TracyDunkelberger
AssistantEditor:MelindaHaggerty
EditorialAssistant:AllisonMichael
VicePresident,Production:VinceO’Brien
SeniorManagingEditor:ScottDisanno
ProductionEditor:JaneBonnell
InteriorDesigners:StuartRussellandPeterNorvig
Copyright©2010,2003,1995byPearsonEducation,Inc., UpperSaddleRiver,NewJersey07458. Allrightsreserved.ManufacturedintheUnitedStatesofAmerica.Thispublicationisprotectedby Copyrightandpermissionsshouldbeobtainedfromthepublisherpriortoanyprohibitedreproduction, storageinaretrievalsystem,ortransmissioninanyformor byanymeans,electronic,mechanical, photocopying,recording,orlikewise.Toobtainpermission(s)tousematerialsfromthiswork,please submitawrittenrequesttoPearsonHigherEducation,PermissionsDepartment,1LakeStreet,Upper SaddleRiver,NJ07458.
Theauthorandpublisherofthisbookhaveusedtheirbesteffortsinpreparingthisbook.These effortsincludethedevelopment,research,andtestingofthetheoriesandprogramstodeterminetheir effectiveness.Theauthorandpublishermakenowarrantyof anykind,expressedorimplied,with regardtotheseprogramsorthedocumentationcontainedinthisbook.Theauthorandpublishershall notbeliableinanyeventforincidentalorconsequentialdamagesinconnectionwith,orarisingout of,thefurnishing,performance,oruseoftheseprograms.
LibraryofCongressCataloging-in-PublicationDataonFile
10987654321
ISBN-13:978-0-13-606738-2
ISBN-10:0-13-606738-7
Preface
ThisInstructor’sSolutionManualprovidessolutions(oratleastsolutionsketches)for almostallofthe400exercisesin ArtificialIntelligence:AModernApproach(ThirdEdition) Weonlygiveactualcodeforafewoftheprogrammingexercises;writingalotofcodewould notbethathelpful,ifonlybecausewedon’tknowwhatlanguageyouprefer.
Inmanycases,wegiveideasfordiscussionandfollow-upquestions,andwetryto explain why wedesignedeachexercise.
Thereismoresupplementarymaterialthatwewanttoofferto theinstructor,butwe havedecidedtodoitthroughthemediumoftheWorldWideWebratherthanthroughaCD orprintedInstructor’sManual.Theideaisthatthissolutionmanualcontainsthematerialthat mustbekeptsecretfromstudents,buttheWebsitecontainsmaterialthatcanbeupdatedand addedtoinamoretimelyfashion.Theaddressforthewebsite is:
http://aima.cs.berkeley.edu
andtheaddressfortheonlineInstructor’sGuideis:
http://aima.cs.berkeley.edu/instructors.html
Thereyouwillfind:
•Instructionsonhowtojointhe aima-instructors discussionlist.Westronglyrecommendthatyoujoinsothatyoucanreceiveupdates,corrections,notificationofnew versionsofthisSolutionsManual,additionalexercisesandexamquestions,etc.,ina timelymanner.
•Sourcecodeforprogramsfromthetext.WeoffercodeinLisp,Python,andJava,and pointtocodedevelopedbyothersinC++andProlog.
•Programmingresourcesandsupplementaltexts.
•Figuresfromthetext,formakingyourownslides.
•Terminologyfromtheindexofthebook.
•OthercoursesusingthebookthathavehomepagesontheWeb. Youcanseeexample syllabiandassignmentshere.Please donot putsolutionsetsforAIMAexerciseson publicwebpages!
•AIEducationinformationonteachingintroductoryAIcourses.
•OthersitesontheWebwithinformationonAI.Organizedbychapterinthebook;check thisforsupplementalmaterial.
Wewelcomesuggestionsfornewexercises,newenvironments andagents,etc.The bookbelongstoyou,theinstructor,asmuchasus.Wehopethatyouenjoyteachingfromit, thatthesesupplementalmaterialshelp,andthatyouwillshareyoursupplementsandexperienceswithotherinstructors.
iii
1.1
SolutionsforChapter1 Introduction
a.Dictionarydefinitionsof intelligence talkabout“thecapacitytoacquireandapply knowledge”or“thefacultyofthoughtandreason”or“theabilitytocomprehendand profitfromexperience.”Theseareallreasonableanswers,butifwewantsomething quantifiablewewouldusesomethinglike“theabilitytoapplyknowledgeinorderto performbetterinanenvironment.”
b.Wedefine artificialintelligence asthestudyandconstructionofagentprogramsthat performwellinagivenenvironment,foragivenagentarchitecture.
c.Wedefinean agent asanentitythattakesactioninresponsetoperceptsfroman environment.
d.Wedefine rationality asthepropertyofasystemwhichdoesthe“rightthing”given whatitknows.SeeSection2.2foramorecompletediscussion.Bothdescribeperfect rationality,however;seeSection27.3.
e.Wedefine logicalreasoning astheaprocessofderivingnewsentencesfromold,such thatthenewsentencesarenecessarilytrueiftheoldonesaretrue.(Noticethatdoes notrefertoanyspecificsyntaxoorformallanguage,butitdoesrequireawell-defined notionoftruth.)
1.2
Theprobabilityoffoolinganinterrogatordependsonjusthowunskilledtheinterrogatoris.Oneentrantinthe2002Loebnerprizecompetition(whichisnotquitearealTuring Test)didfoolonejudge,althoughifyoulookatthetranscript,itishardtoimaginewhat thatjudgewasthinking.Therecertainlyhavebeenexamples ofachatbotorotheronline agentfoolinghumans.Forexample,seeSeeLennyFoner’saccountoftheJuliachatbot atfoner.www.media.mit.edu/people/foner/Julia/.We’dsaythechancetodayissomething like10%,withthevariationdependingmoreontheskilloftheinterrogatorratherthanthe program.In50years,weexpectthattheentertainmentindustry(movies,videogames,commercials)willhavemadesufficientinvestmentsinartificialactorstocreateverycredible impersonators.
Seethesolutionforexercise26.1forsomediscussionofpotentialobjections.
1
1.3 Yes,theyarerational,becauseslower,deliberativeactionswouldtendtoresultinmore damagetothehand.If“intelligent”means“applyingknowledge”or“usingthoughtand reasoning”thenitdoesnotrequireintelligencetomakeareflexaction.
1.4 No.IQtestscorescorrelatewellwithcertainothermeasures,suchassuccessincollege, abilitytomakegooddecisionsincomplex,real-worldsituations,abilitytolearnnewskills andsubjectsquickly,andsoon,but only ifthey’remeasuringfairlynormalhumans.TheIQ testdoesn’tmeasureeverything.AprogramthatisspecializedonlyforIQtests(andspecializedfurtheronlyfortheanalogypart)wouldverylikelyperformpoorlyonothermeasures ofintelligence.Considerthefollowinganalogy:ifahuman runsthe100min10seconds,we mightdescribehimorheras veryathletic andexpectcompetentperformanceinotherareas suchaswalking,jumping,hurdling,andperhapsthrowingballs;butwewouldnotdesscribe aBoeing747as veryathletic becauseitcancover100min0.4seconds,norwouldweexpect ittobegoodathurdlingandthrowingballs.
Evenforhumans,IQtestsarecontroversialbecauseoftheir theoreticalpresuppositions aboutinnateability(distinctfromtrainingeffects)adnthegeneralizabilityofresults.See TheMismeasureofMan byStephenJayGould,Norton,1981or Multipleintelligences:the theoryinpractice byHowardGardner,BasicBooks,1993formoreonIQtests,whatthey measure,andwhatotheraspectsthereareto“intelligence.”
1.5 Inorderofmagnitudefigures,thecomputationalpowerofthe computeris100times larger.
1.6 Justasyouareunawareofallthestepsthatgointomakingyourheartbeat,youare alsounawareofmostofwhathappensinyourthoughts.Youdohaveaconsciousawareness ofsomeofyourthoughtprocesses,butthemajorityremainsopaquetoyourconsciousness. Thefieldofpsychoanalysisisbasedontheideathatoneneeds trainedprofessionalhelpto analyzeone’sownthoughts.
1.7
•Althoughbarcodescanningisinasensecomputervision,thesearenotAIsystems. Theproblemofreadingabarcodeisanextremelylimitedandartificialformofvisual interpretation,andithasbeencarefullydesignedtobeassimpleaspossible,giventhe hardware.
•Inmanyrespects.Theproblemofdeterminingtherelevance ofawebpagetoaquery isaprobleminnaturallanguageunderstanding,andthetechniquesarerelatedtothose wewilldiscussinChapters22and23.SearchengineslikeAsk.com,whichgroup theretrievedpagesintocategories,useclusteringtechniquesanalogoustothosewe discussinChapter20.Likewise,otherfunctionalitiesprovidedbyasearchenginesuse intelligenttechniques;forinstance,thespellingcorrectorusesaformofdatamining basedonobservingusers’correctionsoftheirownspelling errors.Ontheotherhand, theproblemofindexingbillionsofwebpagesinawaythatallowsretrievalinseconds isaproblemindatabasedesign,notinartificialintelligence.
•Toalimitedextent.Suchmenustendstousevocabularieswhichareverylimited–e.g.thedigits,“Yes”,and“No”—andwithinthedesigners’control,whichgreatly simplifiestheproblem.Ontheotherhand,theprogramsmustdealwithanuncontrolled spaceofallkindsofvoicesandaccents.
2 Chapter1.Introduction
Thevoiceactivateddirectoryassistanceprogramsusedbytelephonecompanies, whichmustdealwithalargeandchangingvocabularyarecertainlyAIprograms.
•Thisisborderline.Thereissomethingtobesaidforviewingtheseasintelligentagents workingincyberspace.Thetaskissophisticated,theinformationavailableispartial,the techniquesareheuristic(notguaranteedoptimal),andthe stateoftheworldisdynamic. Allofthesearecharacteristicofintelligentactivities. Ontheotherhand,thetaskisvery farfromthosenormallycarriedoutinhumancognition.
1.8
Presumablythebrainhasevolvedsoastocarryoutthisoperationsonvisualimages, butthemechanismisonlyaccessibleforoneparticularpurposeinthisparticularcognitive taskofimageprocessing.Untilabouttwocenturiesagotherewasnoadvantageinpeople(or animals)beingabletocomputetheconvolutionofaGaussian foranyotherpurpose.
Thereallyinterestingquestionhereiswhatwemeanbysayingthatthe“actualperson” candosomething.Thepersoncansee,buthecannotcomputetheconvolutionofaGaussian; butcomputingthatconvolutionis part ofseeing.Thisisbeyondthescopeofthissolution manual.
1.9
Evolutiontendstoperpetuateorganisms(andcombinations andmutationsoforganisms)thataresuccessfulenoughtoreproduce.Thatis,evolutionfavorsorganismsthatcan optimizetheirperformancemeasuretoatleastsurvivetotheageofsexualmaturity,andthen beabletowinamate.Rationalityjustmeansoptimizingperformancemeasure,sothisisin linewithevolution.
1.10 Thisquestionisintendedtobeabouttheessentialnatureof theAIproblemandwhatis requiredtosolveit,butcouldalsobeinterpretedasasociologicalquestionaboutthecurrent practiceofAIresearch.
A science isafieldofstudythatleadstotheacquisitionofempiricalknowledgebythe scientificmethod,whichinvolvesfalsifiablehypothesesaboutwhatis.Apure engineering fieldcanbethoughtofastakingafixedbaseofempiricalknowledgeandusingittosolve problemsofinteresttosociety.Ofcourse,engineersdobitsofscience—e.g.,theymeasurethe propertiesofbuildingmaterials—andscientistsdobitsof engineeringtocreatenewdevices andsoon.
AsdescribedinSection1.1,the“human”sideofAIisclearly anempiricalscience— calledcognitivesciencethesedays—becauseitinvolvespsychologicalexperimentsdesigned outtofindouthowhumancognitionactuallyworks.Whatabout thethe“rational”side?
Ifweviewitasstudyingtheabstractrelationshipamonganarbitrarytaskenvironment,a computingdevice,andtheprogramforthatcomputingdevice thatyieldsthebestperformance inthetaskenvironment,thentherationalsideofAIisreallymathematicsandengineering; itdoesnotrequireanyempiricalknowledgeaboutthe actual world—andthe actual task environment—thatweinhabit;thatagivenprogramwilldowellinagivenenvironmentisa theorem.(Thesameistrueofpuredecisiontheory.)Inpractice,however,weareinterested intaskenvironmentsthatdoapproximatetheactualworld,soeventherationalsideofAI involvesfindingoutwhattheactualworldislike.Forexample,instudyingrationalagents thatcommunicate,weareinterestedintaskenvironmentsthatcontainhumans,sowehave
3
tofindoutwhathumanlanguageislike.Instudyingperception,wetendtofocusonsensors suchascamerasthatextractusefulinformationfromtheactualworld.(Inaworldwithout light,cameraswouldn’tbemuchuse.)Moreover,todesignvisionalgorithmsthataregood atextractinginformationfromcameraimages,weneedtounderstandtheactualworldthat generatesthoseimages.Obtainingtherequiredunderstandingofscenecharacteristics,object types,surfacemarkings,andsoonisaquitedifferentkindofsciencefromordinaryphysics, chemistry,biology,andsoon,butitisstillscience.
Insummary,AIisdefinitelyengineeringbutitwouldnotbeespeciallyusefultousifit werenotalsoanempiricalscienceconcernedwiththoseaspectsoftherealworldthataffect thedesignofintelligentsystemsforthatworld.
1.11 Thisdependsonyourdefinitionof“intelligent”and“tell.” Inonesensecomputersonly dowhattheprogrammerscommandthemtodo,butinanothersensewhattheprogrammers consciouslytellsthecomputertodooftenhasverylittleto dowithwhatthecomputeractually does.Anyonewhohaswrittenaprogramwithanornerybugknowsthis,asdoesanyone whohaswrittenasuccessfulmachinelearningprogram.Soin onesenseSamuel“told”the computer“learntoplaycheckersbetterthanIdo,andthenplaythatway,”butinanother sensehetoldthecomputer“followthislearningalgorithm” anditlearnedtoplay.Sowe’re leftinthesituationwhereyoumayormaynotconsiderlearningtoplaycheckerstobessign ofintelligence(oryoumaythinkthatlearningtoplayinthe rightwayrequiresintelligence, butnotinthisway),andyoumaythinktheintelligenceresidesintheprogrammerorinthe computer.
1.12 Thepointofthisexerciseistonoticetheparallelwiththepreviousone.Whatever youdecidedaboutwhethercomputerscouldbeintelligentin 1.11,youarecommittedto makingthesameconclusionaboutanimals(includinghumans), unless yourreasonsfordecidingwhethersomethingisintelligenttakeintoaccountthemechanism(programmingvia genesversusprogrammingviaahumanprogrammer).Notethat Searlemakesthisappealto mechanisminhisChineseRoomargument(seeChapter26).
1.13
1.14
a.(ping-pong)Areasonablelevelofproficiencywasachieved byAndersson’srobot(Andersson,1988).
b.(drivinginCairo)No.Althoughtherehasbeenalotofprogressinautomateddriving, allsuchsystemscurrentlyrelyoncertainrelativelyconstantclues:thattheroadhas shouldersandacenterline,thatthecaraheadwilltravelapredictablecourse,thatcars willkeeptotheirsideoftheroad,andsoon.Somelanechangesandturnscanbemade onclearlymarkedroadsinlighttomoderatetraffic.Driving indowntownCairoistoo unpredictableforanyofthesetowork.
c.(drivinginVictorville,California)Yes,tosomeextent, asdemonstratedinDARPA’s UrbanChallenge.Someofthevehiclesmanagedtonegotiatestreets,intersections, well-behavedtraffic,andwell-behavedpedestriansingood visualconditions.
4 Chapter1.Introduction
Again,thechoiceyoumakein1.11drivesyouranswertothisquestion.
d.(shoppingatthemarket)No.Norobotcancurrentlyputtogetherthetasksofmovingin acrowdedenvironment,usingvisiontoidentifyawidevarietyofobjects,andgrasping theobjects(includingsquishablevegetables)withoutdamagingthem.Thecomponent piecesarenearlyabletohandletheindividualtasks,butit wouldtakeamajorintegrationefforttoputitalltogether.
e.(shoppingontheweb)Yes.Softwarerobotsarecapableofhandlingsuchtasks,particularlyifthedesignofthewebgroceryshoppingsitedoes notchangeradicallyover time.
f.(bridge)Yes.ProgramssuchasGIBnowplayatasolidlevel.
g.(theoremproving)Yes.Forexample,theproofofRobbinsalgebradescribedonpage 360.
h.(funnystory)No.Whilesomecomputer-generatedproseand poetryishysterically funny,thisisinvariablyunintentional,exceptinthecase ofprogramsthatechoback prosethattheyhavememorized.
i.(legaladvice)Yes,insomecases.AIhasalonghistoryofresearchintoapplications ofautomatedlegalreasoning.Twooutstandingexamplesare theProlog-basedexpert systemsusedintheUKtoguidemembersofthepublicindealingwiththeintricaciesof thesocialsecurityandnationalitylaws.Thesocialsecuritysystemissaidtohavesaved theUKgovernmentapproximately$150millioninitsfirstyearofoperation.However, extensionintomorecomplexareassuchascontractlawawaitsasatisfactoryencoding ofthevastwebofcommon-senseknowledgepertainingtocommercialtransactionsand agreementandbusinesspractices.
j.(translation)Yes.Inalimitedway,thisisalreadybeingdone.SeeKay,Gawronand Norvig(1994)andWahlster(2000)foranoverviewofthefield ofspeechtranslation, andsomelimitationsonthecurrentstateoftheart.
k.(surgery)Yes.Robotsareincreasinglybeingusedforsurgery,althoughalwaysunder thecommandofadoctor.Roboticskillsdemonstratedatsuperhumanlevelsinclude drillingholesinbonetoinsertartificialjoints,suturing,andknot-tying.Theyarenot yetcapableofplanningandcarryingoutacomplexoperation autonomouslyfromstart tofinish.
1.15
• DARPAGrandChallengeforRoboticCars In2004theGrandChallengewasa240 kmracethroughtheMojaveDesert.Itclearlystressedthestateoftheartofautonomous driving,andinfactnocompetitorfinishedtherace.Thebest team,CMU,completed only12ofthe240km.In2005theracefeatureda212kmcoursewithfewercurves andwiderroadsthanthe2004race.Fiveteamsfinished,withStanfordfinishingfirst, edgingouttwoCMUentries.Thiswashailedasagreatachievementforroboticsand fortheChallengeformat.In2007theUrbanChallengeputcarsinacitysetting,where theyhadtoobeytrafficlawsandavoidothercars.ThistimeCMUedgedoutStanford.
5
Theprogressmadeinthiscontestsisamatteroffact,butthe impactofthatprogressis amatterofopinion.
Thecompetitionappearstohavebeenagoodtestinggroundto puttheoryintopractice, somethingthatthefailuresof2004showedwasneeded.Butit isimportantthatthe competitionwasdoneatjusttherighttime,whentherewastheoreticalworktoconsolidate,asdemonstratedbytheearlierworkbyDickmanns(whoseVaMPcardrove autonomouslyfor158kmin1995)andbyPomerleau(whoseNavlabcardrove5000km acrosstheUSA,alsoin1995,withthesteeringcontrolledautonomouslyfor98%ofthe trip,althoughthebrakesandacceleratorwerecontrolledbyahumandriver).
• InternationalPlanningCompetition In1998,fiveplannerscompeted:Blackbox, HSP,IPP,SGP,andSTAN.Theresultpage(ftp://ftp.cs.yale.edu/pub/ mcdermott/aipscomp-results.html)stated“alloftheseplannersperformed verywell,comparedtothestateoftheartafewyearsago.”Mostplansfoundwere30or 40steps,withsomeover100steps.In2008,thecompetitionhadexpandedquiteabit: thereweremoretracks(satisficingvs.optimizing;sequentialvs.temporal;staticvs. learning).Therewereabout25planners,includingsubmissionsfromthe1998groups (ortheirdescendants)andnewgroups.Solutionsfoundwere muchlongerthanin1998. Insum,thefieldhasprogressedquiteabitinparticipation, inbreadth,andinpowerof theplanners.Inthe1990sitwaspossibletopublishaPlanningpaperthatdiscussed onlyatheoreticalapproach;nowitisnecessarytoshowquantitativeevidenceofthe efficacyofanapproach.Thefieldisstrongerandmorematurenow,anditseemsthat theplanningcompetitiondeservessomeofthecredit.However,someresearchersfeel thattoomuchemphasisisplacedontheparticularclassesof problemsthatappearin thecompetitions,andnotenoughonreal-worldapplications.
• RobocupRoboticsSoccer Thiscompetitionhasprovedextremelypopular,attracting 407teamsfrom43countriesin2009(upfrom38teamsfrom11countriesin1997). Theroboticplatformhasadvancedtoamorecapablehumanoid form,andthestrategy andtacticshaveadvancedaswell.Althoughthecompetition hasspurredinnovations indistributedcontrol,thewinningteamsinrecentyearshavereliedmoreonindividual ball-handlingskillsthanonadvancedteamwork.Thecompetitionhasservedtoincrease interestandparticipationinrobotics,althoughitisnotclearhowwelltheyareadvancing towardsthegoalofdefeatingahumanteamby2050.
• TRECInformationRetrievalConference Thisisoneoftheoldestcompetitions, startedin1992.Thecompetitionshaveservedtobringtogetheracommunityofresearchers,haveledtoalargeliteratureofpublications,andhaveseenprogressinparticipationandinqualityofresultsovertheyears.Intheearlyyears,TRECserved itspurposeasaplacetodoevaluationsofretrievalalgorithmsontextcollectionsthat werelargeforthetime.However,startingaround2000TRECbecamelessrelevantas theadventoftheWorldWideWebcreatedacorpusthatwasavailabletoanyoneand wasmuchlargerthananythingTREChadcreated,andthedevelopmentofcommercial searchenginessurpassedacademicresearch.
• NISTOpenMachineTranslationEvaluation Thisseriesofevaluations(explicitly notlabelleda“competition”)hasexistedsince2001.Since thenwehaveseengreat advancesinMachineTranslationqualityaswellasinthenumberoflanguagescovered.
6 Chapter1.Introduction
Thedominantapproachhasswitchedfromonebasedongrammaticalrulestoonethat reliesprimarilyonstatistics.TheNISTevaluationsseemtotrackthesechangeswell, butdon’tappeartobedrivingthechanges.
Overall,weseethatwhateveryoumeasureisboundtoincreaseovertime.Formostof thesecompetitions,themeasurementwasausefulone,andthestateofthearthasprogressed. InthecaseofICAPS,someplanningresearchersworrythattoomuchattentionhasbeen lavishedonthecompetitionitself.Insomecases,progress hasleftthecompetitionbehind, asinTREC,wheretheresourcesavailabletocommercialsearchenginesoutpacedthose availabletoacademicresearchers.InthiscasetheTRECcompetitionwasuseful—ithelped trainmanyofthepeoplewhoendedupincommercialsearchengines—andinnowaydrew energyawayfromnewideas.
7
SolutionsforChapter2 IntelligentAgents
2.1 Thisquestionteststhestudent’sunderstandingofenvironments,rationalactions,and performancemeasures.Anysequentialenvironmentinwhich rewardsmaytaketimetoarrive willwork,becausethenwecanarrangefortherewardtobe“overthehorizon.”Supposethat inanystatetherearetwoactionchoices, a and b,andconsidertwocases:theagentisinstate s attime T orattime T 1.Instate s,action a reachesstate s′ withreward0,whileaction b reachesstate s againwithreward1;in s′ eitheractiongainsreward10.Attime T 1, it’srationaltodo a in s,withexpectedtotalreward10beforetimeisup;butattime T ,it’s rationaltodo b withtotalexpectedreward1becausetherewardof10cannotbeobtained beforetimeisup.
Studentsmayalsoprovidecommon-senseexamplesfromreallife:investmentswhose payoffoccursaftertheendoflife,examswhereitdoesn’tmakesensetostartthehigh-value questionwithtoolittletimelefttogettheanswer,andsoon
Theenvironmentstatecanincludeaclock,ofcourse;thisdoesn’tchangethegistof theanswer—nowtheactionwilldependontheclockaswellasonthenon-clockpartofthe state—butitdoesmeanthattheagentcanneverbeinthesamestatetwice.
2.2 Noticethatforoursimpleenvironmentalassumptionsweneednotworryaboutquantitativeuncertainty.
a.Itsufficestoshowthatforallpossibleactualenvironments(i.e.,alldirtdistributionsand initiallocations),thisagentcleansthesquaresatleastasfastasanyotheragent.Thisis triviallytruewhenthereisnodirt.Whenthereisdirtinthe initiallocationandnonein theotherlocation,theworldiscleanafteronestep;noagentcandobetter.Whenthere isnodirtintheinitiallocationbutdirtintheother,theworldiscleanaftertwosteps;no agentcandobetter.Whenthereisdirtinbothlocations,the worldiscleanafterthree steps;noagentcandobetter.(Note:ingeneral,theconditionstatedinthefirstsentence ofthisanswerismuchstricterthannecessaryforanagentto berational.)
b.Theagentin(a)keepsmovingbackwardsandforwardsevenaftertheworldisclean. Itisbettertodo NoOp oncetheworldisclean(thechaptersaysthis).Now,since theagent’sperceptdoesn’tsaywhethertheothersquareisclean,itwouldseemthat theagentmusthavesomememorytosaywhethertheothersquarehasalreadybeen cleaned.Tomakethisargumentrigorousismoredifficult—forexample,couldthe agentarrangethingssothatitwouldonlybeinacleanleftsquarewhentherightsquare
8
EXTERNALMEMORY
wasalreadyclean?Asageneralstrategy,anagent can usetheenvironmentitselfas aformof externalmemory—acommontechniqueforhumanswhousethingslike
appointmentcalendarsandknotsinhandkerchiefs.Inthisparticularcase,however,that isnotpossible.Considerthereflexactionsfor [A, Clean ] and [B, Clean ].Ifeitherof theseis NoOp,thentheagentwillfailinthecasewherethatistheinitial perceptbut theothersquareisdirty;hence,neithercanbe NoOp andthereforethesimplereflex agentisdoomedtokeepmoving.Ingeneral,theproblemwithreflexagentsisthatthey havetodothesamethinginsituationsthatlookthesame,evenwhenthesituations areactuallyquitedifferent.Inthevacuumworldthisisabigliability,becauseevery interiorsquare(excepthome)lookseitherlikeasquarewithdirtorasquarewithout dirt.
c.Ifweconsiderasymptoticallylonglifetimes,thenitisclearthatlearningamap(in someform)confersanadvantagebecauseitmeansthattheagentcanavoidbumping intowalls.Itcanalsolearnwheredirtismostlikelytoaccumulateandcandevise anoptimalinspectionstrategy.Theprecisedetailsoftheexplorationmethodneeded toconstructacompletemapappearinChapter4;methodsforderivinganoptimal inspection/cleanupstrategyareinChapter21.
2.3
a. Anagentthatsensesonlypartialinformationaboutthestatecannotbeperfectlyrational.
False.Perfectrationalityreferstotheabilitytomakegooddecisionsgiventhesensor informationreceived.
b Thereexisttaskenvironmentsinwhichnopurereflexagentcanbehaverationally.
True.Apurereflexagentignorespreviouspercepts,socannotobtainanoptimalstate estimateinapartiallyobservableenvironment.Forexample,correspondencechessis playedbysendingmoves;iftheotherplayer’smoveisthecurrentpercept,areflexagent couldnotkeeptrackoftheboardstateandwouldhavetorespondto,say,“a4”inthe samewayregardlessofthepositioninwhichitwasplayed.
c. Thereexistsataskenvironmentinwhicheveryagentisrational.
True.Forexample,inanenvironmentwithasinglestate,suchthatallactionshavethe samereward,itdoesn’tmatterwhichactionistaken.Moregenerally,anyenvironment thatisreward-invariantunderpermutationoftheactionswillsatisfythisproperty.
d. Theinputtoanagentprogramisthesameastheinputtotheagentfunction.
False.Theagentfunction,notionallyspeaking,takesasinputtheentireperceptsequenceuptothatpoint,whereastheagentprogramtakesthecurrentperceptonly.
e. Everyagentfunctionisimplementablebysomeprogram/machinecombination.
False.Forexample,theenvironmentmaycontainTuringmachinesandinputtapesand theagent’sjobistosolvethehaltingproblem;thereisanagent function thatspecifies therightanswers,butnoagentprogramcanimplementit.Anotherexamplewouldbe anagentfunctionthatrequiressolvingintractableprobleminstancesofarbitrarysizein constanttime.
9
MOBILEAGENT
f. Supposeanagentselectsitsactionuniformlyatrandomfrom thesetofpossibleactions. Thereexistsadeterministictaskenvironmentinwhichthis agentisrational.
True.Thisisaspecialcaseof(c);ifitdoesn’tmatterwhich actionyoutake,selecting randomlyisrational.
g. Itispossibleforagivenagenttobeperfectlyrationalintwodistincttaskenvironments.
True.Forexample,wecanarbitrarilymodifythepartsofthe environmentthatare unreachablebyanyoptimalpolicyaslongastheystayunreachable.
h. Everyagentisrationalinanunobservableenvironment.
False.Someactionsarestupid—andtheagentmayknowthisif ithasamodelofthe environment—evenifonecannotperceivetheenvironmentstate.
i Aperfectlyrationalpoker-playingagentneverloses.
False.Unlessitdrawstheperfecthand,theagentcanalways loseifanopponenthas bettercards.Thiscanhappenforgameaftergame.Thecorrectstatementisthatthe agent’sexpectedwinningsarenonnegative.
2.4 Manyofthesecanactuallybearguedeitherway,dependingon thelevelofdetailand abstraction.
A.Partiallyobservable,stochastic,sequential,dynamic,continuous,multi-agent.
B.Partiallyobservable,stochastic,sequential,dynamic,continuous,singleagent(unless therearealienlifeformsthatareusefullymodeledasagents).
C.Partiallyobservable,deterministic,sequential,static,discrete,singleagent.Thiscanbe multi-agentanddynamicifwebuybooksviaauction,ordynamicifwepurchaseona longenoughscalethatbookofferschange.
D.Fullyobservable,stochastic,episodic(everypointisseparate),dynamic,continuous, multi-agent.
E.Fullyobservable,stochastic,episodic,dynamic,continuous,singleagent.
F.Fullyobservable,stochastic,sequential,static,continuous,singleagent.
G.Fullyobservable,deterministic,sequential,static,continuous,singleagent.
H.Fullyobservable,strategic,sequential,static,discrete,multi-agent.
2.5 Thefollowingarejustsomeofthemanypossibledefinitionsthatcanbewritten:
• Agent:anentitythatperceivesandacts;or,onethat canbeviewed asperceivingand acting.Essentiallyanyobjectqualifies;thekeypointisthewaytheobjectimplements anagentfunction.(Note:someauthorsrestrictthetermto programs thatoperate on behalfof ahuman,ortoprogramsthatcancausesomeoralloftheircode torunon othermachinesonanetwork,asin mobileagents.)
• Agentfunction:afunctionthatspecifiestheagent’sactioninresponsetoeverypossible perceptsequence.
• Agentprogram:thatprogramwhich,combinedwithamachinearchitecture, implementsanagentfunction.Inoursimpledesigns,theprogramtakesanewpercepton eachinvocationandreturnsanaction.
10 Chapter2.IntelligentAgents
• Rationality:apropertyofagentsthatchooseactionsthatmaximizetheirexpectedutility,giventheperceptstodate.
• Autonomy:apropertyofagentswhosebehaviorisdeterminedbytheirownexperience ratherthansolelybytheirinitialprogramming.
• Reflexagent:anagentwhoseactiondependsonlyonthecurrentpercept.
• Model-basedagent:anagentwhoseactionisderiveddirectlyfromaninternalmodel ofthecurrentworldstatethatisupdatedovertime.
• Goal-basedagent:anagentthatselectsactionsthatitbelieveswillachieve explicitly representedgoals.
• Utility-basedagent:anagentthatselectsactionsthatitbelieveswillmaximizethe expectedutilityoftheoutcomestate.
• Learningagent:anagentwhosebehaviorimprovesovertimebasedonitsexperience.
2.6 Althoughthesequestionsareverysimple,theyhintatsomeveryfundamentalissues. Ouranswersareforthesimpleagentdesignsfor static environmentswherenothinghappens whiletheagentisdeliberating;theissuesgetevenmoreinterestingfordynamicenvironments.
a.Yes;takeanyagentprogramandinsertnullstatementsthat donotaffecttheoutput.
b.Yes;theagentfunctionmightspecifythattheagentprint true whentheperceptisa Turingmachineprogramthathalts,and false otherwise.(Note:indynamicenvironments,formachinesoflessthaninfinitespeed,therational agentfunctionmaynotbe implementable;e.g.,theagentfunctionthatalwaysplaysa winningmove,ifany,ina gameofchess.)
c.Yes;theagent’sbehaviorisfixedbythearchitectureandprogram.
d.Thereare 2n agentprograms,althoughmanyofthesewillnotrunatall.(Note:Any givenprogramcandevoteatmost n bitstostorage,soitsinternalstatecandistinguish amongonly 2n pasthistories.Becausetheagentfunctionspecifiesactionsbasedonpercepthistories,therewillbemanyagentfunctionsthatcannotbeimplementedbecause oflackofmemoryinthemachine.)
e.Itdependsontheprogramandtheenvironment.Iftheenvironmentisdynamic,speedingupthemachinemaymeanchoosingdifferent(perhapsbetter)actionsand/oracting sooner.Iftheenvironmentisstaticandtheprogrampaysnoattentiontothepassageof elapsedtime,theagentfunctionisunchanged.
2.7
Thedesignofgoal-andutility-basedagentsdependsonthestructureofthetaskenvironment.Thesimplestsuchagents,forexamplethoseinchapters3and10,computethe agent’sentirefuturesequenceofactionsinadvancebefore actingatall.Thisstrategyworks forstaticanddeterministicenvironmentswhichareeither fully-knownorunobservable
Forfully-observableandfully-knownstaticenvironments apolicycanbecomputedin advancewhichgivestheactiontobytakeninanygivenstate.
11
function GOAL-BASED-AGENT(percept) returns anaction persistent: state,theagent’scurrentconceptionoftheworldstate model,adescriptionofhowthenextstatedependsoncurrentstate andaction goal,adescriptionofthedesiredgoalstate plan,asequenceofactionstotake,initiallyempty action,themostrecentaction,initiallynone
state ← UPDATE-STATE(state, action, percept, model ) if GOAL-ACHIEVED(state,goal) thenreturn anullaction if plan isempty then plan ← PLAN(state,goal,model) action ← FIRST(plan) plan ← REST(plan) return action
Forpartially-observableenvironmentstheagentcancomputeaconditionalplan,which specifiesthesequenceofactionstotakeasafunctionoftheagent’sperception.Intheextreme,aconditionalplangivestheagent’sresponsetoeverycontingency,andsoitisarepresentationoftheentireagentfunction.
Inallcasesitmaybeeitherintractableortooexpensivetocomputeeverythingoutin advance.Insteadofaconditionalplan,itmaybebettertocomputeasinglesequenceof actionswhichislikelytoreachthegoal,thenmonitortheenvironmenttocheckwhetherthe planissucceeding,repairingorreplanningifitisnot.Itmaybeevenbettertocomputeonly thestartofthisplanbeforetakingthefirstaction,continuingtoplanatlatertimesteps.
Pseudocodeforsimplegoal-basedagentisgiveninFigureS2.1.GOAL-ACHIEVED teststoseewhetherthecurrentstatesatisfiesthegoalornot,doingnothingifitdoes.PLAN computesasequenceofactionstotaketoachievethegoal.Thismightreturnonlyaprefix ofthefullplan,therestwillbecomputedaftertheprefixisexecuted.Thisagentwillactto maintainthegoal:ifatanypointthegoalisnotsatisfieditwill(eventually)replantoachieve thegoalagain.
Atthislevelofabstractiontheutility-basedagentisnotmuchdifferentthanthegoalbasedagent,exceptthatactionmaybecontinuouslyrequired(thereisnotnecessarilyapoint wheretheutilityfunctionis“satisfied”).PseudocodeisgiveninFigureS2.2.
(defunreflex-rational-vacuum-agent(percept)
(destructuring-bind(locationstatus)percept
12 Chapter2.IntelligentAgents
FigureS2.1 Agoal-basedagent.
2.8 Thefile "agents/environments/vacuum.lisp" inthecoderepositoryimplementsthevacuum-cleanerenvironment.Studentscaneasily extendittogeneratedifferent shapedrooms,obstacles,andsoon.
2.9 Areflexagentprogramimplementingtherationalagentfunctiondescribedinthechapterisasfollows:
function UTILITY-BASED-AGENT(percept) returns anaction persistent: state,theagent’scurrentconceptionoftheworldstate model,adescriptionofhowthenextstatedependsoncurrentstate andaction utility function,adescriptionoftheagent’sutilityfunction plan,asequenceofactionstotake,initiallyempty action,themostrecentaction,initiallynone
state ← UPDATE-STATE(state, action, percept, model ) if plan isempty then plan ← PLAN(state,utility function,model) action ← FIRST(plan) plan ← REST(plan) return action
(cond((eqstatus’Dirty)’Suck) ((eqlocation’A)’Right) (t’Left))))
Forstates1,3,5,7inFigure4.9,theperformancemeasuresare1996,1999,1998,2000 respectively.
2.10
a.No;seeanswerto2.4(b).
b.Seeanswerto2.4(b).
c.Inthiscase,asimplereflexagentcanbeperfectlyrational.Theagentcanconsistof atablewitheightentries,indexedbypercept,thatspecifiesanactiontotakeforeach possiblestate.Aftertheagentacts,theworldisupdatedandthenextperceptwilltell theagentwhattodonext.Forlargerenvironments,constructingatableisinfeasible. Instead,theagentcouldrunoneoftheoptimalsearchalgorithmsinChapters3and4 andexecutethefirststepofthesolutionsequence.Again,no internalstateis required, butitwouldhelptobeabletostorethesolutionsequenceinsteadofrecomputingitfor eachnewpercept.
2.11
a.Becausetheagentdoesnotknowthegeographyandperceives onlylocationandlocal dirt,andcannotrememberwhatjusthappened,itwillgetstuckforeveragainstawall whenittriestomoveinadirectionthatisblocked—thatis,unlessitrandomizes.
b.Onepossibledesigncleansupdirtandotherwisemovesrandomly:
(defunrandomized-reflex-vacuum-agent(percept) (destructuring-bind(locationstatus)percept (cond((eqstatus’Dirty)’Suck)
(t(random-element’(LeftRightUpDown))))))
13
FigureS2.2 Autility-basedagent.
ThisisfairlyclosetowhattheRoombaTM vacuumcleanerdoes(althoughtheRoomba hasabumpsensorandrandomizesonlywhenithitsanobstacle).Itworksreasonably wellinnice,compactenvironments.Inmaze-likeenvironmentsorenvironmentswith smallconnectingpassages,itcantakeaverylongtimetocoverallthesquares.
c.AnexampleisshowninFigureS2.3.Studentsmayalsowishto measureclean-uptime forlinearorsquareenvironmentsofdifferentsizes,andcomparethosetotheefficient onlinesearchalgorithmsdescribedinChapter4.
d.Areflexagentwithstatecanbuildamap(seeChapter4fordetails).Anonlinedepthfirstexplorationwillreacheverystateintimelinearinthe sizeoftheenvironment; therefore,theagentcandomuchbetterthanthesimplereflex agent.
Thequestionofrationalbehaviorinunknownenvironmentsisacomplexonebutitis worthencouragingstudentstothinkaboutit.Weneedtohave somenotionoftheprior probabilitydistributionovertheclassofenvironments;callthistheinitial beliefstate. Anyactionyieldsanewperceptthatcanbeusedtoupdatethis distribution,moving theagenttoanewbeliefstate.Oncetheenvironmentiscompletelyexplored,thebelief statecollapsestoasinglepossibleenvironment.Therefore,theproblemofoptimal explorationcanbeviewedasasearchforanoptimalstrategy inthespaceofpossible beliefstates.Thisisawell-defined,ifhorrendouslyintractable,problem.Chapter21 discussessomecaseswhereoptimalexplorationispossible.Anotherconcreteexample ofexplorationistheMinesweepercomputergame(seeExercise7.22).Forverysmall Minesweeperenvironments,optimalexplorationisfeasiblealthoughthebeliefstate
14 Chapter2.IntelligentAgents
FigureS2.3 Anenvironmentinwhichrandommotionwilltakealongtimeto coverall thesquares.
updateisnontrivialtoexplain.
2.12 Theproblemappearsatfirsttobeverysimilar;themaindifferenceisthatinsteadof usingthelocationpercepttobuildthemap,theagenthasto“invent”itsownlocations(which, afterall,arejustnodesinadatastructurerepresentingthestatespacegraph).Whenabump isdetected,theagentassumesitremainsinthesamelocationandcanaddawalltoitsmap. Forgridenvironments,theagentcankeeptrackofits (x,y) locationandsocantellwhenit hasreturnedtoanoldstate.Inthegeneralcase,however,thereisnosimplewaytotellifa stateisneworold.
2.13
a.Forareflexagent,thispresentsno additional challenge,becausetheagentwillcontinue to Suck aslongasthecurrentlocationremainsdirty.Foranagentthatconstructsa sequentialplan,every Suck actionwouldneedtobereplacedby“Suck untilclean.” Ifthedirtsensorcanbewrongoneachstep,thentheagentmightwanttowaitfora fewstepstogetamorereliablemeasurementbeforedeciding whetherto Suck ormove ontoanewsquare.Obviously,thereisatrade-offbecausewaitingtoolongmeans thatdirtremainsonthefloor(incurringapenalty),butactingimmediatelyriskseither dirtyingacleansquareorignoringadirtysquare(ifthesensoriswrong).Arational agentmustalsocontinuetouringandcheckingthesquaresin caseitmissedoneona previoustour(becauseofbadsensorreadings).itisnotimmediatelyobvioushowthe waitingtimeateachsquareshouldchangewitheachnewtour. Theseissuescanbe clarifiedbyexperimentation,whichmaysuggestageneraltrendthatcanbeverified mathematically.ThisproblemisapartiallyobservableMarkovdecisionprocess—see Chapter17.Suchproblemsarehardingeneral,butsomespecialcasesmayyieldto carefulanalysis.
b.Inthiscase,theagentmustkeeptouringthesquaresindefinitely.Theprobabilitythat asquareisdirtyincreasesmonotonicallywiththetimesinceitwaslastcleaned,sothe rationalstrategyis,roughlyspeaking,torepeatedlyexecutetheshortestpossibletourof allsquares.(Wesay“roughlyspeaking”becausetherearecomplicationscausedbythe factthattheshortesttourmayvisitsomesquarestwice,dependingonthegeography.) ThisproblemisalsoapartiallyobservableMarkovdecision process.
15
SolutionsforChapter3
SolvingProblemsbySearching
3.1 Ingoalformulation,wedecidewhichaspectsoftheworldweareinterestedin,and whichcanbeignoredorabstractedaway.Theninproblemformulationwedecidehowto manipulatetheimportantaspects(andignoretheothers).Ifwedidproblemformulationfirst wewouldnotknowwhattoincludeandwhattoleaveout.Thatsaid,itcanhappenthatthere isacycleofiterationsbetweengoalformulation,problemformulation,andproblemsolving untilonearrivesatasufficientlyusefulandefficientsolution.
3.2
a.We’lldefinethecoordinatesystemsothatthecenterofthe mazeisat (0, 0),andthe mazeitselfisasquarefrom ( 1, 1) to (1, 1).
Initialstate:robotatcoordinate (0, 0),facingNorth.
Goaltest:either |x| > 1 or |y| > 1 where (x,y) isthecurrentlocation.
Successorfunction:moveforwardsanydistance d;changedirectionrobotitfacing. Costfunction:totaldistancemoved.
Thestatespaceisinfinitelylarge,sincetherobot’spositioniscontinuous.
b.Thestatewillrecordtheintersectiontherobotiscurrentlyat,alongwiththedirection it’sfacing.Attheendofeachcorridorleavingthemazewewillhaveanexitnode. We’llassumesomenodecorrespondstothecenterofthemaze.
Initialstate:atthecenterofthemazefacingNorth. Goaltest:atanexitnode.
Successorfunction:movetothenextintersectioninfrontofus,ifthereisone;turnto faceanewdirection.
Costfunction:totaldistancemoved.
Thereare 4n states,where n isthenumberofintersections.
c.Initialstate:atthecenterofthemaze.
Goaltest:atanexitnode.
Successorfunction:movetonextintersectiontotheNorth, South,East,orWest.
Costfunction:totaldistancemoved.
Wenolongerneedtokeeptrackoftherobot’sorientationsinceitisirrelevantto
16
3.3
predictingtheoutcomeofouractions,andnotpartofthegoaltest.Themotorsystem thatexecutesthisplanwillneedtokeeptrackoftherobot’s currentorientation,toknow whentorotatetherobot.
d.Stateabstractions:
(i)Ignoringtheheightoftherobotofftheground,whetheritistiltedoffthevertical.
(ii)Therobotcanfaceinonlyfourdirections.
(iii)Otherpartsoftheworldignored:possibilityofother robotsinthemaze,the weatherintheCaribbean.
Actionabstractions:
(i)Weassumedallpositionswesafelyaccessible:therobot couldn’tgetstuckor damaged.
(ii)Therobotcanmoveasfarasitwants,withouthavingtorechargeitsbatteries.
(iii)Simplifiedmovementsystem:movingforwardsacertain distance,ratherthancontrolledeachindividualmotorandwatchingthesensorstodetectcollisions.
a.Statespace:Statesareallpossiblecitypairs (i,j).Themapis not thestatespace.
Successorfunction:Thesuccessorsof (i,j) areallpairs (x,y) suchthat Adjacent(x,i) and Adjacent(y,j)
Goal:Beat (i,i) forsome i
Stepcostfunction:Thecosttogofrom (i,j) to (x,y) is max(d(i,x),d(j,y))
b.Inthebestcase,thefriendsheadstraightforeachotherin stepsofequalsize,reducing theirseparationbytwicethetimecostoneachstep.Hence(iii)isadmissible.
c.Yes:e.g.,amapwithtwonodesconnectedbyonelink.Thetwo friendswillswap placesforever.Thesamewillhappenonanychainiftheystartanoddnumberofsteps apart.(Onecanseethisbestonthegraphthatrepresentsthe statespace,whichhastwo disjointsetsofnodes.)Thesameevenholdsforagridofanysizeorshape,because everymovechangestheManhattandistancebetweenthetwofriendsby0or2.
d.Yes:takeanyoftheunsolvablemapsfrompart(c)andaddaself-looptoanyoneof thenodes.Ifthefriendsstartanoddnumberofstepsapart,a moveinwhichoneofthe friendstakestheself-loopchangesthedistanceby1,renderingtheproblemsolvable.If theself-loopisnottaken,theargumentfrom(c)appliesand nosolutionispossible.
3.4 Fromhttp://www.cut-the-knot.com/pythagoras/fifteen.shtml,thisproofappliestothe fifteenpuzzle,butthesameargumentworksfortheeightpuzzle:
Definition:Thegoalstatehasthenumbersinacertainorder,whichwewillmeasureas startingattheupperleftcorner,thenproceedinglefttoright,andwhenwereachtheendofa row,goingdowntotheleftmostsquareintherowbelow.Foranyotherconfigurationbesides thegoal,wheneveratilewithagreaternumberonitprecedes atilewithasmallernumber, thetwotilesaresaidtobe inverted
Proposition:Foragivenpuzzleconfiguration,let N denotethesumofthetotalnumber ofinversionsandtherownumberoftheemptysquare.Then (Nmod2) isinvariantunderany
17
Chapter3.SolvingProblemsbySearching legalmove.Inotherwords,afteralegalmoveanodd N remainsoddwhereasaneven N remainseven.ThereforethegoalstateinFigure3.4,withno inversionsandemptysquarein thefirstrow,has N =1,andcanonlybereachedfromstartingstateswithodd N ,notfrom startingstateswitheven N
Proof:Firstofall,slidingatilehorizontallychangesneitherthetotalnumberofinversionsnortherownumberoftheemptysquare.Thereforeletusconsiderslidingatile vertically.
Let’sassume,forexample,thatthetile A islocateddirectlyovertheemptysquare. Slidingitdownchangestheparityoftherownumberoftheemptysquare.Nowconsiderthe totalnumberofinversions.Themoveonlyaffectsrelativepositionsoftiles A, B, C,and D. Ifnoneofthe B, C, D causedaninversionrelativeto A (i.e.,allthreearelargerthan A)then afterslidingonegetsthree(anoddnumber)ofadditionalinversions.Ifoneofthethreeis smallerthan A,thenbeforethemove B, C,and D contributedasingleinversion(relativeto A)whereasafterthemovethey’llbecontributingtwoinversions-achangeof1,alsoanodd number.Twoadditionalcasesobviouslyleadtothesameresult.Thusthechangeinthesum N isalwayseven.Thisispreciselywhatwehavesetouttoshow.
Sobeforewesolveapuzzle,weshouldcomputethe N valueofthestartandgoalstate andmakesuretheyhavethesameparity,otherwisenosolutionispossible.
3.5 Theformulationputsonequeenpercolumn,withanewqueenplacedonlyinasquare thatisnotattackedbyanyotherqueen.Tosimplifymatters, we’llfirstconsiderthe n–rooks problem.Thefirstrookcanbeplacedinanysquareincolumn1(n choices),thesecondin anysquareincolumn2exceptthesamerowthatastherookincolumn1(n 1 choices),and soon.Thisgives n! elementsofthesearchspace.
For n queens,noticethataqueenattacksatmostthreesquaresinanygivencolumn,so incolumn2thereareatleast (n 3) choices,incolumnatleast (n 6) choices,andsoon.
3.6
a.Initialstate:Noregionscolored.
b.Initialstate:Asdescribedinthetext.
Successorfunction:Hoponcrate;Hopoffcrate;Pushcratefromonespottoanother; Walkfromonespottoanother;grabbananas(ifstandingoncrate).
Costfunction:Numberofactions.
18
Thusthestatespacesize S ≥ n · (n 3) · (n 6) ···.Hencewehave S3 ≥ n n n (n 3) (n 3) (n 3) (n 6) (n 6) (n 6) ≥ n (n 1) (n 2) (n 3) (n 4) (n 5) (n 6) (n 7) (n 8) = n! or S ≥ 3 √n!
Costfunction:Numberofassignments.
Goaltest:Allregionscolored,andnotwoadjacentregionshavethesamecolor. Successorfunction:Assignacolortoaregion.
Goaltest:Monkeyhasbananas.
3.7
c.Initialstate:consideringallinputrecords.
Goaltest:consideringasinglerecord,anditgives“illegalinput”message.
Successorfunction:runagainonthefirsthalfoftherecords;runagainonthesecond halfoftherecords.
Costfunction:Numberofruns.
Note:Thisisa contingencyproblem;youneedtoseewhetherarungivesanerror messageornottodecidewhattodonext.
d.Initialstate:jugshavevalues [0, 0, 0].
Successorfunction:givenvalues [x,y,z],generate [12,y,z], [x, 8,z], [x,y, 3] (byfilling); [0,y,z], [x, 0,z], [x,y, 0] (byemptying);orforanytwojugswithcurrentvalues x and y,pour y into x;thischangesthejugwith x totheminimumof x + y andthe capacityofthejug,anddecrementsthejugwith y bybytheamountgainedbythefirst jug.
Costfunction:Numberofactions.
a.Ifweconsiderall (x,y) points,thenthereareaninfinitenumberofstates,andofpaths.
b.(Forthisproblem,weconsiderthestartandgoalpointstobevertices.)Theshortest distancebetweentwopointsisastraightline,andifitisnotpossibletotravelina straightlinebecausesomeobstacleisintheway,thenthenextshortestdistanceisa sequenceoflinesegments,end-to-end,thatdeviatefromthestraightlinebyaslittle aspossible.Sothefirstsegmentofthissequencemustgofrom thestartpointtoa tangentpointonanobstacle–anypaththatgavetheobstacle awidergirthwouldbe longer.Becausetheobstaclesarepolygonal,thetangentpointsmustbeatverticesof theobstacles,andhencetheentirepathmustgofromvertextovertex.Sonowthestate spaceisthesetofvertices,ofwhichthereare35inFigure3.31.
c.Codenotshown.
d.Implementationsandanalysisnotshown.
3.8
a.Anypath,nomatterhowbaditappears,mightleadtoanarbitrarilylargereward(negativecost).Therefore,onewouldneedtoexhaustallpossiblepathstobesureoffinding thebestone.
b.Supposethegreatestpossiblerewardis c.Thenifwealsoknowthemaximumdepthof thestatespace(e.g.whenthestatespaceisatree),thenany pathwith d levelsremaining canbeimprovedbyatmost cd,soanypathsworsethan cd lessthanthebestpathcanbe pruned.Forstatespaceswithloops,thisguaranteedoesn’t help,becauseitispossible togoaroundaloopanynumberoftimes,pickingup c rewardeachtime.
c.Theagentshouldplantogoaroundthisloopforever(unless itcanfindanotherloop withevenbetterreward).
d.Thevalueofascenicloopislessenedeachtimeonerevisits it;anovelscenicsight isagreatreward,butseeingthesameoneforthetenthtimein anhouristedious,not
19
Chapter3.SolvingProblemsbySearching
rewarding.Toaccommodatethis,wewouldhavetoexpandthestatespacetoinclude amemory—astateisnowrepresentednotjustbythecurrentlocation,butbyacurrent locationandabagofalready-visitedlocations.Thereward forvisitinganewlocation isnowa(diminishing)functionofthenumberoftimesithasbeenseenbefore.
e.Realdomainswithloopingbehaviorincludeeatingjunkfoodandgoingtoclass.
3.9
a.Hereisonepossiblerepresentation:Astateisasix-tuple ofintegerslistingthenumber ofmissionaries,cannibals,andboatsonthefirstside,andthenthesecondsideofthe river.Thegoalisastatewith3missionariesand3cannibals onthesecondside.The costfunctionisoneperaction,andthesuccessorsofastate areallthestatesthatmove 1or2peopleand1boatfromonesidetoanother.
b.Thesearchspaceissmall,soanyoptimalalgorithmworks.Foranexample,seethe file "search/domains/cannibals.lisp".Itsufficestoeliminatemovesthat circlebacktothestatejustvisited.Fromallbutthefirstandlaststates,thereisonly oneotherchoice.
c.Itisnotobviousthatalmostallmovesareeitherillegalor reverttothepreviousstate. Thereisafeelingofalargebranchingfactor,andnoclearwaytoproceed.
3.10 A state isasituationthatanagentcanfinditselfin.Wedistinguish twotypesofstates: worldstates(theactualconcretesituationsintherealworld)andrepresentationalstates(the abstractdescriptionsoftherealworldthatareusedbytheagentindeliberatingaboutwhatto do).
A statespace isagraphwhosenodesarethesetofallstates,andwhoselinksare actionsthattransformonestateintoanother.
A searchtree isatree(agraphwithnoundirectedloops)inwhichtherootnodeisthe startstateandthesetofchildrenforeachnodeconsistsofthestatesreachablebytakingany action.
A searchnode isanodeinthesearchtree.
A goal isastatethattheagentistryingtoreach.
An action issomethingthattheagentcanchoosetodo.
A successorfunction describedtheagent’soptions:givenastate,itreturnsasetof (action,state)pairs,whereeachstateisthestatereachablebytakingtheaction.
The branchingfactor inasearchtreeisthenumberofactionsavailabletotheagent.
3.11 Aworldstateishowrealityisorcouldbe.Inoneworldstatewe’reinArad,inanother we’reinBucharest.Theworldstatealsoincludeswhichstreetwe’reon,what’scurrentlyon theradio,andthepriceofteainChina.Astatedescriptionisanagent’sinternaldescriptionofaworldstate.Examplesare In(Arad) and In(Bucharest).Thesedescriptionsare necessarilyapproximate,recordingonlysomeaspectofthe state.
Weneedtodistinguishbetweenworldstatesandstatedescriptionsbecausestatedescriptionarelossyabstractionsoftheworldstate,becausetheagentcouldbemistakenabout
20
howtheworldis,becausetheagentmightwanttoimaginethingsthataren’ttruebutitcould maketrue,andbecausetheagentcaresabouttheworldnotits internalrepresentationofit.
Searchnodesaregeneratedduringsearch,representingastatethesearchprocessknows howtoreach.Theycontainadditionalinformationasidefromthestatedescription,suchas thesequenceofactionsusedtoreachthisstate.Thisdistinctionisusefulbecausewemay generatedifferentsearchnodeswhichhavethesamestate,andbecausesearchnodescontain moreinformationthanastaterepresentation.
3.12
Thestatespaceisatreeofdepthone,withallstatessuccessorsoftheinitialstate. Thereisnodistinctionbetweendepth-firstsearchandbreadth-firstsearchonsuchatree.If thesequencelengthisunboundedtherootnodewillhaveinfinitelymanysuccessors,soonly algorithmswhichtestforgoalnodesaswegeneratesuccessorscanwork.
Whathappensnextdependsonhowthecompositeactionsaresorted.Ifthereisno particularordering,thenarandombutsystematicsearchof potentialsolutionsoccurs.Ifthey aresortedbydictionaryorder,thenthisimplementsdepth-firstsearch.Iftheyaresortedby lengthfirst,thendictionaryordering,thisimplementsbreadth-firstsearch.
Asignificantdisadvantageofcollapsingthesearchspacelikethisisifwediscoverthat aplanstartingwiththeaction“unplugyourbattery”can’tbeasolution,thereisnoeasyway toignoreallothercompositeactionsthatstartwiththisaction.Thisisaprobleminparticular forinformedsearchalgorithms.
Discardingsequencestructureisnotaparticularlypracticalapproachtosearch.
3.13
Thegraphseparationpropertystatesthat“everypathfromtheinitialstatetoanunexploredstatehastopassthroughastateinthefrontier.”
Atthestartofthesearch,thefrontierholdstheinitialstate;hence,trivially,everypath fromtheinitialstatetoanunexploredstateincludesanode inthefrontier(theinitialstate itself).
Now,weassumethatthepropertyholdsatthebeginningofanarbitraryiterationof theGRAPH-SEARCH algorithminFigure3.7.Weassumethattheiterationcompletes,i.e., thefrontierisnotemptyandtheselectedleafnode n isnotagoalstate.Attheendofthe iteration, n hasbeenremovedfromthefrontieranditssuccessors(ifnot alreadyexploredorin thefrontier)placedinthefrontier.Consideranypathfrom theinitialstatetoanunexplored state;bytheinductionhypothesissuchapath(atthebeginningoftheiteration)includes atleastonefrontiernode;exceptwhen n istheonlysuchnode,theseparationproperty automaticallyholds.Hence,wefocusonpathspassingthrough n (andnootherfrontier node).Bydefinition,thenextnode n′ alongthepathfrom n mustbeasuccessorof n that (bytheprecedingsentence)isalreadynotinthefrontier.Furthermore, n′ cannotbeinthe exploredset,sincebyassumptionthereisapathfrom n′ toanunexplorednodenotpassing throughthefrontier,whichwouldviolatetheseparationpropertyaseveryexplorednodeis connectedtotheinitialstatebyexplorednodes(seelemmabelowforproofthisisalways possible).Hence, n′ isnotintheexploredset,henceitwillbeaddedtothefrontier;thenthe pathwillincludeafrontiernodeandtheseparationpropertyisrestored.
Thepropertyisviolatedbyalgorithmsthatmovenodesfromthefrontierintotheex-
21
Chapter3.SolvingProblemsbySearching ploredsetbeforealloftheirsuccessorshavebeengenerated,aswellasbythosethatfailto addsomeofthesuccessorstothefrontier.Notethatitisnot necessarytogenerate all successorsofanodeatoncebeforeexpandinganothernode,aslongaspartiallyexpandednodes remaininthefrontier.
Lemma:Everyexplorednodeisconnectedtotheinitialstate byapathofexplored nodes.
Proof:Thisistrueinitially,sincetheinitialstateisconnectedtoitself.Sincewenever removenodesfromtheexploredregion,weonlyneedtochecknewnodesweaddtothe exploredlistonanexpansion.Let n besuchanewexplorednode.Thisispreviouslyon thefrontier,soitisaneighborofanode n′ previouslyexplored(i.e.,itsparent). n′ is,by hypothesisisconnectedtotheinitialstatebyapathofexplorednodes.Thispathwith n appendedisapathofexplorednodesconnecting n′ totheinitialstate.
3.14
a False:aluckyDFSmightexpandexactly d nodestoreachthegoal.A∗ largelydominatesanygraph-searchalgorithmthatis guaranteedtofindoptimalsolutions
b. True: h(n)=0 isalwaysanadmissibleheuristic,sincecostsarenonnegative.
c True:A*searchisoftenusedinrobotics;thespacecanbediscretizedorskeletonized.
d True:depthofthesolutionmattersforbreadth-firstsearch,not cost.
e. False:arookcanmoveacrosstheboardinmoveone,althoughtheManhattandistance fromstarttofinishis8.
b
Depth-limited:1248951011
Iterativedeepening:1;123;1245367;1248951011
c.Bidirectionalsearchisveryuseful,becausetheonlysuccessorof n inthereversedirectionis ⌊(n/2)⌋.Thishelpsfocusthesearch.Thebranchingfactoris2inthe forward direction;1inthereversedirection.
22
1 2 3 4 5 6 7 8 9 10 12 11 13 14 15
3.15
FigureS3.1 ThestatespacefortheproblemdefinedinEx.3.15.
a.SeeFigureS3.1.
.Breadth-first:1234567891011
d.Yes;startatthegoal,andapplythesinglereversesuccessoractionuntilyoureach1.
e.Thesolutioncanbereadoffthebinarynumeralforthegoalnumber.Writethegoal numberinbinary.Sincewecanonlyreachpositiveintegers, thisbinaryexpansion beingswitha1.Frommost-toleast-significantbit,skippingtheinitial1,goLeftto thenode 2n ifthisbitis0andgoRighttonode 2n +1 ifitis1.Forexample,suppose thegoalis 11,whichis 1011 inbinary.ThesolutionisthereforeLeft,Right,Right.
3.16
a. Initialstate:onearbitrarilyselectedpiece(sayastraightpiece).
Successorfunction:foranyopenpeg,addanypiecetypefromremainingtypes.(You canaddtoopenholesaswell,butthatisn’tnecessaryasallcompletetrackscanbe madebyaddingtopegs.)Foracurvedpiece,add ineitherorientation;forafork,add ineitherorientation and(iftherearetwoholes)connecting ateitherhole.It’sagood ideatodisallowanyoverlappingconfiguration,asthisterminateshopelessconfigurationsearly.(Note:thereisnoneedtoconsideropenholes,becauseinanysolutionthese willbefilledbypiecesaddedtoopenpegs.)
Goaltest:allpiecesusedinasingleconnectedtrack,noopenpegsorholes,nooverlappingtracks.
Stepcost:oneperpiece(actually,doesn’treallymatter).
b.Allsolutionsareatthesamedepth,sodepth-firstsearchwouldbeappropriate.(One couldalsousedepth-limitedsearchwithlimit n 1,butstrictlyspeakingit’snotnecessarytodotheworkofcheckingthelimitbecausestatesatdepth n 1 havenosuccessors.)Thespaceisverylarge,souniform-costandbreadth-firstwouldfail,anditerative deepeningsimplydoesunnecessaryextrawork.Therearemanyrepeatedstates,soit mightbegoodtouseaclosedlist.
c.Asolutionhasnoopenpegsorholes,soeverypegisinahole, sotheremustbeequal numbersofpegsandholes.Removingaforkviolatesthisproperty.Therearetwoother “proofs”thatareacceptable:1)asimilarargumenttotheeffectthattheremustbean evennumberof“ends”;2)eachforkcreatestwotracks,andonlyaforkcanrejointhose tracksintoone,soifaforkismissingitwon’twork.Theargumentusingpegsandholes isactuallymoregeneral,becauseitalsoappliestothecase ofathree-wayforkthathas oneholeandthreepegsoronepegandthreeholes.The“ends”argumentfailshere,as doesthefork/rejoinargument(whichisabithandwavyanyway).
d.Themaximumpossiblenumberofopenpegsis3(startsat1,addingatwo-pegfork increasesitbyone).Pretendingeachpieceisunique,anypiececanbeaddedtoapeg, givingatmost 12+(2 · 16)+(2 · 2)+(2 · 2 · 2)=56 choicesperpeg.Thetotal depthis32(thereare32pieces),soanupperboundis 16832/(12! · 16! · 2! · 2!) where thefactorialsdealwithpermutationsofidenticalpieces. Onecoulddoamorerefined analysistohandlethefactthatthebranchingfactorshrinksaswegodownthetree,but itisnotpretty.
3.17a. Thealgorithmexpandsnodesinorderofincreasingpathcost;thereforethefirst goalitencounterswillbethegoalwiththecheapestcost.
23
Chapter3.SolvingProblemsbySearching
b. Itwillbethesameasiterativedeepening, d iterations,inwhich O(bd) nodesare generated.
c. d/ǫ
d. Implementationnotshown.
3.18 Consideradomaininwhicheverystatehasasinglesuccessor,andthereisasinglegoal atdepth n.Thendepth-firstsearchwillfindthegoalin n steps,whereasiterativedeepening searchwilltake 1+2+3+ ··· + n = O(n2) steps.
3.19 Asanordinaryperson(oragent)browsingtheweb,wecanonly generatethesuccessorsofapagebyvisitingit.Wecanthendobreadth-firstsearch,orperhapsbest-search searchwheretheheuristicissomefunctionofthenumberofwordsincommonbetweenthe startandgoalpages;thismayhelpkeepthelinksontarget.Searchengineskeepthecomplete graphoftheweb,andmayprovidetheuseraccesstoall(oratleastsome)ofthepagesthat linktoapage;thiswouldallowustodobidirectionalsearch.
3.20 Codenotshown,butagoodstartisinthecoderepository.Clearly,graphsearch mustbeused—thisisaclassicgridworldwithmanyalternate pathstoeachstate.Students willquicklyfindthatcomputingtheoptimalsolutionsequenceisprohibitivelyexpensivefor moderatelylargeworlds,becausethestatespaceforan n × n worldhas n2 · 2n states.The completiontimeoftherandomagentgrowslessthanexponentiallyin n,soforanyreasonable exchangeratebetweensearchcostadpathcosttherandomagentwilleventuallywin.
3.21
a.Whenallstepcostsareequal, g(n) ∝ depth (n),souniform-costsearchreproduces breadth-firstsearch.
b.Breadth-firstsearchisbest-firstsearchwith f (n)= depth (n);depth-firstsearchis best-firstsearchwith f (n)= depth (n);uniform-costsearchisbest-firstsearchwith f (n)= g(n).
c.Uniform-costsearchisA∗ searchwith h(n)=0.
3.22 Thestudentshouldfindthatonthe8-puzzle,RBFSexpandsmorenodes(because itdoesnotdetectrepeatedstates)buthaslowercostpernodebecauseitdoesnotneedto maintainaqueue.ThenumberofRBFSnodere-expansionsisnottoohighbecausethe presenceofmanytiedvaluesmeansthatthebestpathchanges seldom.Whentheheuristicis slightlyperturbed,thisadvantagedisappearsandRBFS’sperformanceismuchworse.
ForTSP,thestatespaceisatree,sorepeatedstatesarenotanissue.Ontheotherhand, theheuristicisreal-valuedandthereareessentiallynotiedvalues,soRBFSincursaheavy penaltyforfrequentre-expansions.
3.23 Thesequenceofqueuesisasfollows:
L[0+244=244]
M[70+241=311],T[111+329=440]
L[140+244=384],D[145+242=387],T[111+329=440]
D[145+242=387],T[111+329=440],M[210+241=451],T[251+329=580]
24
C[265+160=425],T[111+329=440],M[210+241=451],M[220+241=461],T[251+329=580]
T[111+329=440],M[210+241=451],M[220+241=461],P[403+100=503],T[251+329=580],R[411+193=604], D[385+242=627]
M[210+241=451],M[220+241=461],L[222+244=466],P[403+100=503],T[251+329=580],A[229+366=595], R[411+193=604],D[385+242=627]
M[220+241=461],L[222+244=466],P[403+100=503],L[280+244=524],D[285+242=527],T[251+329=580], A[229+366=595],R[411+193=604],D[385+242=627]
L[222+244=466],P[403+100=503],L[280+244=524],D[285+242=527],L[290+244=534],D[295+242=537], T[251+329=580],A[229+366=595],R[411+193=604],D[385+242=627]
P[403+100=503],L[280+244=524],D[285+242=527],M[292+241=533],L[290+244=534],D[295+242=537], T[251+329=580],A[229+366=595],R[411+193=604],D[385+242=627],T[333+329=662]
B[504+0=504],L[280+244=524],D[285+242=527],M[292+241=533],L[290+244=534],D[295+242=537],T[251+329=580], A[229+366=595],R[411+193=604],D[385+242=627],T[333+329=662],R[500+193=693],C[541+160=701]
FigureS3.2
AgraphwithaninconsistentheuristiconwhichGRAPH-SEARCH failsto returntheoptimalsolution.Thesuccessorsof S are A with f =5 and B with f =7. A is expandedfirst,sothepathvia B willbediscardedbecause A willalreadybeintheclosed list.
3.24 SeeFigureS3.2.
3.25 Itiscompletewhenever 0 ≤ w< 2. w =0 gives f (n)=2g(n).Thisbehavesexactly likeuniform-costsearch—thefactoroftwomakesnodifferenceinthe ordering ofthenodes. w =1 givesA∗ search. w =2 gives f (n)=2h(n),i.e.,greedybest-firstsearch.Wealso have f (n)=(2 w)[g(n)+ w 2 w h(n)] whichbehavesexactlylikeA∗ searchwithaheuristic w 2 w h(n).For w ≤ 1,thisisalways lessthan h(n) andhenceadmissible,provided h(n) isitselfadmissible.
3.26
a.Thebranchingfactoris4(numberofneighborsofeachlocation).
b.Thestatesatdepth k formasquarerotatedat45degreestothegrid.Obviouslythere arealinearnumberofstatesalongtheboundaryofthesquare,sotheansweris 4k
25
S A G B h=7 h=5 h=1 h=0 2 1 4 4
Chapter3.SolvingProblemsbySearching
c.Withoutrepeatedstatechecking,BFSexpendsexponentiallymanynodes:counting precisely,weget ((4x+y+1 1)/3) 1.
d.Therearequadraticallymanystateswithinthesquarefordepth x + y,sotheansweris 2(x + y)(x + y +1) 1.
e.True;thisistheManhattandistancemetric.
f.False;allnodesintherectangledefinedby (0, 0) and (x,y) arecandidatesforthe optimalpath,andtherearequadraticallymanyofthem,allofwhichmaybeexpended intheworstcase.
g.True;removinglinksmayinducedetours,whichrequiremoresteps,so h isanunderestimate.
h.False;nonlocallinkscanreducetheactualpathlengthbelowtheManhattandistance.
3.27
a n2n.Thereare n vehiclesin n2 locations,soroughly(ignoringtheone-per-square constraint) (n2)n = n2n states.
b 5n
c.Manhattandistance,i.e., |(n i +1) xi| + |n yi|.Thisisexactforalonevehicle.
d.Only(iii) min{h1,...,hn}.Theexplanationisnontrivialasitrequirestwoobservations.First,letthe work W inagivensolutionbethetotal distance movedbyall vehiclesovertheirjointtrajectories;thatis,foreachvehicle,addthelengthsofallthe stepstaken.Wehave W ≥ i hi ≥≥ n min{h1,...,hn}.Second,thetotalworkwe cangetdoneperstepis ≤ n.(Notethatforeverycarthatjumps2,anothercarhasto stayput(move0),sothetotalworkperstepisboundedby n.)Hence,completingall theworkrequiresatleast n · min{h1,...,hn}/n = min{h1,...,hn} steps.
3.28 Theheuristic h = h1 + h2 (addingmisplacedtilesandManhattandistance)sometimes overestimates.Now,suppose h(n) ≤ h∗(n)+ c (asgiven)andlet G2 beagoalthatis suboptimalbymorethan c,i.e., g(G2) >C ∗ + c.Nowconsideranynode n onapathtoan optimalgoal.Wehave
f (n)= g(n)+ h(n)
≤ g(n)+ h∗(n)+ c
≤ C ∗ + c
≤ g(G2)
so G2 willneverbeexpandedbeforeanoptimalgoalisexpanded.
3.29 Aheuristicisconsistentiff,foreverynode n andeverysuccessor n′ of n generatedby anyaction a, h(n) ≤ c(n,a,n ′)+ h(n ′)
Onesimpleproofisbyinductiononthenumber k ofnodesontheshortestpathtoanygoal from n.For k =1,let n′ bethegoalnode;then h(n) ≤ c(n,a,n′).Fortheinductive
26
case,assume n′ isontheshortestpath k stepsfromthegoalandthat h(n′) isadmissibleby hypothesis;then
h(n) ≤ c(n,a,n ′)+ h(n ′) ≤ c(n,a,n ′)+ h∗(n ′)= h∗(n) so h(n) at k +1 stepsfromthegoalisalsoadmissible.
3.30 Thisexercisereiteratesasmallportionoftheclassicwork ofHeldandKarp(1970).
a.TheTSPproblemistofindaminimal(totallength)paththroughthecitiesthatforms aclosedloop.MSTisarelaxedversionofthatbecauseitasks foraminimal(total length)graphthatneednotbeaclosedloop—itcanbeanyfully-connectedgraph.As aheuristic,MSTisadmissible—itisalwaysshorterthanorequaltoaclosedloop.
b.Thestraight-linedistancebacktothestartcityisaratherweakheuristic—itvastly underestimateswhentherearemanycities.Inthelaterstageofasearchwhenthereare onlyafewcitiesleftitisnotsobad.TosaythatMSTdominatesstraight-linedistance istosaythatMSTalwaysgivesahighervalue.ThisisobviouslytruebecauseaMST thatincludesthegoalnodeandthecurrentnodemusteitherbethestraightlinebetween them,oritmustincludetwoormorelinesthatadduptomore.(Thisallassumesthe triangleinequality.)
c.See "search/domains/tsp.lisp" forastartatthis.Thefileincludesaheuristic basedonconnectingeachunvisitedcitytoitsnearestneighbor,acloserelativetothe MSTapproach.
d.See(Cormen etal.,1990,p.505)foranalgorithmthatrunsin O(E log E) time,where E isthenumberofedges.Thecoderepositorycurrentlycontainsasomewhatless efficientalgorithm.
Themisplaced-tilesheuristicisexactfortheproblemwhereatilecanmovefrom squareAtosquareB.Asthisisarelaxationoftheconditionthatatilecanmovefrom squareAtosquareBifBisblank,Gaschnig’sheuristiccannotbelessthanthemisplacedtilesheuristic.Asitisalsoadmissible(beingexactforarelaxationoftheoriginalproblem), Gaschnig’sheuristicisthereforemoreaccurate.
3.31
Ifwepermutetwoadjacenttilesinthegoalstate,wehaveastatewheremisplaced-tiles andManhattanbothreturn2,butGaschnig’sheuristicreturns3.
TocomputeGaschnig’sheuristic,repeatthefollowinguntilthegoalstateisreached: letBbethecurrentlocationoftheblank;ifBisoccupiedbytileX(nottheblank)inthe goalstate,moveXtoB;otherwise,moveanymisplacedtileto B.Studentscouldbeaskedto provethatthisistheoptimalsolutiontotherelaxedproblem.
3.32 Studentsshouldprovideresultsintheformofgraphsand/or tablesshowingbothruntimeandnumberofnodesgenerated.(Differentheuristicshavedifferentcomputationcosts.) Runtimesmaybeverysmallfor8-puzzles,soyoumaywanttoassignthe15-puzzleor24puzzleinstead.Theuseofpatterndatabasesisalsoworthexploringexperimentally.
27
4.1
SolutionsforChapter4 BeyondClassicalSearch
a.Localbeamsearchwith k =1 ishill-climbingsearch.
b.Localbeamsearchwithoneinitialstateandnolimitonthenumberofstatesretained, resemblesbreadth-firstsearchinthatitaddsonecompletelayerofnodesbeforeadding thenextlayer.Startingfromonestate,thealgorithmwould beessentiallyidenticalto breadth-firstsearchexceptthateachlayerisgeneratedall atonce.
c.Simulatedannealingwith T =0 atalltimes:ignoringthefactthattheterminationstep wouldbetriggeredimmediately,thesearchwouldbeidenticaltofirst-choicehillclimbingbecauseeverydownwardsuccessorwouldberejectedwith probability1.(Exercise maybemodifiedinfutureprintings.)
d.Simulatedannealingwith T = ∞ atalltimesisarandom-walksearch:italways acceptsanewstate.
e.Geneticalgorithmwithpopulationsize N =1:ifthepopulationsizeis1,thenthe twoselectedparentswillbethesameindividual;crossover yieldsanexactcopyofthe individual;thenthereisasmallchanceofmutation.Thus,thealgorithmexecutesa randomwalkinthespaceofindividuals.
4.2 Despiteitshumbleorigins,thisquestionraisesmanyofthe sameissuesasthescientificallyimportantproblemofproteindesign.Thereisadiscreteassemblyspaceinwhichpieces arechosentobeaddedtothetrackandacontinuousconfigurationspacedeterminedbythe “jointangles”ateveryplacewheretwopiecesarelinked.Thuswecandefineastateasasetof oriented,linkedpiecesandtheassociatedjointanglesintherange [ 10, 10],plusasetofunlinkedpieces.Thelinkageandjointanglesexactlydeterminethephysicallayoutofthetrack; wecanallowfor(andpenalize)layoutsinwhichtrackslieon topofoneanother,orwecan disallowthem.Theevaluationfunctionwouldincludeterms forhowmanypiecesareused, howmanylooseendsthereare,and(ifallowed)thedegreeofoverlap.Wemightincludea penaltyfortheamountofdeviationfrom0-degreejointangles.(Wecouldalsoincludeterms for“interestingness”and“traversability”—forexample, itisnicetobeabletodriveatrain startingfromanytracksegmenttoanyother,endingupineitherdirectionwithouthavingto liftupthetrain.)Thetrickypartisthesetofallowedmoves.Obviouslywecanunlinkany pieceorlinkanunlinkedpiecetoanopenpegwitheitherorientationatanyallowedangle (possiblyexcludingmovesthatcreateoverlap).Moreproblematicaremovestojoinapeg
28
andholeonalready-linkedpiecesandmovestochangetheangleofajoint.Changingone anglemayforcechangesinothers,andthechangeswillvarydependingonwhethertheother piecesareattheirjoint-anglelimit.Ingeneraltherewill benounique“minimal”solutionfor agivenanglechangeintermsoftheconsequentchangestootherangles,andsomechanges maybeimpossible.
4.3 Hereisonesimplehill-climbingalgorithm:
•Connectallthecitiesintoanarbitrarypath.
•Picktwopointsalongthepathatrandom.
•Splitthepathatthosepoints,producingthreepieces.
•Tryallsixpossiblewaystoconnectthethreepieces.
•Keepthebestone,andreconnectthepathaccordingly.
•Iteratethestepsaboveuntilnoimprovementisobservedforawhile.
4.4 Codenotshown.
4.5 SeeFigureS4.1fortheadaptedalgorithm.ForstatesthatOR-SEARCH findsasolution foritrecordsthesolutionfound.Ifitlatervisitsthatstateagainitimmediatelyreturnsthat solution.
WhenOR-SEARCH failstofindasolutionithastobecareful.Whetherastatecanbe solveddependsonthepathtakentothatsolution,aswedonot allowcycles.Soonfailure
OR-SEARCH recordsthevalueof path.Ifastateiswhichhaspreviouslyfailedwhen path containedanysubsetofitspresentvalue,OR-SEARCH returnsfailure.
Toavoidrepeatingsub-solutionswecanlabelallnewsolutionsfound,recordthese labels,thenreturnthelabelifthesestatesarevisitedagain.Post-processingcanpruneoff unusedlabels.Alternatively,wecanoutputadirectacyclicgraphstructureratherthanatree. See(Bertoli etal.,2001)forfurtherdetails.
4.6
Thequestionstatementdescribestherequiredchangesindetail,seeFigureS4.2forthe modifiedalgorithm.WhenOR-SEARCH cyclesbacktoastateon path itreturnsatoken loop whichmeanstoloopbacktothemostrecenttimethisstatewas reachedalongthepathto it.Since path isimplicitlystoredinthereturnedplan,thereissufficientinformationforlater processing,oramodifiedimplementation,toreplacethesewithlabels.
Theplanrepresentationisimplicitlyaugmentedtokeeptrackofwhethertheplanis cyclic(i.e.,containsa loop)sothatOR-SEARCH canpreferacyclicsolutions.
AND-SEARCH returnsfailureifallbranchesleaddirectlytoa loop,asinthiscasethe planwillalwaysloopforever.Thisistheonlycaseitneedstocheckasifallbranchesina finiteplanlooptheremustbesomeAnd-nodewhosechildrenallimmediatelyloop.
4.7 Asequenceofactionsisasolutiontoabeliefstateproblemifittakeseveryinitial physicalstatetoagoalstate.Wecanrelaxthisproblembyrequiringittakeonly some initial physicalstatetoagoalstate.Tomakethiswelldefined,we’llrequirethatitfindsasolution
29
function AND-OR-GRAPH-SEARCH(problem) returns aconditionalplan, orfailure
OR-SEARCH(problem.INITIAL-STATE, problem, [])
function OR-SEARCH(state, problem, path) returns aconditionalplan, orfailure if problem.GOAL-TEST(state) thenreturn theemptyplan if state haspreviouslybeensolved thenreturn RECALL-SUCCESS(state) if state haspreviouslyfailedforasubsetof path thenreturn failure if state ison path then
RECORD-FAILURE(state, path) return failure
foreach action in problem.ACTIONS(state) do plan ← AND-SEARCH(RESULTS(state, action), problem, [state | path]) if plan = failure then
RECORD-SUCCESS(state, [action | plan ]) return [action | plan ]
return failure
function AND-SEARCH(states, problem, path) returns aconditionalplan, orfailure foreach si in states do plan i ← OR-SEARCH(si, problem, path) if plan i = failure thenreturn failure
return [if s1 then plan 1 elseif s2 then plan 2 else if sn 1 then plan n 1 else plan n]
FigureS4.1 AND-OR searchwithrepeatedstatechecking.
forthephysicalstatewiththemostcostlysolution.If h∗(s) istheoptimalcostofsolution startingfromthephysicalstate s,then h(S)=max s∈S h∗(s)
istheheuristicestimategivenbythisrelaxedproblem.Thisheuristicassumesanysolution tothemostdifficultstatetheagentthingspossiblewillsolveallstates.
OnthesensorlessvacuumcleanerprobleminFigure4.14, h correctlydeterminesthe optimalcostforallstatesexceptthecentralthreestates(thosereachedby [suck], [suck,left] and [suck,right])andtheroot,forwhich h estimatestobe1unitcheaperthantheyreally are.ThismeansA∗ willexpandthesethreecentralnodes,beforemarchingtowardsthe solution.
4.8
a.Anactionsequenceisasolutionforbeliefstate b ifperformingitstartinginanystate s ∈ b reachesagoalstate.Sinceanystateinasubsetof b isin b,theresultisimmediate. Anyactionsequencewhichis notasolutionforbeliefstate b isalsonotasolution foranysuperset;thisisthecontrapositiveofwhatwe’vejustproved.Onecannot,in general,sayanythingaboutarbitrarysupersets,astheactionsequenceneednotleadto agoalonthestatesoutsideof b.Onecansay,forexample,thatifanactionsequence
30 Chapter4.BeyondClassicalSearch
function AND-OR-GRAPH-SEARCH(problem) returns aconditionalplan, orfailure
OR-SEARCH(problem.INITIAL-STATE, problem, [])
function OR-SEARCH(state, problem, path) returns aconditionalplan, orfailure if problem.GOAL-TEST(state) thenreturn theemptyplan if state ison path thenreturn loop cyclic plan ← None foreach action in problem.ACTIONS(state) do plan ← AND-SEARCH(RESULTS(state, action), problem, [state | path]) if plan = failure then if plan isacyclic thenreturn [action | plan ] cyclic plan ← [action | plan ] if cyclic plan = None thenreturn cyclic plan return failure
function AND-SEARCH(states, problem, path) returns aconditionalplan, orfailure loopy ← True foreach si in states do plan i ← OR-SEARCH(si, problem, path) if plan i = failure thenreturn failure if plan i = loop then loopy ← False if not loopy then return [if s1 then plan 1 elseif s2 then plan 2 else ... if sn 1 then plan n 1 else plan n] return failure
solvesabeliefstate b andabeliefstate b′ thenitsolvestheunionbeliefstate b ∪ b′ .
b.Onexpansionofanode,donotaddtothefrontieranychildbeliefstatewhichisa supersetofapreviouslyexploredbeliefstate.
c.Ifyoukeeparecordofpreviouslysolvedbeliefstates,add achecktothestartofORsearchtocheckwhetherthebeliefstatepassedinisasubset ofapreviouslysolved beliefstate,returningtheprevioussolutionincaseitis.
4.9
Consideraverysimpleexample:aninitialbeliefstate {S1,S2},actions a and b both leadingtogoalstate G fromeitherinitialstate,and
c(S1,a,G)=3; c(S2,a,G)=5;
c(S1,b,G)=2; c(S2,b,G)=6 . Inthiscase,thesolution [a] costs3or5,thesolution [b] costs2or6.Neitheris“optimal”in anyobvioussense.
Insomecases,there will beanoptimalsolution.Letusconsiderjustthedeterministic case.Forthiscase,wecanthinkofthecostofaplanasamappingfromeachinitialphysicalstatetotheactualcostofexecutingtheplan.Intheexampleabove,thecostfor [a] is
31
FigureS4.2 AND-OR searchwithrepeatedstatechecking.
{S1:3,S2:5} andthecostfor [b] is {S1:2,S2:6}.Wecansaythatplan p1 weaklydominates p2 if,foreachinitialstate,thecostfor p1 isnohigherthanthecostfor p2.(Moreover, p1 dominates p2 ifitweaklydominatesit and hasalowercostforsomestate.)Ifaplan p weakly dominatesallothers,itisoptimal.Noticethatthisdefinitionreducestoordinaryoptimalityin theobservablecasewhereeverybeliefstateisasingleton. Astheprecedingexampleshows, however,aproblemmayhavenooptimalsolutioninthissense.Aperhapsacceptableversion ofA∗ wouldbeonethatreturnsanysolutionthatisnotdominatedbyanother.
TounderstandwhetheritispossibletoapplyA∗ atall,ithelpstounderstanditsdependenceonBellman’s(1957) principleofoptimality: Anoptimalpolicyhasthepropertythat PRINCIPLEOF OPTIMALITY whatevertheinitialstateandinitialdecisionare,theremainingdecisionsmustconstitutean optimalpolicywithregardtothestateresultingfromthefirstdecision. Itisimportantto understandthatthisisarestrictiononperformancemeasuresdesignedtofacilitateefficient algorithms,notageneraldefinitionofwhatitmeanstobeoptimal.
Inparticular,ifwedefinethecostofaplaninbelief-statespaceastheminimumcost ofanyphysicalrealization,weviolateBellman’sprinciple.Modifyingandextendingthe previousexample,supposethat a and b reach S3 from S1 and S4 from S2,andthenreach G fromthere:
c(S1,a,S3)=6; c(S2,a,S4)=2;
c(S1,b,S3)=6; c(S2,b,S4)=1 .c(S3,a,G)=2; c(S4,a,G)=2;
c(S3,b,G)=1; c(S4,b,G)=9 .
Inthebeliefstate {S3,S4},theminimumcostof [a] is min{2, 2} =2 andtheminimumcost of [b] is min{1, 9} =1,sotheoptimalplanis [b].Intheinitialbeliefstate {S1,S2},thefour possibleplanshavethefollowingcosts:
Hence,theoptimalplanin {S1,S2} is [b,a],whichdoes not choose b in {S3,S4} eventhough thatistheoptimalplanatthatpoint.Thiscounterintuitivebehaviorisadirectconsequence ofchoosingtheminimumofthepossiblepathcostsastheperformancemeasure.
Thisexamplegivesjustasmalltasteofwhatmighthappenwithnonadditiveperformancemeasures.DetailsofhowtomodifyandanalyzeA∗ forgeneralpath-dependentcost functionsaregivebyDechterandPearl(1985).Manyaspects ofA∗ carryover;forexample, wecanstillderivelowerboundsonthecostofapaththrougha givennode.Forabeliefstate b,theminimumvalueof g(s)+ h(s) foreachstate s in b isalowerboundontheminimum costofaplanthatgoesthrough b
4.10 ThebeliefstatespaceisshowninFigureS4.3.Nosolutionis possiblebecausenopath leadstoabeliefstateallofwhoseelementssatisfythegoal.Iftheproblemisfullyobservable, theagentreachesagoalstatebyexecutingasequencesuchthat Suck isperformedonlyina dirtysquare.Thisensuresdeterministicbehaviorandeverystateisobviouslysolvable.
4.11
Thestudentneedstomakeseveraldesignchoicesinansweringthisquestion.First, howwilltheverticesofobjectsberepresented?Theproblem statestheperceptisalistof vertexpositions,butthatisnotpreciseenough.Hereisone goodchoice:Theagenthasan
32 Chapter4.BeyondClassicalSearch
a,a]:min
8
4
a,b]:min
7, 11} =7;[b,a]:min{8, 3} =3;[b,b]:min{7, 10} =7
[
{
,
} =4;[
{
orientation(aheadingindegrees).Thevisiblevertexesarelistedinclockwiseorder,starting straightaheadoftheagent.Eachvertexhasarelativeangle (0to360degrees)andadistance. Wealsowanttoknowifavertexrepresentstheleftedgeofanobstacle,therightedge,oran interiorpoint.WecanusethesymbolsL,R,orItoindicatethis.
Thestudentwillneedtodosomebasiccomputationalgeometrycalculations:intersectionofapathandasetoflinesegmentstoseeiftheagentwill bumpintoanobstacle,and visibilitycalculationstodeterminethepercept.Thereareefficientalgorithmsfordoingthis onasetoflinesegments,butdon’tworryaboutefficiency;an exhaustivealgorithmisok.If thisseemstoomuch,theinstructorcanprovideanenvironmentsimulatorandaskthestudent onlytoprogramtheagent.
Toanswer(c),thestudentwillneedsomeexchangeratefortradingoffsearchtimewith movementtime.Itisprobablytoocomplextomakethesimulationasynchronousreal-time; easiertoimposeapenaltyinpointsforcomputation.
For(d),theagentwillneedtomaintainasetofpossiblepositions.Eachtimetheagent moves,itmaybeabletoeliminatesomeofthepossibilities. Theagentmayconsidermoves thatservetoreduceuncertaintyratherthanjustgettothegoal.
4.12 Thisquestionisslightlyambiguousastowhattheperceptis—eithertheperceptisjust thelocation,oritgivesexactlythesetofunblockeddirections(i.e.,blockeddirectionsare illegalactions).Wewillassumethelatter.(Exercisemaybemodifiedinfutureprintings.) Thereare12possiblelocationsforinternalwalls,sothere are 212 =4096 possibleenvironmentconfigurations.Abeliefstatedesignatesa subset oftheseaspossibleconfigurations; forexample,beforeseeinganyperceptsall4096configurationsarepossible—thisisasingle beliefstate.
a.Onlinesearchisequivalenttoofflinesearchinbelief-statespacewhereeachaction inabelief-statecanhavemultiplesuccessorbelief-states:oneforeachperceptthe agentcouldobserveaftertheaction.Asuccessorbelief-stateisconstructedbytaking thepreviousbelief-state,itselfasetofstates,replacingeachstateinthisbelief-state bythesuccessorstateundertheaction,andremovingallsuccessorstateswhichare inconsistentwiththepercept.ThisisexactlytheconstructioninSection4.4.2.AND-OR searchcanbeusedtosolvethissearchproblem.Theinitialbeliefstatehas 210=1024 statesinit,asweknowwhethertwoedgeshavewallsornot(theupperandrightedges havenowalls)butnothingmore.Thereare 2212 possiblebeliefstates,oneforeachset ofenvironmentconfigurations.
33 L R L R S S S
FigureS4.3 Thebeliefstatespaceforthesensorlessvacuumworldunder Murphy’slaw.
Wecanviewthisasacontingencyprobleminbeliefstatespace.Aftereachactionandpercept,theagentlearnswhetherornotaninternal wallexistsbetweenthe currentsquareandeachneighboringsquare.Hence,eachreachablebeliefstatecanbe representedexactlybyalistofstatusvalues(present,absent,unknown)foreachwall separately.Thatis,thebeliefstateiscompletelydecomposableandthereareexactly 312 reachablebeliefstates.Themaximumnumberofpossiblewall-perceptsineachstate is16(24),soeachbeliefstatehasfouractions,eachwithupto16nondeterministic successors.
b.Assumingtheexternalwallsareknown,therearetwointernalwallsandhence 22 =4 possiblepercepts.
c.Theinitialnullactionleadstofourpossiblebeliefstates,asshowninFigureS4.4.From eachbeliefstate,theagentchoosesasingleactionwhichcanleadtoupto8beliefstates (onenteringthemiddlesquare).Giventhepossibilityofhavingtoretraceitsstepsat adeadend,theagentcanexploretheentiremazeinnomorethan18steps,sothe completeplan(expressedasatree)hasnomorethan 818 nodes.Ontheotherhand, therearejust 312 reachablebeliefstates,sotheplancouldbeexpressedmore concisely asatableofactionsindexedbybeliefstate(a policy intheterminologyofChapter17).
4.13 Hillclimbingissurprisinglyeffectiveatfindingreasonableifnotoptimalpathsforvery littlecomputationalcost,andseldomfailsintwodimensions.
34 Chapter4.BeyondClassicalSearch ?? ? ?? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ?? ? ? ? ? ? ? ?? ? ? ? ? ? ? ?? ? ? ? ? ? ? NoOp Right
FigureS4.4 The 3 × 3 mazeexplorationproblem:theinitialstate,firstpercept, andone selectedactionwithitsperceptualoutcomes.
a.Itispossible(seeFigureS4.5(a))butveryunlikely—theobstaclehastohaveanunusual shapeandbepositionedcorrectlywithrespecttothegoal.
b.Withnonconvexobstacles,gettingstuckismuchmorelikelytobeaproblem(seeFigureS4.5(b)).
c.Noticethatthisisjustdepth-limitedsearch,whereyouchooseastepalongthebestpath evenifitisnotasolution.
d.Set k tothemaximumnumberofsidesofanypolygonandyoucanalwaysescape.
e.LRTA*alwaysmakesamove,butmaymovebackiftheoldstatelooksbetterthanthe newstate.Butthentheoldstateispenalizedforthecostofthetrip,soeventuallythe localminimumfillsupandtheagentescapes.
4.14
Sincewecanobservesuccessorstates,wealwaysknowhowtobacktrackfromtoa previousstate.Thismeanswecanadaptiterativedeepening searchtosolvethisproblem. Theonlydifferenceisbacktrackingmustbeexplicit,followingtheactionwhichtheagent canseeleadstothepreviousstate.
35 Current position Goal (a)(b) Current position Goal
FigureS4.5 (a)Gettingstuckwithaconvexobstacle.(b)Gettingstuckwithanonconvex obstacle.
Thealgorithmexpandsthefollowingnodes: Depth1: (0, 0), (1, 0), (0, 0), ( 1, 0), (0, 0) Depth2: (0, 1), (0, 0), (0, 1), (0, 0), (1, 0), (2, 0), (1, 0), (0, 0), (1, 0), (1, 1), (1, 0), (1, 1)
SolutionsforChapter5 AdversarialSearch
5.1 Thetranslationusesthemodeloftheopponent OM (s) tofillintheopponent’sactions, leavingouractionstobedeterminedbythesearchalgorithm.Let P (s) bethestatepredicted tooccuraftertheopponenthasmadealltheirmovesaccordingto OM.Notethattheopponentmaytakemultiplemovesinarowbeforewegetamove,so weneedtodefinethis recursively.Wehave P (s)= s ifPLAYERs isusorTERMINAL-TESTs istrue,otherwise P (s)= P (RESULT(s,OM (s))
Thesearchproblemisthengivenby:
a.Initialstate: P (S0) where S0 istheinitialgamestate.Weapply P astheopponentmay playfirst
b.Actions:definedasinthegamebyACTIONSs.
c.Successorfunction:RESULT′(s,a)= P (RESULT(s,a))
d.Goaltest:goalsareterminalstates
e.Stepcost:thecostofanactioniszerounlesstheresulting state s′ isterminal,inwhich caseitscostis M UTILITY(s′) where M =maxs UTILITY(s).Noticethatallcosts arenon-negative.
Noticethatthestatespaceofthesearchproblemconsistsof gamestatewhereweareto playandterminalstates.Stateswheretheopponentistoplayhavebeencompiledout.One mightalternativelyleavethosestatesinbutjusthaveasinglepossibleaction.
AnyofthesearchalgorithmsofChapter3canbeapplied.Forexample,depth-first searchcanbeusedtosolvethisproblem,ifallgameseventuallyend.Thisisequivalentto usingtheminimaxalgorithmontheoriginalgameif OM (s) alwaysreturnstheminimax movein s
5.2
a.Initialstate:twoarbitrary8-puzzlestates.Successorfunction:onemoveonanunsolved puzzle.(Youcouldalsohaveactionsthatchangebothpuzzlesatthesametime;thisis OKbuttechnicallyyouhavetosaywhathappenswhenoneissolvedbutnottheother.)
Goaltest:bothpuzzlesingoalstate.Pathcost:1permove.
b.Eachpuzzlehas 9!/2 reachablestates(rememberthathalfthestatesareunreachable). Thejointstatespacehas (9!)2/4 states.
c.Thisislikebackgammon;expectiminimaxworks.
36
d.Actuallythestatementinthequestionisnottrue(itappliestoapreviousversionofpart (c)inwhichtheopponentisjusttryingtopreventyoufromwinning—inthatcase,the cointosseswilleventuallyallowyoutosolveonepuzzlewithoutinterruptions).Forthe gamedescribedin(c),considerastateinwhichthecoinhascomeupheads,say,and yougettoworkonapuzzlethatis2stepsfromthegoal.Should youmoveonestep closer?Ifyoudo,youropponentwinsifhetossesheads;orif hetossestails,youtoss tails,andhetossesheads;oranysequencewherebothtosstails n timesandthenhe tossesheads.Sohisprobabilityofwinningis atleast 1/2+1/8+1/32+ ··· =2/3.So itseemsyou’rebetteroffmoving away fromthegoal.(There’snowaytostaythesame distancefromthegoal.)Thisproblemunintentionallyseemstohavethesamekindof solutionassuicidetictactoewithpassing.
5.3
a.SeeFigureS5.1;thevaluesarejust(minus)thenumberofstepsalongthepathfromthe root.
b.SeeFigureS5.1;notethatthereisbothanupperboundandalowerboundfortheleft childoftheroot.
c.Seefigure.
d.Theshortest-pathlengthbetweenthetwoplayersisalower boundonthetotalcapture time(heretheplayerstaketurns,sononeedtodividebytwo),sothe“?”leaveshavea capturetimegreaterthanorequaltothesumofthecostfromtherootandtheshortestpathlength.NoticethatthisboundisderivedwhentheEvaderplaysverybadly.The truevalueofanodecomesfrombestplaybybothplayers,sowe cangetbetterbounds byassumingbetterplay.Forexample,wecangetabetterboundfromthecostwhenthe EvadersimplymovesbackwardsandforwardsratherthanmovingtowardsthePursuer.
e.Seefigure(wehaveusedthesimplebounds).Noticethatonce therightchildisknown
37 bd cd ad cecfcc debedfbf aeafac dddd−4−4 −2 −4−4 <=−6 <=−6 <=−6 <=−6 <=−4 −4 −4 −4 <=−6 −4
FigureS5.1 Pursuit-evasionsolutiontree.
tohaveavaluebelow–6,theremainingsuccessorsneednotbe considered.
f.Thepursueralwayswinsifthetreeisfinite.Toprovethis,letthetreeberootedas thepursuer’scurrentnode.(I.e.,pickupthetreebythatnodeanddanglealltheother branchesdown.)Theevadermusteitherbeattheroot,inwhichcasethepursuerhas won,orinsomesubtree.Thepursuertakesthebranchleading tothatsubtree.This processrepeatsatmost d times,where d isthemaximumdepthoftheoriginalsubtree, untilthepursuereithercatchestheevaderorreachesaleaf node.Sincetheleafhasno subtrees,theevadermustbeatthatnode.
5.4 Thebasicphysicalstateofthesegamesisfairlyeasytodescribe.Oneimportantthing torememberforScrabbleandbridgeisthatthephysicalstateisnotaccessibletoallplayers andsocannotbeprovideddirectlytoeachplayerbytheenvironmentsimulator.Particularly inbridge,eachplayerneedstomaintainsomebestguess(ormultiplehypotheses)astothe actualstateoftheworld.Weexpecttobeputtingsomeofthegameimplementationsonline astheybecomeavailable.
5.5 Codenotshown.
5.6 Themostobviouschangeisthatthespaceofactionsisnowcontinuous.Forexample, inpool,thecueingdirection,angleofelevation,speed,andpointofcontactwiththecueball areallcontinuousquantities.
Thesimplestsolutionisjusttodiscretizetheactionspace andthenapplystandardmethods.Thismightworkfortennis(modelledcrudelyasalternatingshotswithspeedanddirection),butforgamessuchaspoolandcroquetitislikelytofailmiserablybecausesmall changesindirectionhavelargeeffectsonactionoutcome.Instead,onemustanalyzethe gametoidentifyadiscretesetofmeaningfullocalgoals,suchas“pottingthe4-ball”inpool or“layingupforthenexthoop”incroquet.Then,inthecurrentcontext,alocaloptimization routinecanworkoutthebestwaytoachieveeachlocalgoal,resultinginadiscretesetofpossiblechoices.Typically,thesegamesarestochastic,sothebackgammonmodelisappropriate providedthatweusesampledoutcomesinsteadofsummingoveralloutcomes.
Whereaspoolandcroquetaremodelledcorrectlyasturn-takinggames,tennisisnot. Whileoneplayerismovingtotheball,theotherplayerismovingtoanticipatetheopponent’s return.Thismakestennismorelikethesimultaneous-actiongamesstudiedinChapter17.In particular,itmaybereasonabletoderive randomized strategiessothattheopponentcannot anticipatewheretheballwillgo.
5.7 Considera MIN nodewhosechildrenareterminalnodes.If MIN playssuboptimally, thenthevalueofthenodeisgreaterthanorequaltothevalue itwouldhaveif MIN played optimally.Hence,thevalueofthe MAX nodethatisthe MIN node’sparentcanonlybe increased.Thisargumentcanbeextendedbyasimpleinductionallthewaytotheroot. If thesuboptimalplayby MIN ispredictable,thenonecandobetterthanaminimaxstrategy. Forexample,if MIN alwaysfallsforacertainkindoftrapandloses,thensettingthetrap guaranteesawinevenifthereisactuallyadevastatingresponsefor MIN.Thisisshownin FigureS5.2.
5.8
38 Chapter5.AdversarialSearch
Asimplegametreeshowingthatsettingatrapfor MIN byplaying ai isawin if MIN fallsforit,butmayalsobedisastrous.Theminimaxmoveisofcourse a2,withvalue 5
Thegametreeforthefour-squaregameinExercise5.8.Terminalstatesare insingleboxes,loopstatesindoubleboxes.Eachstateisannotatedwithitsminimaxvalue inacircle.
a.(5)Thegametree,completewithannotationsofallminimax values,isshowninFigureS5.3.
b.(5)The“?”valuesarehandledbyassumingthatanagentwith achoicebetweenwinningthegameandenteringa“?”statewillalwayschoosethewin.Thatis,min(–1,?) is–1andmax(+1,?)is+1.Ifallsuccessorsare“?”,thebacked-upvalueis“?”.
c.(5)Standardminimaxisdepth-firstandwouldgointoaninfiniteloop.Itcanbefixed
39 MAX MIN a1 A B D −10 10001000 2b 3b 1b 2d 1d 3d −5 −5 −5 a2
FigureS5.2
(1,4) (2,4) (2,3) (1,3) (1,2) (3,2) (3,4) (4,3) (3,1) (2,4) (1,4) +1 −1 ? ? ? −1 −1 −1 +1 +1 +1
FigureS5.3
bycomparingthecurrentstateagainstthestack;andifthestateisrepeated,thenreturn a“?”value.Propagationof“?”valuesishandledasabove.Althoughitworksinthis case,itdoesnot always workbecauseitisnotclearhowtocompare“?”withadrawn position;norisitclearhowtohandlethecomparisonwhentherearewinsofdifferent degrees(asinbackgammon).Finally,ingameswithchancenodes,itisunclearhowto computetheaverageofanumberanda“?”.Notethatitis not correcttotreatrepeated statesautomaticallyasdrawnpositions;inthisexample,both(1,4)and(2,4)repeatin thetreebuttheyarewonpositions.
Whatisreallyhappeningisthateachstatehasawell-defined butinitiallyunknown value.Theseunknownvaluesarerelatedbytheminimaxequationatthebottomof 164.Ifthegametreeisacyclic,thentheminimaxalgorithmsolvestheseequationsby propagatingfromtheleaves.Ifthegametreehascycles,thenadynamicprogramming methodmustbeused,asexplainedinChapter17.(Exercise17.7studiesthisproblemin particular.)Thesealgorithmscandeterminewhethereachnodehasawell-determined value(asinthisexample)orisreallyaninfiniteloopinthat bothplayersprefertostay intheloop(orhavenochoice).Insuchacase,therulesofthe gamewillneedtodefine thevalue(otherwisethegamewillneverend).Inchess,forexample,astatethatoccurs 3times(andhenceisassumedtobedesirableforbothplayers)isadraw.
d.Thisquestionisalittletricky.Oneapproachisaproofbyinductiononthesizeofthe game.Clearly,thebasecase n =3 isalossforAandthebasecase n =4 isawinfor A.Forany n> 4,theinitialmovesarethesame:AandBbothmoveonesteptowards eachother.Now,wecanseethattheyareengagedinasubgameofsize n 2 onthe squares [2,...,n 1], except thatthereisanextrachoiceofmovesonsquares 2 and n 1.Ignoringthisforamoment,itisclearthatifthe“n 2”iswonforA,thenA getstothesquare n 1 beforeBgetstosquare 2 (bythedefinitionofwinning)and thereforegetsto n beforeBgetsto 1,hencethe“n”gameiswonforA.Bythesame lineofreasoning,if“n 2”iswonforBthen“n”iswonforB.Now,thepresenceof theextramovescomplicatestheissue,butnottoomuch.First,theplayerwhoisslated towinthesubgame [2,...,n 1] nevermovesbacktohishomesquare.Iftheplayer slatedtolosethesubgamedoesso,thenitiseasytoshowthat heisboundtolosethe gameitself—theotherplayersimplymovesforwardandasubgameofsize n 2k is playedonestepclosertotheloser’shomesquare.
5.9 For a,thereareatmost9!games.(Thisisthenumberofmovesequencesthatfillupthe board,butmanywinsandlossesendbeforetheboardisfull.) For b–e,FigureS5.4showsthe gametree,withtheevaluationfunctionvaluesbelowtheterminalnodesandthebacked-up valuestotherightofthenon-terminalnodes.Thevaluesimplythatthebeststartingmovefor Xistotakethecenter.Theterminalnodeswithaboldoutline aretheonesthatdonotneed tobeevaluated,assumingtheoptimalordering.
5.10
a.Anupperboundonthenumberofterminalnodesis N !,oneforeachorderingof squares,soanupperboundonthetotalnumberofnodesis N i=1 i!.Thisisnotmuch
40 Chapter5.AdversarialSearch