Instant Access to Robust adaptive dynamic programming 1st edition hao yu ebook Full Chapters by Ebook Home

Robust Adaptive Dynamic Programming 1st Edition Hao Yu

Visit to download the full and correct content document: https://textbookfull.com/product/robust-adaptive-dynamic-programming-1st-edition-ha o-yu/

More products digital (pdf, epub, mobi) instant download maybe you interests ...

Adaptive Dynamic Programming with Applications in Optimal Control 1st Edition Derong Liu

https://textbookfull.com/product/adaptive-dynamic-programmingwith-applications-in-optimal-control-1st-edition-derong-liu/

Engineering Adaptive Software Systems: Communications of NII Shonan Meetings Yijun Yu

https://textbookfull.com/product/engineering-adaptive-softwaresystems-communications-of-nii-shonan-meetings-yijun-yu/

Adaptive Critic Control with Robust Stabilization for Uncertain Nonlinear Systems Ding Wang

https://textbookfull.com/product/adaptive-critic-control-withrobust-stabilization-for-uncertain-nonlinear-systems-ding-wang/

Biologically Inspired Control of Humanoid Robot Arms Robust and Adaptive Approaches 1st Edition Adam Spiers

https://textbookfull.com/product/biologically-inspired-controlof-humanoid-robot-arms-robust-and-adaptive-approaches-1stedition-adam-spiers/

Simplified robust adaptive detection and beamforming for wireless communications First Edition. Edition Elnashar Ayman

https://textbookfull.com/product/simplified-robust-adaptivedetection-and-beamforming-for-wireless-communications-firstedition-edition-elnashar-ayman/

Programming Interview Problems: Dynamic Programming (with solutions in Python) 1st Edition Leonardo Rossi

https://textbookfull.com/product/programming-interview-problemsdynamic-programming-with-solutions-in-python-1st-editionleonardo-rossi/

Abstract Dynamic Programming Second Edition Dimitri P. Bertsekas

https://textbookfull.com/product/abstract-dynamic-programmingsecond-edition-dimitri-p-bertsekas/

Semi physical Verification Technology for Dynamic Performance of Internet of Things System Xiaolei Yu

https://textbookfull.com/product/semi-physical-verificationtechnology-for-dynamic-performance-of-internet-of-things-systemxiaolei-yu/

Adaptive Robust Control with Limited Knowledge on Systems Dynamics An Artificial Input Delay Approach and Beyond Spandan Roy

https://textbookfull.com/product/adaptive-robust-control-withlimited-knowledge-on-systems-dynamics-an-artificial-input-delayapproach-and-beyond-spandan-roy/

YUJIANG

TheMathWorks,Inc.

ZHONG-PINGJIANG

NewYorkUniversity

PublishedbyJohnWiley&Sons,Inc.,Hoboken,NewJersey. PublishedsimultaneouslyinCanada.

LibraryofCongressCataloging-in-PublicationDataisavailable.

ISBN:978-1-119-13264-6

PrintedintheUnitedStatesofAmerica.

GLOSSARYxix

1INTRODUCTION1

1.1FromRLtoRADP / 1

1.2SummaryofEachChapter / 5

References / 6

2ADAPTIVEDYNAMICPROGRAMMINGFORUNCERTAIN LINEARSYSTEMS11

2.1ProblemFormulationandPreliminaries / 11

2.2OnlinePolicyIteration / 14

2.3LearningAlgorithms / 16

2.4Applications / 24

2.5Notes / 29

References / 30

3SEMI-GLOBALADAPTIVEDYNAMICPROGRAMMING35

3.1ProblemFormulationandPreliminaries / 35

3.2Semi-GlobalOnlinePolicyIteration / 38

3.3Application / 43

3.4Notes / 46

References / 46

4GLOBALADAPTIVEDYNAMICPROGRAMMINGFOR NONLINEARPOLYNOMIALSYSTEMS49

4.1ProblemFormulationandPreliminaries / 49

4.2RelaxedHJBEquationandSuboptimalControl / 52

4.3SOS-BasedPolicyIterationforPolynomialSystems / 55

4.4GlobalADPforUncertainPolynomialSystems / 59

4.5ExtensionforNonlinearNon-PolynomialSystems / 64

4.6Applications / 70

4.7Notes / 81

References / 81

5ROBUSTADAPTIVEDYNAMICPROGRAMMING85

5.1RADPforPartiallyLinearCompositeSystems / 86

5.2RADPforNonlinearSystems / 97

5.3Applications / 103

5.4Notes / 109

References / 110

6ROBUSTADAPTIVEDYNAMICPROGRAMMINGFOR LARGE-SCALESYSTEMS113

6.1StabilityandOptimalityforLarge-ScaleSystems / 113

6.2RADPforLarge-ScaleSystems / 122

6.3ExtensionforSystemswithUnmatchedDynamicUncertainties / 124

6.4ApplicationtoaTen-MachinePowerSystem / 128

6.5Notes / 132

References / 133

7ROBUSTADAPTIVEDYNAMICPROGRAMMINGASA THEORYOFSENSORIMOTORCONTROL137

7.1ADPforContinuous-TimeStochasticSystems / 138

7.2RADPforContinuous-TimeStochasticSystems / 143

7.3NumericalResults:ADP-BasedSensorimotorControl / 153

7.4NumericalResults:RADP-BasedSensorimotorControl / 165

7.5Discussion / 167

7.6Notes / 172

References / 173

ABASICCONCEPTSINNONLINEARSYSTEMS177

A.1LyapunovStability / 177

A.2ISSandtheSmall-GainTheorem / 178

BSEMIDEFINITEPROGRAMMINGANDSUM-OF-SQUARES PROGRAMMING181

B.1SDPandSOSP / 181 CPROOFS183

C.1ProofofTheorem3.1.4 / 183

C.2ProofofTheorem3.2.3 / 186

References / 188

INDEX191

PREFACEANDACKNOWLEDGMENTS

Thisbookcoversthetopicofadaptiveoptimalcontrol(AOC)forcontinuous-time systems.Anadaptiveoptimalcontrollercangraduallymodifyitselftoadapttothe controlledsystem,andtheadaptationismeasuredbysomeperformanceindexof theclosed-loopsystem.ThestudyofAOCcanbetracedbacktothe1970s,when researchersattheLosAlamosScientificLaboratory(LASL)startedtoinvestigate theuseofadaptiveandoptimalcontroltechniquesinbuildingswithsolar-based temperaturecontrol.Comparedwithconventionaladaptivecontrol,AOChasthe importantabilitytoimproveenergyconservationandsystemperformance.However, eventhoughtherearevariouswaysinAOCtocomputetheoptimalcontroller,most ofthepreviouslyknownapproachesaremodel-based,inthesensethatamodel withafixedstructureisassumedbeforedesigningthecontroller.Inaddition,these approachesdonotgeneralizetononlinearmodels.

Ontheotherhand,quiteafewmodel-free,data-drivenapproachesforAOChave emergedinrecentyears.Inparticular,adaptive/approximatedynamicprogramming (ADP)isapowerfulmethodologythatintegratestheideaofreinforcementlearning (RL)observedfrommammalianbrainwithdecisiontheorysothatcontrollersfor man-madesystemscanlearntoachieveoptimalperformanceinspiteofuncertainty abouttheenvironmentandthelackofdetailedsystemmodels.Sincethe1960s,RL hasbeenbroughttothecomputerscienceandcontrolscienceliteratureasaway tostudyartificialintelligence,andhasbeensuccessfullyappliedtomanydiscretetimesystems,orMarkovDecisionProcesses(MDPs).However,ithasalwaysbeen challengingtogeneralizethoseresultstothecontrollerdesignofphysicalsystems. Thisismainlybecausethestatespaceofaphysicalcontrolsystemisgenerally continuousandunbounded,andthestatesarecontinuousintime.Therefore,the convergenceandthestabilitypropertieshavetobecarefullystudiedforADP-based

approaches.Themainpurposeofthisbookistointroducetherecentlydeveloped framework,knownasrobustadaptivedynamicprogramming(RADP),fordatadriven,non-modelbasedadaptiveoptimalcontroldesignforbothlinearandnonlinear continuous-timesystems.

Inaddition,thisbookisintendedtoaddressinasystematicwaythepresenceof dynamicuncertainty.Dynamicuncertaintyexistsubiquitouslyincontrolengineering. Itisprimarilycausedbythedynamicswhicharepartofthephysicalsystembutare eitherdifficulttobemathematicallymodeledorignoredforthesakeofcontroller designandsystemanalysis.Withoutaddressingthedynamicuncertainty,controller designsbasedonthesimplifiedmodelwillmostlikelyfailwhenappliedtothe physicalsystem.InmostofthepreviouslydevelopedADPorotherRLmethods, itisassumedthatthefull-stateinformationisalwaysavailable,andthereforethe systemordermustbeknown.Althoughthisassumptionexcludestheexistenceof anydynamicuncertainty,itisapparentlytoostrongtoberealistic.Foraphysical modelonarelativelylargescale,knowingtheexactnumberofstatevariablescan bedifficult,nottomentionthatnotallstatevariablescanbemeasuredprecisely. Forexample,considerapowergridwithamaingeneratorcontrolledbytheutility companyandsmalldistributedgenerators(DGs)installedbycustomers.Theutility companyshouldnotneglectthedynamicsoftheDGs,butshouldtreatthemas dynamicuncertaintieswhencontrollingthegrid,suchthatstability,performance, andpowersecuritycanbealwaysmaintainedasexpected.

Thebookisorganizedinfourparts.First,anoverviewofRL,ADP,andRADP iscontainedinChapter1.Second,afewrecentlydevelopedcontinuous-timeADP methodsareintroducedinChapters2,3,and4.Chapter2coversthetopicofADPfor uncertainlinearsystems.Chapters3and4provideneuralnetwork-basedandsum-ofsquares(SOS)-basedADPmethodologiestoachievesemi-globalandglobalstabilizationforuncertainnonlinearcontinuous-timesystems,respectively.Third,Chapters5 and6focusonRADPforlinearandnonlinearsystems,withdynamicuncertainties rigorouslyaddressed.InChapter5,differentrobustificationschemesareintroduced toachieveRADP.Chapter6furtherextendstheRADPframeworkforlarge-scalesystemsandillustratesitsapplicabilitytoindustrialpowersystems.Finally,Chapter7 appliesADPandRADPtostudythesensorimotorcontrolofhumans,andtheresults suggestthathumansmaybeusingverysimilarapproachestolearntocoordinate movementstohandleuncertaintiesinourdailylives.

Thisbookmakesamajordeparturefrommostexistingtextscoveringthesame topicsbyprovidingmanypracticalexamplessuchaspowersystemsandhuman sensorimotorcontrolsystemstoillustratetheeffectivenessofourresults.Thebook usesMATLABineachchaptertoconductnumericalsimulations.MATLABisused asacomputationaltool,aprogrammingtool,andagraphicaltool.Simulink,a graphicalprogrammingenvironmentformodeling,simulating,andanalyzingmultidomaindynamicsystems,isusedinChapter2.Thethird-partyMATLAB-based softwareSOSTOOLSandCVXareusedinChapters4and5tosolveSOSprogramsandsemidefiniteprograms(SDP).AllMATLABprogramsandtheSimulink modeldevelopedinthisbookaswellasextensionoftheseprogramsareavailableat http://yu-jiang.github.io/radpbook/

Thedevelopmentofthisbookwouldnothavebeenpossiblewithoutthesupport andhelpofmanypeople.TheauthorswishtothankProf.FrankLewisandDr.Paul Werboswhoseseminalworkonadaptive/approximatedynamicprogramminghaslaid downthefoundationofthebook.Thefirst-namedauthor(YJ)wouldliketothankhis Master’sThesisadviserProf.JieHuangforguidinghimintotheareaofnonlinear control,andDr.YebinWangforofferinghimasummerresearchinternshippositionat MitsubishiElectricResearchLaboratories,wherepartsoftheideasinChapters4and 5wereoriginallyinspired.Thesecond-namedauthor(ZPJ)wouldliketoacknowledge hiscolleagues—speciallyDrs.AlessandroAstolfi,LeiGuo,IvenMareels,andFrank Lewis—formanyusefulcommentsandconstructivecriticismonsomeoftheresearch summarizedinthebook.Heisgratefultohisstudentsfortheboldnessinentering theinterestingyetstillunpopularfieldofdata-drivenadaptiveoptimalcontrol.The authorswishtothanktheeditorsandeditorialstaff,inparticular,MengchuZhou, MaryHatcher,BradyChin,SureshSrinivasan,andDivyaNarayanan,fortheirefforts inpublishingthebook.WethankTaoBianandWeinanGaoforcollaborationon generalizationsandapplicationsofADPbasedontheframeworkofRADPpresented inthisbook.Finally,wethankourfamiliesfortheirsacrificeinadaptingtoourhardto-predictworkingschedulesthatofteninvolvedynamicuncertainties.Fromour familymembers,wehavelearnedtheimportanceofexplorationnoiseinachieving thedesiredtrade-offbetweenrobustnessandoptimality.Thebulkofthisresearchwas accomplishedwhilethefirst-namedauthorwasworkingtowardhisPh.D.degreeinthe ControlandNetworksLabatNewYorkUniversityTandonSchoolofEngineering. TheauthorswishtoacknowledgetheresearchfundingsupportbytheNational ScienceFoundation.

YuJiang Wellesley,Massachusetts Zhong-PingJiang Brooklyn,NewYork

ACRONYMS

ADPAdaptive/approximatedynamicprogramming

AOCAdaptiveoptimalcontrol

AREAlgebraicRiccatiequation

DFDivergentforcefield

DGDistributedgenerator/generation

DPDynamicprogramming

GASGlobalasymptoticstability

HJBHamilton-Jacobi-Bellman(equation)

IOSInput-to-outputstability

ISSInput-to-statestability

LQRLinearquadraticregulator

MDPMarkovdecisionprocess

NFNull-field

PEPersistentexcitation

PIPolicyiteration

RADPRobustadaptivedynamicprogramming

RLReinforcementlearning

SDPSemidefiniteprogramming

SOSSum-of-squares

SUOStrongunboundednessobservability

VFVelocity-dependentforcefield

VIValueiteration

GLOSSARY

| ⋅ |

‖ ⋅ ‖

⊗

 1

J ⊕ D

J ⊙ D



TheEuclideannormforvectors,ortheinducedmatrixnormformatrices

Foranypiecewisecontinuousfunction u : ℝ+ → ℝm , ‖u‖ = sup{|u(t )|, t ≥ 0}

Kroneckerproduct

Thesetofallcontinuouslydifferentiablefunctions

Thecostforthecoupledlarge-scalesystem

Thecostforthedecoupledlarge-scalesystem

Thesetofallfunctionsin  1 thatarealsopositivedefiniteandradially unbounded

(⋅)Infinitesimalgenerator

ℝ

Thesetofallrealnumbers

ℝ+ Thesetofallnon-negativerealnumbers

ℝ[x]d1 ,d2

Thesetofallpolynomialsin x ∈ ℝn withdegreenolessthan d1 > 0and nogreaterthan d2

vec(⋅)vec(A)isdefinedtobethe mn-vectorformedbystackingthecolumns of A ∈ ℝn×m ontopofanother,thatis,vec(A) = [aT 1 aT 2 ⋯ aT m ]T ,where ai ∈ ℝn ,with i = 1,2, … , m,arethecolumnsofA

ℤ+ Thesetofallnon-negativeintegers

[x]d1 ,d2

Thevectorofall ( n + d2 d2 ) ( n + d1 1 d1 1 ) distinctmonicmonomials in x ∈ ℝn withdegreenolessthan d1 > 0andnogreaterthan d2

∇∇V referstothegradientofadifferentiablefunction V : ℝn → ℝ

CHAPTER1

INTRODUCTION

1.1FROMRLTORADP

1.1.1IntroductiontoRL

Reinforcementlearning(RL)isoriginallyobservedfromthelearningbehaviorin humansandothermammals.ThedefinitionofRLvariesindifferentliterature.Indeed, learningacertaintaskthroughtrial-and-errorcanbeconsideredasanexampleof RL.Ingeneral,anRLproblemrequirestheexistenceofan agent,thatcaninteract withsomeunknown environment bytaking actions,andreceivinga reward fromit. SuttonandBartoreferredtoRLas howtomapsituationstoactionssoastomaximize anumericalrewardsignal [47].Apparently,maximizingarewardisequivalentto minimizinga cost,whichisusedmorefrequentlyinthecontextofoptimalcontrol [32].Inthisbook,amappingbetweensituationsandactionsiscalleda policy,and thegoalofRListolearnanoptimalpolicysuchthatapredefinedcostisminimized.

Asauniquelearningapproach,RLdoesnotrequireasupervisortoteachanagent totaketheoptimalaction.Instead,itfocusesonhowtheagent,throughinteractions withtheunknownenvironment,shouldmodifyitsownactionstowardtheoptimal one(Figure1.1).AnRLiterationgenerallycontainstwomajorsteps.First,theagent evaluatesthecostunderthecurrentpolicy,throughinteractingwiththeenvironment. Thisstepisknownas policyevaluation.Second,basedontheevaluatedcost,the agentadoptsanewpolicyaimingatfurtherreducingthecost.Thisisthestepof policyimprovement RobustAdaptiveDynamicProgramming,FirstEdition.YuJiangandZhong-PingJiang. ©2017byTheInstituteofElectricalandElectronicsEngineers,Inc.Published2017byJohnWiley&Sons,Inc.

FIGURE1.1 IllustrationofRL.Theagenttakesanactiontointeractwiththeunknown environment,andevaluatestheresultingcost,basedonwhichtheagentcanfurtherimprove theactiontoreducethecost.

Asanimportantbranchinmachinelearningtheory,RLhasbeenbroughttothe computerscienceandcontrolscienceliteratureasawaytostudyartificialintelligenceinthe1960s[37,38,54].Sincethen,numerouscontributionstoRL,froma controlperspective,havebeenmade(see,e.g.,[2,29,33,34,46,53,56]).Recently, AlphaGo,acomputerprogramdevelopedbyGoogleDeepMind,isabletoimprove itselfthroughreinforcementlearningandhasbeatenprofessionalhumanGoplayers [44].Itisbelievedthatsignificantattentionwillcontinuouslybepaidtothestudy ofreinforcementlearning,sinceitisapromisingtoolforustobetterunderstandthe trueintelligenceinhumanbrains.

1.1.2IntroductiontoDP

Ontheotherhand,dynamicprogramming(DP)[4]offersatheoreticalwaytosolve multistagedecision-makingproblems.However,itsuffersfromtheinherentcomputationalcomplexity,alsoknownasthe curseofdimensionality [41].Therefore, theneedforapproximativemethodshasbeenrecognizedasearlyasinthelate 1950s[3].In[15],aniterativetechniquecalledpolicyiteration(PI)wasdevised byHowardforMarkovdecisionprocesses(MDPs).Also,Howardreferredtothe iterativemethoddevelopedbyBellman[3,4]asvalueiteration(VI).Computingthe optimalsolutionthroughsuccessiveapproximations,PIiscloselyrelatedtolearning methods.In1968,WerbospointedoutthatPIcanbeemployedtoperformRL[58]. Startingfromthen,manyreal-timeRLmethodsforfindingonlineoptimalcontrol policieshaveemergedandtheyarebroadlycalledapproximate/adaptivedynamic programming(ADP)[31,33,41,43,55,60–65,68],orneurodynamicprogramming [5].ThemainfeatureofADP[59,61]isthatitemploysideasfromRLtoachieve onlineapproximationofthevaluefunction,withoutusingtheknowledgeofthe systemdynamics.

1.1.3TheDevelopmentofADP

ThedevelopmentofADPtheoryconsistsofthreephases.Inthefirstphase,ADP wasextensivelyinvestigatedwithinthecommunitiesofcomputerscienceand

operationsresearch.PIandVIareusuallyemployedastwobasicalgorithms.In [46],Suttonintroducedthetemporaldifferencemethod.In1989,Watkinsproposed thewell-knownQ-learningmethodinhisPhDthesis[56].Q-learningsharessimilarfeatureswiththeaction-dependentheuristicdynamicprogramming(ADHDP) schemeproposedbyWerbosin[62].Otherrelatedresearchworkunderadiscretetime anddiscretestate-spaceMarkovdecisionprocessframeworkcanbefoundin[5,6,8, 9,41,42,48,47]andreferencestherein.Inthesecondphase,stabilityisbroughtinto thecontextofADPwhilereal-timecontrolproblemsarestudiedfordynamicsystems. Tothebestofourknowledge,Lewisandhisco-workersarethefirstwhocontributed totheintegrationofstabilitytheoryandADPtheory[33].Anessentialadvantageof ADPtheoryisthatanoptimalcontrolpolicycanbeobtainedviaarecursivenumericalalgorithmusingonlineinformationwithoutsolvingtheHamilton-Jacobi-Bellman (HJB)equation(fornonlinearsystems)andthealgebraicRiccatiequation(ARE)(for linearsystems),evenwhenthesystemdynamicsarenotpreciselyknown.Related optimalfeedbackcontroldesignsforlinearandnonlineardynamicsystemshavebeen proposedbyseveralresearchersoverthepastfewyears;see,forexample,[7,10,39, 40,50,52,66,69].WhilemostofthepreviousworkonADPtheorywasdevotedto discrete-time(DT)systems(see[31]andreferencestherein),therehasbeenrelatively lessresearchforthecontinuous-time(CT)counterpart.ThisismainlybecauseADPis considerablymoredifficultforCTsystemsthanforDTsystems.Indeed,manyresults developedforDTsystems[35]cannotbeextendedstraightforwardlytoCTsystems. Asaresult,earlyattemptsweremadetoapplyQ-learningforCTsystemsviadiscretizationtechnique[1,11].However,theconvergenceandstabilityanalysisofthese schemesarechallenging.In[40],Murrayet.alproposedanimplementationmethod whichrequiresthemeasurementsofthederivativesofthestatevariables.Assaid previously,Lewisandhisco-workersproposedthefirstsolutiontostabilityanalysis andconvergenceproofsforADP-basedcontrolsystemsbymeansoflinearquadratic regulator(LQR)theory[52].Asynchronouspolicyiterationschemewasalsopresentedin[49].ForCTlinearsystems,thepartialknowledgeofthesystemdynamics (i.e.,theinputmatrix)mustbepreciselyknown.Thisrestrictionhasbeencompletely removedin[18].Anonlinearvariantofthismethodcanbefoundin[22]and[23].

ThethirdphaseinthedevelopmentofADPtheoryisrelatedtoextensionsof previousADPresultstononlinearuncertainsystems.Neuralnetworksandgame theoryareutilizedtoaddressthepresenceofuncertaintyandnonlinearityincontrol systems.See,forexample,[14,31,50,51,57,67,69,70].Animplicitassumptionin thesepapersisthatthesystemorderisknownandthattheuncertaintyisstatic,not dynamic.Thepresenceofdynamicuncertaintyhasnotbeensystematicallyaddressed intheliteratureofADP.Bydynamicuncertainty,werefertothemismatchbetweenthe nominalmodel(alsoreferredtoasthe reduced-ordersystem)andtherealplantwhen theorderofthenominalmodelislowerthantheorderoftherealsystem.Aclosely relatedtopicofresearchishowtoaccountfortheeffectofunseenvariables[60].It isquitecommonthatthefull-stateinformationisoftenmissinginmanyengineering applicationsandonlytheoutputmeasurementorpartial-statemeasurementsare available.AdaptationoftheexistingADPtheorytothispracticalscenarioisimportant yetnon-trivial.Neuralnetworksaresoughtforaddressingthestateestimationproblem

Unknown environment

Valuenetwork

Policynetwork

Action

FIGURE1.2 IllustrationoftheADPscheme.

[12,28].However,thestabilityanalysisoftheestimator/controlleraugmentedsystem isbynomeanseasy,becausethetotalsystemishighlyinterconnectedandoften stronglynonlinear.TheconfigurationofastandardADP-basedcontrolsystemis showninFigure1.2.

Ourrecentwork[17,19,20,21]onthedevelopmentofrobustADP(forshort, RADP)theoryisexactlytargetedataddressingthesechallenges.

1.1.4WhatIsRADP?

RADPisdevelopedtoaddressthepresenceofdynamicuncertaintyinlinearand nonlineardynamicalsystems.SeeFigure1.3foranillustration.Thereareseveral reasonsforwhichwepursueanewframeworkforRADP.Firstandforemost,itis wellknownthatbuildinganexactmathematicalmodelforphysicalsystemsoftenis Action Cost

Unknown environment order

Reducedsystem Dynamic uncertainty

Valuenetwork

Policynetwork

FIGURE1.3 IntheRADPlearningscheme,anewcomponent,knownasdynamicuncertainty,istakenintoconsideration.

ahardtask.Also,eveniftheexactmathematicalmodelcanbeobtainedforsome particularengineeringandbiologicalapplications,simplifiednominalmodelsare oftenmorepreferableforsystemanalysisandcontrolsynthesisthantheoriginal complexsystemmodel.Whilewerefertothemismatchbetweenthesimplified nominalmodelandtheoriginalsystemasdynamicuncertaintyhere,theengineering literatureoftenusesthetermof unmodeleddynamics instead.Second,theobservation errorsmayoftenbecapturedbydynamicuncertainty.Fromtheliteratureofmodern nonlinearcontrol[25,26,30],itisknownthatthepresenceofdynamicuncertainty makesthefeedbackcontrolproblemextremelychallenginginthecontextofnonlinear systems.InordertobroadentheapplicationscopeofADPtheoryinthepresence ofdynamicuncertainty,ourstrategyistointegratetoolsfromnonlinearcontrol theory,suchasLyapunovdesigns,input-to-statestabilitytheory[45],andnonlinear small-gaintechniques[27].ThiswayRADPbecomesapplicabletowideclassesof uncertaindynamicsystemswithincompletestateinformationandunknownsystem order/dynamics.

Additionally,RADPcanbeappliedtolarge-scaledynamicsystemsasshowninour recentpaper[20].Byintegratingasimpleversionofthecyclic-small-gaintheorem [36],asymptoticstabilitycanbeachievedbyassigningappropriateweightingmatrices foreachsubsystem.Further,certainsuboptimalitypropertycanbeobtained.Because ofseveralemergingapplicationsofpracticalimportancesuchassmartelectricgrid, intelligenttransportationsystems,andgroupsofmobileautonomousagents,this topicdeservesfurtherinvestigationsfromanRADPpointofview.Theexistenceof unknownparametersand/ordynamicuncertaintiesandthelimitedinformationof statevariablesgiverisetochallengesforthedecentralizedordistributedcontroller designoflarge-scalesystems.

1.2SUMMARYOFEACHCHAPTER

Thisbookisorganizedasfollows.Chapter2studiesADPforuncertainlinearsystems, ofwhichtheonlyaprioriknowledgeisaninitial,stabilizingstaticstate-feedback controlpolicy.Then,viapolicyiteration,theoptimalcontrolpolicyisapproximated. TwoADPmethods,on-policylearningandoff-policylearning,areintroducedto achieveonlineimplementationofconventionalpolicyiteration.Asaresult,theoptimalcontrolpolicycanbeapproximatedusingonlinemeasurements,insteadofthe knowledgeofthesystemdynamics.

Chapter3furtherextendstheADPmethodsforuncertainaffinenonlinearsystems. Toguaranteeproperapproximationofthevaluefunctionandthecontrolpolicy,neural networksareapplied.ConvergenceandstabilitypropertiesofthenonlinearADP methodarerigorouslyproved.Itisshownthatsemi-globalstabilizationisattainable forageneralclassofcontinuous-timenonlinearsystems,undertheapproximate optimalcontrolpolicy.

Chapter4focusesonthetheoryofglobaladaptivedynamicprogramming(GADP). Itaimsatsimultaneouslyimprovingtheclosed-loopsystemperformanceandachievingglobalasymptoticstabilityoftheoverallsystemattheorigin.Itisshownthat theequalityconstraintusedinpolicyevaluationcanberelaxedtoasum-of-squares

(SOS)constraint.Hence,anSOS-basedpolicyiterationisformulated,byrelaxing theconventionalpolicyiteration.Inthisnewpolicyiterationalgorithm,thecontrol policyobtainedateachiterationstepisgloballystabilizing.Similarly,theSOS-based policyiterationcanbeimplementedonline,withouttheneedtoidentifytheexact systemdynamics.

Chapter5presentsthenewframeworkofRADP.IncontrasttotheADPtheory introducedinChapters2–4,RADPdoesnotrequireallthestatevariablestobe available,northesystemorderassumedknown.Instead,itincorporatesasubsystem knownasthedynamicuncertaintythatinteractswithasimplifiedreduced-order model.WhileADPmethodsareperformedonthereducedmodel,theinteractions betweenthedynamicuncertaintyandthesimplifiedmodelarestudiedusingtools borrowedfrommodernnonlinearsystemanalysisandcontrollerdesign.Thelearning objectiveinRADPistoachieveoptimalperformanceofthereduced-ordermodel intheabsenceofdynamicuncertainties,andmaintainrobustnessofstabilityinthe presenceofthedynamicuncertainty.

Chapter6appliestheRADPframeworktosolvethedecentralizedoptimalcontrol problemforaclassoflarge-scaleuncertainsystems.Inrecentyears,considerable attentionhasbeenpaidtothestabilizationoflarge-scalecomplexsystems,aswell asrelatedconsensusandsynchronizationproblems.Examplesoflarge-scalesystems arisefromecosystems,transportationnetworks,andpowersystems.Often,inrealworldapplications,precisemathematicalmodelsarehardtobuild,andthemodel mismatch,causedbyparametricanddynamicuncertainties,isthusunavoidable. This,togetherwiththeexchangeofonlylocalsysteminformation,makesthedesign problemchallenginginthecontextofcomplexnetworks.Inthischapter,thecontroller designforeachsubsystemonlyneedstoutilizelocalstatemeasurementswithout knowingthesystemdynamics.Byintegratingasimpleversionofthecyclic-smallgaintheorem,asymptoticstabilitycanbeachievedbyassigningappropriatenonlinear gainsforeachsubsystem.

Chapter7studiessensorimotorcontrolwithstaticanddynamicuncertaintiesunder theframeworkofRADP[18,19,21,24].ThelinearversionofRADPisextended forstochasticsystemsbytakingintoaccountsignal-dependentnoise[13],andthe proposedmethodisappliedtostudythesensorimotorcontrolproblemwithboth staticanddynamicuncertainties.Resultspresentedinthischaptersuggestthatthe centralnervoussystem(CNS)mayuseRADP-likelearningstrategytocoordinate movementsandtoachievesuccessfuladaptationinthepresenceofstaticand/or dynamicuncertainties.Intheabsenceofthedynamicuncertainties,thelearning strategyreducestoanADP-likemechanism.

AllthenumericalsimulationsinthisbookaredevelopedusingMATLAB® R2015a.Sourcecodeisavailableonthewebpageofthebook[16].

REFERENCES

[1]L.C.Baird.Reinforcementlearningincontinuoustime:Advantageupdating.In:ProceedingsoftheIEEEWorldCongressonComputationalIntelligence,Vol.4,pp.2448–2453, Orlando,FL,1994.

[2]A.G.Barto,R.S.Sutton,andC.W.Anderson.Neuronlikeadaptiveelementsthat cansolvedifficultlearningcontrolproblems. IEEETransactionsonSystems,Manand Cybernetics,13(5):834–846,1983.

[3]R.BellmanandS.Dreyfus.Functionalapproximationsanddynamicprogramming. MathematicalTablesandOtherAidstoComputation,13(68):247–251,1959.

[4]R.E.Bellman. DynamicProgramming.PrincetonUniversityPress,Princeton,NJ,1957.

[5]D.P.Bertsekas. DynamicProgrammingandOptimalControl,4thed.AthenaScientific, Belmont,MA,2007.

[6]D.P.BertsekasandJ.N.Tsitsiklis. Neuro-DynamicProgramming.AthenaScientific, Nashua,NH,1996.

[7]S.Bhasin,N.Sharma,P.Patre,andW.Dixon.Asymptotictrackingbyareinforcement learning-basedadaptivecriticcontroller. JournalofControlTheoryandApplications, 9(3):400–409,2011.

[8]V.S.Borkar. StochasticApproximation:ADynamicalSystemsViewpoint.Cambridge UniversityPress,Cambridge,2008.

[9]L.Busoniu,R.Babuska,B.DeSchutter,andD.Ernst. ReinforcementLearningand DynamicProgrammingusingFunctionApproximators.CRCPress,2010.

[10]T.DierksandS.Jagannathan.OutputfeedbackcontrolofaquadrotorUAVusingneural networks. IEEETransactionsonNeuralNetworks,21(1):50–66,2010.

[11]K.Doya.Reinforcementlearningincontinuoustimeandspace. NeuralComputation, 12(1):219–245,2000.

[12]L.A.FeldkampandD.V.Prokhorov.Recurrentneuralnetworksforstateestimation. In:ProceedingsoftheTwelfthYaleWorkshoponAdaptiveandLearningSystems, pp.17–22,NewHaven,CT,2003.

[13]C.M.HarrisandD.M.Wolpert.Signal-dependentnoisedeterminesmotorplanning. Nature,394:780–784,1998.

[14]H.He,Z.Ni,andJ.Fu.Athree-networkarchitectureforon-linelearningandoptimization basedonadaptivedynamicprogramming. Neurocomputing,78(1):3–13,2012.

[15]R.Howard. DynamicProgrammingandMarkovProcesses.MITPress,Cambridge,MA, 1960.

[16]Y.JiangandZ.P.Jiang. RADPBook.http://yu-jiang.github.io/radpbook/.AccessedMay 1,2015.

[17]Y.JiangandZ.P.Jiang.Robustapproximatedynamicprogrammingandglobalstabilizationwithnonlineardynamicuncertainties.In:Proceedingsofthe50thIEEEConference onJointDecisionandControlConferenceandEuropeanControlConference(CDCECC),pp.115–120,Orlando,FL,2011.

[18]Y.JiangandZ.P.Jiang.Computationaladaptiveoptimalcontrolforcontinuous-timelinearsystemswithcompletelyunknowndynamics. Automatica,48(10):2699–2704,2012.

[19]Y.JiangandZ.P.Jiang.Robustadaptivedynamicprogramming.In:D.LiuandF.Lewis, editors, ReinforcementLearningandAdaptiveDynamicProgrammingforFeedback Control,Chapter13,pp.281–302.JohnWiley&Sons,2012.

[20]Y.JiangandZ.P.Jiang.Robustadaptivedynamicprogrammingforlarge-scalesystems withanapplicationtomultimachinepowersystems. IEEETransactionsonCircuitsand SystemsII:ExpressBriefs,59(10):693–697,2012.

[21]Y.JiangandZ.P.Jiang.Robustadaptivedynamicprogrammingwithanapplicationtopowersystems. IEEETransactionsonNeuralNetworksandLearningSystems, 24(7):1150–1156,2013.

[22]Y.JiangandZ.P.Jiang.Robustadaptivedynamicprogrammingandfeedbackstabilization ofnonlinearsystems. IEEETransactionsonNeuralNetworksandLearningSystems, 25(5):882–893,2014.

[23]Y.JiangandZ.P.Jiang.Globaladaptivedynamicprogrammingforcontinuous-timenonlinearsystems. IEEETransactionsonAutomaticControl,60(11):2917–2929,November 2015.

[24]Z.P.JiangandY.Jiang.Robustadaptivedynamicprogrammingforlinearandnonlinear systems:Anoverview. EuropeanJournalofControl,19(5):417–425,2013.

[25]Z.P.JiangandI.Mareels.Asmall-gaincontrolmethodfornonlinearcascadedsystems withdynamicuncertainties. IEEETransactionsonAutomaticControl,42(3):292–308, 1997.

[26]Z.P.JiangandL.Praly.Designofrobustadaptivecontrollersfornonlinearsystemswith dynamicuncertainties. Automatica,34(7):825–840,1998.

[27]Z.P.Jiang,A.R.Teel,andL.Praly.Small-gaintheoremforISSsystemsandapplications. MathematicsofControl,SignalsandSystems,7(2):95–120,1994.

[28]Y.H.KimandF.L.Lewis. High-LevelFeedbackControlwithNeuralNetworks.World Scientific,1998.

[29]B.Kiumarsi,F.L.Lewis,H.Modares,A.Karimpour,andM.-B.Naghibi-Sistani.ReinforcementQ-learningforoptimaltrackingcontroloflineardiscrete-timesystemswith unknowndynamics. Automatica,50(4):1167–1175,2014.

[30]M.Krstic,I.Kanellakopoulos,andP.V.Kokotovic. NonlinearandAdaptiveControl Design.JohnWiley&Sons,NewYork,1995.

[31]F.L.LewisandD.Liu. ReinforcementLearningandApproximateDynamicProgramming forFeedbackControl.JohnWiley&Sons,2012.

[32]F.L.LewisandK.G.Vamvoudakis.Reinforcementlearningforpartiallyobservable dynamicprocesses:Adaptivedynamicprogrammingusingmeasuredoutputdata. IEEE TransactionsonSystems,Man,andCybernetics,PartB:Cybernetics,41(1):14–25,2011.

[33]F.L.LewisandD.Vrabie.Reinforcementlearningandadaptivedynamicprogramming forfeedbackcontrol. IEEECircuitsandSystemsMagazine,9(3):32–50,2009.

[34]F.L.Lewis,D.Vrabie,andV.L.Syrmos. OptimalControl,3rded.JohnWiley&Sons, NewYork,2012.

[35]D.LiuandD.Wang.Optimalcontrolofunknownnonlineardiscrete-timesystemsusing iterativeglobalizeddualheuristicprogrammingalgorithm.In:F.LewisandL.Derong, editors, ReinforcementLearningandApproximateDynamicProgrammingforFeedback Control,pp.52–77.JohnWiley&Sons,2012.

[36]T.Liu,D.J.Hill,andZ.P.Jiang.LyapunovformulationofISScyclic-small-gainin continuous-timedynamicalnetworks. Automatica,47(9):2088–2093,2011.

[37]J.MendelandR.McLaren.Reinforcement-learningcontrolandpatternrecognition systems.In: APreludetoNeuralNetworks,pp.287–318.PrenticeHallPress,1994.

[38]M.Minsky.Stepstowardartificialintelligence. ProceedingsoftheIRE,49(1):8–30,1961.

[39]H.Modares,F.L.Lewis,andM.-B.Naghibi-Sistani.Adaptiveoptimalcontrolof unknownconstrained-inputsystemsusingpolicyiterationandneuralnetworks. IEEE TransactionsonNeuralNetworksandLearningSystems,24(10):1513–1525,2013.

[40]J.J.Murray,C.J.Cox,G.G.Lendaris,andR.Saeks.Adaptivedynamicprogramming. IEEETransactionsonSystems,Man,andCybernetics,PartC:ApplicationsandReviews, 32(2):140–153,2002.

REFERENCES

[41]W.B.Powell. ApproximateDynamicProgramming:SolvingtheCursesofDimensionality.JohnWiley&Sons,NewYork,2007.

[42]M.L.Puterman. MarkovDecisionProcesses:DiscreteStochasticDynamicProgramming,Vol.414.JohnWiley&Sons,2009.

[43]J.Si,A.G.Barto,W.B.Powell,andD.C.Wunsch(editors). HandbookofLearningandApproximateDynamicProgramming.JohnWiley&Sons,Inc.,Hoboken,NJ, 2004.

[44]D.Silver,A.Huang,C.J.Maddison,A.Guez,L.Sifre,G.Driessche,J.Schrittwieser, I.Antonoglou,V.Panneershelvam,M.Lanctot,S.Dieleman,D.Grewe,J.Nham,N. Kalchbrenner,I.Sutskever,T.Lillicrap,M.Leach,K.Kavukcuoglu,T.Graepel,andD. Hassabis.MasteringthegameofGowithdeepneuralnetworksandtreesearch. Nature, 529(7587):484–489,2016.

[45]E.D.Sontag.Inputtostatestability:Basicconceptsandresults.In: Nonlinearand OptimalControlTheory,pp.163–220.Springer,2008.

[46]R.S.Sutton.Learningtopredictbythemethodsoftemporaldifferences. Machine learning,3(1):9–44,1988.

[47]R.S.SuttonandA.G.Barto. ReinforcementLearning:AnIntroduction.Cambridge UniversityPress,1998.

[48]C.Szepesvari.ReinforcementlearningalgorithmsforMDPs.TechnicalReportTR09-13, DepartmentofComputingScience,UniversityofAlberta,Edmonton,CA,2009.

[49]K.G.VamvoudakisandF.L.Lewis.Onlineactor–criticalgorithmtosolvethe continuous-timeinfinitehorizonoptimalcontrolproblem. Automatica,46(5):878–888, 2010.

[50]K.G.VamvoudakisandF.L.Lewis.Multi-playernon-zero-sumgames:Onlineadaptive learningsolutionofcoupledHamilton–Jacobiequations. Automatica,47(8):1556–1569, 2011.

[51]K.G.VamvoudakisandF.L.Lewis.Onlinesolutionofnonlineartwo-playerzero-sum gamesusingsynchronouspolicyiteration. InternationalJournalofRobustandNonlinear Control,22(13):1460–1483,2012.

[52]D.Vrabie,O.Pastravanu,M.Abu-Khalaf,andF.Lewis.Adaptiveoptimalcontrolfor continuous-timelinearsystemsbasedonpolicyiteration. Automatica,45(2):477–484, 2009.

[53]D.Vrabie,K.G.Vamvoudakis,andF.L.Lewis. OptimalAdaptiveControlandDifferentialGamesbyReinforcementLearningPrinciples.IET,London,2013.

[54]M.WaltzandK.Fu.Aheuristicapproachtoreinforcementlearningcontrolsystems. IEEETransactionsonAutomaticControl,10(4):390–398,1965.

[55]F.-Y.Wang,H.Zhang,andD.Liu.Adaptivedynamicprogramming:Anintroduction. IEEEComputationalIntelligenceMagazine,4(2):39–47,2009.

[56]C.Watkins.Learningfromdelayedrewards.PhDThesis,King’sCollegeofCambridge, 1989.

[57]Q.WeiandD.Liu.Data-drivenneuro-optimaltemperaturecontrolofwatergasshift reactionusingstableiterativeadaptivedynamicprogramming. IEEETransactionson IndustrialElectronics,61(11):6399–6408,November2014.

[58]P.Werbos.Theelementsofintelligence. Cybernetica(Namur),(3),1968.

[59]P.Werbos.Advancedforecastingmethodsforglobalcrisiswarningandmodelsofintelligence. GeneralSystemsYearbook,22:25–38,1977.

[60]P.Werbos.Reinforcementlearningandapproximatedynamicprogramming(RLADP)–Foundations,commonmisconceptionsandthechallengesahead.In:F.L.Lewisand D.Liu,editors, ReinforcementLearningandApproximateDynamicProgrammingfor FeedbackControl,pp.3–30.JohnWiley&Sons,Hoboken,NJ,2013.

[61]P.J.Werbos.Beyondregression:Newtoolsforpredictionandanalysisinthebehavioral sciences.PhDThesis,HarvardUniversity,1974.

[62]P.J.Werbos.Neuralnetworksforcontrolandsystemidentification.In:Proceedingsof the28thIEEEConferenceonDecisionandControl,pp.260–265,Tampa,FL,1989.

[63]P.J.Werbos.Amenuofdesignsforreinforcementlearningovertime.In:W.Miller, R.Sutton,andP.Werbos,editors, NeuralNetworksforControl,pp.67–95.MITPress, Cambridge,MA,1990.

[64]P.J.Werbos.Approximatedynamicprogrammingforreal-timecontrolandneuralmodeling.In:D.WhiteandD.Sofge,editors, HandbookofIntelligentControl:Neural,Fuzzy, andAdaptiveApproaches,pp.493–525.VanNostrandReinhold,NewYork,1992.

[65]P.J.Werbos.FromADPtothebrain:Foundations,roadmap,challengesandresearch priorities.In:2014InternationalJointConferenceonNeuralNetworks(IJCNN), pp.107–111,Beijing,2014.doi:10.1109/IJCNN.2014.6889359

[66]H.Xu,S.Jagannathan,andF.L.Lewis.Stochasticoptimalcontrolofunknownlinear networkedcontrolsysteminthepresenceofrandomdelaysandpacketlosses. Automatica, 48(6):1017–1030,2012.

[67]X.Xu,C.Wang,andF.L.Lewis.Somerecentadvancesinlearningandadaptation foruncertainfeedbackcontrolsystems. InternationalJournalofAdaptiveControland SignalProcessing,28(3–5):201–204,2014.

[68]H.Zhang,D.Liu,Y.Luo,andD.Wang. AdaptiveDynamicProgrammingforControl Springer,London,2013.

[69]H.Zhang,Q.Wei,andD.Liu.Aniterativeadaptivedynamicprogrammingmethodfor solvingaclassofnonlinearzero-sumdifferentialgames. Automatica,47(1):207–214, 2011.

[70]X.Zhang,H.He,H.Zhang,andZ.Wang.Optimalcontrolforunknowndiscrete-timenonlinearMarkovjumpsystemsusingadaptivedynamicprogramming. IEEETransactions onNeuralNetworksandLearningSystems,25(12):2141–2155,2014.

CHAPTER2

ADAPTIVEDYNAMICPROGRAMMING FORUNCERTAINLINEARSYSTEMS

Thischapterpresentsareinforcementlearning-inspiredADPapproachforfinding anewclassofonlineadaptiveoptimalcontrollersforuncertainlinearsystems.The onlyinformationrequiredforfeedbackcontrollerdesignisthedimensionofthe statevector x(t ),thedimensionoftheinputvector u(t ),andanapriorilinearstatefeedbackcontrolpolicythatasymptoticallystabilizesthesystemattheorigin.The blockdiagramofsuchasettingisshowninFigure2.1.Theproposedapproach employstheideaofADPtoiterativelysolvethealgebraicRiccatiequation(ARE) usingtheonlineinformationofthestateandtheinput,withoutrequiringtheapriori knowledgeof,oridentifyingthesystemmatrices.

2.1PROBLEMFORMULATIONANDPRELIMINARIES

Consideracontinuous-timelinearsystemdescribedby

= Ax + Bu

where x ∈ ℝn isthesystemstatefullyavailableforfeedbackcontroldesign; u ∈ ℝm isthecontrolinput;and A ∈ ℝn×n and B ∈ ℝn×m areuncertainconstantmatrices. Inaddition,thesystemisassumedtobestabilizable,inthesensethatthereexists aconstantmatrix K ofappropriatedimensionssothat A BK isHurwitz(i.e.,all eigenvaluesof A BK arelocatedintheopen-left-halfplane).

Uncertain linear system

ADP controller x(t) u(t)

FIGURE2.1 ADP-basedonlinelearningcontrolforuncertainlinearsystems.

Weareinterestedinfindingalinearquadraticregulator(LQR)intheformof

whichminimizesthefollowingperformanceindex

where Q = QT ≥ 0, R = RT > 0,with(A, Q1∕2 )observable. AccordingtotheconventionalLQRoptimalcontroltheory,with(2.2),wehave

isafinitematrixifandonlyif A BK isHurwitz.Takingthederivativeof xT Px alongthesolutionof(2.1),itfollowsthat P istheuniquepositivedefinitesolutionto theLyapunovequation

Theoptimalsolutiontotheabove-mentionedproblemisassociatedwiththefollowingwell-knownARE(see[29])

whichhasauniquerealsymmetric,positivedefinitesolution P∗ .Once P∗ isobtained, theoptimalfeedbackgainmatrix K ∗ in(2.2)isthusdeterminedby

Since(2.7)isnonlinearin P,itisusuallydifficulttosolveitanalytically,especially forlarge-sizematrices.Nevertheless,manyefficientalgorithmshavebeendeveloped

tonumericallyapproximatethesolutionto(2.7).Oneofsuchalgorithmsiswidely knownasKleinman’salgorithm[27]andisrecalledbelow.

Theorem2.1.1([27]) LetK0 ∈ ℝm×n beanystabilizingfeedbackgainmatrix(i.e., A BK0 isHurwitz),andrepeatthefollowingstepsfork = 0,1, …

(1) SolvefortherealsymmetricpositivedefinitesolutionPk oftheLyapunov equation

whereAk = A BKk .

(2) Updatethefeedbackgainmatrixby

Then,thefollowingpropertieshold:

(1) A BKk isHurwitz, (2) P∗ ≤ Pk +1 ≤ Pk , (3) lim k

Kk = K

Pk = P

Proof: ConsidertheLyapunovequation(2.9)with k = 0.Since A BK0 isHurwitz, by(2.5)weknow P0 isfiniteandpositivedefinite.Inaddition,by(2.5)and(2.9)we have

Similarly,by(2.5)and(2.7)weobtain

Therefore,wehave P∗ ≤ P1 ≤ P0 .Since P∗ ispositivedefiniteand P0 isfinite, P1 mustbefiniteandpositivedefinite.Thisimpliesthat A BK1 isHurwitz.Repeating theaboveanalysisfor k = 1,2, … provesProperties(1)and(2)inTheorem2.1.1. Finally,since{Pk }isamonotonicallydecreasingsequenceandlowerbounded by P∗ , lim k →∞ Pk = P∞ exists.By(2.9)and(2.10), P = P∞ satisfies(2.7),whichhasa uniquesolution.Therefore, P∞ = P∗ .Theproofisthuscomplete.

ThealgorithmdescribedinTheorem2.1.1isinfactapolicyiterationmethod [17]forcontinuous-timelinearsystems.Indeed,givenastabilizinggainmatrix Kk , (2.9)isknownasthestepof policyevaluation,sinceitevaluatesthecostmatrix Pk

ADAPTIVEDYNAMICPROGRAMMINGFORUNCERTAINLINEARSYSTEMS

associatedwiththecontrolpolicy.Equation(2.10),knownas policyimprovement, findsanewfeedbackgain Kk +1 basedontheevaluatedcostmatrix Pk .

AccordingtoTheorem2.1.1,byiterativelysolvingfor Pk fromtheLyapunov equation(2.9)andupdating Kk accordingto(2.10),the(unique)solutiontothe nonlinearequation(2.7)isnumericallyapproximated.However,ineachiteration, theperfectknowledgeof A and B isrequired,becausethesetwomatricesappear explicitlyin(2.9)and(2.10).InSection2.2,wewillshowhowthispolicyiteration canbeimplementedviareinforcementlearning,withoutknowing A or B,orboth.

2.2ONLINEPOLICYITERATION

Tobeginwith,letusconsiderthefollowingcontrolpolicy

wherethetime-varyingsignal e denotesanartificialnoise,knownasthe exploration noise,addedforthepurposeofonlinelearning.

Remark2.2.1 Choosingtheexplorationnoiseisnotatrivialtaskforgeneral reinforcementlearningproblemsandotherrelatedmachinelearningproblems,especiallyforhigh-dimensionalsystems.Insolvingpracticalproblems,severaltypesof explorationnoisehavebeenadopted,suchasrandomnoise[1,48],exponentially decreasingprobingnoise[41].ForthesimulationsinSection2.4,sumofsinusoidal signalswithdifferentfrequencieswillbeusedtoconstructtheexplorationnoise,as in[22].

Underthecontrolpolicy(2.13),theoriginalsystem(2.1)canberewrittenas

Then,takingthetimederivativeof xT Pk x alongthesolutionsof(2.14),itfollows that

where Qk = Q + K T k RKk . Itisworthpointingoutthat,in(2.15)weused(2.9)toreplacetheterm xT (AT k Pk + Pk Ak )x,whichdependson A and B bytheterm xT Qk x.Thisnewtermcanbe measuredonlinefromreal-timedataalongthesystemtrajectories.Also,by(2.10), wereplacedtheterm BT Pk with RKk +1 ,inwhich Kk +1 istreatedasanotherunknown matrixtobesolvedtogetherwith Pk .Therefore,wehaveremovedthedependencies onthesystemmatrices A and B in(2.15),suchthatitbecomespossibletosolve simultaneouslyfor Pk and Kk +1 usingonlinemeasurements.

Now,byintegratingbothsidesof(2.15)onanygiveninterval[t , t + �� t ]andby rearrangingtheterms,wehave

Wecall(2.16)the onlinepolicyiteration equation,foritreliesontheknowledge ofstatemeasurementsandthecontrolpolicybeingapplied,insteadofthesystem knowledge.Further,wecanuse(2.16)toobtainasetofequations,byspecifying t = tk ,1 , tk ,2 , … , tk ,lk ,with0 ≤ tk ,i + �� t ≤ tk ,i+1 and tk ,i + �� t ≤ tk +1,1 forall k = 0,1, … and i = 1,2, … , lk .Theseequationsbasedoninput/statedatacanthenbeusedtosolve for Pk and Kk +1 .ThedetailswillbegiveninSection2.3. Wealsoconsidertheonlinepolicyiteration(2.16)asan on-policylearning method. Thisisbecauseeachtimewhenanewcontrolpolicy,representedbythegainmatrix Kk ,isobtained,itmustbeimplementedtogeneratenewsolutionsoftheclosed-loop system.Thesenewsolutionsarethenusedforevaluatingthecurrentcostandfinding thenewpolicy.Tobemorespecific, x(t )appearedin(2.16)isthesolutionof(2.14), inwhich Kk isusedtoformulatethecontrolpolicy.

Althoughon-policyiterationcloselymimicsbiologicallearning,theentireadaptationprocesscanbeslowandoneneedstokeepcollectingonlinedatauntilsome convergencecriterionissatisfied.Inengineeringapplications,wearesometimes moreinterestedinobtaininganapproximateoptimalsolutionbymakingfulluse ofsomefinitedata.Thismotivatesustodevelopanoff-policylearningstrategy,in whichweapplyaninitialcontrolpolicytothesystemonafinitenumberoftime intervalsandcollecttheonlinemeasurements.Then,alliterationsareconductedby usingrepeatedlythesameonlinedata.

Tothisend,considerthefollowingsystem,whichistheclosed-loopsystem composedof(2.1)andanarbitraryfeedbackcontrolpolicy

Similarto(2.16),wehave

Although(2.16)and(2.18)shareaverysimilarstructure,afundamentaldifference betweenthemisthat x(t )in(2.18)isgeneratedfromsystem(2.17),inwhich Kk isnot

involved.Therefore,thesameamountofdatacollectedontheinterval[t1 , tl + �� t ]can beusedforcalculating Kk ,with k = 1,2, … Asaresult,wecallthisimplementation off-policylearning,inthattheactualpolicybeenusedcanbeanarbitraryone,as longasitkeepsthesolutionsoftheoverallsystembounded.Theon-policyandthe off-policyimplementationswillbefurtherdiscussedintheSection2.3.

2.3LEARNINGALGORITHMS

2.3.1On-PolicyLearning

Forcomputationalsimplicity,wewouldliketoconverttheunknownmatrices Pk and Kk +1 intoavector.Oneconvenientwaytoachievethisconversion,withoutlosingany information,isviaKroneckerproductrepresentation[28].Oneimportantidentitywe useis

Therefore,wehave

Applyingtheaboveequalities(2.20)–(2.21),(2.16)canbeconvertedto

AsmentionedinSection2.2,wenowcanapply(2.16)onmultipletimeintervals toobtainasetoflinearequationsrepresentedinthefollowingmatrixform

Beforesolvingthepair(Pk , Kk +1 )from(2.23),itisimportanttocheckifthe solutionisunique.Tothisend,Assumption2.3.1isintroduced.

Assumption2.3.1 Foreachk = 0,1,2, …,thereexistsasufficientlylargeinteger lk > 0,suchthatthefollowingrankconditionholds.

Eachinterval[tk ,j , tk ,j+1 ]iscalledasamplinginterval.Weneedtocollectenough sampleddata(whichmeanslargeenough lk foreachiterationstep k ).Thechoiceof theexplorationnoiseplaysavitalrole.Ingeneral,thisrankconditioncanbechecked computationally,butnotanalytically.

Tosatisfytherankconditionin(2.26),agoodpracticeistoassurethatoneiteration stepcanutilizedatafromatleasttwiceasmanysamplingintervalsastheunknowns, thatis, lk ≥ n(n + 1) + 2mn for k = 0,1,2, ⋯.Inaddition,iftheexplorationnoiseis aperiodicalsignal,thelengthofanysamplingintervalshouldbesufficientlylarger thantheperiodofthenoise.

Remark2.3.2 Therankcondition(2.26)introducedaboveisessentiallyinspired fromthepersistentexcitation(PE)conditioninadaptivecontrol[18,33].

Lemma2.3.3 UnderAssumption2.3.1,thereisauniquepair (Pk , Kk +1 ) ∈ ℝn×n × ℝn×m satisfying(2.23)withPk = PT k .

Proof: By(2.24),thereare n(n 1) 2 duplicatedcolumnsinthefirst n2 columnsof Θk . Since Pk issymmetric,thereare n(n 1) 2 duplicatedentriesinthefirst n2 entriesofthe vector [ vec(Pk ) vec(Kk +1 ) ],andtherowindicesoftheseduplicatedentriesmatchexactly theindicesofthe n(n 1) 2 duplicatedcolumnsin Θk .Forexample,if n = 2,thethird columnof Θk istheduplicatedcolumnbecauseitisidenticaltothesecondcolumnof Θk .Meanwhile,thethirdentryin [ vec(Pk ) vec(Kk +1 ) ] isduplicatedfromthesecondentry inthesamevector.

UnderAssumption2.3.1,the n(n+1) 2 + mn distinctcolumnsin Θk arelinearly independent.Asdiscussedabove,theindicesoftheseindependentcolumnsare exactlythesameastherowindicesofthe n(n+1) 2 + mn distinctelementsinthevector

[

vec(Pk )

vec(Kk +1 )

],providedthat Pk issymmetric.Therefore,thepair(Pk , Kk +1 )satisfying (2.23)with Pk = PT k mustbeunique.

Aslongastherankcondition(2.26)issatisfied,theuniquepair(Pk , Kk +1 )mentionedinLemma2.3.3canbeeasilysolvedinMATLAB.Inparticular,thefollowingMATLABfunctioncanbeused.Thefourinputargumentsofthisfunctionare expectedtobe ΘK , Ξk , m,and n,respectively.Thetwooutputargumentsarethe correspondingmatrices Pk and Kk +1 .

function [P,K] = PKsolver(Theta,Xi,m,n)

w = pinv(Theta)∗Xi; %Solveforw = [vec(P);vec(K)]

P = reshape(w(1:n∗n),n,n); %Reshapew(1:nˆ2)togetP

P = (P+P’)/2; %ConvertPtoasymmetricmatrix

K = reshape(w(n∗n+1:end),m,n); %Reshapew(nˆ2+1:end)togetK end

Now,wearereadytogivethefollowingon-policycontinuous-timeADPalgorithm, aflowchartdescribingthealgorithmisshowninFigure2.2.

Start

Initialization: k ← 0; K0 stabilizing; is and t0,1 0=

Online data collection:

Use u = Kk x + e as andpolicy, controlthe compute Θk and Ξk

Policy evaluation and improvement: Solve Pk and Kk+1 from Θk vec(Pk ) vec(Kk+1 ) Ξ= k

Algorithm2.3.4 On-policyADPalgorithm

(1) Initialization:

FindK0 suchthatA BK0 isHurwitz.Letk = 0 andt0,1 = 0

(2) Onlinedatacollection:

Applyu =−Kk x + etothesystemfromt = tk ,1 andconstructeachrowofthe thedatamatrices Θk and Ξk ,untiltherankcondition(2.26)issatisfied.

(3) Policyevaluationandimprovement:

SolveforPk = PT k andKk +1 from(2.23).

(4) Stoppingcriterion:

Terminatetheexplorationnoiseandapplyu =−Kk xasthecontrol,ifk ≥ 1 and

(2.27) with ��> 0 apredefined,sufficientlysmall,threshold.Otherwise,lettk +1,1 satisfytk +1,1 ≥ tk ,lk + �� tandgotoStep(2),withk ← k + 1.

Remark2.3.5 Noticethat ��> 0 isselectedtobalancetheexploration/exploitation trade-off.Inpractice,alarger �� mayleadtoshorterexplorationtimeandtherefore willallowthesystemtoimplementthecontrolpolicyandterminatetheexploration noisesooner.Ontheotherhand,toobtainmoreaccurateapproximationofthe optimalsolution,thethreshold �� ischosensmall,and(2.27)shouldholdforseveral consecutivevaluesofk.Thesameistrueforoff-policyalgorithms.

Theorem2.3.6 LetK0 ∈ ℝm×n beanystabilizingfeedbackgainmatrix,andlet (Pk , Kk +1 ) beapairofmatricesobtainedfromAlgorithm2.3.4.Then,underAssumption2.3.1,thefollowingpropertieshold:

(1) A BKk isHurwitz,

(2) P∗ ≤ Pk +1 ≤ Pk , (3) lim k →∞ Kk = K ∗ , lim k →∞ Pk = P∗ .

Proof: From(2.15),(2.16),and(2.22),oneseesthatthepair(Pk , Kk +1 )obtained from(2.9)and(2.10)mustsatisfy(2.23).Inaddition,byLemma2.3.3,suchapair isunique.Therefore,thesolutionto(2.9)and(2.10)isthesameasthesolutionto (2.23)forall k = 0,1, ….TheproofisthuscompletedbyTheorem2.1.1.

Remark2.3.7 TheADPapproachintroducedhereisrelatedtotheaction-dependent heuristicdynamicprogramming(ADHDP)[46],orQ-learning[44]methodfor discrete-timesystems.Indeed,itcanbeviewedthatwesolveforthefollowingmatrix Hk ateachiterationstep.

Another random document with no related content on Scribd:

"Yes, Pooh."

"Will you be here too?"

"Yes, Pooh, I will be, really. I promise I will be, Pooh."

"That's good," said Pooh.

"Pooh, promise you won't forget about me, ever. Not even when I'm a hundred."

Pooh thought for a little.

"How old shall I be then?"

"Ninety-nine."

Pooh nodded.

"I promise," he said.

Still with his eyes on the world Christopher Robin put out a hand and felt for Pooh's paw.

"Pooh," said Christopher Robin earnestly, "if I—if I'm not quite——" he stopped and tried again—"Pooh, whatever happens, you will understand, won't you?"

"Understand what?"

"Oh, nothing." He laughed and jumped to his feet. "Come on!"

"Where?" said Pooh.

"Anywhere," said Christopher Robin.

So they went off together. But wherever they go, and whatever happens to them on the way, in that enchanted place on the top of the Forest, a little boy and his Bear will always be playing.

BOOKS FOR BOYS AND GIRLS

BY A. A. MILNE

with Decorations by E. H. SHEPARD:

WHEN WE WERE VERY YOUNG NOW WE ARE SIX WINNIE-THE-POOH

THE HOUSE AT POOH CORNER

THE CHRISTOPHER ROBIN STORY BOOK

SONG-BOOKS FROM THE POEMS OF A. A. MILNE with Music by H. FRASER-SIMSON: FOURTEEN SONGS

THE KING'S BREAKFAST

TEDDY BEAR AND OTHER SONGS

THE HUMS OF POOH

SONGS FROM "NOW WE ARE SIX"

E. P. DUTTON & CO., INC.

*** END OF THE PROJECT GUTENBERG EBOOK THE HOUSE AT POOH

CORNER

***

Updated editions will replace the previous one—the old editions will be renamed.

Creating the works from print editions not protected by U.S. copyright law means that no one owns a United States copyright in these works, so the Foundation (and you!) can copy and distribute it in the United States without permission and without paying copyright royalties. Special rules, set forth in the General Terms of Use part of this license, apply to copying and distributing Project Gutenberg™ electronic works to protect the PROJECT GUTENBERG™ concept and trademark. Project Gutenberg is a registered trademark, and may not be used if you charge for an eBook, except by following the terms of the trademark license, including paying royalties for use of the Project Gutenberg trademark. If you do not charge anything for copies of this eBook, complying with the trademark license is very easy. You may use this eBook for nearly any purpose such as creation of derivative works, reports, performances and research. Project Gutenberg eBooks may be modified and printed and given away—you may do practically ANYTHING in the United States with eBooks not protected by U.S. copyright law. Redistribution is subject to the trademark license, especially commercial redistribution.

START: FULL LICENSE

THE FULL PROJECT GUTENBERG LICENSE

PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK

To protect the Project Gutenberg™ mission of promoting the free distribution of electronic works, by using or distributing this work (or any other work associated in any way with the phrase “Project Gutenberg”), you agree to comply with all the terms of the Full Project Gutenberg™ License available with this file or online at www.gutenberg.org/license.

Section 1. General Terms of Use and Redistributing Project Gutenberg™ electronic works

1.A. By reading or using any part of this Project Gutenberg™ electronic work, you indicate that you have read, understand, agree to and accept all the terms of this license and intellectual property (trademark/copyright) agreement. If you do not agree to abide by all the terms of this agreement, you must cease using and return or destroy all copies of Project Gutenberg™ electronic works in your possession. If you paid a fee for obtaining a copy of or access to a Project Gutenberg™ electronic work and you do not agree to be bound by the terms of this agreement, you may obtain a refund from the person or entity to whom you paid the fee as set forth in paragraph 1.E.8.

1.B. “Project Gutenberg” is a registered trademark. It may only be used on or associated in any way with an electronic work by people who agree to be bound by the terms of this agreement. There are a few things that you can do with most Project Gutenberg™ electronic works even without complying with the full terms of this agreement. See paragraph 1.C below. There are a lot of things you can do with Project Gutenberg™ electronic works if you follow the terms of this agreement and help preserve free future access to Project Gutenberg™ electronic works. See paragraph 1.E below.

1.C. The Project Gutenberg Literary Archive Foundation (“the Foundation” or PGLAF), owns a compilation copyright in the collection of Project Gutenberg™ electronic works. Nearly all the individual works in the collection are in the public domain in the United States. If an individual work is unprotected by copyright law in the United States and you are located in the United States, we do not claim a right to prevent you from copying, distributing, performing, displaying or creating derivative works based on the work as long as all references to Project Gutenberg are removed. Of course, we hope that you will support the Project Gutenberg™ mission of promoting free access to electronic works by freely sharing Project Gutenberg™ works in compliance with the terms of this agreement for keeping the Project Gutenberg™ name associated with the work. You can easily comply with the terms of this agreement by keeping this work in the same format with its attached full Project Gutenberg™ License when you share it without charge with others.

1.D. The copyright laws of the place where you are located also govern what you can do with this work. Copyright laws in most countries are in a constant state of change. If you are outside the United States, check the laws of your country in addition to the terms of this agreement before downloading, copying, displaying, performing, distributing or creating derivative works based on this work or any other Project Gutenberg™ work. The Foundation makes no representations concerning the copyright status of any work in any country other than the United States.

1.E. Unless you have removed all references to Project Gutenberg:

1.E.1. The following sentence, with active links to, or other immediate access to, the full Project Gutenberg™ License must appear prominently whenever any copy of a Project Gutenberg™ work (any work on which the phrase “Project Gutenberg” appears, or with which the phrase “Project

Gutenberg” is associated) is accessed, displayed, performed, viewed, copied or distributed:

This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.org. If you are not located in the United States, you will have to check the laws of the country where you are located before using this eBook.

1.E.2. If an individual Project Gutenberg™ electronic work is derived from texts not protected by U.S. copyright law (does not contain a notice indicating that it is posted with permission of the copyright holder), the work can be copied and distributed to anyone in the United States without paying any fees or charges. If you are redistributing or providing access to a work with the phrase “Project Gutenberg” associated with or appearing on the work, you must comply either with the requirements of paragraphs 1.E.1 through 1.E.7 or obtain permission for the use of the work and the Project Gutenberg™ trademark as set forth in paragraphs 1.E.8 or 1.E.9.

1.E.3. If an individual Project Gutenberg™ electronic work is posted with the permission of the copyright holder, your use and distribution must comply with both paragraphs 1.E.1 through 1.E.7 and any additional terms imposed by the copyright holder. Additional terms will be linked to the Project Gutenberg™ License for all works posted with the permission of the copyright holder found at the beginning of this work.

1.E.4. Do not unlink or detach or remove the full Project Gutenberg™ License terms from this work, or any files containing a part of this work or any other work associated with Project Gutenberg™.

1.E.5. Do not copy, display, perform, distribute or redistribute this electronic work, or any part of this electronic work, without prominently displaying the sentence set forth in paragraph 1.E.1 with active links or immediate access to the full terms of the Project Gutenberg™ License.

1.E.6. You may convert to and distribute this work in any binary, compressed, marked up, nonproprietary or proprietary form, including any word processing or hypertext form. However, if you provide access to or distribute copies of a Project Gutenberg™ work in a format other than “Plain Vanilla ASCII” or other format used in the official version posted on the official Project Gutenberg™ website (www.gutenberg.org), you must, at no additional cost, fee or expense to the user, provide a copy, a means of exporting a copy, or a means of obtaining a copy upon request, of the work in its original “Plain Vanilla ASCII” or other form. Any alternate format must include the full Project Gutenberg™ License as specified in paragraph 1.E.1.

1.E.7. Do not charge a fee for access to, viewing, displaying, performing, copying or distributing any Project Gutenberg™ works unless you comply with paragraph 1.E.8 or 1.E.9.

1.E.8. You may charge a reasonable fee for copies of or providing access to or distributing Project Gutenberg™ electronic works provided that:

• You pay a royalty fee of 20% of the gross profits you derive from the use of Project Gutenberg™ works calculated using the method you already use to calculate your applicable taxes. The fee is owed to the owner of the Project Gutenberg™ trademark, but he has agreed to donate royalties under this paragraph to the Project Gutenberg Literary Archive Foundation. Royalty payments must be paid within 60 days following each date on which you prepare (or are legally required to prepare) your periodic tax returns. Royalty payments should be clearly marked as such and sent to the Project Gutenberg Literary Archive Foundation at the address specified in Section 4, “Information

about donations to the Project Gutenberg Literary Archive Foundation.”

• You provide a full refund of any money paid by a user who notifies you in writing (or by e-mail) within 30 days of receipt that s/he does not agree to the terms of the full Project Gutenberg™ License. You must require such a user to return or destroy all copies of the works possessed in a physical medium and discontinue all use of and all access to other copies of Project Gutenberg™ works.

• You provide, in accordance with paragraph 1.F.3, a full refund of any money paid for a work or a replacement copy, if a defect in the electronic work is discovered and reported to you within 90 days of receipt of the work.

• You comply with all other terms of this agreement for free distribution of Project Gutenberg™ works.

1.E.9. If you wish to charge a fee or distribute a Project Gutenberg™ electronic work or group of works on different terms than are set forth in this agreement, you must obtain permission in writing from the Project Gutenberg Literary Archive Foundation, the manager of the Project Gutenberg™ trademark. Contact the Foundation as set forth in Section 3 below.

1.F.

1.F.1. Project Gutenberg volunteers and employees expend considerable effort to identify, do copyright research on, transcribe and proofread works not protected by U.S. copyright law in creating the Project Gutenberg™ collection. Despite these efforts, Project Gutenberg™ electronic works, and the medium on which they may be stored, may contain “Defects,” such as, but not limited to, incomplete, inaccurate or corrupt data, transcription errors, a copyright or other intellectual property infringement, a defective or damaged disk or other

medium, a computer virus, or computer codes that damage or cannot be read by your equipment.

1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGESExcept for the “Right of Replacement or Refund” described in paragraph 1.F.3, the Project Gutenberg Literary Archive Foundation, the owner of the Project Gutenberg™ trademark, and any other party distributing a Project Gutenberg™ electronic work under this agreement, disclaim all liability to you for damages, costs and expenses, including legal fees. YOU AGREE THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE, STRICT LIABILITY, BREACH OF WARRANTY OR BREACH OF CONTRACT EXCEPT THOSE PROVIDED IN PARAGRAPH

1.F.3. YOU AGREE THAT THE FOUNDATION, THE TRADEMARK OWNER, AND ANY DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.

1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you discover a defect in this electronic work within 90 days of receiving it, you can receive a refund of the money (if any) you paid for it by sending a written explanation to the person you received the work from. If you received the work on a physical medium, you must return the medium with your written explanation. The person or entity that provided you with the defective work may elect to provide a replacement copy in lieu of a refund. If you received the work electronically, the person or entity providing it to you may choose to give you a second opportunity to receive the work electronically in lieu of a refund. If the second copy is also defective, you may demand a refund in writing without further opportunities to fix the problem.

1.F.4. Except for the limited right of replacement or refund set forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS

OR IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied warranties or the exclusion or limitation of certain types of damages. If any disclaimer or limitation set forth in this agreement violates the law of the state applicable to this agreement, the agreement shall be interpreted to make the maximum disclaimer or limitation permitted by the applicable state law. The invalidity or unenforceability of any provision of this agreement shall not void the remaining provisions.

1.F.6. INDEMNITY - You agree to indemnify and hold the Foundation, the trademark owner, any agent or employee of the Foundation, anyone providing copies of Project Gutenberg™ electronic works in accordance with this agreement, and any volunteers associated with the production, promotion and distribution of Project Gutenberg™ electronic works, harmless from all liability, costs and expenses, including legal fees, that arise directly or indirectly from any of the following which you do or cause to occur: (a) distribution of this or any Project Gutenberg™ work, (b) alteration, modification, or additions or deletions to any Project Gutenberg™ work, and (c) any Defect you cause.

Section 2. Information about the Mission of Project Gutenberg™

Project Gutenberg™ is synonymous with the free distribution of electronic works in formats readable by the widest variety of computers including obsolete, old, middle-aged and new computers. It exists because of the efforts of hundreds of volunteers and donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the assistance they need are critical to reaching Project

Gutenberg™’s goals and ensuring that the Project Gutenberg™ collection will remain freely available for generations to come. In 2001, the Project Gutenberg Literary Archive Foundation was created to provide a secure and permanent future for Project Gutenberg™ and future generations. To learn more about the Project Gutenberg Literary Archive Foundation and how your efforts and donations can help, see Sections 3 and 4 and the Foundation information page at www.gutenberg.org.

Section 3. Information about the Project Gutenberg Literary Archive Foundation

The Project Gutenberg Literary Archive Foundation is a nonprofit 501(c)(3) educational corporation organized under the laws of the state of Mississippi and granted tax exempt status by the Internal Revenue Service. The Foundation’s EIN or federal tax identification number is 64-6221541. Contributions to the Project Gutenberg Literary Archive Foundation are tax deductible to the full extent permitted by U.S. federal laws and your state’s laws.

The Foundation’s business office is located at 809 North 1500 West, Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up to date contact information can be found at the Foundation’s website and official page at www.gutenberg.org/contact

Section 4. Information about Donations to the Project Gutenberg Literary Archive Foundation

Project Gutenberg™ depends upon and cannot survive without widespread public support and donations to carry out its mission of increasing the number of public domain and licensed works that can be freely distributed in machine-readable form

accessible by the widest array of equipment including outdated equipment. Many small donations ($1 to $5,000) are particularly important to maintaining tax exempt status with the IRS.

The Foundation is committed to complying with the laws regulating charities and charitable donations in all 50 states of the United States. Compliance requirements are not uniform and it takes a considerable effort, much paperwork and many fees to meet and keep up with these requirements. We do not solicit donations in locations where we have not received written confirmation of compliance. To SEND DONATIONS or determine the status of compliance for any particular state visit www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states where we have not met the solicitation requirements, we know of no prohibition against accepting unsolicited donations from donors in such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot make any statements concerning tax treatment of donations received from outside the United States. U.S. laws alone swamp our small staff.

Please check the Project Gutenberg web pages for current donation methods and addresses. Donations are accepted in a number of other ways including checks, online payments and credit card donations. To donate, please visit: www.gutenberg.org/donate.

Section 5. General Information About Project Gutenberg™ electronic works

Professor Michael S. Hart was the originator of the Project Gutenberg™ concept of a library of electronic works that could be freely shared with anyone. For forty years, he produced and

distributed Project Gutenberg™ eBooks with only a loose network of volunteer support.

Project Gutenberg™ eBooks are often created from several printed editions, all of which are confirmed as not protected by copyright in the U.S. unless a copyright notice is included. Thus, we do not necessarily keep eBooks in compliance with any particular paper edition.

Most people start at our website which has the main PG search facility: www.gutenberg.org.

This website includes information about Project Gutenberg™, including how to make donations to the Project Gutenberg Literary Archive Foundation, how to help produce our new eBooks, and how to subscribe to our email newsletter to hear about new eBooks.