Introductionto AlgorithmsforData MiningandMachine Learning
Xin-SheYang
MiddlesexUniversity
SchoolofScienceandTechnology
London,UnitedKingdom
AcademicPressisanimprintofElsevier 125LondonWall,LondonEC2Y5AS,UnitedKingdom 525BStreet,Suite1650,SanDiego,CA92101,UnitedStates 50HampshireStreet,5thFloor,Cambridge,MA02139,UnitedStates TheBoulevard,LangfordLane,Kidlington,OxfordOX51GB,UnitedKingdom
Copyright©2019ElsevierInc.Allrightsreserved.
Nopartofthispublicationmaybereproducedortransmittedinanyformorbyanymeans,electronicor mechanical,includingphotocopying,recording,oranyinformationstorageandretrievalsystem,without permissioninwritingfromthepublisher.Detailsonhowtoseekpermission,furtherinformationaboutthe Publisher’spermissionspoliciesandourarrangementswithorganizationssuchastheCopyrightClearanceCenter andtheCopyrightLicensingAgency,canbefoundatourwebsite: www.elsevier.com/permissions.
ThisbookandtheindividualcontributionscontainedinitareprotectedundercopyrightbythePublisher(other thanasmaybenotedherein).
Notices
Knowledgeandbestpracticeinthisfieldareconstantlychanging.Asnewresearchandexperiencebroadenour understanding,changesinresearchmethods,professionalpractices,ormedicaltreatmentmaybecomenecessary.
Practitionersandresearchersmustalwaysrelyontheirownexperienceandknowledgeinevaluatingandusing anyinformation,methods,compounds,orexperimentsdescribedherein.Inusingsuchinformationormethods theyshouldbemindfuloftheirownsafetyandthesafetyofothers,includingpartiesforwhomtheyhavea professionalresponsibility.
Tothefullestextentofthelaw,neitherthePublishernortheauthors,contributors,oreditors,assumeanyliability foranyinjuryand/ordamagetopersonsorpropertyasamatterofproductsliability,negligenceorotherwise,or fromanyuseoroperationofanymethods,products,instructions,orideascontainedinthematerialherein.
LibraryofCongressCataloging-in-PublicationData
AcatalogrecordforthisbookisavailablefromtheLibraryofCongress
BritishLibraryCataloguing-in-PublicationData
AcataloguerecordforthisbookisavailablefromtheBritishLibrary
ISBN:978-0-12-817216-2
ForinformationonallAcademicPresspublications visitourwebsiteat https://www.elsevier.com/books-and-journals
Publisher: CandiceJanco
AcquisitionEditor: J.ScottBentley
EditorialProjectManager: MichaelLutz
ProductionProjectManager: NileshKumarShah
Designer: MilesHitchen
TypesetbyVTeX
1Introductiontooptimization 1
1.1Algorithms1
1.1.1Essenceofanalgorithm1
1.1.2Issueswithalgorithms3
1.1.3Typesofalgorithms3
1.2Optimization4
1.2.1Asimpleexample4
1.2.2Generalformulationofoptimization7
1.2.3Feasiblesolution9
1.2.4Optimalitycriteria10
1.3Unconstrainedoptimization10
1.3.1Univariatefunctions11
1.3.2Multivariatefunctions12
1.4Nonlinearconstrainedoptimization14
1.4.1Penaltymethod15
1.4.2Lagrangemultipliers16
1.4.3Karush–Kuhn–Tuckerconditions17
1.5Notesonsoftware18
2Mathematicalfoundations 19
2.1Convexity20
2.1.1Linearandaffinefunctions20
2.1.2Convexfunctions21
2.1.3Mathematicaloperationsonconvexfunctions22
2.2Computationalcomplexity22
2.2.1Timeandspacecomplexity24
2.2.2Complexityofalgorithms25
2.3Normsandregularization26
2.3.1Norms26
2.3.2Regularization28
2.4Probabilitydistributions29
2.4.1Randomvariables29
2.4.2Probabilitydistributions30
2.4.3ConditionalprobabilityandBayesianrule32
2.4.4Gaussianprocess34
2.5BayesiannetworkandMarkovmodels35
2.6MonteCarlosampling36
2.6.1MarkovchainMonteCarlo37
2.6.2Metropolis–Hastingsalgorithm37
2.6.3Gibbssampler39
2.7Entropy,crossentropy,andKLdivergence39
2.7.1Entropyandcrossentropy39
2.7.2DLdivergence40
2.8Fuzzyrules41
2.9Dataminingandmachinelearning42
2.9.1Datamining42
2.9.2Machinelearning42 2.10Notesonsoftware42
3Optimizationalgorithms 45
3.1Gradient-basedmethods45
3.1.1Newton’smethod45
3.1.2Newton’smethodformultivariatefunctions47
3.1.3Linesearch48
3.2Variantsofgradient-basedmethods49
3.2.1Stochasticgradientdescent50
3.2.2Subgradientmethod51
3.2.3Conjugategradientmethod52
3.3Optimizersindeeplearning53
3.4Gradient-freemethods56
3.5Evolutionaryalgorithmsandswarmintelligence58
3.5.1Geneticalgorithm58
3.5.2Differentialevolution60
3.5.3Particleswarmoptimization61
3.5.4Batalgorithm61
3.5.5Fireflyalgorithm62
3.5.6Cuckoosearch62
3.5.7Flowerpollinationalgorithm63 3.6Notesonsoftware64
4Datafittingandregression 67
4.1Samplemeanandvariance67 4.2Regressionanalysis69
4.2.1Maximumlikelihood69
4.2.2Linerregression70
4.2.3Linearization75
4.2.4Generalizedlinearregression77
4.2.5Goodnessoffit80
4.3Nonlinearleastsquares81
4.3.1Gauss–Newtonalgorithm82
4.3.2Levenberg–Marquardtalgorithm85
4.3.3Weightedleastsquares85
4.4Overfittingandinformationcriteria86
4.5RegularizationandLassomethod88
4.6Notesonsoftware90
5Logisticregression,PCA,LDA,andICA 91
5.1Logisticregression91
5.2Softmaxregression96
5.3Principalcomponentanalysis96
5.4Lineardiscriminantanalysis101
5.5Singularvaluedecomposition104
5.6Independentcomponentanalysis105
5.7Notesonsoftware108
6Dataminingtechniques 109
6.1Introduction110
6.1.1Typesofdata110
6.1.2Distancemetric110
6.2Hierarchyclustering111
6.3 k -Nearest-neighboralgorithm112
6.4 k -Meansalgorithm113
6.5Decisiontreesandrandomforests115
6.5.1Decisiontreealgorithm115
6.5.2ID3algorithmandC4.5classifier116
6.5.3Randomforest120
6.6Bayesianclassifiers121
6.6.1NaiveBayesianclassifier121
6.6.2Bayesiannetworks123
6.7Dataminingforbigdata124
6.7.1Characteristicsofbigdata124
6.7.2Statisticalnatureofbigdata125
6.7.3Miningbigdata125
6.8Notesonsoftware127
7Supportvectormachineandregression 129
7.1Statisticallearningtheory129
7.2Linearsupportvectormachine130
7.3KernelfunctionsandnonlinearSVM133
7.4Supportvectorregression135
7.5Notesonsoftware137
8Neuralnetworksanddeeplearning 139
8.1Learning139
8.2Artificialneuralnetworks140
8.2.1Neuronmodels140
8.2.2Activationmodels141
8.2.3Artificialneuralnetworks143
8.3Backpropagationalgorithm146
8.4LossfunctionsinANN147
8.5Optimizersandchoiceofoptimizers149
8.6Networkarchitecture149
8.7Deeplearning151
8.7.1Convolutionalneuralnetworks151
8.7.2RestrictedBoltzmannmachine157
8.7.3Deepneuralnets158
8.7.4Trendsindeeplearning159
8.8Tuningofhyperparameters160
8.9Notesonsoftware161
Introductiontooptimization
Thisbookintroducesthemostfundamentalsandalgorithmsrelatedtooptimization, datamining,andmachinelearning.Themainrequirementissomeunderstandingof high-schoolmathematicsandbasiccalculus;however,wewillreviewandintroduce someofthemathematicalfoundationsinthefirsttwochapters.
1.1Algorithms
Analgorithmisaniterative,step-by-stepprocedureforcomputation.Thedetailed procedurecanbeasimpledescription,anequation,oraseriesofdescriptionsin combinationwithequations.Findingtherootsofapolynomial,checkingifanaturalnumberisaprimenumber,andgeneratingrandomnumbersareallalgorithms.
1.1.1Essenceofanalgorithm
Inessence,analgorithmcanbewrittenasaniterativeequationorasetofiterative equations.Forexample,tofindasquarerootof a> 0,wecanusethefollowing iterativeequation:
Example1
Asanexample,if x0 = 1 and a = 4,thenwehave x1 = 1 2 (1 + 4 1 ) = 2 5
Similarly,wehave
x4 ≈ 2.00000927,
whichisveryclosetothetruevalueof √4 = 2.Theaccuracyofthisiterativeformulaoralgorithm ishighbecauseitachievestheaccuracyoffivedecimalplacesafterfouriterations.
Theconvergenceisveryquickifwestartfromdifferentinitialvaluessuchas x0 = 10andeven x0 = 100.However,foranobviousreason,wecannotstartwith x0 = 0duetodivisionbyzero.
Findtherootof x = √a isequivalenttosolvingtheequation f(x) = x 2 a = 0,
whichisagainequivalenttofindingtherootsofapolynomial f(x).Weknowthat Newton’sroot-findingalgorithmcanbewrittenas
where f (x) isthefirstderivativeorgradientof f(x).Inthiscase,wehave f (x) = 2x .Thus,Newton’sformulabecomes
whichcanbewrittenas
ThisisexactlywhatwehaveinEq.(1.1).
Newton’smethodhasrigorousmathematicalfoundations,whichhasaguaranteed convergenceundercertainconditions.However,ingeneral,Eq.(1.6)ismoregeneral, andthegradientinformation f (x) isneeded.Inaddition,fortheformulatobevalid, wemusthave f (x) = 0.
1.1.2Issueswithalgorithms
TheadvantageofthealgorithmgiveninEq.(1.1)isthatitconvergesveryquickly. However,carefulreadersmayhaveasked:weknowthat √4 =±2,howcanwefind theotherroot 2inadditionto +2?
Evenifweusedifferentinitialvalue x0 = 10or x0 = 0.5,wecanonlyreach x∗ = 2, not 2.
Whathappensifwestartwith
whichisapproaching 2veryquickly.Ifwestartfrom x0 =−10or x0 =−0.5,then wecanalwaysget x∗ =−2,not +2.
Thishighlightsakeyissuehere:thefinalsolutionseemstodependontheinitial startingpointforthisalgorithm,whichistrueformanyalgorithms.
Nowtherelevantquestionis:howdoweknowwheretostarttogetaparticular solution?Thegeneralshortansweris“wedonotknow”.Thus,someknowledgeof theproblemunderconsiderationoraneducatedguessmaybeusefultofindthefinal solution.
Infact,mostalgorithmsmaydependontheinitialconfiguration,andsuchalgorithmsareoftencarryingoutsearchmoveslocally.Thus,thistypeofalgorithmis oftenreferredtoaslocalsearch.Agoodalgorithmshouldbeableto“forget”itsinitial configurationthoughsuchalgorithmsmaynotexistatallformosttypesofproblems.
Whatweneedingeneralistheglobalsearch,whichattemptstofindfinalsolutions thatarelesssensitivetotheinitialstartingpoint(s).
Anotherimportantissueinourdiscussionsisthatthegradientinformation f (x) is necessaryforsomealgorithmssuchasNewton’smethodgiveninEq.(1.6).Thisposes certainrequirementsonthesmoothnessofthefunction f(x).Forexample,weknow that |x | isnotdifferentiableat x = 0.Thus,wecannotdirectlyuseNewton’smethod tofindtherootsof f(x) =|x |x 2 a = 0for a> 0.Somemodificationsareneeded. Thereareotherissuesrelatedtoalgorithmssuchasthesettingofparameters,the slowrateofconvergence,conditionnumbers,anditerationstructures.Allthesemake algorithmdesignsandusagesomehowchallenging,andwewilldiscusstheseissues inmoredetaillaterinthisbook.
1.1.3Typesofalgorithms
Analgorithmcanonlydoaspecificcomputationtask(atmostaclassofcomputational tasks),andnoalgorithmscandoallthetasks.Thus,algorithmscanbeclassifieddue totheirpurposes.Analgorithmtofindrootsofapolynomialbelongstoroot-finding algorithms,whereasanalgorithmforrankingasetofnumbersbelongstosorting algorithms.Therearemanyclassesofalgorithmsfordifferentpurposes.Evenforthe samepurposesuchassorting,therearemanydifferentalgorithmssuchasthemerge sort,bubblesort,quicksort,andothers.
Wecanalsocategorizealgorithmsintermsoftheircharacteristics.Theroot-finding algorithmswejustintroducedaredeterministicalgorithmsbecausethefinalsolutions areexactlythesameifwestartfromthesameinitialguess.Weobtainthesamesetof solutionseverytimewerunthealgorithm.Ontheotherhand,wemayintroducesome randomizationintothealgorithm,forexample,usingpurelyrandominitialpoints. Everytimewerunthealgorithm,weuseanewrandominitialguess.Inthiscase,the algorithmcanhavesomenondeterministicnature,andsuchalgorithmsarereferred toasstochastic.Sometimes,usingrandomnessmaybeadvantageous.Forexample,in theexampleof √4 =±2usingEq.(1.1),randominitialvalues(bothpositiveandnegative)canallowthealgorithmtofindbothroots.Infact,amajortrendinthemodern metaheuristicsisusingsomerandomizationtosuitdifferentpurposes.
Foralgorithmstobeintroducedinthisbook,wearemainlyconcernedwithalgorithmsfordatamining,optimization,andmachinelearning.Weusearelatively unifiedapproachtolinkalgorithmsindataminingandmachinelearningtoalgorithms foroptimization.
1.2Optimization
Optimizationiseverywhere,fromengineeringdesigntobusinessplanning.Afterall, timeandresourcesarelimited,andoptimaluseofsuchvaluableresourcesiscrucial. Inaddition,designsofproductshavetomaximizetheperformance,sustainability,and energyefficiencyandtominimizethecosts.Therefore,optimizationisimportantfor manyapplications.
1.2.1Asimpleexample
Letusstartwithaverysimpleexampletodesignacontainerwithvolumecapacity V0 = 10m3 .Asthemaincostisrelatedtothecostofmaterials,themainaimisto minimizethetotalsurfacearea S .
Thefirstthingwehavetodecideistheshapeofthecontainer(cylinder,cubic, sphereorellipsoid,ormorecomplexgeometry).Forsimplicity,letusstartwitha cylindricalshapewithradius r andheight h (seeFig. 1.1).
Thetotalsurfaceareaofacylinderis
S = 2(πr 2 ) + 2πrh, (1.11) andthevolumeis
V = πr 2 h. (1.12)
Thereareonlytwodesignvariables r and h andoneobjectivefunction S tobeminimized.Obviously,ifthereisnocapacityconstraint,thenwecanchoosenottobuild thecontainer,andthenthecostofmaterialsiszerofor r = 0and h = 0.However,
Figure1.1 Designofacylindriccontainer.
theconstraintrequirementmeansthatwehavetobuildacontainerwithfixedvolume V0 = πr 2 h = 10m3 .Therefore,thisoptimizationproblemcanbewrittenas
subjecttotheequalityconstraint
Tosolvethisproblem,wecanfirsttrytousetheequalityconstrainttoreducethe numberofdesignvariablesbysolving h.Sowehave
Substitutingitinto(1.13),weget
Thisisaunivariatefunction.Frombasiccalculusweknowthattheminimumormaximumcanoccuratthestationarypoint,wherethefirstderivativeiszero,thatis,
whichgives
Thus,theheightis
Thismeansthattheheightistwicetheradius: h = 2r .Thus,theminimumsurfaceis
S∗ = 2πr 2 + 2πrh = 2πr 2 + 2πr(2r) = 6πr 2
For V0 = 10,wehave
andthetotalsurfacearea
S∗ = 2πr 2 + 2πrh ≈ 25.69.
Itisworthpointingoutthatthisoptimalsolutionisbasedontheassumptionorrequirementtodesignacylindricalcontainer.Ifwedecidetouseaspherewithradius R , weknowthatitsvolumeandsurfaceareais
Wecansolve
whichgivesthesurfacearea
Since6π/ 3 √4π 2 ≈ 5.5358and4π
.83598,wehave S<S∗ ,thatis,the surfaceareaofasphereissmallerthantheminimumsurfaceareaofacylinderwith thesamevolume.Infact,forthesame V0 = 10,wehave
whichissmallerthan S∗ = 25.69foracylinder.
Thishighlightstheimportanceofthechoiceofdesigntype(hereintermsofshape) beforewecandoanytrulyusefuloptimization.Obviously,therearemanyotherfactorsthatcaninfluencethechoiceofdesign,includingthemanufacturabilityofthe design,stabilityofthestructure,easeofinstallation,spaceavailability,andsoon.For acontainer,inmostapplications,acylindermaybemucheasiertoproducethana sphere,andthustheoverallcostmaybelowerinpractice.Thoughtherearesomany factorstobeconsideredinengineeringdesign,forthepurposeofoptimization,here wewillonlyfocusontheimprovementandoptimizationofadesignwithwell-posed mathematicalformulations.
1.2.2Generalformulationofoptimization
Whateverthereal-worldapplicationsmaybe,itisusuallypossibletoformulatean optimizationprobleminagenericform[49,53,160].Alloptimizationproblemswith explicitobjectivescaningeneralbeexpressedasanonlinearlyconstrainedoptimizationproblem
maximize/minimize f(x ), x = (x1 ,x2 ,...,xD )T ∈ RD ,
subjectto φj (x ) = 0 (j = 1, 2,...,M), ψk (x ) ≤ 0 (k = 1,...,N), (1.25)
where f(x ), φj (x ),and ψk (x ) arescalarfunctionsofthedesignvector x .Herethe components xi of x = (x1 ,...,xD )T arecalleddesignordecisionvariables,andthey canbeeithercontinuous,discrete,oramixtureofthesetwo.Thevector x isoften calledthedecisionvector,whichvariesina D -dimensionalspace RD
Itisworthpointingoutthatweuseacolumnvectorherefor x (thuswithtranspose T ).Wecanalsousearowvector x = (x1 ,...,xD ) andtheresultswillbethe same.Differenttextbooksmayuseslightlydifferentformulations.Onceweareaware ofsuchminorvariations,itshouldcausenodifficultyorconfusion.
Inaddition,thefunction f(x ) iscalledtheobjectivefunctionorcostfunction, φj (x ) areconstraintsintermsof M equalities,and ψk (x ) areconstraintswrittenas N inequalities.Sothereare M + N constraintsintotal.Theoptimizationproblem formulatedhereisanonlinearconstrainedproblem.Heretheinequalities ψk (x ) ≤ 0 arewrittenas“lessthan”,andtheycanalsobewrittenas“greaterthan”viaasimple transformationbymultiplyingbothsidesby 1.
Thespacespannedbythedecisionvariablesiscalledthesearchspace RD ,whereas thespaceformedbythevaluesoftheobjectivefunctioniscalledtheobjectiveor responsespace,andsometimesthelandscape.Theoptimizationproblemessentially mapsthedomain RD orthespaceofdecisionvariablesintothesolutionspace R (or therealaxisingeneral).
Theobjectivefunction f(x ) canbeeitherlinearornonlinear.Iftheconstraints φj and ψk arealllinear,itbecomesalinearlyconstrainedproblem.Furthermore,when φj , ψk ,andtheobjectivefunction f(x ) arealllinear,thenitbecomesalinearprogrammingproblem[35].Iftheobjectiveisatmostquadraticwithlinearconstraints, thenitiscalledaquadraticprogrammingproblem.Ifallthevaluesofthedecision variablescanbeonlyintegers,thenthistypeoflinearprogrammingiscalledinteger programmingorintegerlinearprogramming.
Ontheotherhand,ifnoconstraintsarespecifiedandthus xi cantakeanyvalues intherealaxis(oranyintegers),thentheoptimizationproblemisreferredtoasan unconstrainedoptimizationproblem.
Asaverysimpleexampleofoptimizationproblemswithoutanyconstraints,we discussthesearchofthemaximaorminimaofaunivariatefunction.
Asimplemultimodalfunction f(x) = x 2 e x 2
Example2
Forexample,tofindthemaximumofaunivariatefunction f(x)
isasimpleunconstrainedproblem,whereasthefollowingproblemisasimpleconstrainedminimizationproblem:
subjectto
Itisworthpointingoutthattheobjectivesareexplicitlyknowninalltheoptimizationproblemstobediscussedinthisbook.However,inreality,itisoftendifficultto quantifywhatwewanttoachieve,butwestilltrytooptimizecertainthingssuchasthe degreeofenjoymentorservicequalityonholiday.Inothercases,itmaybeimpossible towritetheobjectivefunctioninanyexplicitformmathematically.
Frombasiccalculusweknowthat,foragivencurvedescribedby f(x),itsgradient f (x) describestherateofchange.When f (x) = 0,thecurvehasahorizontaltangent atthatparticularpoint.Thismeansthatitbecomesapointofspecialinterest.Infact, themaximumorminimumofacurveoccursat
f (x∗ ) = 0, (1.29)
whichisacriticalconditionorstationarycondition.Thesolution x∗ tothisequation correspondstoastationarypoint,andtheremaybemultiplestationarypointsfora givencurve.
Toseeifitisamaximumorminimumat x = x∗ ,wehavetousetheinformationof itssecondderivative f (x).Infact, f (x∗ )> 0correspondstoaminimum,whereas f (x∗ )< 0correspondstoamaximum.Letusseeaconcreteexample.
Example3
Tofindtheminimumof f(x) = x 2 e x 2 (seeFig. 1.2),wehavethestationarycondition f (x) = 0 or
f (x) = 2
Figure1.2
Figure1.3 (a)Feasibledomainwithnonlinearinequalityconstraints ψ1 (x) and ψ2 (x) (left)andlinear inequalityconstraint ψ3 (x).(b)Anexamplewithanobjectiveof f(x) = x 2 subjectto x ≥ 2(right).
As e x 2 > 0,wehave
x(1 x 2 ) = 0, or x = 0and x =±1.
Thesecondderivativeisgivenby
f (x) = 2e x 2 (1 5x 2 + 2x 4 ),
whichisanevenfunctionwithrespectto x
Soat x =±1, f (±1) = 2[1 5(±1)2 + 2(±1)4 ]e (±1)2 =−4e 1 < 0.Thus,thereare twomaximathatoccurat x∗ =±1 with fmax = e 1 .At x = 0,wehave f (0) = 2 > 0,thus theminimumof f(x) occursat x∗ = 0 with fmin (0) = 0.
Whatevertheobjectiveis,wehavetoevaluateitmanytimes.Inmostcases,the evaluationsoftheobjectivefunctionsconsumeasubstantialamountofcomputational power(whichcostsmoney)anddesigntime.Anyefficientalgorithmthatcanreduce thenumberofobjectiveevaluationssavesbothtimeandmoney.
Inmathematicalprogramming,therearemanyimportantconcepts,andwewill firstintroduceafewrelatedconcepts:feasiblesolutions,optimalitycriteria,thestrong localoptimum,andweaklocaloptimum.
1.2.3Feasiblesolution
Apoint x thatsatisfiesalltheconstraintsiscalledafeasiblepointandthusisafeasible solutiontotheproblem.Thesetofallfeasiblepointsiscalledthefeasibleregion(see Fig. 1.3).
Forexample,weknowthatthedomain f(x) = x 2 consistsofallrealnumbers.If wewanttominimize f(x) withoutanyconstraint,allsolutionssuchas x =−1, x = 1, and x = 0arefeasible.Infact,thefeasibleregionisthewholerealaxis.Obviously, x = 0correspondsto f(0) = 0asthetrueminimum.
However,ifwewanttofindtheminimumof f(x) = x 2 subjectto x ≥ 2,thenit becomesaconstrainedoptimizationproblem.Thepointssuchas x = 1and x = 0are nolongerfeasiblebecausetheydonotsatisfy x ≥ 2.Inthiscasethefeasiblesolutions areallthepointsthatsatisfy x ≥ 2.So x = 2, x = 100,and x = 108 areallfeasible.It isobviousthattheminimumoccursat x = 2with f(2) = 22 = 4,thatis,theoptimal solutionforthisproblemoccursattheboundarypoint x = 2(seeFig. 1.3).
Figure1.4 Localoptima,weakoptima,andglobaloptimality.
1.2.4Optimalitycriteria
Apoint x ∗ iscalledastronglocalmaximumofthenonlinearlyconstrainedoptimizationproblemif f(x ) isdefinedina δ -neighborhood N(x ∗ ,δ) andsatisfies f(x ∗ )>f(u) for u ∈ N(x ∗ ,δ),where δ> 0and u = x ∗ .If x ∗ isnotastronglocalmaximum,thentheinclusionofequalityinthecondition f(x ∗ ) ≥ f(u) forall u ∈ N(x ∗ ,δ) definesthepoint x ∗ asaweaklocalmaximum(seeFig. 1.4).Thelocal minimacanbedefinedinasimilarmannerwhen > and ≥ arereplacedby < and ≤, respectively.
Fig. 1.4 showsvariouslocalmaximaandminima.Point A isastronglocalmaximum,whereaspoint B isaweaklocalmaximumbecausetherearemany(infact, infinite)differentvaluesof x thatwillleadtothesamevalueof f(x ∗ ).Point D isthe globalmaximum,andpoint E istheglobalminimum.Inaddition,point F isastrong localminimum.However,point C isastronglocalminimum,butithasadiscontinuity in f (x ∗ ).Sothestationaryconditionforthispoint f (x ∗ ) = 0isnotvalid.Wewill notdealwiththesetypesofminimaormaximaindetail.
Aswebrieflymentionedbefore,forasmoothcurve f(x),optimalsolutionsusuallyoccuratstationarypointswhere f (x) = 0.Thisisnotalwaysthecasebecause optimalsolutionscanalsooccurattheboundary,aswehaveseeninthepreviousexampleofminimizing f(x) = x 2 subjectto x ≥ 2.Inourpresentdiscussion,wewill assumethatboth f(x ) and f (x ) arealwayscontinuousor f(x ) iseverywheretwice continuouslydifferentiable.Obviously,theinformationof f (x) isnotsufficientto determinewhetherastationarypointisalocalmaximumorminimum.Thus,higherorderderivativessuchas f (x) areneeded,butwedonotmakeanyassumptionatthis stage.Wewillfurtherdiscussthisindetailinthenextsection.
1.3Unconstrainedoptimization
Optimizationproblemscanbeclassifiedaseitherunconstrainedorconstrained.Unconstrainedoptimizationproblemscaninturnbesubdividedintounivariateandmultivariateproblems.
1.3.1Univariatefunctions
Thesimplestoptimizationproblemwithoutanyconstraintsisprobablythesearchfor themaximaorminimaofaunivariatefunction f(x).Forunconstrainedoptimization problems,theoptimalityoccursatthecriticalpointsgivenbythestationarycondition f (x) = 0.
However,thisstationaryconditionisjustanecessarycondition,butitisnotasufficientcondition.If f (x∗ ) = 0and f (x∗ )> 0,itisalocalminimum.Conversely,if f (x∗ ) = 0and f (x∗ )< 0,thenitisalocalmaximum.However,if f (x∗ ) = 0and f (x∗ ) = 0,careshouldbetakenbecause f (x) maybeindefinite(bothpositiveand negative)when x → x∗ ,then x∗ correspondstoasaddlepoint.
Forexample,for f(x) = x 3 ,wehave
f (x) = 3x 2 ,f (x) = 6x.
Thestationarycondition f (x) = 3x 2 = 0gives x∗ = 0.However,wealsohave f (x∗ ) = f (0) = 0.
Infact, f(x) = x 3 hasasaddlepoint x∗ = 0because f (0) = 0but f changessign from f (0+)> 0to f (0 )< 0as x movesfrompositivetonegative.
Example4
Forexample,tofindthemaximumorminimumofaunivariatefunction
f(x) = 3x 4 4x 3 12x 2 + 9, −∞ <x< ∞,
wefirsthavetofinditsstationarypoints x∗ whenthefirstderivative f (x) iszero,thatis,
f (x) = 12x 3 12x 2 24x = 12(x 3 x 2 2x) = 0.
Since f (x) = 12(x 3 x 2 2x) = 12x(x + 1)(x 2) = 0,wehave x∗ =−1,x∗ = 2,x∗ = 0.
Thesecondderivativeof f(x) issimply
f (x) = 36x 2 24x 24.
Fromthebasiccalculusweknowthatthemaximumrequires f (x∗ ) ≤ 0 whereastheminimum requires f (x∗ ) ≥ 0
At x∗ =−1,wehave
f ( 1) = 36( 1)2 24( 1) 24 = 36 > 0, sothispointcorrespondstoalocalminimum
f( 1) = 3( 1)4 4( 1)3 12( 1)2 + 9 = 4.