Introduction to algorithms for data mining and machine learning xin-she yang - The 2025 ebook editio

Page 1


https://ebookmass.com/product/introduction-to-algorithms-

Instant digital products (PDF, ePub, MOBI) ready for you

Download now and discover formats that fit your needs...

Introduction to algorithms for data mining and machine learning Yang

https://ebookmass.com/product/introduction-to-algorithms-for-datamining-and-machine-learning-yang/

ebookmass.com

Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms,

https://ebookmass.com/product/fundamentals-of-machine-learning-forpredictive-data-analytics-algorithms/

ebookmass.com

Machine Learning for Signal Processing: Data Science, Algorithms, and Computational Statistics Max A. Little

https://ebookmass.com/product/machine-learning-for-signal-processingdata-science-algorithms-and-computational-statistics-max-a-little/

ebookmass.com

Exchange and Markets in Early Economic Development: Informal Economy in the Three New Guineas John D. Conroy

https://ebookmass.com/product/exchange-and-markets-in-early-economicdevelopment-informal-economy-in-the-three-new-guineas-john-d-conroy/

ebookmass.com

Accurate Results in the Clinical Laboratory: A Guide to Error Detection and Correction 2nd Edition Ph.D. Dasgupta

https://ebookmass.com/product/accurate-results-in-the-clinicallaboratory-a-guide-to-error-detection-and-correction-2nd-edition-ph-ddasgupta/ ebookmass.com

College Admission Essentials: A Step-by-Step Guide to Showing Colleges Who You Are and What Matters to You Ethan Sawyer

https://ebookmass.com/product/college-admission-essentials-a-step-bystep-guide-to-showing-colleges-who-you-are-and-what-matters-to-youethan-sawyer-2/ ebookmass.com

Uncommon Wrath. How Caesar and Cato's Deadly Rivalry Destroyed the Roman Republic Josiah Osgood

https://ebookmass.com/product/uncommon-wrath-how-caesar-and-catosdeadly-rivalry-destroyed-the-roman-republic-josiah-osgood/

ebookmass.com

Transitions of Care in Pharmacy Casebook 1 -online Edition

Laressa Bethishou

https://ebookmass.com/product/transitions-of-care-in-pharmacycasebook-1-online-edition-laressa-bethishou/

ebookmass.com

Wall Street: A History Charles R Geisst

https://ebookmass.com/product/wall-street-a-history-charles-r-geisst/

ebookmass.com

https://ebookmass.com/product/nutrition-for-sport-andexercise-3e-edition-edition-andrew-j-doyle/

ebookmass.com

Algorithms for Data Mining and Machine Learning

IntroductiontoAlgorithmsforDataMiningand MachineLearning

This page intentionally left blank

Introductionto AlgorithmsforData MiningandMachine Learning

Xin-SheYang

MiddlesexUniversity

SchoolofScienceandTechnology

London,UnitedKingdom

AcademicPressisanimprintofElsevier 125LondonWall,LondonEC2Y5AS,UnitedKingdom 525BStreet,Suite1650,SanDiego,CA92101,UnitedStates 50HampshireStreet,5thFloor,Cambridge,MA02139,UnitedStates TheBoulevard,LangfordLane,Kidlington,OxfordOX51GB,UnitedKingdom

Copyright©2019ElsevierInc.Allrightsreserved.

Nopartofthispublicationmaybereproducedortransmittedinanyformorbyanymeans,electronicor mechanical,includingphotocopying,recording,oranyinformationstorageandretrievalsystem,without permissioninwritingfromthepublisher.Detailsonhowtoseekpermission,furtherinformationaboutthe Publisher’spermissionspoliciesandourarrangementswithorganizationssuchastheCopyrightClearanceCenter andtheCopyrightLicensingAgency,canbefoundatourwebsite: www.elsevier.com/permissions.

ThisbookandtheindividualcontributionscontainedinitareprotectedundercopyrightbythePublisher(other thanasmaybenotedherein).

Notices

Knowledgeandbestpracticeinthisfieldareconstantlychanging.Asnewresearchandexperiencebroadenour understanding,changesinresearchmethods,professionalpractices,ormedicaltreatmentmaybecomenecessary.

Practitionersandresearchersmustalwaysrelyontheirownexperienceandknowledgeinevaluatingandusing anyinformation,methods,compounds,orexperimentsdescribedherein.Inusingsuchinformationormethods theyshouldbemindfuloftheirownsafetyandthesafetyofothers,includingpartiesforwhomtheyhavea professionalresponsibility.

Tothefullestextentofthelaw,neitherthePublishernortheauthors,contributors,oreditors,assumeanyliability foranyinjuryand/ordamagetopersonsorpropertyasamatterofproductsliability,negligenceorotherwise,or fromanyuseoroperationofanymethods,products,instructions,orideascontainedinthematerialherein.

LibraryofCongressCataloging-in-PublicationData

AcatalogrecordforthisbookisavailablefromtheLibraryofCongress

BritishLibraryCataloguing-in-PublicationData

AcataloguerecordforthisbookisavailablefromtheBritishLibrary

ISBN:978-0-12-817216-2

ForinformationonallAcademicPresspublications visitourwebsiteat https://www.elsevier.com/books-and-journals

Publisher: CandiceJanco

AcquisitionEditor: J.ScottBentley

EditorialProjectManager: MichaelLutz

ProductionProjectManager: NileshKumarShah

Designer: MilesHitchen

TypesetbyVTeX

1Introductiontooptimization 1

1.1Algorithms1

1.1.1Essenceofanalgorithm1

1.1.2Issueswithalgorithms3

1.1.3Typesofalgorithms3

1.2Optimization4

1.2.1Asimpleexample4

1.2.2Generalformulationofoptimization7

1.2.3Feasiblesolution9

1.2.4Optimalitycriteria10

1.3Unconstrainedoptimization10

1.3.1Univariatefunctions11

1.3.2Multivariatefunctions12

1.4Nonlinearconstrainedoptimization14

1.4.1Penaltymethod15

1.4.2Lagrangemultipliers16

1.4.3Karush–Kuhn–Tuckerconditions17

1.5Notesonsoftware18

2Mathematicalfoundations 19

2.1Convexity20

2.1.1Linearandaffinefunctions20

2.1.2Convexfunctions21

2.1.3Mathematicaloperationsonconvexfunctions22

2.2Computationalcomplexity22

2.2.1Timeandspacecomplexity24

2.2.2Complexityofalgorithms25

2.3Normsandregularization26

2.3.1Norms26

2.3.2Regularization28

2.4Probabilitydistributions29

2.4.1Randomvariables29

2.4.2Probabilitydistributions30

2.4.3ConditionalprobabilityandBayesianrule32

2.4.4Gaussianprocess34

2.5BayesiannetworkandMarkovmodels35

2.6MonteCarlosampling36

2.6.1MarkovchainMonteCarlo37

2.6.2Metropolis–Hastingsalgorithm37

2.6.3Gibbssampler39

2.7Entropy,crossentropy,andKLdivergence39

2.7.1Entropyandcrossentropy39

2.7.2DLdivergence40

2.8Fuzzyrules41

2.9Dataminingandmachinelearning42

2.9.1Datamining42

2.9.2Machinelearning42 2.10Notesonsoftware42

3Optimizationalgorithms 45

3.1Gradient-basedmethods45

3.1.1Newton’smethod45

3.1.2Newton’smethodformultivariatefunctions47

3.1.3Linesearch48

3.2Variantsofgradient-basedmethods49

3.2.1Stochasticgradientdescent50

3.2.2Subgradientmethod51

3.2.3Conjugategradientmethod52

3.3Optimizersindeeplearning53

3.4Gradient-freemethods56

3.5Evolutionaryalgorithmsandswarmintelligence58

3.5.1Geneticalgorithm58

3.5.2Differentialevolution60

3.5.3Particleswarmoptimization61

3.5.4Batalgorithm61

3.5.5Fireflyalgorithm62

3.5.6Cuckoosearch62

3.5.7Flowerpollinationalgorithm63 3.6Notesonsoftware64

4Datafittingandregression 67

4.1Samplemeanandvariance67 4.2Regressionanalysis69

4.2.1Maximumlikelihood69

4.2.2Linerregression70

4.2.3Linearization75

4.2.4Generalizedlinearregression77

4.2.5Goodnessoffit80

4.3Nonlinearleastsquares81

4.3.1Gauss–Newtonalgorithm82

4.3.2Levenberg–Marquardtalgorithm85

4.3.3Weightedleastsquares85

4.4Overfittingandinformationcriteria86

4.5RegularizationandLassomethod88

4.6Notesonsoftware90

5Logisticregression,PCA,LDA,andICA 91

5.1Logisticregression91

5.2Softmaxregression96

5.3Principalcomponentanalysis96

5.4Lineardiscriminantanalysis101

5.5Singularvaluedecomposition104

5.6Independentcomponentanalysis105

5.7Notesonsoftware108

6Dataminingtechniques 109

6.1Introduction110

6.1.1Typesofdata110

6.1.2Distancemetric110

6.2Hierarchyclustering111

6.3 k -Nearest-neighboralgorithm112

6.4 k -Meansalgorithm113

6.5Decisiontreesandrandomforests115

6.5.1Decisiontreealgorithm115

6.5.2ID3algorithmandC4.5classifier116

6.5.3Randomforest120

6.6Bayesianclassifiers121

6.6.1NaiveBayesianclassifier121

6.6.2Bayesiannetworks123

6.7Dataminingforbigdata124

6.7.1Characteristicsofbigdata124

6.7.2Statisticalnatureofbigdata125

6.7.3Miningbigdata125

6.8Notesonsoftware127

7Supportvectormachineandregression 129

7.1Statisticallearningtheory129

7.2Linearsupportvectormachine130

7.3KernelfunctionsandnonlinearSVM133

7.4Supportvectorregression135

7.5Notesonsoftware137

8Neuralnetworksanddeeplearning 139

8.1Learning139

8.2Artificialneuralnetworks140

8.2.1Neuronmodels140

8.2.2Activationmodels141

8.2.3Artificialneuralnetworks143

8.3Backpropagationalgorithm146

8.4LossfunctionsinANN147

8.5Optimizersandchoiceofoptimizers149

8.6Networkarchitecture149

8.7Deeplearning151

8.7.1Convolutionalneuralnetworks151

8.7.2RestrictedBoltzmannmachine157

8.7.3Deepneuralnets158

8.7.4Trendsindeeplearning159

8.8Tuningofhyperparameters160

8.9Notesonsoftware161

Abouttheauthor

Xin-SheYang obtainedhisPhDinAppliedMathematicsfromtheUniversityofOxford.HethenworkedatCambridgeUniversityandNationalPhysicalLaboratory(UK) asaSeniorResearchScientist.NowheisReaderatMiddlesexUniversityLondon,and anelectedBye-FellowatCambridgeUniversity.

HeisalsotheIEEEComputerIntelligenceSociety(CIS)ChairfortheTaskForce onBusinessIntelligenceandKnowledgeManagement,DirectoroftheInternational ConsortiumforOptimizationandModellinginScienceandIndustry(iCOMSI),and anEditorofSpringer’sBookSeries SpringerTractsinNature-InspiredComputing (STNIC).

Withmorethan20yearsofresearchandteachingexperience,hehasauthored 10booksandeditedmorethan15books.Hepublishedmorethan200researchpapersininternationalpeer-reviewedjournalsandconferenceproceedingswithmore than36800citations.HehasbeenontheprestigiouslistsofClarivateAnalyticsand WebofSciencehighlycitedresearchersin2016,2017,and2018.Heservesonthe EditorialBoardsofmanyinternationaljournalsincluding InternationalJournalof Bio-InspiredComputation,Elsevier’s JournalofComputationalScience(JoCS), InternationalJournalofParallel,EmergentandDistributedSystems,and International JournalofComputerMathematics.HeisalsotheEditor-in-Chiefofthe International JournalofMathematicalModellingandNumericalOptimisation.

This page intentionally left blank

Preface

Bothdataminingandmachinelearningarebecomingpopularsubjectsforuniversity coursesandindustrialapplications.ThispopularityispartlydrivenbytheInternetand socialmediabecausetheygenerateahugeamountofdataeveryday,andtheunderstandingofsuchbigdatarequiressophisticateddataminingtechniques.Inaddition, manyapplicationssuchasfacialrecognitionandroboticshaveextensivelyusedmachinelearningalgorithms,leadingtotheincreasingpopularityofartificialintelligence. Fromamoregeneralperspective,bothdataminingandmachinelearningareclosely relatedtooptimization.Afterall,inmanyapplications,wehavetominimizecosts, errors,energyconsumption,andenvironmentimpactandtomaximizesustainability,productivity,andefficiency.Manyproblemsindataminingandmachinelearning areusuallyformulatedasoptimizationproblemssothattheycanbesolvedbyoptimizationalgorithms.Therefore,optimizationtechniquesarecloselyrelatedtomany techniquesindataminingandmachinelearning.

Coursesondatamining,machinelearning,andoptimizationareoftencompulsory forstudents,studyingcomputerscience,managementscience,engineeringdesign,operationsresearch,datascience,finance,andeconomics.Allstudentshavetodevelop acertainlevelofdatamodelingskillssothattheycanprocessandinterpretdatafor classification,clustering,curve-fitting,andpredictions.Theyshouldalsobefamiliar withmachinelearningtechniquesthatarecloselyrelatedtodataminingsoastocarry outproblemsolvinginmanyreal-worldapplications.Thisbookprovidesanintroductiontoallthemajortopicsforsuchcourses,coveringtheessentialideasofallkey algorithmsandtechniquesfordatamining,machinelearning,andoptimization.

Thoughthereareoveradozengoodbooksonsuchtopics,mostofthesebooksare eithertoospecializedwithspecificreadershiportoolengthy(oftenover500pages). Thisbookfillsinthegapwithacompactandconciseapproachbyfocusingonthekey concepts,algorithms,andtechniquesatanintroductorylevel.Themainapproachof thisbookisinformal,theorem-free,andpractical.Byusinganinformalapproachall fundamentaltopicsrequiredfordataminingandmachinelearningarecovered,and thereaderscangainsuchbasicknowledgeofallimportantalgorithmswithafocus ontheirkeyideas,withoutworryingaboutanytedious,rigorousmathematicalproofs. Inaddition,thepracticalapproachprovidesabout30workedexamplesinthisbook sothatthereaderscanseehoweachstepofthealgorithmsandtechniquesworks. Thus,thereaderscanbuildtheirunderstandingandconfidencegraduallyandina step-by-stepmanner.Furthermore,withtheminimalrequirementsofbasichighschool mathematicsandsomebasiccalculus,suchaninformalandpracticalstylecanalso enablethereaderstolearnthecontentsbyself-studyandattheirownpace.

Thisbookissuitableforundergraduatesandgraduatestorapidlydevelopallthe fundamentalknowledgeofdatamining,machinelearning,andoptimization.Itcan

alsobeusedbystudentsandresearchersasareferencetoreviewandrefreshtheir knowledgeindatamining,machinelearning,optimization,computerscience,anddata science.

Xin-SheYang January2019inLondon

Acknowledgments

Iwouldliketothankallmystudentsandcolleagueswhohavegivenvaluablefeedback andcommentsonsomeofthecontentsandexamplesofthisbook.Ialsowouldliketo thankmyeditors,J.ScottBentleyandMichaelLutz,andthestaffatElsevierfortheir professionalism.Lastbutnotleast,Ithankmyfamilyforallthehelpandsupport.

Xin-SheYang

January2019

This page intentionally left blank

Introductiontooptimization

Thisbookintroducesthemostfundamentalsandalgorithmsrelatedtooptimization, datamining,andmachinelearning.Themainrequirementissomeunderstandingof high-schoolmathematicsandbasiccalculus;however,wewillreviewandintroduce someofthemathematicalfoundationsinthefirsttwochapters.

1.1Algorithms

Analgorithmisaniterative,step-by-stepprocedureforcomputation.Thedetailed procedurecanbeasimpledescription,anequation,oraseriesofdescriptionsin combinationwithequations.Findingtherootsofapolynomial,checkingifanaturalnumberisaprimenumber,andgeneratingrandomnumbersareallalgorithms.

1.1.1Essenceofanalgorithm

Inessence,analgorithmcanbewrittenasaniterativeequationorasetofiterative equations.Forexample,tofindasquarerootof a> 0,wecanusethefollowing iterativeequation:

Example1

Asanexample,if x0 = 1 and a = 4,thenwehave x1 = 1 2 (1 + 4 1 ) = 2 5

Similarly,wehave

x4 ≈ 2.00000927,

whichisveryclosetothetruevalueof √4 = 2.Theaccuracyofthisiterativeformulaoralgorithm ishighbecauseitachievestheaccuracyoffivedecimalplacesafterfouriterations.

Theconvergenceisveryquickifwestartfromdifferentinitialvaluessuchas x0 = 10andeven x0 = 100.However,foranobviousreason,wecannotstartwith x0 = 0duetodivisionbyzero.

Findtherootof x = √a isequivalenttosolvingtheequation f(x) = x 2 a = 0,

whichisagainequivalenttofindingtherootsofapolynomial f(x).Weknowthat Newton’sroot-findingalgorithmcanbewrittenas

where f (x) isthefirstderivativeorgradientof f(x).Inthiscase,wehave f (x) = 2x .Thus,Newton’sformulabecomes

whichcanbewrittenas

ThisisexactlywhatwehaveinEq.(1.1).

Newton’smethodhasrigorousmathematicalfoundations,whichhasaguaranteed convergenceundercertainconditions.However,ingeneral,Eq.(1.6)ismoregeneral, andthegradientinformation f (x) isneeded.Inaddition,fortheformulatobevalid, wemusthave f (x) = 0.

1.1.2Issueswithalgorithms

TheadvantageofthealgorithmgiveninEq.(1.1)isthatitconvergesveryquickly. However,carefulreadersmayhaveasked:weknowthat √4 =±2,howcanwefind theotherroot 2inadditionto +2?

Evenifweusedifferentinitialvalue x0 = 10or x0 = 0.5,wecanonlyreach x∗ = 2, not 2.

Whathappensifwestartwith

whichisapproaching 2veryquickly.Ifwestartfrom x0 =−10or x0 =−0.5,then wecanalwaysget x∗ =−2,not +2.

Thishighlightsakeyissuehere:thefinalsolutionseemstodependontheinitial startingpointforthisalgorithm,whichistrueformanyalgorithms.

Nowtherelevantquestionis:howdoweknowwheretostarttogetaparticular solution?Thegeneralshortansweris“wedonotknow”.Thus,someknowledgeof theproblemunderconsiderationoraneducatedguessmaybeusefultofindthefinal solution.

Infact,mostalgorithmsmaydependontheinitialconfiguration,andsuchalgorithmsareoftencarryingoutsearchmoveslocally.Thus,thistypeofalgorithmis oftenreferredtoaslocalsearch.Agoodalgorithmshouldbeableto“forget”itsinitial configurationthoughsuchalgorithmsmaynotexistatallformosttypesofproblems.

Whatweneedingeneralistheglobalsearch,whichattemptstofindfinalsolutions thatarelesssensitivetotheinitialstartingpoint(s).

Anotherimportantissueinourdiscussionsisthatthegradientinformation f (x) is necessaryforsomealgorithmssuchasNewton’smethodgiveninEq.(1.6).Thisposes certainrequirementsonthesmoothnessofthefunction f(x).Forexample,weknow that |x | isnotdifferentiableat x = 0.Thus,wecannotdirectlyuseNewton’smethod tofindtherootsof f(x) =|x |x 2 a = 0for a> 0.Somemodificationsareneeded. Thereareotherissuesrelatedtoalgorithmssuchasthesettingofparameters,the slowrateofconvergence,conditionnumbers,anditerationstructures.Allthesemake algorithmdesignsandusagesomehowchallenging,andwewilldiscusstheseissues inmoredetaillaterinthisbook.

1.1.3Typesofalgorithms

Analgorithmcanonlydoaspecificcomputationtask(atmostaclassofcomputational tasks),andnoalgorithmscandoallthetasks.Thus,algorithmscanbeclassifieddue totheirpurposes.Analgorithmtofindrootsofapolynomialbelongstoroot-finding algorithms,whereasanalgorithmforrankingasetofnumbersbelongstosorting algorithms.Therearemanyclassesofalgorithmsfordifferentpurposes.Evenforthe samepurposesuchassorting,therearemanydifferentalgorithmssuchasthemerge sort,bubblesort,quicksort,andothers.

Wecanalsocategorizealgorithmsintermsoftheircharacteristics.Theroot-finding algorithmswejustintroducedaredeterministicalgorithmsbecausethefinalsolutions areexactlythesameifwestartfromthesameinitialguess.Weobtainthesamesetof solutionseverytimewerunthealgorithm.Ontheotherhand,wemayintroducesome randomizationintothealgorithm,forexample,usingpurelyrandominitialpoints. Everytimewerunthealgorithm,weuseanewrandominitialguess.Inthiscase,the algorithmcanhavesomenondeterministicnature,andsuchalgorithmsarereferred toasstochastic.Sometimes,usingrandomnessmaybeadvantageous.Forexample,in theexampleof √4 =±2usingEq.(1.1),randominitialvalues(bothpositiveandnegative)canallowthealgorithmtofindbothroots.Infact,amajortrendinthemodern metaheuristicsisusingsomerandomizationtosuitdifferentpurposes.

Foralgorithmstobeintroducedinthisbook,wearemainlyconcernedwithalgorithmsfordatamining,optimization,andmachinelearning.Weusearelatively unifiedapproachtolinkalgorithmsindataminingandmachinelearningtoalgorithms foroptimization.

1.2Optimization

Optimizationiseverywhere,fromengineeringdesigntobusinessplanning.Afterall, timeandresourcesarelimited,andoptimaluseofsuchvaluableresourcesiscrucial. Inaddition,designsofproductshavetomaximizetheperformance,sustainability,and energyefficiencyandtominimizethecosts.Therefore,optimizationisimportantfor manyapplications.

1.2.1Asimpleexample

Letusstartwithaverysimpleexampletodesignacontainerwithvolumecapacity V0 = 10m3 .Asthemaincostisrelatedtothecostofmaterials,themainaimisto minimizethetotalsurfacearea S .

Thefirstthingwehavetodecideistheshapeofthecontainer(cylinder,cubic, sphereorellipsoid,ormorecomplexgeometry).Forsimplicity,letusstartwitha cylindricalshapewithradius r andheight h (seeFig. 1.1).

Thetotalsurfaceareaofacylinderis

S = 2(πr 2 ) + 2πrh, (1.11) andthevolumeis

V = πr 2 h. (1.12)

Thereareonlytwodesignvariables r and h andoneobjectivefunction S tobeminimized.Obviously,ifthereisnocapacityconstraint,thenwecanchoosenottobuild thecontainer,andthenthecostofmaterialsiszerofor r = 0and h = 0.However,

Figure1.1 Designofacylindriccontainer.

theconstraintrequirementmeansthatwehavetobuildacontainerwithfixedvolume V0 = πr 2 h = 10m3 .Therefore,thisoptimizationproblemcanbewrittenas

subjecttotheequalityconstraint

Tosolvethisproblem,wecanfirsttrytousetheequalityconstrainttoreducethe numberofdesignvariablesbysolving h.Sowehave

Substitutingitinto(1.13),weget

Thisisaunivariatefunction.Frombasiccalculusweknowthattheminimumormaximumcanoccuratthestationarypoint,wherethefirstderivativeiszero,thatis,

whichgives

Thus,theheightis

Thismeansthattheheightistwicetheradius: h = 2r .Thus,theminimumsurfaceis

S∗ = 2πr 2 + 2πrh = 2πr 2 + 2πr(2r) = 6πr 2

For V0 = 10,wehave

andthetotalsurfacearea

S∗ = 2πr 2 + 2πrh ≈ 25.69.

Itisworthpointingoutthatthisoptimalsolutionisbasedontheassumptionorrequirementtodesignacylindricalcontainer.Ifwedecidetouseaspherewithradius R , weknowthatitsvolumeandsurfaceareais

Wecansolve

whichgivesthesurfacearea

Since6π/ 3 √4π 2 ≈ 5.5358and4π

.83598,wehave S<S∗ ,thatis,the surfaceareaofasphereissmallerthantheminimumsurfaceareaofacylinderwith thesamevolume.Infact,forthesame V0 = 10,wehave

whichissmallerthan S∗ = 25.69foracylinder.

Thishighlightstheimportanceofthechoiceofdesigntype(hereintermsofshape) beforewecandoanytrulyusefuloptimization.Obviously,therearemanyotherfactorsthatcaninfluencethechoiceofdesign,includingthemanufacturabilityofthe design,stabilityofthestructure,easeofinstallation,spaceavailability,andsoon.For acontainer,inmostapplications,acylindermaybemucheasiertoproducethana sphere,andthustheoverallcostmaybelowerinpractice.Thoughtherearesomany factorstobeconsideredinengineeringdesign,forthepurposeofoptimization,here wewillonlyfocusontheimprovementandoptimizationofadesignwithwell-posed mathematicalformulations.

1.2.2Generalformulationofoptimization

Whateverthereal-worldapplicationsmaybe,itisusuallypossibletoformulatean optimizationprobleminagenericform[49,53,160].Alloptimizationproblemswith explicitobjectivescaningeneralbeexpressedasanonlinearlyconstrainedoptimizationproblem

maximize/minimize f(x ), x = (x1 ,x2 ,...,xD )T ∈ RD ,

subjectto φj (x ) = 0 (j = 1, 2,...,M), ψk (x ) ≤ 0 (k = 1,...,N), (1.25)

where f(x ), φj (x ),and ψk (x ) arescalarfunctionsofthedesignvector x .Herethe components xi of x = (x1 ,...,xD )T arecalleddesignordecisionvariables,andthey canbeeithercontinuous,discrete,oramixtureofthesetwo.Thevector x isoften calledthedecisionvector,whichvariesina D -dimensionalspace RD

Itisworthpointingoutthatweuseacolumnvectorherefor x (thuswithtranspose T ).Wecanalsousearowvector x = (x1 ,...,xD ) andtheresultswillbethe same.Differenttextbooksmayuseslightlydifferentformulations.Onceweareaware ofsuchminorvariations,itshouldcausenodifficultyorconfusion.

Inaddition,thefunction f(x ) iscalledtheobjectivefunctionorcostfunction, φj (x ) areconstraintsintermsof M equalities,and ψk (x ) areconstraintswrittenas N inequalities.Sothereare M + N constraintsintotal.Theoptimizationproblem formulatedhereisanonlinearconstrainedproblem.Heretheinequalities ψk (x ) ≤ 0 arewrittenas“lessthan”,andtheycanalsobewrittenas“greaterthan”viaasimple transformationbymultiplyingbothsidesby 1.

Thespacespannedbythedecisionvariablesiscalledthesearchspace RD ,whereas thespaceformedbythevaluesoftheobjectivefunctioniscalledtheobjectiveor responsespace,andsometimesthelandscape.Theoptimizationproblemessentially mapsthedomain RD orthespaceofdecisionvariablesintothesolutionspace R (or therealaxisingeneral).

Theobjectivefunction f(x ) canbeeitherlinearornonlinear.Iftheconstraints φj and ψk arealllinear,itbecomesalinearlyconstrainedproblem.Furthermore,when φj , ψk ,andtheobjectivefunction f(x ) arealllinear,thenitbecomesalinearprogrammingproblem[35].Iftheobjectiveisatmostquadraticwithlinearconstraints, thenitiscalledaquadraticprogrammingproblem.Ifallthevaluesofthedecision variablescanbeonlyintegers,thenthistypeoflinearprogrammingiscalledinteger programmingorintegerlinearprogramming.

Ontheotherhand,ifnoconstraintsarespecifiedandthus xi cantakeanyvalues intherealaxis(oranyintegers),thentheoptimizationproblemisreferredtoasan unconstrainedoptimizationproblem.

Asaverysimpleexampleofoptimizationproblemswithoutanyconstraints,we discussthesearchofthemaximaorminimaofaunivariatefunction.

Asimplemultimodalfunction f(x) = x 2 e x 2

Example2

Forexample,tofindthemaximumofaunivariatefunction f(x)

isasimpleunconstrainedproblem,whereasthefollowingproblemisasimpleconstrainedminimizationproblem:

subjectto

Itisworthpointingoutthattheobjectivesareexplicitlyknowninalltheoptimizationproblemstobediscussedinthisbook.However,inreality,itisoftendifficultto quantifywhatwewanttoachieve,butwestilltrytooptimizecertainthingssuchasthe degreeofenjoymentorservicequalityonholiday.Inothercases,itmaybeimpossible towritetheobjectivefunctioninanyexplicitformmathematically.

Frombasiccalculusweknowthat,foragivencurvedescribedby f(x),itsgradient f (x) describestherateofchange.When f (x) = 0,thecurvehasahorizontaltangent atthatparticularpoint.Thismeansthatitbecomesapointofspecialinterest.Infact, themaximumorminimumofacurveoccursat

f (x∗ ) = 0, (1.29)

whichisacriticalconditionorstationarycondition.Thesolution x∗ tothisequation correspondstoastationarypoint,andtheremaybemultiplestationarypointsfora givencurve.

Toseeifitisamaximumorminimumat x = x∗ ,wehavetousetheinformationof itssecondderivative f (x).Infact, f (x∗ )> 0correspondstoaminimum,whereas f (x∗ )< 0correspondstoamaximum.Letusseeaconcreteexample.

Example3

Tofindtheminimumof f(x) = x 2 e x 2 (seeFig. 1.2),wehavethestationarycondition f (x) = 0 or

f (x) = 2

Figure1.2

Figure1.3 (a)Feasibledomainwithnonlinearinequalityconstraints ψ1 (x) and ψ2 (x) (left)andlinear inequalityconstraint ψ3 (x).(b)Anexamplewithanobjectiveof f(x) = x 2 subjectto x ≥ 2(right).

As e x 2 > 0,wehave

x(1 x 2 ) = 0, or x = 0and x =±1.

Thesecondderivativeisgivenby

f (x) = 2e x 2 (1 5x 2 + 2x 4 ),

whichisanevenfunctionwithrespectto x

Soat x =±1, f (±1) = 2[1 5(±1)2 + 2(±1)4 ]e (±1)2 =−4e 1 < 0.Thus,thereare twomaximathatoccurat x∗ =±1 with fmax = e 1 .At x = 0,wehave f (0) = 2 > 0,thus theminimumof f(x) occursat x∗ = 0 with fmin (0) = 0.

Whatevertheobjectiveis,wehavetoevaluateitmanytimes.Inmostcases,the evaluationsoftheobjectivefunctionsconsumeasubstantialamountofcomputational power(whichcostsmoney)anddesigntime.Anyefficientalgorithmthatcanreduce thenumberofobjectiveevaluationssavesbothtimeandmoney.

Inmathematicalprogramming,therearemanyimportantconcepts,andwewill firstintroduceafewrelatedconcepts:feasiblesolutions,optimalitycriteria,thestrong localoptimum,andweaklocaloptimum.

1.2.3Feasiblesolution

Apoint x thatsatisfiesalltheconstraintsiscalledafeasiblepointandthusisafeasible solutiontotheproblem.Thesetofallfeasiblepointsiscalledthefeasibleregion(see Fig. 1.3).

Forexample,weknowthatthedomain f(x) = x 2 consistsofallrealnumbers.If wewanttominimize f(x) withoutanyconstraint,allsolutionssuchas x =−1, x = 1, and x = 0arefeasible.Infact,thefeasibleregionisthewholerealaxis.Obviously, x = 0correspondsto f(0) = 0asthetrueminimum.

However,ifwewanttofindtheminimumof f(x) = x 2 subjectto x ≥ 2,thenit becomesaconstrainedoptimizationproblem.Thepointssuchas x = 1and x = 0are nolongerfeasiblebecausetheydonotsatisfy x ≥ 2.Inthiscasethefeasiblesolutions areallthepointsthatsatisfy x ≥ 2.So x = 2, x = 100,and x = 108 areallfeasible.It isobviousthattheminimumoccursat x = 2with f(2) = 22 = 4,thatis,theoptimal solutionforthisproblemoccursattheboundarypoint x = 2(seeFig. 1.3).

Figure1.4 Localoptima,weakoptima,andglobaloptimality.

1.2.4Optimalitycriteria

Apoint x ∗ iscalledastronglocalmaximumofthenonlinearlyconstrainedoptimizationproblemif f(x ) isdefinedina δ -neighborhood N(x ∗ ,δ) andsatisfies f(x ∗ )>f(u) for u ∈ N(x ∗ ,δ),where δ> 0and u = x ∗ .If x ∗ isnotastronglocalmaximum,thentheinclusionofequalityinthecondition f(x ∗ ) ≥ f(u) forall u ∈ N(x ∗ ,δ) definesthepoint x ∗ asaweaklocalmaximum(seeFig. 1.4).Thelocal minimacanbedefinedinasimilarmannerwhen > and ≥ arereplacedby < and ≤, respectively.

Fig. 1.4 showsvariouslocalmaximaandminima.Point A isastronglocalmaximum,whereaspoint B isaweaklocalmaximumbecausetherearemany(infact, infinite)differentvaluesof x thatwillleadtothesamevalueof f(x ∗ ).Point D isthe globalmaximum,andpoint E istheglobalminimum.Inaddition,point F isastrong localminimum.However,point C isastronglocalminimum,butithasadiscontinuity in f (x ∗ ).Sothestationaryconditionforthispoint f (x ∗ ) = 0isnotvalid.Wewill notdealwiththesetypesofminimaormaximaindetail.

Aswebrieflymentionedbefore,forasmoothcurve f(x),optimalsolutionsusuallyoccuratstationarypointswhere f (x) = 0.Thisisnotalwaysthecasebecause optimalsolutionscanalsooccurattheboundary,aswehaveseeninthepreviousexampleofminimizing f(x) = x 2 subjectto x ≥ 2.Inourpresentdiscussion,wewill assumethatboth f(x ) and f (x ) arealwayscontinuousor f(x ) iseverywheretwice continuouslydifferentiable.Obviously,theinformationof f (x) isnotsufficientto determinewhetherastationarypointisalocalmaximumorminimum.Thus,higherorderderivativessuchas f (x) areneeded,butwedonotmakeanyassumptionatthis stage.Wewillfurtherdiscussthisindetailinthenextsection.

1.3Unconstrainedoptimization

Optimizationproblemscanbeclassifiedaseitherunconstrainedorconstrained.Unconstrainedoptimizationproblemscaninturnbesubdividedintounivariateandmultivariateproblems.

1.3.1Univariatefunctions

Thesimplestoptimizationproblemwithoutanyconstraintsisprobablythesearchfor themaximaorminimaofaunivariatefunction f(x).Forunconstrainedoptimization problems,theoptimalityoccursatthecriticalpointsgivenbythestationarycondition f (x) = 0.

However,thisstationaryconditionisjustanecessarycondition,butitisnotasufficientcondition.If f (x∗ ) = 0and f (x∗ )> 0,itisalocalminimum.Conversely,if f (x∗ ) = 0and f (x∗ )< 0,thenitisalocalmaximum.However,if f (x∗ ) = 0and f (x∗ ) = 0,careshouldbetakenbecause f (x) maybeindefinite(bothpositiveand negative)when x → x∗ ,then x∗ correspondstoasaddlepoint.

Forexample,for f(x) = x 3 ,wehave

f (x) = 3x 2 ,f (x) = 6x.

Thestationarycondition f (x) = 3x 2 = 0gives x∗ = 0.However,wealsohave f (x∗ ) = f (0) = 0.

Infact, f(x) = x 3 hasasaddlepoint x∗ = 0because f (0) = 0but f changessign from f (0+)> 0to f (0 )< 0as x movesfrompositivetonegative.

Example4

Forexample,tofindthemaximumorminimumofaunivariatefunction

f(x) = 3x 4 4x 3 12x 2 + 9, −∞ <x< ∞,

wefirsthavetofinditsstationarypoints x∗ whenthefirstderivative f (x) iszero,thatis,

f (x) = 12x 3 12x 2 24x = 12(x 3 x 2 2x) = 0.

Since f (x) = 12(x 3 x 2 2x) = 12x(x + 1)(x 2) = 0,wehave x∗ =−1,x∗ = 2,x∗ = 0.

Thesecondderivativeof f(x) issimply

f (x) = 36x 2 24x 24.

Fromthebasiccalculusweknowthatthemaximumrequires f (x∗ ) ≤ 0 whereastheminimum requires f (x∗ ) ≥ 0

At x∗ =−1,wehave

f ( 1) = 36( 1)2 24( 1) 24 = 36 > 0, sothispointcorrespondstoalocalminimum

f( 1) = 3( 1)4 4( 1)3 12( 1)2 + 9 = 4.

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.