Introduction to algorithms for data mining and machine learning xin-she yang - The 2025 ebook editio by Education Libraries

https://ebookmass.com/product/introduction-to-algorithms-

Instant digital products (PDF, ePub, MOBI) ready for you

Download now and discover formats that fit your needs...

Introduction to algorithms for data mining and machine learning Yang

https://ebookmass.com/product/introduction-to-algorithms-for-datamining-and-machine-learning-yang/

ebookmass.com

Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms,

https://ebookmass.com/product/fundamentals-of-machine-learning-forpredictive-data-analytics-algorithms/

ebookmass.com

Machine Learning for Signal Processing: Data Science, Algorithms, and Computational Statistics Max A. Little

https://ebookmass.com/product/machine-learning-for-signal-processingdata-science-algorithms-and-computational-statistics-max-a-little/

ebookmass.com

Exchange and Markets in Early Economic Development: Informal Economy in the Three New Guineas John D. Conroy

https://ebookmass.com/product/exchange-and-markets-in-early-economicdevelopment-informal-economy-in-the-three-new-guineas-john-d-conroy/

ebookmass.com

Accurate Results in the Clinical Laboratory: A Guide to Error Detection and Correction 2nd Edition Ph.D. Dasgupta

https://ebookmass.com/product/accurate-results-in-the-clinicallaboratory-a-guide-to-error-detection-and-correction-2nd-edition-ph-ddasgupta/ ebookmass.com

College Admission Essentials: A Step-by-Step Guide to Showing Colleges Who You Are and What Matters to You Ethan Sawyer

https://ebookmass.com/product/college-admission-essentials-a-step-bystep-guide-to-showing-colleges-who-you-are-and-what-matters-to-youethan-sawyer-2/ ebookmass.com

Uncommon Wrath. How Caesar and Cato's Deadly Rivalry Destroyed the Roman Republic Josiah Osgood

https://ebookmass.com/product/uncommon-wrath-how-caesar-and-catosdeadly-rivalry-destroyed-the-roman-republic-josiah-osgood/

ebookmass.com

Transitions of Care in Pharmacy Casebook 1 -online Edition

Laressa Bethishou

https://ebookmass.com/product/transitions-of-care-in-pharmacycasebook-1-online-edition-laressa-bethishou/

ebookmass.com

Wall Street: A History Charles R Geisst

https://ebookmass.com/product/wall-street-a-history-charles-r-geisst/

ebookmass.com

J. Doyle

https://ebookmass.com/product/nutrition-for-sport-andexercise-3e-edition-edition-andrew-j-doyle/

ebookmass.com

Xin-She Yang

Algorithms for Data Mining and Machine Learning

IntroductiontoAlgorithmsforDataMiningand MachineLearning

This page intentionally left blank

Introductionto AlgorithmsforData MiningandMachine Learning

Xin-SheYang

MiddlesexUniversity

SchoolofScienceandTechnology

London,UnitedKingdom

AcademicPressisanimprintofElsevier 125LondonWall,LondonEC2Y5AS,UnitedKingdom 525BStreet,Suite1650,SanDiego,CA92101,UnitedStates 50HampshireStreet,5thFloor,Cambridge,MA02139,UnitedStates TheBoulevard,LangfordLane,Kidlington,OxfordOX51GB,UnitedKingdom

Nopartofthispublicationmaybereproducedortransmittedinanyformorbyanymeans,electronicor mechanical,includingphotocopying,recording,oranyinformationstorageandretrievalsystem,without permissioninwritingfromthepublisher.Detailsonhowtoseekpermission,furtherinformationaboutthe Publisher’spermissionspoliciesandourarrangementswithorganizationssuchastheCopyrightClearanceCenter andtheCopyrightLicensingAgency,canbefoundatourwebsite: www.elsevier.com/permissions.

ThisbookandtheindividualcontributionscontainedinitareprotectedundercopyrightbythePublisher(other thanasmaybenotedherein).

Notices

Knowledgeandbestpracticeinthisﬁeldareconstantlychanging.Asnewresearchandexperiencebroadenour understanding,changesinresearchmethods,professionalpractices,ormedicaltreatmentmaybecomenecessary.

Practitionersandresearchersmustalwaysrelyontheirownexperienceandknowledgeinevaluatingandusing anyinformation,methods,compounds,orexperimentsdescribedherein.Inusingsuchinformationormethods theyshouldbemindfuloftheirownsafetyandthesafetyofothers,includingpartiesforwhomtheyhavea professionalresponsibility.

Tothefullestextentofthelaw,neitherthePublishernortheauthors,contributors,oreditors,assumeanyliability foranyinjuryand/ordamagetopersonsorpropertyasamatterofproductsliability,negligenceorotherwise,or fromanyuseoroperationofanymethods,products,instructions,orideascontainedinthematerialherein.

LibraryofCongressCataloging-in-PublicationData

AcatalogrecordforthisbookisavailablefromtheLibraryofCongress

BritishLibraryCataloguing-in-PublicationData

AcataloguerecordforthisbookisavailablefromtheBritishLibrary

ISBN:978-0-12-817216-2

ForinformationonallAcademicPresspublications visitourwebsiteat https://www.elsevier.com/books-and-journals

Publisher: CandiceJanco

AcquisitionEditor: J.ScottBentley

EditorialProjectManager: MichaelLutz

ProductionProjectManager: NileshKumarShah

Designer: MilesHitchen

TypesetbyVTeX

1Introductiontooptimization 1

1.1Algorithms1

1.1.1Essenceofanalgorithm1

1.1.2Issueswithalgorithms3

1.1.3Typesofalgorithms3

1.2Optimization4

1.2.1Asimpleexample4

1.2.2Generalformulationofoptimization7

1.2.3Feasiblesolution9

1.2.4Optimalitycriteria10

1.3Unconstrainedoptimization10

1.3.1Univariatefunctions11

1.3.2Multivariatefunctions12

1.4Nonlinearconstrainedoptimization14

1.4.1Penaltymethod15

1.4.2Lagrangemultipliers16

1.4.3Karush–Kuhn–Tuckerconditions17

1.5Notesonsoftware18

2Mathematicalfoundations 19

2.1Convexity20

2.1.1Linearandafﬁnefunctions20

2.1.2Convexfunctions21

2.1.3Mathematicaloperationsonconvexfunctions22

2.2Computationalcomplexity22

2.2.1Timeandspacecomplexity24

2.2.2Complexityofalgorithms25

2.3Normsandregularization26

2.3.1Norms26

2.3.2Regularization28

2.4Probabilitydistributions29

2.4.1Randomvariables29

2.4.2Probabilitydistributions30

2.4.3ConditionalprobabilityandBayesianrule32

2.4.4Gaussianprocess34

2.5BayesiannetworkandMarkovmodels35

2.6MonteCarlosampling36

2.6.1MarkovchainMonteCarlo37

2.6.2Metropolis–Hastingsalgorithm37

2.6.3Gibbssampler39

2.7Entropy,crossentropy,andKLdivergence39

2.7.1Entropyandcrossentropy39

2.7.2DLdivergence40

2.8Fuzzyrules41

2.9Dataminingandmachinelearning42

2.9.1Datamining42

2.9.2Machinelearning42 2.10Notesonsoftware42

3Optimizationalgorithms 45

3.1Gradient-basedmethods45

3.1.1Newton’smethod45

3.1.2Newton’smethodformultivariatefunctions47

3.1.3Linesearch48

3.2Variantsofgradient-basedmethods49

3.2.1Stochasticgradientdescent50

3.2.2Subgradientmethod51

3.2.3Conjugategradientmethod52

3.3Optimizersindeeplearning53

3.4Gradient-freemethods56

3.5Evolutionaryalgorithmsandswarmintelligence58

3.5.1Geneticalgorithm58

3.5.2Differentialevolution60

3.5.3Particleswarmoptimization61

3.5.4Batalgorithm61

3.5.5Fireﬂyalgorithm62

3.5.6Cuckoosearch62

3.5.7Flowerpollinationalgorithm63 3.6Notesonsoftware64

4Dataﬁttingandregression 67

4.1Samplemeanandvariance67 4.2Regressionanalysis69

4.2.1Maximumlikelihood69

4.2.2Linerregression70

4.2.3Linearization75

4.2.4Generalizedlinearregression77

4.2.5Goodnessofﬁt80

4.3Nonlinearleastsquares81

4.3.1Gauss–Newtonalgorithm82

4.3.2Levenberg–Marquardtalgorithm85

4.3.3Weightedleastsquares85

4.4Overﬁttingandinformationcriteria86

4.5RegularizationandLassomethod88

4.6Notesonsoftware90

5Logisticregression,PCA,LDA,andICA 91

5.1Logisticregression91

5.2Softmaxregression96

5.3Principalcomponentanalysis96

5.4Lineardiscriminantanalysis101

5.5Singularvaluedecomposition104

5.6Independentcomponentanalysis105

5.7Notesonsoftware108

6Dataminingtechniques 109

6.1Introduction110

6.1.1Typesofdata110

6.1.2Distancemetric110

6.2Hierarchyclustering111

6.3 k -Nearest-neighboralgorithm112

6.4 k -Meansalgorithm113

6.5Decisiontreesandrandomforests115

6.5.1Decisiontreealgorithm115

6.5.2ID3algorithmandC4.5classiﬁer116

6.5.3Randomforest120

6.6Bayesianclassiﬁers121

6.6.1NaiveBayesianclassiﬁer121

6.6.2Bayesiannetworks123

6.7Dataminingforbigdata124

6.7.1Characteristicsofbigdata124

6.7.2Statisticalnatureofbigdata125

6.7.3Miningbigdata125

6.8Notesonsoftware127

7Supportvectormachineandregression 129

7.1Statisticallearningtheory129

7.2Linearsupportvectormachine130

7.3KernelfunctionsandnonlinearSVM133

7.4Supportvectorregression135

7.5Notesonsoftware137

8Neuralnetworksanddeeplearning 139

8.1Learning139

8.2Artiﬁcialneuralnetworks140

8.2.1Neuronmodels140

8.2.2Activationmodels141

8.2.3Artiﬁcialneuralnetworks143

8.3Backpropagationalgorithm146

8.4LossfunctionsinANN147

8.5Optimizersandchoiceofoptimizers149

8.6Networkarchitecture149

8.7Deeplearning151

8.7.1Convolutionalneuralnetworks151

8.7.2RestrictedBoltzmannmachine157

8.7.3Deepneuralnets158

8.7.4Trendsindeeplearning159

8.8Tuningofhyperparameters160

8.9Notesonsoftware161

Abouttheauthor

Xin-SheYang obtainedhisPhDinAppliedMathematicsfromtheUniversityofOxford.HethenworkedatCambridgeUniversityandNationalPhysicalLaboratory(UK) asaSeniorResearchScientist.NowheisReaderatMiddlesexUniversityLondon,and anelectedBye-FellowatCambridgeUniversity.

HeisalsotheIEEEComputerIntelligenceSociety(CIS)ChairfortheTaskForce onBusinessIntelligenceandKnowledgeManagement,DirectoroftheInternational ConsortiumforOptimizationandModellinginScienceandIndustry(iCOMSI),and anEditorofSpringer’sBookSeries SpringerTractsinNature-InspiredComputing (STNIC).

Withmorethan20yearsofresearchandteachingexperience,hehasauthored 10booksandeditedmorethan15books.Hepublishedmorethan200researchpapersininternationalpeer-reviewedjournalsandconferenceproceedingswithmore than36800citations.HehasbeenontheprestigiouslistsofClarivateAnalyticsand WebofSciencehighlycitedresearchersin2016,2017,and2018.Heservesonthe EditorialBoardsofmanyinternationaljournalsincluding InternationalJournalof Bio-InspiredComputation,Elsevier’s JournalofComputationalScience(JoCS), InternationalJournalofParallel,EmergentandDistributedSystems,and International JournalofComputerMathematics.HeisalsotheEditor-in-Chiefofthe International JournalofMathematicalModellingandNumericalOptimisation.

This page intentionally left blank

Preface

Bothdataminingandmachinelearningarebecomingpopularsubjectsforuniversity coursesandindustrialapplications.ThispopularityispartlydrivenbytheInternetand socialmediabecausetheygenerateahugeamountofdataeveryday,andtheunderstandingofsuchbigdatarequiressophisticateddataminingtechniques.Inaddition, manyapplicationssuchasfacialrecognitionandroboticshaveextensivelyusedmachinelearningalgorithms,leadingtotheincreasingpopularityofartiﬁcialintelligence. Fromamoregeneralperspective,bothdataminingandmachinelearningareclosely relatedtooptimization.Afterall,inmanyapplications,wehavetominimizecosts, errors,energyconsumption,andenvironmentimpactandtomaximizesustainability,productivity,andefﬁciency.Manyproblemsindataminingandmachinelearning areusuallyformulatedasoptimizationproblemssothattheycanbesolvedbyoptimizationalgorithms.Therefore,optimizationtechniquesarecloselyrelatedtomany techniquesindataminingandmachinelearning.

Coursesondatamining,machinelearning,andoptimizationareoftencompulsory forstudents,studyingcomputerscience,managementscience,engineeringdesign,operationsresearch,datascience,finance,andeconomics.Allstudentshavetodevelop acertainlevelofdatamodelingskillssothattheycanprocessandinterpretdatafor classification,clustering,curve-fitting,andpredictions.Theyshouldalsobefamiliar withmachinelearningtechniquesthatarecloselyrelatedtodataminingsoastocarry outproblemsolvinginmanyreal-worldapplications.Thisbookprovidesanintroductiontoallthemajortopicsforsuchcourses,coveringtheessentialideasofallkey algorithmsandtechniquesfordatamining,machinelearning,andoptimization.

Thoughthereareoveradozengoodbooksonsuchtopics,mostofthesebooksare eithertoospecializedwithspecificreadershiportoolengthy(oftenover500pages). Thisbookfillsinthegapwithacompactandconciseapproachbyfocusingonthekey concepts,algorithms,andtechniquesatanintroductorylevel.Themainapproachof thisbookisinformal,theorem-free,andpractical.Byusinganinformalapproachall fundamentaltopicsrequiredfordataminingandmachinelearningarecovered,and thereaderscangainsuchbasicknowledgeofallimportantalgorithmswithafocus ontheirkeyideas,withoutworryingaboutanytedious,rigorousmathematicalproofs. Inaddition,thepracticalapproachprovidesabout30workedexamplesinthisbook sothatthereaderscanseehoweachstepofthealgorithmsandtechniquesworks. Thus,thereaderscanbuildtheirunderstandingandconfidencegraduallyandina step-by-stepmanner.Furthermore,withtheminimalrequirementsofbasichighschool mathematicsandsomebasiccalculus,suchaninformalandpracticalstylecanalso enablethereaderstolearnthecontentsbyself-studyandattheirownpace.

Thisbookissuitableforundergraduatesandgraduatestorapidlydevelopallthe fundamentalknowledgeofdatamining,machinelearning,andoptimization.Itcan

alsobeusedbystudentsandresearchersasareferencetoreviewandrefreshtheir knowledgeindatamining,machinelearning,optimization,computerscience,anddata science.

Xin-SheYang January2019inLondon

Acknowledgments

Iwouldliketothankallmystudentsandcolleagueswhohavegivenvaluablefeedback andcommentsonsomeofthecontentsandexamplesofthisbook.Ialsowouldliketo thankmyeditors,J.ScottBentleyandMichaelLutz,andthestaffatElsevierfortheir professionalism.Lastbutnotleast,Ithankmyfamilyforallthehelpandsupport.

Xin-SheYang

January2019

This page intentionally left blank

Introductiontooptimization

Thisbookintroducesthemostfundamentalsandalgorithmsrelatedtooptimization, datamining,andmachinelearning.Themainrequirementissomeunderstandingof high-schoolmathematicsandbasiccalculus;however,wewillreviewandintroduce someofthemathematicalfoundationsintheﬁrsttwochapters.

1.1Algorithms

Analgorithmisaniterative,step-by-stepprocedureforcomputation.Thedetailed procedurecanbeasimpledescription,anequation,oraseriesofdescriptionsin combinationwithequations.Findingtherootsofapolynomial,checkingifanaturalnumberisaprimenumber,andgeneratingrandomnumbersareallalgorithms.

1.1.1Essenceofanalgorithm

Inessence,analgorithmcanbewrittenasaniterativeequationorasetofiterative equations.Forexample,toﬁndasquarerootof a> 0,wecanusethefollowing iterativeequation:

Example1

Asanexample,if x0 = 1 and a = 4,thenwehave x1 = 1 2 (1 + 4 1 ) = 2 5

Similarly,wehave

x4 ≈ 2.00000927,

whichisveryclosetothetruevalueof √4 = 2.Theaccuracyofthisiterativeformulaoralgorithm ishighbecauseitachievestheaccuracyofﬁvedecimalplacesafterfouriterations.

Theconvergenceisveryquickifwestartfromdifferentinitialvaluessuchas x0 = 10andeven x0 = 100.However,foranobviousreason,wecannotstartwith x0 = 0duetodivisionbyzero.

Findtherootof x = √a isequivalenttosolvingtheequation f(x) = x 2 a = 0,

whichisagainequivalenttoﬁndingtherootsofapolynomial f(x).Weknowthat Newton’sroot-ﬁndingalgorithmcanbewrittenas

where f (x) istheﬁrstderivativeorgradientof f(x).Inthiscase,wehave f (x) = 2x .Thus,Newton’sformulabecomes

whichcanbewrittenas

ThisisexactlywhatwehaveinEq.(1.1).

Newton’smethodhasrigorousmathematicalfoundations,whichhasaguaranteed convergenceundercertainconditions.However,ingeneral,Eq.(1.6)ismoregeneral, andthegradientinformation f (x) isneeded.Inaddition,fortheformulatobevalid, wemusthave f (x) = 0.

1.1.2Issueswithalgorithms

TheadvantageofthealgorithmgiveninEq.(1.1)isthatitconvergesveryquickly. However,carefulreadersmayhaveasked:weknowthat √4 =±2,howcanweﬁnd theotherroot 2inadditionto +2?

Evenifweusedifferentinitialvalue x0 = 10or x0 = 0.5,wecanonlyreach x∗ = 2, not 2.

Whathappensifwestartwith

whichisapproaching 2veryquickly.Ifwestartfrom x0 =−10or x0 =−0.5,then wecanalwaysget x∗ =−2,not +2.

Thishighlightsakeyissuehere:theﬁnalsolutionseemstodependontheinitial startingpointforthisalgorithm,whichistrueformanyalgorithms.

Nowtherelevantquestionis:howdoweknowwheretostarttogetaparticular solution?Thegeneralshortansweris“wedonotknow”.Thus,someknowledgeof theproblemunderconsiderationoraneducatedguessmaybeusefultoﬁndtheﬁnal solution.

Infact,mostalgorithmsmaydependontheinitialconﬁguration,andsuchalgorithmsareoftencarryingoutsearchmoveslocally.Thus,thistypeofalgorithmis oftenreferredtoaslocalsearch.Agoodalgorithmshouldbeableto“forget”itsinitial conﬁgurationthoughsuchalgorithmsmaynotexistatallformosttypesofproblems.

Whatweneedingeneralistheglobalsearch,whichattemptstoﬁndﬁnalsolutions thatarelesssensitivetotheinitialstartingpoint(s).

Anotherimportantissueinourdiscussionsisthatthegradientinformation f (x) is necessaryforsomealgorithmssuchasNewton’smethodgiveninEq.(1.6).Thisposes certainrequirementsonthesmoothnessofthefunction f(x).Forexample,weknow that |x | isnotdifferentiableat x = 0.Thus,wecannotdirectlyuseNewton’smethod toﬁndtherootsof f(x) =|x |x 2 a = 0for a> 0.Somemodiﬁcationsareneeded. Thereareotherissuesrelatedtoalgorithmssuchasthesettingofparameters,the slowrateofconvergence,conditionnumbers,anditerationstructures.Allthesemake algorithmdesignsandusagesomehowchallenging,andwewilldiscusstheseissues inmoredetaillaterinthisbook.

1.1.3Typesofalgorithms

Analgorithmcanonlydoaspecificcomputationtask(atmostaclassofcomputational tasks),andnoalgorithmscandoallthetasks.Thus,algorithmscanbeclassifieddue totheirpurposes.Analgorithmtofindrootsofapolynomialbelongstoroot-finding algorithms,whereasanalgorithmforrankingasetofnumbersbelongstosorting algorithms.Therearemanyclassesofalgorithmsfordifferentpurposes.Evenforthe samepurposesuchassorting,therearemanydifferentalgorithmssuchasthemerge sort,bubblesort,quicksort,andothers.

Wecanalsocategorizealgorithmsintermsoftheircharacteristics.Theroot-finding algorithmswejustintroducedaredeterministicalgorithmsbecausethefinalsolutions areexactlythesameifwestartfromthesameinitialguess.Weobtainthesamesetof solutionseverytimewerunthealgorithm.Ontheotherhand,wemayintroducesome randomizationintothealgorithm,forexample,usingpurelyrandominitialpoints. Everytimewerunthealgorithm,weuseanewrandominitialguess.Inthiscase,the algorithmcanhavesomenondeterministicnature,andsuchalgorithmsarereferred toasstochastic.Sometimes,usingrandomnessmaybeadvantageous.Forexample,in theexampleof √4 =±2usingEq.(1.1),randominitialvalues(bothpositiveandnegative)canallowthealgorithmtofindbothroots.Infact,amajortrendinthemodern metaheuristicsisusingsomerandomizationtosuitdifferentpurposes.

Foralgorithmstobeintroducedinthisbook,wearemainlyconcernedwithalgorithmsfordatamining,optimization,andmachinelearning.Weusearelatively uniﬁedapproachtolinkalgorithmsindataminingandmachinelearningtoalgorithms foroptimization.

1.2Optimization

Optimizationiseverywhere,fromengineeringdesigntobusinessplanning.Afterall, timeandresourcesarelimited,andoptimaluseofsuchvaluableresourcesiscrucial. Inaddition,designsofproductshavetomaximizetheperformance,sustainability,and energyefﬁciencyandtominimizethecosts.Therefore,optimizationisimportantfor manyapplications.

1.2.1Asimpleexample

Letusstartwithaverysimpleexampletodesignacontainerwithvolumecapacity V0 = 10m3 .Asthemaincostisrelatedtothecostofmaterials,themainaimisto minimizethetotalsurfacearea S .

Theﬁrstthingwehavetodecideistheshapeofthecontainer(cylinder,cubic, sphereorellipsoid,ormorecomplexgeometry).Forsimplicity,letusstartwitha cylindricalshapewithradius r andheight h (seeFig. 1.1).

Thetotalsurfaceareaofacylinderis

S = 2(πr 2 ) + 2πrh, (1.11) andthevolumeis

V = πr 2 h. (1.12)

Thereareonlytwodesignvariables r and h andoneobjectivefunction S tobeminimized.Obviously,ifthereisnocapacityconstraint,thenwecanchoosenottobuild thecontainer,andthenthecostofmaterialsiszerofor r = 0and h = 0.However,

Figure1.1 Designofacylindriccontainer.

theconstraintrequirementmeansthatwehavetobuildacontainerwithﬁxedvolume V0 = πr 2 h = 10m3 .Therefore,thisoptimizationproblemcanbewrittenas

subjecttotheequalityconstraint

Tosolvethisproblem,wecanﬁrsttrytousetheequalityconstrainttoreducethe numberofdesignvariablesbysolving h.Sowehave

Substitutingitinto(1.13),weget

Thisisaunivariatefunction.Frombasiccalculusweknowthattheminimumormaximumcanoccuratthestationarypoint,wheretheﬁrstderivativeiszero,thatis,

whichgives

Thus,theheightis

Thismeansthattheheightistwicetheradius: h = 2r .Thus,theminimumsurfaceis

S∗ = 2πr 2 + 2πrh = 2πr 2 + 2πr(2r) = 6πr 2

For V0 = 10,wehave

andthetotalsurfacearea

S∗ = 2πr 2 + 2πrh ≈ 25.69.

Itisworthpointingoutthatthisoptimalsolutionisbasedontheassumptionorrequirementtodesignacylindricalcontainer.Ifwedecidetouseaspherewithradius R , weknowthatitsvolumeandsurfaceareais

Wecansolve

whichgivesthesurfacearea

Since6π/ 3 √4π 2 ≈ 5.5358and4π

.83598,wehave S<S∗ ,thatis,the surfaceareaofasphereissmallerthantheminimumsurfaceareaofacylinderwith thesamevolume.Infact,forthesame V0 = 10,wehave

whichissmallerthan S∗ = 25.69foracylinder.

Thishighlightstheimportanceofthechoiceofdesigntype(hereintermsofshape) beforewecandoanytrulyusefuloptimization.Obviously,therearemanyotherfactorsthatcaninﬂuencethechoiceofdesign,includingthemanufacturabilityofthe design,stabilityofthestructure,easeofinstallation,spaceavailability,andsoon.For acontainer,inmostapplications,acylindermaybemucheasiertoproducethana sphere,andthustheoverallcostmaybelowerinpractice.Thoughtherearesomany factorstobeconsideredinengineeringdesign,forthepurposeofoptimization,here wewillonlyfocusontheimprovementandoptimizationofadesignwithwell-posed mathematicalformulations.

1.2.2Generalformulationofoptimization

Whateverthereal-worldapplicationsmaybe,itisusuallypossibletoformulatean optimizationprobleminagenericform[49,53,160].Alloptimizationproblemswith explicitobjectivescaningeneralbeexpressedasanonlinearlyconstrainedoptimizationproblem

maximize/minimize f(x ), x = (x1 ,x2 ,...,xD )T ∈ RD ,

subjectto φj (x ) = 0 (j = 1, 2,...,M), ψk (x ) ≤ 0 (k = 1,...,N), (1.25)

where f(x ), φj (x ),and ψk (x ) arescalarfunctionsofthedesignvector x .Herethe components xi of x = (x1 ,...,xD )T arecalleddesignordecisionvariables,andthey canbeeithercontinuous,discrete,oramixtureofthesetwo.Thevector x isoften calledthedecisionvector,whichvariesina D -dimensionalspace RD

Itisworthpointingoutthatweuseacolumnvectorherefor x (thuswithtranspose T ).Wecanalsousearowvector x = (x1 ,...,xD ) andtheresultswillbethe same.Differenttextbooksmayuseslightlydifferentformulations.Onceweareaware ofsuchminorvariations,itshouldcausenodifﬁcultyorconfusion.

Inaddition,thefunction f(x ) iscalledtheobjectivefunctionorcostfunction, φj (x ) areconstraintsintermsof M equalities,and ψk (x ) areconstraintswrittenas N inequalities.Sothereare M + N constraintsintotal.Theoptimizationproblem formulatedhereisanonlinearconstrainedproblem.Heretheinequalities ψk (x ) ≤ 0 arewrittenas“lessthan”,andtheycanalsobewrittenas“greaterthan”viaasimple transformationbymultiplyingbothsidesby 1.

Thespacespannedbythedecisionvariablesiscalledthesearchspace RD ,whereas thespaceformedbythevaluesoftheobjectivefunctioniscalledtheobjectiveor responsespace,andsometimesthelandscape.Theoptimizationproblemessentially mapsthedomain RD orthespaceofdecisionvariablesintothesolutionspace R (or therealaxisingeneral).

Theobjectivefunction f(x ) canbeeitherlinearornonlinear.Iftheconstraints φj and ψk arealllinear,itbecomesalinearlyconstrainedproblem.Furthermore,when φj , ψk ,andtheobjectivefunction f(x ) arealllinear,thenitbecomesalinearprogrammingproblem[35].Iftheobjectiveisatmostquadraticwithlinearconstraints, thenitiscalledaquadraticprogrammingproblem.Ifallthevaluesofthedecision variablescanbeonlyintegers,thenthistypeoflinearprogrammingiscalledinteger programmingorintegerlinearprogramming.

Ontheotherhand,ifnoconstraintsarespeciﬁedandthus xi cantakeanyvalues intherealaxis(oranyintegers),thentheoptimizationproblemisreferredtoasan unconstrainedoptimizationproblem.

Asaverysimpleexampleofoptimizationproblemswithoutanyconstraints,we discussthesearchofthemaximaorminimaofaunivariatefunction.

Asimplemultimodalfunction f(x) = x 2 e x 2

Example2

Forexample,toﬁndthemaximumofaunivariatefunction f(x)

isasimpleunconstrainedproblem,whereasthefollowingproblemisasimpleconstrainedminimizationproblem:

subjectto

Itisworthpointingoutthattheobjectivesareexplicitlyknowninalltheoptimizationproblemstobediscussedinthisbook.However,inreality,itisoftendifﬁcultto quantifywhatwewanttoachieve,butwestilltrytooptimizecertainthingssuchasthe degreeofenjoymentorservicequalityonholiday.Inothercases,itmaybeimpossible towritetheobjectivefunctioninanyexplicitformmathematically.

Frombasiccalculusweknowthat,foragivencurvedescribedby f(x),itsgradient f (x) describestherateofchange.When f (x) = 0,thecurvehasahorizontaltangent atthatparticularpoint.Thismeansthatitbecomesapointofspecialinterest.Infact, themaximumorminimumofacurveoccursat

f (x∗ ) = 0, (1.29)

whichisacriticalconditionorstationarycondition.Thesolution x∗ tothisequation correspondstoastationarypoint,andtheremaybemultiplestationarypointsfora givencurve.

Toseeifitisamaximumorminimumat x = x∗ ,wehavetousetheinformationof itssecondderivative f (x).Infact, f (x∗ )> 0correspondstoaminimum,whereas f (x∗ )< 0correspondstoamaximum.Letusseeaconcreteexample.

Example3

Toﬁndtheminimumof f(x) = x 2 e x 2 (seeFig. 1.2),wehavethestationarycondition f (x) = 0 or

f (x) = 2

Figure1.2

Figure1.3 (a)Feasibledomainwithnonlinearinequalityconstraints ψ1 (x) and ψ2 (x) (left)andlinear inequalityconstraint ψ3 (x).(b)Anexamplewithanobjectiveof f(x) = x 2 subjectto x ≥ 2(right).

As e x 2 > 0,wehave

x(1 x 2 ) = 0, or x = 0and x =±1.

Thesecondderivativeisgivenby

f (x) = 2e x 2 (1 5x 2 + 2x 4 ),

whichisanevenfunctionwithrespectto x

Soat x =±1, f (±1) = 2[1 5(±1)2 + 2(±1)4 ]e (±1)2 =−4e 1 < 0.Thus,thereare twomaximathatoccurat x∗ =±1 with fmax = e 1 .At x = 0,wehave f (0) = 2 > 0,thus theminimumof f(x) occursat x∗ = 0 with fmin (0) = 0.

Whatevertheobjectiveis,wehavetoevaluateitmanytimes.Inmostcases,the evaluationsoftheobjectivefunctionsconsumeasubstantialamountofcomputational power(whichcostsmoney)anddesigntime.Anyefﬁcientalgorithmthatcanreduce thenumberofobjectiveevaluationssavesbothtimeandmoney.

Inmathematicalprogramming,therearemanyimportantconcepts,andwewill ﬁrstintroduceafewrelatedconcepts:feasiblesolutions,optimalitycriteria,thestrong localoptimum,andweaklocaloptimum.

1.2.3Feasiblesolution

Apoint x thatsatisﬁesalltheconstraintsiscalledafeasiblepointandthusisafeasible solutiontotheproblem.Thesetofallfeasiblepointsiscalledthefeasibleregion(see Fig. 1.3).

Forexample,weknowthatthedomain f(x) = x 2 consistsofallrealnumbers.If wewanttominimize f(x) withoutanyconstraint,allsolutionssuchas x =−1, x = 1, and x = 0arefeasible.Infact,thefeasibleregionisthewholerealaxis.Obviously, x = 0correspondsto f(0) = 0asthetrueminimum.

However,ifwewanttoﬁndtheminimumof f(x) = x 2 subjectto x ≥ 2,thenit becomesaconstrainedoptimizationproblem.Thepointssuchas x = 1and x = 0are nolongerfeasiblebecausetheydonotsatisfy x ≥ 2.Inthiscasethefeasiblesolutions areallthepointsthatsatisfy x ≥ 2.So x = 2, x = 100,and x = 108 areallfeasible.It isobviousthattheminimumoccursat x = 2with f(2) = 22 = 4,thatis,theoptimal solutionforthisproblemoccursattheboundarypoint x = 2(seeFig. 1.3).

Figure1.4 Localoptima,weakoptima,andglobaloptimality.

1.2.4Optimalitycriteria

Apoint x ∗ iscalledastronglocalmaximumofthenonlinearlyconstrainedoptimizationproblemif f(x ) isdefinedina δ -neighborhood N(x ∗ ,δ) andsatisfies f(x ∗ )>f(u) for u ∈ N(x ∗ ,δ),where δ> 0and u = x ∗ .If x ∗ isnotastronglocalmaximum,thentheinclusionofequalityinthecondition f(x ∗ ) ≥ f(u) forall u ∈ N(x ∗ ,δ) definesthepoint x ∗ asaweaklocalmaximum(seeFig. 1.4).Thelocal minimacanbedefinedinasimilarmannerwhen > and ≥ arereplacedby < and ≤, respectively.

Fig. 1.4 showsvariouslocalmaximaandminima.Point A isastronglocalmaximum,whereaspoint B isaweaklocalmaximumbecausetherearemany(infact, inﬁnite)differentvaluesof x thatwillleadtothesamevalueof f(x ∗ ).Point D isthe globalmaximum,andpoint E istheglobalminimum.Inaddition,point F isastrong localminimum.However,point C isastronglocalminimum,butithasadiscontinuity in f (x ∗ ).Sothestationaryconditionforthispoint f (x ∗ ) = 0isnotvalid.Wewill notdealwiththesetypesofminimaormaximaindetail.

Aswebrieﬂymentionedbefore,forasmoothcurve f(x),optimalsolutionsusuallyoccuratstationarypointswhere f (x) = 0.Thisisnotalwaysthecasebecause optimalsolutionscanalsooccurattheboundary,aswehaveseeninthepreviousexampleofminimizing f(x) = x 2 subjectto x ≥ 2.Inourpresentdiscussion,wewill assumethatboth f(x ) and f (x ) arealwayscontinuousor f(x ) iseverywheretwice continuouslydifferentiable.Obviously,theinformationof f (x) isnotsufﬁcientto determinewhetherastationarypointisalocalmaximumorminimum.Thus,higherorderderivativessuchas f (x) areneeded,butwedonotmakeanyassumptionatthis stage.Wewillfurtherdiscussthisindetailinthenextsection.

1.3Unconstrainedoptimization

Optimizationproblemscanbeclassiﬁedaseitherunconstrainedorconstrained.Unconstrainedoptimizationproblemscaninturnbesubdividedintounivariateandmultivariateproblems.

1.3.1Univariatefunctions

Thesimplestoptimizationproblemwithoutanyconstraintsisprobablythesearchfor themaximaorminimaofaunivariatefunction f(x).Forunconstrainedoptimization problems,theoptimalityoccursatthecriticalpointsgivenbythestationarycondition f (x) = 0.

However,thisstationaryconditionisjustanecessarycondition,butitisnotasufﬁcientcondition.If f (x∗ ) = 0and f (x∗ )> 0,itisalocalminimum.Conversely,if f (x∗ ) = 0and f (x∗ )< 0,thenitisalocalmaximum.However,if f (x∗ ) = 0and f (x∗ ) = 0,careshouldbetakenbecause f (x) maybeindeﬁnite(bothpositiveand negative)when x → x∗ ,then x∗ correspondstoasaddlepoint.

Forexample,for f(x) = x 3 ,wehave

f (x) = 3x 2 ,f (x) = 6x.

Thestationarycondition f (x) = 3x 2 = 0gives x∗ = 0.However,wealsohave f (x∗ ) = f (0) = 0.

Infact, f(x) = x 3 hasasaddlepoint x∗ = 0because f (0) = 0but f changessign from f (0+)> 0to f (0 )< 0as x movesfrompositivetonegative.

Example4

Forexample,toﬁndthemaximumorminimumofaunivariatefunction

f(x) = 3x 4 4x 3 12x 2 + 9, −∞ <x< ∞,

wefirsthavetofinditsstationarypoints x∗ whenthefirstderivative f (x) iszero,thatis,

f (x) = 12x 3 12x 2 24x = 12(x 3 x 2 2x) = 0.

Since f (x) = 12(x 3 x 2 2x) = 12x(x + 1)(x 2) = 0,wehave x∗ =−1,x∗ = 2,x∗ = 0.

Thesecondderivativeof f(x) issimply

f (x) = 36x 2 24x 24.

Fromthebasiccalculusweknowthatthemaximumrequires f (x∗ ) ≤ 0 whereastheminimum requires f (x∗ ) ≥ 0

At x∗ =−1,wehave

f ( 1) = 36( 1)2 24( 1) 24 = 36 > 0, sothispointcorrespondstoalocalminimum

f( 1) = 3( 1)4 4( 1)3 12( 1)2 + 9 = 4.