Mathematics for machine learning 1st edition marc peter deisenroth

Mathematics for Machine Learning 1st Edition Marc

Peter Deisenroth

Visit to download the full and correct content document: https://textbookfull.com/product/mathematics-for-machine-learning-1st-edition-marc-p eter-deisenroth/

More products digital (pdf, epub, mobi) instant download maybe you interests ...

Mathematics for machine learning Deisenroth M.P

https://textbookfull.com/product/mathematics-for-machinelearning-deisenroth-m-p/

Machine Learning and its Applications 1st Edition Peter Wlodarczak

https://textbookfull.com/product/machine-learning-and-itsapplications-1st-edition-peter-wlodarczak/

AI as a Service: Serverless machine learning with AWS 1st Edition Peter Elger

https://textbookfull.com/product/ai-as-a-service-serverlessmachine-learning-with-aws-1st-edition-peter-elger/

Machine Learning for Factor Investing R Version Chapman and Hall CRC Financial Mathematics Series 1st Edition Guillaume Coqueret

https://textbookfull.com/product/machine-learning-for-factorinvesting-r-version-chapman-and-hall-crc-financial-mathematicsseries-1st-edition-guillaume-coqueret/

Mathematics Affect and Learning Middle School Students

Beliefs and Attitudes About Mathematics Education Peter Grootenboer

https://textbookfull.com/product/mathematics-affect-and-learningmiddle-school-students-beliefs-and-attitudes-about-mathematicseducation-peter-grootenboer/

Machine Learning for iOS Developers 1st Edition

Abhishek Mishra

https://textbookfull.com/product/machine-learning-for-iosdevelopers-1st-edition-abhishek-mishra/

Machine Learning for Decision Makers 1st Edition

Patanjali Kashyap

https://textbookfull.com/product/machine-learning-for-decisionmakers-1st-edition-patanjali-kashyap/

Mathematics for Machine Technology 8th Edition John C. Peterson

https://textbookfull.com/product/mathematics-for-machinetechnology-8th-edition-john-c-peterson/

Machine Learning by Tutorials Beginning Machine

Learning for Apple and iOS First Edition Raywenderlich Tutorial Team

https://textbookfull.com/product/machine-learning-by-tutorialsbeginning-machine-learning-for-apple-and-ios-first-editionraywenderlich-tutorial-team/

MA THE MA TICS FOR MA CHINE LEA R NI NG

A. Aldo Faisal Cheng Soon O n g

Marc Peter Deisenroth

2.4VectorSpaces

2.5LinearIndependence

2.7LinearMappings

3AnalyticGeometry

3.5OrthonormalBasis

3.6OrthogonalComplement

3.7InnerProductofFunctions

3.8OrthogonalProjections

3.9Rotations

ThismaterialwillbepublishedbyCambridgeUniversityPressas MathematicsforMachineLearning byMarcPeterDeisenroth,A.AldoFaisal,andChengSoonOng.Thispre-publicationversionis freetoviewanddownloadforpersonaluseonly.Notforre-distribution,re-saleoruseinderivativeworks. c byM.P.Deisenroth,A.A.Faisal,andC.S.Ong,2020. https://mml-book.com.

Contents Foreword 1 PartIMathematicalFoundations 9 1IntroductionandMotivation 11 1.1FindingWordsforIntuitions 12 1.2TwoWaystoReadThisBook 13 1.3ExercisesandFeedback 16

17

19

22

27

2LinearAlgebra

2.1SystemsofLinearEquations

2.2Matrices

2.3SolvingSystemsofLinearEquations

35

40

44

2.6BasisandRank

48

61 2.9FurtherReading 63 Exercises 64

2.8AfﬁneSpaces

70

71

72

75

76

78

3.1Norms

3.2InnerProducts

3.3LengthsandDistances

3.4AnglesandOrthogonality

79

80

81

91

94 Exercises 96

98 4.1DeterminantandTrace 99 i

3.10FurtherReading

4MatrixDecompositions

4.2EigenvaluesandEigenvectors

4.7MatrixPhylogeny

4.8FurtherReading

5VectorCalculus

5.1DifferentiationofUnivariateFunctions

5.2PartialDifferentiationandGradients

5.3GradientsofVector-ValuedFunctions

5.4GradientsofMatrices

5.5UsefulIdentitiesforComputingGradients

5.6BackpropagationandAutomaticDifferentiation

5.7Higher-OrderDerivatives

5.8LinearizationandMultivariateTaylorSeries

5.9FurtherReading

6ProbabilityandDistributions

6.1ConstructionofaProbabilitySpace

6.2DiscreteandContinuousProbabilities

6.3SumRule,ProductRule,andBayes’Theorem

6.4SummaryStatisticsandIndependence

6.5GaussianDistribution

6.6ConjugacyandtheExponentialFamily

6.7ChangeofVariables/InverseTransform

6.8FurtherReading

7ContinuousOptimization

7.1OptimizationUsingGradientDescent

7.2ConstrainedOptimizationandLagrangeMultipliers

7.3ConvexOptimization

7.4FurtherReading

8WhenModelsMeetData

8.1Data,Models,andLearning

8.2EmpiricalRiskMinimization

8.4ProbabilisticModelingandInference

8.5DirectedGraphicalModels

ii Contents

105

114

115

119

129

4.3CholeskyDecomposition

4.4EigendecompositionandDiagonalization

4.5SingularValueDecomposition

4.6MatrixApproximation

134

135 Exercises 137

139

141

146

149

155

158

159

164

165

170 Exercises 170

172

178

183

186

197

205

214

221 Exercises 222

225

227

233

236

246 Exercises 247 PartIICentralMachineLearningProblems 249

251

258

265

8.3ParameterEstimation

272

278

Draft(2020-01-01)of“MathematicsforMachineLearning”.Feedback: https://mml-book.com

10DimensionalityReductionwithPrincipalComponentAnalysis

11DensityEstimationwithGaussianMixtureModels

12ClassiﬁcationwithSupportVectorMachines

c 2020M.P.Deisenroth,A.A.Faisal,C.S.Ong.TobepublishedbyCambridgeUniversityPress.

Contents iii

283

289 9.1ProblemFormulation 291 9.2ParameterEstimation 292 9.3BayesianLinearRegression 303 9.4MaximumLikelihoodasOrthogonalProjection 313

315

8.6ModelSelection

9LinearRegression

9.5FurtherReading

317

318 10.2MaximumVariancePerspective 320

325

333

335 10.6KeyStepsofPCAinPractice 336 10.7LatentVariablePerspective 339 10.8FurtherReading 343

10.1ProblemSetting

10.3ProjectionPerspective

10.4EigenvectorComputationandLow-RankApproximations

10.5PCAinHighDimensions

348 11.1GaussianMixtureModel 349 11.2ParameterLearningviaMaximumLikelihood 350 11.3EMAlgorithm 360 11.4Latent-VariablePerspective 363 11.5FurtherReading 368

370 12.1SeparatingHyperplanes 372 12.2PrimalSupportVectorMachine 374 12.3DualSupportVectorMachine 383 12.4Kernels 388 12.5NumericalSolution 390 12.6FurtherReading 392 References 395 Index 407

Foreword

Machinelearningisthelatestinalonglineofattemptstodistillhuman knowledgeandreasoningintoaformthatissuitableforconstructingmachinesandengineeringautomatedsystems.Asmachinelearningbecomes moreubiquitousanditssoftwarepackagesbecomeeasiertouse,itisnaturalanddesirablethatthelow-leveltechnicaldetailsareabstractedaway andhiddenfromthepractitioner.However,thisbringswithitthedanger thatapractitionerbecomesunawareofthedesigndecisionsand,hence, thelimitsofmachinelearningalgorithms.

Theenthusiasticpractitionerwhoisinterestedtolearnmoreaboutthe magicbehindsuccessfulmachinelearningalgorithmscurrentlyfacesa dauntingsetofpre-requisiteknowledge:

Programminglanguagesanddataanalysistools

Large-scalecomputationandtheassociatedframeworks Mathematicsandstatisticsandhowmachinelearningbuildsonit

Atuniversities,introductorycoursesonmachinelearningtendtospend earlypartsofthecoursecoveringsomeofthesepre-requisites.Forhistoricalreasons,coursesinmachinelearningtendtobetaughtinthecomputer sciencedepartment,wherestudentsareoftentrainedintheﬁrsttwoareas ofknowledge,butnotsomuchinmathematicsandstatistics.

Currentmachinelearningtextbooksprimarilyfocusonmachinelearningalgorithmsandmethodologiesandassumethatthereaderiscompetentinmathematicsandstatistics.Therefore,thesebooksonlyspend oneortwochaptersofbackgroundmathematics,eitheratthebeginning ofthebookorasappendices.Wehavefoundmanypeoplewhowantto delveintothefoundationsofbasicmachinelearningmethodswhostrugglewiththemathematicalknowledgerequiredtoreadamachinelearning textbook.Havingtaughtundergraduateandgraduatecoursesatuniversities,weﬁndthatthegapbetweenhighschoolmathematicsandthemathematicslevelrequiredtoreadastandardmachinelearningtextbookistoo bigformanypeople.

Thisbookbringsthemathematicalfoundationsofbasicmachinelearningconceptstotheforeandcollectstheinformationinasingleplaceso thatthisskillsgapisnarrowedorevenclosed.

1

ThismaterialwillbepublishedbyCambridgeUniversityPressas MathematicsforMachineLearning byMarcPeterDeisenroth,A.AldoFaisal,andChengSoonOng.Thispre-publicationversionis freetoviewanddownloadforpersonaluseonly.Notforre-distribution,re-saleoruseinderivativeworks. c byM.P.Deisenroth,A.A.Faisal,andC.S.Ong,2020. https://mml-book.com

Foreword

WhyAnotherBookonMachineLearning?

Machinelearningbuildsuponthelanguageofmathematicstoexpress conceptsthatseemintuitivelyobviousbutthataresurprisinglydifﬁcult toformalize.Onceformalizedproperly,wecangaininsightsintothetask wewanttosolve.Onecommoncomplaintofstudentsofmathematics aroundtheglobeisthatthetopicscoveredseemtohavelittlerelevance topracticalproblems.Webelievethatmachinelearningisanobviousand directmotivationforpeopletolearnmathematics.

Thisbookisintendedtobeaguidebooktothevastmathematicalliteraturethatformsthefoundationsofmodernmachinelearning.Wemo- “Mathislinkedin thepopularmind withphobiaand anxiety.You’dthink we’rediscussing spiders.”(Strogatz, 2014,page281)

tivatetheneedformathematicalconceptsbydirectlypointingouttheir usefulnessinthecontextoffundamentalmachinelearningproblems.In theinterestofkeepingthebookshort,manydetailsandmoreadvanced conceptshavebeenleftout.Equippedwiththebasicconceptspresented here,andhowtheyfitintothelargercontextofmachinelearning,the readercanfindnumerousresourcesforfurtherstudy,whichweprovideat theendoftherespectivechapters.Forreaderswithamathematicalbackground,thisbookprovidesabriefbutpreciselystatedglimpseofmachine learning.Incontrasttootherbooksthatfocusonmethodsandmodels ofmachinelearning(MacKay, 2003; Bishop, 2006; Alpaydin, 2010; Barber, 2012; Murphy, 2012; Shalev-ShwartzandBen-David, 2014; Rogers andGirolami, 2016)orprogrammaticaspectsofmachinelearning(Müller andGuido, 2016; RaschkaandMirjalili, 2017; CholletandAllaire, 2018), weprovideonlyfourrepresentativeexamplesofmachinelearningalgorithms.Instead,wefocusonthemathematicalconceptsbehindthemodels themselves.Wehopethatreaderswillbeabletogainadeeperunderstandingofthebasicquestionsinmachinelearningandconnectpracticalquestionsarisingfromtheuseofmachinelearningwithfundamentalchoices inthemathematicalmodel.

Wedonotaimtowriteaclassicalmachinelearningbook.Instead,our intentionistoprovidethemathematicalbackground,appliedtofourcentralmachinelearningproblems,tomakeiteasiertoreadothermachine learningtextbooks.

WhoIstheTargetAudience?

Asapplicationsofmachinelearningbecomewidespreadinsociety,we believethateverybodyshouldhavesomeunderstandingofitsunderlying principles.Thisbookiswritteninanacademicmathematicalstyle,which enablesustobepreciseabouttheconceptsbehindmachinelearning.We encouragereadersunfamiliarwiththisseeminglytersestyletopersevere andtokeepthegoalsofeachtopicinmind.Wesprinklecommentsand remarksthroughoutthetext,inthehopethatitprovidesusefulguidance withrespecttothebigpicture.

Thebookassumesthereadertohavemathematicalknowledgecommonly

Draft(2020-01-01)of“MathematicsforMachineLearning”.Feedback: https://mml-book.com

2

Foreword 3 coveredinhighschoolmathematicsandphysics. Forexample,thereader shouldhaveseenderivativesandintegralsbefore,andgeometricvectors intwoorthreedimensions.Startingfromthere,wegeneralizetheseconcepts.Therefore,thetargetaudienceofthebookincludesundergraduate universitystudents,eveninglearnersandlearnersparticipatinginonline machinelearningcourses.

Inanalogytomusic,therearethreetypesofinteractionthatpeople havewithmachinelearning:

AstuteListener Thedemocratizationofmachinelearningbytheprovisionofopen-sourcesoftware,onlinetutorialsandcloud-basedtoolsallowsuserstonotworryaboutthespecificsofpipelines.Userscanfocuson extractinginsightsfromdatausingoff-the-shelftools.Thisenablesnontech-savvydomainexpertstobenefitfrommachinelearning.Thisissimilartolisteningtomusic;theuserisabletochooseanddiscernbetween differenttypesofmachinelearning,andbenefitsfromit.Moreexperiencedusersarelikemusiccritics,askingimportantquestionsaboutthe applicationofmachinelearninginsocietysuchasethics,fairness,andprivacyoftheindividual.Wehopethatthisbookprovidesafoundationfor thinkingaboutthecertificationandriskmanagementofmachinelearning systems,andallowsthemtousetheirdomainexpertisetobuildbetter machinelearningsystems.

ExperiencedArtist Skilledpractitionersofmachinelearningcanplug andplaydifferenttoolsandlibrariesintoananalysispipeline.Thestereotypicalpractitionerwouldbeadatascientistorengineerwhounderstands machinelearninginterfacesandtheirusecases,andisabletoperform wonderfulfeatsofpredictionfromdata.Thisissimilartoavirtuosoplayingmusic,wherehighlyskilledpractitionerscanbringexistinginstrumentstolifeandbringenjoymenttotheiraudience.Usingthemathematicspresentedhereasaprimer,practitionerswouldbeabletounderstandthebeneﬁtsandlimitsoftheirfavoritemethod,andtoextendand generalizeexistingmachinelearningalgorithms.Wehopethatthisbook providestheimpetusformorerigorousandprincipleddevelopmentof machinelearningmethods.

FledglingComposer Asmachinelearningisappliedtonewdomains, developersofmachinelearningneedtodevelopnewmethodsandextend existingalgorithms.Theyareoftenresearcherswhoneedtounderstand themathematicalbasisofmachinelearninganduncoverrelationshipsbetweendifferenttasks.Thisissimilartocomposersofmusicwho,within therulesandstructureofmusicaltheory,createnewandamazingpieces. Wehopethisbookprovidesahigh-leveloverviewofothertechnicalbooks forpeoplewhowanttobecomecomposersofmachinelearning.Thereis agreatneedinsocietyfornewresearcherswhoareabletoproposeand explorenovelapproachesforattackingthemanychallengesoflearning fromdata.

c 2020M.P.Deisenroth,A.A.Faisal,C.S.Ong.TobepublishedbyCambridgeUniversityPress.

Foreword

Acknowledgments

Wearegratefultomanypeoplewholookedatearlydraftsofthebookand sufferedthroughpainfulexpositionsofconcepts.Wetriedtoimplement theirideasthatwedidnotvehementlydisagreewith.Wewouldliketo especiallyacknowledgeChristfriedWebersforhiscarefulreadingofmany partsofthebook,andhisdetailedsuggestionsonstructureandpresentation.Manyfriendsandcolleagueshavealsobeenkindenoughtoprovidetheirtimeandenergyondifferentversionsofeachchapter.Wehave beenluckytobeneﬁtfromthegenerosityoftheonlinecommunity,who havesuggestedimprovementsvia github.com,whichgreatlyimproved thebook.

Thefollowingpeoplehavefoundbugs,proposedclariﬁcationsandsuggestedrelevantliterature,eithervia github.com orpersonalcommunication.Theirnamesaresortedalphabetically.

Abdul-GaniyUsman

AdamGaier

AdeleJackson

AdityaMenon

AlasdairTran

AleksandarKrnjaic

AlexanderMakrigiorgos

AlfredoCanziani

AliShafti

AmrKhalifa

AndrewTanggara

AngusGruen

AntalA.Buss

AntoineToisoulLeCann

AregSarvazyan

ArtemArtemev

ArtyomStepanov

BillKromydas

BobWilliamson

BoonPingLim

ChaoQu

ChengLi

ChrisSherlock

ChristopherGray

DanielMcNamara

DanielWood

DarrenSiegel

DavidJohnston

DaweiChen

EllenBroad

FengkuangtianZhu

FionaCondon

GeorgiosTheodorou

HeXin

IreneRaissaKameni

JakubNabaglo

JamesHensman

JamieLiu

JeanKaddour

Jean-PaulEbejer

JerryQiang

JiteshSindhare

JohnLloyd

JonasNgnawe

JonMartin

JustinHsi

KaiArulkumaran

KamilDreczkowski

LilyWang

LionelTondjiNgoupeyou

LydiaKnuﬁng

MahmoudAslan

MarkHartenstein

MarkvanderWilk

MarkusHegland

MartinHewing

MatthewAlger

MatthewLee

Draft(2020-01-01)of“MathematicsforMachineLearning”.Feedback: https://mml-book.com

4

Foreword 5

MaximusMcCann

MengyanZhang

MichaelBennett

MichaelPedersen

MinjeongShin

MohammadMalekzadeh

NaveenKumar

NicoMontali

OscarArmas

PatrickHenriksen

PatrickWieschollek

PattarawatChormai

PaulKelly

PetrosChristodoulou

PiotrJanuszewski

PranavSubramani

QuyuKong

RagibZaman

RuiZhang

Ryan-RhysGrifﬁths

SalomonKabongo

SamuelOgunmola

SandeepMavadia

SarveshNikumbh

SebastianRaschka

SenanayakSeshKumarKarri

Seung-HeonBaek

ShahbazChaudhary

ShakirMohamed

ShawnBerry

SheikhAbdulRaheemAli

ShengXue

SridharThiagarajan

SyedNoumanHasany

SzymonBrych

ThomasBuhler

TimurSharapov

TomMelamed

VincentAdam

VincentDutordoir

VuMinh

WasimAftab

WenZhi

WojciechStokowiec

XiaonanChong

XiaoweiZhang

YazhouHao

YichengLuo

YoungLee

YuLu

YunCheng

YuxiaoHuang

ZacCranko

ZijianCao

ZoeNolan

Contributorsthroughgithub,whoserealnameswerenotlistedontheir githubproﬁle,are:

SamDataMad bumptiousmonkey idoamihai deepakiim

insad HorizonP cs-maillist kudo23

empet victorBigand 17SKYE jessjing1995

WearealsoverygratefultoParameswaranRamanandthemanyanonymousreviewers,organizedbyCambridgeUniversityPress,whoreadone ormorechaptersofearlierversionsofthemanuscript,andprovidedconstructivecriticismthatledtoconsiderableimprovements.AspecialmentiongoestoDineshSinghNegi,ourLATEXsupport,fordetailedandprompt adviceaboutLATEX-relatedissues.Lastbutnotleast,weareverygrateful tooureditorLaurenCowles,whohasbeenpatientlyguidingusthrough thegestationprocessofthisbook.

c 2020M.P.Deisenroth,A.A.Faisal,C.S.Ong.TobepublishedbyCambridgeUniversityPress.

Foreword

TableofSymbols

SymbolTypicalmeaning

a,b,c,α,β,γ

x, y, z

A, B, C

x , A

A 1

x, y

x y

Scalarsarelowercase

Vectorsareboldlowercase

Matricesarebolduppercase

Transposeofavectorormatrix

Inverseofamatrix

Innerproductof x and y

Dotproductof x and y

B =(b1, b2, b3) (Ordered)tuple

B =[b1, b2, b3] Matrixofcolumnvectorsstackedhorizontally

B = {b1, b2, b3} Setofvectors(unordered)

Z, N

Integersandnaturalnumbers,respectively

R, C Realandcomplexnumbers,respectively

Rn n-dimensionalvectorspaceofrealnumbers

∀x Universalquantiﬁer:forall x

∃x Existentialquantiﬁer:thereexists x

a := ba isdeﬁnedas b

a =: bb isdeﬁnedas a

a ∝ ba isproportionalto b,i.e., a = constant b

g ◦ f Functioncomposition:“g after f ”

⇐⇒ Ifandonlyif

=⇒ Implies

A, C Sets

a ∈A a isanelementoftheset A

∅ Emptyset

D

Numberofdimensions;indexedby d =1,...,D

N Numberofdatapoints;indexedby n =1,...,N

I m

0m,n

1m,n

Identitymatrixofsize m × m

Matrixofzerosofsize m × n

Matrixofonesofsize m × n

ei Standard/canonicalvector(where i isthecomponentthatis 1) dim Dimensionalityofvectorspace rk(A) Rankofmatrix A

Im(Φ) Imageoflinearmapping Φ ker(Φ) Kernel(nullspace)ofalinearmapping Φ span[b1] Span(generatingset)of b1 tr(A) Traceof A

det(A) Determinantof A

|·| Absolutevalueordeterminant(dependingoncontext)

· Norm;Euclideanunlessspeciﬁed

λ EigenvalueorLagrangemultiplier

Eλ Eigenspacecorrespondingtoeigenvalue λ

Draft(2020-01-01)of“MathematicsforMachineLearning”.Feedback: https://mml-book.com

6

Foreword 7

SymbolTypicalmeaning

θ Parametervector

∂f

∂x

df

Partialderivativeof f withrespectto x

dx Totalderivativeof f withrespectto x

∇ Gradient

L Lagrangian

L Negativelog-likelihood

n k Binomialcoefﬁcient, n choose k

VX [x] Varianceof x withrespecttotherandomvariable X

EX [x] Expectationof x withrespecttotherandomvariable X

CovX,Y [x, y] Covariancebetween x and y.

X ⊥⊥ Y | ZX isconditionallyindependentof Y given Z

X ∼ p Randomvariable X isdistributedaccordingto p

N µ, Σ Gaussiandistributionwithmean µ andcovariance Σ

Ber(µ) Bernoullidistributionwithparameter µ

Bin(N,µ) Binomialdistributionwithparameters N,µ

Beta(α,β) Betadistributionwithparameters α,β

TableofAbbreviationsandAcronyms

AcronymMeaning

e.g.Exempligratia(Latin:forexample)

GMMGaussianmixturemodel

i.e.Idest(Latin:thismeans)

i.i.d.Independent,identicallydistributed

MAPMaximumaposteriori

MLEMaximumlikelihoodestimation/estimator

ONBOrthonormalbasis

PCAPrincipalcomponentanalysis

PPCAProbabilisticprincipalcomponentanalysis

REFRow-echelonform

SPDSymmetric,positivedeﬁnite

SVMSupportvectormachine

c 2020M.P.Deisenroth,A.A.Faisal,C.S.Ong.TobepublishedbyCambridgeUniversityPress.

PartI

MathematicalFoundations

ThismaterialwillbepublishedbyCambridgeUniversityPressas MathematicsforMachineLearning byMarcPeterDeisenroth,A.AldoFaisal,andChengSoonOng.Thispre-publicationversionis freetoviewanddownloadforpersonaluseonly.Notforre-distribution,re-saleoruseinderivativeworks. c byM.P.Deisenroth,A.A.Faisal,andC.S.Ong,2020. https://mml-book.com

9

IntroductionandMotivation

Machinelearningisaboutdesigningalgorithmsthatautomaticallyextract valuableinformationfromdata.Theemphasishereison“automatic”,i.e., machinelearningisconcernedaboutgeneral-purposemethodologiesthat canbeappliedtomanydatasets,whileproducingsomethingthatismeaningful.Therearethreeconceptsthatareatthecoreofmachinelearning: data,amodel,andlearning.

Sincemachinelearningisinherentlydatadriven, data isatthecore data ofmachinelearning.Thegoalofmachinelearningistodesigngeneralpurposemethodologiestoextractvaluablepatternsfromdata,ideally withoutmuchdomain-specificexpertise.Forexample,givenalargecorpus ofdocuments(e.g.,booksinmanylibraries),machinelearningmethods canbeusedtoautomaticallyfindrelevanttopicsthataresharedacross documents(Hoffmanetal., 2010).Toachievethisgoal,wedesign models thataretypicallyrelatedtotheprocessthatgeneratesdata,similarto model thedatasetwearegiven.Forexample,inaregressionsetting,themodel woulddescribeafunctionthatmapsinputstoreal-valuedoutputs.To paraphrase Mitchell (1997):Amodelissaidtolearnfromdataifitsperformanceonagiventaskimprovesafterthedataistakenintoaccount. Thegoalistofindgoodmodelsthatgeneralizewelltoyetunseendata, whichwemaycareaboutinthefuture. Learning canbeunderstoodasa learning waytoautomaticallyfindpatternsandstructureindatabyoptimizingthe parametersofthemodel.

Whilemachinelearninghasseenmanysuccessstories,andsoftwareis readilyavailabletodesignandtrainrichandﬂexiblemachinelearning systems,webelievethatthemathematicalfoundationsofmachinelearningareimportantinordertounderstandfundamentalprinciplesupon whichmorecomplicatedmachinelearningsystemsarebuilt.Understandingtheseprinciplescanfacilitatecreatingnewmachinelearningsolutions, understandinganddebuggingexistingapproaches,andlearningaboutthe inherentassumptionsandlimitationsofthemethodologiesweareworkingwith.

11

ThismaterialwillbepublishedbyCambridgeUniversityPressas MathematicsforMachineLearning byMarcPeterDeisenroth,A.AldoFaisal,andChengSoonOng.Thispre-publicationversionis freetoviewanddownloadforpersonaluseonly.Notforre-distribution,re-saleoruseinderivativeworks. c byM.P.Deisenroth,A.A.Faisal,andC.S.Ong,2020. https://mml-book.com

1

1.1FindingWordsforIntuitions

Achallengewefaceregularlyinmachinelearningisthatconceptsand wordsareslippery,andaparticularcomponentofthemachinelearning systemcanbeabstractedtodifferentmathematicalconcepts.Forexample, theword“algorithm”isusedinatleasttwodifferentsensesinthecontextofmachinelearning.Intheﬁrstsense,weusethephrase“machine learningalgorithm”tomeanasystemthatmakespredictionsbasedoninputdata.Werefertothesealgorithmsas predictors.Inthesecondsense, predictor weusetheexactsamephrase“machinelearningalgorithm”tomeana systemthatadaptssomeinternalparametersofthepredictorsothatit performswellonfutureunseeninputdata.Herewerefertothisadaptationas training asystem. training

Thisbookwillnotresolvetheissueofambiguity,butwewanttohighlightupfrontthat,dependingonthecontext,thesameexpressionscan meandifferentthings.However,weattempttomakethecontextsufﬁcientlycleartoreducethelevelofambiguity.

Theﬁrstpartofthisbookintroducesthemathematicalconceptsand foundationsneededtotalkaboutthethreemaincomponentsofamachine learningsystem:data,models,andlearning.Wewillbrieﬂyoutlinethese componentshere,andwewillrevisitthemagaininChapter 8 oncewe havediscussedthenecessarymathematicalconcepts.

Whilenotalldataisnumerical,itisoftenusefultoconsiderdatain anumberformat.Inthisbook,weassumethat data hasalreadybeen appropriatelyconvertedintoanumericalrepresentationsuitableforreadingintoacomputerprogram.Therefore,wethinkofdataasvectors.As dataasvectors anotherillustrationofhowsubtlewordsare,thereare(atleast)three differentwaystothinkaboutvectors:avectorasanarrayofnumbers(a computerscienceview),avectorasanarrowwithadirectionandmagnitude(aphysicsview),andavectorasanobjectthatobeysadditionand scaling(amathematicalview).

A model istypicallyusedtodescribeaprocessforgeneratingdata,sim- model ilartothedatasetathand.Therefore,goodmodelscanalsobethought ofassimpliﬁedversionsofthereal(unknown)data-generatingprocess, capturingaspectsthatarerelevantformodelingthedataandextracting hiddenpatternsfromit.Agoodmodelcanthenbeusedtopredictwhat wouldhappenintherealworldwithoutperformingreal-worldexperiments.

Wenowcometothecruxofthematter,the learning componentof learning machinelearning.Assumewearegivenadatasetandasuitablemodel. Training themodelmeanstousethedataavailabletooptimizesomeparametersofthemodelwithrespecttoautilityfunctionthatevaluateshow wellthemodelpredictsthetrainingdata.Mosttrainingmethodscanbe thoughtofasanapproachanalogoustoclimbingahilltoreachitspeak. Inthisanalogy,thepeakofthehillcorrespondstoamaximumofsome

Draft(2020-01-01)of“MathematicsforMachineLearning”.Feedback: https://mml-book.com

12 IntroductionandMotivation

1.2TwoWaystoReadThisBook

desiredperformancemeasure.However,inpractice,weareinterestedin themodeltoperformwellonunseendata.Performingwellondatathat wehavealreadyseen(trainingdata)mayonlymeanthatwefounda goodwaytomemorizethedata.However,thismaynotgeneralizewellto unseendata,and,inpracticalapplications,weoftenneedtoexposeour machinelearningsystemtosituationsthatithasnotencounteredbefore.

Letussummarizethemainconceptsofmachinelearningthatwecover inthisbook:

Werepresentdataasvectors.

Wechooseanappropriatemodel,eitherusingtheprobabilisticoroptimizationview.

Welearnfromavailabledatabyusingnumericaloptimizationmethods withtheaimthatthemodelperformswellondatanotusedfortraining.

1.2TwoWaystoReadThisBook

Wecanconsidertwostrategiesforunderstandingthemathematicsfor machinelearning:

Bottom-up: Buildinguptheconceptsfromfoundationaltomoreadvanced.Thisisoftenthepreferredapproachinmoretechnicalﬁelds, suchasmathematics.Thisstrategyhastheadvantagethatthereader atalltimesisabletorelyontheirpreviouslylearnedconcepts.Unfortunately,forapractitionermanyofthefoundationalconceptsarenot particularlyinterestingbythemselves,andthelackofmotivationmeans thatmostfoundationaldeﬁnitionsarequicklyforgotten.

Top-down: Drillingdownfrompracticalneedstomorebasicrequirements.Thisgoal-drivenapproachhastheadvantagethatthereaders knowatalltimeswhytheyneedtoworkonaparticularconcept,and thereisaclearpathofrequiredknowledge.Thedownsideofthisstrategyisthattheknowledgeisbuiltonpotentiallyshakyfoundations,and thereadershavetorememberasetofwordsthattheydonothaveany wayofunderstanding.

Wedecidedtowritethisbookinamodularwaytoseparatefoundational (mathematical)conceptsfromapplicationssothatthisbookcanberead inbothways.Thebookissplitintotwoparts,wherePartIlaysthemathematicalfoundationsandPartIIappliestheconceptsfromPartItoaset offundamentalmachinelearningproblems,whichformfourpillarsof machinelearningasillustratedinFigure 1.1:regression,dimensionality reduction,densityestimation,andclassiﬁcation.ChaptersinPartImostly builduponthepreviousones,butitispossibletoskipachapterandwork backwardifnecessary.ChaptersinPartIIareonlylooselycoupledand canbereadinanyorder.Therearemanypointersforwardandbackward

c 2020M.P.Deisenroth,A.A.Faisal,C.S.Ong.TobepublishedbyCambridgeUniversityPress.

13

Figure1.1 The foundationsand fourpillarsof machinelearning.

IntroductionandMotivation

betweenthetwopartsofthebooktolinkmathematicalconceptswith machinelearningalgorithms.

Ofcoursetherearemorethantwowaystoreadthisbook. Mostreaders learnusingacombinationoftop-downandbottom-upapproaches,sometimesbuildingupbasicmathematicalskillsbeforeattemptingmorecomplexconcepts,butalsochoosingtopicsbasedonapplicationsofmachine learning.

PartIIsaboutMathematics

Thefourpillarsofmachinelearningwecoverinthisbook(seeFigure 1.1) requireasolidmathematicalfoundation,whichislaidoutinPartI. Werepresentnumericaldataasvectorsandrepresentatableofsuch dataasamatrix.Thestudyofvectorsandmatricesiscalled linearalgebra, whichweintroduceinChapter 2.Thecollectionofvectorsasamatrixis linearalgebra alsodescribedthere.

Giventwovectorsrepresentingtwoobjectsintherealworld,wewant tomakestatementsabouttheirsimilarity.Theideaisthatvectorsthat aresimilarshouldbepredictedtohavesimilaroutputsbyourmachine learningalgorithm(ourpredictor).Toformalizetheideaofsimilaritybetweenvectors,weneedtointroduceoperationsthattaketwovectorsas inputandreturnanumericalvaluerepresentingtheirsimilarity.Theconstructionofsimilarityanddistancesiscentralto analyticgeometry andis analyticgeometry discussedinChapter 3

InChapter 4,weintroducesomefundamentalconceptsaboutmatricesand matrixdecomposition.Someoperationsonmatricesareextremely matrix decomposition usefulinmachinelearning,andtheyallowforanintuitiveinterpretation ofthedataandmoreefﬁcientlearning.

Weoftenconsiderdatatobenoisyobservationsofsometrueunderlyingsignal.Wehopethatbyapplyingmachinelearningwecanidentifythe signalfromthenoise.Thisrequiresustohavealanguageforquantifyingwhat“noise”means.Weoftenwouldalsoliketohavepredictorsthat

Draft(2020-01-01)of“MathematicsforMachineLearning”.Feedback: https://mml-book.com

14

Classiﬁcation DensityEstimation Regression DimensionalityReduction MachineLearning VectorCalculus Probability&Distributions Optimization AnalyticGeometry MatrixDecomposition LinearAlgebra

1.2TwoWaystoReadThisBook

allowustoexpresssomesortofuncertainty,e.g.,toquantifytheconﬁdencewehaveaboutthevalueofthepredictionataparticulartestdata point.Quantiﬁcationofuncertaintyistherealmof probabilitytheory and probabilitytheory iscoveredinChapter 6.

Totrainmachinelearningmodels,wetypicallyﬁndparametersthat maximizesomeperformancemeasure.Manyoptimizationtechniquesrequiretheconceptofagradient,whichtellsusthedirectioninwhichto searchforasolution.Chapter 5 isabout vectorcalculus anddetailsthe vectorcalculus conceptofgradients,whichwesubsequentlyuseinChapter 7,wherewe talkabout optimization toﬁndmaxima/minimaoffunctions. optimization

PartIIIsaboutMachineLearning

Thesecondpartofthebookintroduces fourpillarsofmachinelearning asshowninFigure 1.1.Weillustratehowthemathematicalconceptsintroducedintheﬁrstpartofthebookarethefoundationforeachpillar. Broadlyspeaking,chaptersareorderedbydifﬁculty(inascendingorder).

InChapter 8,werestatethethreecomponentsofmachinelearning (data,models,andparameterestimation)inamathematicalfashion.In addition,weprovidesomeguidelinesforbuildingexperimentalset-ups thatguardagainstoverlyoptimisticevaluationsofmachinelearningsystems.Recallthatthegoalistobuildapredictorthatperformswellon unseendata.

InChapter 9,wewillhaveacloselookat linearregression,whereour linearregression objectiveistoﬁndfunctionsthatmapinputs x ∈ RD tocorrespondingobservedfunctionvalues y ∈ R,whichwecaninterpretasthelabelsoftheir respectiveinputs.Wewilldiscussclassicalmodelﬁtting(parameterestimation)viamaximumlikelihoodandmaximumaposterioriestimation, aswellasBayesianlinearregression,whereweintegratetheparameters outinsteadofoptimizingthem.

Chapter 10 focuseson dimensionalityreduction,thesecondpillarinFig- dimensionality reduction ure 1.1,usingprincipalcomponentanalysis.Thekeyobjectiveofdimensionalityreductionistoﬁndacompact,lower-dimensionalrepresentation ofhigh-dimensionaldata x ∈ RD,whichisofteneasiertoanalyzethan theoriginaldata.Unlikeregression,dimensionalityreductionisonlyconcernedaboutmodelingthedata–therearenolabelsassociatedwitha datapoint x

InChapter 11,wewillmovetoourthirdpillar: densityestimation.The densityestimation objectiveofdensityestimationistoﬁndaprobabilitydistributionthatdescribesagivendataset.WewillfocusonGaussianmixturemodelsforthis purpose,andwewilldiscussaniterativeschemetoﬁndtheparametersof thismodel.Asindimensionalityreduction,therearenolabelsassociated withthedatapoints x ∈ RD.However,wedonotseekalow-dimensional representationofthedata.Instead,weareinterestedinadensitymodel thatdescribesthedata.

Chapter 12 concludesthebookwithanin-depthdiscussionofthefourth

c 2020M.P.Deisenroth,A.A.Faisal,C.S.Ong.TobepublishedbyCambridgeUniversityPress.

15

IntroductionandMotivation

pillar: classification.Wewilldiscussclassificationinthecontextofsupport classification vectormachines.Similartoregression(Chapter 9),wehaveinputs x and correspondinglabels y.However,unlikeregression,wherethelabelswere real-valued,thelabelsinclassificationareintegers,whichrequiresspecial care.

1.3ExercisesandFeedback

WeprovidesomeexercisesinPartI,whichcanbedonemostlybypenand paper.ForPartII,weprovideprogrammingtutorials(jupyternotebooks) toexploresomepropertiesofthemachinelearningalgorithmswediscuss inthisbook.

WeappreciatethatCambridgeUniversityPressstronglysupportsour aimtodemocratizeeducationandlearningbymakingthisbookfreely availablefordownloadat

https://mml-book.com wheretutorials,errata,andadditionalmaterialscanbefound.Mistakes canbereportedandfeedbackprovidedusingtheprecedingURL.

Draft(2020-01-01)of“MathematicsforMachineLearning”.Feedback: https://mml-book.com

16

LinearAlgebra

Whenformalizingintuitiveconcepts,acommonapproachistoconstructa setofobjects(symbols)andasetofrulestomanipulatetheseobjects.This isknownasan algebra.Linearalgebraisthestudyofvectorsandcertain algebra rulestomanipulatevectors.Thevectorsmanyofusknowfromschoolare called“geometricvectors”,whichareusuallydenotedbyasmallarrow abovetheletter,e.g., −→ x and −→ y .Inthisbook,wediscussmoregeneral conceptsofvectorsanduseaboldlettertorepresentthem,e.g., x and y Ingeneral,vectorsarespecialobjectsthatcanbeaddedtogetherand multipliedbyscalarstoproduceanotherobjectofthesamekind.From anabstractmathematicalviewpoint,anyobjectthatsatisﬁesthesetwo propertiescanbeconsideredavector.Herearesomeexamplesofsuch vectorobjects:

1. Geometricvectors.Thisexampleofavectormaybefamiliarfromhigh schoolmathematicsandphysics.Geometricvectors–seeFigure 2.1(a) –aredirectedsegments,whichcanbedrawn(atleastintwodimensions).Twogeometricvectors → x, → y canbeadded,suchthat → x + → y = → z isanothergeometricvector.Furthermore,multiplicationbyascalar λ → x, λ ∈ R,isalsoageometricvector.Infact,itistheoriginalvector scaledby λ.Therefore,geometricvectorsareinstancesofthevector conceptsintroducedpreviously.Interpretingvectorsasgeometricvectorsenablesustouseourintuitionsaboutdirectionandmagnitudeto reasonaboutmathematicaloperations.

2. Polynomialsarealsovectors;seeFigure 2.1(b):Twopolynomialscan

ThismaterialwillbepublishedbyCambridgeUniversityPressas MathematicsforMachineLearning byMarcPeterDeisenroth,A.AldoFaisal,andChengSoonOng.Thispre-publicationversionis freetoviewanddownloadforpersonaluseonly.Notforre-distribution,re-saleoruseinderivativeworks. c byM.P.Deisenroth,A.A.Faisal,andC.S.Ong,2020. https://mml-book.com

Figure2.1

Differenttypesof vectors.Vectorscan besurprising objects,including (a) geometric vectors and (b) polynomials.

2

→ x → y → x + → y (a)Geometricvectors. 2 0 2 x 6 4 2 0 2 4 y (b)Polynomials.

17

18

LinearAlgebra beaddedtogether,whichresultsinanotherpolynomial;andtheycan bemultipliedbyascalar λ ∈ R,andtheresultisapolynomialas well.Therefore,polynomialsare(ratherunusual)instancesofvectors. Notethatpolynomialsareverydifferentfromgeometricvectors.While geometricvectorsareconcrete“drawings”,polynomialsareabstract concepts.However,theyarebothvectorsinthesensepreviouslydescribed.

3. Audiosignalsarevectors.Audiosignalsarerepresentedasaseriesof numbers.Wecanaddaudiosignalstogether,andtheirsumisanew audiosignal.Ifwescaleanaudiosignal,wealsoobtainanaudiosignal. Therefore,audiosignalsareatypeofvector,too.

4. Elementsof Rn (tuplesof n realnumbers)arevectors. Rn ismore abstractthanpolynomials,anditistheconceptwefocusoninthis book.Forinstance,

isanexampleofatripletofnumbers.Addingtwovectors a, b ∈ Rn component-wiseresultsinanothervector: a + b = c ∈ Rn.Moreover, multiplying a ∈ Rn by λ ∈ R resultsinascaledvector λa ∈ Rn . Consideringvectorsaselementsof Rn hasanadditionalbeneﬁtthat

Becarefultocheck whetherarray operationsactually performvector operationswhen implementingona computer. itlooselycorrespondstoarraysofrealnumbersonacomputer.Many programminglanguagessupportarrayoperations,whichallowforconvenientimplementationofalgorithmsthatinvolvevectoroperations.

Linearalgebrafocusesonthesimilaritiesbetweenthesevectorconcepts. Wecanaddthemtogetherandmultiplythembyscalars.Wewilllargely PavelGrinfeld’s seriesonlinear algebra: http://tinyurl. com/nahclwm GilbertStrang’s courseonlinear algebra: http://tinyurl. com/29p5q8j 3Blue1Brownseries onlinearalgebra: https://tinyurl. com/h5g4kps

focusonvectorsin Rn sincemostalgorithmsinlinearalgebraareformulatedin Rn.WewillseeinChapter 8 thatweoftenconsiderdatato berepresentedasvectorsin Rn.Inthisbook,wewillfocusonﬁnitedimensionalvectorspaces,inwhichcasethereisa 1:1 correspondence betweenanykindofvectorand Rn.Whenitisconvenient,wewilluse intuitionsaboutgeometricvectorsandconsiderarray-basedalgorithms.

Onemajorideainmathematicsistheideaof“closure”.Thisisthequestion:Whatisthesetofallthingsthatcanresultfrommyproposedoperations?Inthecaseofvectors:Whatisthesetofvectorsthatcanresultby startingwithasmallsetofvectors,andaddingthemtoeachotherand scalingthem?Thisresultsinavectorspace(Section 2.4).Theconceptof avectorspaceanditspropertiesunderliemuchofmachinelearning.The conceptsintroducedinthischapteraresummarizedinFigure 2.2

Thischapterismostlybasedonthelecturenotesandbooksby Drumm andWeil (2001), Strang (2003), Hogben (2013), LiesenandMehrmann (2015),aswellasPavelGrinfeld’sLinearAlgebraseries.Otherexcellent

Draft(2020-01-01)of“MathematicsforMachineLearning”.Feedback: https://mml-book.com

a =   1 2 3   ∈ R3 (2.1)

2.1SystemsofLinearEquations

resourcesareGilbertStrang’sLinearAlgebracourseatMITandtheLinear AlgebraSeriesby3Blue1Brown.

Linearalgebraplaysanimportantroleinmachinelearningandgeneralmathematics.TheconceptsintroducedinthischapterarefurtherexpandedtoincludetheideaofgeometryinChapter 3.InChapter 5,we willdiscussvectorcalculus,whereaprincipledknowledgeofmatrixoperationsisessential.InChapter 10,wewilluseprojections(tobeintroducedinSection 3.8)fordimensionalityreductionwithprincipalcomponentanalysis(PCA).InChapter 9,wewilldiscusslinearregression,where linearalgebraplaysacentralroleforsolvingleast-squaresproblems.

2.1SystemsofLinearEquations

Systemsoflinearequationsplayacentralpartoflinearalgebra.Many problemscanbeformulatedassystemsoflinearequations,andlinear algebragivesusthetoolsforsolvingthem.

Example2.1

Acompanyproducesproducts N1,...,Nn forwhichresources R1,...,Rm arerequired.Toproduceaunitofproduct Nj , aij unitsof resource Ri areneeded,where i =1,...,m and j =1,...,n. Theobjectiveistoﬁndanoptimalproductionplan,i.e.,aplanofhow manyunits xj ofproduct Nj shouldbeproducedifatotalof bi unitsof resource Ri areavailableand(ideally)noresourcesareleftover. Ifweproduce x1,...,xn unitsofthecorrespondingproducts,weneed

c 2020M.P.Deisenroth,A.A.Faisal,C.S.Ong.TobepublishedbyCambridgeUniversityPress.

Figure2.2 Amind mapoftheconcepts introducedinthis chapter,alongwith wheretheyareused inotherpartsofthe book.

19

Vector Vectorspace Matrix Chapter 5 Vectorcalculus Group Systemof linearequations Matrix inverse Gaussian elimination Linear/afﬁne mapping Linear independence Basis Chapter 10 Dimensionality reduction Chapter 12 Classiﬁcation Chapter 3 Analyticgeometry composes closure Abelian with + represents represents solvedby solves propertyof maximalset

atotalof

ai1x1 + + ainxn (2.2)

manyunitsofresource Ri.Anoptimalproductionplan (x1,...,xn) ∈ Rn , therefore,hastosatisfythefollowingsystemofequations:

(2.3)

where aij ∈ R and bi ∈ R

Equation(2.3)isthegeneralformofa systemoflinearequations,and

systemoflinear equations x1,...,xn arethe unknowns ofthissystem.Every n-tuple (x1,...,xn) ∈ Rn thatsatisﬁes(2.3)isa solution ofthelinearequationsystem. solution

Example2.2

Thesystemoflinearequations

x1 + x2 + x3 =3(1) x1 x2 +2x3 =2(2) 2x1 +3x3 =1(3) (2.4)

has nosolution: Addingtheﬁrsttwoequationsyields 2x1 +3x3 =5,which contradictsthethirdequation(3).

Letushavealookatthesystemoflinearequations

x1 + x2 + x3 =3(1) x1 x2 +2x3 =2(2) x2 + x3 =2(3) . (2.5)

Fromtheﬁrstandthirdequation,itfollowsthat x1 =1.From(1)+(2), weget 2x1 +3x3 =5,i.e., x3 =1.From(3),wethengetthat x2 =1. Therefore, (1, 1, 1) istheonlypossibleand uniquesolution (verifythat (1, 1, 1) isasolutionbypluggingin).

Asathirdexample,weconsider

x1 + x2 + x3 =3(1) x1 x2 +2x3 =2(2) 2x1 +3x3 =5(3) . (2.6)

Since(1)+(2)=(3),wecanomitthethirdequation(redundancy).From (1)and(2),weget 2x1 =5 3x3 and 2x2 =1+x3.Wedeﬁne x3 = a ∈ R asafreevariable,suchthatanytriplet 5 2 3 2 a, 1 2 + 1 2 a,a ,a ∈ R (2.7)

Draft(2020-01-01)of“MathematicsforMachineLearning”.Feedback: https://mml-book.com

20 LinearAlgebra

a

11x1 + + a1nxn = b1 . am1x1 + + amnxn = bm ,

Figure2.3 The solutionspaceofa systemoftwolinear equationswithtwo variablescanbe geometrically interpretedasthe intersectionoftwo lines.Everylinear equationrepresents aline. 2x1 4x2 =1 4x1 +4x2 =5 x1 x2

isasolutionofthesystemoflinearequations,i.e.,weobtainasolution setthatcontains inﬁnitelymany solutions.

Ingeneral,forareal-valuedsystemoflinearequationsweobtaineither no,exactlyone,orinﬁnitelymanysolutions.Linearregression(Chapter 9) solvesaversionofExample 2.1 whenwecannotsolvethesystemoflinear equations.

Remark (GeometricInterpretationofSystemsofLinearEquations). Ina systemoflinearequationswithtwovariables x1,x2,eachlinearequation deﬁnesalineonthe x1x2-plane.Sinceasolutiontoasystemoflinear equationsmustsatisfyallequationssimultaneously,thesolutionsetisthe intersectionoftheselines.Thisintersectionsetcanbealine(ifthelinear equationsdescribethesameline),apoint,orempty(whenthelinesare parallel).AnillustrationisgiveninFigure 2.3 forthesystem

(2.8)

wherethesolutionspaceisthepoint (x1,x2)=(1, 1 4 ).Similarly,forthree variables,eachlinearequationdeterminesaplaneinthree-dimensional space.Whenweintersecttheseplanes,i.e.,satisfyalllinearequationsat thesametime,wecanobtainasolutionsetthatisaplane,aline,apoint orempty(whentheplaneshavenocommonintersection).

Forasystematicapproachtosolvingsystemsoflinearequations,we willintroduceausefulcompactnotation.Wecollectthecoefﬁcients aij intovectorsandcollectthevectorsintomatrices.Inotherwords,wewrite thesystemfrom(2.3)inthefollowingform:

c 2020M.P.Deisenroth,A.A.Faisal,C.S.Ong.TobepublishedbyCambridgeUniversityPress.

2.1SystemsofLinearEquations 21

4x1 +4x2 =5 2x1 4x2 =1

♦

x1    a11 . . am1    + x2    a12 . . am2    + + xn    a1n . . amn    =    b1 . . bm    (2.9)

Inthefollowing,wewillhaveacloselookatthese matrices anddeﬁnecomputationrules.WewillreturntosolvinglinearequationsinSection 2.3.

2.2Matrices

Matricesplayacentralroleinlinearalgebra.Theycanbeusedtocompactlyrepresentsystemsoflinearequations,buttheyalsorepresentlinear functions(linearmappings)aswewillseelaterinSection 2.7.Beforewe discusssomeoftheseinterestingtopics,letusﬁrstdeﬁnewhatamatrix isandwhatkindofoperationswecandowithmatrices.Wewillseemore propertiesofmatricesinChapter 4

Deﬁnition2.1 (Matrix). With m,n ∈ N areal-valued (m,n) matrix A is matrix an m n-tupleofelements aij , i =1,...,m, j =1,...,n,whichisordered accordingtoarectangularschemeconsistingof m rowsand n columns:

Byconvention (1,n)-matricesarecalled rows and (m, 1)-matricesarecalled row columns.Thesespecialmatricesarealsocalled row/columnvectors. column rowvector columnvector Figure2.4 By stackingits columns,amatrix A canberepresented asalongvector a.

Rm×n isthesetofallreal-valued (m,n)-matrices. A ∈ Rm×n canbe equivalentlyrepresentedas a ∈ Rmn bystackingall n columnsofthe matrixintoalongvector;seeFigure 2.4.

2.2.1MatrixAdditionandMultiplication

Thesumoftwomatrices A ∈ Rm×n , B ∈ Rm×n isdeﬁnedastheelementwisesum,i.e.,

Notethesizeofthe matrices. C = AB ∈ Rm×k arecomputedas C= np.einsum(’il, lj’,A,B) cij = n l=1

Formatrices A ∈ Rm×n , B ∈ Rn×k,theelements cij oftheproduct

Draft(2020-01-01)of“MathematicsforMachineLearning”.Feedback: https://mml-book.com

22 LinearAlgebra ⇐⇒    a11 a1n . . . . am1 amn       x1 . . xn    =    b1 . . bm    . (2.10)

A =      a11 a12 a1n a21 a22 ··· a2n . . . am1 am2 ··· amn      ,aij ∈ R . (2.11)

re-shape A ∈ R4×2 a ∈ R8

   a11 + b11 a1n + b1n . . . . am1 + bm1 amn + bmn    ∈ Rm×n (2.12)

A + B :=

b

=1

ail

lj ,i

,...,m,j

,...,k. (2.13)

2.2Matrices 23

Thismeans,tocomputeelement cij wemultiplytheelementsofthe ith Thereare n columns in A and n rowsin B sothatwecan compute ailblj for l =1,...,n

rowof A withthe jthcolumnof B andsumthemup.LaterinSection 3.2, wewillcallthisthe dotproduct ofthecorrespondingrowandcolumn.In cases,whereweneedtobeexplicitthatweareperformingmultiplication, weusethenotation A · B todenotemultiplication(explicitlyshowing “ ”).

Remark. Matricescanonlybemultipliediftheir“neighboring”dimensions match.Forinstance,an n × k-matrix A canbemultipliedwitha k × mmatrix B,butonlyfromtheleftside:

Theproduct BA isnotdeﬁnedif m = n sincetheneighboringdimensions donotmatch.

Remark. Matrixmultiplicationis not deﬁnedasanelement-wiseoperation onmatrixelements,i.e., cij = aij bij (evenifthesizeof A, B waschosenappropriately).Thiskindofelement-wisemultiplicationoftenappears inprogramminglanguageswhenwemultiply(multi-dimensional)arrays witheachother,andiscalleda Hadamardproduct

Example2.3

Fromthisexample,wecanalreadyseethatmatrixmultiplicationisnot commutative,i.e., AB = BA;seealsoFigure 2.5 foranillustration.

Deﬁnition2.2 (IdentityMatrix). In Rn×n,wedeﬁnethe identitymatrix

Commonly,thedot productbetween twovectors a, b is denotedby a b or a, b

c 2020M.P.Deisenroth,A.A.Faisal,C.S.Ong.TobepublishedbyCambridgeUniversityPress.

Figure2.5 Evenif bothmatrix multiplications AB and BA are deﬁned,the dimensionsofthe resultscanbe different.

identitymatrix

A n×k B k×m = C n×m (2.14)

♦

♦ Hadamardproduct

A = 123 321 ∈ R2×3 , B =   02 1 1 01   ∈ R3×2,weobtain AB = 123 321   02 1 1 01   = 23 25 ∈ R2×2 , (2.15) BA =   02 1 1 01   123 321 =   642 202 321   ∈ R3×3 . (2.16)

For

I n :=           10 ··· 0 ··· 0 01 0 0 . . . . . . . . . . 00 1 0 . . . . . . . . . . 00 0 1           ∈ Rn×n (2.17)

Turn static files into dynamic content formats.

Create a flipbook