Mathematics for machine learning 1st edition marc peter deisenroth

Page 1

Mathematics for Machine Learning 1st Edition Marc

Visit to download the full and correct content document: https://textbookfull.com/product/mathematics-for-machine-learning-1st-edition-marc-p eter-deisenroth/

More products digital (pdf, epub, mobi) instant download maybe you interests ...

Mathematics for machine learning Deisenroth M.P

https://textbookfull.com/product/mathematics-for-machinelearning-deisenroth-m-p/

Machine Learning and its Applications 1st Edition Peter Wlodarczak

https://textbookfull.com/product/machine-learning-and-itsapplications-1st-edition-peter-wlodarczak/

AI as a Service: Serverless machine learning with AWS 1st Edition Peter Elger

https://textbookfull.com/product/ai-as-a-service-serverlessmachine-learning-with-aws-1st-edition-peter-elger/

Machine Learning for Factor Investing R Version Chapman and Hall CRC Financial Mathematics Series 1st Edition Guillaume Coqueret

https://textbookfull.com/product/machine-learning-for-factorinvesting-r-version-chapman-and-hall-crc-financial-mathematicsseries-1st-edition-guillaume-coqueret/

Mathematics Affect and Learning Middle School Students

Beliefs and Attitudes About Mathematics Education Peter Grootenboer

https://textbookfull.com/product/mathematics-affect-and-learningmiddle-school-students-beliefs-and-attitudes-about-mathematicseducation-peter-grootenboer/

Machine Learning for iOS Developers 1st Edition

Abhishek Mishra

https://textbookfull.com/product/machine-learning-for-iosdevelopers-1st-edition-abhishek-mishra/

Machine Learning for Decision Makers 1st Edition

Patanjali Kashyap

https://textbookfull.com/product/machine-learning-for-decisionmakers-1st-edition-patanjali-kashyap/

Mathematics for Machine Technology 8th Edition John C. Peterson

https://textbookfull.com/product/mathematics-for-machinetechnology-8th-edition-john-c-peterson/

Machine Learning by Tutorials Beginning Machine

Learning for Apple and iOS First Edition Raywenderlich Tutorial Team

https://textbookfull.com/product/machine-learning-by-tutorialsbeginning-machine-learning-for-apple-and-ios-first-editionraywenderlich-tutorial-team/

MA THE MA TICS FOR MA CHINE LEA R NI NG

A. Aldo Faisal Cheng Soon O n g

Marc Peter Deisenroth

2.4VectorSpaces

2.5LinearIndependence

2.7LinearMappings

3AnalyticGeometry

3.5OrthonormalBasis

3.6OrthogonalComplement

3.7InnerProductofFunctions

3.8OrthogonalProjections

3.9Rotations

ThismaterialwillbepublishedbyCambridgeUniversityPressas MathematicsforMachineLearning byMarcPeterDeisenroth,A.AldoFaisal,andChengSoonOng.Thispre-publicationversionis freetoviewanddownloadforpersonaluseonly.Notforre-distribution,re-saleoruseinderivativeworks. c byM.P.Deisenroth,A.A.Faisal,andC.S.Ong,2020. https://mml-book.com.

Contents Foreword 1 PartIMathematicalFoundations 9 1IntroductionandMotivation 11 1.1FindingWordsforIntuitions 12 1.2TwoWaystoReadThisBook 13 1.3ExercisesandFeedback 16
17
19
22
27
2LinearAlgebra
2.1SystemsofLinearEquations
2.2Matrices
2.3SolvingSystemsofLinearEquations
35
40
44
2.6BasisandRank
48
61 2.9FurtherReading 63 Exercises 64
2.8AffineSpaces
70
71
72
75
76
78
3.1Norms
3.2InnerProducts
3.3LengthsandDistances
3.4AnglesandOrthogonality
79
80
81
91
94 Exercises 96
98 4.1DeterminantandTrace 99 i
3.10FurtherReading
4MatrixDecompositions

4.2EigenvaluesandEigenvectors

4.7MatrixPhylogeny

4.8FurtherReading

5VectorCalculus

5.1DifferentiationofUnivariateFunctions

5.2PartialDifferentiationandGradients

5.3GradientsofVector-ValuedFunctions

5.4GradientsofMatrices

5.5UsefulIdentitiesforComputingGradients

5.6BackpropagationandAutomaticDifferentiation

5.7Higher-OrderDerivatives

5.8LinearizationandMultivariateTaylorSeries

5.9FurtherReading

6ProbabilityandDistributions

6.1ConstructionofaProbabilitySpace

6.2DiscreteandContinuousProbabilities

6.3SumRule,ProductRule,andBayes’Theorem

6.4SummaryStatisticsandIndependence

6.5GaussianDistribution

6.6ConjugacyandtheExponentialFamily

6.7ChangeofVariables/InverseTransform

6.8FurtherReading

7ContinuousOptimization

7.1OptimizationUsingGradientDescent

7.2ConstrainedOptimizationandLagrangeMultipliers

7.3ConvexOptimization

7.4FurtherReading

8WhenModelsMeetData

8.1Data,Models,andLearning

8.2EmpiricalRiskMinimization

8.4ProbabilisticModelingandInference

8.5DirectedGraphicalModels

ii Contents
105
114
115
119
129
4.3CholeskyDecomposition
4.4EigendecompositionandDiagonalization
4.5SingularValueDecomposition
4.6MatrixApproximation
134
135 Exercises 137
139
141
146
149
155
158
159
164
165
170 Exercises 170
172
172
178
183
186
197
205
214
221 Exercises 222
225
227
233
236
246 Exercises 247 PartIICentralMachineLearningProblems 249
251
251
258
265
8.3ParameterEstimation
272
278
Draft(2020-01-01)of“MathematicsforMachineLearning”.Feedback: https://mml-book.com

10DimensionalityReductionwithPrincipalComponentAnalysis

11DensityEstimationwithGaussianMixtureModels

12ClassificationwithSupportVectorMachines

c 2020M.P.Deisenroth,A.A.Faisal,C.S.Ong.TobepublishedbyCambridgeUniversityPress.

Contents iii
283
289 9.1ProblemFormulation 291 9.2ParameterEstimation 292 9.3BayesianLinearRegression 303 9.4MaximumLikelihoodasOrthogonalProjection 313
315
8.6ModelSelection
9LinearRegression
9.5FurtherReading
317
318 10.2MaximumVariancePerspective 320
325
333
335 10.6KeyStepsofPCAinPractice 336 10.7LatentVariablePerspective 339 10.8FurtherReading 343
10.1ProblemSetting
10.3ProjectionPerspective
10.4EigenvectorComputationandLow-RankApproximations
10.5PCAinHighDimensions
348 11.1GaussianMixtureModel 349 11.2ParameterLearningviaMaximumLikelihood 350 11.3EMAlgorithm 360 11.4Latent-VariablePerspective 363 11.5FurtherReading 368
370 12.1SeparatingHyperplanes 372 12.2PrimalSupportVectorMachine 374 12.3DualSupportVectorMachine 383 12.4Kernels 388 12.5NumericalSolution 390 12.6FurtherReading 392 References 395 Index 407

Foreword

Machinelearningisthelatestinalonglineofattemptstodistillhuman knowledgeandreasoningintoaformthatissuitableforconstructingmachinesandengineeringautomatedsystems.Asmachinelearningbecomes moreubiquitousanditssoftwarepackagesbecomeeasiertouse,itisnaturalanddesirablethatthelow-leveltechnicaldetailsareabstractedaway andhiddenfromthepractitioner.However,thisbringswithitthedanger thatapractitionerbecomesunawareofthedesigndecisionsand,hence, thelimitsofmachinelearningalgorithms.

Theenthusiasticpractitionerwhoisinterestedtolearnmoreaboutthe magicbehindsuccessfulmachinelearningalgorithmscurrentlyfacesa dauntingsetofpre-requisiteknowledge:

Programminglanguagesanddataanalysistools

Large-scalecomputationandtheassociatedframeworks Mathematicsandstatisticsandhowmachinelearningbuildsonit

Atuniversities,introductorycoursesonmachinelearningtendtospend earlypartsofthecoursecoveringsomeofthesepre-requisites.Forhistoricalreasons,coursesinmachinelearningtendtobetaughtinthecomputer sciencedepartment,wherestudentsareoftentrainedinthefirsttwoareas ofknowledge,butnotsomuchinmathematicsandstatistics.

Currentmachinelearningtextbooksprimarilyfocusonmachinelearningalgorithmsandmethodologiesandassumethatthereaderiscompetentinmathematicsandstatistics.Therefore,thesebooksonlyspend oneortwochaptersofbackgroundmathematics,eitheratthebeginning ofthebookorasappendices.Wehavefoundmanypeoplewhowantto delveintothefoundationsofbasicmachinelearningmethodswhostrugglewiththemathematicalknowledgerequiredtoreadamachinelearning textbook.Havingtaughtundergraduateandgraduatecoursesatuniversities,wefindthatthegapbetweenhighschoolmathematicsandthemathematicslevelrequiredtoreadastandardmachinelearningtextbookistoo bigformanypeople.

Thisbookbringsthemathematicalfoundationsofbasicmachinelearningconceptstotheforeandcollectstheinformationinasingleplaceso thatthisskillsgapisnarrowedorevenclosed.

1

ThismaterialwillbepublishedbyCambridgeUniversityPressas MathematicsforMachineLearning byMarcPeterDeisenroth,A.AldoFaisal,andChengSoonOng.Thispre-publicationversionis freetoviewanddownloadforpersonaluseonly.Notforre-distribution,re-saleoruseinderivativeworks. c byM.P.Deisenroth,A.A.Faisal,andC.S.Ong,2020. https://mml-book.com

Foreword

WhyAnotherBookonMachineLearning?

Machinelearningbuildsuponthelanguageofmathematicstoexpress conceptsthatseemintuitivelyobviousbutthataresurprisinglydifficult toformalize.Onceformalizedproperly,wecangaininsightsintothetask wewanttosolve.Onecommoncomplaintofstudentsofmathematics aroundtheglobeisthatthetopicscoveredseemtohavelittlerelevance topracticalproblems.Webelievethatmachinelearningisanobviousand directmotivationforpeopletolearnmathematics.

Thisbookisintendedtobeaguidebooktothevastmathematicalliteraturethatformsthefoundationsofmodernmachinelearning.Wemo- “Mathislinkedin thepopularmind withphobiaand anxiety.You’dthink we’rediscussing spiders.”(Strogatz, 2014,page281)

tivatetheneedformathematicalconceptsbydirectlypointingouttheir usefulnessinthecontextoffundamentalmachinelearningproblems.In theinterestofkeepingthebookshort,manydetailsandmoreadvanced conceptshavebeenleftout.Equippedwiththebasicconceptspresented here,andhowtheyfitintothelargercontextofmachinelearning,the readercanfindnumerousresourcesforfurtherstudy,whichweprovideat theendoftherespectivechapters.Forreaderswithamathematicalbackground,thisbookprovidesabriefbutpreciselystatedglimpseofmachine learning.Incontrasttootherbooksthatfocusonmethodsandmodels ofmachinelearning(MacKay, 2003; Bishop, 2006; Alpaydin, 2010; Barber, 2012; Murphy, 2012; Shalev-ShwartzandBen-David, 2014; Rogers andGirolami, 2016)orprogrammaticaspectsofmachinelearning(M¨uller andGuido, 2016; RaschkaandMirjalili, 2017; CholletandAllaire, 2018), weprovideonlyfourrepresentativeexamplesofmachinelearningalgorithms.Instead,wefocusonthemathematicalconceptsbehindthemodels themselves.Wehopethatreaderswillbeabletogainadeeperunderstandingofthebasicquestionsinmachinelearningandconnectpracticalquestionsarisingfromtheuseofmachinelearningwithfundamentalchoices inthemathematicalmodel.

Wedonotaimtowriteaclassicalmachinelearningbook.Instead,our intentionistoprovidethemathematicalbackground,appliedtofourcentralmachinelearningproblems,tomakeiteasiertoreadothermachine learningtextbooks.

WhoIstheTargetAudience?

Asapplicationsofmachinelearningbecomewidespreadinsociety,we believethateverybodyshouldhavesomeunderstandingofitsunderlying principles.Thisbookiswritteninanacademicmathematicalstyle,which enablesustobepreciseabouttheconceptsbehindmachinelearning.We encouragereadersunfamiliarwiththisseeminglytersestyletopersevere andtokeepthegoalsofeachtopicinmind.Wesprinklecommentsand remarksthroughoutthetext,inthehopethatitprovidesusefulguidance withrespecttothebigpicture.

Thebookassumesthereadertohavemathematicalknowledgecommonly

Draft(2020-01-01)of“MathematicsforMachineLearning”.Feedback: https://mml-book.com

2

Foreword 3 coveredinhighschoolmathematicsandphysics. Forexample,thereader shouldhaveseenderivativesandintegralsbefore,andgeometricvectors intwoorthreedimensions.Startingfromthere,wegeneralizetheseconcepts.Therefore,thetargetaudienceofthebookincludesundergraduate universitystudents,eveninglearnersandlearnersparticipatinginonline machinelearningcourses.

Inanalogytomusic,therearethreetypesofinteractionthatpeople havewithmachinelearning:

AstuteListener Thedemocratizationofmachinelearningbytheprovisionofopen-sourcesoftware,onlinetutorialsandcloud-basedtoolsallowsuserstonotworryaboutthespecificsofpipelines.Userscanfocuson extractinginsightsfromdatausingoff-the-shelftools.Thisenablesnontech-savvydomainexpertstobenefitfrommachinelearning.Thisissimilartolisteningtomusic;theuserisabletochooseanddiscernbetween differenttypesofmachinelearning,andbenefitsfromit.Moreexperiencedusersarelikemusiccritics,askingimportantquestionsaboutthe applicationofmachinelearninginsocietysuchasethics,fairness,andprivacyoftheindividual.Wehopethatthisbookprovidesafoundationfor thinkingaboutthecertificationandriskmanagementofmachinelearning systems,andallowsthemtousetheirdomainexpertisetobuildbetter machinelearningsystems.

ExperiencedArtist Skilledpractitionersofmachinelearningcanplug andplaydifferenttoolsandlibrariesintoananalysispipeline.Thestereotypicalpractitionerwouldbeadatascientistorengineerwhounderstands machinelearninginterfacesandtheirusecases,andisabletoperform wonderfulfeatsofpredictionfromdata.Thisissimilartoavirtuosoplayingmusic,wherehighlyskilledpractitionerscanbringexistinginstrumentstolifeandbringenjoymenttotheiraudience.Usingthemathematicspresentedhereasaprimer,practitionerswouldbeabletounderstandthebenefitsandlimitsoftheirfavoritemethod,andtoextendand generalizeexistingmachinelearningalgorithms.Wehopethatthisbook providestheimpetusformorerigorousandprincipleddevelopmentof machinelearningmethods.

FledglingComposer Asmachinelearningisappliedtonewdomains, developersofmachinelearningneedtodevelopnewmethodsandextend existingalgorithms.Theyareoftenresearcherswhoneedtounderstand themathematicalbasisofmachinelearninganduncoverrelationshipsbetweendifferenttasks.Thisissimilartocomposersofmusicwho,within therulesandstructureofmusicaltheory,createnewandamazingpieces. Wehopethisbookprovidesahigh-leveloverviewofothertechnicalbooks forpeoplewhowanttobecomecomposersofmachinelearning.Thereis agreatneedinsocietyfornewresearcherswhoareabletoproposeand explorenovelapproachesforattackingthemanychallengesoflearning fromdata.

c 2020M.P.Deisenroth,A.A.Faisal,C.S.Ong.TobepublishedbyCambridgeUniversityPress.

Foreword

Acknowledgments

Wearegratefultomanypeoplewholookedatearlydraftsofthebookand sufferedthroughpainfulexpositionsofconcepts.Wetriedtoimplement theirideasthatwedidnotvehementlydisagreewith.Wewouldliketo especiallyacknowledgeChristfriedWebersforhiscarefulreadingofmany partsofthebook,andhisdetailedsuggestionsonstructureandpresentation.Manyfriendsandcolleagueshavealsobeenkindenoughtoprovidetheirtimeandenergyondifferentversionsofeachchapter.Wehave beenluckytobenefitfromthegenerosityoftheonlinecommunity,who havesuggestedimprovementsvia github.com,whichgreatlyimproved thebook.

Thefollowingpeoplehavefoundbugs,proposedclarificationsandsuggestedrelevantliterature,eithervia github.com orpersonalcommunication.Theirnamesaresortedalphabetically.

Abdul-GaniyUsman

AdamGaier

AdeleJackson

AdityaMenon

AlasdairTran

AleksandarKrnjaic

AlexanderMakrigiorgos

AlfredoCanziani

AliShafti

AmrKhalifa

AndrewTanggara

AngusGruen

AntalA.Buss

AntoineToisoulLeCann

AregSarvazyan

ArtemArtemev

ArtyomStepanov

BillKromydas

BobWilliamson

BoonPingLim

ChaoQu

ChengLi

ChrisSherlock

ChristopherGray

DanielMcNamara

DanielWood

DarrenSiegel

DavidJohnston

DaweiChen

EllenBroad

FengkuangtianZhu

FionaCondon

GeorgiosTheodorou

HeXin

IreneRaissaKameni

JakubNabaglo

JamesHensman

JamieLiu

JeanKaddour

Jean-PaulEbejer

JerryQiang

JiteshSindhare

JohnLloyd

JonasNgnawe

JonMartin

JustinHsi

KaiArulkumaran

KamilDreczkowski

LilyWang

LionelTondjiNgoupeyou

LydiaKnufing

MahmoudAslan

MarkHartenstein

MarkvanderWilk

MarkusHegland

MartinHewing

MatthewAlger

MatthewLee

Draft(2020-01-01)of“MathematicsforMachineLearning”.Feedback: https://mml-book.com

4

Foreword 5

MaximusMcCann

MengyanZhang

MichaelBennett

MichaelPedersen

MinjeongShin

MohammadMalekzadeh

NaveenKumar

NicoMontali

OscarArmas

PatrickHenriksen

PatrickWieschollek

PattarawatChormai

PaulKelly

PetrosChristodoulou

PiotrJanuszewski

PranavSubramani

QuyuKong

RagibZaman

RuiZhang

Ryan-RhysGriffiths

SalomonKabongo

SamuelOgunmola

SandeepMavadia

SarveshNikumbh

SebastianRaschka

SenanayakSeshKumarKarri

Seung-HeonBaek

ShahbazChaudhary

ShakirMohamed

ShawnBerry

SheikhAbdulRaheemAli

ShengXue

SridharThiagarajan

SyedNoumanHasany

SzymonBrych

ThomasBuhler

TimurSharapov

TomMelamed

VincentAdam

VincentDutordoir

VuMinh

WasimAftab

WenZhi

WojciechStokowiec

XiaonanChong

XiaoweiZhang

YazhouHao

YichengLuo

YoungLee

YuLu

YunCheng

YuxiaoHuang

ZacCranko

ZijianCao

ZoeNolan

Contributorsthroughgithub,whoserealnameswerenotlistedontheir githubprofile,are:

SamDataMad bumptiousmonkey idoamihai deepakiim

insad HorizonP cs-maillist kudo23

empet victorBigand 17SKYE jessjing1995

WearealsoverygratefultoParameswaranRamanandthemanyanonymousreviewers,organizedbyCambridgeUniversityPress,whoreadone ormorechaptersofearlierversionsofthemanuscript,andprovidedconstructivecriticismthatledtoconsiderableimprovements.AspecialmentiongoestoDineshSinghNegi,ourLATEXsupport,fordetailedandprompt adviceaboutLATEX-relatedissues.Lastbutnotleast,weareverygrateful tooureditorLaurenCowles,whohasbeenpatientlyguidingusthrough thegestationprocessofthisbook.

c 2020M.P.Deisenroth,A.A.Faisal,C.S.Ong.TobepublishedbyCambridgeUniversityPress.

Foreword

TableofSymbols

SymbolTypicalmeaning

a,b,c,α,β,γ

x, y, z

A, B, C

x , A

A 1

x, y

x y

Scalarsarelowercase

Vectorsareboldlowercase

Matricesarebolduppercase

Transposeofavectorormatrix

Inverseofamatrix

Innerproductof x and y

Dotproductof x and y

B =(b1, b2, b3) (Ordered)tuple

B =[b1, b2, b3] Matrixofcolumnvectorsstackedhorizontally

B = {b1, b2, b3} Setofvectors(unordered)

Z, N

Integersandnaturalnumbers,respectively

R, C Realandcomplexnumbers,respectively

Rn n-dimensionalvectorspaceofrealnumbers

∀x Universalquantifier:forall x

∃x Existentialquantifier:thereexists x

a := ba isdefinedas b

a =: bb isdefinedas a

a ∝ ba isproportionalto b,i.e., a = constant b

g ◦ f Functioncomposition:“g after f ”

⇐⇒ Ifandonlyif

=⇒ Implies

A, C Sets

a ∈A a isanelementoftheset A

∅ Emptyset

D

Numberofdimensions;indexedby d =1,...,D

N Numberofdatapoints;indexedby n =1,...,N

I m

0m,n

1m,n

Identitymatrixofsize m × m

Matrixofzerosofsize m × n

Matrixofonesofsize m × n

ei Standard/canonicalvector(where i isthecomponentthatis 1) dim Dimensionalityofvectorspace rk(A) Rankofmatrix A

Im(Φ) Imageoflinearmapping Φ ker(Φ) Kernel(nullspace)ofalinearmapping Φ span[b1] Span(generatingset)of b1 tr(A) Traceof A

det(A) Determinantof A

|·| Absolutevalueordeterminant(dependingoncontext)

· Norm;Euclideanunlessspecified

λ EigenvalueorLagrangemultiplier

Eλ Eigenspacecorrespondingtoeigenvalue λ

Draft(2020-01-01)of“MathematicsforMachineLearning”.Feedback: https://mml-book.com

6

Foreword 7

SymbolTypicalmeaning

θ Parametervector

∂f

∂x

df

Partialderivativeof f withrespectto x

dx Totalderivativeof f withrespectto x

∇ Gradient

L Lagrangian

L Negativelog-likelihood

n k Binomialcoefficient, n choose k

VX [x] Varianceof x withrespecttotherandomvariable X

EX [x] Expectationof x withrespecttotherandomvariable X

CovX,Y [x, y] Covariancebetween x and y.

X ⊥⊥ Y | ZX isconditionallyindependentof Y given Z

X ∼ p Randomvariable X isdistributedaccordingto p

N µ, Σ Gaussiandistributionwithmean µ andcovariance Σ

Ber(µ) Bernoullidistributionwithparameter µ

Bin(N,µ) Binomialdistributionwithparameters N,µ

Beta(α,β) Betadistributionwithparameters α,β

TableofAbbreviationsandAcronyms

AcronymMeaning

e.g.Exempligratia(Latin:forexample)

GMMGaussianmixturemodel

i.e.Idest(Latin:thismeans)

i.i.d.Independent,identicallydistributed

MAPMaximumaposteriori

MLEMaximumlikelihoodestimation/estimator

ONBOrthonormalbasis

PCAPrincipalcomponentanalysis

PPCAProbabilisticprincipalcomponentanalysis

REFRow-echelonform

SPDSymmetric,positivedefinite

SVMSupportvectormachine

c 2020M.P.Deisenroth,A.A.Faisal,C.S.Ong.TobepublishedbyCambridgeUniversityPress.

PartI

MathematicalFoundations

ThismaterialwillbepublishedbyCambridgeUniversityPressas MathematicsforMachineLearning byMarcPeterDeisenroth,A.AldoFaisal,andChengSoonOng.Thispre-publicationversionis freetoviewanddownloadforpersonaluseonly.Notforre-distribution,re-saleoruseinderivativeworks. c byM.P.Deisenroth,A.A.Faisal,andC.S.Ong,2020. https://mml-book.com

9

IntroductionandMotivation

Machinelearningisaboutdesigningalgorithmsthatautomaticallyextract valuableinformationfromdata.Theemphasishereison“automatic”,i.e., machinelearningisconcernedaboutgeneral-purposemethodologiesthat canbeappliedtomanydatasets,whileproducingsomethingthatismeaningful.Therearethreeconceptsthatareatthecoreofmachinelearning: data,amodel,andlearning.

Sincemachinelearningisinherentlydatadriven, data isatthecore data ofmachinelearning.Thegoalofmachinelearningistodesigngeneralpurposemethodologiestoextractvaluablepatternsfromdata,ideally withoutmuchdomain-specificexpertise.Forexample,givenalargecorpus ofdocuments(e.g.,booksinmanylibraries),machinelearningmethods canbeusedtoautomaticallyfindrelevanttopicsthataresharedacross documents(Hoffmanetal., 2010).Toachievethisgoal,wedesign models thataretypicallyrelatedtotheprocessthatgeneratesdata,similarto model thedatasetwearegiven.Forexample,inaregressionsetting,themodel woulddescribeafunctionthatmapsinputstoreal-valuedoutputs.To paraphrase Mitchell (1997):Amodelissaidtolearnfromdataifitsperformanceonagiventaskimprovesafterthedataistakenintoaccount. Thegoalistofindgoodmodelsthatgeneralizewelltoyetunseendata, whichwemaycareaboutinthefuture. Learning canbeunderstoodasa learning waytoautomaticallyfindpatternsandstructureindatabyoptimizingthe parametersofthemodel.

Whilemachinelearninghasseenmanysuccessstories,andsoftwareis readilyavailabletodesignandtrainrichandflexiblemachinelearning systems,webelievethatthemathematicalfoundationsofmachinelearningareimportantinordertounderstandfundamentalprinciplesupon whichmorecomplicatedmachinelearningsystemsarebuilt.Understandingtheseprinciplescanfacilitatecreatingnewmachinelearningsolutions, understandinganddebuggingexistingapproaches,andlearningaboutthe inherentassumptionsandlimitationsofthemethodologiesweareworkingwith.

11

ThismaterialwillbepublishedbyCambridgeUniversityPressas MathematicsforMachineLearning byMarcPeterDeisenroth,A.AldoFaisal,andChengSoonOng.Thispre-publicationversionis freetoviewanddownloadforpersonaluseonly.Notforre-distribution,re-saleoruseinderivativeworks. c byM.P.Deisenroth,A.A.Faisal,andC.S.Ong,2020. https://mml-book.com

1

1.1FindingWordsforIntuitions

Achallengewefaceregularlyinmachinelearningisthatconceptsand wordsareslippery,andaparticularcomponentofthemachinelearning systemcanbeabstractedtodifferentmathematicalconcepts.Forexample, theword“algorithm”isusedinatleasttwodifferentsensesinthecontextofmachinelearning.Inthefirstsense,weusethephrase“machine learningalgorithm”tomeanasystemthatmakespredictionsbasedoninputdata.Werefertothesealgorithmsas predictors.Inthesecondsense, predictor weusetheexactsamephrase“machinelearningalgorithm”tomeana systemthatadaptssomeinternalparametersofthepredictorsothatit performswellonfutureunseeninputdata.Herewerefertothisadaptationas training asystem. training

Thisbookwillnotresolvetheissueofambiguity,butwewanttohighlightupfrontthat,dependingonthecontext,thesameexpressionscan meandifferentthings.However,weattempttomakethecontextsufficientlycleartoreducethelevelofambiguity.

Thefirstpartofthisbookintroducesthemathematicalconceptsand foundationsneededtotalkaboutthethreemaincomponentsofamachine learningsystem:data,models,andlearning.Wewillbrieflyoutlinethese componentshere,andwewillrevisitthemagaininChapter 8 oncewe havediscussedthenecessarymathematicalconcepts.

Whilenotalldataisnumerical,itisoftenusefultoconsiderdatain anumberformat.Inthisbook,weassumethat data hasalreadybeen appropriatelyconvertedintoanumericalrepresentationsuitableforreadingintoacomputerprogram.Therefore,wethinkofdataasvectors.As dataasvectors anotherillustrationofhowsubtlewordsare,thereare(atleast)three differentwaystothinkaboutvectors:avectorasanarrayofnumbers(a computerscienceview),avectorasanarrowwithadirectionandmagnitude(aphysicsview),andavectorasanobjectthatobeysadditionand scaling(amathematicalview).

A model istypicallyusedtodescribeaprocessforgeneratingdata,sim- model ilartothedatasetathand.Therefore,goodmodelscanalsobethought ofassimplifiedversionsofthereal(unknown)data-generatingprocess, capturingaspectsthatarerelevantformodelingthedataandextracting hiddenpatternsfromit.Agoodmodelcanthenbeusedtopredictwhat wouldhappenintherealworldwithoutperformingreal-worldexperiments.

Wenowcometothecruxofthematter,the learning componentof learning machinelearning.Assumewearegivenadatasetandasuitablemodel. Training themodelmeanstousethedataavailabletooptimizesomeparametersofthemodelwithrespecttoautilityfunctionthatevaluateshow wellthemodelpredictsthetrainingdata.Mosttrainingmethodscanbe thoughtofasanapproachanalogoustoclimbingahilltoreachitspeak. Inthisanalogy,thepeakofthehillcorrespondstoamaximumofsome

Draft(2020-01-01)of“MathematicsforMachineLearning”.Feedback: https://mml-book.com

12 IntroductionandMotivation

1.2TwoWaystoReadThisBook

desiredperformancemeasure.However,inpractice,weareinterestedin themodeltoperformwellonunseendata.Performingwellondatathat wehavealreadyseen(trainingdata)mayonlymeanthatwefounda goodwaytomemorizethedata.However,thismaynotgeneralizewellto unseendata,and,inpracticalapplications,weoftenneedtoexposeour machinelearningsystemtosituationsthatithasnotencounteredbefore.

Letussummarizethemainconceptsofmachinelearningthatwecover inthisbook:

Werepresentdataasvectors.

Wechooseanappropriatemodel,eitherusingtheprobabilisticoroptimizationview.

Welearnfromavailabledatabyusingnumericaloptimizationmethods withtheaimthatthemodelperformswellondatanotusedfortraining.

1.2TwoWaystoReadThisBook

Wecanconsidertwostrategiesforunderstandingthemathematicsfor machinelearning:

Bottom-up: Buildinguptheconceptsfromfoundationaltomoreadvanced.Thisisoftenthepreferredapproachinmoretechnicalfields, suchasmathematics.Thisstrategyhastheadvantagethatthereader atalltimesisabletorelyontheirpreviouslylearnedconcepts.Unfortunately,forapractitionermanyofthefoundationalconceptsarenot particularlyinterestingbythemselves,andthelackofmotivationmeans thatmostfoundationaldefinitionsarequicklyforgotten.

Top-down: Drillingdownfrompracticalneedstomorebasicrequirements.Thisgoal-drivenapproachhastheadvantagethatthereaders knowatalltimeswhytheyneedtoworkonaparticularconcept,and thereisaclearpathofrequiredknowledge.Thedownsideofthisstrategyisthattheknowledgeisbuiltonpotentiallyshakyfoundations,and thereadershavetorememberasetofwordsthattheydonothaveany wayofunderstanding.

Wedecidedtowritethisbookinamodularwaytoseparatefoundational (mathematical)conceptsfromapplicationssothatthisbookcanberead inbothways.Thebookissplitintotwoparts,wherePartIlaysthemathematicalfoundationsandPartIIappliestheconceptsfromPartItoaset offundamentalmachinelearningproblems,whichformfourpillarsof machinelearningasillustratedinFigure 1.1:regression,dimensionality reduction,densityestimation,andclassification.ChaptersinPartImostly builduponthepreviousones,butitispossibletoskipachapterandwork backwardifnecessary.ChaptersinPartIIareonlylooselycoupledand canbereadinanyorder.Therearemanypointersforwardandbackward

c 2020M.P.Deisenroth,A.A.Faisal,C.S.Ong.TobepublishedbyCambridgeUniversityPress.

13

Figure1.1 The foundationsand fourpillarsof machinelearning.

IntroductionandMotivation

betweenthetwopartsofthebooktolinkmathematicalconceptswith machinelearningalgorithms.

Ofcoursetherearemorethantwowaystoreadthisbook. Mostreaders learnusingacombinationoftop-downandbottom-upapproaches,sometimesbuildingupbasicmathematicalskillsbeforeattemptingmorecomplexconcepts,butalsochoosingtopicsbasedonapplicationsofmachine learning.

PartIIsaboutMathematics

Thefourpillarsofmachinelearningwecoverinthisbook(seeFigure 1.1) requireasolidmathematicalfoundation,whichislaidoutinPartI. Werepresentnumericaldataasvectorsandrepresentatableofsuch dataasamatrix.Thestudyofvectorsandmatricesiscalled linearalgebra, whichweintroduceinChapter 2.Thecollectionofvectorsasamatrixis linearalgebra alsodescribedthere.

Giventwovectorsrepresentingtwoobjectsintherealworld,wewant tomakestatementsabouttheirsimilarity.Theideaisthatvectorsthat aresimilarshouldbepredictedtohavesimilaroutputsbyourmachine learningalgorithm(ourpredictor).Toformalizetheideaofsimilaritybetweenvectors,weneedtointroduceoperationsthattaketwovectorsas inputandreturnanumericalvaluerepresentingtheirsimilarity.Theconstructionofsimilarityanddistancesiscentralto analyticgeometry andis analyticgeometry discussedinChapter 3

InChapter 4,weintroducesomefundamentalconceptsaboutmatricesand matrixdecomposition.Someoperationsonmatricesareextremely matrix decomposition usefulinmachinelearning,andtheyallowforanintuitiveinterpretation ofthedataandmoreefficientlearning.

Weoftenconsiderdatatobenoisyobservationsofsometrueunderlyingsignal.Wehopethatbyapplyingmachinelearningwecanidentifythe signalfromthenoise.Thisrequiresustohavealanguageforquantifyingwhat“noise”means.Weoftenwouldalsoliketohavepredictorsthat

Draft(2020-01-01)of“MathematicsforMachineLearning”.Feedback: https://mml-book.com

14
Classification DensityEstimation Regression DimensionalityReduction MachineLearning VectorCalculus Probability&Distributions Optimization AnalyticGeometry MatrixDecomposition LinearAlgebra

1.2TwoWaystoReadThisBook

allowustoexpresssomesortofuncertainty,e.g.,toquantifytheconfidencewehaveaboutthevalueofthepredictionataparticulartestdata point.Quantificationofuncertaintyistherealmof probabilitytheory and probabilitytheory iscoveredinChapter 6.

Totrainmachinelearningmodels,wetypicallyfindparametersthat maximizesomeperformancemeasure.Manyoptimizationtechniquesrequiretheconceptofagradient,whichtellsusthedirectioninwhichto searchforasolution.Chapter 5 isabout vectorcalculus anddetailsthe vectorcalculus conceptofgradients,whichwesubsequentlyuseinChapter 7,wherewe talkabout optimization tofindmaxima/minimaoffunctions. optimization

PartIIIsaboutMachineLearning

Thesecondpartofthebookintroduces fourpillarsofmachinelearning asshowninFigure 1.1.Weillustratehowthemathematicalconceptsintroducedinthefirstpartofthebookarethefoundationforeachpillar. Broadlyspeaking,chaptersareorderedbydifficulty(inascendingorder).

InChapter 8,werestatethethreecomponentsofmachinelearning (data,models,andparameterestimation)inamathematicalfashion.In addition,weprovidesomeguidelinesforbuildingexperimentalset-ups thatguardagainstoverlyoptimisticevaluationsofmachinelearningsystems.Recallthatthegoalistobuildapredictorthatperformswellon unseendata.

InChapter 9,wewillhaveacloselookat linearregression,whereour linearregression objectiveistofindfunctionsthatmapinputs x ∈ RD tocorrespondingobservedfunctionvalues y ∈ R,whichwecaninterpretasthelabelsoftheir respectiveinputs.Wewilldiscussclassicalmodelfitting(parameterestimation)viamaximumlikelihoodandmaximumaposterioriestimation, aswellasBayesianlinearregression,whereweintegratetheparameters outinsteadofoptimizingthem.

Chapter 10 focuseson dimensionalityreduction,thesecondpillarinFig- dimensionality reduction ure 1.1,usingprincipalcomponentanalysis.Thekeyobjectiveofdimensionalityreductionistofindacompact,lower-dimensionalrepresentation ofhigh-dimensionaldata x ∈ RD,whichisofteneasiertoanalyzethan theoriginaldata.Unlikeregression,dimensionalityreductionisonlyconcernedaboutmodelingthedata–therearenolabelsassociatedwitha datapoint x

InChapter 11,wewillmovetoourthirdpillar: densityestimation.The densityestimation objectiveofdensityestimationistofindaprobabilitydistributionthatdescribesagivendataset.WewillfocusonGaussianmixturemodelsforthis purpose,andwewilldiscussaniterativeschemetofindtheparametersof thismodel.Asindimensionalityreduction,therearenolabelsassociated withthedatapoints x ∈ RD.However,wedonotseekalow-dimensional representationofthedata.Instead,weareinterestedinadensitymodel thatdescribesthedata.

Chapter 12 concludesthebookwithanin-depthdiscussionofthefourth

c 2020M.P.Deisenroth,A.A.Faisal,C.S.Ong.TobepublishedbyCambridgeUniversityPress.

15

IntroductionandMotivation

pillar: classification.Wewilldiscussclassificationinthecontextofsupport classification vectormachines.Similartoregression(Chapter 9),wehaveinputs x and correspondinglabels y.However,unlikeregression,wherethelabelswere real-valued,thelabelsinclassificationareintegers,whichrequiresspecial care.

1.3ExercisesandFeedback

WeprovidesomeexercisesinPartI,whichcanbedonemostlybypenand paper.ForPartII,weprovideprogrammingtutorials(jupyternotebooks) toexploresomepropertiesofthemachinelearningalgorithmswediscuss inthisbook.

WeappreciatethatCambridgeUniversityPressstronglysupportsour aimtodemocratizeeducationandlearningbymakingthisbookfreely availablefordownloadat

https://mml-book.com wheretutorials,errata,andadditionalmaterialscanbefound.Mistakes canbereportedandfeedbackprovidedusingtheprecedingURL.

Draft(2020-01-01)of“MathematicsforMachineLearning”.Feedback: https://mml-book.com

16

LinearAlgebra

Whenformalizingintuitiveconcepts,acommonapproachistoconstructa setofobjects(symbols)andasetofrulestomanipulatetheseobjects.This isknownasan algebra.Linearalgebraisthestudyofvectorsandcertain algebra rulestomanipulatevectors.Thevectorsmanyofusknowfromschoolare called“geometricvectors”,whichareusuallydenotedbyasmallarrow abovetheletter,e.g., −→ x and −→ y .Inthisbook,wediscussmoregeneral conceptsofvectorsanduseaboldlettertorepresentthem,e.g., x and y Ingeneral,vectorsarespecialobjectsthatcanbeaddedtogetherand multipliedbyscalarstoproduceanotherobjectofthesamekind.From anabstractmathematicalviewpoint,anyobjectthatsatisfiesthesetwo propertiescanbeconsideredavector.Herearesomeexamplesofsuch vectorobjects:

1. Geometricvectors.Thisexampleofavectormaybefamiliarfromhigh schoolmathematicsandphysics.Geometricvectors–seeFigure 2.1(a) –aredirectedsegments,whichcanbedrawn(atleastintwodimensions).Twogeometricvectors → x, → y canbeadded,suchthat → x + → y = → z isanothergeometricvector.Furthermore,multiplicationbyascalar λ → x, λ ∈ R,isalsoageometricvector.Infact,itistheoriginalvector scaledby λ.Therefore,geometricvectorsareinstancesofthevector conceptsintroducedpreviously.Interpretingvectorsasgeometricvectorsenablesustouseourintuitionsaboutdirectionandmagnitudeto reasonaboutmathematicaloperations.

2. Polynomialsarealsovectors;seeFigure 2.1(b):Twopolynomialscan

ThismaterialwillbepublishedbyCambridgeUniversityPressas MathematicsforMachineLearning byMarcPeterDeisenroth,A.AldoFaisal,andChengSoonOng.Thispre-publicationversionis freetoviewanddownloadforpersonaluseonly.Notforre-distribution,re-saleoruseinderivativeworks. c byM.P.Deisenroth,A.A.Faisal,andC.S.Ong,2020. https://mml-book.com

Figure2.1

Differenttypesof vectors.Vectorscan besurprising objects,including (a) geometric vectors and (b) polynomials.

2
→ x → y → x + → y (a)Geometricvectors. 2 0 2 x 6 4 2 0 2 4 y (b)Polynomials.
17

18

LinearAlgebra beaddedtogether,whichresultsinanotherpolynomial;andtheycan bemultipliedbyascalar λ ∈ R,andtheresultisapolynomialas well.Therefore,polynomialsare(ratherunusual)instancesofvectors. Notethatpolynomialsareverydifferentfromgeometricvectors.While geometricvectorsareconcrete“drawings”,polynomialsareabstract concepts.However,theyarebothvectorsinthesensepreviouslydescribed.

3. Audiosignalsarevectors.Audiosignalsarerepresentedasaseriesof numbers.Wecanaddaudiosignalstogether,andtheirsumisanew audiosignal.Ifwescaleanaudiosignal,wealsoobtainanaudiosignal. Therefore,audiosignalsareatypeofvector,too.

4. Elementsof Rn (tuplesof n realnumbers)arevectors. Rn ismore abstractthanpolynomials,anditistheconceptwefocusoninthis book.Forinstance,

isanexampleofatripletofnumbers.Addingtwovectors a, b ∈ Rn component-wiseresultsinanothervector: a + b = c ∈ Rn.Moreover, multiplying a ∈ Rn by λ ∈ R resultsinascaledvector λa ∈ Rn . Consideringvectorsaselementsof Rn hasanadditionalbenefitthat

Becarefultocheck whetherarray operationsactually performvector operationswhen implementingona computer. itlooselycorrespondstoarraysofrealnumbersonacomputer.Many programminglanguagessupportarrayoperations,whichallowforconvenientimplementationofalgorithmsthatinvolvevectoroperations.

Linearalgebrafocusesonthesimilaritiesbetweenthesevectorconcepts. Wecanaddthemtogetherandmultiplythembyscalars.Wewilllargely PavelGrinfeld’s seriesonlinear algebra: http://tinyurl. com/nahclwm GilbertStrang’s courseonlinear algebra: http://tinyurl. com/29p5q8j 3Blue1Brownseries onlinearalgebra: https://tinyurl. com/h5g4kps

focusonvectorsin Rn sincemostalgorithmsinlinearalgebraareformulatedin Rn.WewillseeinChapter 8 thatweoftenconsiderdatato berepresentedasvectorsin Rn.Inthisbook,wewillfocusonfinitedimensionalvectorspaces,inwhichcasethereisa 1:1 correspondence betweenanykindofvectorand Rn.Whenitisconvenient,wewilluse intuitionsaboutgeometricvectorsandconsiderarray-basedalgorithms.

Onemajorideainmathematicsistheideaof“closure”.Thisisthequestion:Whatisthesetofallthingsthatcanresultfrommyproposedoperations?Inthecaseofvectors:Whatisthesetofvectorsthatcanresultby startingwithasmallsetofvectors,andaddingthemtoeachotherand scalingthem?Thisresultsinavectorspace(Section 2.4).Theconceptof avectorspaceanditspropertiesunderliemuchofmachinelearning.The conceptsintroducedinthischapteraresummarizedinFigure 2.2

Thischapterismostlybasedonthelecturenotesandbooksby Drumm andWeil (2001), Strang (2003), Hogben (2013), LiesenandMehrmann (2015),aswellasPavelGrinfeld’sLinearAlgebraseries.Otherexcellent

Draft(2020-01-01)of“MathematicsforMachineLearning”.Feedback: https://mml-book.com

a =   1 2 3   ∈ R3 (2.1)

2.1SystemsofLinearEquations

resourcesareGilbertStrang’sLinearAlgebracourseatMITandtheLinear AlgebraSeriesby3Blue1Brown.

Linearalgebraplaysanimportantroleinmachinelearningandgeneralmathematics.TheconceptsintroducedinthischapterarefurtherexpandedtoincludetheideaofgeometryinChapter 3.InChapter 5,we willdiscussvectorcalculus,whereaprincipledknowledgeofmatrixoperationsisessential.InChapter 10,wewilluseprojections(tobeintroducedinSection 3.8)fordimensionalityreductionwithprincipalcomponentanalysis(PCA).InChapter 9,wewilldiscusslinearregression,where linearalgebraplaysacentralroleforsolvingleast-squaresproblems.

2.1SystemsofLinearEquations

Systemsoflinearequationsplayacentralpartoflinearalgebra.Many problemscanbeformulatedassystemsoflinearequations,andlinear algebragivesusthetoolsforsolvingthem.

Example2.1

Acompanyproducesproducts N1,...,Nn forwhichresources R1,...,Rm arerequired.Toproduceaunitofproduct Nj , aij unitsof resource Ri areneeded,where i =1,...,m and j =1,...,n. Theobjectiveistofindanoptimalproductionplan,i.e.,aplanofhow manyunits xj ofproduct Nj shouldbeproducedifatotalof bi unitsof resource Ri areavailableand(ideally)noresourcesareleftover. Ifweproduce x1,...,xn unitsofthecorrespondingproducts,weneed

c 2020M.P.Deisenroth,A.A.Faisal,C.S.Ong.TobepublishedbyCambridgeUniversityPress.

Figure2.2 Amind mapoftheconcepts introducedinthis chapter,alongwith wheretheyareused inotherpartsofthe book.

19
Vector Vectorspace Matrix Chapter 5 Vectorcalculus Group Systemof linearequations Matrix inverse Gaussian elimination Linear/affine mapping Linear independence Basis Chapter 10 Dimensionality reduction Chapter 12 Classification Chapter 3 Analyticgeometry composes closure Abelian with + represents represents solvedby solves propertyof maximalset

atotalof

ai1x1 + + ainxn (2.2)

manyunitsofresource Ri.Anoptimalproductionplan (x1,...,xn) ∈ Rn , therefore,hastosatisfythefollowingsystemofequations:

(2.3)

where aij ∈ R and bi ∈ R

Equation(2.3)isthegeneralformofa systemoflinearequations,and

systemoflinear equations x1,...,xn arethe unknowns ofthissystem.Every n-tuple (x1,...,xn) ∈ Rn thatsatisfies(2.3)isa solution ofthelinearequationsystem. solution

Example2.2

Thesystemoflinearequations

x1 + x2 + x3 =3(1) x1 x2 +2x3 =2(2) 2x1 +3x3 =1(3) (2.4)

has nosolution: Addingthefirsttwoequationsyields 2x1 +3x3 =5,which contradictsthethirdequation(3).

Letushavealookatthesystemoflinearequations

x1 + x2 + x3 =3(1) x1 x2 +2x3 =2(2) x2 + x3 =2(3) . (2.5)

Fromthefirstandthirdequation,itfollowsthat x1 =1.From(1)+(2), weget 2x1 +3x3 =5,i.e., x3 =1.From(3),wethengetthat x2 =1. Therefore, (1, 1, 1) istheonlypossibleand uniquesolution (verifythat (1, 1, 1) isasolutionbypluggingin).

Asathirdexample,weconsider

x1 + x2 + x3 =3(1) x1 x2 +2x3 =2(2) 2x1 +3x3 =5(3) . (2.6)

Since(1)+(2)=(3),wecanomitthethirdequation(redundancy).From (1)and(2),weget 2x1 =5 3x3 and 2x2 =1+x3.Wedefine x3 = a ∈ R asafreevariable,suchthatanytriplet 5 2 3 2 a, 1 2 + 1 2 a,a ,a ∈ R (2.7)

Draft(2020-01-01)of“MathematicsforMachineLearning”.Feedback: https://mml-book.com

20 LinearAlgebra
a
11x1 + + a1nxn = b1 . am1x1 + + amnxn = bm ,

Figure2.3 The solutionspaceofa systemoftwolinear equationswithtwo variablescanbe geometrically interpretedasthe intersectionoftwo lines.Everylinear equationrepresents aline. 2x1 4x2 =1 4x1 +4x2 =5 x1 x2

isasolutionofthesystemoflinearequations,i.e.,weobtainasolution setthatcontains infinitelymany solutions.

Ingeneral,forareal-valuedsystemoflinearequationsweobtaineither no,exactlyone,orinfinitelymanysolutions.Linearregression(Chapter 9) solvesaversionofExample 2.1 whenwecannotsolvethesystemoflinear equations.

Remark (GeometricInterpretationofSystemsofLinearEquations). Ina systemoflinearequationswithtwovariables x1,x2,eachlinearequation definesalineonthe x1x2-plane.Sinceasolutiontoasystemoflinear equationsmustsatisfyallequationssimultaneously,thesolutionsetisthe intersectionoftheselines.Thisintersectionsetcanbealine(ifthelinear equationsdescribethesameline),apoint,orempty(whenthelinesare parallel).AnillustrationisgiveninFigure 2.3 forthesystem

(2.8)

wherethesolutionspaceisthepoint (x1,x2)=(1, 1 4 ).Similarly,forthree variables,eachlinearequationdeterminesaplaneinthree-dimensional space.Whenweintersecttheseplanes,i.e.,satisfyalllinearequationsat thesametime,wecanobtainasolutionsetthatisaplane,aline,apoint orempty(whentheplaneshavenocommonintersection).

Forasystematicapproachtosolvingsystemsoflinearequations,we willintroduceausefulcompactnotation.Wecollectthecoefficients aij intovectorsandcollectthevectorsintomatrices.Inotherwords,wewrite thesystemfrom(2.3)inthefollowingform:

c 2020M.P.Deisenroth,A.A.Faisal,C.S.Ong.TobepublishedbyCambridgeUniversityPress.

2.1SystemsofLinearEquations 21
4x1 +4x2 =5 2x1 4x2 =1
x1    a11 . . am1    + x2    a12 . . am2    + + xn    a1n . . amn    =    b1 . . bm    (2.9)

Inthefollowing,wewillhaveacloselookatthese matrices anddefinecomputationrules.WewillreturntosolvinglinearequationsinSection 2.3.

2.2Matrices

Matricesplayacentralroleinlinearalgebra.Theycanbeusedtocompactlyrepresentsystemsoflinearequations,buttheyalsorepresentlinear functions(linearmappings)aswewillseelaterinSection 2.7.Beforewe discusssomeoftheseinterestingtopics,letusfirstdefinewhatamatrix isandwhatkindofoperationswecandowithmatrices.Wewillseemore propertiesofmatricesinChapter 4

Definition2.1 (Matrix). With m,n ∈ N areal-valued (m,n) matrix A is matrix an m n-tupleofelements aij , i =1,...,m, j =1,...,n,whichisordered accordingtoarectangularschemeconsistingof m rowsand n columns:

Byconvention (1,n)-matricesarecalled rows and (m, 1)-matricesarecalled row columns.Thesespecialmatricesarealsocalled row/columnvectors. column rowvector columnvector Figure2.4 By stackingits columns,amatrix A canberepresented asalongvector a.

Rm×n isthesetofallreal-valued (m,n)-matrices. A ∈ Rm×n canbe equivalentlyrepresentedas a ∈ Rmn bystackingall n columnsofthe matrixintoalongvector;seeFigure 2.4.

2.2.1MatrixAdditionandMultiplication

Thesumoftwomatrices A ∈ Rm×n , B ∈ Rm×n isdefinedastheelementwisesum,i.e.,

Notethesizeofthe matrices. C = AB ∈ Rm×k arecomputedas C= np.einsum(’il, lj’,A,B) cij = n l=1

Formatrices A ∈ Rm×n , B ∈ Rn×k,theelements cij oftheproduct

Draft(2020-01-01)of“MathematicsforMachineLearning”.Feedback: https://mml-book.com

22 LinearAlgebra ⇐⇒    a11 a1n . . . . am1 amn       x1 . . xn    =    b1 . . bm    . (2.10)
A =      a11 a12 a1n a21 a22 ··· a2n . . . am1 am2 ··· amn      ,aij ∈ R . (2.11)
re-shape A ∈ R4×2 a ∈ R8
   a11 + b11 a1n + b1n . . . . am1 + bm1 amn + bmn    ∈ Rm×n (2.12)
A + B :=
b
=1
=1
ail
lj ,i
,...,m,j
,...,k. (2.13)

2.2Matrices 23

Thismeans,tocomputeelement cij wemultiplytheelementsofthe ith Thereare n columns in A and n rowsin B sothatwecan compute ailblj for l =1,...,n

rowof A withthe jthcolumnof B andsumthemup.LaterinSection 3.2, wewillcallthisthe dotproduct ofthecorrespondingrowandcolumn.In cases,whereweneedtobeexplicitthatweareperformingmultiplication, weusethenotation A · B todenotemultiplication(explicitlyshowing “ ”).

Remark. Matricescanonlybemultipliediftheir“neighboring”dimensions match.Forinstance,an n × k-matrix A canbemultipliedwitha k × mmatrix B,butonlyfromtheleftside:

Theproduct BA isnotdefinedif m = n sincetheneighboringdimensions donotmatch.

Remark. Matrixmultiplicationis not definedasanelement-wiseoperation onmatrixelements,i.e., cij = aij bij (evenifthesizeof A, B waschosenappropriately).Thiskindofelement-wisemultiplicationoftenappears inprogramminglanguageswhenwemultiply(multi-dimensional)arrays witheachother,andiscalleda Hadamardproduct

Example2.3

Fromthisexample,wecanalreadyseethatmatrixmultiplicationisnot commutative,i.e., AB = BA;seealsoFigure 2.5 foranillustration.

Definition2.2 (IdentityMatrix). In Rn×n,wedefinethe identitymatrix

Commonly,thedot productbetween twovectors a, b is denotedby a b or a, b

c 2020M.P.Deisenroth,A.A.Faisal,C.S.Ong.TobepublishedbyCambridgeUniversityPress.

Figure2.5 Evenif bothmatrix multiplications AB and BA are defined,the dimensionsofthe resultscanbe different.

identitymatrix

A n×k B k×m = C n×m (2.14)
♦ Hadamardproduct
A = 123 321 ∈ R2×3 , B =   02 1 1 01   ∈ R3×2,weobtain AB = 123 321   02 1 1 01   = 23 25 ∈ R2×2 , (2.15) BA =   02 1 1 01   123 321 =   642 202 321   ∈ R3×3 . (2.16)
For
I n :=           10 ··· 0 ··· 0 01 0 0 . . . . . . . . . . 00 1 0 . . . . . . . . . . 00 0 1           ∈ Rn×n (2.17)

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.