Mathematics for Machine Learning 1st Edition Marc
Peter Deisenroth
Visit to download the full and correct content document: https://textbookfull.com/product/mathematics-for-machine-learning-1st-edition-marc-p eter-deisenroth/
More products digital (pdf, epub, mobi) instant download maybe you interests ...
Mathematics for machine learning Deisenroth M.P
https://textbookfull.com/product/mathematics-for-machinelearning-deisenroth-m-p/
Machine Learning and its Applications 1st Edition Peter Wlodarczak
https://textbookfull.com/product/machine-learning-and-itsapplications-1st-edition-peter-wlodarczak/
AI as a Service: Serverless machine learning with AWS 1st Edition Peter Elger
https://textbookfull.com/product/ai-as-a-service-serverlessmachine-learning-with-aws-1st-edition-peter-elger/
Machine Learning for Factor Investing R Version Chapman and Hall CRC Financial Mathematics Series 1st Edition Guillaume Coqueret
https://textbookfull.com/product/machine-learning-for-factorinvesting-r-version-chapman-and-hall-crc-financial-mathematicsseries-1st-edition-guillaume-coqueret/
Mathematics Affect and Learning Middle School Students
Beliefs and Attitudes About Mathematics Education Peter Grootenboer
https://textbookfull.com/product/mathematics-affect-and-learningmiddle-school-students-beliefs-and-attitudes-about-mathematicseducation-peter-grootenboer/
Machine Learning for iOS Developers 1st Edition
Abhishek Mishra
https://textbookfull.com/product/machine-learning-for-iosdevelopers-1st-edition-abhishek-mishra/
Machine Learning for Decision Makers 1st Edition
Patanjali Kashyap
https://textbookfull.com/product/machine-learning-for-decisionmakers-1st-edition-patanjali-kashyap/
Mathematics for Machine Technology 8th Edition John C. Peterson
https://textbookfull.com/product/mathematics-for-machinetechnology-8th-edition-john-c-peterson/
Machine Learning by Tutorials Beginning Machine
Learning for Apple and iOS First Edition Raywenderlich Tutorial Team
https://textbookfull.com/product/machine-learning-by-tutorialsbeginning-machine-learning-for-apple-and-ios-first-editionraywenderlich-tutorial-team/
MA THE MA TICS FOR MA CHINE LEA R NI NG
A. Aldo Faisal Cheng Soon O n g
Marc Peter Deisenroth
2.4VectorSpaces
2.5LinearIndependence
2.7LinearMappings
3AnalyticGeometry
3.5OrthonormalBasis
3.6OrthogonalComplement
3.7InnerProductofFunctions
3.8OrthogonalProjections
3.9Rotations
ThismaterialwillbepublishedbyCambridgeUniversityPressas MathematicsforMachineLearning byMarcPeterDeisenroth,A.AldoFaisal,andChengSoonOng.Thispre-publicationversionis freetoviewanddownloadforpersonaluseonly.Notforre-distribution,re-saleoruseinderivativeworks. c byM.P.Deisenroth,A.A.Faisal,andC.S.Ong,2020. https://mml-book.com.
Contents Foreword 1 PartIMathematicalFoundations 9 1IntroductionandMotivation 11 1.1FindingWordsforIntuitions 12 1.2TwoWaystoReadThisBook 13 1.3ExercisesandFeedback 16
17
19
22
27
2LinearAlgebra
2.1SystemsofLinearEquations
2.2Matrices
2.3SolvingSystemsofLinearEquations
35
40
44
2.6BasisandRank
48
61 2.9FurtherReading 63 Exercises 64
2.8AffineSpaces
70
71
72
75
76
78
3.1Norms
3.2InnerProducts
3.3LengthsandDistances
3.4AnglesandOrthogonality
79
80
81
91
94 Exercises 96
98 4.1DeterminantandTrace 99 i
3.10FurtherReading
4MatrixDecompositions
4.2EigenvaluesandEigenvectors
4.7MatrixPhylogeny
4.8FurtherReading
5VectorCalculus
5.1DifferentiationofUnivariateFunctions
5.2PartialDifferentiationandGradients
5.3GradientsofVector-ValuedFunctions
5.4GradientsofMatrices
5.5UsefulIdentitiesforComputingGradients
5.6BackpropagationandAutomaticDifferentiation
5.7Higher-OrderDerivatives
5.8LinearizationandMultivariateTaylorSeries
5.9FurtherReading
6ProbabilityandDistributions
6.1ConstructionofaProbabilitySpace
6.2DiscreteandContinuousProbabilities
6.3SumRule,ProductRule,andBayes’Theorem
6.4SummaryStatisticsandIndependence
6.5GaussianDistribution
6.6ConjugacyandtheExponentialFamily
6.7ChangeofVariables/InverseTransform
6.8FurtherReading
7ContinuousOptimization
7.1OptimizationUsingGradientDescent
7.2ConstrainedOptimizationandLagrangeMultipliers
7.3ConvexOptimization
7.4FurtherReading
8WhenModelsMeetData
8.1Data,Models,andLearning
8.2EmpiricalRiskMinimization
8.4ProbabilisticModelingandInference
8.5DirectedGraphicalModels
ii Contents
105
114
115
119
129
4.3CholeskyDecomposition
4.4EigendecompositionandDiagonalization
4.5SingularValueDecomposition
4.6MatrixApproximation
134
135 Exercises 137
139
141
146
149
155
158
159
164
165
170 Exercises 170
172
172
178
183
186
197
205
214
221 Exercises 222
225
227
233
236
246 Exercises 247 PartIICentralMachineLearningProblems 249
251
251
258
265
8.3ParameterEstimation
272
278
Draft(2020-01-01)of“MathematicsforMachineLearning”.Feedback: https://mml-book.com
10DimensionalityReductionwithPrincipalComponentAnalysis
11DensityEstimationwithGaussianMixtureModels
12ClassificationwithSupportVectorMachines
c 2020M.P.Deisenroth,A.A.Faisal,C.S.Ong.TobepublishedbyCambridgeUniversityPress.
Contents iii
283
289 9.1ProblemFormulation 291 9.2ParameterEstimation 292 9.3BayesianLinearRegression 303 9.4MaximumLikelihoodasOrthogonalProjection 313
315
8.6ModelSelection
9LinearRegression
9.5FurtherReading
317
318 10.2MaximumVariancePerspective 320
325
333
335 10.6KeyStepsofPCAinPractice 336 10.7LatentVariablePerspective 339 10.8FurtherReading 343
10.1ProblemSetting
10.3ProjectionPerspective
10.4EigenvectorComputationandLow-RankApproximations
10.5PCAinHighDimensions
348 11.1GaussianMixtureModel 349 11.2ParameterLearningviaMaximumLikelihood 350 11.3EMAlgorithm 360 11.4Latent-VariablePerspective 363 11.5FurtherReading 368
370 12.1SeparatingHyperplanes 372 12.2PrimalSupportVectorMachine 374 12.3DualSupportVectorMachine 383 12.4Kernels 388 12.5NumericalSolution 390 12.6FurtherReading 392 References 395 Index 407
Foreword
Machinelearningisthelatestinalonglineofattemptstodistillhuman knowledgeandreasoningintoaformthatissuitableforconstructingmachinesandengineeringautomatedsystems.Asmachinelearningbecomes moreubiquitousanditssoftwarepackagesbecomeeasiertouse,itisnaturalanddesirablethatthelow-leveltechnicaldetailsareabstractedaway andhiddenfromthepractitioner.However,thisbringswithitthedanger thatapractitionerbecomesunawareofthedesigndecisionsand,hence, thelimitsofmachinelearningalgorithms.
Theenthusiasticpractitionerwhoisinterestedtolearnmoreaboutthe magicbehindsuccessfulmachinelearningalgorithmscurrentlyfacesa dauntingsetofpre-requisiteknowledge:
Programminglanguagesanddataanalysistools
Large-scalecomputationandtheassociatedframeworks Mathematicsandstatisticsandhowmachinelearningbuildsonit
Atuniversities,introductorycoursesonmachinelearningtendtospend earlypartsofthecoursecoveringsomeofthesepre-requisites.Forhistoricalreasons,coursesinmachinelearningtendtobetaughtinthecomputer sciencedepartment,wherestudentsareoftentrainedinthefirsttwoareas ofknowledge,butnotsomuchinmathematicsandstatistics.
Currentmachinelearningtextbooksprimarilyfocusonmachinelearningalgorithmsandmethodologiesandassumethatthereaderiscompetentinmathematicsandstatistics.Therefore,thesebooksonlyspend oneortwochaptersofbackgroundmathematics,eitheratthebeginning ofthebookorasappendices.Wehavefoundmanypeoplewhowantto delveintothefoundationsofbasicmachinelearningmethodswhostrugglewiththemathematicalknowledgerequiredtoreadamachinelearning textbook.Havingtaughtundergraduateandgraduatecoursesatuniversities,wefindthatthegapbetweenhighschoolmathematicsandthemathematicslevelrequiredtoreadastandardmachinelearningtextbookistoo bigformanypeople.
Thisbookbringsthemathematicalfoundationsofbasicmachinelearningconceptstotheforeandcollectstheinformationinasingleplaceso thatthisskillsgapisnarrowedorevenclosed.
1
ThismaterialwillbepublishedbyCambridgeUniversityPressas MathematicsforMachineLearning byMarcPeterDeisenroth,A.AldoFaisal,andChengSoonOng.Thispre-publicationversionis freetoviewanddownloadforpersonaluseonly.Notforre-distribution,re-saleoruseinderivativeworks. c byM.P.Deisenroth,A.A.Faisal,andC.S.Ong,2020. https://mml-book.com
Foreword
WhyAnotherBookonMachineLearning?
Machinelearningbuildsuponthelanguageofmathematicstoexpress conceptsthatseemintuitivelyobviousbutthataresurprisinglydifficult toformalize.Onceformalizedproperly,wecangaininsightsintothetask wewanttosolve.Onecommoncomplaintofstudentsofmathematics aroundtheglobeisthatthetopicscoveredseemtohavelittlerelevance topracticalproblems.Webelievethatmachinelearningisanobviousand directmotivationforpeopletolearnmathematics.
Thisbookisintendedtobeaguidebooktothevastmathematicalliteraturethatformsthefoundationsofmodernmachinelearning.Wemo- “Mathislinkedin thepopularmind withphobiaand anxiety.You’dthink we’rediscussing spiders.”(Strogatz, 2014,page281)
tivatetheneedformathematicalconceptsbydirectlypointingouttheir usefulnessinthecontextoffundamentalmachinelearningproblems.In theinterestofkeepingthebookshort,manydetailsandmoreadvanced conceptshavebeenleftout.Equippedwiththebasicconceptspresented here,andhowtheyfitintothelargercontextofmachinelearning,the readercanfindnumerousresourcesforfurtherstudy,whichweprovideat theendoftherespectivechapters.Forreaderswithamathematicalbackground,thisbookprovidesabriefbutpreciselystatedglimpseofmachine learning.Incontrasttootherbooksthatfocusonmethodsandmodels ofmachinelearning(MacKay, 2003; Bishop, 2006; Alpaydin, 2010; Barber, 2012; Murphy, 2012; Shalev-ShwartzandBen-David, 2014; Rogers andGirolami, 2016)orprogrammaticaspectsofmachinelearning(M¨uller andGuido, 2016; RaschkaandMirjalili, 2017; CholletandAllaire, 2018), weprovideonlyfourrepresentativeexamplesofmachinelearningalgorithms.Instead,wefocusonthemathematicalconceptsbehindthemodels themselves.Wehopethatreaderswillbeabletogainadeeperunderstandingofthebasicquestionsinmachinelearningandconnectpracticalquestionsarisingfromtheuseofmachinelearningwithfundamentalchoices inthemathematicalmodel.
Wedonotaimtowriteaclassicalmachinelearningbook.Instead,our intentionistoprovidethemathematicalbackground,appliedtofourcentralmachinelearningproblems,tomakeiteasiertoreadothermachine learningtextbooks.
WhoIstheTargetAudience?
Asapplicationsofmachinelearningbecomewidespreadinsociety,we believethateverybodyshouldhavesomeunderstandingofitsunderlying principles.Thisbookiswritteninanacademicmathematicalstyle,which enablesustobepreciseabouttheconceptsbehindmachinelearning.We encouragereadersunfamiliarwiththisseeminglytersestyletopersevere andtokeepthegoalsofeachtopicinmind.Wesprinklecommentsand remarksthroughoutthetext,inthehopethatitprovidesusefulguidance withrespecttothebigpicture.
Thebookassumesthereadertohavemathematicalknowledgecommonly
Draft(2020-01-01)of“MathematicsforMachineLearning”.Feedback: https://mml-book.com
2
Foreword 3 coveredinhighschoolmathematicsandphysics. Forexample,thereader shouldhaveseenderivativesandintegralsbefore,andgeometricvectors intwoorthreedimensions.Startingfromthere,wegeneralizetheseconcepts.Therefore,thetargetaudienceofthebookincludesundergraduate universitystudents,eveninglearnersandlearnersparticipatinginonline machinelearningcourses.
Inanalogytomusic,therearethreetypesofinteractionthatpeople havewithmachinelearning:
AstuteListener Thedemocratizationofmachinelearningbytheprovisionofopen-sourcesoftware,onlinetutorialsandcloud-basedtoolsallowsuserstonotworryaboutthespecificsofpipelines.Userscanfocuson extractinginsightsfromdatausingoff-the-shelftools.Thisenablesnontech-savvydomainexpertstobenefitfrommachinelearning.Thisissimilartolisteningtomusic;theuserisabletochooseanddiscernbetween differenttypesofmachinelearning,andbenefitsfromit.Moreexperiencedusersarelikemusiccritics,askingimportantquestionsaboutthe applicationofmachinelearninginsocietysuchasethics,fairness,andprivacyoftheindividual.Wehopethatthisbookprovidesafoundationfor thinkingaboutthecertificationandriskmanagementofmachinelearning systems,andallowsthemtousetheirdomainexpertisetobuildbetter machinelearningsystems.
ExperiencedArtist Skilledpractitionersofmachinelearningcanplug andplaydifferenttoolsandlibrariesintoananalysispipeline.Thestereotypicalpractitionerwouldbeadatascientistorengineerwhounderstands machinelearninginterfacesandtheirusecases,andisabletoperform wonderfulfeatsofpredictionfromdata.Thisissimilartoavirtuosoplayingmusic,wherehighlyskilledpractitionerscanbringexistinginstrumentstolifeandbringenjoymenttotheiraudience.Usingthemathematicspresentedhereasaprimer,practitionerswouldbeabletounderstandthebenefitsandlimitsoftheirfavoritemethod,andtoextendand generalizeexistingmachinelearningalgorithms.Wehopethatthisbook providestheimpetusformorerigorousandprincipleddevelopmentof machinelearningmethods.
FledglingComposer Asmachinelearningisappliedtonewdomains, developersofmachinelearningneedtodevelopnewmethodsandextend existingalgorithms.Theyareoftenresearcherswhoneedtounderstand themathematicalbasisofmachinelearninganduncoverrelationshipsbetweendifferenttasks.Thisissimilartocomposersofmusicwho,within therulesandstructureofmusicaltheory,createnewandamazingpieces. Wehopethisbookprovidesahigh-leveloverviewofothertechnicalbooks forpeoplewhowanttobecomecomposersofmachinelearning.Thereis agreatneedinsocietyfornewresearcherswhoareabletoproposeand explorenovelapproachesforattackingthemanychallengesoflearning fromdata.
c 2020M.P.Deisenroth,A.A.Faisal,C.S.Ong.TobepublishedbyCambridgeUniversityPress.
Foreword
Acknowledgments
Wearegratefultomanypeoplewholookedatearlydraftsofthebookand sufferedthroughpainfulexpositionsofconcepts.Wetriedtoimplement theirideasthatwedidnotvehementlydisagreewith.Wewouldliketo especiallyacknowledgeChristfriedWebersforhiscarefulreadingofmany partsofthebook,andhisdetailedsuggestionsonstructureandpresentation.Manyfriendsandcolleagueshavealsobeenkindenoughtoprovidetheirtimeandenergyondifferentversionsofeachchapter.Wehave beenluckytobenefitfromthegenerosityoftheonlinecommunity,who havesuggestedimprovementsvia github.com,whichgreatlyimproved thebook.
Thefollowingpeoplehavefoundbugs,proposedclarificationsandsuggestedrelevantliterature,eithervia github.com orpersonalcommunication.Theirnamesaresortedalphabetically.
Abdul-GaniyUsman
AdamGaier
AdeleJackson
AdityaMenon
AlasdairTran
AleksandarKrnjaic
AlexanderMakrigiorgos
AlfredoCanziani
AliShafti
AmrKhalifa
AndrewTanggara
AngusGruen
AntalA.Buss
AntoineToisoulLeCann
AregSarvazyan
ArtemArtemev
ArtyomStepanov
BillKromydas
BobWilliamson
BoonPingLim
ChaoQu
ChengLi
ChrisSherlock
ChristopherGray
DanielMcNamara
DanielWood
DarrenSiegel
DavidJohnston
DaweiChen
EllenBroad
FengkuangtianZhu
FionaCondon
GeorgiosTheodorou
HeXin
IreneRaissaKameni
JakubNabaglo
JamesHensman
JamieLiu
JeanKaddour
Jean-PaulEbejer
JerryQiang
JiteshSindhare
JohnLloyd
JonasNgnawe
JonMartin
JustinHsi
KaiArulkumaran
KamilDreczkowski
LilyWang
LionelTondjiNgoupeyou
LydiaKnufing
MahmoudAslan
MarkHartenstein
MarkvanderWilk
MarkusHegland
MartinHewing
MatthewAlger
MatthewLee
Draft(2020-01-01)of“MathematicsforMachineLearning”.Feedback: https://mml-book.com
4
Foreword 5
MaximusMcCann
MengyanZhang
MichaelBennett
MichaelPedersen
MinjeongShin
MohammadMalekzadeh
NaveenKumar
NicoMontali
OscarArmas
PatrickHenriksen
PatrickWieschollek
PattarawatChormai
PaulKelly
PetrosChristodoulou
PiotrJanuszewski
PranavSubramani
QuyuKong
RagibZaman
RuiZhang
Ryan-RhysGriffiths
SalomonKabongo
SamuelOgunmola
SandeepMavadia
SarveshNikumbh
SebastianRaschka
SenanayakSeshKumarKarri
Seung-HeonBaek
ShahbazChaudhary
ShakirMohamed
ShawnBerry
SheikhAbdulRaheemAli
ShengXue
SridharThiagarajan
SyedNoumanHasany
SzymonBrych
ThomasBuhler
TimurSharapov
TomMelamed
VincentAdam
VincentDutordoir
VuMinh
WasimAftab
WenZhi
WojciechStokowiec
XiaonanChong
XiaoweiZhang
YazhouHao
YichengLuo
YoungLee
YuLu
YunCheng
YuxiaoHuang
ZacCranko
ZijianCao
ZoeNolan
Contributorsthroughgithub,whoserealnameswerenotlistedontheir githubprofile,are:
SamDataMad bumptiousmonkey idoamihai deepakiim
insad HorizonP cs-maillist kudo23
empet victorBigand 17SKYE jessjing1995
WearealsoverygratefultoParameswaranRamanandthemanyanonymousreviewers,organizedbyCambridgeUniversityPress,whoreadone ormorechaptersofearlierversionsofthemanuscript,andprovidedconstructivecriticismthatledtoconsiderableimprovements.AspecialmentiongoestoDineshSinghNegi,ourLATEXsupport,fordetailedandprompt adviceaboutLATEX-relatedissues.Lastbutnotleast,weareverygrateful tooureditorLaurenCowles,whohasbeenpatientlyguidingusthrough thegestationprocessofthisbook.
c 2020M.P.Deisenroth,A.A.Faisal,C.S.Ong.TobepublishedbyCambridgeUniversityPress.
Foreword
TableofSymbols
SymbolTypicalmeaning
a,b,c,α,β,γ
x, y, z
A, B, C
x , A
A 1
x, y
x y
Scalarsarelowercase
Vectorsareboldlowercase
Matricesarebolduppercase
Transposeofavectorormatrix
Inverseofamatrix
Innerproductof x and y
Dotproductof x and y
B =(b1, b2, b3) (Ordered)tuple
B =[b1, b2, b3] Matrixofcolumnvectorsstackedhorizontally
B = {b1, b2, b3} Setofvectors(unordered)
Z, N
Integersandnaturalnumbers,respectively
R, C Realandcomplexnumbers,respectively
Rn n-dimensionalvectorspaceofrealnumbers
∀x Universalquantifier:forall x
∃x Existentialquantifier:thereexists x
a := ba isdefinedas b
a =: bb isdefinedas a
a ∝ ba isproportionalto b,i.e., a = constant b
g ◦ f Functioncomposition:“g after f ”
⇐⇒ Ifandonlyif
=⇒ Implies
A, C Sets
a ∈A a isanelementoftheset A
∅ Emptyset
D
Numberofdimensions;indexedby d =1,...,D
N Numberofdatapoints;indexedby n =1,...,N
I m
0m,n
1m,n
Identitymatrixofsize m × m
Matrixofzerosofsize m × n
Matrixofonesofsize m × n
ei Standard/canonicalvector(where i isthecomponentthatis 1) dim Dimensionalityofvectorspace rk(A) Rankofmatrix A
Im(Φ) Imageoflinearmapping Φ ker(Φ) Kernel(nullspace)ofalinearmapping Φ span[b1] Span(generatingset)of b1 tr(A) Traceof A
det(A) Determinantof A
|·| Absolutevalueordeterminant(dependingoncontext)
· Norm;Euclideanunlessspecified
λ EigenvalueorLagrangemultiplier
Eλ Eigenspacecorrespondingtoeigenvalue λ
Draft(2020-01-01)of“MathematicsforMachineLearning”.Feedback: https://mml-book.com
6
Foreword 7
SymbolTypicalmeaning
θ Parametervector
∂f
∂x
df
Partialderivativeof f withrespectto x
dx Totalderivativeof f withrespectto x
∇ Gradient
L Lagrangian
L Negativelog-likelihood
n k Binomialcoefficient, n choose k
VX [x] Varianceof x withrespecttotherandomvariable X
EX [x] Expectationof x withrespecttotherandomvariable X
CovX,Y [x, y] Covariancebetween x and y.
X ⊥⊥ Y | ZX isconditionallyindependentof Y given Z
X ∼ p Randomvariable X isdistributedaccordingto p
N µ, Σ Gaussiandistributionwithmean µ andcovariance Σ
Ber(µ) Bernoullidistributionwithparameter µ
Bin(N,µ) Binomialdistributionwithparameters N,µ
Beta(α,β) Betadistributionwithparameters α,β
TableofAbbreviationsandAcronyms
AcronymMeaning
e.g.Exempligratia(Latin:forexample)
GMMGaussianmixturemodel
i.e.Idest(Latin:thismeans)
i.i.d.Independent,identicallydistributed
MAPMaximumaposteriori
MLEMaximumlikelihoodestimation/estimator
ONBOrthonormalbasis
PCAPrincipalcomponentanalysis
PPCAProbabilisticprincipalcomponentanalysis
REFRow-echelonform
SPDSymmetric,positivedefinite
SVMSupportvectormachine
c 2020M.P.Deisenroth,A.A.Faisal,C.S.Ong.TobepublishedbyCambridgeUniversityPress.
PartI
MathematicalFoundations
ThismaterialwillbepublishedbyCambridgeUniversityPressas MathematicsforMachineLearning byMarcPeterDeisenroth,A.AldoFaisal,andChengSoonOng.Thispre-publicationversionis freetoviewanddownloadforpersonaluseonly.Notforre-distribution,re-saleoruseinderivativeworks. c byM.P.Deisenroth,A.A.Faisal,andC.S.Ong,2020. https://mml-book.com
9
IntroductionandMotivation
Machinelearningisaboutdesigningalgorithmsthatautomaticallyextract valuableinformationfromdata.Theemphasishereison“automatic”,i.e., machinelearningisconcernedaboutgeneral-purposemethodologiesthat canbeappliedtomanydatasets,whileproducingsomethingthatismeaningful.Therearethreeconceptsthatareatthecoreofmachinelearning: data,amodel,andlearning.
Sincemachinelearningisinherentlydatadriven, data isatthecore data ofmachinelearning.Thegoalofmachinelearningistodesigngeneralpurposemethodologiestoextractvaluablepatternsfromdata,ideally withoutmuchdomain-specificexpertise.Forexample,givenalargecorpus ofdocuments(e.g.,booksinmanylibraries),machinelearningmethods canbeusedtoautomaticallyfindrelevanttopicsthataresharedacross documents(Hoffmanetal., 2010).Toachievethisgoal,wedesign models thataretypicallyrelatedtotheprocessthatgeneratesdata,similarto model thedatasetwearegiven.Forexample,inaregressionsetting,themodel woulddescribeafunctionthatmapsinputstoreal-valuedoutputs.To paraphrase Mitchell (1997):Amodelissaidtolearnfromdataifitsperformanceonagiventaskimprovesafterthedataistakenintoaccount. Thegoalistofindgoodmodelsthatgeneralizewelltoyetunseendata, whichwemaycareaboutinthefuture. Learning canbeunderstoodasa learning waytoautomaticallyfindpatternsandstructureindatabyoptimizingthe parametersofthemodel.
Whilemachinelearninghasseenmanysuccessstories,andsoftwareis readilyavailabletodesignandtrainrichandflexiblemachinelearning systems,webelievethatthemathematicalfoundationsofmachinelearningareimportantinordertounderstandfundamentalprinciplesupon whichmorecomplicatedmachinelearningsystemsarebuilt.Understandingtheseprinciplescanfacilitatecreatingnewmachinelearningsolutions, understandinganddebuggingexistingapproaches,andlearningaboutthe inherentassumptionsandlimitationsofthemethodologiesweareworkingwith.
11
ThismaterialwillbepublishedbyCambridgeUniversityPressas MathematicsforMachineLearning byMarcPeterDeisenroth,A.AldoFaisal,andChengSoonOng.Thispre-publicationversionis freetoviewanddownloadforpersonaluseonly.Notforre-distribution,re-saleoruseinderivativeworks. c byM.P.Deisenroth,A.A.Faisal,andC.S.Ong,2020. https://mml-book.com
1
1.1FindingWordsforIntuitions
Achallengewefaceregularlyinmachinelearningisthatconceptsand wordsareslippery,andaparticularcomponentofthemachinelearning systemcanbeabstractedtodifferentmathematicalconcepts.Forexample, theword“algorithm”isusedinatleasttwodifferentsensesinthecontextofmachinelearning.Inthefirstsense,weusethephrase“machine learningalgorithm”tomeanasystemthatmakespredictionsbasedoninputdata.Werefertothesealgorithmsas predictors.Inthesecondsense, predictor weusetheexactsamephrase“machinelearningalgorithm”tomeana systemthatadaptssomeinternalparametersofthepredictorsothatit performswellonfutureunseeninputdata.Herewerefertothisadaptationas training asystem. training
Thisbookwillnotresolvetheissueofambiguity,butwewanttohighlightupfrontthat,dependingonthecontext,thesameexpressionscan meandifferentthings.However,weattempttomakethecontextsufficientlycleartoreducethelevelofambiguity.
Thefirstpartofthisbookintroducesthemathematicalconceptsand foundationsneededtotalkaboutthethreemaincomponentsofamachine learningsystem:data,models,andlearning.Wewillbrieflyoutlinethese componentshere,andwewillrevisitthemagaininChapter 8 oncewe havediscussedthenecessarymathematicalconcepts.
Whilenotalldataisnumerical,itisoftenusefultoconsiderdatain anumberformat.Inthisbook,weassumethat data hasalreadybeen appropriatelyconvertedintoanumericalrepresentationsuitableforreadingintoacomputerprogram.Therefore,wethinkofdataasvectors.As dataasvectors anotherillustrationofhowsubtlewordsare,thereare(atleast)three differentwaystothinkaboutvectors:avectorasanarrayofnumbers(a computerscienceview),avectorasanarrowwithadirectionandmagnitude(aphysicsview),andavectorasanobjectthatobeysadditionand scaling(amathematicalview).
A model istypicallyusedtodescribeaprocessforgeneratingdata,sim- model ilartothedatasetathand.Therefore,goodmodelscanalsobethought ofassimplifiedversionsofthereal(unknown)data-generatingprocess, capturingaspectsthatarerelevantformodelingthedataandextracting hiddenpatternsfromit.Agoodmodelcanthenbeusedtopredictwhat wouldhappenintherealworldwithoutperformingreal-worldexperiments.
Wenowcometothecruxofthematter,the learning componentof learning machinelearning.Assumewearegivenadatasetandasuitablemodel. Training themodelmeanstousethedataavailabletooptimizesomeparametersofthemodelwithrespecttoautilityfunctionthatevaluateshow wellthemodelpredictsthetrainingdata.Mosttrainingmethodscanbe thoughtofasanapproachanalogoustoclimbingahilltoreachitspeak. Inthisanalogy,thepeakofthehillcorrespondstoamaximumofsome
Draft(2020-01-01)of“MathematicsforMachineLearning”.Feedback: https://mml-book.com
12 IntroductionandMotivation
1.2TwoWaystoReadThisBook
desiredperformancemeasure.However,inpractice,weareinterestedin themodeltoperformwellonunseendata.Performingwellondatathat wehavealreadyseen(trainingdata)mayonlymeanthatwefounda goodwaytomemorizethedata.However,thismaynotgeneralizewellto unseendata,and,inpracticalapplications,weoftenneedtoexposeour machinelearningsystemtosituationsthatithasnotencounteredbefore.
Letussummarizethemainconceptsofmachinelearningthatwecover inthisbook:
Werepresentdataasvectors.
Wechooseanappropriatemodel,eitherusingtheprobabilisticoroptimizationview.
Welearnfromavailabledatabyusingnumericaloptimizationmethods withtheaimthatthemodelperformswellondatanotusedfortraining.
1.2TwoWaystoReadThisBook
Wecanconsidertwostrategiesforunderstandingthemathematicsfor machinelearning:
Bottom-up: Buildinguptheconceptsfromfoundationaltomoreadvanced.Thisisoftenthepreferredapproachinmoretechnicalfields, suchasmathematics.Thisstrategyhastheadvantagethatthereader atalltimesisabletorelyontheirpreviouslylearnedconcepts.Unfortunately,forapractitionermanyofthefoundationalconceptsarenot particularlyinterestingbythemselves,andthelackofmotivationmeans thatmostfoundationaldefinitionsarequicklyforgotten.
Top-down: Drillingdownfrompracticalneedstomorebasicrequirements.Thisgoal-drivenapproachhastheadvantagethatthereaders knowatalltimeswhytheyneedtoworkonaparticularconcept,and thereisaclearpathofrequiredknowledge.Thedownsideofthisstrategyisthattheknowledgeisbuiltonpotentiallyshakyfoundations,and thereadershavetorememberasetofwordsthattheydonothaveany wayofunderstanding.
Wedecidedtowritethisbookinamodularwaytoseparatefoundational (mathematical)conceptsfromapplicationssothatthisbookcanberead inbothways.Thebookissplitintotwoparts,wherePartIlaysthemathematicalfoundationsandPartIIappliestheconceptsfromPartItoaset offundamentalmachinelearningproblems,whichformfourpillarsof machinelearningasillustratedinFigure 1.1:regression,dimensionality reduction,densityestimation,andclassification.ChaptersinPartImostly builduponthepreviousones,butitispossibletoskipachapterandwork backwardifnecessary.ChaptersinPartIIareonlylooselycoupledand canbereadinanyorder.Therearemanypointersforwardandbackward
c 2020M.P.Deisenroth,A.A.Faisal,C.S.Ong.TobepublishedbyCambridgeUniversityPress.
13
Figure1.1 The foundationsand fourpillarsof machinelearning.
IntroductionandMotivation
betweenthetwopartsofthebooktolinkmathematicalconceptswith machinelearningalgorithms.
Ofcoursetherearemorethantwowaystoreadthisbook. Mostreaders learnusingacombinationoftop-downandbottom-upapproaches,sometimesbuildingupbasicmathematicalskillsbeforeattemptingmorecomplexconcepts,butalsochoosingtopicsbasedonapplicationsofmachine learning.
PartIIsaboutMathematics
Thefourpillarsofmachinelearningwecoverinthisbook(seeFigure 1.1) requireasolidmathematicalfoundation,whichislaidoutinPartI. Werepresentnumericaldataasvectorsandrepresentatableofsuch dataasamatrix.Thestudyofvectorsandmatricesiscalled linearalgebra, whichweintroduceinChapter 2.Thecollectionofvectorsasamatrixis linearalgebra alsodescribedthere.
Giventwovectorsrepresentingtwoobjectsintherealworld,wewant tomakestatementsabouttheirsimilarity.Theideaisthatvectorsthat aresimilarshouldbepredictedtohavesimilaroutputsbyourmachine learningalgorithm(ourpredictor).Toformalizetheideaofsimilaritybetweenvectors,weneedtointroduceoperationsthattaketwovectorsas inputandreturnanumericalvaluerepresentingtheirsimilarity.Theconstructionofsimilarityanddistancesiscentralto analyticgeometry andis analyticgeometry discussedinChapter 3
InChapter 4,weintroducesomefundamentalconceptsaboutmatricesand matrixdecomposition.Someoperationsonmatricesareextremely matrix decomposition usefulinmachinelearning,andtheyallowforanintuitiveinterpretation ofthedataandmoreefficientlearning.
Weoftenconsiderdatatobenoisyobservationsofsometrueunderlyingsignal.Wehopethatbyapplyingmachinelearningwecanidentifythe signalfromthenoise.Thisrequiresustohavealanguageforquantifyingwhat“noise”means.Weoftenwouldalsoliketohavepredictorsthat
Draft(2020-01-01)of“MathematicsforMachineLearning”.Feedback: https://mml-book.com
14
Classification DensityEstimation Regression DimensionalityReduction MachineLearning VectorCalculus Probability&Distributions Optimization AnalyticGeometry MatrixDecomposition LinearAlgebra
1.2TwoWaystoReadThisBook
allowustoexpresssomesortofuncertainty,e.g.,toquantifytheconfidencewehaveaboutthevalueofthepredictionataparticulartestdata point.Quantificationofuncertaintyistherealmof probabilitytheory and probabilitytheory iscoveredinChapter 6.
Totrainmachinelearningmodels,wetypicallyfindparametersthat maximizesomeperformancemeasure.Manyoptimizationtechniquesrequiretheconceptofagradient,whichtellsusthedirectioninwhichto searchforasolution.Chapter 5 isabout vectorcalculus anddetailsthe vectorcalculus conceptofgradients,whichwesubsequentlyuseinChapter 7,wherewe talkabout optimization tofindmaxima/minimaoffunctions. optimization
PartIIIsaboutMachineLearning
Thesecondpartofthebookintroduces fourpillarsofmachinelearning asshowninFigure 1.1.Weillustratehowthemathematicalconceptsintroducedinthefirstpartofthebookarethefoundationforeachpillar. Broadlyspeaking,chaptersareorderedbydifficulty(inascendingorder).
InChapter 8,werestatethethreecomponentsofmachinelearning (data,models,andparameterestimation)inamathematicalfashion.In addition,weprovidesomeguidelinesforbuildingexperimentalset-ups thatguardagainstoverlyoptimisticevaluationsofmachinelearningsystems.Recallthatthegoalistobuildapredictorthatperformswellon unseendata.
InChapter 9,wewillhaveacloselookat linearregression,whereour linearregression objectiveistofindfunctionsthatmapinputs x ∈ RD tocorrespondingobservedfunctionvalues y ∈ R,whichwecaninterpretasthelabelsoftheir respectiveinputs.Wewilldiscussclassicalmodelfitting(parameterestimation)viamaximumlikelihoodandmaximumaposterioriestimation, aswellasBayesianlinearregression,whereweintegratetheparameters outinsteadofoptimizingthem.
Chapter 10 focuseson dimensionalityreduction,thesecondpillarinFig- dimensionality reduction ure 1.1,usingprincipalcomponentanalysis.Thekeyobjectiveofdimensionalityreductionistofindacompact,lower-dimensionalrepresentation ofhigh-dimensionaldata x ∈ RD,whichisofteneasiertoanalyzethan theoriginaldata.Unlikeregression,dimensionalityreductionisonlyconcernedaboutmodelingthedata–therearenolabelsassociatedwitha datapoint x
InChapter 11,wewillmovetoourthirdpillar: densityestimation.The densityestimation objectiveofdensityestimationistofindaprobabilitydistributionthatdescribesagivendataset.WewillfocusonGaussianmixturemodelsforthis purpose,andwewilldiscussaniterativeschemetofindtheparametersof thismodel.Asindimensionalityreduction,therearenolabelsassociated withthedatapoints x ∈ RD.However,wedonotseekalow-dimensional representationofthedata.Instead,weareinterestedinadensitymodel thatdescribesthedata.
Chapter 12 concludesthebookwithanin-depthdiscussionofthefourth
c 2020M.P.Deisenroth,A.A.Faisal,C.S.Ong.TobepublishedbyCambridgeUniversityPress.
15
IntroductionandMotivation
pillar: classification.Wewilldiscussclassificationinthecontextofsupport classification vectormachines.Similartoregression(Chapter 9),wehaveinputs x and correspondinglabels y.However,unlikeregression,wherethelabelswere real-valued,thelabelsinclassificationareintegers,whichrequiresspecial care.
1.3ExercisesandFeedback
WeprovidesomeexercisesinPartI,whichcanbedonemostlybypenand paper.ForPartII,weprovideprogrammingtutorials(jupyternotebooks) toexploresomepropertiesofthemachinelearningalgorithmswediscuss inthisbook.
WeappreciatethatCambridgeUniversityPressstronglysupportsour aimtodemocratizeeducationandlearningbymakingthisbookfreely availablefordownloadat
https://mml-book.com wheretutorials,errata,andadditionalmaterialscanbefound.Mistakes canbereportedandfeedbackprovidedusingtheprecedingURL.
Draft(2020-01-01)of“MathematicsforMachineLearning”.Feedback: https://mml-book.com
16
LinearAlgebra
Whenformalizingintuitiveconcepts,acommonapproachistoconstructa setofobjects(symbols)andasetofrulestomanipulatetheseobjects.This isknownasan algebra.Linearalgebraisthestudyofvectorsandcertain algebra rulestomanipulatevectors.Thevectorsmanyofusknowfromschoolare called“geometricvectors”,whichareusuallydenotedbyasmallarrow abovetheletter,e.g., −→ x and −→ y .Inthisbook,wediscussmoregeneral conceptsofvectorsanduseaboldlettertorepresentthem,e.g., x and y Ingeneral,vectorsarespecialobjectsthatcanbeaddedtogetherand multipliedbyscalarstoproduceanotherobjectofthesamekind.From anabstractmathematicalviewpoint,anyobjectthatsatisfiesthesetwo propertiescanbeconsideredavector.Herearesomeexamplesofsuch vectorobjects:
1. Geometricvectors.Thisexampleofavectormaybefamiliarfromhigh schoolmathematicsandphysics.Geometricvectors–seeFigure 2.1(a) –aredirectedsegments,whichcanbedrawn(atleastintwodimensions).Twogeometricvectors → x, → y canbeadded,suchthat → x + → y = → z isanothergeometricvector.Furthermore,multiplicationbyascalar λ → x, λ ∈ R,isalsoageometricvector.Infact,itistheoriginalvector scaledby λ.Therefore,geometricvectorsareinstancesofthevector conceptsintroducedpreviously.Interpretingvectorsasgeometricvectorsenablesustouseourintuitionsaboutdirectionandmagnitudeto reasonaboutmathematicaloperations.
2. Polynomialsarealsovectors;seeFigure 2.1(b):Twopolynomialscan
ThismaterialwillbepublishedbyCambridgeUniversityPressas MathematicsforMachineLearning byMarcPeterDeisenroth,A.AldoFaisal,andChengSoonOng.Thispre-publicationversionis freetoviewanddownloadforpersonaluseonly.Notforre-distribution,re-saleoruseinderivativeworks. c byM.P.Deisenroth,A.A.Faisal,andC.S.Ong,2020. https://mml-book.com
Figure2.1
Differenttypesof vectors.Vectorscan besurprising objects,including (a) geometric vectors and (b) polynomials.
2
→ x → y → x + → y (a)Geometricvectors. 2 0 2 x 6 4 2 0 2 4 y (b)Polynomials.
17
18
LinearAlgebra beaddedtogether,whichresultsinanotherpolynomial;andtheycan bemultipliedbyascalar λ ∈ R,andtheresultisapolynomialas well.Therefore,polynomialsare(ratherunusual)instancesofvectors. Notethatpolynomialsareverydifferentfromgeometricvectors.While geometricvectorsareconcrete“drawings”,polynomialsareabstract concepts.However,theyarebothvectorsinthesensepreviouslydescribed.
3. Audiosignalsarevectors.Audiosignalsarerepresentedasaseriesof numbers.Wecanaddaudiosignalstogether,andtheirsumisanew audiosignal.Ifwescaleanaudiosignal,wealsoobtainanaudiosignal. Therefore,audiosignalsareatypeofvector,too.
4. Elementsof Rn (tuplesof n realnumbers)arevectors. Rn ismore abstractthanpolynomials,anditistheconceptwefocusoninthis book.Forinstance,
isanexampleofatripletofnumbers.Addingtwovectors a, b ∈ Rn component-wiseresultsinanothervector: a + b = c ∈ Rn.Moreover, multiplying a ∈ Rn by λ ∈ R resultsinascaledvector λa ∈ Rn . Consideringvectorsaselementsof Rn hasanadditionalbenefitthat
Becarefultocheck whetherarray operationsactually performvector operationswhen implementingona computer. itlooselycorrespondstoarraysofrealnumbersonacomputer.Many programminglanguagessupportarrayoperations,whichallowforconvenientimplementationofalgorithmsthatinvolvevectoroperations.
Linearalgebrafocusesonthesimilaritiesbetweenthesevectorconcepts. Wecanaddthemtogetherandmultiplythembyscalars.Wewilllargely PavelGrinfeld’s seriesonlinear algebra: http://tinyurl. com/nahclwm GilbertStrang’s courseonlinear algebra: http://tinyurl. com/29p5q8j 3Blue1Brownseries onlinearalgebra: https://tinyurl. com/h5g4kps
focusonvectorsin Rn sincemostalgorithmsinlinearalgebraareformulatedin Rn.WewillseeinChapter 8 thatweoftenconsiderdatato berepresentedasvectorsin Rn.Inthisbook,wewillfocusonfinitedimensionalvectorspaces,inwhichcasethereisa 1:1 correspondence betweenanykindofvectorand Rn.Whenitisconvenient,wewilluse intuitionsaboutgeometricvectorsandconsiderarray-basedalgorithms.
Onemajorideainmathematicsistheideaof“closure”.Thisisthequestion:Whatisthesetofallthingsthatcanresultfrommyproposedoperations?Inthecaseofvectors:Whatisthesetofvectorsthatcanresultby startingwithasmallsetofvectors,andaddingthemtoeachotherand scalingthem?Thisresultsinavectorspace(Section 2.4).Theconceptof avectorspaceanditspropertiesunderliemuchofmachinelearning.The conceptsintroducedinthischapteraresummarizedinFigure 2.2
Thischapterismostlybasedonthelecturenotesandbooksby Drumm andWeil (2001), Strang (2003), Hogben (2013), LiesenandMehrmann (2015),aswellasPavelGrinfeld’sLinearAlgebraseries.Otherexcellent
Draft(2020-01-01)of“MathematicsforMachineLearning”.Feedback: https://mml-book.com
a = 1 2 3 ∈ R3 (2.1)
2.1SystemsofLinearEquations
resourcesareGilbertStrang’sLinearAlgebracourseatMITandtheLinear AlgebraSeriesby3Blue1Brown.
Linearalgebraplaysanimportantroleinmachinelearningandgeneralmathematics.TheconceptsintroducedinthischapterarefurtherexpandedtoincludetheideaofgeometryinChapter 3.InChapter 5,we willdiscussvectorcalculus,whereaprincipledknowledgeofmatrixoperationsisessential.InChapter 10,wewilluseprojections(tobeintroducedinSection 3.8)fordimensionalityreductionwithprincipalcomponentanalysis(PCA).InChapter 9,wewilldiscusslinearregression,where linearalgebraplaysacentralroleforsolvingleast-squaresproblems.
2.1SystemsofLinearEquations
Systemsoflinearequationsplayacentralpartoflinearalgebra.Many problemscanbeformulatedassystemsoflinearequations,andlinear algebragivesusthetoolsforsolvingthem.
Example2.1
Acompanyproducesproducts N1,...,Nn forwhichresources R1,...,Rm arerequired.Toproduceaunitofproduct Nj , aij unitsof resource Ri areneeded,where i =1,...,m and j =1,...,n. Theobjectiveistofindanoptimalproductionplan,i.e.,aplanofhow manyunits xj ofproduct Nj shouldbeproducedifatotalof bi unitsof resource Ri areavailableand(ideally)noresourcesareleftover. Ifweproduce x1,...,xn unitsofthecorrespondingproducts,weneed
c 2020M.P.Deisenroth,A.A.Faisal,C.S.Ong.TobepublishedbyCambridgeUniversityPress.
Figure2.2 Amind mapoftheconcepts introducedinthis chapter,alongwith wheretheyareused inotherpartsofthe book.
19
Vector Vectorspace Matrix Chapter 5 Vectorcalculus Group Systemof linearequations Matrix inverse Gaussian elimination Linear/affine mapping Linear independence Basis Chapter 10 Dimensionality reduction Chapter 12 Classification Chapter 3 Analyticgeometry composes closure Abelian with + represents represents solvedby solves propertyof maximalset
atotalof
ai1x1 + + ainxn (2.2)
manyunitsofresource Ri.Anoptimalproductionplan (x1,...,xn) ∈ Rn , therefore,hastosatisfythefollowingsystemofequations:
(2.3)
where aij ∈ R and bi ∈ R
Equation(2.3)isthegeneralformofa systemoflinearequations,and
systemoflinear equations x1,...,xn arethe unknowns ofthissystem.Every n-tuple (x1,...,xn) ∈ Rn thatsatisfies(2.3)isa solution ofthelinearequationsystem. solution
Example2.2
Thesystemoflinearequations
x1 + x2 + x3 =3(1) x1 x2 +2x3 =2(2) 2x1 +3x3 =1(3) (2.4)
has nosolution: Addingthefirsttwoequationsyields 2x1 +3x3 =5,which contradictsthethirdequation(3).
Letushavealookatthesystemoflinearequations
x1 + x2 + x3 =3(1) x1 x2 +2x3 =2(2) x2 + x3 =2(3) . (2.5)
Fromthefirstandthirdequation,itfollowsthat x1 =1.From(1)+(2), weget 2x1 +3x3 =5,i.e., x3 =1.From(3),wethengetthat x2 =1. Therefore, (1, 1, 1) istheonlypossibleand uniquesolution (verifythat (1, 1, 1) isasolutionbypluggingin).
Asathirdexample,weconsider
x1 + x2 + x3 =3(1) x1 x2 +2x3 =2(2) 2x1 +3x3 =5(3) . (2.6)
Since(1)+(2)=(3),wecanomitthethirdequation(redundancy).From (1)and(2),weget 2x1 =5 3x3 and 2x2 =1+x3.Wedefine x3 = a ∈ R asafreevariable,suchthatanytriplet 5 2 3 2 a, 1 2 + 1 2 a,a ,a ∈ R (2.7)
Draft(2020-01-01)of“MathematicsforMachineLearning”.Feedback: https://mml-book.com
20 LinearAlgebra
a
11x1 + + a1nxn = b1 . am1x1 + + amnxn = bm ,
Figure2.3 The solutionspaceofa systemoftwolinear equationswithtwo variablescanbe geometrically interpretedasthe intersectionoftwo lines.Everylinear equationrepresents aline. 2x1 4x2 =1 4x1 +4x2 =5 x1 x2
isasolutionofthesystemoflinearequations,i.e.,weobtainasolution setthatcontains infinitelymany solutions.
Ingeneral,forareal-valuedsystemoflinearequationsweobtaineither no,exactlyone,orinfinitelymanysolutions.Linearregression(Chapter 9) solvesaversionofExample 2.1 whenwecannotsolvethesystemoflinear equations.
Remark (GeometricInterpretationofSystemsofLinearEquations). Ina systemoflinearequationswithtwovariables x1,x2,eachlinearequation definesalineonthe x1x2-plane.Sinceasolutiontoasystemoflinear equationsmustsatisfyallequationssimultaneously,thesolutionsetisthe intersectionoftheselines.Thisintersectionsetcanbealine(ifthelinear equationsdescribethesameline),apoint,orempty(whenthelinesare parallel).AnillustrationisgiveninFigure 2.3 forthesystem
(2.8)
wherethesolutionspaceisthepoint (x1,x2)=(1, 1 4 ).Similarly,forthree variables,eachlinearequationdeterminesaplaneinthree-dimensional space.Whenweintersecttheseplanes,i.e.,satisfyalllinearequationsat thesametime,wecanobtainasolutionsetthatisaplane,aline,apoint orempty(whentheplaneshavenocommonintersection).
Forasystematicapproachtosolvingsystemsoflinearequations,we willintroduceausefulcompactnotation.Wecollectthecoefficients aij intovectorsandcollectthevectorsintomatrices.Inotherwords,wewrite thesystemfrom(2.3)inthefollowingform:
c 2020M.P.Deisenroth,A.A.Faisal,C.S.Ong.TobepublishedbyCambridgeUniversityPress.
2.1SystemsofLinearEquations 21
4x1 +4x2 =5 2x1 4x2 =1
♦
x1 a11 . . am1 + x2 a12 . . am2 + + xn a1n . . amn = b1 . . bm (2.9)
Inthefollowing,wewillhaveacloselookatthese matrices anddefinecomputationrules.WewillreturntosolvinglinearequationsinSection 2.3.
2.2Matrices
Matricesplayacentralroleinlinearalgebra.Theycanbeusedtocompactlyrepresentsystemsoflinearequations,buttheyalsorepresentlinear functions(linearmappings)aswewillseelaterinSection 2.7.Beforewe discusssomeoftheseinterestingtopics,letusfirstdefinewhatamatrix isandwhatkindofoperationswecandowithmatrices.Wewillseemore propertiesofmatricesinChapter 4
Definition2.1 (Matrix). With m,n ∈ N areal-valued (m,n) matrix A is matrix an m n-tupleofelements aij , i =1,...,m, j =1,...,n,whichisordered accordingtoarectangularschemeconsistingof m rowsand n columns:
Byconvention (1,n)-matricesarecalled rows and (m, 1)-matricesarecalled row columns.Thesespecialmatricesarealsocalled row/columnvectors. column rowvector columnvector Figure2.4 By stackingits columns,amatrix A canberepresented asalongvector a.
Rm×n isthesetofallreal-valued (m,n)-matrices. A ∈ Rm×n canbe equivalentlyrepresentedas a ∈ Rmn bystackingall n columnsofthe matrixintoalongvector;seeFigure 2.4.
2.2.1MatrixAdditionandMultiplication
Thesumoftwomatrices A ∈ Rm×n , B ∈ Rm×n isdefinedastheelementwisesum,i.e.,
Notethesizeofthe matrices. C = AB ∈ Rm×k arecomputedas C= np.einsum(’il, lj’,A,B) cij = n l=1
Formatrices A ∈ Rm×n , B ∈ Rn×k,theelements cij oftheproduct
Draft(2020-01-01)of“MathematicsforMachineLearning”.Feedback: https://mml-book.com
22 LinearAlgebra ⇐⇒ a11 a1n . . . . am1 amn x1 . . xn = b1 . . bm . (2.10)
A = a11 a12 a1n a21 a22 ··· a2n . . . am1 am2 ··· amn ,aij ∈ R . (2.11)
re-shape A ∈ R4×2 a ∈ R8
a11 + b11 a1n + b1n . . . . am1 + bm1 amn + bmn ∈ Rm×n (2.12)
A + B :=
b
=1
=1
ail
lj ,i
,...,m,j
,...,k. (2.13)
2.2Matrices 23
Thismeans,tocomputeelement cij wemultiplytheelementsofthe ith Thereare n columns in A and n rowsin B sothatwecan compute ailblj for l =1,...,n
rowof A withthe jthcolumnof B andsumthemup.LaterinSection 3.2, wewillcallthisthe dotproduct ofthecorrespondingrowandcolumn.In cases,whereweneedtobeexplicitthatweareperformingmultiplication, weusethenotation A · B todenotemultiplication(explicitlyshowing “ ”).
Remark. Matricescanonlybemultipliediftheir“neighboring”dimensions match.Forinstance,an n × k-matrix A canbemultipliedwitha k × mmatrix B,butonlyfromtheleftside:
Theproduct BA isnotdefinedif m = n sincetheneighboringdimensions donotmatch.
Remark. Matrixmultiplicationis not definedasanelement-wiseoperation onmatrixelements,i.e., cij = aij bij (evenifthesizeof A, B waschosenappropriately).Thiskindofelement-wisemultiplicationoftenappears inprogramminglanguageswhenwemultiply(multi-dimensional)arrays witheachother,andiscalleda Hadamardproduct
Example2.3
Fromthisexample,wecanalreadyseethatmatrixmultiplicationisnot commutative,i.e., AB = BA;seealsoFigure 2.5 foranillustration.
Definition2.2 (IdentityMatrix). In Rn×n,wedefinethe identitymatrix
Commonly,thedot productbetween twovectors a, b is denotedby a b or a, b
c 2020M.P.Deisenroth,A.A.Faisal,C.S.Ong.TobepublishedbyCambridgeUniversityPress.
Figure2.5 Evenif bothmatrix multiplications AB and BA are defined,the dimensionsofthe resultscanbe different.
identitymatrix
A n×k B k×m = C n×m (2.14)
♦
♦ Hadamardproduct
A = 123 321 ∈ R2×3 , B = 02 1 1 01 ∈ R3×2,weobtain AB = 123 321 02 1 1 01 = 23 25 ∈ R2×2 , (2.15) BA = 02 1 1 01 123 321 = 642 202 321 ∈ R3×3 . (2.16)
For
I n := 10 ··· 0 ··· 0 01 0 0 . . . . . . . . . . 00 1 0 . . . . . . . . . . 00 0 1 ∈ Rn×n (2.17)