Download ebooks file Matrix and tensor decompositions in signal processing, volume 2 gérard favier a by Education Libraries

Visit to download the full and correct content document: https://ebookmass.com/product/matrix-and-tensor-decompositions-in-signal-processin g-volume-2-gerard-favier/

More products digital (pdf, epub, mobi) instant download maybe you interests ...

Digital Signal Processing 4th Edition John G. Proakis

https://ebookmass.com/product/digital-signal-processing-4thedition-john-g-proakis/

Signal processing for neuroscientists 2nd ed Edition Drongelen

https://ebookmass.com/product/signal-processing-forneuroscientists-2nd-ed-edition-drongelen/

Intelligent Multi-Modal Data Processing (The Wiley Series in Intelligent Signal and Data Processing) 1st Edition Soham Sarkar

https://ebookmass.com/product/intelligent-multi-modal-dataprocessing-the-wiley-series-in-intelligent-signal-and-dataprocessing-1st-edition-soham-sarkar/

Multiresolution and Multirate Signal Processing: Introduction, Principles and Applications Vikram Gadre

https://ebookmass.com/product/multiresolution-and-multiratesignal-processing-introduction-principles-and-applicationsvikram-gadre/

Machine Learning Algorithms for Signal and Image Processing Suman Lata Tripathi

https://ebookmass.com/product/machine-learning-algorithms-forsignal-and-image-processing-suman-lata-tripathi/

Cellular Signal Processing: An Introduction to the Molecular Mechanisms of Signal Transduction, Second Edition 2nd Edition, (Ebook PDF)

https://ebookmass.com/product/cellular-signal-processing-anintroduction-to-the-molecular-mechanisms-of-signal-transductionsecond-edition-2nd-edition-ebook-pdf/

Machine Learning for Signal Processing: Data Science, Algorithms, and Computational Statistics Max A. Little

https://ebookmass.com/product/machine-learning-for-signalprocessing-data-science-algorithms-and-computational-statisticsmax-a-little/

Methods and techniques for fire detection : signal, image and video processing perspectives 1st Edition Çetin

https://ebookmass.com/product/methods-and-techniques-for-firedetection-signal-image-and-video-processing-perspectives-1stedition-cetin/

Synthetic Organic Chemistry and the Nobel Prize, Volume 2 John G. D’Angelo

https://ebookmass.com/product/synthetic-organic-chemistry-andthe-nobel-prize-volume-2-john-g-dangelo/

Matrix and Tensor Decompositions in Signal Processing

Matrices and Tensors with Signal Processing Set coordinated by Gérard Favier

Matrix and Tensor Decompositions in Signal Processing

Gérard Favier

First published 2021 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:

ISTE Ltd

John Wiley & Sons, Inc.

27-37 St George’s Road 111 River Street London SW19 4EU Hoboken, NJ 07030

UK USA

www.iste.co.uk

www.wiley.com

The rights of Gérard Favier to be identified as the author of this work have been asserted by him in accordance with the Copyright, Designs and Patents Act 1988.

Library of Congress Control Number: 2021938218

British Library Cataloguing-in-Publication Data

A CIP record for this book is available from the British Library

ISBN 978-1-78630-155-0

Chapter1.MatrixDecompositions

1.1.Introduction.................................1

1.2.Overviewofthemostcommonmatrixdecompositions.........2

1.3.Eigenvaluedecomposition.........................5

1.3.1.Remindersabouttheeigenvaluesofamatrix.............5

1.3.2.Eigendecompositionandproperties..................7

1.3.3.Specialcaseofsymmetric/Hermitianmatrices............10

1.3.4.Applicationtocomputethepowersofamatrixandamatrix polynomial...................................12

1.3.5.Applicationtocomputeastatetransitionmatrix...........12

1.3.6.Applicationtocomputethetransferfunctionandtheoutputofa discrete-timelinearsystem...........................13

1.4. URV H decomposition...........................13

1.5.Singularvaluedecomposition.......................15

1.5.1.Deﬁnitionandproperties........................15

1.5.2.ReducedSVDanddyadicdecomposition...............17

1.5.3.SVDandfundamentalsubspacesassociatedwithamatrix.....20

1.5.4.SVDandtheMoore–Penrosepseudo-inverse.............20

1.5.5.SVDcomputation............................21

1.5.6.SVDandmatrixnorms.........................22

1.5.7.SVDandlow-rankmatrixapproximation...............25

1.5.8.SVDandorthogonalprojectors....................28

1.5.9.SVDandLSestimator.........................28

1.5.10.SVDandpolardecomposition....................31

1.5.11.SVDandPCA.............................33

1.5.12.SVDandblindsourceseparation...................38

1.6. CUR decomposition............................43

Chapter2.Hadamard,KroneckerandKhatri–RaoProducts .....47

2.1.Introduction.................................47

2.2.Notation...................................49

2.3.Hadamardproduct..............................50

2.3.1.Deﬁnitionandidentities.........................50

2.3.2.Fundamentalproperties.........................51

2.3.3.Basicrelations..............................51

2.3.4.Relationsbetweenthe diag operatorandHadamardproduct....52

2.4.Kroneckerproduct..............................54

2.4.1.Kroneckerproductofvectors......................54

2.4.2.Kroneckerproductofmatrices.....................58

2.4.3.Rank,trace,determinantandspectrumofaKroneckerproduct..64

2.4.4.StructuralpropertiesofaKroneckerproduct.............66

2.4.5.InverseandMoore–Penrosepseudo-inverseofa Kroneckerproduct...............................67

2.4.6.DecompositionsofaKroneckerproduct...............68

2.5.Kroneckersum................................69

2.5.1.Deﬁnition.................................69

2.5.2.Properties.................................70

2.6.Indexconvention...............................70

2.6.1.Writingvectorsandmatriceswiththeindexconvention.......71

2.6.2.Basicrulesandidentitieswiththeindexconvention.........72

2.6.3.Matrixproductsandindexconvention.................74

2.6.4.Kroneckerproductsandindexconvention..............75

2.6.5.Vectorizationandindexconvention..................77

2.6.6.Vectorizationformulae.........................79

2.6.7.Vectorizationofpartitionedmatrices.................82

2.6.8.Tracesofmatrixproductsandindexconvention...........84

2.7.Commutationmatrices...........................86

2.7.1.Deﬁnition.................................87

2.7.2.Properties.................................88

2.7.3.Kroneckerproductandpermutationoffactors............88

2.7.4.MultipleKroneckerproductandcommutationmatrices.......90

2.7.5.BlockKroneckerproduct........................92

2.7.6.StrongKroneckerproduct.......................94

2.8.Relationsbetweenthe diag operatorandtheKroneckerproduct....94

2.9.Khatri–Raoproduct.............................95

2.9.1.Deﬁnition.................................95

2.9.2.Khatri–Raoproductandindexconvention..............96

2.9.3.MultipleKhatri–Raoproduct......................97

2.9.4.Properties.................................97

2.9.5.Identities.................................98

2.9.6.Khatri–Raoproductandpermutationoffactors...........99

2.9.7.TraceofaproductofmatricesandKhatri–Raoproduct.......100 2.10.RelationsbetweenvectorizationandKroneckerandKhatri–Rao products......................................101

2.11.RelationsbetweentheKronecker,Khatri–Raoand Hadamardproducts................................101

2.12.Applications................................108

2.12.1.Partialderivativesandindexconvention...............108

2.12.2.Solvingmatrixequations.......................116

Chapter3.TensorOperations

3.1.Introduction.................................125

3.2.Notationandparticularsetsoftensors...................127

3.3.Notionofslice................................133

3.3.1.Fibers...................................133

3.3.2.Matrixandtensorslices.........................133

3.4.Modecombination..............................135

3.5.Partitionedtensorsorblocktensors....................137

3.6.Diagonaltensors...............................139

3.6.1.Caseofatensor X∈ K[N ;I ] ......................139

3.6.2.Caseofasquaretensor.........................140

3.6.3.Caseofarectangulartensor......................141

3.7.Matricization.................................141

3.7.1.Matricizationofathird-ordertensor..................142

3.7.2.Matrixunfoldingsandindexconvention...............143

3.7.3.Matricizationofatensoroforder N ..................144

3.7.4.Tensormatricizationbyindexblocks.................147

3.8.Subspacesassociatedwithatensorandmultilinearrank........148

3.9.Vectorization.................................149

3.9.1.Vectorizationofatensoroforder N ..................149

3.9.2.Vectorizationofathird-ordertensor..................150

3.10.Transposition................................151

3.10.1.Deﬁnitionofatransposetensor....................151

3.10.2.Propertiesoftransposetensors....................152

3.10.3.Transpositionandtensorcontraction.................155

3.11.Symmetric/partiallysymmetrictensors.................156

3.11.1.Symmetrictensors...........................156

3.11.2.Partiallysymmetric/Hermitiantensors................156

3.11.3.MultilinearformswithHermitiansymmetryand Hermitiantensors................................159

3.11.4.Symmetrizationofatensor......................161

3.12.Triangulartensors.............................166

3.13.Multiplicationoperations.........................166

3.13.1.Outerproductoftensors........................168

3.13.2.Tensor–matrixmultiplication.....................170

3.13.3.Tensor–vectormultiplication.....................174

3.13.4.Mode-(p,n) product..........................176

3.13.5.Einsteinproduct............................178

3.14.Inverseandpseudo-inversetensors....................186

3.15.Tensordecompositionsintheformoffactorizations..........193

3.15.1.Eigendecompositionofasymmetricsquaretensor.........193

3.15.2.SVDdecompositionofarectangulartensor.............194

3.15.3.ConnectionbetweenSVDandHOSVD...............194

3.15.4.Full-rankdecomposition........................197

3.16.Innerproduct,Frobeniusnormandtraceofatensor..........199

3.16.1.Innerproductoftwotensors.....................199

3.16.2.Frobeniusnormofatensor......................200

3.16.3.Traceofatensor............................203

3.17.Tensorsystemsandhomogeneouspolynomials.............203

3.17.1.Multilinearsystemsbasedonthemode-n product.........204

3.17.2.TensorsystemsbasedontheEinsteinproduct............207

3.17.3.SolvingtensorsystemsusingLS...................209

3.18.HadamardandKroneckerproductsoftensors..............211

3.19.Tensorextension..............................213

3.20.Tensorization................................215

3.21.Hankelization................................217

Chapter4.EigenvaluesandSingularValuesofaTensor .......221

4.1.Introduction.................................221

4.2.Eigenvaluesofatensorofordergreaterthantwo............224

4.2.1.Differentdeﬁnitionsoftheeigenvaluesofatensor..........224

4.2.2.Positive/negative(semi-)deﬁnitetensors...............232

4.2.3.Orthogonally/unitarilysimilartensors.................233

4.3.Bestrank-oneapproximation........................235

4.4.Orthogonaldecompositions.........................238

4.5.Singularvaluesofatensor.........................239

Chapter5.TensorDecompositions .....................241

5.1.Introduction.................................241

5.2.Tensormodels................................242

5.2.1.Tuckermodel..............................242

5.2.2.Tucker-(N1 ,N )model.........................249

5.2.3.Tuckermodelofatransposetensor..................251

5.2.4.TuckerdecompositionandmultidimensionalFouriertransform..251

5.2.5.PARAFACmodel............................252

5.2.6.Blocktensormodels...........................271

5.2.7.Constrainedtensormodels.......................273

5.3.Examplesoftensormodels.........................277

5.3.1.Modelofmultidimensionalharmonics................277

5.3.2.Sourceseparation............................278

5.3.3.ModelofaFIRsystemusingfourth-orderoutputcumulants....282

Theﬁrstbookofthisserieswasdedicatedtointroducingmatricesandtensors(of ordergreaterthantwo)fromtheperspectiveoftheiralgebraicstructure,presenting theirsimilarities,differencesandconnectionswithrepresentationsoflinear,bilinear andmultilinearmappings.Thissecondvolumewillnowstudytensoroperationsand decompositionsingreaterdepth.

Inthisintroduction,wewillmotivatetheuseoftensorsbyansweringﬁvequestions thatprospectiveusersmightandshouldask:

–Whataretheadvantagesoftensorapproaches?

–Forwhatuses?

–Inwhatﬁeldsofapplication?

–Withwhattensordecompositions?

–Withwhatcostfunctionsandoptimizationalgorithms?

Althoughouranswersarenecessarilyincomplete,ouraimisto:

–presenttheadvantagesoftensorapproachesovermatrixapproaches;

–showafewexamplesofhowtensortoolscanbeused;

–giveanoverviewoftheextensivediversityofproblemsthatcanbesolvedusing tensors,includingafewexampleapplications;

–introducethethreemostwidelyusedtensordecompositions,presentingsomeof theirpropertiesandcomparingtheirparametriccomplexity;

–stateafewproblemsbasedontensormodelsintermsofthecostfunctionstobe optimized;

–describevarioustypesoftensor-basedprocessing,withabriefglimpseofthe optimizationmethodsthatcanbeused.

I.1.Whataretheadvantagesoftensorapproaches?

Inmostapplications,atensor X oforder N isviewedasanarrayofrealorcomplex numbers.Thecurrentelementofthetensorisdenoted xi1 , ,iN ,whereeachindex in ∈ In {1, ,In },for n ∈ N {1, ,N },isassociatedwiththe nth mode,and In isitsdimension,i.e.thenumberofelementsforthe nthmode.The orderofthetensoristhenumber N ofindices,i.e.thenumberofmodes.Tensorsare writtenwithcalligraphicletters1.An N th-ordertensorwithentries xi1 , ··· ,iN iswritten X =[xi1 , ··· ,iN ] ∈ KI1 ×···×IN ,where K = R or C,dependingonwhetherthetensor isreal-valuedorcomplex-valued,and I1 ×···× IN representsthesizeof X .

Ingeneral,amode(alsocalledaway)canhaveoneofthefollowing interpretations:(i)asasourceofinformation(user,patient,client,trial,etc.);(ii)asa typeofentityattachedtothedata(items/products,typesofmusic,typesofﬁlm,etc.); (iii)asatagthatcharacterizesanitem,apieceofmusic,aﬁlm,etc.;(iv)asa recordingmodalitythatcapturesdiversityinvariousdomains(space,time, frequency,wavelength,polarization,color,etc.).Thus,adigitalimageincolorcanbe representedasathree-dimensionaltensor(ofpixels)withtwospatialmodes,onefor therows(width)andoneforthecolumns(height),andonechannelmode(RGB colors).Forexample,acolorimagecanberepresentedasatensorofsize 1024 × 768 × 3,wherethethirdmodecorrespondstotheintensityofthethreeRGB colors(red,green,blue).Foravolumetricimage,therearethreespatialmodes (width × height × depth),andthepointsoftheimagearecalledvoxels.Inthe contextofhyperspectralimagery,inadditiontothetwospatialdimensions,thereisa thirddimensioncorrespondingtotheemissionwavelengthwithinaspectralband.

Tensorapproachesbenefitfromthefollowingadvantagesovermatrixapproaches: –theessentialuniquenessproperty2,satisfiedbysometensordecompositions, suchasPARAFAC(parallelfactors)(Harshman1970)undercertainmildconditions; formatrixdecompositions,thispropertyrequirescertainrestrictiveconditionson thefactormatrices,suchasorthogonality,non-negativity,oraspecificstructure (triangular,Vandermonde,Toeplitz,etc.);

–theabilitytosolvecertainproblems,suchastheidentiﬁcationofcommunication channels,directlyfrommeasuredsignals,withoutrequiringthecalculationof

1Scalars,vectors,andmatricesarewritteninlowercase,boldlowercase,andbolduppercase letters,respectively: a, a, A. 2Adecompositionsatisﬁestheessentialuniquenesspropertyifitisuniqueuptopermutation andscalingfactorsinthecolumnsofitsfactormatrices.

high-orderstatisticsofthesesignalsortheuseoflongpilotsequences.Theresulting deterministicandsemi-blindprocessingscanbeperformedwithsignalrecordings thatareshorterthanthoserequiredbystatisticalmethods,basedontheestimation ofhigh-ordermomentsorcumulants.Fortheblindsourceseparationproblem,tensor approachescanbeusedtotacklethecaseofunderdeterminedsystems,i.e.systems withmoresourcesthansensors;

–thepossibilityofcompressingbigdatasetsviaadatatensorizationandtheuse ofatensordecomposition,inparticular,alowmultilinearrankapproximation;

–agreaterﬂexibilityinrepresentingandprocessingmultimodaldataby consideringthemodalitiesseparately,insteadofstackingthecorrespondingdatainto avectororamatrix.Thisallowsthemultilinearstructureofdatatobepreserved, meaningthatinteractionsbetweenmodescanbetakenintoaccount;

–agreaternumberofmodalitiescanbeincorporatedintotensorrepresentations ofdata,meaningthatmorecomplementaryinformationisavailable,whichallows theperformanceofcertainsystemstobeimproved,e.g.wirelesscommunication, recommendation,diagnostic,andmonitoringsystems,bymakingdetection, interpretation,recognition,andclassiﬁcationoperationseasierandmoreefﬁcient. Thisledtoageneralizationofcertainmatrixalgorithms,likeSVD(singularvalue decomposition)toMLSVD(multilinearSVD),alsoknownasHOSVD(higherorder SVD)(deLathauwer etal.2000a);similarly,certainsignalprocessingalgorithms weregeneralized,likePCA(principalcomponentanalysis)toMPCA(multilinear PCA)(Lu etal.2008)orTRPCA(tensorrobustPCA)(Lu etal.2020)and ICA(independentcomponentanalysis)toMICA(multilinearICA)(Vasilescuand Terzopoulos2005)ortensorPICA(probabilisticICA)(BeckmannandSmith2005).

Itisworthnotingthat,withatensormodel,thenumberofmodalitiesconsidered inaproblemcanbeincreasedeitherbyincreasingtheorderofthedatatensororby couplingtensorand/ormatrixdecompositionsthatshareoneorseveralmodes.Such acouplingapproachiscalleddatafusionusingacoupledtensor/matrixfactorization. Twoexamplesofthistypeofcouplingarepresentedlaterinthisintroductorychapter. Intheﬁrst,EEGsignalsarecoupledwithfunctionalmagneticresonanceimaging (fMRI)datatoanalyzethebrainfunction;inthesecond,hyperspectraland multispectralimagesaremergedforremotesensing.

Theotherapproach,namely,increasingthenumberofmodalities,willbe illustratedinVolume3ofthisseriesbygivingauniﬁedpresentationofvarious modelsofwirelesscommunicationsystemsdesignedusingtensors.Inorderto improvesystemperformance,bothintermsoftransmissionandreception,theideais toemploymultipletypesofdiversitysimultaneouslyinvariousdomains(space,time, frequency,code,etc.),eachtypeofdiversitybeingassociatedwithamodeofthe tensorofreceivedsignals.Coupledtensormodelswillalsobepresentedinthe contextofcooperativecommunicationsystemswithrelays.

I.2.Forwhatuses?

Inthebigdata3 era,digitalinformationprocessingplaysakeyroleinvarious fieldsofapplication.Eachfieldhasitsownspecificitiesandrequiresspecialized, oftenmultidisciplinary,skillstomanageboththemultimodalityofthedataandthe processingtechniquesthatneedtobeimplemented.Thus,the“intelligent” informationprocessingsystemsofthefuturewillhavetointegraterepresentation tools,suchastensorsandgraphs,signalandimageprocessingmethods,with artificialintelligencetechniquesbasedonartificialneuralnetworksandmachine learning.

Theneedsofsuchsystemsarediverseandnumerous–whetherintermsof storage,visualization(3Drepresentation,virtualreality,disseminationofworksof art),transmission,imputation,prediction/forecasting,analysis,classiﬁcationor fusionofmultimodalandheterogeneousdata.ThereaderisinvitedtorefertoLahat etal.(2015)andPapalexakis etal.(2016)forapresentationofvariousexamplesof datafusionanddataminingbasedontensormodels.

Someofthekeyapplicationsoftensortoolsareasfollows:

–decompositionorseparationofheterogeneousdatasetsintocomponents/factors orsubspaceswiththegoalofexploitingthemultimodalstructureofthedataand extractingusefulinformationforusersfromuncertainornoisydataormeasurements providedbydifferentsourcesofinformationand/ortypesofsensor.Thus,featurescan beextractedindifferentdomains(spatial,temporal,frequential)forclassiﬁcationand decision-makingtasks;

–imputationofmissingdatawithinanincompletedatabaseusingalow-rank tensormodel,wherethemissingdataresultsfromdefectivesensorsorcommunication links,forexample.Thistaskiscalledtensorcompletionandisahigherorder generalizationofmatrixcompletion(CandèsandRecht2009;Signoretto etal.2011; Liu etal.2013);

–recoveryofusefulinformationfromcompresseddatabyreconstructingasignal oranimagethathasasparserepresentationinapredeﬁnedbasis,usingcompressive sampling(CS;alsoknownascompressedsensing)techniques(CandèsandWakin 2008;CandèsandPlan2010),appliedtosparse,low-ranktensors(Sidiropoulosand Kyrillidis2012);

–fusionofdatausingcoupledtensorandmatrixdecompositions;

–designofcooperativemulti-antennacommunicationsystems(alsocalled MIMO(multiple-inputmultiple-output);thistypeofapplication,whichledtothe

3Bigdataischaracterizedby3Vs(Volume,Variety,Velocity)linkedtothesizeofthedataset, theheterogeneityofthedataandtherateatwhichitiscaptured,storedandprocessed.

developmentofseveralnewtensormodels,willbeconsideredinthenexttwovolumes ofthisseries;

–multilinearcompressivelearningthatcombinescompressedsensingwith machinelearning;

–reductionofthedimensionalityofmultimodal,heterogeneousdatabaseswith verylargedimensions(bigdata)bysolvingalow-ranktensorapproximationproblem;

–multiwayﬁlteringandtensordatadenoising.

Tensorscanalsobeusedtotensorizeneuralnetworkswithfullyconnectedlayers byexpressingtheweightmatrixofalayerasatensortrain(TT)whosecores representtheparametersofthelayer.Thisconsiderablyreducestheparametric complexityand,therefore,thestoragespace.Thiscompressionpropertyofthe informationcontainedinlayeredneuralnetworkswhenusingtensordecompositions providesawaytoincreasethenumberofhiddenunits(Novikov etal.2015).Tensors, whenusedtogetherwithmultilayerperceptronneuralnetworkstosolveclassiﬁcation problems,achievelowererrorrateswithfewerparametersandlesscomputationtime thanneuralnetworksalone(ChienandBao2017).Neuralnetworkscanalsobeused tolearntherankofatensor(Zhou etal.2019),ortocomputeitseigenvaluesand singularvalues,andhencetherank-oneapproximationofatensor(Che etal.2017).

I.3.Inwhatﬁeldsofapplication?

Tensorshaveapplicationsinmanydomains.Thefieldsofpsychometricsand chemometricsinthe1970sand1990spavedthewayforsignalandimageprocessing applications,suchasblindsourceseparation,digitalcommunications,andcomputer visioninthe1990sandearly2000s.Today,thereisaquantitativeexplosionof bigdatainmedicine,astronomy,meteorology,withfifth-generationwireless communications(5G),formedicaldiagnosticaid,webservicesdeliveredby recommendationsystems(videoondemand,onlinesales,restaurantandhotel reservations,etc.),aswellasforinformationsearchingwithinmultimediadatabases (texts,images,audioandvideorecordings)andwithsocialnetworks.Thisexplains whyvariousscientificcommunitiesandtheindustrialworldareshowingagrowing interestintensors.

Amongthemanyexamplesofapplicationsoftensorsforsignalandimage processing,wecanmention:

–blindsourceseparationandblindsystemidentiﬁcation.Theseproblemsplay afundamentalroleinsignalprocessing.Theyinvolveseparatingtheinputsignals (alsocalledsources)andidentifyingasystemfromtheknowledgeofonlytheoutput signalsandcertainhypothesesabouttheinputsignals,suchasstatisticalindependence inthecaseofindependentcomponentanalysis(Comon1994),ortheassumptionofa

ﬁnitealphabetinthecontextofdigitalcommunications.Thistypeofprocessingis,in particular,usedtojointlyestimatecommunicationchannelsandinformationsymbols emittedbyatransmitter.Itcanalsobeusedforspeechormusicseparation,orto processseismicsignals;

–useoftensordecompositionstoanalyzebiomedicalsignals(EEG,MEG,ECG, EOG4)inthespace,timeandfrequencydomains,inordertoprovideamedical diagnosticaid;forinstance,Acar etal.(2007)usedaPARAFACmodelofEEGsignals toanalyzeepilepticseizures;Becker etal.(2014)usedthesametypeofdecomposition tolocatesourceswithinEEGsignals;

–analysisofbrainactivitybymergingimagingdata(fMRI)andbiomedical signals(EEGandMEG)withthegoalofenablingnon-invasivemedicaltests(see TableI.4);

–analysisandclassiﬁcationofhyperspectralimagesusedinmanyﬁelds (medicine,environment,agriculture,monitoring,astrophysics,etc.).Toimprovethe spatialresolutionofhyperspectralimages,Li etal.(2018)mergedhyperspectral andmultispectralimagesusingacoupledTuckerdecompositionwithasparsecore (coupledsparsetensorfactorization(CSTF))(seeTableI.4);

–designofsemi-blindreceiversforpoint-to-pointorcooperativeMIMO communicationsystemsbasedontensormodels;seetheoverviewsbydeAlmeida etal.(2016)anddaCosta etal.(2018);

–modelingandidentiﬁcationofnonlinearsystemsviaatensorrepresentationof VolterrakernelsorWiener–Hammersteinsystems(see,forexample,Kibangouand Favier2009a,2010;FavierandKibangou2009;FavierandBouilloc2009,2010; Favier etal.2012a);

–identificationoftensor-basedseparabletrilinearsystemsthatarelinearwith respectto(w.r.t.)theinputsignalandtrilinearw.r.t.thecoefficientsoftheglobal impulseresponse,modeledasaKroneckerproductofthreeindividualimpulse responses(Elisei-Iliescu etal.2020).Notethatsuchsystemsaretobecompared withthird-orderVolterrafiltersthatarelinearw.r.t.theVolterrakernelcoefficients andtrilinearw.r.t.theinputsignal;

–facialrecognition,basedonfacetensors,forpurposesofauthenticationand identiﬁcationinsurveillancesystems.Forfacialrecognition,photosofpeopleto recognizearestoredinadatabasewithdifferentlightingconditions,differentfacial expressions,frommultipleangles,foreachindividual.InVasilescuandTerzopoulos (2002),thetensoroffacialimagesisoforderﬁve,withdimensions: 28 × 5 × 3 × 3 × 7943,correspondingtothemodes: people × views × illumination × expressions × pixelsperimage.Foranoverviewofvariousfacialrecognitionsystems, seeArachchilageandIzquierdo(2020);

4Electroencephalography(EEG),magnetoencephalography(MEG),electrocardiography (ECG)andelectrooculography(EOG).

–tensor-basedanomalydetectionusedinmonitoringandsurveillancesystems.

TableI.1presentsafewexamplesofsignalandimagetensors,specifyingthe natureofthemodesineachcase.

Signals

Modes

References

Antenna space(antennas) × time × sensorsubnetwork (Sidiropoulos etal.2000a) processing space × time × polarization (Raimondi etal.2017)

Digital space(antennas) × time × code (Sidiropoulos etal.2000b)

communications antennas × blocks × symbolperiods × code × frequencies (FavieranddeAlmeida2014b)

ECG space(electrodes) × time × frequencies (Acar etal.2007;Padhy etal.2019)

EEG space(electrodes) × time × frequencies × subjectsortrials (Becker etal.2014;Cong etal.2015)

subjects × electrodes × time + subjects × voxels

EEG+fMRI (modelwithmatrixandtensorfactorizations (Acar etal.2017) coupledviathe“subjects”mode)

Images

Colorimages space(width) × space(height) × channel(colors)

Videosingrayscale space(width) × space(height) × time

Videosincolor space × space × channel × time

Hyperspectralimages space × space × spectralbands (Makantasis etal.2018)

Computervision people × views × illumination × expressions × pixels (VasilescuandTerzopoulos2002)

TableI.1. Signalandimagetensors

OtherﬁeldsofapplicationareconsideredinTableI.2.

Below,wegivesomedetailsabouttheapplicationconcerningrecommendation systems,whichplayanimportantroleinvariouswebsites.Thegoalofthesesystems istohelpuserstoselect items from tags thathavebeenassignedtoeachitembyusers. Theseitemscould,forexample,bemovies,books,musicalrecordings,webpages, productsforsaleonane-commercesite,etc.Astandardrecommendationsystemis basedonthethreefollowingmodes: users × items × tags.

Collaborativefilteringtechniquesusetheopinionsofasetofpeople,or assessmentsfromthesepeoplebasedonaratingsystem,togeneratealistof recommendationsforaspecificuser.Thistypeoffilteringis,forexample,usedby websiteslikeNetflixforrentingDVDs.Collaborativefilteringmethodsareclassified intothreecategories,dependingonwhetherthefilteringisbasedon(a)historyanda similaritymetric;(b)amodelbasedonmatrixfactorizationusingalgorithmslike SVDornon-negativematrixfactorization(NMF);(c)somecombinationofboth, knownashybridcollaborativefilteringtechniques.SeeLuo etal.(2014)andBokde etal.(2015)forapproachesbasedonmatrixfactorization.

Otherso-calledpassiveﬁlteringtechniquesexploitthedataofamatrixofrelations betweenitemstodeducerecommendationsforauserfromcorrelationsbetweenitems andtheuser’spreviouschoices,withoutusinganykindofratingsystem.Thisisknown asacontent-basedapproach.

Domains

Modes

References

Phonetics subjects × vowels × formants (Harshman1970)

Chemometrics excitation × emission × samples (ﬂuorescence) (excitation/emissionwavelengths) (Bro1997,2006;Smilde etal.2004)

Contextual (RendleandSchmidt-Thieme2010) recommendation users × items × tags (SymeonidisandZioupos2016) systems × context1 ×···× contextN (FrolovandOseledets2017)

Transportation Space(sensors) × time(days) × time(weeks) (Goulart etal.2017) (speedmeasurements) (periodsof15sand24h) (Tan etal.2013;Ran etal.2016)

typesofmusic × frequencies × frequencies (Panagakis etal.2010) users × keywords × songs (Nanopoulos etal.2010) Music recordings × (audio)characteristics × segments (BenetosandKotropoulos2008)

Bioinformatics medicine × targets × diseases (Wang etal.2019)

TableI.2. Otherﬁeldsofapplication

Recommendationsystemscanalsouseinformationabouttheusers(age, nationality,geographiclocation,participationonsocialnetworks,etc.)andtheitems themselves(typesofmusic,typesofﬁlm,classesofhotels,etc.).Thisiscalled contextualinformation.Takingthisadditionalinformationintoaccountallowsthe relevanceoftherecommendationstobeimproved,atthecostofincreasingthe dimensionalityandthecomplexityofthedatarepresentationmodeland,therefore,of theprocessingalgorithms.Thisiswhytensorapproachesaresoimportantforthis typeofapplicationtoday.Notethat,forrecommendationsystems,thedatatensors aresparse.Consequently,sometagscanbeautomaticallygeneratedbythesystem basedonsimilaritymetricsbetweenitems.Thisis,forexample,thecaseformusic recommendationsbasedontheacousticcharacteristicsofsongs(Nanopoulos etal. 2010).Personalizedtagrecommendationstakeintoaccounttheuser’sproﬁle, preferences,andinterests.Thesystemcanalsohelptheuserselectexistingtagsor createnewones(RendleandSchmidt-Thieme2010).

ThearticlesbyBobadilla etal.(2013)andFrolovandOseledets(2017)present variousrecommendationsystemswithmanybibliographicalreferences.Operating accordingtoasimilarprincipleasrecommendationsystems,socialnetwork websites,suchasWikipedia,Facebook,orTwitter,allowdifferenttypesofdatatobe exchangedandshared,contenttobeproducedandconnectionstobeestablished.

I.4.Withwhattensordecompositions?

Itisimportanttonotethat,foran N th-ordertensor X∈ KI1 ×···×IN ,thenumber ofelementsis N n=1 In ,and,assuming In = I for n ∈ N ,thisnumberbecomes I N , whichinducesanexponentialincreasewiththetensororder N .Thisiscalledthecurse ofdimensionality(OseledetsandTyrtyshnikov2009).Forbigdatatensors,tensor decompositionsplayafundamentalroleinalleviatingthiscurseofdimensionality, duetothefactthatthenumberofparametersthatcharacterizethedecompositionsis generallymuchsmallerthanthenumberofelementsintheoriginaltensor.

Wenowintroducethreebasicdecompositions:PARAFAC/CANDECOMP/CPD, TDandTT5.TheﬁrsttwoarestudiedindepthinChapter5,whereasthethird,brieﬂy introducedinChapter3,willbeconsideredinmoredetailinVolume3.

TableI.3givestheexpressionoftheelement xi1 , ,iN ofatensor X∈ KI1 ×···×IN oforder N andsize I1 ×···× IN ,eitherreal(K = R)orcomplex(K = C),foreach ofthethreedecompositionscitedabove.Theirparametriccomplexityiscomparedin termsofthesizeofeachmatrixandtensorfactor,assuming In = I and Rn = R for all n ∈ N

FiguresI.1–I.3showgraphicalrepresentationsofthePARAFACmodel A(1) , A(2) , A(3) ; R andtheTDmodel G ; A(1) , A(2) , A(3) forathird-order tensor X∈ KI1 ×I2 ×I3 ,andoftheTTmodel A(1) , A(2) , A(3) , A(4) fora fourth-ordertensor X∈ KI1 ×I2 ×I3 ×I4 .InthecaseofthePARAFACmodel,we deﬁne A(n) a(n) 1 , , a(n) R ∈ KIn ×R usingitscolumns,for n ∈{1, 2, 3}

FigureI.1. Third-orderPARAFACmodel

Wecanmakeafewremarksabouteachofthesedecompositions: –ThePARAFACdecomposition(Harshman1970),alsoknownasCANDECOMP (CarrollandChang1970)orCPD(Hitchcock1927),ofa N th-ordertensor X isasum 5PARAFACforparallelfactors;CANDECOMPforcanonicaldecomposition;CPDfor canonicalpolyadicdecomposition;TDforTuckerdecomposition;TTfortensortrain.

xxMatrixandTensorDecompositionsinSignalProcessing of R rank-onetensors,eachdeﬁnedastheouterproductofonecolumnfromeachof the N matrixfactors A(n) ∈ KIn ×R .When R isminimal,itiscalledtherankof thetensor.Ifthematrixfactorssatisfycertainconditions,thisdecompositionhasthe essentialuniquenessproperty.SeeFigureI.1forathird-ordertensor (N =3),and Chapter5foradetailedpresentation. Decompositions

TableI.3. ParametriccomplexityoftheCPD,TD, andTTdecompositions

FigureI.2. Third-orderTuckermodel

FigureI.3. Fourth-orderTTmodel

–TheTuckerdecomposition(Tucker1966)canbeviewedasageneralizationof thePARAFACdecompositionthattakesintoaccountalltheinteractionsbetweenthe columnsofthematrixfactors A(n) ∈ KIn ×Rn viatheintroductionofacoretensor

G∈ KR1 ×···×RN .Thisdecompositionisnotuniqueingeneral.Notethat,if Rn ≤ In for ∀n ∈ N ,thenthecoretensor G providesacompressedformof X .If Rn ,for n ∈ N ,ischosenastherankofthemode-n matrixunfolding6 of X ,thenthe N -tuple (R1 , ··· ,RN ) isminimal,anditiscalledthemultilinearrankofthetensor.

SuchaTuckerdecompositioncanbeobtainedusingthetruncatedhigh-order SVD(THOSVD),undertheconstraintofcolumn-orthonormalmatrices A(n) (deLathauwer etal.2000a).Thisalgorithmisdescribedinsection5.2.1.8.

SeeFigureI.2forathird-ordertensor,andChapter5foradetailedpresentation.

–TheTTdecomposition(Oseledets2011)iscomposedofatrainofthird-order tensors A(n) ∈ KRn 1 ×In ×Rn ,for n ∈{2, 3, ··· ,N 1},theﬁrstandlastcarriages ofthetrainbeingmatrices A(1) ∈ KI1 ×R1 and A(N ) ∈ KRN 1 ×IN ,whichimplies r0 = rN =1,andtherefore a(1)

and

.The dimensions Rn ,for n ∈ N 1 ,calledtheTTranks,aregivenbytheranksofsome matrixunfoldingsoftheoriginaltensor.

Thisdecompositionhasbeenusedtosolvethetensorcompletionproblem (Grasedyck etal.2015;Bengua etal.2017),forfacialrecognition(Brandoniand Simoncini2020)andformodelingMIMOcommunicationchannels(Zniyed etal. 2020),amongmanyotherapplications.AbriefdescriptionoftheTTdecompositionis giveninsection3.13.4usingthemode-(p,n) product.NotethataspeciﬁcSVD-based algorithm,calledTT-SVD,wasproposedbyOseledets(2011)forcomputingaTT decomposition.

ThisdecompositionandthehierarchicalTucker(HT)one(Grasedyckand Hackbush2011;Ballani etal.2013)arespecialcasesoftensornetworks(TNs) (Cichocki2014),aswillbediscussedinmoredetailinthenextvolume.

6Seedeﬁnition[3.41],inChapter3,ofthemode-n matrixunfolding Xn ofatensor X ,whose columnsarethemode-n vectorsobtainedbyﬁxingallbut n indices.

Fromthisbriefdescriptionofthethreetensormodels,onecanconcludethat, unlikematrices,thenotionofrankisnotuniquefortensors,sinceitdependsonthe decompositionused.Thus,asmentionedabove,onedeﬁnesthetensorrank(also calledthecanonicalrankorKruskal’srank)associatedwiththePARAFAC decomposition,themultilinearrankthatreliesontheTucker’smodel,andthe TT-rankslinkedwiththeTTdecomposition.

Itisimportanttonotethatthenumberofcharacteristicparametersofthe PARAFACandTTdecompositionsisproportionalto N ,theorderofthetensor, whereastheparametriccomplexityoftheTuckerdecompositionincreases exponentiallywith N .Thisiswhytheﬁrsttwodecompositionsareespecially valuableforlarge-scaleproblems.AlthoughtheTuckermodelisnotuniquein general,imposinganorthogonalityconstraintonthematrixfactorsyieldsthe HOSVDdecomposition,atruncatedformofwhichgivesanapproximatesolutionto thebestlowmultilinearrankapproximationproblem(deLathauwer etal.2000a). Thissolution,whichisbasedonan apriori choiceofthedimensions Rn ofthecore tensor,istobecomparedwiththetruncatedSVDinthematrixcase,althoughitdoes nothavethesameoptimalityproperty.Itiswidelyusedtoreducetheparametric complexityofdatatensors.

Fromtheabove,itcanbeconcludedthattheTTmodelcombinestheadvantages oftheothertwodecompositions,intermsofparametriccomplexity(likePARAFAC) andnumericalstability(likeTucker’smodel),duetoaparameterestimationalgorithm basedonacalculationofSVDs.

ToillustratetheuseofthePARAFACdecomposition,letusconsiderthecaseof multi-usermobilecommunicationswithaCDMA(code-divisionmultipleaccess) encodingsystem.Themultipleaccesstechniqueallowsmultipleemittersto simultaneouslytransmitinformationoverthesamecommunicationchannelby assigningacodetoeachemitter.Theinformationistransmittedassymbols sn,m , with n ∈ N and m ∈ M ,where N and M arethenumberoftransmissiontime slots,i.e.thenumberofsymbolperiods,andthenumberofemittingantennas, respectively.Thesymbolsbelongtoaﬁnitealphabetthatdependsonthemodulation beingused.Theyareencodedwithaspace-timecodingthatintroducescodediversity byrepeatingeachsymbol P timeswithacode cp,m assignedtothe mthemitting antenna, p ∈ P ,where P denotesthelengthofthespreadingcode.Thesignal receivedbythe k threceivingantenna,duringthe nthsymbolperiodandthe pthchip period,isalinearcombinationofthesymbolsencodedandtransmittedbythe M emittingantennas:

xk,n,p = M m=1 hk,m sn,m cp,m , [I.1] where hk,m isthefadingcoefﬁcientofthecommunicationchannelbetweenthe receivingantenna k andtheemittingantenna m

Thereceivedsignals,whicharecomplex-valued,thereforeformathird-order tensor X∈ CK ×N ×P whosemodesare: space × time × code,associatedwith theindices (k,n,p).ThissignaltensorsatisﬁesaPARAFACdecomposition H, S, C; M whoserankisequaltothenumber M ofemittingantennasandwhose matrixfactorsarethechannel(H ∈ CK ×M ),thematrixoftransmittedsymbols (S ∈ CN ×M )andthecodingmatrix(C ∈ CP ×M ).Thisexampleisasimpliﬁedform oftheDS-CDMA(direct-sequenceCDMA)systemproposedby(Sidiropoulos etal 2000b).

I.5.Withwhatcostfunctionsandoptimizationalgorithms?

Wewillnowbrieflydescribethemostcommonprocessingoperationscarriedout withtensors,aswellassomeoftheoptimizationalgorithmsthatareused.Itis importanttofirstpresentthepreprocessingoperationsthatneedtobeperformed. Preprocessingtypicallyinvolvesdatacenteringoperations(offsetelimination), scalingofnon-homogeneousdata,suppressionofoutliersandartifacts,image adjustment(size,brightness,contrast,alignment,etc.),denoising,signal transformationusingcertaintransforms(wavelets,Fourier,etc.),andfinally,insome cases,thecalculationofstatisticsofsignalstobeprocessed.

Preprocessingisfundamental,bothtoimprovethequalityoftheestimated modelsand,therefore,ofthesubsequentprocessingoperations,andtoavoid numericalproblemswithoptimizationalgorithms,suchasconditioningproblems thatmaycausethealgorithmstofailtoconverge.Centeringandscaling preprocessingoperationsarepotentiallyproblematicbecausetheyareinterdependent andcanbecombinedinseveraldifferentways.Ifdataaremissing,centeringcanalso reducetherankofthetensormodel.Foramoredetaileddescriptionofthese preprocessingoperations,seeSmilde etal.(2004).

Fortheprocessingoperationsthemselves,wecandistinguishbetweenseveral differentclasses:

–supervised/non-supervised(blindorsemi-blind),i.e.withorwithouttraining data,forexample,tosolveclassiﬁcationproblems,orwhen apriori information, calledapilotsequence,istransmittedtothereceiverforchannelestimation;

–real-time(online)/batch(ofﬂine)processing; –centralized/distributed;

–adaptive/blockwise(withrespecttothedata);

–with/withoutcouplingoftensorand/ormatrixmodels; –with/withoutmissingdata.

Itisimportanttodistinguishbatchprocessing,whichisperformedtoanalyzedata recordedassignalandimagesets,fromthereal-timeprocessingrequiredbywireless communicationsystems,recommendationsystems,websearchesandsocial networks.Inreal-timeapplications,thedimensionalityofthemodelandthe algorithmiccomplexityarepredominantfactors.Thesignalsreceivedbyreceiving antennas,theinformationexchangedbetweenawebsiteandtheusersandthe messagesexchangedbetweentheusersofasocialnetworkaretime-dependent.For instance,arecommendationsysteminteractswiththeusersinreal-time,viaa possibleextensionofanexistingdatabasebymeansofmachinelearningtechniques. Foradescriptionofvariousapplicationsoftensorsfordataminingandmachine learning,seeAnandkumar etal.(2014)andSidiropoulos etal.(2017).

Tensor-basedprocessingsleadtovarioustypesofoptimizationalgorithmas follows:

–constrained/unconstrainedoptimization; –iterative/non-iterative,orclosed-form;

–alternating/global; –sequential/parallel.

Furthermore,dependingontheinformationthatisavailable apriori,different typesofconstraintscanbetakenintoaccountinthecostfunctiontobeoptimized: lowrank,sparseness,non-negativity,orthogonalityanddifferentiability/smoothness. Inthecaseofconstrainedoptimization,weightsneedtobechoseninthecost functionaccordingtotherelativeimportanceofeachconstraintandthequalityofthe apriori informationthatisavailable.

TableI.4presentsafewexamplesofcostfunctionsthatcanbeminimizedfor theparameterestimationofcertainthird-ordertensormodels(CPD,Tucker,coupled matrixTucker(CMTucker)andcoupledsparsetensorfactorization(CSTF)),forthe imputationofmissingdatainatensorandfortheestimationofasparsedatatensor withalow-rankconstraintexpressedintheformofthenuclearnormofthetensor.

R EMARK I.1.–Wecanmakethefollowingremarks:

–thecostfunctionspresentedinTableI.4correspondtodataﬁttingcriteria.These criteria,expressedintermsoftensorandmatrixFrobeniusnorms( . F ),arequadratic inthedifferencebetweenthedatatensor X andtheoutputofCPDandTDmodels,as wellasbetweenthedatamatrix Y andamatrixfactorizationmodel,inthecaseofthe CMTuckermodel.Theyaretrilinearandquadrilinear,respectively,withrespecttothe parametersoftheCPDandTDmodelstobeestimated,andbilinearwithrespectto theparametersofthematrixfactorizationmodel;

–forthemissingdataimputationproblemusingaCPDorTDmodel,thebinary tensor W ,whichhasthesamesizeas X ,isdeﬁnedas:

wijk = 1 if xijk isknown 0 if xijk ismissing

ThepurposeoftheHadamardproduct(denoted )of W ,withthedifference between X andtheoutputoftheCPDandTDmodels,istoﬁtthemodeltothe availabledataonly,ignoringanymissingdataformodelestimation.Thisimputation problem,knownasthetensorcompletionproblem,wasoriginallydealtwithby TomasiandBro(2005)andAcar etal.(2011a)usingaCPDmodel,followedby FilipovicandJukic(2015)usingaTDmodel.Variousarticleshavediscussedthis probleminthecontextofdifferentapplications.Anoverviewoftheliteraturewillbe giveninthenextvolume; Problems

Estimation

Costfunctions

Imputation

Imputationwith low-rankconstraint

Costfunctions

TableI.4. Costfunctionsformodelestimation andrecoveryofmissingdata

–fortheimputationproblemwiththelow-rankconstraint,theterm X in thecostfunctionreplacesthelow-rankconstraintwiththenuclearnormof X , sincethefunctionrank(X ) isnotconvex,andthenuclearnormistheclosest

convexapproximationoftherank.InLiu etal.(2013),thistermisreplacedby 3 n=1 λn Xn ,where Xn representsthemode-n unfoldingof X 7;

–inthecaseoftheCMTuckermodel,thecouplingconsideredhererelatestothe ﬁrstmodesofthetensor X andthematrix Y ofdataviathecommonmatrixfactor A.

Coupledmatrixandtensorfactorization(CMTF)modelswereintroducedinAcar etal.(2011b)bycouplingaCPDmodelwithamatrixfactorizationandusingthe gradientdescentalgorithmtoestimatetheparameters.Thistypeofmodelwasused byAcar etal.(2017)tomergeEEGandfMRIdatawiththegoalofanalyzingbrain activity.TheEEGsignalsaremodeledwithanormalizedCPDmodel(seeChapter5), whereasthefMRIdataaremodeledwithamatrixfactorization.Thedataarecoupled throughthe subjects mode(seeTableI.1).Thecostfunctiontobeminimizedis thereforegivenby:

wherethecolumnvectorsofthematrixfactors(A, B, C)haveunitnorm, Σ isa diagonalmatrixwhosediagonalelementsarethecoefﬁcientsofthevector σ and α> 0 isapenaltyparameterthatallowstheimportanceofthesparsenessconstraints ontheweightvectors (g , σ ) tobeincreasedordecreased,modeledbymeansofthe l1 norm.TheadvantageofmergingEEGandfMRIdatawiththecriterion[I.3]isthatthe acquisitionandobservationmethodsarecomplementaryintermsofresolution,since EEGsignalshaveahightemporalresolutionbutlowspatialresolution,whilefMRI imagingprovideshighspatialresolution;

–inthecaseoftheCSTFmodel(Li etal.2018),thetensorofhigh-resolution hyperspectralimages(HR-HSI)isrepresentedusingathird-orderTuckermodel thathasasparsecore(X = G×1 W ×2 H ×3 S),withthefollowingmodes: space (width) × space (height) × spectralbands.Thematrices W ∈ RM ×nw , H ∈ RN ×nh and S ∈ RP ×ns denotethedictionariesforthewidth,heightand spectralmodes,composedof nw , nh and ns atoms,respectively,andthecoretensor G containsthecoefﬁcientsrelativetothethreedictionaries.Thematrices W ∗ , H∗ and S∗ arespatiallyandspectrallysubsampledversionswithrespecttoeachmode.The term λ isaregularizationparameterforthesparsenessconstraintonthecoretensor, expressedintermsofthe l1 normof G .

ThecriterialistedinTableI.4canbegloballyminimizedusinganonlinear optimizationmethodsuchasagradientdescentalgorithm(withﬁxedoroptimalstep size),ortheGauss–NewtonandLevenberg–Marquardtalgorithms,thelatterbeinga

7Seedeﬁnition[3.41]oftheunfolding Xn ,anddeﬁnitions[1.65]and[1.67]oftheFrobenius norm( F )andthenuclearnorm( ∗ )ofamatrix;foratensor,seesection3.16.

regularizedformoftheformer.Inthecaseofconstrainedoptimization,the augmentedLagrangianmethodisveryoftenused,asitallowstheconstrained optimizationproblemtobetransformedintoasequenceofunconstrained optimizationproblems.

Thedrawbacksoftheseoptimizationmethodsincludeslowconvergencefor gradient-typealgorithmsandhighnumericalcomplexityfortheGauss–Newtonand Levenberg–MarquardtalgorithmsduetotheneedtocomputetheJacobianmatrixof thecriterionw.r.t.theparametersbeingestimated,aswellastheinverseofalarge matrix.

Alternatingoptimizationmethodsarethereforeoftenusedinsteadofaglobal optimizationw.r.t.allmatrixandtensorfactorstobeestimated.Theseiterative methodsperformasequenceofseparateoptimizationsofcriterialinearineach unknownfactorwhileﬁxingtheotherfactorswiththevaluesestimatedatprevious iterations.AnexampleisthestandardALS(alternatingleastsquares)algorithm, presentedinChapter5forestimatingPARAFACmodels.Forconstrained optimization,thealternatingdirectionmethodofmultipliers(ADMM)isoftenused (Boyd etal.2011).

Tocompletethisintroductorychapter,letusoutlinethekeyknowledgeneededto employtensortools,whosepresentationconstitutesthemainobjectiveofthissecond volume:

–arrangement(alsocalledreshaping)operationsthatexpressthedatatensor asavector(vectorization),amatrix(matricization),oralowerordertensorby combiningmodes;conversely,thetensorizationandHankelizationoperationsallow ustoconstructtensorsfromdatacontainedinlargevectorsormatrices;

–tensoroperationssuchastransposition,symmetrization,Hadamardand Kroneckerproducts,inversionandpseudo-inversion;

–thenotionsofeigenvalueandsingularvalueofatensor;

–tensordecompositions/models,andtheiruniquenessproperties;

–algorithmsusedtosolvedimensionalityreductionproblemsand,hence,best low-rankapproximation,parameterestimationandmissingdataimputation.This algorithmicaspectlinkedtotensorswillbeexploredinmoredepthinVolume3.

I.6.Briefdescriptionofcontent

Tensoroperationsanddecompositionsoftenusematrixtools,sowewillbegin byreviewingsomematrixdecompositionsinChapter1,goingintofurtherdetailon eigenvaluedecomposition(EVD)andSVD,aswellasafewoftheirapplications.

TheHadamard,KroneckerandKhatri–Raomatrixproductsarepresentedin detailinChapter2,togetherwithmanyoftheirpropertiesandafewrelations betweenthem.Toillustratetheseoperations,wewillusethemtorepresentﬁrst-order partialderivativesofafunction,andsolvematrixequations,suchasSylvesterand Lyapunovones.Thischapteralsointroducesanindexconventionthatisveryuseful fortensorcomputations.Thisconvention,whichgeneralizesEinstein’ssummation convention(Pollock2011),willbeusedtorepresentvariousmatrixproductsandto provesomematrixproductvectorizationformulae,aswellasvariousrelations betweentheKronecker,Khatri-RaoandHadamardproducts.Itwillbeusedin Chapter3fortensormatricizationandvectorizationinanoriginalway,aswellasin Chapter5toestablishmatrixformsoftheTuckerandPARAFACdecompositions.

Chapter3presentsvarioussetsoftensorsbeforeintroducingthenotionsofmatrix andtensorslicesandofmodecombinationonwhichreshapingoperationsarebased. Thekeytensoroperationslistedabovearethenpresented.Severallinksbetween productsoftensorsandsystemsoftensorequationsarealsooutlined,andsomeof thesesystemsaresolvedwiththeleastsquaresmethod.

Chapter4isdedicatedtointroducingthenotionsofeigenvalueandsingularvalue fortensors.Theproblemofthebestrank-oneapproximationofatensorisalso considered.

InChapter5,wewillgiveadetailedpresentationofvarioustensor decompositions,withaparticularfocusonthebasicTuckerandCPD decompositions,whichcanbeviewedasgeneralizationsofmatrixSVDtotensorsof ordergreaterthantwo.Blocktensormodelsandconstrainedtensormodelswillalso bedescribed,aswellascertainvariants,suchasHOSVDandBTD(blockterm decomposition).CPD-typedecompositionsaregenerallyusedtoestimatelatent parameters,whereasTuckerdecompositionisoftenusedtoestimatemodalsubspaces andreducethedimensionalityvialowmultilinearrankapproximationandtruncated HOSVD.

AdescriptionoftheALSalgorithmforparameterestimationofPARAFACmodels willalsobegiven.TheuniquenesspropertiesoftheTuckerandCDPdecompositions willbepresented,aswellasthevariousnotionsoftherankofatensor.Thechapter willendwithillustrationsofBTDandCPDdecompositionsforthetensormodeling ofmultidimensionalharmonics,theproblemofsourceseparationinaninstantaneous linearmixtureandthemodelingandestimationofaﬁniteimpulseresponse(FIR) linearsystem,usingatensoroffourth-ordercumulantsofthesystemoutput.

High-ordercumulantsofrandomsignalsthatcanbeviewedastensorsplaya centralroleinvarioussignalprocessingapplications,asillustratedinChapter5.This motivatedustoincludeanAppendixtopresentabriefoverviewofsomebasicresults concerningthehigherorderstatistics(HOS)ofrandomsignals,withtwoapplications totheHOS-basedestimationofalineartime-invariantsystemandahomogeneous quadraticsystem.

MatrixDecompositions

1.1.Introduction

Thegoalofthischapteristogiveanoverviewofthemostimportantmatrix decompositions,withamoredetailedpresentationoftheeigenvaluedecomposition (EVD)andsingularvaluedecomposition(SVD),aswellassomeoftheir applications.Matrixdecompositions(alsocalledfactorizations)playakeyrolein matrixcomputation,inparticular,forcomputingthepseudo-inverseofamatrix(see section1.5.4),thelow-rankapproximationofamatrix(seesection1.5.7),the solutionofasystemoflinearequationsusingtheleastsquares(LS)method(see section1.5.9),orforparametricestimationofnonlinearmodelsusingtheALS method,asillustratedinChapter5withtheestimationoftensormodels.

Matrixdecompositionshavetwogoals.Theﬁrstistofactorizeagivenmatrix withstructuredfactormatricesthatareeasiertoinvert,andthesecondistoreduce thedimensionality,inordertoreduceboththememorycapacityrequiredtostorethe dataandthecomputationalcostofthedataprocessingalgorithms.Aftergivingabrief overviewofthemostcommondecompositions,wewillrecallafewresultsaboutthe eigenvaluesofamatrix,andthenpresenttheEVDdecompositionofasquarematrix. Theuseofthisdecompositionwillbeillustratedbycomputingthepowersofamatrix, amatrixpolynomial,astatetransitionmatrixandthetransferfunctionofadiscretetimelinearsystem.

The URV H decompositionofarectangularmatrixwillthenbeintroduced, followedbyapresentationoftheSVDdecomposition.Thelattercanbeviewedasa specialcaseofthe URV H decompositionwith U and V orthogonal(respectively, unitary)inthecaseofareal(respectively,complex)matrixand R pseudo-diagonal. TheSVDcanalsobeviewedasanextensionoftheEVDfordiagonalizing rectangularmatrices.

WewillpresentseveralresultsrelatingtotheSVDasthelinksbetweenSVDand thefundamentalspacesofamatrixandcertainmatrixnorms.Applicationsofthe SVDtocomputethepseudo-inverseofamatrixandhencetheLSestimator,aswell asalow-rankmatrixapproximation,willalsobedescribed.Polardecompositionwill bedemonstratedusingtheSVD.TheconnectionbetweenSVDandprincipal componentanalysis(PCA)willbeestablished,withanapplicationtodata compressionbyreducingthedimensionalityofadatamatrix.TheuseoftheSVDfor theblindsourceseparation(BSS)problemwillalsobeconsidered.Finally,theCUR decompositionofamatrix,whichisbasedonselectingcertaincolumnsandrows, willbebrieﬂydescribed.

1.2.Overviewofthemostcommonmatrixdecompositions

Table1.1presentsthemostcommonmatrixdecompositionsasproductsof matrices.Thesedecompositionsdifferfromoneanotherintermsofthestructural propertiesoftheirfactormatrices(diagonal/pseudo-diagonal,upper/lowertriangular, orthogonal/unitary).

TheEVD,theSVDandthepolardecompositionarepresentedindetailin sections1.3and1.5.The URV H andCURdecompositionsarealsodescribedinthis chapter.Full-rankdecompositionisdiscussedinChapter3,insection3.15.4.For otherdecompositions,seeLawsonandHanson(1974),Favier(1982,2019),Golub andVanLoan(1983),LancasterandTismenetsky(1985),HornandJohnson(1985, 1991)andMeyer(2000),amongmanyothers.

Wecanmakethefollowingremarks:

–Asquarerootofasymmetricpositivesemi-deﬁnitesquarematrix A ∈ RI ×I S is deﬁnedasasquarematrix S ∈ RI ×I suchthat A = SST .Weoftenwrite S = A1/2 . Thesquarerootisnotunique,sinceanymatrix SQ with Q orthogonalisalsoasquare rootof A.

TheCholeskydecompositiongivesasquarerootinlowertriangular (L) orupper triangular (U) formforasymmetricpositivesemi-deﬁnitematrix A.Thematrix L iscomputedrowbyrow,fromlefttorightandtoptobottom,whereas U is computedcolumnbycolumn,fromrighttoleftandbottomtotop.SeeFavier (1982)foradetailedpresentation.Thefactors L and U areuniqueif A ispositive deﬁnite.Inthecomplexcase,Choleskydecompositionsareexpressedintheform A = LLH = UUH

TheUDdecompositionisobtainedbymodifyingtheCholeskydecompositionso thatthefactormatrices L and U areunitlowertriangularandunituppertriangular, respectively,and D isdiagonal.

–TheSchurdecompositioniswrittenas A = UTUH (respectively, A = QTQT ),where U isunitary(respectively, Q isorthogonal),and T isupper orlowertriangularwiththeeigenvaluesof A alongthediagonal.Thisdecomposition, whichcanalsobewritten UH AU = T (respectively, QT AQ = T),showsthatevery complex(respectively,real)squarematrixisunitarily(respectively,orthogonally) similartoatriangularmatrix(seeTable1.6).

–Thedecomposition A = LU,where A ∈ CI ×I isofrank R, L ∈ CI ×I is lowertriangularand U ∈ CI ×I isuppertriangular,satisﬁesthepropertythat U or L isnon-singular.Thisdecompositiondoesnotalwaysexist.

Inthecaseofanon-singularmatrix A ∈ RI ×I ,avariantoftheLUdecomposition is A = LDU,where L and U areunitloweranduppertriangular,respectively,and D isdiagonal.Inthiscase,thematrices L and U areunique.

–TheQRdecompositioncanbecomputedusingvariousorthogonalization methodsbasedonHouseholder,GivensorGram–Schmidttransformations1

–TheLUandQRdecompositionsareusedtosolvesystemsoflinearequationsof theform Ax = b usingtheLSmethod.

Usingthedecomposition A = LU,with A ∈ RI ×I ,theoriginalsystemis replacedbytwotriangularsystems Ly = b and Ux = y .Tosolvethesenewsystems, thesquaretriangularmatrices L and U areinvertedusingforwardandbackward substitutionalgorithms,respectively,insteadofinverting A

Similarly,usingthedecomposition A = QR,with A ∈ RI ×J assumedtohave fullcolumnrank(I ≥ J ),theoriginalsystemofequationscanbesolvedbyinverting atriangularmatrix.Toseethis,set:

where R1 ∈ RJ ×J isnon-singular, c ∈ RJ and d ∈ RI J .Usingthefact thatpre-multiplyingavectorbyanorthogonalmatrixpreservesitsEuclideannorm ( Qy 2 = y 2 ),theLScriteriontominimizecanberewrittenasfollows:

1Amatrix Q ∈ RI ×J ,with J ≤ I ,issaidtobecolumn-wiseorthonormal(orsimplycolumn orthonormal)ifitscolumnvectorsformanorthonormalset (QT Q = IJ ).Similarly,when I ≤ J ,thematrix Q issaidtoberow-wiseorthonormal(orsimplyroworthonormal)ifitsrow vectorsformanorthonormalset (QQT = II )