Download ebooks file Matrix and tensor decompositions in signal processing, volume 2 gérard favier a

Page 1


Visit to download the full and correct content document: https://ebookmass.com/product/matrix-and-tensor-decompositions-in-signal-processin g-volume-2-gerard-favier/

More products digital (pdf, epub, mobi) instant download maybe you interests ...

Digital Signal Processing 4th Edition John G. Proakis

https://ebookmass.com/product/digital-signal-processing-4thedition-john-g-proakis/

Signal processing for neuroscientists 2nd ed Edition Drongelen

https://ebookmass.com/product/signal-processing-forneuroscientists-2nd-ed-edition-drongelen/

Intelligent Multi-Modal Data Processing (The Wiley Series in Intelligent Signal and Data Processing) 1st Edition Soham Sarkar

https://ebookmass.com/product/intelligent-multi-modal-dataprocessing-the-wiley-series-in-intelligent-signal-and-dataprocessing-1st-edition-soham-sarkar/

Multiresolution and Multirate Signal Processing: Introduction, Principles and Applications Vikram Gadre

https://ebookmass.com/product/multiresolution-and-multiratesignal-processing-introduction-principles-and-applicationsvikram-gadre/

Machine Learning Algorithms for Signal and Image Processing Suman Lata Tripathi

https://ebookmass.com/product/machine-learning-algorithms-forsignal-and-image-processing-suman-lata-tripathi/

Cellular Signal Processing: An Introduction to the Molecular Mechanisms of Signal Transduction, Second Edition 2nd Edition, (Ebook PDF)

https://ebookmass.com/product/cellular-signal-processing-anintroduction-to-the-molecular-mechanisms-of-signal-transductionsecond-edition-2nd-edition-ebook-pdf/

Machine Learning for Signal Processing: Data Science, Algorithms, and Computational Statistics Max A. Little

https://ebookmass.com/product/machine-learning-for-signalprocessing-data-science-algorithms-and-computational-statisticsmax-a-little/

Methods and techniques for fire detection : signal, image and video processing perspectives 1st Edition Çetin

https://ebookmass.com/product/methods-and-techniques-for-firedetection-signal-image-and-video-processing-perspectives-1stedition-cetin/

Synthetic Organic Chemistry and the Nobel Prize, Volume 2 John G. D’Angelo

https://ebookmass.com/product/synthetic-organic-chemistry-andthe-nobel-prize-volume-2-john-g-dangelo/

Matrix and Tensor Decompositions in Signal Processing

Matrices and Tensors with Signal Processing Set coordinated by Gérard Favier

Matrix and Tensor Decompositions in Signal Processing

Gérard Favier

First published 2021 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:

ISTE Ltd

John Wiley & Sons, Inc.

27-37 St George’s Road 111 River Street London SW19 4EU Hoboken, NJ 07030

UK USA

www.iste.co.uk

www.wiley.com

© ISTE Ltd 2021

The rights of Gérard Favier to be identified as the author of this work have been asserted by him in accordance with the Copyright, Designs and Patents Act 1988.

Library of Congress Control Number: 2021938218

British Library Cataloguing-in-Publication Data

A CIP record for this book is available from the British Library

ISBN 978-1-78630-155-0

Chapter1.MatrixDecompositions

1.1.Introduction.................................1

1.2.Overviewofthemostcommonmatrixdecompositions.........2

1.3.Eigenvaluedecomposition.........................5

1.3.1.Remindersabouttheeigenvaluesofamatrix.............5

1.3.2.Eigendecompositionandproperties..................7

1.3.3.Specialcaseofsymmetric/Hermitianmatrices............10

1.3.4.Applicationtocomputethepowersofamatrixandamatrix polynomial...................................12

1.3.5.Applicationtocomputeastatetransitionmatrix...........12

1.3.6.Applicationtocomputethetransferfunctionandtheoutputofa discrete-timelinearsystem...........................13

1.4. URV H decomposition...........................13

1.5.Singularvaluedecomposition.......................15

1.5.1.Definitionandproperties........................15

1.5.2.ReducedSVDanddyadicdecomposition...............17

1.5.3.SVDandfundamentalsubspacesassociatedwithamatrix.....20

1.5.4.SVDandtheMoore–Penrosepseudo-inverse.............20

1.5.5.SVDcomputation............................21

1.5.6.SVDandmatrixnorms.........................22

1.5.7.SVDandlow-rankmatrixapproximation...............25

1.5.8.SVDandorthogonalprojectors....................28

1.5.9.SVDandLSestimator.........................28

1.5.10.SVDandpolardecomposition....................31

1.5.11.SVDandPCA.............................33

1.5.12.SVDandblindsourceseparation...................38

1.6. CUR decomposition............................43

Chapter2.Hadamard,KroneckerandKhatri–RaoProducts .....47

2.1.Introduction.................................47

2.2.Notation...................................49

2.3.Hadamardproduct..............................50

2.3.1.Definitionandidentities.........................50

2.3.2.Fundamentalproperties.........................51

2.3.3.Basicrelations..............................51

2.3.4.Relationsbetweenthe diag operatorandHadamardproduct....52

2.4.Kroneckerproduct..............................54

2.4.1.Kroneckerproductofvectors......................54

2.4.2.Kroneckerproductofmatrices.....................58

2.4.3.Rank,trace,determinantandspectrumofaKroneckerproduct..64

2.4.4.StructuralpropertiesofaKroneckerproduct.............66

2.4.5.InverseandMoore–Penrosepseudo-inverseofa Kroneckerproduct...............................67

2.4.6.DecompositionsofaKroneckerproduct...............68

2.5.Kroneckersum................................69

2.5.1.Definition.................................69

2.5.2.Properties.................................70

2.6.Indexconvention...............................70

2.6.1.Writingvectorsandmatriceswiththeindexconvention.......71

2.6.2.Basicrulesandidentitieswiththeindexconvention.........72

2.6.3.Matrixproductsandindexconvention.................74

2.6.4.Kroneckerproductsandindexconvention..............75

2.6.5.Vectorizationandindexconvention..................77

2.6.6.Vectorizationformulae.........................79

2.6.7.Vectorizationofpartitionedmatrices.................82

2.6.8.Tracesofmatrixproductsandindexconvention...........84

2.7.Commutationmatrices...........................86

2.7.1.Definition.................................87

2.7.2.Properties.................................88

2.7.3.Kroneckerproductandpermutationoffactors............88

2.7.4.MultipleKroneckerproductandcommutationmatrices.......90

2.7.5.BlockKroneckerproduct........................92

2.7.6.StrongKroneckerproduct.......................94

2.8.Relationsbetweenthe diag operatorandtheKroneckerproduct....94

2.9.Khatri–Raoproduct.............................95

2.9.1.Definition.................................95

2.9.2.Khatri–Raoproductandindexconvention..............96

2.9.3.MultipleKhatri–Raoproduct......................97

2.9.4.Properties.................................97

2.9.5.Identities.................................98

2.9.6.Khatri–Raoproductandpermutationoffactors...........99

2.9.7.TraceofaproductofmatricesandKhatri–Raoproduct.......100 2.10.RelationsbetweenvectorizationandKroneckerandKhatri–Rao products......................................101

2.11.RelationsbetweentheKronecker,Khatri–Raoand Hadamardproducts................................101

2.12.Applications................................108

2.12.1.Partialderivativesandindexconvention...............108

2.12.2.Solvingmatrixequations.......................116

Chapter3.TensorOperations

3.1.Introduction.................................125

3.2.Notationandparticularsetsoftensors...................127

3.3.Notionofslice................................133

3.3.1.Fibers...................................133

3.3.2.Matrixandtensorslices.........................133

3.4.Modecombination..............................135

3.5.Partitionedtensorsorblocktensors....................137

3.6.Diagonaltensors...............................139

3.6.1.Caseofatensor X∈ K[N ;I ] ......................139

3.6.2.Caseofasquaretensor.........................140

3.6.3.Caseofarectangulartensor......................141

3.7.Matricization.................................141

3.7.1.Matricizationofathird-ordertensor..................142

3.7.2.Matrixunfoldingsandindexconvention...............143

3.7.3.Matricizationofatensoroforder N ..................144

3.7.4.Tensormatricizationbyindexblocks.................147

3.8.Subspacesassociatedwithatensorandmultilinearrank........148

3.9.Vectorization.................................149

3.9.1.Vectorizationofatensoroforder N ..................149

3.9.2.Vectorizationofathird-ordertensor..................150

3.10.Transposition................................151

3.10.1.Definitionofatransposetensor....................151

3.10.2.Propertiesoftransposetensors....................152

3.10.3.Transpositionandtensorcontraction.................155

3.11.Symmetric/partiallysymmetrictensors.................156

3.11.1.Symmetrictensors...........................156

3.11.2.Partiallysymmetric/Hermitiantensors................156

3.11.3.MultilinearformswithHermitiansymmetryand Hermitiantensors................................159

3.11.4.Symmetrizationofatensor......................161

3.12.Triangulartensors.............................166

3.13.Multiplicationoperations.........................166

3.13.1.Outerproductoftensors........................168

3.13.2.Tensor–matrixmultiplication.....................170

3.13.3.Tensor–vectormultiplication.....................174

3.13.4.Mode-(p,n) product..........................176

3.13.5.Einsteinproduct............................178

3.14.Inverseandpseudo-inversetensors....................186

3.15.Tensordecompositionsintheformoffactorizations..........193

3.15.1.Eigendecompositionofasymmetricsquaretensor.........193

3.15.2.SVDdecompositionofarectangulartensor.............194

3.15.3.ConnectionbetweenSVDandHOSVD...............194

3.15.4.Full-rankdecomposition........................197

3.16.Innerproduct,Frobeniusnormandtraceofatensor..........199

3.16.1.Innerproductoftwotensors.....................199

3.16.2.Frobeniusnormofatensor......................200

3.16.3.Traceofatensor............................203

3.17.Tensorsystemsandhomogeneouspolynomials.............203

3.17.1.Multilinearsystemsbasedonthemode-n product.........204

3.17.2.TensorsystemsbasedontheEinsteinproduct............207

3.17.3.SolvingtensorsystemsusingLS...................209

3.18.HadamardandKroneckerproductsoftensors..............211

3.19.Tensorextension..............................213

3.20.Tensorization................................215

3.21.Hankelization................................217

Chapter4.EigenvaluesandSingularValuesofaTensor .......221

4.1.Introduction.................................221

4.2.Eigenvaluesofatensorofordergreaterthantwo............224

4.2.1.Differentdefinitionsoftheeigenvaluesofatensor..........224

4.2.2.Positive/negative(semi-)definitetensors...............232

4.2.3.Orthogonally/unitarilysimilartensors.................233

4.3.Bestrank-oneapproximation........................235

4.4.Orthogonaldecompositions.........................238

4.5.Singularvaluesofatensor.........................239

Chapter5.TensorDecompositions .....................241

5.1.Introduction.................................241

5.2.Tensormodels................................242

5.2.1.Tuckermodel..............................242

5.2.2.Tucker-(N1 ,N )model.........................249

5.2.3.Tuckermodelofatransposetensor..................251

5.2.4.TuckerdecompositionandmultidimensionalFouriertransform..251

5.2.5.PARAFACmodel............................252

5.2.6.Blocktensormodels...........................271

5.2.7.Constrainedtensormodels.......................273

5.3.Examplesoftensormodels.........................277

5.3.1.Modelofmultidimensionalharmonics................277

5.3.2.Sourceseparation............................278

5.3.3.ModelofaFIRsystemusingfourth-orderoutputcumulants....282

Thefirstbookofthisserieswasdedicatedtointroducingmatricesandtensors(of ordergreaterthantwo)fromtheperspectiveoftheiralgebraicstructure,presenting theirsimilarities,differencesandconnectionswithrepresentationsoflinear,bilinear andmultilinearmappings.Thissecondvolumewillnowstudytensoroperationsand decompositionsingreaterdepth.

Inthisintroduction,wewillmotivatetheuseoftensorsbyansweringfivequestions thatprospectiveusersmightandshouldask:

–Whataretheadvantagesoftensorapproaches?

–Forwhatuses?

–Inwhatfieldsofapplication?

–Withwhattensordecompositions?

–Withwhatcostfunctionsandoptimizationalgorithms?

Althoughouranswersarenecessarilyincomplete,ouraimisto:

–presenttheadvantagesoftensorapproachesovermatrixapproaches;

–showafewexamplesofhowtensortoolscanbeused;

–giveanoverviewoftheextensivediversityofproblemsthatcanbesolvedusing tensors,includingafewexampleapplications;

–introducethethreemostwidelyusedtensordecompositions,presentingsomeof theirpropertiesandcomparingtheirparametriccomplexity;

–stateafewproblemsbasedontensormodelsintermsofthecostfunctionstobe optimized;

–describevarioustypesoftensor-basedprocessing,withabriefglimpseofthe optimizationmethodsthatcanbeused.

I.1.Whataretheadvantagesoftensorapproaches?

Inmostapplications,atensor X oforder N isviewedasanarrayofrealorcomplex numbers.Thecurrentelementofthetensorisdenoted xi1 , ,iN ,whereeachindex in ∈ In {1, ,In },for n ∈ N {1, ,N },isassociatedwiththe nth mode,and In isitsdimension,i.e.thenumberofelementsforthe nthmode.The orderofthetensoristhenumber N ofindices,i.e.thenumberofmodes.Tensorsare writtenwithcalligraphicletters1.An N th-ordertensorwithentries xi1 , ··· ,iN iswritten X =[xi1 , ··· ,iN ] ∈ KI1 ×···×IN ,where K = R or C,dependingonwhetherthetensor isreal-valuedorcomplex-valued,and I1 ×···× IN representsthesizeof X .

Ingeneral,amode(alsocalledaway)canhaveoneofthefollowing interpretations:(i)asasourceofinformation(user,patient,client,trial,etc.);(ii)asa typeofentityattachedtothedata(items/products,typesofmusic,typesoffilm,etc.); (iii)asatagthatcharacterizesanitem,apieceofmusic,afilm,etc.;(iv)asa recordingmodalitythatcapturesdiversityinvariousdomains(space,time, frequency,wavelength,polarization,color,etc.).Thus,adigitalimageincolorcanbe representedasathree-dimensionaltensor(ofpixels)withtwospatialmodes,onefor therows(width)andoneforthecolumns(height),andonechannelmode(RGB colors).Forexample,acolorimagecanberepresentedasatensorofsize 1024 × 768 × 3,wherethethirdmodecorrespondstotheintensityofthethreeRGB colors(red,green,blue).Foravolumetricimage,therearethreespatialmodes (width × height × depth),andthepointsoftheimagearecalledvoxels.Inthe contextofhyperspectralimagery,inadditiontothetwospatialdimensions,thereisa thirddimensioncorrespondingtotheemissionwavelengthwithinaspectralband.

Tensorapproachesbenefitfromthefollowingadvantagesovermatrixapproaches: –theessentialuniquenessproperty2,satisfiedbysometensordecompositions, suchasPARAFAC(parallelfactors)(Harshman1970)undercertainmildconditions; formatrixdecompositions,thispropertyrequirescertainrestrictiveconditionson thefactormatrices,suchasorthogonality,non-negativity,oraspecificstructure (triangular,Vandermonde,Toeplitz,etc.);

–theabilitytosolvecertainproblems,suchastheidentificationofcommunication channels,directlyfrommeasuredsignals,withoutrequiringthecalculationof

1Scalars,vectors,andmatricesarewritteninlowercase,boldlowercase,andbolduppercase letters,respectively: a, a, A. 2Adecompositionsatisfiestheessentialuniquenesspropertyifitisuniqueuptopermutation andscalingfactorsinthecolumnsofitsfactormatrices.

high-orderstatisticsofthesesignalsortheuseoflongpilotsequences.Theresulting deterministicandsemi-blindprocessingscanbeperformedwithsignalrecordings thatareshorterthanthoserequiredbystatisticalmethods,basedontheestimation ofhigh-ordermomentsorcumulants.Fortheblindsourceseparationproblem,tensor approachescanbeusedtotacklethecaseofunderdeterminedsystems,i.e.systems withmoresourcesthansensors;

–thepossibilityofcompressingbigdatasetsviaadatatensorizationandtheuse ofatensordecomposition,inparticular,alowmultilinearrankapproximation;

–agreaterflexibilityinrepresentingandprocessingmultimodaldataby consideringthemodalitiesseparately,insteadofstackingthecorrespondingdatainto avectororamatrix.Thisallowsthemultilinearstructureofdatatobepreserved, meaningthatinteractionsbetweenmodescanbetakenintoaccount;

–agreaternumberofmodalitiescanbeincorporatedintotensorrepresentations ofdata,meaningthatmorecomplementaryinformationisavailable,whichallows theperformanceofcertainsystemstobeimproved,e.g.wirelesscommunication, recommendation,diagnostic,andmonitoringsystems,bymakingdetection, interpretation,recognition,andclassificationoperationseasierandmoreefficient. Thisledtoageneralizationofcertainmatrixalgorithms,likeSVD(singularvalue decomposition)toMLSVD(multilinearSVD),alsoknownasHOSVD(higherorder SVD)(deLathauwer etal.2000a);similarly,certainsignalprocessingalgorithms weregeneralized,likePCA(principalcomponentanalysis)toMPCA(multilinear PCA)(Lu etal.2008)orTRPCA(tensorrobustPCA)(Lu etal.2020)and ICA(independentcomponentanalysis)toMICA(multilinearICA)(Vasilescuand Terzopoulos2005)ortensorPICA(probabilisticICA)(BeckmannandSmith2005).

Itisworthnotingthat,withatensormodel,thenumberofmodalitiesconsidered inaproblemcanbeincreasedeitherbyincreasingtheorderofthedatatensororby couplingtensorand/ormatrixdecompositionsthatshareoneorseveralmodes.Such acouplingapproachiscalleddatafusionusingacoupledtensor/matrixfactorization. Twoexamplesofthistypeofcouplingarepresentedlaterinthisintroductorychapter. Inthefirst,EEGsignalsarecoupledwithfunctionalmagneticresonanceimaging (fMRI)datatoanalyzethebrainfunction;inthesecond,hyperspectraland multispectralimagesaremergedforremotesensing.

Theotherapproach,namely,increasingthenumberofmodalities,willbe illustratedinVolume3ofthisseriesbygivingaunifiedpresentationofvarious modelsofwirelesscommunicationsystemsdesignedusingtensors.Inorderto improvesystemperformance,bothintermsoftransmissionandreception,theideais toemploymultipletypesofdiversitysimultaneouslyinvariousdomains(space,time, frequency,code,etc.),eachtypeofdiversitybeingassociatedwithamodeofthe tensorofreceivedsignals.Coupledtensormodelswillalsobepresentedinthe contextofcooperativecommunicationsystemswithrelays.

I.2.Forwhatuses?

Inthebigdata3 era,digitalinformationprocessingplaysakeyroleinvarious fieldsofapplication.Eachfieldhasitsownspecificitiesandrequiresspecialized, oftenmultidisciplinary,skillstomanageboththemultimodalityofthedataandthe processingtechniquesthatneedtobeimplemented.Thus,the“intelligent” informationprocessingsystemsofthefuturewillhavetointegraterepresentation tools,suchastensorsandgraphs,signalandimageprocessingmethods,with artificialintelligencetechniquesbasedonartificialneuralnetworksandmachine learning.

Theneedsofsuchsystemsarediverseandnumerous–whetherintermsof storage,visualization(3Drepresentation,virtualreality,disseminationofworksof art),transmission,imputation,prediction/forecasting,analysis,classificationor fusionofmultimodalandheterogeneousdata.ThereaderisinvitedtorefertoLahat etal.(2015)andPapalexakis etal.(2016)forapresentationofvariousexamplesof datafusionanddataminingbasedontensormodels.

Someofthekeyapplicationsoftensortoolsareasfollows:

–decompositionorseparationofheterogeneousdatasetsintocomponents/factors orsubspaceswiththegoalofexploitingthemultimodalstructureofthedataand extractingusefulinformationforusersfromuncertainornoisydataormeasurements providedbydifferentsourcesofinformationand/ortypesofsensor.Thus,featurescan beextractedindifferentdomains(spatial,temporal,frequential)forclassificationand decision-makingtasks;

–imputationofmissingdatawithinanincompletedatabaseusingalow-rank tensormodel,wherethemissingdataresultsfromdefectivesensorsorcommunication links,forexample.Thistaskiscalledtensorcompletionandisahigherorder generalizationofmatrixcompletion(CandèsandRecht2009;Signoretto etal.2011; Liu etal.2013);

–recoveryofusefulinformationfromcompresseddatabyreconstructingasignal oranimagethathasasparserepresentationinapredefinedbasis,usingcompressive sampling(CS;alsoknownascompressedsensing)techniques(CandèsandWakin 2008;CandèsandPlan2010),appliedtosparse,low-ranktensors(Sidiropoulosand Kyrillidis2012);

–fusionofdatausingcoupledtensorandmatrixdecompositions;

–designofcooperativemulti-antennacommunicationsystems(alsocalled MIMO(multiple-inputmultiple-output);thistypeofapplication,whichledtothe

3Bigdataischaracterizedby3Vs(Volume,Variety,Velocity)linkedtothesizeofthedataset, theheterogeneityofthedataandtherateatwhichitiscaptured,storedandprocessed.

developmentofseveralnewtensormodels,willbeconsideredinthenexttwovolumes ofthisseries;

–multilinearcompressivelearningthatcombinescompressedsensingwith machinelearning;

–reductionofthedimensionalityofmultimodal,heterogeneousdatabaseswith verylargedimensions(bigdata)bysolvingalow-ranktensorapproximationproblem;

–multiwayfilteringandtensordatadenoising.

Tensorscanalsobeusedtotensorizeneuralnetworkswithfullyconnectedlayers byexpressingtheweightmatrixofalayerasatensortrain(TT)whosecores representtheparametersofthelayer.Thisconsiderablyreducestheparametric complexityand,therefore,thestoragespace.Thiscompressionpropertyofthe informationcontainedinlayeredneuralnetworkswhenusingtensordecompositions providesawaytoincreasethenumberofhiddenunits(Novikov etal.2015).Tensors, whenusedtogetherwithmultilayerperceptronneuralnetworkstosolveclassification problems,achievelowererrorrateswithfewerparametersandlesscomputationtime thanneuralnetworksalone(ChienandBao2017).Neuralnetworkscanalsobeused tolearntherankofatensor(Zhou etal.2019),ortocomputeitseigenvaluesand singularvalues,andhencetherank-oneapproximationofatensor(Che etal.2017).

I.3.Inwhatfieldsofapplication?

Tensorshaveapplicationsinmanydomains.Thefieldsofpsychometricsand chemometricsinthe1970sand1990spavedthewayforsignalandimageprocessing applications,suchasblindsourceseparation,digitalcommunications,andcomputer visioninthe1990sandearly2000s.Today,thereisaquantitativeexplosionof bigdatainmedicine,astronomy,meteorology,withfifth-generationwireless communications(5G),formedicaldiagnosticaid,webservicesdeliveredby recommendationsystems(videoondemand,onlinesales,restaurantandhotel reservations,etc.),aswellasforinformationsearchingwithinmultimediadatabases (texts,images,audioandvideorecordings)andwithsocialnetworks.Thisexplains whyvariousscientificcommunitiesandtheindustrialworldareshowingagrowing interestintensors.

Amongthemanyexamplesofapplicationsoftensorsforsignalandimage processing,wecanmention:

–blindsourceseparationandblindsystemidentification.Theseproblemsplay afundamentalroleinsignalprocessing.Theyinvolveseparatingtheinputsignals (alsocalledsources)andidentifyingasystemfromtheknowledgeofonlytheoutput signalsandcertainhypothesesabouttheinputsignals,suchasstatisticalindependence inthecaseofindependentcomponentanalysis(Comon1994),ortheassumptionofa

finitealphabetinthecontextofdigitalcommunications.Thistypeofprocessingis,in particular,usedtojointlyestimatecommunicationchannelsandinformationsymbols emittedbyatransmitter.Itcanalsobeusedforspeechormusicseparation,orto processseismicsignals;

–useoftensordecompositionstoanalyzebiomedicalsignals(EEG,MEG,ECG, EOG4)inthespace,timeandfrequencydomains,inordertoprovideamedical diagnosticaid;forinstance,Acar etal.(2007)usedaPARAFACmodelofEEGsignals toanalyzeepilepticseizures;Becker etal.(2014)usedthesametypeofdecomposition tolocatesourceswithinEEGsignals;

–analysisofbrainactivitybymergingimagingdata(fMRI)andbiomedical signals(EEGandMEG)withthegoalofenablingnon-invasivemedicaltests(see TableI.4);

–analysisandclassificationofhyperspectralimagesusedinmanyfields (medicine,environment,agriculture,monitoring,astrophysics,etc.).Toimprovethe spatialresolutionofhyperspectralimages,Li etal.(2018)mergedhyperspectral andmultispectralimagesusingacoupledTuckerdecompositionwithasparsecore (coupledsparsetensorfactorization(CSTF))(seeTableI.4);

–designofsemi-blindreceiversforpoint-to-pointorcooperativeMIMO communicationsystemsbasedontensormodels;seetheoverviewsbydeAlmeida etal.(2016)anddaCosta etal.(2018);

–modelingandidentificationofnonlinearsystemsviaatensorrepresentationof VolterrakernelsorWiener–Hammersteinsystems(see,forexample,Kibangouand Favier2009a,2010;FavierandKibangou2009;FavierandBouilloc2009,2010; Favier etal.2012a);

–identificationoftensor-basedseparabletrilinearsystemsthatarelinearwith respectto(w.r.t.)theinputsignalandtrilinearw.r.t.thecoefficientsoftheglobal impulseresponse,modeledasaKroneckerproductofthreeindividualimpulse responses(Elisei-Iliescu etal.2020).Notethatsuchsystemsaretobecompared withthird-orderVolterrafiltersthatarelinearw.r.t.theVolterrakernelcoefficients andtrilinearw.r.t.theinputsignal;

–facialrecognition,basedonfacetensors,forpurposesofauthenticationand identificationinsurveillancesystems.Forfacialrecognition,photosofpeopleto recognizearestoredinadatabasewithdifferentlightingconditions,differentfacial expressions,frommultipleangles,foreachindividual.InVasilescuandTerzopoulos (2002),thetensoroffacialimagesisoforderfive,withdimensions: 28 × 5 × 3 × 3 × 7943,correspondingtothemodes: people × views × illumination × expressions × pixelsperimage.Foranoverviewofvariousfacialrecognitionsystems, seeArachchilageandIzquierdo(2020);

4Electroencephalography(EEG),magnetoencephalography(MEG),electrocardiography (ECG)andelectrooculography(EOG).

–tensor-basedanomalydetectionusedinmonitoringandsurveillancesystems.

TableI.1presentsafewexamplesofsignalandimagetensors,specifyingthe natureofthemodesineachcase.

Signals

Modes

References

Antenna space(antennas) × time × sensorsubnetwork (Sidiropoulos etal.2000a) processing space × time × polarization (Raimondi etal.2017)

Digital space(antennas) × time × code (Sidiropoulos etal.2000b)

communications antennas × blocks × symbolperiods × code × frequencies (FavieranddeAlmeida2014b)

ECG space(electrodes) × time × frequencies (Acar etal.2007;Padhy etal.2019)

EEG space(electrodes) × time × frequencies × subjectsortrials (Becker etal.2014;Cong etal.2015)

subjects × electrodes × time + subjects × voxels

EEG+fMRI (modelwithmatrixandtensorfactorizations (Acar etal.2017) coupledviathe“subjects”mode)

Images

Colorimages space(width) × space(height) × channel(colors)

Videosingrayscale space(width) × space(height) × time

Videosincolor space × space × channel × time

Hyperspectralimages space × space × spectralbands (Makantasis etal.2018)

Computervision people × views × illumination × expressions × pixels (VasilescuandTerzopoulos2002)

TableI.1. Signalandimagetensors

OtherfieldsofapplicationareconsideredinTableI.2.

Below,wegivesomedetailsabouttheapplicationconcerningrecommendation systems,whichplayanimportantroleinvariouswebsites.Thegoalofthesesystems istohelpuserstoselect items from tags thathavebeenassignedtoeachitembyusers. Theseitemscould,forexample,bemovies,books,musicalrecordings,webpages, productsforsaleonane-commercesite,etc.Astandardrecommendationsystemis basedonthethreefollowingmodes: users × items × tags.

Collaborativefilteringtechniquesusetheopinionsofasetofpeople,or assessmentsfromthesepeoplebasedonaratingsystem,togeneratealistof recommendationsforaspecificuser.Thistypeoffilteringis,forexample,usedby websiteslikeNetflixforrentingDVDs.Collaborativefilteringmethodsareclassified intothreecategories,dependingonwhetherthefilteringisbasedon(a)historyanda similaritymetric;(b)amodelbasedonmatrixfactorizationusingalgorithmslike SVDornon-negativematrixfactorization(NMF);(c)somecombinationofboth, knownashybridcollaborativefilteringtechniques.SeeLuo etal.(2014)andBokde etal.(2015)forapproachesbasedonmatrixfactorization.

Otherso-calledpassivefilteringtechniquesexploitthedataofamatrixofrelations betweenitemstodeducerecommendationsforauserfromcorrelationsbetweenitems andtheuser’spreviouschoices,withoutusinganykindofratingsystem.Thisisknown asacontent-basedapproach.

Domains

Modes

References

Phonetics subjects × vowels × formants (Harshman1970)

Chemometrics excitation × emission × samples (fluorescence) (excitation/emissionwavelengths) (Bro1997,2006;Smilde etal.2004)

Contextual (RendleandSchmidt-Thieme2010) recommendation users × items × tags (SymeonidisandZioupos2016) systems × context1 ×···× contextN (FrolovandOseledets2017)

Transportation Space(sensors) × time(days) × time(weeks) (Goulart etal.2017) (speedmeasurements) (periodsof15sand24h) (Tan etal.2013;Ran etal.2016)

typesofmusic × frequencies × frequencies (Panagakis etal.2010) users × keywords × songs (Nanopoulos etal.2010) Music recordings × (audio)characteristics × segments (BenetosandKotropoulos2008)

Bioinformatics medicine × targets × diseases (Wang etal.2019)

TableI.2. Otherfieldsofapplication

Recommendationsystemscanalsouseinformationabouttheusers(age, nationality,geographiclocation,participationonsocialnetworks,etc.)andtheitems themselves(typesofmusic,typesoffilm,classesofhotels,etc.).Thisiscalled contextualinformation.Takingthisadditionalinformationintoaccountallowsthe relevanceoftherecommendationstobeimproved,atthecostofincreasingthe dimensionalityandthecomplexityofthedatarepresentationmodeland,therefore,of theprocessingalgorithms.Thisiswhytensorapproachesaresoimportantforthis typeofapplicationtoday.Notethat,forrecommendationsystems,thedatatensors aresparse.Consequently,sometagscanbeautomaticallygeneratedbythesystem basedonsimilaritymetricsbetweenitems.Thisis,forexample,thecaseformusic recommendationsbasedontheacousticcharacteristicsofsongs(Nanopoulos etal. 2010).Personalizedtagrecommendationstakeintoaccounttheuser’sprofile, preferences,andinterests.Thesystemcanalsohelptheuserselectexistingtagsor createnewones(RendleandSchmidt-Thieme2010).

ThearticlesbyBobadilla etal.(2013)andFrolovandOseledets(2017)present variousrecommendationsystemswithmanybibliographicalreferences.Operating accordingtoasimilarprincipleasrecommendationsystems,socialnetwork websites,suchasWikipedia,Facebook,orTwitter,allowdifferenttypesofdatatobe exchangedandshared,contenttobeproducedandconnectionstobeestablished.

I.4.Withwhattensordecompositions?

Itisimportanttonotethat,foran N th-ordertensor X∈ KI1 ×···×IN ,thenumber ofelementsis N n=1 In ,and,assuming In = I for n ∈ N ,thisnumberbecomes I N , whichinducesanexponentialincreasewiththetensororder N .Thisiscalledthecurse ofdimensionality(OseledetsandTyrtyshnikov2009).Forbigdatatensors,tensor decompositionsplayafundamentalroleinalleviatingthiscurseofdimensionality, duetothefactthatthenumberofparametersthatcharacterizethedecompositionsis generallymuchsmallerthanthenumberofelementsintheoriginaltensor.

Wenowintroducethreebasicdecompositions:PARAFAC/CANDECOMP/CPD, TDandTT5.ThefirsttwoarestudiedindepthinChapter5,whereasthethird,briefly introducedinChapter3,willbeconsideredinmoredetailinVolume3.

TableI.3givestheexpressionoftheelement xi1 , ,iN ofatensor X∈ KI1 ×···×IN oforder N andsize I1 ×···× IN ,eitherreal(K = R)orcomplex(K = C),foreach ofthethreedecompositionscitedabove.Theirparametriccomplexityiscomparedin termsofthesizeofeachmatrixandtensorfactor,assuming In = I and Rn = R for all n ∈ N

FiguresI.1–I.3showgraphicalrepresentationsofthePARAFACmodel A(1) , A(2) , A(3) ; R andtheTDmodel G ; A(1) , A(2) , A(3) forathird-order tensor X∈ KI1 ×I2 ×I3 ,andoftheTTmodel A(1) , A(2) , A(3) , A(4) fora fourth-ordertensor X∈ KI1 ×I2 ×I3 ×I4 .InthecaseofthePARAFACmodel,we define A(n) a(n) 1 , , a(n) R ∈ KIn ×R usingitscolumns,for n ∈{1, 2, 3}

FigureI.1. Third-orderPARAFACmodel

Wecanmakeafewremarksabouteachofthesedecompositions: –ThePARAFACdecomposition(Harshman1970),alsoknownasCANDECOMP (CarrollandChang1970)orCPD(Hitchcock1927),ofa N th-ordertensor X isasum 5PARAFACforparallelfactors;CANDECOMPforcanonicaldecomposition;CPDfor canonicalpolyadicdecomposition;TDforTuckerdecomposition;TTfortensortrain.

xxMatrixandTensorDecompositionsinSignalProcessing of R rank-onetensors,eachdefinedastheouterproductofonecolumnfromeachof the N matrixfactors A(n) ∈ KIn ×R .When R isminimal,itiscalledtherankof thetensor.Ifthematrixfactorssatisfycertainconditions,thisdecompositionhasthe essentialuniquenessproperty.SeeFigureI.1forathird-ordertensor (N =3),and Chapter5foradetailedpresentation. Decompositions

TableI.3. ParametriccomplexityoftheCPD,TD, andTTdecompositions

FigureI.2. Third-orderTuckermodel

FigureI.3. Fourth-orderTTmodel

–TheTuckerdecomposition(Tucker1966)canbeviewedasageneralizationof thePARAFACdecompositionthattakesintoaccountalltheinteractionsbetweenthe columnsofthematrixfactors A(n) ∈ KIn ×Rn viatheintroductionofacoretensor

G∈ KR1 ×···×RN .Thisdecompositionisnotuniqueingeneral.Notethat,if Rn ≤ In for ∀n ∈ N ,thenthecoretensor G providesacompressedformof X .If Rn ,for n ∈ N ,ischosenastherankofthemode-n matrixunfolding6 of X ,thenthe N -tuple (R1 , ··· ,RN ) isminimal,anditiscalledthemultilinearrankofthetensor.

SuchaTuckerdecompositioncanbeobtainedusingthetruncatedhigh-order SVD(THOSVD),undertheconstraintofcolumn-orthonormalmatrices A(n) (deLathauwer etal.2000a).Thisalgorithmisdescribedinsection5.2.1.8.

SeeFigureI.2forathird-ordertensor,andChapter5foradetailedpresentation.

–TheTTdecomposition(Oseledets2011)iscomposedofatrainofthird-order tensors A(n) ∈ KRn 1 ×In ×Rn ,for n ∈{2, 3, ··· ,N 1},thefirstandlastcarriages ofthetrainbeingmatrices A(1) ∈ KI1 ×R1 and A(N ) ∈ KRN 1 ×IN ,whichimplies r0 = rN =1,andtherefore a(1)

and

.The dimensions Rn ,for n ∈ N 1 ,calledtheTTranks,aregivenbytheranksofsome matrixunfoldingsoftheoriginaltensor.

,i

Thisdecompositionhasbeenusedtosolvethetensorcompletionproblem (Grasedyck etal.2015;Bengua etal.2017),forfacialrecognition(Brandoniand Simoncini2020)andformodelingMIMOcommunicationchannels(Zniyed etal. 2020),amongmanyotherapplications.AbriefdescriptionoftheTTdecompositionis giveninsection3.13.4usingthemode-(p,n) product.NotethataspecificSVD-based algorithm,calledTT-SVD,wasproposedbyOseledets(2011)forcomputingaTT decomposition.

ThisdecompositionandthehierarchicalTucker(HT)one(Grasedyckand Hackbush2011;Ballani etal.2013)arespecialcasesoftensornetworks(TNs) (Cichocki2014),aswillbediscussedinmoredetailinthenextvolume.

6Seedefinition[3.41],inChapter3,ofthemode-n matrixunfolding Xn ofatensor X ,whose columnsarethemode-n vectorsobtainedbyfixingallbut n indices.

Fromthisbriefdescriptionofthethreetensormodels,onecanconcludethat, unlikematrices,thenotionofrankisnotuniquefortensors,sinceitdependsonthe decompositionused.Thus,asmentionedabove,onedefinesthetensorrank(also calledthecanonicalrankorKruskal’srank)associatedwiththePARAFAC decomposition,themultilinearrankthatreliesontheTucker’smodel,andthe TT-rankslinkedwiththeTTdecomposition.

Itisimportanttonotethatthenumberofcharacteristicparametersofthe PARAFACandTTdecompositionsisproportionalto N ,theorderofthetensor, whereastheparametriccomplexityoftheTuckerdecompositionincreases exponentiallywith N .Thisiswhythefirsttwodecompositionsareespecially valuableforlarge-scaleproblems.AlthoughtheTuckermodelisnotuniquein general,imposinganorthogonalityconstraintonthematrixfactorsyieldsthe HOSVDdecomposition,atruncatedformofwhichgivesanapproximatesolutionto thebestlowmultilinearrankapproximationproblem(deLathauwer etal.2000a). Thissolution,whichisbasedonan apriori choiceofthedimensions Rn ofthecore tensor,istobecomparedwiththetruncatedSVDinthematrixcase,althoughitdoes nothavethesameoptimalityproperty.Itiswidelyusedtoreducetheparametric complexityofdatatensors.

Fromtheabove,itcanbeconcludedthattheTTmodelcombinestheadvantages oftheothertwodecompositions,intermsofparametriccomplexity(likePARAFAC) andnumericalstability(likeTucker’smodel),duetoaparameterestimationalgorithm basedonacalculationofSVDs.

ToillustratetheuseofthePARAFACdecomposition,letusconsiderthecaseof multi-usermobilecommunicationswithaCDMA(code-divisionmultipleaccess) encodingsystem.Themultipleaccesstechniqueallowsmultipleemittersto simultaneouslytransmitinformationoverthesamecommunicationchannelby assigningacodetoeachemitter.Theinformationistransmittedassymbols sn,m , with n ∈ N and m ∈ M ,where N and M arethenumberoftransmissiontime slots,i.e.thenumberofsymbolperiods,andthenumberofemittingantennas, respectively.Thesymbolsbelongtoafinitealphabetthatdependsonthemodulation beingused.Theyareencodedwithaspace-timecodingthatintroducescodediversity byrepeatingeachsymbol P timeswithacode cp,m assignedtothe mthemitting antenna, p ∈ P ,where P denotesthelengthofthespreadingcode.Thesignal receivedbythe k threceivingantenna,duringthe nthsymbolperiodandthe pthchip period,isalinearcombinationofthesymbolsencodedandtransmittedbythe M emittingantennas:

xk,n,p = M m=1 hk,m sn,m cp,m , [I.1] where hk,m isthefadingcoefficientofthecommunicationchannelbetweenthe receivingantenna k andtheemittingantenna m

Thereceivedsignals,whicharecomplex-valued,thereforeformathird-order tensor X∈ CK ×N ×P whosemodesare: space × time × code,associatedwith theindices (k,n,p).ThissignaltensorsatisfiesaPARAFACdecomposition H, S, C; M whoserankisequaltothenumber M ofemittingantennasandwhose matrixfactorsarethechannel(H ∈ CK ×M ),thematrixoftransmittedsymbols (S ∈ CN ×M )andthecodingmatrix(C ∈ CP ×M ).Thisexampleisasimplifiedform oftheDS-CDMA(direct-sequenceCDMA)systemproposedby(Sidiropoulos etal 2000b).

I.5.Withwhatcostfunctionsandoptimizationalgorithms?

Wewillnowbrieflydescribethemostcommonprocessingoperationscarriedout withtensors,aswellassomeoftheoptimizationalgorithmsthatareused.Itis importanttofirstpresentthepreprocessingoperationsthatneedtobeperformed. Preprocessingtypicallyinvolvesdatacenteringoperations(offsetelimination), scalingofnon-homogeneousdata,suppressionofoutliersandartifacts,image adjustment(size,brightness,contrast,alignment,etc.),denoising,signal transformationusingcertaintransforms(wavelets,Fourier,etc.),andfinally,insome cases,thecalculationofstatisticsofsignalstobeprocessed.

Preprocessingisfundamental,bothtoimprovethequalityoftheestimated modelsand,therefore,ofthesubsequentprocessingoperations,andtoavoid numericalproblemswithoptimizationalgorithms,suchasconditioningproblems thatmaycausethealgorithmstofailtoconverge.Centeringandscaling preprocessingoperationsarepotentiallyproblematicbecausetheyareinterdependent andcanbecombinedinseveraldifferentways.Ifdataaremissing,centeringcanalso reducetherankofthetensormodel.Foramoredetaileddescriptionofthese preprocessingoperations,seeSmilde etal.(2004).

Fortheprocessingoperationsthemselves,wecandistinguishbetweenseveral differentclasses:

–supervised/non-supervised(blindorsemi-blind),i.e.withorwithouttraining data,forexample,tosolveclassificationproblems,orwhen apriori information, calledapilotsequence,istransmittedtothereceiverforchannelestimation;

–real-time(online)/batch(offline)processing; –centralized/distributed;

–adaptive/blockwise(withrespecttothedata);

–with/withoutcouplingoftensorand/ormatrixmodels; –with/withoutmissingdata.

Itisimportanttodistinguishbatchprocessing,whichisperformedtoanalyzedata recordedassignalandimagesets,fromthereal-timeprocessingrequiredbywireless communicationsystems,recommendationsystems,websearchesandsocial networks.Inreal-timeapplications,thedimensionalityofthemodelandthe algorithmiccomplexityarepredominantfactors.Thesignalsreceivedbyreceiving antennas,theinformationexchangedbetweenawebsiteandtheusersandthe messagesexchangedbetweentheusersofasocialnetworkaretime-dependent.For instance,arecommendationsysteminteractswiththeusersinreal-time,viaa possibleextensionofanexistingdatabasebymeansofmachinelearningtechniques. Foradescriptionofvariousapplicationsoftensorsfordataminingandmachine learning,seeAnandkumar etal.(2014)andSidiropoulos etal.(2017).

Tensor-basedprocessingsleadtovarioustypesofoptimizationalgorithmas follows:

–constrained/unconstrainedoptimization; –iterative/non-iterative,orclosed-form;

–alternating/global; –sequential/parallel.

Furthermore,dependingontheinformationthatisavailable apriori,different typesofconstraintscanbetakenintoaccountinthecostfunctiontobeoptimized: lowrank,sparseness,non-negativity,orthogonalityanddifferentiability/smoothness. Inthecaseofconstrainedoptimization,weightsneedtobechoseninthecost functionaccordingtotherelativeimportanceofeachconstraintandthequalityofthe apriori informationthatisavailable.

TableI.4presentsafewexamplesofcostfunctionsthatcanbeminimizedfor theparameterestimationofcertainthird-ordertensormodels(CPD,Tucker,coupled matrixTucker(CMTucker)andcoupledsparsetensorfactorization(CSTF)),forthe imputationofmissingdatainatensorandfortheestimationofasparsedatatensor withalow-rankconstraintexpressedintheformofthenuclearnormofthetensor.

R EMARK I.1.–Wecanmakethefollowingremarks:

–thecostfunctionspresentedinTableI.4correspondtodatafittingcriteria.These criteria,expressedintermsoftensorandmatrixFrobeniusnorms( . F ),arequadratic inthedifferencebetweenthedatatensor X andtheoutputofCPDandTDmodels,as wellasbetweenthedatamatrix Y andamatrixfactorizationmodel,inthecaseofthe CMTuckermodel.Theyaretrilinearandquadrilinear,respectively,withrespecttothe parametersoftheCPDandTDmodelstobeestimated,andbilinearwithrespectto theparametersofthematrixfactorizationmodel;

–forthemissingdataimputationproblemusingaCPDorTDmodel,thebinary tensor W ,whichhasthesamesizeas X ,isdefinedas:

wijk = 1 if xijk isknown 0 if xijk ismissing

ThepurposeoftheHadamardproduct(denoted )of W ,withthedifference between X andtheoutputoftheCPDandTDmodels,istofitthemodeltothe availabledataonly,ignoringanymissingdataformodelestimation.Thisimputation problem,knownasthetensorcompletionproblem,wasoriginallydealtwithby TomasiandBro(2005)andAcar etal.(2011a)usingaCPDmodel,followedby FilipovicandJukic(2015)usingaTDmodel.Variousarticleshavediscussedthis probleminthecontextofdifferentapplications.Anoverviewoftheliteraturewillbe giveninthenextvolume; Problems

Estimation

Costfunctions

Imputation

Imputationwith low-rankconstraint

Costfunctions

Costfunctions

TableI.4. Costfunctionsformodelestimation andrecoveryofmissingdata

–fortheimputationproblemwiththelow-rankconstraint,theterm X in thecostfunctionreplacesthelow-rankconstraintwiththenuclearnormof X , sincethefunctionrank(X ) isnotconvex,andthenuclearnormistheclosest

convexapproximationoftherank.InLiu etal.(2013),thistermisreplacedby 3 n=1 λn Xn ,where Xn representsthemode-n unfoldingof X 7;

–inthecaseoftheCMTuckermodel,thecouplingconsideredhererelatestothe firstmodesofthetensor X andthematrix Y ofdataviathecommonmatrixfactor A.

Coupledmatrixandtensorfactorization(CMTF)modelswereintroducedinAcar etal.(2011b)bycouplingaCPDmodelwithamatrixfactorizationandusingthe gradientdescentalgorithmtoestimatetheparameters.Thistypeofmodelwasused byAcar etal.(2017)tomergeEEGandfMRIdatawiththegoalofanalyzingbrain activity.TheEEGsignalsaremodeledwithanormalizedCPDmodel(seeChapter5), whereasthefMRIdataaremodeledwithamatrixfactorization.Thedataarecoupled throughthe subjects mode(seeTableI.1).Thecostfunctiontobeminimizedis thereforegivenby:

wherethecolumnvectorsofthematrixfactors(A, B, C)haveunitnorm, Σ isa diagonalmatrixwhosediagonalelementsarethecoefficientsofthevector σ and α> 0 isapenaltyparameterthatallowstheimportanceofthesparsenessconstraints ontheweightvectors (g , σ ) tobeincreasedordecreased,modeledbymeansofthe l1 norm.TheadvantageofmergingEEGandfMRIdatawiththecriterion[I.3]isthatthe acquisitionandobservationmethodsarecomplementaryintermsofresolution,since EEGsignalshaveahightemporalresolutionbutlowspatialresolution,whilefMRI imagingprovideshighspatialresolution;

–inthecaseoftheCSTFmodel(Li etal.2018),thetensorofhigh-resolution hyperspectralimages(HR-HSI)isrepresentedusingathird-orderTuckermodel thathasasparsecore(X = G×1 W ×2 H ×3 S),withthefollowingmodes: space (width) × space (height) × spectralbands.Thematrices W ∈ RM ×nw , H ∈ RN ×nh and S ∈ RP ×ns denotethedictionariesforthewidth,heightand spectralmodes,composedof nw , nh and ns atoms,respectively,andthecoretensor G containsthecoefficientsrelativetothethreedictionaries.Thematrices W ∗ , H∗ and S∗ arespatiallyandspectrallysubsampledversionswithrespecttoeachmode.The term λ isaregularizationparameterforthesparsenessconstraintonthecoretensor, expressedintermsofthe l1 normof G .

ThecriterialistedinTableI.4canbegloballyminimizedusinganonlinear optimizationmethodsuchasagradientdescentalgorithm(withfixedoroptimalstep size),ortheGauss–NewtonandLevenberg–Marquardtalgorithms,thelatterbeinga

7Seedefinition[3.41]oftheunfolding Xn ,anddefinitions[1.65]and[1.67]oftheFrobenius norm( F )andthenuclearnorm( ∗ )ofamatrix;foratensor,seesection3.16.

regularizedformoftheformer.Inthecaseofconstrainedoptimization,the augmentedLagrangianmethodisveryoftenused,asitallowstheconstrained optimizationproblemtobetransformedintoasequenceofunconstrained optimizationproblems.

Thedrawbacksoftheseoptimizationmethodsincludeslowconvergencefor gradient-typealgorithmsandhighnumericalcomplexityfortheGauss–Newtonand Levenberg–MarquardtalgorithmsduetotheneedtocomputetheJacobianmatrixof thecriterionw.r.t.theparametersbeingestimated,aswellastheinverseofalarge matrix.

Alternatingoptimizationmethodsarethereforeoftenusedinsteadofaglobal optimizationw.r.t.allmatrixandtensorfactorstobeestimated.Theseiterative methodsperformasequenceofseparateoptimizationsofcriterialinearineach unknownfactorwhilefixingtheotherfactorswiththevaluesestimatedatprevious iterations.AnexampleisthestandardALS(alternatingleastsquares)algorithm, presentedinChapter5forestimatingPARAFACmodels.Forconstrained optimization,thealternatingdirectionmethodofmultipliers(ADMM)isoftenused (Boyd etal.2011).

Tocompletethisintroductorychapter,letusoutlinethekeyknowledgeneededto employtensortools,whosepresentationconstitutesthemainobjectiveofthissecond volume:

–arrangement(alsocalledreshaping)operationsthatexpressthedatatensor asavector(vectorization),amatrix(matricization),oralowerordertensorby combiningmodes;conversely,thetensorizationandHankelizationoperationsallow ustoconstructtensorsfromdatacontainedinlargevectorsormatrices;

–tensoroperationssuchastransposition,symmetrization,Hadamardand Kroneckerproducts,inversionandpseudo-inversion;

–thenotionsofeigenvalueandsingularvalueofatensor;

–tensordecompositions/models,andtheiruniquenessproperties;

–algorithmsusedtosolvedimensionalityreductionproblemsand,hence,best low-rankapproximation,parameterestimationandmissingdataimputation.This algorithmicaspectlinkedtotensorswillbeexploredinmoredepthinVolume3.

I.6.Briefdescriptionofcontent

Tensoroperationsanddecompositionsoftenusematrixtools,sowewillbegin byreviewingsomematrixdecompositionsinChapter1,goingintofurtherdetailon eigenvaluedecomposition(EVD)andSVD,aswellasafewoftheirapplications.

TheHadamard,KroneckerandKhatri–Raomatrixproductsarepresentedin detailinChapter2,togetherwithmanyoftheirpropertiesandafewrelations betweenthem.Toillustratetheseoperations,wewillusethemtorepresentfirst-order partialderivativesofafunction,andsolvematrixequations,suchasSylvesterand Lyapunovones.Thischapteralsointroducesanindexconventionthatisveryuseful fortensorcomputations.Thisconvention,whichgeneralizesEinstein’ssummation convention(Pollock2011),willbeusedtorepresentvariousmatrixproductsandto provesomematrixproductvectorizationformulae,aswellasvariousrelations betweentheKronecker,Khatri-RaoandHadamardproducts.Itwillbeusedin Chapter3fortensormatricizationandvectorizationinanoriginalway,aswellasin Chapter5toestablishmatrixformsoftheTuckerandPARAFACdecompositions.

Chapter3presentsvarioussetsoftensorsbeforeintroducingthenotionsofmatrix andtensorslicesandofmodecombinationonwhichreshapingoperationsarebased. Thekeytensoroperationslistedabovearethenpresented.Severallinksbetween productsoftensorsandsystemsoftensorequationsarealsooutlined,andsomeof thesesystemsaresolvedwiththeleastsquaresmethod.

Chapter4isdedicatedtointroducingthenotionsofeigenvalueandsingularvalue fortensors.Theproblemofthebestrank-oneapproximationofatensorisalso considered.

InChapter5,wewillgiveadetailedpresentationofvarioustensor decompositions,withaparticularfocusonthebasicTuckerandCPD decompositions,whichcanbeviewedasgeneralizationsofmatrixSVDtotensorsof ordergreaterthantwo.Blocktensormodelsandconstrainedtensormodelswillalso bedescribed,aswellascertainvariants,suchasHOSVDandBTD(blockterm decomposition).CPD-typedecompositionsaregenerallyusedtoestimatelatent parameters,whereasTuckerdecompositionisoftenusedtoestimatemodalsubspaces andreducethedimensionalityvialowmultilinearrankapproximationandtruncated HOSVD.

AdescriptionoftheALSalgorithmforparameterestimationofPARAFACmodels willalsobegiven.TheuniquenesspropertiesoftheTuckerandCDPdecompositions willbepresented,aswellasthevariousnotionsoftherankofatensor.Thechapter willendwithillustrationsofBTDandCPDdecompositionsforthetensormodeling ofmultidimensionalharmonics,theproblemofsourceseparationinaninstantaneous linearmixtureandthemodelingandestimationofafiniteimpulseresponse(FIR) linearsystem,usingatensoroffourth-ordercumulantsofthesystemoutput.

High-ordercumulantsofrandomsignalsthatcanbeviewedastensorsplaya centralroleinvarioussignalprocessingapplications,asillustratedinChapter5.This motivatedustoincludeanAppendixtopresentabriefoverviewofsomebasicresults concerningthehigherorderstatistics(HOS)ofrandomsignals,withtwoapplications totheHOS-basedestimationofalineartime-invariantsystemandahomogeneous quadraticsystem.

MatrixDecompositions

1.1.Introduction

Thegoalofthischapteristogiveanoverviewofthemostimportantmatrix decompositions,withamoredetailedpresentationoftheeigenvaluedecomposition (EVD)andsingularvaluedecomposition(SVD),aswellassomeoftheir applications.Matrixdecompositions(alsocalledfactorizations)playakeyrolein matrixcomputation,inparticular,forcomputingthepseudo-inverseofamatrix(see section1.5.4),thelow-rankapproximationofamatrix(seesection1.5.7),the solutionofasystemoflinearequationsusingtheleastsquares(LS)method(see section1.5.9),orforparametricestimationofnonlinearmodelsusingtheALS method,asillustratedinChapter5withtheestimationoftensormodels.

Matrixdecompositionshavetwogoals.Thefirstistofactorizeagivenmatrix withstructuredfactormatricesthatareeasiertoinvert,andthesecondistoreduce thedimensionality,inordertoreduceboththememorycapacityrequiredtostorethe dataandthecomputationalcostofthedataprocessingalgorithms.Aftergivingabrief overviewofthemostcommondecompositions,wewillrecallafewresultsaboutthe eigenvaluesofamatrix,andthenpresenttheEVDdecompositionofasquarematrix. Theuseofthisdecompositionwillbeillustratedbycomputingthepowersofamatrix, amatrixpolynomial,astatetransitionmatrixandthetransferfunctionofadiscretetimelinearsystem.

The URV H decompositionofarectangularmatrixwillthenbeintroduced, followedbyapresentationoftheSVDdecomposition.Thelattercanbeviewedasa specialcaseofthe URV H decompositionwith U and V orthogonal(respectively, unitary)inthecaseofareal(respectively,complex)matrixand R pseudo-diagonal. TheSVDcanalsobeviewedasanextensionoftheEVDfordiagonalizing rectangularmatrices.

WewillpresentseveralresultsrelatingtotheSVDasthelinksbetweenSVDand thefundamentalspacesofamatrixandcertainmatrixnorms.Applicationsofthe SVDtocomputethepseudo-inverseofamatrixandhencetheLSestimator,aswell asalow-rankmatrixapproximation,willalsobedescribed.Polardecompositionwill bedemonstratedusingtheSVD.TheconnectionbetweenSVDandprincipal componentanalysis(PCA)willbeestablished,withanapplicationtodata compressionbyreducingthedimensionalityofadatamatrix.TheuseoftheSVDfor theblindsourceseparation(BSS)problemwillalsobeconsidered.Finally,theCUR decompositionofamatrix,whichisbasedonselectingcertaincolumnsandrows, willbebrieflydescribed.

1.2.Overviewofthemostcommonmatrixdecompositions

Table1.1presentsthemostcommonmatrixdecompositionsasproductsof matrices.Thesedecompositionsdifferfromoneanotherintermsofthestructural propertiesoftheirfactormatrices(diagonal/pseudo-diagonal,upper/lowertriangular, orthogonal/unitary).

TheEVD,theSVDandthepolardecompositionarepresentedindetailin sections1.3and1.5.The URV H andCURdecompositionsarealsodescribedinthis chapter.Full-rankdecompositionisdiscussedinChapter3,insection3.15.4.For otherdecompositions,seeLawsonandHanson(1974),Favier(1982,2019),Golub andVanLoan(1983),LancasterandTismenetsky(1985),HornandJohnson(1985, 1991)andMeyer(2000),amongmanyothers.

Wecanmakethefollowingremarks:

–Asquarerootofasymmetricpositivesemi-definitesquarematrix A ∈ RI ×I S is definedasasquarematrix S ∈ RI ×I suchthat A = SST .Weoftenwrite S = A1/2 . Thesquarerootisnotunique,sinceanymatrix SQ with Q orthogonalisalsoasquare rootof A.

TheCholeskydecompositiongivesasquarerootinlowertriangular (L) orupper triangular (U) formforasymmetricpositivesemi-definitematrix A.Thematrix L iscomputedrowbyrow,fromlefttorightandtoptobottom,whereas U is computedcolumnbycolumn,fromrighttoleftandbottomtotop.SeeFavier (1982)foradetailedpresentation.Thefactors L and U areuniqueif A ispositive definite.Inthecomplexcase,Choleskydecompositionsareexpressedintheform A = LLH = UUH

TheUDdecompositionisobtainedbymodifyingtheCholeskydecompositionso thatthefactormatrices L and U areunitlowertriangularandunituppertriangular, respectively,and D isdiagonal.

–TheSchurdecompositioniswrittenas A = UTUH (respectively, A = QTQT ),where U isunitary(respectively, Q isorthogonal),and T isupper orlowertriangularwiththeeigenvaluesof A alongthediagonal.Thisdecomposition, whichcanalsobewritten UH AU = T (respectively, QT AQ = T),showsthatevery complex(respectively,real)squarematrixisunitarily(respectively,orthogonally) similartoatriangularmatrix(seeTable1.6).

–Thedecomposition A = LU,where A ∈ CI ×I isofrank R, L ∈ CI ×I is lowertriangularand U ∈ CI ×I isuppertriangular,satisfiesthepropertythat U or L isnon-singular.Thisdecompositiondoesnotalwaysexist.

Inthecaseofanon-singularmatrix A ∈ RI ×I ,avariantoftheLUdecomposition is A = LDU,where L and U areunitloweranduppertriangular,respectively,and D isdiagonal.Inthiscase,thematrices L and U areunique.

–TheQRdecompositioncanbecomputedusingvariousorthogonalization methodsbasedonHouseholder,GivensorGram–Schmidttransformations1

–TheLUandQRdecompositionsareusedtosolvesystemsoflinearequationsof theform Ax = b usingtheLSmethod.

Usingthedecomposition A = LU,with A ∈ RI ×I ,theoriginalsystemis replacedbytwotriangularsystems Ly = b and Ux = y .Tosolvethesenewsystems, thesquaretriangularmatrices L and U areinvertedusingforwardandbackward substitutionalgorithms,respectively,insteadofinverting A

Similarly,usingthedecomposition A = QR,with A ∈ RI ×J assumedtohave fullcolumnrank(I ≥ J ),theoriginalsystemofequationscanbesolvedbyinverting atriangularmatrix.Toseethis,set:

where R1 ∈ RJ ×J isnon-singular, c ∈ RJ and d ∈ RI J .Usingthefact thatpre-multiplyingavectorbyanorthogonalmatrixpreservesitsEuclideannorm ( Qy 2 = y 2 ),theLScriteriontominimizecanberewrittenasfollows:

1Amatrix Q ∈ RI ×J ,with J ≤ I ,issaidtobecolumn-wiseorthonormal(orsimplycolumn orthonormal)ifitscolumnvectorsformanorthonormalset (QT Q = IJ ).Similarly,when I ≤ J ,thematrix Q issaidtoberow-wiseorthonormal(orsimplyroworthonormal)ifitsrow vectorsformanorthonormalset (QQT = II )

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.