Where can buy Next generation arithmetic: 4th international conference, conga 2023, singapore, march

Page 1


NextGenerationArithmetic:4thInternational Conference,CoNGA2023,Singapore,March1-2, 2023,ProceedingsJohnGustafson

https://ebookmass.com/product/next-generationarithmetic-4th-international-conferenceconga-2023-singapore-march-1-2-2023-proceedings-johngustafson/

Instant digital products (PDF, ePub, MOBI) ready for you

Download now and discover formats that fit your needs...

Dynamic Intentions: DCMC Next Generation (DreamCatcher MC Next Generation Book 1) Liberty Parker

https://ebookmass.com/product/dynamic-intentions-dcmc-next-generationdreamcatcher-mc-next-generation-book-1-liberty-parker/

ebookmass.com

Beyond Camps and Forced Labour : Proceedings of the Sixth International Conference 1st Edition Suzanne Bardgett

https://ebookmass.com/product/beyond-camps-and-forced-labourproceedings-of-the-sixth-international-conference-1st-edition-suzannebardgett/

ebookmass.com

Craving the Fight (Gloves Off - Next Generation Book 1) L.P. Dover

https://ebookmass.com/product/craving-the-fight-gloves-off-nextgeneration-book-1-l-p-dover/ ebookmass.com

Handbook of MRI Scanning 1st Edition

https://ebookmass.com/product/handbook-of-mri-scanning-1st-edition/

ebookmass.com

Computational Methods in Organometallic Catalysis: From Elementary Reactions to Mechanisms Yu Lan

https://ebookmass.com/product/computational-methods-in-organometalliccatalysis-from-elementary-reactions-to-mechanisms-yu-lan/

ebookmass.com

Sociology of the Arts in Action Arturo Rodríguez Morató

https://ebookmass.com/product/sociology-of-the-arts-in-action-arturorodriguez-morato/

ebookmass.com

eTextbook 978-1138668386 Cross-Cultural Psychology: Critical Thinking and Contemporary Applications Sixth Edition

https://ebookmass.com/product/etextbook-978-1138668386-cross-culturalpsychology-critical-thinking-and-contemporary-applications-sixthedition/

ebookmass.com

Fortune Favors the Duke Kristin Vayden

https://ebookmass.com/product/fortune-favors-the-duke-kristinvayden-3/

ebookmass.com

Single

Skin and Double Skin Concrete Filled Tubular Structures : Analysis and Design Mohamed Elchalakani

https://ebookmass.com/product/single-skin-and-double-skin-concretefilled-tubular-structures-analysis-and-design-mohamed-elchalakani/

ebookmass.com

Gender-Critical Feminism 1st Edition

https://ebookmass.com/product/gender-critical-feminism-1st-editionholly-lawford-smith/

ebookmass.com

John Gustafson

Siew Hoon Leong

Next Generation Arithmetic

4th International Conference, CoNGA 2023 Singapore, March 1–2, 2023

Proceedings

LectureNotesinComputerScience13851

FoundingEditors

GerhardGoos

JurisHartmanis

EditorialBoardMembers

ElisaBertino, PurdueUniversity,WestLafayette,IN,USA

WenGao, PekingUniversity,Beijing,China

BernhardSteffen , TUDortmundUniversity,Dortmund,Germany

MotiYung , ColumbiaUniversity,NewYork,NY,USA

TheseriesLectureNotesinComputerScience(LNCS),includingitssubseriesLecture NotesinArtificialIntelligence(LNAI)andLectureNotesinBioinformatics(LNBI), hasestablisheditselfasamediumforthepublicationofnewdevelopmentsincomputer scienceandinformationtechnologyresearch,teaching,andeducation.

LNCSenjoysclosecooperationwiththecomputerscienceR&Dcommunity,the seriescountsmanyrenownedacademicsamongitsvolumeeditorsandpaperauthors,and collaborateswithprestigioussocieties.Itsmissionistoservethisinternationalcommunitybyprovidinganinvaluableservice,mainlyfocusedonthepublicationofconference andworkshopproceedingsandpostproceedings.LNCScommencedpublicationin1973.

JohnGustafson·SiewHoonLeong·

Editors

NextGeneration

Arithmetic

4thInternationalConference,CoNGA2023

Singapore,March1–2,2023

Proceedings

Editors

JohnGustafson

ArizonaStateUniversity

Tempe,AZ,USA

MarekMichalewicz

NationalSupercomputingCentre

Singapore,Singapore

SiewHoonLeong

SwissNationalSupercomputingCentre ETHZurich

Lugano,Switzerland

ISSN0302-9743ISSN1611-3349(electronic) LectureNotesinComputerScience

ISBN978-3-031-32179-5ISBN978-3-031-32180-1(eBook) https://doi.org/10.1007/978-3-031-32180-1

©TheEditor(s)(ifapplicable)andTheAuthor(s),underexclusivelicense toSpringerNatureSwitzerlandAG2023

Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodologynow knownorhereafterdeveloped.

Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse.

Thepublisher,theauthors,andtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsorthe editorsgiveawarranty,expressedorimplied,withrespecttothematerialcontainedhereinorforanyerrors oromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictionalclaimsin publishedmapsandinstitutionalaffiliations.

ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland

Preface

Computerarithmeticisonceagainacontroversialtopic.Beforetheestablishmentof IEEEStd754™forfloating-pointarithmeticin1985,nooneexpectedsimilaranswers fromdifferentmakesofcomputers,becausethevendorsalluseddifferentwaystorepresentrealnumbers.IEEE754compliancemadeitpossibletogetsimilar,albeitnot identical answersondifferentsystems.Italsoledtothecasualuseofdoubleprecision (64-bit)floating-pointeverywhereasaneasysubstituteforhavingtothinkcarefully aboutroundingerrors.

AdherencetothatStandardbegantowaneintheearly2000sassystemdesigners begantoabandonhardwaresupportforthemanycomplicatedandexpensiveprovisions ofIEEE754,suchasgradualunderflow.Differentvendorsmadedifferentshortcut choices,resultingagaininthesituationthatnooneexpectssimilaranswersfromdifferent computers.Ascomputingbecameincreasinglybandwidth-limited,pressuremountedto try32-bitandeven16-bitrepresentationswithoutincurringunacceptableaccuracyloss. Lowerprecisionsmademoreobvioustheinefficiencyoftheoutdated754Standard,and theMachineLearningcommunityinparticularisexperimentingwithnewrealnumber formats.WhilesomeclingtotheideathattheIEEE754Standardwilllastforever, anincreasingpartofthecomputingcommunityrecognizestheneedforfreshdesigns moresuitedtopresent-daypriorities.Thecontroversyisnotunlikethetransitionfrom sequentialcomputingtoparallelcomputinginthe1990s.

AspartofSCAsia2023,the ConferenceonNext-GenerationArithmetic (CoNGA 2023)isthepremierforumforthepresentationoftheimpactofnovelnumberformats on

• ApplicationSpeedandAccuracy

• HardwareCosts

• Software-HardwareCodevelopment

• AlgorithmChoices

• ToolsandProgrammingEnvironments

ThisisthefourthCoNGAconference,andthelargesttodate.The16submitted papersforthetechnicalpapersprogramwentthrougharigorouspeerreviewprocessby aninternationalprogramcommittee,withanaverageofthreereviewspersubmission. ElevenpaperswereselectedforinclusionintheProceedings.Acceptedpaperscover topicsrangingfrombetterwaystobuildarithmeticunitstotheapplicationconsequences ofthenewformats,manyusingthe posit formatstandardizedin2022.

WethankallauthorsfortheirsubmissionstoCoNGA.Oursincerethanksgotoall ProgramCommitteemembersfordoinghigh-qualityandin-depthsubmissionreviews.

WealsothanktheorganizersforgivingustheopportunitytoholdCoNGA2023asa sub-conferenceofSCAsia2023.

February2023JohnL.Gustafson MarekMichalewicz CerlaneLeong

Organization

Co-chairs

JohnL.GustafsonArizonaStateUniversity,USA MarekMichalewiczNationalSupercomputingCentre(NSCC), Singapore

ProgramChair

CerlaneLeongCSCS,Switzerland

ProgramCommittee

ShinYeeChungNSCC,Singapore MarcoCococcioniUniversityofPisa,Italy HimeshiDeSilvaA*STAR,Singapore VassilDimitrovLemurianLabs,Canada RomanIakymchukUmeåUniversity,Sweden PeterLindstromLawrenceLivermoreNationalLaboratory,USA AndrewShewmakerOpenEyeScientific,USA

LosslessFFTsUsingPositArithmetic....................................1 SiewHoonLeongandJohnL.Gustafson

Bedot:BitEfficientDotProductforDeepGenerativeModels................19 Nhut-MinhHo,Duy-ThanhNguyen,JohnL.Gustafson, andWeng-FaiWong

AParadigmforInterval-AwareProgramming..............................38 MoritzBeutelandRobertStrzodka

Decoding-FreeTwo-InputArithmeticforLow-PrecisionRealNumbers.......61 JohnL.Gustafson,MarcoCococcioni,FedericoRossi, EmanueleRuffaldi,andSergioSaponara

HybridSORNHardwareAcceleratorforSupportVectorMachines...........77 NilsHülsmeier,MoritzBärthel,JochenRust,andSteffenPaul

PHAc: Posit Hardware AcceleratorforEfficientArithmeticLogic Operations...........................................................88 DikshaShekhawat,JugalGandhi,M.Santosh,andJaiGopalPandey

FusedThree-InputSORNArithmetic.....................................101 MoritzBärthel,ChenYuxing,NilsHülsmeier,JochenRust, andSteffenPaul

TowardsaBetter16-BitNumberRepresentationforTrainingNeural Networks.............................................................114 HimeshiDeSilva,HongshiTan,Nhut-MinhHo,JohnL.Gustafson, andWeng-FaiWong

ImprovingtheStabilityofKalmanFilterswithPositArithmetic..............134 PonsuganthIlangovanP.,RohanRayan,andVinayShankarSaxena

EvaluationoftheUseofLowPrecisionFloating-PointArithmetic forApplicationsinRadioAstronomy.....................................155 ThusharaKanchanaGunaratne

PLAUs:PositLogarithmicApproximateUnitstoImplementLow-Cost OperationswithRealNumbers..........................................171

AuthorIndex

LosslessFFTsUsingPositArithmetic

1 SwissNationalSupercomputingCentre,ETHZurich,Zurich,Switzerland cerlane.leong@cscs.ch 2 ArizonaStateUniversity,Tempe,USA jlgusta6@asu.edu

Abstract. TheFastFourierTransform(FFT)isrequiredforchemistry, weather,defense,andsignalprocessingforseismicexplorationandradio astronomy.Itiscommunication-bound,makingsupercomputersthousandsoftimessloweratFFTsthenatdenselinearalgebra.Thekey toacceleratingFFTsistominimizebitsperdatumwithoutsacrificing accuracy.The16-bitfixedpointandIEEEfloattypelacksufficientaccuracyfor1024-and4096-pointFFTsofdatafromanalog-to-digitalconverters.Weshowthatthe16-bitposit,withhigheraccuracyandlarger dynamicrange,canperformFFTssoaccuratelythataforward-inverse FFTrestorestheoriginalsignalperfectly.“Reversible”FFTswithposits arelossless,eliminatingtheneedfor32-bitorhigherprecision.Similarly, 32-bitpositFFTscanreplace64-bitfloatFFTsformanyHPCtasks. Speed,energyefficiency,andstoragecostscanthusbeimprovedby2× forabroadrangeofHPCworkloads.

Keywords: Posit · Quire · FFT · ComputerArithmetic

1Introduction

The posit ™ numberformatistheroundedformofTypeIIIuniversalnumber (unum)arithmetic[13, 16].ItevolvedfromTypeIIunumsinDecember2016as ahardware-friendlydrop-inalternativetothefloating-pointIEEEStd754™ [19]. Thetaperedaccuracyofpositsallowsthemtohavemorefractionbitsinthemost commonlyusedrange,thusenablingpositstobemoreaccuratethanfloatingpointnumbers(floats)ofthesamesize,yethaveanevenlargerdynamicrange thanfloats.Positarithmeticalsointroducesthe quire,anexactaccumulatorfor fuseddotproducts,thatcandramaticallyreduceroundingerrors.

ThecomputationoftheDiscreteFourierTransform(DFT)usingtheFast FourierTransform(FFT)algorithmhasbecomeoneofthemostimportantand powerfultoolsofHighPerformanceComputing(HPC).FFTsareinvestigated heretodemonstratethespeedandaccuracyofpositarithmeticwhencompared tofloating-pointarithmetic.ImprovingFFTscanpotentiallyimprovetheperformanceofHPCapplicationssuchasCP2K[18],SpecFEM3D[20, 21],and SupportedbyorganizationA*STARandNSCCSingapore.

c TheAuthor(s),underexclusivelicensetoSpringerNatureSwitzerlandAG2023 J.Gustafsonetal.(Eds.):CoNGA2023,LNCS13851,pp.1–18,2023. https://doi.org/10.1007/978-3-031-32180-1 1

WRF[31],leadingtohigherspeedandimprovedaccuracy.“Precision”here referstothenumberofbitsinanumberformat,and“accuracy”referstothe correctnessofananswer,measuredinthenumberofcorrectdecimalsorcorrect bits.Positsachieveordersofmagnitudesmallerroundingerrorswhencompared tofloatsthathavethesameprecision[26, 27].Thus,thecommonly-used32-bit and64-bitprecisionfloatscanpotentiallybereplacedwith16-bitand32-bit positsrespectivelyinsomeapplications,doublingthespeedofcommunicationboundcomputationsandhalvingthestorageandpowercosts.

WesayFFTaccuracyis lossless whentheinverseFFTreproducestheoriginal signalperfectly;thatis,theFFTis reversible.Theroundingerrorof16-bitIEEE floatsand16-bitfixed-pointformataddstoomuchnoiseforthoseformatsto performlosslessFFTs,whichforcesprogrammerstouse32-bitfloatsforsignal processingtasks.Incontrast,16-bitpositshaveenoughaccuracyfora“round trip”tobelosslessattheresolutionofcommonAnalog-to-DigitalConverters (ADCs)thatsupplytheinputdata.Oncereversibilityisachieved,theuseof morebitsofprecisioniswastefulsincethetransformationretainsallinformation.

Thepaperisorganizedasfollow:InSect. 2,relatedworkonposits,fixedpointFFTsandfloating-pointFFTswillbeshared.Backgroundinformationon thepositandquireformatandtheFFTisprovidedinSect. 3.Section 4 presents theapproachusedtoevaluatetheaccuracyandperformanceofradix-2and radix-4(1024-and4096-point)FFTs(Decimation-In-TimeandDecimation-InFrequency)usingboth16-bitpositsand16-bitfloats.TheresultsoftheevaluationarediscussedinSect. 5.Finally,theconclusionsandplansforfuturework arepresentedinSect. 6.

2RelatedWork

PositsareanewformofcomputerarithmeticinventedbyGustafsoninDecember2016.TheconceptwasfirstpubliclysharedasaStanfordlectureseminar[16] inFebruary2017.Thefirstpeer-reviewedpositjournalpaper[15]waspublished inJune2017.Sincethen,studiesonpositcorrectness[6],accuracy,efficiency whencomparedtofloats[26]andvarioussoftwareandField-ProgrammableGate Array(FPGA)implementations[5, 24, 29]havebeenperformed.Duetotheflexibilitytochoosetheprecisionrequiredandexpresshighdynamicrangeusing veryfewbits,researchershavefoundpositsparticularlywell-suitedtomachine learningapplications.[23]hasdemonstratedthatverylowprecisionpositsoutperformfixed-pointandallothertestedformatsforinference,thusimproving speedfortime-criticalAItaskssuchasself-drivingcars.

EffortstoimproveDFTsbyreducingthenumberofoperationscanbetraced backtotheworkofCooleyandTukey[8]in1965,whoseimprovementsbased onthealgorithmofGood[12]reducedtheoperationcomplexityfrom O (N 2 )to O (N log 2 N ),nowcalledFFTs[7,p.1667].Additionalworktofurtherimprove theFFTalgorithmledtoradix-2m algorithms[9, 30],theRader-Brenneralgorithm[30],theWinogradalgorithm(WFTA)[35, 36]andprimefactoralgorithms (PFA)[11, 33].Inpractice,radix-2,radix-4,andsplit-radixarethemostwidely adoptedtypes.

Theeffectofarithmeticprecisiononperformancehasbeenstudied[1, 1]. Foroptimumspeed,thelowestpossiblebitprecisionshouldbeusedthatstill meetsaccuracyrequirements.Fixed-pointasopposedtofloating-pointistraditionallyusedtoimplementFFTalgorithmsincustomDigitalSignalProcessing (DSP)hardware[3]duetothehighercostandcomplexityoffloating-pointlogic. Fixed-pointreducespowerdissipationandachieveshigherspeed[3, 25].Custom Application-SpecificIntegratedCircuits(ASICs)andFPGAsallowtheuseof unusualandvariablewordsizeswithoutaspeedpenalty,butthatflexibilityis notavailableinageneralprogrammingenvironment.

3Background

Inthissection,positformat,thecorresponding quire exactaccumulatorformat, andtheFFTalgorithmarediscussed.

3.1Posits

Positsaremuchsimplerthanfloats,whichcanpotentiallyresultinfastercircuits requiringlesschiparea[2].ThePositStandard(2022)Themainadvantagesof positsoverfloatsare:

–Higheraccuracyforthemostcommonly-usedvaluerange –1-to-1mapofsignedbinaryintegerstoorderedrealnumbers –Bitwisereproducibilityacrossallcomputingsystems –Increaseddynamicrange

–Moreinformationperbit(higherShannonentropy) –Onlytwoexceptionvalues:zeroandNot-a-Real(NaR) –Supportforassociativeanddistributivelaws

Thedifferencesamong16-bitfloat,fixed-point,andpositformatsaredisplayedinFig. 1 whereeachcolorblockisabit,i.e.0or1.Thecolorsdepictthe fields(sign, exponent,fraction, integer,or regime)thatthebitsrepresent.

A16-bitfloat(Fig. 1a)consistsofasignbitand5exponentbits,leaving 10bitsforfractionafterthe“hiddenbit”.Ifsignaldataiscenteredabout0so thesignbitissignificant,a16-bitfloatiscapableofstoringsignedvaluesfrom ADCswithupto12bitsofoutput,butnolarger.

Althoughthenumberofbitstotheleftoftheradixpointcanbeflexibly chosenforafixed-pointformat(Fig. 1b),1024-pointFFTsofdataintherange 1to1canproducevaluesbetween 32and32ingeneral.Therefore,a16-bit fixed-pointformatfortheFFTrequiresintegerbitsthatincreasefrom2to6 withthestagesoftheFFT,leaving13to9bitsrespectivelyforthefraction part.Atfirstglance,fixed-pointwouldappeartohavethebestaccuracyfor FFTs,sinceitallowsthemaximumpossiblenumberoffractionbits.However, towardsthefinalstageofa1024-pointFFTcomputation,a16-bitfloatwillstill have10fractionbits(excludingthehiddenbit)whilefixed-pointwillonlyhave

(a)Floating-point

(b)Fixed-point

9fractionbitstoaccommodatethelargerworst-caseintegerpartitneedsto storetoavoidcatastrophicoverflow.For4096-pointFFTs,fixed-pointwillonly have8fractionbits.Positswillhave10to12fractionbitsfortheresultsofthe FFT.Consequently,16-bitfixed-pointhasthe lowest accuracyamongthethree numberformatsforboth1024-and4096-pointFFTs;positshavethehighest accuracy(seeFig. 2).Notealsothatthe“twiddlefactors”aretrigonometric functionsintherange 1to1,whichpositsrepresentwithabout0.6decimals greateraccuracythanfloats.

Aswithfloatsandintegers,themostsignificantbitofapositindicates thesign.The“regime”bitsusessignedunaryencodingrequiring2to15bits (Fig. 1c).Accuracytapers,withthehighestaccuracyforvalueswithmagnitudes near1andlessaccuracyforthelargestandsmallestmagnitudenumbers.Posit arithmetichardwarerequiresintegeradders,integermultipliers,shifters,leadingzerocountersandANDtreesverysimilartothoserequiredforIEEEfloats; however,posithardwareissimplerinhavingasingleroundingmode,nointernalflags,andonlytwoexceptionvaluestodealwith.Comparisonoperationsare thoseofintegers;noextrahardwareisneeded.Proprietarydesignsshowareductioningatecountforpositsversusfloats,forbothFPGAandVLSIdesigns,and areductioninoperationlatency[2].

ThedynamicrangeforIEEEStandard16-bitfloatsisfrom2 16 to65504, orabout6.0 × 10 8 to6.5 × 104 (12ordersofmagnitude).Floatsusetapered accuracyforsmall-magnitude(“subnormal”)valuesonly,makingtheirdynamic rangeunbalancedabout1.Thereciprocalofasmall-magnitudefloatoverflowsto infinity.For16-bitposits,theuseofasingle eS exponentbitallowsexpressionof magnitudesfrom2 28 to228 ,orabout3.7 × 10 9 to2.7 × 108 (almost17orders ofmagnitude).Posittaperedprecisionissymmetricalaboutmagnitude1,and reciprocationisclosedandexactforintegerpowersof2.Thus,theaccuracy advantageofpositsdoesnotcomeatthecostofreduceddynamicrange.

(c)Posit
Fig.1. Differentnumberformats

Comparisonof16-bitpositsandfloatsforaccuracy

3.2TheQuireRegister

Theconceptofthe quire [14,pp.80–84],afixed-pointscratchvalue,originates fromtheworkofKulischandMiranker[22].Thequiredatatypeisusedto accumulatedotproductswithnoroundingerrors.Whentheaccumulatedresult isconvertedbacktopositformwithasinglerounding,theresultisa“fuseddot product”.Thequiredatatypeisusedtoaccumulatetheaddition/subtractionof aproductoftwoposits,usingexactfixed-pointarithmetic.Thus,aquiredata typeonlyneedstosupportaddandsubtractoperations,andobeythesamerules asintegeraddandsubtractoperations(augmentedwiththeabilitytohandlea NaRinputvalue).

Tostoretheresultofafuseddotproductwithoutanyrounding,aquire datatypemustminimallysupporttherange[minPos2 , maxPos2 ],where minPos isthesmallestexpressiblerealgreaterthanzeroand maxPos isthebiggest expressiblereal,foraparticular n-bitposit.Sincetherewillbeaneedforthe quiretoaccumulatetheresultsoffuseddotproductsoflongvectors,additional n 1bitsareprependedtothemostsignificantbitsascarryoverflowprotection. Thus,a16-bitpositwitha1-bit eS (posit 16, 1 )willhaveacorresponding128bitquire,notatedquire128 16, 1

Theuseofthequirereducescumulativeroundingerror,aswillbedemonstratedinSects. 4 and 5,andenablescorrectly-roundedfuseddotproducts. Noticethatthecomplexmultiply-addinanFFTcanbeexpressedasapair ofdotproducts,soallofthecomplexrotationsintheFFTneedincuronly oneroundingperrealandimaginarypart,insteadoffour(ifalloperationsare rounded)ortwo(iffusedmultiply-addisused).

3.3TheFFTAlgorithm

ThediscreteformoftheFouriertransform(DFT)canbewrittenas

Fig.2.

where

xn isthereal-valuedsequenceof N data-points, Xk isthecomplex-valuedsequenceof N data-points, k=0,...,N 1

Thesumisscaledby 1 √N suchthattheinverseDFThasthesameformother thanthesignoftheexponent,whichrequiresreversingthedirectionofthe angularrotationfactors(theimaginarypartofthecomplexvalue),commonly knownas“twiddles”or“twiddlefactors”:

Followingthisconvention,thetwiddlefactor e2πikn/N willbewrittenas w forshort.Whileitisalsopossibletohavenoscalinginonedirectionanda scalingof 1 N intheother,thishasonlythemeritthatitsavesoneoperationper pointinaforward-inversetransformation.Scalingby 1 √N makesbothforward andinversetransforms unitary andconsistent.TheformsasshowninEqn 1 and 2 havetheadditionaladvantagethattheykeepintermediatevaluesfrom growinginmagnitudeunnecessarily,apropertythatiscrucialforfixed-point arithmetictopreventoverflow,anddesirableforpositarithmeticsinceitmaximizesaccuracy.TheonlyvariantfromtraditionalFFTalgorithmsusedhereis thatthedatasetisscaledby0.5oneveryradix-4pass,oreveryotherpassof aradix-2FFT.Thisautomaticallyprovidesthe 1 √N scalingwhilekeepingthe computationsintherangewherepositshavemaximumaccuracy.Thescalingby 0.5canbeincorporatedintothetwiddlefactortabletoeliminatethecostofan extramultiplyoperation.

TheFFTisaformoftheDFTthatusesthefactthatthesummationscan berepresentedasamatrix-vectorproduct,andthematrixcanbefactoredto reducethecomputationalcomplexityto N log 2 N operations.Inthispaper,two basicclassesofFFTalgorithms,Decimation-In-Time(DIT)andDecimationIn-Frequency(DIF),willbediscussed.DITalgorithmsdecomposethetime sequencesintosuccessivelysmallersubsequenceswhileDIFalgorithmsdecomposethecoefficientsintosmallersubsequences[28].

TraditionalanalysisofthecomputationalcomplexityoftheFFTcenterson thenumberofmultiplicationsandadditions.ThekerneloperationofanFFTis oftencalleda“butterfly”becauseofitsdataflowpattern(seeFig. 3).Theoriginal radix-2algorithmperforms10operations(6additionsand4multiplications) perbutterfly[34,p.42],highlightedinredinFig. 3,andthereare 1 2 N log 2 N butterflies,sotheoperationcomplexityis5N log 2 N forlarge N .Useofradix4 reducesthisto4 5N log 2 N ;splitradixmethodsare4N log 2 N ,andwithalittle morewhittling awaythiscanbefurtherreducedtoabout3 88N log 2 N [32]. Operationcount,however,isnotthekeytoincreasingFFTperformance,since theFFTiscommunication-boundandnotcomputation-bound.

SupercomputerusersareoftensurprisedbythetinyfractionofpeakperformancetheyobtainwhentheyperformFFTswhileusinghighly-optimizedvendorlibrariesthatarewell-tunedforaparticularsystem.TheTOP500listshows

Aradix-1“butterfly”calculation

manysystemsachievingover80%oftheirpeakperformanceformultiply-add operationsinadenselinearalgebraproblem.FFTs,whichhaveonlymultiply andaddoperationsandpredetermineddataflowforagivensize N ,mightbe expectedtoachievesimilarperformances.However,traditionalcomplexityanalysis,whichcountsthenumberofoperations,doesapoorjobofpredictingactual FFTexecutiontimings.

FFTsarethustheAchillesHeelofHPC,becausetheytaxthemostvaluableresource:datamotion.Inanerawhenperformanceisalmostalways communication-boundandnotcomputation-bound,itismoresensibletooptimizethe datamotion asopposedtothe operationcount.Denselinearalgebra involvesorder N 3 operationsbutonlyorder N 2 datamotion,makingitoneof thefewHPCtasksthatisstillcompute-bound.Denselinearalgebraisoneof thefewworkloadsforwhichoperationcountcorrelateswellwithpeakarithmetic performance.WhilesomehavestudiedthecommunicationaspectsoftheFFT basedonasingleprocessorwithacachehierarchy(avalidmodelforsmall-scale digitalsignalprocessing),supercomputersareincreasinglylimitedbythelaws of physics andnotbyarchitecture.ThecommunicationcostfortheFFT(inthe limitwheredataaccessislimitedbythespeedoflightanditsphysicaldistance) isthus not order N log 2 N .

Figure 4 showsatypical(16-point)FFTdiagramwherethenodes(brown dots)andedges(linesinblue)representthedatapointsandcommunications ineachstagerespectively.ThefigureillustratesaDIT-FFT,butaDIF-FFTis simplyitsmirrorimage,sothefollowingargumentappliestoeithercase.For anyinputontheleftside,datatravelsinthe y dimensionbyabsolutedistance 1, 2, 4,..., N 2 positions,atotalmotionof N 1positions.Simplisticmodelsof performanceassumealledgesareofequaltimecost,butthisisnottrueifthe physicallimitsoncommunicationspeedareconsidered.Thetotalmotioncostof N 1positionsholdsforeachofthe N datapoints,hencethetotalcommunicationworkisorder N 2 ,thesameordercomplexityasaDFTwithoutanyclever factoring.Thisassumesmemoryisphysicallyplacedinaline.Inarealsystem likeasupercomputerclustercoveringmanysquaremeters,memoryisdistributed overaplane,forwhichtheaveragedistancebetweenlocationsisorder N 1/2 ,or inavolume,forwhichtheaveragedistanceisorder N 1/3 .Thoseconfigurations resultinphysics-limitedFFTcommunicationcomplexityoforder N 3/2 or N 4/3 respectively,bothofwhichgrowfasterwith N thandoes N log 2 N .Itispossible

Fig.3.

todoall-to-allexchangespartwaythroughtheFFTtomakecommunications localagain,butthismerelyshiftsthecommunicationcostintotheall-to-all exchange.Thus,itisnotsurprisingthatlarge-scalesupercomputersattainonly afractionoftheirpeakarithmeticperformancewhenperformingFFTs.

Thisobservationpointsustoadifferentapproach:reducethe bitsmoved, nottheoperationcount.Thecommunicationcostgrowslinearlywiththenumberofbitsusedperdataitem.Theuseofadataformat,i.e.posits,thathas considerablymoreinformation-per-bitthanIEEE754floatingpointnumbers cangenerateanswerswithacceptableaccuracyusingfewerbits.Inthefollowing section,16-bitpositswillbeusedtocomputeFFTswithhigheraccuracythan 16-bit(IEEEhalf-precision)floats,potentiallyallowing32-bitfloatstobesafely replacedby16-bitposits,doublingthespeed(byhalvingthecommunication cost)ofsignalandimageprocessingworkloads.WespeculatethatsimilarperformancedoublingispossibleforHPCworkloadsthatpresentlyuse64-bitfloats forFFTs,bymakingthemsufficientlyaccurateusing32-bitposits.

4Approach

4.1Accuracy

Totesttheeffectivenessofposits,1024-and4096-pointFFTsareusedasthe sizesmostcommonlyfoundintheliterature.Bothradix-2andradix-4methods arestudiedhere,butnotsplit-radix[9, 30].Althoughmodifiedsplit-radixhas thesmallestoperationcount(about3 88N log 2 N ),fixed-pointstudies[4]show ithaspooreraccuracythanradix-2andradix-4methods.

BothDITandDIFmethodsaretested.TheDIFapproachintroducesmultiplicativeroundingearlyintheprocessing.Intuitionsaysthismightpollutethe latercomputationsmorethantheDITmethodwherethefirstpassesperformno multiplications.Empiricaltestsareconductedtocheckthisintuitionwiththree numericalapproaches:

Fig.4. AtypicalFFTdiagram

–16-bitIEEEstandardfloats,withmaximumuseoffusedmultiply-addoperationstoreduceroundingerrorinthemultiplicationsofcomplexnumbers, –16-bitposits(eS =1)withexactlythesameoperationsasusedforfloats, and –16-bitpositsusingthequiretofurtherreduceroundingerrortoonerounding perpassoftheFFT.

Foreachofthese24combinationsofdata-pointsize,radix,decimationtype, andnumericalapproach,randomuniformdistributioninputsignalsintherange [ 1, 1)attheresolutionofa12-bitADCarecreated.The12-bitfixed-point ADCinputsarefirsttransformedintotheircorresponding16-bitpositsand floatsasshowninFig. 5.A“round-trip”(forwardfollowedbyinverse)FFTis thenappliedbeforetheresultsareconverted(withrounding)backtothe12bitADCfixed-pointformat(representedby ADC inFig. 5).Ifnoerrorsand roundingsoccur,theoriginalsignalisrecoveredperfectly,i.e. ADC = ADC

Around-tripFFTfora12-bitADC

Theabsoluteerrorofa12-bitADCinputcanthusbecomputedasshown inEq. 3.Thiserrorrepresentstheroundingerrorsincurredbypositsandfloats respectively.

Toevaluatetheaccuracyofpositsandfloats,thevectorofabsoluteerrors ofallADCinputsisevaluated.Threeflavorsofmeasures,themaximum(L∞ norm),RMS(L2 norm)andaverage(L1 norm)ofthevectorarecomputed.

Togainadditionalinsighttotheerror,theunitsinthelastplace(ULPs) metricisused.AsshownbyGoldberg[10,p.8],ULPerrormeasureissuperior torelativeerrorformeasuringpureroundingerror.ULPerrorcanbecomputed asfollowsasshowninEq. 4.Fora12-bitfixed-pointADC, ulp(ADC )isa constantvalue(2 11 forinputin[ 1, 1)range).

ULPerror = ADC ADC ulp(ADC ) (4) where ulp(ADC )isoneunitvalueofADC lastplaceunit.

Fig.5.

Inthecaseofapositsandfloats,evenifanansweriscorrectlyrounded, therewillbeanerrorofasmuchas0.5ULP.Witheachadditionalarithmetic operation,theerrorsaccumulate.Consequently,tominimizetheeffectofroundingerrors,itisimportanttominimizetheerrore.g.byusingthequiretodefer roundingasmuchaspossible.

BecauseanADChandlesonlyafiniterangeofuniformly-spacedpoints adjustedtothedynamicrangeitneedstohandle,aGaussiandistributionthatis unboundedisdeemedunsuitableforthisevaluation.Auniformly-spaceddistributionthatisboundedtotherequiredrangeisusedinstead.Preliminarytests werealsoconductedonbell-shapeddistributions(truncatedGaussiandistributions)confinedtosamerange,[ 1, 1)representingthreestandarddeviations fromthemeanat0;theyyieldedresultssimilartothosefortheuniformdistributiontestspresentedhere.

For16-bitfixed-point,werelyonanalysisbecauseitobviatesexperimentation.AftereverypassofanFFT,theFFTvaluesmustbescaledby1/2to guaranteethereisnooverflow.ForanFFTwith22k points,theresultofthe2k scalingswillbeananswerthatistoosmallbyafactorof2k ,soitmustbescaled upbyafactorof2k ,shiftingleftby k bits.Thisintroduceszerosontheright thatrevealthelossofaccuracyofthefixed-pointapproach.Thelossofaccuracy is5bitsfora1024-pointFFT,and6bitsfora4096-pointFFT.InFPGAdevelopment,itispossibletousenon-power-of-twodatasizeseasily,andfixedpoint canbemadetoyieldacceptable1024-pointFFTresultsifthedatapointshave 18bitsofprecision[25].Sincefixedpointrequiresmuchlesshardwarethanfloats (orposits),thisisanexcellentapproachforspecial-purposeFPGAdesigns.In themoregeneralcomputingenvironmentwheredatasizesarepower-of-twobits insize,aprogrammerusingfixed-pointformathaslittlechoicebuttoupsizeall thevaluesto32-bitsize.Thesamewillbeshowntruefor16-bitfloats,which cannotachieveacceptableaccuracy.

4.2Performance

Theperformanceoflarge-scaleFFTsiscommunicationbound,aspointedout inSect. 3.3.Thereductioninthesizeofoperandsnotonlyreducestimeproportionately,butalsoreduce cachespill effects.Forexample,a1024-by-1024 2D-FFTcomputedwith16-bitpositswillfitina4MBcache.IftheFFTwas performedusing32-bitfloatstoachievesufficientaccuracy,thedatawillnotfit incacheandthecache“spill”willreduceperformancedramaticallybyafactor ofmorethantwo.Thisisawell-knowneffect.

However,thereisaneedtoshowthatpositarithmeticcanbeasfastasfloat arithmetic,possiblyfaster.Otherwise,theprojectedbandwidthsavingsmightbe offsetbyslowerarithmetic.UntilVLSIprocessorsusingpositsasanativetype arecomplete,arithmeticperformancecomparisonsbetweenpositsandfloatsof thesamebitsizecanbeperformedwithsimilarimplementationsinsoftware.

Asoftwarelibrary,SoftPosit,isusedinthisstudy.Itiscloselybasedon BerkeleySoftFloat[17](Release3d)Similarimplementationandoptimization techniquesareadoptedtoenableafaircomparisonoftheperformanceof16-bit

positsversus16-bitfloats.Note:theperformanceresultsonpositsarepreliminarysinceSoftPositisanewlibrary;26yearsofoptimizationefforthavebeen putintoBerkeleySoftFloat.

Table1. Testmachinespecification

Thespecificationofthemachineusedtoevaluatetheperformanceisshown inTable 1.BothSoftPositandSoftFloatarecompiledwithGNUGCC4.8.5with optimizationlevel“O2”andarchitecturesetto“core-avx2”.

Thearithmeticoperationsofposit 16, 1 andquire128 16, 1 ,andofIEEE Standardhalf-precisionfloats(float 16, 5 )areshowninTable 2.Eachoperation isimplementedusingintegeroperatorsinC.Withtheexceptionofthefuseddot product(apositarithmeticfunctionalitythatisnotintheIEEE754Standard), thereisanequivalentpositoperationforeveryfloatoperationshowninTable 2

ThemostsignificantroundingerrorsthatinfluencetheaccuracyofDFT algorithmsoccurineachbutterflycalculation,the bfly routine.Toreducethe numberofroundingerrors,fusedmultiply-addsareleveraged.Positarithmetic canperformfusedmultiply-adds,orbetter,leveragethefuseddotproductswith thequiredatatypetofurtherreducetheaccumulationofroundingerrors.

Thetwiddlefactorsareobtainedusingaprecomputedcosinetablewith1024 pointstostorethevaluesofcos(0)tocos( π 2 ).Thesineandcosinevaluesforthe entireunitcirclearefoundthroughindexedreflectionsandnegationsofthese discretevalues.

A1D-FFTwithinputsizes1024and4096isusedtocheckthecomputationalperformancewithposit 16, 1 withoutquire,posit 16, 1 withquire,and float 16, 5 bytaggingeachruntothesameselectedcore.

5Results

5.1Accuracy

Figure 6 showstheaverageRMSerrorsfor1024-pointtests,representinghundredsofthousandsofexperimentaldatapoints.TheRMSerrorbargraphforthe 4096-pointFFTslooksverysimilarbutbarsareuniformlyabout12%higher. TheverticalaxisrepresentsUnitsintheLastPlace(ULP)attheresolutionof

Table2. ArithmeticOperations

Arithmeticoperations Posit 16, 1 functions Float 16, 5 functions

Add p16 add f16 add

Subtract p16 sub f16 sub

Multiply p16 mul f16 mul

Divide p16 div f16 div

Fusedmultiply-add p16 mulAdd f16 mulAdd

Fuseddotproduct-add q16 fdp add Notapplicable

Fuseddotproduct-sub q16 fdp sub Notapplicable

a12-bitADCasdescribedintheprevioussection.BesidestheRMSerror, L1 and L∞ errorsarecalculatedandshowanearlyidenticalpattern,differingonly intheverticalscale.

Fig.6. RMSerrorspervalue ×106 for1024-pointFFTs

Theobviousdifferenceisthat16-bitpositshaveabout1/4theroundingerror offloatswhenrunningalgorithmsthatroundatidenticalplacesinthedataflow. Thisfollowsfromthefactthatmostoftheoperationsoccurwherepositshave twobitsgreateraccuracyinthefractionpartoftheirformat.Theuseofthe quirefurtherreduceserrors,byasmuchas1.8× forthecaseoftheradix-4form oftheFFT.(Similarly,roundingerrorisabout16timeslessfor32-bitposits thanfor32-bitfloatssince32-bitpositshave28bitsinthesignificandforvalues withmagnitudesbetween1/16and16,comparedwith24bitsinthesignificand ofastandardfloat).

Theotherdifferences,betweenDITandDIF,orbetweenradix-2andradix4,aresubtlebutstatisticallysignificantandrepeatable.Forpositswithoutthe quire,radix-4isslightlylessaccuratethanradix-2becauseintermediatecalculationsinthelongerdotproductcanstrayintotheregionswherepositshave onlyonebitinsteadoftwobitsofgreaterfractionprecision.However,thequire providesastrongadvantageforradix-4,reducingthenumberofroundingevents perresulttoonly4perpointfora1024-pointFFT,and4morefortheinverse.

However,Fig. 6 understatesthesignificanceofthehigheraccuracyofposits. TheoriginalADCsignalisthe“goldstandard”forthecorrectanswer.Bothfloat orpositcomputationsmakeroundingerrorsinperformingaround-tripFFT. Additionally,weroundthatresultagain toexpressitinthefixed-pointformat usedbytheADC,asshowninFig. 5.Oncetheresultisaccuratetowithin0.5 ULPoftheADCinputformat,itwillroundtotheoriginalvaluewith no error. Butiftheresultismorethan0.5ULPfromtheoriginalADCvalue,it“fallsoff acliff”androundstothewrongfixed-pointnumber.Becauseofthisinsight,we focusonthenumberofbitswrongcomparedtotheoriginalsignal(measuredin ADCULPs)andnotmeasuressuchasRMSerrorordBsignal-to-noiseratios thataremoreappropriatewhennumericalerrorsarefrequentandpervasive.

SupposewemeasureandsumtheabsoluteULPsoferrorforeverydatapoint inaround-trip(radix-2,DIF)1024-pointFFT.Figure 7ashowsthatthemassive lossesproducedby16-bitfloatsprecludetheiruseforFFTs.

Forposits,ontheotherhand,97.9%ofthevaluesmaketheroundtripwithall bitsidenticaltotheoriginalvalue.Ofthe2.1%thatareoff,theyareoffbyonly 1ULP.Whilethereversibilityisnotmathematicallyperfect,itisnearlyso,and maybeaccurateenoughtoeliminatetheneedtouse32-bitdatarepresentation. Thebarchartshows16-bitpositstobeabout36timesasaccurateas16-bitfloats inpreservingtheinformationcontentoftheoriginaldata.Theratioissimilar fora4096-pointFFT,showninFig. 7b;theyarealmostindistinguishableexcept fortheverticalscale.

(a)Around-trip1024-pointFFT(b)Around-trip4096-pointFFT

Figure 8ashowsanotherwaytovisualizetheerror,plottingtheerrorinULPs asafunctionofthedatapoint(realparterrorsinblue,andimaginaryparterrors inorange).TheerrorsareaslargeassixULPsfromtheoriginaldata,and68%of theround-tripvaluesareincorrect.AnerrorofsixULPsrepresentsaworst-case lossof1.8decimalsofaccuracy(aone-bitlossrepresentsabout0.3decimalloss), and16-bitfloatsonlyhaveabout3.6decimalstobeginwith.Figure 8bshows theresultswhenusingpositsandthequire,withjustafewscatteredpointsthat donotlieonthe x-axis.Theinformationlossisveryslight.

Fig.7. TotalULPsoferror

(a)Floats(b)Posit+Quire

Fig.8. ULPserror,1024-pointround-tripFFT

(a)Posits(b)Floats

Fig.9. Percentround-triperrorsversusADCbitresolutionfor1024-pointFFTs

Caninformationlossbereducedto zero ?Itcan,asshowninFig. 9.Ifwewere tousealower-resolutionADC,with11or10bitsofresolution,the16-bitposit approachcanresultinperfectreversibility;nobitsarelost.Low-precisionADCs areinwidespreaduse,allthewaydowntofast2-bitconvertersthatproduce onlyvalues 1,0,or1.Theoilandgasindustryisnotoriousformovingseismic databythetruckload,literally,andtheystoretheirFFTsoflow-resolutionADC outputin32-bitfloatstoprotectagainstanylossofdatathatwasveryexpensive toacquire.Asimilarsituationholdsforradioastronomy,X-raycrystallography, magneticresonanceimaging(MRI)data,andsoon.Theuseof16-bitposits couldcutallthestorageanddatamotioncostsinhalffortheseapplications. Figure 9 showsthattheinsufficientaccuracyof16-bitfloatsforcestheuseof 32-bitfloatstoachievelosslessreversibility.

5.2Performance

Theperformanceofarithmeticoperations,add,subtract,multiplyanddivide, fromSoftPositandSoftFloatarecomputedexhaustivelybysimulatingallpossiblecombinationsoftheinputsintherange[ 1, 1]wheremostoftheFFT calculationsoccur.Fusedmultiply-addandfuseddotproductwillrequirefrom daystoweekstobeexhaustivelytestedontheselectedmachine.Consequently, thetestsaresimplifiedbyreusingfunctionarguments,i.e.restrictingthetest

torunonlytwonestedloopsexhaustivelyinsteadofthree.Tenrunsforeach operationwereperformedonaselectedcoretoeliminateperformancevariations betweencoresandremovethevariationcausedbyoperatingsysteminterrupts andotherinterference.

Intheselectedinputrange,float 16, 5 andposit 16, 1 have471951364and 1073807361bitpatternsrespectively.Thusposit 16, 1 hasmoreaccuracythan float 16, 5 .Oneofthereasonswhyfloats 16, 5 havefewerthanhalfthebit patternscomparedtoposits 16, 1 isduetothereserved2048bit-patternsfor “non-numbers”,i.e.whenallexponentbitsare1s.“Non-numbers”represent positiveandnegativeinfinityandnot-a-number(NaN).Incomparison,posits donotwastebitpatternsandhaveonlyoneNot-a-Real(NaR)bitpattern, 100...000,andonlyonerepresentationforzero,000 000.Additionally,posits havemorevaluescloseto ±1andto0thandofloats.

Theperformanceinoperationspersecondforargumentsintherangeof [ 1, 1]isshowninFig 10

Fig.10. Posit 16, 1 versusfloat 16, 5 performanceinrange[ 1, 1]

Theresultsshowthatpositshaveslightlybetterperformanceinmultiplyand divideoperationswhilefloatshaveslightlybetterperformanceinadd,subtract andFMAoperations.“FDP-add”and“FDP-sub”areadditionsandsubtractions ofproductsusingthequirewhencomputingafuseddotproduct.Theyshow higherperformancethanFMAduetothefactthatoneofthearguments,the quire,doesnotrequireadditionalshiftingbeforeadding/subtractingthedot productoftwootherpositarguments.Italsodoesnotneedtoperformrounding untilitcompletesallaccumulations,whichsavestime.Whenappropriatelyused, quirescanpotentiallyimproveperformancewhileminimizingroundingerrors.

Add Subtract Multiply Divide FMA FDP-add FDP-sub

6ConclusionsandFutureWork

Wehaveshownthat16-bitpositsoutperform16-bitfloatsandfixed-pointsin accuracyforradix-2andradix-4,1024-and4096-pointFFTsforbothDITand DIFclasses.Tohaveaccuracysimilartothatof16-bitposits,32-bitsfloats wouldhavetobeused.WhenADCinputsare11-bitsorsmaller,16-bitposits cancomputecompletelylossless“round-trip”FFTs.16-bitpositshavecomparablecomputationperformancetothatof16-bitfloats,butapproximatelytwice theperformanceonbandwidthboundtaskssuchastheFFT.Becauseposit arithmeticisstillinitsinfancy,theperformanceresultshownhereusingan in-housesoftwareemulator,SoftPosit,ispreliminary.

Whileweherestudiedexamplesfromthelow-precisionsideofHPC,the advantagesofpositsshouldalsoapplytohigh-precisionFFTapplicationssuchas abinitio computationalchemistry,radarcross-sectionanalysis,andthesolution ofpartialdifferentialequations(PDEs)byspectralmethods.Forsomeusers, positsmightbedesirabletoincreaseaccuracyusingthesameprecision,instead oftoenabletheuseofhalfasmanybitspervariable.A32-bitpositisnominally 16timesasaccurateperoperationasa32-bitfloatintermsofitsfractionbits, thoughtestsonrealapplicationsshowtheadvantagetobemorelike50times asaccurate[26],probablybecauseoftheaccumulationoferrorintime-stepping physicalsimulations.

AnotherareaforfutureFFTinvestigationistheWinogradFFTalgorithm [35, 36],whichtradesmultiplicationsforadditions.Normally,summingnumbersofsimilarmagnitudeandoppositesignmagnifiesanyrelativeerrorinthe addends,whereasmultiplicationsarealwaysaccurateto0.5ULP,soWinograd’s approachmightseemdubiousforfloatsorposits.However,therangeofvalues whereFFTstakeplacearerichinadditionsthatmakenoroundingerrors,so thisdeservesinvestigation.

ItiscrucialtonotethattheadvantageofpositsisnotlimitedtoFFTsand weexpecttoexpandexperimentalcomparisonsoffloatandpositaccuracyfor otheralgorithmssuchaslinearequationsolutionandmatrixmultiplication,for bothHPCandmachinelearning(ML)purposes.Onepromisingareafor32-bit positsisweatherprediction(whichfrequentlyreliesonFFTsthataretypically performedwith64-bitfloats).

Large-scalesimulationstypicallyachieveonlysingle-digitpercentagesofthe peakspeedofanHPCsystem,whichmeansthearithmeticunitsarespending mostoftheirtimewaitingforoperandstobecommunicatedtothem.Hardware bandwidthshavebeenincreasingveryslowlyforthelastseveraldecades,and therehasbeennosteeptrendlineforbandwidthliketherehasbeenfortransistorsizeandcost(Moore’slaw).High-accuracypositarithmeticpermitsuse ofreduceddatasizes,whichpromisestoprovidedramaticspeedupsnotjustfor FFTkernelsbutfortheverybroadrangeofbandwidth-boundapplicationsthat presentlyrelyonfloating-pointarithmetic.

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.
Where can buy Next generation arithmetic: 4th international conference, conga 2023, singapore, march by Education Libraries - Issuu