Issuu

NextGenerationArithmetic:4thInternational Conference,CoNGA2023,Singapore,March1-2, 2023,ProceedingsJohnGustafson

https://ebookmass.com/product/next-generationarithmetic-4th-international-conferenceconga-2023-singapore-march-1-2-2023-proceedings-johngustafson/

Instant digital products (PDF, ePub, MOBI) ready for you

Download now and discover formats that fit your needs...

Dynamic Intentions: DCMC Next Generation (DreamCatcher MC Next Generation Book 1) Liberty Parker

https://ebookmass.com/product/dynamic-intentions-dcmc-next-generationdreamcatcher-mc-next-generation-book-1-liberty-parker/

ebookmass.com

Beyond Camps and Forced Labour : Proceedings of the Sixth International Conference 1st Edition Suzanne Bardgett

https://ebookmass.com/product/beyond-camps-and-forced-labourproceedings-of-the-sixth-international-conference-1st-edition-suzannebardgett/

ebookmass.com

Craving the Fight (Gloves Off - Next Generation Book 1) L.P. Dover

https://ebookmass.com/product/craving-the-fight-gloves-off-nextgeneration-book-1-l-p-dover/ ebookmass.com

Handbook of MRI Scanning 1st Edition

https://ebookmass.com/product/handbook-of-mri-scanning-1st-edition/

ebookmass.com

Computational Methods in Organometallic Catalysis: From Elementary Reactions to Mechanisms Yu Lan

https://ebookmass.com/product/computational-methods-in-organometalliccatalysis-from-elementary-reactions-to-mechanisms-yu-lan/

ebookmass.com

Sociology of the Arts in Action Arturo Rodríguez Morató

https://ebookmass.com/product/sociology-of-the-arts-in-action-arturorodriguez-morato/

ebookmass.com

eTextbook 978-1138668386 Cross-Cultural Psychology: Critical Thinking and Contemporary Applications Sixth Edition

https://ebookmass.com/product/etextbook-978-1138668386-cross-culturalpsychology-critical-thinking-and-contemporary-applications-sixthedition/

ebookmass.com

Fortune Favors the Duke Kristin Vayden

https://ebookmass.com/product/fortune-favors-the-duke-kristinvayden-3/

ebookmass.com

Single

Skin and Double Skin Concrete Filled Tubular Structures : Analysis and Design Mohamed Elchalakani

https://ebookmass.com/product/single-skin-and-double-skin-concretefilled-tubular-structures-analysis-and-design-mohamed-elchalakani/

ebookmass.com

Gender-Critical Feminism 1st Edition

https://ebookmass.com/product/gender-critical-feminism-1st-editionholly-lawford-smith/

ebookmass.com

John Gustafson

Siew Hoon Leong

Marek Michalewicz (Eds.)

Next Generation Arithmetic

4th International Conference, CoNGA 2023 Singapore, March 1–2, 2023

Proceedings

LectureNotesinComputerScience13851

FoundingEditors

GerhardGoos

JurisHartmanis

EditorialBoardMembers

ElisaBertino, PurdueUniversity,WestLafayette,IN,USA

WenGao, PekingUniversity,Beijing,China

BernhardSteffen , TUDortmundUniversity,Dortmund,Germany

MotiYung , ColumbiaUniversity,NewYork,NY,USA

TheseriesLectureNotesinComputerScience(LNCS),includingitssubseriesLecture NotesinArtiﬁcialIntelligence(LNAI)andLectureNotesinBioinformatics(LNBI), hasestablisheditselfasamediumforthepublicationofnewdevelopmentsincomputer scienceandinformationtechnologyresearch,teaching,andeducation.

LNCSenjoysclosecooperationwiththecomputerscienceR&Dcommunity,the seriescountsmanyrenownedacademicsamongitsvolumeeditorsandpaperauthors,and collaborateswithprestigioussocieties.Itsmissionistoservethisinternationalcommunitybyprovidinganinvaluableservice,mainlyfocusedonthepublicationofconference andworkshopproceedingsandpostproceedings.LNCScommencedpublicationin1973.

JohnGustafson·SiewHoonLeong·

MarekMichalewicz

Editors

NextGeneration

Arithmetic

4thInternationalConference,CoNGA2023

Singapore,March1–2,2023

Proceedings

Editors

JohnGustafson

ArizonaStateUniversity

Tempe,AZ,USA

MarekMichalewicz

NationalSupercomputingCentre

Singapore,Singapore

SiewHoonLeong

SwissNationalSupercomputingCentre ETHZurich

Lugano,Switzerland

ISSN0302-9743ISSN1611-3349(electronic) LectureNotesinComputerScience

ISBN978-3-031-32179-5ISBN978-3-031-32180-1(eBook) https://doi.org/10.1007/978-3-031-32180-1

©TheEditor(s)(ifapplicable)andTheAuthor(s),underexclusivelicense toSpringerNatureSwitzerlandAG2023

Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,speciﬁcallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicroﬁlmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodologynow knownorhereafterdeveloped.

Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspeciﬁcstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse.

Thepublisher,theauthors,andtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsorthe editorsgiveawarranty,expressedorimplied,withrespecttothematerialcontainedhereinorforanyerrors oromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictionalclaimsin publishedmapsandinstitutionalafﬁliations.

ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland

Preface

Computerarithmeticisonceagainacontroversialtopic.Beforetheestablishmentof IEEEStd754™forﬂoating-pointarithmeticin1985,nooneexpectedsimilaranswers fromdifferentmakesofcomputers,becausethevendorsalluseddifferentwaystorepresentrealnumbers.IEEE754compliancemadeitpossibletogetsimilar,albeitnot identical answersondifferentsystems.Italsoledtothecasualuseofdoubleprecision (64-bit)ﬂoating-pointeverywhereasaneasysubstituteforhavingtothinkcarefully aboutroundingerrors.

AdherencetothatStandardbegantowaneintheearly2000sassystemdesigners begantoabandonhardwaresupportforthemanycomplicatedandexpensiveprovisions ofIEEE754,suchasgradualunderﬂow.Differentvendorsmadedifferentshortcut choices,resultingagaininthesituationthatnooneexpectssimilaranswersfromdifferent computers.Ascomputingbecameincreasinglybandwidth-limited,pressuremountedto try32-bitandeven16-bitrepresentationswithoutincurringunacceptableaccuracyloss. Lowerprecisionsmademoreobvioustheinefﬁciencyoftheoutdated754Standard,and theMachineLearningcommunityinparticularisexperimentingwithnewrealnumber formats.WhilesomeclingtotheideathattheIEEE754Standardwilllastforever, anincreasingpartofthecomputingcommunityrecognizestheneedforfreshdesigns moresuitedtopresent-daypriorities.Thecontroversyisnotunlikethetransitionfrom sequentialcomputingtoparallelcomputinginthe1990s.

AspartofSCAsia2023,the ConferenceonNext-GenerationArithmetic (CoNGA 2023)isthepremierforumforthepresentationoftheimpactofnovelnumberformats on

• ApplicationSpeedandAccuracy

• HardwareCosts

• Software-HardwareCodevelopment

• AlgorithmChoices

• ToolsandProgrammingEnvironments

ThisisthefourthCoNGAconference,andthelargesttodate.The16submitted papersforthetechnicalpapersprogramwentthrougharigorouspeerreviewprocessby aninternationalprogramcommittee,withanaverageofthreereviewspersubmission. ElevenpaperswereselectedforinclusionintheProceedings.Acceptedpaperscover topicsrangingfrombetterwaystobuildarithmeticunitstotheapplicationconsequences ofthenewformats,manyusingthe posit formatstandardizedin2022.

WethankallauthorsfortheirsubmissionstoCoNGA.Oursincerethanksgotoall ProgramCommitteemembersfordoinghigh-qualityandin-depthsubmissionreviews.

WealsothanktheorganizersforgivingustheopportunitytoholdCoNGA2023asa sub-conferenceofSCAsia2023.

February2023JohnL.Gustafson MarekMichalewicz CerlaneLeong

Organization

Co-chairs

JohnL.GustafsonArizonaStateUniversity,USA MarekMichalewiczNationalSupercomputingCentre(NSCC), Singapore

ProgramChair

CerlaneLeongCSCS,Switzerland

ProgramCommittee

ShinYeeChungNSCC,Singapore MarcoCococcioniUniversityofPisa,Italy HimeshiDeSilvaA*STAR,Singapore VassilDimitrovLemurianLabs,Canada RomanIakymchukUmeåUniversity,Sweden PeterLindstromLawrenceLivermoreNationalLaboratory,USA AndrewShewmakerOpenEyeScientiﬁc,USA

LosslessFFTsUsingPositArithmetic....................................1 SiewHoonLeongandJohnL.Gustafson

Bedot:BitEfﬁcientDotProductforDeepGenerativeModels................19 Nhut-MinhHo,Duy-ThanhNguyen,JohnL.Gustafson, andWeng-FaiWong

AParadigmforInterval-AwareProgramming..............................38 MoritzBeutelandRobertStrzodka

Decoding-FreeTwo-InputArithmeticforLow-PrecisionRealNumbers.......61 JohnL.Gustafson,MarcoCococcioni,FedericoRossi, EmanueleRuffaldi,andSergioSaponara

HybridSORNHardwareAcceleratorforSupportVectorMachines...........77 NilsHülsmeier,MoritzBärthel,JochenRust,andSteffenPaul

PHAc: Posit Hardware AcceleratorforEfﬁcientArithmeticLogic Operations...........................................................88 DikshaShekhawat,JugalGandhi,M.Santosh,andJaiGopalPandey

FusedThree-InputSORNArithmetic.....................................101 MoritzBärthel,ChenYuxing,NilsHülsmeier,JochenRust, andSteffenPaul

TowardsaBetter16-BitNumberRepresentationforTrainingNeural Networks.............................................................114 HimeshiDeSilva,HongshiTan,Nhut-MinhHo,JohnL.Gustafson, andWeng-FaiWong

ImprovingtheStabilityofKalmanFilterswithPositArithmetic..............134 PonsuganthIlangovanP.,RohanRayan,andVinayShankarSaxena

EvaluationoftheUseofLowPrecisionFloating-PointArithmetic forApplicationsinRadioAstronomy.....................................155 ThusharaKanchanaGunaratne

PLAUs:PositLogarithmicApproximateUnitstoImplementLow-Cost OperationswithRealNumbers..........................................171

RaulMurillo,DavidMallasén,AlbertoA.DelBarrio, andGuillermoBotella

AuthorIndex

LosslessFFTsUsingPositArithmetic

SiewHoonLeong1(B) andJohnL.Gustafson2

1 SwissNationalSupercomputingCentre,ETHZurich,Zurich,Switzerland cerlane.leong@cscs.ch 2 ArizonaStateUniversity,Tempe,USA jlgusta6@asu.edu

Abstract. TheFastFourierTransform(FFT)isrequiredforchemistry, weather,defense,andsignalprocessingforseismicexplorationandradio astronomy.Itiscommunication-bound,makingsupercomputersthousandsoftimessloweratFFTsthenatdenselinearalgebra.Thekey toacceleratingFFTsistominimizebitsperdatumwithoutsacrificing accuracy.The16-bitfixedpointandIEEEfloattypelacksufficientaccuracyfor1024-and4096-pointFFTsofdatafromanalog-to-digitalconverters.Weshowthatthe16-bitposit,withhigheraccuracyandlarger dynamicrange,canperformFFTssoaccuratelythataforward-inverse FFTrestorestheoriginalsignalperfectly.“Reversible”FFTswithposits arelossless,eliminatingtheneedfor32-bitorhigherprecision.Similarly, 32-bitpositFFTscanreplace64-bitfloatFFTsformanyHPCtasks. Speed,energyefficiency,andstoragecostscanthusbeimprovedby2× forabroadrangeofHPCworkloads.

Keywords: Posit · Quire · FFT · ComputerArithmetic

1Introduction

The posit ™ numberformatistheroundedformofTypeIIIuniversalnumber (unum)arithmetic[13, 16].ItevolvedfromTypeIIunumsinDecember2016as ahardware-friendlydrop-inalternativetothefloating-pointIEEEStd754™ [19]. Thetaperedaccuracyofpositsallowsthemtohavemorefractionbitsinthemost commonlyusedrange,thusenablingpositstobemoreaccuratethanfloatingpointnumbers(floats)ofthesamesize,yethaveanevenlargerdynamicrange thanfloats.Positarithmeticalsointroducesthe quire,anexactaccumulatorfor fuseddotproducts,thatcandramaticallyreduceroundingerrors.

ThecomputationoftheDiscreteFourierTransform(DFT)usingtheFast FourierTransform(FFT)algorithmhasbecomeoneofthemostimportantand powerfultoolsofHighPerformanceComputing(HPC).FFTsareinvestigated heretodemonstratethespeedandaccuracyofpositarithmeticwhencompared toﬂoating-pointarithmetic.ImprovingFFTscanpotentiallyimprovetheperformanceofHPCapplicationssuchasCP2K[18],SpecFEM3D[20, 21],and SupportedbyorganizationA*STARandNSCCSingapore.

c TheAuthor(s),underexclusivelicensetoSpringerNatureSwitzerlandAG2023 J.Gustafsonetal.(Eds.):CoNGA2023,LNCS13851,pp.1–18,2023. https://doi.org/10.1007/978-3-031-32180-1 1

WRF[31],leadingtohigherspeedandimprovedaccuracy.“Precision”here referstothenumberofbitsinanumberformat,and“accuracy”referstothe correctnessofananswer,measuredinthenumberofcorrectdecimalsorcorrect bits.Positsachieveordersofmagnitudesmallerroundingerrorswhencompared toﬂoatsthathavethesameprecision[26, 27].Thus,thecommonly-used32-bit and64-bitprecisionﬂoatscanpotentiallybereplacedwith16-bitand32-bit positsrespectivelyinsomeapplications,doublingthespeedofcommunicationboundcomputationsandhalvingthestorageandpowercosts.

WesayFFTaccuracyis lossless whentheinverseFFTreproducestheoriginal signalperfectly;thatis,theFFTis reversible.Theroundingerrorof16-bitIEEE floatsand16-bitfixed-pointformataddstoomuchnoiseforthoseformatsto performlosslessFFTs,whichforcesprogrammerstouse32-bitfloatsforsignal processingtasks.Incontrast,16-bitpositshaveenoughaccuracyfora“round trip”tobelosslessattheresolutionofcommonAnalog-to-DigitalConverters (ADCs)thatsupplytheinputdata.Oncereversibilityisachieved,theuseof morebitsofprecisioniswastefulsincethetransformationretainsallinformation.

Thepaperisorganizedasfollow:InSect. 2,relatedworkonposits,fixedpointFFTsandfloating-pointFFTswillbeshared.Backgroundinformationon thepositandquireformatandtheFFTisprovidedinSect. 3.Section 4 presents theapproachusedtoevaluatetheaccuracyandperformanceofradix-2and radix-4(1024-and4096-point)FFTs(Decimation-In-TimeandDecimation-InFrequency)usingboth16-bitpositsand16-bitfloats.TheresultsoftheevaluationarediscussedinSect. 5.Finally,theconclusionsandplansforfuturework arepresentedinSect. 6.

2RelatedWork

PositsareanewformofcomputerarithmeticinventedbyGustafsoninDecember2016.TheconceptwasfirstpubliclysharedasaStanfordlectureseminar[16] inFebruary2017.Thefirstpeer-reviewedpositjournalpaper[15]waspublished inJune2017.Sincethen,studiesonpositcorrectness[6],accuracy,efficiency whencomparedtofloats[26]andvarioussoftwareandField-ProgrammableGate Array(FPGA)implementations[5, 24, 29]havebeenperformed.Duetotheflexibilitytochoosetheprecisionrequiredandexpresshighdynamicrangeusing veryfewbits,researchershavefoundpositsparticularlywell-suitedtomachine learningapplications.[23]hasdemonstratedthatverylowprecisionpositsoutperformfixed-pointandallothertestedformatsforinference,thusimproving speedfortime-criticalAItaskssuchasself-drivingcars.

EﬀortstoimproveDFTsbyreducingthenumberofoperationscanbetraced backtotheworkofCooleyandTukey[8]in1965,whoseimprovementsbased onthealgorithmofGood[12]reducedtheoperationcomplexityfrom O (N 2 )to O (N log 2 N ),nowcalledFFTs[7,p.1667].Additionalworktofurtherimprove theFFTalgorithmledtoradix-2m algorithms[9, 30],theRader-Brenneralgorithm[30],theWinogradalgorithm(WFTA)[35, 36]andprimefactoralgorithms (PFA)[11, 33].Inpractice,radix-2,radix-4,andsplit-radixarethemostwidely adoptedtypes.

Theeffectofarithmeticprecisiononperformancehasbeenstudied[1, 1]. Foroptimumspeed,thelowestpossiblebitprecisionshouldbeusedthatstill meetsaccuracyrequirements.Fixed-pointasopposedtofloating-pointistraditionallyusedtoimplementFFTalgorithmsincustomDigitalSignalProcessing (DSP)hardware[3]duetothehighercostandcomplexityoffloating-pointlogic. Fixed-pointreducespowerdissipationandachieveshigherspeed[3, 25].Custom Application-SpecificIntegratedCircuits(ASICs)andFPGAsallowtheuseof unusualandvariablewordsizeswithoutaspeedpenalty,butthatflexibilityis notavailableinageneralprogrammingenvironment.

3Background

Inthissection,positformat,thecorresponding quire exactaccumulatorformat, andtheFFTalgorithmarediscussed.

3.1Posits

Positsaremuchsimplerthanﬂoats,whichcanpotentiallyresultinfastercircuits requiringlesschiparea[2].ThePositStandard(2022)Themainadvantagesof positsoverﬂoatsare:

–Higheraccuracyforthemostcommonly-usedvaluerange –1-to-1mapofsignedbinaryintegerstoorderedrealnumbers –Bitwisereproducibilityacrossallcomputingsystems –Increaseddynamicrange

–Moreinformationperbit(higherShannonentropy) –Onlytwoexceptionvalues:zeroandNot-a-Real(NaR) –Supportforassociativeanddistributivelaws

Thedifferencesamong16-bitfloat,fixed-point,andpositformatsaredisplayedinFig. 1 whereeachcolorblockisabit,i.e.0or1.Thecolorsdepictthe fields(sign, exponent,fraction, integer,or regime)thatthebitsrepresent.

A16-bitfloat(Fig. 1a)consistsofasignbitand5exponentbits,leaving 10bitsforfractionafterthe“hiddenbit”.Ifsignaldataiscenteredabout0so thesignbitissignificant,a16-bitfloatiscapableofstoringsignedvaluesfrom ADCswithupto12bitsofoutput,butnolarger.

Althoughthenumberofbitstotheleftoftheradixpointcanbeflexibly chosenforafixed-pointformat(Fig. 1b),1024-pointFFTsofdataintherange 1to1canproducevaluesbetween 32and32ingeneral.Therefore,a16-bit fixed-pointformatfortheFFTrequiresintegerbitsthatincreasefrom2to6 withthestagesoftheFFT,leaving13to9bitsrespectivelyforthefraction part.Atfirstglance,fixed-pointwouldappeartohavethebestaccuracyfor FFTs,sinceitallowsthemaximumpossiblenumberoffractionbits.However, towardsthefinalstageofa1024-pointFFTcomputation,a16-bitfloatwillstill have10fractionbits(excludingthehiddenbit)whilefixed-pointwillonlyhave

(a)Floating-point

(b)Fixed-point

9fractionbitstoaccommodatethelargerworst-caseintegerpartitneedsto storetoavoidcatastrophicoverflow.For4096-pointFFTs,fixed-pointwillonly have8fractionbits.Positswillhave10to12fractionbitsfortheresultsofthe FFT.Consequently,16-bitfixed-pointhasthe lowest accuracyamongthethree numberformatsforboth1024-and4096-pointFFTs;positshavethehighest accuracy(seeFig. 2).Notealsothatthe“twiddlefactors”aretrigonometric functionsintherange 1to1,whichpositsrepresentwithabout0.6decimals greateraccuracythanfloats.

Aswithfloatsandintegers,themostsignificantbitofapositindicates thesign.The“regime”bitsusessignedunaryencodingrequiring2to15bits (Fig. 1c).Accuracytapers,withthehighestaccuracyforvalueswithmagnitudes near1andlessaccuracyforthelargestandsmallestmagnitudenumbers.Posit arithmetichardwarerequiresintegeradders,integermultipliers,shifters,leadingzerocountersandANDtreesverysimilartothoserequiredforIEEEfloats; however,posithardwareissimplerinhavingasingleroundingmode,nointernalflags,andonlytwoexceptionvaluestodealwith.Comparisonoperationsare thoseofintegers;noextrahardwareisneeded.Proprietarydesignsshowareductioningatecountforpositsversusfloats,forbothFPGAandVLSIdesigns,and areductioninoperationlatency[2].

ThedynamicrangeforIEEEStandard16-bitfloatsisfrom2 16 to65504, orabout6.0 × 10 8 to6.5 × 104 (12ordersofmagnitude).Floatsusetapered accuracyforsmall-magnitude(“subnormal”)valuesonly,makingtheirdynamic rangeunbalancedabout1.Thereciprocalofasmall-magnitudefloatoverflowsto infinity.For16-bitposits,theuseofasingle eS exponentbitallowsexpressionof magnitudesfrom2 28 to228 ,orabout3.7 × 10 9 to2.7 × 108 (almost17orders ofmagnitude).Posittaperedprecisionissymmetricalaboutmagnitude1,and reciprocationisclosedandexactforintegerpowersof2.Thus,theaccuracy advantageofpositsdoesnotcomeatthecostofreduceddynamicrange.

(c)Posit

Fig.1. Diﬀerentnumberformats

Comparisonof16-bitpositsandﬂoatsforaccuracy

3.2TheQuireRegister

Theconceptofthe quire [14,pp.80–84],aﬁxed-pointscratchvalue,originates fromtheworkofKulischandMiranker[22].Thequiredatatypeisusedto accumulatedotproductswithnoroundingerrors.Whentheaccumulatedresult isconvertedbacktopositformwithasinglerounding,theresultisa“fuseddot product”.Thequiredatatypeisusedtoaccumulatetheaddition/subtractionof aproductoftwoposits,usingexactﬁxed-pointarithmetic.Thus,aquiredata typeonlyneedstosupportaddandsubtractoperations,andobeythesamerules asintegeraddandsubtractoperations(augmentedwiththeabilitytohandlea NaRinputvalue).

Tostoretheresultofafuseddotproductwithoutanyrounding,aquire datatypemustminimallysupporttherange[minPos2 , maxPos2 ],where minPos isthesmallestexpressiblerealgreaterthanzeroand maxPos isthebiggest expressiblereal,foraparticular n-bitposit.Sincetherewillbeaneedforthe quiretoaccumulatetheresultsoffuseddotproductsoflongvectors,additional n 1bitsareprependedtothemostsigniﬁcantbitsascarryoverﬂowprotection. Thus,a16-bitpositwitha1-bit eS (posit 16, 1 )willhaveacorresponding128bitquire,notatedquire128 16, 1

Theuseofthequirereducescumulativeroundingerror,aswillbedemonstratedinSects. 4 and 5,andenablescorrectly-roundedfuseddotproducts. Noticethatthecomplexmultiply-addinanFFTcanbeexpressedasapair ofdotproducts,soallofthecomplexrotationsintheFFTneedincuronly oneroundingperrealandimaginarypart,insteadoffour(ifalloperationsare rounded)ortwo(iffusedmultiply-addisused).

3.3TheFFTAlgorithm

ThediscreteformoftheFouriertransform(DFT)canbewrittenas

Fig.2.

where

xn isthereal-valuedsequenceof N data-points, Xk isthecomplex-valuedsequenceof N data-points, k=0,...,N 1

Thesumisscaledby 1 √N suchthattheinverseDFThasthesameformother thanthesignoftheexponent,whichrequiresreversingthedirectionofthe angularrotationfactors(theimaginarypartofthecomplexvalue),commonly knownas“twiddles”or“twiddlefactors”:

Followingthisconvention,thetwiddlefactor e2πikn/N willbewrittenas w forshort.Whileitisalsopossibletohavenoscalinginonedirectionanda scalingof 1 N intheother,thishasonlythemeritthatitsavesoneoperationper pointinaforward-inversetransformation.Scalingby 1 √N makesbothforward andinversetransforms unitary andconsistent.TheformsasshowninEqn 1 and 2 havetheadditionaladvantagethattheykeepintermediatevaluesfrom growinginmagnitudeunnecessarily,apropertythatiscrucialforﬁxed-point arithmetictopreventoverﬂow,anddesirableforpositarithmeticsinceitmaximizesaccuracy.TheonlyvariantfromtraditionalFFTalgorithmsusedhereis thatthedatasetisscaledby0.5oneveryradix-4pass,oreveryotherpassof aradix-2FFT.Thisautomaticallyprovidesthe 1 √N scalingwhilekeepingthe computationsintherangewherepositshavemaximumaccuracy.Thescalingby 0.5canbeincorporatedintothetwiddlefactortabletoeliminatethecostofan extramultiplyoperation.

TheFFTisaformoftheDFTthatusesthefactthatthesummationscan berepresentedasamatrix-vectorproduct,andthematrixcanbefactoredto reducethecomputationalcomplexityto N log 2 N operations.Inthispaper,two basicclassesofFFTalgorithms,Decimation-In-Time(DIT)andDecimationIn-Frequency(DIF),willbediscussed.DITalgorithmsdecomposethetime sequencesintosuccessivelysmallersubsequenceswhileDIFalgorithmsdecomposethecoeﬃcientsintosmallersubsequences[28].

TraditionalanalysisofthecomputationalcomplexityoftheFFTcenterson thenumberofmultiplicationsandadditions.ThekerneloperationofanFFTis oftencalleda“butterfly”becauseofitsdataflowpattern(seeFig. 3).Theoriginal radix-2algorithmperforms10operations(6additionsand4multiplications) perbutterfly[34,p.42],highlightedinredinFig. 3,andthereare 1 2 N log 2 N butterflies,sotheoperationcomplexityis5N log 2 N forlarge N .Useofradix4 reducesthisto4 5N log 2 N ;splitradixmethodsare4N log 2 N ,andwithalittle morewhittling awaythiscanbefurtherreducedtoabout3 88N log 2 N [32]. Operationcount,however,isnotthekeytoincreasingFFTperformance,since theFFTiscommunication-boundandnotcomputation-bound.

SupercomputerusersareoftensurprisedbythetinyfractionofpeakperformancetheyobtainwhentheyperformFFTswhileusinghighly-optimizedvendorlibrariesthatarewell-tunedforaparticularsystem.TheTOP500listshows

Aradix-1“butterﬂy”calculation

manysystemsachievingover80%oftheirpeakperformanceformultiply-add operationsinadenselinearalgebraproblem.FFTs,whichhaveonlymultiply andaddoperationsandpredetermineddataﬂowforagivensize N ,mightbe expectedtoachievesimilarperformances.However,traditionalcomplexityanalysis,whichcountsthenumberofoperations,doesapoorjobofpredictingactual FFTexecutiontimings.

FFTsarethustheAchillesHeelofHPC,becausetheytaxthemostvaluableresource:datamotion.Inanerawhenperformanceisalmostalways communication-boundandnotcomputation-bound,itismoresensibletooptimizethe datamotion asopposedtothe operationcount.Denselinearalgebra involvesorder N 3 operationsbutonlyorder N 2 datamotion,makingitoneof thefewHPCtasksthatisstillcompute-bound.Denselinearalgebraisoneof thefewworkloadsforwhichoperationcountcorrelateswellwithpeakarithmetic performance.WhilesomehavestudiedthecommunicationaspectsoftheFFT basedonasingleprocessorwithacachehierarchy(avalidmodelforsmall-scale digitalsignalprocessing),supercomputersareincreasinglylimitedbythelaws of physics andnotbyarchitecture.ThecommunicationcostfortheFFT(inthe limitwheredataaccessislimitedbythespeedoflightanditsphysicaldistance) isthus not order N log 2 N .

Figure 4 showsatypical(16-point)FFTdiagramwherethenodes(brown dots)andedges(linesinblue)representthedatapointsandcommunications ineachstagerespectively.TheﬁgureillustratesaDIT-FFT,butaDIF-FFTis simplyitsmirrorimage,sothefollowingargumentappliestoeithercase.For anyinputontheleftside,datatravelsinthe y dimensionbyabsolutedistance 1, 2, 4,..., N 2 positions,atotalmotionof N 1positions.Simplisticmodelsof performanceassumealledgesareofequaltimecost,butthisisnottrueifthe physicallimitsoncommunicationspeedareconsidered.Thetotalmotioncostof N 1positionsholdsforeachofthe N datapoints,hencethetotalcommunicationworkisorder N 2 ,thesameordercomplexityasaDFTwithoutanyclever factoring.Thisassumesmemoryisphysicallyplacedinaline.Inarealsystem likeasupercomputerclustercoveringmanysquaremeters,memoryisdistributed overaplane,forwhichtheaveragedistancebetweenlocationsisorder N 1/2 ,or inavolume,forwhichtheaveragedistanceisorder N 1/3 .Thoseconﬁgurations resultinphysics-limitedFFTcommunicationcomplexityoforder N 3/2 or N 4/3 respectively,bothofwhichgrowfasterwith N thandoes N log 2 N .Itispossible

Fig.3.

todoall-to-allexchangespartwaythroughtheFFTtomakecommunications localagain,butthismerelyshiftsthecommunicationcostintotheall-to-all exchange.Thus,itisnotsurprisingthatlarge-scalesupercomputersattainonly afractionoftheirpeakarithmeticperformancewhenperformingFFTs.

Thisobservationpointsustoadifferentapproach:reducethe bitsmoved, nottheoperationcount.Thecommunicationcostgrowslinearlywiththenumberofbitsusedperdataitem.Theuseofadataformat,i.e.posits,thathas considerablymoreinformation-per-bitthanIEEE754floatingpointnumbers cangenerateanswerswithacceptableaccuracyusingfewerbits.Inthefollowing section,16-bitpositswillbeusedtocomputeFFTswithhigheraccuracythan 16-bit(IEEEhalf-precision)floats,potentiallyallowing32-bitfloatstobesafely replacedby16-bitposits,doublingthespeed(byhalvingthecommunication cost)ofsignalandimageprocessingworkloads.WespeculatethatsimilarperformancedoublingispossibleforHPCworkloadsthatpresentlyuse64-bitfloats forFFTs,bymakingthemsufficientlyaccurateusing32-bitposits.

4Approach

4.1Accuracy

Totesttheeffectivenessofposits,1024-and4096-pointFFTsareusedasthe sizesmostcommonlyfoundintheliterature.Bothradix-2andradix-4methods arestudiedhere,butnotsplit-radix[9, 30].Althoughmodifiedsplit-radixhas thesmallestoperationcount(about3 88N log 2 N ),fixed-pointstudies[4]show ithaspooreraccuracythanradix-2andradix-4methods.

BothDITandDIFmethodsaretested.TheDIFapproachintroducesmultiplicativeroundingearlyintheprocessing.Intuitionsaysthismightpollutethe latercomputationsmorethantheDITmethodwheretheﬁrstpassesperformno multiplications.Empiricaltestsareconductedtocheckthisintuitionwiththree numericalapproaches:

Fig.4. AtypicalFFTdiagram

–16-bitIEEEstandardﬂoats,withmaximumuseoffusedmultiply-addoperationstoreduceroundingerrorinthemultiplicationsofcomplexnumbers, –16-bitposits(eS =1)withexactlythesameoperationsasusedforﬂoats, and –16-bitpositsusingthequiretofurtherreduceroundingerrortoonerounding perpassoftheFFT.

Foreachofthese24combinationsofdata-pointsize,radix,decimationtype, andnumericalapproach,randomuniformdistributioninputsignalsintherange [ 1, 1)attheresolutionofa12-bitADCarecreated.The12-bitfixed-point ADCinputsarefirsttransformedintotheircorresponding16-bitpositsand floatsasshowninFig. 5.A“round-trip”(forwardfollowedbyinverse)FFTis thenappliedbeforetheresultsareconverted(withrounding)backtothe12bitADCfixed-pointformat(representedby ADC inFig. 5).Ifnoerrorsand roundingsoccur,theoriginalsignalisrecoveredperfectly,i.e. ADC = ADC

Around-tripFFTfora12-bitADC

Theabsoluteerrorofa12-bitADCinputcanthusbecomputedasshown inEq. 3.Thiserrorrepresentstheroundingerrorsincurredbypositsandﬂoats respectively.

Toevaluatetheaccuracyofpositsandﬂoats,thevectorofabsoluteerrors ofallADCinputsisevaluated.Threeﬂavorsofmeasures,themaximum(L∞ norm),RMS(L2 norm)andaverage(L1 norm)ofthevectorarecomputed.

Togainadditionalinsighttotheerror,theunitsinthelastplace(ULPs) metricisused.AsshownbyGoldberg[10,p.8],ULPerrormeasureissuperior torelativeerrorformeasuringpureroundingerror.ULPerrorcanbecomputed asfollowsasshowninEq. 4.Fora12-bitﬁxed-pointADC, ulp(ADC )isa constantvalue(2 11 forinputin[ 1, 1)range).

ULPerror = ADC ADC ulp(ADC ) (4) where ulp(ADC )isoneunitvalueofADC lastplaceunit.

Fig.5.

Inthecaseofapositsandﬂoats,evenifanansweriscorrectlyrounded, therewillbeanerrorofasmuchas0.5ULP.Witheachadditionalarithmetic operation,theerrorsaccumulate.Consequently,tominimizetheeﬀectofroundingerrors,itisimportanttominimizetheerrore.g.byusingthequiretodefer roundingasmuchaspossible.

BecauseanADChandlesonlyaﬁniterangeofuniformly-spacedpoints adjustedtothedynamicrangeitneedstohandle,aGaussiandistributionthatis unboundedisdeemedunsuitableforthisevaluation.Auniformly-spaceddistributionthatisboundedtotherequiredrangeisusedinstead.Preliminarytests werealsoconductedonbell-shapeddistributions(truncatedGaussiandistributions)conﬁnedtosamerange,[ 1, 1)representingthreestandarddeviations fromthemeanat0;theyyieldedresultssimilartothosefortheuniformdistributiontestspresentedhere.

For16-bitfixed-point,werelyonanalysisbecauseitobviatesexperimentation.AftereverypassofanFFT,theFFTvaluesmustbescaledby1/2to guaranteethereisnooverflow.ForanFFTwith22k points,theresultofthe2k scalingswillbeananswerthatistoosmallbyafactorof2k ,soitmustbescaled upbyafactorof2k ,shiftingleftby k bits.Thisintroduceszerosontheright thatrevealthelossofaccuracyofthefixed-pointapproach.Thelossofaccuracy is5bitsfora1024-pointFFT,and6bitsfora4096-pointFFT.InFPGAdevelopment,itispossibletousenon-power-of-twodatasizeseasily,andfixedpoint canbemadetoyieldacceptable1024-pointFFTresultsifthedatapointshave 18bitsofprecision[25].Sincefixedpointrequiresmuchlesshardwarethanfloats (orposits),thisisanexcellentapproachforspecial-purposeFPGAdesigns.In themoregeneralcomputingenvironmentwheredatasizesarepower-of-twobits insize,aprogrammerusingfixed-pointformathaslittlechoicebuttoupsizeall thevaluesto32-bitsize.Thesamewillbeshowntruefor16-bitfloats,which cannotachieveacceptableaccuracy.

4.2Performance

Theperformanceoflarge-scaleFFTsiscommunicationbound,aspointedout inSect. 3.3.Thereductioninthesizeofoperandsnotonlyreducestimeproportionately,butalsoreduce cachespill effects.Forexample,a1024-by-1024 2D-FFTcomputedwith16-bitpositswillfitina4MBcache.IftheFFTwas performedusing32-bitfloatstoachievesufficientaccuracy,thedatawillnotfit incacheandthecache“spill”willreduceperformancedramaticallybyafactor ofmorethantwo.Thisisawell-knowneffect.

However,thereisaneedtoshowthatpositarithmeticcanbeasfastasfloat arithmetic,possiblyfaster.Otherwise,theprojectedbandwidthsavingsmightbe offsetbyslowerarithmetic.UntilVLSIprocessorsusingpositsasanativetype arecomplete,arithmeticperformancecomparisonsbetweenpositsandfloatsof thesamebitsizecanbeperformedwithsimilarimplementationsinsoftware.

Asoftwarelibrary,SoftPosit,isusedinthisstudy.Itiscloselybasedon BerkeleySoftFloat[17](Release3d)Similarimplementationandoptimization techniquesareadoptedtoenableafaircomparisonoftheperformanceof16-bit

positsversus16-bitﬂoats.Note:theperformanceresultsonpositsarepreliminarysinceSoftPositisanewlibrary;26yearsofoptimizationeﬀorthavebeen putintoBerkeleySoftFloat.

Table1. Testmachinespeciﬁcation

Thespeciﬁcationofthemachineusedtoevaluatetheperformanceisshown inTable 1.BothSoftPositandSoftFloatarecompiledwithGNUGCC4.8.5with optimizationlevel“O2”andarchitecturesetto“core-avx2”.

Thearithmeticoperationsofposit 16, 1 andquire128 16, 1 ,andofIEEE Standardhalf-precisionfloats(float 16, 5 )areshowninTable 2.Eachoperation isimplementedusingintegeroperatorsinC.Withtheexceptionofthefuseddot product(apositarithmeticfunctionalitythatisnotintheIEEE754Standard), thereisanequivalentpositoperationforeveryfloatoperationshowninTable 2

ThemostsignificantroundingerrorsthatinfluencetheaccuracyofDFT algorithmsoccurineachbutterflycalculation,the bfly routine.Toreducethe numberofroundingerrors,fusedmultiply-addsareleveraged.Positarithmetic canperformfusedmultiply-adds,orbetter,leveragethefuseddotproductswith thequiredatatypetofurtherreducetheaccumulationofroundingerrors.

Thetwiddlefactorsareobtainedusingaprecomputedcosinetablewith1024 pointstostorethevaluesofcos(0)tocos( π 2 ).Thesineandcosinevaluesforthe entireunitcirclearefoundthroughindexedreﬂectionsandnegationsofthese discretevalues.

A1D-FFTwithinputsizes1024and4096isusedtocheckthecomputationalperformancewithposit 16, 1 withoutquire,posit 16, 1 withquire,and ﬂoat 16, 5 bytaggingeachruntothesameselectedcore.

5Results

5.1Accuracy

Figure 6 showstheaverageRMSerrorsfor1024-pointtests,representinghundredsofthousandsofexperimentaldatapoints.TheRMSerrorbargraphforthe 4096-pointFFTslooksverysimilarbutbarsareuniformlyabout12%higher. TheverticalaxisrepresentsUnitsintheLastPlace(ULP)attheresolutionof

Table2. ArithmeticOperations

Arithmeticoperations Posit 16, 1 functions Float 16, 5 functions

Add p16 add f16 add

Subtract p16 sub f16 sub

Multiply p16 mul f16 mul

Divide p16 div f16 div

Fusedmultiply-add p16 mulAdd f16 mulAdd

Fuseddotproduct-add q16 fdp add Notapplicable

Fuseddotproduct-sub q16 fdp sub Notapplicable

a12-bitADCasdescribedintheprevioussection.BesidestheRMSerror, L1 and L∞ errorsarecalculatedandshowanearlyidenticalpattern,diﬀeringonly intheverticalscale.

Fig.6. RMSerrorspervalue ×106 for1024-pointFFTs

Theobviousdifferenceisthat16-bitpositshaveabout1/4theroundingerror offloatswhenrunningalgorithmsthatroundatidenticalplacesinthedataflow. Thisfollowsfromthefactthatmostoftheoperationsoccurwherepositshave twobitsgreateraccuracyinthefractionpartoftheirformat.Theuseofthe quirefurtherreduceserrors,byasmuchas1.8× forthecaseoftheradix-4form oftheFFT.(Similarly,roundingerrorisabout16timeslessfor32-bitposits thanfor32-bitfloatssince32-bitpositshave28bitsinthesignificandforvalues withmagnitudesbetween1/16and16,comparedwith24bitsinthesignificand ofastandardfloat).

Theotherdiﬀerences,betweenDITandDIF,orbetweenradix-2andradix4,aresubtlebutstatisticallysigniﬁcantandrepeatable.Forpositswithoutthe quire,radix-4isslightlylessaccuratethanradix-2becauseintermediatecalculationsinthelongerdotproductcanstrayintotheregionswherepositshave onlyonebitinsteadoftwobitsofgreaterfractionprecision.However,thequire providesastrongadvantageforradix-4,reducingthenumberofroundingevents perresulttoonly4perpointfora1024-pointFFT,and4morefortheinverse.

However,Fig. 6 understatesthesignificanceofthehigheraccuracyofposits. TheoriginalADCsignalisthe“goldstandard”forthecorrectanswer.Bothfloat orpositcomputationsmakeroundingerrorsinperformingaround-tripFFT. Additionally,weroundthatresultagain toexpressitinthefixed-pointformat usedbytheADC,asshowninFig. 5.Oncetheresultisaccuratetowithin0.5 ULPoftheADCinputformat,itwillroundtotheoriginalvaluewith no error. Butiftheresultismorethan0.5ULPfromtheoriginalADCvalue,it“fallsoff acliff”androundstothewrongfixed-pointnumber.Becauseofthisinsight,we focusonthenumberofbitswrongcomparedtotheoriginalsignal(measuredin ADCULPs)andnotmeasuressuchasRMSerrorordBsignal-to-noiseratios thataremoreappropriatewhennumericalerrorsarefrequentandpervasive.

SupposewemeasureandsumtheabsoluteULPsoferrorforeverydatapoint inaround-trip(radix-2,DIF)1024-pointFFT.Figure 7ashowsthatthemassive lossesproducedby16-bitﬂoatsprecludetheiruseforFFTs.

Forposits,ontheotherhand,97.9%ofthevaluesmaketheroundtripwithall bitsidenticaltotheoriginalvalue.Ofthe2.1%thatareoff,theyareoffbyonly 1ULP.Whilethereversibilityisnotmathematicallyperfect,itisnearlyso,and maybeaccurateenoughtoeliminatetheneedtouse32-bitdatarepresentation. Thebarchartshows16-bitpositstobeabout36timesasaccurateas16-bitfloats inpreservingtheinformationcontentoftheoriginaldata.Theratioissimilar fora4096-pointFFT,showninFig. 7b;theyarealmostindistinguishableexcept fortheverticalscale.

(a)Around-trip1024-pointFFT(b)Around-trip4096-pointFFT

Figure 8ashowsanotherwaytovisualizetheerror,plottingtheerrorinULPs asafunctionofthedatapoint(realparterrorsinblue,andimaginaryparterrors inorange).TheerrorsareaslargeassixULPsfromtheoriginaldata,and68%of theround-tripvaluesareincorrect.AnerrorofsixULPsrepresentsaworst-case lossof1.8decimalsofaccuracy(aone-bitlossrepresentsabout0.3decimalloss), and16-bitﬂoatsonlyhaveabout3.6decimalstobeginwith.Figure 8bshows theresultswhenusingpositsandthequire,withjustafewscatteredpointsthat donotlieonthe x-axis.Theinformationlossisveryslight.

Fig.7. TotalULPsoferror

(a)Floats(b)Posit+Quire

Fig.8. ULPserror,1024-pointround-tripFFT

(a)Posits(b)Floats

Fig.9. Percentround-triperrorsversusADCbitresolutionfor1024-pointFFTs

Caninformationlossbereducedto zero ?Itcan,asshowninFig. 9.Ifwewere tousealower-resolutionADC,with11or10bitsofresolution,the16-bitposit approachcanresultinperfectreversibility;nobitsarelost.Low-precisionADCs areinwidespreaduse,allthewaydowntofast2-bitconvertersthatproduce onlyvalues 1,0,or1.Theoilandgasindustryisnotoriousformovingseismic databythetruckload,literally,andtheystoretheirFFTsoflow-resolutionADC outputin32-bitfloatstoprotectagainstanylossofdatathatwasveryexpensive toacquire.Asimilarsituationholdsforradioastronomy,X-raycrystallography, magneticresonanceimaging(MRI)data,andsoon.Theuseof16-bitposits couldcutallthestorageanddatamotioncostsinhalffortheseapplications. Figure 9 showsthattheinsufficientaccuracyof16-bitfloatsforcestheuseof 32-bitfloatstoachievelosslessreversibility.

5.2Performance

Theperformanceofarithmeticoperations,add,subtract,multiplyanddivide, fromSoftPositandSoftFloatarecomputedexhaustivelybysimulatingallpossiblecombinationsoftheinputsintherange[ 1, 1]wheremostoftheFFT calculationsoccur.Fusedmultiply-addandfuseddotproductwillrequirefrom daystoweekstobeexhaustivelytestedontheselectedmachine.Consequently, thetestsaresimpliﬁedbyreusingfunctionarguments,i.e.restrictingthetest

torunonlytwonestedloopsexhaustivelyinsteadofthree.Tenrunsforeach operationwereperformedonaselectedcoretoeliminateperformancevariations betweencoresandremovethevariationcausedbyoperatingsysteminterrupts andotherinterference.

Intheselectedinputrange,float 16, 5 andposit 16, 1 have471951364and 1073807361bitpatternsrespectively.Thusposit 16, 1 hasmoreaccuracythan float 16, 5 .Oneofthereasonswhyfloats 16, 5 havefewerthanhalfthebit patternscomparedtoposits 16, 1 isduetothereserved2048bit-patternsfor “non-numbers”,i.e.whenallexponentbitsare1s.“Non-numbers”represent positiveandnegativeinfinityandnot-a-number(NaN).Incomparison,posits donotwastebitpatternsandhaveonlyoneNot-a-Real(NaR)bitpattern, 100...000,andonlyonerepresentationforzero,000 000.Additionally,posits havemorevaluescloseto ±1andto0thandofloats.

Theperformanceinoperationspersecondforargumentsintherangeof [ 1, 1]isshowninFig 10

Fig.10. Posit 16, 1 versusﬂoat 16, 5 performanceinrange[ 1, 1]

Theresultsshowthatpositshaveslightlybetterperformanceinmultiplyand divideoperationswhileﬂoatshaveslightlybetterperformanceinadd,subtract andFMAoperations.“FDP-add”and“FDP-sub”areadditionsandsubtractions ofproductsusingthequirewhencomputingafuseddotproduct.Theyshow higherperformancethanFMAduetothefactthatoneofthearguments,the quire,doesnotrequireadditionalshiftingbeforeadding/subtractingthedot productoftwootherpositarguments.Italsodoesnotneedtoperformrounding untilitcompletesallaccumulations,whichsavestime.Whenappropriatelyused, quirescanpotentiallyimproveperformancewhileminimizingroundingerrors.

Add Subtract Multiply Divide FMA FDP-add FDP-sub

6ConclusionsandFutureWork

Wehaveshownthat16-bitpositsoutperform16-bitfloatsandfixed-pointsin accuracyforradix-2andradix-4,1024-and4096-pointFFTsforbothDITand DIFclasses.Tohaveaccuracysimilartothatof16-bitposits,32-bitsfloats wouldhavetobeused.WhenADCinputsare11-bitsorsmaller,16-bitposits cancomputecompletelylossless“round-trip”FFTs.16-bitpositshavecomparablecomputationperformancetothatof16-bitfloats,butapproximatelytwice theperformanceonbandwidthboundtaskssuchastheFFT.Becauseposit arithmeticisstillinitsinfancy,theperformanceresultshownhereusingan in-housesoftwareemulator,SoftPosit,ispreliminary.

Whileweherestudiedexamplesfromthelow-precisionsideofHPC,the advantagesofpositsshouldalsoapplytohigh-precisionFFTapplicationssuchas abinitio computationalchemistry,radarcross-sectionanalysis,andthesolution ofpartialdiﬀerentialequations(PDEs)byspectralmethods.Forsomeusers, positsmightbedesirabletoincreaseaccuracyusingthesameprecision,instead oftoenabletheuseofhalfasmanybitspervariable.A32-bitpositisnominally 16timesasaccurateperoperationasa32-bitﬂoatintermsofitsfractionbits, thoughtestsonrealapplicationsshowtheadvantagetobemorelike50times asaccurate[26],probablybecauseoftheaccumulationoferrorintime-stepping physicalsimulations.

AnotherareaforfutureFFTinvestigationistheWinogradFFTalgorithm [35, 36],whichtradesmultiplicationsforadditions.Normally,summingnumbersofsimilarmagnitudeandoppositesignmagniﬁesanyrelativeerrorinthe addends,whereasmultiplicationsarealwaysaccurateto0.5ULP,soWinograd’s approachmightseemdubiousforﬂoatsorposits.However,therangeofvalues whereFFTstakeplacearerichinadditionsthatmakenoroundingerrors,so thisdeservesinvestigation.

ItiscrucialtonotethattheadvantageofpositsisnotlimitedtoFFTsand weexpecttoexpandexperimentalcomparisonsofﬂoatandpositaccuracyfor otheralgorithmssuchaslinearequationsolutionandmatrixmultiplication,for bothHPCandmachinelearning(ML)purposes.Onepromisingareafor32-bit positsisweatherprediction(whichfrequentlyreliesonFFTsthataretypically performedwith64-bitﬂoats).

Large-scalesimulationstypicallyachieveonlysingle-digitpercentagesofthe peakspeedofanHPCsystem,whichmeansthearithmeticunitsarespending mostoftheirtimewaitingforoperandstobecommunicatedtothem.Hardware bandwidthshavebeenincreasingveryslowlyforthelastseveraldecades,and therehasbeennosteeptrendlineforbandwidthliketherehasbeenfortransistorsizeandcost(Moore’slaw).High-accuracypositarithmeticpermitsuse ofreduceddatasizes,whichpromisestoprovidedramaticspeedupsnotjustfor FFTkernelsbutfortheverybroadrangeofbandwidth-boundapplicationsthat presentlyrelyonﬂoating-pointarithmetic.

Where can buy Next generation arithmetic: 4th international conference, conga 2023, singapore, march

Handbook of MRI Scanning 1st Edition

Single

Gender-Critical Feminism 1st Edition

Next Generation Arithmetic

LectureNotesinComputerScience13851

NextGeneration

Arithmetic

Preface

Organization

LosslessFFTsUsingPositArithmetic

1Introduction

2RelatedWork

3Background

3.1Posits

3.2TheQuireRegister

4Approach

4.1Accuracy

4.2Performance

5Results

Table2. ArithmeticOperations

5.2Performance

6ConclusionsandFutureWork