Algorithms and architectures for parallel processing 14th international conference ica3pp 2014 dalia

Page 1

and Architectures for Parallel Processing 14th International Conference ICA3PP 2014 Dalian

August 24 27 2014 Proceedings Part I 1st Edition Xian-He Sun

Visit to download the full and correct content document: https://textbookfull.com/product/algorithms-and-architectures-for-parallel-processing-1 4th-international-conference-ica3pp-2014-dalian-china-august-24-27-2014-proceedin gs-part-i-1st-edition-xian-he-sun/

Algorithms
China

More products digital (pdf, epub, mobi) instant download maybe you interests ...

Algorithms and Architectures for Parallel Processing

14th International Conference ICA3PP 2014 Dalian China

August 24 27 2014 Proceedings Part II 1st Edition XianHe Sun

https://textbookfull.com/product/algorithms-and-architecturesfor-parallel-processing-14th-international-conferenceica3pp-2014-dalian-china-august-24-27-2014-proceedings-partii-1st-edition-xian-he-sun/

Algorithms and Architectures for Parallel Processing

18th International Conference ICA3PP 2018 Guangzhou

China November 15 17 2018 Proceedings Part I Jaideep

Vaidya

https://textbookfull.com/product/algorithms-and-architecturesfor-parallel-processing-18th-international-conferenceica3pp-2018-guangzhou-china-november-15-17-2018-proceedings-parti-jaideep-vaidya/

Algorithms and Architectures for Parallel Processing

18th International Conference ICA3PP 2018 Guangzhou

China November 15 17 2018 Proceedings Part III Jaideep

Vaidya

https://textbookfull.com/product/algorithms-and-architecturesfor-parallel-processing-18th-international-conferenceica3pp-2018-guangzhou-china-november-15-17-2018-proceedings-partiii-jaideep-vaidya/

Algorithms and Architectures for Parallel Processing

18th International Conference ICA3PP 2018 Guangzhou

China November 15 17 2018 Proceedings Part IV Jaideep

Vaidya

https://textbookfull.com/product/algorithms-and-architecturesfor-parallel-processing-18th-international-conferenceica3pp-2018-guangzhou-china-november-15-17-2018-proceedings-partiv-jaideep-vaidya/

Algorithms and Architectures for Parallel Processing

13th International Conference ICA3PP 2013 Vietri sul Mare Italy

December 18 20 2013 Proceedings Part I 1st

Edition Antonio Balzanella

https://textbookfull.com/product/algorithms-and-architecturesfor-parallel-processing-13th-international-conferenceica3pp-2013-vietri-sul-mare-italydecember-18-20-2013-proceedings-part-i-1st-edition-antoniobalzanella/

Algorithms and Architectures for Parallel Processing

ICA3PP 2018 International Workshops Guangzhou China

November 15 17 2018 Proceedings Ting Hu

https://textbookfull.com/product/algorithms-and-architecturesfor-parallel-processing-ica3pp-2018-international-workshopsguangzhou-china-november-15-17-2018-proceedings-ting-hu/

Intelligent

Virtual Agents 14th International Conference IVA 2014 Boston MA USA August 27 29 2014

Proceedings 1st Edition Timothy Bickmore

https://textbookfull.com/product/intelligent-virtual-agents-14thinternational-conference-iva-2014-boston-ma-usaaugust-27-29-2014-proceedings-1st-edition-timothy-bickmore/

Algorithms and Architectures for Parallel Processing

16th International Conference ICA3PP 2016 Granada Spain

December 14 16 2016 Proceedings 1st Edition Jesus Carretero

https://textbookfull.com/product/algorithms-and-architecturesfor-parallel-processing-16th-international-conferenceica3pp-2016-granada-spain-december-14-16-2016-proceedings-1stedition-jesus-carretero/

Algorithms and Architectures for Parallel Processing

13th International Conference ICA3PP 2013 Vietri sul Mare Italy December 18 20 2013 Proceedings Part II 1st Edition Peter Benner

https://textbookfull.com/product/algorithms-and-architecturesfor-parallel-processing-13th-international-conferenceica3pp-2013-vietri-sul-mare-italydecember-18-20-2013-proceedings-part-ii-1st-edition-peter-benner/

Xian-he Sun Wenyu Qu Ivan Stojmenovic

Wanlei Zhou Zhiyang Li Hua Guo

Geyong Min Tingting Yang Yulei Wu

Lei Liu (Eds.)

Algorithms and Architectures for Parallel Processing

14th International Conference, ICA3PP 2014 Dalian, China, August 24–27, 2014

Proceedings, Part I

123 LNCS 8630

LectureNotesinComputerScience8630

CommencedPublicationin1973

FoundingandFormerSeriesEditors: GerhardGoos,JurisHartmanis,andJanvanLeeuwen

EditorialBoard

DavidHutchison LancasterUniversity,UK

TakeoKanade CarnegieMellonUniversity,Pittsburgh,PA,USA

JosefKittler UniversityofSurrey,Guildford,UK

JonM.Kleinberg CornellUniversity,Ithaca,NY,USA

AlfredKobsa UniversityofCalifornia,Irvine,CA,USA

FriedemannMattern ETHZurich,Switzerland

JohnC.Mitchell StanfordUniversity,CA,USA

MoniNaor

WeizmannInstituteofScience,Rehovot,Israel

OscarNierstrasz UniversityofBern,Switzerland

C.PanduRangan IndianInstituteofTechnology,Madras,India

BernhardSteffen TUDortmundUniversity,Germany

DemetriTerzopoulos UniversityofCalifornia,LosAngeles,CA,USA

DougTygar UniversityofCalifornia,Berkeley,CA,USA

GerhardWeikum MaxPlanckInstituteforInformatics,Saarbruecken,Germany

Xian-heSunWenyuQuIvanStojmenovic WanleiZhouZhiyangLiHuaGuo

GeyongMinTingtingYangYuleiWu LeiLiu(Eds.)

AlgorithmsandArchitectures forParallelProcessing

14thInternationalConference,ICA3PP2014

Dalian,China,August24-27,2014

Proceedings,PartI

13

VolumeEditors

Xian-heSun

IllinoisInstituteofTechnology,Chicago,IL,USA,e-mail:sun@iit.edu

WenyuQu

DalianMaritimeUniversity,China,e-mail:wenyu@dlmu.edu.cn

IvanStojmenovic

UniversityofOttawa,ON,Canada,e-mail:ivan@site.ottawa.ca

WanleiZhou

DeakinUniversity,Burwood,VIC,Australia,e-mail:wanlei.zhou@deakin.edu.au

ZhiyangLi

DalianMaritimeUniversity,China,e-mail:lizy0205@gmail.com

HuaGuo

BeiHangUniversity,Beijing,China,e-mail:hguo@buaa.edu.cn

GeyongMin

UniversityofBradford,UK,e-mail:g.min@brad.ac.uk

TingtingYang

DalianMaritimeUniversity,China,e-mail:yangtingting820523@163.com

YuleiWu

ChineseAcademyofSciences,Beijing,China,e-mail:yulei.frank.wu@gmail.com

LeiLiu

ShandongUniversity,JinanCity,China,e-mail:l.liu@sdu.edu.cn

ISSN0302-9743e-ISSN1611-3349 ISBN978-3-319-11196-4e-ISBN978-3-319-11197-1 DOI10.1007/978-3-319-11197-1

SpringerChamHeidelbergNewYorkDordrechtLondon LibraryofCongressControlNumber:2014947719

LNCSSublibrary:SL1–TheoreticalComputerScienceandGeneralIssues

©SpringerInternationalPublishingSwitzerland2014

Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped.Exemptedfromthislegalreservationarebriefexcerptsinconnection withreviewsorscholarlyanalysisormaterialsuppliedspecificallyforthepurposeofbeingenteredand executedonacomputersystem,forexclusiveusebythepurchaserofthework.Duplicationofthispublication orpartsthereofispermittedonlyundertheprovisionsoftheCopyrightLawofthePublisher’slocation, inistcurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer.Permissionsforuse maybeobtainedthroughRightsLinkattheCopyrightClearanceCenter.Violationsareliabletoprosecution undertherespectiveCopyrightLaw.

Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Whiletheadviceandinformationinthisbookarebelievedtobetrueandaccurateatthedateofpublication, neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityforanyerrorsor omissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,withrespecttothe materialcontainedherein.

Typesetting: Camera-readybyauthor,dataconversionbyScientificPublishingServices,Chennai,India Printedonacid-freepaper

SpringerispartofSpringerScience+BusinessMedia(www.springer.com)

Preface

Welcometotheproceedingsofthe14thInternationalConferenceonAlgorithms andArchitecturesforParallelProcessing(ICA3PP2014)heldinDalian,China.

ICA3PP2014isthe14thinthisseriesofconferencesstartedin1995thatare devotedtoalgorithmsandarchitecturesforparallelprocessing.Asapplications ofcomputingsystemshavepermeatedineveryaspectofdailylife,thepower ofcomputingsystemhasbecomeincreasinglycritical.Thisconferenceprovides aforumforacademicsandpractitionersfromcountriesaroundtheworldto exchangeideasforimprovingtheefficiency,performance,reliability,security, andinteroperabilityofcomputingsystemsandapplications.

Itisourgreathonortointroducetheprogramfortheconference.Thanksto theProgramCommittee’shardwork,wewereabletofinalizethetechnicalprogram.Intheselectionprocess,eachpaperwasassignedtoatleast4PCmembers asreviewers.TheauthorsandthosePCmembersfromthesameinstitutionwere separatedinthereviewingprocesstoavoidconflictsofinterests.Wereceived285 submissionsfromallovertheworld.Thelargenumberofsubmissionsindicated continuedexcitementinthefieldworldwide.Themanuscriptshavebeenranked accordingtotheiroriginalcontribution,quality,presentation,andrelevanceto thethemesoftheconference.Intheend,70(24.56%)paperswereacceptedas themainconferencepapersandinclusionintheconference.

ICA3PP2014obtainedthesupportofmanypeopleandorganizationsaswell asthegeneralchairswhosemainresponsibilitywasvarioustaskscarriedoutby otherwillingandtalentedvolunteers.Wewanttoexpressourappreciationto ProfessorXian-HeSunforacceptingourinvitationtobethekeynote/invited speaker.

WewouldliketogiveourspecialthankstotheprogramchairsoftheconferencefortheirhardandexcellentworkonorganizingtheProgramCommittee, outstandingreviewprocesstoselecthigh-qualitypapers,andmakinganexcellent conferenceprogram.Wearegratefultoallworkshoporganizersfortheirprofessionalexpertiseandexcellenceinorganizingtheattractiveworkshops/symposia, andothercommitteechairs,advisorymembersandPCmembersfortheirgreat support.Weappreciateallauthorswhosubmittedtheirhigh-qualitypapersto themainconferenceandworkshops/symposia.

Wethankallofyouforparticipatinginthisyear’sICA3PP2014conference, andhopeyoufindthisconferencestimulatingandinteresting.

July2014IvanStojmenovic WanleiZhou

Organization

GeneralChairs

IvanStojmenovicOttawaUniversity,Canada

WanleiZhouDeakinUniversity,Australia

ProgramChairs

XianheSunIllinoisInstituteofTechnology,USA WenyuQuDalianMaritimeUniversity,China

PublicityChairs

JaimeLloretMauriPolytechnicUniversityofValencia,Spain

Al-SakibKhanPathanInternationalIslamicUniversityMalaysia, Malaysia

PublicationChair

YangXiangDeakinUniversity,Australia

SteeringCommitteeChairs

AndrzejGoscinskiDeakinUniversity,Australia

YiPanGeorgiaStateUniversity,USA

YangXiangDeakinUniversity,Australia

WorkshopChairs

MianxiongDongNationalInstituteofInformationand CommunicationsTechnology,Japan

LeiLiuShandongUniversity,China

LocalOrganizingChair

ZhiyangLiDalianMaritimeUniversity,China

RegistrationChair

WeijiangLiuDalianMaritimeUniversity,China

FinanceChair

ZhaobinLiuDalianMaritimeUniversity,China

WebChairs

YangShangDalianMaritimeUniversity,China TingtingWangDalianMaritimeUniversity,China

ProgramCommitteeMembers

ZafeiriosPapazachosQueen’sUniversityofBelfast,UK PaoloTrunfioUniversityofCalabria,Italy

Chao-TungYangTunghaiUniversity,Taiwan YongZhaoUniversityofElectronicScienceandTechnology ofChina,China Xingquan(Hill)ZhuFloridaAtlanticUniversity,USA GiandomenicoSpezzanoICAR-CNR,Italy YasuhikoTakenagaTheUniversityofElectro-Communications, Japan

SushilPrasadUniversityofGeorgia,USA TanselOzyerTOBBUniversityofEconomicsand Technology,Turkey DengPanFloridaInternationalUniversity,USA ApostolosPapadopoulosAristotleUniversity ofThessaloniki,Greece EricPardedeLaTrobeUniversity,Australia KarampelasPanagiotisHellenicAmericanUniversity,Greece PaulLuUniversityofAlberta,Canada KameshMadduriPennStateUniversity,USA

Ching-HsienHsuChungHuaUniversity,Taiwan MuhammadKhurramKhanKingSaudUniversity,SaudiArabia MorihiroKugaKumamotoUniversity,Japan WeiweiFangBeijingJiaotongUniversity,China FrancoFrattolilloUniversit`adelSannio,Italy LongxiangGaoDeakinUniversity,Australia JavierGarc´ıaUniversityCarlosIII,Spain MichaelGlassUniversityofErlangen-Nuremberg,Germany DavidE.SinghUniversidadCarlosIIIdeMadrid,Spain MarionOswaldTUWien,Austria RajkumarBuyyaTheUniversityofMelbourne,Australia

VIIIOrganization

Yue-ShanChangNationalTaipeiUniversity,Taiwan ChristianEngelmanOakRidgeNationalLab,USA

AlessioBechiniUniversityofPisa,Italy

HideharuAmanoKeioUniversity,Japan

WeiWeiXi’anUniversityofTechnology,China

ToshihiroYamauchiOkayamaUniversity,Japan

BoYangUniversityofElectronicScienceandTechnology ofChina,China

LaurenceT.YangSt.FrancisXavierUniversity,Canada SheraliZeadallyUniversityoftheDistrictofColumbia,USA

SotiriosG.ZiavrasNJIT,USA

GennaroDellaVecchiaGennaroDellaVecchia-ICAR-CNR,Italy

OlivierTerzoIstitutoSuperioreMarioBoella,Italy

HiroyukiTomiyamaRitsumeikanUniversity,Japan

TomoakiTsumuraNagoyaInstituteofTechnology,Japan

LuisJavierGarc´ıaVillalbaUniversidadComplutensedeMadrid(UCM), Spain

GaocaiWangGuangxiUniversity,China

ChenWangCSIROICTCentre,Australia

MartineWedlakeIBM,USA

WeiXueTsinghuaUniversity,China

EdwinShaUniversityofTexasatDallas,USA

SachinShettyTennesseeStateUniversity,USA

Ching-LungSuNationalYunlinUniversityofScienceand Technology,Taiwan

AnthonySulistioHighPerformanceComputingCenterStuttgart (HLRS),Germany

MagdalenaSzmajduchCracowUniversityofTechnology(CDN PartnerCracow),Poland JieTaoUniversityofKarlsruhe(KarlsruheInstituteof Technology),Germany

DanaPetcuWestUniversityofTimisoara,Romania FlorinPopUniversityPolitehnicaofBucharest,Romania RajeevRajeIndianaUniversity-PurdueUniversity Indianapolis,USA

FrancoiseSailhanCNAM,France

SubhashSainiNASA,USA

ErichSchikutaUniversityofVienna,Austria

AlbaAmatoSecondUniversityofNaples,Italy

CosimoAnglanoUniversit`adelPiemonteOrientale,Italy

LadjelBellatrecheENSMA,France

AteetBhallaOrientalInstituteofScienceandTechnology, India

OrganizationIX

SurendraBynaLawrenceBerkeleyNationalLab,USA AleksanderByrskiAGHUniversityofScienceandTechnology, Poland

JuanM.MarinUniversityofMurcia,Spain FrancescoMoscatoSecondUniversityofNaples,Italy HirotakaOnoKyushuUniversity,Japan

FabrizioPetriniIBMResearch,USA

StefanoMarroneSecondUniversityofNaples,Italy

AlejandroMasrurTechnologyUniversityofMunich,Germany SusumuMatsumaeSagaUniversity,Japan WeiLuKeeneUniversity,USA

AmitMajumdarSanDiegoSupercomputerCenter,USA

TomasMargalefUniversitatAutonomadeBarcelona,Spain Che-RungLeeNationalTsingHuaUniversity,Taiwan

KeqinLiStateUniversityofNewYorkatNewPaltz, USA

MauroIaconoSecondUniversityofNaples,Italy ShadiIbrahimInria,France

HelenKaratzaAristotleUniversityofThessaloniki,Greece Soo-KyunKimPaiChaiUniversity,Korea EdmundLaiMasseyUniversity,NewZealand KarlFuerlingerLudwig-Maximilians-UniversityMunich, Germany

JoseDanielGarciaUniversityCarlosIIIofMadrid,Spain HaraldGjermundrodUniversityofNicosia,Cyprus HoucineHassanUniversidadPolitecnicadeValencia,Spain Rapha¨elCouturierUniversityofFranche-Comt´e,France

EugenDeduUniversityofFranche-Comt´e,France

CiprianDobreUniversityPolitehnicaofBucharest,Romania MassimoCafaroUniversityofSalento,Italy

Ruay-ShiungChangNationalDongHwaUniversity,Taiwan DanChenUniversityofGeosciences,China

Zizhong(Jeffrey)ChenUniversityofCaliforniaatRiverside,USA JingChenNationalChengKungUniversity,Taiwan CarmelaComitoUniversityofCalabria,Italy YujieXuDalianMaritimeUniversity,China

NatalijaVlajicYorkUniversity,Canada KenjiSaitoKeioUniversity,Japan

ThomasRauberUniversityofBayreuth,Germany

PilarHereroUniversidadPolitecnicadeMadrid,Spain TaniaCerquitelliPolitecnicodiTorino,Italy Tzung-ShiChenNationalUniversityofTainan,Taiwan

DavidExp´ositoUniversityCarlosIII,Spain

PeterStrazdinsTheAustralianNationalUniversity,Australia

UweTangenRuhr-UniversitaetBochum,Germany

XOrganization

LucaTasquierSecondUniversityofNaples,Italy RafaelSantosNationalInstituteforSpaceResearch,Brazil GeorgeBosilcaUniversityofTennessee,USA EsmondNgLawrenceBerkeleyNationalLab,USA LaurentLefevreLaurentLefevre,Inria,UniversityofLyon, France

GiuseppinaCretellaSecondUniversityofNaples,Italy GregoireDanoyUniversityofLuxembourg,Luxembourg BernabeDorronsoroUniversityofLille1,France MassimoFiccoSecondUniversityofNaples,Italy

JorgeBernalBernabeUniversityofMurcia,Spain

OrganizationXI

Keynote

C-AMAT:ConcurrentDataAccessModelfor theBigDataEra

IllinoisInstituteofTechnology,Chicago,USA sun@iit.edu

Abstract. Scalabledatamanagementforbigdataapplicationsisachallengingtask.Itputsevenmorepressureonthelastingmemory-wallproblem,whichmakesdataaccesstheprominentperformancebottleneckfor high-endcomputing.High-endcomputingisknownforitsmassivelyparallelarchitectures.Anaturalwaytoimprovememoryperformanceisto increaseandutilizememoryconcurrencytoalevelcommensuratewith thatofhigh-endcomputing.Wearguethatsubstantialmemoryconcurrencyexistsateachlayerofcurrentmemorysystems,butithasnotbeen fullyutilized.Inthistalkwereevaluatememorysystemsandintroduce thenovelC-AMATmodelforsystemdesignanalysisofconcurrentdata accesses.C-AMATisaparadigmshifttosupportsustaineddataaccessingfromadata-centricview.ThepowerofC-AMATisthatithasopened newdirectionstoreducedataaccessdelay.Inanidealparallelmemory system,thesystemwillexplicitlyexpressandutilizeparalleldataaccesses.Thisawarenessislargelymissingfromcurrentmemorysystems. Wewillreviewtheconcurrencyavailableinmodernmemorysystems, presenttheconceptofC-AMAT,anddiscusstheconsiderationsandpossibilityofoptimizingparalleldataaccessforbigdataapplications.We willalsopresentsomeofourrecentresultswhichquantizeandutilize parallelI/Ofollowingtheparallelmemoryconcept.

Keywords: BigData;Parallelmemorysystem;Dataaccessmodel

1Bio-Shortversion

Dr.Xian-HeSunisaDistinguishedProfessorofComputerScienceandthechairmanoftheDepartmentofComputerScienceattheIllinoisInstituteofTechnology(IIT).HeisthedirectoroftheScalableComputingSoftwarelaboratoryat IITandaguestfacultyintheMathematicsandComputerScienceDivisionat theArgonneNationalLaboratory.BeforejoiningIIT,heworkedatDoEAmes NationalLaboratory,atICASE,NASALangleyResearchCenter,atLouisiana StateUniversity,BatonRouge,andwasanASEEfellowatNavyResearchLaboratories.Dr.SunisanIEEEfellowandisknownforhismemory-bounded speedupmodel,alsocalledSun-NisLaw,forscalablecomputing.Hisresearch

interestsincludeparallelanddistributedprocessing,memoryandI/Osystems, softwaresystemsforbigdataapplications,andperformanceevaluation.Hehas over200publicationsand4patentsintheseareas.HeisaformerIEEECS distinguishedspeakerandformervicechairoftheIEEETechnicalCommittee onScalableComputing,andisservingandservedontheeditorialboardofmost oftheleadingprofessionaljournalsinthefieldofparallelprocessing.MoreinformationaboutDr.Suncanbefoundathiswebsite www.cs.iit.edu/~sun/.

XVIX.-H.Sun

PortingthePrincetonOceanModeltoGPUs

ShizhenXu,XiaomengHuang,YanZhang,YongHu, HaohuanFu,andGuangwenYang

WebServiceRecommendationviaExploitingTemporalQoS Information .....................................................

ChaoZhou,WancaiZhang,andBoLi

OptimizingandScalingHPCGonTianhe-2:EarlyExperience

XianyiZhang,ChaoYang,FangfangLiu,YiqunLiu,andYutongLu

YichaoCheng,HongAn,ZhitaoChen,FengLi,ZhaohuiWang, XiaJiang,andYiPeng

AGPUImplementationofClipping-FreeHalftoningUsingtheDirect

HiroakiKoge,YasuakiIto,andKojiNakano AReliableandSecureGPU-AssistedFileSystem ....................

Shang-ChiehLin,Yu-ChengLiao,andYarsunHsu

EfficientDetectionofClonedAttacksforLarge-ScaleRFIDSystems 85 XiulongLiu,HengQi,KeqiuLi,JieWu,WeilianXue, GeyongMin,andBinXiao

ProbabilityBasedAlgorithmsforGuaranteeingtheStabilityof RechargeableWirelessSensorNetworks 100 YiyiGao,CeYu,JianXiao,JizhouSun,GuiyuanJiang,and HuiWang

PTASforMinimum k -PathConnectedVertexCover inGrowth-BoundedGraphs ....................................... 114 YanChu,JianxiFan,WenjunLiu,andCheng-KuanLin

ASimpleandEffectiveLongDurationContact-BasedUtilityMetric forMobileOpportunisticNetworking 127 ChyouhwaChen,Wei-ChungTeng,andYu-RenWu

AdaptiveQoSandSecurityforVideoTransmissionoverWireless Networks:ACognitive-BasedApproach ............................. 138 WalidAbdallah,SukkyuLee,HwagnamKim,and NoureddineBoudriga

VirtualNetworkMappingAlgorithminWirelessDataCenter Networks ....................................................... 152 JuanLuo,WenfengHe,KeqinLi,andYalingGuo

TableofContents–PartI
........................ 1
15
28
UnderstandingtheSIMDEfficiencyofGraphTraversalonGPU ....... 42
BinarySearch 57
71

AWeightedCentroidBasedTrackingSysteminWirelessSensor Networks 166 HongyangLiu,QianqianRen,LongjiangGuo,JinbaoLi,HuiXu, HuJin,NanWang,andChengjieSong

ASmartphoneLocationIndependentActivityRecognitionMethod BasedontheAngleFeature 179 ChanghaiWang,JianzhongZhang,MengLi,YuanYuan,and YuweiXu

ReliableandEnergyEfficientRoutingAlgorithmforWirelessHART .... 192 QunZhang,FengLi,LeiJu,ZhipingJia,andZhaopengZhang

ADSCP-BasedMethodofQoSClassMappingbetweenWLANand EPSNetwork .................................................... 204 YaoLiu,GangLu,WeiZhang,FenglingCai,andQianKong

HostoSink:ACollaborativeSchedulinginHeterogeneous Environment 214 XiaofeiLiao,XiaobaoXiang,HaiJin,WeiZhang,andFengLu

LoadBalancinginMapReduceBasedonDataLocality 229 YiChen,ZhaobinLiu,TingtingWang,andLuWang

RD-PCA:ATrafficConditionDataImputationMethodBasedon RobustDistance ................................................. 242 XueJinWan,YongDu,andJiongWang

Network-AwareRe-Scheduling:TowardsImprovingNetwork PerformanceofVirtualMachinesinaDataCenter ................... 255 GangyiLuo,ZhuzhongQian,MianxiongDong,KaoruOta,and SangluLu

ANovelPetri-NetBasedResourceConstrainedMulti-project SchedulingMethod ............................................... 270 WenbinHuandHuanWang

InterconnectionNetworkReconstructionforFault-Toleranceof Torus-ConnectedVLSIArray 285 LongtingZhu,JigangWu,GuiyuanJiang,andJizhouSun

AnAntColonyOptimizationAlgorithmforVirtualNetwork Embedding 299 WenjieCao,HuaWang,andLeiLiu

Temperature-AwareSchedulingBasedonDynamicTime-Slice Scaling ......................................................... 310 GangyongJia,YouweiYuan,JianWan,CongfengJiang, XiLi,andDongDai

XVIIITableofContents–PartI

AnImprovedEnergy-EfficientSchedulingforPrecedenceConstrained TasksinMultiprocessorClusters 323 XinLi,YanhengZhao,YibinLi,LeiJu,andZhipingJia

HierarchicalEventualLeaderElectionforDynamicSystems ........... 338 HuaguanLi,WeigangWu,andYuZhou

EfficientResourceProvisioningforMobileMediaTrafficManagement inaCloudComputingEnvironment 352 MohammadMehediHassan,MuhammadAl-Qurishi, BiaoSong,andAtifAlamri

ACommunityCloudforaReal-TimeFinancialApplicationRequirements,ArchitectureandMechanisms ........................ 364 MarceloDutra OsandGra¸caBressan

StrategiesforEvacuatingfromanAffectedAreawithOneorTwo Groups ......................................................... 378

QiWei,YuanShi,BoJiang,andLijuanWang

ANovelAdaptiveWebServiceSelectionAlgorithmBasedonAnt ColonyOptimizationforDynamicWebServiceComposition ........... 391 DenghuiWang,HaoHuang,andChangshengXie

AnOptimizationVMDeploymentforMaximizingEnergyUtilityin CloudEnvironment ..............................................

JinhaiWang,ChuanheHuang,QinLiu,KaiHe,JingWang, PengLi,andXiaohuaJia

PerformanceEvaluationofLight-WeightedVirtualizationforPaaSin Clouds 415 XuehaiTang,ZhangZhang,MinWang,YifangWang, QingqingFeng,andJizhongHan

AnAccessControlSchemewithDirectCloud-AidedAttribute RevocationUsingVersionKey ..................................... 429 JiaoliShi,ChuanheHuang,JingWang,KaiHe,andJinhaiWang

DaluZhang,XiangJin,DejiangZhou,JianpengWang,andJiaqiZhu

ANear-ExactDefragmentationSchemetoImproveRestorePerformance forCloudBackupSystems 457 RongyuLai,YuHua,DanFeng,WenXia,MinFu,andYifanYang

AMusicRecommendationMethodforLarge-ScaleMusicLibraryona HeterogeneousPlatform ..........................................

YaoZheng,LiminXiao,WenqiTang,andLiRuan

TableofContents–PartIXIX
400
FullandLiveVirtualMachineMigrationoverXIA ...................
443
472

GPU-AcceleratedVerificationoftheCollatz Conjecture 483 TakumiHonda,YasuakiIto,andKojiNakano

ReducingtheInterconnectionLengthfor3DFault-TolerantProcessor Arrays ..........................................................

GuiyuanJiang,JigangWu,JizhouSun,andLongtingZhu

497

FeatureEvaluationforEarlyStageInternetTrafficIdentification 511 LizhiPeng,HongliZhang,BoYang,andYuehuiChen

Hyper-StarGraphs:SomeTopologicalPropertiesandanOptimal NeighbourhoodBroadcastingAlgorithm ............................ 526

F.Zhang,K.Qiu,andJ.S.Kim

CustomizedNetwork-on-ChipforMessageReduction 535 HongweiWang,SiyuLu,YouhuiZhang,GuangwenYang,and WeiminZheng

Athena:AFault-Tolerant,EfficientandApplicableRoutingMechanism forDataCenters ................................................. 549

LijunLyu,JunjieXie,YuhuiDeng,andYongtaoZhou

Performance-AwareDataPlacementinHybridParallelFileSystems .... 563 ShuibingHe,Xian-HeSun,BoFeng,andKunFeng

SecurityAnalysisandProtectionBasedonSmaliInjectionforAndroid Applications ..................................................... 577 JunfengXu,ShoupengLi,andTaoZhang

The1stInternationalWorkshoponEmergingTopics inWirelessandMobileComputing(ETWMC2014)

ANovelKeyManagementSchemeinVANETs 587 GuihuaDuan,YunXiao,RuiJu,andHongSong

DesignandImplementationofNetworkHardDisk ................... 596 HongSong,JialongXu,andXiaoqiangCai

CombiningSupervisedandUnsupervisedLearningforAutomatic AttackSignatureGenerationSystem ............................... 607 LiliYang,JieWang,andPingZhong

TheStudyontheIncreasingStrategyofDetectingMovingTargetin WirelessSensorNetworks 619 JialongXu,ZhigangChen,AnfengLiu,andHongSong

ACRC-BasedLightweightAuthenticationProtocolforEPCglobal Class-1Gen-2Tags ............................................... 632 ZhicaiShi,YongxiangXia,YuZhang,YihanWang,andJianDai

XXTableofContents–PartI

TestCasePrioritizationBasedonGeneticAlgorithmandTest-Points Coverage 644

WeixiangZhang,BoWei,andHuisenDu

SAEP:SimulatedAnnealingBasedEnsembleProjectingMethodfor SolvingConditionalNonlinearOptimalPerturbation 655 ShichengWen,ShijinYuan,BinMu,HongyuLi,andLeiChen

ConvertingPtolemyIIModelsto SpaceExforAppliedVerification ..... 669 ShiweiRan,JinzhiLin,YingWu,JianzhongZhang,andYuweiXu

ResearchonInterestSearchingMechanisminSNSLearning Community ..................................................... 684 RenfengWang,JunpeiLiu,HainingSun,andZhihuaiLi

The5thInternationalWorkshoponIntelligent CommunicationNetworks(IntelNet2014)

ImprovingtheFrequencyAdaptiveCapabilityofHybridImmune DetectorMaturationAlgorithm .................................... 691 JunganChen,ShaoZhongZhang,andDanjiangChen

Cluster-BasedTimeSynchronizationProtocolforWirelessSensor Networks ....................................................... 700 JianZhang,ShipingLin,andDandanLiu

AFastCABACAlgorithmforTransformCoefficientsinHEVC 712 NanaShan,WeiZhou,andZheminDuan

AImprovedPageRankAlgorithmBasedonPageLinkWeight 720 XinshengWang,JianchuMa,KaiyuanBi,andZhihuaiLi

ComputationOffloadingManagementforVehicularAdHocCloud 728 BoLi,YijianPei,HaoWu,ZhiLiu,andHaixiaLiu

AnApproachtoModelComplexBigDataDrivenCyberPhysical Systems ........................................................ 740 LichenZhang

The5thInternationalWorkshoponWireless NetworksandMultimedia(WNM2014)

ReliableTransmissionwithMultipathandRedundancyforWireless MeshNetworks .................................................. 755

WenzeShi,TakeshiIkenaga,DaikiNobayashi,XinchunYin,and YebinXu

TableofContents–PartIXXI

CommunityRoamer:ASocial-BasedRoutingAlgorithmin OpportunisticMobileNetworks 768 TieyingZhu,ChengWang,andDandanLiu

ASelf-adaptiveReliablePacketTransmissionSchemeforWireless MeshNetworks .................................................. 781 WenzeShi,TakeshiIkenaga,DaikiNobayashi,XinchunYin,and HuiXu

DistributedEfficientNodeLocalizationinWirelessSensorNetworks UsingtheBacktrackingSearchAlgorithm ...........................

AlanOliveiradeS´a,NadiaNedjah,andLuizadeMacedoMourelle UserSpecificQoSandItsApplicationinResourcesSchedulingfor WirelessSystem 809 ChaoHeandRichardD.Gitlin

LeeLuanLing

RelationbetweenIrregularSamplingandEstimatedCovariancefor Closed-LoopTrackingMethod .....................................

Bei-beiMiaoandXue-boJin

XXIITableofContents–PartI
794
ADistributedStorageModelforSensorNetworks .................... 822
836
AuthorIndex 845

TableofContents–PartII

ParallelDataProcessinginDynamicHybridComputingEnvironment UsingMapReduce ................................................ 1 BingTang,HaiwuHe,andGillesFedak

FastScalablek-means++AlgorithmwithMapReduce ................ 15 YujieXu,WenyuQu,ZhiyangLi,ChangqingJi,YuanyuanLi,and YinanWu

AccelerationofSolvingNon-EquilibriumIonizationviaTracerParticles andMapReduceonEulerianMesh 29 JianXiao,XingyuXu,JizhouSun,XinZhou,andLiJi

AContinuousVirtualVector-BasedAlgorithmforMeasuring CardinalityDistribution 43 XuefeiZhou,WeijiangLiu,ZhiyangLi,andWenwenGao

Hmfs:EfficientSupportofSmallFilesProcessingoverHDFS .......... 54 CairongYan,TieLi,YongfengHuang,andYanglanGan

UtilizingMultipleXeonPhiCoprocessorsonOneComputeNode 68 XinnanDong,JunChai,JingYang,MeiWen,NanWu,XingCai, ChunyuanZhang,andZhaoyunChen

HPSO:PrefetchingBasedSchedulingtoImproveDataLocalityfor MapReduceClusters 82 MingmingSun,HangZhuang,XuehaiZhou,KunLu,and ChanglongLi

ServiceSchedulingAlgorithminVehicleEmbeddedMiddleware ........ 96 JuanLuo,XinJin,andFengWu

SimilarSamplesCleaninginSpeculativeMultithreading .............. 108 YuxiangLi,YinliangZhao,andBinLiu

Equi-joinforMultipleDatasetsBasedonTimeCostEvaluation Model 122 HongZhu,LiboXia,MieyiXie,andKeYan

IdentifyingFileSimilarityinLargeDataSetsbyModuloFileLength ... 136 YongtaoZhou,YuhuiDeng,XiaoguangChen,andJunjieXie

Conpy:ConcolicExecutionEngineforPythonApplications ........... 150 TingChen,Xiao-songZhang,Rui-dongChen,BoYang,and YangBai

APlatformforStockMarketSimulationwithDistributedAgent-Based Modeling 164

ChunyuWang,CeYu,HutongWu,XiangChen,YueleiLi,and XiaotaoZhang

C2CU:ACUDACProgramGeneratorforBulkExecutionofa SequentialAlgorithm 178

DaisukeTakafuji,KojiNakano,andYasuakiIto

DynamicallySpawningSpeculativeThreadstoImproveSpeculative PathExecution 192

MeirongLi,YinliangZhao,andYouTao

AParallelAlgorithmofKirchhoffPre-stackDepthMigrationBasedon GPU 207

YidaWang,ChaoLi,YangTian,HaihuaYan,ChanghaiZhao,and JianleiZhang

AnAlgorithmtoEmbedaFamilyofNode-Disjoint3DMeshesinto LocallyTwistedCubes 219 LantaoYouandYuejuanHan

GPUAccelerationofFindingMaximumEigenvalueofPositive Matrices 231

NingTian,LongjiangGuo,ChunyuAi,MeiruiRen,andJinbaoLi

ImprovingSpeculationAccuracywithInter-threadFetchingValue Prediction 245

FanXu,LiShen,ZhiyingWang,HuiGuo,BoSu,andWeiChen

TowardsEfficientDistributedSPARQLQueriesonLinkedData 259 XuejinLi,ZhendongNiu,andChunxiaZhang

MRFS:ADistributedFilesSystemwithGeo-replicatedMetadata 273 JiongyuYu,WeigangWu,DiYang,andNingHuang

AnAdvancedDataRedistributionApproachtoAcceleratethe Scale-DownProcessofRAID-6 286 CongjinDu,ChentaoWu,andJieLi

ThreadMappingandParallelOptimizationforMICHeterogeneous ParallelSystems 300 TaoJu,ZhengdongZhu,YinfengWang,LiangLi,andXiaosheDong

XXIVTableofContents–PartII

EfficientStorageSupportforReal-TimeNear-DuplicateVideo Retrieval

ZhenhuaNie,YuHua,DanFeng,QiuyuLi,andYuanyuanSun

RepairingMultipleDataLossesbyParallelMax-minTreesBasedon RegeneratingCodesinDistributedStorageSystems

312

325 PengfeiYou,YuxingPeng,ZhenHuang,andChangjianWang

ExploitingContentLocalitytoImprovethePerformanceandReliability ofPhaseChangeMemory ......................................... 339

SuzhenWu,ZaifaXi,BoMao,andHongJiang

Computing,CommunicationandControl TechnologiesinIntelligentTransportationSystem (3CinITS2014)

ApplicationofSupportVectorMachineintheDecision-Makingof Maneuvering ....................................................

352 ZhuangQi,ZhengChang,HanbangSong,andXinyuZhang

MobilePhoneDataRevealtheSpatiotemporalRegularityofHuman Mobility ........................................................ 359

ZihanSun,HanxiaoZhou,JianfengZheng,andYuhaoQin

ResearchonLarge-ScaleVesselRidingTidalCurrenttoPromote EfficiencyofFairway 366 KangZhou,RanDai,andXingwangYue

AVertex-ClusteringAlgorithmBasedontheCluster-Clique ........... 376 DeqiangWang,BinZhang,andKelunWang

DesignedSlideModeControllerforShipAutopilotwithSteeringGear Saturation

386 Gao-Xiaori,Hong-Biguang,Xing-Shengwei,andLi-Tieshan

396 WangDelongandRenHongxiang

BusArrivalTimePredictionandRelease:System,Databaseand AndroidApplicationDesign 404 JunhaoFu,LeiWang,MingyangPan,ZhongyiZuo,andQianYang

OnKeyTechniquesofaRadarRemoteTelemetryandMonitoring System ......................................................... 417 JianglingHao,MingyangPan,DeqiangWang,LiningZhao,and DepengZhao

TableofContents–PartIIXXV
......................................................
AutomaticAssessmentModelforSailinginNarrowChannel

PSCShip-SelectingModelBasedonImprovedParticleSwarm OptimizationandBPNeuralNetworkAlgorithm

425 TingtingYang,ZhonghuaSun,ShounaWang, ChengmingYang,andBinLin

LRPONBasedInfrastructureLayoutPlanningofBackboneNetworks forMobileCloudServicesinTransportation 436

SongYingge,DongJie,LinBin,andDingNing

InfrastructureDeploymentandDimensioningofRelayed-Based HeterogeneousWirelessAccessNetworksforGreenIntelligent

LinBin,GuoJiamei,HeRongxi,andYangTingting

VesselMotionPatternRecognitionBasedonOne-WayDistanceand SpectralClusteringAlgorithm .....................................

WenyaoMa,ZhaolinWu,JiaxuanYang,andWeifengLi

NavigationSafetyAssessmentofShipinRoughSeasBasedonBayesian Network

FengdeQu,FengwuWang,ZongmoYang,andJianSun

OptimizationofShipSchedulingBasedonOne-WayFairway

JunLin,Xin-yuZhang,YongYin,Jin-taoWang,andShunYao ResearchonVirtualCrewPathPlanningSimulatorBasedonA* Algorithm

HuilongHao,HongxiangRen,andDajunChen

DajunChen,HongxiangRen,andHuilongHao

TheAssessmentofRiskofCollisionbetweenTwoShipsAvoiding CollisionbyAlteringCourse ....................................... 507

WeifengLi,WenyaoMa,JiaxuanYang,GuoyouShi,and RobertDesrosiers

TheMergingAlgorithmofRadarSimulationDatainNavigational Simulator .......................................................

ShunYao,Xin-yuZhang,YongYin,XinXiong,andJunLin DataMiningResearchBasedonCollegeForum

LimingXue,ZhihuaiLi,andWeixinLuan

SimulationofMaritimeJointSea-AirSearchTrendUsing3DGIS 533 XingShengwei,WangRenda,YangXuefeng,andLiuJiandao

XXVITableofContents–PartII
Transportation .................................................. 447
461
470
479
487
SpeechRecognitionAppliedinVHFSimulationSystem ............... 496
516
525

QuantitativeAnalysisfortheDevelopmentofMaritimeTransport Efficiency 543

WenboZhang,ZhaolinWu,YongLiu,andZebingLi

SecurityandPrivacyinComputerandNetwork Systems(SPCNS2014)

ImageCompressionBasedonTime-DomainLappedTransformand QuadtreePartition 553

XiuhuaMa,JiwenDong,andLeiWang

TheApplicabilityandSecurityAnalysisofIPv6TunnelTransition Mechanisms ..................................................... 560 WeiMi

QOSPerformanceAnalysisforFlexibleWorkflowSupportingException Handling ........................................................ 571

XiaoyanZhu,JingleZhang,andBoWang

AnalysisofPropagationCharacteristicsofVariantWorms 581 TaoLiu,CanZhang,MingjingCao,andRupingWu

ADesignofNetworkBehavior-BasedMalwareDetectionSystemfor Android 590

YinchengQi,MingjingCao,CanZhang,andRupingWu

DetectionandDefenseTechnologyofBlackholeAttacksinWireless SensorNetwork .................................................. 601 HuishengGao,RupingWu,MingjingCao,andCanZhang

AnImprovedRemoteDataPossessionCheckingProtocolinCloud Storage ......................................................... 611 EnguangZhouandZhoujunLi

FaultLocalizationofConcurrencyBugsandItsApplicationinWeb Security 618 ZhenyuanJiang

FeatureSelectionTowardOptimizingInternetTrafficBehavior Identification 631 ZhenxiangChen,LizhiPeng,ShupengZhao,LeiZhang,and ShanJing

ID-BasedAnonymousMulti-receiverKeyEncapsulationMechanism withSenderAuthentication ....................................... 645 BoZhang,TaoSun,andDairongYu

TableofContents–PartIIXXVII

EnergyEfficientRoutingwith aTree-BasedParticleSwarm OptimizationApproach 659

AContext-AwareFrameworkforSaaSServiceDynamicDiscoveryin Clouds ......................................................... 671

ShaochongLiandHao-pengChen

XXVIIITableofContents–PartII
AuthorIndex .................................................. 685

PortingthePrincetonOceanModeltoGPUs

ShizhenXu1,3 ,XiaomengHuang1,2,3 ,YanZhang1,2,3 ,YongHu1,3 , HaohuanFu1,2,3 ,andGuangwenYang1,2,3

1 MinistryofEducationKeyLaboratoryforEarthSystemModeling

2 CenterforEarthSystemScience,TsinghuaUniversity,100084

3 JointCenterforGlobalChangeStudies,Beijing,100875,China {hxm,haohuan,ygw}@tsinghua.edu.cn, {yan-zhang12,huyong11,xsz12} @mails.tsinghua.edu.cn

Abstract. WhileGPUisbecomingacompellingaccelerationsolution foraseriesofscientificapplications,mostexistingworkonclimatemodelsonlyachievedlimitedspeedup.Itisduetopartialportingofthe hugecodeandthememoryboundinherenceofthesemodels.Inthis work,wedesignandimplementacustomizedGPU-basedacceleration ofthePrincetonOceanModel(gpuPOM).BasedonNvidia’sstate-ofthe-artGPUarchitectures(K20XandK40m),werewritetheoriginal modelfromtheFortranintotheCUDA-Ccompletely.Severalacceleratingmethods,includingoptimizingmemoryaccessinasingleGPU,overlappingcommunicationandboundaryoperationsamongmultipleGPUs, arepresented.TheexperimentalresultsshowthatthegpuPOMonone K40mGPUachieves6.9-foldto17.8-foldspeedupand5.8-foldto15.5foldspeedupononeK20XGPUcomparingwithdifferentIntelCPUs. FurtherexperimentsonmultipleGPUsindicatethattheperformanceof thegpuPOMonasuper-workstationcontaining4GPUsisequivalentto apowerfulclusterconsistingof34pureCPUnodeswithover400CPU cores.

1Introduction

Thereisnodoubtthathigh-resolutionclimatemodelingiscrucialforsimulating globalandregionalclimatechange.Mostresearchgroupsinclimatemodeling haveestablishedtheirownroadmapsforhigh-resolutionrangingfromseveral kilometersdowntohundredsofmeters.Theneedforhigh-resolutionexposes seriousproblemsbecausethetimeconsumedinrunninghigh-resolutionclimate modelsremainsasignificantscientificandengineeringchallenge.

Recentyears,manyscientificcodeshavebeenportedtotheGPUandwell suitedtotheGPU.Intheareaofclimatemodels,mostofthepreviouswork achieveddifferentlevelsofspeedupforentiremodelsonGPUs.Forinstance, Michalakesetalacceleratedacomputationallyintensivemicrophysicsprocess oftheWeatherResearchandForecast(WRF)modelwithaspeedupofnearly 25xthatspeedupstheentireWRFmodelbyonly1.23x[1];Shimokawabeetal fullyacceleratedtheASUCAmodel–apro ductionandnon-hydrostaticweather

X.-h.Sunetal.(Eds.):ICA3PP2014,PartI,LNCS8630,pp.1–14,2014. c SpringerInternationalPublishingSwitzerland2014

model–on528NvidiaTeslaGT200GPUsandachieves15TFlops[2];[3]acceleratedafullhugeoperationalweatherforecastingmodelCOSMOandachieved 2.8Xspeedupsforitsdynamiccore.

Accordingtoouranalysis,furtherspeedupofclimatemodelsislimitedby tworeasons.Firstly,thepartialGPUportinglimitstheperformanceofthe wholeapplication.Manyscientificmodelssufferfromaflatperformanceprofile duringGPUaccerlation.InCAM[4],themostsingleexpensivesubroutineonly accountsforabout10%oftotalruntimeandmostsubroutinesaccountsforless than5%,whichisthesameinmpiPOM.AccordingtotheAmdahllaw,thewhole modelcannotbesignificantlyspeedupsuchastheGPUaccerlationofCAM [4],ROMS[5],WRF[1]andHOMME[6],butitissometimesacompromised approachbecauseofthehugecode.Secondly,climatemodelsismainlybounded bymemoryaccess,especiallyfortheir dynamiccores[3].Greatworkhasbeen doneinthefullGPUaccelerationofCOSMO[3],ASUCA[2]andNIM[7], includingallthedynamiccoresandsomeportionofphysics.Butthememoryboundproblemexistsandcanbefurthereasedthroughbetteruseofstate-ofthe-artGPUmemoryhierarchy.

Theobjectiveofourstudyistoshortenthecomputationtimeofhighresolutionoceanmodelsbyparallelizingtheirexistingmodelstructuresusing theGPU.Withtherepresentativeoceanmodel,thempiPOM,usedasanexample,wedemonstratehowtoparallelizea noceanmodeltomakeiteffectivelyrun onGPUarchitecture.Usingstate-of-the-artGPUarchitecture,wefirstrewrite theentirempiPOMmodelfromtheoriginalFortranversionintoanewCUDACversion.WecallthenewversiongpuPOM.Then,wedesignandimplement severaldifferentoptimizingmethodsfromtwolevels,suchas computationoptimizationinasingleGPU,communicationoptimizationamongmultipleGPUs.

Intermsofcomputation,weconcentr ateonmemoryaccessoptimizationand makingbetteruseofGPU’smemoryhierarchy.Wedeployafour-categories optimizations,includingusingread-only datacache,localmemoryblocking,loop fusionandsubroutinefusion,thatareespeciallyeffectiveforclimatemodels.The experimentalresultsdemonstratethatoneK40mGPUachieves6.9-foldto17.8foldspeedupandoneK20Xachieves5.8-foldto15.5-foldspeedupoverdifferent Intelmulti-coreCPUs.

Intermsofcommunication,weconcentrateonthefine-grainedoverlapping betweentheinner-regioncomputationandtheouter-regioncommunicationand updating.Withthenewdesign,multipleGPUsinonenodecancommunicate directlybypassingtheCPU.Inaddition,withthefine-grainedcontrolofthe CUDAstreamsandtheirpriorities,inner-regioncomputationcanexecuteconcurrentlywithouter-regioncommunicationandupdating.

Tounderstandtheaccuracy,performanceandscalabilityofthegpuPOM,we buildacustomizedsuper-workstationwithfourK20XGPUsinside.ExperimentalresultsshowthattheperformanceofthegpuPOMrunningonthissuperworkstationcancomparewithapowerfulclusterconsistedof34pureCPUnodes withover400CPUcores,whichmeansthisnovelgpuPOMversionprovidesa fastandattractivesolutionforoceanscientiststoconductsimulationresearch.

2S.Xuetal.

PortingthePrincetonOceanModeltoGPUs3

Theremainderofthispaperisorganizedasfollows.WereviewthempiPOM inSec.2,andintroducethestructureofthegpuPOMinSec.3.InSec.4, weintroducefouroptimizationstoefficientlyuseGPU’smemoryhierarchyfor thegpuPOM.InSection5,wepresentdetailedtechniquesaboutcommunication optimizationamongmultipleGPUs.Weprovidethecorrespondingexperimental resultsaboutcorrectness,performanceandscalabilityinSection6andconclude ourworkinSection7.

2ThempiPOM

ThempiPOM[8]isawidelyusedparallelbranchofPOMbasedonMessage PassingInterface(MPI)andretainspartthephysicspackageoftheoriginal POM[9].POMisapowerfulregionalo ceanmodelandhasbeenusedinawiderangeofapplicationssuchascirculationandmixingprocessesinrivers,seasand oceans.

ThempiPOMsolvestheprimitiveequationunderhydrostaticandboussinesqapproximations.Inthehorizontal,spatialderivativesarecomputedusingcentered-spacedifferencingonastaggeredArakawaC-grid.Inthevertical, thempiPOMsupportsterrain-followingsigmacoordinatesandafourth-order schemeoptiontoreducetheinternalpressure-gradienterrors.ThempiPOMuses theclassicaltime-splittingtechniquetoseparatetheverticallyintegratedequations(externalmode)fromtheverticalstructureequations(internalmode).The externalmodecalculationisresponsibleforupdatingsurfaceelevationandthe verticallyaveragedvelocities.Theinternalmodecalculationresultsinupdates forvelocity,temperatureandsalinity,inadditiontotheturbulencequantities. Thethree-dimensionalinternalmodeandthetwo-dimensionalexternalmode arebothintegratedexplicitlyusingasecond-orderleapfrogscheme.Thesetwo modulesarethemosttimeconsumingmodulesofmpiPOM.

Todemonstratethememory-boundproblem,thePerformanceAPI(PAPI)[10] isusedtoestimatefloatingpointoperationscountandmemoryaccess(store/load) instructionscount.Resultsshowsthatthecomputationalintensity(flops/byte) ofthempiPOMisabout1:3.3,whilethatprovidedbySandyBridgeE5-2670 CPUsisabout7.5:1.Moreover,dataaremostlystreamedfrommemoryand showslittlelocality,whichmeansthatmpiPOMismainlyboundedbymemory access.ThewidermemorybandwidthofGPUhasattractedusalotandtothe bestofourknowledge,ourworkisthefirstGPUportingofthePOM.

3StructureofthegpuPOM

TheflowchartofthegpuPOMisillustratedinFig.1.ThemaindifferencebetweenmpiPOMandgpuPOMisthatCPUinthegpuPOMimplementationis onlyresponsiblefortheinitializingandoutput.ThegpuPOMbeginswithinitializingtherelevantarraysonCPUhostandthencopythesedatatoGPU. GPUwilldealwithallthecomputationincludingexternalmodeandinternal mode,andsoon.Duringthecomputation,thevariablesrequiredforoutputlike

Inputdataand Initialization

MemcpyHostToDevice

Advectionand Horizontaldiffusion ofU,V

Baroclinictermof U,V

Internal Mode

UpdateU,V

Continuityequation

+Boundaryoperation

Turbulenceequation

+Boundaryoperation

Tracertransport equation(T,S) +Boundaryoperation

Momentumequation

+Boundaryoperation

Output NetcdfFiles

External Mode

SeaSurfaceHight +Boundaryoperation

VerticalIntegrated momentequations

ComputeUT,VTfor Internalmode +Boundaryoperation

Fig.1. FlowchartofthegpuPOM

velocityandsurfaceelevationwillbecopiedbacktoCPUhostandthenwritten todiskataconstantfrequency.

InFig.1,eachmodulerepresentsapartoftheprimitiveequationsandis implementedbyseveralsubroutines.We retainedtheoverallstructureofthe originalcodeandrewroteallthesubroutineswithabout70CUDAkernelfunctions.ThusdataalwaysresidesonGPUduringthecomputation.

Inourimplementation,the3Darraysofvariablesarestoredsequentiallyin theorderof x, y , z (i,j,k ordering)andthe2Darraysareintheorderof x, y ,whichisthesamewiththeoriginalcode.EachGPUthreadspecifiesa(x,y ) pointinhorizontaldirectionandperformsallthecalculationsfromsurfaceto bottom(30to50points).Thethreadblocksareconfiguredwith(32, 4, 1)threads andsimilarperformancecanbeachievedifconfiguredwith(32, 8, 1)or(32, 16, 1) fortheK20XGPUweuse.

4S.Xuetal.
CPUGPU

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.