and Architectures for Parallel Processing 14th International Conference ICA3PP 2014 Dalian
August 24 27 2014 Proceedings Part I 1st Edition Xian-He Sun
Visit to download the full and correct content document: https://textbookfull.com/product/algorithms-and-architectures-for-parallel-processing-1 4th-international-conference-ica3pp-2014-dalian-china-august-24-27-2014-proceedin gs-part-i-1st-edition-xian-he-sun/
Algorithms
China
More products digital (pdf, epub, mobi) instant download maybe you interests ...
Algorithms and Architectures for Parallel Processing
14th International Conference ICA3PP 2014 Dalian China
August 24 27 2014 Proceedings Part II 1st Edition XianHe Sun
https://textbookfull.com/product/algorithms-and-architecturesfor-parallel-processing-14th-international-conferenceica3pp-2014-dalian-china-august-24-27-2014-proceedings-partii-1st-edition-xian-he-sun/
Algorithms and Architectures for Parallel Processing
18th International Conference ICA3PP 2018 Guangzhou
China November 15 17 2018 Proceedings Part I Jaideep
Vaidya
https://textbookfull.com/product/algorithms-and-architecturesfor-parallel-processing-18th-international-conferenceica3pp-2018-guangzhou-china-november-15-17-2018-proceedings-parti-jaideep-vaidya/
Algorithms and Architectures for Parallel Processing
18th International Conference ICA3PP 2018 Guangzhou
China November 15 17 2018 Proceedings Part III Jaideep
Vaidya
https://textbookfull.com/product/algorithms-and-architecturesfor-parallel-processing-18th-international-conferenceica3pp-2018-guangzhou-china-november-15-17-2018-proceedings-partiii-jaideep-vaidya/
Algorithms and Architectures for Parallel Processing
18th International Conference ICA3PP 2018 Guangzhou
China November 15 17 2018 Proceedings Part IV Jaideep
Vaidya
https://textbookfull.com/product/algorithms-and-architecturesfor-parallel-processing-18th-international-conferenceica3pp-2018-guangzhou-china-november-15-17-2018-proceedings-partiv-jaideep-vaidya/
Algorithms and Architectures for Parallel Processing
13th International Conference ICA3PP 2013 Vietri sul Mare Italy
December 18 20 2013 Proceedings Part I 1st
Edition Antonio Balzanella
https://textbookfull.com/product/algorithms-and-architecturesfor-parallel-processing-13th-international-conferenceica3pp-2013-vietri-sul-mare-italydecember-18-20-2013-proceedings-part-i-1st-edition-antoniobalzanella/
Algorithms and Architectures for Parallel Processing
ICA3PP 2018 International Workshops Guangzhou China
November 15 17 2018 Proceedings Ting Hu
https://textbookfull.com/product/algorithms-and-architecturesfor-parallel-processing-ica3pp-2018-international-workshopsguangzhou-china-november-15-17-2018-proceedings-ting-hu/
Intelligent
Virtual Agents 14th International Conference IVA 2014 Boston MA USA August 27 29 2014
Proceedings 1st Edition Timothy Bickmore
https://textbookfull.com/product/intelligent-virtual-agents-14thinternational-conference-iva-2014-boston-ma-usaaugust-27-29-2014-proceedings-1st-edition-timothy-bickmore/
Algorithms and Architectures for Parallel Processing
16th International Conference ICA3PP 2016 Granada Spain
December 14 16 2016 Proceedings 1st Edition Jesus Carretero
https://textbookfull.com/product/algorithms-and-architecturesfor-parallel-processing-16th-international-conferenceica3pp-2016-granada-spain-december-14-16-2016-proceedings-1stedition-jesus-carretero/
Algorithms and Architectures for Parallel Processing
13th International Conference ICA3PP 2013 Vietri sul Mare Italy December 18 20 2013 Proceedings Part II 1st Edition Peter Benner
https://textbookfull.com/product/algorithms-and-architecturesfor-parallel-processing-13th-international-conferenceica3pp-2013-vietri-sul-mare-italydecember-18-20-2013-proceedings-part-ii-1st-edition-peter-benner/
Xian-he Sun Wenyu Qu Ivan Stojmenovic
Wanlei Zhou Zhiyang Li Hua Guo
Geyong Min Tingting Yang Yulei Wu
Lei Liu (Eds.)
Algorithms and Architectures for Parallel Processing
14th International Conference, ICA3PP 2014 Dalian, China, August 24–27, 2014
Proceedings, Part I
123 LNCS 8630
LectureNotesinComputerScience8630
CommencedPublicationin1973
FoundingandFormerSeriesEditors: GerhardGoos,JurisHartmanis,andJanvanLeeuwen
EditorialBoard
DavidHutchison LancasterUniversity,UK
TakeoKanade CarnegieMellonUniversity,Pittsburgh,PA,USA
JosefKittler UniversityofSurrey,Guildford,UK
JonM.Kleinberg CornellUniversity,Ithaca,NY,USA
AlfredKobsa UniversityofCalifornia,Irvine,CA,USA
FriedemannMattern ETHZurich,Switzerland
JohnC.Mitchell StanfordUniversity,CA,USA
MoniNaor
WeizmannInstituteofScience,Rehovot,Israel
OscarNierstrasz UniversityofBern,Switzerland
C.PanduRangan IndianInstituteofTechnology,Madras,India
BernhardSteffen TUDortmundUniversity,Germany
DemetriTerzopoulos UniversityofCalifornia,LosAngeles,CA,USA
DougTygar UniversityofCalifornia,Berkeley,CA,USA
GerhardWeikum MaxPlanckInstituteforInformatics,Saarbruecken,Germany
Xian-heSunWenyuQuIvanStojmenovic WanleiZhouZhiyangLiHuaGuo
GeyongMinTingtingYangYuleiWu LeiLiu(Eds.)
AlgorithmsandArchitectures forParallelProcessing
14thInternationalConference,ICA3PP2014
Dalian,China,August24-27,2014
Proceedings,PartI
13
VolumeEditors
Xian-heSun
IllinoisInstituteofTechnology,Chicago,IL,USA,e-mail:sun@iit.edu
WenyuQu
DalianMaritimeUniversity,China,e-mail:wenyu@dlmu.edu.cn
IvanStojmenovic
UniversityofOttawa,ON,Canada,e-mail:ivan@site.ottawa.ca
WanleiZhou
DeakinUniversity,Burwood,VIC,Australia,e-mail:wanlei.zhou@deakin.edu.au
ZhiyangLi
DalianMaritimeUniversity,China,e-mail:lizy0205@gmail.com
HuaGuo
BeiHangUniversity,Beijing,China,e-mail:hguo@buaa.edu.cn
GeyongMin
UniversityofBradford,UK,e-mail:g.min@brad.ac.uk
TingtingYang
DalianMaritimeUniversity,China,e-mail:yangtingting820523@163.com
YuleiWu
ChineseAcademyofSciences,Beijing,China,e-mail:yulei.frank.wu@gmail.com
LeiLiu
ShandongUniversity,JinanCity,China,e-mail:l.liu@sdu.edu.cn
ISSN0302-9743e-ISSN1611-3349 ISBN978-3-319-11196-4e-ISBN978-3-319-11197-1 DOI10.1007/978-3-319-11197-1
SpringerChamHeidelbergNewYorkDordrechtLondon LibraryofCongressControlNumber:2014947719
LNCSSublibrary:SL1–TheoreticalComputerScienceandGeneralIssues
©SpringerInternationalPublishingSwitzerland2014
Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped.Exemptedfromthislegalreservationarebriefexcerptsinconnection withreviewsorscholarlyanalysisormaterialsuppliedspecificallyforthepurposeofbeingenteredand executedonacomputersystem,forexclusiveusebythepurchaserofthework.Duplicationofthispublication orpartsthereofispermittedonlyundertheprovisionsoftheCopyrightLawofthePublisher’slocation, inistcurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer.Permissionsforuse maybeobtainedthroughRightsLinkattheCopyrightClearanceCenter.Violationsareliabletoprosecution undertherespectiveCopyrightLaw.
Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Whiletheadviceandinformationinthisbookarebelievedtobetrueandaccurateatthedateofpublication, neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityforanyerrorsor omissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,withrespecttothe materialcontainedherein.
Typesetting: Camera-readybyauthor,dataconversionbyScientificPublishingServices,Chennai,India Printedonacid-freepaper
SpringerispartofSpringerScience+BusinessMedia(www.springer.com)
Preface
Welcometotheproceedingsofthe14thInternationalConferenceonAlgorithms andArchitecturesforParallelProcessing(ICA3PP2014)heldinDalian,China.
ICA3PP2014isthe14thinthisseriesofconferencesstartedin1995thatare devotedtoalgorithmsandarchitecturesforparallelprocessing.Asapplications ofcomputingsystemshavepermeatedineveryaspectofdailylife,thepower ofcomputingsystemhasbecomeincreasinglycritical.Thisconferenceprovides aforumforacademicsandpractitionersfromcountriesaroundtheworldto exchangeideasforimprovingtheefficiency,performance,reliability,security, andinteroperabilityofcomputingsystemsandapplications.
Itisourgreathonortointroducetheprogramfortheconference.Thanksto theProgramCommittee’shardwork,wewereabletofinalizethetechnicalprogram.Intheselectionprocess,eachpaperwasassignedtoatleast4PCmembers asreviewers.TheauthorsandthosePCmembersfromthesameinstitutionwere separatedinthereviewingprocesstoavoidconflictsofinterests.Wereceived285 submissionsfromallovertheworld.Thelargenumberofsubmissionsindicated continuedexcitementinthefieldworldwide.Themanuscriptshavebeenranked accordingtotheiroriginalcontribution,quality,presentation,andrelevanceto thethemesoftheconference.Intheend,70(24.56%)paperswereacceptedas themainconferencepapersandinclusionintheconference.
ICA3PP2014obtainedthesupportofmanypeopleandorganizationsaswell asthegeneralchairswhosemainresponsibilitywasvarioustaskscarriedoutby otherwillingandtalentedvolunteers.Wewanttoexpressourappreciationto ProfessorXian-HeSunforacceptingourinvitationtobethekeynote/invited speaker.
WewouldliketogiveourspecialthankstotheprogramchairsoftheconferencefortheirhardandexcellentworkonorganizingtheProgramCommittee, outstandingreviewprocesstoselecthigh-qualitypapers,andmakinganexcellent conferenceprogram.Wearegratefultoallworkshoporganizersfortheirprofessionalexpertiseandexcellenceinorganizingtheattractiveworkshops/symposia, andothercommitteechairs,advisorymembersandPCmembersfortheirgreat support.Weappreciateallauthorswhosubmittedtheirhigh-qualitypapersto themainconferenceandworkshops/symposia.
Wethankallofyouforparticipatinginthisyear’sICA3PP2014conference, andhopeyoufindthisconferencestimulatingandinteresting.
July2014IvanStojmenovic WanleiZhou
Organization
GeneralChairs
IvanStojmenovicOttawaUniversity,Canada
WanleiZhouDeakinUniversity,Australia
ProgramChairs
XianheSunIllinoisInstituteofTechnology,USA WenyuQuDalianMaritimeUniversity,China
PublicityChairs
JaimeLloretMauriPolytechnicUniversityofValencia,Spain
Al-SakibKhanPathanInternationalIslamicUniversityMalaysia, Malaysia
PublicationChair
YangXiangDeakinUniversity,Australia
SteeringCommitteeChairs
AndrzejGoscinskiDeakinUniversity,Australia
YiPanGeorgiaStateUniversity,USA
YangXiangDeakinUniversity,Australia
WorkshopChairs
MianxiongDongNationalInstituteofInformationand CommunicationsTechnology,Japan
LeiLiuShandongUniversity,China
LocalOrganizingChair
ZhiyangLiDalianMaritimeUniversity,China
RegistrationChair
WeijiangLiuDalianMaritimeUniversity,China
FinanceChair
ZhaobinLiuDalianMaritimeUniversity,China
WebChairs
YangShangDalianMaritimeUniversity,China TingtingWangDalianMaritimeUniversity,China
ProgramCommitteeMembers
ZafeiriosPapazachosQueen’sUniversityofBelfast,UK PaoloTrunfioUniversityofCalabria,Italy
Chao-TungYangTunghaiUniversity,Taiwan YongZhaoUniversityofElectronicScienceandTechnology ofChina,China Xingquan(Hill)ZhuFloridaAtlanticUniversity,USA GiandomenicoSpezzanoICAR-CNR,Italy YasuhikoTakenagaTheUniversityofElectro-Communications, Japan
SushilPrasadUniversityofGeorgia,USA TanselOzyerTOBBUniversityofEconomicsand Technology,Turkey DengPanFloridaInternationalUniversity,USA ApostolosPapadopoulosAristotleUniversity ofThessaloniki,Greece EricPardedeLaTrobeUniversity,Australia KarampelasPanagiotisHellenicAmericanUniversity,Greece PaulLuUniversityofAlberta,Canada KameshMadduriPennStateUniversity,USA
Ching-HsienHsuChungHuaUniversity,Taiwan MuhammadKhurramKhanKingSaudUniversity,SaudiArabia MorihiroKugaKumamotoUniversity,Japan WeiweiFangBeijingJiaotongUniversity,China FrancoFrattolilloUniversit`adelSannio,Italy LongxiangGaoDeakinUniversity,Australia JavierGarc´ıaUniversityCarlosIII,Spain MichaelGlassUniversityofErlangen-Nuremberg,Germany DavidE.SinghUniversidadCarlosIIIdeMadrid,Spain MarionOswaldTUWien,Austria RajkumarBuyyaTheUniversityofMelbourne,Australia
VIIIOrganization
Yue-ShanChangNationalTaipeiUniversity,Taiwan ChristianEngelmanOakRidgeNationalLab,USA
AlessioBechiniUniversityofPisa,Italy
HideharuAmanoKeioUniversity,Japan
WeiWeiXi’anUniversityofTechnology,China
ToshihiroYamauchiOkayamaUniversity,Japan
BoYangUniversityofElectronicScienceandTechnology ofChina,China
LaurenceT.YangSt.FrancisXavierUniversity,Canada SheraliZeadallyUniversityoftheDistrictofColumbia,USA
SotiriosG.ZiavrasNJIT,USA
GennaroDellaVecchiaGennaroDellaVecchia-ICAR-CNR,Italy
OlivierTerzoIstitutoSuperioreMarioBoella,Italy
HiroyukiTomiyamaRitsumeikanUniversity,Japan
TomoakiTsumuraNagoyaInstituteofTechnology,Japan
LuisJavierGarc´ıaVillalbaUniversidadComplutensedeMadrid(UCM), Spain
GaocaiWangGuangxiUniversity,China
ChenWangCSIROICTCentre,Australia
MartineWedlakeIBM,USA
WeiXueTsinghuaUniversity,China
EdwinShaUniversityofTexasatDallas,USA
SachinShettyTennesseeStateUniversity,USA
Ching-LungSuNationalYunlinUniversityofScienceand Technology,Taiwan
AnthonySulistioHighPerformanceComputingCenterStuttgart (HLRS),Germany
MagdalenaSzmajduchCracowUniversityofTechnology(CDN PartnerCracow),Poland JieTaoUniversityofKarlsruhe(KarlsruheInstituteof Technology),Germany
DanaPetcuWestUniversityofTimisoara,Romania FlorinPopUniversityPolitehnicaofBucharest,Romania RajeevRajeIndianaUniversity-PurdueUniversity Indianapolis,USA
FrancoiseSailhanCNAM,France
SubhashSainiNASA,USA
ErichSchikutaUniversityofVienna,Austria
AlbaAmatoSecondUniversityofNaples,Italy
CosimoAnglanoUniversit`adelPiemonteOrientale,Italy
LadjelBellatrecheENSMA,France
AteetBhallaOrientalInstituteofScienceandTechnology, India
OrganizationIX
SurendraBynaLawrenceBerkeleyNationalLab,USA AleksanderByrskiAGHUniversityofScienceandTechnology, Poland
JuanM.MarinUniversityofMurcia,Spain FrancescoMoscatoSecondUniversityofNaples,Italy HirotakaOnoKyushuUniversity,Japan
FabrizioPetriniIBMResearch,USA
StefanoMarroneSecondUniversityofNaples,Italy
AlejandroMasrurTechnologyUniversityofMunich,Germany SusumuMatsumaeSagaUniversity,Japan WeiLuKeeneUniversity,USA
AmitMajumdarSanDiegoSupercomputerCenter,USA
TomasMargalefUniversitatAutonomadeBarcelona,Spain Che-RungLeeNationalTsingHuaUniversity,Taiwan
KeqinLiStateUniversityofNewYorkatNewPaltz, USA
MauroIaconoSecondUniversityofNaples,Italy ShadiIbrahimInria,France
HelenKaratzaAristotleUniversityofThessaloniki,Greece Soo-KyunKimPaiChaiUniversity,Korea EdmundLaiMasseyUniversity,NewZealand KarlFuerlingerLudwig-Maximilians-UniversityMunich, Germany
JoseDanielGarciaUniversityCarlosIIIofMadrid,Spain HaraldGjermundrodUniversityofNicosia,Cyprus HoucineHassanUniversidadPolitecnicadeValencia,Spain Rapha¨elCouturierUniversityofFranche-Comt´e,France
EugenDeduUniversityofFranche-Comt´e,France
CiprianDobreUniversityPolitehnicaofBucharest,Romania MassimoCafaroUniversityofSalento,Italy
Ruay-ShiungChangNationalDongHwaUniversity,Taiwan DanChenUniversityofGeosciences,China
Zizhong(Jeffrey)ChenUniversityofCaliforniaatRiverside,USA JingChenNationalChengKungUniversity,Taiwan CarmelaComitoUniversityofCalabria,Italy YujieXuDalianMaritimeUniversity,China
NatalijaVlajicYorkUniversity,Canada KenjiSaitoKeioUniversity,Japan
ThomasRauberUniversityofBayreuth,Germany
PilarHereroUniversidadPolitecnicadeMadrid,Spain TaniaCerquitelliPolitecnicodiTorino,Italy Tzung-ShiChenNationalUniversityofTainan,Taiwan
DavidExp´ositoUniversityCarlosIII,Spain
PeterStrazdinsTheAustralianNationalUniversity,Australia
UweTangenRuhr-UniversitaetBochum,Germany
XOrganization
LucaTasquierSecondUniversityofNaples,Italy RafaelSantosNationalInstituteforSpaceResearch,Brazil GeorgeBosilcaUniversityofTennessee,USA EsmondNgLawrenceBerkeleyNationalLab,USA LaurentLefevreLaurentLefevre,Inria,UniversityofLyon, France
GiuseppinaCretellaSecondUniversityofNaples,Italy GregoireDanoyUniversityofLuxembourg,Luxembourg BernabeDorronsoroUniversityofLille1,France MassimoFiccoSecondUniversityofNaples,Italy
JorgeBernalBernabeUniversityofMurcia,Spain
OrganizationXI
Keynote
C-AMAT:ConcurrentDataAccessModelfor theBigDataEra
Xian-HeSun
IllinoisInstituteofTechnology,Chicago,USA sun@iit.edu
Abstract. Scalabledatamanagementforbigdataapplicationsisachallengingtask.Itputsevenmorepressureonthelastingmemory-wallproblem,whichmakesdataaccesstheprominentperformancebottleneckfor high-endcomputing.High-endcomputingisknownforitsmassivelyparallelarchitectures.Anaturalwaytoimprovememoryperformanceisto increaseandutilizememoryconcurrencytoalevelcommensuratewith thatofhigh-endcomputing.Wearguethatsubstantialmemoryconcurrencyexistsateachlayerofcurrentmemorysystems,butithasnotbeen fullyutilized.Inthistalkwereevaluatememorysystemsandintroduce thenovelC-AMATmodelforsystemdesignanalysisofconcurrentdata accesses.C-AMATisaparadigmshifttosupportsustaineddataaccessingfromadata-centricview.ThepowerofC-AMATisthatithasopened newdirectionstoreducedataaccessdelay.Inanidealparallelmemory system,thesystemwillexplicitlyexpressandutilizeparalleldataaccesses.Thisawarenessislargelymissingfromcurrentmemorysystems. Wewillreviewtheconcurrencyavailableinmodernmemorysystems, presenttheconceptofC-AMAT,anddiscusstheconsiderationsandpossibilityofoptimizingparalleldataaccessforbigdataapplications.We willalsopresentsomeofourrecentresultswhichquantizeandutilize parallelI/Ofollowingtheparallelmemoryconcept.
Keywords: BigData;Parallelmemorysystem;Dataaccessmodel
1Bio-Shortversion
Dr.Xian-HeSunisaDistinguishedProfessorofComputerScienceandthechairmanoftheDepartmentofComputerScienceattheIllinoisInstituteofTechnology(IIT).HeisthedirectoroftheScalableComputingSoftwarelaboratoryat IITandaguestfacultyintheMathematicsandComputerScienceDivisionat theArgonneNationalLaboratory.BeforejoiningIIT,heworkedatDoEAmes NationalLaboratory,atICASE,NASALangleyResearchCenter,atLouisiana StateUniversity,BatonRouge,andwasanASEEfellowatNavyResearchLaboratories.Dr.SunisanIEEEfellowandisknownforhismemory-bounded speedupmodel,alsocalledSun-NisLaw,forscalablecomputing.Hisresearch
interestsincludeparallelanddistributedprocessing,memoryandI/Osystems, softwaresystemsforbigdataapplications,andperformanceevaluation.Hehas over200publicationsand4patentsintheseareas.HeisaformerIEEECS distinguishedspeakerandformervicechairoftheIEEETechnicalCommittee onScalableComputing,andisservingandservedontheeditorialboardofmost oftheleadingprofessionaljournalsinthefieldofparallelprocessing.MoreinformationaboutDr.Suncanbefoundathiswebsite www.cs.iit.edu/~sun/.
XVIX.-H.Sun
PortingthePrincetonOceanModeltoGPUs
ShizhenXu,XiaomengHuang,YanZhang,YongHu, HaohuanFu,andGuangwenYang
WebServiceRecommendationviaExploitingTemporalQoS Information .....................................................
ChaoZhou,WancaiZhang,andBoLi
OptimizingandScalingHPCGonTianhe-2:EarlyExperience
XianyiZhang,ChaoYang,FangfangLiu,YiqunLiu,andYutongLu
YichaoCheng,HongAn,ZhitaoChen,FengLi,ZhaohuiWang, XiaJiang,andYiPeng
AGPUImplementationofClipping-FreeHalftoningUsingtheDirect
HiroakiKoge,YasuakiIto,andKojiNakano AReliableandSecureGPU-AssistedFileSystem ....................
Shang-ChiehLin,Yu-ChengLiao,andYarsunHsu
EfficientDetectionofClonedAttacksforLarge-ScaleRFIDSystems 85 XiulongLiu,HengQi,KeqiuLi,JieWu,WeilianXue, GeyongMin,andBinXiao
ProbabilityBasedAlgorithmsforGuaranteeingtheStabilityof RechargeableWirelessSensorNetworks 100 YiyiGao,CeYu,JianXiao,JizhouSun,GuiyuanJiang,and HuiWang
PTASforMinimum k -PathConnectedVertexCover inGrowth-BoundedGraphs ....................................... 114 YanChu,JianxiFan,WenjunLiu,andCheng-KuanLin
ASimpleandEffectiveLongDurationContact-BasedUtilityMetric forMobileOpportunisticNetworking 127 ChyouhwaChen,Wei-ChungTeng,andYu-RenWu
AdaptiveQoSandSecurityforVideoTransmissionoverWireless Networks:ACognitive-BasedApproach ............................. 138 WalidAbdallah,SukkyuLee,HwagnamKim,and NoureddineBoudriga
VirtualNetworkMappingAlgorithminWirelessDataCenter Networks ....................................................... 152 JuanLuo,WenfengHe,KeqinLi,andYalingGuo
TableofContents–PartI
........................ 1
15
28
UnderstandingtheSIMDEfficiencyofGraphTraversalonGPU ....... 42
BinarySearch 57
71
AWeightedCentroidBasedTrackingSysteminWirelessSensor Networks 166 HongyangLiu,QianqianRen,LongjiangGuo,JinbaoLi,HuiXu, HuJin,NanWang,andChengjieSong
ASmartphoneLocationIndependentActivityRecognitionMethod BasedontheAngleFeature 179 ChanghaiWang,JianzhongZhang,MengLi,YuanYuan,and YuweiXu
ReliableandEnergyEfficientRoutingAlgorithmforWirelessHART .... 192 QunZhang,FengLi,LeiJu,ZhipingJia,andZhaopengZhang
ADSCP-BasedMethodofQoSClassMappingbetweenWLANand EPSNetwork .................................................... 204 YaoLiu,GangLu,WeiZhang,FenglingCai,andQianKong
HostoSink:ACollaborativeSchedulinginHeterogeneous Environment 214 XiaofeiLiao,XiaobaoXiang,HaiJin,WeiZhang,andFengLu
LoadBalancinginMapReduceBasedonDataLocality 229 YiChen,ZhaobinLiu,TingtingWang,andLuWang
RD-PCA:ATrafficConditionDataImputationMethodBasedon RobustDistance ................................................. 242 XueJinWan,YongDu,andJiongWang
Network-AwareRe-Scheduling:TowardsImprovingNetwork PerformanceofVirtualMachinesinaDataCenter ................... 255 GangyiLuo,ZhuzhongQian,MianxiongDong,KaoruOta,and SangluLu
ANovelPetri-NetBasedResourceConstrainedMulti-project SchedulingMethod ............................................... 270 WenbinHuandHuanWang
InterconnectionNetworkReconstructionforFault-Toleranceof Torus-ConnectedVLSIArray 285 LongtingZhu,JigangWu,GuiyuanJiang,andJizhouSun
AnAntColonyOptimizationAlgorithmforVirtualNetwork Embedding 299 WenjieCao,HuaWang,andLeiLiu
Temperature-AwareSchedulingBasedonDynamicTime-Slice Scaling ......................................................... 310 GangyongJia,YouweiYuan,JianWan,CongfengJiang, XiLi,andDongDai
XVIIITableofContents–PartI
AnImprovedEnergy-EfficientSchedulingforPrecedenceConstrained TasksinMultiprocessorClusters 323 XinLi,YanhengZhao,YibinLi,LeiJu,andZhipingJia
HierarchicalEventualLeaderElectionforDynamicSystems ........... 338 HuaguanLi,WeigangWu,andYuZhou
EfficientResourceProvisioningforMobileMediaTrafficManagement inaCloudComputingEnvironment 352 MohammadMehediHassan,MuhammadAl-Qurishi, BiaoSong,andAtifAlamri
ACommunityCloudforaReal-TimeFinancialApplicationRequirements,ArchitectureandMechanisms ........................ 364 MarceloDutra OsandGra¸caBressan
StrategiesforEvacuatingfromanAffectedAreawithOneorTwo Groups ......................................................... 378
QiWei,YuanShi,BoJiang,andLijuanWang
ANovelAdaptiveWebServiceSelectionAlgorithmBasedonAnt ColonyOptimizationforDynamicWebServiceComposition ........... 391 DenghuiWang,HaoHuang,andChangshengXie
AnOptimizationVMDeploymentforMaximizingEnergyUtilityin CloudEnvironment ..............................................
JinhaiWang,ChuanheHuang,QinLiu,KaiHe,JingWang, PengLi,andXiaohuaJia
PerformanceEvaluationofLight-WeightedVirtualizationforPaaSin Clouds 415 XuehaiTang,ZhangZhang,MinWang,YifangWang, QingqingFeng,andJizhongHan
AnAccessControlSchemewithDirectCloud-AidedAttribute RevocationUsingVersionKey ..................................... 429 JiaoliShi,ChuanheHuang,JingWang,KaiHe,andJinhaiWang
DaluZhang,XiangJin,DejiangZhou,JianpengWang,andJiaqiZhu
ANear-ExactDefragmentationSchemetoImproveRestorePerformance forCloudBackupSystems 457 RongyuLai,YuHua,DanFeng,WenXia,MinFu,andYifanYang
AMusicRecommendationMethodforLarge-ScaleMusicLibraryona HeterogeneousPlatform ..........................................
YaoZheng,LiminXiao,WenqiTang,andLiRuan
TableofContents–PartIXIX
400
FullandLiveVirtualMachineMigrationoverXIA ...................
443
472
GPU-AcceleratedVerificationoftheCollatz Conjecture 483 TakumiHonda,YasuakiIto,andKojiNakano
ReducingtheInterconnectionLengthfor3DFault-TolerantProcessor Arrays ..........................................................
GuiyuanJiang,JigangWu,JizhouSun,andLongtingZhu
497
FeatureEvaluationforEarlyStageInternetTrafficIdentification 511 LizhiPeng,HongliZhang,BoYang,andYuehuiChen
Hyper-StarGraphs:SomeTopologicalPropertiesandanOptimal NeighbourhoodBroadcastingAlgorithm ............................ 526
F.Zhang,K.Qiu,andJ.S.Kim
CustomizedNetwork-on-ChipforMessageReduction 535 HongweiWang,SiyuLu,YouhuiZhang,GuangwenYang,and WeiminZheng
Athena:AFault-Tolerant,EfficientandApplicableRoutingMechanism forDataCenters ................................................. 549
LijunLyu,JunjieXie,YuhuiDeng,andYongtaoZhou
Performance-AwareDataPlacementinHybridParallelFileSystems .... 563 ShuibingHe,Xian-HeSun,BoFeng,andKunFeng
SecurityAnalysisandProtectionBasedonSmaliInjectionforAndroid Applications ..................................................... 577 JunfengXu,ShoupengLi,andTaoZhang
The1stInternationalWorkshoponEmergingTopics inWirelessandMobileComputing(ETWMC2014)
ANovelKeyManagementSchemeinVANETs 587 GuihuaDuan,YunXiao,RuiJu,andHongSong
DesignandImplementationofNetworkHardDisk ................... 596 HongSong,JialongXu,andXiaoqiangCai
CombiningSupervisedandUnsupervisedLearningforAutomatic AttackSignatureGenerationSystem ............................... 607 LiliYang,JieWang,andPingZhong
TheStudyontheIncreasingStrategyofDetectingMovingTargetin WirelessSensorNetworks 619 JialongXu,ZhigangChen,AnfengLiu,andHongSong
ACRC-BasedLightweightAuthenticationProtocolforEPCglobal Class-1Gen-2Tags ............................................... 632 ZhicaiShi,YongxiangXia,YuZhang,YihanWang,andJianDai
XXTableofContents–PartI
TestCasePrioritizationBasedonGeneticAlgorithmandTest-Points Coverage 644
WeixiangZhang,BoWei,andHuisenDu
SAEP:SimulatedAnnealingBasedEnsembleProjectingMethodfor SolvingConditionalNonlinearOptimalPerturbation 655 ShichengWen,ShijinYuan,BinMu,HongyuLi,andLeiChen
ConvertingPtolemyIIModelsto SpaceExforAppliedVerification ..... 669 ShiweiRan,JinzhiLin,YingWu,JianzhongZhang,andYuweiXu
ResearchonInterestSearchingMechanisminSNSLearning Community ..................................................... 684 RenfengWang,JunpeiLiu,HainingSun,andZhihuaiLi
The5thInternationalWorkshoponIntelligent CommunicationNetworks(IntelNet2014)
ImprovingtheFrequencyAdaptiveCapabilityofHybridImmune DetectorMaturationAlgorithm .................................... 691 JunganChen,ShaoZhongZhang,andDanjiangChen
Cluster-BasedTimeSynchronizationProtocolforWirelessSensor Networks ....................................................... 700 JianZhang,ShipingLin,andDandanLiu
AFastCABACAlgorithmforTransformCoefficientsinHEVC 712 NanaShan,WeiZhou,andZheminDuan
AImprovedPageRankAlgorithmBasedonPageLinkWeight 720 XinshengWang,JianchuMa,KaiyuanBi,andZhihuaiLi
ComputationOffloadingManagementforVehicularAdHocCloud 728 BoLi,YijianPei,HaoWu,ZhiLiu,andHaixiaLiu
AnApproachtoModelComplexBigDataDrivenCyberPhysical Systems ........................................................ 740 LichenZhang
The5thInternationalWorkshoponWireless NetworksandMultimedia(WNM2014)
ReliableTransmissionwithMultipathandRedundancyforWireless MeshNetworks .................................................. 755
WenzeShi,TakeshiIkenaga,DaikiNobayashi,XinchunYin,and YebinXu
TableofContents–PartIXXI
CommunityRoamer:ASocial-BasedRoutingAlgorithmin OpportunisticMobileNetworks 768 TieyingZhu,ChengWang,andDandanLiu
ASelf-adaptiveReliablePacketTransmissionSchemeforWireless MeshNetworks .................................................. 781 WenzeShi,TakeshiIkenaga,DaikiNobayashi,XinchunYin,and HuiXu
DistributedEfficientNodeLocalizationinWirelessSensorNetworks UsingtheBacktrackingSearchAlgorithm ...........................
AlanOliveiradeS´a,NadiaNedjah,andLuizadeMacedoMourelle UserSpecificQoSandItsApplicationinResourcesSchedulingfor WirelessSystem 809 ChaoHeandRichardD.Gitlin
LeeLuanLing
RelationbetweenIrregularSamplingandEstimatedCovariancefor Closed-LoopTrackingMethod .....................................
Bei-beiMiaoandXue-boJin
XXIITableofContents–PartI
794
ADistributedStorageModelforSensorNetworks .................... 822
836
AuthorIndex 845
TableofContents–PartII
ParallelDataProcessinginDynamicHybridComputingEnvironment UsingMapReduce ................................................ 1 BingTang,HaiwuHe,andGillesFedak
FastScalablek-means++AlgorithmwithMapReduce ................ 15 YujieXu,WenyuQu,ZhiyangLi,ChangqingJi,YuanyuanLi,and YinanWu
AccelerationofSolvingNon-EquilibriumIonizationviaTracerParticles andMapReduceonEulerianMesh 29 JianXiao,XingyuXu,JizhouSun,XinZhou,andLiJi
AContinuousVirtualVector-BasedAlgorithmforMeasuring CardinalityDistribution 43 XuefeiZhou,WeijiangLiu,ZhiyangLi,andWenwenGao
Hmfs:EfficientSupportofSmallFilesProcessingoverHDFS .......... 54 CairongYan,TieLi,YongfengHuang,andYanglanGan
UtilizingMultipleXeonPhiCoprocessorsonOneComputeNode 68 XinnanDong,JunChai,JingYang,MeiWen,NanWu,XingCai, ChunyuanZhang,andZhaoyunChen
HPSO:PrefetchingBasedSchedulingtoImproveDataLocalityfor MapReduceClusters 82 MingmingSun,HangZhuang,XuehaiZhou,KunLu,and ChanglongLi
ServiceSchedulingAlgorithminVehicleEmbeddedMiddleware ........ 96 JuanLuo,XinJin,andFengWu
SimilarSamplesCleaninginSpeculativeMultithreading .............. 108 YuxiangLi,YinliangZhao,andBinLiu
Equi-joinforMultipleDatasetsBasedonTimeCostEvaluation Model 122 HongZhu,LiboXia,MieyiXie,andKeYan
IdentifyingFileSimilarityinLargeDataSetsbyModuloFileLength ... 136 YongtaoZhou,YuhuiDeng,XiaoguangChen,andJunjieXie
Conpy:ConcolicExecutionEngineforPythonApplications ........... 150 TingChen,Xiao-songZhang,Rui-dongChen,BoYang,and YangBai
APlatformforStockMarketSimulationwithDistributedAgent-Based Modeling 164
ChunyuWang,CeYu,HutongWu,XiangChen,YueleiLi,and XiaotaoZhang
C2CU:ACUDACProgramGeneratorforBulkExecutionofa SequentialAlgorithm 178
DaisukeTakafuji,KojiNakano,andYasuakiIto
DynamicallySpawningSpeculativeThreadstoImproveSpeculative PathExecution 192
MeirongLi,YinliangZhao,andYouTao
AParallelAlgorithmofKirchhoffPre-stackDepthMigrationBasedon GPU 207
YidaWang,ChaoLi,YangTian,HaihuaYan,ChanghaiZhao,and JianleiZhang
AnAlgorithmtoEmbedaFamilyofNode-Disjoint3DMeshesinto LocallyTwistedCubes 219 LantaoYouandYuejuanHan
GPUAccelerationofFindingMaximumEigenvalueofPositive Matrices 231
NingTian,LongjiangGuo,ChunyuAi,MeiruiRen,andJinbaoLi
ImprovingSpeculationAccuracywithInter-threadFetchingValue Prediction 245
FanXu,LiShen,ZhiyingWang,HuiGuo,BoSu,andWeiChen
TowardsEfficientDistributedSPARQLQueriesonLinkedData 259 XuejinLi,ZhendongNiu,andChunxiaZhang
MRFS:ADistributedFilesSystemwithGeo-replicatedMetadata 273 JiongyuYu,WeigangWu,DiYang,andNingHuang
AnAdvancedDataRedistributionApproachtoAcceleratethe Scale-DownProcessofRAID-6 286 CongjinDu,ChentaoWu,andJieLi
ThreadMappingandParallelOptimizationforMICHeterogeneous ParallelSystems 300 TaoJu,ZhengdongZhu,YinfengWang,LiangLi,andXiaosheDong
XXIVTableofContents–PartII
EfficientStorageSupportforReal-TimeNear-DuplicateVideo Retrieval
ZhenhuaNie,YuHua,DanFeng,QiuyuLi,andYuanyuanSun
RepairingMultipleDataLossesbyParallelMax-minTreesBasedon RegeneratingCodesinDistributedStorageSystems
312
325 PengfeiYou,YuxingPeng,ZhenHuang,andChangjianWang
ExploitingContentLocalitytoImprovethePerformanceandReliability ofPhaseChangeMemory ......................................... 339
SuzhenWu,ZaifaXi,BoMao,andHongJiang
Computing,CommunicationandControl TechnologiesinIntelligentTransportationSystem (3CinITS2014)
ApplicationofSupportVectorMachineintheDecision-Makingof Maneuvering ....................................................
352 ZhuangQi,ZhengChang,HanbangSong,andXinyuZhang
MobilePhoneDataRevealtheSpatiotemporalRegularityofHuman Mobility ........................................................ 359
ZihanSun,HanxiaoZhou,JianfengZheng,andYuhaoQin
ResearchonLarge-ScaleVesselRidingTidalCurrenttoPromote EfficiencyofFairway 366 KangZhou,RanDai,andXingwangYue
AVertex-ClusteringAlgorithmBasedontheCluster-Clique ........... 376 DeqiangWang,BinZhang,andKelunWang
DesignedSlideModeControllerforShipAutopilotwithSteeringGear Saturation
386 Gao-Xiaori,Hong-Biguang,Xing-Shengwei,andLi-Tieshan
396 WangDelongandRenHongxiang
BusArrivalTimePredictionandRelease:System,Databaseand AndroidApplicationDesign 404 JunhaoFu,LeiWang,MingyangPan,ZhongyiZuo,andQianYang
OnKeyTechniquesofaRadarRemoteTelemetryandMonitoring System ......................................................... 417 JianglingHao,MingyangPan,DeqiangWang,LiningZhao,and DepengZhao
TableofContents–PartIIXXV
......................................................
AutomaticAssessmentModelforSailinginNarrowChannel
PSCShip-SelectingModelBasedonImprovedParticleSwarm OptimizationandBPNeuralNetworkAlgorithm
425 TingtingYang,ZhonghuaSun,ShounaWang, ChengmingYang,andBinLin
LRPONBasedInfrastructureLayoutPlanningofBackboneNetworks forMobileCloudServicesinTransportation 436
SongYingge,DongJie,LinBin,andDingNing
InfrastructureDeploymentandDimensioningofRelayed-Based HeterogeneousWirelessAccessNetworksforGreenIntelligent
LinBin,GuoJiamei,HeRongxi,andYangTingting
VesselMotionPatternRecognitionBasedonOne-WayDistanceand SpectralClusteringAlgorithm .....................................
WenyaoMa,ZhaolinWu,JiaxuanYang,andWeifengLi
NavigationSafetyAssessmentofShipinRoughSeasBasedonBayesian Network
FengdeQu,FengwuWang,ZongmoYang,andJianSun
OptimizationofShipSchedulingBasedonOne-WayFairway
JunLin,Xin-yuZhang,YongYin,Jin-taoWang,andShunYao ResearchonVirtualCrewPathPlanningSimulatorBasedonA* Algorithm
HuilongHao,HongxiangRen,andDajunChen
DajunChen,HongxiangRen,andHuilongHao
TheAssessmentofRiskofCollisionbetweenTwoShipsAvoiding CollisionbyAlteringCourse ....................................... 507
WeifengLi,WenyaoMa,JiaxuanYang,GuoyouShi,and RobertDesrosiers
TheMergingAlgorithmofRadarSimulationDatainNavigational Simulator .......................................................
ShunYao,Xin-yuZhang,YongYin,XinXiong,andJunLin DataMiningResearchBasedonCollegeForum
LimingXue,ZhihuaiLi,andWeixinLuan
SimulationofMaritimeJointSea-AirSearchTrendUsing3DGIS 533 XingShengwei,WangRenda,YangXuefeng,andLiuJiandao
XXVITableofContents–PartII
Transportation .................................................. 447
461
470
479
487
SpeechRecognitionAppliedinVHFSimulationSystem ............... 496
516
525
QuantitativeAnalysisfortheDevelopmentofMaritimeTransport Efficiency 543
WenboZhang,ZhaolinWu,YongLiu,andZebingLi
SecurityandPrivacyinComputerandNetwork Systems(SPCNS2014)
ImageCompressionBasedonTime-DomainLappedTransformand QuadtreePartition 553
XiuhuaMa,JiwenDong,andLeiWang
TheApplicabilityandSecurityAnalysisofIPv6TunnelTransition Mechanisms ..................................................... 560 WeiMi
QOSPerformanceAnalysisforFlexibleWorkflowSupportingException Handling ........................................................ 571
XiaoyanZhu,JingleZhang,andBoWang
AnalysisofPropagationCharacteristicsofVariantWorms 581 TaoLiu,CanZhang,MingjingCao,andRupingWu
ADesignofNetworkBehavior-BasedMalwareDetectionSystemfor Android 590
YinchengQi,MingjingCao,CanZhang,andRupingWu
DetectionandDefenseTechnologyofBlackholeAttacksinWireless SensorNetwork .................................................. 601 HuishengGao,RupingWu,MingjingCao,andCanZhang
AnImprovedRemoteDataPossessionCheckingProtocolinCloud Storage ......................................................... 611 EnguangZhouandZhoujunLi
FaultLocalizationofConcurrencyBugsandItsApplicationinWeb Security 618 ZhenyuanJiang
FeatureSelectionTowardOptimizingInternetTrafficBehavior Identification 631 ZhenxiangChen,LizhiPeng,ShupengZhao,LeiZhang,and ShanJing
ID-BasedAnonymousMulti-receiverKeyEncapsulationMechanism withSenderAuthentication ....................................... 645 BoZhang,TaoSun,andDairongYu
TableofContents–PartIIXXVII
EnergyEfficientRoutingwith aTree-BasedParticleSwarm OptimizationApproach 659
GuodongWang,HuaWang,andLeiLiu
AContext-AwareFrameworkforSaaSServiceDynamicDiscoveryin Clouds ......................................................... 671
ShaochongLiandHao-pengChen
XXVIIITableofContents–PartII
AuthorIndex .................................................. 685
PortingthePrincetonOceanModeltoGPUs
ShizhenXu1,3 ,XiaomengHuang1,2,3 ,YanZhang1,2,3 ,YongHu1,3 , HaohuanFu1,2,3 ,andGuangwenYang1,2,3
1 MinistryofEducationKeyLaboratoryforEarthSystemModeling
2 CenterforEarthSystemScience,TsinghuaUniversity,100084
3 JointCenterforGlobalChangeStudies,Beijing,100875,China {hxm,haohuan,ygw}@tsinghua.edu.cn, {yan-zhang12,huyong11,xsz12} @mails.tsinghua.edu.cn
Abstract. WhileGPUisbecomingacompellingaccelerationsolution foraseriesofscientificapplications,mostexistingworkonclimatemodelsonlyachievedlimitedspeedup.Itisduetopartialportingofthe hugecodeandthememoryboundinherenceofthesemodels.Inthis work,wedesignandimplementacustomizedGPU-basedacceleration ofthePrincetonOceanModel(gpuPOM).BasedonNvidia’sstate-ofthe-artGPUarchitectures(K20XandK40m),werewritetheoriginal modelfromtheFortranintotheCUDA-Ccompletely.Severalacceleratingmethods,includingoptimizingmemoryaccessinasingleGPU,overlappingcommunicationandboundaryoperationsamongmultipleGPUs, arepresented.TheexperimentalresultsshowthatthegpuPOMonone K40mGPUachieves6.9-foldto17.8-foldspeedupand5.8-foldto15.5foldspeedupononeK20XGPUcomparingwithdifferentIntelCPUs. FurtherexperimentsonmultipleGPUsindicatethattheperformanceof thegpuPOMonasuper-workstationcontaining4GPUsisequivalentto apowerfulclusterconsistingof34pureCPUnodeswithover400CPU cores.
1Introduction
Thereisnodoubtthathigh-resolutionclimatemodelingiscrucialforsimulating globalandregionalclimatechange.Mostresearchgroupsinclimatemodeling haveestablishedtheirownroadmapsforhigh-resolutionrangingfromseveral kilometersdowntohundredsofmeters.Theneedforhigh-resolutionexposes seriousproblemsbecausethetimeconsumedinrunninghigh-resolutionclimate modelsremainsasignificantscientificandengineeringchallenge.
Recentyears,manyscientificcodeshavebeenportedtotheGPUandwell suitedtotheGPU.Intheareaofclimatemodels,mostofthepreviouswork achieveddifferentlevelsofspeedupforentiremodelsonGPUs.Forinstance, Michalakesetalacceleratedacomputationallyintensivemicrophysicsprocess oftheWeatherResearchandForecast(WRF)modelwithaspeedupofnearly 25xthatspeedupstheentireWRFmodelbyonly1.23x[1];Shimokawabeetal fullyacceleratedtheASUCAmodel–apro ductionandnon-hydrostaticweather
X.-h.Sunetal.(Eds.):ICA3PP2014,PartI,LNCS8630,pp.1–14,2014. c SpringerInternationalPublishingSwitzerland2014
model–on528NvidiaTeslaGT200GPUsandachieves15TFlops[2];[3]acceleratedafullhugeoperationalweatherforecastingmodelCOSMOandachieved 2.8Xspeedupsforitsdynamiccore.
Accordingtoouranalysis,furtherspeedupofclimatemodelsislimitedby tworeasons.Firstly,thepartialGPUportinglimitstheperformanceofthe wholeapplication.Manyscientificmodelssufferfromaflatperformanceprofile duringGPUaccerlation.InCAM[4],themostsingleexpensivesubroutineonly accountsforabout10%oftotalruntimeandmostsubroutinesaccountsforless than5%,whichisthesameinmpiPOM.AccordingtotheAmdahllaw,thewhole modelcannotbesignificantlyspeedupsuchastheGPUaccerlationofCAM [4],ROMS[5],WRF[1]andHOMME[6],butitissometimesacompromised approachbecauseofthehugecode.Secondly,climatemodelsismainlybounded bymemoryaccess,especiallyfortheir dynamiccores[3].Greatworkhasbeen doneinthefullGPUaccelerationofCOSMO[3],ASUCA[2]andNIM[7], includingallthedynamiccoresandsomeportionofphysics.Butthememoryboundproblemexistsandcanbefurthereasedthroughbetteruseofstate-ofthe-artGPUmemoryhierarchy.
Theobjectiveofourstudyistoshortenthecomputationtimeofhighresolutionoceanmodelsbyparallelizingtheirexistingmodelstructuresusing theGPU.Withtherepresentativeoceanmodel,thempiPOM,usedasanexample,wedemonstratehowtoparallelizea noceanmodeltomakeiteffectivelyrun onGPUarchitecture.Usingstate-of-the-artGPUarchitecture,wefirstrewrite theentirempiPOMmodelfromtheoriginalFortranversionintoanewCUDACversion.WecallthenewversiongpuPOM.Then,wedesignandimplement severaldifferentoptimizingmethodsfromtwolevels,suchas computationoptimizationinasingleGPU,communicationoptimizationamongmultipleGPUs.
Intermsofcomputation,weconcentr ateonmemoryaccessoptimizationand makingbetteruseofGPU’smemoryhierarchy.Wedeployafour-categories optimizations,includingusingread-only datacache,localmemoryblocking,loop fusionandsubroutinefusion,thatareespeciallyeffectiveforclimatemodels.The experimentalresultsdemonstratethatoneK40mGPUachieves6.9-foldto17.8foldspeedupandoneK20Xachieves5.8-foldto15.5-foldspeedupoverdifferent Intelmulti-coreCPUs.
Intermsofcommunication,weconcentrateonthefine-grainedoverlapping betweentheinner-regioncomputationandtheouter-regioncommunicationand updating.Withthenewdesign,multipleGPUsinonenodecancommunicate directlybypassingtheCPU.Inaddition,withthefine-grainedcontrolofthe CUDAstreamsandtheirpriorities,inner-regioncomputationcanexecuteconcurrentlywithouter-regioncommunicationandupdating.
Tounderstandtheaccuracy,performanceandscalabilityofthegpuPOM,we buildacustomizedsuper-workstationwithfourK20XGPUsinside.ExperimentalresultsshowthattheperformanceofthegpuPOMrunningonthissuperworkstationcancomparewithapowerfulclusterconsistedof34pureCPUnodes withover400CPUcores,whichmeansthisnovelgpuPOMversionprovidesa fastandattractivesolutionforoceanscientiststoconductsimulationresearch.
2S.Xuetal.
PortingthePrincetonOceanModeltoGPUs3
Theremainderofthispaperisorganizedasfollows.WereviewthempiPOM inSec.2,andintroducethestructureofthegpuPOMinSec.3.InSec.4, weintroducefouroptimizationstoefficientlyuseGPU’smemoryhierarchyfor thegpuPOM.InSection5,wepresentdetailedtechniquesaboutcommunication optimizationamongmultipleGPUs.Weprovidethecorrespondingexperimental resultsaboutcorrectness,performanceandscalabilityinSection6andconclude ourworkinSection7.
2ThempiPOM
ThempiPOM[8]isawidelyusedparallelbranchofPOMbasedonMessage PassingInterface(MPI)andretainspartthephysicspackageoftheoriginal POM[9].POMisapowerfulregionalo ceanmodelandhasbeenusedinawiderangeofapplicationssuchascirculationandmixingprocessesinrivers,seasand oceans.
ThempiPOMsolvestheprimitiveequationunderhydrostaticandboussinesqapproximations.Inthehorizontal,spatialderivativesarecomputedusingcentered-spacedifferencingonastaggeredArakawaC-grid.Inthevertical, thempiPOMsupportsterrain-followingsigmacoordinatesandafourth-order schemeoptiontoreducetheinternalpressure-gradienterrors.ThempiPOMuses theclassicaltime-splittingtechniquetoseparatetheverticallyintegratedequations(externalmode)fromtheverticalstructureequations(internalmode).The externalmodecalculationisresponsibleforupdatingsurfaceelevationandthe verticallyaveragedvelocities.Theinternalmodecalculationresultsinupdates forvelocity,temperatureandsalinity,inadditiontotheturbulencequantities. Thethree-dimensionalinternalmodeandthetwo-dimensionalexternalmode arebothintegratedexplicitlyusingasecond-orderleapfrogscheme.Thesetwo modulesarethemosttimeconsumingmodulesofmpiPOM.
Todemonstratethememory-boundproblem,thePerformanceAPI(PAPI)[10] isusedtoestimatefloatingpointoperationscountandmemoryaccess(store/load) instructionscount.Resultsshowsthatthecomputationalintensity(flops/byte) ofthempiPOMisabout1:3.3,whilethatprovidedbySandyBridgeE5-2670 CPUsisabout7.5:1.Moreover,dataaremostlystreamedfrommemoryand showslittlelocality,whichmeansthatmpiPOMismainlyboundedbymemory access.ThewidermemorybandwidthofGPUhasattractedusalotandtothe bestofourknowledge,ourworkisthefirstGPUportingofthePOM.
3StructureofthegpuPOM
TheflowchartofthegpuPOMisillustratedinFig.1.ThemaindifferencebetweenmpiPOMandgpuPOMisthatCPUinthegpuPOMimplementationis onlyresponsiblefortheinitializingandoutput.ThegpuPOMbeginswithinitializingtherelevantarraysonCPUhostandthencopythesedatatoGPU. GPUwilldealwithallthecomputationincludingexternalmodeandinternal mode,andsoon.Duringthecomputation,thevariablesrequiredforoutputlike
Inputdataand Initialization
MemcpyHostToDevice
Advectionand Horizontaldiffusion ofU,V
Baroclinictermof U,V
Internal Mode
UpdateU,V
Continuityequation
+Boundaryoperation
Turbulenceequation
+Boundaryoperation
Tracertransport equation(T,S) +Boundaryoperation
Momentumequation
+Boundaryoperation
Output NetcdfFiles
External Mode
SeaSurfaceHight +Boundaryoperation
VerticalIntegrated momentequations
ComputeUT,VTfor Internalmode +Boundaryoperation
Fig.1. FlowchartofthegpuPOM
velocityandsurfaceelevationwillbecopiedbacktoCPUhostandthenwritten todiskataconstantfrequency.
InFig.1,eachmodulerepresentsapartoftheprimitiveequationsandis implementedbyseveralsubroutines.We retainedtheoverallstructureofthe originalcodeandrewroteallthesubroutineswithabout70CUDAkernelfunctions.ThusdataalwaysresidesonGPUduringthecomputation.
Inourimplementation,the3Darraysofvariablesarestoredsequentiallyin theorderof x, y , z (i,j,k ordering)andthe2Darraysareintheorderof x, y ,whichisthesamewiththeoriginalcode.EachGPUthreadspecifiesa(x,y ) pointinhorizontaldirectionandperformsallthecalculationsfromsurfaceto bottom(30to50points).Thethreadblocksareconfiguredwith(32, 4, 1)threads andsimilarperformancecanbeachievedifconfiguredwith(32, 8, 1)or(32, 16, 1) fortheK20XGPUweuse.
4S.Xuetal.
CPUGPU