Organization
ProgramCommittee
HabtamuAbieNorwegianComputingCenter,Norway
MarcoAldinucciUniversityofTurin,Italy
GiulioAlibertiUniversità degliStudiRomaIII,Italy
PedroAlonsoUniversitatPolitècnicadeValència,Spain
AlbaAmatoSecondaUniversità degliStudidiNapoli,Italy
DanielAndresenKansasStateUniversity,USA
CosimoAnglanoUniversitá delPiemonteOrientale,Italy
DaniloArdagnaPolitecnicodiMilano,Italy
MarcosAssuncaoInria,LIP,ENSLyon,France
DavidDelRioAstorgaUniversityCarlosIIIofMadrid,Spain
HrachyaAstsatryanNationalAcademyofSciencesofArmenia
NikzadBabaiiRizvandiUniversityofSydney,NICTA,Australia YanBaiUniversityofWashingtonTacoma,USA
MuneerMasadehBani
Yassein
Alal-BaytUniversity,Jordan
SaadBani-MohammadAlal-BaytUniversity,Jordan JorgeBarbosaFEUP,Portugal
NovellaBartoliniUniversityofRomeSapienza,Italy
LadjelBellatrecheLIAS/ENSMA,France
SalimaBenbernouUniversité ParisDescartes,France SiegfriedBenknerUniversityofVienna,Austria
JorgeBernalBernabeUniversityofMurcia,Spain
MdZakirulAlamBhuiyanTempleUniversity,USA
JavierGarciaBlasUniversityCarlosIIIofMadrid,Spain OanaBoncaloUniversityPolitehnicaofTimisoara,Romania
DanielRubioBonillaHLRS – UniversityofStuttgart,Germany
GeorgeBosilcaInnovativeComputingLaboratory,Universityof Tennessee,USA
PascalBouvryUniversityofLuxembourg SurenBynaLawrenceBerkeleyNationalLaboratory,USA MassimoCafaroUniversityofSalento,Italy
SilvinaCainoLoresUniversityCarlosIIIofMadrid,Spain ChristianCallegariUniversityofPisa,Italy
AparicioCarranzaNewYorkCityCollegeofTechnology,USA
JesusCarreteroUniversityCarlosIIIofMadrid,Spain
PedroA.CastilloValdiviesoUniversidaddeGranada,Spain TaniaCerquitelliPolitecnicodiTorino,Italy
SudipChakrabortyValdostaStateUniversity,USA
JerryH.ChangNationalCenterforHigh-PerformanceComputing, China
Yue-ShanChangNationalTaipeiUniversity,Taiwan AnupamChattopadhyayNanyangTechnologicalUniversity,Singapore JingChenNationalChengKungUniversity,Taiwan
Tzung-ShiChenNationalUniversityofTainan,Taiwan
YuChenStateUniversityofNewYork – Binghamton,USA ZizhongChenUniversityofCalifornia,Riverside,USA
JohnA.ClarkUniversityofYork,UK
StefaniaColonneseUniversità diRomaLaSapienza,Italy
MassimoCoppolaInstituteofInformationScienceandTechnologies (ISTI/CNR),Italy
AnaCortesUniversitatAutònomadeBarcelona,Spain
RaphaëlCouturierUniversityofFrancheComte,France
FélixCuadradoQueenMaryUniversityofLondon,UK
AlfredoCuzzocreaUniversityofTrieste,Italy
BogusławCyganekAGHUniversityofScienceandTechnology,Poland GeorgesDaCostaIRIT/ToulouseIII,France
MasoudDaneshtalabKTHRoyalInstituteofTechnology,Sweden GregoireDanoyUniversityofLuxembourg
SabrinaDeCapitani diVimercati
Università degliStudidiMilano,Italy
SaptarshiDebroyCityUniversityofNewYork,USA CasimerDecusatisMaristCollege,USA
EugenDeduUniversityofFranche-Comté,France
Juan-CarlosDíaz-MartínUniversityofExtremadura,Spain
YacineDjemaielCommunicationNetworksandSecurity,Research Laboratory,Tunisia
CiprianDobreUniversityPolitehnicaofBucharest,Romania
ManuelF.DolzUniversityCarlosIIIofMadrid,Spain
SusanDonohueUniversityofVirginia,USA ZhihuiDuTsinghuaUniversity,Beijing,China
YucongDuanHainanUniversity,China
ChristianEspositoUniversityofSalerno,Italy
RobertoR.ExpósitoUniversityofACoruña,Spain
JoseAlfredoFerreiraCostaUFRN – UniversidadeFederaldoRioGrandedo Norte,Brazil
UgoFioreFedericoIIUniversity,Italy
NekiFrasheriPolytechnicUniversityofTirana,Albania FrancoFrattolilloUniversityofSannio,Italy
MarcFrincuUniversityofSouthernCalifornia,USA
JaafarGaberUniversité deTechnologiedeBelfort-Montbéliard, France
JoseDanielGarciaUniversityCarlosIIIofMadrid,Spain
LuisJavierGarcíaVillalbaUniversityComplutenseofMadrid,Spain
XOrganization
Juan-L.García-ZapataUniversityofExtremadura,Spain SaurabhKumarGargUniversityofTasmania,Australia
EsterMartinGarzonAlmeriaUniversity,Spain
PaoloGastiNewYorkInstituteofTechnology,USA VictorGergelNizhnyNovgorodStateUniversity,Russia
AnsgarGerlicherStuttgartMediaUniversity,Germany
VladimirGetovUniversityofWestminster,UK
HaraldGjermundrodUniversityofNicosia,Cyprus DieterGollmannHamburgUniversityofTechnology,Germany JingGongKTH,Sweden
ArturoGonzalez-EscribanoUniversidaddeValladolid,Spain
PilarGonzalez-FerezICS-FORTH,Greece Jose-Luis
Gonzalez-Sanchez CINVESTAV,Mexico
José GraciaHigh-PerformanceComputingCenterStuttgart, Germany
ChristosGrecosIndependentConsultants,Greece
DanielGrosuWayneStateUniversity,USA
SheikhM.HabibTechnischeUniversitätDarmstadt,Germany
KhalidHasanovIBMResearchIreland HoucineHassanUniversitatPolitecnicadeValencia,Spain
Shi-JinnHorngNationalTaiwanUniversityofScienceand Technology,Taiwan
AtanasHristovUniversityofInformationScienceandTechnology, FYRMacedonia
Sun-YuanHsiehNationalChengKungUniversity,Taiwan
Ching-HsienHsuChungHuaUniversity,China JiaHuLiverpoolHopeUniversity,UK XinyiHuangFujianNormalUniversity,China YonggangHuangBeijingInstituteofTechnology,China
ZhiyiHuangUniversityofOtago,NewZealand MauroIaconoSecondaUniversità degliStudidiNapoli,Italy
ShadiIbrahimInria,RennesBretagneAtlantiqueResearchCenter, France
Young-SikJeongDonggukUniversity,SouthKorea HaiJiangArkansasStateUniversity,USA WenjunJiangHunanUniversity,China
EdwardJungSouthernPolytechnicStateUniversity
VanaKalogerakiAthensUniversityofEconomicsandBusiness,Greece GeorgiosKambourakisUniversityoftheAegean,Greece PanagiotisKarampelasHellenicAmericanUniversity,USA HelenKaratzaAristotleUniversityofThessaloniki,Greece ChristophKesslerLinköpingUniversity,Sweden
MuhammadKhurramKhanKingSaudUniversity,SaudiArabia
PeterKilpatrickQueen’sUniversityBelfast,UK SookyunKimPaiChaiUniversity,SouthKorea
OrganizationXI
RyanK.L.KoTheUniversityofWaikato,NewZealand PeterKropfUniversityofNeuchatel,Switzerland
RuggeroDonidaLabatiUniversità degliStudidiMilano,Italy
Kuan-ChouLaiNationalTaichungUniversity,Taiwan
AlgirdasLančinskasVilniusUniversity,Lithuania AlexeyLastovetskyUniversityCollegeDublin,Ireland Che-RungLeeNationalTsingHuaUniversity,Taiwan
LaurentLefevreEcoleNormalLyon,France YingjiuLiSingaporeManagementUniversity,Singapore YusenLiNanyangTechnologicalUniversity,Singapore ZengxiangLiInstituteofHighPerformanceComputingAgencyfor Science,TechnologyandResearch,Singapore XinLiaoHunanUniversity,China
ChunyuLinHTCCorp.,Taiwan
ZhenLingSoutheastUniversity,China
QinLiuHunanUniversity,China
HaikunLiuNorthChinaElectricityPowerUniversity
XiaoLiuSoftwareEngineeringInstitute,EastChinaNormal University,China
YongchaoLiuGeorgiaInstituteofTechnology,USA GiovanniLivragaUniversità degliStudidiMilano,Italy
JaimeLloretUniversidadPolitécnicadeValencia,Spain
GeorgeLoukasUniversityofGreenwich,UK HaibingLuSantaClaraUniversity,USA PaulLuUniversityofAlberta,Canada RongxingLuUniversityofNewBrunswick,Canada WeiLuKeeneStateCollege,USA LiangLuoUniversityofWashington,USA SidiAhmedMahmoudiUniversityofMons,France AmitMajumdarUniversityofCaliforniaSanDiego,SanDiego SupercomputerCenter,USA
DamianAlvarezMallonForschungszentrumJülich,Germany JoseMiguelMantasRuizUniversidaddeGranada,Spain RavindranathManumachuUniversityCollegeDublin,Ireland XinjunMaoNationalUniversityofDefenseTechnology,China TomasMargalefUniversitatAutonomadeBarcelona,Spain StefanoMarkidisKTH,Sweden FabrizioMarozzoDIMES,UniversityofCalabria,Italy PedroJ.MarronUniversityofDuisburg-Essen,Germany
StefanoMarroneSecondUniversityofNaples,Italy AlejandroMasrurTUChemnitz,Germany
BarbaraMasucciUniversityofSalerno,Italy SusumuMatsumaeSagaUniversity,Japan
RafaelMayoGualUniversityJaumeI,Spain AnatolyMelnykLvivPolytechnicNationalUniversity,Ukraine ViktorMelnykJohnPaulIICatholicUniversityofLublin,Poland
XIIOrganization
IosifMeyerovUNN,Russia
KonstantinaMitropoulouUniversityofCambridge,UK
MiguelCárdenasMontesCIEMAT,Spain
FrancescoMoscatoSecondUniversityofNaples,Italy
PeterMuellerIBMZurichResearch,Switzerland
DavidNaccacheUniversité ParisII,France
TakeshiNanriKyushuUniversity,Japan
EsmondNgLawrenceBerkeleyNationalLaboratory,USA
KennethO’BrienUniversityCollegeDublin,Ireland
HirotakaOnoKyushuUniversity,Japan
EunokPaekHanyangUniversity,SouthKorea
FrancescoPalmieriFedericoIIUniversity,Italy
BenoitParreinUniversityofNantes,France
AbelFranciscoPazGallardoCIEMAT,Spain
CathrynPeoplesUniversityofUlster,Ireland
FernandoPereniguez-GarciaCatholicUniversityofMurcia,Spain
GüntherPernulUniversityofRegensburg,Germany
KalyanPerumallaOakRidgeNationalLaboratory,USA
DanaPetcuWestUniversityofTimisoara,Romania
SalvadorPetitUniversidadPolitécnicadeValencia,Spain
Jean-MarcPiersonUniversityofToulouse,IRIT,France
RobertoDiPietroRomaTreUniversityofRome,Italy
VincenzoPiuriUniversityofMilan,Italy
FlorinPopUniversityPolitehnicaofBucharest,Romania NinaPopovaMoscowStateUniversity,Russia
HengQiDalianUniversityofTechnology,China
Md.ObaidurRahmanDhakaUniversityofEngineeringandTechnology, Bangladesh
RajivRanjanTheUniversityofNewSouthWales,Australia
ThomasRauberUniversityofBayreuth,Germany
Md.AbdurRazzaqueUniversityofDhaka,Bangladesh
YongliRenDeakinUniversity,Australia
JuanAntonioRicoGallegoUniversityofExtremadura,Spain
GabrielRodríguezUniversidadedaCoruña,Spain
FélixR.RodríguezUniversityofExtremadura,Spain
ImedRomdhaniEdinburghNapierUniversity,UK
BimalRoyIndianStatisticalInstitute,Kolkata,India
GudulaRuengerChemnitzUniversityofTechnology,Germany
SalvatoreRuggieriUniversità diPisa,Italy
AntonioRuiz-MartínezUniversityofMurcia,Spain
FrancoiseSailhanCNAM,France
SubhashSainiNASA,USA
RizosSakellariouUniversityofManchester,UK
LuisMiguelSanchezUniversityCarlosIIIofMadrid,Spain
HamidSarbazi-AzadIPM,Iran
Sven-BodoScholzHeriot-WattUniversity,UK
OrganizationXIII
EstefaniaSerranoUniversityCarlosIIIofMadrid,Spain JunShenUniversityofWollongong,Australia
AliShokerINESCTECandUniversityofMinho,Portugal
AnnaSikoraUniversityAutonomaofBarcelona,Spain DraganSimicUniversityofNoviSad,Serbia
DimitrisE.SimosSBAResearch,Austria
DavidE.SinghUniversityCarlosIIIofMadrid,Spain
GenovevaVargasSolarCNRS-LIG-LAFMIA,France ChaoSongUniversityofElectronicScienceandTechnology ofChina
AndreySozykinUralFederalUniversity,Russia
GiandomenicoSpezzanoCNR-ICARandUniversityofCalabria,Italy
PatriciaStolfIRIT,France
PeterStrazdinsTheAustralianNationalUniversity,Australia
ChunhuaSuJapanAdvancedInstituteofScienceandTechnology
Chang-AiSunUniversityofScienceandTechnologyBeijing,China
MagdalenaSzmajduchCracowUniversityofTechnology,Poland
DaisukeTakahashiUniversityofTsukuba,Japan
DomenicoTaliaUniversityofCalabria,Italy
AndreiTchernykhCICESEResearchCenter
SabuM.ThampiIndianInstituteofInformationTechnologyand Management,India
HiroyukiTomiyamaRitsumeikanUniversity,Japan MassimoTorquatiUniversityofPisa,Italy
PaoloTrunfioDEIS,UniversityofCalabria,Italy
TomoakiTsumuraNagoyaInstituteofTechnology,Japan
RaduTudoranHUAWEIERC,Germany
DidemUnatLawrenceBerkeleyNationalLaboratory,USA SebastienVarretteUniversityofLuxembourg,Luxembourg
MariaBarredaVayá UniversitatJaumeI,Spain
SalvatoreVenticinqueSecondaUniversità diNapoli,Italy
VladimirVoevodinMoscowUniversity,Russia ChenWangCSIROICTCenter,Australia MingzhongWangUniversityoftheSunshineCoast,Australia QianWangWuhanUniversity,China
You-ChiunWangNationalSunYat-senUniversity,China YunshengWangKetteringUniversity,USA
ZekeWangNanyangTechnologicalUniversity,Singapore
ZhiboWangWuhanUniversity,China
MartineWedlakeIBM,USA
JinWeiUniversityofAkron,USA
ShengWenDeakinUniversity,Australia
BeatWolfHES-SO,UniversityofWürzburg,Germany
HejunWuSunYat-SenUniversity,China WeigangWuSunYat-senUniversity,China
XIVOrganization
YongdongWuInstituteforInfocommResearch,Singapore RomanWyrzykowskiCzestochowaUniversityofTechnology,Poland LiaoXiaofeiHuazhongUniversityofScienceandTechnology, China
XiaofeiXingGuangzhouUniversity,China QuanqingXuDataStorageInstitute,A*STAR,Singapore WeiXueTsinghuaUniversity,China
RaminYahyapourGWDG – UniversityofGöttingen,Germany Chao-TungYangTunghaiUniversity,Taiwan BaijianYangPurdueUniversity,USA BaoliuYeNanjingUniversity,China
HuaYuHuazhongUniversityofScienceandTechnology, China
ShuchengYuUniversityofArkansas,USA MazdakZamaniKeanUniversity,USA
SheraliZeadallyUniversityofKentucky,USA DezeZengUniversityofAizu,Japan
PengZhangStonyBrookUniversity,USA
DaqiangZhangTongjiUniversity,China
DongfangZhaoPaci ficNorthwestNationalLaboratory Yun-WeiZhaoNanyangTechnologicalUniversity,Singapore YunhuiZhengIBMT.J.WatsonResearchCenter,USA JianlongZhongGRAPHSQLINC,USA XingquanZhuFloridaAtlanticUniversity,USA SotiriosZiavrasNewJerseyInstituteofTechnology,USA
AdditionalReviewers
Andión,José M. Bao,Tao Bezemskij,Anatolij Catuogno,Luigi CortezMendoza, JorgeMario Crane,Paul Dhoutaut,Dominique
Fernandez,Javier GarcíaZapata,JuanLuis Heart field,Ryan Kieffer,Emmanuel Mair,Jason Niu,Zhaojie Peng,Tao Seo,Hwajeong
Soundararajan,Varun Tao,Jinsong Tygart,Adam Veiga,Jorge Zhang,Shaobo Zhao,Jieyi
OrganizationXV
ParallelandDistributedArchitectures
IntelligentSPARQLEndpoints:OptimizingExecutionPerformance byAutomaticQueryRelaxationandQueueScheduling................3 AnaI.Torre-Bastida,EstherVillar-Rodriguez,MirenNekaneBilbao, andJavierDelSer
Hardware-BasedSequentialConsistencyViolationDetectionMadeSimpler...18 MohammadMajharulIslam,RiadAkram,andAbdullahMuzahid
OptimizedMappingSpikingNeuralNetworksontoNetwork-on-Chip......38 YuJi,YouhuiZhang,HeLiu,andWeiminZheng
SoftwareSystemsandProgramming
APortableLock-FreeBoundedQueue............................55 PeterPirkelbauer,ReedMilewicz,andJuanFelipeGonzalez
AC++GenericParallelPatternInterfaceforStreamProcessing..........74 DaviddelRioAstorga,ManuelF.Dolz,LuisMiguelSanchez, JavierGarcíaBlas,andJ.DanielGarcía
CreatingDistributedExecutionPlanswithBobolangNG...............88 DavidBednárek,MartinKruliš,JakubYaghob,andFilipZavoral
DecidingtheDeadlockandLivelockinaPetriNetwithaTargetMarking BasedonItsBasicUnfolding..................................98 GuanjunLiu,KunZhang,andChangjunJiang
ANewScalableApproachforDistributedMetadatainHPC............106 CristinaRodríguez-Quintana,AntonioF.Díaz,JulioOrtega, RaúlH.Palacios,andAndrésOrtiz
EnablingAndroid-BasedDevicestoHigh-EndGPGPUs...............118 RaffaeleMontella,CarmineFerraro,SokolKosta,ValentinaPelliccia, andGiulioGiunta
DistributedandNetwork-BasedComputing
3-AdditiveApproximationAlgorithmforMulticastTimein2D TorusNetworks...........................................129 HovhanessA.HarutyunyanandMeghrigTerzian
Contents
OnlineResourceCoalitionReorganizationforEfficientScheduling ontheIntercloud...........................................143
AdrianSpataru,TeodoraSelea,andMarcFrincu
Graphein:ANovelOpticalHigh-RadixSwitchArchitecture for3DIntegration..........................................162
JieJian,MingcheLai,LiquanXiao,andWeixiaXu
ImprovingthePerformanceofVolunteerComputingwithDataVolunteers: ACaseStudywiththeATLAS@homeProject......................178 SaúlAlonso-Monsalve,FélixGarcía-Carballeira, andAlejandroCalderón
Microcities:APlatformBasedonMicrocloudsforNeighborhoodServices...192 IsmaelCuadrado-Cordero,FelixCuadrado,ChrisPhillips, Anne-CécileOrgerie,andChristineMorin
ImpactofShutdownTechniquesforEnergy-EfficientCloudDataCenters...203 IssamRaïs,Anne-CécileOrgerie,andMartinQuinson
ProcessingPartiallyOrderedRequestsinDistributedStream ProcessingSystems.........................................211 RijunCai,WeigangWu,NingHuang,andLihuiWu
ImplementandOptimizationofIndoorPositioningSystemBased onWi-FiSignal...........................................220 ChongshengYu,XinLi,LeiDou,JianweiLi,YuZhang,JianQin, YuqingSun,andZhiyueCao
BigDataandItsApplications
OptimizingInter-serverCommunicationsbyExploitingOverlapping CommunitiesinOnlineSocialNetworks..........................231 JingyaZhou,JianxiFan,BaoleiCheng,andJunchengJia
RoadSegmentInformationBasedNamedDataNetworking forVehicularEnvironments...................................245 JunlanXiao,JianDeng,HuiCao,andWeigangWu
Energy-AwareQueryProcessingonaParallelDatabaseClusterNode......260 AmineRoukh,LadjelBellatreche,NikosTziritas,andCarlosOrdonez
CurrentFlowBetweennessCentralitywithApacheSpark...............270 MassimilianoBertolucci,AlessandroLulli,andLauraRicci
XVIIIContents
ParallelandDistributedAlgorithms
LightLoss-LessDataCompression,withGPUImplementation...........281
ShunjiFunasaka,KojiNakano,andYasuakiIto
DeterministicConstructionofRegularGeometricGraphswithShort AverageDistanceandLimitedEdgeLength........................295 SatoshiFujita,KojiNakano,MichihiroKoibuchi,andIkkiFujiwara
AGPU-BasedBacktrackingAlgorithmforPermutationCombinatorial Problems................................................310
TiagoCarneiroPessoa,JanGmys,NouredineMelab, FranciscoHerondeCarvalhoJunior,andDanielTuyttens
BufferMinimizationforRate-OptimalSchedulingofSynchronous DataflowGraphsonMulticoreSystems...........................325 MingzeMaandRizosSakellariou
ImplementingSnapshotObjectsonTopofCrash-ProneAsynchronous Message-PassingSystems.....................................341
CaroleDelporte-Gallet,HuguesFauconnier,SergioRajsbaum, andMichelRaynal
ScalingDBSCAN-likeAlgorithmsforEventDetectionSystemsinTwitter...356
JoanCapdevila,GonzaloPericacho,JordiTorres,andJesúsCerquides
TowardsParallelCFDComputationfortheADAPTFramework..........374 ImadKissami,ChristopheCérin,FayssalBenkhaldoun, andGillesScarella
FeedbackControlOptimizationforPerformanceandEnergyEfficiency onCPU-GPUHeterogeneousSystems............................388 Feng-ShengLin,Po-TingLiu,Ming-HuaLi,andPao-AnnHsiung
TheImpactofPanelFactorizationontheGauss-HuardAlgorithm fortheSolutionofLinearSystemsonModernArchitectures............405
SandraCatalán,PabloEzzatti,EnriqueS.Quintana-Ortí, andAlfredoRemón
LeveragingthePerformanceofLBM-HPCforLargeSizesonGPUs
UsingGhostCells..........................................417
PedroValero-Lara
ImprovingHashDistributedA*forSharedMemoryArchitectures
UsingAbstraction..........................................431
VictoriaSanz,ArmandoDeGiusti,andMarceloNaiouf
ContentsXIX
OnaParallelAlgorithmfortheDeterminationofMultipleOptimal SolutionsfortheLCSSProblem................................440
BchiraBenMabrouk,HamadiHasni,andZaherMahjoub
LocalityofComputationforStencilOptimization....................449 LufengYuan,JunhongLiu,YulongLuo,andGuangmingTan
GPUComputingtoSpeed-UptheResolutionofMicrorheologyModels.....457 GloriaOrtega,AntonioPuertas,FcoJavierdeLasNieves, andEsterMartin-Garzón
ApplicationsofParallelandDistributedComputing
MethodologicalApproachtoData-CentricCloudificationofScientific IterativeWorkflows.........................................469
SilvinaCaíno-Lores,AndreiLapin,PeterKropf,andJesúsCarretero
EfficientParallelAlgorithmforOptimalDAGStructureSearchonParallel ComputerwithTorusNetwork.................................483
HirokazuHonda,YoshinoriTamada,andReijiSuda
BinRecyclingStrategyforanAccuracy-AwareImplementation ofTwo-PointAngularCorrelationFunctiononGPU..................503
MiguelCárdenas-Montes,JuanJosé Rodríguez-Vázquez, MiguelA.Vega-Rodríguez,IgnacioSevillaNoarbe, andAntonioGómez-Iglesias
AnEfficientImplementationofLZWCompressionintheFPGA.........512
XinZhou,YasuakiIto,andKojiNakano
SharedMemoryTile-BasedvsHybridMemoryGOP-BasedParallel AlgorithmsforHEVCEncoder.................................521 HéctorMigallón,OtonielLópez-Granado,VicenteGaliano, PabloPiñol,andManuelP.Malumbres
GPU-BasedHeterogeneousCodingArchitectureforHEVC.............529
GabrielCebrián-Márquez,HéctorMigallón,José LuisMartínez, OtonielLópez-Granado,PabloPiñol,andPedroCuenca
OptimizingGPUCodeforCPUExecutionUsingOpenCL andVectorization:ACaseStudyonImageCoding...................537
PedroM.M.Pereira,PatricioDomingues,NunoM.M.Rodrigues, GabrielFalcao,andSergioM.M.deFaria
ImprovingthePerformanceofCardiacSimulationsinaMulti-GPU ArchitectureUsingaCoalescedDataandKernelScheme...............546
RaphaelPereiraCordeiro,RafaelSachettoOliveira, RodrigoWeberdosSantos,andMarceloLobosco
XXContents
ServiceDependabilityandSecurityinDistributedandParallelSystems
DynamicVerifiableSearchOverEncryptedDatainUntrustedClouds......557 XiaohongNie,QinLiu,XuhuiLiu,TaoPeng,andYapinLin
ReducingTCBofLinuxKernelUsingUser-SpaceDeviceDriver.........572 WeizhongQiang,KangZhang,andHaiJin
OBCBasedOptimizationofRe-encryptionforCryptographic CloudStorage.............................................586 HuidongQiao,JiangchunRen,ZhiyingWang,HaiheBa,HuaizheZhou, andTieHong
PerformanceModelingandEvaluation
ModelingPerformanceofHadoopApplications:AJourneyfromQueueing NetworkstoStochasticWellFormedNets.........................599 DaniloArdagna,SimonaBernardi,EugenioGianniti, SoroushKarimianAliabadi,DiegoPerez-Palacin, andJosé IgnacioRequeno
D-SPACE4Cloud:ADesignToolforBigDataApplications............614 MicheleCiavotta,EugenioGianniti,andDaniloArdagna
PortingMATLAB ApplicationstoHigh-PerformanceC++Codes: CPU/GPU-AcceleratedSphericalDeconvolutionofDiffusionMRIData....630 JavierGarciaBlas,ManuelF.Dolz,J.DanielGarcia, JesusCarretero,AlessandroDaducci,YasserAleman, andErickJorgeCanales-Rodriguez
OnStochasticPerformanceandCost-AwareOptimalCapacityPlanning ofUnreliableInfrastructure-as-a-ServiceCloud......................644 WeilingLi,LeiWu,YunniXia,YuandouWang,KunyinGuo,XinLuo, MingweiLin,andWanboZheng
ADistributedFormalModelfortheAnalysisandVerification ofArbitrationProtocolsonMPSoCsArchitecture....................658 ImenBenHafaiedh,MarouaBenSlimane,andRiadhRobbana
SyntheticTrafficModeloftheGraph500Communications..............675 PabloFuentes,EnriqueVallejo,José LuisBosque, RamónBeivide,AndreeaAnghel,GermánRodríguez, MitchGusat,andCyrielMinkenberg
AuthorIndex ............................................685
ContentsXXI
ParallelandDistributedArchitectures
AnaI.Torre-Bastida1(B) ,EstherVillar-Rodriguez1 ,MirenNekaneBilbao2 , andJavierDelSer1,2,3
1 TECNALIA.OPTIMAUnit,48160Derio,Spain {isabel.torre,esther.villar,javier.delser}@tecnalia.com
2 UniversityoftheBasqueCountryUPV/EHU,48013Bilbao,Spain {nekane.bilbao,javier.delser}@ehu.eus
3 BasqueCenterforAppliedMathematics(BCAM),48009Bilbao,Spain
Abstract. TheWebofDataiswidelyconsideredasoneofthemajor globalrepositoriespopulatedwithcountlessinterconnectedandstructureddatapromptingtheselinkeddatasetstobecontinuouslyand sharplyincreasing.Inthiscontexttheso-calledSPARQLProtocoland RDFQueryLanguageiscommonlyusedtoretrieveandmanagestored databymeansofSPARQLendpoints,aqueryprocessingserviceespeciallydesignedtogetaccesstothesedatabases.Nevertheless,dueto thelargeamountofdatatackledbysuchendpointsandtheirstructural complexity,theseservicesusuallysufferfromsevereperformanceissues, includinginadmissibleprocessingtimes.Thisworkaimsatovercoming thisnotedinefficiencybydesigningadistributedparallelsystemarchitecturethatimprovestheperformanceofSPARQLendpointsbyincorporatingtwofunctionalities:(1)aqueuingsystemtoavoidbottlenecks duringtheexecutionofSPARQLqueries;and(2)anintelligentrelaxationofthequeriessubmittedtotheendpointathandwheneverthe relaxationitselfandtheconsequentlyloweredcomplexityofthequery arebeneficialfortheoverallperformanceofthesystem.Tothisendthe systemreliesonatwo-foldoptimizationcriterion:theminimizationof thequeryrunningtime,aspredictedbyasupervisedlearningmodel;and themaximizationofthequalityoftheresultsofthequeryasquantified byameasureofsimilarity.Thesetwoconflictingoptimizationcriteriaare efficientlybalancedbytwobi-objectiveheuristicalgorithmssequentially executedovergroupsofSPARQLqueries.Theapproachisvalidatedon aprototypeandseveralexperimentsthatevincetheapplicabilityofthe proposedscheme.
Keywords: SPARQL · Queryrewriting · LinkedOpenData · Ontology management · Multiobjectiveoptimization
1IntroductionandMotivation
Itwillbesoonadecadesincetheso-calledLinkedOpenData(LOD)paradigm,alongwithseveralrelatedprojectsandinitiatives,becamethemain c SpringerInternationalPublishingAG2016 J.Carreteroetal.(Eds.):ICA3PP2016,LNCS10048,pp.3–17,2016. DOI:10.1007/978-3-319-49583-5 1
IntelligentSPARQLEndpoints:Optimizing ExecutionPerformancebyAutomaticQuery RelaxationandQueueScheduling
technologyenablerfortheexpansionoftheSemanticWeb,whose raisond’ˆetre was anintrinsicinformationtechnologiesrevolutioncenteredonenrichingthepublisheddataandcopingwiththeinherentinabilityofmachinestounderstandwebsites[1].OverthelastdecadetheincreasingadoptionofLODledtothedevelopmentofadistributedmeshofgloballyinterlinkedknowledgecapableofproviding apioneeringmethodtotraversethewebandinterpretitscontents:theWebof Data.Thishuge,distributed,diversedatabaseisdeployedonmanifolddomains andawiderangeofsubjectssuchasgovernment,libraries,lifescienceandmedia, amongmanyothers.Itallowsfortheexecutionofexploratoryandselectivequeries overaenormoussetofupdated,comprehensiveandpertinentdata.TheprevalentsemanticquerylanguagefortheserepositoriesisSPARQL,whichprovidesa fullsetofqueryoperationsandfunctionalities.Notwithstanding,inordertofully unleashtheSemanticWebpotentialSPARQLusersareforcedtodominatethe syntaxoftheSPARQLlanguage.Onthispurposethecommunityhasdevotedconsiderableresearchefforttowardsderivingsophisticatedyetfriendlytoolstohelp usersproperlyexploitthevastamountofavailabledataandachieveasatisfactory performanceintermsofaccuracy.Underthisrationale,thesystemsandengines whereSPARQLendpointsaredeployedhavebecometheprimarytargetwhereto allocatespecializedresourcesandintelligentsoftwareprocedurestoenhancethe qualityofservicecommonlyjeopardizedandcalledintoquestionduetosignificant delays,speciallywhendealingwithlargedatasets[2].
Thecontributionofthisresearchworkgravitatesonthreemainaxesto improvetheperformanceofSPARQLsystems:theperformancepredictionof SPARQLqueriespriortotheirprocessing,theirrelaxationandtheplanningof runqueuesinprocessingengines.Inthefieldofperformancepredictionthereis alargenumberofworksinthefieldoftheSQLquerylanguageforrelational databases,whichhavetraditionallyrevolvedaroundstatisticalorheuristicscosts estimation.Inregardstothepredictionofthequeryexecutiontime,supervised learningmodelshavepositionedthemselvesasthe off-the-shelf estimatorsin recentyears(seee.g.[3, 4]).Tothebestofourknowledgethereareveryscarce studiesthatextrapolatethisacquiredknowledgewithrelationaldatabasesto theLODrepositories.Themaindifferencebetweenthesetwoareasresideson theabsenceofanschematicstructureintheRDFstandard,aswellasonthe shortageofstatisticsofthedatasetscompoundingtheLODenvironment.Justifiably,thecurrentgenerationofSPARQLquerycostestimationapproaches thatinspirefromthosederivedforrelationaldatabaseshavebeenproventobe inadequateforthetaskofperformanceprediction.Thisistherationaleforthe brandnewdirectionstartedin[3]andsubsequentlyfollowedin[4, 5]thatresorts tomachinelearningtechniquestoextractSPARQLperformancemetricsfrom alreadyexecutedqueries.Despitethegoodpredictivescoresreportedinthese references(withthelatestworkin[5]scoringanaverage R2 of0.94withSupport VectorMachines),wewillshowthroughoutthismanuscriptthatthereisstill roomforimprovementintermsofthelearningmodelandthesetoffeatures.
Concerningthesecondaspectthatcanbeleveragedsoastoimprovethe performanceofendpoints,theoptimizationofSPARQLquerieshashitherto
4A.I.Torre-Bastidaetal.
mainlyfocusedonrewritingthequeryathandbasedondifferentobjectives, suchastheminimizationoftheexecutiontimeorthereductionofitsstructuralcomplexity.Weclassifythesestudiesintothreecategoriesdependingon theutilizedoptimizationtechnique:cost-based[6–8],heuristics-based[9–11]and machinelearningtechniques[12, 13].Cost-basedschemessufferfromtheaforementionedlowavailabilityofstatisticsintheLOD.Heuristicapproachesassume thatstructurallysimplequeriesareingenerallessexpensive,butthisisnotthe caseinSPARQLduetotheinferenceandvariantextensionalinformationcontainedinaSPARQLarrangement.TheworkbyBiceretal.in[12]introducethe conceptofRelationalKernelMachines,whichsimplifytheproblemofextractingfeaturesfromthecomplexstructureofsemanticdataandhenceimproving na¨ıveapproximationsbasedonSupportVectorMachines.Likewise,in[13]longrunningqueries(detectedbypredictingitscomputationalcosts)arerelaxedby applyingaGeneticAlgorithm(GA)basedrewritingapproachsoastoyielda fasterrewrittenquery.Inourworkwewilltakeastepfurthersoastoconsider inthedeterminationofthequeryrelaxationpolicytheinherentParetotrade-off betweenthequalityoftheresultsreturnedbythequeryandtherelativerunning timewithrespecttoitsoriginalversion.ThisPareto-optimalbalancebetween bothobjectiveswillbeshowntobetractableviaevolutionarymulti-objective heuristics.
Finally,thethirdaxisreferstotheschedulingofrunqueuestoorganizeand coordinatequeryexecutions,aroundwhichourliteraturesurveyhasidentifieda single,recentyetrelevantcontributionforSPARQLendpoints[14].Inthispaper theauthorsexplainthatguaranteeingaconsistentlygoodqualityofservicein SPARQLendpointsisadifficulttasktoaccomplish,forwhichtheuseofan schedulerisproposedtooptimallymanagetheexecutionofqueriesinSPARQL endpoints.Wegoonestepbeyondthesimpleschedulersexploredinthisreference byproposinganovelapproachinwhichweoptimizetheschedulingcriterion basedonthepreviouslymentionedSPARQLrelaxationpolicies.
Oursoftwaresystemblendstogetherthethreeaspectscommentedaboveto improvetheruntimeperformanceofaSPARQLendpoint.Theproblemisthat manyofthequeriesprocessedbysuchsystemscannotbeexecutedwithina reasonabletimefortheuser.Toaddressthisissueabi-objectivealgorithmis designedtoobtaintheoptimalsetofrelaxationrulesonthisdatasetwithout disregardingthequalityofthequeryresult.ByapplyingsuchaPareto-optimal setofrelaxationrulestheexecutiontimeofthequeriesisreducedwhilekeeping thequalitydegradationoftheirresultstoaminimum.Suchrulesetscanbe furtherexploitedbyimplementingasetofprocessingqueuesintheSPARQL endpoint,sothattheoptimizationalgorithmdeterminestheadequatesetof relaxationrules,theallocationofqueriesoverthepoolofprocessingqueuesand theexecutionorderofthequeriesassignedtoeveryqueue.Insummary,the maingoalofthispaperisthedesignofasoftwaresystemcapableofenhancing theperformanceofaSPARQLendpointbycombiningoptimizedrunqueues, adequatequeryrelaxationpoliciesandSPARQLqueryruntimepredictions. Schematicallythenoveltechnicalingredientsofthisresearchworkareasfollows:
IntelligentSPARQLEndpoints5
1.Thederivationofnewpredictivefeaturesforthedesignofaruntimeestimator forSPARQLqueries,whichcanbedividedinquerylanguagealgebraand vocabularyfeaturesdefiningthetermsofthequery.
2.Thedesignandimplementationofasystembasedonrunqueuestoimprove theperformanceofSPARQLendpoints,whichtoourknowledgeisthefirst oneproposedintheliterature.
3.Aqueryrelaxationoptimizationalgorithmguidedbytwoobjectives:themaximizationofthequeryquality(quantifiedintermsofsimilarity)andthe minimizationoftheruntimeofthequery.
4.Theuseofparallelizableevolutionarymeta-heuristicsolverstotheperformanceimprovementofSPARQLendpointsintheparticularaspectsofquery relaxationandrunqueueschedulingmentionedpreviously.
Therestofthemanuscriptisstructuredasfollows:Sect. 2 overviewsthe generalarchitectureoftheproposedsystemandformulatestheoptimization problemthatmathematicallydefinesitsoperation.Subsects. 2.1 and 2.2 delve intothedesignofestimatorsforthequeryrunningtimeandqualityonwhich theaforementionedoptimizationproblemisbased.Next,Sect. 3 elaborateson themeta-heuristicoptimizationalgorithmdesignedtoefficientlyimplementthe proposedsystem,includingrelevantaspectssuchasthesolutionencodingand thedesignoftheoperators.Section 4 reportsontheexperimentalevaluationof theproposedschemeandconclusionsaredrawninSect. 5.
2ArchitectureOverviewandProblemFormulation
Inthissectionwebrieflyintroducekeyconceptsandnotationusedthroughout therestofthepaper.SPARQListhestandardquerylanguageforRDF.Let I bethesetofallIRIs(InternationalizedResourceIdentifiers), L bethesetof RDF literals,and B bethesetofRDF blanknodes.Thesethreeinfinitesetsare pairwisedisjoint.AnRDF triple isatuple(s,p,o) ∈ (I ∪ B ) × I × (I ∪ B ∪ L); s iscalledthe subject, p isthe predicate,and o standsforthe object ofthetriple, respectively.AnRDF graph isafinitesetoftriples.Forthepurposeofthis paper,a dataset D isanRDFgraph.Givenadataset D,werefertotheset voc (D) ⊆ (I ∪ L)ofIRIsandliteralsoccurringin D asthe vocabulary of D.We usethewords term or resource torefertoelementsin I ∪ L.
ThecoreofaSPARQLqueryisa basicgraphpattern,whichisusedtomatch anRDFgraphinordertosearchfortherequiredanswers.A triplepattern is atriple,withoutblanknodes,whereavariablemayoccurinanyplaceofthe triple.A graphpattern isanexpressionrecursivelydefinedasfollows:(1)atriple patternisagraphpattern;(2)if P1 , P2 aregraphpatterns,then(P1 and P2 ), (P1 union P2 ),and(P1 opt P2 )aregraphpatterns;and(3)if P isagraph patternand C isSPARQLconstraint,then(P filter C )isagraphpattern. Withthesedefinitionsinmind,aqueryisdefinedby Q =(D,δ )where D isthe datasettobeusedduringthepatternmatchingand δ isthegraphpatternof thequery.
6A.I.Torre-Bastidaetal.
Figure 1 showsanoverviewoftheproposedsystem,whichisconceivedas anintermediatemanagerbetweentheuserssubmittingtheirqueriesandthe poolofparallelprocessingqueuesthatcompoundtheSPARQLendpoint.Severalmodulescanbefoundinthisdiagram:firstitisimportanttoremarkthat therelaxationpoliciesandthemappingtoprocessingqueuesareoptimizedat thelevelofpreviouslyclusteredquerygroups,sothatquerieswithinthesame clusterundergothesamerelaxationrulesandareassignedtothesameprocessingqueue.Thisclusteranalysismoduleisbasedonthemethodologypresented in[15]thatfollowthesesteps:datagenerationmimickinganinputdatasource, querylogmining,clusteringandSPARQLfeatureanalysis.Asaresult P query sets
Fig.1. Overviewoftheproposedarchitectureassuming P =3queryprofiles {Qp }3 p=1 and Z =3processingqueuesattheendpoint.Thelowerpartoftheplotcorresponds totheprocessingstagesthatareperformedoff-linebasedonahistoricrecordofqueries submittedtotheendpoint,whereastheupperpartillustratestheentirerelaxationand schedulingprocedureappliedtoanewincomingquerysubmittedtotheendpoint.
PriortoitsonlineworkingregimetheSPARQLendpointmustdecidethe setofrelaxationpolicies,thequeueandtheprioritywithinthequeueforeach ofsuchclusters.Let fr (Q)bethegenericdefinitionforarelaxationrule,drawn froma R-sizedvocabulary F = {fr (Q)}R r =1 ofpossiblerelaxationoperators.Itis importanttonotethat fr (Q)mayonlyimpactonacertaintriple(s,p,o)within Q or,instead,involvemoretermswithinitsexpression.Threekindofruleshave beenconsideredinthesetup:
1.Deletionrules,whichconsistofeliminatingatriple(s,p,o),filter,terms,union and/oroptionalclausesfromthequery.
IntelligentSPARQLEndpoints7
{Qp }P p=1 = {{Qn p }Np n=1 }P p=1 (clusters)areproducedwith Qn p =(D,δ n p ). T (Q ,Q) P (Q ,Q) Cluster analysis Scheduling optimization Q1 Q2 Q3 F T ()-P () Paretoestimation Fp P (·) Fp T ( ) F ∗ 1 F ∗ 2 F ∗ 3 Cluster mapping Relaxation module Scheduler module {τz }3 z=1 Average runningtime Average quality Q
New
ON-LINE OFF-LINE (training) (testing) Q τ1 τ2 τ3 Q SPARQL database Cluster index p {α♦,m p }3 p=1 Selected policy {F ♦,m p }3 p=1 {λ♦,m (p)}3 p=1
=(D,δ )
SPARQL query
8A.I.Torre-Bastidaetal.
2.Additionrules,whichaddarestrictiveclausetothequery,e.g.alimitoperator.
3.Hierarchicalrules,bywhichatermofthequeryissubstitutedbyitsdescendantorascendantintheontologicalhierarchyofthequerieddataset.
Thecompletelistofpossiblerules F issortedbytheirestimateddegreeof degradationontheresultsoftherelaxedquery.Underthisnotation Fp ⊆F will denotethesequenceofrelaxationoperatorsthatwillbeappliedtothequeries belongingtocluster p,whereas Qn, p willdenotetherelaxedversionofquery Qn p aftertheapplicationoftherulesin Fp
Thedeterminationof {Fp }P p=1 willbedoneunderatwofoldcriteria:weseek tooptimallybalancetheimpactoftherelaxationpolicyontheaveragerunning timeandqualityoftheresultsassociatedtothequery;themorerelaxedthequery Qn, p is,thefasteritwillbeexecutedattheendpoint,butthelessprecisethe returnedresultswillbewithrespecttotheoriginal,unrelaxedquery Qn p .Such objectiveswillberepresentedbytwofunctions T (Qn, p ,Qn p )and P (Qn, p ,Qn p ), both ∈ [0, 1],correspondingtotherelativerunningtimeandqualityofthe relaxedquery Qn, p w.r.t. Qn p .Inmathematicaltermstherelaxationmodulein Fig. 1 seeks,foreachqueryprofile Qp ,agroupofpolicies F ∗ p composedbyseveral relaxationrulesets {F ∗,m p }Mr m=1 suchthat F ∗,m p =arg Fp ⊆F
min 1 Np Np n=1 T (Qn, p ,Qn p ), max 1 Np
subjectto Qn, p beingthequeryresultingfromthesuccessiveapplicationofthe relaxationrules f ∈Fp to Qn p .Foreach m ∈{1,...,Mr } adifferentsetofrules F ∗,m p balancesdifferentlybothfitnessobjectiveswhenappliedoverthereference queryprofile Qp .Subsects. 2.1 and 2.2 willelaborateontheestimationofthe valuefor T (Qn, p ,Qn p )and P (Qn, p ,Qn p )priortotheexecutionofthequeryitself.
OncesuchParetoestimationshavebeenproducedoff-lineforeachquery profile,theschedulermoduleexploitsthisinformationtodetermine(1)which processingqueueshouldbeassignedtoanincomingqueryassociatedtoacertain cluster p ∈{1,...,P };(2)whichrelaxationpolicyshouldbeappliedtothe queryamongthosein F ∗ p ;and(3)theexecutionorderofthequeries(i.e.their priority)inthecaseseveralofthemareassignedtothesamequeue.Without lossofgeneralitycomputingpowerdifferencesbetweenprocessingqueuesare assumedtoyieldfactors {τz }Z z =1 (with τz ∈ (0, 1]and Z denotingthenumber ofqueues)suchthatthetimetakenbyqueue z toprocess Qn p isreducedby 100 · τz %.Thequeueallocationtobedecidedatthismodulewillbedenoted asanon-surjective,non-injectivemappingfunction λ : {1,...,P } →{1,...,Z }, suchthat λ(p)willstandforthequeuetowhichthequeriesassociatedtoprofile p ∈{1,...,P } willbeforwarded.Prioritieswithinqueue z ∈{1,...,Z } will bedenotedasareal-valuedvariable αp ∈ R suchthatif λ(p)= λ(p )(i.e. profiles p and p areassignedthesameprocessingqueue),thequeriesinprofile p willbeexecutedfirstif αp ≤ αp .Conversely,if αp >αp queriesbelongingto
⎧ ⎨
Np n=1 P
n,
n p
⎫ ⎬ ⎭
⎩
(Q
p ,Q
)
, (1)
cluster p willbegrantedahigherexecutionprioritylevelthanthosein p.The criteriontodeterminetheoptimalmapping λ♦ (p),relaxationpolicies {F ♦ p }P p=1 andpriorityfactors α♦ = {α♦ p }P p=1 attheschedulermodulewillagainrelyon theaforementionedtime-qualityParetotrade-off,butincorporatingasubtleyet relevantaspect:querieswithinthesameprocessingqueueinteractintermsof theircompletiontime,i.e.boththerelativeorderofquerieswithinagivenqueue andthedifferentprocessingcapabilitiesofthequeuesthemselvesaremeaningful fortheoverallevaluationoftheaverageexecutiontimetakenbytheendpointto processincomingqueries.Inotherwords,avectorofmappingfunctions λ♦ ( ) . = {
,relaxationpolicies F
willbalancethefollowingPareto:
where I( )isanauxiliaryindicatorfunctiontakingvalue1ifitsargumentis trueand0otherwise; λ(p) ∈{1,...,Z } denotestheindexofthequeuetowhich thequeriesincluster p areassigned;and Qn, p istheresultofrelaxingquery Qn p throughpolicy Fp .Inwords,Expression(2)denotesthetimetakenbythe queries Qn p withincluster p,whichdependsnotonlyontheassignedqueue through τλ(p) ,butalsoontheaveragetimetakenbyqueriesbelongingtoother clusters ∈{1,...,p 1,p +1,...,P } providedthattheyareassignedtothe sameprocessingqueueandgrantedhigherpriority.Finally,Expression(3)poses themeanqualityscoreaveragedoveralltheconsideredqueryclusters.
Beforeproceedingwiththealgorithmicsolutionproposedtoefficientlysolve theaboveproblems,itshouldbenotedthatinpracticetherelaxationand schedulingmodulesmightbeconceivedandformulatedasasingleoptimization problemdrivenbytheobjectivefunctionsinExpressions(2)and(3).However, bydecouplingbothmodulesadeeperunderstandingoftheflexibilityoftheclusterswithrespecttothesetofrelaxationoperatorscanbeacquired,withfurther potentialapplicationsbeyondtheoneaddressedinthispaper(e.g.optimizinga distributeddeploymentofthedatabaseathand).
2.1SPARQLQueryRun-TimePrediction
AsshowninFig. 1 andarguedabove,anestimationoftherunningtimerequired tocompleteagivenrelaxedquery Q isneededwhentheproposedapproach
IntelligentSPARQLEndpoints9
λ
♦
p=1
λ♦,m (·), α♦,m , F ♦,m p =arg λ∈Λ α∈RP Fp ⊆F ⎧ ⎨ ⎩ min 1 P P p=1 τλ(p) Np Np n=1 T (Qn, p ,Qn p ) P =1 =p 1 N N η =1 τλ( ) T (Qη, ,Qη )I(α ≤ αp )I(λ( )= λ(p)), (2) max 1 P P p=1 1 Np Np n=1 P (Qn, p ,Qn p ) ⎫ ⎬ ⎭ , (3)
♦,m ( )}Ms m=1
p . = {F ♦,m p }Ms m=1 andprioritylevels A♦ ( ) . = {α♦,m }Ms m=1 = {{α♦,m p }P
}Ms m=1
operatesinbothoff-lineandon-linemodes.Suchanestimationmustbeproduced withoutexecutingthequeryitself.Therefore,asupervisedlearningmodelis includedinordertopredictexecutiontimesofgenericSPARQLqueriesbased onahistoricsetofalreadyexecutedqueries.Theadoptedapproachissimilar totheonepresentedbyHassanetal.in[5],butwithnovelingredients:the learningmodelitselfandthesetoffeaturesextractedfromtheexpressionofthe SPARQLqueriestobuildthetrainingdataset.Assuch,thisdatasetconsistof asetofpreviouslyexecutedqueriesandtheobservedperformancemetricvalues (executiontimes)forthosequeriesintheirnative,unrelaxedform.Thegoalisto extractproperfeaturesfromthesyntaxofthequeriestoconstructaprediction modelthatprovideuswithanaccurateestimationoftheexecutiontimethatcan begeneralizedtonew,possiblyrelaxedquerysets.Theproposedsetoffeatures areclassifiedas:
1.Algebrafeatures,whichrepresentthesyntaxoftheSPARQLquery,itsoperatorsandstructuralinformation.Firstwetransformaqueryintoanalgebra expressiontree,fromwhichweextractthefollowingfeatures:numberofbasic graphpatterns,filteroperatorpresence,typeoffilter,limitoperatorpresence, optionaloperatorpresence,distinctoperatorpresence,numberofprojected variables,groupoperatorpresence,numberofunion,numberofjoinsand numberofleftjoins.
2.Datasetvocabularyfeatures,forwhichweusethedatasettermsinvolvedin theSPARQLquerydefinitiontoextractintensionalandsemanticinformation aboutthem.Firstwecomputetheoverallsetofterms,andwiththis bagof words wecomputetheTF-IDFfrequency[16]asaquantitativescoreofthe importanceofthetermsofthequery(words )inthedataset(document ) (Fig. 2).
Regardingthesupervisedlearningmodelweoptforaso-calledRandom ForestClassifier[17],awidelyutilizedensemblemodelcharacterizedbyitsgood generalizationpropertiesandlowtendencytooverfit.InshortRandomForests exploittheprincipleofbaggingbyrandomlysplittingthedataintochunks, selectingafeaturesubsetandtrainingaweaklearner(tree)oneachofthem, fromwherethepredictedoutputisgivenbyvoting(classification)oraveraging (prediction)theindividualoutputsoftheaforementionedweakmodels.
10A.I.Torre-Bastidaetal.
Fig.2. SPARQLqueryfeaturesvector.