Declarative programming and knowledge management conference on declarative programming declare 2017 by Ebook Home

Visit to download the full and correct content document: https://textbookfull.com/product/declarative-programming-and-knowledge-manageme nt-conference-on-declarative-programming-declare-2017-unifying-inap-wflp-and-wlpwurzburg-germany-september-19-22-2017-revised-selected-papers-dietmar-sei/

More products digital (pdf, epub, mobi) instant download maybe you interests ...

High Performance Computing 4th Latin American Conference CARLA 2017 Buenos Aires Argentina and Colonia del Sacramento Uruguay September 20 22 2017 Revised Selected Papers 1st Edition Esteban Mocskos

https://textbookfull.com/product/high-performance-computing-4thlatin-american-conference-carla-2017-buenos-aires-argentina-andcolonia-del-sacramento-uruguay-september-20-22-2017-revisedselected-papers-1st-edition-esteban-mocskos/

Agile Processes in Software Engineering and Extreme Programming 18th International Conference XP 2017 Cologne Germany May 22 26 2017 Proceedings 1st Edition Hubert Baumeister

https://textbookfull.com/product/agile-processes-in-softwareengineering-and-extreme-programming-18th-internationalconference-xp-2017-cologne-germanymay-22-26-2017-proceedings-1st-edition-hubert-baumeister/

Studies on Speech Production 11th International Seminar

ISSP 2017 Tianjin China October 16 19 2017 Revised Selected Papers Qiang Fang

https://textbookfull.com/product/studies-on-speechproduction-11th-international-seminar-issp-2017-tianjin-chinaoctober-16-19-2017-revised-selected-papers-qiang-fang/

Sensor Networks 6th International Conference SENSORNETS

2017 Porto Portugal February 19 21 2017 and 7th International Conference SENSORNETS 2018 Funchal

Madeira Portugal January 22 24 2018 Revised Selected Papers César Benavente-Peces

https://textbookfull.com/product/sensor-networks-6thinternational-conference-sensornets-2017-porto-portugalfebruary-19-21-2017-and-7th-international-conference-

Mobility Analytics for Spatio Temporal and Social Data

First International Workshop MATES 2017 Munich Germany September 1 2017 Revised Selected Papers 1st Edition

Christos Doulkeridis

https://textbookfull.com/product/mobility-analytics-for-spatiotemporal-and-social-data-first-international-workshopmates-2017-munich-germany-september-1-2017-revised-selectedpapers-1st-edition-christos-doulkeridis/

Knowledge Graph and Semantic Computing Language

Knowledge and Intelligence Second China Conference CCKS

2017 Chengdu China August 26 29 2017 Revised Selected Papers 1st Edition Juanzi Li

https://textbookfull.com/product/knowledge-graph-and-semanticcomputing-language-knowledge-and-intelligence-second-chinaconference-ccks-2017-chengdu-china-august-26-29-2017-revisedselected-papers-1st-edition-juanzi-li/

Knowledge Graph and Semantic Computing Semantic Knowledge and Linked Big Data First China Conference CCKS 2016 Beijing China September 19 22 2016 Revised Selected Papers 1st Edition Huajun Chen

https://textbookfull.com/product/knowledge-graph-and-semanticcomputing-semantic-knowledge-and-linked-big-data-first-chinaconference-ccks-2016-beijing-china-september-19-22-2016-revisedselected-papers-1st-edition-huajun-chen/

Data Management Technologies and Applications 6th

International Conference DATA 2017 Madrid Spain July 24 26 2017 Revised Selected Papers Joaquim Filipe

https://textbookfull.com/product/data-management-technologiesand-applications-6th-international-conference-data-2017-madridspain-july-24-26-2017-revised-selected-papers-joaquim-filipe/

Image and Graphics 9th International Conference ICIG

2017 Shanghai China September 13 15 2017 Revised Selected Papers Part III 1st Edition Yao Zhao

https://textbookfull.com/product/image-and-graphics-9thinternational-conference-icig-2017-shanghai-chinaseptember-13-15-2017-revised-selected-papers-part-iii-1st-

Dietmar Seipel

Michael Hanus

Salvador Abreu (Eds.)

Declarative Programming and Knowledge Management

Conference on Declarative Programming, DECLARE 2017

Unifying INAP, WFLP, and WLP

Würzburg, Germany, September 19–22, 2017

Revised Selected Papers

LectureNotesinArtiﬁcialIntelligence10997

SubseriesofLectureNotesinComputerScience

LNAISeriesEditors

RandyGoebel UniversityofAlberta,Edmonton,Canada

YuzuruTanaka HokkaidoUniversity,Sapporo,Japan

WolfgangWahlster

DFKIandSaarlandUniversity,Saarbrücken,Germany

LNAIFoundingSeriesEditor

JoergSiekmann

DFKIandSaarlandUniversity,Saarbrücken,Germany

Moreinformationaboutthisseriesathttp://www.springer.com/series/1244

DietmarSeipel • MichaelHanus

SalvadorAbreu(Eds.) DeclarativeProgramming andKnowledgeManagement

ConferenceonDeclarativeProgramming,DECLARE2017

UnifyingINAP,WFLP,andWLP

Würzburg,Germany,September19–22,2017

RevisedSelectedPapers

Editors

DietmarSeipel

UniversitätWürzburg

Wuerzburg

Germany

MichaelHanus

Christian-Albrechts-Universit ätzuKiel

Kiel

Germany

SalvadorAbreu

Universidadede Èvora

Evora

Portugal

ISSN0302-9743ISSN1611-3349(electronic)

LectureNotesinArtiﬁcialIntelligence

ISBN978-3-030-00800-0ISBN978-3-030-00801-7(eBook) https://doi.org/10.1007/978-3-030-00801-7

LibraryofCongressControlNumber:2018954670

LNCSSublibrary:SL7 – ArtiﬁcialIntelligence

Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartofthe materialisconcerned,speciﬁcallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicroﬁlmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodologynow knownorhereafterdeveloped.

Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspeciﬁcstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse.

Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbookare believedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsortheeditors giveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforanyerrorsor omissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictionalclaimsin publishedmapsandinstitutionalafﬁliations.

ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland

Preface

ThisvolumecontainsaselectionofthepaperspresentedattheInternationalConferenceonDeclarativeProgrammingDeclare2017.Thejointconferencewasheldin Würzburg,Germany,duringSeptember19–22,2017.Itconsistedofthe21stInternationalConferenceonApplicationsofDeclarativeProgrammingandKnowledge Management(INAP),the31stWorkshoponLogicProgramming(WLP),andthe25th WorkshoponFunctionaland(Constraint)LogicProgramming(WFLP),anditwas accompaniedbyaone-weeksummerschoolonAdvancedConceptsforDatabasesand LogicProgrammingforstudentsandPhDstudents.

Declarativeprogrammingisanadvancedparadigmformodelingandsolvingcomplexproblems,whichhasattractedincreasedattentionoverthelastdecades,e.g.,inthe domainsofdataandknowledgeengineering,databases,artiﬁcialintelligence,natural languageprocessing,modelingandprocessingcombinatorialproblems,andforestablishingknowledge-basedsystemsfortheweb.TheconferenceDeclare2017aimedto promotethecross–fertilizingexchangeofideasandexperiencesamongresearchesand studentsfromthedifferentcommunitiesinterestedinthefoundations,applications,and combinationsofhigh-level,declarativeprogrammingandrelatedareas.

TheINAPconferencesprovideaforumforintensivediscussionsofapplicationsof importanttechnologiesaroundlogicprogramming,constraintproblemsolving,and closelyrelatedadvancedsoftware.TheycomprehensivelycovertheimpactofprogrammablelogicsolversintheInternetsociety,itsunderlyingtechnologies,and leadingedgeapplicationsinindustry,commerce,government,andsocietalservices. PreviousINAPconferenceshavebeenheldinJapan,Germany,Portugal,andAustria. TheWorkshopsonLogicProgramming(WLP)aretheannualmeetingoftheGerman SocietyforLogicProgramming(GLPe.V.).Theybringtogetherinternational researchersinterestedinlogicprogramming,constraintprogramming,andrelatedareas likedatabasesandartiﬁ cialintelligence.PreviousWLPworkshopshavebeenheldin Germany,Austria,Switzerland,andEgypt.TheInternationalWorkshoponFunctional andLogicProgramming(WFLP)bringstogetherresearchersinterestedinfunctional programming,logicprogramming,aswellastheintegrationoftheseparadigms.PreviousWFLPeditionshavebeenheldinGermany,France,Spain,Italy,Estonia,Brazil, Denmark,andJapan.Thetopicsofthepapersofthisyear'sjointconferenceDeclare concentratedonthreecurrentlyimportant ﬁ elds:constraintprogrammingandsolving, functionalandlogicprogramming,anddeclarativeprogramming.

Thedeclarativeprogrammingparadigmexpressesthelogicofacomputationinan abstractway.Thus,thesemanticsofadeclarativelanguagebecomeseasiertograspfor domainexperts.Declarativeprogrammingoffersmanyadvantagesfordataandknowledgeengineering,suchas,e.g.,security,safety,andshorterdevelopmenttime.Duringthe lastcoupleofyears,alotofresearchhasbeenconductedontheusageofdeclarative systemsinareaslikeanswersetprogramming,reasoning,meta-programming,and deductivedatabases.Reasoningaboutknowledgewrappedinrules,databases,orthe

SemanticWeballowstoexploreinterestinghiddenknowledge.Declarativetechniquesfor thetransformation,deduction,induction,visualization,orqueryingofknowledgehavethe advantageofhightransparencyandbettermaintainabilitycomparedtoprocedural approaches.

Manyproblemswhichoccurinlargeindustrialtasksareintractable,invalidating theirsolutionbyexactorevenmanyapproximateconstructivealgorithms.One approachwhichhasmadesubstantialprogressoverthelastfewyearsisconstraint programming.Itsdeclarativenatureofferssignificantadvantages,fromasoftware engineeringstandpointandinthespeci fication,implementation,andmaintenance phases.Severalinterestingaspectsareindiscussion:howcanthisparadigmbe improvedorcombinedwithknown,classicalmethods;howcanreal-worldsituations bemodelledasconstraintproblems;whatstrategiesmaybepursuedtosolveaproblem onceithasbeenspeci fied;orwhatistheexperienceofapplicationsinreallylarge industrialplanning,simulation,andoptimisationtasks?

Anotherareaofactiveresearchistheuseofdeclarativeprogramminglanguages,in particular,functionalandlogiclanguages,toimplementmorereliablesoftwaresystems.Theclosenessoftheselanguagestologicalmodelsprovidesnewmethodstotest andverifyprograms.Combiningdifferentprogrammingparadigmsisbeneﬁcialfroma softwareengineeringpointofview.Therefore,theextensionofthelogicprogramming paradigmanditsintegrationwithotherprogrammingconceptsareactiveresearch branches.Thesuccessfulextensionoflogicprogrammingwithconstraintshasalready beenmentioned.Theintegrationoflogicprogrammingwithotherprogramming paradigmshasbeenmainlyinvestigatedforthecaseoffunctionalprogramming,sothat types,modules,higher-orderoperators,orlazyevaluationcanalsobeusedin logic-orientedcomputations.

ThethreeeventsINAP,WLP,andWFLPwerejointlyorganizedbytheUniversity ofWürzburgandtheSocietyforLogicProgramming(GLPe.V.).Wewouldliketo thankallauthorswhosubmittedpapersandallconferenceparticipantsforthefruitful discussions.WearegratefultothemembersoftheProgramCommitteeandthe externalrefereesfortheirtimelyexpertiseincarefullyreviewingthepapers.Wewould liketoexpressourthankstotheGermanFederalMinistryofEducationandResearch (BMBF)forfundingthesummerschoolonAdvancedConceptsforDatabasesand LogicProgramming(under01PL16019)andtotheUniversityofWürzburgforhosting theconferenceinthenewCentralLectureBuildingZ6andforprovidingtheTuscany HallintheBaroquestyleWürzburgResidencePalaceforaclassicalmusicconcertin honorofJackMinker,apioneerindeductivedatabasesanddisjunctivelogicprogrammingandthelongtimementorofthe ﬁrsteditor,whocelebratedhis90thbirthday in2017.

July2018DietmarSeipel MichaelHanus SalvadorAbreu

Organization

ProgramChair

DietmarSeipelUniversityofWürzburg,Germany

ProgramCommitteeofINAP

SlimAbdennadherGermanUniversityinCairo,Egypt

SalvadorAbreu(Co-chair)Universityof Évora,Portugal

MolhamArefLogicBloxInc,Atlanta,USA

ChittaBaralArizonaStateUniversity,USA

JoachimBaumeisterUniversityofWürzburg,Germany

StefanBrassUniversityofHalle,Germany

FrançoisBryLudwig-MaximilianUniversityofMunich,Germany

PhilippeCodognetUniversityPierre-and-MarieCurie,France

VitorSantosCostaUniversityofPorto,Portugal AgostinoDovierUniversityofUdine,Italy

ThomasEiterViennaUniversityofTechnology,Austria

ThomFrühwirthUniversityofUlm,Germany

ParkeGodfreyYorkUniversity,Canada GopalGuptaUniversityofTexasatDallas,USA

MichaelHanusKielUniversity,Germany

JorgeLoboICREAandUniversitatPompeuFabra,Spain

GrzegorzJ.NalepaAGHUniversity,Poland VitorNogueiraUniversityof Évora,Portugal

EnricoPontelliNewMexicoStateUniversity,USA

DietmarSeipel(Chair)UniversityofWürzburg,Germany

HansTompitsViennaUniversityofTechnology,Austria MasanobuUmedaKyushuInstituteofTechnology,Japan

ProgramCommitteeofWLP/WFLP

SlimAbdennadherGermanUniversityinCairo,Egypt

SergioAntoyPortlandStateUniversity,USA

OlafChitilUniversityofKent,UK

JürgenDixClausthalUniversityofTechnology,Germany MorenoFalaschiUniversità diSiena,Italy

MichaelHanus(Chair)KielUniversity,Germany

SebastiaanJoostenUniversityofInnsbruck,Austria OlegKiselyovTohokuUniversity,Japan

HerbertKuchenUniversityofMünster,Germany

TomSchrijversKatholiekeUniversiteitLeuven,Belgium

SibylleSchwarzHTWKLeipzig,Germany

DietmarSeipelUniversityofWürzburg,Germany

MartinSulzmannKarlsruheUniversityofAppliedSciences,Germany

HansTompitsViennaUniversityofTechnology,Austria

GermanVidalUniversitatPolitècnicadeValència,Spain

JanisVoigtl änderUniversityofDuisburg-Essen,Germany

JohannesWaldmannHTWKLeipzig,Germany

LocalOrganization

FalcoNogatzUniversityofWürzburg,Germany

DietmarSeipelUniversityofWürzburg,Germany

AdditionalReviewers

PedroBarahona ZhuoChen

DanielGall

FalcoNogatz NadaSharaf

Constraints

ConstraintSolvingonHybridSystems............................3 PedroRoqueandVascoPedro

Run-TimeAnalysisofTemporalConstrainedObjects.................20 JineshM.Kannimoola,BharatJayaraman,andKrishnashreeAchuthan

ImplementationofLogicalRetractioninConstraintHandlingRules withJustifications..........................................37 ThomFrühwirth

TheProportionalConstraintandItsPruning........................53 ArminWolf

AnOperationalSemanticsforConstraint-LogicImperativeProgramming....64 JanC.DagefördeandHerbertKuchen

HypertreeDecomposition:TheFirstStepTowardsParallel ConstraintSolving..........................................81 KeLiu,SvenLöffler,andPetraHofstedt

DeclarativeSystems

DeclarativeAspectsinExplicativeDataMining forComputationalSensemaking................................97 MartinAtzmueller

AnApproachforRepresentingAnswerSetsinNaturalLanguage.........115 MinFangandHansTompits

TechniquesforEfficientLazy-GroundingASPSolving................132 LorenzLeutgebandAntoniusWeinzierl

TheSyllogisticReasoningTask:ReasoningPrinciples andHeuristicStrategiesinModelingHumanClusters.................149 Emmanuelle-AnnaDietzSaldanha,SteffenHölldobler, andRichardMörbitz

FunctionalandLogicProgramming

ConcolicTestingofFunctionalLogicPrograms.....................169 JanRasmusTikovsky

DeclarativeXMLSchemaValidationwithSWI–Prolog: SystemDescription.........................................187 FalcoNogatzandJonaKalkus

plspec – ASpecificationLanguageforPrologData...................198 PhilippKörnerandSebastianKrings

AuthorIndex ............................................215

Constraints

ConstraintSolvingonHybridSystems

PedroRoque(B) andVascoPedro

LISP,Universidadede ´ Evora, ´ Evora,Portugal d11735@alunos.uevora.pt , vp@di.uevora.pt

Abstract. Applyingparallelismtoconstraintsolvingseemsapromisingapproachandithasbeendonewithvaryingdegreesofsuccess.Early attemptstoparallelizeconstraintpropagation,whichconstitutesthecore oftraditionalinterleavedpropagationandsearchconstraintsolving,were hinderedbyitsessentiallysequentialnature.Recently,parallelization eﬀortshavefocussedmainlyonthesearchpartofconstraintsolving,as wellasonlocal-searchbasedsolving.Lately,aparticularsourceofparallelismhasbecomepervasive,intheguiseofGPUs,abletorunthousandsofparallelthreads,andtheyhavenaturallydrawntheattention ofresearchersinparallelconstraintsolving.

Inthispaper,weaddresschallengesfacedwhenusingmultipledevices forconstraintsolving,especiallyGPUs,suchasdecidingontheappropriatelevelofparallelismtoemploy,loadbalancingandinter-device communication,andpresentourcurrentsolutions.

Keywords: Constraintsolving · Parallelism · GPU · IntelMIC Hybridsystems

1Introduction

ConstraintSatisfactionProblems(CSPs)allowmodelingproblemsliketheCostasArrayproblem[6],andsomereallifeproblemslikeplanningandscheduling[2],resourcesallocation[7]androutedeﬁnition[3].

CPU’sparallelismisalreadybeingusedwithsuccesstospeedupthesolvingprocessesofharderCSPs[5, 16, 19, 21].However,veryfewconstraintsolvers contemplatetheuseofGPUs.Infact,Jenkins etal. recentlyconcludedthat theexecutionmodelandthearchitectureofGPUsarenotwellsuitedtocomputationsdisplayingirregulardataaccessandcodeexecutionpatternssuchas backtrackingsearch[10].

WearecurrentlydevelopingaconstraintsolvernamedParallelHeterogeneous ArchitectureToolkit(PHACT)thatisalreadycapableofachievingstate-of-theartperformancesonmulti-coreCPUs,andcanalsospeedupthesolvingprocess byaddingGPUsandprocessorslikeIntelManyIntegratedCores(MICs1 )to solvetheproblems.

1 IntelMICsarecoprocessorsthatcombinemanyIntelprocessorcoresontoasingle chipwithdedicatedRAM.

c SpringerNatureSwitzerlandAG2018

D.Seipeletal.(Eds.):DECLARE2017,LNAI10997,pp.3–19,2018. https://doi.org/10.1007/978-3-030-00801-7 1

ThenextsectionintroducesthemainCSPconceptsandSect. 3 presentssome relatedwork.Section 4 describesthearchitectureofPHACT,andinSect. 5 the resultsachievedwithPHACT,whensolvingsomeCSPsonmultiplecombinationsofdevicesandwhencomparedwithsomestate-of-the-artsolvers,are displayedanddiscussed.Section 6 presentstheconclusionsanddirectionsfor futurework.

2CSPsConcepts

ACSPcanbebrieﬂydescribedasasetofvariableswithﬁnitedomains,anda setofconstraintsbetweenthevaluesofthosevariables.ThesolutionofaCSP istheassignmentofonevaluefromtherespectivedomaintoeachoneofthe variables,ensuringthatallconstraintsaremet[3].

Forexample,theCostasArrayproblemconsistsinplacing n dotsona n × n matrixsuchthateachrowandcolumncontainonlyonedotandallvectors betweendotsaredistinct.ItcanbemodeledasaCSPwith n + n(n 1)/2 variables, n ofwhichcorrespondtothedotsandeachoneismappedtoadiﬀerent matrixcolumn.Thedomainofthese n variablesiscomposedbytheintegersthat correspondtothematrixrowswhereeachdotmaybeplaced.Theremaining n(n 1)/2variablesconstituteadiﬀerencetriangle,whoserowscannotcontain repeatedvalues[6].

ThemethodsforsolvingCSPscanbecategorizedasincompleteorcomplete. Incompletesolversdonotguaranteethatanexistingsolutionwillbefound, beingmostlyusedforoptimizationproblemsandforlargeproblemsthatwould taketoomuchtimetofullyexplore.Incompletesearchisbeyondthescopeof thispaperandwillnotbediscussedhere.Onthecontrary,completemethods, suchastheoneimplementedinPHACT,guaranteethatifasolutionexists,it willbefound.

3RelatedWork

SearchingforCSPsolutionsinabacktrackingapproachcanberepresentedinthe formofasearchtree.Totakeadvantageofparallelismthissearchtreemaybe splitintomultiplesubtreesandeachoneofthemexploredinadiﬀerentthread thatmayberunningonadiﬀerentcore,deviceormachine.Thisistheapproach generallyfoundinparallelconstraintsolvers,whichrunonsingleordistributed multi-coreCPUs[5, 16, 19, 21].

PedrodevelopedaCSPsolvernamedParallelCompleteConstraintSolver (PaCCS)capableofrunningfromasinglecoreCPUtomultiplemulti-coreCPUs inadistributedsystem[16].Distributingtheworkamongthethreadsthrough workstealingtechniquesandusingtheMessagePassingInterface(MPI)toallow communicationbetweenmachines,thissolverachievedalmostlinearspeedups formostoftheproblemstested,whenusingmachineswithupto16CPUcores.

R´egin etal. implementedEmbarrassinglyParallelSearch,featuringaninterfaceresponsiblefordecomposinganinitialproblemintomultiplesub-problems,

ﬁlteringoutthosefoundtobeinconsistent[20].Aftergeneratingthesubproblemsitcreatesmultiplethreads,eachonecorrespondingtoanexecution ofasolver(e.g.,Gecode[22]),towhichasub-problemissentatatimefor exploration.

Forsomeoptimizationandsearchproblems,wherethefullsearchspaceis explored,theseauthorsachievedaveragegainsof13.8and7.7againstasequentialversion,whenusingGecodethroughtheirinterfaceorjustGecode,respectively[20].Ontheirtrials,thebestresultswereachievedwhendecomposingthe initialprobleminto30sub-problemsperthreadandrunning40threadsona machinewith40CPUcores.

WhilesolvingCSPsthroughparallelizationhasbeenasubjectofresearch fordecades,theusageofGPUsforthatpurposeisarecentarea,andassuch therearenotmanypublishedreportsofrelatedwork.Toourknowledge,there areonlytwopublishedpapersrelatedwithconstraintsolvingonGPUs[1, 4]. Fromthesetwo,onlyCampeotto etal. presentedacompletesolver[4].

Campeotto etal. developedaCSPsolverwithNvidia’sComputeUnified DeviceArchitecture(CUDA),capableofusingsimultaneouslyaCPUandan NvidiaGPUtosolveCSPs[4].OntheGPU,thissolverimplementsanapproach differentfromtheonementionedbefore,namely,insteadofsplittingthesearch treeovermultiplethreads,itsplitseachconstraintpropagationovermultiple threads.ConstraintsrelatingmanyvariablesarepropagatedontheGPU,while theremainingconstraintsarefilteredsequentiallybytheCPU.OntheGPU, thepropagationandconsistencycheckforeachconstraintareassignedtoone ormoreblocksofthreadsaccordingtothenumberofvariablesinvolved.The domainofeachvariableisfilteredbyadifferentthread.

Campeotto etal. reducedthedatatransfertoaminimumbytransferringto theGPUonlythedomainsofthevariablesthatwerenotlabeledyetandthe eventsgeneratedduringthelastpropagation.Eventsidentifythechangesthat happenedtoadomain,likebecomingasingletonorhavinganewmaximum value,whichallowsdecidingontheappropriatepropagatortoapply.

Campeotto etal. obtainedspeedupsofupto6.61,withproblemslikethe LangfordproblemandsomerealproblemssuchasthemodiﬁedRenaultproblem[4],whencomparingasequentialexecutiononaCPUwiththehybrid CPU/GPUversion.

4SolverArchitecture

PHACTisacompletesolver,capableofﬁndingasolutionforaCSPifoneexists. Itismeanttobeabletouseallthe(parallel)processingpowerofthedevices availableonasystem,suchasCPUs,GPUsandMICs,tospeedupsolving constraintproblems.

Thesolveriscomposedofamasterprocesswhichcollectsinformationabout thedevicesthatareavailableonthemachine,suchasthenumberofcoresand thetypeofdevice(CPU,GPUorMIC),andcalculatesthenumberofsubsearchspacesthatwillbecreatedtodistributeamongthosedevices.Foreach

devicetherewillbeonethread(communicator)responsibleforcommunicating withthatdevice,andinsideeachdevicetherewillbearangeofthreads(search engines)thatwillperformlabeling,constraintpropagationandbacktrackingon onesub-searchspaceatatime.Thenumberofsearchenginesthatwillbecreated insideeachdevicewilldependonthenumberofcoresandtypeofthatdevice, andmayvaryfrom8onaQuad-coreCPUtomorethan100,000onaGPU.

PHACTmaybeusedtocountallthesolutionsofagivenCSP,toﬁndjust onesolutionorabestone(foroptimizationproblems).

Framework

PHACTisimplementedinCandOpenCL[13],whichallowsitsexecutionon multipletypesofdevicesfromdiﬀerentvendorsandthecapabilityofbeing executedonLinuxoronMicrosoftWindows.

WepresentsomeOpenCLconcepts,inordertobetterunderstandPHACT’s architecture:

– Computeunit. Oneormoreprocessingelementsandtheirlocalmemory.In NvidiaGPUseachStreamingMultiprocessor(SM)isacomputeunit.AMD GPUshavetheirowncomponentscalledComputeUnitsthatmatchthis deﬁnition.ForCPUsandMICs,thenumberofavailablecomputeunitsis normallyequaltoorhigherthanthenumberofthreadsthatthedevicecan executesimultaneously[13];

– Kernel. Thecodethatwillbeexecutedonthedevices;

– Work-item. Aninstanceofthekernel(thread);

– Work-group. Composedofoneormorework-itemsthatwillbeexecuted onthesamecomputeunit,inparallel.Allwork-groupsforonekernelonone devicehavethesamenumberofwork-items;

– Host. CPUwheretheapplicationresponsibleformanagingtheexecutionof thekernelsisrun;

– Device. Adevicewherethekernelsareexecuted(CPU,GPU,MIC).

Intheimplementationdescribedhere,themasterprocessandthethreads responsibleforcommunicatingwiththedevicesrunontheOpenCLhostand thesearchenginesrunonthedevices.TheOpenCLhostmayalsoconstitutea device,inwhichcaseitwillbesimultaneouslycontrollingandcommunicating withthedevicesandrunningsearchengines.Eachsearchenginecorrespondsto awork-item,andallwork-itemsexecutethesamekernelcode,whichimplements thesearchengine.

SearchSpaceSplittingandWorkDistribution

Fordistributingtheworkbetweenthedevices,PHACTsplitsthesearchspace intomultiplesub-searchspaces.Search-spacesplittingiseﬀectedbypartitioning thedomainsofoneormoreofthevariablesoftheproblem,sothattheresulting sub-searchspacespartitionthefullsearchspace.Thenumberandthesizeofthe sub-searchspacesthuscreateddependonthenumberofwork-itemswhichwill beused,andmaygouptoafewmillions.

Example 1 showstheresultofsplittingthesearchspaceofaCSPwiththree variables, V 1, V 2and V 3,allwithdomain {1, 2},into4sub-searchspaces, SS 1, SS 2, SS 3and SS 4.

Example1. Creationof4sub-searchspaces

Sinceeachdevicewillhavemultiplesearchenginesrunninginparallel,the computedpartitionisorganizedintoblocksofcontiguoussub-searchspacesthat willbehandledbyeachdevice,oneatatime.Thenumberofsub-searchspaces thatwillcomposeeachblockwillvaryalongthesolvingprocessanddependson theperformanceofeachdeviceonsolvingthecurrentproblem.

Thecommunicatorthreadsrunningonthehostlaunchtheexecutionofthe searchenginesonthedevices,handeachdeviceoneblockofsub-searchspacesto explore,andcoordinatetheprogressofthesolvingprocessaseachdeviceﬁnishes exploringitsassignedblock.Thecoordinationofthedevicesconsistsinassessing thestateofthesearch,distributingmoreblockstothedevices,signalingtoall thedevicesthattheyshouldstop(whenasolutionhasbeenfoundandonlyone iswanted),orupdatingthecurrentbound(inoptimizationproblems).

LoadBalancing

Anessentialaspecttoconsiderwhenparallelizingsometaskisthebalancingof theworkbetweentheparallelcomponents.Creatingsub-searchspaceswithbalanceddomains,whenpossible,isnoguaranteethattheamountofworkinvolved inexploringeachofthemisevensimilar.Tocompoundtheissue,wearedealing withdeviceswithdiﬀeringcharacteristicsandvaryingspeeds,makingiteven hardertostaticallydetermineanoptimal,orevengood,workdistribution.

AchievingeffectiveloadbalancingbetweendeviceswithsuchdifferentarchitecturesasCPUsandGPUsisacomplextask[10].Whentryingtoimplementdynamicloadbalancing,twoimportantOpenCLlimitationsarise,namely whenadeviceisexecutingakernelitisnotpossibleforittocommunicate withotherdevices[8],andtheexecutionofakernelcannotbepausedor stopped.Hence,techniqueslikeworkstealing[5, 17],whichrequirescommunicationbetweenthreads,willnotworkwithkernelsthatrunindependentlyon differentdevicesandloadbalancingmustbedoneonthehostside.

Tobettermanagethedistributionofwork,thehostcouldreducetheamount ofworkitsendstothedeviceseachtime,byreducingthenumberofsub-search spacesineachblock.Thiswouldmakethedevicessynchronizemorefrequently onthehostandallowforaﬁnercontroloverthebehaviorofthesolver.When workingwithGPUs,though,thenumberandthesizeofdatatransfersbetween thedevicesandthehostshouldbeassmallaspossible,becausethesearevery timeconsumingoperations.So,abalancemustbestruckbetweentheworkload ofthedevicesandtheamountofcommunicationneeded.

PHACTimplementsadynamicloadbalancingtechniquewhichadjuststhe sizeoftheblocksofsub-searchspacestotheperformanceofeachdevicesolving thecurrentproblem,whencomparedtotheperformanceoftheotherdevices.

Initiallyeachdevice d explorestwosmallblocksofsub-searchspacestoget the averagetime, avg (d),itneedstoexploreonesub-searchspace.Thesize ofthoseblocksmaybedistinctamongdevicesasitiscalculatedaccordingto thenumberofthreadsthateachdeviceiscapableofrunningsimultaneously anditsclockspeed.Whentwoormoredevicesﬁnishexploringthoseﬁrsttwo blocks,their rank, rank (d)iscalculatedaccordingtoEq.(1),where m isthe totalnumberofdevices.

Therankofadeviceconsistsofavaluebetween0and1,correspondingto therelativespeedofthedeviceagainstallthedevicesthatwereusedforsolving ablockofsub-searchspaces.Fasterdeviceswillgetahigherrankthanslower devices,andthesumoftheranksofallthedeviceswillbe1.Therankisthen usedtocalculatethesizeofthenextblockofsub-searchspacestosendtothe device,bymultiplyingitsvaluebythenumberofsub-searchspacesthatareyet tobeexplored.

Sincethesizeoftheﬁrsttwoblocksofsub-searchspacesexploredbyeach deviceissmall,topreventslowdevicesfromdominatingthesolvingprocess, itoftenonlyallowsforaroughapproximationofthespeedofadevice.So,in thebeginning,only1/3oftheremainingsub-searchspacesareconsideredwhen computingthesizeofthenextblocktosendtoadevice.

Forthefirstdevicetofinishitsfirsttwoblocks,itwillnotbepossibleto calculateitsrank,asitwouldneedthe averagetime ofatleastonemoredevice. Inthiscase,thatdevicewillgetanewblockwithtwicethesizeoftheprevious ones,asthisdeviceisprobablythefastestdevicesolvingthecurrentproblem.

Assearchprogresses,everytimeadeviceﬁnishesexploringanotherblock, itsaveragetimeandrankareupdated.Thevalueoftheaveragetimeofadevice istheresultofdividingthetotaltimethatthedevicewasexploringsub-search spacesbythetotalnumberofsub-searchspacesthatitexploredalready.

Astherankvaluestabilizes,thesizeofthenewblockofsub-searchspacesfor thedevicewillbethecorrespondingpercentagefromallunexploredsub-search spaces.Table 1 exempliﬁesthecalculationofthenumberofsearchspacesthat willcomposetheblockofsearchspaceswhichwillbesentforeachdeviceassoon aseachofthemﬁnishesitspreviousblock.Thisisrepeateduntiladevicewaiting forworkisestimatedtoneedlessthanonesecond2 tosolvealltheremaining sub-searchspaces,inwhichcaseitwillbeassignedallofthem.

2 Ifadevicetakeslessthanonesecondtoexploreablockofsearchspaces,mostof thattimeisspentcommunicatingwiththehostandinitializingitsdatastructures.

Table1. Exampleofthecalculationofblockssizewhenusingthreedevices

Device Averagetimeper searchspace(ms) Rank Remainingsub-search spacestoexplore Sizeofthenextblock ofsub-searchspaces

AnotherchallengeGPUsposeisthattheyachievethebestperformancewhen runninghundredsoreventhousandsofthreadssimultaneously.Buttousethat levelofparallelism,theymusthaveenoughworktokeepthatmanythreads busy.Otherwise,whenaGPUreceivesablockwithlesssub-searchspacesthan thenumberofthreadsthatwouldallowittoachieveitsbestperformance,the averagetimeneededtoexploreonesub-searchspaceincreasessharply.

ForexampletheNvidiaGeForceGTX980Mtakesabout1.1stoﬁndallthe solutionsforthen-Queens13whensplittingtheproblemin742,586sub-search spaces,and2.4swhensplitinonly338sub-searchspaces.Thischallengeisalso validforCPUs,butnotsoproblematicduetotheirlesserdegreeofparallelism whencomparedwiththeGPUs.

Toovercomethatchallenge,sub-searchspacesmaybefurtherdividedinside adevice,byapplyingamultiplierfactor m tothesizeofablockandturning ablockofsub-searchspacesintoablockwith m timestheoriginalnumberof sub-searchspaces,thatwillbecreatedaspresentedinExample 1

Communication

Toreducetheamountofdatathatistransferredtoeachdevice,allofthemwill receivethefullCSP,thatis,alltheconstraints,variablesandtheirdomains, atthebeginningofthesolvingprocess.Afterwards,whenadevicemustbe instructedtosolveanewblockofsub-searchspaces,insteadofsendingallthe sub-searchspacestothedevice,onlytheinformationneededtocreatethose sub-searchspacesissent.

Ifadeviceistosolvesub-searchspaces SS 2and SS 3fromExample 1,it willreceivetheinformationthatthetreemustbeexpandeddowntodepth2, thatthevaluesoftheﬁrstvariablearerepeated2timesandthevaluesofthe secondvariablearerepeated1timeonly(notrepeated).Withthisinformation thedevicewillknowthatthevaluesoftheﬁrstvariablearerepeated2times, sothethirdsub-searchspace(SS 3)willgetthesecondvalueofthatvariable, andsoondowntotheexpansiondepth.Thevaluesofthevariablesthatwere notexpandedaresimplycopiedfromtheoriginalCSPthatwaspassedtothe devicesatthebeginningofthesolvingprocess.

Eachtimeawork-itemneedsanewsub-searchspacetoexplore,itincreases byonethenumberoftheﬁrst/nextsub-searchspacethatisyettobeexplored onthatdeviceandcreatesthesub-searchspacecorrespondingtothenumber beforebeingincreased.Thenitwilldolabeling,propagationandbacktracking onthatsearch-space,repeatingallthesestepsuntileitherallthesub-search

spacesofthatblockhavebeenexplored,whenallthesolutionsmustbefound, oronlyonesolutioniswantedandoneofthework-itemsonthatdeviceﬁnds asolution.

ImplementationDetails

Severaltestsweremadetoﬁndthebestnumberofwork-groupstouseforeach typeofdevice.ItwasfoundthatforCPUsandMICsthebestresultswere achievedwiththesamenumberofwork-groupsastheamountofcomputeunits ofthedevice.ForGPUs,thepredeﬁnednumberofwork-groupsis4096dueto theincreasedlevelofparallelismallowedbythistypeofdevices.

Theusercanspecifyhowmanysub-searchspacesmustbecreatedorlet PHACTestimatethatnumber.Forestimatingthenumberofsub-searchspaces thatwillbegenerated,PHACTwillsumallthework-itemsthatwillbeusedin allthedevicesandmultiplythatvalueby40ifallthesolutionsmustbefound forthecurrentCSP,orby100ifonlyonesolutionisrequiredorwhensolvingan optimizationproblem.Afterseveralteststhesevalues(40and100)werefound asallowingtoachieveagoodloadbalancingbetweenthedevices,andassuch theyarethepredeﬁnedvalues.

Whenlookingforjustonesolutionoroptimizing,theamountofworksent toeachdeviceisreducedbygeneratingmoresub-searchspacesanddecreasingthesizeoftheblockssenttothedevices,whichmakeseachoneofthem fastertoexplore,tomakesureallthedevicesaresynchronizedonthehostmore frequently.

Asforthenumberofwork-itemsperwork-group,CPUsandMICsare assignedonework-itemperwork-group,astheircomputeunitscanonlyexecuteonethreadatatime.

Onthecontrary,eachGPUcomputeunitcanexecutemorethanonethread simultaneously.Forexample,theNvidiaGeForceGTX980has16SMswith128 CUDAcores3 each,makingatotalof2048CUDAcores.Nevertheless,eachSMis onlycapableofexecutingsimultaneously32threads(usingonly32CUDAcores atthesametime)makingitcapableofrunning512threadssimultaneously[15].

EachSMhasverylimitedresourcesthataresharedbetweenwork-groupsand theirwork-items,thuslimitingthenumberofwork-itemsperwork-groupthat canbeusedaccordingtotheresourcesneededbyeachwork-item.Themain limitationisthesizeofthelocalmemoryofeachSMthatissharedbetween allthework-itemsofthesamework-groupandbetweensomework-groups(8 work-groupsfortheNvidiaGeForceGTX980).

Forthisreason,PHACTestimatesthebestnumberofwork-itemsperworkgrouptouseforGPUs,bylimitingtheamountoflocalmemoryrequiredtothe sizeoftheavailablelocalmemoryontheGPU.Whentheavailablelocalmemory isnotenoughtoeﬃcientlyuseatleastonework-itemperwork-group,PHACT willonlyusetheglobalmemoryofthedevice,whichismuchlargerbutalso muchslower,and32work-itemsperwork-group,aseachSMisonlycapableof running32threadssimultaneously.

3 ACUDAcoreisaprocessingelementcapableofexecutingoneintegerorﬂoating instructionperclockforathread.

NotethatPHACTrepresentsvariabledomainsaseither32-bitbitmaps,multiplesof64-bitbitmaps,oras(compact)intervals.Whenusingintervals,PHACT isslowerthanwhenusingbitmaps,butintervalsaremeanttobeusedinstead oflargerbitmapsonsystemswherethesizeoftheRAMisanissue.

ThetechniquesdescribedinthissectionallowPHACTtouseallthedevices compatiblewithOpenCLtosolveaCSP.Itsplitsthesearchspaceinmultiple searchspacesthataredistributedamongthedevicesinblockstoreducethe numberofcommunicationsbetweenthehostandthedevices.Thesizeofeach blockiscalculatedaccordingtothespeedoftherespectivedevicewhensolving thepreviousblockstotrytoachieveagoodloadbalancingbetweenthedevices. Thesizeofthedatatransfersbetweenthedevicesandthehostisreducedby replacingtheblocksoffullycreatedsearchspaceswithasmalldatasetcontaining theinformationneededforadevicetogeneratethosesearchspaces.

5ResultsandDiscussion

PHACTwasevaluatedonfindingallthesolutionsforfourdifferentCSPs,on optimizingoneotherCSPandonfindingonesolutionforanotherCSP,each onewithtwodifferentsizes,exceptfortheLatinProblemwhosesmallersize issolvedtoofastandabiggersizetakestoolongtosolve.Thosetestswere executedonone,twoandthreedevicesandonfourdifferentmachinesrunning LinuxtoevaluatethespeedupswhenaddingmoredevicestohelptheCPU. PHACTperformancewascomparedwiththoseofPaCCSandGecode5.1.0 onthesefourmachines.Thefourmachineshavethefollowingcharacteristics:

M1. Machinewith32GBofRAMand: –IntelCorei7-4870HQ(8computeunits); –NvidiaGeForceGTX980M(12computeunits).

M2. Machinewith64GBofRAMand:

–IntelXeonE5-2690v2(referredtoasXeon1,40computeunits); –NvidiaTeslaK20c(13computeunits).

M3. Machinewith128GBofRAMand: –AMDOpteron6376(64computeunits); –TwoAMDTahitis(32computeunitseach).ThesetwodevicesarecombinedinanAMDRadeonHD7990,butaremanagedseparatelyby OpenCL.

M4. Machinewith64GBofRAMand:

–IntelXeonCPUE5-2640v2(referredtoasXeon2,32computeunits); –TwoIntelManyIntegratedCore7120P(240computeunitseach).

Tables 2, 3, 4 and 5 presenttheelapsedtimesandspeedupswhensolving alltheproblemsonM1,M2,M3andM4,respectively.Fiveofthesixproblems modelswereretrievedfromtheMinizincBenchmarkssuite[12].TheLangford NumbersproblemwasretrievedfromCSPLib[9],duetotheabsenceofreiﬁed constraintsonPHACTandPaCCS,thatareusedintheMinizincBenchmarks

model,whichwouldleadtodiﬀerentconstraintsbeingusedamongthethree solvers.PaCCSdoesnothavethe“absolutevalue”constraintimplemented,so itwasnottestedwiththeAllIntervalproblem.

Thissetofproblemsallowedtoevaluatethesolverswith8differentconstraintscombinedwitheachotherindifferentways.Allthesolutionswerefound fortheproblemswhosenameisfollowedby“(Count)”onthetables,theoptimalsolutionwassearchedfortheproblemidentifiedwith“(Optim.)”andfor theproblemwhosenameisfollowedby“(One)”,onlyonesolutionwasrequired. Forsimplicity,the4tableshavetheresourcesusedontherespectivemachine identifiedasR1,R2,R3andR4,whereR1meansusingonlyasinglethreadon theCPU,R2meansusingallthethreadsofthatCPU,R3meansusingallthe threadsontheCPUandonedevice(Geforce,Tesla,TahitiorMIC),andR4 meansusingallthethreadsontheCPUandtwoidenticaldevices(MICsor Tahitis).ItmustbenotedthatonlyPHACTiscapableofusingR3andR4 resources.

Table 2 showsthatusingtheGeforcetohelpI7allowedspeedupsofupto 4.66.However,intwoproblems,usingalsotheGeforceresultedinmoretime neededtosolvethesameproblems.Thisresultismainlyduetothesmallnumber ofwork-itemsperwork-groupthatwaseﬀectivelyusedonGeforce,duetothe localmemorylimitationsdetailedinSect. 4.Onthismachine,addingtheGeforce tohelpI7allowedageometricmeanspeedupof1.53.

TheslowdownnotedwhenoptimizingtheGolombRulerwith12marksis alsoduetotheimpossibilityofdifferentdevicescommunicatingwitheachother whiletheirkernelsarerunning,asstatedinSect. 4.Thisisproblematicwhen optimizing,asadevicewhichfindsabettersolutioncannottelltheotherdevices tofindonlysolutionsbetterthantheoneitjustfound.Insteaditwillfinish exploringitsblockofsub-searchspacesandonlyafterthatitwillinformthehost aboutthenewsolution,andonlyafterthispoint,whenanotherdevicefinishes itsblock,itwillbeinformedaboutthenewsolutionthatmustbeoptimized. Duetothislimitation,thedevicesspendsometimelookingforsolutionsthat mayalreadybeworsethantheonesfoundbyotherdevices.Thisproblemwas alsonotedontheotherthreemachines.

AsfortheLangfordNumbersproblemwith14numbers,theworseresult whenaddingtheGeforcewasduetotheveryunbalancedsub-searchspaces thataregeneratedleadingtomostofsub-searchspacesbeingeasilydetectedas inconsistent,andonlyafewcontainingmostofthework.Thisisproblematic, becauseaseachthreadexploreseachsub-searchspacesequentially,intheend onlyafewthreadswillbeworkingonthehardersub-searchspaceswhilethe othersareidle.Thisproblemwasalsonotedontheotherthreemachines.

PHACTwasfasterthanPaCCSinallproblems,achievingspeedupsofup to5.37.

WhencomparingwithGecode,PHACTachievedgoodspeedupsonallthe problems,exceptonMarketSplit,whichisasimpleproblemwithonlyone constrainttypewhichmayhaveafasterpropagatoronGecode.Onthecontrary, withtheLatinproblem,Gecodewas127.85timesslowerthanPHACTwhen

Table2. ElapsedtimesandspeedupsonM1,with4coresand1GPU

usingonlytheCPU.Gecodewasslowerinsolvingthisproblemwithallthe CPUthreadsthanwhenusingonlyonethread,whichsuggeststhatthemethod usedforloadbalancingbetweenthreadsisveryineﬃcientforthisproblem.This behaviorofGecodewasnotedinallthemachines.

Table 3 presentstheresultsonsolvingthesameproblemsonM2.Usingthe TeslaGPUtohelptheXeon1resultedinmostofthecasesinaslowdown.In fact,addingtheTeslatohelpXeon1introducedanaverageslowdownof0.84. ThisisduetothefactthatTeslawastheslowestGPUusedonthetests,being nomatchforXeon1.Infact,theworkdonebyTesladidnotcompensatethe timespentbyXeon1(host)tocontrolTesla(device).

Onthismachine,PHACTwasfasterthanPaCCSinallbutoneproblem,resultinginanaveragespeedupof1.44favoringPHACT.Comparingwith Gecode,PHACTwasfasteronalltheproblemswithalltheresourcescombinations.

TheresultsfortheM3machinearepresentedinTable 4.ThismachinepossessestheCPUusedontheteststhathasthegreaternumberofcores(64), anditispairedupwithtwoTahitiGPUs,thatarefasterthanTesla,butslower thanGeforce.SoitisveryhardfortheTahitistodisplaysomeperformance gainswhencomparedwitha64coresCPU.However,withtheAllInterval15 problem,theywerecapableofspeedingupthesolvingprocessby1.48times. Onaverage,addingthetwoTahitiGPUstohelpOpterondidnotallowany

Table3. ElapsedtimesandspeedupsonM2,with40coresand1GPU

speedup,becausethetimespentbyOpterontocontrolandcommunicatewith theTahitiswassimilartothetimethattheOpteronwouldtaketoperformthe workdonebytheTahitis.

TheissueswithGolombRulerandLangfordNumberdiscussedbeforeinthis section,werealsonotedonthismachine.

WhencomparingwithPaCCS,PHACTachievedspeedupsthatrangedfrom 0.21onaverysmallproblemto4.67.PHACTwasfasterthanGecodeinall thetests,exceptwhenoptimizingGolombRuler12withtheOpteronandone Tahiti.

Table 5 presentstheresultsontheM4machine.Thismachinepossessestwo MICswhosearchitectureismoresimilartotheCPUsthantoGPUs,so,they aremorepreparedforsolvingsequentialproblemsthanGPUs.Thatdiﬀerence wasnotedwiththeLangfordNumbersproblem,wheretheywerecapableof achievingaspeedupof1.51despitetheunbalancedsub-searchspaces.Onthis machine,addingthetwoMICstohelpXeon2allowedanaveragespeedupof 1.45.WhencountingallthesolutionsfortheCostasArray15,thetwoMICs allowedatopspeedupof1.90.

WhencomparedwithPaCCSandGecodetheresultsareverysimilartothe onesachievedontheothermachines,beingfasterthanGecodeinallbutone problemandfasterthanPaCCSin19ofthe24tests.

Table4. ElapsedtimesandspeedupsonM3,with64coresand2GPUs

Figure 1 presentsthegeometricmeanofthespeedupsachievedbyPHACT againstPaCCSandGecode,showingthatPHACTwasfasterthanGecodeand PaCCSonallthemachineswithalltheresourcescombinations.

WecanobservethatthediﬀerenceinperformancebetweenPHACTand GecodeisgreateronthemachinesthathaveaCPUwithmorecores,whichshows thattheloadbalancingtechniquesimplementedinPHACTaremoreeﬃcient fortheproblemsthatwerepresentedhere.WhencomparedwithPaCCS,that relationisnolongernoticedandtheresultsaremuchcloserbetweenthetwo solverswhenusingonlytheCPUs.

UsingalltheavailableresourcesonthefourmachinesallowedPHACTto increaseitsperformancewhencomparedtoPaCCSandGecode,whichshows thatitsgreaterversatilitycanleadtoanimprovedperformance.

Table5. ElapsedtimesandspeedupsonM4,with32coresand2MICs

6ConclusionandFutureWork

Toourknowledge,PHACTistheonlyconstraintsolvercapableofusingsimultaneouslyCPUs,GPUs,MICSandanyotherdevicecompatiblewithOpenCL tosolveCSPsinafastermanner.AlthoughGPUsarenotparticularlyeﬃcient forthistypeofproblems,theystillcanspeedupthesolvingprocessandinsome cases,beevenfasterthantheCPUofthesamemachine.

PHACThasbeentestedwith6diﬀerentCSPson4diﬀerentmachineswith 2and3deviceseach,namelyIntelCPUsandMICs,NvidiaGPUs,andAMD CPUsandGPUs,allowingittoachievespeedupsofupto4.66whencompared withusingonlytheCPUofthemachinetosolveasingleCPS,andageometric meanspeedupofupto1.53whensolvingallthereferredCSPsoneachmachine.

Onthefourmachinesusedfortesting,PHACTachievedageometricmean speedupthatrangedfrom1.28to2.83whencomparedwithPACCS,and2.31to 28.44whencomparedwithGecode.Theuseofallthedevicescompatiblewith OpenCLtosolveaCSPallowedPHACTtoimproveitsperformanceagainst PaCCSandGecodewhencomparedwithusingonlytheCPUs.

Campeotto etal. [4]achievedatopspeedupof6.61whenusingonethreadof aCPUtogetherwithaGPU,whilePHACTachievedaveragespeedupsbetween 1.56and2.88whenusingonethreadonaCPUandaGPU,withatopspeedup of15.15,andaverageandtopspeedupsof7.33and30.63whenreplacingthe GPUbytwoMICs.AlthoughtheirtechniqueofusingtheGPUstopropagate constraintsrelatingmanyvariablesseemstohavesigniﬁcanthost–devicesynchronizationrequirements,weintendtotestthisapproachinthefuture.

PHACTisyetbeingimprovedtotrytoovercomethelackofsynchronizationbetweendeviceswhenoptimizing.Thesolutionmaypassbymorefrequent communicationbetweenhostanddevices,takingintoaccountthenumberof solutionsalreadyfoundandincreasingthefrequencyofthecommunicationfor problemswithmoresolutions.

Asfortheunbalancedsub-searchspacesthatleadtoonlyafewthreads workinginparallelwhiletheothershavealreadyﬁnishedtheirwork,weare analysingawork-sharingstrategy[18]thatmaybeexecutedwhenallthesubsearchspacesgeneratedfortheblockhaveendedbutsomethreadsarestill working.

AMiniZinc/FlatZinc[14]readerisalsobeingimplementedtoallowthedirect inputofproblemsalreadymodeledinthislanguage.

Acknowledgments. ThisworkwaspartiallyfundedbyFunda¸c˜aoparaaCiˆenciae Tecnologia(FCT)undergrantUID/CEC/4668/2016(LISP).Someoftheexperimentationwascarriedoutonthe khromeleque clusteroftheUniversityof ´ Evora,which waspartlyfundedbygrantsALENT-07-0262-FEDER-001872andALENT-07-0262FEDER-001876.

References

1.Arbelaez,A.,Codognet,P.:AGPUimplementationofparallelconstraint-based localsearch.In:201422ndEuromicroInternationalConferenceonPDP,pp.648–655.IEEE(2014)

2.Bart´ak,R.,Salido,M.A.:Constraintsatisfactionforplanningandschedulingproblems.Constraints 16(3),223–227(2011)

3.Brailsford,S.,Potts,C.,Smith,B.:Constraintsatisfactionproblems:algorithms andapplications.Eur.J.Oper.Res. 119,557–581(1999)

4.Campeotto,F.,DalPal`u,A.,Dovier,A.,Fioretto,F.,Pontelli,E.:Exploringthe useofGPUsinconstraintsolving.In:Flatt,M.,Guo,H.-F.(eds.)PADL2014. LNCS,vol.8324,pp.152–167.Springer,Cham(2014). https://doi.org/10.1007/ 978-3-319-04132-2 11

5.Chu,G.,Schulte,C.,Stuckey,P.J.:Conﬁdence-basedworkstealinginparallel constraintprogramming.In:Gent,I.P.(ed.)CP2009.LNCS,vol.5732,pp.226–241.Springer,Heidelberg(2009). https://doi.org/10.1007/978-3-642-04244-7 20

6.Diaz,D.,Richoux,F.,Codognet,P.,Caniou,Y.,Abreu,S.:Constraint-basedlocal searchforthecostasarrayproblem.In:Hamadi,Y.,Schoenauer,M.(eds.)LION 2012.LNCS,pp.378–383.Springer,Heidelberg(2012). https://doi.org/10.1007/ 978-3-642-34413-8 31

7.Filho,C.,Rocha,D.,Costa,M.,Albuquerque,P.:Usingconstraintsatisfaction problemapproachtosolvehumanresourceallocationproblemsincooperative healthservices.ExpertSyst.Appl. 39(1),385–394(2012)

8.Gaster,B.,Howes,L.,Kaeli,D.,Mistry,P.,Schaa,D.:HeterogeneousComputing withOpenCL.MorganKaufmannPublishersInc.,SanFrancisco(2011)

9.Jeﬀerson,C.,Miguel,I.,Hnich,B.,Walsh,T.,Gent,I.P.:CSPLib:aproblem libraryforconstraints(1999). http://www.csplib.org

10.Jenkins,J.,Arkatkar,I.,Owens,J.D.,Choudhary,A.,Samatova,N.F.:Lessons learnedfromexploringthebacktrackingparadigmontheGPU.In:Jeannot,E., Namyst,R.,Roman,J.(eds.)Euro-Par2011.LNCS,vol.6853,pp.425–437. Springer,Heidelberg(2011). https://doi.org/10.1007/978-3-642-23397-5 42

11.Mairy,J.-B.,Deville,Y.,Lecoutre,C.:Domaink-wiseconsistencymadeassimpleasgeneralizedarcconsistency.In:Simonis,H.(ed.)CPAIOR2014.LNCS, vol.8451,pp.235–250.Springer,Cham(2014). https://doi.org/10.1007/978-3-31907046-9 17

12.MIT:asuiteofminizincbenchmarks(2017). https://github.com/MiniZinc/ minizinc-benchmarks

13.Munshi,A.,Gaster,B.,Mattson,T.G.,Fung,J.,Ginsburg,D.:OpenCLProgrammingGuide,1stedn.Addison-WesleyProfessional,Boston(2011)

14.Nethercote,N.,Stuckey,P.J.,Becket,R.,Brand,S.,Duck,G.J.,Tack,G.:MiniZinc: towardsastandardCPmodellinglanguage.In:Bessi`ere,C.(ed.)CP2007.LNCS, vol.4741,pp.529–543.Springer,Heidelberg(2007). https://doi.org/10.1007/9783-540-74970-7 38

15.NVIDIACorporation:NVIDIAGeForceGTX980featuringmaxwell,themost advancedGPUevermade.Whitepaper.NVIDIACorporation(2014)

16.Pedro,V.:Constraintprogrammingonhierarchicalmultiprocessorsystems.Ph.D. thesis,Universidadede ´ Evora(2012)

17.Pedro,V.,Abreu,S.:Distributedworkstealingforconstraintsolving.In:Vidal, G.,Zhou,N.F.(eds.)CICLOPS-WLPE2010,Edinburgh,Scotland,U.K.(2010)

18.Rolf,C.C.,Kuchcinski,K.:Load-balancingmethodsforparallelanddistributed constraintsolving.In:2008IEEEInternationalConferenceonClusterComputing, pp.304–309,September2008

19.Rolf,C.C.,Kuchcinski,K.:Parallelsolvinginconstraintprogramming.In:MCC 2010,November2010

20.R´egin,J.-C.,Rezgui,M.,Malapert,A.:Embarrassinglyparallelsearch.In:Schulte, C.(ed.)CP2013.LNCS,vol.8124,pp.596–610.Springer,Heidelberg(2013). https://doi.org/10.1007/978-3-642-40627-0 45

21.Schulte,C.:Parallelsearchmadesimple.In:Beldiceanu,N.,etal.(eds.)ProceedingsofTRICS:CP2000,Singapore,September2000

22.Schulte,C.,Duchier,D.,Konvicka,F.,Szokoli,G.,Tack,G.:Genericconstraint developmentenvironment. http://www.gecode.org/

Another random document with no related content on Scribd:

Ladder scaffolds.—A light scaffold of ladders braced, and connected by rails, which also serve the purpose of guard rails, is shown in fig. 43 The ladders, which have parallel sides, are placed about 2 feet away from the building. The boards forming the platform can be laid on the ladder rungs, or if necessary on brackets as shown in fig. 44. The ladders are prevented from falling away from the building by ties which are connected to the ladder as shown in fig. 45, and fastened to the window openings by extension rods as shown in fig. 46. The same figure illustrates the method of tying in the scaffold when the ladders are not opposite to the windows, the rail being connected to at least two ladders. The braces and guard rails are bored for thumb screws at one end, the other being slotted so that they can be adjusted as required. This form of scaffold is only

suitable for repairing purposes, and no weight of material can be stored upon it.

A light repairing scaffold lately put on the market has a platform which is supported and not suspended, but otherwise affords about the same scope to the workmen as the painters’ boats. It consists of one pole and a platform, the latter being levered up and down the pole as required by a man standing on the platform itself. The whole apparatus can be moved by one man standing at the bottom. It is an arrangement comparatively new to the English trade, but is in considerable use in Denmark, Germany, and Sweden.

CHAPTER III

SHORING AND UNDERPINNING

S is the term given to a method of temporarily supporting buildings by a framing of timber acting against the walls of the structure. If the frame consists of more than one shore, it is called a system; if of two or more systems, it becomes a series.

The forces that tend to render a building unstable are due primarily to gravity, but owing to the various resistances set up by the tying together of the building, the force does not always exert itself vertically downwards.

This instability may arise from various causes, the most common being the unequal settlement of materials in new buildings, the pulling down of adjoining buildings, structural alterations and defects, and alterations or disturbances of the adjacent ground which affect the foundations. The pulling down of an adjoining building would, by removing the corresponding resistance, allow the weight of the internal structure of the building to set up forces which at first would act in a horizontal direction outwards. Structural defects, such as an insufficiently tied roof truss, would have the same effect. Structural alterations, such as the removal of the lower portion of a wall in order to insert a shop front would allow a force due to gravity to act vertically downwards.

To resist these forces, three different methods of shoring are in general use, and they are known as flying or horizontal shores, raking shores, and underpinning.

Flying Shores.—Where the thrusts acting upon the wall are in a horizontal direction, flying or raking shores are used to give temporary support. The most direct resistance is gained by the firstnamed, the flying or horizontal shore. There are, however, limits to its application, as, owing to the difficulty of obtaining sound timber of more than 50 to 60 feet in length, a solid body is necessary within that distance, from which the required purchase can be obtained.

It is a method of shoring generally used where one house in a row is to be taken down, the timbers being erected as demolition proceeds, and taken down again as the new work takes its place.

Fig. 47 shows a half-elevation of two general systems of construction.

The framing, as at , may be used alone where the wall to be supported is of moderate height and the opening narrow, but larger frames should be combined, as at .

The framework is for wide openings and walls of considerable height.

The wall plates, 9 in. by 2 in. or 9 in. by 3 in., are first fixed vertically on the walls by wall hooks. Then, in a line with the floors, rectangular holes 4 in. by 3 in. are cut in the centre of wall plates. Into these holes, and at least 41⁄2 inches into the brickwork, needles (also known as tossles and joggles) of the same size are fitted, leaving a projection out from the wall plate of 5 in. or 6 in., sufficient to carry the shore of about 7 in. by 7 in.

The shore, prior to being fixed, has nailed on its top and under sides straining pieces 2 inches thick, and of the same width as the shore. To tighten, oak folding wedges are driven at one end between the shore and wall plate.

F. 47

To stiffen the shore, and to further equalise the given resistance over the defective wall, raking struts are fixed between the straining pieces, and cleats are nailed above and below the shore. These raking struts are tightened by driving wedges between their ends and the straining pieces.

The cleats previous to, and in addition to being nailed, should be slightly mortised into the wall plate. This lessens the likelihood of the nails drawing under the pressure.

A Raking Shore consists of a triangulated system of timber framing, and is used to support defective walls where the resistance to the threatened rupture has to be derived from the ground surrounding the building.

In its simplest form a raking shore is a balk of timber of varying scantlings, but as a rule of square section, inclined from the ground to the defective wall. The angle of inclination is taken from the horizon, and should vary between 60 and 75 degrees. In settling this the space available at the foot of the wall has to be taken into consideration, especially in urban districts where the wall abuts on the footpath.

Fig. 48 shows a raking shore in its simplest form, but usually two or more shores are used (see fig. 49).

The following table from Mr. Stock’s book1 shows the general rule and also the scantlings to be used:

For walls from 15 to 30 feet high

For walls from 30 to 40 feet high

For walls from 40 feet high and upwards

2 shores are necessary in each system

3 shores are necessary in each system

4 shores are necessary in each system

F. 49.—E R S

Taking the angle of the shore at from 60 to 75 degrees:

For walls from 15 to 20 feet high 5 in. by 5 in. may be the scantling for each shore

For walls from 20 to 30 feet high 6 in. by 6 in. may be the scantling for each shore

For walls from 30 to 35 feet high 7 in. by 7 in. may be the scantling for each shore

For walls from 35 to 40 feet high 8 in. by 8 in. may be the scantling for each shore

For walls from 40 to 50 feet high 9 in. by 9 in. may be the scantling for each shore

For walls from 50 feet and upwards 12 in. by 9 in. may be the scantling for each shore

In the greatest length, the beams are 12 in. by 9 in. to give increased rigidity, which prevents any likelihood of sagging.

The wall plate is the first timber put into position. It is placed vertically down the face of the wall, and held in its position by wall hooks. Note should then be taken of the position of floors. If the floor joists run at right angles to the wall, the shore should abut on the wall in such position that it points directly below the wall plate carrying the floor joist. If the joists run parallel to the wall, the shore should act directly on a point representing the meeting of lines drawn down the centre of the wall and across the centre of the floor (see fig. 50).

F. 50

To enable the shore to fulfil this condition, the needle (of 4 in. by 3 in.) should be let through the plate 41⁄2 inches into the wall below the point in question. To strengthen the needle cleats are nailed, and slightly let into the plate immediately above.

The footing, or sole piece, has next to be laid. It consists of a timber 11 in. by 3 in., and long enough to take the bottom ends of the required number of shores. Attention should be paid to the ground in which it is to be bedded, and if this is at all soft, additional timbers

should be laid under, and at right angles with it, to give greater bearing.

The sole piece should not be laid at right angles to the shore, but its face should form, with the outside line of the top shore, an angle somewhat wider, say of 93 degrees. The advantage of this will be seen presently.

The shore itself has now to be prepared. Its top end should be grooved sufficiently (fig. 51) to receive the needle. This will prevent lateral motion when under pressure.

F. 51

The bottom end should be slightly slotted, in order to receive the end of a crowbar (see fig. 52).

It is now placed in position, and gently tightened up by the leverage of a crowbar acting in the slot, and using the sole piece as a fulcrum.

The advantage of the sole piece not being at right angles to the shore can now be seen, as if it were so laid no tightening could be

gained by the leverage. This system is an improvement upon the tightening up by wedges, as the structure is not jarred in any manner. If the frame is to have more than one shore, they are erected in the same manner, the bottom shore being the first put up, the others succeeding in their turn. When in position the shores are dogged to the sole piece and a cleat is nailed down on the outer side of the system. The bottom ends are then bound together by hoop iron just above the ground level. To prevent the shores sagging, struts are fixed as shown on fig. 49.

F. 52

Besides preventing the sagging these struts serve the purpose of keeping the shores in position. They may be fixed as nearly at right angles to the shores as possible, or at right angles to the wall; in any case they should reach to the wall plate at a point just below the needle. The struts should be nailed to the shores and wall plate. If

the latter is wider than the shores, it should be cut to receive the struts.

It sometimes occurs that the timbers are of insufficient length to reach from the sole piece to wall plate. To overcome this difficulty, a short timber is laid on the sole piece against and parallel to the next middle raker, and on this short timber a rider shore stands reaching to its position on the wall plate (see fig. 49).

When this is done the top middle raker should be stiffer to resist the increased cross strain. Stiffness is gained by increasing the depth. A rider shore is tightened by oak folding wedges driven between the foot of the shore and the short timber which supports it.

Note must be taken that the outer raker is not carried too near the top of the building, or else the upward thrust of the shores, which always exists with raking shores, might force the bond or joints.

Fir is the best wood for shoring owing to the ease with which it can be obtained in good length. Another advantage is its straightness of fibre; although, as it is more easily crushed by pressure across the grain, it does not answer so well as oak for wedges, sole pieces, &c.

In erecting flying or raking shores, notice should be taken of the following points.

The systems should be placed from 12 to 15 feet apart if on a wall without openings, otherwise on the piers between the openings.

In very defective walls it is an advantage to use lighter scantlings, the systems being placed closer together. Heavy timbers handled carelessly may precipitate the collapse which it is the intention to avoid.

Wedge driving and tightening should be done as gently as possible. It should be remembered that support only is to be given, and not new thrusts set up, which may result in more harm than good.

Underpinning.—Underpinning is necessary to carry the upper part of a wall, while the lower part is removed; for instance, the insertion of a shop front, or the repairing of a foundation. It is only kept in position until a permanent resistance to the load is effected. Underpinning is, as a rule, unnecessary when the opening to be

made is of less width than five feet. This method of shoring is a simple operation, but yet requires great care in its execution.

The first thing to be done is to remove from the wall all its attendant loads. This is accomplished by strutting from the foundation floor upwards from floor to floor until the roof is reached (see fig. 53).

Header and sole plates 9 in. by 2 in. are put in at right angles to the joists in order to give bearing to the struts.

The portion of the wall to be taken down having been marked out, small openings are made, slightly above the proposed removal, at from 5 to 7 feet apart, and through these, at right angles to the face of the wall itself, steel joists or balk timbers 13 in. by 13 in., called needles, are placed. These are supported at each end by vertical timbers 13 in. by 13 in., called dead shores, which again rest upon sleepers.

The sleepers serve as a bed to the dead shores to which they are dogged, and by distributing the weight over a larger area, they prevent the dead shores sinking under the pressure. The dead shores, if well braced, may be of smaller scantling.

Where it is impossible to arrange for the dead shores to be in one length, the lower pieces are first fixed. They must be of uniform length, and across their top end a transom is carried to support the upper pieces, the bottom ends of which must stand directly over the top ends of the lower pieces (see fig. 53).

Having placed all the timbers in position, and before the tightening up takes place, the windows or other openings in the wall are strutted to prevent any twisting which may take place. This is done as shown on fig. 54, but small windows do not require the centering.

F. 53.—E U

The tightening up is caused by the driving home of oak folding wedges placed in the joints between the needles and the dead shores. This position is better than between the shore and sleeper, as any inequality of driving here would have the tendency to throw the shore out of the perpendicular. For a similar reason the wedges should be driven in the same line as the run of the needle, as cross driving, if unequal, would cause the needle to present an inclined surface to the wall to be carried.

F.

In carrying out these operations note should be taken of the following points:—

1. That the dead shores should not stand over cellars or such places. It is better to continue the needle to such a length that solid ground is reached, and the needle can then be strutted from the dead shore.