LectureNotesinArtificialIntelligence10997
SubseriesofLectureNotesinComputerScience
LNAISeriesEditors
RandyGoebel UniversityofAlberta,Edmonton,Canada
YuzuruTanaka HokkaidoUniversity,Sapporo,Japan
WolfgangWahlster
DFKIandSaarlandUniversity,Saarbrücken,Germany
LNAIFoundingSeriesEditor
JoergSiekmann
DFKIandSaarlandUniversity,Saarbrücken,Germany
Moreinformationaboutthisseriesathttp://www.springer.com/series/1244
DietmarSeipel • MichaelHanus
SalvadorAbreu(Eds.) DeclarativeProgramming andKnowledgeManagement
ConferenceonDeclarativeProgramming,DECLARE2017
UnifyingINAP,WFLP,andWLP
Würzburg,Germany,September19–22,2017
RevisedSelectedPapers
Editors
DietmarSeipel
UniversitätWürzburg
Wuerzburg
Germany
MichaelHanus
Christian-Albrechts-Universit ätzuKiel
Kiel
Germany
SalvadorAbreu
Universidadede Èvora
Evora
Portugal
ISSN0302-9743ISSN1611-3349(electronic)
LectureNotesinArtificialIntelligence
ISBN978-3-030-00800-0ISBN978-3-030-00801-7(eBook) https://doi.org/10.1007/978-3-030-00801-7
LibraryofCongressControlNumber:2018954670
LNCSSublibrary:SL7 – ArtificialIntelligence
© SpringerNatureSwitzerlandAG2018
Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartofthe materialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodologynow knownorhereafterdeveloped.
Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse.
Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbookare believedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsortheeditors giveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforanyerrorsor omissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictionalclaimsin publishedmapsandinstitutionalaffiliations.
ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland
Preface
ThisvolumecontainsaselectionofthepaperspresentedattheInternationalConferenceonDeclarativeProgrammingDeclare2017.Thejointconferencewasheldin Würzburg,Germany,duringSeptember19–22,2017.Itconsistedofthe21stInternationalConferenceonApplicationsofDeclarativeProgrammingandKnowledge Management(INAP),the31stWorkshoponLogicProgramming(WLP),andthe25th WorkshoponFunctionaland(Constraint)LogicProgramming(WFLP),anditwas accompaniedbyaone-weeksummerschoolonAdvancedConceptsforDatabasesand LogicProgrammingforstudentsandPhDstudents.
Declarativeprogrammingisanadvancedparadigmformodelingandsolvingcomplexproblems,whichhasattractedincreasedattentionoverthelastdecades,e.g.,inthe domainsofdataandknowledgeengineering,databases,artificialintelligence,natural languageprocessing,modelingandprocessingcombinatorialproblems,andforestablishingknowledge-basedsystemsfortheweb.TheconferenceDeclare2017aimedto promotethecross–fertilizingexchangeofideasandexperiencesamongresearchesand studentsfromthedifferentcommunitiesinterestedinthefoundations,applications,and combinationsofhigh-level,declarativeprogrammingandrelatedareas.
TheINAPconferencesprovideaforumforintensivediscussionsofapplicationsof importanttechnologiesaroundlogicprogramming,constraintproblemsolving,and closelyrelatedadvancedsoftware.TheycomprehensivelycovertheimpactofprogrammablelogicsolversintheInternetsociety,itsunderlyingtechnologies,and leadingedgeapplicationsinindustry,commerce,government,andsocietalservices. PreviousINAPconferenceshavebeenheldinJapan,Germany,Portugal,andAustria. TheWorkshopsonLogicProgramming(WLP)aretheannualmeetingoftheGerman SocietyforLogicProgramming(GLPe.V.).Theybringtogetherinternational researchersinterestedinlogicprogramming,constraintprogramming,andrelatedareas likedatabasesandartifi cialintelligence.PreviousWLPworkshopshavebeenheldin Germany,Austria,Switzerland,andEgypt.TheInternationalWorkshoponFunctional andLogicProgramming(WFLP)bringstogetherresearchersinterestedinfunctional programming,logicprogramming,aswellastheintegrationoftheseparadigms.PreviousWFLPeditionshavebeenheldinGermany,France,Spain,Italy,Estonia,Brazil, Denmark,andJapan.Thetopicsofthepapersofthisyear'sjointconferenceDeclare concentratedonthreecurrentlyimportant fi elds:constraintprogrammingandsolving, functionalandlogicprogramming,anddeclarativeprogramming.
Thedeclarativeprogrammingparadigmexpressesthelogicofacomputationinan abstractway.Thus,thesemanticsofadeclarativelanguagebecomeseasiertograspfor domainexperts.Declarativeprogrammingoffersmanyadvantagesfordataandknowledgeengineering,suchas,e.g.,security,safety,andshorterdevelopmenttime.Duringthe lastcoupleofyears,alotofresearchhasbeenconductedontheusageofdeclarative systemsinareaslikeanswersetprogramming,reasoning,meta-programming,and deductivedatabases.Reasoningaboutknowledgewrappedinrules,databases,orthe
SemanticWeballowstoexploreinterestinghiddenknowledge.Declarativetechniquesfor thetransformation,deduction,induction,visualization,orqueryingofknowledgehavethe advantageofhightransparencyandbettermaintainabilitycomparedtoprocedural approaches.
Manyproblemswhichoccurinlargeindustrialtasksareintractable,invalidating theirsolutionbyexactorevenmanyapproximateconstructivealgorithms.One approachwhichhasmadesubstantialprogressoverthelastfewyearsisconstraint programming.Itsdeclarativenatureofferssignificantadvantages,fromasoftware engineeringstandpointandinthespeci fication,implementation,andmaintenance phases.Severalinterestingaspectsareindiscussion:howcanthisparadigmbe improvedorcombinedwithknown,classicalmethods;howcanreal-worldsituations bemodelledasconstraintproblems;whatstrategiesmaybepursuedtosolveaproblem onceithasbeenspeci fied;orwhatistheexperienceofapplicationsinreallylarge industrialplanning,simulation,andoptimisationtasks?
Anotherareaofactiveresearchistheuseofdeclarativeprogramminglanguages,in particular,functionalandlogiclanguages,toimplementmorereliablesoftwaresystems.Theclosenessoftheselanguagestologicalmodelsprovidesnewmethodstotest andverifyprograms.Combiningdifferentprogrammingparadigmsisbeneficialfroma softwareengineeringpointofview.Therefore,theextensionofthelogicprogramming paradigmanditsintegrationwithotherprogrammingconceptsareactiveresearch branches.Thesuccessfulextensionoflogicprogrammingwithconstraintshasalready beenmentioned.Theintegrationoflogicprogrammingwithotherprogramming paradigmshasbeenmainlyinvestigatedforthecaseoffunctionalprogramming,sothat types,modules,higher-orderoperators,orlazyevaluationcanalsobeusedin logic-orientedcomputations.
ThethreeeventsINAP,WLP,andWFLPwerejointlyorganizedbytheUniversity ofWürzburgandtheSocietyforLogicProgramming(GLPe.V.).Wewouldliketo thankallauthorswhosubmittedpapersandallconferenceparticipantsforthefruitful discussions.WearegratefultothemembersoftheProgramCommitteeandthe externalrefereesfortheirtimelyexpertiseincarefullyreviewingthepapers.Wewould liketoexpressourthankstotheGermanFederalMinistryofEducationandResearch (BMBF)forfundingthesummerschoolonAdvancedConceptsforDatabasesand LogicProgramming(under01PL16019)andtotheUniversityofWürzburgforhosting theconferenceinthenewCentralLectureBuildingZ6andforprovidingtheTuscany HallintheBaroquestyleWürzburgResidencePalaceforaclassicalmusicconcertin honorofJackMinker,apioneerindeductivedatabasesanddisjunctivelogicprogrammingandthelongtimementorofthe firsteditor,whocelebratedhis90thbirthday in2017.
July2018DietmarSeipel MichaelHanus SalvadorAbreu
ConstraintSolvingonHybridSystems
PedroRoque(B) andVascoPedro
LISP,Universidadede ´ Evora, ´ Evora,Portugal d11735@alunos.uevora.pt , vp@di.uevora.pt
Abstract. Applyingparallelismtoconstraintsolvingseemsapromisingapproachandithasbeendonewithvaryingdegreesofsuccess.Early attemptstoparallelizeconstraintpropagation,whichconstitutesthecore oftraditionalinterleavedpropagationandsearchconstraintsolving,were hinderedbyitsessentiallysequentialnature.Recently,parallelization effortshavefocussedmainlyonthesearchpartofconstraintsolving,as wellasonlocal-searchbasedsolving.Lately,aparticularsourceofparallelismhasbecomepervasive,intheguiseofGPUs,abletorunthousandsofparallelthreads,andtheyhavenaturallydrawntheattention ofresearchersinparallelconstraintsolving.
Inthispaper,weaddresschallengesfacedwhenusingmultipledevices forconstraintsolving,especiallyGPUs,suchasdecidingontheappropriatelevelofparallelismtoemploy,loadbalancingandinter-device communication,andpresentourcurrentsolutions.
Keywords: Constraintsolving · Parallelism · GPU · IntelMIC Hybridsystems
1Introduction
ConstraintSatisfactionProblems(CSPs)allowmodelingproblemsliketheCostasArrayproblem[6],andsomereallifeproblemslikeplanningandscheduling[2],resourcesallocation[7]androutedefinition[3].
CPU’sparallelismisalreadybeingusedwithsuccesstospeedupthesolvingprocessesofharderCSPs[5, 16, 19, 21].However,veryfewconstraintsolvers contemplatetheuseofGPUs.Infact,Jenkins etal. recentlyconcludedthat theexecutionmodelandthearchitectureofGPUsarenotwellsuitedtocomputationsdisplayingirregulardataaccessandcodeexecutionpatternssuchas backtrackingsearch[10].
WearecurrentlydevelopingaconstraintsolvernamedParallelHeterogeneous ArchitectureToolkit(PHACT)thatisalreadycapableofachievingstate-of-theartperformancesonmulti-coreCPUs,andcanalsospeedupthesolvingprocess byaddingGPUsandprocessorslikeIntelManyIntegratedCores(MICs1 )to solvetheproblems.
1 IntelMICsarecoprocessorsthatcombinemanyIntelprocessorcoresontoasingle chipwithdedicatedRAM.
c SpringerNatureSwitzerlandAG2018
D.Seipeletal.(Eds.):DECLARE2017,LNAI10997,pp.3–19,2018. https://doi.org/10.1007/978-3-030-00801-7 1
ThenextsectionintroducesthemainCSPconceptsandSect. 3 presentssome relatedwork.Section 4 describesthearchitectureofPHACT,andinSect. 5 the resultsachievedwithPHACT,whensolvingsomeCSPsonmultiplecombinationsofdevicesandwhencomparedwithsomestate-of-the-artsolvers,are displayedanddiscussed.Section 6 presentstheconclusionsanddirectionsfor futurework.
2CSPsConcepts
ACSPcanbebrieflydescribedasasetofvariableswithfinitedomains,anda setofconstraintsbetweenthevaluesofthosevariables.ThesolutionofaCSP istheassignmentofonevaluefromtherespectivedomaintoeachoneofthe variables,ensuringthatallconstraintsaremet[3].
Forexample,theCostasArrayproblemconsistsinplacing n dotsona n × n matrixsuchthateachrowandcolumncontainonlyonedotandallvectors betweendotsaredistinct.ItcanbemodeledasaCSPwith n + n(n 1)/2 variables, n ofwhichcorrespondtothedotsandeachoneismappedtoadifferent matrixcolumn.Thedomainofthese n variablesiscomposedbytheintegersthat correspondtothematrixrowswhereeachdotmaybeplaced.Theremaining n(n 1)/2variablesconstituteadifferencetriangle,whoserowscannotcontain repeatedvalues[6].
ThemethodsforsolvingCSPscanbecategorizedasincompleteorcomplete. Incompletesolversdonotguaranteethatanexistingsolutionwillbefound, beingmostlyusedforoptimizationproblemsandforlargeproblemsthatwould taketoomuchtimetofullyexplore.Incompletesearchisbeyondthescopeof thispaperandwillnotbediscussedhere.Onthecontrary,completemethods, suchastheoneimplementedinPHACT,guaranteethatifasolutionexists,it willbefound.
3RelatedWork
SearchingforCSPsolutionsinabacktrackingapproachcanberepresentedinthe formofasearchtree.Totakeadvantageofparallelismthissearchtreemaybe splitintomultiplesubtreesandeachoneofthemexploredinadifferentthread thatmayberunningonadifferentcore,deviceormachine.Thisistheapproach generallyfoundinparallelconstraintsolvers,whichrunonsingleordistributed multi-coreCPUs[5, 16, 19, 21].
PedrodevelopedaCSPsolvernamedParallelCompleteConstraintSolver (PaCCS)capableofrunningfromasinglecoreCPUtomultiplemulti-coreCPUs inadistributedsystem[16].Distributingtheworkamongthethreadsthrough workstealingtechniquesandusingtheMessagePassingInterface(MPI)toallow communicationbetweenmachines,thissolverachievedalmostlinearspeedups formostoftheproblemstested,whenusingmachineswithupto16CPUcores.
R´egin etal. implementedEmbarrassinglyParallelSearch,featuringaninterfaceresponsiblefordecomposinganinitialproblemintomultiplesub-problems,
filteringoutthosefoundtobeinconsistent[20].Aftergeneratingthesubproblemsitcreatesmultiplethreads,eachonecorrespondingtoanexecution ofasolver(e.g.,Gecode[22]),towhichasub-problemissentatatimefor exploration.
Forsomeoptimizationandsearchproblems,wherethefullsearchspaceis explored,theseauthorsachievedaveragegainsof13.8and7.7againstasequentialversion,whenusingGecodethroughtheirinterfaceorjustGecode,respectively[20].Ontheirtrials,thebestresultswereachievedwhendecomposingthe initialprobleminto30sub-problemsperthreadandrunning40threadsona machinewith40CPUcores.
WhilesolvingCSPsthroughparallelizationhasbeenasubjectofresearch fordecades,theusageofGPUsforthatpurposeisarecentarea,andassuch therearenotmanypublishedreportsofrelatedwork.Toourknowledge,there areonlytwopublishedpapersrelatedwithconstraintsolvingonGPUs[1, 4]. Fromthesetwo,onlyCampeotto etal. presentedacompletesolver[4].
Campeotto etal. developedaCSPsolverwithNvidia’sComputeUnified DeviceArchitecture(CUDA),capableofusingsimultaneouslyaCPUandan NvidiaGPUtosolveCSPs[4].OntheGPU,thissolverimplementsanapproach differentfromtheonementionedbefore,namely,insteadofsplittingthesearch treeovermultiplethreads,itsplitseachconstraintpropagationovermultiple threads.ConstraintsrelatingmanyvariablesarepropagatedontheGPU,while theremainingconstraintsarefilteredsequentiallybytheCPU.OntheGPU, thepropagationandconsistencycheckforeachconstraintareassignedtoone ormoreblocksofthreadsaccordingtothenumberofvariablesinvolved.The domainofeachvariableisfilteredbyadifferentthread.
Campeotto etal. reducedthedatatransfertoaminimumbytransferringto theGPUonlythedomainsofthevariablesthatwerenotlabeledyetandthe eventsgeneratedduringthelastpropagation.Eventsidentifythechangesthat happenedtoadomain,likebecomingasingletonorhavinganewmaximum value,whichallowsdecidingontheappropriatepropagatortoapply.
Campeotto etal. obtainedspeedupsofupto6.61,withproblemslikethe LangfordproblemandsomerealproblemssuchasthemodifiedRenaultproblem[4],whencomparingasequentialexecutiononaCPUwiththehybrid CPU/GPUversion.
4SolverArchitecture
PHACTisacompletesolver,capableoffindingasolutionforaCSPifoneexists. Itismeanttobeabletouseallthe(parallel)processingpowerofthedevices availableonasystem,suchasCPUs,GPUsandMICs,tospeedupsolving constraintproblems.
Thesolveriscomposedofamasterprocesswhichcollectsinformationabout thedevicesthatareavailableonthemachine,suchasthenumberofcoresand thetypeofdevice(CPU,GPUorMIC),andcalculatesthenumberofsubsearchspacesthatwillbecreatedtodistributeamongthosedevices.Foreach
devicetherewillbeonethread(communicator)responsibleforcommunicating withthatdevice,andinsideeachdevicetherewillbearangeofthreads(search engines)thatwillperformlabeling,constraintpropagationandbacktrackingon onesub-searchspaceatatime.Thenumberofsearchenginesthatwillbecreated insideeachdevicewilldependonthenumberofcoresandtypeofthatdevice, andmayvaryfrom8onaQuad-coreCPUtomorethan100,000onaGPU.
PHACTmaybeusedtocountallthesolutionsofagivenCSP,tofindjust onesolutionorabestone(foroptimizationproblems).
Framework
PHACTisimplementedinCandOpenCL[13],whichallowsitsexecutionon multipletypesofdevicesfromdifferentvendorsandthecapabilityofbeing executedonLinuxoronMicrosoftWindows.
WepresentsomeOpenCLconcepts,inordertobetterunderstandPHACT’s architecture:
– Computeunit. Oneormoreprocessingelementsandtheirlocalmemory.In NvidiaGPUseachStreamingMultiprocessor(SM)isacomputeunit.AMD GPUshavetheirowncomponentscalledComputeUnitsthatmatchthis definition.ForCPUsandMICs,thenumberofavailablecomputeunitsis normallyequaltoorhigherthanthenumberofthreadsthatthedevicecan executesimultaneously[13];
– Kernel. Thecodethatwillbeexecutedonthedevices;
– Work-item. Aninstanceofthekernel(thread);
– Work-group. Composedofoneormorework-itemsthatwillbeexecuted onthesamecomputeunit,inparallel.Allwork-groupsforonekernelonone devicehavethesamenumberofwork-items;
– Host. CPUwheretheapplicationresponsibleformanagingtheexecutionof thekernelsisrun;
– Device. Adevicewherethekernelsareexecuted(CPU,GPU,MIC).
Intheimplementationdescribedhere,themasterprocessandthethreads responsibleforcommunicatingwiththedevicesrunontheOpenCLhostand thesearchenginesrunonthedevices.TheOpenCLhostmayalsoconstitutea device,inwhichcaseitwillbesimultaneouslycontrollingandcommunicating withthedevicesandrunningsearchengines.Eachsearchenginecorrespondsto awork-item,andallwork-itemsexecutethesamekernelcode,whichimplements thesearchengine.
SearchSpaceSplittingandWorkDistribution
Fordistributingtheworkbetweenthedevices,PHACTsplitsthesearchspace intomultiplesub-searchspaces.Search-spacesplittingiseffectedbypartitioning thedomainsofoneormoreofthevariablesoftheproblem,sothattheresulting sub-searchspacespartitionthefullsearchspace.Thenumberandthesizeofthe sub-searchspacesthuscreateddependonthenumberofwork-itemswhichwill beused,andmaygouptoafewmillions.
Example 1 showstheresultofsplittingthesearchspaceofaCSPwiththree variables, V 1, V 2and V 3,allwithdomain {1, 2},into4sub-searchspaces, SS 1, SS 2, SS 3and SS 4.
Example1. Creationof4sub-searchspaces
SS
Sinceeachdevicewillhavemultiplesearchenginesrunninginparallel,the computedpartitionisorganizedintoblocksofcontiguoussub-searchspacesthat willbehandledbyeachdevice,oneatatime.Thenumberofsub-searchspaces thatwillcomposeeachblockwillvaryalongthesolvingprocessanddependson theperformanceofeachdeviceonsolvingthecurrentproblem.
Thecommunicatorthreadsrunningonthehostlaunchtheexecutionofthe searchenginesonthedevices,handeachdeviceoneblockofsub-searchspacesto explore,andcoordinatetheprogressofthesolvingprocessaseachdevicefinishes exploringitsassignedblock.Thecoordinationofthedevicesconsistsinassessing thestateofthesearch,distributingmoreblockstothedevices,signalingtoall thedevicesthattheyshouldstop(whenasolutionhasbeenfoundandonlyone iswanted),orupdatingthecurrentbound(inoptimizationproblems).
LoadBalancing
Anessentialaspecttoconsiderwhenparallelizingsometaskisthebalancingof theworkbetweentheparallelcomponents.Creatingsub-searchspaceswithbalanceddomains,whenpossible,isnoguaranteethattheamountofworkinvolved inexploringeachofthemisevensimilar.Tocompoundtheissue,wearedealing withdeviceswithdifferingcharacteristicsandvaryingspeeds,makingiteven hardertostaticallydetermineanoptimal,orevengood,workdistribution.
AchievingeffectiveloadbalancingbetweendeviceswithsuchdifferentarchitecturesasCPUsandGPUsisacomplextask[10].Whentryingtoimplementdynamicloadbalancing,twoimportantOpenCLlimitationsarise,namely whenadeviceisexecutingakernelitisnotpossibleforittocommunicate withotherdevices[8],andtheexecutionofakernelcannotbepausedor stopped.Hence,techniqueslikeworkstealing[5, 17],whichrequirescommunicationbetweenthreads,willnotworkwithkernelsthatrunindependentlyon differentdevicesandloadbalancingmustbedoneonthehostside.
Tobettermanagethedistributionofwork,thehostcouldreducetheamount ofworkitsendstothedeviceseachtime,byreducingthenumberofsub-search spacesineachblock.Thiswouldmakethedevicessynchronizemorefrequently onthehostandallowforafinercontroloverthebehaviorofthesolver.When workingwithGPUs,though,thenumberandthesizeofdatatransfersbetween thedevicesandthehostshouldbeassmallaspossible,becausethesearevery timeconsumingoperations.So,abalancemustbestruckbetweentheworkload ofthedevicesandtheamountofcommunicationneeded.
PHACTimplementsadynamicloadbalancingtechniquewhichadjuststhe sizeoftheblocksofsub-searchspacestotheperformanceofeachdevicesolving thecurrentproblem,whencomparedtotheperformanceoftheotherdevices.
Initiallyeachdevice d explorestwosmallblocksofsub-searchspacestoget the averagetime, avg (d),itneedstoexploreonesub-searchspace.Thesize ofthoseblocksmaybedistinctamongdevicesasitiscalculatedaccordingto thenumberofthreadsthateachdeviceiscapableofrunningsimultaneously anditsclockspeed.Whentwoormoredevicesfinishexploringthosefirsttwo blocks,their rank, rank (d)iscalculatedaccordingtoEq.(1),where m isthe totalnumberofdevices.
Therankofadeviceconsistsofavaluebetween0and1,correspondingto therelativespeedofthedeviceagainstallthedevicesthatwereusedforsolving ablockofsub-searchspaces.Fasterdeviceswillgetahigherrankthanslower devices,andthesumoftheranksofallthedeviceswillbe1.Therankisthen usedtocalculatethesizeofthenextblockofsub-searchspacestosendtothe device,bymultiplyingitsvaluebythenumberofsub-searchspacesthatareyet tobeexplored.
Sincethesizeofthefirsttwoblocksofsub-searchspacesexploredbyeach deviceissmall,topreventslowdevicesfromdominatingthesolvingprocess, itoftenonlyallowsforaroughapproximationofthespeedofadevice.So,in thebeginning,only1/3oftheremainingsub-searchspacesareconsideredwhen computingthesizeofthenextblocktosendtoadevice.
Forthefirstdevicetofinishitsfirsttwoblocks,itwillnotbepossibleto calculateitsrank,asitwouldneedthe averagetime ofatleastonemoredevice. Inthiscase,thatdevicewillgetanewblockwithtwicethesizeoftheprevious ones,asthisdeviceisprobablythefastestdevicesolvingthecurrentproblem.
Assearchprogresses,everytimeadevicefinishesexploringanotherblock, itsaveragetimeandrankareupdated.Thevalueoftheaveragetimeofadevice istheresultofdividingthetotaltimethatthedevicewasexploringsub-search spacesbythetotalnumberofsub-searchspacesthatitexploredalready.
Astherankvaluestabilizes,thesizeofthenewblockofsub-searchspacesfor thedevicewillbethecorrespondingpercentagefromallunexploredsub-search spaces.Table 1 exemplifiesthecalculationofthenumberofsearchspacesthat willcomposetheblockofsearchspaceswhichwillbesentforeachdeviceassoon aseachofthemfinishesitspreviousblock.Thisisrepeateduntiladevicewaiting forworkisestimatedtoneedlessthanonesecond2 tosolvealltheremaining sub-searchspaces,inwhichcaseitwillbeassignedallofthem.
2 Ifadevicetakeslessthanonesecondtoexploreablockofsearchspaces,mostof thattimeisspentcommunicatingwiththehostandinitializingitsdatastructures.
Table1. Exampleofthecalculationofblockssizewhenusingthreedevices
Device Averagetimeper searchspace(ms) Rank Remainingsub-search spacestoexplore Sizeofthenextblock ofsub-searchspaces
AnotherchallengeGPUsposeisthattheyachievethebestperformancewhen runninghundredsoreventhousandsofthreadssimultaneously.Buttousethat levelofparallelism,theymusthaveenoughworktokeepthatmanythreads busy.Otherwise,whenaGPUreceivesablockwithlesssub-searchspacesthan thenumberofthreadsthatwouldallowittoachieveitsbestperformance,the averagetimeneededtoexploreonesub-searchspaceincreasessharply.
ForexampletheNvidiaGeForceGTX980Mtakesabout1.1stofindallthe solutionsforthen-Queens13whensplittingtheproblemin742,586sub-search spaces,and2.4swhensplitinonly338sub-searchspaces.Thischallengeisalso validforCPUs,butnotsoproblematicduetotheirlesserdegreeofparallelism whencomparedwiththeGPUs.
Toovercomethatchallenge,sub-searchspacesmaybefurtherdividedinside adevice,byapplyingamultiplierfactor m tothesizeofablockandturning ablockofsub-searchspacesintoablockwith m timestheoriginalnumberof sub-searchspaces,thatwillbecreatedaspresentedinExample 1
Communication
Toreducetheamountofdatathatistransferredtoeachdevice,allofthemwill receivethefullCSP,thatis,alltheconstraints,variablesandtheirdomains, atthebeginningofthesolvingprocess.Afterwards,whenadevicemustbe instructedtosolveanewblockofsub-searchspaces,insteadofsendingallthe sub-searchspacestothedevice,onlytheinformationneededtocreatethose sub-searchspacesissent.
Ifadeviceistosolvesub-searchspaces SS 2and SS 3fromExample 1,it willreceivetheinformationthatthetreemustbeexpandeddowntodepth2, thatthevaluesofthefirstvariablearerepeated2timesandthevaluesofthe secondvariablearerepeated1timeonly(notrepeated).Withthisinformation thedevicewillknowthatthevaluesofthefirstvariablearerepeated2times, sothethirdsub-searchspace(SS 3)willgetthesecondvalueofthatvariable, andsoondowntotheexpansiondepth.Thevaluesofthevariablesthatwere notexpandedaresimplycopiedfromtheoriginalCSPthatwaspassedtothe devicesatthebeginningofthesolvingprocess.
Eachtimeawork-itemneedsanewsub-searchspacetoexplore,itincreases byonethenumberofthefirst/nextsub-searchspacethatisyettobeexplored onthatdeviceandcreatesthesub-searchspacecorrespondingtothenumber beforebeingincreased.Thenitwilldolabeling,propagationandbacktracking onthatsearch-space,repeatingallthesestepsuntileitherallthesub-search
spacesofthatblockhavebeenexplored,whenallthesolutionsmustbefound, oronlyonesolutioniswantedandoneofthework-itemsonthatdevicefinds asolution.
ImplementationDetails
Severaltestsweremadetofindthebestnumberofwork-groupstouseforeach typeofdevice.ItwasfoundthatforCPUsandMICsthebestresultswere achievedwiththesamenumberofwork-groupsastheamountofcomputeunits ofthedevice.ForGPUs,thepredefinednumberofwork-groupsis4096dueto theincreasedlevelofparallelismallowedbythistypeofdevices.
Theusercanspecifyhowmanysub-searchspacesmustbecreatedorlet PHACTestimatethatnumber.Forestimatingthenumberofsub-searchspaces thatwillbegenerated,PHACTwillsumallthework-itemsthatwillbeusedin allthedevicesandmultiplythatvalueby40ifallthesolutionsmustbefound forthecurrentCSP,orby100ifonlyonesolutionisrequiredorwhensolvingan optimizationproblem.Afterseveralteststhesevalues(40and100)werefound asallowingtoachieveagoodloadbalancingbetweenthedevices,andassuch theyarethepredefinedvalues.
Whenlookingforjustonesolutionoroptimizing,theamountofworksent toeachdeviceisreducedbygeneratingmoresub-searchspacesanddecreasingthesizeoftheblockssenttothedevices,whichmakeseachoneofthem fastertoexplore,tomakesureallthedevicesaresynchronizedonthehostmore frequently.
Asforthenumberofwork-itemsperwork-group,CPUsandMICsare assignedonework-itemperwork-group,astheircomputeunitscanonlyexecuteonethreadatatime.
Onthecontrary,eachGPUcomputeunitcanexecutemorethanonethread simultaneously.Forexample,theNvidiaGeForceGTX980has16SMswith128 CUDAcores3 each,makingatotalof2048CUDAcores.Nevertheless,eachSMis onlycapableofexecutingsimultaneously32threads(usingonly32CUDAcores atthesametime)makingitcapableofrunning512threadssimultaneously[15].
EachSMhasverylimitedresourcesthataresharedbetweenwork-groupsand theirwork-items,thuslimitingthenumberofwork-itemsperwork-groupthat canbeusedaccordingtotheresourcesneededbyeachwork-item.Themain limitationisthesizeofthelocalmemoryofeachSMthatissharedbetween allthework-itemsofthesamework-groupandbetweensomework-groups(8 work-groupsfortheNvidiaGeForceGTX980).
Forthisreason,PHACTestimatesthebestnumberofwork-itemsperworkgrouptouseforGPUs,bylimitingtheamountoflocalmemoryrequiredtothe sizeoftheavailablelocalmemoryontheGPU.Whentheavailablelocalmemory isnotenoughtoefficientlyuseatleastonework-itemperwork-group,PHACT willonlyusetheglobalmemoryofthedevice,whichismuchlargerbutalso muchslower,and32work-itemsperwork-group,aseachSMisonlycapableof running32threadssimultaneously.
3 ACUDAcoreisaprocessingelementcapableofexecutingoneintegerorfloating instructionperclockforathread.
NotethatPHACTrepresentsvariabledomainsaseither32-bitbitmaps,multiplesof64-bitbitmaps,oras(compact)intervals.Whenusingintervals,PHACT isslowerthanwhenusingbitmaps,butintervalsaremeanttobeusedinstead oflargerbitmapsonsystemswherethesizeoftheRAMisanissue.
ThetechniquesdescribedinthissectionallowPHACTtouseallthedevices compatiblewithOpenCLtosolveaCSP.Itsplitsthesearchspaceinmultiple searchspacesthataredistributedamongthedevicesinblockstoreducethe numberofcommunicationsbetweenthehostandthedevices.Thesizeofeach blockiscalculatedaccordingtothespeedoftherespectivedevicewhensolving thepreviousblockstotrytoachieveagoodloadbalancingbetweenthedevices. Thesizeofthedatatransfersbetweenthedevicesandthehostisreducedby replacingtheblocksoffullycreatedsearchspaceswithasmalldatasetcontaining theinformationneededforadevicetogeneratethosesearchspaces.
5ResultsandDiscussion
PHACTwasevaluatedonfindingallthesolutionsforfourdifferentCSPs,on optimizingoneotherCSPandonfindingonesolutionforanotherCSP,each onewithtwodifferentsizes,exceptfortheLatinProblemwhosesmallersize issolvedtoofastandabiggersizetakestoolongtosolve.Thosetestswere executedonone,twoandthreedevicesandonfourdifferentmachinesrunning LinuxtoevaluatethespeedupswhenaddingmoredevicestohelptheCPU. PHACTperformancewascomparedwiththoseofPaCCSandGecode5.1.0 onthesefourmachines.Thefourmachineshavethefollowingcharacteristics:
M1. Machinewith32GBofRAMand: –IntelCorei7-4870HQ(8computeunits); –NvidiaGeForceGTX980M(12computeunits).
M2. Machinewith64GBofRAMand:
–IntelXeonE5-2690v2(referredtoasXeon1,40computeunits); –NvidiaTeslaK20c(13computeunits).
M3. Machinewith128GBofRAMand: –AMDOpteron6376(64computeunits); –TwoAMDTahitis(32computeunitseach).ThesetwodevicesarecombinedinanAMDRadeonHD7990,butaremanagedseparatelyby OpenCL.
M4. Machinewith64GBofRAMand:
–IntelXeonCPUE5-2640v2(referredtoasXeon2,32computeunits); –TwoIntelManyIntegratedCore7120P(240computeunitseach).
Tables 2, 3, 4 and 5 presenttheelapsedtimesandspeedupswhensolving alltheproblemsonM1,M2,M3andM4,respectively.Fiveofthesixproblems modelswereretrievedfromtheMinizincBenchmarkssuite[12].TheLangford NumbersproblemwasretrievedfromCSPLib[9],duetotheabsenceofreified constraintsonPHACTandPaCCS,thatareusedintheMinizincBenchmarks
model,whichwouldleadtodifferentconstraintsbeingusedamongthethree solvers.PaCCSdoesnothavethe“absolutevalue”constraintimplemented,so itwasnottestedwiththeAllIntervalproblem.
Thissetofproblemsallowedtoevaluatethesolverswith8differentconstraintscombinedwitheachotherindifferentways.Allthesolutionswerefound fortheproblemswhosenameisfollowedby“(Count)”onthetables,theoptimalsolutionwassearchedfortheproblemidentifiedwith“(Optim.)”andfor theproblemwhosenameisfollowedby“(One)”,onlyonesolutionwasrequired. Forsimplicity,the4tableshavetheresourcesusedontherespectivemachine identifiedasR1,R2,R3andR4,whereR1meansusingonlyasinglethreadon theCPU,R2meansusingallthethreadsofthatCPU,R3meansusingallthe threadsontheCPUandonedevice(Geforce,Tesla,TahitiorMIC),andR4 meansusingallthethreadsontheCPUandtwoidenticaldevices(MICsor Tahitis).ItmustbenotedthatonlyPHACTiscapableofusingR3andR4 resources.
Table 2 showsthatusingtheGeforcetohelpI7allowedspeedupsofupto 4.66.However,intwoproblems,usingalsotheGeforceresultedinmoretime neededtosolvethesameproblems.Thisresultismainlyduetothesmallnumber ofwork-itemsperwork-groupthatwaseffectivelyusedonGeforce,duetothe localmemorylimitationsdetailedinSect. 4.Onthismachine,addingtheGeforce tohelpI7allowedageometricmeanspeedupof1.53.
TheslowdownnotedwhenoptimizingtheGolombRulerwith12marksis alsoduetotheimpossibilityofdifferentdevicescommunicatingwitheachother whiletheirkernelsarerunning,asstatedinSect. 4.Thisisproblematicwhen optimizing,asadevicewhichfindsabettersolutioncannottelltheotherdevices tofindonlysolutionsbetterthantheoneitjustfound.Insteaditwillfinish exploringitsblockofsub-searchspacesandonlyafterthatitwillinformthehost aboutthenewsolution,andonlyafterthispoint,whenanotherdevicefinishes itsblock,itwillbeinformedaboutthenewsolutionthatmustbeoptimized. Duetothislimitation,thedevicesspendsometimelookingforsolutionsthat mayalreadybeworsethantheonesfoundbyotherdevices.Thisproblemwas alsonotedontheotherthreemachines.
AsfortheLangfordNumbersproblemwith14numbers,theworseresult whenaddingtheGeforcewasduetotheveryunbalancedsub-searchspaces thataregeneratedleadingtomostofsub-searchspacesbeingeasilydetectedas inconsistent,andonlyafewcontainingmostofthework.Thisisproblematic, becauseaseachthreadexploreseachsub-searchspacesequentially,intheend onlyafewthreadswillbeworkingonthehardersub-searchspaceswhilethe othersareidle.Thisproblemwasalsonotedontheotherthreemachines.
PHACTwasfasterthanPaCCSinallproblems,achievingspeedupsofup to5.37.
WhencomparingwithGecode,PHACTachievedgoodspeedupsonallthe problems,exceptonMarketSplit,whichisasimpleproblemwithonlyone constrainttypewhichmayhaveafasterpropagatoronGecode.Onthecontrary, withtheLatinproblem,Gecodewas127.85timesslowerthanPHACTwhen
Table2. ElapsedtimesandspeedupsonM1,with4coresand1GPU
usingonlytheCPU.Gecodewasslowerinsolvingthisproblemwithallthe CPUthreadsthanwhenusingonlyonethread,whichsuggeststhatthemethod usedforloadbalancingbetweenthreadsisveryinefficientforthisproblem.This behaviorofGecodewasnotedinallthemachines.
Table 3 presentstheresultsonsolvingthesameproblemsonM2.Usingthe TeslaGPUtohelptheXeon1resultedinmostofthecasesinaslowdown.In fact,addingtheTeslatohelpXeon1introducedanaverageslowdownof0.84. ThisisduetothefactthatTeslawastheslowestGPUusedonthetests,being nomatchforXeon1.Infact,theworkdonebyTesladidnotcompensatethe timespentbyXeon1(host)tocontrolTesla(device).
Onthismachine,PHACTwasfasterthanPaCCSinallbutoneproblem,resultinginanaveragespeedupof1.44favoringPHACT.Comparingwith Gecode,PHACTwasfasteronalltheproblemswithalltheresourcescombinations.
TheresultsfortheM3machinearepresentedinTable 4.ThismachinepossessestheCPUusedontheteststhathasthegreaternumberofcores(64), anditispairedupwithtwoTahitiGPUs,thatarefasterthanTesla,butslower thanGeforce.SoitisveryhardfortheTahitistodisplaysomeperformance gainswhencomparedwitha64coresCPU.However,withtheAllInterval15 problem,theywerecapableofspeedingupthesolvingprocessby1.48times. Onaverage,addingthetwoTahitiGPUstohelpOpterondidnotallowany
Table3. ElapsedtimesandspeedupsonM2,with40coresand1GPU
speedup,becausethetimespentbyOpterontocontrolandcommunicatewith theTahitiswassimilartothetimethattheOpteronwouldtaketoperformthe workdonebytheTahitis.
TheissueswithGolombRulerandLangfordNumberdiscussedbeforeinthis section,werealsonotedonthismachine.
WhencomparingwithPaCCS,PHACTachievedspeedupsthatrangedfrom 0.21onaverysmallproblemto4.67.PHACTwasfasterthanGecodeinall thetests,exceptwhenoptimizingGolombRuler12withtheOpteronandone Tahiti.
Table 5 presentstheresultsontheM4machine.Thismachinepossessestwo MICswhosearchitectureismoresimilartotheCPUsthantoGPUs,so,they aremorepreparedforsolvingsequentialproblemsthanGPUs.Thatdifference wasnotedwiththeLangfordNumbersproblem,wheretheywerecapableof achievingaspeedupof1.51despitetheunbalancedsub-searchspaces.Onthis machine,addingthetwoMICstohelpXeon2allowedanaveragespeedupof 1.45.WhencountingallthesolutionsfortheCostasArray15,thetwoMICs allowedatopspeedupof1.90.
WhencomparedwithPaCCSandGecodetheresultsareverysimilartothe onesachievedontheothermachines,beingfasterthanGecodeinallbutone problemandfasterthanPaCCSin19ofthe24tests.
Table4. ElapsedtimesandspeedupsonM3,with64coresand2GPUs
Figure 1 presentsthegeometricmeanofthespeedupsachievedbyPHACT againstPaCCSandGecode,showingthatPHACTwasfasterthanGecodeand PaCCSonallthemachineswithalltheresourcescombinations.
WecanobservethatthedifferenceinperformancebetweenPHACTand GecodeisgreateronthemachinesthathaveaCPUwithmorecores,whichshows thattheloadbalancingtechniquesimplementedinPHACTaremoreefficient fortheproblemsthatwerepresentedhere.WhencomparedwithPaCCS,that relationisnolongernoticedandtheresultsaremuchcloserbetweenthetwo solverswhenusingonlytheCPUs.
UsingalltheavailableresourcesonthefourmachinesallowedPHACTto increaseitsperformancewhencomparedtoPaCCSandGecode,whichshows thatitsgreaterversatilitycanleadtoanimprovedperformance.
Table5. ElapsedtimesandspeedupsonM4,with32coresand2MICs
6ConclusionandFutureWork
Toourknowledge,PHACTistheonlyconstraintsolvercapableofusingsimultaneouslyCPUs,GPUs,MICSandanyotherdevicecompatiblewithOpenCL tosolveCSPsinafastermanner.AlthoughGPUsarenotparticularlyefficient forthistypeofproblems,theystillcanspeedupthesolvingprocessandinsome cases,beevenfasterthantheCPUofthesamemachine.
PHACThasbeentestedwith6differentCSPson4differentmachineswith 2and3deviceseach,namelyIntelCPUsandMICs,NvidiaGPUs,andAMD CPUsandGPUs,allowingittoachievespeedupsofupto4.66whencompared withusingonlytheCPUofthemachinetosolveasingleCPS,andageometric meanspeedupofupto1.53whensolvingallthereferredCSPsoneachmachine.
Onthefourmachinesusedfortesting,PHACTachievedageometricmean speedupthatrangedfrom1.28to2.83whencomparedwithPACCS,and2.31to 28.44whencomparedwithGecode.Theuseofallthedevicescompatiblewith OpenCLtosolveaCSPallowedPHACTtoimproveitsperformanceagainst PaCCSandGecodewhencomparedwithusingonlytheCPUs.
Campeotto etal. [4]achievedatopspeedupof6.61whenusingonethreadof aCPUtogetherwithaGPU,whilePHACTachievedaveragespeedupsbetween 1.56and2.88whenusingonethreadonaCPUandaGPU,withatopspeedup of15.15,andaverageandtopspeedupsof7.33and30.63whenreplacingthe GPUbytwoMICs.AlthoughtheirtechniqueofusingtheGPUstopropagate constraintsrelatingmanyvariablesseemstohavesignificanthost–devicesynchronizationrequirements,weintendtotestthisapproachinthefuture.
PHACTisyetbeingimprovedtotrytoovercomethelackofsynchronizationbetweendeviceswhenoptimizing.Thesolutionmaypassbymorefrequent communicationbetweenhostanddevices,takingintoaccountthenumberof solutionsalreadyfoundandincreasingthefrequencyofthecommunicationfor problemswithmoresolutions.
Asfortheunbalancedsub-searchspacesthatleadtoonlyafewthreads workinginparallelwhiletheothershavealreadyfinishedtheirwork,weare analysingawork-sharingstrategy[18]thatmaybeexecutedwhenallthesubsearchspacesgeneratedfortheblockhaveendedbutsomethreadsarestill working.
AMiniZinc/FlatZinc[14]readerisalsobeingimplementedtoallowthedirect inputofproblemsalreadymodeledinthislanguage.
Acknowledgments. ThisworkwaspartiallyfundedbyFunda¸c˜aoparaaCiˆenciae Tecnologia(FCT)undergrantUID/CEC/4668/2016(LISP).Someoftheexperimentationwascarriedoutonthe khromeleque clusteroftheUniversityof ´ Evora,which waspartlyfundedbygrantsALENT-07-0262-FEDER-001872andALENT-07-0262FEDER-001876.
References
1.Arbelaez,A.,Codognet,P.:AGPUimplementationofparallelconstraint-based localsearch.In:201422ndEuromicroInternationalConferenceonPDP,pp.648–655.IEEE(2014)
2.Bart´ak,R.,Salido,M.A.:Constraintsatisfactionforplanningandschedulingproblems.Constraints 16(3),223–227(2011)
3.Brailsford,S.,Potts,C.,Smith,B.:Constraintsatisfactionproblems:algorithms andapplications.Eur.J.Oper.Res. 119,557–581(1999)
4.Campeotto,F.,DalPal`u,A.,Dovier,A.,Fioretto,F.,Pontelli,E.:Exploringthe useofGPUsinconstraintsolving.In:Flatt,M.,Guo,H.-F.(eds.)PADL2014. LNCS,vol.8324,pp.152–167.Springer,Cham(2014). https://doi.org/10.1007/ 978-3-319-04132-2 11
5.Chu,G.,Schulte,C.,Stuckey,P.J.:Confidence-basedworkstealinginparallel constraintprogramming.In:Gent,I.P.(ed.)CP2009.LNCS,vol.5732,pp.226–241.Springer,Heidelberg(2009). https://doi.org/10.1007/978-3-642-04244-7 20
6.Diaz,D.,Richoux,F.,Codognet,P.,Caniou,Y.,Abreu,S.:Constraint-basedlocal searchforthecostasarrayproblem.In:Hamadi,Y.,Schoenauer,M.(eds.)LION 2012.LNCS,pp.378–383.Springer,Heidelberg(2012). https://doi.org/10.1007/ 978-3-642-34413-8 31
7.Filho,C.,Rocha,D.,Costa,M.,Albuquerque,P.:Usingconstraintsatisfaction problemapproachtosolvehumanresourceallocationproblemsincooperative healthservices.ExpertSyst.Appl. 39(1),385–394(2012)
8.Gaster,B.,Howes,L.,Kaeli,D.,Mistry,P.,Schaa,D.:HeterogeneousComputing withOpenCL.MorganKaufmannPublishersInc.,SanFrancisco(2011)
9.Jefferson,C.,Miguel,I.,Hnich,B.,Walsh,T.,Gent,I.P.:CSPLib:aproblem libraryforconstraints(1999). http://www.csplib.org
10.Jenkins,J.,Arkatkar,I.,Owens,J.D.,Choudhary,A.,Samatova,N.F.:Lessons learnedfromexploringthebacktrackingparadigmontheGPU.In:Jeannot,E., Namyst,R.,Roman,J.(eds.)Euro-Par2011.LNCS,vol.6853,pp.425–437. Springer,Heidelberg(2011). https://doi.org/10.1007/978-3-642-23397-5 42
11.Mairy,J.-B.,Deville,Y.,Lecoutre,C.:Domaink-wiseconsistencymadeassimpleasgeneralizedarcconsistency.In:Simonis,H.(ed.)CPAIOR2014.LNCS, vol.8451,pp.235–250.Springer,Cham(2014). https://doi.org/10.1007/978-3-31907046-9 17
12.MIT:asuiteofminizincbenchmarks(2017). https://github.com/MiniZinc/ minizinc-benchmarks
13.Munshi,A.,Gaster,B.,Mattson,T.G.,Fung,J.,Ginsburg,D.:OpenCLProgrammingGuide,1stedn.Addison-WesleyProfessional,Boston(2011)
14.Nethercote,N.,Stuckey,P.J.,Becket,R.,Brand,S.,Duck,G.J.,Tack,G.:MiniZinc: towardsastandardCPmodellinglanguage.In:Bessi`ere,C.(ed.)CP2007.LNCS, vol.4741,pp.529–543.Springer,Heidelberg(2007). https://doi.org/10.1007/9783-540-74970-7 38
15.NVIDIACorporation:NVIDIAGeForceGTX980featuringmaxwell,themost advancedGPUevermade.Whitepaper.NVIDIACorporation(2014)
16.Pedro,V.:Constraintprogrammingonhierarchicalmultiprocessorsystems.Ph.D. thesis,Universidadede ´ Evora(2012)
17.Pedro,V.,Abreu,S.:Distributedworkstealingforconstraintsolving.In:Vidal, G.,Zhou,N.F.(eds.)CICLOPS-WLPE2010,Edinburgh,Scotland,U.K.(2010)
18.Rolf,C.C.,Kuchcinski,K.:Load-balancingmethodsforparallelanddistributed constraintsolving.In:2008IEEEInternationalConferenceonClusterComputing, pp.304–309,September2008
19.Rolf,C.C.,Kuchcinski,K.:Parallelsolvinginconstraintprogramming.In:MCC 2010,November2010
20.R´egin,J.-C.,Rezgui,M.,Malapert,A.:Embarrassinglyparallelsearch.In:Schulte, C.(ed.)CP2013.LNCS,vol.8124,pp.596–610.Springer,Heidelberg(2013). https://doi.org/10.1007/978-3-642-40627-0 45
21.Schulte,C.:Parallelsearchmadesimple.In:Beldiceanu,N.,etal.(eds.)ProceedingsofTRICS:CP2000,Singapore,September2000
22.Schulte,C.,Duchier,D.,Konvicka,F.,Szokoli,G.,Tack,G.:Genericconstraint developmentenvironment. http://www.gecode.org/
Another random document with no related content on Scribd:
Ladder scaffolds.—A light scaffold of ladders braced, and connected by rails, which also serve the purpose of guard rails, is shown in fig. 43 The ladders, which have parallel sides, are placed about 2 feet away from the building. The boards forming the platform can be laid on the ladder rungs, or if necessary on brackets as shown in fig. 44. The ladders are prevented from falling away from the building by ties which are connected to the ladder as shown in fig. 45, and fastened to the window openings by extension rods as shown in fig. 46. The same figure illustrates the method of tying in the scaffold when the ladders are not opposite to the windows, the rail being connected to at least two ladders. The braces and guard rails are bored for thumb screws at one end, the other being slotted so that they can be adjusted as required. This form of scaffold is only
suitable for repairing purposes, and no weight of material can be stored upon it.
A light repairing scaffold lately put on the market has a platform which is supported and not suspended, but otherwise affords about the same scope to the workmen as the painters’ boats. It consists of one pole and a platform, the latter being levered up and down the pole as required by a man standing on the platform itself. The whole apparatus can be moved by one man standing at the bottom. It is an arrangement comparatively new to the English trade, but is in considerable use in Denmark, Germany, and Sweden.
CHAPTER III
SHORING AND UNDERPINNING
S is the term given to a method of temporarily supporting buildings by a framing of timber acting against the walls of the structure. If the frame consists of more than one shore, it is called a system; if of two or more systems, it becomes a series.
The forces that tend to render a building unstable are due primarily to gravity, but owing to the various resistances set up by the tying together of the building, the force does not always exert itself vertically downwards.
This instability may arise from various causes, the most common being the unequal settlement of materials in new buildings, the pulling down of adjoining buildings, structural alterations and defects, and alterations or disturbances of the adjacent ground which affect the foundations. The pulling down of an adjoining building would, by removing the corresponding resistance, allow the weight of the internal structure of the building to set up forces which at first would act in a horizontal direction outwards. Structural defects, such as an insufficiently tied roof truss, would have the same effect. Structural alterations, such as the removal of the lower portion of a wall in order to insert a shop front would allow a force due to gravity to act vertically downwards.
To resist these forces, three different methods of shoring are in general use, and they are known as flying or horizontal shores, raking shores, and underpinning.
Flying Shores.—Where the thrusts acting upon the wall are in a horizontal direction, flying or raking shores are used to give temporary support. The most direct resistance is gained by the firstnamed, the flying or horizontal shore. There are, however, limits to its application, as, owing to the difficulty of obtaining sound timber of more than 50 to 60 feet in length, a solid body is necessary within that distance, from which the required purchase can be obtained.
It is a method of shoring generally used where one house in a row is to be taken down, the timbers being erected as demolition proceeds, and taken down again as the new work takes its place.
Fig. 47 shows a half-elevation of two general systems of construction.
The framing, as at , may be used alone where the wall to be supported is of moderate height and the opening narrow, but larger frames should be combined, as at .
The framework is for wide openings and walls of considerable height.
The wall plates, 9 in. by 2 in. or 9 in. by 3 in., are first fixed vertically on the walls by wall hooks. Then, in a line with the floors, rectangular holes 4 in. by 3 in. are cut in the centre of wall plates. Into these holes, and at least 41⁄2 inches into the brickwork, needles (also known as tossles and joggles) of the same size are fitted, leaving a projection out from the wall plate of 5 in. or 6 in., sufficient to carry the shore of about 7 in. by 7 in.
The shore, prior to being fixed, has nailed on its top and under sides straining pieces 2 inches thick, and of the same width as the shore. To tighten, oak folding wedges are driven at one end between the shore and wall plate.
F . 47
To stiffen the shore, and to further equalise the given resistance over the defective wall, raking struts are fixed between the straining pieces, and cleats are nailed above and below the shore. These raking struts are tightened by driving wedges between their ends and the straining pieces.
The cleats previous to, and in addition to being nailed, should be slightly mortised into the wall plate. This lessens the likelihood of the nails drawing under the pressure.
A Raking Shore consists of a triangulated system of timber framing, and is used to support defective walls where the resistance to the threatened rupture has to be derived from the ground surrounding the building.
In its simplest form a raking shore is a balk of timber of varying scantlings, but as a rule of square section, inclined from the ground to the defective wall. The angle of inclination is taken from the horizon, and should vary between 60 and 75 degrees. In settling this the space available at the foot of the wall has to be taken into consideration, especially in urban districts where the wall abuts on the footpath.
Fig. 48 shows a raking shore in its simplest form, but usually two or more shores are used (see fig. 49).
The following table from Mr. Stock’s book1 shows the general rule and also the scantlings to be used:
For walls from 15 to 30 feet high
For walls from 30 to 40 feet high
For walls from 40 feet high and upwards
2 shores are necessary in each system
3 shores are necessary in each system
4 shores are necessary in each system
F . 49.—E R S
Taking the angle of the shore at from 60 to 75 degrees:
For walls from 15 to 20 feet high 5 in. by 5 in. may be the scantling for each shore
For walls from 20 to 30 feet high 6 in. by 6 in. may be the scantling for each shore
For walls from 30 to 35 feet high 7 in. by 7 in. may be the scantling for each shore
For walls from 35 to 40 feet high 8 in. by 8 in. may be the scantling for each shore
For walls from 40 to 50 feet high 9 in. by 9 in. may be the scantling for each shore
For walls from 50 feet and upwards 12 in. by 9 in. may be the scantling for each shore
In the greatest length, the beams are 12 in. by 9 in. to give increased rigidity, which prevents any likelihood of sagging.
The wall plate is the first timber put into position. It is placed vertically down the face of the wall, and held in its position by wall hooks. Note should then be taken of the position of floors. If the floor joists run at right angles to the wall, the shore should abut on the wall in such position that it points directly below the wall plate carrying the floor joist. If the joists run parallel to the wall, the shore should act directly on a point representing the meeting of lines drawn down the centre of the wall and across the centre of the floor (see fig. 50).
F . 50
To enable the shore to fulfil this condition, the needle (of 4 in. by 3 in.) should be let through the plate 41⁄2 inches into the wall below the point in question. To strengthen the needle cleats are nailed, and slightly let into the plate immediately above.
The footing, or sole piece, has next to be laid. It consists of a timber 11 in. by 3 in., and long enough to take the bottom ends of the required number of shores. Attention should be paid to the ground in which it is to be bedded, and if this is at all soft, additional timbers
should be laid under, and at right angles with it, to give greater bearing.
The sole piece should not be laid at right angles to the shore, but its face should form, with the outside line of the top shore, an angle somewhat wider, say of 93 degrees. The advantage of this will be seen presently.
The shore itself has now to be prepared. Its top end should be grooved sufficiently (fig. 51) to receive the needle. This will prevent lateral motion when under pressure.
F . 51
The bottom end should be slightly slotted, in order to receive the end of a crowbar (see fig. 52).
It is now placed in position, and gently tightened up by the leverage of a crowbar acting in the slot, and using the sole piece as a fulcrum.
The advantage of the sole piece not being at right angles to the shore can now be seen, as if it were so laid no tightening could be
gained by the leverage. This system is an improvement upon the tightening up by wedges, as the structure is not jarred in any manner. If the frame is to have more than one shore, they are erected in the same manner, the bottom shore being the first put up, the others succeeding in their turn. When in position the shores are dogged to the sole piece and a cleat is nailed down on the outer side of the system. The bottom ends are then bound together by hoop iron just above the ground level. To prevent the shores sagging, struts are fixed as shown on fig. 49.
F . 52
Besides preventing the sagging these struts serve the purpose of keeping the shores in position. They may be fixed as nearly at right angles to the shores as possible, or at right angles to the wall; in any case they should reach to the wall plate at a point just below the needle. The struts should be nailed to the shores and wall plate. If
the latter is wider than the shores, it should be cut to receive the struts.
It sometimes occurs that the timbers are of insufficient length to reach from the sole piece to wall plate. To overcome this difficulty, a short timber is laid on the sole piece against and parallel to the next middle raker, and on this short timber a rider shore stands reaching to its position on the wall plate (see fig. 49).
When this is done the top middle raker should be stiffer to resist the increased cross strain. Stiffness is gained by increasing the depth. A rider shore is tightened by oak folding wedges driven between the foot of the shore and the short timber which supports it.
Note must be taken that the outer raker is not carried too near the top of the building, or else the upward thrust of the shores, which always exists with raking shores, might force the bond or joints.
Fir is the best wood for shoring owing to the ease with which it can be obtained in good length. Another advantage is its straightness of fibre; although, as it is more easily crushed by pressure across the grain, it does not answer so well as oak for wedges, sole pieces, &c.
In erecting flying or raking shores, notice should be taken of the following points.
The systems should be placed from 12 to 15 feet apart if on a wall without openings, otherwise on the piers between the openings.
In very defective walls it is an advantage to use lighter scantlings, the systems being placed closer together. Heavy timbers handled carelessly may precipitate the collapse which it is the intention to avoid.
Wedge driving and tightening should be done as gently as possible. It should be remembered that support only is to be given, and not new thrusts set up, which may result in more harm than good.
Underpinning.—Underpinning is necessary to carry the upper part of a wall, while the lower part is removed; for instance, the insertion of a shop front, or the repairing of a foundation. It is only kept in position until a permanent resistance to the load is effected. Underpinning is, as a rule, unnecessary when the opening to be
made is of less width than five feet. This method of shoring is a simple operation, but yet requires great care in its execution.
The first thing to be done is to remove from the wall all its attendant loads. This is accomplished by strutting from the foundation floor upwards from floor to floor until the roof is reached (see fig. 53).
Header and sole plates 9 in. by 2 in. are put in at right angles to the joists in order to give bearing to the struts.
The portion of the wall to be taken down having been marked out, small openings are made, slightly above the proposed removal, at from 5 to 7 feet apart, and through these, at right angles to the face of the wall itself, steel joists or balk timbers 13 in. by 13 in., called needles, are placed. These are supported at each end by vertical timbers 13 in. by 13 in., called dead shores, which again rest upon sleepers.
The sleepers serve as a bed to the dead shores to which they are dogged, and by distributing the weight over a larger area, they prevent the dead shores sinking under the pressure. The dead shores, if well braced, may be of smaller scantling.
Where it is impossible to arrange for the dead shores to be in one length, the lower pieces are first fixed. They must be of uniform length, and across their top end a transom is carried to support the upper pieces, the bottom ends of which must stand directly over the top ends of the lower pieces (see fig. 53).
Having placed all the timbers in position, and before the tightening up takes place, the windows or other openings in the wall are strutted to prevent any twisting which may take place. This is done as shown on fig. 54, but small windows do not require the centering.
F . 53.—E U
The tightening up is caused by the driving home of oak folding wedges placed in the joints between the needles and the dead shores. This position is better than between the shore and sleeper, as any inequality of driving here would have the tendency to throw the shore out of the perpendicular. For a similar reason the wedges should be driven in the same line as the run of the needle, as cross driving, if unequal, would cause the needle to present an inclined surface to the wall to be carried.
F .
54
In carrying out these operations note should be taken of the following points:—
1. That the dead shores should not stand over cellars or such places. It is better to continue the needle to such a length that solid ground is reached, and the needle can then be strutted from the dead shore.