The Unquiet River: A Biography of the Brahmaputra 1st Edition Arupjyoti Saikia
https://ebookmass.com/product/the-unquiet-river-a-biography-of-thebrahmaputra-1st-edition-arupjyoti-saikia/
ebookmass.com
Een vleugje kaneel Olivia Hill
https://ebookmass.com/product/een-vleugje-kaneel-olivia-hill/
ebookmass.com
Infants, Toddlers, and Caregivers: A Curriculum of Respectful, Responsive, Relationship-Based Care and Education Dianne Widmeyer Eyer
https://ebookmass.com/product/infants-toddlers-and-caregivers-acurriculum-of-respectful-responsive-relationship-based-care-andeducation-dianne-widmeyer-eyer/ ebookmass.com
House of Omega: A Reverse Harem Omegaverse (Pack's Companion) Roxy Collins
https://ebookmass.com/product/house-of-omega-a-reverse-haremomegaverse-packs-companion-roxy-collins/
ebookmass.com
Latin Grammarians on the Latin Accent: The Transformation of Greek Grammatical Thought Philomen Probert
https://ebookmass.com/product/latin-grammarians-on-the-latin-accentthe-transformation-of-greek-grammatical-thought-philomen-probert/
ebookmass.com
ENCYCLOPEDIAOF BIOINFORMATICSAND COMPUTATIONALBIOLOGY ENCYCLOPEDIAOF BIOINFORMATICSAND COMPUTATIONALBIOLOGY EDITORSINCHIEF
ShobaRanganathan
MacquarieUniversity,Sydney,NSW,Australia
MichaelGribskov
PurdueUniversity,WestLafayette,IN,UnitedStates
KentaNakai
TheUniversityofTokyo,Tokyo,Japan
ChristianSchönbach
NazarbayevUniversity,SchoolofScienceandTechnology,DepartmentofBiology, Astana,Kazakhstan
VOLUME1 Methods
MarioCannataro
TheMagnaGræciaUniversityofCatanzaro,Catanzaro,Italy
ChristianSchonbachiscurrentlyDepartmentChairandProfessoratDepartmentofBiology, SchoolofScienceandTechnology,NazarbayevUniversity,KazakhstanandVisitingProfessorat InternationalResearchCenterforMedicalSciencesatKumamotoUniversity,Japan.Heisa bioinformaticspractitionerinterfacinggenetics,immunologyandinformaticsconducting researchonmajorhistocompatibilitycomplex,immuneresponsesfollowingvirusinfection, biomedicalknowledgediscovery,peroxisomaldiseases,andautismspectrumdisorderthat resultedinmorethan80publications.HispreviousacademicappointmentsincludedProfessoratKumamotoUniversity(2016–2017),NazarbayevUniversity(2013–2016),Kazakhstan,KyushuInstituteofTechnology(2009–2013)Japan,AssociateProfessoratNanyang TechnologicalUniversity(2006–2009),Singapore,andTeamLeaderatRIKENGenomicSciencesCenter(2002–2006),Japan.OtherpriorpositionsincludedPrincipalInvestigatoratKent RidgeDigitalLabs,SingaporeandResearchScientistatChugaiInstituteforMolecularMedicine,Inc.,Japan.In2018hebecameamemberofInternationalSocietyforComputationalBiology(ISCB)BoardofDirectors. Since2010heisservingAsia-PacificBioinformaticsNetwork(APBioNet)asVice-President(Conferences2010–2016)andPresident(2016–2018).
CONTENTSOFVOLUME1 EditorsinChief
Pre-Processing:ADataPreparationStep
SwarupRoy,PoojaSharma,KeshabNath, DhrubaKBhattacharyya,andJugalKKalita
KernelMachines:Introduction
ItaloZoppis,GiancarloMauri,andRiccardoDondi
KernelMethods:SupportVectorMachines
ItaloZoppis,GiancarloMauri,andRiccardoDondi
KernelMachines:Applications
ItaloZoppis,GiancarloMauri,andRiccardoDondi
TextMiningBasicsinBioinformatics
CarmenDeMaio,GiuseppeFenza,VincenzoLoia,andMimmoParente
Data-Information-ConceptContinuumFromaTextMiningPerspective
DaniloCavaliere,SabrinaSenatore,andVincenzoLoia
TextMiningforBioinformaticsUsingBiomedicalLiterature AndreLamuriasandFranciscoMCouto
DeepLearning
MassimoGuarascio,GiuseppeManco,andEttoreRitacco
BiologicalandMedicalOntologies:HumanPhenotypeOntology(HPO)
BiologicalandMedicalOntologies:SystemsBiologyOntology(SBO)
fi
RaffaeleGiancarlo,DanieleGreco,FrancescoLandolina,andSimonaERombo
MaxKotlyar,ChiaraPastrello,AndreaEMRossos,andIgorJurisica
AlignmentofProtein-ProteinInteractionNetworks
SwarupRoy,HazelNManners,AhedElmsallati,andJugalKKalita
VisualizationofBiomedicalNetworks
Anne-ChristinHauschild,ChiaraPastrello,AndreaEMRossos,andIgorJurisica
ClusterAnalysisofBiologicalNetworks
AsudaSharma,HeshamAli,andDarioGhersi
LISTOFCONTRIBUTORSFORVOLUME1 GiuseppeAgapito University “MagnaGraecia” ofCatanzaro, Catanzaro,Italy
HeshamAli UniversityofNebraskaatOmaha,Omaha,NE, UnitedStates
AlessiaAmelio UniversityofCalabria,Rende,Italy
ClaudiaAngelini IstitutoperleApplicazionidelCalcolo “M.Picone” , Napoli,Italy
FabrizioAngiulli UniversityofCalabria,Rende,Italy
MatteoBaldoni UniversityofTurin,Turin,Italy
CristinaBaroglio UniversityofTurin,Turin,Italy
DenisBauer CSIRO,NorthRyde,NSW,Australia
StefanoBeretta UniversityofMilan-Biocca,Milan,Italy
FedericoBergenti UniversityofParma,Parma,Italy
AnnaBernasconi
PolitecnicodiMilano,Milan,Italy
DanielBerrar TokyoInstituteofTechnology,Tokyo,Japan
DhrubaK.Bhattacharyya TezpurUniversity,Tezpur,India
MariaconcettaBilotta UniversityofCatanzaro,Catanzaro,Italy;andInstitute S.AnnaofCrotone,Crotone,Italy
FrancescoBuccafurri UniversityofReggioCalabria,Italy
MassimoCafaro UniversityofSalento,Lecce,Italy
BarbaraCalabrese University “MagnaGraecia” ofCatanzaro, Catanzaro,Italy
MarioCannataro University “MagnaGraecia” ofCatanzaro,Catanzaro, Italy
MariaFrancescaCarfora
IstitutoperleApplicazionidelCalcoloCNR,Napoli, Italy
MauroCastelli NOVAIMS,UniversidadeNovadeLisboa,Lisboa, Portugal
GiuseppeCattaneo UniversityofSalerno,Fisciano,Italy
FrancescoCauteruccio UniversityofCalabria,Rende,Italy
DaniloCavaliere UniversitàdegliStudidiSalerno,Fisciano,Italy
DavideChicco PrincessMargaretCancerCentre,Toronto,ON,Canada
PietroCinaglia
MagnaGraeciaUniversityofCatanzaro,Catanzaro, Italy
FranciscoM.Couto UniversidadedeLisboa,Lisboa,Portugal
LuisaCutillo UniversityofSheffield,Shef field,UnitedKingdom;and ParthenopeUniversityofNaples,Naples,Italy
VincenzoDeAngelis UniversityofReggioCalabria,Italy
DanielaDeCanditiis
IstitutoperleApplicazionidelCalcolo “M.Picone” , Rome,Italy
ItaliaDeFeis
IstitutoperleApplicazionidelCalcoloCNR,Napoli, Italy
CarmenDeMaio UniversityofSalerno,Fisciano,Italy
LucaDenti UniversityofMilan-Biocca,Milan,Italy
GiuseppeDiFatta UniversityofReading,Reading,UnitedKingdom
MariaT.DiMartino University “MagnaGraecia” ofCatanzaro,Catanzaro, Italy
AsudaSharma UniversityofNebraskaatOmaha,Omaha,NE, UnitedStates
DavidSimoncini
UniversityofToulouse,Toulouse,France;and RIKEN,Yokohama,Japan
AnkitaSingh
IITDelhi,NewDelhi,India;andBanasthali Vidyapith,Banasthali,India
GiovanniStracquadanio UniversityofEssex,Colchester,UnitedKingdom
AndreaTagarelli UniversityofCalabria,Rende,Italy
RobertoTagliaferri UniversityofSalerno,Salerno,Italy
DomenicoTalia UniversityofCalabria,Rende,Italy
GiorgioTerracina UniversityofCalabria,Rende,Italy
AlfredoTirado-Ramos UniversityofTexasHealthatSanAntonio,San Antonio,TX,UnitedStates
PaoloTorroni UniversityofBologna,Bologna,Italy
GiuseppeTradigo
UniversityofCalabria,Rende,Italy;andUniversity ofFlorida,Gainsville,UnitedStates
PaoloTrun fio UniversityofCalabria,Rende,Italy
AndreaTundis
DarmstadtUniversityofTechnology,Darmstadt, Germany
NatalieTwine CSIRO,NorthRyde,NSW,Australia
DomenicoUrsino University “Mediterranea” ofReggioCalabria,Reggio Calabria,Italy
AlfonsoUrso ViaUgoLaMalfa,Palermo,Italy
FilippoUtro
IBMThomasJ.WatsonResearchCenter,Yorktown Heights,NY,UnitedStates
LeonardoVanneschi NOVAIMS,UniversidadeNovadeLisboa,Lisboa, Portugal
PierangeloVeltri University “MagnaGraecia” ofCatanzaro,Catanzaro, Italy
RamakanthC.Venkata UniversityofNebraskaatOmaha,Omaha,NE,United States
GiuseppeVizzari UniversityofMilano-Bicocca,Milan,Italy
HaiyingWang UlsterUniversity,Newtonabbey,NorthernIreland, UnitedKingdom
JyotsnaT.Wassan UlsterUniversity,Newtonabbey,NorthernIreland, UnitedKingdom
MarcoWiltgen GrazGeneralHospitalandUniversityClinics,Graz, Austria
KamY.J.Zhang RIKEN,Yokohama,Japan
HuiruZheng UlsterUniversity,Newtonabbey,NorthernIreland, UnitedKingdom
ItaloZoppis UniversityofMilan-Biocca,Milan,Italy
ChiaraZucco University “MagnaGraecia” ofCatanzaro,Catanzaro, Italy
AlgorithmsFoundations NadiaPisanti, UniversityofPisa,Pisa,Italy
r 2019ElsevierInc.Allrightsreserved.
Introduction
Biologyoffersahugeamountandvarietyofdatatobeprocessed.Suchdatahastobestored,analysed,compared,searched, classified,etcetera,feedingwithnewchallengesmany fieldsofcomputerscience.Amongthem,algorithmicsplaysaspecialrolein theanalysisofbiologicalsequences,structures,andnetworks.Indeed,especiallyduetothe floodofdatacomingfromsequencing projectsaswellasfromitsdown-streamanalysis,thesizeofdigitalbiologicaldatatobestudiedrequiresthedesignofvery efficientalgorithms.Moreover,biologyhasbecome,probablymorethananyotherfundamentalscience,agreatsourceofnew algorithmicproblemsaskingforaccuratesolutions.Nowadays,biologistsmoreandmoreneedtoworkwith insilico data,and thereforeitisimportantforthemtounderstandwhyandhowanalgorithmworks,inordertobecon fidentinitsresults.Thegoal ofthischapteristogiveanoverviewoffundamentalsofalgorithmsdesignandevaluationtoanon-computerscientist.
AlgorithmsandTheirComplexity Computationallyspeaking,a problem isdefinedbyaninput/outputrelation:wearegivenaninput,andwewanttoreturnas outputawelldefinedsolutionwhichisafunctionoftheinputsatisfyingsomeproperty.
An algorithm isacomputationalprocedure(describedbymeansofanunambiguoussequenceofinstructions)thathastobe excutedinordertosolveacomputationalproblem.Analgorithmsolvingagivenproblemiscorrectifitoutputstherightresultfor everypossibleinput.Thealgorithmhastobedescribedaccordingtotheentitywhichwillexecuteit:ifthisisacomputer,thenthe algorithmwillhavetobewritteninaprogramminglanguage.
Example: SortingProblem
INPUT:AsequenceSofnnumbers o a1,a2, …,an4
OUTPUT:Apermutation oa 0 1 ; a 0 2 ; …; a 0 n 4 ofSsuchthata0 1 ra 0 2 r…ra 0 n
Givenaproblem,therecanbemanyalgorithmsthatcorrectlysolveit,butingeneraltheywillnotallbeequallyef ficient.The efficiencyofanalgorithmisafunctionofitsinputsize.
Forexample,asolutionforthesortingproblemwouldbetogenerateallpossiblepermutationsofSand,pereachoneofthem, checkwhetherthisissorted.Withthisprocedure,oneneedstobeluckyto findtherightsortingfast,asthereisanexponential(in n)numberofsuchpermutationsandintheaveragecase,aswellasintheworstcase,thisalgorithmwouldrequireanumberof elementaryoperations(suchaswriteavalueinamemorycell,comparingtwovalues,swappingtwovalues,etcetera)whichis exponentialintheinputsizen.Inthiscase,sincetheworstcasecannotbeexcluded,wesaythatthealgorithmhasanexponential timecomplexity.Incomputerscience,exponentialalgorithmsareconsidered intractable.Analgorithmis,instead, tractable,ifits complexityfunctionispolynomialintheinputsize.The complexityofaproblem isthatofthemostefficientalgorithmthatsolvesit. Fortunately,thesortingproblemistractable,asthereexisttractablesolutionsthatwewilldescribelater.
Inordertoevaluatetherunningtimeofanalgorithmindependentlyfromthespecifichardwareonwhichitisexecuted,thisis computedintermsoftheamountofsimpleoperationstowhichitisassignedanunitarycostor,however,acostwhichisconstant withrespecttotheinputsize.Aconstantrunningtimeisanegligiblecost,asitdoesnotgrowwhentheinputsizedoes;moreover, aconstantfactorsummedupwithahigherdegreepolynomialinnisalsonegligible;furthermore,evenaconstantfactor multiplyingahigherpolynomialisconsiderednegligibleinrunningtimeanalysis.Whatcountsisthegrowthfactorwithrespectto theinputsize,i.e.the asymptotic complexityT(n)astheinputsizengrows.Incomputationalcomplexitytheory,thisisformalized usingthe big-O notationthatexcludesbothcoefficientsandlowerorderterms:theasymptotictimecomplexityT(n)ofan algorithmisinO(f(n))ifthereexistn0 andc40suchthatT(n)rcf(n)forallnZn0.Forexample,analgorithmthatscansaninput ofsizenaconstantnumberoftimes,andthenperformsaconstantnumberofsomeotheroperations,takesO(n)time,andissaid tohavelineartimecomplexity.AnalgorithmthattakeslineartimeonlyintheworstcaseisalsosaidtobeinO(n),becausethe big-Onotationrepresentsanupperbound.Thereisalsoanasymptoticcomplexitynotation O(f(n))forthelowerbound:T(n) ¼ O (f(n))wheneverf(n) ¼ O(T(n)).Athirdnotation Y(f(n))denotesasymptoticequivalence:wewriteT(n) ¼ Y(f(n))ifbothT(n) ¼ O(f(n))andf(n) ¼ O(T(n))hold.Forexample,analgorithmthat always performsalinearscanoftheinput,andnotjustinthe worstcase,hastimecomplexityin Y(n).Finally,analgorithmwhichneedstoatleastread,hencescan,thewholeinputofsizen (andpossibilyalsoperformmorecostlytasks),hastimecomplexityin O(n).
Timecomplexityisnottheonlycostparameterofanalgorithm: spacecomplexity isalsorelevanttoevaluateitsefficiency.For spacecomplexity,computerscientistsdonotmeanthesizeoftheprogramdescribinganalgorithm,butratherthedatastructures thisactuallykeepsinmemoryduringitsexecution.Likefortimecomplexity,theconcernisabouthowmuchmemorythe executiontakesintheworstcaseandwithrespecttotheinputsize.Forexample,analgorithmsolvingthesortingproblemwithout
requiringanyadditionaldatastructure(besidespossiblyaconstantnumberofconstant-sizevariables),wouldhavelinearspace complexity.Alsotheexponentialtimecomplexityalgorithmwedescribedabovehaslinearspacecomplexity:ateachstep,it sufficestokeepinmemoryonlyonepermutationofS,asthosepreviouslyattemptedcanbediscarded.Thisobservationoffersan exampleofwhy,often,timecomplexityisofmoreconcernthanspacecomplexity.Thereasonisnotthatspaceislessrelevantthan time,butratherthatspacecomplexityisinpracticealowerboundof(andthussmallerthan)timecomplexity:ifanalgorithmhas towriteand/orreadacertainamountofdata,thenitforcelyhastoperformatleastthatamountofelementarysteps(Cormen etal.,2009; JonesandPevzner,2004).
IterativeAlgorithms An iterativealgorithm isanalgorithmwhichrepeatesasamesequenceofactionsseveraltimes;thenumberofsuchtimesdoesnot needtobeknownapriori,butithastobe finite.Inprogramminglanguages,therearebasicallytwokindsofiterativecommands: the for commandrepeatstheactionsanumberoftimeswhichiscomputed,oranyhowknown,beforetheiteractionsbegin;the while command,instead,performstheactionsaslongasacertaingivenconditionissatisfied,andthenumberoftimesthiswill occurisnotknownapriori.Whatwecallherean action isacommandwhichcanbe,onitsturn,againiterative.Thecostofan iterativecommandisthecostofitsactionsmultipliedbythenumberofiterations.
Fromnowon,inthisarticlewewilldescribeanalgorithmbymeansoftheso-called pseudocode:aninformaldescriptionofa realcomputerprogram,whichisamixtureofnaturallanguageandkeywordsrepresentingcommandsthataretypicalofprogramminglanguages.Tothispurpose,beforeexhibitinganexampleofaniterativealgorithmforthesortingproblem,weintroduce thesyntaxofafundamentalelementarycommand:theassignment “ x’E”,whoseeffectistosetthevalueofanexpressionEtothe variablex,andwhosetimecostisconstant,providedthatcomputingthevalueofE,whichcancontainonitsturnvariablesaswell ascallsoffunctions,isalsoconstant.WewillassumethattheinputsequenceSofthesortingproblemisgivenasanarray:anarray isadatastructureofknown fixedlengththatcontainselementsofthesametype(inthiscasenumbers).Thei-thelementofarrayS isdenotedbyS[i],andreadingorwritingS[i]takesconstanttime.Alsoswappingtwovaluesofthearraytakesconstanttime,and wewilldenotethisasasinglecommandinourpseudocode,evenifinpracticeitwillbeimplementedbyafewoperationsthatuse athirdtemporaryvariable.Whatfollowsisthepseudocodeofanalgorithmthatsolvesthesortingprobleminpolynomialtime.
INSERTION-SORT(S,n) for i ¼ 1 to n 1 do j’i while (j40andS[j 1]4S[j]) swapS[j]andS[j 1] j’j 1 endwhile endfor
INSERTION-SORTtakesininputthearraySanditssizen.ItworksiterativelybyinsertingintothepartiallysortedSthe elementsoneaftertheother.Thearrayisindexedfrom0ton 1,anda for commandperformsactionsforeachiintheinterval[1, n 1]sothatattheendofiterationi,theleftendofthearrayuptoitsi-thpositionissorted.Thisisrealizedbymeansofanother iterativecommand,nestedintothe firstone,thatusesasecondindexjthatstartsfromi,comparesS[j](thenewelement)withits predecessor,andpossiblyswapsthemsothatS[j]movesdowntowardsitsrightposition;thenjisdecreasedandthetaskis repeateduntilS[j]hasreacheditscorrectposition;thisinneriterativecommandisa while commandbecausethistaskhastobe performedaslongasthepredecessorofS[j]islargerthanit.
Example: LetusconsiderS ¼ [3,2,7,1].Recallthatarraysareindexedfromposition0(thatis,S[0] ¼ 3,S[1] ¼ 1,andsoon).
INSERTION-SORTfori ¼ 1setsj ¼ 1aswell,andthenexecutesthewhilebecausej ¼ 140andS[0]4S[1]:thesetwovaluesare swappedandjbecomes0sothatthewhilecommandendswithS ¼ [2,3,7,1].Thenanew for iterationstartswithi ¼ 2(noticethat atthistime,correctly,SissorteduptoS[1]),andS[2]istakenintoaccount;thistimethe while commandisenteredwithj ¼ 2and itsconditionisnotsatisfied(asS[2]4S[1])sothatthe while immediatelyendswithoutchangingS:the firstthreevaluesofSare alreadysorted.Finally,thelast for iterationwithi ¼ 4willexecutethe while threetimes(thatis,n 1)swapping1with7,then with3,and finallywith2,leadingtoS ¼ [1,2,3,7]whichisthecorrectoutput.
INSERTION-SORTtakesatleastlineartime(thatis,itstimecomplexityisin O(n))becauseallelementsofSmustberead,and indeedthe for commandisexecuted Y(n)times:onepereacharraypositionfromthesecondtothelast.Theinvariantisthatat thebeginningofeachsuchiteration,thearrayissorteduptopositionS[i 1],andthenthenewvalueatS[i]isprocessed.Each iterationofthe for,besidestheconstanttime(hencenegligible)assignmentj’i,executesthe while command.Thislatterchecks itscondition(inconstanttime)and,ifthenewlyreadelementS[j] isgreaterthan,orequalto,S[j 1] (whichisthelargestofthe sofarsortedarray),thenitdoesnothing;else,itswapsS[j]andS[j 1],decreasesj,checksagainthecondition,andpossibly repeatestheseactions,aslongaseitherS[j] findsitsplaceafterasmallervalue,oritbecomesthenew firstelementofSasitisthe smallestfoundsofar.Therefore,theactionsofthe while commandareneverexecutedifthearrayisalreadysorted.Thisisthebest casetimecomplexityofINSERTION-SORT:linearintheinputsizen.Theworstcaseis,instead,whentheinputarrayissortedin
thereverseorder:inthiscase,ateachiterationi,the while commandhastoperformexactlyiswapstoletS[j]movedowntothe firstposition.Therefore,inthiscase,iterationiofthe for takesisteps,andtherearen 1suchiterationsforeach1rirn 1. Hence,theworstcaserunningtimeis
Asforspacecomplexity,INSERTION-SORTworkswithintheinputarrayplusaconstantnumberoftemporaryvariables,and henceithaslinearspacecomplexity.Beingnalsoalowerbound(thewholearraymustbestored),inthiscasethespace complexityisoptimal.
Thealgorithmwejustdescribedisanexampleofiterativealgorithmthatrealisesaquiteintuitivesortingstrategy;indeed,often thisalgorithmisexplainedasthewaywewouldsortplayingcardsinonehandbyusingtheotherhandtoiterativelyinserteach newcardinitscorrectposition.Iterationispowerfulenoughtoachieve,foroursortingproblem,apolynomialtime – although almosttrivial – solution;thetimecomplexityofINSERTION-SORTcannothoweverbeprovedtobeoptimalasthelowerbound forthesortingproblemisnotn2,butrathern log2n(resultnotprovedhere).InordertoachieveO(n log2n)timecomplexitywe needanevenmorepowerfulparadigmthatwewillintroduceinnextsection.
RecursiveAlgorithms A recursivealgorithm isanalgorithmwhich,amongitscommands,recursivelycallsitselfonsmallerinstances:itsplitsthemain problemintosubproblems,recursivelysolvesthemandcombinestheirsolutionsinordertobuildupthesolutionoftheoriginal problem.Thereisafascinatingmathematicalfoundation,thatgoesbacktothearithmeticofPeano,andevenfurtherbackto inductiontheory,fortheconditionsthatguaranteecorrectnessofarecursivealgorithm.Wewillomitdetailsofthisinvolved mathematicalframework.Surprisinglyenough,foracomputerthisapparentlyverycomplexparadigm,iseasytoimplementby meansofasimpledatastructure(the stack).
Inordertoshowhowpowerfulinductionis,wewilluseagainourSortingProblemrunningexample.Namely,wedescribehere therecursiveMERGE-SORTalgorithmwhichachieves Y(n log2n)timecomplexity,andisthusoptimal.Basically,thealgorithm MERGE-SORTsplitsthearrayintotwohalves,sortsthem(bymeansoftworecursivecallsonasmanysub-arraysofsizen/2each), andthenmergestheoutcomesintoawholesortedarray.Thetworecursivecalls,ontheirturn,willrecursivelysplitagaininto subarraysofsizen/4,andsoon,untilthebasecase(thealreadysortedsub-arrayofsize1)isreached.Themergingprocedurewill beimplementedbythefunctionMERGE(pseudocodenotshown)whichtakesininputthearrayandthestartingandending positionsofitsportionsthatcontainthetwocontiguoussub-arraystobemerged.Recallingthatthetwohalf-arraystobemerged aresorted,MERGEsimplyusestwoindicesalongthemslidingfromlefttoright,and,ateachstep:makesacomparison,writesthe smallest,andincreasestheindexofthesub-arraywhichcontainedit.Thisisdoneuntilwhenbothsub-arrayshavebeenentirely writtenintotheresult.
MERGE-SORT(S,p,r)
if por then
q’⌊(p þ r)/2m
MERGE-SORT(S,p,q)
MERGE-SORT(S,q þ 1,r)
MERGE(S,p,q,r) endif
Giventheneedofcallingthealgorithmondifferentarrayfragments,theinputparameters,besidesSitself,willbethestarting andendingpositionoftheportionofarraytobesorted.Therefore,the firstcallwillbeMERGE-SORT(S,0,n 1).Thentheindexq whichsplitsSintwohalvesiscomputed,andthetwosofoundsubarraysaresortedbymeansofasmanyrecursivecalls;thetwo resultingsortedarraysofsizen/2arethenfusedbyMERGEintothe finalresult.Thecorrectnessoftherecursionfollowsfromthe factthattherecursivecallisdoneonahalf-longarray,andfromtheterminationcondition “ por ”:ifthisholds,thentherecursion goeson;else(p ¼ r)thereisnothingtodoasthearrayhaslength1anditissorted.Notice,indeed,thatifSisnotempty,thenp4r canneverholdasqiscomputedsuchthatprqor.
ThealgorithmMERGE-SORThaslinear(henceoptimal)spacecomplexityasitonlyusesSitselfplusacontantnumberof variables.ThetimecomplexityT(n)ofMERGE-SORTcanbedefinedbythefollowingrecurrencerelation:
because,withaninputofsizen,MERGE-SORTcallsitselftwiceonarraysofsizen/2,andthencallsMERGEwhichtakes,aswe showedabove, Y(n)time.
WenowshowbyinductiononnthatT(n) ¼ Y(n log2n).Thebasecaseissimple:ifn ¼ 1thenSisalreadysortedandcorrectly MERGE-SORTdoesnothingandendsin Y(1)time.Ifn41,assumingthatT(n0 ) ¼ Y(n0 log2n0 )holdsforn0 on,thenwehave