Encyclopedia of bioinformatics and computational biology: abc of bioinformatics shoba ranganathan -

Page 1


The Unquiet River: A Biography of the Brahmaputra 1st Edition Arupjyoti Saikia

https://ebookmass.com/product/the-unquiet-river-a-biography-of-thebrahmaputra-1st-edition-arupjyoti-saikia/

ebookmass.com

Een vleugje kaneel Olivia Hill

https://ebookmass.com/product/een-vleugje-kaneel-olivia-hill/

ebookmass.com

Infants, Toddlers, and Caregivers: A Curriculum of Respectful, Responsive, Relationship-Based Care and Education Dianne Widmeyer Eyer

https://ebookmass.com/product/infants-toddlers-and-caregivers-acurriculum-of-respectful-responsive-relationship-based-care-andeducation-dianne-widmeyer-eyer/ ebookmass.com

House

of Omega: A Reverse Harem Omegaverse (Pack's Companion) Roxy Collins

https://ebookmass.com/product/house-of-omega-a-reverse-haremomegaverse-packs-companion-roxy-collins/

ebookmass.com

Latin Grammarians on the Latin Accent: The Transformation of Greek Grammatical Thought Philomen Probert

https://ebookmass.com/product/latin-grammarians-on-the-latin-accentthe-transformation-of-greek-grammatical-thought-philomen-probert/

ebookmass.com

ENCYCLOPEDIAOF BIOINFORMATICSAND COMPUTATIONALBIOLOGY

ENCYCLOPEDIAOF BIOINFORMATICSAND COMPUTATIONALBIOLOGY

EDITORSINCHIEF

ShobaRanganathan

MacquarieUniversity,Sydney,NSW,Australia

MichaelGribskov

PurdueUniversity,WestLafayette,IN,UnitedStates

KentaNakai

TheUniversityofTokyo,Tokyo,Japan

ChristianSchönbach

NazarbayevUniversity,SchoolofScienceandTechnology,DepartmentofBiology, Astana,Kazakhstan

VOLUME1 Methods

MarioCannataro

TheMagnaGræciaUniversityofCatanzaro,Catanzaro,Italy

ChristianSchonbachiscurrentlyDepartmentChairandProfessoratDepartmentofBiology, SchoolofScienceandTechnology,NazarbayevUniversity,KazakhstanandVisitingProfessorat InternationalResearchCenterforMedicalSciencesatKumamotoUniversity,Japan.Heisa bioinformaticspractitionerinterfacinggenetics,immunologyandinformaticsconducting researchonmajorhistocompatibilitycomplex,immuneresponsesfollowingvirusinfection, biomedicalknowledgediscovery,peroxisomaldiseases,andautismspectrumdisorderthat resultedinmorethan80publications.HispreviousacademicappointmentsincludedProfessoratKumamotoUniversity(2016–2017),NazarbayevUniversity(2013–2016),Kazakhstan,KyushuInstituteofTechnology(2009–2013)Japan,AssociateProfessoratNanyang TechnologicalUniversity(2006–2009),Singapore,andTeamLeaderatRIKENGenomicSciencesCenter(2002–2006),Japan.OtherpriorpositionsincludedPrincipalInvestigatoratKent RidgeDigitalLabs,SingaporeandResearchScientistatChugaiInstituteforMolecularMedicine,Inc.,Japan.In2018hebecameamemberofInternationalSocietyforComputationalBiology(ISCB)BoardofDirectors. Since2010heisservingAsia-PacificBioinformaticsNetwork(APBioNet)asVice-President(Conferences2010–2016)andPresident(2016–2018).

CONTENTSOFVOLUME1

EditorsinChief

Pre-Processing:ADataPreparationStep

SwarupRoy,PoojaSharma,KeshabNath, DhrubaKBhattacharyya,andJugalKKalita

KernelMachines:Introduction

ItaloZoppis,GiancarloMauri,andRiccardoDondi

KernelMethods:SupportVectorMachines

ItaloZoppis,GiancarloMauri,andRiccardoDondi

KernelMachines:Applications

ItaloZoppis,GiancarloMauri,andRiccardoDondi

TextMiningBasicsinBioinformatics

CarmenDeMaio,GiuseppeFenza,VincenzoLoia,andMimmoParente

Data-Information-ConceptContinuumFromaTextMiningPerspective

DaniloCavaliere,SabrinaSenatore,andVincenzoLoia

TextMiningforBioinformaticsUsingBiomedicalLiterature AndreLamuriasandFranciscoMCouto

DeepLearning

MassimoGuarascio,GiuseppeManco,andEttoreRitacco

BiologicalandMedicalOntologies:HumanPhenotypeOntology(HPO)

BiologicalandMedicalOntologies:SystemsBiologyOntology(SBO)

RaffaeleGiancarlo,DanieleGreco,FrancescoLandolina,andSimonaERombo

MaxKotlyar,ChiaraPastrello,AndreaEMRossos,andIgorJurisica

AlignmentofProtein-ProteinInteractionNetworks

SwarupRoy,HazelNManners,AhedElmsallati,andJugalKKalita

VisualizationofBiomedicalNetworks

Anne-ChristinHauschild,ChiaraPastrello,AndreaEMRossos,andIgorJurisica

ClusterAnalysisofBiologicalNetworks

AsudaSharma,HeshamAli,andDarioGhersi

LISTOFCONTRIBUTORSFORVOLUME1

GiuseppeAgapito University “MagnaGraecia” ofCatanzaro, Catanzaro,Italy

HeshamAli UniversityofNebraskaatOmaha,Omaha,NE, UnitedStates

AlessiaAmelio UniversityofCalabria,Rende,Italy

ClaudiaAngelini IstitutoperleApplicazionidelCalcolo “M.Picone” , Napoli,Italy

FabrizioAngiulli UniversityofCalabria,Rende,Italy

MatteoBaldoni UniversityofTurin,Turin,Italy

CristinaBaroglio UniversityofTurin,Turin,Italy

DenisBauer CSIRO,NorthRyde,NSW,Australia

StefanoBeretta UniversityofMilan-Biocca,Milan,Italy

FedericoBergenti UniversityofParma,Parma,Italy

AnnaBernasconi

PolitecnicodiMilano,Milan,Italy

DanielBerrar TokyoInstituteofTechnology,Tokyo,Japan

DhrubaK.Bhattacharyya TezpurUniversity,Tezpur,India

MariaconcettaBilotta UniversityofCatanzaro,Catanzaro,Italy;andInstitute S.AnnaofCrotone,Crotone,Italy

FrancescoBuccafurri UniversityofReggioCalabria,Italy

MassimoCafaro UniversityofSalento,Lecce,Italy

BarbaraCalabrese University “MagnaGraecia” ofCatanzaro, Catanzaro,Italy

MarioCannataro University “MagnaGraecia” ofCatanzaro,Catanzaro, Italy

MariaFrancescaCarfora

IstitutoperleApplicazionidelCalcoloCNR,Napoli, Italy

MauroCastelli NOVAIMS,UniversidadeNovadeLisboa,Lisboa, Portugal

GiuseppeCattaneo UniversityofSalerno,Fisciano,Italy

FrancescoCauteruccio UniversityofCalabria,Rende,Italy

DaniloCavaliere UniversitàdegliStudidiSalerno,Fisciano,Italy

DavideChicco PrincessMargaretCancerCentre,Toronto,ON,Canada

PietroCinaglia

MagnaGraeciaUniversityofCatanzaro,Catanzaro, Italy

FranciscoM.Couto UniversidadedeLisboa,Lisboa,Portugal

LuisaCutillo UniversityofSheffield,Shef field,UnitedKingdom;and ParthenopeUniversityofNaples,Naples,Italy

VincenzoDeAngelis UniversityofReggioCalabria,Italy

DanielaDeCanditiis

IstitutoperleApplicazionidelCalcolo “M.Picone” , Rome,Italy

ItaliaDeFeis

IstitutoperleApplicazionidelCalcoloCNR,Napoli, Italy

CarmenDeMaio UniversityofSalerno,Fisciano,Italy

LucaDenti UniversityofMilan-Biocca,Milan,Italy

GiuseppeDiFatta UniversityofReading,Reading,UnitedKingdom

MariaT.DiMartino University “MagnaGraecia” ofCatanzaro,Catanzaro, Italy

AsudaSharma UniversityofNebraskaatOmaha,Omaha,NE, UnitedStates

DavidSimoncini

UniversityofToulouse,Toulouse,France;and RIKEN,Yokohama,Japan

AnkitaSingh

IITDelhi,NewDelhi,India;andBanasthali Vidyapith,Banasthali,India

GiovanniStracquadanio UniversityofEssex,Colchester,UnitedKingdom

AndreaTagarelli UniversityofCalabria,Rende,Italy

RobertoTagliaferri UniversityofSalerno,Salerno,Italy

DomenicoTalia UniversityofCalabria,Rende,Italy

GiorgioTerracina UniversityofCalabria,Rende,Italy

AlfredoTirado-Ramos UniversityofTexasHealthatSanAntonio,San Antonio,TX,UnitedStates

PaoloTorroni UniversityofBologna,Bologna,Italy

GiuseppeTradigo

UniversityofCalabria,Rende,Italy;andUniversity ofFlorida,Gainsville,UnitedStates

PaoloTrun fio UniversityofCalabria,Rende,Italy

AndreaTundis

DarmstadtUniversityofTechnology,Darmstadt, Germany

NatalieTwine CSIRO,NorthRyde,NSW,Australia

DomenicoUrsino University “Mediterranea” ofReggioCalabria,Reggio Calabria,Italy

AlfonsoUrso ViaUgoLaMalfa,Palermo,Italy

FilippoUtro

IBMThomasJ.WatsonResearchCenter,Yorktown Heights,NY,UnitedStates

LeonardoVanneschi NOVAIMS,UniversidadeNovadeLisboa,Lisboa, Portugal

PierangeloVeltri University “MagnaGraecia” ofCatanzaro,Catanzaro, Italy

RamakanthC.Venkata UniversityofNebraskaatOmaha,Omaha,NE,United States

GiuseppeVizzari UniversityofMilano-Bicocca,Milan,Italy

HaiyingWang UlsterUniversity,Newtonabbey,NorthernIreland, UnitedKingdom

JyotsnaT.Wassan UlsterUniversity,Newtonabbey,NorthernIreland, UnitedKingdom

MarcoWiltgen GrazGeneralHospitalandUniversityClinics,Graz, Austria

KamY.J.Zhang RIKEN,Yokohama,Japan

HuiruZheng UlsterUniversity,Newtonabbey,NorthernIreland, UnitedKingdom

ItaloZoppis UniversityofMilan-Biocca,Milan,Italy

ChiaraZucco University “MagnaGraecia” ofCatanzaro,Catanzaro, Italy

AlgorithmsFoundations

NadiaPisanti, UniversityofPisa,Pisa,Italy

r 2019ElsevierInc.Allrightsreserved.

Introduction

Biologyoffersahugeamountandvarietyofdatatobeprocessed.Suchdatahastobestored,analysed,compared,searched, classified,etcetera,feedingwithnewchallengesmany fieldsofcomputerscience.Amongthem,algorithmicsplaysaspecialrolein theanalysisofbiologicalsequences,structures,andnetworks.Indeed,especiallyduetothe floodofdatacomingfromsequencing projectsaswellasfromitsdown-streamanalysis,thesizeofdigitalbiologicaldatatobestudiedrequiresthedesignofvery efficientalgorithms.Moreover,biologyhasbecome,probablymorethananyotherfundamentalscience,agreatsourceofnew algorithmicproblemsaskingforaccuratesolutions.Nowadays,biologistsmoreandmoreneedtoworkwith insilico data,and thereforeitisimportantforthemtounderstandwhyandhowanalgorithmworks,inordertobecon fidentinitsresults.Thegoal ofthischapteristogiveanoverviewoffundamentalsofalgorithmsdesignandevaluationtoanon-computerscientist.

AlgorithmsandTheirComplexity

Computationallyspeaking,a problem isdefinedbyaninput/outputrelation:wearegivenaninput,andwewanttoreturnas outputawelldefinedsolutionwhichisafunctionoftheinputsatisfyingsomeproperty.

An algorithm isacomputationalprocedure(describedbymeansofanunambiguoussequenceofinstructions)thathastobe excutedinordertosolveacomputationalproblem.Analgorithmsolvingagivenproblemiscorrectifitoutputstherightresultfor everypossibleinput.Thealgorithmhastobedescribedaccordingtotheentitywhichwillexecuteit:ifthisisacomputer,thenthe algorithmwillhavetobewritteninaprogramminglanguage.

Example: SortingProblem

INPUT:AsequenceSofnnumbers o a1,a2, …,an4

OUTPUT:Apermutation oa 0 1 ; a 0 2 ; …; a 0 n 4 ofSsuchthata0 1 ra 0 2 r…ra 0 n

Givenaproblem,therecanbemanyalgorithmsthatcorrectlysolveit,butingeneraltheywillnotallbeequallyef ficient.The efficiencyofanalgorithmisafunctionofitsinputsize.

Forexample,asolutionforthesortingproblemwouldbetogenerateallpossiblepermutationsofSand,pereachoneofthem, checkwhetherthisissorted.Withthisprocedure,oneneedstobeluckyto findtherightsortingfast,asthereisanexponential(in n)numberofsuchpermutationsandintheaveragecase,aswellasintheworstcase,thisalgorithmwouldrequireanumberof elementaryoperations(suchaswriteavalueinamemorycell,comparingtwovalues,swappingtwovalues,etcetera)whichis exponentialintheinputsizen.Inthiscase,sincetheworstcasecannotbeexcluded,wesaythatthealgorithmhasanexponential timecomplexity.Incomputerscience,exponentialalgorithmsareconsidered intractable.Analgorithmis,instead, tractable,ifits complexityfunctionispolynomialintheinputsize.The complexityofaproblem isthatofthemostefficientalgorithmthatsolvesit. Fortunately,thesortingproblemistractable,asthereexisttractablesolutionsthatwewilldescribelater.

Inordertoevaluatetherunningtimeofanalgorithmindependentlyfromthespecifichardwareonwhichitisexecuted,thisis computedintermsoftheamountofsimpleoperationstowhichitisassignedanunitarycostor,however,acostwhichisconstant withrespecttotheinputsize.Aconstantrunningtimeisanegligiblecost,asitdoesnotgrowwhentheinputsizedoes;moreover, aconstantfactorsummedupwithahigherdegreepolynomialinnisalsonegligible;furthermore,evenaconstantfactor multiplyingahigherpolynomialisconsiderednegligibleinrunningtimeanalysis.Whatcountsisthegrowthfactorwithrespectto theinputsize,i.e.the asymptotic complexityT(n)astheinputsizengrows.Incomputationalcomplexitytheory,thisisformalized usingthe big-O notationthatexcludesbothcoefficientsandlowerorderterms:theasymptotictimecomplexityT(n)ofan algorithmisinO(f(n))ifthereexistn0 andc40suchthatT(n)rcf(n)forallnZn0.Forexample,analgorithmthatscansaninput ofsizenaconstantnumberoftimes,andthenperformsaconstantnumberofsomeotheroperations,takesO(n)time,andissaid tohavelineartimecomplexity.AnalgorithmthattakeslineartimeonlyintheworstcaseisalsosaidtobeinO(n),becausethe big-Onotationrepresentsanupperbound.Thereisalsoanasymptoticcomplexitynotation O(f(n))forthelowerbound:T(n) ¼ O (f(n))wheneverf(n) ¼ O(T(n)).Athirdnotation Y(f(n))denotesasymptoticequivalence:wewriteT(n) ¼ Y(f(n))ifbothT(n) ¼ O(f(n))andf(n) ¼ O(T(n))hold.Forexample,analgorithmthat always performsalinearscanoftheinput,andnotjustinthe worstcase,hastimecomplexityin Y(n).Finally,analgorithmwhichneedstoatleastread,hencescan,thewholeinputofsizen (andpossibilyalsoperformmorecostlytasks),hastimecomplexityin O(n).

Timecomplexityisnottheonlycostparameterofanalgorithm: spacecomplexity isalsorelevanttoevaluateitsefficiency.For spacecomplexity,computerscientistsdonotmeanthesizeoftheprogramdescribinganalgorithm,butratherthedatastructures thisactuallykeepsinmemoryduringitsexecution.Likefortimecomplexity,theconcernisabouthowmuchmemorythe executiontakesintheworstcaseandwithrespecttotheinputsize.Forexample,analgorithmsolvingthesortingproblemwithout

requiringanyadditionaldatastructure(besidespossiblyaconstantnumberofconstant-sizevariables),wouldhavelinearspace complexity.Alsotheexponentialtimecomplexityalgorithmwedescribedabovehaslinearspacecomplexity:ateachstep,it sufficestokeepinmemoryonlyonepermutationofS,asthosepreviouslyattemptedcanbediscarded.Thisobservationoffersan exampleofwhy,often,timecomplexityisofmoreconcernthanspacecomplexity.Thereasonisnotthatspaceislessrelevantthan time,butratherthatspacecomplexityisinpracticealowerboundof(andthussmallerthan)timecomplexity:ifanalgorithmhas towriteand/orreadacertainamountofdata,thenitforcelyhastoperformatleastthatamountofelementarysteps(Cormen etal.,2009; JonesandPevzner,2004).

IterativeAlgorithms

An iterativealgorithm isanalgorithmwhichrepeatesasamesequenceofactionsseveraltimes;thenumberofsuchtimesdoesnot needtobeknownapriori,butithastobe finite.Inprogramminglanguages,therearebasicallytwokindsofiterativecommands: the for commandrepeatstheactionsanumberoftimeswhichiscomputed,oranyhowknown,beforetheiteractionsbegin;the while command,instead,performstheactionsaslongasacertaingivenconditionissatisfied,andthenumberoftimesthiswill occurisnotknownapriori.Whatwecallherean action isacommandwhichcanbe,onitsturn,againiterative.Thecostofan iterativecommandisthecostofitsactionsmultipliedbythenumberofiterations.

Fromnowon,inthisarticlewewilldescribeanalgorithmbymeansoftheso-called pseudocode:aninformaldescriptionofa realcomputerprogram,whichisamixtureofnaturallanguageandkeywordsrepresentingcommandsthataretypicalofprogramminglanguages.Tothispurpose,beforeexhibitinganexampleofaniterativealgorithmforthesortingproblem,weintroduce thesyntaxofafundamentalelementarycommand:theassignment “ x’E”,whoseeffectistosetthevalueofanexpressionEtothe variablex,andwhosetimecostisconstant,providedthatcomputingthevalueofE,whichcancontainonitsturnvariablesaswell ascallsoffunctions,isalsoconstant.WewillassumethattheinputsequenceSofthesortingproblemisgivenasanarray:anarray isadatastructureofknown fixedlengththatcontainselementsofthesametype(inthiscasenumbers).Thei-thelementofarrayS isdenotedbyS[i],andreadingorwritingS[i]takesconstanttime.Alsoswappingtwovaluesofthearraytakesconstanttime,and wewilldenotethisasasinglecommandinourpseudocode,evenifinpracticeitwillbeimplementedbyafewoperationsthatuse athirdtemporaryvariable.Whatfollowsisthepseudocodeofanalgorithmthatsolvesthesortingprobleminpolynomialtime.

INSERTION-SORT(S,n) for i ¼ 1 to n 1 do j’i while (j40andS[j 1]4S[j]) swapS[j]andS[j 1] j’j 1 endwhile endfor

INSERTION-SORTtakesininputthearraySanditssizen.ItworksiterativelybyinsertingintothepartiallysortedSthe elementsoneaftertheother.Thearrayisindexedfrom0ton 1,anda for commandperformsactionsforeachiintheinterval[1, n 1]sothatattheendofiterationi,theleftendofthearrayuptoitsi-thpositionissorted.Thisisrealizedbymeansofanother iterativecommand,nestedintothe firstone,thatusesasecondindexjthatstartsfromi,comparesS[j](thenewelement)withits predecessor,andpossiblyswapsthemsothatS[j]movesdowntowardsitsrightposition;thenjisdecreasedandthetaskis repeateduntilS[j]hasreacheditscorrectposition;thisinneriterativecommandisa while commandbecausethistaskhastobe performedaslongasthepredecessorofS[j]islargerthanit.

Example: LetusconsiderS ¼ [3,2,7,1].Recallthatarraysareindexedfromposition0(thatis,S[0] ¼ 3,S[1] ¼ 1,andsoon).

INSERTION-SORTfori ¼ 1setsj ¼ 1aswell,andthenexecutesthewhilebecausej ¼ 140andS[0]4S[1]:thesetwovaluesare swappedandjbecomes0sothatthewhilecommandendswithS ¼ [2,3,7,1].Thenanew for iterationstartswithi ¼ 2(noticethat atthistime,correctly,SissorteduptoS[1]),andS[2]istakenintoaccount;thistimethe while commandisenteredwithj ¼ 2and itsconditionisnotsatisfied(asS[2]4S[1])sothatthe while immediatelyendswithoutchangingS:the firstthreevaluesofSare alreadysorted.Finally,thelast for iterationwithi ¼ 4willexecutethe while threetimes(thatis,n 1)swapping1with7,then with3,and finallywith2,leadingtoS ¼ [1,2,3,7]whichisthecorrectoutput.

INSERTION-SORTtakesatleastlineartime(thatis,itstimecomplexityisin O(n))becauseallelementsofSmustberead,and indeedthe for commandisexecuted Y(n)times:onepereacharraypositionfromthesecondtothelast.Theinvariantisthatat thebeginningofeachsuchiteration,thearrayissorteduptopositionS[i 1],andthenthenewvalueatS[i]isprocessed.Each iterationofthe for,besidestheconstanttime(hencenegligible)assignmentj’i,executesthe while command.Thislatterchecks itscondition(inconstanttime)and,ifthenewlyreadelementS[j] isgreaterthan,orequalto,S[j 1] (whichisthelargestofthe sofarsortedarray),thenitdoesnothing;else,itswapsS[j]andS[j 1],decreasesj,checksagainthecondition,andpossibly repeatestheseactions,aslongaseitherS[j] findsitsplaceafterasmallervalue,oritbecomesthenew firstelementofSasitisthe smallestfoundsofar.Therefore,theactionsofthe while commandareneverexecutedifthearrayisalreadysorted.Thisisthebest casetimecomplexityofINSERTION-SORT:linearintheinputsizen.Theworstcaseis,instead,whentheinputarrayissortedin

thereverseorder:inthiscase,ateachiterationi,the while commandhastoperformexactlyiswapstoletS[j]movedowntothe firstposition.Therefore,inthiscase,iterationiofthe for takesisteps,andtherearen 1suchiterationsforeach1rirn 1. Hence,theworstcaserunningtimeis

Asforspacecomplexity,INSERTION-SORTworkswithintheinputarrayplusaconstantnumberoftemporaryvariables,and henceithaslinearspacecomplexity.Beingnalsoalowerbound(thewholearraymustbestored),inthiscasethespace complexityisoptimal.

Thealgorithmwejustdescribedisanexampleofiterativealgorithmthatrealisesaquiteintuitivesortingstrategy;indeed,often thisalgorithmisexplainedasthewaywewouldsortplayingcardsinonehandbyusingtheotherhandtoiterativelyinserteach newcardinitscorrectposition.Iterationispowerfulenoughtoachieve,foroursortingproblem,apolynomialtime – although almosttrivial – solution;thetimecomplexityofINSERTION-SORTcannothoweverbeprovedtobeoptimalasthelowerbound forthesortingproblemisnotn2,butrathern log2n(resultnotprovedhere).InordertoachieveO(n log2n)timecomplexitywe needanevenmorepowerfulparadigmthatwewillintroduceinnextsection.

RecursiveAlgorithms

A recursivealgorithm isanalgorithmwhich,amongitscommands,recursivelycallsitselfonsmallerinstances:itsplitsthemain problemintosubproblems,recursivelysolvesthemandcombinestheirsolutionsinordertobuildupthesolutionoftheoriginal problem.Thereisafascinatingmathematicalfoundation,thatgoesbacktothearithmeticofPeano,andevenfurtherbackto inductiontheory,fortheconditionsthatguaranteecorrectnessofarecursivealgorithm.Wewillomitdetailsofthisinvolved mathematicalframework.Surprisinglyenough,foracomputerthisapparentlyverycomplexparadigm,iseasytoimplementby meansofasimpledatastructure(the stack).

Inordertoshowhowpowerfulinductionis,wewilluseagainourSortingProblemrunningexample.Namely,wedescribehere therecursiveMERGE-SORTalgorithmwhichachieves Y(n log2n)timecomplexity,andisthusoptimal.Basically,thealgorithm MERGE-SORTsplitsthearrayintotwohalves,sortsthem(bymeansoftworecursivecallsonasmanysub-arraysofsizen/2each), andthenmergestheoutcomesintoawholesortedarray.Thetworecursivecalls,ontheirturn,willrecursivelysplitagaininto subarraysofsizen/4,andsoon,untilthebasecase(thealreadysortedsub-arrayofsize1)isreached.Themergingprocedurewill beimplementedbythefunctionMERGE(pseudocodenotshown)whichtakesininputthearrayandthestartingandending positionsofitsportionsthatcontainthetwocontiguoussub-arraystobemerged.Recallingthatthetwohalf-arraystobemerged aresorted,MERGEsimplyusestwoindicesalongthemslidingfromlefttoright,and,ateachstep:makesacomparison,writesthe smallest,andincreasestheindexofthesub-arraywhichcontainedit.Thisisdoneuntilwhenbothsub-arrayshavebeenentirely writtenintotheresult.

MERGE-SORT(S,p,r)

if por then

q’⌊(p þ r)/2m

MERGE-SORT(S,p,q)

MERGE-SORT(S,q þ 1,r)

MERGE(S,p,q,r) endif

Giventheneedofcallingthealgorithmondifferentarrayfragments,theinputparameters,besidesSitself,willbethestarting andendingpositionoftheportionofarraytobesorted.Therefore,the firstcallwillbeMERGE-SORT(S,0,n 1).Thentheindexq whichsplitsSintwohalvesiscomputed,andthetwosofoundsubarraysaresortedbymeansofasmanyrecursivecalls;thetwo resultingsortedarraysofsizen/2arethenfusedbyMERGEintothe finalresult.Thecorrectnessoftherecursionfollowsfromthe factthattherecursivecallisdoneonahalf-longarray,andfromtheterminationcondition “ por ”:ifthisholds,thentherecursion goeson;else(p ¼ r)thereisnothingtodoasthearrayhaslength1anditissorted.Notice,indeed,thatifSisnotempty,thenp4r canneverholdasqiscomputedsuchthatprqor.

ThealgorithmMERGE-SORThaslinear(henceoptimal)spacecomplexityasitonlyusesSitselfplusacontantnumberof variables.ThetimecomplexityT(n)ofMERGE-SORTcanbedefinedbythefollowingrecurrencerelation:

because,withaninputofsizen,MERGE-SORTcallsitselftwiceonarraysofsizen/2,andthencallsMERGEwhichtakes,aswe showedabove, Y(n)time.

WenowshowbyinductiononnthatT(n) ¼ Y(n log2n).Thebasecaseissimple:ifn ¼ 1thenSisalreadysortedandcorrectly MERGE-SORTdoesnothingandendsin Y(1)time.Ifn41,assumingthatT(n0 ) ¼ Y(n0 log2n0 )holdsforn0 on,thenwehave

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.
Encyclopedia of bioinformatics and computational biology: abc of bioinformatics shoba ranganathan - by Education Libraries - Issuu