Encyclopedia of bioinformatics and computational biology ranganathan s (ed.) - Get the ebook instant

Page 1


https://ebookmass.com/product/encyclopedia-of-

Instant digital products (PDF, ePub, MOBI) ready for you

Download now and discover formats that fit your needs...

Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics Shoba Ranganathan

https://ebookmass.com/product/encyclopedia-of-bioinformatics-andcomputational-biology-abc-of-bioinformatics-shoba-ranganathan/

ebookmass.com

Emerging trends in applications and infrastructures for computational biology, bioinformatics, and systems biology : systems and applications 1st Edition Arabnia

https://ebookmass.com/product/emerging-trends-in-applications-andinfrastructures-for-computational-biology-bioinformatics-and-systemsbiology-systems-and-applications-1st-edition-arabnia/

ebookmass.com

Cell biology : translational impact in cancer biology and bioinformatics 1st Edition Mitchell

https://ebookmass.com/product/cell-biology-translational-impact-incancer-biology-and-bioinformatics-1st-edition-mitchell/

ebookmass.com

GATE

2019 Electronics and Communication Engineering Trishna Knowledge Systems

https://ebookmass.com/product/gate-2019-electronics-and-communicationengineering-trishna-knowledge-systems/

ebookmass.com

Rice Bran and Rice Bran Oil: Chemistry, Processing and Utilization Ling-Zhi Cheong

https://ebookmass.com/product/rice-bran-and-rice-bran-oil-chemistryprocessing-and-utilization-ling-zhi-cheong/

ebookmass.com

Engenharia de Controle Moderno 5. ed. Edition Katsuhiko Ogata

https://ebookmass.com/product/engenharia-de-controle-moderno-5-ededition-katsuhiko-ogata/

ebookmass.com

The Comprehensive Textbook of Clinical Biomechanics 2nd Edition Edition Jim Richards

https://ebookmass.com/product/the-comprehensive-textbook-of-clinicalbiomechanics-2nd-edition-edition-jim-richards/

ebookmass.com

Navigating the Challenges of Concussion Michael S. Jaffee

https://ebookmass.com/product/navigating-the-challenges-of-concussionmichael-s-jaffee/

ebookmass.com

John for Normal People Jennifer Garcia Bashaw

https://ebookmass.com/product/john-for-normal-people-jennifer-garciabashaw/

ebookmass.com

Fundamentals of Semiconductor Devices (2nd Edition) Betty Anderson

https://ebookmass.com/product/fundamentals-of-semiconductordevices-2nd-edition-betty-anderson/

ebookmass.com

ENCYCLOPEDIAOF BIOINFORMATICSAND COMPUTATIONALBIOLOGY

ENCYCLOPEDIAOF BIOINFORMATICSAND COMPUTATIONALBIOLOGY

EDITORSINCHIEF

ShobaRanganathan

MacquarieUniversity,Sydney,NSW,Australia

MichaelGribskov

PurdueUniversity,WestLafayette,IN,UnitedStates

KentaNakai

TheUniversityofTokyo,Tokyo,Japan

ChristianSchönbach

NazarbayevUniversity,SchoolofScienceandTechnology,DepartmentofBiology, Astana,Kazakhstan

VOLUME1 Methods

MarioCannataro

TheMagnaGræciaUniversityofCatanzaro,Catanzaro,Italy

Elsevier

Radarweg29,POBox211,1000AEAmsterdam,Netherlands TheBoulevard,LangfordLane,Kidlington,OxfordOX51GB,UnitedKingdom 50HampshireStreet,5thFloor,CambridgeMA02139,UnitedStates

Copyright r 2019ElsevierInc.Allrightsreserved.

Nopartofthispublicationmaybereproducedortransmittedinanyformorbyanymeans,electronicormechanical,including photocopying,recording,oranyinformationstorageandretrievalsystem,withoutpermissioninwritingfromthepublisher.Detailson howtoseekpermission,furtherinformationaboutthePublisher’spermissionspoliciesandourarrangementswithorganizationssuchas theCopyrightClearanceCenterandtheCopyrightLicensingAgency,canbefoundatourwebsite:www.elsevier.com/permissions.

ThisbookandtheindividualcontributionscontainedinitareprotectedundercopyrightbythePublisher(otherthanasmaybenoted herein).

Notices

Knowledgeandbestpracticeinthis fieldareconstantlychanging.Asnewresearchandexperiencebroadenourunderstanding,changesin researchmethods,professionalpractices,ormedicaltreatmentmaybecomenecessary.

Practitionersandresearchersmayalwaysrelyontheirownexperienceandknowledgeinevaluatingandusinganyinformation,methods, compounds,orexperimentsdescribedherein.Inusingsuchinformationormethodstheyshouldbemindfuloftheirownsafetyandthe safetyofothers,includingpartiesforwhomtheyhaveaprofessionalresponsibility.

Tothefullestextentofthelaw,neitherthePublishernortheauthors,contributors,oreditors,assumeanyliabilityforanyinjuryand/or damagetopersonsorpropertyasamatterofproductsliability,negligenceorotherwise,orfromanyuseoroperationofanymethods, products,instructions,orideascontainedinthematerialherein.

LibraryofCongressCataloging-in-PublicationData

AcatalogrecordforthisbookisavailablefromtheLibraryofCongress

BritishLibraryCataloguing-in-PublicationData

AcataloguerecordforthisbookisavailablefromtheBritishLibrary

ISBN978-0-12-811414-8

Forinformationonallpublicationsvisitour websiteathttp://store.elsevier.com

Publisher:OliverWalter

AcquisitionEditor:SamCrowe

ContentProjectManager:PaulaDavies

AssociateContentProjectManager:EbinClintonRozario Designer:GregHarris

PrintedandboundintheUnitedStates

EDITORSINCHIEF

ShobaRanganathanholdsaChairinBioinformaticsatMacquarieUniversitysince2004.She hasheldresearchandacademicpositionsinIndia,USA,SingaporeandAustraliaaswellasa consultancyinindustry.ShehostedtheMacquarieNodeoftheARCCentreofExcellencein Bioinformatics(2008–2013).Shewaselectedthe firstAustralianBoardDirectoroftheInternationalSocietyforComputationalBiology(ISCB;2003–2005);PresidentofAsia-Paci fic BioinformaticsNetwork(2005–2016)andSteeringCommitteeMember(2007–2012)of BioinformaticsAustralia.SheinitiatedtheWorkshopsonEducationinBioinformatics(WEB) asanISMB2001SpecialInterestGroupmeetingandalsoservedasChairofICSB'sEducaton Committee.ShobacurrentlyservesasCo-ChairoftheComputationalMassSpectrometry (CompMS)initiativeoftheHumanProteomeOrganization(HuPO),ISCBandMetabolomics SocietyandasBoardDirector,APBioNetLtd.

Shoba'sresearchaddressesseveralkeyareasofbioinformaticstounderstandbiological systemsusingcomputationalapproaches.Hergrouphasachievedbothexperienceand expertiseindifferentaspectsofcomputationalbiology,rangingfrommetabolitesandsmall moleculestobiochemicalnetworks,pathwayanalysisandcomputationalsystemsbiology.She hasauthoredaswellaseditedseveralbooksaswellasarticlesforthe2013Encyclopediaof SystemsBiology.SheiscurrentlyanEditor-in-ChiefoftheEncyclopediaofBioinformaticsand ComputationalBiologyandtheBioinformaticsSectionEditoroftheReferenceModuleinLifeScienceaswellasaneditorialboard memberofseveralbioinformaticsjournals.

Dr.GribskovgraduatedfromOregonStateUniversityin1979withaBachelorsofScience degree(withHonors)inBiochemistryandBiophysics.HethenmovedtotheUniversityof Wisconsin-Madisonforgraduatestudiesfocusedonthestructureandfunctionofthesigma subunitof E.coli RNApolymerase,receivinghisPh.D.in1985.Dr.GribskovstudiedX-ray crystallographyasanAmericanCancerSocietypost-doctoralfellowatUCLAinthelaboratory ofDavidEisenberg,andfollowedthiswithbothcrystallographicandcomputationalstudiesat theNationalCancerInstitute.In1992,Dr.GribskovmovedtotheSanDiegoSupercomputer CenterattheUniversityofCalifornia,SanDiegowherehewasleadscientistintheareaof computationalbiologyandanadjunctassociateprofessorinthedepartmentofBiology.From 2003to2007,Dr.GribskovwasthepresidentoftheInternationalSocietyforComputational Biology,thelargestprofessionalsocietydevotedtobioinformaticsandcomputationalbiology. In2004,Dr.GribskovmovedtoPurdueUniversitywhereheholdsanappointmentasafull professorintheBiologicalSciencesandComputerSciencedepartments(bycourtesy).Dr. Gribskov'sinterestsincludegenomicandtranscriptomicanalysisofmodelandnon-model organisms,theapplicationofpatternrecognitionandmachinelearningtechniquestobiomolecules,thedesignandimplementationofbiologicaldatabasestosupportmolecularandsystemsbiology,developmentofmethodstostudyRNAstructural patterns,andsystemsbiologystudiesofhumandisease.

KentaNakaireceivedthePhDdegreeonthepredictionofsubcellularlocalizationsitesof proteinsfromKyotoUniversityin1992.From1989,hehasworkedatKyotoUniversity, NationalInstituteofBasicBiology,andOsakaUniversity.From1999to2003,hewasan AssociateProfessorattheHumanGenomeCenter,theInstituteofMedicalScience,theUniversityofTokyo,Japan.Since2003,hehasbeenafullProfessoratthesameinstitute.Hismain researchinterestistodevelopcomputationalwaysforinterpretingbiologicalinformation, especiallythatoftranscriptionalregulation,fromgenomesequencedata.Hehaspublished morethan150papers,someofwhichhavebeencitedmorethan1,000times.

ChristianSchonbachiscurrentlyDepartmentChairandProfessoratDepartmentofBiology, SchoolofScienceandTechnology,NazarbayevUniversity,KazakhstanandVisitingProfessorat InternationalResearchCenterforMedicalSciencesatKumamotoUniversity,Japan.Heisa bioinformaticspractitionerinterfacinggenetics,immunologyandinformaticsconducting researchonmajorhistocompatibilitycomplex,immuneresponsesfollowingvirusinfection, biomedicalknowledgediscovery,peroxisomaldiseases,andautismspectrumdisorderthat resultedinmorethan80publications.HispreviousacademicappointmentsincludedProfessoratKumamotoUniversity(2016–2017),NazarbayevUniversity(2013–2016),Kazakhstan,KyushuInstituteofTechnology(2009–2013)Japan,AssociateProfessoratNanyang TechnologicalUniversity(2006–2009),Singapore,andTeamLeaderatRIKENGenomicSciencesCenter(2002–2006),Japan.OtherpriorpositionsincludedPrincipalInvestigatoratKent RidgeDigitalLabs,SingaporeandResearchScientistatChugaiInstituteforMolecularMedicine,Inc.,Japan.In2018hebecameamemberofInternationalSocietyforComputationalBiology(ISCB)BoardofDirectors. Since2010heisservingAsia-PacificBioinformaticsNetwork(APBioNet)asVice-President(Conferences2010–2016)andPresident(2016–2018).

VOLUMEEDITORS

MarioCannataroisaFullProfessorofComputerEngineeringandBioinformaticsatUniversity “MagnaGraecia” ofCatanzaro,Italy.HeisthedirectoroftheDataAnalyticsresearch centerandthechairoftheBioinformaticsLaboratoryatUniversity “MagnaGraecia” of Catanzaro.Hiscurrentresearchinterestsincludebioinformatics,medicalinformatics,data analytics,parallelanddistributedcomputing.HeisaMemberoftheeditorialboardsof BriefingsinBioinformatics,High-Throughput,EncyclopediaofBioinformaticsandComputationalBiology,EncyclopediaofSystemsBiology.Hewasguesteditorofseveralspecial issuesonbioinformaticsandheisservingasaprogramcommitteememberofseveralconferences.Hepublishedthreebooksandmorethan200papersininternationaljournalsand conferenceproceedings.Prof.CannataroisaSeniorMemberofIEEE,ACMandBITS,anda memberoftheBoardofDirectorsforACMSIGBIO.

BrunoGaetaisSeniorLecturerandDirectorofStudiesinBioinformaticsintheSchoolof ComputerScienceandEngineeringatUNSWAustralia.Hisresearchinterestscovermultiple areasofbioinformaticsincludinggeneregulationandproteinstructure,currentlywithafocus ontheimmunesystem,antibodygenesandthegenerationofantibodydiversity.Heisa pioneerofbioinformaticseducationandhastrainedthousandsofbiologistsandtrainee bioinformaticiansintheuseofcomputationaltoolsforbiologicalresearchthroughcourses, workshopsaswellasabookseries.Hehasworkedbothinacademiaandinthebioinformaticsindustry,andcurrentlycoordinatesthelargestbioinformaticsundergraduateprogram inAustralia.

MohammadAsifKhan,PhD,isanassociateprofessorandtheDeanoftheSchoolofData Sciences,aswellastheDirectoroftheCentreforBioinformaticsatPerdanaUniversity, Malaysia.HeisalsoavisitingscientistattheDepartmentofPharmacologyandMolecular Sciences,JohnsHopkinsUniversitySchoolofMedicine(JHUSOM),USA.Hisresearch interestsareintheareaofbiologicaldatawarehousingandapplicationsofbioinformaticsto thestudyofimmuneresponses,vaccines,inhibitorydrugs,venomtoxins,anddiseasebiomarkers.Hehaspublishedintheseareas,beeninvolvedinthedevelopmentofseveralnovel bioinformaticsmethodologies,tools,andspecializeddatabases,andcurrentlyhasthree patentapplicationsgranted.HehasalsoledthecurriculumdevelopmentofaPostgraduate DiplomainBioinformaticsprogrammeandanMSc(Bioinformatics)programmeatPerdana University.HeisanelectedExComemberoftheAsia-Paci ficBioinformaticsNetwork (APBioNET)since2010andiscurrentlythePresidentofAssociationforMedicalandBioInformatics,Singapore(AMBIS).Hehasdonnedvariousimportantrolesintheorganization ofmanylocalandinternationalbioinformaticsconferences,meetingsandworkshops.

CONTENTSOFVOLUME1

EditorsinChief

StandardsandModelsforBiologicalData:CommonFormats

StandardsandModelsforBiologicalData:BioPAX

ComputingforBioinformatics

ComputingLanguagesforBioinformatics:Perl

ComputingLanguagesforBioinformatics:BioPerl

ComputingLanguagesforBioinformatics:Python

ComputingLanguagesforBioinformatics:R

ComputingLanguagesforBioinformatics:Java

MapReduceinComputationalBiologyViaHadoopandSpark GiuseppeCattaneo,RaffaeleGiancarlo,UmbertoFerraroPetrillo,andGianlucaRoscigno

InfrastructuresforHigh-PerformanceComputing:CloudComputing

InfrastructuresforHigh-PerformanceComputing:CloudInfrastructures

InfrastructuresforHigh-PerformanceComputing:CloudComputingDevelopment Environments

TheChallengeofPrivacyintheCloud FrancescoBuccafurri,VincenzoDeAngelis,GianlucaLax,SerenaNicolazzo,andAntoninoNocera

ArtificialIntelligenceandMachineLearninginBioinformatics

KaitaoLai,NatalieTwine,AidanO’Brien,YiGuo,andDenisBauer

MachineLearninginBioinformatics

JyotsnaTWassan,HaiyingWang,andHuiruZheng

IntelligentAgentsandEnvironment

AlfredoGarro,MaxMühlhäuser,AndreaTundis,StefanoMariani,AndreaOmicini,andGiuseppeVizzari

IntelligentAgents:Multi-AgentSystems

AlfredoGarro,MaxMühlhäuser,AndreaTundis,MatteoBaldoni,CristinaBaroglio,FedericoBergenti,and PaoloTorroni

StochasticMethodsforGlobalOptimizationandProblemSolving GiovanniStracquadanioandPanosMPardalos

DataMining:ClassificationandPrediction AlfonsoUrso,AntoninoFiannaca,MassimoLaRosa,ValentinaRavì,andRiccardoRizzo

Bayes’ TheoremandNaiveBayesClassi fier

DataMining:PredictionMethods

DataMining:AccuracyandErrorMeasuresforClassificationandPrediction PaolaGaldiandRobertoTagliaferri

DataMining:Clustering

DataMining:OutlierDetection

Pre-Processing:ADataPreparationStep

SwarupRoy,PoojaSharma,KeshabNath, DhrubaKBhattacharyya,andJugalKKalita

KernelMachines:Introduction

ItaloZoppis,GiancarloMauri,andRiccardoDondi

KernelMethods:SupportVectorMachines

ItaloZoppis,GiancarloMauri,andRiccardoDondi

KernelMachines:Applications

ItaloZoppis,GiancarloMauri,andRiccardoDondi

TextMiningBasicsinBioinformatics

CarmenDeMaio,GiuseppeFenza,VincenzoLoia,andMimmoParente

Data-Information-ConceptContinuumFromaTextMiningPerspective

DaniloCavaliere,SabrinaSenatore,andVincenzoLoia

TextMiningforBioinformaticsUsingBiomedicalLiterature AndreLamuriasandFranciscoMCouto

DeepLearning

MassimoGuarascio,GiuseppeManco,andEttoreRitacco

BiologicalandMedicalOntologies:HumanPhenotypeOntology(HPO)

BiologicalandMedicalOntologies:SystemsBiologyOntology(SBO)

RaffaeleGiancarlo,DanieleGreco,FrancescoLandolina,andSimonaERombo

MaxKotlyar,ChiaraPastrello,AndreaEMRossos,andIgorJurisica

AlignmentofProtein-ProteinInteractionNetworks

SwarupRoy,HazelNManners,AhedElmsallati,andJugalKKalita

VisualizationofBiomedicalNetworks

Anne-ChristinHauschild,ChiaraPastrello,AndreaEMRossos,andIgorJurisica

ClusterAnalysisofBiologicalNetworks

AsudaSharma,HeshamAli,andDarioGhersi

BiologicalPathways

GiuseppeAgapito

BiologicalPathwayDataFormatsandStandards

RamakanthCVenkataandDarioGhersi

BiologicalPathwayAnalysis

RamakanthChirravuriVenkataandDarioGhersi 1067

TwoDecadesofBiologicalPathwayDatabases:ResultsandChallenges

SaraRahmati,ChiaraPastrello,AndreaEMRossos,andIgorJurisica 1071

VisualizationofBiologicalPathways

GiuseppeAgapito 1085

IntegrativeBioinformatics

MarcoMasseroli

IntegrativeBioinformaticsofTranscriptome:Databases,ToolsandPipelines

MariaTDiMartinoandPietroHGuzzi 1099

InformationRetrievalinLifeSciences

PietroCinaglia,DomenicoMirarchi,andPierangeloVeltri

LISTOFCONTRIBUTORSFORVOLUME1

GiuseppeAgapito University “MagnaGraecia” ofCatanzaro, Catanzaro,Italy

HeshamAli UniversityofNebraskaatOmaha,Omaha,NE, UnitedStates

AlessiaAmelio UniversityofCalabria,Rende,Italy

ClaudiaAngelini IstitutoperleApplicazionidelCalcolo “M.Picone” , Napoli,Italy

FabrizioAngiulli UniversityofCalabria,Rende,Italy

MatteoBaldoni UniversityofTurin,Turin,Italy

CristinaBaroglio UniversityofTurin,Turin,Italy

DenisBauer CSIRO,NorthRyde,NSW,Australia

StefanoBeretta UniversityofMilan-Biocca,Milan,Italy

FedericoBergenti UniversityofParma,Parma,Italy

AnnaBernasconi

PolitecnicodiMilano,Milan,Italy

DanielBerrar TokyoInstituteofTechnology,Tokyo,Japan

DhrubaK.Bhattacharyya TezpurUniversity,Tezpur,India

MariaconcettaBilotta UniversityofCatanzaro,Catanzaro,Italy;andInstitute S.AnnaofCrotone,Crotone,Italy

FrancescoBuccafurri UniversityofReggioCalabria,Italy

MassimoCafaro UniversityofSalento,Lecce,Italy

BarbaraCalabrese University “MagnaGraecia” ofCatanzaro, Catanzaro,Italy

MarioCannataro University “MagnaGraecia” ofCatanzaro,Catanzaro, Italy

MariaFrancescaCarfora

IstitutoperleApplicazionidelCalcoloCNR,Napoli, Italy

MauroCastelli NOVAIMS,UniversidadeNovadeLisboa,Lisboa, Portugal

GiuseppeCattaneo UniversityofSalerno,Fisciano,Italy

FrancescoCauteruccio UniversityofCalabria,Rende,Italy

DaniloCavaliere UniversitàdegliStudidiSalerno,Fisciano,Italy

DavideChicco PrincessMargaretCancerCentre,Toronto,ON,Canada

PietroCinaglia

MagnaGraeciaUniversityofCatanzaro,Catanzaro, Italy

FranciscoM.Couto UniversidadedeLisboa,Lisboa,Portugal

LuisaCutillo UniversityofSheffield,Shef field,UnitedKingdom;and ParthenopeUniversityofNaples,Naples,Italy

VincenzoDeAngelis UniversityofReggioCalabria,Italy

DanielaDeCanditiis

IstitutoperleApplicazionidelCalcolo “M.Picone” , Rome,Italy

ItaliaDeFeis

IstitutoperleApplicazionidelCalcoloCNR,Napoli, Italy

CarmenDeMaio UniversityofSalerno,Fisciano,Italy

LucaDenti UniversityofMilan-Biocca,Milan,Italy

GiuseppeDiFatta UniversityofReading,Reading,UnitedKingdom

MariaT.DiMartino University “MagnaGraecia” ofCatanzaro,Catanzaro, Italy

RiccardoDondi UniversityofBergamo,Bergamo,Italy

AhedElmsallati McKendreeUniversity,Lebanon,IL,UnitedStates

ItaloEpicoco UniversityofSalento,Lecce,Italy

GiuseppeFenza UniversityofSalerno,Fisciano,Italy

AntoninoFiannaca ViaUgoLaMalfa,Palermo,Italy

ValeriaFionda UniversityofCalabria,Rende,Italy

MonicaFranzese InstituteforAppliedMathematics “MauroPicone” , Napoli,Italy

MicheleFratello DPControl,Salerno,Italy

PaolaGaldi UniversityofSalerno,Fisciano,Italy

AlfredoGarro UniversityofCalabria,Rende,Italy

DarioGhersi UniversityofNebraskaatOmaha,Omaha,NE, UnitedStates

RaffaeleGiancarlo UniversityofPalermo,Palermo,Italy

GianluigiGreco UniversityofCalabria,Cosenza,Italy

DanieleGreco UniversityofPalermo,Palermo,Italy

MassimoGuarascio ICAR-CNR,Rende,Italy

YiGuo WesternSydneyUniversity,Penrith,NSW,Australia

PietroH.Guzzi University “MagnaGraecia” ofCatanzaro, Catanzaro,Italy

AntonellaGuzzo UniversityofCalabria,Rende,Italy

XuHan NanyangTechnologicalUniversity,Singapore

Anne-ChristinHauschild KrembilResearchInstitute,Toronto,ON,Canada

AntonellaIuliano InstituteforAppliedMathematics “MauroPicone” , Napoli,Italy

AudroneJakaitiene VilniusUniversity,Vilnius,Lithuania

B.Jayaram IITDelhi,NewDelhi,India

IgorJurisica UniversityofToronto,ON,Canada;andSlovak AcademyofSciences,Bratislava,Slovakia

JugalK.Kalita UniversityofColorado,Boulder,CO,UnitedStates

RahulKaushik IITDelhi,NewDelhi,India

MaxKotlyar UniversityHealthNetwork,Toronto,ON,Canada

CheeK.Kwoh NanyangTechnologicalUniversity,Singapore

MassimoLaRosa ViaUgoLaMalfa,Palermo,Italy

KaitaoLai CSIRO,NorthRyde,NSW,Australia

AndreLamurias UniversidadedeLisboa,Lisboa,Portugal

FrancescoLandolina UniversityofPalermo,Palermo,Italy

ÁlvaroRubioLargo NOVAIMS,UniversidadeNovadeLisboa,Lisboa, Portugal

GianlucaLax UniversityofReggioCalabria,Italy

PaoloLoGiudice University “Mediterranea” ofReggioCalabria,Reggio Calabria,Italy

VincenzoLoia UniversityofSalerno,Fisciano,Italy

MaxMühlhäuser DarmstadtUniversityofTechnology,Darmstadt, Germany

GiuseppeManco ICAR-CNR,Rende,Italy

MarcoManna UniversityofCalabria,Cosenza,Italy

HazelN.Manners North-EasternHillUniversity,Shillong,India

LauraManuel UniversityofTexasHealthatSanAntonio,San Antonio,TX,UnitedStates

StefanoMariani

UniversityofBologna,Bologna,Italy

FabrizioMarozzo UniversityofCalabria,Rende,Italy

MarcoMasseroli PolytechnicUniversityofMilan,Milan,Italy

GiancarloMauri UniversityofMilan-Biocca,Milan,Italy

IvanMerelli InstituteforBiomedicalTechnologies(CNR),Milan, Italy;andNationalResearchCouncil,Segrate,Italy

MariannaMilano UniversityofCatanzaro,Catanzaro,Italy

DomenicoMirarchi

MagnaGraeciaUniversityofCatanzaro, Catanzaro,Italy

KeshabNath North-EasternHillUniversity,Shillong,India

SerenaNicolazzo UniversityofReggioCalabria,Italy

AntoninoNocera UniversityofReggioCalabria,Italy

AidanO’Brien

CSIRO,NorthRyde,NSW,Australia

AndreaOmicini UniversityofBologna,Bologna,Italy

LuigiPalopoli UniversitàdellaCalabria,Cosenza,Italy

PanosM..Pardalos UniversityofFlorida,Gainesville,FL,UnitedStates

MimmoParente UniversityofSalerno,Fisciano,Italy

ChiaraPastrello

KrembilResearchInstitute,Toronto,ON,Canada

MarcoPellegrini

ConsiglioNazionaledelleRicerche,Istitutodi InformaticaeTelematica,Pisa,Italy

UmbertoFerraroPetrillo UniversityofRome “Sapienza”,Rome,Italy

GiuseppePirrò

ICAR-CNR,Rende,Italy

NadiaPisanti UniversityofPisa,Pisa,Italy

ClaraPizzuti

InstituteforHighPerformanceComputingand Networking(ICAR),Cosenza,Italy

GianlucaPollastri UniversityCollegeDublin,Dublin,Ireland

ErinijaPranckeviciene VilniusUniversity,Vilnius,Lithuania

MarcoPrevitali UniversityofMilan-Biocca,Milan,Italy

MarcoPulimeno UniversityofSalento,Lecce,Italy

SaraRahmati UniversityofToronto,Toronto,ON,Canada;and KrembilResearchInstitute,Toronto,ON,Canada

ValentinaRavì ViaUgoLaMalfa,Palermo,Italy

FrancescoRicca UniversityofCalabria,Rende,Italy

EttoreRitacco ICAR-CNR,Rende,Italy

RiccardoRizzo ICAR-CNR,Rende,Italy

SimonaE.Rombo UniversityofPalermo,Palermo,Italy

FrancescaRondinelli UniversitàdegliStudidiNapoliFedericoII,Napoli, Italy

GianlucaRoscigno UniversityofSalerno,Fisciano,Italy

AndreaE.M.Rossos KrembilResearchInstitute,Toronto,ON,Canada

SwarupRoy SikkimUniversity,Gangtok,India;andNorth-Eastern HillUniversity,Shillong,India

FrancescoScarcello UniversityofCalabria,Rende,Italy

SabrinaSenatore UniversitàdegliStudidiSalerno,Fisciano,Italy

AngelaSerra UniversityofSalerno,Salerno,Italy

PoojaSharma TezpurUniversity,Tezpur,India

AsudaSharma UniversityofNebraskaatOmaha,Omaha,NE, UnitedStates

DavidSimoncini

UniversityofToulouse,Toulouse,France;and RIKEN,Yokohama,Japan

AnkitaSingh

IITDelhi,NewDelhi,India;andBanasthali Vidyapith,Banasthali,India

GiovanniStracquadanio UniversityofEssex,Colchester,UnitedKingdom

AndreaTagarelli UniversityofCalabria,Rende,Italy

RobertoTagliaferri UniversityofSalerno,Salerno,Italy

DomenicoTalia UniversityofCalabria,Rende,Italy

GiorgioTerracina UniversityofCalabria,Rende,Italy

AlfredoTirado-Ramos UniversityofTexasHealthatSanAntonio,San Antonio,TX,UnitedStates

PaoloTorroni UniversityofBologna,Bologna,Italy

GiuseppeTradigo

UniversityofCalabria,Rende,Italy;andUniversity ofFlorida,Gainsville,UnitedStates

PaoloTrun fio UniversityofCalabria,Rende,Italy

AndreaTundis

DarmstadtUniversityofTechnology,Darmstadt, Germany

NatalieTwine CSIRO,NorthRyde,NSW,Australia

DomenicoUrsino University “Mediterranea” ofReggioCalabria,Reggio Calabria,Italy

AlfonsoUrso ViaUgoLaMalfa,Palermo,Italy

FilippoUtro

IBMThomasJ.WatsonResearchCenter,Yorktown Heights,NY,UnitedStates

LeonardoVanneschi NOVAIMS,UniversidadeNovadeLisboa,Lisboa, Portugal

PierangeloVeltri University “MagnaGraecia” ofCatanzaro,Catanzaro, Italy

RamakanthC.Venkata UniversityofNebraskaatOmaha,Omaha,NE,United States

GiuseppeVizzari UniversityofMilano-Bicocca,Milan,Italy

HaiyingWang UlsterUniversity,Newtonabbey,NorthernIreland, UnitedKingdom

JyotsnaT.Wassan UlsterUniversity,Newtonabbey,NorthernIreland, UnitedKingdom

MarcoWiltgen GrazGeneralHospitalandUniversityClinics,Graz, Austria

KamY.J.Zhang RIKEN,Yokohama,Japan

HuiruZheng UlsterUniversity,Newtonabbey,NorthernIreland, UnitedKingdom

ItaloZoppis UniversityofMilan-Biocca,Milan,Italy

ChiaraZucco University “MagnaGraecia” ofCatanzaro,Catanzaro, Italy

PREFACE

BioinformaticsandComputationalBiology(BCB)combineelementsofcomputerscience,informationtechnology,mathematics, statistics,andbiotechnology,providingthemethodologyand insilico solutionstominebiologicaldataandprocesses,forknowledge discovery.Intheeraofmoleculardiagnostics,targeteddrugdesignandBigDataforpersonalizedorevenprecisionmedicine, computationalmethodsfordataanalysisareessentialforbiochemistry,biology,biotechnology,pharmacology,biomedicalscience, andmathematicsandstatistics.BioinformaticsandComputationalBiologyareessentialformakingsenseofthemoleculardatafrom manymodernhigh-throughputstudiesofmiceandmen,aswellaskeymodelorganismsandpathogens.ThisEncyclopediaspans basicstocutting-edgemethodologies,authoredbyleadersinthe field,providinganinvaluableresourcetostudentsaswellas scientists,inacademiaandresearchinstitutesaswellasbiotechnology,biomedicalandpharmaceuticalindustries.

Navigatingthemazeofconfusingandoftencontradictoryjargoncombinedwithaplethoraofsoftwaretoolsisoftenconfusing forstudentsandresearchersalike.Thiscomprehensiveanduniqueresourceprovidesup-to-datetheoryandapplicationcontentto addressmoleculardataanalysisrequirements,withprecisedefinitionofterminology,andlucidexplanationsbyexperts.

Nosingleauthoritativeentityexistsinthisarea,providingacomprehensivedefinitionofthemyriadofcomputerscience, informationtechnology,mathematics,statistics,andbiotechnologytermsusedbyscientistsworkinginbioinformaticsand computationalbiology.Currentbooksavailableinthisareaaswellasexistingpublicationsaddresspartsofaproblemorprovide chaptersonthetopic,essentiallyaddressingpracticingbioinformaticistsorcomputationalbiologists.Newcomerstothisarea dependonGooglesearchesleadingtopublishedliteratureaswellasseveraltextbooks,tocollecttherelevantinformation.

AlthoughcurriculahavebeendevelopedforBioinformaticseducationfortwodecadesnow(Altman,1998),offeringeducationin bioinformaticscontinuestoremainchallengingfromthemultidisciplinaryperspective,andisperhapsanNP-hardproblem(Ranganathan, 2005).AminimumBioinformaticsskillsetforuniversitygraduateshasbeensuggested(Tan etal.,2009).TheBioinformaticssectionofthe ReferenceModuleinLifeSciences(Ranganathan,2017)commencedbyaddressingthepaucityofacomprehensivereferencebook,leading tothedevelopmentofthisEncyclopedia.Thiscompilationaimsto fillthe “ gap ” forreaderswithsuccinctandauthoritativedescriptionsof currentandcutting-edgebioinformaticsareas,supplementedwiththetheoreticalconceptsunderpinningthesetopics.

ThisEncyclopediacomprisesthreesections,coveringMethods,TopicsandApplications.ThetheoreticalmethodologyunderpinningBCBaredescribedintheMethodssection,withTopicscoveringtraditionalareassuchasphylogeny,aswellasmorerecent areassuchastranslationalbioinformatics,cheminformaticsandcomputationalsystemsbiology.Additionally,Applicationswill provideguidanceforcommonlyasked “howto” questionsonscientificareasdescribedintheTopicssection,usingthemethodology setoutintheMethodssection.ThroughoutthisEncyclopedia,wehaveendeavoredtokeepthecontentaslucidaspossible,making thetext “ assimpleaspossible,butnotsimpler,” attributedtoAlbertEinstein.Comprehensivechaptersprovideoverviewswhile detailsareprovidedbyshorter,encyclopedicchapters.

DuringtheplanningphaseofthisEncyclopedia,theencouragementofElsevier ’sPriscillaBragliaandtheconstructivecommentsfromnolessthantenreviewersleadoursmallpreliminaryeditorialteam(ChristianSchönbach,KentaNakaiandmyself)to embarkonthismassiveproject.WethenwelcomedonemoreEditor-in-Chief,MichaelGribskovandthreesectioneditors,Mario Cannataro,BrunoGaetaandAsifKhan,whosetoilshaveresultsingatheringmostofthecurrentcontent,withalleditorsreviewing thesubmissions.Throughouttheproductionphase,wehavereceivedinvaluablesupportandguidanceaswellasmilestone remindersfromPaulaDavies,forwhichweremainextremelygrateful.

Finallywewouldliketoacknowledgeallourauthors,fromaroundtheworld,whodedicatedtheirvaluabletimetosharetheir knowledgeandexpertisetoprovideeducationalguidanceforourreaders,aswellasleavealastinglegacyoftheirwork.

WehopethereaderswillenjoythisEncyclopediaasmuchastheeditorialteamhave,incompilingthisasanABCof bioinformatics,suitablefornaïveaswellasexperiencedscientistsandasanessentialreferenceandinvaluableteachingguidefor students,post-doctoralscientists,seniorscientists,academicsinuniversitiesandresearchinstitutesaswellaspharmaceutical, biomedicalandbiotechnologicalindustries.NobellaureateWalterGilbertpredictedin1990that “Intheyear2020youwillbe abletogointothedrugstore,haveyourDNAsequencereadinanhourorso,andgivenbacktoyouonacompactdisksoyoucan analyzeit.” Whiletechnologymayhavealreadyarrivedatthismilestone,weareconfidentoneofthereadersofthisEncyclopedia willbereadytoextractvaluablebiologicaldatabycomputationalanalysis,resultinginbiomedicalandtherapeuticsolutions, usingbioinformaticsto “ measure ” healthforearlydiagnosisof “disease.”

References

Altman,R.B.,1998.Acurriculumforbioinformatics:thetimeisripe.Bioinformatics.14(7),549–550. Ranganathan,S.,2005.Bioinformaticseducation–perspectivesandchallenges.PLoSComputBiol1(6),e52. Tan,T.W.,Lim,S.J.,Khan,A.M.,Ranganathan,S.,2009.Aproposedminimumskillsetforuniversitygraduatestomeettheinformaticsneedsandchallengesofthe “-omics” era.BMCGenomics.10(Suppl3),S36. Ranganathan,S.,2017.Bioinformatics.ReferenceModuleinLifeSciences.Oxford:Elsevier.

ShobaRanganathan

AlgorithmsFoundations

NadiaPisanti, UniversityofPisa,Pisa,Italy

r 2019ElsevierInc.Allrightsreserved.

Introduction

Biologyoffersahugeamountandvarietyofdatatobeprocessed.Suchdatahastobestored,analysed,compared,searched, classified,etcetera,feedingwithnewchallengesmany fieldsofcomputerscience.Amongthem,algorithmicsplaysaspecialrolein theanalysisofbiologicalsequences,structures,andnetworks.Indeed,especiallyduetothe floodofdatacomingfromsequencing projectsaswellasfromitsdown-streamanalysis,thesizeofdigitalbiologicaldatatobestudiedrequiresthedesignofvery efficientalgorithms.Moreover,biologyhasbecome,probablymorethananyotherfundamentalscience,agreatsourceofnew algorithmicproblemsaskingforaccuratesolutions.Nowadays,biologistsmoreandmoreneedtoworkwith insilico data,and thereforeitisimportantforthemtounderstandwhyandhowanalgorithmworks,inordertobecon fidentinitsresults.Thegoal ofthischapteristogiveanoverviewoffundamentalsofalgorithmsdesignandevaluationtoanon-computerscientist.

AlgorithmsandTheirComplexity

Computationallyspeaking,a problem isdefinedbyaninput/outputrelation:wearegivenaninput,andwewanttoreturnas outputawelldefinedsolutionwhichisafunctionoftheinputsatisfyingsomeproperty.

An algorithm isacomputationalprocedure(describedbymeansofanunambiguoussequenceofinstructions)thathastobe excutedinordertosolveacomputationalproblem.Analgorithmsolvingagivenproblemiscorrectifitoutputstherightresultfor everypossibleinput.Thealgorithmhastobedescribedaccordingtotheentitywhichwillexecuteit:ifthisisacomputer,thenthe algorithmwillhavetobewritteninaprogramminglanguage.

Example: SortingProblem

INPUT:AsequenceSofnnumbers o a1,a2, …,an4

OUTPUT:Apermutation oa 0 1 ; a 0 2 ; …; a 0 n 4 ofSsuchthata0 1 ra 0 2 r…ra 0 n

Givenaproblem,therecanbemanyalgorithmsthatcorrectlysolveit,butingeneraltheywillnotallbeequallyef ficient.The efficiencyofanalgorithmisafunctionofitsinputsize.

Forexample,asolutionforthesortingproblemwouldbetogenerateallpossiblepermutationsofSand,pereachoneofthem, checkwhetherthisissorted.Withthisprocedure,oneneedstobeluckyto findtherightsortingfast,asthereisanexponential(in n)numberofsuchpermutationsandintheaveragecase,aswellasintheworstcase,thisalgorithmwouldrequireanumberof elementaryoperations(suchaswriteavalueinamemorycell,comparingtwovalues,swappingtwovalues,etcetera)whichis exponentialintheinputsizen.Inthiscase,sincetheworstcasecannotbeexcluded,wesaythatthealgorithmhasanexponential timecomplexity.Incomputerscience,exponentialalgorithmsareconsidered intractable.Analgorithmis,instead, tractable,ifits complexityfunctionispolynomialintheinputsize.The complexityofaproblem isthatofthemostefficientalgorithmthatsolvesit. Fortunately,thesortingproblemistractable,asthereexisttractablesolutionsthatwewilldescribelater.

Inordertoevaluatetherunningtimeofanalgorithmindependentlyfromthespecifichardwareonwhichitisexecuted,thisis computedintermsoftheamountofsimpleoperationstowhichitisassignedanunitarycostor,however,acostwhichisconstant withrespecttotheinputsize.Aconstantrunningtimeisanegligiblecost,asitdoesnotgrowwhentheinputsizedoes;moreover, aconstantfactorsummedupwithahigherdegreepolynomialinnisalsonegligible;furthermore,evenaconstantfactor multiplyingahigherpolynomialisconsiderednegligibleinrunningtimeanalysis.Whatcountsisthegrowthfactorwithrespectto theinputsize,i.e.the asymptotic complexityT(n)astheinputsizengrows.Incomputationalcomplexitytheory,thisisformalized usingthe big-O notationthatexcludesbothcoefficientsandlowerorderterms:theasymptotictimecomplexityT(n)ofan algorithmisinO(f(n))ifthereexistn0 andc40suchthatT(n)rcf(n)forallnZn0.Forexample,analgorithmthatscansaninput ofsizenaconstantnumberoftimes,andthenperformsaconstantnumberofsomeotheroperations,takesO(n)time,andissaid tohavelineartimecomplexity.AnalgorithmthattakeslineartimeonlyintheworstcaseisalsosaidtobeinO(n),becausethe big-Onotationrepresentsanupperbound.Thereisalsoanasymptoticcomplexitynotation O(f(n))forthelowerbound:T(n) ¼ O (f(n))wheneverf(n) ¼ O(T(n)).Athirdnotation Y(f(n))denotesasymptoticequivalence:wewriteT(n) ¼ Y(f(n))ifbothT(n) ¼ O(f(n))andf(n) ¼ O(T(n))hold.Forexample,analgorithmthat always performsalinearscanoftheinput,andnotjustinthe worstcase,hastimecomplexityin Y(n).Finally,analgorithmwhichneedstoatleastread,hencescan,thewholeinputofsizen (andpossibilyalsoperformmorecostlytasks),hastimecomplexityin O(n).

Timecomplexityisnottheonlycostparameterofanalgorithm: spacecomplexity isalsorelevanttoevaluateitsefficiency.For spacecomplexity,computerscientistsdonotmeanthesizeoftheprogramdescribinganalgorithm,butratherthedatastructures thisactuallykeepsinmemoryduringitsexecution.Likefortimecomplexity,theconcernisabouthowmuchmemorythe executiontakesintheworstcaseandwithrespecttotheinputsize.Forexample,analgorithmsolvingthesortingproblemwithout

requiringanyadditionaldatastructure(besidespossiblyaconstantnumberofconstant-sizevariables),wouldhavelinearspace complexity.Alsotheexponentialtimecomplexityalgorithmwedescribedabovehaslinearspacecomplexity:ateachstep,it sufficestokeepinmemoryonlyonepermutationofS,asthosepreviouslyattemptedcanbediscarded.Thisobservationoffersan exampleofwhy,often,timecomplexityisofmoreconcernthanspacecomplexity.Thereasonisnotthatspaceislessrelevantthan time,butratherthatspacecomplexityisinpracticealowerboundof(andthussmallerthan)timecomplexity:ifanalgorithmhas towriteand/orreadacertainamountofdata,thenitforcelyhastoperformatleastthatamountofelementarysteps(Cormen etal.,2009; JonesandPevzner,2004).

IterativeAlgorithms

An iterativealgorithm isanalgorithmwhichrepeatesasamesequenceofactionsseveraltimes;thenumberofsuchtimesdoesnot needtobeknownapriori,butithastobe finite.Inprogramminglanguages,therearebasicallytwokindsofiterativecommands: the for commandrepeatstheactionsanumberoftimeswhichiscomputed,oranyhowknown,beforetheiteractionsbegin;the while command,instead,performstheactionsaslongasacertaingivenconditionissatisfied,andthenumberoftimesthiswill occurisnotknownapriori.Whatwecallherean action isacommandwhichcanbe,onitsturn,againiterative.Thecostofan iterativecommandisthecostofitsactionsmultipliedbythenumberofiterations.

Fromnowon,inthisarticlewewilldescribeanalgorithmbymeansoftheso-called pseudocode:aninformaldescriptionofa realcomputerprogram,whichisamixtureofnaturallanguageandkeywordsrepresentingcommandsthataretypicalofprogramminglanguages.Tothispurpose,beforeexhibitinganexampleofaniterativealgorithmforthesortingproblem,weintroduce thesyntaxofafundamentalelementarycommand:theassignment “ x’E”,whoseeffectistosetthevalueofanexpressionEtothe variablex,andwhosetimecostisconstant,providedthatcomputingthevalueofE,whichcancontainonitsturnvariablesaswell ascallsoffunctions,isalsoconstant.WewillassumethattheinputsequenceSofthesortingproblemisgivenasanarray:anarray isadatastructureofknown fixedlengththatcontainselementsofthesametype(inthiscasenumbers).Thei-thelementofarrayS isdenotedbyS[i],andreadingorwritingS[i]takesconstanttime.Alsoswappingtwovaluesofthearraytakesconstanttime,and wewilldenotethisasasinglecommandinourpseudocode,evenifinpracticeitwillbeimplementedbyafewoperationsthatuse athirdtemporaryvariable.Whatfollowsisthepseudocodeofanalgorithmthatsolvesthesortingprobleminpolynomialtime.

INSERTION-SORT(S,n) for i ¼ 1 to n 1 do j’i while (j40andS[j 1]4S[j]) swapS[j]andS[j 1] j’j 1 endwhile endfor

INSERTION-SORTtakesininputthearraySanditssizen.ItworksiterativelybyinsertingintothepartiallysortedSthe elementsoneaftertheother.Thearrayisindexedfrom0ton 1,anda for commandperformsactionsforeachiintheinterval[1, n 1]sothatattheendofiterationi,theleftendofthearrayuptoitsi-thpositionissorted.Thisisrealizedbymeansofanother iterativecommand,nestedintothe firstone,thatusesasecondindexjthatstartsfromi,comparesS[j](thenewelement)withits predecessor,andpossiblyswapsthemsothatS[j]movesdowntowardsitsrightposition;thenjisdecreasedandthetaskis repeateduntilS[j]hasreacheditscorrectposition;thisinneriterativecommandisa while commandbecausethistaskhastobe performedaslongasthepredecessorofS[j]islargerthanit.

Example: LetusconsiderS ¼ [3,2,7,1].Recallthatarraysareindexedfromposition0(thatis,S[0] ¼ 3,S[1] ¼ 1,andsoon).

INSERTION-SORTfori ¼ 1setsj ¼ 1aswell,andthenexecutesthewhilebecausej ¼ 140andS[0]4S[1]:thesetwovaluesare swappedandjbecomes0sothatthewhilecommandendswithS ¼ [2,3,7,1].Thenanew for iterationstartswithi ¼ 2(noticethat atthistime,correctly,SissorteduptoS[1]),andS[2]istakenintoaccount;thistimethe while commandisenteredwithj ¼ 2and itsconditionisnotsatisfied(asS[2]4S[1])sothatthe while immediatelyendswithoutchangingS:the firstthreevaluesofSare alreadysorted.Finally,thelast for iterationwithi ¼ 4willexecutethe while threetimes(thatis,n 1)swapping1with7,then with3,and finallywith2,leadingtoS ¼ [1,2,3,7]whichisthecorrectoutput.

INSERTION-SORTtakesatleastlineartime(thatis,itstimecomplexityisin O(n))becauseallelementsofSmustberead,and indeedthe for commandisexecuted Y(n)times:onepereacharraypositionfromthesecondtothelast.Theinvariantisthatat thebeginningofeachsuchiteration,thearrayissorteduptopositionS[i 1],andthenthenewvalueatS[i]isprocessed.Each iterationofthe for,besidestheconstanttime(hencenegligible)assignmentj’i,executesthe while command.Thislatterchecks itscondition(inconstanttime)and,ifthenewlyreadelementS[j] isgreaterthan,orequalto,S[j 1] (whichisthelargestofthe sofarsortedarray),thenitdoesnothing;else,itswapsS[j]andS[j 1],decreasesj,checksagainthecondition,andpossibly repeatestheseactions,aslongaseitherS[j] findsitsplaceafterasmallervalue,oritbecomesthenew firstelementofSasitisthe smallestfoundsofar.Therefore,theactionsofthe while commandareneverexecutedifthearrayisalreadysorted.Thisisthebest casetimecomplexityofINSERTION-SORT:linearintheinputsizen.Theworstcaseis,instead,whentheinputarrayissortedin

thereverseorder:inthiscase,ateachiterationi,the while commandhastoperformexactlyiswapstoletS[j]movedowntothe firstposition.Therefore,inthiscase,iterationiofthe for takesisteps,andtherearen 1suchiterationsforeach1rirn 1. Hence,theworstcaserunningtimeis

Asforspacecomplexity,INSERTION-SORTworkswithintheinputarrayplusaconstantnumberoftemporaryvariables,and henceithaslinearspacecomplexity.Beingnalsoalowerbound(thewholearraymustbestored),inthiscasethespace complexityisoptimal.

Thealgorithmwejustdescribedisanexampleofiterativealgorithmthatrealisesaquiteintuitivesortingstrategy;indeed,often thisalgorithmisexplainedasthewaywewouldsortplayingcardsinonehandbyusingtheotherhandtoiterativelyinserteach newcardinitscorrectposition.Iterationispowerfulenoughtoachieve,foroursortingproblem,apolynomialtime – although almosttrivial – solution;thetimecomplexityofINSERTION-SORTcannothoweverbeprovedtobeoptimalasthelowerbound forthesortingproblemisnotn2,butrathern log2n(resultnotprovedhere).InordertoachieveO(n log2n)timecomplexitywe needanevenmorepowerfulparadigmthatwewillintroduceinnextsection.

RecursiveAlgorithms

A recursivealgorithm isanalgorithmwhich,amongitscommands,recursivelycallsitselfonsmallerinstances:itsplitsthemain problemintosubproblems,recursivelysolvesthemandcombinestheirsolutionsinordertobuildupthesolutionoftheoriginal problem.Thereisafascinatingmathematicalfoundation,thatgoesbacktothearithmeticofPeano,andevenfurtherbackto inductiontheory,fortheconditionsthatguaranteecorrectnessofarecursivealgorithm.Wewillomitdetailsofthisinvolved mathematicalframework.Surprisinglyenough,foracomputerthisapparentlyverycomplexparadigm,iseasytoimplementby meansofasimpledatastructure(the stack).

Inordertoshowhowpowerfulinductionis,wewilluseagainourSortingProblemrunningexample.Namely,wedescribehere therecursiveMERGE-SORTalgorithmwhichachieves Y(n log2n)timecomplexity,andisthusoptimal.Basically,thealgorithm MERGE-SORTsplitsthearrayintotwohalves,sortsthem(bymeansoftworecursivecallsonasmanysub-arraysofsizen/2each), andthenmergestheoutcomesintoawholesortedarray.Thetworecursivecalls,ontheirturn,willrecursivelysplitagaininto subarraysofsizen/4,andsoon,untilthebasecase(thealreadysortedsub-arrayofsize1)isreached.Themergingprocedurewill beimplementedbythefunctionMERGE(pseudocodenotshown)whichtakesininputthearrayandthestartingandending positionsofitsportionsthatcontainthetwocontiguoussub-arraystobemerged.Recallingthatthetwohalf-arraystobemerged aresorted,MERGEsimplyusestwoindicesalongthemslidingfromlefttoright,and,ateachstep:makesacomparison,writesthe smallest,andincreasestheindexofthesub-arraywhichcontainedit.Thisisdoneuntilwhenbothsub-arrayshavebeenentirely writtenintotheresult.

MERGE-SORT(S,p,r)

if por then

q’⌊(p þ r)/2m

MERGE-SORT(S,p,q)

MERGE-SORT(S,q þ 1,r)

MERGE(S,p,q,r) endif

Giventheneedofcallingthealgorithmondifferentarrayfragments,theinputparameters,besidesSitself,willbethestarting andendingpositionoftheportionofarraytobesorted.Therefore,the firstcallwillbeMERGE-SORT(S,0,n 1).Thentheindexq whichsplitsSintwohalvesiscomputed,andthetwosofoundsubarraysaresortedbymeansofasmanyrecursivecalls;thetwo resultingsortedarraysofsizen/2arethenfusedbyMERGEintothe finalresult.Thecorrectnessoftherecursionfollowsfromthe factthattherecursivecallisdoneonahalf-longarray,andfromtheterminationcondition “ por ”:ifthisholds,thentherecursion goeson;else(p ¼ r)thereisnothingtodoasthearrayhaslength1anditissorted.Notice,indeed,thatifSisnotempty,thenp4r canneverholdasqiscomputedsuchthatprqor.

ThealgorithmMERGE-SORThaslinear(henceoptimal)spacecomplexityasitonlyusesSitselfplusacontantnumberof variables.ThetimecomplexityT(n)ofMERGE-SORTcanbedefinedbythefollowingrecurrencerelation:

because,withaninputofsizen,MERGE-SORTcallsitselftwiceonarraysofsizen/2,andthencallsMERGEwhichtakes,aswe showedabove, Y(n)time.

WenowshowbyinductiononnthatT(n) ¼ Y(n log2n).Thebasecaseissimple:ifn ¼ 1thenSisalreadysortedandcorrectly MERGE-SORTdoesnothingandendsin Y(1)time.Ifn41,assumingthatT(n0 ) ¼ Y(n0 log2n0 )holdsforn0 on,thenwehave

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.