https://ebookmass.com/product/encyclopedia-of-
Instant digital products (PDF, ePub, MOBI) ready for you
Download now and discover formats that fit your needs...
Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics Shoba Ranganathan
https://ebookmass.com/product/encyclopedia-of-bioinformatics-andcomputational-biology-abc-of-bioinformatics-shoba-ranganathan/
ebookmass.com
Emerging trends in applications and infrastructures for computational biology, bioinformatics, and systems biology : systems and applications 1st Edition Arabnia
https://ebookmass.com/product/emerging-trends-in-applications-andinfrastructures-for-computational-biology-bioinformatics-and-systemsbiology-systems-and-applications-1st-edition-arabnia/
ebookmass.com
Cell biology : translational impact in cancer biology and bioinformatics 1st Edition Mitchell
https://ebookmass.com/product/cell-biology-translational-impact-incancer-biology-and-bioinformatics-1st-edition-mitchell/
ebookmass.com
GATE 2019 Electronics and Communication Engineering Trishna Knowledge Systems
https://ebookmass.com/product/gate-2019-electronics-and-communicationengineering-trishna-knowledge-systems/
ebookmass.com
Rice Bran and Rice Bran Oil: Chemistry, Processing and Utilization Ling-Zhi Cheong
https://ebookmass.com/product/rice-bran-and-rice-bran-oil-chemistryprocessing-and-utilization-ling-zhi-cheong/
ebookmass.com
Engenharia de Controle Moderno 5. ed. Edition Katsuhiko Ogata
https://ebookmass.com/product/engenharia-de-controle-moderno-5-ededition-katsuhiko-ogata/
ebookmass.com
The Comprehensive Textbook of Clinical Biomechanics 2nd Edition Edition Jim Richards
https://ebookmass.com/product/the-comprehensive-textbook-of-clinicalbiomechanics-2nd-edition-edition-jim-richards/
ebookmass.com
Navigating the Challenges of Concussion Michael S. Jaffee
https://ebookmass.com/product/navigating-the-challenges-of-concussionmichael-s-jaffee/
ebookmass.com
John for Normal People Jennifer Garcia Bashaw
https://ebookmass.com/product/john-for-normal-people-jennifer-garciabashaw/
ebookmass.com
Fundamentals of Semiconductor Devices (2nd Edition) Betty Anderson
https://ebookmass.com/product/fundamentals-of-semiconductordevices-2nd-edition-betty-anderson/
ebookmass.com
ENCYCLOPEDIAOF BIOINFORMATICSAND COMPUTATIONALBIOLOGY ENCYCLOPEDIAOF BIOINFORMATICSAND COMPUTATIONALBIOLOGY EDITORSINCHIEF
ShobaRanganathan
MacquarieUniversity,Sydney,NSW,Australia
MichaelGribskov
PurdueUniversity,WestLafayette,IN,UnitedStates
KentaNakai
TheUniversityofTokyo,Tokyo,Japan
ChristianSchönbach
NazarbayevUniversity,SchoolofScienceandTechnology,DepartmentofBiology, Astana,Kazakhstan
VOLUME1 Methods
MarioCannataro
TheMagnaGræciaUniversityofCatanzaro,Catanzaro,Italy
Elsevier
Radarweg29,POBox211,1000AEAmsterdam,Netherlands TheBoulevard,LangfordLane,Kidlington,OxfordOX51GB,UnitedKingdom 50HampshireStreet,5thFloor,CambridgeMA02139,UnitedStates
Copyright r 2019ElsevierInc.Allrightsreserved.
Nopartofthispublicationmaybereproducedortransmittedinanyformorbyanymeans,electronicormechanical,including photocopying,recording,oranyinformationstorageandretrievalsystem,withoutpermissioninwritingfromthepublisher.Detailson howtoseekpermission,furtherinformationaboutthePublisher’spermissionspoliciesandourarrangementswithorganizationssuchas theCopyrightClearanceCenterandtheCopyrightLicensingAgency,canbefoundatourwebsite:www.elsevier.com/permissions.
ThisbookandtheindividualcontributionscontainedinitareprotectedundercopyrightbythePublisher(otherthanasmaybenoted herein).
Notices
Knowledgeandbestpracticeinthis fieldareconstantlychanging.Asnewresearchandexperiencebroadenourunderstanding,changesin researchmethods,professionalpractices,ormedicaltreatmentmaybecomenecessary.
Practitionersandresearchersmayalwaysrelyontheirownexperienceandknowledgeinevaluatingandusinganyinformation,methods, compounds,orexperimentsdescribedherein.Inusingsuchinformationormethodstheyshouldbemindfuloftheirownsafetyandthe safetyofothers,includingpartiesforwhomtheyhaveaprofessionalresponsibility.
Tothefullestextentofthelaw,neitherthePublishernortheauthors,contributors,oreditors,assumeanyliabilityforanyinjuryand/or damagetopersonsorpropertyasamatterofproductsliability,negligenceorotherwise,orfromanyuseoroperationofanymethods, products,instructions,orideascontainedinthematerialherein.
LibraryofCongressCataloging-in-PublicationData
AcatalogrecordforthisbookisavailablefromtheLibraryofCongress
BritishLibraryCataloguing-in-PublicationData
AcataloguerecordforthisbookisavailablefromtheBritishLibrary
ISBN978-0-12-811414-8
Forinformationonallpublicationsvisitour websiteathttp://store.elsevier.com
Publisher:OliverWalter
AcquisitionEditor:SamCrowe
ContentProjectManager:PaulaDavies
AssociateContentProjectManager:EbinClintonRozario Designer:GregHarris
PrintedandboundintheUnitedStates
EDITORSINCHIEF ShobaRanganathanholdsaChairinBioinformaticsatMacquarieUniversitysince2004.She hasheldresearchandacademicpositionsinIndia,USA,SingaporeandAustraliaaswellasa consultancyinindustry.ShehostedtheMacquarieNodeoftheARCCentreofExcellencein Bioinformatics(2008–2013).Shewaselectedthe firstAustralianBoardDirectoroftheInternationalSocietyforComputationalBiology(ISCB;2003–2005);PresidentofAsia-Paci fic BioinformaticsNetwork(2005–2016)andSteeringCommitteeMember(2007–2012)of BioinformaticsAustralia.SheinitiatedtheWorkshopsonEducationinBioinformatics(WEB) asanISMB2001SpecialInterestGroupmeetingandalsoservedasChairofICSB'sEducaton Committee.ShobacurrentlyservesasCo-ChairoftheComputationalMassSpectrometry (CompMS)initiativeoftheHumanProteomeOrganization(HuPO),ISCBandMetabolomics SocietyandasBoardDirector,APBioNetLtd.
Shoba'sresearchaddressesseveralkeyareasofbioinformaticstounderstandbiological systemsusingcomputationalapproaches.Hergrouphasachievedbothexperienceand expertiseindifferentaspectsofcomputationalbiology,rangingfrommetabolitesandsmall moleculestobiochemicalnetworks,pathwayanalysisandcomputationalsystemsbiology.She hasauthoredaswellaseditedseveralbooksaswellasarticlesforthe2013Encyclopediaof SystemsBiology.SheiscurrentlyanEditor-in-ChiefoftheEncyclopediaofBioinformaticsand ComputationalBiologyandtheBioinformaticsSectionEditoroftheReferenceModuleinLifeScienceaswellasaneditorialboard memberofseveralbioinformaticsjournals.
Dr.GribskovgraduatedfromOregonStateUniversityin1979withaBachelorsofScience degree(withHonors)inBiochemistryandBiophysics.HethenmovedtotheUniversityof Wisconsin-Madisonforgraduatestudiesfocusedonthestructureandfunctionofthesigma subunitof E.coli RNApolymerase,receivinghisPh.D.in1985.Dr.GribskovstudiedX-ray crystallographyasanAmericanCancerSocietypost-doctoralfellowatUCLAinthelaboratory ofDavidEisenberg,andfollowedthiswithbothcrystallographicandcomputationalstudiesat theNationalCancerInstitute.In1992,Dr.GribskovmovedtotheSanDiegoSupercomputer CenterattheUniversityofCalifornia,SanDiegowherehewasleadscientistintheareaof computationalbiologyandanadjunctassociateprofessorinthedepartmentofBiology.From 2003to2007,Dr.GribskovwasthepresidentoftheInternationalSocietyforComputational Biology,thelargestprofessionalsocietydevotedtobioinformaticsandcomputationalbiology. In2004,Dr.GribskovmovedtoPurdueUniversitywhereheholdsanappointmentasafull professorintheBiologicalSciencesandComputerSciencedepartments(bycourtesy).Dr. Gribskov'sinterestsincludegenomicandtranscriptomicanalysisofmodelandnon-model organisms,theapplicationofpatternrecognitionandmachinelearningtechniquestobiomolecules,thedesignandimplementationofbiologicaldatabasestosupportmolecularandsystemsbiology,developmentofmethodstostudyRNAstructural patterns,andsystemsbiologystudiesofhumandisease.
KentaNakaireceivedthePhDdegreeonthepredictionofsubcellularlocalizationsitesof proteinsfromKyotoUniversityin1992.From1989,hehasworkedatKyotoUniversity, NationalInstituteofBasicBiology,andOsakaUniversity.From1999to2003,hewasan AssociateProfessorattheHumanGenomeCenter,theInstituteofMedicalScience,theUniversityofTokyo,Japan.Since2003,hehasbeenafullProfessoratthesameinstitute.Hismain researchinterestistodevelopcomputationalwaysforinterpretingbiologicalinformation, especiallythatoftranscriptionalregulation,fromgenomesequencedata.Hehaspublished morethan150papers,someofwhichhavebeencitedmorethan1,000times.
ChristianSchonbachiscurrentlyDepartmentChairandProfessoratDepartmentofBiology, SchoolofScienceandTechnology,NazarbayevUniversity,KazakhstanandVisitingProfessorat InternationalResearchCenterforMedicalSciencesatKumamotoUniversity,Japan.Heisa bioinformaticspractitionerinterfacinggenetics,immunologyandinformaticsconducting researchonmajorhistocompatibilitycomplex,immuneresponsesfollowingvirusinfection, biomedicalknowledgediscovery,peroxisomaldiseases,andautismspectrumdisorderthat resultedinmorethan80publications.HispreviousacademicappointmentsincludedProfessoratKumamotoUniversity(2016–2017),NazarbayevUniversity(2013–2016),Kazakhstan,KyushuInstituteofTechnology(2009–2013)Japan,AssociateProfessoratNanyang TechnologicalUniversity(2006–2009),Singapore,andTeamLeaderatRIKENGenomicSciencesCenter(2002–2006),Japan.OtherpriorpositionsincludedPrincipalInvestigatoratKent RidgeDigitalLabs,SingaporeandResearchScientistatChugaiInstituteforMolecularMedicine,Inc.,Japan.In2018hebecameamemberofInternationalSocietyforComputationalBiology(ISCB)BoardofDirectors. Since2010heisservingAsia-PacificBioinformaticsNetwork(APBioNet)asVice-President(Conferences2010–2016)andPresident(2016–2018).
VOLUMEEDITORS MarioCannataroisaFullProfessorofComputerEngineeringandBioinformaticsatUniversity “MagnaGraecia” ofCatanzaro,Italy.HeisthedirectoroftheDataAnalyticsresearch centerandthechairoftheBioinformaticsLaboratoryatUniversity “MagnaGraecia” of Catanzaro.Hiscurrentresearchinterestsincludebioinformatics,medicalinformatics,data analytics,parallelanddistributedcomputing.HeisaMemberoftheeditorialboardsof BriefingsinBioinformatics,High-Throughput,EncyclopediaofBioinformaticsandComputationalBiology,EncyclopediaofSystemsBiology.Hewasguesteditorofseveralspecial issuesonbioinformaticsandheisservingasaprogramcommitteememberofseveralconferences.Hepublishedthreebooksandmorethan200papersininternationaljournalsand conferenceproceedings.Prof.CannataroisaSeniorMemberofIEEE,ACMandBITS,anda memberoftheBoardofDirectorsforACMSIGBIO.
BrunoGaetaisSeniorLecturerandDirectorofStudiesinBioinformaticsintheSchoolof ComputerScienceandEngineeringatUNSWAustralia.Hisresearchinterestscovermultiple areasofbioinformaticsincludinggeneregulationandproteinstructure,currentlywithafocus ontheimmunesystem,antibodygenesandthegenerationofantibodydiversity.Heisa pioneerofbioinformaticseducationandhastrainedthousandsofbiologistsandtrainee bioinformaticiansintheuseofcomputationaltoolsforbiologicalresearchthroughcourses, workshopsaswellasabookseries.Hehasworkedbothinacademiaandinthebioinformaticsindustry,andcurrentlycoordinatesthelargestbioinformaticsundergraduateprogram inAustralia.
MohammadAsifKhan,PhD,isanassociateprofessorandtheDeanoftheSchoolofData Sciences,aswellastheDirectoroftheCentreforBioinformaticsatPerdanaUniversity, Malaysia.HeisalsoavisitingscientistattheDepartmentofPharmacologyandMolecular Sciences,JohnsHopkinsUniversitySchoolofMedicine(JHUSOM),USA.Hisresearch interestsareintheareaofbiologicaldatawarehousingandapplicationsofbioinformaticsto thestudyofimmuneresponses,vaccines,inhibitorydrugs,venomtoxins,anddiseasebiomarkers.Hehaspublishedintheseareas,beeninvolvedinthedevelopmentofseveralnovel bioinformaticsmethodologies,tools,andspecializeddatabases,andcurrentlyhasthree patentapplicationsgranted.HehasalsoledthecurriculumdevelopmentofaPostgraduate DiplomainBioinformaticsprogrammeandanMSc(Bioinformatics)programmeatPerdana University.HeisanelectedExComemberoftheAsia-Paci ficBioinformaticsNetwork (APBioNET)since2010andiscurrentlythePresidentofAssociationforMedicalandBioInformatics,Singapore(AMBIS).Hehasdonnedvariousimportantrolesintheorganization ofmanylocalandinternationalbioinformaticsconferences,meetingsandworkshops.
CONTENTSOFVOLUME1 EditorsinChief
StandardsandModelsforBiologicalData:CommonFormats
StandardsandModelsforBiologicalData:BioPAX
ComputingforBioinformatics
ComputingLanguagesforBioinformatics:Perl
ComputingLanguagesforBioinformatics:BioPerl
ComputingLanguagesforBioinformatics:Python
ComputingLanguagesforBioinformatics:R
ComputingLanguagesforBioinformatics:Java
MapReduceinComputationalBiologyViaHadoopandSpark GiuseppeCattaneo,RaffaeleGiancarlo,UmbertoFerraroPetrillo,andGianlucaRoscigno
InfrastructuresforHigh-PerformanceComputing:CloudComputing
InfrastructuresforHigh-PerformanceComputing:CloudInfrastructures
InfrastructuresforHigh-PerformanceComputing:CloudComputingDevelopment Environments
TheChallengeofPrivacyintheCloud FrancescoBuccafurri,VincenzoDeAngelis,GianlucaLax,SerenaNicolazzo,andAntoninoNocera
ArtificialIntelligenceandMachineLearninginBioinformatics
KaitaoLai,NatalieTwine,AidanO’Brien,YiGuo,andDenisBauer
MachineLearninginBioinformatics
JyotsnaTWassan,HaiyingWang,andHuiruZheng
IntelligentAgentsandEnvironment
AlfredoGarro,MaxMühlhäuser,AndreaTundis,StefanoMariani,AndreaOmicini,andGiuseppeVizzari
IntelligentAgents:Multi-AgentSystems
AlfredoGarro,MaxMühlhäuser,AndreaTundis,MatteoBaldoni,CristinaBaroglio,FedericoBergenti,and PaoloTorroni
StochasticMethodsforGlobalOptimizationandProblemSolving GiovanniStracquadanioandPanosMPardalos
DataMining:ClassificationandPrediction AlfonsoUrso,AntoninoFiannaca,MassimoLaRosa,ValentinaRavì,andRiccardoRizzo
Bayes’ TheoremandNaiveBayesClassi fier
DataMining:PredictionMethods
DataMining:AccuracyandErrorMeasuresforClassificationandPrediction PaolaGaldiandRobertoTagliaferri
DataMining:Clustering
DataMining:OutlierDetection
Pre-Processing:ADataPreparationStep
SwarupRoy,PoojaSharma,KeshabNath, DhrubaKBhattacharyya,andJugalKKalita
KernelMachines:Introduction
ItaloZoppis,GiancarloMauri,andRiccardoDondi
KernelMethods:SupportVectorMachines
ItaloZoppis,GiancarloMauri,andRiccardoDondi
KernelMachines:Applications
ItaloZoppis,GiancarloMauri,andRiccardoDondi
TextMiningBasicsinBioinformatics
CarmenDeMaio,GiuseppeFenza,VincenzoLoia,andMimmoParente
Data-Information-ConceptContinuumFromaTextMiningPerspective
DaniloCavaliere,SabrinaSenatore,andVincenzoLoia
TextMiningforBioinformaticsUsingBiomedicalLiterature AndreLamuriasandFranciscoMCouto
DeepLearning
MassimoGuarascio,GiuseppeManco,andEttoreRitacco
BiologicalandMedicalOntologies:HumanPhenotypeOntology(HPO)
BiologicalandMedicalOntologies:SystemsBiologyOntology(SBO)
fi
RaffaeleGiancarlo,DanieleGreco,FrancescoLandolina,andSimonaERombo
MaxKotlyar,ChiaraPastrello,AndreaEMRossos,andIgorJurisica
AlignmentofProtein-ProteinInteractionNetworks
SwarupRoy,HazelNManners,AhedElmsallati,andJugalKKalita
VisualizationofBiomedicalNetworks
Anne-ChristinHauschild,ChiaraPastrello,AndreaEMRossos,andIgorJurisica
ClusterAnalysisofBiologicalNetworks
AsudaSharma,HeshamAli,andDarioGhersi
BiologicalPathways
GiuseppeAgapito
BiologicalPathwayDataFormatsandStandards
RamakanthCVenkataandDarioGhersi
BiologicalPathwayAnalysis
RamakanthChirravuriVenkataandDarioGhersi 1067
TwoDecadesofBiologicalPathwayDatabases:ResultsandChallenges
SaraRahmati,ChiaraPastrello,AndreaEMRossos,andIgorJurisica 1071
VisualizationofBiologicalPathways
GiuseppeAgapito 1085
IntegrativeBioinformatics
MarcoMasseroli
IntegrativeBioinformaticsofTranscriptome:Databases,ToolsandPipelines
MariaTDiMartinoandPietroHGuzzi 1099
InformationRetrievalinLifeSciences
PietroCinaglia,DomenicoMirarchi,andPierangeloVeltri
LISTOFCONTRIBUTORSFORVOLUME1 GiuseppeAgapito University “MagnaGraecia” ofCatanzaro, Catanzaro,Italy
HeshamAli UniversityofNebraskaatOmaha,Omaha,NE, UnitedStates
AlessiaAmelio UniversityofCalabria,Rende,Italy
ClaudiaAngelini IstitutoperleApplicazionidelCalcolo “M.Picone” , Napoli,Italy
FabrizioAngiulli UniversityofCalabria,Rende,Italy
MatteoBaldoni UniversityofTurin,Turin,Italy
CristinaBaroglio UniversityofTurin,Turin,Italy
DenisBauer CSIRO,NorthRyde,NSW,Australia
StefanoBeretta UniversityofMilan-Biocca,Milan,Italy
FedericoBergenti UniversityofParma,Parma,Italy
AnnaBernasconi
PolitecnicodiMilano,Milan,Italy
DanielBerrar TokyoInstituteofTechnology,Tokyo,Japan
DhrubaK.Bhattacharyya TezpurUniversity,Tezpur,India
MariaconcettaBilotta UniversityofCatanzaro,Catanzaro,Italy;andInstitute S.AnnaofCrotone,Crotone,Italy
FrancescoBuccafurri UniversityofReggioCalabria,Italy
MassimoCafaro UniversityofSalento,Lecce,Italy
BarbaraCalabrese University “MagnaGraecia” ofCatanzaro, Catanzaro,Italy
MarioCannataro University “MagnaGraecia” ofCatanzaro,Catanzaro, Italy
MariaFrancescaCarfora
IstitutoperleApplicazionidelCalcoloCNR,Napoli, Italy
MauroCastelli NOVAIMS,UniversidadeNovadeLisboa,Lisboa, Portugal
GiuseppeCattaneo UniversityofSalerno,Fisciano,Italy
FrancescoCauteruccio UniversityofCalabria,Rende,Italy
DaniloCavaliere UniversitàdegliStudidiSalerno,Fisciano,Italy
DavideChicco PrincessMargaretCancerCentre,Toronto,ON,Canada
PietroCinaglia
MagnaGraeciaUniversityofCatanzaro,Catanzaro, Italy
FranciscoM.Couto UniversidadedeLisboa,Lisboa,Portugal
LuisaCutillo UniversityofSheffield,Shef field,UnitedKingdom;and ParthenopeUniversityofNaples,Naples,Italy
VincenzoDeAngelis UniversityofReggioCalabria,Italy
DanielaDeCanditiis
IstitutoperleApplicazionidelCalcolo “M.Picone” , Rome,Italy
ItaliaDeFeis
IstitutoperleApplicazionidelCalcoloCNR,Napoli, Italy
CarmenDeMaio UniversityofSalerno,Fisciano,Italy
LucaDenti UniversityofMilan-Biocca,Milan,Italy
GiuseppeDiFatta UniversityofReading,Reading,UnitedKingdom
MariaT.DiMartino University “MagnaGraecia” ofCatanzaro,Catanzaro, Italy
RiccardoDondi UniversityofBergamo,Bergamo,Italy
AhedElmsallati McKendreeUniversity,Lebanon,IL,UnitedStates
ItaloEpicoco UniversityofSalento,Lecce,Italy
GiuseppeFenza UniversityofSalerno,Fisciano,Italy
AntoninoFiannaca ViaUgoLaMalfa,Palermo,Italy
ValeriaFionda UniversityofCalabria,Rende,Italy
MonicaFranzese InstituteforAppliedMathematics “MauroPicone” , Napoli,Italy
MicheleFratello DPControl,Salerno,Italy
PaolaGaldi UniversityofSalerno,Fisciano,Italy
AlfredoGarro UniversityofCalabria,Rende,Italy
DarioGhersi UniversityofNebraskaatOmaha,Omaha,NE, UnitedStates
RaffaeleGiancarlo UniversityofPalermo,Palermo,Italy
GianluigiGreco UniversityofCalabria,Cosenza,Italy
DanieleGreco UniversityofPalermo,Palermo,Italy
MassimoGuarascio ICAR-CNR,Rende,Italy
YiGuo WesternSydneyUniversity,Penrith,NSW,Australia
PietroH.Guzzi University “MagnaGraecia” ofCatanzaro, Catanzaro,Italy
AntonellaGuzzo UniversityofCalabria,Rende,Italy
XuHan NanyangTechnologicalUniversity,Singapore
Anne-ChristinHauschild KrembilResearchInstitute,Toronto,ON,Canada
AntonellaIuliano InstituteforAppliedMathematics “MauroPicone” , Napoli,Italy
AudroneJakaitiene VilniusUniversity,Vilnius,Lithuania
B.Jayaram IITDelhi,NewDelhi,India
IgorJurisica UniversityofToronto,ON,Canada;andSlovak AcademyofSciences,Bratislava,Slovakia
JugalK.Kalita UniversityofColorado,Boulder,CO,UnitedStates
RahulKaushik IITDelhi,NewDelhi,India
MaxKotlyar UniversityHealthNetwork,Toronto,ON,Canada
CheeK.Kwoh NanyangTechnologicalUniversity,Singapore
MassimoLaRosa ViaUgoLaMalfa,Palermo,Italy
KaitaoLai CSIRO,NorthRyde,NSW,Australia
AndreLamurias UniversidadedeLisboa,Lisboa,Portugal
FrancescoLandolina UniversityofPalermo,Palermo,Italy
ÁlvaroRubioLargo NOVAIMS,UniversidadeNovadeLisboa,Lisboa, Portugal
GianlucaLax UniversityofReggioCalabria,Italy
PaoloLoGiudice University “Mediterranea” ofReggioCalabria,Reggio Calabria,Italy
VincenzoLoia UniversityofSalerno,Fisciano,Italy
MaxMühlhäuser DarmstadtUniversityofTechnology,Darmstadt, Germany
GiuseppeManco ICAR-CNR,Rende,Italy
MarcoManna UniversityofCalabria,Cosenza,Italy
HazelN.Manners North-EasternHillUniversity,Shillong,India
LauraManuel UniversityofTexasHealthatSanAntonio,San Antonio,TX,UnitedStates
StefanoMariani
UniversityofBologna,Bologna,Italy
FabrizioMarozzo UniversityofCalabria,Rende,Italy
MarcoMasseroli PolytechnicUniversityofMilan,Milan,Italy
GiancarloMauri UniversityofMilan-Biocca,Milan,Italy
IvanMerelli InstituteforBiomedicalTechnologies(CNR),Milan, Italy;andNationalResearchCouncil,Segrate,Italy
MariannaMilano UniversityofCatanzaro,Catanzaro,Italy
DomenicoMirarchi
MagnaGraeciaUniversityofCatanzaro, Catanzaro,Italy
KeshabNath North-EasternHillUniversity,Shillong,India
SerenaNicolazzo UniversityofReggioCalabria,Italy
AntoninoNocera UniversityofReggioCalabria,Italy
AidanO’Brien
CSIRO,NorthRyde,NSW,Australia
AndreaOmicini UniversityofBologna,Bologna,Italy
LuigiPalopoli UniversitàdellaCalabria,Cosenza,Italy
PanosM..Pardalos UniversityofFlorida,Gainesville,FL,UnitedStates
MimmoParente UniversityofSalerno,Fisciano,Italy
ChiaraPastrello
KrembilResearchInstitute,Toronto,ON,Canada
MarcoPellegrini
ConsiglioNazionaledelleRicerche,Istitutodi InformaticaeTelematica,Pisa,Italy
UmbertoFerraroPetrillo UniversityofRome “Sapienza”,Rome,Italy
GiuseppePirrò
ICAR-CNR,Rende,Italy
NadiaPisanti UniversityofPisa,Pisa,Italy
ClaraPizzuti
InstituteforHighPerformanceComputingand Networking(ICAR),Cosenza,Italy
GianlucaPollastri UniversityCollegeDublin,Dublin,Ireland
ErinijaPranckeviciene VilniusUniversity,Vilnius,Lithuania
MarcoPrevitali UniversityofMilan-Biocca,Milan,Italy
MarcoPulimeno UniversityofSalento,Lecce,Italy
SaraRahmati UniversityofToronto,Toronto,ON,Canada;and KrembilResearchInstitute,Toronto,ON,Canada
ValentinaRavì ViaUgoLaMalfa,Palermo,Italy
FrancescoRicca UniversityofCalabria,Rende,Italy
EttoreRitacco ICAR-CNR,Rende,Italy
RiccardoRizzo ICAR-CNR,Rende,Italy
SimonaE.Rombo UniversityofPalermo,Palermo,Italy
FrancescaRondinelli UniversitàdegliStudidiNapoliFedericoII,Napoli, Italy
GianlucaRoscigno UniversityofSalerno,Fisciano,Italy
AndreaE.M.Rossos KrembilResearchInstitute,Toronto,ON,Canada
SwarupRoy SikkimUniversity,Gangtok,India;andNorth-Eastern HillUniversity,Shillong,India
FrancescoScarcello UniversityofCalabria,Rende,Italy
SabrinaSenatore UniversitàdegliStudidiSalerno,Fisciano,Italy
AngelaSerra UniversityofSalerno,Salerno,Italy
PoojaSharma TezpurUniversity,Tezpur,India
AsudaSharma UniversityofNebraskaatOmaha,Omaha,NE, UnitedStates
DavidSimoncini
UniversityofToulouse,Toulouse,France;and RIKEN,Yokohama,Japan
AnkitaSingh
IITDelhi,NewDelhi,India;andBanasthali Vidyapith,Banasthali,India
GiovanniStracquadanio UniversityofEssex,Colchester,UnitedKingdom
AndreaTagarelli UniversityofCalabria,Rende,Italy
RobertoTagliaferri UniversityofSalerno,Salerno,Italy
DomenicoTalia UniversityofCalabria,Rende,Italy
GiorgioTerracina UniversityofCalabria,Rende,Italy
AlfredoTirado-Ramos UniversityofTexasHealthatSanAntonio,San Antonio,TX,UnitedStates
PaoloTorroni UniversityofBologna,Bologna,Italy
GiuseppeTradigo
UniversityofCalabria,Rende,Italy;andUniversity ofFlorida,Gainsville,UnitedStates
PaoloTrun fio UniversityofCalabria,Rende,Italy
AndreaTundis
DarmstadtUniversityofTechnology,Darmstadt, Germany
NatalieTwine CSIRO,NorthRyde,NSW,Australia
DomenicoUrsino University “Mediterranea” ofReggioCalabria,Reggio Calabria,Italy
AlfonsoUrso ViaUgoLaMalfa,Palermo,Italy
FilippoUtro
IBMThomasJ.WatsonResearchCenter,Yorktown Heights,NY,UnitedStates
LeonardoVanneschi NOVAIMS,UniversidadeNovadeLisboa,Lisboa, Portugal
PierangeloVeltri University “MagnaGraecia” ofCatanzaro,Catanzaro, Italy
RamakanthC.Venkata UniversityofNebraskaatOmaha,Omaha,NE,United States
GiuseppeVizzari UniversityofMilano-Bicocca,Milan,Italy
HaiyingWang UlsterUniversity,Newtonabbey,NorthernIreland, UnitedKingdom
JyotsnaT.Wassan UlsterUniversity,Newtonabbey,NorthernIreland, UnitedKingdom
MarcoWiltgen GrazGeneralHospitalandUniversityClinics,Graz, Austria
KamY.J.Zhang RIKEN,Yokohama,Japan
HuiruZheng UlsterUniversity,Newtonabbey,NorthernIreland, UnitedKingdom
ItaloZoppis UniversityofMilan-Biocca,Milan,Italy
ChiaraZucco University “MagnaGraecia” ofCatanzaro,Catanzaro, Italy
PREFACE BioinformaticsandComputationalBiology(BCB)combineelementsofcomputerscience,informationtechnology,mathematics, statistics,andbiotechnology,providingthemethodologyand insilico solutionstominebiologicaldataandprocesses,forknowledge discovery.Intheeraofmoleculardiagnostics,targeteddrugdesignandBigDataforpersonalizedorevenprecisionmedicine, computationalmethodsfordataanalysisareessentialforbiochemistry,biology,biotechnology,pharmacology,biomedicalscience, andmathematicsandstatistics.BioinformaticsandComputationalBiologyareessentialformakingsenseofthemoleculardatafrom manymodernhigh-throughputstudiesofmiceandmen,aswellaskeymodelorganismsandpathogens.ThisEncyclopediaspans basicstocutting-edgemethodologies,authoredbyleadersinthe field,providinganinvaluableresourcetostudentsaswellas scientists,inacademiaandresearchinstitutesaswellasbiotechnology,biomedicalandpharmaceuticalindustries.
Navigatingthemazeofconfusingandoftencontradictoryjargoncombinedwithaplethoraofsoftwaretoolsisoftenconfusing forstudentsandresearchersalike.Thiscomprehensiveanduniqueresourceprovidesup-to-datetheoryandapplicationcontentto addressmoleculardataanalysisrequirements,withprecisedefinitionofterminology,andlucidexplanationsbyexperts.
Nosingleauthoritativeentityexistsinthisarea,providingacomprehensivedefinitionofthemyriadofcomputerscience, informationtechnology,mathematics,statistics,andbiotechnologytermsusedbyscientistsworkinginbioinformaticsand computationalbiology.Currentbooksavailableinthisareaaswellasexistingpublicationsaddresspartsofaproblemorprovide chaptersonthetopic,essentiallyaddressingpracticingbioinformaticistsorcomputationalbiologists.Newcomerstothisarea dependonGooglesearchesleadingtopublishedliteratureaswellasseveraltextbooks,tocollecttherelevantinformation.
AlthoughcurriculahavebeendevelopedforBioinformaticseducationfortwodecadesnow(Altman,1998),offeringeducationin bioinformaticscontinuestoremainchallengingfromthemultidisciplinaryperspective,andisperhapsanNP-hardproblem(Ranganathan, 2005).AminimumBioinformaticsskillsetforuniversitygraduateshasbeensuggested(Tan etal.,2009).TheBioinformaticssectionofthe ReferenceModuleinLifeSciences(Ranganathan,2017)commencedbyaddressingthepaucityofacomprehensivereferencebook,leading tothedevelopmentofthisEncyclopedia.Thiscompilationaimsto fillthe “ gap ” forreaderswithsuccinctandauthoritativedescriptionsof currentandcutting-edgebioinformaticsareas,supplementedwiththetheoreticalconceptsunderpinningthesetopics.
ThisEncyclopediacomprisesthreesections,coveringMethods,TopicsandApplications.ThetheoreticalmethodologyunderpinningBCBaredescribedintheMethodssection,withTopicscoveringtraditionalareassuchasphylogeny,aswellasmorerecent areassuchastranslationalbioinformatics,cheminformaticsandcomputationalsystemsbiology.Additionally,Applicationswill provideguidanceforcommonlyasked “howto” questionsonscientificareasdescribedintheTopicssection,usingthemethodology setoutintheMethodssection.ThroughoutthisEncyclopedia,wehaveendeavoredtokeepthecontentaslucidaspossible,making thetext “ assimpleaspossible,butnotsimpler,” attributedtoAlbertEinstein.Comprehensivechaptersprovideoverviewswhile detailsareprovidedbyshorter,encyclopedicchapters.
DuringtheplanningphaseofthisEncyclopedia,theencouragementofElsevier ’sPriscillaBragliaandtheconstructivecommentsfromnolessthantenreviewersleadoursmallpreliminaryeditorialteam(ChristianSchönbach,KentaNakaiandmyself)to embarkonthismassiveproject.WethenwelcomedonemoreEditor-in-Chief,MichaelGribskovandthreesectioneditors,Mario Cannataro,BrunoGaetaandAsifKhan,whosetoilshaveresultsingatheringmostofthecurrentcontent,withalleditorsreviewing thesubmissions.Throughouttheproductionphase,wehavereceivedinvaluablesupportandguidanceaswellasmilestone remindersfromPaulaDavies,forwhichweremainextremelygrateful.
Finallywewouldliketoacknowledgeallourauthors,fromaroundtheworld,whodedicatedtheirvaluabletimetosharetheir knowledgeandexpertisetoprovideeducationalguidanceforourreaders,aswellasleavealastinglegacyoftheirwork.
WehopethereaderswillenjoythisEncyclopediaasmuchastheeditorialteamhave,incompilingthisasanABCof bioinformatics,suitablefornaïveaswellasexperiencedscientistsandasanessentialreferenceandinvaluableteachingguidefor students,post-doctoralscientists,seniorscientists,academicsinuniversitiesandresearchinstitutesaswellaspharmaceutical, biomedicalandbiotechnologicalindustries.NobellaureateWalterGilbertpredictedin1990that “Intheyear2020youwillbe abletogointothedrugstore,haveyourDNAsequencereadinanhourorso,andgivenbacktoyouonacompactdisksoyoucan analyzeit.” Whiletechnologymayhavealreadyarrivedatthismilestone,weareconfidentoneofthereadersofthisEncyclopedia willbereadytoextractvaluablebiologicaldatabycomputationalanalysis,resultinginbiomedicalandtherapeuticsolutions, usingbioinformaticsto “ measure ” healthforearlydiagnosisof “disease.”
References
Altman,R.B.,1998.Acurriculumforbioinformatics:thetimeisripe.Bioinformatics.14(7),549–550. Ranganathan,S.,2005.Bioinformaticseducation–perspectivesandchallenges.PLoSComputBiol1(6),e52. Tan,T.W.,Lim,S.J.,Khan,A.M.,Ranganathan,S.,2009.Aproposedminimumskillsetforuniversitygraduatestomeettheinformaticsneedsandchallengesofthe “-omics” era.BMCGenomics.10(Suppl3),S36. Ranganathan,S.,2017.Bioinformatics.ReferenceModuleinLifeSciences.Oxford:Elsevier.
ShobaRanganathan
AlgorithmsFoundations NadiaPisanti, UniversityofPisa,Pisa,Italy
r 2019ElsevierInc.Allrightsreserved.
Introduction
Biologyoffersahugeamountandvarietyofdatatobeprocessed.Suchdatahastobestored,analysed,compared,searched, classified,etcetera,feedingwithnewchallengesmany fieldsofcomputerscience.Amongthem,algorithmicsplaysaspecialrolein theanalysisofbiologicalsequences,structures,andnetworks.Indeed,especiallyduetothe floodofdatacomingfromsequencing projectsaswellasfromitsdown-streamanalysis,thesizeofdigitalbiologicaldatatobestudiedrequiresthedesignofvery efficientalgorithms.Moreover,biologyhasbecome,probablymorethananyotherfundamentalscience,agreatsourceofnew algorithmicproblemsaskingforaccuratesolutions.Nowadays,biologistsmoreandmoreneedtoworkwith insilico data,and thereforeitisimportantforthemtounderstandwhyandhowanalgorithmworks,inordertobecon fidentinitsresults.Thegoal ofthischapteristogiveanoverviewoffundamentalsofalgorithmsdesignandevaluationtoanon-computerscientist.
AlgorithmsandTheirComplexity Computationallyspeaking,a problem isdefinedbyaninput/outputrelation:wearegivenaninput,andwewanttoreturnas outputawelldefinedsolutionwhichisafunctionoftheinputsatisfyingsomeproperty.
An algorithm isacomputationalprocedure(describedbymeansofanunambiguoussequenceofinstructions)thathastobe excutedinordertosolveacomputationalproblem.Analgorithmsolvingagivenproblemiscorrectifitoutputstherightresultfor everypossibleinput.Thealgorithmhastobedescribedaccordingtotheentitywhichwillexecuteit:ifthisisacomputer,thenthe algorithmwillhavetobewritteninaprogramminglanguage.
Example: SortingProblem
INPUT:AsequenceSofnnumbers o a1,a2, …,an4
OUTPUT:Apermutation oa 0 1 ; a 0 2 ; …; a 0 n 4 ofSsuchthata0 1 ra 0 2 r…ra 0 n
Givenaproblem,therecanbemanyalgorithmsthatcorrectlysolveit,butingeneraltheywillnotallbeequallyef ficient.The efficiencyofanalgorithmisafunctionofitsinputsize.
Forexample,asolutionforthesortingproblemwouldbetogenerateallpossiblepermutationsofSand,pereachoneofthem, checkwhetherthisissorted.Withthisprocedure,oneneedstobeluckyto findtherightsortingfast,asthereisanexponential(in n)numberofsuchpermutationsandintheaveragecase,aswellasintheworstcase,thisalgorithmwouldrequireanumberof elementaryoperations(suchaswriteavalueinamemorycell,comparingtwovalues,swappingtwovalues,etcetera)whichis exponentialintheinputsizen.Inthiscase,sincetheworstcasecannotbeexcluded,wesaythatthealgorithmhasanexponential timecomplexity.Incomputerscience,exponentialalgorithmsareconsidered intractable.Analgorithmis,instead, tractable,ifits complexityfunctionispolynomialintheinputsize.The complexityofaproblem isthatofthemostefficientalgorithmthatsolvesit. Fortunately,thesortingproblemistractable,asthereexisttractablesolutionsthatwewilldescribelater.
Inordertoevaluatetherunningtimeofanalgorithmindependentlyfromthespecifichardwareonwhichitisexecuted,thisis computedintermsoftheamountofsimpleoperationstowhichitisassignedanunitarycostor,however,acostwhichisconstant withrespecttotheinputsize.Aconstantrunningtimeisanegligiblecost,asitdoesnotgrowwhentheinputsizedoes;moreover, aconstantfactorsummedupwithahigherdegreepolynomialinnisalsonegligible;furthermore,evenaconstantfactor multiplyingahigherpolynomialisconsiderednegligibleinrunningtimeanalysis.Whatcountsisthegrowthfactorwithrespectto theinputsize,i.e.the asymptotic complexityT(n)astheinputsizengrows.Incomputationalcomplexitytheory,thisisformalized usingthe big-O notationthatexcludesbothcoefficientsandlowerorderterms:theasymptotictimecomplexityT(n)ofan algorithmisinO(f(n))ifthereexistn0 andc40suchthatT(n)rcf(n)forallnZn0.Forexample,analgorithmthatscansaninput ofsizenaconstantnumberoftimes,andthenperformsaconstantnumberofsomeotheroperations,takesO(n)time,andissaid tohavelineartimecomplexity.AnalgorithmthattakeslineartimeonlyintheworstcaseisalsosaidtobeinO(n),becausethe big-Onotationrepresentsanupperbound.Thereisalsoanasymptoticcomplexitynotation O(f(n))forthelowerbound:T(n) ¼ O (f(n))wheneverf(n) ¼ O(T(n)).Athirdnotation Y(f(n))denotesasymptoticequivalence:wewriteT(n) ¼ Y(f(n))ifbothT(n) ¼ O(f(n))andf(n) ¼ O(T(n))hold.Forexample,analgorithmthat always performsalinearscanoftheinput,andnotjustinthe worstcase,hastimecomplexityin Y(n).Finally,analgorithmwhichneedstoatleastread,hencescan,thewholeinputofsizen (andpossibilyalsoperformmorecostlytasks),hastimecomplexityin O(n).
Timecomplexityisnottheonlycostparameterofanalgorithm: spacecomplexity isalsorelevanttoevaluateitsefficiency.For spacecomplexity,computerscientistsdonotmeanthesizeoftheprogramdescribinganalgorithm,butratherthedatastructures thisactuallykeepsinmemoryduringitsexecution.Likefortimecomplexity,theconcernisabouthowmuchmemorythe executiontakesintheworstcaseandwithrespecttotheinputsize.Forexample,analgorithmsolvingthesortingproblemwithout
requiringanyadditionaldatastructure(besidespossiblyaconstantnumberofconstant-sizevariables),wouldhavelinearspace complexity.Alsotheexponentialtimecomplexityalgorithmwedescribedabovehaslinearspacecomplexity:ateachstep,it sufficestokeepinmemoryonlyonepermutationofS,asthosepreviouslyattemptedcanbediscarded.Thisobservationoffersan exampleofwhy,often,timecomplexityisofmoreconcernthanspacecomplexity.Thereasonisnotthatspaceislessrelevantthan time,butratherthatspacecomplexityisinpracticealowerboundof(andthussmallerthan)timecomplexity:ifanalgorithmhas towriteand/orreadacertainamountofdata,thenitforcelyhastoperformatleastthatamountofelementarysteps(Cormen etal.,2009; JonesandPevzner,2004).
IterativeAlgorithms An iterativealgorithm isanalgorithmwhichrepeatesasamesequenceofactionsseveraltimes;thenumberofsuchtimesdoesnot needtobeknownapriori,butithastobe finite.Inprogramminglanguages,therearebasicallytwokindsofiterativecommands: the for commandrepeatstheactionsanumberoftimeswhichiscomputed,oranyhowknown,beforetheiteractionsbegin;the while command,instead,performstheactionsaslongasacertaingivenconditionissatisfied,andthenumberoftimesthiswill occurisnotknownapriori.Whatwecallherean action isacommandwhichcanbe,onitsturn,againiterative.Thecostofan iterativecommandisthecostofitsactionsmultipliedbythenumberofiterations.
Fromnowon,inthisarticlewewilldescribeanalgorithmbymeansoftheso-called pseudocode:aninformaldescriptionofa realcomputerprogram,whichisamixtureofnaturallanguageandkeywordsrepresentingcommandsthataretypicalofprogramminglanguages.Tothispurpose,beforeexhibitinganexampleofaniterativealgorithmforthesortingproblem,weintroduce thesyntaxofafundamentalelementarycommand:theassignment “ x’E”,whoseeffectistosetthevalueofanexpressionEtothe variablex,andwhosetimecostisconstant,providedthatcomputingthevalueofE,whichcancontainonitsturnvariablesaswell ascallsoffunctions,isalsoconstant.WewillassumethattheinputsequenceSofthesortingproblemisgivenasanarray:anarray isadatastructureofknown fixedlengththatcontainselementsofthesametype(inthiscasenumbers).Thei-thelementofarrayS isdenotedbyS[i],andreadingorwritingS[i]takesconstanttime.Alsoswappingtwovaluesofthearraytakesconstanttime,and wewilldenotethisasasinglecommandinourpseudocode,evenifinpracticeitwillbeimplementedbyafewoperationsthatuse athirdtemporaryvariable.Whatfollowsisthepseudocodeofanalgorithmthatsolvesthesortingprobleminpolynomialtime.
INSERTION-SORT(S,n) for i ¼ 1 to n 1 do j’i while (j40andS[j 1]4S[j]) swapS[j]andS[j 1] j’j 1 endwhile endfor
INSERTION-SORTtakesininputthearraySanditssizen.ItworksiterativelybyinsertingintothepartiallysortedSthe elementsoneaftertheother.Thearrayisindexedfrom0ton 1,anda for commandperformsactionsforeachiintheinterval[1, n 1]sothatattheendofiterationi,theleftendofthearrayuptoitsi-thpositionissorted.Thisisrealizedbymeansofanother iterativecommand,nestedintothe firstone,thatusesasecondindexjthatstartsfromi,comparesS[j](thenewelement)withits predecessor,andpossiblyswapsthemsothatS[j]movesdowntowardsitsrightposition;thenjisdecreasedandthetaskis repeateduntilS[j]hasreacheditscorrectposition;thisinneriterativecommandisa while commandbecausethistaskhastobe performedaslongasthepredecessorofS[j]islargerthanit.
Example: LetusconsiderS ¼ [3,2,7,1].Recallthatarraysareindexedfromposition0(thatis,S[0] ¼ 3,S[1] ¼ 1,andsoon).
INSERTION-SORTfori ¼ 1setsj ¼ 1aswell,andthenexecutesthewhilebecausej ¼ 140andS[0]4S[1]:thesetwovaluesare swappedandjbecomes0sothatthewhilecommandendswithS ¼ [2,3,7,1].Thenanew for iterationstartswithi ¼ 2(noticethat atthistime,correctly,SissorteduptoS[1]),andS[2]istakenintoaccount;thistimethe while commandisenteredwithj ¼ 2and itsconditionisnotsatisfied(asS[2]4S[1])sothatthe while immediatelyendswithoutchangingS:the firstthreevaluesofSare alreadysorted.Finally,thelast for iterationwithi ¼ 4willexecutethe while threetimes(thatis,n 1)swapping1with7,then with3,and finallywith2,leadingtoS ¼ [1,2,3,7]whichisthecorrectoutput.
INSERTION-SORTtakesatleastlineartime(thatis,itstimecomplexityisin O(n))becauseallelementsofSmustberead,and indeedthe for commandisexecuted Y(n)times:onepereacharraypositionfromthesecondtothelast.Theinvariantisthatat thebeginningofeachsuchiteration,thearrayissorteduptopositionS[i 1],andthenthenewvalueatS[i]isprocessed.Each iterationofthe for,besidestheconstanttime(hencenegligible)assignmentj’i,executesthe while command.Thislatterchecks itscondition(inconstanttime)and,ifthenewlyreadelementS[j] isgreaterthan,orequalto,S[j 1] (whichisthelargestofthe sofarsortedarray),thenitdoesnothing;else,itswapsS[j]andS[j 1],decreasesj,checksagainthecondition,andpossibly repeatestheseactions,aslongaseitherS[j] findsitsplaceafterasmallervalue,oritbecomesthenew firstelementofSasitisthe smallestfoundsofar.Therefore,theactionsofthe while commandareneverexecutedifthearrayisalreadysorted.Thisisthebest casetimecomplexityofINSERTION-SORT:linearintheinputsizen.Theworstcaseis,instead,whentheinputarrayissortedin
thereverseorder:inthiscase,ateachiterationi,the while commandhastoperformexactlyiswapstoletS[j]movedowntothe firstposition.Therefore,inthiscase,iterationiofthe for takesisteps,andtherearen 1suchiterationsforeach1rirn 1. Hence,theworstcaserunningtimeis
Asforspacecomplexity,INSERTION-SORTworkswithintheinputarrayplusaconstantnumberoftemporaryvariables,and henceithaslinearspacecomplexity.Beingnalsoalowerbound(thewholearraymustbestored),inthiscasethespace complexityisoptimal.
Thealgorithmwejustdescribedisanexampleofiterativealgorithmthatrealisesaquiteintuitivesortingstrategy;indeed,often thisalgorithmisexplainedasthewaywewouldsortplayingcardsinonehandbyusingtheotherhandtoiterativelyinserteach newcardinitscorrectposition.Iterationispowerfulenoughtoachieve,foroursortingproblem,apolynomialtime – although almosttrivial – solution;thetimecomplexityofINSERTION-SORTcannothoweverbeprovedtobeoptimalasthelowerbound forthesortingproblemisnotn2,butrathern log2n(resultnotprovedhere).InordertoachieveO(n log2n)timecomplexitywe needanevenmorepowerfulparadigmthatwewillintroduceinnextsection.
RecursiveAlgorithms A recursivealgorithm isanalgorithmwhich,amongitscommands,recursivelycallsitselfonsmallerinstances:itsplitsthemain problemintosubproblems,recursivelysolvesthemandcombinestheirsolutionsinordertobuildupthesolutionoftheoriginal problem.Thereisafascinatingmathematicalfoundation,thatgoesbacktothearithmeticofPeano,andevenfurtherbackto inductiontheory,fortheconditionsthatguaranteecorrectnessofarecursivealgorithm.Wewillomitdetailsofthisinvolved mathematicalframework.Surprisinglyenough,foracomputerthisapparentlyverycomplexparadigm,iseasytoimplementby meansofasimpledatastructure(the stack).
Inordertoshowhowpowerfulinductionis,wewilluseagainourSortingProblemrunningexample.Namely,wedescribehere therecursiveMERGE-SORTalgorithmwhichachieves Y(n log2n)timecomplexity,andisthusoptimal.Basically,thealgorithm MERGE-SORTsplitsthearrayintotwohalves,sortsthem(bymeansoftworecursivecallsonasmanysub-arraysofsizen/2each), andthenmergestheoutcomesintoawholesortedarray.Thetworecursivecalls,ontheirturn,willrecursivelysplitagaininto subarraysofsizen/4,andsoon,untilthebasecase(thealreadysortedsub-arrayofsize1)isreached.Themergingprocedurewill beimplementedbythefunctionMERGE(pseudocodenotshown)whichtakesininputthearrayandthestartingandending positionsofitsportionsthatcontainthetwocontiguoussub-arraystobemerged.Recallingthatthetwohalf-arraystobemerged aresorted,MERGEsimplyusestwoindicesalongthemslidingfromlefttoright,and,ateachstep:makesacomparison,writesthe smallest,andincreasestheindexofthesub-arraywhichcontainedit.Thisisdoneuntilwhenbothsub-arrayshavebeenentirely writtenintotheresult.
MERGE-SORT(S,p,r)
if por then
q’⌊(p þ r)/2m
MERGE-SORT(S,p,q)
MERGE-SORT(S,q þ 1,r)
MERGE(S,p,q,r) endif
Giventheneedofcallingthealgorithmondifferentarrayfragments,theinputparameters,besidesSitself,willbethestarting andendingpositionoftheportionofarraytobesorted.Therefore,the firstcallwillbeMERGE-SORT(S,0,n 1).Thentheindexq whichsplitsSintwohalvesiscomputed,andthetwosofoundsubarraysaresortedbymeansofasmanyrecursivecalls;thetwo resultingsortedarraysofsizen/2arethenfusedbyMERGEintothe finalresult.Thecorrectnessoftherecursionfollowsfromthe factthattherecursivecallisdoneonahalf-longarray,andfromtheterminationcondition “ por ”:ifthisholds,thentherecursion goeson;else(p ¼ r)thereisnothingtodoasthearrayhaslength1anditissorted.Notice,indeed,thatifSisnotempty,thenp4r canneverholdasqiscomputedsuchthatprqor.
ThealgorithmMERGE-SORThaslinear(henceoptimal)spacecomplexityasitonlyusesSitselfplusacontantnumberof variables.ThetimecomplexityT(n)ofMERGE-SORTcanbedefinedbythefollowingrecurrencerelation:
because,withaninputofsizen,MERGE-SORTcallsitselftwiceonarraysofsizen/2,andthencallsMERGEwhichtakes,aswe showedabove, Y(n)time.
WenowshowbyinductiononnthatT(n) ¼ Y(n log2n).Thebasecaseissimple:ifn ¼ 1thenSisalreadysortedandcorrectly MERGE-SORTdoesnothingandendsin Y(1)time.Ifn41,assumingthatT(n0 ) ¼ Y(n0 log2n0 )holdsforn0 on,thenwehave