Emerging trends in applications and infrastructures for computational biology, bioinformatics, and s

Page 1


https://ebookmass.com/product/integrated-wastewater-management-andvalorization-using-algal-cultures-goksel-n-demirer/

ebookmass.com

A Revolution in Rhyme: Poetic Co-option under the Islamic Republic Fatemeh Shams

https://ebookmass.com/product/a-revolution-in-rhyme-poetic-co-optionunder-the-islamic-republic-fatemeh-shams/

ebookmass.com

The Cambridge Handbook of Experimental Syntax (Cambridge Handbooks in Language and Linguistics) Grant Goodall

https://ebookmass.com/product/the-cambridge-handbook-of-experimentalsyntax-cambridge-handbooks-in-language-and-linguistics-grant-goodall/

ebookmass.com

The Golden Age of Indian Buddhist Philosophy Jan Westerhoff

https://ebookmass.com/product/the-golden-age-of-indian-buddhistphilosophy-jan-westerhoff/

ebookmass.com

Learning from my daughter : the value and care of disabled minds Eva Feder Kittay

https://ebookmass.com/product/learning-from-my-daughter-the-value-andcare-of-disabled-minds-eva-feder-kittay/

ebookmass.com

Introduction to physical anthropology Fifteenth Edition, Student Edition Bartelink

https://ebookmass.com/product/introduction-to-physical-anthropologyfifteenth-edition-student-edition-bartelink/

ebookmass.com

C.A.Cole

DepartmentofComputerScienceandEngineering,UniversityofSouthCarolina,Columbia,SC, UnitedStates

R.M.Cordeiro

CenterforNaturalandHumanSciences,FederalUniversityofABC,SantoAndre ´ ,SP,Brazil

E.B.Costa

CenterforNaturalandHumanSciences,FederalUniversityofABC,SantoAndre ´ ,SP,Brazil

P.Costa

DepartmentofSystemsEngineeringandOperationsResearch,GeorgeMasonUniversity,Fairfax, VA,UnitedStates

C.A.deLunaOrtega

UniversidadPolite ´ cnicadeAguascalientes,Aguascalientes,Mexico

A.Deeter

IntegratedBioscienceProgram,DepartmentofComputerScience,TheUniversityofAkron,Akron, OH,UnitedStates

J.R.Deller

MichiganStateUniversity,EastLansing,MI,UnitedStates

C.DiRuberto

DepartmentofMathematicsandComputerScience,UniversityofCagliari,Cagliari,Italy

Z.-H.Duan

IntegratedBioscienceProgram,DepartmentofComputerScience,TheUniversityofAkron,Akron, OH,UnitedStates

C.Early

DepartmentofScienceandEngineeringTechnology,UniversityofHouston-ClearLake,Houston, TX,UnitedStates

C.S.Ee

MultimediaUniversity,Melaka,Malaysia

A.Ertas DepartmentofMechanicalEngineering,TexasTechUniversity,Lubbock,TX,UnitedStates

A.Fahim

DepartmentofComputerScienceandEngineering,UniversityofSouthCarolina,Columbia,SC, UnitedStates

A.C.Ferraz

PhysicsInstitute,UniversityofSaoPaulo,SaoPaulo,SP,Brazil

Y.Fischer

FraunhoferInstituteofOptronics,SystemTechnologiesandImageExploitationIOSB,Karlsruhe, Germany

B.D.Fleet

MichiganStateUniversity,EastLansing,MI,UnitedStates

A.Fronville

ComputerScienceDepartment,UniversityofWesternBrittany,Brest,France

A ´ .Monteagudo

DepartmentofComputerScience,UniversityofACorun ˜ a,ACorun ˜ a,Spain

L.M.Montoni

ComplexSystemandSecurityLaboratory,UniversityCampusBio-MedicoofRome,Rome,Italy

J.L.Mustard

DivisionofBasicSciences,LaboratoryofBioinformaticsandComputationalBiology,KansasCity UniversityofMedicineandBiosciences,KansasCity,MO,UnitedStates

A.J.P.Neto

CenterforNaturalandHumanSciences,FederalUniversityofABC,SantoAndre ´ ,SP,Brazil

M.E.Nia

MultimediaUniversity,Melaka,Malaysia

H.Nishimura

GraduateSchoolofAppliedInformatics,UniversityofHyogo,Hyogo,Japan

S.Nobukawa

DepartmentofManagementInformationScience,FukuiUniversityofTechnology,Fukui,Japan

P.Philipp

VisionandFusionLaboratoryIES,KarlsruheInstituteofTechnologyKIT,Karlsruhe,Germany

C.W.Philipson

BioTherapeuticsInc.,Blacksburg,VA,UnitedStates

L.Putzu

DepartmentofMathematicsandComputerScience,UniversityofCagliari,Cagliari,Italy

T.S.Rani

SCISUniversityofHyderabad,Hyderabad,Telangana,India

S.K.Rath

DepartmentofComputerScienceandEngineering,NationalInstituteofTechnologyRourkela, Rourkela,India

W.C.Ray

TheResearchInstituteatNationwideChildren’sHospital,Columbus,OH,UnitedStates

V.Rehbock

DepartmentofMathematicsandStatistics,CurtinUniversity,Perth,WA,Australia

V.L.Rivas

ComputerScienceDepartment,InstitutoTecnologicodeAguascalientes,Aguascalientes, Mexico

V.Rodin

ComputerScienceDepartment,UniversityofWesternBrittany,Brest,France

J.C.M.Romo

ComputerScienceDepartment,InstitutoTecnologicodeAguascalientes,Aguascalientes,Mexico

F.J.L.Rosas

ComputerScienceDepartment,InstitutoTecnologicodeAguascalientes,Aguascalientes,Mexico

Q.Zhang

ShantouUniversityMedicalCollege,Shantou,Guangdong,PRChina;ShantouUniversity, Shantou,Guangdong,PRChina

Y.Zhang

NorthDakotaStateUniversity,Fargo,ND,UnitedStates

H.Zhao

IntegratedBioscienceProgram,DepartmentofComputerScience,TheUniversityofAkron,Akron, OH,UnitedStates

B.Zheng

ShantouUniversity,Shantou,Guangdong,PRChina

D.Zhukov

MoscowStateTechnicalUniversityofRadioEngineering,ElectronicsandAutomation"MIREA", Moscow,Russia

B.B.Zobel

DepartmentofDiagnosticImaging,UniversityCampusBio-MedicoofRome,Rome,Italy

Preface

Itgivesusgreatpleasuretointroducethiscollectionofchapterstothereadersofthebookseries “EmergingTrendsinComputerScienceandAppliedComputing”(MorganKaufmann/Elsevier).This bookisentitled“EmergingTrendsinComputationalBiology,Bioinformatics,andSystemsBiology— SystemsandApplications.”Thisisthesecondbookintheseriesaboutthetopic.Weareindebtedto ProfessorQuoc-NamTran(ProfessorandDepartmentChair)oftheUniversityofSouthDakotafor acceptingourinvitationtobethesenioreditor.Hisleadershipandstrategicplanmadetheimplementationofthisbookprojectawonderfulexperience.

ComputationalBiologyisthescienceofusingbiologicaldatatodevelopalgorithmsandrelations amongvariousbiologicalsystems.Itinvolvesthedevelopmentandapplicationofdata-analyticaland algorithms,mathematicalmodeling,andsimulationtechniquestothestudyofbiological,behavioral, andsocialsystems.Thefieldismultidisciplinaryinthatitincludestopicsthataretraditionallycovered incomputerscience,mathematics,imagingscience,statistics,chemistry,biophysics,genetics,genomics,ecology,evolution,anatomy,neuroscience,andvisualizationwherecomputerscienceactsasthe topicalbridgebetweenallsuchdiverseareas(foraformaldefinitionofComputationalBiology,referto http://www.bisti.nih.gov/docs/compubiodef.pdf).ManyconsidertheareaofBioinformaticstobea subfieldofComputationalBiologythatincludesmethodsforacquiring,storing,retrieving,organizing, analyzing,andvisualizingbiologicaldata.TheareaofSystemsBiologyisanemergingmethodology appliedtobiomedicalandbiologicalscientificresearch.Itisanareathatoverlapswithcomputational biologyandbioinformatics.Thiseditedbookattemptstocovertheemergingtrendsinmanyimportant areasofComputationalBiology,Bioinformatics,andSystemsBiologywithparticularemphasison systemsandapplications.

Thebookiscomposedofselectedpapersthatw ereacceptedforthe2014and2015International ConferenceonBioinformatics&ComputationalBiology(BIOCOMP’14andBIOCOMP’15),July, LasVegas,USA.Selectedauthorsweregiventheopportunitytosubmittheextendedversions oftheirconferencepapersaschaptersforpublicationconsidera tioninthiseditedbook.Other authors(notaffiliatedwithBIOCOMP)werealsogiventheopportunitytocontributetothisbook bysubmittingtheirchaptersforevaluation.Theeditorialboardselected34chapterstocomprise thisbook.

TheBIOCOMPannualconferencesareheldaspartoftheWorldCongressinComputerScience, ComputerEngineering,andAppliedComputing,WORLDCOMP(http://www.world-academy-ofscience.org/).AnimportantmissionofWORLDCOMPincludes“Providingauniqueplatformfora diversecommunityofconstituentscomposedofscholars,researchers,developers,educators,andpractitioners.TheCongressmakesconcertedefforttoreachouttoparticipantsaffiliatedwithdiverseentities(suchas:universities,institutions,corporations,governmentagencies,andresearchcenters/ labs)fromallovertheworld.Thecongressalsoattemptstoconnectparticipantsfrominstitutionsthat have teaching astheirmainmissionwiththosewhoareaffiliatedwithinstitutionsthathave research astheirmainmission.Thecongressusesaquotasystemtoachieveitsinstitutionandgeographydiversityobjectives.”Asthisbookismainlycomposedoftheextendedversionsoftheacceptedpapersof BIOCOMPannualconferences,itisnosurprisethatthebookhaschaptersfromahighlyqualifiedand diversegroupofauthors.

•Prof.XiangSimonWang.Head,LaboratoryofCheminfomaticsandDrugDesign,Howard UniversityCollegeofPharmacy,Washington,DC,USA

•Prof.MaryQ.Yang.Director,Mid-SouthBioinformaticsCenterandJointBioinformaticsPh.D. Program,MedicalSciencesandGeorgeW.DonagheyCollegeofEngineeringandInformation Technology,UniversityofArkansas,USA

•Prof.JaneYou.AssociateHead,DepartmentofComputing,TheHongKongPolytechnicUniversity, Kowloon,HongKong

•PengZhang.BiomedicalEngineeringDepartment,StonyBrookUniversity,StonyBrook,New York,USA

•Prof.WenbingZhao.DepartmentofElectricalandComputerEngineering,ClevelandState University,Cleveland,Ohio,USA

Wearegratefultoallauthorswhosubmittedtheircontributionstousforevaluation.Weexpressour gratitudetoBrianRomerandAmyInvernizzi(Elsevier)andtheirstaff. Wehopethatyouenjoyreadingthisbookasmuchasweenjoyededitingit.

OnBehalfofEditorialBoard: HamidR.Arabnia,PhD Editor-in-Chief,“EmergingTrendsinComputerScienceandAppliedComputing” Professor,ComputerScience DepartmentofComputerScience,TheUniversityofGeorgia,Athens,GA,UnitedStates

Acknowledgments

Weareverygratefultothemanycolleagueswhoofferedtheirservicesinpreparingandpublishing thiseditedbook.Inparticular,wewouldliketothankthemembersoftheProgramCommitteeof BIOCOMP’14andBIOCOMP’15AnnualInternationalConferences;theirnamesappearat: http:// www.worldacademyofscience.org/worldcomp14/ws/conferences/biocomp14/committee.html, http:// www.world-academy-of-science.org/worldcomp15/ws/conferences/biocomp15/committee.html.We wouldalsoliketothankthemembersoftheSteeringCommitteeofFederatedCongress, WORLDCOMP2015; http://www.world-academy-of-science.org/ andtherefereesthatweredesignatedbythem.TheAmericanCouncilonScienceandEducation(ACSE: http://www.americancse. org/about)providedtheuseofacomputerandawebserverformanagingtheevaluationofthesubmittedchapters.WewouldliketoextendourappreciationtoBrianRomer(ElsevierExecutiveEditor) andAmyInvernizzi(ElsevierEditorialProjectManager)andtheirstaffatElsevierfortheoutstanding professionalservicethattheyprovidedtous.WearealsoverygratefultoRonRouhaniandKaveh ArbtanforprovidingITservicesateachphaseofthisproject.

Weconsideratheoreticalexperimentwhereasinglecoloncrypt,withapproximately2000cells,is bisulfite-sequenced(using454sequencingtechnology).Weconsidermethylationpatternsmadeup from9CpGsitescontainedinasingleampliconsequence,numbersthataretypicalofrealdata(cf. [13, 14, 18, 19]).Specifically,weinvestigatehowthemeasurementprocessaffectsinferenceof twoquantities:thenumberofhaploidgenomes(outofthetotalof4000)thathavethemostcommon pattern,andhowmanydistinctpatternsarepresentintheentirepopulation.Whileweconsideraspecificcaseforillustration,thereadercangeneralizetootherexamples.

1.2 ERRORS,BIASES,ANDUNCERTAINTYINBISULFITESEQUENCING

Tounderstandhowthestepsinthebisulfitesequencingprotocolaffecttheanalysisofmethylationpatternswestartbydescribinganaivestatisticalmethodforthistypeofdata.Supposeweknowthatthere are N(0) genomesinoursampleandweobserve k differentpatternswithcounts y1,…,yk.Thenthenaive estimatorofthenumberofdistinctpatternsis k andtheestimatorofthenumberofgenomeswithpattern i is N(0)yi/Y,where Y ¼ y1 + ⋯ + yk isthetotalnumberofreads.

Therearemanywaysthaterrors,biases,anduncertaintycanbeintroducedinthecourseofbisulfite sequencing,andthatmayconsequentlyresultintheseestimatorsbeingbiasedorhighlyvariable.For thisstudywelimitthesourcesoferrorandbiasthatweconsider.Weconsideronlythosebiasesthat resultfromthelossofmoleculesfromtheexperiment:forexample,bybisulfitedegradation,bysampling,byligation,andbybeadplacement.Weexpectthatthelossofmoleculesfromobservationwill affectestimatesforthemethylationpatternfrequenciesaswellasestimationofthetotalnumberof distinctmethylationpatterns.Insummary,wearefocusingontheaffectoflosingmoleculesfrom theexperiment,andweassesshowsmallthedegradationprobabilitycanbewhilestillachievingsmall biasandvarianceinestimation.

1.3 MODELFORDEGRADATIONANDSAMPLING

1.3.1

MODELING

Wefocusontheexperimentalstepsofbisulfitedegradationandothersamplingsteps;theresultant protocolcanbedescribedasfollows.Let b bethenumberofCpGsatwhichthemethylationstatus willbeobserved.Thenthereare k ¼ 2b possiblepatternswhich wlog wecanlabel1, , k.Aquantitativemodelfortheprotocolisasfollows.

1. Westartwithatotalof N(0) moleculescontainingtheCpGsites,andwith nð0Þ i ofpattern i for i ¼ 1,…,k; nð0Þ 1 + ⋯ + nð0Þ k ¼ N ð0Þ .

2. Bisulfitetreatmentcausesthefailureofafraction(1 p)ofthemoleculesleavingatotalof N(1) molecules,whichis N(0)p onaverage,and nð1Þ i withpattern i for i ¼ 1, ,k.

3. PCRamplifiesthenumberofeachpatternbyaconstantfactor M

4. Thenumberofreads Y willbesmallerthan M N(1) duetolossofmoleculesduringligation, beadplacement,andothersamplingstepsthathappenafterPCR.

Wecandescribethismodelprobabilisticallyas

FIGURE1.2

Number of reads with pattern

0.010.050.090.130.170.210.250.290.330.370.410.450.49

Probability of bisulfite degradation

Foraninitialpopulation n(0) suchthatthemostfrequentpatternwaspresentin50%ofgenomes,thedata y weresimulated100times,with Y ¼ 104,andeachforarangeof p in(0,0.5).Theobservednumberofthispattern hasverylargevariation,butisunbiased,when p issmall.

Choiceofnoninformativeprior. Weneedtospecifyapriorfor ðnð0Þ 1 , …, nð0Þ k Þ.Consideringthis inferenceprobleminisolationfromeverythingknownabouttheevolutionaryprocessthat createdthepopulationofmethylationpatterns,asimplethree-levelpriorwouldbe

where 1k isa k-vectorofones,and π isadensityfunctionon ð0, 1Þ thatisyettobedecided. Thepriorsinthisfamilysatisfythesensiblepropertyofbeingexchangeablein n(0);thatis,

foranypermutation σ 2 Sk.Equivalently,ourprior knowledgeisinvarianttothewaythepatternsarelabeled.Simulationsfrom

showthatthispriorislessinformativeforthenumberofdistinctmethylationpatterns;see Chapter2 of[23]forfurtherdetails.

Algorithmdevelopment. Fig.1.3 showsadirectedacyclicgraph(DAG)representationofthe model,includingthepriorandobservationmodel.Weshallshowhow n(0) and q canbe integratedout.

FIGURE1.3

Adirectedacyclicgraph(DAG)representationofthemodel.Thecircularnodesarethosethatarenot observed;thesquarenodesareobserved;thediamondnodesareassumedknown.Thearrowsrepresent theconditionalrelationships:theconditionaldensityofanodegivenallitsancestorsisonlyafunctionofsaidnode anditsparentnodes.Thenodesintheplateexistin k copies;twonodesindifferentplatesareindependent conditionalontheparentsoftheplatenodes.

Thefullconditionalfor(n(0) , q)is

pðnð0Þ , qjnð1Þ , α, yÞ ∝ Y k i¼1 nð0Þ i nð1Þ i !pnð1Þ i ð1 pÞnð0Þ i nð1Þ i 1f0 nð1Þ i nð0Þ i g

N ð0Þ !Y k i¼1 q nð0Þ i i nð0Þ i ! 1f0 nð0Þ i g 1 X j nð0Þ j ¼ N ð0Þ ()

Γ ðαk ÞY k i¼1 qα 1 i Γ ðαÞ 1f0 qi 1g 1 X j qj ¼ 1 ()

whichisproportional(in n(0) and q)to

pðnð0Þ , qjnð1Þ , α, yÞ¼ N ð0Þ X k j¼1 nð1Þ j !!Y k i¼1 q nð0Þ i nð1Þ i i ðnð0Þ i nð1Þ i Þ! 1 X j nð0Þ j ¼ N ð0Þ () 1fnð1Þ i nð0Þ i g Γ

Dividing p(n(0) , n(1) , q, αjy)by p(n(0) , qjn(1) , α, y)thisleavesthejointposteriorfor n(1) , α (0) (1) (0) i =1, ..., k

whichcannotobviouslybewrittenintermsofsimplerdistributions.

ThefactthattheposteriordistributioncanbedecomposedinthiswaysuggestsaprogramforsamplingusingaMCMCalgorithm:getasamplefrom n(1) , αjy,thensampledirectlyfromtheexactconditionaldistributionsof qjn(1) , α, y,andthen n(0)jn(1) , q, y.Thisshouldbebetterthansamplingallthe variablesinaMCMCscheme,duetodecreasedcorrelationbetweensamples.

Thefullconditionalsfortherandomvariables α, nð1Þ 1 , …, nð1Þ k are

where sð iÞ ¼ Pj6¼i nð1Þ j

OneiterationoftheMCMCalgorithmproceedsas

1. update n(1) by k Metropolis-Hastingssteps:foreach i ¼ 1,…,k update nð1Þ i given s( i) and α;

2. update α byaMetropolis-Hastingsstep;

3. sample q given n(1) and α;

4. sample n(0) given n(1) , q and α

Thereisoneissuewiththisschemethatwouldpreventitworkingwellincertaincases:if Y islargethen theposteriorfor n(0) haslocalmodesatvectorsapproximatelyintegermultiplesofthetruth.Ifthe algorithmisinitializedneartoalocalmodethenitwillnotconvergetothetrueposterior.

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.