Download full Statistical modeling with r: a dual frequentist and bayesian approach for life scienti

Page 1


Inchausti

Visit to download the full and correct content document: https://ebookmass.com/product/statistical-modeling-with-r-a-dual-frequentist-and-bay esian-approach-for-life-scientists-pablo-inchausti/

More products digital (pdf, epub, mobi) instant download maybe you interests ...

Bayesian Analysis with Excel and R 1st Edition Conrad Carlberg

https://ebookmass.com/product/bayesian-analysis-with-exceland-r-1st-edition-conrad-carlberg/

Statistical Thinking From Scratch: A Primer For Scientists M. D. Edge

https://ebookmass.com/product/statistical-thinking-from-scratcha-primer-for-scientists-m-d-edge/

Applied Data Analysis and Modeling for Energy Engineers and Scientists

https://ebookmass.com/product/applied-data-analysis-and-modelingfor-energy-engineers-and-scientists/

Applied Statistics with R: A Practical Guide for the Life Sciences Justin C. Touchon

https://ebookmass.com/product/applied-statistics-with-r-apractical-guide-for-the-life-sciences-justin-c-touchon/

Physics for Scientists and Engineers: A Strategic Approach with Modern Physics, 5th Edition, Global Edition Randall Knight

https://ebookmass.com/product/physics-for-scientists-andengineers-a-strategic-approach-with-modern-physics-5th-editionglobal-edition-randall-knight/

Bayesian Statistics for Beginners: A Step-By-Step Approach Therese M Donovan

https://ebookmass.com/product/bayesian-statistics-for-beginnersa-step-by-step-approach-therese-m-donovan/

Data

Analysis for the Life Sciences with R 1st

Edition

https://ebookmass.com/product/data-analysis-for-the-lifesciences-with-r-1st-edition/

An Introduction to Statistical Learning with Applications in R eBook

https://ebookmass.com/product/an-introduction-to-statisticallearning-with-applications-in-r-ebook/

SAS for R users : a book for budding data scientists First Edition Ohri

https://ebookmass.com/product/sas-for-r-users-a-book-for-buddingdata-scientists-first-edition-ohri/

StatisticalModelingWithR

StatisticalModeling WithR AdualfrequentistandBayesian approachforlifescientists

PABLOINCHAUSTI

CentroUniversitarioRegionaldelEste,UniversidaddelaRepública,Uruguay

GreatClarendonStreet,Oxford,OX26DP, UnitedKingdom

OxfordUniversityPressisadepartmentoftheUniversityofOxford. ItfurtherstheUniversity’sobjectiveofexcellenceinresearch,scholarship, andeducationbypublishingworldwide.Oxfordisaregisteredtrademarkof OxfordUniversityPressintheUKandincertainothercountries ©PabloInchausti2023

Themoralrightsoftheauthorhavebeenasserted Impression:1

Allrightsreserved.Nopartofthispublicationmaybereproduced,storedin aretrievalsystem,ortransmitted,inanyformorbyanymeans,withoutthe priorpermissioninwritingofOxfordUniversityPress,orasexpresslypermitted bylaw,bylicenceorundertermsagreedwiththeappropriatereprographics rightsorganization.Enquiriesconcerningreproductionoutsidethescopeofthe aboveshouldbesenttotheRightsDepartment,OxfordUniversityPress,atthe addressabove

Youmustnotcirculatethisworkinanyotherform andyoumustimposethissameconditiononanyacquirer

PublishedintheUnitedStatesofAmericabyOxfordUniversityPress 198MadisonAvenue,NewYork,NY10016,UnitedStatesofAmerica

BritishLibraryCataloguinginPublicationData

Dataavailable

LibraryofCongressControlNumber:2022937827

ISBN978–0–19–285901–3(hbk)

ISBN978–0–19–285902–0(pbk)

DOI:10.1093/oso/9780192859013.001.0001

Printedandboundby CPIGroup(UK)Ltd,Croydon,CR04YY

Coverimage:JohnLund/GettyImages.

LinkstothirdpartywebsitesareprovidedbyOxfordingoodfaithand forinformationonly.Oxforddisclaimsanyresponsibilityforthematerials containedinanythirdpartywebsitereferencedinthiswork.

ForJoana, becausetwoissomuchmorethanone

Preface

Aprefaceistheclosestanauthormayhavetothe“letterofmarque”usedbyEuropean governmentsintheseventeenthcenturytoauthorizepiracywiththesovereign’stacitconsent.Anauthorcanwritejustaboutanythingintheprefacewiththeresignedpermission oftheeditor.

Havingfinishedthebook,Iherebygivemyselfpermissiontowriteinthefirstperson singular.Ihavespentmanythousandsofhoursendlesslywritingandrewritingthisbook overthelast24months.Ithasbeenachallenge,apleasure,athrill,andaburdenatthe sametime.

Atthispoint,Ihavethefollowingunsettlingmixtureoffeelings:

* Reliefandjoy:Finallyitisover.

* Prideandaccomplishment:Ihavedoneit!Ihavedoneit!Ihavedoneit!

* Uncertainty:Wasitworththeeffort?Willitbewell/badlyreceived?

* Insecurity:DidIdouble-checkeverything?Doesithaveanyembarrassingerrors?

Itisnowtimeforthebooktosinkorswimonitsownatthehandsofitsreaders.

Alongtimeago,myundergraduateadvisersentmetofetchsomerainfalldatafrom theVenezuelanMinistryoftheEnvironment.Ireluctantlywent,gotlostinsidetheugly building,failedtogatherthedata,butstumbledbychanceintoanearlyemptyand shabbyMinistrybookstore.ThereIfoundagem:theoriginal1975Spanisheditionof thebook Areografía:estrategiasgeográficasdelasespecies byEduardoRapoport.Hewasa cleverandoriginalArgentinianecologistwhoanticipatedwhatlaterbecameknownas macroecology.ThePergamonPresstranslation(Areography:TheGeographicStrategiesof Species)missedtheprefaceoftheSpanishoriginal,easilythebestprefaceofascientific bookIhaveeverread.IreadRapoport’sprefaceatthebookstore,andrightawayIbought thebookusingallthemoneyIhad,includingthereturnbusfare.Iwalkedthe4.5km home,readingthebookbetweentrafficlights.Recognizingtheabsenceofrulesforwhat theprefaceofascientificbookshouldcontain,Rapoportaimedtobringahumanized depictionofhisCV,sothat“sciencebookswouldleavehomeandbetakentothedentist waitingroom,”andyouknewwhattosayifyouevermeettheauthor.Followingthe masterEduardoRapoport,hereismyattempt.

ThingsthatIlove

* ThedazzlingimaginationsofGabrielGarcíaMárquez,JulioCortázar,andAlejoCarpentier,thedignityofPrimoLevi,thewisdomofUmbertoEco,thehumanityofItalo Calvino, TheMagus byJohnFowles.

* ThemusicandshiningsmileofLouisArmstrong,thesaxophoneofJohnColtrane(his mission:“amasterpiecebymidnight”),andArtTatumplayingthepiano;theBeatles, EricClapton,Sting,andMarkKnopfler;thebluesofTajMahal,Keb’Mo,andBuddy

Guy;thestirringvoicesofAnnieLennox,UteLemper,andMadeleinePeyroux;the lyricsofLeonardCohenandBobDylan.

* Twoscientificheroes:John(JBS)HaldaneandRichardFeynman.

* AlltheMontyPythonfilmsandtheoriginalBBCseries.

* AllPicasso,exceptthepinkperiod;GustavKlimtandVincentvanGogh.

* Let’sshareafullUruguayanbarbecue,includingmollejas(thymus),kidneys,andsweet bloodsausages.ItwouldbeglorioustoenjoysomeFrenchgoatcheese,afreshgreen saladwithendivesandcherrytomatoes,redwineofcourse,andamangoorpassionfruit moussetobringmysoulclosertoearthlyparadise.

* And,aboveall,let’stalk,exchangingstories,books,andanecdotes.Thefoldersofmy memoryholdcountlessmegabytesofhistorical,literary,andscientificinformation, someofwhichmayeveninterestorentertainyouforawhile.AndIcanswiftlychange myopiniononanyissueunderthesunasmanytimesasyoucanmanagetoconvince mewithgoodarguments,sensibleevidence,andamodicumofstraightreasoning.

ThingsIhate

* Socialinjusticeinanyshapeorform.

* Thestupidityofthemilitaryandallitscheerleaders.

* Social,racial,andsexualdiscriminationunderanydisguiseorshade.

* Alltotalitarianideologiesandformsofthought.

* Thestiffness,conservatism,intolerance,andbackwardnessofthetraditionalCatholic Churchandofmanyrecentlycreatedprotestantchurches.

* Pineappleonpizza:ahorrendousmixthatspoilstwogreatthings.

Mystory

Myownbiographyisprettyordinary.Iwillrecalljustafeweventsthatmightperhaps inspireotherstobelieveinthemselves.IwasborninUruguay,asmallcountrythatlies sandwichedbetweenArgentinaandBrazilatthebottomleftoftheworldmap.Afterstartingpublicprimaryschoolthere,IfollowedmyfathertoVenezuela.Istarteduniversity wishingtobecomeanelectricalengineer,butfinallymanagedtograduateinbiologyat thetardyageof26.

Idesperatelywantedtostudymoreandbecomeascientist.Withmythenpartner, wemanagedtogetadmittedtotheStateUniversityofNewYorkatStonyBrook(now StonyBrookUniversity)bysheerluck.InSeptember1992,wegatheredallourmoneyand belongings,gotsomefamilyloans,andtraveledtoNewYork.Welandedthereperfectly unawareofeverything,includinghowtogettotheuniversityfromtheairport.Wehad

expectedtopaythefirstyearofuniversityfees,bettingthatourgoodbackgroundwould allowustoobtaingoodgradesthatmightleadtosomefinancialsupport.Butitturned outthatwedidnothavetopayanyuniversityfeesatall!

Andevenbetter,onedaybeforethestartofmyfirstsemester,anothergraduatestudent chosetotakecareofherillgrandmotheranddeclinedherteachingassistantposition.I wasofferedit,andofcoursetookit:$750amonthminustaxesamountedtotouching heaven.ThenextdayIwenttoteachsomeverybasicbiologyto25puzzledAmerican students.DuringmyfourthdayinanEnglish-speakingcountry,Ibarelyunderstood40% ofwhatthestudentssaid.ButIhadaninspiredideathatsavedme.Ishamelesslytold themofahearingdisabilitythatrequiredthemtospeakslowlyandveryloudlyforme tounderstandthem.Andtheydidittosuchanextentthatmystrangedisabilitymiraculouslydisappearedafterafewweeks.WequicklyboughtaTVtohelptrainmywooden ears.Atfirst,theonlyprogramthatIcouldunderstandwasthe(British)PrimeMinister’sQuestionsthatwasbroadcastonCSPANverylateatnightjustforthe(dis)pleasure ofinsomniacs.Thisverytheatrical,ceremonial,andmostlypointlessweeklyexerciseof BritishpoliticswasmydoortounderstandingspokenEnglish,andthestartingofan anglophiliathatonlyBrexitrecentlyanddefinitelymanagedtocure.

AtStonyBrook,ImetLevGinzburgbychancewhileeatingsandwichesattheDepartmentofEcologyandEvolution.ThisveryintelligentandwittyRussianmathematician becamemyPhDsupervisor.Atfirst,itwasnearlyimpossibletounderstandwhatthisman wastalkingsoquicklyabout.Iusedtosharea6m2 officeinfrontofhis. Лeв oftencalled meintohisofficeusingmundaneexcusestospendmanyhourstalkingandteachingme onaone-to-onebasisasifIwasamedievalapprentice.Theseinteractionsovertheyears shapedmeintoascientist,andaffectedmybrainmorethananythingsincegastrulation. Thewiderangeoftopicsoftheseconversationsincludedmathematics,ecology,classical physics,dynamicalsystems,riskanalysis,philosophyofscience,thelatestbookswewere reading,andwhoknowswhatelse IstillvividlyrecalltwoentireFridayafternoons that Лeв devotedtoteachingmethepuzzlingbasicsofquantummechanics(including theSchrödingerwaveequation)usingasmallgreenblackboardandwhitechalk.Itwas anindescribablepleasuretohavereceivedsuchagiftofhumanknowledgefromyou, мoй дopoгoйдpугинacтaвник.

Igraduatedin1998,andmyItalianpassport(lifelessonfortheyoung:youcannever havetoomanypassports;acquireasmanyaspossiblesincesomemayopenunexpected doors)gotmeanEUfellowshipforapostdocwithJohnLawtonatImperialCollege,UK. IlatermovedtoFrancewhereIlivedandworkedfornineyears.Othermovesfollowing anon-traditionalandhardlystraightpathtookmebacktoUruguay,whereIlivenow.

Iwillnotbotherthereaderwithfurtherdetailsofmyacademicpast.Thereishardly anymeritinvolvedinit.Likeyou,Ihave23pairsofchromosomesineverycell,bloodthe samecolorasyours,andagenomethatdiffersfromyoursandfromMandela’s,Einstein’s, Himmler’s,andStalin’sbyaboutsixmillionDNAbases(~0.06%,anirrelevantdifference sinceonlyabout2%ofourDNAistranslatedintoproteins).Therefore,restassuredthat thereisnothingspecial,unique,orevengoodaboutme.Youcaneasilydobetterthan meifyouwish.

Justtrustmeonthisone.Mostpeoplewhosucceedinlifearethosethatseriously applytheirheartandmindandenergylongenoughtopursuetheirdreamswithstubborn determination.Iamconvincedthatlife(ortheuniverse,orthegods)rewardspersistence andsingle-mindednessoverapparentleapsofinspiredgenius.However,forthatyoufirst needtoholddreamsandambitionsforyourself.Nobodycanteachyoutodreamand

aspiretoahigherfuturethanyourpresent.Dreamingturnsouttobeaspontaneousand personalaffair.Igleanedthenextquote(outofcontext,andoddlyenoughduetoLenin) fromaJulioCortázarbookthatsummarizeswellwhatIwishtoconvey:

Theriftbetweendreamsandrealitycausesnoharmifonlythepersondreamingbelievesseriouslyinhisdream,ifheattentivelyobserveslife,compareshis observationswithhiscastlesintheair,andif,generallyspeaking,heworksconscientiouslyfortheachievementofhisfantasies.Ifthereissomeconnection betweendreamsandlifethenalliswell.

IhavebeenhelpedbeyondthecallofdutybythestaffofOxfordUniversityPress.Ian Sherman,senioreditorofLifeSciences,incrediblyrememberedmeaftera19-yearhiatus and,evenmoresurprisingly,believedinandlikedtheideaofthisbook.Hevariously guided,prompted,keptquiet,andencouragedme,andIcannotthankhimenoughfor allthisandmore.ImustalsothankKatieLakinaforputtingtheproductionofthisbook backontrack,KarenMooreforherdiligentanddedicatedworkduringthetransformation ofmanyfilesintoafinishedbook,andRichardHutchinsonforhisattentiveandcareful copyeditingthatgreatlyimprovedthequalityofthetextthatyouarereading.

ThefreeandopensoftwareRandthemanypackagesusedinthisbookstemfrom thefantasticandcreativeworkofmanygenerousscientistsandprogrammersaroundthe world.Theirincredibleworkhascreatedthecollectivepropertyofstatisticalknowledge thatmadethisbookpossible.WhileIlackthemeanstothankyouall,letmeatleastraise aglasstotoastyouwithendlessgratitude.Ifthereisanyinformaticsgod,itsblessings shouldalsoextendtothecreatorsandmaintainersofLinuxUbuntuandLibreOffice.

SebastiánAguiar,MarcKéry,EnriqueLessa,DanielNaya,andMatíasSchraufkindly read,commentedon,andcorrecteddifferentchaptersofthisbook.Theirinputand feedbackpromptedchangesthatledtoimprovementsandhopefullyfewerembarrassing mistakes.Thestubbornerrors,plaininconsistencies,andstraightomissionsthatmight remainare,ofcourse,mineonly.MelinaAranda,JavierGarcía,DanielNaya,AliciaPonce, andAgustínSáezkindlyprovideddatafromtheirpublishedpapersthatareusedaseither casestudiesorproblemsattheendofsomechapters.IthankAlexandraElbakyanfor allowingmetoaccessanenormousamountofessentialinformationthatIcouldnot otherwisehaveeverdreamedtoreadanduseinthisbook:

Irefusetoindulgeinthetackyfinalsentencesthatendtheprefacesofmanyscientific books:“Lastbutnotleast,Iwanttothank ... fortheirpatienceand ... forthemanyhours Ispent ”Ohno,pleasenotthatagain!ButIwillsaythis:overthelast12years,Ihave beenblessedbeyonddeservingbytheearthlygodstosharemylifewithJoanaGagliardi. Sheismymagnificentpartner,mypassionatelover,myclosefriend,andatrulygreatand beautifulwomanwithashinysoulenvelopedbyalargesmileandalmond-shapedeyes.I havealsohadtheprivilegetosharetheseyearswithFiamma(24)andIahel(20),Joana’s brightdaughterandson,whomIhaveseengrowintotwobeautifuladultswhoarethe betterangelsofmysoul.

Thisisenoughnow.Youdidnotbuythebooktoreadthisbabble.Youwantsomestats, andthatiswhatyouwillfindstartingonthenextpage.Shouldyouhaveanycomments, complaints,remarks,orsuggestions,orhavespottedanysmallorlargeerrors,Iwantto hearfromyou,sopleasewriteto pablo.inchausti.f@gmail.com

Withwarmregards, Pablo

5TheGeneralLinearModelII:Categoricalexplanatory

5.8Aposterioritestsinfrequentistmodels

6.5Analysisofcovariance:Mixingcontinuousandcategorical explanatoryvariables

6.6Analysisofcovariance:Frequentistfitting

6.7Analysisofcovariance:Bayesianfitting

7ModelSelection:One,two,andmoremodelsfittedtothe

7.1Introduction

7.2Theproblemofmodelselection:Parsimonyinstatistics

7.3Modelselectioncriteriainthefrequentistframework:AIC

7.4ModelselectioncriteriaintheBayesianframework:DICand WAIC

7.5Theposteriorpredictivedistributionandposteriorpredictive checks

7.6NowbacktotheWAICandLOO-CV

7.7Priorpredictivedistributions:Arelatively“new”kidontheblock

8TheGeneralizedLinearModel

8.1Introduction

8.2WhatareGLMsmadeof?

8.3FittingGLMs

8.4GoodnessoffitinGLMs

9WhentheResponseVariableisBinary

9.1Introduction

9.2KeyconceptsforbinaryGLMs:Odds,logodds,andadditional linkfunctions

9.3FittingbinaryGLMs

9.4UngroupedbinaryGLM:Frequentistfitting

9.5FurtherissuesaboutvalidatingbinaryGLMs

9.6UngroupedbinaryGLMs:Bayesianfitting

9.7GroupedbinaryGLMs

9.8Problems

10WhentheResponseVariableisaCount,OftenwithMany Zeros

10.1Introduction

10.2Over-dispersion:Acommonproblemwithmanycausesand somesolutions

10.3Plantspeciesrichnessandgeographicalvariables

10.4Modelingofcountswithanexcessofzeros:Zero-inflatedand hurdlemodels

10.4.1Frequentistfittingofazero-inflatedmodel

13.4Problemsandinconsistencieswiththedefinitionofrandom effects

13.5Population-levelandgroup-leveleffectsinBayesianhierarchical models

13.6Fittingmixedmodelsinthefrequentistframework

13.7Statisticalsignificanceandmodelselectioninfrequentistmixed models

13.8Theshrinkageorborrowingstrengtheffectinmixedmodels

13.9FittingmixedmodelsintheBayesianframework

14.4.2Randomizedblockdesign

14.4.3Split-plotdesign

14.4.4Nesteddesign

14.4.5Repeatedmeasuresdesign

15MixedHierarchicalModelsandExperimentalDesignData

15.2.1BinaryGLMMwitharandomizedblockdesign:Frequentist models

15.2.2BinaryGLMMwitharandomizedblockdesign:Bayesian models 407

15.3GaussianGLMMwitharepeatedmeasuresdesign 416

15.3.1GaussianGLMMwitharepeatedmeasuresdesign:Frequentist models 420

15.3.2GaussianGLMMwitharepeatedmeasuresdesign:Bayesian models

15.4BetaGLMMwithasplit-plotdesign 428

15.4.1BetaGLMMwithasplit-plotdesign:Frequentistmodel 432

15.4.2BetaGLMMwithasplit-plotdesign:Bayesianmodel 439 15.5Problems 449

Afterword

AppendixA:ListofRPackagesUsedinThisBook

AppendixB:ExploringandDescribingtheEvidenceinGraphics (onlyavailableonlineat www.oup.com/companion/InchaustiSMWR)

AppendixC:UsingRandRStudio:TheBare-BonesBasics (onlyavailableonlineat www.oup.com/companion/InchaustiSMWR)

Index

PARTI TheConceptualBasisforFitting

StatisticalModels

CHAPTER1

GeneralIntroduction

1.1 Thepurposeofstatistics

Thefirstarticleofthefirstissueof AnnualReviewofStatistics wasentitled“Whatis statistics?”(Fienberg2014).Itstartedbylistingeightdifferentandonlypartlyoverlappingdefinitions.Itishardtoimaginethatchemistsorphysicistswouldprovideasmuch varietywhendefiningtheirowntrades.TheAmericanStatisticalAssociationoffersavery inclusivedefinition:“Statisticsisthescienceoflearningfromdata,andofmeasuring, controllingandcommunicatinguncertainty”(https://www.amstat.org/asa-newsroom). Whilenoteverystatisticianwouldagreewiththis,itservestohighlightthatstatistics isakindofmeta-disciplineaimingtoextractreal-worldinsightsfromdatagathered withinotherrealmsofknowledge(Wildetal.2011).Statisticsisameta-disciplinebecause, indealingwiththefuzziness,imprecision,andvagariesofreal-worlddata,itpushes itspractitionerstoformulate“theoreticalscaffolds”thatcanbeusedonotherareasof knowledge.

Obtaininginsightsfromstatisticsinvolvesspecifyinghypotheses,gatheringdatarelevanttoaproblem,modelingdatawithquantitativemethods,andinterpretingquantitativefindingswithinthespecificcontextofthescientifichypothesesthatmotivated theresearch.Theseactivitiesdonot,andcannot,takeplaceasanintellectualabstraction aimingtosolveproblemswithintheclearlydefinedboundariesofappliedmathematicswherestatisticsissometimesplaced.Mathematiciansoftenneedto(over-)simplifythe contextoftheinitialproblemtobetterdefineanarrower,moreinteresting,andhopefully solvableresearchquestion.Incontrast,instatisticsthecontextisthekeytointerpreting thefindingsofcomputerprintoutsoftablesandgraphsandtotransformingdatainto insightsintermsoftheresearchproblemandhypothesesthatmotivatedthegatheringof evidence.Thepracticeofstatisticsis(orrathershouldbe)somethingfarmoresubtleand interestingthanaquasi-mechanicalquesttocontrastandrejecthypotheseswhenever p <0.05,asyoumighthavelearnedinundergraduatecourses.

“Statisticiansareengagedinanexhaustingbutexhilaratingstrugglewiththebiggest challengethatphilosophymakestoscience:howdowetranslateinformationintoknowledge?”(Senn2003 p.3).Takenatfacevalue,howcanthislaststatementfailtoexciteyou? Statisticiansdealwiththeexcruciatingmessinessofreal-worlddata.Bythatwemean theuncertaintyinthemeasurementsofvariables,thepervasivevariabilityoftheworld, andtheoftenfoggyrelationsbetweenthevariablesthatweaimtouncoverinorderto claimempiricalsupportforascientifichypothesis.Statisticshastotacklethechanceand contingencythatlieentangledwithinreal-worlddata,andwhoseinfluencecanbeaspervasiveasthatofthesignalrelatedtothemainpatternsthatwewishtoreliablyretrieve. Thestatisticalholygrailistouncoveranapproximatestatisticalmodelthatcouldhave plausiblygenerated(andhencefitsacceptablywell)theavailableevidence.Butthisisnot

all.Themagnitudesoftheestimatedparametersofsuchawell-fittingmodelshouldallow theevaluationofastatisticalhypothesisandhaveatangible,real-worldinterpretationin theresearchcontextthatpromptedthedesignoftheexperiment,thegatheringofdata, anditsanalysis.

1.2 Statisticsinaschizophrenicstate?

Overthelastcentury,statisticshasfullydevelopedtwotheoreticalframeworks(frequentistandBayesian,tobeexplainedinChapters 2 and 3)thathavecontendedtobecome “therightandappropriate”wayofanalyzingdata.Youwillnotfindpractitionersin otherscientificdisciplinesspillingsomanybarrelsofinkfightingeachotherwithout everachievingcompletevictory.Thesetwoframeworkslargelystemfromtwodifferent viewsofprobabilitythathavecoexistedsincetheseventeenthcentury,andtheirproponentsanddefendershaveengagedinacrimoniousandprotracteddisputesduringmost ofthetwentiethcentury.Thecurrentlydominantfrequentistframeworkisanincoherent blendthatarosefromtheprotractedclashbetweenR.FisherononesideandJ.Neyman andE.Pearsonontheother.ItislikelythatFisherandNeyman/Pearsononlyagreedon theirstrongdislikeanddistrustoftheuseofpriorinformation(again,tobeexplainedin Chapter 2)asasubjectiveandarbitrarycomponentoftheBayesianframeworkthatthey wanteduprootedfromstatistics.Aimingforobjectivityandconclusionsthatareindependentofwhoeveranalyzesthedata,mostofthepracticeofstatisticschampionedunder thefrequentistframeworkhasturnedintoaquasi-mechanizedprocedureaimingtoreject statisticalhypotheses.

Itiscurrentlyfairtosaythataclearmajorityofscientistshavebeeneducatedincourses basedon(andhenceonlyuse)frequentistmethods.However,beingin(arapidlygrowing) minoritydoesnotsuggest,orevenlessproves,thatthechampionsoftheBayesianframeworkare“wrong”byanystretchoftheimagination.Thestruggleforprimacybetween proponentsofthesetwostatisticalframeworkshasbeenlargelyinconclusivethusfar. Atpresent,scientistshaveamoreecumenicalorpragmaticviewofusingwhatseems appropriate,andwhattheyknowbest,tosolvetheproblemathand.Scientistsneeding toemploytheotherframeworkalmostneedtorelearnfromscratch.Thisbookexplains, discusses,andappliesboththefrequentistandBayesianstatisticalframeworkstoanalyze thedifferenttypesofdatathatarecommonlygatheredbyresearchscientistsandstudents.

Thebookinyourhandsaimstopresentmaterialinaninformal,approachable,and progressivemannersuitableforresearchscientistsandgraduatestudentswithamodicumofprevioustraining.Thebookcoversallthematerialinatheoreticallyrigorous manner,focusingonthepracticalapplicationsofallmethodstoactualresearchdata. Itaimstoprovidejustenoughtheoreticalbackgroundforyoutounderstandthebasic underpinningsofthestatisticalmodelsexplainedhere.Everyimportantformulawillbe “translated”intowordstoprovideaclear,non-intimidatingdescriptiontoreaderswith onlyabasicbackgroundinmathematicsandinferentialstatistics.Incontrasttobooks ladenwithmoretheory,thisisa“how-to”book.Itemphasizesteachingbylearningto computeusingR,andtothoroughlyinterprettheresultsfromtheviewpointandneeds ofresearchscientistsandstudents.

1.3 Howisthisbookorganized?

Itisunthinkabletocarryoutstatisticalanalysisofmeaningfulamountsdataofeven moderatecomplexitywithoutacomputer.Thisbookwillmakeextensiveuseofthe Rprogrammingenvironment(http://www.r-project.org/).Thisisanopen-source(one

canaccessandeditthecodeofalltheRfunctionsandsavearevisedversioninone’s computer),interpreted(itdoesnotrequirecompilationtobeexecuted)programminglanguageenvironmentforstatisticalcomputingandgraphics.RrunsonLinux,Windows, andmacOS,amongothers,andisthebrainchildofitscreatorsRossIhakaandRobert Gentleman.ItisnowsupportedbytheRFoundationforStatisticalComputing(Thieme 2018).RhasexperiencedphenomenalgrowthsinceAugust1993tobecomeoneofthe mostpopularandfastestgrowingprogramsforstatisticalanalysisandgraphicsworldwide.Beingaprogramminglanguage,Rcanbeeasilyextendedbywritingfunctionsand extensions.ThereisagrowingandveryactiveRcommunitycreatingpackages(more than17,500packagesinApril2021)andprovidinganswersintermsofcodeandexplanationsinmanyactiveandfast-reactingmailinglists.RcodeismostlywrittenintheR languageitself,althoughadvanceduserscanlinkittoothercomputerlanguagessuchas C,C++,FORTRAN,Java,andPythonusingspecificcommandstoassistintheexecution ofcomputer-intensivetasks.

MoststatisticsbooksusingRaimforstandaloneusebyprovidingbrief(andbynecessity incomplete)introductorychaptersabouttheinstallationandbasicuseofR,including thebasiccommandstogenerategraphics.ThisintroductorymaterialaboutRcantakeup severalchapters,often10to20percentoftheoveralllengthofmanystatisticaltextbooks. Therearemanybooksandcompanionwebsitesthatcoverboththebasicstepsforusing Randproducinggraphs:see Beckermanetal.(2017), Lander(2017), Petcheyetal.(2021), and Teetor(2017) forthebasicsofR; HortonandKleinman(2011) and Kabacoff(2011) forsimplegraphics,and AbedinandMittal(2015), Chang(2012),and Teutonico(2015) for ggplot2 graphics.Wefeltitunwisetoprovidethesamematerialinprintyetagain. Thecompanionwebsite(www.oup.com/companion/InchaustiSMWR)containsdetailed informationabouttheinstallationofRinWindows,macOS,andLinux,alongwiththe basicsyntaxforusingandmanipulatingRobjects.Thewebsitealsoprovidesdetailed explanationsformakingbasicplotsinRusingthepackage ggplot2 (Wickham2016), whichisrapidlybecomingthedominantapproachtoproducinggraphicsinR.Fromhere on,allRcodeinthebookwillbeshown in this font and highlighted in gray Whilethecodenecessaryforeachstatisticalanalysiswillbethoroughlyexplainedineach chapter,thecodeusedtomakeallthefigurescanbefoundonthecompanionwebsite toavoiddistractingyoufromunderstandingthemainideas.Youwillalsofindallthe datasetsandscripts(i.e.,textfileswithcommands)foreachchapterinthecompanion website.

Rhasaratherminimalistinterfaceinwhichtheusertypescommandsandobtains statisticalandgraphicalresults.RStudio(https://rstudio.org)hasbecomeaverypopular graphicalinterfacethatmanagestheinteractionbetweentheuserandRwithgreatflexibility.Theinstallationandbasicuseofthisfreegraphicalinterfaceisalsoexplainedon thecompanionwebsite.Nonetheless,allstatisticalandgraphicalanalysesdescribedin thisbookareindependentofwhetheroneusesagraphicalinterfacesuchasRStudio.

Thisbookisorganizedinthreeparts.Part I willprovidethefundamentaldefinitions ofprobabilitythatunderliethefrequentistandBayesianframeworks,anddevelopsthe notionofparameterestimationasthemaingoalofstatisticalinference(Chapter 2).

Chapter 3 thencoversthebasicunderpinningsofthefrequentistandBayesianmethods ofparameterestimation(i.e.,maximumlikelihood,andtheMarkovchainandHamiltonianMonteCarloalgorithms)thatwillbeusedinthedataanalysesofallthechaptersof Parts II and III

Part II representsthebulkofthisbook.Itcoverstheanalysisofthemaintypesofdata gatheredinsocialandnaturalsciencesfrombothfrequentistandBayesianperspectives. Eachdatasetwillbeanalyzedwithbothframeworks.Readersmaychoosetofocuson

separate,largelyself-containedchaptersdependingonthetypeofresponsevariable.However,thesingleeffectsofnumericalandcategoricalexplanatoryvariables(Chapters 4 to 6)shouldbeexaminedasbasicfoundationalaspects.Chapter 7 coversthetheoretical basisofmodelselection(andafewotherthings),againforbothfrequentistandBayesian frameworks.Chapter 8 reviewstheconceptualbasisofthegeneralizedlinearmodelsthat allowviewingmostoftheanalysesexplainedinseparatechaptersofPart II asspecial cases.Theassessmentofstatisticalsignificanceofparameterestimates,thecalculationof confidenceintervals,andtheassessmentofmodelgoodnessoffitarealsocovered.The restofPart II covers,inseparatechapters,theanalysisofdifferenttypesofdatacommonly encounteredinscientificresearchinvolvingbinary,count,proportions,andotherrealvaluedoutcomevariables.Thequalityoffitofallthestatisticalmodelstothedatawill beassessedwithresidualanalysisandrelatedmethods,allofwhichwillbeexplainedin detail.

Part III buildsontheunderstandinggainedinPart II toincorporaterandomor population-leveleffects(Chapter 13).Thisenablestheincorporationofstructureinthe dataimposedbyexperimentalandsurveydesigns(Chapter 14).Itisatthispointthatthe bookreachesitshighestlevelofcomplexity,generality,andusefulness.Asinallchapters ofPart II,theemphasisisplacedonformulatingthestartingstatisticalmodel,fittingthe modelusingeitherthefrequentistorBayesianframework,interpretingandunderstandingthemodeloutputs,assessingthegoodnessoffittothedata,andtranslatingintowords andfiguresthestatisticalfindingsforinterpretation.

Thebookwasstructuredandwrittenassuminganimaginaryreaderinterestedinacquiringabroadandcomprehensiveunderstandingofunivariatestatisticalanalysisaftera basicundergraduatecourseastaughtinmostengineeringandsciencefacultiesaround theworld.Thesesingle-semestercoursesprovideabasicunderstandingofdescriptive statistics(mean,variance,quartiles),thebasicnotionsofprobabilitytheory,aworking knowledgeofsomeprobabilitydistributions(e.g.,normal,binomial),howtocalculate theconfidenceintervalsofatleastthepopulationmean,thebasis(i.e.,typesofstatistical errors,thenotionofstatisticalsignificance)fortestingstatisticalhypothesesaboutthe differencesbetweentwomeans,andhopefullysimplelinearregression.Thebookstarts slowlytoprogressivelybuildabasicunderstandingofthemainconceptsandideasthat willbeusedinsubsequentchapters.

1.4 Howtousethisbook

In1963theArgentinianwriterJulioCortázarpublishedtheremarkablebook Hopscotch (or Rayuela forthosewhocanreaditintheSpanishoriginal).Thisnovelhas155mostly shortchapters,99ofwhichwereconsidered“expendable”byitsauthor.Evenmore, JulioCortázarproposedseveralalternativewaysinwhichhisbookcouldbereadasif thechapterswerepiecesofmanydifferentpossiblepuzzlestobeassembledatwillbyits readers.FollowingCortazar’slead,hereareafewsuggestedpathsforusingthisbook:

• IfyoulackareasonableknowledgeofRandhowtomakegraphics,youshoulddefinitelystartbyreadingtheintroductorymaterialaboutRandRgraphicsonthe companionwebsite.

• Shouldyounotbeinterestedinthehistoricalrootsandtheconceptualbasisofthe frequentistandBayesianframeworksoverwhichstatisticianshavespilledsomuch ink,youmayskipChapters 2 and 3.However,pleasehavealookatthefinaltable

ofChapter 3 highlightingthemaindifferencesbetweentheBayesianandfrequentist approachesthatareworthknowingevenifjustforbasicstatisticalliteracy.

• Ifyouarejustinterestedinaspecificdataanalysis(say,logisticregression,factorialanalysisofvariance,countregression), Table2.1 pointstothechaptersyouneeddepending ontheprobabilitydistributionappropriateformodelingeachtypeofresponsevariable. BewarethatyoumayneedtohavealookatpartsofChapter 8 tounderstandcertainkey featuresofthegeneralizedlinearmodelssuchasthelinkfunction.Themainaspectsof incorporatingnumericaland/orcategoricalexplanatoryvariablesinmodelsarecovered inChapters 4 to 6,andtheyarevalidforallmodelscoveredinthisbook.

• IfyouwishtolearneitherfrequentistorBayesianstatistics,youmayonlyreadselected partsofspecificchaptersandsimplydismisstheotherhalf.Butagain,atthispointin thetwenty-firstcenturyitisbecomingessentialforscientiststopossessatleastabroad understandingofthetheoretical/conceptualbasisofbothfrequentistandBayesian frameworksasdiscussedinChapter 3.Youwillneedthebasicsjusttoavoidgetting lostandbeingfooledwhilereadingpapers.

• ReadersonlyinterestedinBayesianstatisticsmayfinditfrustratingtherethereisno singlechapterdevotedtopriors,theperenniallydebatedfeatureofthisframework. StartinginChapter 4,thesettingofpriorsisprogressivelybuiltupincomplexityin differentchapters.Thereisasummaryofthemanynon-exclusivestepsorapproaches todefiningpriorsinthedifferentchaptersonpage323.

• ShouldyoubeinterestedinmodelselectionineitherthefrequentistorBayesianframework,youneedtoreadpartsofChapter 7 toacquireatleastaflavorofhowitisdonein eitherframework.Pleasereadthischapterbeforedoinganymodelselectionwithyour specificdatatype,asunwrittenandoraltraditionshaveplaguedtoomuchofstatistical modelselectioncarriedoutbylifescientists.Althoughthebookhaslimitedemphasis onmodelselectionissues,therearespecificexamplesinChapters 11 and 12.

• Readerswithdatastemmingfromspecificexperimentaldesignsshouldfirstreadthe chapterdealingwiththetypeofdatainPart II,thenhaveatleastaquickreadonthe theoreticalbasisofthemixedmodels(Chapter 13),andthencarryoutthedataanalysis perhapsinspiredbyoneoftheseveralexamplesgiveninthechaptersofPart III.

• Finally,forreaderswishingtoacquireabroadandreasonablyexhaustiveoverview ofunivariatestatistics,theauthorsuggestsstartingwithChapters 4 to 6,jumping toChapter 8 tocoverthebasictheoryofgeneralizedlinearmodels,andthengoing straighttothechapter(s)dealingwiththetypesofdataaccordingto Table2.1.

Whicheverofthesuggested(orother)pathsyoutakethroughthisbook,itisverylikely thatyouwillhavetoflipbackandforthtoimproveorcheckyourunderstandingofa concept,anidea,ortheinterpretationofmodelresults,orsimplythecodeforananalysis orafigure.Inthisregard,whileeachchapterisself-contained,thebookisheavilycrossreferencedtoallowyoutofindyourwaybackandforthbetweenchaptersasneeded.

References

Abedin,J.andMittal,H.(2015). RGraphsCookbook,2ndedn.PacktPublishing,Birmingham. Beckerman,A.,Childs,D.,andPetchey,O.(2017). GettingStartedwithR:AnIntroductionfor Biologists.OxfordUniversityPress,Oxford. Chang,W.(2012). RGraphicsCookbook,2ndedn.CRCPress/ChapmanandHall,NewYork. Fienberg,S.(2014).Whatisstatistics? AnnualReviewofStatisticsandApplications,1,1–19.

Horton,N.andKleinman,K.(2011). UsingRforDataManagementStatisticalAnalysisand Graphics.CRCPress/ChapmanandHall,NewYork. Kabacoff,R.(2011). RinAction.ManningPublications,NewYork. LanderJ.(2017). RforEveryone:AdvancedAnalyticsandGraphics,2ndedn.Addison-Wesley,New York.

Petchey,O.Beckerman,A.,Childs,D.,etal.(2021). InsightsfromDatawithR:AnIntroduction fortheLifeandEnvironmentalSciences.OxfordUniversityPress,Oxford. Teetor,P.(2017). RCookbook.O’ReillyPublishing,NewYork. Teutonico,D.(2015). ggplot2Essentials.PacktPublishing,Birmingham. Senn,S.(2003). DicingwithDeath:Chance,RiskandHealing.CambridgeUniversityPress, Cambridge. Thieme,N.(2018).TheRgeneration. Significance,15,14–20. Wickham,H.(2016) ggplot2:ElegantGraphicsforDataAnalysis.Springer,NewYork. Wild,C.,Pfannkuch,M.,andHorton,N.(2011).Towardsmoreaccessibleconceptionsof statisticalinference. JournaloftheRoyalStatisticalSocietyA,174,247–295.

CHAPTER2 StatisticalModeling

Ashorthistoricalbackground

2.1 Whatisastatisticalmodel?

Usingdatatoteststatisticalhypotheses,tofitempiricalrelations,ortoexploresuggestivepatternsrequiresformulatingstatisticalmodels.Allstatisticaltestsofhypothesesand statisticalestimatorsofparametersarederivedfromstatisticalmodels.Inverygeneral terms,astatisticalmodelcanbedefinedasamathematicalequation(s)havingatleast onevariableexhibitingstochastic(i.e.,probabilistic)variationtorepresenttheinherent uncertaintyofobservingitspotentialvalues.

Thestatisticalmodelsconsideredinthisbookcontainasingleresponsevariable Y reflectingtheeffectof,orthevariationassociatedwith,theexplanatoryvariables X.The lattercanbeanynumberofnumericalvariables,categoricalvariablesdenotinggroups,or combinationsthereof(i.e.,interactionsbetweenexplanatoryvariables).Inallthemodels consideredinthisbook,theresponsevariableisarandomvariablewithanassociated probabilitydistributionwhoseparametersembodyboththeeffectoftheexplanatory variablesandthevariabilityofitspotentialvalues.Statisticalmodelsarethusequations thatcanbeseenasdata-generatingmechanisms.Theycontainexplicitassumptionsthat mayreproducethedataforsomecombinationoftheirparametersandvaluesofthe explanatoryvariables.

Youmightrecallfrompreviousintroductorycoursestheexistenceofprobabilitymass functions(PMFs)andprobabilitydensityfunctions(PDFs)thatareassociatedwithdiscreteandcontinuousrandomvariables,respectively.PMFsandPDFsarecollectivelyalso termed“probabilitydistributions,”andsometimesbotharealsosubsumedundertheterm PDF.Thenamesofsomeprobabilitydistributionsthatmayspringtomindarebinomial, Poisson,normal,andperhapsothers.Whichprobabilitydistributioncouldorshouldbe usedforeachstatisticalmodelessentiallydependsonthemainattributesofitsresponse variable.Ratherthanshowingabestiaryoftheprobabilitydistributionsthatwillbeconsideredinthisbookalongwiththeirequationsandtheirdifferentshapesaccordingto particularparametervalues,wesimplylisttheminrelationtothetypeofdatatowhich theyapply(i.e.,thedomainoftheresponsevariable)inTable 2.1,anddeferfurtherdetails totherespectivechapterswheretheanalysisofeachdatatypeisexplained.Inaddition, youcanfindsuchbestiariesofprobabilitydistributionsinalmostanystatisticsbookon theshelfofthelibraryofyourinstitute,aswellasontheinternet.

Yet,whymusttheresponsevariable Y ofallstatisticalmodelsbearandomvariable? Thereareseverallinesofargumentationforthis(BlitzsteinandHwang2014).Onelineof reasoningisthattherandomnessoftheoutcomevariablesresultsfromtheepistemic uncertainty(afancywayofsayinglimitedknowledge),andthemeasurementerrors

Table2.1 Listofprobabilityfunctionsconsideredinthisbook.

describedbeforepreventusfrompreciselypredictingthembeforeactuallymeasuring orestimatingthemduringthedatacollection.Wheneverwerepeatedlyperformsimple experimentsandmeasureorestimatethevaluesofanoutcomevariablethatcharacterize itsoutcome,weinevitablyobservethatvariabilityisapervasivefeature.Everytimeyou drivethesamecar5kmataconstantspeedittakesadifferentamountoftimetoreachits destination.Aftergivingthesameamountofwatertoidenticalgeneticclonesfromthe sameoriginalplantyouwillobservethatplantheightwillvaryamongthepots.Variability,beitduetotheuncertaintyofthevariablesaffectinganoutcomeortothevagaries ofmeasurement,isasimportantapartofrealityasarethemaintrendsobservedindata. Therewouldbenoneedforstatisticsintheabsenceofvariabilitysince,barringmeasurementerror,agivensetofinputswouldthenalwaysrenderthesamesetofobservable outputs.

Weareinterestedinstatisticsbecauseweneedtounderstandhowtoexplore,analyze,andinterpretthedataathandinthecontextofourcurrentresearch,orbecause wewishtounderstandsomeofthemainprinciplesinvolvedindesigningexperiments andsurveystogatherthedata.Trueenough,thestatisticsinvolvedintheexploration, analysis,andinterpretationofdataandindesigningexperimentsandsurveysrequiresa decentminimalbackgroundinprobabilitytheory.Thismuchyoualreadyknew,which iswhyyouacquiredsuchbasicknowledgebeforepickingupthisbook.Thisbookneed notpretendtobeaself-containedencyclopediabyrepeatingtheintroductorymaterialon probabilitywhoseretellinghasbecomeanenduringritualofstatisticstextbooks.Should youwishtorefreshthesefundamentalconceptsandideas,considerconsulting Wasserman(2004), WestfallandHenning(2014),and BlitzsteinandHwang(2014) amongmany, manyothers.

Moreinterestingandusefultothegoalsofthisbookwouldbetorecallthemaininterpretationsofprobability.Thisisbecausetheseinterpretationsofprobabilityunderlieand gaveorigintothefrequentistandBayesianframeworksofstatisticalinferencethatarethe subjectmatterofthisbook.

2.2 Whatisthisthingcalledprobability?

Probabilityisaprincipledwayofquantifyinguncertaintybyassigningplausibilityor credibilitytoasetofmutuallyexclusivepossibilitiesorresultsofanexperimentor observation.Theconceptofprobabilityhasalong,interesting,andconvolutedhistory (see Tabak2004, Stigler1986,and Weisberg2014).Theoriginofmodernprobability stemsfromAntoineGombaud’s(hewasalsoknownasChevalierdeMéré)question toBlaisePascal(1623–1652)inaParissalonregardingthefairdivisionofstakesofan

interruptedcardgameaccountingforthepreviousandpotentialgainsofeachplayer. Gombaud’squestionsledtoabriefcorrespondenceexchangebetweenPascalandPierre deFermat(1607–1655).PascalandFermatweremostlyconcernedwithevaluatinggamblesandequity,notwithevaluatingeitherevidence(i.e.,data)ortruthinarguments. Theircorrespondencewouldprobablyhavevanishedfrompublicviewwereitnotfor ChristianHuygens’1657book DeRatiociniisinLudoAleae (OnReasoninginGamesofDice). Whilestillexclusivelyfocusingonanalyzinggamesofchance,Huygens’bookremained thereferenceforprobabilityforthenext50orsoyears.

JacobBernoulli’sposthumouslypublishedbook TheArtofConjecturing (1713)markeda turningpointinthehistoryofprobabilityforseveralreasons.First,Bernoulli(1654–1705) showedhowtocalculateprobabilitiesasafrequencyobtainedbytheratioofthenumber offavorableeventstothetotalnumberofevents.Bysodoing,Bernoullidefinedforever probabilityasanindexofuncertaintythatisboundedby0and1.Healsorelatedthe calculationof(someaspectsof)probabilitytodata,andmadethecruciallinkbetween probabilityandthelong-termfrequencyofanevent,latercalledthelawoflargenumbers. Second,Bernoulliappliedprobabilitytomodeluncertaintyinareasotherthangambling, suchashumanmortalityandcriminaljustice,andbysodoinghecreatedwhatcameto beknownas“subjectiveprobability.”Bymakingthecrucialintellectualleapofviewing humanexistenceasanexistentiallotteryakintoagameofchance,Bernoulliwasableto calculatemortalityoddsinordertopricethelifetimeyearlypaymentgivenbythestate tothelendersofmoneytoEuropeanstatesatthetime.Bernoulliwasprobablythefirst topracticallyapplyprobabilitytheoryoutsideofgamesofchance.

Followingthechronologicalline,ThomasBayes’paperpublishedposthumouslyin 1763becamethenextkeycontributiontowhatinthetwentiethcenturybecameknown asstatisticaltheory.Here’sthehistoricalcontext.TheScottishphilosopherDavidHume hadarguedin1748that“causesandeffectsarediscoverablenotbyreason,butbyexperience.”Humestatedthatwecanneverbecertainaboutthecauseofagiveneffectaseither orbothofthemmaybeduetoanasyetunknownultimatecauseofboth.InHume’sview, inductiveinferenceembracedtheuncertaintyofinferringcausesfromeffectsbyreferring toprobableratherthantodefinitecauses.BeingaPresbyterianministerandtrainedin mathematics,Bayes(1702–1761)wantedtocounterHume’sviewbyfindingamathematicalwaybasedonprobabilitytheorytoreliablyinferthecausefromanobservedeffect. HissolutioncametobeknownasBayes’theoremorrule.Itallowsustocomputeso-called “inverseprobabilities,”i.e.,thechancesofinferringacausefromitsobservedeffects.

Bernoulli’smonumentalcontributionwascontinuedbyPierre-SimonLaplace(1749–1827)withtwomajorworkspublishedin1774and1814.Inthem,Laplacenotonly furtherdevelopedthetwoviewsofprobabilitycontainedinBernoulli’sbook,butalso independentlyreachedthesameresultasBayes,usingaclearerandmorethoroughanalysis.ItisLaplace’sresultsthatformthecurrentbasisofwhatiscalledBayes’ruleinmodern statistics(Chapter 3).

Atthispoint,itisusefultoreconsiderthetwomaininterpretationsofprobabilityconsideredbyBernoulliandLaplaceinmoredetailtohelpsynthesizethemajorideas.You shouldalsobeawarethatthereareotherclassificationsandaccountsofthehistorical developmentsandinterpretationsofprobability(e.g., Tabak2004, Howie2004, Zabell 2005, Stigler1986, Weisberg2014),butforthesakeofbrevityitsufficestoconsiderhere thetwomain,broadinterpretationsofprobability.

Thefirstinterpretationofprobabilityissometimescalled aleatoryprobability (Spiegelhalter2019).Itdescribeseitherachanceorexperimentalsetupinvolvingtheprocess ofobtaininguncertainobservations,ortheintrinsicuncertaintyinnature.Let’salso

recallthattheunpredictablechanceeventsatsubatomiclevelscanonlybecharacterizedusingprobability.Aleatoryprobabilitydescribesthepropensityoftheoccurrence ofeventsreferringtoanobjectiverealitythatisindependentofanobserver’sknowledge,andoftheamountofinformationtheypossesstodescribeit.Thisinterpretation ofprobabilityincludesthefrequentistinterpretationpresentinthebooksofBernoulli andLaplace.Init,theprobabilityofaneventisthelong-runproportionoftimesthatit occurswithinasetofinfinitelymanyidenticalpotentialrepetitionsofanexperimentor observation.Calculatingthatproportionthusrequiresdefiningareferencesetofhypothetical,replicatedexperimentswhosecumulativeresultswouldreflectthetruetendency orpropensitytoobserveanyofallthepossibleoutcomes.Aleatoryprobabilityrequiresan actualorhypothesizedchancemechanismcapableofgeneratingasetofuncertainresults whosefrequencieswecouldcountinalargeorinfinitesetofidenticaltrialsorobservations.Theobjectiveorfrequentistinterpretationofprobabilitywaslaterformalizedin John Venn’s(1866) TheLogicofChance.InVenn’sview,probabilityisobjective,literal, andnotaconceptualorpersonalbelief.Asaconsequence,Venndogmaticallydismissed theuseofprobabilitytorefertosingleeventsoranythingunrelatedtofrequency(Howie 2004).ThefrequentistviewofprobabilitygainedtractioninthenaturalandsocialsciencesinthelatenineteenthcenturyasscientistsinspiredbyFrancisGaltonandAdolphe Queteletwereamassingeverlargeramountsofempiricaldata(see Clayton2021). Fisher (1925,p.25)creditedVennwith“developingtheconceptofprobabilityasanobjective fact,verifiablebyobservationsoffrequency,”whichcametobethedominantviewfor mostscientists.

Theothermaininterpretationofprobabilitycanbecalled epistemicprobability (Spiegelhalter2019).Here,probabilityreferstoameasureofapersonaldegreeofbeliefinsome proposition.Therefore,epistemicprobabilityisamentalconstructthatdoesnotdirectly applytoactualevents,buttoourimperfectknowledgeofrealitythatmaybeprogressively modifiedbyinformation.By“knowledgeofreality”werefertoanyobservableeventsuch asflippingacoin,rollingadie,orobservingtheresultsofexperiments.Firstconsideredby bothBernoulliandLaplaceasauniversalmodelofrationality,thisviewofprobabilitywas laterviewedbyAugustusDeMorganandGeorgeBooleasalogicalrelationshipbetween evidenceandbeliefthatmeasuresourignoranceaboutthetruestateofaffairsintheworld (Howie2004).ThisviewofprobabilityasameasureofadegreeofbeliefwaslaterchampionedbyJohnMaynard Keynes(1921),Frank Ramsey(1931),andBruno deFinetti(1933) Jeffreys(1939) wrote:“Theessenceofthepresenttheoryisthatnoprobability,direct, prior,orposterior,issimplyafrequency.”Incontrasttoaleatoryprobability,epistemic probabilityquantifiestheamountofinformationwepossessregardingtheoccurrenceof anevent,orthedegreeoftruthofastatement,orthedegreeof(un)certaintyaboutan event.Additionalinformationwouldgenerallydecreaseourignoranceandhencereduce ourepistemicuncertainty.

Considertheperennialexampleoftossingafaircoin.Theadjective“fair”todescribea coincouldcomefromthelong-termsequenceoftossesyieldingsimilarnumbersofheads andtails.Butitcouldjustaswellstemfromconsideringthephysicalconstitutionofthe cointhatallowsanequalchanceofobtainingaheadsoratails.Orfromtheabsenceof additionalknowledgethatwarrantsyoubelievingotherwise.Orfromouraprioripersonal considerationsthatitisanevenorfairbet.Theprobabilityofheadswillbethesame regardlessofone’sinterpretationofprobability.FollowingWilliam Feller(1967,p.3), “weshallnomoreattempttoexplainthe‘truemeaning’ofprobabilitythanthemodern physicistdwellsonthe‘realmeaning’ofmassandenergyorthegeometerdiscussesthe natureofapoint.”

2.3 Linkingprobabilitywithstatistics

Ourshorthistoricalaccountofprobabilityendedsometimein1920s.ThisiswhenRonald A.Fisherformulatedanovelframeworkforafrequency-basedgeneraltheoryofparametric statisticalinference.Fisher’sframeworkincluded,amongotherthings,maximumlikelihood,testsofsignificance,randomizationmethods,samplingtheory,analysisofvariance, andexperimentaldesign.In1922Fisherpublishedoneofthemostinfluentialpapersin thehistoryofstatisticsthatfundamentallychangeditstheoryandmethodsforever.In thispaperhesingle-handedlycoinedfundamentalconceptssuchas“parameter,”“statistic,”“variance,”“sufficiency,”“consistency,”“information,”“estimation,”“maximum likelihoodestimate,”“efficiency,”and“optimality.”HewasalsothefirsttouseGreeklettersforunknownparametersandLatinlettersfortheirestimates.Muchlikeinclassical physics,thefoundingfatherofmodernstatisticswasalsoanill-temperedgeniusfrom Cambridge.Inthewordsof Hald(1999,p.1):“Therearethreerevolutionsinparametric statisticalinferencedueto Laplace(1774),GaussandLaplacein1809–1812,and Fisher (1922).”Thefirstrevolutionformallyintroducedthemethodofinverseprobability,the seconddevelopedlinearstatisticalmethodsbasedonthenormaldistribution,andthe thirdintroducedmaximumlikelihoodastheworkhorsemethodforstatisticalinference (Hald1999).Indeed,itmightbesaidthatstatisticsasthechildofprobabilitytheorywas bornwithBayes’posthumous1763paper,andwasbroughttomaturitybyLaplacewho usedinverseprobability,viathenow-standardBayes’theorem(Pawitan2001).Thesecondrevolutioninvolvedthedevelopmentofatheoryoferrorsby Gauss(1809).Itwas inspiredbytheneedtoadjustandsummarizeobservationaldatafromastronomyorsurveying.Gaussproposeduseofthenormaldistributionandtheprincipleofleastsquares asageneralmethodofestimation(Fig. 2.1).

Century Theory of probability Frequentist statistics

Bayesian statistics

XVII

1654

B. Pascal & P. Fermat exchange

1657

C. Huygens’ book

1713

J. Bernoulli’s book

T. Bayes paper

Fig.2.1 Datesandtheoreticallandmarks.

1866

J. Venn book

1812

P. Laplace book

Books by 1921 J. Keynes

1925 F. Ramsey

1933 B. de Finetti

1933 A. Kolmogorov

1922, 1925, 1935 R. Fisher books and papers

1939

H. Jeffries book

1932, 1934

J. Neyman & E. Pearson papers 1953

Metropolis et at paper

Rediscovery of MCMC

Priortothethird,Fisherianrevolution,statisticsmostlyconsistedofacollectionof semi-independent,discipline-specificmethods(Efron1998)developedtoanalyzedata inbiology,agronomy,psychology,astronomy,etc.Thesemethodsincludedtheleastsquaresmethodofestimation,linearregressionandcorrelation,chi-squaretables,andthe t-test(see Stigler1986, 1999 and Hald1999 forhistoricaloverviews).Thesemethodswere appliedtothelargedatasetsthatwereamassedthroughoutthenineteenthcenturyinthe

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.