Inchausti Visit to download the full and correct content document: https://ebookmass.com/product/statistical-modeling-with-r-a-dual-frequentist-and-bay esian-approach-for-life-scientists-pablo-inchausti/
More products digital (pdf, epub, mobi) instant download maybe you interests ...
Bayesian Analysis with Excel and R 1st Edition Conrad Carlberg
https://ebookmass.com/product/bayesian-analysis-with-exceland-r-1st-edition-conrad-carlberg/
Statistical Thinking From Scratch: A Primer For Scientists M. D. Edge
https://ebookmass.com/product/statistical-thinking-from-scratcha-primer-for-scientists-m-d-edge/
Applied Data Analysis and Modeling for Energy Engineers and Scientists
https://ebookmass.com/product/applied-data-analysis-and-modelingfor-energy-engineers-and-scientists/
Applied Statistics with R: A Practical Guide for the Life Sciences Justin C. Touchon
https://ebookmass.com/product/applied-statistics-with-r-apractical-guide-for-the-life-sciences-justin-c-touchon/
Physics for Scientists and Engineers: A Strategic Approach with Modern Physics, 5th Edition, Global Edition Randall Knight
https://ebookmass.com/product/physics-for-scientists-andengineers-a-strategic-approach-with-modern-physics-5th-editionglobal-edition-randall-knight/
Bayesian Statistics for Beginners: A Step-By-Step Approach Therese M Donovan
https://ebookmass.com/product/bayesian-statistics-for-beginnersa-step-by-step-approach-therese-m-donovan/
Data Analysis for the Life Sciences with R 1st
Edition https://ebookmass.com/product/data-analysis-for-the-lifesciences-with-r-1st-edition/
An Introduction to Statistical Learning with Applications in R eBook
https://ebookmass.com/product/an-introduction-to-statisticallearning-with-applications-in-r-ebook/
SAS for R users : a book for budding data scientists First Edition Ohri
https://ebookmass.com/product/sas-for-r-users-a-book-for-buddingdata-scientists-first-edition-ohri/
StatisticalModelingWithR StatisticalModeling WithR AdualfrequentistandBayesian approachforlifescientists PABLOINCHAUSTI CentroUniversitarioRegionaldelEste,UniversidaddelaRepública,Uruguay
GreatClarendonStreet,Oxford,OX26DP, UnitedKingdom
OxfordUniversityPressisadepartmentoftheUniversityofOxford. ItfurtherstheUniversity’sobjectiveofexcellenceinresearch,scholarship, andeducationbypublishingworldwide.Oxfordisaregisteredtrademarkof OxfordUniversityPressintheUKandincertainothercountries ©PabloInchausti2023
Themoralrightsoftheauthorhavebeenasserted Impression:1
Allrightsreserved.Nopartofthispublicationmaybereproduced,storedin aretrievalsystem,ortransmitted,inanyformorbyanymeans,withoutthe priorpermissioninwritingofOxfordUniversityPress,orasexpresslypermitted bylaw,bylicenceorundertermsagreedwiththeappropriatereprographics rightsorganization.Enquiriesconcerningreproductionoutsidethescopeofthe aboveshouldbesenttotheRightsDepartment,OxfordUniversityPress,atthe addressabove
Youmustnotcirculatethisworkinanyotherform andyoumustimposethissameconditiononanyacquirer
PublishedintheUnitedStatesofAmericabyOxfordUniversityPress 198MadisonAvenue,NewYork,NY10016,UnitedStatesofAmerica
BritishLibraryCataloguinginPublicationData
Dataavailable
LibraryofCongressControlNumber:2022937827
ISBN978–0–19–285901–3(hbk)
ISBN978–0–19–285902–0(pbk)
DOI:10.1093/oso/9780192859013.001.0001
Printedandboundby CPIGroup(UK)Ltd,Croydon,CR04YY
Coverimage:JohnLund/GettyImages.
LinkstothirdpartywebsitesareprovidedbyOxfordingoodfaithand forinformationonly.Oxforddisclaimsanyresponsibilityforthematerials containedinanythirdpartywebsitereferencedinthiswork.
ForJoana, becausetwoissomuchmorethanone
Preface Aprefaceistheclosestanauthormayhavetothe“letterofmarque”usedbyEuropean governmentsintheseventeenthcenturytoauthorizepiracywiththesovereign’stacitconsent.Anauthorcanwritejustaboutanythingintheprefacewiththeresignedpermission oftheeditor.
Havingfinishedthebook,Iherebygivemyselfpermissiontowriteinthefirstperson singular.Ihavespentmanythousandsofhoursendlesslywritingandrewritingthisbook overthelast24months.Ithasbeenachallenge,apleasure,athrill,andaburdenatthe sametime.
Atthispoint,Ihavethefollowingunsettlingmixtureoffeelings:
* Reliefandjoy:Finallyitisover.
* Prideandaccomplishment:Ihavedoneit!Ihavedoneit!Ihavedoneit!
* Uncertainty:Wasitworththeeffort?Willitbewell/badlyreceived?
* Insecurity:DidIdouble-checkeverything?Doesithaveanyembarrassingerrors?
Itisnowtimeforthebooktosinkorswimonitsownatthehandsofitsreaders.
Alongtimeago,myundergraduateadvisersentmetofetchsomerainfalldatafrom theVenezuelanMinistryoftheEnvironment.Ireluctantlywent,gotlostinsidetheugly building,failedtogatherthedata,butstumbledbychanceintoanearlyemptyand shabbyMinistrybookstore.ThereIfoundagem:theoriginal1975Spanisheditionof thebook Areografía:estrategiasgeográficasdelasespecies byEduardoRapoport.Hewasa cleverandoriginalArgentinianecologistwhoanticipatedwhatlaterbecameknownas macroecology.ThePergamonPresstranslation(Areography:TheGeographicStrategiesof Species)missedtheprefaceoftheSpanishoriginal,easilythebestprefaceofascientific bookIhaveeverread.IreadRapoport’sprefaceatthebookstore,andrightawayIbought thebookusingallthemoneyIhad,includingthereturnbusfare.Iwalkedthe4.5km home,readingthebookbetweentrafficlights.Recognizingtheabsenceofrulesforwhat theprefaceofascientificbookshouldcontain,Rapoportaimedtobringahumanized depictionofhisCV,sothat“sciencebookswouldleavehomeandbetakentothedentist waitingroom,”andyouknewwhattosayifyouevermeettheauthor.Followingthe masterEduardoRapoport,hereismyattempt.
ThingsthatIlove * ThedazzlingimaginationsofGabrielGarcíaMárquez,JulioCortázar,andAlejoCarpentier,thedignityofPrimoLevi,thewisdomofUmbertoEco,thehumanityofItalo Calvino, TheMagus byJohnFowles.
* ThemusicandshiningsmileofLouisArmstrong,thesaxophoneofJohnColtrane(his mission:“amasterpiecebymidnight”),andArtTatumplayingthepiano;theBeatles, EricClapton,Sting,andMarkKnopfler;thebluesofTajMahal,Keb’Mo,andBuddy
Guy;thestirringvoicesofAnnieLennox,UteLemper,andMadeleinePeyroux;the lyricsofLeonardCohenandBobDylan.
* Twoscientificheroes:John(JBS)HaldaneandRichardFeynman.
* AlltheMontyPythonfilmsandtheoriginalBBCseries.
* AllPicasso,exceptthepinkperiod;GustavKlimtandVincentvanGogh.
* Let’sshareafullUruguayanbarbecue,includingmollejas(thymus),kidneys,andsweet bloodsausages.ItwouldbeglorioustoenjoysomeFrenchgoatcheese,afreshgreen saladwithendivesandcherrytomatoes,redwineofcourse,andamangoorpassionfruit moussetobringmysoulclosertoearthlyparadise.
* And,aboveall,let’stalk,exchangingstories,books,andanecdotes.Thefoldersofmy memoryholdcountlessmegabytesofhistorical,literary,andscientificinformation, someofwhichmayeveninterestorentertainyouforawhile.AndIcanswiftlychange myopiniononanyissueunderthesunasmanytimesasyoucanmanagetoconvince mewithgoodarguments,sensibleevidence,andamodicumofstraightreasoning.
ThingsIhate * Socialinjusticeinanyshapeorform.
* Thestupidityofthemilitaryandallitscheerleaders.
* Social,racial,andsexualdiscriminationunderanydisguiseorshade.
* Alltotalitarianideologiesandformsofthought.
* Thestiffness,conservatism,intolerance,andbackwardnessofthetraditionalCatholic Churchandofmanyrecentlycreatedprotestantchurches.
* Pineappleonpizza:ahorrendousmixthatspoilstwogreatthings.
Mystory Myownbiographyisprettyordinary.Iwillrecalljustafeweventsthatmightperhaps inspireotherstobelieveinthemselves.IwasborninUruguay,asmallcountrythatlies sandwichedbetweenArgentinaandBrazilatthebottomleftoftheworldmap.Afterstartingpublicprimaryschoolthere,IfollowedmyfathertoVenezuela.Istarteduniversity wishingtobecomeanelectricalengineer,butfinallymanagedtograduateinbiologyat thetardyageof26.
Idesperatelywantedtostudymoreandbecomeascientist.Withmythenpartner, wemanagedtogetadmittedtotheStateUniversityofNewYorkatStonyBrook(now StonyBrookUniversity)bysheerluck.InSeptember1992,wegatheredallourmoneyand belongings,gotsomefamilyloans,andtraveledtoNewYork.Welandedthereperfectly unawareofeverything,includinghowtogettotheuniversityfromtheairport.Wehad
expectedtopaythefirstyearofuniversityfees,bettingthatourgoodbackgroundwould allowustoobtaingoodgradesthatmightleadtosomefinancialsupport.Butitturned outthatwedidnothavetopayanyuniversityfeesatall!
Andevenbetter,onedaybeforethestartofmyfirstsemester,anothergraduatestudent chosetotakecareofherillgrandmotheranddeclinedherteachingassistantposition.I wasofferedit,andofcoursetookit:$750amonthminustaxesamountedtotouching heaven.ThenextdayIwenttoteachsomeverybasicbiologyto25puzzledAmerican students.DuringmyfourthdayinanEnglish-speakingcountry,Ibarelyunderstood40% ofwhatthestudentssaid.ButIhadaninspiredideathatsavedme.Ishamelesslytold themofahearingdisabilitythatrequiredthemtospeakslowlyandveryloudlyforme tounderstandthem.Andtheydidittosuchanextentthatmystrangedisabilitymiraculouslydisappearedafterafewweeks.WequicklyboughtaTVtohelptrainmywooden ears.Atfirst,theonlyprogramthatIcouldunderstandwasthe(British)PrimeMinister’sQuestionsthatwasbroadcastonCSPANverylateatnightjustforthe(dis)pleasure ofinsomniacs.Thisverytheatrical,ceremonial,andmostlypointlessweeklyexerciseof BritishpoliticswasmydoortounderstandingspokenEnglish,andthestartingofan anglophiliathatonlyBrexitrecentlyanddefinitelymanagedtocure.
AtStonyBrook,ImetLevGinzburgbychancewhileeatingsandwichesattheDepartmentofEcologyandEvolution.ThisveryintelligentandwittyRussianmathematician becamemyPhDsupervisor.Atfirst,itwasnearlyimpossibletounderstandwhatthisman wastalkingsoquicklyabout.Iusedtosharea6m2 officeinfrontofhis. Лeв oftencalled meintohisofficeusingmundaneexcusestospendmanyhourstalkingandteachingme onaone-to-onebasisasifIwasamedievalapprentice.Theseinteractionsovertheyears shapedmeintoascientist,andaffectedmybrainmorethananythingsincegastrulation. Thewiderangeoftopicsoftheseconversationsincludedmathematics,ecology,classical physics,dynamicalsystems,riskanalysis,philosophyofscience,thelatestbookswewere reading,andwhoknowswhatelse IstillvividlyrecalltwoentireFridayafternoons that Лeв devotedtoteachingmethepuzzlingbasicsofquantummechanics(including theSchrödingerwaveequation)usingasmallgreenblackboardandwhitechalk.Itwas anindescribablepleasuretohavereceivedsuchagiftofhumanknowledgefromyou, мoй дopoгoйдpугинacтaвник.
Igraduatedin1998,andmyItalianpassport(lifelessonfortheyoung:youcannever havetoomanypassports;acquireasmanyaspossiblesincesomemayopenunexpected doors)gotmeanEUfellowshipforapostdocwithJohnLawtonatImperialCollege,UK. IlatermovedtoFrancewhereIlivedandworkedfornineyears.Othermovesfollowing anon-traditionalandhardlystraightpathtookmebacktoUruguay,whereIlivenow.
Iwillnotbotherthereaderwithfurtherdetailsofmyacademicpast.Thereishardly anymeritinvolvedinit.Likeyou,Ihave23pairsofchromosomesineverycell,bloodthe samecolorasyours,andagenomethatdiffersfromyoursandfromMandela’s,Einstein’s, Himmler’s,andStalin’sbyaboutsixmillionDNAbases(~0.06%,anirrelevantdifference sinceonlyabout2%ofourDNAistranslatedintoproteins).Therefore,restassuredthat thereisnothingspecial,unique,orevengoodaboutme.Youcaneasilydobetterthan meifyouwish.
Justtrustmeonthisone.Mostpeoplewhosucceedinlifearethosethatseriously applytheirheartandmindandenergylongenoughtopursuetheirdreamswithstubborn determination.Iamconvincedthatlife(ortheuniverse,orthegods)rewardspersistence andsingle-mindednessoverapparentleapsofinspiredgenius.However,forthatyoufirst needtoholddreamsandambitionsforyourself.Nobodycanteachyoutodreamand
aspiretoahigherfuturethanyourpresent.Dreamingturnsouttobeaspontaneousand personalaffair.Igleanedthenextquote(outofcontext,andoddlyenoughduetoLenin) fromaJulioCortázarbookthatsummarizeswellwhatIwishtoconvey:
Theriftbetweendreamsandrealitycausesnoharmifonlythepersondreamingbelievesseriouslyinhisdream,ifheattentivelyobserveslife,compareshis observationswithhiscastlesintheair,andif,generallyspeaking,heworksconscientiouslyfortheachievementofhisfantasies.Ifthereissomeconnection betweendreamsandlifethenalliswell.
IhavebeenhelpedbeyondthecallofdutybythestaffofOxfordUniversityPress.Ian Sherman,senioreditorofLifeSciences,incrediblyrememberedmeaftera19-yearhiatus and,evenmoresurprisingly,believedinandlikedtheideaofthisbook.Hevariously guided,prompted,keptquiet,andencouragedme,andIcannotthankhimenoughfor allthisandmore.ImustalsothankKatieLakinaforputtingtheproductionofthisbook backontrack,KarenMooreforherdiligentanddedicatedworkduringthetransformation ofmanyfilesintoafinishedbook,andRichardHutchinsonforhisattentiveandcareful copyeditingthatgreatlyimprovedthequalityofthetextthatyouarereading.
ThefreeandopensoftwareRandthemanypackagesusedinthisbookstemfrom thefantasticandcreativeworkofmanygenerousscientistsandprogrammersaroundthe world.Theirincredibleworkhascreatedthecollectivepropertyofstatisticalknowledge thatmadethisbookpossible.WhileIlackthemeanstothankyouall,letmeatleastraise aglasstotoastyouwithendlessgratitude.Ifthereisanyinformaticsgod,itsblessings shouldalsoextendtothecreatorsandmaintainersofLinuxUbuntuandLibreOffice.
SebastiánAguiar,MarcKéry,EnriqueLessa,DanielNaya,andMatíasSchraufkindly read,commentedon,andcorrecteddifferentchaptersofthisbook.Theirinputand feedbackpromptedchangesthatledtoimprovementsandhopefullyfewerembarrassing mistakes.Thestubbornerrors,plaininconsistencies,andstraightomissionsthatmight remainare,ofcourse,mineonly.MelinaAranda,JavierGarcía,DanielNaya,AliciaPonce, andAgustínSáezkindlyprovideddatafromtheirpublishedpapersthatareusedaseither casestudiesorproblemsattheendofsomechapters.IthankAlexandraElbakyanfor allowingmetoaccessanenormousamountofessentialinformationthatIcouldnot otherwisehaveeverdreamedtoreadanduseinthisbook:
Irefusetoindulgeinthetackyfinalsentencesthatendtheprefacesofmanyscientific books:“Lastbutnotleast,Iwanttothank ... fortheirpatienceand ... forthemanyhours Ispent ”Ohno,pleasenotthatagain!ButIwillsaythis:overthelast12years,Ihave beenblessedbeyonddeservingbytheearthlygodstosharemylifewithJoanaGagliardi. Sheismymagnificentpartner,mypassionatelover,myclosefriend,andatrulygreatand beautifulwomanwithashinysoulenvelopedbyalargesmileandalmond-shapedeyes.I havealsohadtheprivilegetosharetheseyearswithFiamma(24)andIahel(20),Joana’s brightdaughterandson,whomIhaveseengrowintotwobeautifuladultswhoarethe betterangelsofmysoul.
Thisisenoughnow.Youdidnotbuythebooktoreadthisbabble.Youwantsomestats, andthatiswhatyouwillfindstartingonthenextpage.Shouldyouhaveanycomments, complaints,remarks,orsuggestions,orhavespottedanysmallorlargeerrors,Iwantto hearfromyou,sopleasewriteto pablo.inchausti.f@gmail.com
Withwarmregards, Pablo
5TheGeneralLinearModelII:Categoricalexplanatory
5.8Aposterioritestsinfrequentistmodels
6.5Analysisofcovariance:Mixingcontinuousandcategorical explanatoryvariables
6.6Analysisofcovariance:Frequentistfitting
6.7Analysisofcovariance:Bayesianfitting
7ModelSelection:One,two,andmoremodelsfittedtothe
7.1Introduction
7.2Theproblemofmodelselection:Parsimonyinstatistics
7.3Modelselectioncriteriainthefrequentistframework:AIC
7.4ModelselectioncriteriaintheBayesianframework:DICand WAIC
7.5Theposteriorpredictivedistributionandposteriorpredictive checks
7.6NowbacktotheWAICandLOO-CV
7.7Priorpredictivedistributions:Arelatively“new”kidontheblock
8TheGeneralizedLinearModel 8.1Introduction
8.2WhatareGLMsmadeof?
8.3FittingGLMs
8.4GoodnessoffitinGLMs
9WhentheResponseVariableisBinary 9.1Introduction
9.2KeyconceptsforbinaryGLMs:Odds,logodds,andadditional linkfunctions
9.3FittingbinaryGLMs
9.4UngroupedbinaryGLM:Frequentistfitting
9.5FurtherissuesaboutvalidatingbinaryGLMs
9.6UngroupedbinaryGLMs:Bayesianfitting
9.7GroupedbinaryGLMs
9.8Problems
10WhentheResponseVariableisaCount,OftenwithMany Zeros
10.1Introduction
10.2Over-dispersion:Acommonproblemwithmanycausesand somesolutions
10.3Plantspeciesrichnessandgeographicalvariables
10.4Modelingofcountswithanexcessofzeros:Zero-inflatedand hurdlemodels
10.4.1Frequentistfittingofazero-inflatedmodel
13.4Problemsandinconsistencieswiththedefinitionofrandom effects
13.5Population-levelandgroup-leveleffectsinBayesianhierarchical models
13.6Fittingmixedmodelsinthefrequentistframework
13.7Statisticalsignificanceandmodelselectioninfrequentistmixed models
13.8Theshrinkageorborrowingstrengtheffectinmixedmodels
13.9FittingmixedmodelsintheBayesianframework
14.4.2Randomizedblockdesign
14.4.3Split-plotdesign
14.4.4Nesteddesign
14.4.5Repeatedmeasuresdesign
15MixedHierarchicalModelsandExperimentalDesignData 15.2.1BinaryGLMMwitharandomizedblockdesign:Frequentist models
15.2.2BinaryGLMMwitharandomizedblockdesign:Bayesian models 407
15.3GaussianGLMMwitharepeatedmeasuresdesign 416
15.3.1GaussianGLMMwitharepeatedmeasuresdesign:Frequentist models 420
15.3.2GaussianGLMMwitharepeatedmeasuresdesign:Bayesian models
15.4BetaGLMMwithasplit-plotdesign 428
15.4.1BetaGLMMwithasplit-plotdesign:Frequentistmodel 432
15.4.2BetaGLMMwithasplit-plotdesign:Bayesianmodel 439 15.5Problems 449
Afterword
AppendixA:ListofRPackagesUsedinThisBook
AppendixB:ExploringandDescribingtheEvidenceinGraphics (onlyavailableonlineat www.oup.com/companion/InchaustiSMWR)
AppendixC:UsingRandRStudio:TheBare-BonesBasics (onlyavailableonlineat www.oup.com/companion/InchaustiSMWR)
Index
PARTI TheConceptualBasisforFitting StatisticalModels CHAPTER1 GeneralIntroduction 1.1 Thepurposeofstatistics Thefirstarticleofthefirstissueof AnnualReviewofStatistics wasentitled“Whatis statistics?”(Fienberg2014).Itstartedbylistingeightdifferentandonlypartlyoverlappingdefinitions.Itishardtoimaginethatchemistsorphysicistswouldprovideasmuch varietywhendefiningtheirowntrades.TheAmericanStatisticalAssociationoffersavery inclusivedefinition:“Statisticsisthescienceoflearningfromdata,andofmeasuring, controllingandcommunicatinguncertainty”(https://www.amstat.org/asa-newsroom). Whilenoteverystatisticianwouldagreewiththis,itservestohighlightthatstatistics isakindofmeta-disciplineaimingtoextractreal-worldinsightsfromdatagathered withinotherrealmsofknowledge(Wildetal.2011).Statisticsisameta-disciplinebecause, indealingwiththefuzziness,imprecision,andvagariesofreal-worlddata,itpushes itspractitionerstoformulate“theoreticalscaffolds”thatcanbeusedonotherareasof knowledge.
Obtaininginsightsfromstatisticsinvolvesspecifyinghypotheses,gatheringdatarelevanttoaproblem,modelingdatawithquantitativemethods,andinterpretingquantitativefindingswithinthespecificcontextofthescientifichypothesesthatmotivated theresearch.Theseactivitiesdonot,andcannot,takeplaceasanintellectualabstraction aimingtosolveproblemswithintheclearlydefinedboundariesofappliedmathematicswherestatisticsissometimesplaced.Mathematiciansoftenneedto(over-)simplifythe contextoftheinitialproblemtobetterdefineanarrower,moreinteresting,andhopefully solvableresearchquestion.Incontrast,instatisticsthecontextisthekeytointerpreting thefindingsofcomputerprintoutsoftablesandgraphsandtotransformingdatainto insightsintermsoftheresearchproblemandhypothesesthatmotivatedthegatheringof evidence.Thepracticeofstatisticsis(orrathershouldbe)somethingfarmoresubtleand interestingthanaquasi-mechanicalquesttocontrastandrejecthypotheseswhenever p <0.05,asyoumighthavelearnedinundergraduatecourses.
“Statisticiansareengagedinanexhaustingbutexhilaratingstrugglewiththebiggest challengethatphilosophymakestoscience:howdowetranslateinformationintoknowledge?”(Senn2003 p.3).Takenatfacevalue,howcanthislaststatementfailtoexciteyou? Statisticiansdealwiththeexcruciatingmessinessofreal-worlddata.Bythatwemean theuncertaintyinthemeasurementsofvariables,thepervasivevariabilityoftheworld, andtheoftenfoggyrelationsbetweenthevariablesthatweaimtouncoverinorderto claimempiricalsupportforascientifichypothesis.Statisticshastotacklethechanceand contingencythatlieentangledwithinreal-worlddata,andwhoseinfluencecanbeaspervasiveasthatofthesignalrelatedtothemainpatternsthatwewishtoreliablyretrieve. Thestatisticalholygrailistouncoveranapproximatestatisticalmodelthatcouldhave plausiblygenerated(andhencefitsacceptablywell)theavailableevidence.Butthisisnot
all.Themagnitudesoftheestimatedparametersofsuchawell-fittingmodelshouldallow theevaluationofastatisticalhypothesisandhaveatangible,real-worldinterpretationin theresearchcontextthatpromptedthedesignoftheexperiment,thegatheringofdata, anditsanalysis.
1.2 Statisticsinaschizophrenicstate? Overthelastcentury,statisticshasfullydevelopedtwotheoreticalframeworks(frequentistandBayesian,tobeexplainedinChapters 2 and 3)thathavecontendedtobecome “therightandappropriate”wayofanalyzingdata.Youwillnotfindpractitionersin otherscientificdisciplinesspillingsomanybarrelsofinkfightingeachotherwithout everachievingcompletevictory.Thesetwoframeworkslargelystemfromtwodifferent viewsofprobabilitythathavecoexistedsincetheseventeenthcentury,andtheirproponentsanddefendershaveengagedinacrimoniousandprotracteddisputesduringmost ofthetwentiethcentury.Thecurrentlydominantfrequentistframeworkisanincoherent blendthatarosefromtheprotractedclashbetweenR.FisherononesideandJ.Neyman andE.Pearsonontheother.ItislikelythatFisherandNeyman/Pearsononlyagreedon theirstrongdislikeanddistrustoftheuseofpriorinformation(again,tobeexplainedin Chapter 2)asasubjectiveandarbitrarycomponentoftheBayesianframeworkthatthey wanteduprootedfromstatistics.Aimingforobjectivityandconclusionsthatareindependentofwhoeveranalyzesthedata,mostofthepracticeofstatisticschampionedunder thefrequentistframeworkhasturnedintoaquasi-mechanizedprocedureaimingtoreject statisticalhypotheses.
Itiscurrentlyfairtosaythataclearmajorityofscientistshavebeeneducatedincourses basedon(andhenceonlyuse)frequentistmethods.However,beingin(arapidlygrowing) minoritydoesnotsuggest,orevenlessproves,thatthechampionsoftheBayesianframeworkare“wrong”byanystretchoftheimagination.Thestruggleforprimacybetween proponentsofthesetwostatisticalframeworkshasbeenlargelyinconclusivethusfar. Atpresent,scientistshaveamoreecumenicalorpragmaticviewofusingwhatseems appropriate,andwhattheyknowbest,tosolvetheproblemathand.Scientistsneeding toemploytheotherframeworkalmostneedtorelearnfromscratch.Thisbookexplains, discusses,andappliesboththefrequentistandBayesianstatisticalframeworkstoanalyze thedifferenttypesofdatathatarecommonlygatheredbyresearchscientistsandstudents.
Thebookinyourhandsaimstopresentmaterialinaninformal,approachable,and progressivemannersuitableforresearchscientistsandgraduatestudentswithamodicumofprevioustraining.Thebookcoversallthematerialinatheoreticallyrigorous manner,focusingonthepracticalapplicationsofallmethodstoactualresearchdata. Itaimstoprovidejustenoughtheoreticalbackgroundforyoutounderstandthebasic underpinningsofthestatisticalmodelsexplainedhere.Everyimportantformulawillbe “translated”intowordstoprovideaclear,non-intimidatingdescriptiontoreaderswith onlyabasicbackgroundinmathematicsandinferentialstatistics.Incontrasttobooks ladenwithmoretheory,thisisa“how-to”book.Itemphasizesteachingbylearningto computeusingR,andtothoroughlyinterprettheresultsfromtheviewpointandneeds ofresearchscientistsandstudents.
1.3 Howisthisbookorganized? Itisunthinkabletocarryoutstatisticalanalysisofmeaningfulamountsdataofeven moderatecomplexitywithoutacomputer.Thisbookwillmakeextensiveuseofthe Rprogrammingenvironment(http://www.r-project.org/).Thisisanopen-source(one
canaccessandeditthecodeofalltheRfunctionsandsavearevisedversioninone’s computer),interpreted(itdoesnotrequirecompilationtobeexecuted)programminglanguageenvironmentforstatisticalcomputingandgraphics.RrunsonLinux,Windows, andmacOS,amongothers,andisthebrainchildofitscreatorsRossIhakaandRobert Gentleman.ItisnowsupportedbytheRFoundationforStatisticalComputing(Thieme 2018).RhasexperiencedphenomenalgrowthsinceAugust1993tobecomeoneofthe mostpopularandfastestgrowingprogramsforstatisticalanalysisandgraphicsworldwide.Beingaprogramminglanguage,Rcanbeeasilyextendedbywritingfunctionsand extensions.ThereisagrowingandveryactiveRcommunitycreatingpackages(more than17,500packagesinApril2021)andprovidinganswersintermsofcodeandexplanationsinmanyactiveandfast-reactingmailinglists.RcodeismostlywrittenintheR languageitself,althoughadvanceduserscanlinkittoothercomputerlanguagessuchas C,C++,FORTRAN,Java,andPythonusingspecificcommandstoassistintheexecution ofcomputer-intensivetasks.
MoststatisticsbooksusingRaimforstandaloneusebyprovidingbrief(andbynecessity incomplete)introductorychaptersabouttheinstallationandbasicuseofR,including thebasiccommandstogenerategraphics.ThisintroductorymaterialaboutRcantakeup severalchapters,often10to20percentoftheoveralllengthofmanystatisticaltextbooks. Therearemanybooksandcompanionwebsitesthatcoverboththebasicstepsforusing Randproducinggraphs:see Beckermanetal.(2017), Lander(2017), Petcheyetal.(2021), and Teetor(2017) forthebasicsofR; HortonandKleinman(2011) and Kabacoff(2011) forsimplegraphics,and AbedinandMittal(2015), Chang(2012),and Teutonico(2015) for ggplot2 graphics.Wefeltitunwisetoprovidethesamematerialinprintyetagain. Thecompanionwebsite(www.oup.com/companion/InchaustiSMWR)containsdetailed informationabouttheinstallationofRinWindows,macOS,andLinux,alongwiththe basicsyntaxforusingandmanipulatingRobjects.Thewebsitealsoprovidesdetailed explanationsformakingbasicplotsinRusingthepackage ggplot2 (Wickham2016), whichisrapidlybecomingthedominantapproachtoproducinggraphicsinR.Fromhere on,allRcodeinthebookwillbeshown in this font and highlighted in gray Whilethecodenecessaryforeachstatisticalanalysiswillbethoroughlyexplainedineach chapter,thecodeusedtomakeallthefigurescanbefoundonthecompanionwebsite toavoiddistractingyoufromunderstandingthemainideas.Youwillalsofindallthe datasetsandscripts(i.e.,textfileswithcommands)foreachchapterinthecompanion website.
Rhasaratherminimalistinterfaceinwhichtheusertypescommandsandobtains statisticalandgraphicalresults.RStudio(https://rstudio.org)hasbecomeaverypopular graphicalinterfacethatmanagestheinteractionbetweentheuserandRwithgreatflexibility.Theinstallationandbasicuseofthisfreegraphicalinterfaceisalsoexplainedon thecompanionwebsite.Nonetheless,allstatisticalandgraphicalanalysesdescribedin thisbookareindependentofwhetheroneusesagraphicalinterfacesuchasRStudio.
Thisbookisorganizedinthreeparts.Part I willprovidethefundamentaldefinitions ofprobabilitythatunderliethefrequentistandBayesianframeworks,anddevelopsthe notionofparameterestimationasthemaingoalofstatisticalinference(Chapter 2).
Chapter 3 thencoversthebasicunderpinningsofthefrequentistandBayesianmethods ofparameterestimation(i.e.,maximumlikelihood,andtheMarkovchainandHamiltonianMonteCarloalgorithms)thatwillbeusedinthedataanalysesofallthechaptersof Parts II and III
Part II representsthebulkofthisbook.Itcoverstheanalysisofthemaintypesofdata gatheredinsocialandnaturalsciencesfrombothfrequentistandBayesianperspectives. Eachdatasetwillbeanalyzedwithbothframeworks.Readersmaychoosetofocuson
separate,largelyself-containedchaptersdependingonthetypeofresponsevariable.However,thesingleeffectsofnumericalandcategoricalexplanatoryvariables(Chapters 4 to 6)shouldbeexaminedasbasicfoundationalaspects.Chapter 7 coversthetheoretical basisofmodelselection(andafewotherthings),againforbothfrequentistandBayesian frameworks.Chapter 8 reviewstheconceptualbasisofthegeneralizedlinearmodelsthat allowviewingmostoftheanalysesexplainedinseparatechaptersofPart II asspecial cases.Theassessmentofstatisticalsignificanceofparameterestimates,thecalculationof confidenceintervals,andtheassessmentofmodelgoodnessoffitarealsocovered.The restofPart II covers,inseparatechapters,theanalysisofdifferenttypesofdatacommonly encounteredinscientificresearchinvolvingbinary,count,proportions,andotherrealvaluedoutcomevariables.Thequalityoffitofallthestatisticalmodelstothedatawill beassessedwithresidualanalysisandrelatedmethods,allofwhichwillbeexplainedin detail.
Part III buildsontheunderstandinggainedinPart II toincorporaterandomor population-leveleffects(Chapter 13).Thisenablestheincorporationofstructureinthe dataimposedbyexperimentalandsurveydesigns(Chapter 14).Itisatthispointthatthe bookreachesitshighestlevelofcomplexity,generality,andusefulness.Asinallchapters ofPart II,theemphasisisplacedonformulatingthestartingstatisticalmodel,fittingthe modelusingeitherthefrequentistorBayesianframework,interpretingandunderstandingthemodeloutputs,assessingthegoodnessoffittothedata,andtranslatingintowords andfiguresthestatisticalfindingsforinterpretation.
Thebookwasstructuredandwrittenassuminganimaginaryreaderinterestedinacquiringabroadandcomprehensiveunderstandingofunivariatestatisticalanalysisaftera basicundergraduatecourseastaughtinmostengineeringandsciencefacultiesaround theworld.Thesesingle-semestercoursesprovideabasicunderstandingofdescriptive statistics(mean,variance,quartiles),thebasicnotionsofprobabilitytheory,aworking knowledgeofsomeprobabilitydistributions(e.g.,normal,binomial),howtocalculate theconfidenceintervalsofatleastthepopulationmean,thebasis(i.e.,typesofstatistical errors,thenotionofstatisticalsignificance)fortestingstatisticalhypothesesaboutthe differencesbetweentwomeans,andhopefullysimplelinearregression.Thebookstarts slowlytoprogressivelybuildabasicunderstandingofthemainconceptsandideasthat willbeusedinsubsequentchapters.
1.4 Howtousethisbook In1963theArgentinianwriterJulioCortázarpublishedtheremarkablebook Hopscotch (or Rayuela forthosewhocanreaditintheSpanishoriginal).Thisnovelhas155mostly shortchapters,99ofwhichwereconsidered“expendable”byitsauthor.Evenmore, JulioCortázarproposedseveralalternativewaysinwhichhisbookcouldbereadasif thechapterswerepiecesofmanydifferentpossiblepuzzlestobeassembledatwillbyits readers.FollowingCortazar’slead,hereareafewsuggestedpathsforusingthisbook:
• IfyoulackareasonableknowledgeofRandhowtomakegraphics,youshoulddefinitelystartbyreadingtheintroductorymaterialaboutRandRgraphicsonthe companionwebsite.
• Shouldyounotbeinterestedinthehistoricalrootsandtheconceptualbasisofthe frequentistandBayesianframeworksoverwhichstatisticianshavespilledsomuch ink,youmayskipChapters 2 and 3.However,pleasehavealookatthefinaltable
ofChapter 3 highlightingthemaindifferencesbetweentheBayesianandfrequentist approachesthatareworthknowingevenifjustforbasicstatisticalliteracy.
• Ifyouarejustinterestedinaspecificdataanalysis(say,logisticregression,factorialanalysisofvariance,countregression), Table2.1 pointstothechaptersyouneeddepending ontheprobabilitydistributionappropriateformodelingeachtypeofresponsevariable. BewarethatyoumayneedtohavealookatpartsofChapter 8 tounderstandcertainkey featuresofthegeneralizedlinearmodelssuchasthelinkfunction.Themainaspectsof incorporatingnumericaland/orcategoricalexplanatoryvariablesinmodelsarecovered inChapters 4 to 6,andtheyarevalidforallmodelscoveredinthisbook.
• IfyouwishtolearneitherfrequentistorBayesianstatistics,youmayonlyreadselected partsofspecificchaptersandsimplydismisstheotherhalf.Butagain,atthispointin thetwenty-firstcenturyitisbecomingessentialforscientiststopossessatleastabroad understandingofthetheoretical/conceptualbasisofbothfrequentistandBayesian frameworksasdiscussedinChapter 3.Youwillneedthebasicsjusttoavoidgetting lostandbeingfooledwhilereadingpapers.
• ReadersonlyinterestedinBayesianstatisticsmayfinditfrustratingtherethereisno singlechapterdevotedtopriors,theperenniallydebatedfeatureofthisframework. StartinginChapter 4,thesettingofpriorsisprogressivelybuiltupincomplexityin differentchapters.Thereisasummaryofthemanynon-exclusivestepsorapproaches todefiningpriorsinthedifferentchaptersonpage323.
• ShouldyoubeinterestedinmodelselectionineitherthefrequentistorBayesianframework,youneedtoreadpartsofChapter 7 toacquireatleastaflavorofhowitisdonein eitherframework.Pleasereadthischapterbeforedoinganymodelselectionwithyour specificdatatype,asunwrittenandoraltraditionshaveplaguedtoomuchofstatistical modelselectioncarriedoutbylifescientists.Althoughthebookhaslimitedemphasis onmodelselectionissues,therearespecificexamplesinChapters 11 and 12.
• Readerswithdatastemmingfromspecificexperimentaldesignsshouldfirstreadthe chapterdealingwiththetypeofdatainPart II,thenhaveatleastaquickreadonthe theoreticalbasisofthemixedmodels(Chapter 13),andthencarryoutthedataanalysis perhapsinspiredbyoneoftheseveralexamplesgiveninthechaptersofPart III.
• Finally,forreaderswishingtoacquireabroadandreasonablyexhaustiveoverview ofunivariatestatistics,theauthorsuggestsstartingwithChapters 4 to 6,jumping toChapter 8 tocoverthebasictheoryofgeneralizedlinearmodels,andthengoing straighttothechapter(s)dealingwiththetypesofdataaccordingto Table2.1.
Whicheverofthesuggested(orother)pathsyoutakethroughthisbook,itisverylikely thatyouwillhavetoflipbackandforthtoimproveorcheckyourunderstandingofa concept,anidea,ortheinterpretationofmodelresults,orsimplythecodeforananalysis orafigure.Inthisregard,whileeachchapterisself-contained,thebookisheavilycrossreferencedtoallowyoutofindyourwaybackandforthbetweenchaptersasneeded.
References Abedin,J.andMittal,H.(2015). RGraphsCookbook,2ndedn.PacktPublishing,Birmingham. Beckerman,A.,Childs,D.,andPetchey,O.(2017). GettingStartedwithR:AnIntroductionfor Biologists.OxfordUniversityPress,Oxford. Chang,W.(2012). RGraphicsCookbook,2ndedn.CRCPress/ChapmanandHall,NewYork. Fienberg,S.(2014).Whatisstatistics? AnnualReviewofStatisticsandApplications,1,1–19.
Horton,N.andKleinman,K.(2011). UsingRforDataManagementStatisticalAnalysisand Graphics.CRCPress/ChapmanandHall,NewYork. Kabacoff,R.(2011). RinAction.ManningPublications,NewYork. LanderJ.(2017). RforEveryone:AdvancedAnalyticsandGraphics,2ndedn.Addison-Wesley,New York.
Petchey,O.Beckerman,A.,Childs,D.,etal.(2021). InsightsfromDatawithR:AnIntroduction fortheLifeandEnvironmentalSciences.OxfordUniversityPress,Oxford. Teetor,P.(2017). RCookbook.O’ReillyPublishing,NewYork. Teutonico,D.(2015). ggplot2Essentials.PacktPublishing,Birmingham. Senn,S.(2003). DicingwithDeath:Chance,RiskandHealing.CambridgeUniversityPress, Cambridge. Thieme,N.(2018).TheRgeneration. Significance,15,14–20. Wickham,H.(2016) ggplot2:ElegantGraphicsforDataAnalysis.Springer,NewYork. Wild,C.,Pfannkuch,M.,andHorton,N.(2011).Towardsmoreaccessibleconceptionsof statisticalinference. JournaloftheRoyalStatisticalSocietyA,174,247–295.
CHAPTER2 StatisticalModeling Ashorthistoricalbackground 2.1 Whatisastatisticalmodel? Usingdatatoteststatisticalhypotheses,tofitempiricalrelations,ortoexploresuggestivepatternsrequiresformulatingstatisticalmodels.Allstatisticaltestsofhypothesesand statisticalestimatorsofparametersarederivedfromstatisticalmodels.Inverygeneral terms,astatisticalmodelcanbedefinedasamathematicalequation(s)havingatleast onevariableexhibitingstochastic(i.e.,probabilistic)variationtorepresenttheinherent uncertaintyofobservingitspotentialvalues.
Thestatisticalmodelsconsideredinthisbookcontainasingleresponsevariable Y reflectingtheeffectof,orthevariationassociatedwith,theexplanatoryvariables X.The lattercanbeanynumberofnumericalvariables,categoricalvariablesdenotinggroups,or combinationsthereof(i.e.,interactionsbetweenexplanatoryvariables).Inallthemodels consideredinthisbook,theresponsevariableisarandomvariablewithanassociated probabilitydistributionwhoseparametersembodyboththeeffectoftheexplanatory variablesandthevariabilityofitspotentialvalues.Statisticalmodelsarethusequations thatcanbeseenasdata-generatingmechanisms.Theycontainexplicitassumptionsthat mayreproducethedataforsomecombinationoftheirparametersandvaluesofthe explanatoryvariables.
Youmightrecallfrompreviousintroductorycoursestheexistenceofprobabilitymass functions(PMFs)andprobabilitydensityfunctions(PDFs)thatareassociatedwithdiscreteandcontinuousrandomvariables,respectively.PMFsandPDFsarecollectivelyalso termed“probabilitydistributions,”andsometimesbotharealsosubsumedundertheterm PDF.Thenamesofsomeprobabilitydistributionsthatmayspringtomindarebinomial, Poisson,normal,andperhapsothers.Whichprobabilitydistributioncouldorshouldbe usedforeachstatisticalmodelessentiallydependsonthemainattributesofitsresponse variable.Ratherthanshowingabestiaryoftheprobabilitydistributionsthatwillbeconsideredinthisbookalongwiththeirequationsandtheirdifferentshapesaccordingto particularparametervalues,wesimplylisttheminrelationtothetypeofdatatowhich theyapply(i.e.,thedomainoftheresponsevariable)inTable 2.1,anddeferfurtherdetails totherespectivechapterswheretheanalysisofeachdatatypeisexplained.Inaddition, youcanfindsuchbestiariesofprobabilitydistributionsinalmostanystatisticsbookon theshelfofthelibraryofyourinstitute,aswellasontheinternet.
Yet,whymusttheresponsevariable Y ofallstatisticalmodelsbearandomvariable? Thereareseverallinesofargumentationforthis(BlitzsteinandHwang2014).Onelineof reasoningisthattherandomnessoftheoutcomevariablesresultsfromtheepistemic uncertainty(afancywayofsayinglimitedknowledge),andthemeasurementerrors
Table2.1 Listofprobabilityfunctionsconsideredinthisbook.
describedbeforepreventusfrompreciselypredictingthembeforeactuallymeasuring orestimatingthemduringthedatacollection.Wheneverwerepeatedlyperformsimple experimentsandmeasureorestimatethevaluesofanoutcomevariablethatcharacterize itsoutcome,weinevitablyobservethatvariabilityisapervasivefeature.Everytimeyou drivethesamecar5kmataconstantspeedittakesadifferentamountoftimetoreachits destination.Aftergivingthesameamountofwatertoidenticalgeneticclonesfromthe sameoriginalplantyouwillobservethatplantheightwillvaryamongthepots.Variability,beitduetotheuncertaintyofthevariablesaffectinganoutcomeortothevagaries ofmeasurement,isasimportantapartofrealityasarethemaintrendsobservedindata. Therewouldbenoneedforstatisticsintheabsenceofvariabilitysince,barringmeasurementerror,agivensetofinputswouldthenalwaysrenderthesamesetofobservable outputs.
Weareinterestedinstatisticsbecauseweneedtounderstandhowtoexplore,analyze,andinterpretthedataathandinthecontextofourcurrentresearch,orbecause wewishtounderstandsomeofthemainprinciplesinvolvedindesigningexperiments andsurveystogatherthedata.Trueenough,thestatisticsinvolvedintheexploration, analysis,andinterpretationofdataandindesigningexperimentsandsurveysrequiresa decentminimalbackgroundinprobabilitytheory.Thismuchyoualreadyknew,which iswhyyouacquiredsuchbasicknowledgebeforepickingupthisbook.Thisbookneed notpretendtobeaself-containedencyclopediabyrepeatingtheintroductorymaterialon probabilitywhoseretellinghasbecomeanenduringritualofstatisticstextbooks.Should youwishtorefreshthesefundamentalconceptsandideas,considerconsulting Wasserman(2004), WestfallandHenning(2014),and BlitzsteinandHwang(2014) amongmany, manyothers.
Moreinterestingandusefultothegoalsofthisbookwouldbetorecallthemaininterpretationsofprobability.Thisisbecausetheseinterpretationsofprobabilityunderlieand gaveorigintothefrequentistandBayesianframeworksofstatisticalinferencethatarethe subjectmatterofthisbook.
2.2 Whatisthisthingcalledprobability? Probabilityisaprincipledwayofquantifyinguncertaintybyassigningplausibilityor credibilitytoasetofmutuallyexclusivepossibilitiesorresultsofanexperimentor observation.Theconceptofprobabilityhasalong,interesting,andconvolutedhistory (see Tabak2004, Stigler1986,and Weisberg2014).Theoriginofmodernprobability stemsfromAntoineGombaud’s(hewasalsoknownasChevalierdeMéré)question toBlaisePascal(1623–1652)inaParissalonregardingthefairdivisionofstakesofan
interruptedcardgameaccountingforthepreviousandpotentialgainsofeachplayer. Gombaud’squestionsledtoabriefcorrespondenceexchangebetweenPascalandPierre deFermat(1607–1655).PascalandFermatweremostlyconcernedwithevaluatinggamblesandequity,notwithevaluatingeitherevidence(i.e.,data)ortruthinarguments. Theircorrespondencewouldprobablyhavevanishedfrompublicviewwereitnotfor ChristianHuygens’1657book DeRatiociniisinLudoAleae (OnReasoninginGamesofDice). Whilestillexclusivelyfocusingonanalyzinggamesofchance,Huygens’bookremained thereferenceforprobabilityforthenext50orsoyears.
JacobBernoulli’sposthumouslypublishedbook TheArtofConjecturing (1713)markeda turningpointinthehistoryofprobabilityforseveralreasons.First,Bernoulli(1654–1705) showedhowtocalculateprobabilitiesasafrequencyobtainedbytheratioofthenumber offavorableeventstothetotalnumberofevents.Bysodoing,Bernoullidefinedforever probabilityasanindexofuncertaintythatisboundedby0and1.Healsorelatedthe calculationof(someaspectsof)probabilitytodata,andmadethecruciallinkbetween probabilityandthelong-termfrequencyofanevent,latercalledthelawoflargenumbers. Second,Bernoulliappliedprobabilitytomodeluncertaintyinareasotherthangambling, suchashumanmortalityandcriminaljustice,andbysodoinghecreatedwhatcameto beknownas“subjectiveprobability.”Bymakingthecrucialintellectualleapofviewing humanexistenceasanexistentiallotteryakintoagameofchance,Bernoulliwasableto calculatemortalityoddsinordertopricethelifetimeyearlypaymentgivenbythestate tothelendersofmoneytoEuropeanstatesatthetime.Bernoulliwasprobablythefirst topracticallyapplyprobabilitytheoryoutsideofgamesofchance.
Followingthechronologicalline,ThomasBayes’paperpublishedposthumouslyin 1763becamethenextkeycontributiontowhatinthetwentiethcenturybecameknown asstatisticaltheory.Here’sthehistoricalcontext.TheScottishphilosopherDavidHume hadarguedin1748that“causesandeffectsarediscoverablenotbyreason,butbyexperience.”Humestatedthatwecanneverbecertainaboutthecauseofagiveneffectaseither orbothofthemmaybeduetoanasyetunknownultimatecauseofboth.InHume’sview, inductiveinferenceembracedtheuncertaintyofinferringcausesfromeffectsbyreferring toprobableratherthantodefinitecauses.BeingaPresbyterianministerandtrainedin mathematics,Bayes(1702–1761)wantedtocounterHume’sviewbyfindingamathematicalwaybasedonprobabilitytheorytoreliablyinferthecausefromanobservedeffect. HissolutioncametobeknownasBayes’theoremorrule.Itallowsustocomputeso-called “inverseprobabilities,”i.e.,thechancesofinferringacausefromitsobservedeffects.
Bernoulli’smonumentalcontributionwascontinuedbyPierre-SimonLaplace(1749–1827)withtwomajorworkspublishedin1774and1814.Inthem,Laplacenotonly furtherdevelopedthetwoviewsofprobabilitycontainedinBernoulli’sbook,butalso independentlyreachedthesameresultasBayes,usingaclearerandmorethoroughanalysis.ItisLaplace’sresultsthatformthecurrentbasisofwhatiscalledBayes’ruleinmodern statistics(Chapter 3).
Atthispoint,itisusefultoreconsiderthetwomaininterpretationsofprobabilityconsideredbyBernoulliandLaplaceinmoredetailtohelpsynthesizethemajorideas.You shouldalsobeawarethatthereareotherclassificationsandaccountsofthehistorical developmentsandinterpretationsofprobability(e.g., Tabak2004, Howie2004, Zabell 2005, Stigler1986, Weisberg2014),butforthesakeofbrevityitsufficestoconsiderhere thetwomain,broadinterpretationsofprobability.
Thefirstinterpretationofprobabilityissometimescalled aleatoryprobability (Spiegelhalter2019).Itdescribeseitherachanceorexperimentalsetupinvolvingtheprocess ofobtaininguncertainobservations,ortheintrinsicuncertaintyinnature.Let’salso
recallthattheunpredictablechanceeventsatsubatomiclevelscanonlybecharacterizedusingprobability.Aleatoryprobabilitydescribesthepropensityoftheoccurrence ofeventsreferringtoanobjectiverealitythatisindependentofanobserver’sknowledge,andoftheamountofinformationtheypossesstodescribeit.Thisinterpretation ofprobabilityincludesthefrequentistinterpretationpresentinthebooksofBernoulli andLaplace.Init,theprobabilityofaneventisthelong-runproportionoftimesthatit occurswithinasetofinfinitelymanyidenticalpotentialrepetitionsofanexperimentor observation.Calculatingthatproportionthusrequiresdefiningareferencesetofhypothetical,replicatedexperimentswhosecumulativeresultswouldreflectthetruetendency orpropensitytoobserveanyofallthepossibleoutcomes.Aleatoryprobabilityrequiresan actualorhypothesizedchancemechanismcapableofgeneratingasetofuncertainresults whosefrequencieswecouldcountinalargeorinfinitesetofidenticaltrialsorobservations.Theobjectiveorfrequentistinterpretationofprobabilitywaslaterformalizedin John Venn’s(1866) TheLogicofChance.InVenn’sview,probabilityisobjective,literal, andnotaconceptualorpersonalbelief.Asaconsequence,Venndogmaticallydismissed theuseofprobabilitytorefertosingleeventsoranythingunrelatedtofrequency(Howie 2004).ThefrequentistviewofprobabilitygainedtractioninthenaturalandsocialsciencesinthelatenineteenthcenturyasscientistsinspiredbyFrancisGaltonandAdolphe Queteletwereamassingeverlargeramountsofempiricaldata(see Clayton2021). Fisher (1925,p.25)creditedVennwith“developingtheconceptofprobabilityasanobjective fact,verifiablebyobservationsoffrequency,”whichcametobethedominantviewfor mostscientists.
Theothermaininterpretationofprobabilitycanbecalled epistemicprobability (Spiegelhalter2019).Here,probabilityreferstoameasureofapersonaldegreeofbeliefinsome proposition.Therefore,epistemicprobabilityisamentalconstructthatdoesnotdirectly applytoactualevents,buttoourimperfectknowledgeofrealitythatmaybeprogressively modifiedbyinformation.By“knowledgeofreality”werefertoanyobservableeventsuch asflippingacoin,rollingadie,orobservingtheresultsofexperiments.Firstconsideredby bothBernoulliandLaplaceasauniversalmodelofrationality,thisviewofprobabilitywas laterviewedbyAugustusDeMorganandGeorgeBooleasalogicalrelationshipbetween evidenceandbeliefthatmeasuresourignoranceaboutthetruestateofaffairsintheworld (Howie2004).ThisviewofprobabilityasameasureofadegreeofbeliefwaslaterchampionedbyJohnMaynard Keynes(1921),Frank Ramsey(1931),andBruno deFinetti(1933) Jeffreys(1939) wrote:“Theessenceofthepresenttheoryisthatnoprobability,direct, prior,orposterior,issimplyafrequency.”Incontrasttoaleatoryprobability,epistemic probabilityquantifiestheamountofinformationwepossessregardingtheoccurrenceof anevent,orthedegreeoftruthofastatement,orthedegreeof(un)certaintyaboutan event.Additionalinformationwouldgenerallydecreaseourignoranceandhencereduce ourepistemicuncertainty.
Considertheperennialexampleoftossingafaircoin.Theadjective“fair”todescribea coincouldcomefromthelong-termsequenceoftossesyieldingsimilarnumbersofheads andtails.Butitcouldjustaswellstemfromconsideringthephysicalconstitutionofthe cointhatallowsanequalchanceofobtainingaheadsoratails.Orfromtheabsenceof additionalknowledgethatwarrantsyoubelievingotherwise.Orfromouraprioripersonal considerationsthatitisanevenorfairbet.Theprobabilityofheadswillbethesame regardlessofone’sinterpretationofprobability.FollowingWilliam Feller(1967,p.3), “weshallnomoreattempttoexplainthe‘truemeaning’ofprobabilitythanthemodern physicistdwellsonthe‘realmeaning’ofmassandenergyorthegeometerdiscussesthe natureofapoint.”
2.3 Linkingprobabilitywithstatistics Ourshorthistoricalaccountofprobabilityendedsometimein1920s.ThisiswhenRonald A.Fisherformulatedanovelframeworkforafrequency-basedgeneraltheoryofparametric statisticalinference.Fisher’sframeworkincluded,amongotherthings,maximumlikelihood,testsofsignificance,randomizationmethods,samplingtheory,analysisofvariance, andexperimentaldesign.In1922Fisherpublishedoneofthemostinfluentialpapersin thehistoryofstatisticsthatfundamentallychangeditstheoryandmethodsforever.In thispaperhesingle-handedlycoinedfundamentalconceptssuchas“parameter,”“statistic,”“variance,”“sufficiency,”“consistency,”“information,”“estimation,”“maximum likelihoodestimate,”“efficiency,”and“optimality.”HewasalsothefirsttouseGreeklettersforunknownparametersandLatinlettersfortheirestimates.Muchlikeinclassical physics,thefoundingfatherofmodernstatisticswasalsoanill-temperedgeniusfrom Cambridge.Inthewordsof Hald(1999,p.1):“Therearethreerevolutionsinparametric statisticalinferencedueto Laplace(1774),GaussandLaplacein1809–1812,and Fisher (1922).”Thefirstrevolutionformallyintroducedthemethodofinverseprobability,the seconddevelopedlinearstatisticalmethodsbasedonthenormaldistribution,andthe thirdintroducedmaximumlikelihoodastheworkhorsemethodforstatisticalinference (Hald1999).Indeed,itmightbesaidthatstatisticsasthechildofprobabilitytheorywas bornwithBayes’posthumous1763paper,andwasbroughttomaturitybyLaplacewho usedinverseprobability,viathenow-standardBayes’theorem(Pawitan2001).Thesecondrevolutioninvolvedthedevelopmentofatheoryoferrorsby Gauss(1809).Itwas inspiredbytheneedtoadjustandsummarizeobservationaldatafromastronomyorsurveying.Gaussproposeduseofthenormaldistributionandtheprincipleofleastsquares asageneralmethodofestimation(Fig. 2.1).
Century Theory of probability Frequentist statistics
Bayesian statistics
XVII
1654
B. Pascal & P. Fermat exchange
1657
C. Huygens’ book
1713
J. Bernoulli’s book
T. Bayes paper
Fig.2.1 Datesandtheoreticallandmarks.
1866
J. Venn book
1812
P. Laplace book
Books by 1921 J. Keynes
1925 F. Ramsey
1933 B. de Finetti
1933 A. Kolmogorov
1922, 1925, 1935 R. Fisher books and papers
1939
H. Jeffries book
1932, 1934
J. Neyman & E. Pearson papers 1953
Metropolis et at paper
Rediscovery of MCMC
Priortothethird,Fisherianrevolution,statisticsmostlyconsistedofacollectionof semi-independent,discipline-specificmethods(Efron1998)developedtoanalyzedata inbiology,agronomy,psychology,astronomy,etc.Thesemethodsincludedtheleastsquaresmethodofestimation,linearregressionandcorrelation,chi-squaretables,andthe t-test(see Stigler1986, 1999 and Hald1999 forhistoricaloverviews).Thesemethodswere appliedtothelargedatasetsthatwereamassedthroughoutthenineteenthcenturyinthe