Full download Evidence-based statistics: an introduction to the evidential approach - from likelihoo

Page 1


Evidence-Based Statistics: An Introduction to the Evidential Approach - From Likelihood Principle to Statistical

Practice Peter M. B. Cahusac

Visit to download the full and correct content document: https://ebookmass.com/product/evidence-based-statistics-an-introduction-to-the-evide ntial-approach-from-likelihood-principle-to-statistical-practice-peter-m-b-cahusac/

More products digital (pdf, epub, mobi) instant download maybe you interests ...

Program Evaluation: An Introduction to an EvidenceBased Approach 6th Edition

https://ebookmass.com/product/program-evaluation-an-introductionto-an-evidence-based-approach-6th-edition/

Children’s Speech: An Evidence-Based Approach to Assessment and Intervention

https://ebookmass.com/product/childrens-speech-an-evidence-basedapproach-to-assessment-and-intervention/

Guide to Evidence-Based Physical Therapist Practice 4th Edition

https://ebookmass.com/product/guide-to-evidence-based-physicaltherapist-practice-4th-edition/

Forest Ecology: An Evidence-Based Approach Dan Binkley

https://ebookmass.com/product/forest-ecology-an-evidence-basedapproach-dan-binkley/

Introduction to the Practice of Statistics 9th Edition, (Ebook PDF)

https://ebookmass.com/product/introduction-to-the-practice-ofstatistics-9th-edition-ebook-pdf/

Symptom to Diagnosis An Evidence Based Guide, Fourth Edition

https://ebookmass.com/product/symptom-to-diagnosis-an-evidencebased-guide-fourth-edition/

Statistics for Evidence-Based Practice in Nursing –Ebook PDF Version

https://ebookmass.com/product/statistics-for-evidence-basedpractice-in-nursing-ebook-pdf-version/

Evidence-Based Practice in Nursing & Healthcare: A Guide to Best Practice

https://ebookmass.com/product/evidence-based-practice-in-nursinghealthcare-a-guide-to-best-practice/

Orthopaedic and Trauma Nursing: An Evidence-based Approach to Musculoskeletal Care 2nd Edition Sonya Clarke

https://ebookmass.com/product/orthopaedic-and-trauma-nursing-anevidence-based-approach-to-musculoskeletal-care-2nd-editionsonya-clarke/

Evidence-BasedStatistics

Evidence-BasedStatistics

AnIntroductiontotheEvidentialApproach—from LikelihoodPrincipletoStatisticalPractice

PeterM.B.Cahusac

Thiseditionfirstpublished2021 ©2021JohnWiley&Sons,Inc.Allrightsreserved.

Allrightsreserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystem, ortransmitted,inanyformorbyanymeans,electronic,mechanical,photocopying,recordingor otherwise,exceptaspermittedbylaw.Adviceonhowtoobtainpermissiontoreusematerial fromthistitleisavailableathttp://www.wiley.com/go/permissions.

TherightofPeterM.B.Cahusactobeidentifiedastheauthorsofthisworkhasbeenassertedin accordancewithlaw.

RegisteredOffice

JohnWiley&Sons,Inc.,111RiverStreet,Hoboken,NJ07030,USA

EditorialOffice 111RiverStreet,Hoboken,NJ07030,USA

Fordetailsofourglobaleditorialoffices,customerservices,andmoreinformationaboutWiley productsvisitusatwww.wiley.com.

Wileyalsopublishesitsbooksinavarietyofelectronicformatsandbyprint-on-demand.Some contentthatappearsinstandardprintversionsofthisbookmaynotbeavailableinother formats.

LimitofLiability/DisclaimerofWarranty

Whilethepublisherandauthorshaveusedtheirbesteffortsinpreparingthiswork,theymake norepresentationsorwarrantieswithrespecttotheaccuracyorcompletenessofthecontentsof thisworkandspecificallydisclaimallwarranties,includingwithoutlimitationanyimplied warrantiesofmerchantabilityorfitnessforaparticularpurpose.Nowarrantymaybecreatedor extendedbysalesrepresentatives,writtensalesmaterialsorpromotionalstatementsforthis work.Thefactthatanorganization,website,orproductisreferredtointhisworkasacitation and/orpotentialsourceoffurtherinformationdoesnotmeanthatthepublisherandauthors endorsetheinformationorservicestheorganization,website,orproductmayprovideor recommendationsitmaymake.Thisworkissoldwiththeunderstandingthatthepublisheris notengagedinrenderingprofessionalservices.Theadviceandstrategiescontainedhereinmay notbesuitableforyoursituation.Youshouldconsultwithaspecialistwhereappropriate. Further,readersshouldbeawarethatwebsiteslistedinthisworkmayhavechangedor disappearedbetweenwhenthisworkwaswrittenandwhenitisread.Neitherthepublishernor authorsshallbeliableforanylossofprofitoranyothercommercialdamages,includingbutnot limitedtospecial,incidental,consequential,orotherdamages.

LibraryofCongressCataloging-in-PublicationDataisappliedfor

ISBN:9781119549802

CoverDesign:Wiley

CoverImages:InsetgraphcourtesyofPeterM.B.Cahusac,Medicineabstract background©Zoezoe33/Shutterstock

Setin9.5/12.5ptSTIXTwoTextbySPiGlobal,Chennai,India

PrintedintheUnitedStatesofAmerica 10987654321

Contents

Acknowledgements xi

AbouttheAuthor xiii

AbouttheCompanionSite xv

Introduction 1 References 2

1TheEvidenceistheEvidence 3

1.1Evidence-BasedStatistics 3

1.1.1TheLiterature 4

1.2StatisticalInference–TheBasics 6

1.2.1DifferentStatisticalApproaches 7

1.2.2TheLikelihood/EvidentialApproach 8

1.2.3TypesofApproachUsingLikelihoods 11

1.2.4ProsandConsofLikelihoodApproach 11

1.3EffectSize–TrueIfHuge! 12

1.4Calculations 15

1.5SummaryoftheEvidentialApproach 16 References 18

2TheEvidentialApproach 21

2.1Likelihood 21

2.1.1ThePrinciple 22

2.1.2Support 24

2.1.3Example–OneSample 29

2.1.4DirectionMatters 36

2.1.5MaximumLikelihoodRatio 37

2.1.6LikelihoodIntervals 39

2.1.7TheSupportFunction 42

2.1.8ChoosingtheEffectSize 42

2.2MisleadingandWeakEvidence 46

2.3AddingMoreDataandMultipleTesting 48

2.4SequenceofCalculationsUsing t49

2.5LikelihoodTerminology 51

2.6RCodeforChapter2 52

2.6.1CalculatingtheLikelihoodFunctionforaOneSample t52

2.7Exercises 53 References 53

3TwoSamples 55

3.1BasicsUsingthe t Distribution 55

3.1.1StepsinCalculations 56

3.2RelatedSamples 56

3.3IndependentSamples 59

3.3.1IndependentSampleswithUnequalVariances 60

3.4CalculationSimplification 62

3.5IfVarianceIsKnown,orLargeSampleSize,Use z63

3.6Methodologicaland ProForma Analyses 65

3.7AddingMoreData 68

3.8EstimatingSampleSize 70

3.8.1SampleSizeforOneSampleandRelatedSamples 71

3.8.2SampleSizeforIndependentSamples 73

3.9DifferencesinVariances 73

3.10RCodeForChapter3 74

3.10.1CalculatingtheLikelihoodFunction,theLikelihoodsandSupportfor IndependentSamples 74

3.10.2CreatingaGardner–AltmanEstimationPlotwithLikelihoodFunction andInterval 76

3.11Exercises 77 References 77

4ANOVA 79

4.1MultipleMeans 79

4.1.1TheModellingApproach 79

4.1.2ModelComplexity 80

4.2Example–Fitness 81

4.2.1ComparingModels 82

4.2.2SpecificModelComparisons 84

4.2.2.1ANon-OrthogonalContrast 88

4.2.3UnequalSampleSizes 89

4.3FactorialANOVA 90

4.3.1Example–BloodClottingTimes 91

4.3.2SpecificAnalysesinFactorialANOVA,IncludingContrasts 93

4.4Alerting r 2 96

4.4.1Alerting r 2 toCompareContrastsforEffectSize 96

4.5RepeatedMeasuresDesigns 97

4.5.1MixedRepeatedMeasureswithBetweenParticipantDesigns 98

4.5.2ContrastsinMixedDesigns 100

4.6Exercise 102 References 102

5CorrelationandRegression 103

5.1RelationshipsBetweenTwoVariables 103

5.2Correlation 103

5.2.1LikelihoodIntervalsforCorrelation 107

5.3Regression 108

5.3.1ObtainingEvidencefrom F values 110

5.3.2ExaminingNon-linearity 111

5.4LogisticRegression 113

5.5Exercises 120 References 120

6CategoricalData 121

6.1TypesofCategoricalData 121

6.1.1HowIsthe �� 2 TestUsed? 122

6.2Binomial 123

6.2.1LikelihoodIntervalsforBinomial 125

6.2.2ComparingDifferent �� 126

6.2.3TheSupportFunction 127

6.3Poisson 129

6.4RateRatios 131

6.5One-WayCategoricalData 134

6.5.1One-WayCategoricalComparingDifferentExpectedValues 135

6.5.2One-WaywithMorethanTwoCategories 135

6.62 × 2ContingencyTables 137

6.6.1Paired2 × 2CategoricalAnalysis 139

6.6.2DiagnosticTests 141

6.6.2.1SensitivityandSpecificity 141

6.6.2.2PositiveandNegativePredictiveValues 142

6.6.2.3LikelihoodRatioandPost-testProbability 143

6.6.2.4ComparingSensitivitiesandSpecificitiesofTwoDiagnostic Procedures 144

6.6.3OddsRatio 146

6.6.3.1LikelihoodFunctionfortheOddsRatio 149

6.6.4LikelihoodFunctionforRelativeRiskwithFixedEntries 151

6.7LargerContingencyTables 151

6.7.1MainEffects 153

6.7.2EvidenceforLinearTrend 154

6.7.3HigherDimensions? 155

6.8DataThatFitsaHypothesisTooWell 158

6.9TransformationsoftheVariable 159

6.10ClinicalTrials–ATragedyin3Acts 161

6.11RCodeforChapter6 164

6.11.1One-WayCategoricalDataSupportAgainstSpecifiedProportions 164

6.11.2CalculatingtheOddsRatioLikelihoodFunctionandSupport 164

6.11.3CalculatingtheLikelihoodFunctionandSupportforRelativeRisk withFixedEntries 166

6.11.4CalculatingInteractionandMainEffectsforLargerContingency Tables 168

6.11.5Log-LinearModellingforMulti-wayTables 169

6.12Exercises 171 References 172

7NonparametricAnalyses 175

7.1So-Called‘Distribution-Free’Statistics 175

7.2Hacking SM 176

7.3OneSampleandRelatedSamples 176

7.4IndependentSamples 179

7.5MorethanTwoIndependentSamples 181

7.6PermutationAnalyses 182

7.7BootstrapAnalysesforOneSampleorRelatedSamples 184

7.7.1BootstrapAnalysesforIndependentSamples 186

7.8RCodeforChapter7 187

7.8.1CalculatingRelativeSupportforOneSample 187

7.8.2CalculatingRelativeSupportforDifferencesinTwoIndependent Samples 188

7.8.3CalculatingRelativeSupportforDifferencesinThreeIndependent Samples 189

7.8.4CalculatingRelativeSupportUsingPermutationsAnalysis 189

7.8.5BootstrapAnalysesforOneSample 191

7.8.6BootstrapAnalysesforTwoIndependentSamples 193

7.9Exercises 195 References 196

8OtherUsefulTechniques 197

8.1OtherTechniques 197

8.2CriticalPriorInterval 197

8.3FalsePositiveRisk 201

8.4TheBayesFactorandtheProbabilityoftheNullHypothesis 205

8.4.1Example 208

8.5Bayesian t Tests 210

8.6TheArmitageStoppingRule 212

8.7CounternullEffectSize 214 References 217

AppendixAOrthogonalPolynomials 219

AppendixBOccam’sBonus 221 Reference 222

AppendixCProblemswith p Values 223

C.1TheMisuseof p Values 223

C.1.1 p ValueFallacies 225

C.2TheUseof p Values 225

C.2.1TwoContradictoryTraditions 226

C.2.2Whitherthe p Value? 227

C.2.3Remedies 228 References 229

Index 231

Acknowledgements

Iwouldliketothankseveralpeoplewhoinfluencedmeduringthewritingofthis book.IamfortunateandhonouredtohavemadeacquaintancewithProfessor A.W.F.Edwards(UniversityofCambridge)andthankhimforhissuggestions andreprints.IappreciatetherepliestomyquestionsfromProfessorP.Dixon (UniversityofAlberta)andDrScottGlover(UniversityofLondon).Thisbook wouldhavebeendifficulttocompletewithoutthesupportofmylovingwife AnnahAdero.Finally,thisbookisdedicatedtotheCollegeofMedicineatAlfaisal UniversityinRiyadh.

AbouttheAuthor

PeterM.B.Cahusac graduatedwithBScHonsinPsychologyfromStAndrews Universityin1980,followedbyaPhDinneuropharmacologyfromtheMedical SchoolBristolUniversityin1984.Duringapost-doctoralatOxfordposition,he becameinterestedinstatisticsandsubsequentlyobtainedanMScinApplied StatisticsfromOxfordUniversityin1992.HetaughtstatisticsatStirling Universityfrom1990to2012.HewaselectedordinarymemberofthePhysiologicalSociety(UK)since1993andthenelectedFellow(FTPS)from2018.Hehas beenamemberoftheBritishPharmacologicalSocietysince2006.HeisFellow oftheRoyalStatisticalSociety,andGradStatstatussince2009.From2008,he becameparticularlyinterestedinthelikelihoodapproachtostatisticalinference asitappearedtoavoidsomeofthedifficultiesassociatedwithotherapproaches. In2014,alongwithDrPatriciadeWinter,hepublishedanintroductorybookon statistics.Currently,heisAssociateProfessorinBiostatisticsandPharmacology atAlfaisalUniversity,Riyadh,SaudiArabia.

AbouttheCompanionSite

Thisbookisaccompaniedbyacompanionwebsite: www.wiley.com/go/evidencebasedstatistics

Thewebsiteincludesmaterialsforstudents(openaccess):

● Rstatisticalcodeforlikelihoodratioandsupportcalculations

● Answers

Introduction

Likelihoodisthecentralconceptinstatisticalinference.Notonlydoesit leadtoinferentialtechniquesinitsownright,butitisasfundamentalto therepeated-samplingtheoriesofestimationadvancedbythe‘classical’ statisticianasitistotheprobabilisticreasoningadvancedbytheBayesian. ThusbeginsEdwards’sremarkablebookonLikelihood[1]. Fisherwasresponsibleformuchofthefundamentaltheoryunderlyingthe modernuseofstatistics.Hedevelopedmethodsofestimationandsignificance testingbutalso,accordingtoEdwards[1,p.3]‘quietlyandpersistentlyespoused analternativemeasurebywhichheclaimedrivalhypothesescouldbeweighed. Hecalledit likelihood ’.NeymanandPearsonweredrawntotheuseofthe likelihoodratio,stating‘ thereislittledoubtthatthecriterionoflikelihood isonewhichwillassisttheinvestigatorinreachinghisfinaljudgement’[2]. Eventuallytheyturnedawayfromusingit,whentheyrealizedthatitwould notallowthemtoestimatetheTypeIerrorprobabilitynecessaryforfrequentist statistics.Edwardsisnotalonewhenhelamentsinhis1992preface‘Nevertheless, likelihoodcontinuestobecuriouslyneglectedbymathematicalstatisticians’[1].

RichardDawkins(biologistandauthor)oncesaid‘Evidenceistheonlygood reasontobelieveanything’.However,‘evidence’hasbecomeanover-usedbuzz wordappropriatedinexpressionslike‘evidence-basededucation’.Overused andattachedtostatementsonpolicyorpractice,itisnodoubtusedwiththe intentionofenhancingorvalidatingtheirendeavours.Often‘evidence-based’ statementsappeartorefertostatisticsasprovidingtheevidence.However,we areinthecurioussituationwherethetwomostpopularstatisticalapproaches donotactuallyquantifyevidence.Bayesianandfrequentiststatisticsprovide probabilitiesratherthananyweightofevidence.Thelesserknownlikelihood approachisaloneinprovidingobjectivestatisticalevidence.Allthreeapproaches weredevelopedinBritain(specificallyEngland),yetonlythelikelihoodapproach providesadmissibleevidenceinBritishcourtsoflaw.

Evidence-BasedStatistics:AnIntroductiontotheEvidentialApproach–from LikelihoodPrincipletoStatisticalPractice, FirstEdition.PeterM.B.Cahusac. ©2021JohnWiley&Sons,Inc.Published2021byJohnWiley&Sons,Inc. CompanionWebsite:www.wiley.com/go/evidencebasedstatistics

Manyexcellenttextsinappliedstatisticsmakementionoflikelihoodsinceitisa keyconceptinstatisticalinference.Despitethis,fewtextsgivepracticalexamples todemonstrateitsuse.Noneareavailableattheintroductoryleveltoexplain, step-by-step,howthelikelihoodratiocalculationsformanydifferenttypesofstatisticalanalyses,suchascomparisonsofmeans,associationsbetweenvariables, categoricaldataanalyses,andnonparametricanalyses,aredone.Thecurrenttext isanattempttofillthisgap.Itisassumedthatthereaderhassomebasicknowledge ofstatistics,perhapsfromanintroductoryuniversityorschoolcourse.Otherwise, thereadercanconsultanyoneofalargenumberofexcellenttextsandonline resources.

JohnTukey,amathematicianwhomadehugecontributionstostatistical methodology,oncesaid:‘Farbetteranapproximateanswertothe right question, whichisoftenvague,thananexactanswertothewrongquestion,whichcan alwaysbemadeprecise’[3].A p valueprovidesanexactanswer,butoftentothe wrongquestion.

Forhistoricalreasons,likelihoodsandtheirratioswillprobablynotreplaceanalysesusingotherapproaches,especiallythewell-entrenched p value.However,the likelihoodapproachcansupplementorcomplementotherapproaches.Forsome, itwilladdanotherinstrumenttotheirstatisticalbagoftricks.

References

1 EdwardsAWF. Likelihood.Baltimore:JohnHopkinsUniversityPress;1992.

2 NeymanJ,PearsonES.Ontheuseandinterpretationofcertaintestcriteriafor purposesofstatisticalinference:partI. Biometrika.1928;20A(1/2):175–240.

3 TukeyJW.Thefutureofdataanalysis. TheAnnalsofMathematicalStatistics 1962;33(1):1–67.

TheEvidenceistheEvidence

Itisthesimplesuggestionthattheonlyvalidreasonforrejectingastatistical hypothesisisthatsomealternativehypothesisexplainstheobservedeventswith agreaterdegreeofprobability.1

—E.S.PearsononreceivingaletterfromW.S.Gosset[2,p.242]

1.1Evidence-BasedStatistics

Scienceadvancesfromevidence,andscientificevidenceguidesdecision-making, practice,andpolicy. Evidence-basedpractice encompassesnumerousfields:policy,design,management,medicine,education,etc.Inmedicine,practitionersand patientsalikerightlydemandandexpectthattreatmentsusedareevidence-based. Tosaythattheuseofaparticulartherapyisevidence-basedmeansthatithas sufficientevidencetosupportthebenefitofitsusecomparedwithotherpossible treatments.

Inscience,dataisobtainedinmanydifferentwaysdependingonthemethodology.Oftenthemethodologyisdictatedbytheconstraintspeculiartotheresearch area.Datacanprovideevidenceonanumberofdifferentlevels.Itmaybeanecdotal,maycomefromobservational,orfromexperimentalstudies.Anecdotal evidenceisregardedastheweakest,althoughitmaybethestartingpointformore systematicresearch.Atthenextlevel,multipleobservationsprovideobservational evidencewhichisusuallycorrelationalinnature.Acarefullydesignedstudy,such 1TaperandLele(p.545)emphasisadded’Theevidentialapproachisalone inhavingits measureofevidenceinvarianttointent,belief,andtimeofhypothesisformulation. Theevidence istheevidence.Bothbeliefanderrorprobabilitieshavebeenseparatedfromevidence.Thisisnot tosaythatbeliefanderrorprobabilitiesareunimportantinmakinginferences,butonlythat belief,errorprobabilities,andevidencecanbemosteffectivelyusedforinferenceiftheyarenot conflated’[1].

Evidence-BasedStatistics:AnIntroductiontotheEvidentialApproach–from LikelihoodPrincipletoStatisticalPractice, FirstEdition.PeterM.B.Cahusac. ©2021JohnWiley&Sons,Inc.Published2021byJohnWiley&Sons,Inc. CompanionWebsite:www.wiley.com/go/evidencebasedstatistics

asrandomizedcontrolledtrial,canprovidecausalevidencefortheeffectiveness ofatreatment.Finally,takingevidencefrommanyresearchstudiesmaybe achievedbycarryingoutmeta-analysesandsystematicreviews.Eachlevelinthe pyramidofevidencehasitsadvantagesanddrawbacks.

Appropriatestatisticalpracticeisfundamentaltodoinggoodscience.This bookisdifferentfrommoststatisticaltexts.Itisanintroductiontothelikelihood approachandprovidespracticalinstructionsonhowtoconvertdataintostatisticalevidence.Itusesthelikelihoodapproachthatisfullyobjectiveinproducing statisticalresultsthatdependonlyontheobserveddata.AsTaperandLele said‘…theuseofthelikelihoodratioasanevidencemeasureisthatonlythe modelsandtheactualdataareinvolved.Thisisquitedifferentfromtheclassical frequentistanderror-statisticalapproaches,wherethestrengthofevidenceis theprobabilityofmakinganerror,calculatedoverallpossibleconfigurationsof potentialdata’[1,p.538].

Thelikelihoodapproachencompassesarangeoftechniquesgroundedinestablishedstatisticaltheory.Thesetechniquesallowustoexpressrelativeevidenceasa ratiooflikelihoods.Thephrases evidentialapproach and likelihoodapproach will beusedinterchangeably.Usingtheevidentialapproachfreesusfromdependence onthesubjectiveconsiderationsthatbedevilotherapproaches.Basedonlyupon observedevidence,italwaysinformsuscorrectlyabouttherelativestrengthof evidenceforonehypothesisversusanother.

Afullerdiscussionofthedifficultieswithapproachesassociatedwith p values isrelegatedtoAppendixC.

1.1.1TheLiterature

Theuseofevidencebasedonlikelihoodsandlikelihoodratios(LRs)strikes thoseunfamiliarwithitashighlyspecializedandesoteric,evenarcane.There iswidespreadbelief,thoughmisguided,thatevidentialmethodologycanonly beusedsafelyandcrediblybyhighlyexperiencedorprofessionalstatisticians. Acontributingfactorsupportingthisbeliefisthefactthat,comparedwithother areasofstatisticalmethodology,therearerelativelyfewbooksandresearch papersontheevidentialapproach.However,thequalityofthetextsmakesupfor theirquantity.

Themostimportantbookonthesubjectis Likelihood byEdwards.Originally publishedin1972,itrepresentedahighlyoriginaltext.Anexpandededitionwas subsequentlypublishedin1992[3].A.W.F.Edwards(belowphoto)isastatistician andgeneticistwhodidhisPhDwithR.A.Fisher,whowasalsoastatisticianand geneticist.Edwards’sground-breakingbookcoversaremarkablerangeoftopics. Sometimesdenselywritten,othertimesappearingtocoverimportanttopics,such asthe F ratio,inacursoryfashion.Thesuccincttext,pepperedwithdryhumour

andunderstatement,repayscarefulreadingandre-reading.Manyglitteringgems relevanttoappliedstatisticsawaittobeminedandpolished.

Royall’sbook[4], StatisticalEvidence:ALikelihoodParadigm,published25years laterisaremarkablemonograph,providinga tourdeforce ofcarefullyarguedprose andexamplestoconvinceanyonestillindoubtaboutthemeritsoftheevidentialapproach.ThebookaddstoEdwards’swork,forexamplebyexplaininghow samplesizecalculationsrelevanttotheevidentialapproachcanbedone.

ThebooksbyEdwardsandRoyallareoutstandingsourcesofreferencefortheory andexamples.Theymakeanappealtoreasonastowhystatisticalinferencesbased onstatisticaltestsandBayesianmethodsareflawed,andthatonlythelikelihood approachisvalid.Thesebooksmayappearsomewhatinaccessibletoreaderswho lacksufficientmathematicalorstatisticalexpertise.

Adeeptheoreticalandphilosophicaltreatmentofthelikelihoodapproachis givenbyHacking[5].Thismayappealtophilosophersandtheoreticiansbutthere islittletherefortheappliedstatisticianorresearcher.

ProfessorA.W.F.EdwardsFRS. Source: PhotofromGonvilleandCaiusCollege,Cambridge.

Therearesomeexcellentbookswithlargesectionsdevotedtotheevidential approach.FirstupisthebookbyDieneswithhisexcellent,cogent,andentertaining UnderstandingPsychologyasaScience:AnintroductiontoScientificandStatisticalInference [6].Then,thereistheverysolidandthoroughtreatmentbyBaguley in SeriousStats:AGuidetoAdvancedStatisticsfortheBehavioralSciences [7].Both thesebooksofferlimitedcomputercodetoperformLRcalculations.Taperand Leleedited TheNatureofScientificEvidence:Statistical,Philosophical,andEmpiricalConsiderations whichconsistsofacompilationofchaptersincludingsome notableauthors,suchasRoyall,Mayo,andothers[1].Therearecommentariesto thechapters,includingbyD.R.CoxwhowascriticalofRoyall’sapproach,which wasfollowedbyarobustandmemorablerejoinderbyRoyall.

ThebookbyAitkenisausefuladdition,butislimitedinscopetoforensicstatisticalevidence[8].Pawitan’s InAllLikelihood isausefulmathematicaltreatmentofarangeoflikelihoodtopics[9].ClaytonandHills’s StatisticalModelsin Epidemiology [10]isexcellentbutlimitsitselftoepidemiologicalstatistics.Lindsey’sbook IntroductoryStatistics:AModellingApproach [11],makesextensiveuse ofthelikelihoodapproach.KirkwoodandSterne’s MedicalStatistics [12]isausefulpracticalbookthatdevotesachaptertolikelihood.Armitageetal’s Statistical MethodsinMedicalResearch [13]isasolidstandardreferenceworkformedicalstatisticswhichmakespassingreferencestothelikelihoodapproach.There aresomeexcellentbooksthatuseamodellingapproach,althoughwithoutlikelihoods,forexampleMaxwellandDelaney’s DesigningExperimentsandAnalyzing Data:AModelComparisonPerspective [14]andJuddetal’s DataAnalysis:AModel ComparisonApproachtoRegression,ANOVA,andBeyond [15].

Perhapsthemostconcentratedaccountoflikelihood,giveninjustafewpages, isbyEdwardsina2015entryforanencyclopaedia[16].Thereareanumberof accessibleresearchpapers.ThosebyGoodman[17–21](oneofthesejointlywith Royall),andDixonandGlover[22,23]areexemplaryinexplaininganddemonstratingarangeofevidentialtechniques.

1.2StatisticalInference–TheBasics

Togetherwiththedata,statisticalhypotheses and statisticalmodels areessential componentsforustobeabletodrawinferences.Hypothesesandmodelsprovide anadequateprobabilisticexplanationoftheprocessbywhichtheobserved dataweregenerated.Bystatisticalhypothesis,wemeanattributingaspecified quantitativeorqualitativevaluetoanidentifiedparameterofinterestwithinthe statisticalmodel.Asimplestatisticalhypothesisspecifiesaparticularvalue.For example,thenullhypothesisforameasureddifferencebetweentwopopulations mightbeexactly0.Ahypothesismayalsobearangeofvalues,knownasa

compositehypothesis,forexamplethedirectionofdifferenceinameasurement oftwopopulations(e.g.A > B).Bystatisticalmodel,wemeanthemathematical assumptionswemakeabouthowsampledata(andsimilarsamplesfromthesame population)weregenerated.Typically,amodelisaconvenientsimplificationof amorecomplexreality.Statisticalinferenceandestimationareconditionalona model.Forexample,incomparingheightsofmalnourishedandwell-nourished adults,ourmodelcouldassumethatthemeasurementsarenormallydistributed. Wemightspecifytwosimplehypothesestobecompared,forexample:thenullof 0differenceandapopulationmeandifferenceofmorethan3cmbetweenthetwo populations.Thedistinctionbetweenhypothesisandmodelisnotabsolutesinceit ispossibletoconsideroneofthesecomponentstobepartofthemodelononeoccasionandthencontestedasahypothesisonanother.Hence,ourmodelassumption ofnormallydistributeddatacoulditselfbequestionedbybecomingahypothesis.

1.2.1DifferentStatisticalApproaches

Therearethreemainstatisticalapproachestodataanalysis.TheseareneatlysummarizedbyRoyall’sthreequestionsthatfollowthecollectionandanalysisofsome data[24]:

1.WhatshouldIdo?

2.WhatshouldIbelieve?

3.HowshouldIinterprettheevidence?

Theydescribethedifferentwaysinwhichthedataareanalyzedandinterpreted. Eachapproachisimportantwithintheirspecificdomain.Thefirstoftheseispragmatic,whereadecisionmustbemadeonthebasisoftheanalysis.Itrepresentsthe frequentistapproachesofstatisticaltestsandhypothesistesting.Typically,either thenullhypothesisisrejected(evidenceforaneffectisfound)ornotrejected (insufficientevidencefound).Thedecisionisbaseduponacriticalprobability, usually.05.Thesignificancetestingapproachmeasuresthestrengthofevidence againstthenullhypothesisbythediminutivenessofacalculatedprobabilityof obtainingthedata(ormoreextreme)assumingthenullhypothesisistrue.This probabilityisknownasa p value.

Thesecondapproachrepresentsthestrengthofbeliefforaspecifiedhypothesis. Ittooisbaseduponprobabilityandisconditionedbytheprobabilityofthehypothesispriortothecollectionofthedata.Ifthepriorprobabilityisknown,thenthe calculationusingBayes’theoremlogicallyprovidesthe(posterior)probabilityfor thespecifiedhypothesis.

Thethirdapproachalsousesprobabilitybutprovidesobjectiveevidencewhich isexpressedasthelikelihoodforonehypothesisversusanotherintheformofa LR.TheLRisnotaprobabilitybutarelativemeasureofevidenceforcompeting

hypotheses.Thetechnicalmeaningoftheword‘likelihood’instatisticsisvery similartoitsuseincommonparlancebynon-statisticians.Forexamplewemight say,seeingdarkcloudsinthesky,‘thereisagreaterlikelihoodforrainthansunshinethisafternoon’.

WhentheLRistransformedintothenaturallogarithm,itisknownasthe support,denoted S.Thesupportquantifiesthecomparativeevidenceonascale of −∞ to +∞,withmidpoint0representingnoevidenceinfavourofeither hypothesis.Unliketheuseof p, S isagradedmeasureofevidencewithoutclear cutoffsorthresholds.

Ifthecollecteddataarenotstronglyinfluencedbypriorconsiderations, itissomewhatreassuringthatthethreeapproachesusuallyreachthesame conclusion.However,itisnotdifficulttofindexamplesofwherethelikelihood evidencepointsonewayandthehypothesistestingdecisionpointstheother(see Section3.7,anddeWinterandCahusac[25],p.89andDienes[6],p.127)

1.2.2TheLikelihood/EvidentialApproach

Inadvocatingtheevidentialapproach,Royallwrotein2004‘Statisticstodayis inaconceptualandtheoreticalmess.Thedisciplineisdividedintotworival camps,thefrequentistsandtheBayesians,andneithercampoffersthetools thatscienceneedsforobjectivelyrepresentingandinterpretingstatisticaldataas evidence’[24],p.127.

Inmakingsenseofdataandtomakeinferences,itisnaturaltoconsiderdifferentrivalhypothesestoexplainhowsuchasetofobservationsarose.Significance testingusesasinglehypothesistotest,typicallythenullhypothesis.Thetopof Figure1.1illustratesthetypicalsituationwhentestingasamplemean.Thesamplingdistributionforthemeanislocatedoverthenullvalue,seeverticaldashed linedowntothehorizontalaxis.Thesamplemeanindicatedbythecontinuous verticallineliesintheshadedrejectionregion.Theshadedregionrepresents5% oftheareaunderthesamplingdistributioncurve,with2.5%ineachtail.Significancetestingstatesapre-specifiedsignificancelevel �� ,typicallythisis5%.Since thevalueforthesamplemeanlieswithintheshadedarea,wecansaythat p < .05 andwerejectthenullhypothesisgivenour ��

Estimation,akeyelementinstatisticalanalysis,hasoftenbeenignoredinthe faceofdichotomousdecisionsreachedfromstatisticaltests.Ifresultsarereported asnon-significant,itisassumedthatthereisnoeffectordifferencebetweenpopulationparameters.Alternatively,highlysignificantresultsbasedonlargesamples areassumedtorepresentlargeeffects.Theincreaseduseofconfidenceintervals [26]isagreatimprovementthatallowsustoseehowlargeorsmallthemagnitudeoftheeffectsare,andhencewhethertheyareofpractical/clinicalimportance.Theseadvanceshaveincreasedthecredibilityofwell-reportedstudiesand

Figure1.1 Fromsamplingdistributiontolikelihoodfunction.Thetopcurveshowsthe samplingdistributionusedfortestingstatisticalsignificance.Itiscentredonthenull hypothesisvalue(often0)andthestandarderrorusedtocalculatethecurvecomesfrom theobserveddata.Belowthisinthemiddleisshownthe95%confidenceinterval.This usesthesamplemeanandstandarderrorfromtheobserveddata.Atthebottomshows thelikelihoodfunction,withinwhichisplottedthe S -2likelihoodfunction.Boththe likelihoodfunctionandthelikelihoodintervalusetheobserveddataliketheconfidence interval.

facilitatedourunderstandingofresearchresults.TheconfidenceintervalisillustratedinthemiddleportionofFigure1.1.Thisiscentredonthesamplemean (shownbytheend-stoppedline)andgivesarangeofplausiblevaluesforthepopulationmean[26].Theintervalhasafrequentistinterpretation:95%ofsuchintervals,calculatedfromrandomsamplestakenfromthepopulationofinterest,will containthepopulationstatistic.Theconfidenceintervalfocussesourattentionon theobtainedsamplemeanvalue,andthe95%limitsindicatehowfarthisvalueis fromparametervaluesofinterest,especiallythenull.Theintervalhelpsusdeterminewhetherthedatawehaveisofpracticalimportance.

AtthebottomofFigure1.1isshownthelikelihoodfunction.Thisisnoneother thanarescaledsamplingdistributionthatwesawaroundthenullvalue.Itis calculatedfromthedata,specificallyfromthesamplemeanandvariance.Itcontainsalltheinformationthatwecanextractfromthedata.Itiscentredonthe samplemeanwhichrepresentsthemaximumlikelihoodestimate(MLE)forthe

populationmean.Thelikelihoodfunctioncanthenbeusedtocomparedifferent hypothesisparametervalues.Usingsimplytheheightofthecurve,thelikelihood functionallowsustocalculatetherelativelikelihood,intermsofaratio,forany twoparametervaluesfromcompetinghypotheses.Wemaycompareanyvalueof interestwiththenull.Forexample,wemaytakeavaluethatrepresentsavalue thatisofpracticalimportance.Thismightbesituatedaboveorbelowthesamplemeanvalue.Ifthisvalueliesbetweenthenullandthesamplemean,thenthe ratiorelativetothenullwillbe ≥1.Ifthevalueislessthanthenull,thentheratio willbe <1.Thesamewillbetrueontheothersideofthesamplemeanuntilthe counternull 2 valueisreached,afterwhichtheratiowillbe ≤1.ThemaximumLR isobtainedatthesamplemeanvalue.Fortheillustrateddata,thisratiowas13.4, givingan S of2.6.Theevidencerepresentedbythelikelihoodfunctioniscentred ontheobserveddatastatistic.Thesamefunctioncentredonthenull,asusedin significancetesting,nowseemssomewhatartificial.

ThelikelihoodintervalshowninFigure1.1representsthatcalculatedforasupportof2(S-2),whichcloselyresemblesthe95%confidenceinterval,althoughits interpretationismoredirect:valueswithintheintervalareconsistentwiththecollecteddata.Avalueoutsidetheintervalhasatleastonehypothesisvalue,herethe samplemean,thathasmorethanmoderateevidenceagainstit.

Theprecisemeaningof p valuesobtainedinstatisticaltestsisdifficulttograspby theaveragescientist.Evenseasonedresearchersmisunderstandthem.Incontrast, thelikelihoodapproachisconceptuallysimple.Itusesthelikelihoodfunction, derivedfromthesamplingdistributionofthecollecteddata,toprovidecomparativeevidencefortwospecifiedhypotheses.Thelikelihoodapproachusesnothing otherthantheevidenceobtainedinthecollectedsample.For p values,thetail regionsofthesamplingdistributioncentredonthenullareused.Theseregions includevaluesbeyondthesamplestatisticwhichwerenotobserved.Whatcanbe thejustificationforincludingvaluesthatwerenotobserved?Laterinhiscareer, Fisher[27],p.71admitsthat‘Thisfeatureisindeednotverydefensiblesaveasan approximation’.‘Towhat?’repliesEdwards[28].ItisinterestingthatFisherthen proceedstocomparelikelihoods(pp.71–73)‘Itwould,however,havebeenbetterto havecomparedthedifferentpossiblevaluesof p,inrelationtothefrequencieswith whichtheactualvaluesobservedwouldhavebeenproducedbythem,asisdone bytheMathematicalLikelihood …’Concluding‘Thelikelihoodsuppliesanaturalorderofpreferenceamongthepossibilitiesunderconsideration’.Theuseofthe LRiscomputationallysimpleandintuitivelyattractive.TsouandRoyallobserve ‘Strongtheoreticalargumentsimplythatfordirectlyrepresentingandinterpreting

2Thecounternullisthevalueontheothersideofthesamplemeanthatisequidistantfromthe samplemeanasthenullisfromthesamplemean.SeeSection8.7.

statisticaldataasevidence,thepropervehicleisthelikelihoodfunction’.Adding pointedly‘Theseargumentshavehadlimitedimpactonstatisticalpractice’[29]. Perhapstheritualized[30]andover-rehearseduseof p valueshavemadethemso ingrainedinthescientificcommunity,thattheconceptuallysimplerLRstatistic hasnowbecomemoredifficulttograsp.

1.2.3TypesofApproachUsingLikelihoods

AkeyfeatureoftheevidentialapproachistheuseofLRbasedupontwovaluesselectedbytheresearcher.TheLRthenrevealswhichvalueisbestsupported bytheobservations.Typically,oneofthesevaluesisthenullhypothesisandthe otheravaluebelievedtorepresentaneffectsize(seebelowSection1.3).Theuseof explicithypothesisvalueshastheeffectofstrengtheningtheinferencesmadefrom theratiooftheirlikelihoods.Thismoreusefulapproachcanbeusedinmany differenttypesofanalyses,notablyintestingmeanswith t,analysisofvariance (ANOVA)usingcontrasts,correlation,incategoricaldatausingbinomial,Poisson,andoddsratio.Whereprecisehypothesisvaluescannoteasilybespecified,it isusefultouselikelihoodintervalstoshowwhichvaluesaresupportedbythedata. WhenoneofthechosenvaluesistheMLE,forexamplethesamplemean,then somethingcalledthe maximumLR iscalculated(seeSection2.1.5).Thisisequivalenttousing p valuessincethereisadirecttransformationbetweenitanda p value–theybothmeasureevidenceagainstasinglevalue,usuallyrepresentedby thenullhypothesis.ThiscanbeusefulinprovidingthemaximumpossibleLR, andhencesupport,foraneffectagainstanotherspecifiedvaluesuchasthenull.

1.2.4ProsandConsofLikelihoodApproach

Advantages:

1.Providesanobjectivemeasureofevidencebetweencompetinghypotheses, unaffectedbytheintentionsoftheinvestigator.

2.Calculationstendtobesimpler,andareoftenbasedoncommonlyusedstatisticssuchas z, t, and F .

3.Likelihoods,LRsandsupport(logLR)canbecalculateddirectlyfromthedata. Statisticaltablesandcriticalvaluesarenotrequired.

4.Thescaleforsupportvaluesisintuitivelyeasytounderstandrangingfrom −∞ to +∞.Positivevaluesrepresentsupportforprimaryhypothesis,negativevaluesrepresentsupportforsecondaryhypothesis.3 Zerorepresentsnoevidence foreitherhypothesis.

3Oftenthesecondaryhypothesiswillbethenullhypothesis.

5.Supportvaluesareproportionaltothequantityofdata,representingthe weightofevidence.Thismeansthatsupportvaluesfromindependentstudies cansimplybeaddedtogether.

6.Collectingadditionaldatainastudywilltendtostrengthenthesupportforthe hypothesisclosesttothetruevalue.Bycontrastwith p values,evenwhenthe nullhypothesisistrue,additionaldatawill always eventuallygivestatistically significantresults,toanylevelof �� (0.05,0.01,0.001,etc.)required.

7.Categoricaldataanalysesarenotrestrictedbynormalityassumptions,and withinagivenmodel,supportvaluessumalgebraically.

8.Itisversatileandhasunlimitedflexibilityformodelcomparisons.

9.Thestrongertheevidencethelesslikelyitistobemisleadingbecause S has auniversalboundof e m ofobservingmisleadingevidence,where m isthe supportforanytwohypotheses.

10.Unlikeotherapproachesbasedonprobabilities,itisunaffectedbytransformationsofvariables.

Disadvantages:

1.Doesnothaveaspecificthresholdbetweenstatisticallyunimportantand important.

2.Fewmajorstatisticalpackagessupporttheevidentialapproach.

3.Fewstatisticscurriculaincludetheevidentialapproach.

4.Hence,fewresearchersarefamiliarwiththeevidentialapproach.

1.3EffectSize–TrueIfHuge!

Breakingnews!Massivestory!Hugeiftrue!Thesearephrasesusedinmedia headlinestoreportthelatestoutrageorscoop.Howdowedecidehowbigthe storyis?Well,theremaybeseveraldimensions:timing(e.g.novelty),proximity (culturalandgeographical),prominence(celebrities),andmagnitude(e.g.numberofdeaths).Inscience,theissuesofeffectsizeandimpactmaybemoreprosaic butareactuallyofgreatimportance.Indeed,thisissuehasbeensadlyneglected instatisticalteachingandpractice.Toomuchemphasishasbeenputonwhether aresultisstatisticallysignificantornot.AsCohen[31]observed‘Theprimary productofaresearchinquiryisoneormoremeasuresofeffectsize,not p values’. Weneedtoaskwhatistheeffectsize,andhowwemeasureit.

Theeffectsize,orsizeofeffect,issimplytheobservedmagnitudeofadifferencebetweentwomeasurementsorthestrengthoftheassociationbetweentwo variables.Forexample,ifthereisaveryobviousdifferencebetweentheoutcomes producedbytwoclinicaltreatmentswithahighproportionofpatientscured,we wouldsaytheeffectsizeislarge.Ontheotherhand,ifthedifferencebetween

thetreatmentoutcomeswerebarelynoticeablethenwewouldsaythattheeffect sizeissmall.Ingeneral,thelargertheeffectsizethelargerwillbethepractical orclinicalimportanceoftheresult.Theeffectsizeinclinicaltreatmentsisclearly important,buttheeffectsizealsoimpactsontheassessmentoftheories–where theobservedeffectsizestronglyinfluencesthecredibilityofthetheory.

Acommonquestionishowdoweknowwhateffectsizeisofpracticalimportance?Thatdepends.Ifwethinkofprices,adifferenceof$1betweentwocar insurancequoteswouldprobablynotbeconsideredimportant.However,a$differenceinthecostofacoffeeofferedbytwosimilarcaféswouldlikelyinfluence ourchoiceofcafé.Sometimes,itismoredifficultthanthis.Forexample,adrug thatproducesanabsoluteriskreductionof1%mightappeartobeasmalleffect size.Thereductionmeansthatof100peopletakingthedrugtherewouldbe1 fewerpeoplesufferingfromthedisease.Ifthebaselinerateofthediseaseis10%in peoplenottakingthedrug,thentakingthedrugwouldreducethisto9%.Again thismightappearsmall,butifweconsideramillionpeople,thiswouldrepresent anextra10000peoplebeingaffectediftheydidnottakethedrug.

Reportingtheeffectsizefortheresultsofastudyisimportant,informingreaders abouttheimpactofthefindings.Effectsizeisalsousedinthecontextofplanning astudy,sinceitinfluencesthestatisticalpowerofthestudy.Here,itmaybespecifiedinthreeways.First,theeffectsizemaybethatexpectedfromsimilarprevious publishedworkorevenfromapilotstudy.Second,itmaybetheminimumeffect sizethatisofpracticalorclinicalimportance.Third,itmaybesimplyaneffectsize thatisconsideredtorepresentausefuleffect.Forexample,intestinganewtreatmentinhypertensivepatientswithsystolicbloodpressureof140mmHgormore, aclinicianmightjudgethatameanreductionofatleast10mmHgwouldbeclinicallyimportantandhaveacleardesirablehealthoutcome.Thiswouldrepresent theminimumeffectsize.Alternatively,aclinicianmightnotspecifyaminimum butmerelyjudgethatameandecreaseof15mmHgwouldbeclinicallyimportant.(Inpractice,otheraspectsofanewtreatmentneedtobeconsidered–will thetreatmentbefinanciallyaffordableandwhatmightbethelikelyadverseside effects?)

Inmostareasofresearch,thereissomeeffect,howeversmall,foratreatment differenceoracorrelation.Tukeyin1991[32]explainedthisinaforthrightmanner ‘Statisticiansclassicallyaskedthewrongquestion–andwerewillingtoanswer withalie,onethatwasoftenadownrightlie.Theyasked“AretheeffectsofA andBdifferent?”andtheywerewillingtoanswer“no.”Allweknowaboutthe worldteachesusthattheeffectsofAandBarealwaysdifferent–insomedecimal place–foranyAandB.Thusasking“Aretheeffectsdifferent?”isfoolish’.Hence, wegenerallyneedtothinkabouthowlargeaneffectis,ratherthanwhetheroneis presentornot.Thelatterpracticeisencouragedbythestatisticaltestingapproach where p islesserthanorgreaterthansomesignificancelevel.

Thehabitofthinkingabouteffectsizeforcestheresearchertofocusonthe phenomenonunderstudy.Itplacesemphasisonpractical/clinicalimportanceof findings.Onescientist,clearlyinterestedineffectsizes,expressestheirfrustration ‘Honestly,atsomepointI’dliketoworkonthingswheretheeffectsizeisgrounded onareal-worldmeasurableoutcome,butifI’mjustlookingatdifferencebetween psychmeasures,I’mnotsurehowtodefineitotherthanthat’(Twitter@PaoloAPalma22May2019).Thisbringsusontohowtodefineeffectsize.

Whichmetricshouldbeusedtodefineeffectsize?Baguleyarguesconvincingly thatthebestmeasureofeffectsizeusestheoriginalunitsratherthanstandardized measures,andthattheuseofverballabelssuchas‘large’or‘small’cansometimes bemisleading[33].Whatmaybeconsideredalargeeffectinonearea(e.g.epidemiology)maybeconsideredsmallinanother(e.g.adrugtreatmentforhypertension). Apopularstandardizedmeasureofeffectsizeforadifferenceinmeansis d.This isactuallyHedges’standardizedstatisticusingthesamplestandarddeviationSD ratherthanCohen’susingthepopulationparameter �� . 4

Therelativeeffectsizesusing d canbedescribedas:

Amoregeneralmeasureisprovidedbythecorrelationcoefficient r .However, thetransformbetween r and d isnotlinearsince r isrestrictedto 1and1,while d variesbetweennegativeandpositiveinfinity.Forexampleamediumeffect r of 0.3correspondstoa d of0.63(onthelargeside),andalarge r of.5corresponds toaverylargeeffectin d of1.15.Using d allowsustorelatemorenaturallytothe measurementsthataremade.

Effectsizeisgenerallyunaffectedbysamplesize,unlikethe p value.Ifthe nullisnottruethenthe p valueobtainedwillvaryaccordingtothesamplesize: otherthingsbeingequal,thelargerthesample,thesmallerthe p value.When consideringsamplesizeandstrengthofevidenceprovidedby p values,opposite 4ThereisanadjustmenttoHedges’statisticforsmallsamples,thatmeansmultiplyingthe valueby(N 3)/(N 2.25),seep.244in[7].

For both scenarios:

95% confidence interval

SD = 2

p = .01 versus mean of 0

1.4Calculations 15

Figure1.2 Effectsizeversussamplesize:whichprovidesmostevidenceagainst H 0 ? conclusionsarereachedbydifferentstatisticians[4],p.71.InFigure1.2,the 95%confidenceintervalsaroundmeansareplottedfortwosetsofdata.Foreach interval,thesamestandarddeviationisusedandthesame p valueisobtainedfor themean’sdifferencefrom0.However,thesamplesizesvary,sothatwith N = 4, thereisa2.6differencefrom0,andfor N = 80,thereisa0.6differencefrom0. Hence,thesizeoftheeffectismuchlargerfortheintervalusingfewobservations, whichmightindicatethatthisresultisofmorepracticalimportancethanthe resultobtainedwithalargerdataset.However,itisalsoarguedthatthedatawith larger N representsstrongerevidence,althoughitseffectsizeismuchsmaller andthe p valuesidentical.

1.4Calculations

EvidenceismeasuredbythenaturallogarithmoftheLR,knownasthesupport S Thewordsevidenceandsupportwillbeusedinterchangeably.

Givingdecimalplacesduringcalculationsistricky.Thedecimalplacesgivenfor valuesinthetextareusuallygiventoanaccuracythatallowsonetocheckformulaeandequations,oftengiveninstages.Occasionally,therewillbemismatches withthefinalanswerwhichwillbebasedonthemostaccuratecalculation possible.ThesecanusuallybecheckedfromtherawdatausingExcelorR.

Thesupport S willgenerallybeexpressedtoonlyonedecimalplace.Theuseof S ismerelyaguidetothestrengthofevidence.Itisgradedratherthanthresholded.

Theevidentialapproachdoesnotrequireanystatisticaltables.Allcalculations canbeperformedfromfirstprincipleswithahandcalculator,RorExcel spreadsheet.

1.5SummaryoftheEvidentialApproach

1.Chooseaparametervalueforprimaryhypothesis H 1 .Eitheravaluecorrespondingtopracticalimportance,ofminimumimportance,ortheexpected value.Elseuseamediumeffectsize,e.g. d =±0.5.Alternatively,usetheMLE.

2.Chooseasecondaryhypothesis H 2 tocomparewith H 1 .Oftenthisisthenull hypothesis H 0 .

3.Calculate S12 , S10 for H 0 ,or SM forMLE.

4.Assesstherelativeevidenceforthetwohypothesesonthegradedscalefrom −∞ to +∞

5.Alwaysuselikelihoodintervals,typicallyfor S-2and S-3.Likelihoodintervals aremoreflexibleandmaybemoreinformativethanexamining S forparticular hypotheses.

6.Ifpossibleandconvenient,plotthelikelihoodfunction.

Figure1.3givesaflowdiagramshowingthesequenceusedtocalculateand assesstheevidencefromadatasample.

Figure1.3 Aflowdiagramillustratingthegeneralprocedureofcalculatingand assessingevidence.Atthetop,westartwithdefininghypothesesofinterest.Theprimary hypothesis H 1 isthatspecifiedbyaneffectsizeorthesamplestatistic(maximum likelihoodestimate(MLE)).Thesecondaryhypothesis H 2 specifiesanothervalueof interest,oftenthisisthenullhypothesis.Thesupport S iscalculatedfromthelogarithm oftheLRfor H 1 versus H 2 .IftheMLEisusedthenthemaximumLRiscalculated,which becomes S M ontakinglogs.Thevalueof S indicatesthestrengthofevidenceforoneof thehypothesesagainsttheother.Ifthevalueisnegativethenthisrepresentsevidencein favourof H 2 .Ifthevalueispositivethenthisrepresentsevidenceinfavouroftheprimary hypothesis H 1 .Themagnitudeofthenegativeorpositivesupportvaluesindicatesthe relativestrengthoftheevidence,from ±1meaningweak, ±2moderate, ±3strong,and ≥±4extremelystrong.AnLRof1representsan S of0,whichisnoevidenceinfavourof eitherhypothesis.Thelikelihoodfunctionshouldbecalculatedwhereverpossibleand likelihoodintervalprovidedwhenpresentingresults.ThankstoAlfaisalstudent, MuhammadAffanElahi,forthesuggestiontouseflowchartshereandforFigure2.12.

Hypothesis H1

Hypotheses/models

Either of special interest (e.g. clinically important effect)

Or the MLE

Maximum likelihood estimate

Hypothesis H2

Either of additional interest

Or the null Hypothesis H0

Select as necessary

Support

Calculate support

Either S12 or S10 or SM

AssessEvidence

Assess the relative strength of evidence

If necessary make an inference about a hypothesis or model

If possible, calculate the Likelihood function and Determine the Likelihood interval S–2 or S–3 …Likelihood function and interval 1

1 TaperML,LeleSR,editors. TheNatureofScientificEvidence:Statistical, Philosophical,andEmpiricalConsiderations.Chicago:UniversityofChicago Press;2004.

2 PearsonES.‘Student’asstatistician. Biometrika.1939;30(3/4):210–50.

3 EdwardsAWF. Likelihood.Baltimore:JohnHopkinsUniversityPress;1992.

4 RoyallRM. StatisticalEvidence:ALikelihoodParadigm.London:Chapman& Hall;1997.

5 HackingI. LogicofStatisticalInference.Cambridge:CambridgeUniversity Press;1965.

6 DienesZ. UnderstandingPsychologyasaScience:AnIntroductiontoScientific andStatisticalInference.Basingstoke:PalgraveMacMillan;2008.

7 BaguleyT. SeriousStats:AGuidetoAdvancedStatisticsfortheBehavioral Sciences.Basingstoke:PalgraveMacMillan;2012.

8 AitkinCGG,TaroniF.Statisticsandtheevaluationofevidenceforforensic scientists.In:BarnettV,editor.(2nded).Chichester:JohnWiley&Sons;2004.

9 PawitanY. InAllLikelihood:StatisticalModellingandInferenceUsing Likelihood.Oxford:OxfordUniversityPress;2001.

10 ClaytonD,HillsM. StatisticalModelsinEpidemiology.Oxford:Oxford UniversityPress;2013.

11 LindseyJK. IntroductoryStatistics:AModellingApproach.Oxford:Clarendon Press;1995.

12 KirkwoodBR,SterneJAC. EssentialMedicalStatistics.2nded.Oxford: Blackwell;2003.

13 ArmitageP,BerryG,MatthewsJNS. StatisticalMethodsinMedicalResearch. Oxford:Wiley-Blackwell;2002.

14 MaxwellSE,DelaneyHD. DesigningExperimentsandAnalyzingData:AModel ComparisonPerspective.Belmont:WadsworthPublishingCompany;1990.

15 JuddCM,McClellandGH,RyanCS. DataAnalysis:AModelComparison ApproachtoRegression,ANOVA,andBeyond (3rded):Routledge;2017.

16 EdwardsAWF.Likelihoodinstatistics.In:WrightJD,editor. International EncyclopediaoftheSocialandBehavioralSciences (2nded).Oxford:Elsevier; 2015.p.116–9.

17 GoodmanSN.Towardevidence-basedmedicalstatistics.1:The p valuefallacy. AnnalsofInternalMedicine.1999;130(12):995–1004.

18 GoodmanSN.Towardevidence-basedmedicalstatistics.2:TheBayesfactor. AnnalsofInternalMedicine.1999;130(12):1005–13.

19 GoodmanSN,RoyallR.Evidenceandscientificresearch. AmericanJournalof PublicHealth.1988;78(12):1568–74.

20 DixonP.Theeffectivenumberofparametersinposthocmodels. Behavior ResearchMethods.2013;45(3):604–12.

21 GoodmanSN.Meta-analysisandevidence. ControlledClinicalTrials.1989; 10(2):188–204.

22 DixonP.The p-valuefallacyandhowtoavoidit. CanadianJournalof ExperimentalPsychology/Revuecanadiennedepsychologieexpérimentale.2003; 57(3):189–202.

23 GloverS,DixonP.Likelihoodratios:asimpleandflexiblestatisticforempiricalpsychologists. PsychonomicBulletinandReview.2004;11(5):791–806.

24 RoyallR.Thelikelihoodparadigmforstatisticalevidence.In:TaperML,Lele SR,editors. TheNatureofScientificEvidence.Chicago:UniversityofChicago; 2004.p.119–52.

25 deWinterP,CahusacPMB. StartingOutinStatistics:AnIntroductionfor StudentsofHumanHealth,Disease,andPsychology.Chichester:JohnWiley& Sons;2014.

26 CummingG,Calin-JagemanR. IntroductiontotheNewStatistics.NewYork: Routledge;2017.

27 FisherRA. StatisticalMethodsandScientificInference.Edinburgh:Oliver& Boyd;1956.

28 EdwardsAWF.Statisticalmethodsinscientificinference. Nature.1969; 222(5200):1233–7.

29 TsouT-S,RoyallRM.Robustlikelihoods. JournaloftheAmericanStatistical Association.1995;90(429):316–20.

30 SalsburgDS.Thereligionofstatisticsaspracticedinmedicaljournals. AmericanStatistician.1985;39(3):220–3.

31 CohenJ.ThingsIhavelearned(sofar). AmericanPsychologist.1990;45 (12):1304-12.

32 TukeyJW.Thephilosophyofmultiplecomparisons. StatisticalScience.1991; 6(1):100–16.

33 BaguleyT.Standardizedorsimpleeffectsize:whatshouldbereported? British JournalofPsychology.2009;100(3):603–17.

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.