Exact statistical inference for categorical data ebook pdf

Page 1

Exact Statistical Inference for Categorical Data - eBook PDF

Visit to download the full and correct content document: https://ebooksecure.com/download/exact-statistical-inference-for-categorical-data-ebo ok-pdf/

ExactStatisticalInference forCategoricalData

ExactStatisticalInference forCategoricalData

GuogenShan

imprint of Elsevier
UniversityofNevada,LasVegas,NV,USA AMSTERDAM • BOSTON
HEIDELBERG • LONDON NEW YORK
OXFORD
PARIS
SAN DIEGO SAN FRANCISCO
SINGAPORE
SYDNEY
TOKYO Academic Press is an

AcademicPressisanimprintofElsevier

125LondonWall,London,EC2Y5AS,UK

525BStreet,Suite1800,SanDiego,CA92101-4495,USA

225WymanStreet,Waltham,MA02451,USA

TheBoulevard,LangfordLane,Kidlington,OxfordOX51GB,UK

Copyright©2016ElsevierLtd.Allrightsreserved.

Nopartofthispublicationmaybereproducedortransmittedinanyformorbyanymeans,electronicor mechanical,includingphotocopying, recording,oranyinformationstorageandretrievalsystem, withoutpermissioninwritingfromthepublisher.Detailsonhowtoseekpermission,further informationaboutthePublisher’spermissionspoliciesandourarrangementswithorganizationssuchas theCopyrightClearanceCenterandtheCopyrightLicensingAgency,canbefoundatourwebsite: www.elsevier.com/permissions.

Thisbookandtheindividualcontributionscontainedinitare protectedundercopyrightbythe Publisher(otherthanasmaybenotedherein).

Notices

Knowledgeandbestpracticeinthisfieldareconstantlychanging.Asnewresearchandexperience broadenourunderstanding,changesinresearchmethods,professionalpractices,ormedicaltreatment maybecomenecessary.

Practitionersandresearchersmustalwaysrelyontheirownexperienceandknowledgeinevaluating andusinganyinformation,methods,compounds,orexperimentsdescribedherein.Inusingsuch informationormethodstheyshouldbemindfuloftheirownsafetyandthesafetyofothers,including partiesforwhomtheyhaveaprofessionalresponsibility.

Tothefullestextentofthelaw,neitherthePublishernortheauthors,contributors,oreditors,assume anyliabilityforanyinjuryand/ordamagetopersonsorpropertyasamatterofproductsliability, negligenceorotherwise,orfromanyuseoroperationofanymethods,products,instructions,orideas containedinthematerialherein.

ISBN:978-0-08-100681-8

BritishLibraryCataloguinginPublicationData

AcataloguerecordforthisbookisavailablefromtheBritishLibrary

LibraryofCongressCataloging-in-PublicationData

AcatalogrecordforthisbookisavailablefromtheLibraryofCongress

ForinformationonallAcademicPresspublications visitourwebsiteathttp://store.elsevier.com/

LISTOFFIGURES

1.1Tailprobabilityplotsforabinomialcomparativestudybased onthreeteststatistics........................................................ 12

1.2Tailprobabilityplotsforabinomialcomparativestudybased onpartialmaximizationbyusingtheZ-pooledteststatistic. Thetwogreenlinesaretheexact0.999confidenceinterval forthenuisanceparameter,andthepurpledotisthevaluefor theBB p-value............................................................... 14

1.3Two-sidedandone-sidedtypeIerrorratecomparisonsamong thefiveexactapproacheswithabalancedsamplesizeof 100pergroupatthesignificancelevelof α = 0.05,based ontheteststatistics TPD , TZuP ,and TZP fromthefirstrowto thethirdrow...................................................................

1.4Two-sidedandone-sidedtypeIerrorratecomparisonsamong thefiveexactapproacheswithunbalancedsamplesize, nt = 100and nc = 50atthesignificancelevelof α = 0.05, basedontheteststatistics TPD , TZuP ,and TZP fromthefirst

1.5Powercomparisonsfortwo-sidedandone-sidedproblems amongthefiveexactapproacheswithbalancedsamplesize, nt = 100and nc = 100atthesignificancelevelof α = 0.05, underthealternative pc = 0.3and pt = pc + θ ,basedon theteststatistics T

1.6Powercomparisonsfortwo-sidedandone-sidedproblems

amongthefiveexactapproacheswithunbalancedsamplesize, nt = 100and nc = 50atthesignificancelevelof α = 0.05, underthealternative pc = 0.3and pt = pc + θ ,basedon theteststatistics TPD , TZuP ,and TZP fromthefirstrowto thethirdrow................................................................... 25

2.1Plotsofthetailprobabilityasafunctionofthenuisance parameterfortheanimalcarcinogenicitystudywith K = 4 andasamplesizeof10pergroup.. ...................................... 33

18
rowtothethirdrow.......................................................... 21
PD , TZuP ,and TZP fromthefirstrowto thethirdrow................................................................... 23
vii

2.2TypeIerrorrateforthefiveexactapproacheswhen K = 3

andasamplesizeof30pergroupatthesignificancelevelof

α = 0.05,theleftplotwiththescorevalue d = (0,1,2),and therightplotwiththescorevalue d = (0,1,4)....................... 34

2.3Powercomparisonamongthefiveexactapproacheswhen K = 4,asamplesizeof30pergroup,andthescorevalue d = (0,1,2,3).................................................................

2.4PowercomparisonsamongtheCapproach,theMapproach, andtheC+Mapproach,usingthe χ 2 testwithtotal samplesizesof25,50,100,and300fromthefirstrowto thefourthrow.................................................................. 40

viii ListofFigures
36
1.1A2 × 2ContingencyTable................................................ 2 1.2ARandomizedPlacebo-ControlTwo-ArmStudyforPatients withChronicNoncancer-RelatedPain.................................. 2 1.3AssociationBetweenSmokingandUADTCancer. ................. 4 1.4AirwayHyper-Responsiveness(AHR)StatusBeforeandAfter StemCellTransplantation(SCT)......................................... 5 2.1A2 × K ContingencyTable............................................... 30 2.2DataFromtheAnimalCarcinogenicityStudywith K = 4........ 33 ix
LISTOFTABLES

Withthedevelopmentofcomputationaltechniques(e.g.,super-computers, parallelcomputing)andstatisticalsoftwarepackages(e.g.,SAS,R,Stata, StatXact,SPSS,Matlab,PASS),exactstatisticalinferenceforcategorical dataanalysisisincreasinglyavailableforuseinpractice.Inthecasesthat traditionalasymptoticapproachesdonothavesatisfactoryperformancewith regardstotypeIerrorcontrolandaccuratesamplesizedetermination, exactapproachesshouldbeutilized.Thisbookprovidesanoverviewof exactapproaches,includingFisher’sexactapproach,whichisalsoknown astheexactconditionalapproach,andseveralefficientexactunconditional approaches.Realexamplesareprovidedtoillustratetheapplicationofthese exactapproaches,andtheseapproachesarealsocomprehensivelycompared inmanyimportantstatisticalproblems.

Thefirstchapterreviewsthethreesamplingmethodstogeneratedata,then presentsthefiveexactapproaches.Datathatcanbeorganizedina2 × 2 contingencytableisconsideredinthischapter.Amongthefiveapproaches, oneistheexactconditionalapproach,andtheremainingfourareunconditional.Thisbookisthefirsttocomprehensivelycomparetheperformance ofthefiveexactapproachesfordatafromabinomialcomparativestudy. Chapter2 dealswithdatafroma2 × K tablebyapplyingtheexact approaches.Suchdataiscommonlyobtainedfromadose-responsestudy, andageneticstudy.Thelastchapterisgiventosamplesizedetermination basedonexactapproaches.Poweranalysisisanessentialpartofaresearch proposal,andaccuratesamplesizedeterminationwouldincreasethechance oftheproposalbeingfundedandfinishedinatimelymanner.

Ithankthosewhoprovidedvaluablecommentsandhelpfulsuggestions aboutthebook,includinganonymousreviewersforthisbook.Iamgrateful toProf.ChrisLloydfromMelbourneBusinessSchoolfordevelopingthe mostrecentexactapproachforcategoricaldataanalysis.IalsothankProfs. DanielYoung,WeizhenWang,ChangxingMa,GregoryWilding,andAlan Hutsonfortheirvaluablecomments.Finally,Iwouldliketothankmywife, Yanjuan,forhercontinuedsupportofmy academiccareer,andourparents fortakingcareofusandourchildren.

PREFACE
xi

ExactStatisticalInferencefora2 × 2Table

A2 × 2contingencytablecommonlyarisesinacomparativestudy (e.g.,astudytocomparetheresponseratebetweentwotreatments)or across-sectionalstudy(e.g.,astudytotesttheassociationbetweentwo dichotomousvariables).Barnard[1]wasthefirsttodescribethreedistinct samplingmethodstogeneratea2 × 2contingencytable;see Table1.1.They areoftentermedasa2 × 2independentstudy,a2 × 2comparativestudy, andadoubledichotomystudywithbothmarginaltotalsfixed,onemarginal fixed,andnomarginalfixed,respectively.Inadoubledichotomystudy withnomarginalfixed,thetotalsamplesizeofastudy, N,isconsidered fixed.

IndependentStudy

Inthefirstsamplingmethodfora2 × 2independentstudy,bothmarginal totalsofa2 × 2tableareconsideredasfixed,thatis,samplesizes (n1 , n2 ) for thefactorAand (m1 , m2 ) forthefactorB,areknownbeforethestudy.One classicalexampleistheexperimentdescribedbyFisher[2]:aladyclaimed thatshecouldtellwhethermilkwaspouredbeforeorafterteainacup.In thisinterestingexperiment,eightcupswerepreparedwithfourofeachkind. Theladywasinformedthatamongtheseeightcups,fourwerepreparedwith teafirstandtheremainingfourwithmilkfirst.Aftertheladytastedalleight cups,shereportedwhichfourcupsshethoughthadmilkaddedfirst.Itis obviousinthisexperimentthatbothmarginaltotalsarefixed,withfourfor eachkindonbothmarginaltotalsfromthetruthandthelady’sanswer.Such studiesarerelativelyrareinpracticeduetothefactthatsubjectsinthestudy wereinformedaboutthenumberofeachkind,andforthisreason,a2 × 2 independentstudywillnotbefurtherdiscussedindetailhere.

ComparativeStudy

A2 × 2comparativestudyinvolvestwoindependentgroupswithsample sizes n1 and n2 .Attheendofthestudy,theassociatednumberofevents (e.g.,response,survival) x1 and x2 areobservedfromthefirstgroupandthe secondgroup,respectively.Itisofteninterestingtocomparetheresponse

ExactStatisticalInferenceforCategoricalData. http://dx.doi.org/10.1016/B978-0-08-100681-8.00001-4

Copyright©2016ElsevierLtd.Allrightsreserved.

1
CHAPTER 1
1

Table1.1A 2 × 2 Contingency

ratebetweenthetwogroups.ThefollowingPhaseIIIstudyisusedto illustratethesettingofa2 × 2comparativestudy.

Example1.1. Arandomized,placebo-controlledtwo-armPhaseIII clinicaltrialwasconductedtoevaluateorallubiprostoneforconstipation associatedwithnon-methadoneopioidsinpatientswithchronicnoncancerrelatedpain[3].Patientswererandomizedintoeitherthetreatmentgroup treatedwithlubiprostoneortheplacebogroup,andtheywerefollowedfor 12weeksinthestudy.Thedataisdisplayedin

Table1.2.Attheendof thestudy, x1 = 58respondersoutof n1 = 214wererecordedfromthe treatmentgroup,while x2 = 41outof n2 = 217patientswereobserved fromtheplacebogroup.

Theresponserates,theprimaryendpoint,areestimatedtobe27.1% and18.9%forthetreatmentgroupandtheplacebogroup,respectively. Tocomparetheresponsedifferencebetweentwogroups,Pearson’s χ 2 teststatistic

Source:FromJamaletal.[3],withpermission.

2 ExactStatisticalInferenceforCategoricalData
FactorA Yes No Total FactorB Yes n11 n12 m1 No n21 n22 m2 Total n1 n2 N
Table
Tχ 2 = (n11 n22 n12 n21 )2 N n1 n2 m1 m2 isusedfortestingthenullhypothesis H0 : p1 = p2 againstthealternativehypothesis
TreatmentGroup PlaceboGroup Response Yes 58 41 No 156 176 Total 214 217
Table1.2ARandomizedPlacebo-Control Two-ArmStudyforPatientswithChronic Noncancer-RelatedPain

Ha : p1 = p2 .

Thistestcanbefoundinthefunction prop.test fromstatisticalsoftware R, andthe freq procedurefromSAStocomparetwoindependentproportions. Itshouldbenotedthatthe χ 2 teststatisticisequivalenttotheZteststatistic withapooledvarianceestimate,whichisgivenas

ˆ p1 −ˆp2

ˆ p(1 −ˆp)(1/n1 + 1/n2 ) , where ˆ p1 = x1 /n1 , ˆ p2 = x2 /n2 ,and ˆ p = (x1 + x2 )/(n1 + n2 ).Itisobvious thattheZteststatisticcanbeappliedtoaone-sidedproblem,butthe χ 2 test statistic Tχ 2 isonlyusedforatwo-sidedproblem.

Theasymptoticlimitingdistributionsofthe χ 2 testandtheZtestare oftenusedforstatisticalinference,andtheyareappropriateforusein practiceonlywhencellfrequenciesarelargeenough.The χ 2 testisnot recommendedforusewhenthelowestexpectedfrequenciesfromthefour cellsislessthan5[4, 5].However,Cochran[6]arguedthatthecutpoint value5ischosenarbitrarily,andthiscutpointmaybemodifiedwhennew evidencefromdatabecomesavailable.Fordatawithsmallcellfrequencies, exactapproaches(e.g.,Fisher’sexactconditionalapproach)aregenerally recommended[2, 7, 8].Severalexactapproaches[2, 8–15]willbediscussed laterin Section1.1.

DoubleDichotomyStudy

Inadoubledichotomystudy,onlythetotalsumisfixed,whichiscommonin across-sectionalstudyfortestinganassociationbetweentwodichotomous variables.Asampleofsize N isdrawnfroma population, andeachmember ofthesampleisclassifiedaccordingtothetwodichotomousvariables, A and B.Forsuchstudies,therowandcolumntotalsarenotfixedinadvance; onlythetotalsumisfixed.Onetypicalexampleisacross-sectionalstudyto testtheassociationbetweensmokingandcancer.

Example1.2. Krishnatreyaetal.[16]reportedaretrospectivestudy ofupperaerodigestivetract(UADT)cancerpatientsfrom2010to2011 fromthehospitalcancerregistry.Forthe N = 56patientsdocumented withtheoccurrenceorpresenceofsynchronousprimaries,eachpatientwas askedabouthis/hersmokingstatus,andwastestedwhetherornotUADT appearsatbothindexandsynchronous.Datafromthisstudyispresented in Table1.3.Oneofthemainresearchquestionsfromthisstudywasto testtheassociationbetweenthesmokinghistoryandthepresenceofUADT synchronouscancers.

ExactStatisticalInferencefora2 × 2Table 3

Table1.3AssociationBetween SmokingandUADTCancer

Source:FromKrishnatreyaetal.[16],with permission.

ThePearson χ 2 testwasusedfortestingtheassociation,andthe p-value wasfoundtobemuchlessthan0.05.Then,theauthorsconcludedthat smokingwassignificantlyassociatedwiththeoccurrenceofsynchronous primariesinUADT.TheyalsomentionedthattheYates’correctionwas usedinthe χ 2 teststatisticassmallexpectedfrequencieswereobserved fromthetable.

InadditiontothecommonlyusedPearson χ 2 teststatistic,thelikelihood ratio χ 2 test,oftenreferredtoastheGtest,isanalternativetotestthe hypothesisofindependence.Basedonthestandardmaximumlikelihood method,thelikelihoodratio χ 2 testwillbecloseinresultstothePearson’s χ 2 testforlargesamples.Thelikelihoodratio χ 2 teststatistichastheformas

where0log0 = 0.Althoughtheseasymptotictestsperformwellinthe presenceoflargesamplesizes,theycouldperformpoorlyinasmallsample setting[17].

Inadditiontothecross-sectionalstudy,amatched-pairsdesignisanother importantapplicationofadoubledichotomystudy.Withthetotal N subjects enrolledinastudy,eachsubjectismeasuredtwice,beforeandafteran intervention.

Example1.3. Benturetal.[18,p.847]conductedastudyonairway hyper-responsiveness(AHR)statusbeforeandafterstemcelltransplantation(SCT)on21patients;see Table1.4 forthedata.TheAHRstatus foreachpatientisassessedusingamethacholinechallengetest(MCT) twice,beforeandafterSCT.ApositiveMCT(AHRyes)isdefinedby PC20 < 8mg/ml.

4 ExactStatisticalInferenceforCategoricalData
UADTtumors Yes No Smokinghistory Yes 45 1 No 5 5
TLR = 2 2 i=1 2 j=1 nij log nij N mi nj ,

Source:FromBenturetal.[18],withpermission.

Inadditiontoamatched-pairsstudywhereeachsubjectismeasured twice,itcouldbeastudyinwhicheachsubjectismatchedwithanequivalentfromanotherstudy.Inpractice,datafromanotherexperimentisalready existoreasytoobtain.Suchdesignscanbeusedtoreducetheinfluenceof possibleconfoundingfactors.Traditionally,the χ 2 testandthelikelihood ratiotestareusedfortestingtheassociationbetweentwodichotomous variables.

Let pij = nij /N beprobabilityforthe i-thlevelofthefactor A and j-thlevelofthefactor B.Suppose p1 = p11 + p21 and p2 = p11 + p12 are themarginalprobabilities.Thedifferencebetweenthesetwoproportionsis oftentheparameterofinterest, p1 p2 ,orequivalently p21 p12 .Tomakea statisticalinferenceforthisparameter,themostcommonlyusedteststatistic istheMcNemartest[19]:

Itshouldbenotedthatonlytheoff-diagonalnumbers, n12 and n21 ,froma 2 × 2tableareusedintheteststatistic,andthediagonalvalues, n11 and n22 , havenoinfluenceoncomputingtheteststatisticandthe p-valuecalculation. Therehasbeenalong-termdebateoverwhetherallvaluesshouldbeused intheteststatistic.

1.1EXACTTESTINGPROCEDURES

Whensamplesizeinastudyisincreasedfromsmalltolarge,asymptotic approachesaretraditionallyusedfordataanalysis.However,thesignificancevaluetheyprovideisonlyanapproximation,becausethesampling distributionoftheteststatisticisonlyapproximatelyequaltothetheoretical

ExactStatisticalInferencefora2 × 2Table 5
BeforeandAfterStemCell Transplantation(SCT) BeforeSCT AHRyes AHRno AfterSCT AHRyes 1 7 AHRno 1 12
Table1.4Airway Hyper-Responsiveness(AHR)Status
TMC = (n21 n12 )2 n21 + n12 .

limitingdistribution,forexample,a χ 2 distribution,astandardnormal distribution.Theapproximationisinadequateincaseswherethetotal samplesizeissmall,ortheexpectedvaluesforcellsinthetablearelow.

Indiscretedataanalysis,unsatisfiedtypeIerrorcontrolfromtraditionally usedasymptoticapproacheshasbeenobservedinmanystatisticalproblems. Inacomparativebinomialstudy,Pearson’s χ 2 testisoftenassociated withaninflatedtypeIerrorrate,whilethe χ 2 testbasedonYates’ correction[20]isalwaysconservative,withactualtypeIerrorratebeing lessthanthenominallevel,andoftenlessthanhalfofthenominallevel [7, 11, 21–23].UncontrolledtypeIerrorrateinastudycouldleadtoeither under-oroverestimatedsamplesizecalculation.Severalmodified χ 2 test statisticswereproposedtoincreasetheperformanceofthePearson’s χ 2 test,forexample,theuncorrected χ 2 test[24]andre-corrected χ 2 test[25]. UncontrolledtypeIerroroccursnotonlyina2 × 2table,butalsoinother typesofdata.Forexample,adose-responsestudytotestatrendfordatain a2 × K table,thetraditionallyusedteststatistic,theCochran-Armitagetest [4, 26]alwayshasaninflatedtypeIerrorrateasthetotalsamplesizegoes toinfinity[27].

InlightoftheproblemsoftypeIerrorcontrol,proceduresbasedonexact probabilitycalculationsmaybeconsideredinordertopreservethenominal levelofatest.Twobasicexactapproaches,theconditionalapproachand theunconditionalapproach,willbeintroducedfirst,followedbyanother threeexactunconditionalapproaches.Toavoidtoomanymathematical notationsandsymbols,weuseacomparativebinomialstudytoexplainthe computationofthesefiveexactapproaches.

1.1.1ConditionalApproach

Forthecaseswhereasymptoticapproachesarenotadequate(e.g.,thetotal samplesizeissmall,theexpectedsamplesizeforsomecellsistoosmall), exactapproachesshouldbeconsideredtomakeproperstatisticalinference. Fisher[2]wasamongthefirsttoproposeanexactapproachbyfixingboth marginaltotalstocontrolforanynuisanceparameterinthetailprobability. ThisisanexactconditionalapproachandisreferredtobeastheCapproach. Thisapproachwasoriginallydevelopedforanalyzinga2 × 2table,butit canbeappliedtoageneral R × C contingencytable.MehtaandPatel[28] developedanetwork-basedalgorithmtoimplementFisher’sexactapproach fordifferenttypesofcategoricaldata.But,themainapplicationofFisher’s approachliesinasimple2 × 2table.

6 ExactStatisticalInferenceforCategoricalData

Generallyspeaking,ateststatisticshouldbeusedinconjunctionwith theconditionalapproachtodeterminethetailarea.Ateststatisticis usedtoorderallpossibletablesinthesamplespacewheretheyallhave withthesamemarginaltotalsastheobservedtable.Asweallknow,the limitingdistributionofateststatisticanditspropertyareveryimportantin asymptoticapproachesfor p-valuecalculationandefficiencycomparison, butnotinexactapproaches.Itisonlyusedforthepurposeofordering alltablesfromthesamplespace.Theprobabilityofeachtablecanbe calculatedexactlyfromahyper-geometricdistributionwhoseprobability densityfunctionisonlybasedonthevaluesfromatable.

Undertheconditionalframeworkwithbothmarginaltotalsfixed,the valueatthe (1,1) cell, n11 ,determinestheotherthreevaluesina2 × 2 contingencytable.Therefore,weuse n11 torepresentthecompletedata (n11 , n12 , n21 , n22 ) forsimplicity.Theprobabilityofa2 × 2tableasin Table 1.1 undertheconditionalapproachiscomputedas

Then,the p-valueiscomputedbyaddingtheprobabilitiesofthegiventable andothermoreextremetables.Forexample,inaone-sidedhypothesis problemthatrejectsthenullhypothesiswithalargeteststatistic,allthe tableswiththeteststatisticvaluesbeinglargerthanorequaltothatof thegiventable’sareintherejectionregionandtheirprobabilitiesare addedtogethertocomputethe p-value.Althoughtheassumptionforthe limitingdistributionofateststatisticisnotneededinexactapproaches, someassumptionsrelatedtoastudyitselfmustbesatisfied,forexample, theindependenceassumptionofparticipants.Theseassumptionscanbe checkedfromthestudy.

Fisher’sexactapproachhasbeenappliedtomanystatisticalproblems, suchasanassociationtestbetweentwocategoricalvariables,atrendtest withbinaryendpoints[14],andproportioncomparisonforbinaryclustered data[29].WhilesometheoristsarguedthatFisher’sexacttestcanonlybe appliedtoastudythatwasoriginallydesignedwithbothmarginaltotals fixed,actually,Fisher’sideaisageneralapproach,andithasbeenapplied tomanystudieswhosemarginaltotalsarenotfixed.Asaforementioned, astudywithbothmarginaltotalsfixedisrarelyfoundinpractice.One frequentlyusedareaofFisher’sexactapproachiswherethetraditionally used χ 2 testcannotbeappliedduetoarelativelysmallexpectedfrequency. Whenthisassumptionofthe χ 2 testisnotsatisfied,theexactconditional

ExactStatisticalInferencefora2 × 2Table 7
PC (n11 ) = m1 !m2 !n1 !n2 ! N!n11 !n12 !n21 !n22 ! . (1.1)

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.