Issuu

https://ebookmass.com/product/exact-statistical-inferencefor-categorical-data-1st-edition-shan/

Instant digital products (PDF, ePub, MOBI) ready for you

Download now and discover formats that fit your needs...

Probability and Statistical Inference 9th Edition, (Ebook PDF)

https://ebookmass.com/product/probability-and-statisticalinference-9th-edition-ebook-pdf/

ebookmass.com

Probability and Statistical Inference 3rd Edition

Magdalena Niewiadomska-Bugaj

https://ebookmass.com/product/probability-and-statisticalinference-3rd-edition-magdalena-niewiadomska-bugaj/

ebookmass.com

Probability and Statistical Inference 3rd Edition

Magdalena Niewiadomska-Bugaj

https://ebookmass.com/product/probability-and-statisticalinference-3rd-edition-magdalena-niewiadomska-bugaj-2/

ebookmass.com

Excellence in Human Service Organization Management, (Standards for Excellence Series) 2nd Edition – Ebook PDF Version

https://ebookmass.com/product/excellence-in-human-serviceorganization-management-standards-for-excellence-series-2nd-editionebook-pdf-version/ ebookmass.com

ExactStatisticalInference forCategoricalData

GuogenShan

UniversityofNevada,LasVegas,NV,USA

2.2TypeIerrorratefortheﬁveexactapproacheswhen K = 3 andasamplesizeof30pergroupatthesigniﬁcancelevelof α = 0.05,theleftplotwiththescorevalue d = (0,1,2),and therightplotwiththescorevalue d

2.3Powercomparisonamongtheﬁveexactapproacheswhen K = 4,asamplesizeof30pergroup,andthescorevalue

2.4PowercomparisonsamongtheCapproach,theMapproach, andtheC+Mapproach,usingthe χ 2 testwithtotal samplesizesof25,50,100,and300fromtheﬁrstrowto thefourthrow..................................................................

Table1.1A 2 × 2 Contingency

Table

ratebetweenthetwogroups.ThefollowingPhaseIIIstudyisusedto illustratethesettingofa2 × 2comparativestudy.

Example1.1. Arandomized,placebo-controlledtwo-armPhaseIII clinicaltrialwasconductedtoevaluateorallubiprostoneforconstipation associatedwithnon-methadoneopioidsinpatientswithchronicnoncancerrelatedpain[3].Patientswererandomizedintoeitherthetreatmentgroup treatedwithlubiprostoneortheplacebogroup,andtheywerefollowedfor 12weeksinthestudy.Thedataisdisplayedin Table1.2.Attheendof thestudy, x1 = 58respondersoutof n1 = 214wererecordedfromthe treatmentgroup,while x2 = 41outof n2 = 217patientswereobserved fromtheplacebogroup.

Theresponserates,theprimaryendpoint,areestimatedtobe27.1% and18.9%forthetreatmentgroupandtheplacebogroup,respectively. Tocomparetheresponsedifferencebetweentwogroups,Pearson’s χ 2 teststatistic

Table1.2ARandomizedPlacebo-Control Two-ArmStudyforPatientswithChronic Noncancer-RelatedPain

Source:FromJamaletal.[3],withpermission.

Ha : p1 = p2 .

Thistestcanbefoundinthefunction prop.test fromstatisticalsoftware R, andthe freq procedurefromSAStocomparetwoindependentproportions. Itshouldbenotedthatthe χ 2 teststatisticisequivalenttotheZteststatistic withapooledvarianceestimate,whichisgivenas

).Itisobvious thattheZteststatisticcanbeappliedtoaone-sidedproblem,butthe χ 2 test statistic Tχ 2 isonlyusedforatwo-sidedproblem.

Theasymptoticlimitingdistributionsofthe χ 2 testandtheZtestare oftenusedforstatisticalinference,andtheyareappropriateforusein practiceonlywhencellfrequenciesarelargeenough.The χ 2 testisnot recommendedforusewhenthelowestexpectedfrequenciesfromthefour cellsislessthan5[4, 5].However,Cochran[6]arguedthatthecutpoint value5ischosenarbitrarily,andthiscutpointmaybemodiﬁedwhennew evidencefromdatabecomesavailable.Fordatawithsmallcellfrequencies, exactapproaches(e.g.,Fisher’sexactconditionalapproach)aregenerally recommended[2, 7, 8].Severalexactapproaches[2, 8–15]willbediscussed laterin Section1.1.

DoubleDichotomyStudy

Inadoubledichotomystudy,onlythetotalsumisfixed,whichiscommonin across-sectionalstudyfortestinganassociationbetweentwodichotomous variables.Asampleofsize N isdrawnfroma population, andeachmember ofthesampleisclassifiedaccordingtothetwodichotomousvariables, A and B.Forsuchstudies,therowandcolumntotalsarenotfixedinadvance; onlythetotalsumisfixed.Onetypicalexampleisacross-sectionalstudyto testtheassociationbetweensmokingandcancer.

Example1.2.

Krishnatreyaetal.[16]reportedaretrospectivestudy ofupperaerodigestivetract(UADT)cancerpatientsfrom2010to2011 fromthehospitalcancerregistry.Forthe N = 56patientsdocumented withtheoccurrenceorpresenceofsynchronousprimaries,eachpatientwas askedabouthis/hersmokingstatus,andwastestedwhetherornotUADT appearsatbothindexandsynchronous.Datafromthisstudyispresented in Table1.3.Oneofthemainresearchquestionsfromthisstudywasto testtheassociationbetweenthesmokinghistoryandthepresenceofUADT synchronouscancers.

Table1.4Airway

Source:FromBenturetal.[18],withpermission.

Inadditiontoamatched-pairsstudywhereeachsubjectismeasured twice,itcouldbeastudyinwhicheachsubjectismatchedwithanequivalentfromanotherstudy.Inpractice,datafromanotherexperimentisalready existoreasytoobtain.Suchdesignscanbeusedtoreducetheinﬂuenceof possibleconfoundingfactors.Traditionally,the χ 2 testandthelikelihood ratiotestareusedfortestingtheassociationbetweentwodichotomous variables.

Let pij = nij /N beprobabilityforthe i-thlevelofthefactor A and j-thlevelofthefactor B.Suppose p1 = p11 + p21 and p2 = p11 + p12 are themarginalprobabilities.Thedifferencebetweenthesetwoproportionsis oftentheparameterofinterest, p1 p2 ,orequivalently p21 p12 .Tomakea statisticalinferenceforthisparameter,themostcommonlyusedteststatistic istheMcNemartest[19]:

MC = (n21 n12 )2 n21 + n12 .

Itshouldbenotedthatonlytheoff-diagonalnumbers, n12 and n21 ,froma 2 × 2tableareusedintheteststatistic,andthediagonalvalues, n11 and n22 , havenoinﬂuenceoncomputingtheteststatisticandthe p-valuecalculation. Therehasbeenalong-termdebateoverwhetherallvaluesshouldbeused intheteststatistic.

1.1EXACTTESTINGPROCEDURES

Whensamplesizeinastudyisincreasedfromsmalltolarge,asymptotic approachesaretraditionallyusedfordataanalysis.However,thesigniﬁcancevaluetheyprovideisonlyanapproximation,becausethesampling distributionoftheteststatisticisonlyapproximatelyequaltothetheoretical

limitingdistribution,forexample,a χ 2 distribution,astandardnormal distribution.Theapproximationisinadequateincaseswherethetotal samplesizeissmall,ortheexpectedvaluesforcellsinthetablearelow.

Indiscretedataanalysis,unsatisfiedtypeIerrorcontrolfromtraditionally usedasymptoticapproacheshasbeenobservedinmanystatisticalproblems. Inacomparativebinomialstudy,Pearson’s χ 2 testisoftenassociated withaninflatedtypeIerrorrate,whilethe χ 2 testbasedonYates’ correction[20]isalwaysconservative,withactualtypeIerrorratebeing lessthanthenominallevel,andoftenlessthanhalfofthenominallevel [7, 11, 21–23].UncontrolledtypeIerrorrateinastudycouldleadtoeither under-oroverestimatedsamplesizecalculation.Severalmodified χ 2 test statisticswereproposedtoincreasetheperformanceofthePearson’s χ 2 test,forexample,theuncorrected χ 2 test[24]andre-corrected χ 2 test[25]. UncontrolledtypeIerroroccursnotonlyina2 × 2table,butalsoinother typesofdata.Forexample,adose-responsestudytotestatrendfordatain a2 × K table,thetraditionallyusedteststatistic,theCochran-Armitagetest [4, 26]alwayshasaninflatedtypeIerrorrateasthetotalsamplesizegoes toinfinity[27].

InlightoftheproblemsoftypeIerrorcontrol,proceduresbasedonexact probabilitycalculationsmaybeconsideredinordertopreservethenominal levelofatest.Twobasicexactapproaches,theconditionalapproachand theunconditionalapproach,willbeintroducedﬁrst,followedbyanother threeexactunconditionalapproaches.Toavoidtoomanymathematical notationsandsymbols,weuseacomparativebinomialstudytoexplainthe computationoftheseﬁveexactapproaches.

1.1.1ConditionalApproach

Forthecaseswhereasymptoticapproachesarenotadequate(e.g.,thetotal samplesizeissmall,theexpectedsamplesizeforsomecellsistoosmall), exactapproachesshouldbeconsideredtomakeproperstatisticalinference. Fisher[2]wasamongtheﬁrsttoproposeanexactapproachbyﬁxingboth marginaltotalstocontrolforanynuisanceparameterinthetailprobability. ThisisanexactconditionalapproachandisreferredtobeastheCapproach. Thisapproachwasoriginallydevelopedforanalyzinga2 × 2table,butit canbeappliedtoageneral R × C contingencytable.MehtaandPatel[28] developedanetwork-basedalgorithmtoimplementFisher’sexactapproach fordifferenttypesofcategoricaldata.But,themainapplicationofFisher’s approachliesinasimple2 × 2table.

1.1.2UnconditionalApproachBasedonMaximization

Theexactconditionalapproachisthealternativeoftraditionalasymptotic approacheswhenasymptoticapproachesdonotcontrolforthetypeI errorrate.However,theconditionalapproachisoftencriticizedforbeing conservativefromanunconditionalframework,whichisoftenbasedon resultsfromstudieswithsmallsamplesizes.Theconservativenessofthe exactconditionalapproachhasbeendiscussedinmanyresearcharticles. AndrésandTejedor[32]comparedtheconditionalapproachandtheunconditionalapproachforbinomialcomparativestudiesunderone-sidedand two-sidedalternativeswithvarioussamplesizeratiosbetweentwogroups. TheyfoundthatthelossofpowerfromtheCapproachascomparedwith theexactunconditionalapproach,isoftenslight.Later,CransandShuster [33]continuedthedebateonwhichexactapproachisthemostpowerful andthemostappropriateforuseinbinomialcomparativestudies.The resultsindicatedthattheCapproachisindeedconservativeastheactual signiﬁcancelevelislessthan0.035forasamplesizeupto50atthenominal levelof0.05.

Toaddresstheconservativenessoftheexactconditionalapproachdue tothesmallsizeofthesamplespace,Barnard[34]wasthefirsttopropose anunconditionalapproachwhereonlythecolumntotals(n1 and n2 asin Table1.1)arefixed,fortestingthehypothesesas H0 : p1 = p2 against Ha : p1 = p2 .Thisstudybelongstothecomparativestudymentionedbefore. Underthenullhypothesis,itstatesthatbothgroupshavethesameresponse rate,forexample, p1 = p2 = p,whichisanuisanceparameterinthetable probability,specifically,

where b(x, y, z) = y x zx (1 z)y x istheprobabilitydensityfunctionofa binomialdistribution.

Beforecomputingtheexactunconditional p-value,thetailareaneeds tobedetermined,andateststatisticisoftenusedinthisprocedurefor orderingthesamplespace.Foragiventeststatistic T,suchas,the χ 2 test,thelikelihoodratiotest,orthescoretest,thetailareaiscalculatedas T (x∗ ) ={x : T(x) ≥ T(x∗ )},where x = (n11 , n12 , n21 , n22 ) isadatapoint, and x∗ istheobserveddata.Inabinomialcomparativestudy, x isequivalent to (n11 , n12 ) asthecolumntotals n1 and n2 aregiven.Theunconditional p-valueisthencomputedas

Proportiondifference

Z-unpooled

Z-pooled

Figure1.1Tailprobabilityplotsforabinomialcomparativestudybasedonthreeteststatistics.

).Themonotonicitypropertyissatisﬁedforallthreeteststatistics[36].Therefore, theunconditional p-valuebasedontheMapproachisobtainedfromthe boundaryofthenullspace,whichisthecommonprobabilityofthetwo groups, p.

IntheMapproach,thefirststepistodeterminethetailareaforthegiven databasedonateststatistic.Differentteststatisticsmayleadtoadifferent tailarea.Thetailprobabilitycurveisdrawnasafunctionofthenuisance parameter, p. Figure1.1 presentsthethreetailprobabilityplotsasafunction of p basedonthethreeteststatistics.Itcanbeseenthattheplotbasedonthe TPD isverydifferentfromthosebasedonthetwo Z teststatistics.Thefinal p-valueiscomputedas themaximumofeachcurve:0.044,0.022,and0.022 basedontheteststatistic TPD , TZuP ,and TZP ,respectively.Thesemaximum valuesarefoundwhen p = 0.588,0.803,and0.602,respectively,andthey aremarkedinthefigurewithbigdots.TheRpackage, Exact,canbeused tocomputetheseM p-values.Itisnoticeablethatthetailprobabilitycurve isnotsmooth,withmultiplelocalspikesasseenintheplots.Thetraditional

(x|

), where C (x∗ ) ={x : PC (x) ≤ PC (x∗ )} isthetailarea,and PC istheC p-value.

When λ = α , P(PC ≤ λ) ≤ α isalwaystrueasfromtheCapproach.It followsthat λ ≥ α .Therefore,theC+Mapproachisatleastaspowerfulas theCapproach.

TheC+Mapproachhasnotbeenwidelyusedinpractice,possibly becauseoftheconfusionthatcomesfromusingtheC p-valueasthetest statistic.TheC p-valueisoftenusedasthe p-valueforstatisticalinference. TheC+MapproachwasshowntobeuniformlymorepowerfulthantheC approachinabinomialcomparativestudy[40],anditwasrecommended foruse.Forthebinomialcomparative Example1.1,the p-valuebasedon theC+Mapproachis0.023,whichleadstothesameconclusionasothers. TheC+M p-valuemaybecomputedfromtheRpackage, Exact.

1.1.5UnconditionalApproachBasedonEstimationand Maximization

TheexactunconditionalMapproachcouldbecomputationallyintensive whenmultiplenuisanceparametersarepresented.Forthisreason,Liddell [48]wastheﬁrsttoproposeanapproachbycomputingtheexactdistribution oftheproportionaldifferenceoftwoindependentbinomialdistributionsat asinglepoint,themaximumlikelihoodestimate(MLE)forthecommon proportionunderthenullhypothesis.InLiddell’sapproach,oneonlyneeds toﬁndtheexactdistributionatonepointinsteadofthewholeparameter spaceasintheMapproach.Itiscomputationallyeasytoobtainthe p-value. Later,StorerandKim[49]extendedLiddell’sapproachtoothercommonly usedteststatistics.Thisapproachisoftencalledtheapproximateunconditionalapproach.Sincethenuisanceparameterinthetableprobabilityis replacedbyanestimateoftheparameter,thisapproachisreferredtoasthe Eapproach.

Ifthenullhypothesisisrejectedforalargeteststatistic,thenthetailarea basedontheteststatistic, T,iscomputedas T (x∗ ) ={x : T(x) ≥ T(x∗ )} Itisoftenthecasethatateststatistichasaclosedformula,thereforeitis computationallyeasy.Foreachdatapointinthetailarea,itsprobabilityisa functionofthenuisanceparameter.Inabinomialcomparativestudy,theE p-valueiscomputedas

where ˆ p = (n11 + n12 )/(n1 + n2 ) istheMLEofthecommonproportionunder thenullhypothesis.ThetailareadoesnotdependontheMLEvalue.The Eapproachisageneralapproach,andithasbeenappliedtootherstudies [50, 51].

TheE p-valueonlyneedstoevaluatethetailprobabilityatasinglepoint. Forthisreason,thisapproachwasattractiveinthedayswhencomputational resourceswereaproblemformostpractitioners.TheEapproachguarantees thetestsizeatasingleestimatedvalue,butnotforallthepossiblevaluesof thenuisanceparameter.Forthisreason,theEapproachisnotexact.

Lloyd[13, 31, 50, 52–55]proposedanewapproachforthe p-value calculationbasedonestimationandmaximization.Theestimationstepis usedtoobtainaflatter p-valueplot,andthemaximizationstepisusedto guaranteethenominallevel.The p-valueplotbasedonateststatisticinthe Mapproachisgenerallyerratic,anditiscomputationallydifficulttosearch fortheglobalmaximum,especiallyforthecasewithmultiplespikes.Itwas showninLloyd[52]thatthe p-valueplotbyusingtheE p-valueasthetest statistictendstohaveamuchflatterplot.Thisimportantstepmayallowone toavoidthesituationwherethemaximumofthetailprobabilityisobtained fromunlikelyvaluesofthenuisanceparameter,suchasthevaluesoutsideof aconfidenceinterval.AlthoughtheE p-valueisonlyapproximatelyvalid, thefollowingmaximizationstepmakestheapproachexactwiththetype Ierrorrateguaranteed.TheapproachisreferredtoastheE+Mapproach. TheE+Mapproachhasbeenappliedtomanyimportantstatisticalproblems [15, 45, 50, 56–61].

Theestimationstepcouldbecomputationallydifficultwithalarge sizesamplespace,sincetheE p-valueforeachdatapointneedstobe computed.Parallelcomputingisausefultooltoreducethecomputational timesignificantlybycomputingtheE p-valuesatthesametimeforalltables. SomeofthepackageshavebeendevelopedinRtoconducttheparallel computing,forexample, multicore, parallel.Forastudywithasmallto moderatesamplesize,apersonalcomputermaybesufficienttoservethis purpose.

TheE p-valueisusedasateststatisticintheE+Mapproachtoﬁndthe tailareaincludingthetableswhoseE p-valuesarelessthanorequaltothat

Figure1.3,Continued

Type I error rate

One-sidedproblem

0.000.010.020.030.040.050.06 p

0.00.20.40.60.81.0

0.000.010.020.030.040.050.06 p

0.00.20.40.60.81.0

Type I error rate

0.00.20.40.60.81.0