Instant digital products (PDF, ePub, MOBI) ready for you
Download now and discover formats that fit your needs...
An Introduction to Statistical Methods and Data Analysis 7th Edition, (Ebook PDF)
https://ebookmass.com/product/an-introduction-to-statistical-methodsand-data-analysis-7th-edition-ebook-pdf/
ebookmass.com
Qualitative Data Analysis: A Methods Sourcebook 3rd Edition – Ebook PDF Version
https://ebookmass.com/product/qualitative-data-analysis-a-methodssourcebook-3rd-edition-ebook-pdf-version/
ebookmass.com
Hands On With Google Data Studio: A Data Citizen's Survival Guide Lee Hurst
https://ebookmass.com/product/hands-on-with-google-data-studio-a-datacitizens-survival-guide-lee-hurst/
ebookmass.com
Introduction to Agricultural Economics (What’s New in Trades & Technology) 7th Edition, (Ebook PDF)
https://ebookmass.com/product/introduction-to-agricultural-economicswhats-new-in-trades-technology-7th-edition-ebook-pdf/
ebookmass.com
(eTextbook PDF) for Clinical Immunology and Serology: A Laboratory Perspective 4th Edition
https://ebookmass.com/product/etextbook-pdf-for-clinical-immunologyand-serology-a-laboratory-perspective-4th-edition/
ebookmass.com
Facilities Design – Ebook PDF Version
https://ebookmass.com/product/facilities-design-ebook-pdf-version/
ebookmass.com
A Culling Tide (Chronicles of an Urban Druid Book 14) Auburn Tempest & Michael Anderle
https://ebookmass.com/product/a-culling-tide-chronicles-of-an-urbandruid-book-14-auburn-tempest-michael-anderle/
ebookmass.com
Leopard's Hunt Christine Feehan
https://ebookmass.com/product/leopards-hunt-christine-feehan/
ebookmass.com
(eBook PDF) Single Variable Calculus: Early Transcendentals 9th Edition
https://ebookmass.com/product/ebook-pdf-single-variable-calculusearly-transcendentals-9th-edition/
ebookmass.com
Feminist Intersectionality: Centering the Margins in 21stcentury Medieval Studies Samantha Seal
https://ebookmass.com/product/feminist-intersectionality-centeringthe-margins-in-21st-century-medieval-studies-samantha-seal/
ebookmass.com
StatisticalMethodsfor SurvivalDataAnalysis StatisticalMethodsfor SurvivalDataAnalysis ThirdEdition ELISAT.LEE JOHNWENYUWANG DepartmentofBiostatisticsandEpidemiologyand CenterforAmericanIndianHealthResearch CollegeofPublicHealth UniversityofOklahomaHealthSciencesCenter OklahomaCity,Oklahoma
Copyright 2003byJohnWiley&Sons,Inc.Allrightsreserved.
PublishedbyJohnWiley&Sons,Inc.,Hoboken,NewJersey. PublishedsimultaneouslyinCanada.
Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmittedinany formorbyanymeans,electronic,mechanical,photocopying,recording,scanning,orotherwise, exceptaspermittedunderSection107or108ofthe1976UnitedStatesCopyrightAct, withouteitherthepriorwrittenpermissionofthePublisher,orauthorizationthroughpaymentof theappropriateper-copyfeetotheCopyrightClearanceCenter,Inc.,222RosewoodDrive, Danvers,MA01923,978-750-8400,fax978-750-4470,oronthewebatwww.copyright.com. RequeststothePublisherforpermissionshouldbeaddressedtothePermissionsDepartment, JohnWiley&Sons,Inc.,111RiverStreet,Hoboken,NJ07030, (201) 748-6011,fax (201) 748-6008, e-mail:permreq wiley.com.
LimitofLiability/DisclaimerofWarranty:Whilethepublisherandauthorhaveusedtheirbest effortsinpreparingthisbook,theymakenorepresentationsorwarrantieswithrespecttothe accuracyorcompletenessofthecontentsofthisbookandspecificallydisclaimanyimplied warrantiesofmerchantabilityorfitnessforaparticularpurpose.Nowarrantymaybecreatedor extendedbysalesrepresentativesorwrittensalesmaterials.Theadviceandstrategiescontained hereinmaynotbesuitableforyoursituationYoushouldconsultwithaprofessionalwhere appropriate.Neitherthepublishernorauthorshallbeliableforanylossofprofitoranyother commercialdamages,includingbutnotlimitedtospecial,incidental,consequential,orother damages.
ForgeneralinformationonourotherproductsandservicespleasecontactourCustomerCare DepartmentwithintheU.S.at877-762-2974,outsidetheU.S.at317-572-3993orfax317-572-4002. Wileyalsopublishesitsbooksinavarietyofelectronicformats.Somecontentthatappearsin print,however,maynotbeavailableinelectronicformat.
LibraryofCongressCataloging-in-PublicationData:
Lee,ElisaT.
Statisticalmethodsforsurvivaldataanalysis.--3rded./ElisaT.LeeandJohnWenyuWang. p.cm.--(Wileyseriesinprobabilityandstatistics) Includesbibliographicalreferencesandindex. ISBN0-471-36997-7 (cloth:alk.paper)
1.Medicine--Research--Statisticalmethods.2.Failuretimedataanalysis.3. Prognosis--Statisticalmethods.I.Wang,JohnWenyu.II.Title.III.Series.
R853.S7L432003
610 .72--dc212002027025
PrintedintheUnitedStatesofAmerica. 10987654321
Tothememoryofourparents Mr.Chi-LanTanandMrs.Hwei-ChiLeeTan (E.T.L.)
Mr.BeijunZhangandMrs.XiangyiWang (J.W.W.)
Prefacexi
1Introduction
1.1Preliminaries,1
1.2CensoredData,1
1.3ScopeoftheBook,5 BibliographicalRemarks,7
2FunctionsofSurvivalTime8
2.1Definitions,8
2.2RelationshipsoftheSurvivalFunctions,15 BibliographicalRemarks,17 Exercises,17
3ExamplesofSurvivalDataAnalysis19
3.1Example3.1:ComparisonofTwoTreatmentsandThree Diets,19
3.2Example3.2:ComparisonofTwoSurvivalPatterns UsingLifeTables,26
3.3Example3.3:FittingSurvivalDistributionstoRemission Data,29
3.4Example3.4:RelativeMortalityandIdentificationof PrognosticFactors,32
3.5Example3.5:IdentificationofRiskFactors,40 BibliographicalRemarks,47 Exercises,47
4NonparametricMethodsofEstimatingSurvivalFunctions64
4.1Product-LimitEstimatesofSurvivorshipFunction,65
4.2Life-TableAnalysis,77
4.3Relative,Five-Year,andCorrectedSurvivalRates,94
4.4StandardizedRatesandRatios,97
BibliographicalRemarks,102 Exercises,102
5NonparametricMethodsforComparingSurvivalDistributions106
5.1ComparisonofTwoSurvivalDistributions,106
5.2Mantel HaenszelTest,121
5.3Comparisonof K (K 2) Samples,125
BibliographicalRemarks,131 Exercises,131
6SomeWell-KnownParametricSurvivalDistributions andTheirApplications134
6.1ExponentialDistribution,134
6.2WeibullDistribution,138
6.3LognormalDistribution,143
6.4GammaandGeneralizedGammaDistributions,148
6.5Log-LogisticDistribution,154
6.6OtherSurvivalDistributions,155
BibliographicalRemarks,160 Exercises,160
7EstimationProceduresforParametricSurvivalDistributions withoutCovariates162
7.1GeneralMaximumLikelihoodEstimationProcedure,162
7.2ExponentialDistribution,166
7.3WeibullDistribution,178
7.4LognormalDistribution,180
7.5StandardandGeneralizedGammaDistributions,188
7.6Log-LogisticDistribution,195
7.7OtherParametricSurvivalDistributions,196
BibliographicalRemarks,196 Exercises,197
8GraphicalMethodsforSurvivalDistributionFitting198
8.1Introduction,198
8.2ProbabilityPlotting,200
8.3HazardPlotting,209
8.4Cox SnellResidualMethod,215
BibliographicalRemarks,219 Exercises,219
9TestsofGoodnessofFitandDistributionSelection221
9.1Goodness-of-FitTestStatisticsBasedonAsymptotic LikelihoodInferences,222
9.2TestsforAppropriatenessofaFamilyofDistributions,225
9.3SelectionofaDistributionUsingBIC orAICProcedures,230
9.4TestsforaSpecificDistributionwith KnownParameters,233
9.5HollanderandProschan’sTestforAppropriateness ofaGivenDistributionwithKnownParameters,236 BibliographicalRemarks,238 Exercises,240
10ParametricMethodsforComparingTwoSurvivalDistributions243
10.1LikelihoodRatioTestforComparingTwoSurvival Distributions,243
10.2ComparisonofTwoExponentialDistributions,246
10.3ComparisonofTwoWeibullDistributions,251
10.4ComparisonofTwoGammaDistributions,252
BibliographicalRemarks,254 Exercises,254
11ParametricMethodsforRegressionModelFittingand IdentificationofPrognosticFactors256
11.1PreliminaryExaminationofData,257
11.2GeneralStructureofParametricRegressionModels andTheirAsymptoticLikelihoodInference,259
11.3ExponentialRegressionModel,263
11.4WeibullRegressionModel,269
11.5LognormalRegressionModel,274
11.6ExtendedGeneralizedGammaRegressionModel,277
11.7Log-LogisticRegressionModel,280
11.8OtherParametricRegressionModels,283
11.9ModelSelectionMethods,286
BibliographicalRemarks,295 Exercises,295
12IdentificationofPrognosticFactorsRelatedtoSurvivalTime: CoxProportionalHazardsModel298
12.1PartialLikelihoodFunctionforSurvivalTimes,298
12.2IdentificationofSignificantCovariates,314
12.3EstimationoftheSurvivorshipFunctionwithCovariates,319
12.4AdequacyAssessmentoftheProportionalHazardsModel,326
BibliographicalRemarks,336 Exercises,337
13IdentificationofPrognosticFactorsRelatedtoSurvivalTime: NonproportionalHazardsModels339
13.1ModelswithTime-DependentCovariates,339
13.2StratifiedProportionalHazardsModels,348
13.3CompetingRisksModel,352
13.4RecurrentEventsModels,356
13.5ModelsforRelatedObservations,374
BibliographicalRemarks,376 Exercises,376
14IdentificationofRiskFactorsRelatedtoDichotomous andPolychotomousOutcomes377
14.1UnivariateAnalysis,378
14.2LogisticandConditionalLogisticRegressionModels forDichotomousResponses,385
14.3ModelsforPolychotomousOutcomes,413
BibliographicalRemarks,425 Exercises,425
Preface Statisticalmethodsforsurvivaldataanalysishavecontinuedtoflourishinthe lasttwodecades.Applicationsofthemethodshavebeenwidenedfromtheir historicaluseincancerandreliabilityresearchtobusiness,criminology, epidemiology,andsocialandbehavioralsciences.Thethirdeditionof StatisticalMethodsforSurvivalDataAnalysis isintendedtoprovideacomprehensive introductionofthemostcommonlyusedmethodsforanalyzingsurvivaldata. Itbeginswithbasicdefinitionsandinterpretationsofsurvivalfunctions.From there,thereaderisguidedthroughmethods,parametricandnonparametric, forestimatingandcomparingthesefunctionsandthesearchforatheoretical distribution (ormodel) tofitthedata.Parametricandnonparametricapproachestotheidentificationofprognosticfactorsthatarerelatedtosurvival arethendiscussed.Finally,regressionmethods,primarilylinearlogisticregressionmodels,toidentifyriskfactorsfordichotomousandpolychotomous outcomesareintroduced.
Thethirdeditioncontinuestobeapplication-oriented,withaminimum levelofmathematics.Inafewchapters,someknowledgeofcalculusandmatrix algebraisneeded.Thefewsectionsthatintroducethegeneralmathematical structureforthemethodscanbeskippedwithoutlossofcontinuity.Alarge numberofpracticalexamplesaregiventoassistthereaderinunderstanding themethodsandapplicationsandininterpretingtheresults.Readerswithonly collegealgebrashouldfindthebookreadableandunderstandable.
Therearemanyexcellentbooksonclinicaltrials.Wethereforehavedeleted thetwochaptersonthesubjectthatwereinthesecondedition.Instead,we haveincludeddiscussionsofmorestatisticalmethodsforsurvivaldataanalysis. Abriefsummaryoftheimprovementsmadeforthethirdeditionisgiven below.
1.Twoadditionaldistributions,thelog-logisticdistributionandageneralizedgammadistribution,havebeenaddedtotheapplicationofparametricmodelsthatcanbeusedinmodelfittingandprognosticfactor identification (Chapters6,7,and11).
2.Inseveralsections (Sections7.1,9.1,10.1,11.2,and12.1),discussionsof theasymptoticlikelihoodinferenceofthemethodscoveredinthe chaptersaregiven.Thesesectionsareintendedtoprovideamoregeneral mathematicalstructureforstatisticians.
3.TheCox Snellresidualmethodhasbeenaddedtothechapteron graphicalmethodsforsurvivaldistributionfitting (Chapter8).Inaddition,thesectionsonprobabilityandhazardplottinghavebeenrevised sothatnospecialgraphicalpapersarerequiredtomaketheplots.
4.Moretestsofgoodnessoffitaregiven,includingtheBICandAIC procedures (Chapters9and11)
5.ForCox’sproportionalhazardsmodel (Chapter12),wehavenow includedmethodstoassessitsadequencyandprocedurestoestimatethe survivorshipfunctionwithcovariates.
6.Theconceptofnonproportionalhazardsmodelsisintroduced (Chapter 13),whichincludesmodelswithtime-dependentcovariates,stratified models,competingrisksmodels,recurrenteventmodels,andmodelsfor relatedobservations.
7.Thechapteronlinearlogisticregression (Chapter14) hasbeenexpanded tocoverregressionmodelsforpolychotomousoutcomes.Inaddition, methodsforageneral m : n matchingdesignhavebeenaddedtothe sectiononconditionallogisticregressionforcase controlstudies.
8.ComputerprogrammingcodesforsoftwarepackagesBMDP,SAS,and SPSSareprovidedformostexamplesinthetext.
Wewouldliketothankthemanyresearchers,teachers,andstudentswho haveusedthesecondeditionofthebook.Thesuggestionsforimprovement thatmanyofthemhaveprovidedareinvaluable.SpecialthanksgotoXing Wang,LindaHutton,TracyMankin,andImranAhmedfortypingthe manuscript.SteveQuigleyofJohnWileyconvincedustoworkonathird edition.Wethankhimforhisenthusiasm.
Finally,wearemostgratefultoourfamilies,Sam,Vivian,Benedict,Jennifer, andAnnelisa (E.T.L.),andAliceandXing (J.W.W.),fortheconstantjoy,love, andsupporttheyhavegivenus.
OklahomaCity,OK
April18,2001
CHAPTER1 Introduction 1.1PRELIMINARIES Thisbookisforbiomedicalresearchers,epidemiologists,consultingstatisticians,studentstakingafirstcourseonsurvivaldataanalysis,andothers interestedinsurvivaltimestudy.Itdealswithstatisticalmethodsforanalyzing survivaldataderivedfromlaboratorystudiesofanimals,clinicalandepidemiologicstudiesofhumans,andotherappropriateapplications.
Survivaltime canbedefinedbroadlyasthetimetotheoccurrenceofagiven event.Thiseventcanbethedevelopmentofadisease,responsetoatreatment, relapse,ordeath.Therefore,survivaltimecanbetumor-freetime,thetimefrom thestartoftreatmenttoresponse,lengthofremission,andtimetodeath. Survivaldatacanincludesurvivaltime,responsetoagiventreatment,and patientcharacteristicsrelatedtoresponse,survival,andthedevelopmentofa disease.Thestudyofsurvivaldatahasfocusedonpredictingtheprobabilityof response,survival,ormeanlifetime,comparingthesurvivaldistributionsof experimentalanimalsorofhumanpatientsandtheidentificationofriskand/or prognosticfactorsrelatedtoresponse,survival,andthedevelopmentofa disease.Inthisbook,specialconsiderationisgiventothestudyofsurvivaldata inbiomedicalsciences,althoughallthemethodsaresuitableforapplications inindustrialreliability,socialsciences,andbusiness.Examplesofsurvivaldata inthesefieldsarethelifetimeofelectronicdevices,components,orsystems (reliabilityengineering);felons’timetoparole (criminology);durationoffirst marriage (sociology);lengthofnewspaperormagazinesubscription (marketing);andworker’scompensationclaims (insurance) andtheirvariousinfluencingriskorprognosticfactors.
1.2CENSOREDDATA Manyresearchersconsidersurvivaldataanalysistobemerelytheapplication oftwoconventionalstatisticalmethodstoaspecialtypeofproblem: parametric ifthedistributionofsurvivaltimesisknowntobenormaland nonparametric
ifthedistributionisunknown.Thisassumptionwouldbetrueifthesurvival timesofallthesubjectswereexactandknown;however,somesurvivaltimes arenot.Further,thesurvivaldistributionisoftenskewed,orfarfrombeing normal.Thusthereisaneedfornewstatisticaltechniques.Oneofthemost importantdevelopmentsisduetoaspecialfeatureofsurvivaldatainthelife sciencesthatoccurswhensomesubjectsinthestudyhavenotexperiencedthe eventofinterestattheendofthestudyortimeofanalysis.Forexample,some patientsmaystillbealiveordisease-freeattheendofthestudyperiod.The exactsurvivaltimesofthesesubjectsareunknown.Thesearecalled censored observations or censoredtimes andcanalsooccurwhenpeoplearelostto follow-upafteraperiodofstudy.Whenthesearenotcensoredobservations, thesetofsurvivaltimesis complete. Therearethreetypesofcensoring.
TypeICensoring Animalstudiesusuallystartwithafixednumberofanimals,towhichthe treatmentortreatmentsisgiven.Becauseoftimeand/orcostlimitations,the researcheroftencannotwaitforthedeathofalltheanimals.Oneoptionisto observeforafixedperiodoftime,saysixmonths,afterwhichthesurviving animalsaresacrificed.Survivaltimesrecordedfortheanimalsthatdiedduring thestudyperiodarethetimesfromthestartoftheexperimenttotheirdeath. Thesearecalled exact or uncensoredobservations.Thesurvivaltimesofthe sacrificedanimalsarenotknownexactlybutarerecordedasatleastthelength ofthestudyperiod.Thesearecalled censoredobservations. Someanimalscould belostordieaccidentally.Theirsurvivaltimes,fromthestartofexperiment tolossordeath,arealsocensoredobservations.In typeIcensoring,ifthereare noaccidentallosses,allcensoredobservationsequalthelengthofthestudy period.
Forexample,supposethatsixratshavebeenexposedtocarcinogensby injectingtumorcellsintotheirfootpads.Thetimestodevelopatumorofa givensizeareobserved.Theinvestigatordecidestoterminatetheexperiment after30weeks.Figure1.1isaplotofthedevelopmenttimesofthetumors. RatsA,B,andDdevelopedtumorsafter10,15,and25weeks,respectively. RatsCandEdidnotdeveloptumorsbytheendofthestudy;theirtumor-free timesarethus30-plusweeks.RatFdiedaccidentallywithouttumorsafter19 weeksofobservation.Thesurvivaldata (tumor-freetimes) are10,15,30 ,25, 30 ,and19 weeks. (Theplusindicatesacensoredobservation.)
TypeIICensoring Anotheroptioninanimalstudiesistowaituntilafixedportionoftheanimals havedied,say80of100,afterwhichthesurvivinganimalsaresacrificed.In thiscase, typeIIcensoring,iftherearenoaccidentallosses,thecensored observationsequalthelargestuncensoredobservation.Forexample,inan experimentofsixrats (Figure1.2),theinvestigatormaydecidetoterminatethe studyafterfourofthesixratshavedevelopedtumors.Thesurvivalor tumor-freetimesarethen10,15,35 ,25,35,and19 weeks.
Figure1.1 ExampleoftypeIcensoreddata.
Figure1.2 ExampleoftypeIIcensoreddata.
TypeIIICensoring Inmostclinicalandepidemiologicstudiestheperiodofstudyisfixedand patientsenterthestudyatdifferenttimesduringthatperiod.Somemaydie beforetheendofthestudy;theirexactsurvivaltimesareknown.Othersmay withdrawbeforetheendofthestudyandarelosttofollow-up.Stillothersmay bealiveattheendofthestudy.For‘‘lost’’patients,survivaltimesareatleast fromtheirentrancetothelastcontact.Forpatientsstillalive,survivaltimes areatleastfromentrytotheendofthestudy.Thelattertwokindsof observationsarecensoredobservations.Sincetheentrytimesarenotsimultaneous,thecensoredtimesarealsodifferent.Thisis typeIIIcensoring.For example,supposethatsixpatientswithacuteleukemiaenteraclinicalstudy
Figure1.3 ExampleoftypeIIIcensoreddata. duringatotalstudyperiodofoneyear.Supposealsothatallsixrespondto treatmentandachieveremission.TheremissiontimesareplottedinFigure1.3. PatientsA,C,andEachieveremissionatthebeginningofthesecond,fourth, andninthmonths,andrelapseafterfour,six,andthreemonths,respectively. PatientBachievesremissionatthebeginningofthethirdmonthbutislostto follow-upfourmonthslater;theremissiondurationisthusatleastfour months.PatientsDandFachieveremissionatthebeginningofthefifthand tenthmonths,respectively,andarestillinremissionattheendofthestudy; theirremissiontimesarethusatleasteightandthreemonths.Therespective remissiontimesofthesixpatientsare4,4 ,6,8 ,3,and3 months.
TypeIandtypeIIcensoredobservationsarealsocalled singlycensored data,andtypeIII, progressivelycensoreddata,byCohen (1965).Another commonlyusednamefortypeIIIcensoringis randomcensoring.Allofthese typesofcensoringare rightcensoring or censoringtotheright.Therearealso leftcensoringandintervalcensoringcases. Leftcensoring occurswhenitis knownthattheeventofinterestoccurredpriortoacertaintime t,buttheexact timeofoccurrenceisunknown.Forexample,anepidemiologistwishestoknow theageatdiagnosisinafollow-upstudyofdiabeticretinopathy.Atthetimeof theexamination,a50-year-oldparticipantwasfoundtohavealreadydevelopedretinopathy,butthereisnorecordoftheexacttimeatwhichinitialevidence wasfound.Thustheageatexamination (i.e.,50) isaleft-censoredobservation. Itmeansthattheageofdiagnosisforthispatientis atmost 50years.
Intervalcensoring occurswhentheeventofinterestisknowntohave occurredbetweentimes a and b.Forexample,ifmedicalrecordsindicatethat atage45,thepatientintheexampleabovedidnothaveretinopathy,hisage atdiagnosisisbetween45and50years.
Wewillstudydescriptiveandanalyticmethodsforcomplete,singlycensored,andprogressivelycensoredsurvivaldatausingnumericalandgraphical
techniques.Analyticmethodsdiscussedincludeparametricandnonparametric. Parametricapproachesareusedeitherwhenasuitablemodelordistribution isfittedtothedataorwhenadistributioncanbeassumedforthepopulation fromwhichthesampleisdrawn.Commonlyusedsurvivaldistributionsarethe exponential,Weibull,lognormal,andgamma.Ifasurvivaldistributionisfound tofitthedataproperly,thesurvivalpatterncanthenbedescribedbythe parametersinacompactway.Statisticalinferencecanbebasedonthe distributionchosen.Ifthesearchforanappropriatemodelordistributionis tootimeconsumingornoteconomicalornotheoreticaldistributionadequatelyfitsthedata,nonparametricmethods,whicharegenerallyeasytoapply, shouldbeconsidered.
1.3SCOPEOFTHEBOOK Thisbookisdividedintofourparts.
PartI (Chapters1,2,and3) definessurvivalfunctionsandgivesexamples ofsurvivaldataanalysis.Survivaldistributionismostcommonlydescribedby threefunctions:thesurvivorshipfunction (alsocalledthecumulativesurvival rateorsurvivalfunction),theprobabilitydensityfunction,andthehazard function (hazardrateorage-specificrate).InChapter2wedefinethesethree functionsandtheirequivalencerelationships.Chapter3illustratessurvival dataanalysiswithfiveexamplestakenfromactualresearchsituations.Clinical andlaboratorydataaresystematicallyanalyzedinprogressivestepsandthe resultsareinterpreted.Sectionandchapternumbersaregivenforquick reference.Theactualcalculationsaregivenasexamplesorleftasexercisesin thechapterswherethemethodsarediscussed.Foursetsofdataareprovided intheexercisesectionforthereadertoanalyze.Thesedataarereferredtoin thevariouschapters.
InPartII (Chapters4and5) weintroducesomeofthemostwidelyused nonparametricmethodsforestimatingandcomparingsurvivaldistributions. Chapter4dealswiththenonparametricmethodsforestimatingthethree survivalfunctions:theKaplanandMeierproduct-limit (PL) estimateandthe life-tabletechnique (populationlifetablesandclinicallifetables).Alsocovered isstandardizationofratesbydirectandindirectmethods,includingthe standardizedmortalityratio.Chapter5isdevotedtononparametrictechniquesforcomparingsurvivaldistributions.Acommonpracticeistocompare thesurvivalexperiencesoftwoormoregroupsdifferingintheirtreatmentor inagivencharacteristic.Severalnonparametrictestsaredescribed.
PartIII (Chapters6to10) introducestheparametricapproachtosurvival dataanalysis.Althoughnonparametricmethodsplayanimportantrolein survivalstudies,parametrictechniquescannotbeignored.InChapter6we introduceanddiscusstheexponential,Weibull,lognormal,gamma,and log-logisticsurvivaldistributions.Practicalapplicationsofthesedistributions takenfromtheliteratureareincluded.
Animportantpartofsurvivaldataanalysisismodelordistributionfitting. Onceanappropriatestatisticalmodelforsurvivaltimehasbeenconstructed anditsparametersestimated,itsinformationcanhelppredictsurvival,develop optimaltreatmentregimens,planfutureclinicalorlaboratorystudies,andso on.Thegraphicaltechniqueisasimpleinformalwaytoselectastatistical modelandestimateitsparameters.Whenastatisticaldistributionisfoundto fitthedatawell,theparameterscanbeestimatedbyanalyticalmethods.In Chapter7wediscussanalyticalestimationproceduresforsurvivaldistributions.Mostoftheestimationproceduresarebasedonthemaximumlikelihood method.Mathematicalderivationsareomitted;onlyformulasfortheestimates andexamplesaregiven.InChapter8weintroducethreekindsofgraphical methods:probabilityplotting,hazardplotting,andtheCox Snellresidual methodforsurvivaldistributionfitting.InChapter9wediscussseveraltests ofgoodnessoffitanddistributionselection.InChapter10wedescribeseveral parametricmethodsforcomparingsurvivaldistributions.
Atopicthathasreceivedincreasingattentionistheidentificationof prognosticfactorsrelatedtosurvivaltime.Forexample,whoislikelyto survivelongestaftermastectomy,andwhatarethemostimportantfactorsthat influencethatsurvival?Anothersubjectimportanttobothbiomedicalresearchersandepidemiologistsisidentificationoftheriskfactorsrelatedtothe developmentofagivendiseaseandtheresponsetoagiventreatment.What arethefactorsmostcloselyrelatedtothedevelopmentofagivendisease?Who ismorelikelytodeveloplungcancer,diabetes,orcoronarydisease?Inmany diseases,suchascancer,patientswhorespondtotreatmenthaveabetter prognosisthanpatientswhodonot.Thequestion,then,relatestowhatthe factorsarethatinfluenceresponse.Whoismorelikelytorespondtotreatment andthusperhapssurvivelonger?
PartIV (Chapters11to14) dealswithprognostic/riskfactorsandsurvival times.InChapter11weintroduceparametricmethodsforidentifyingimportantprognosticfactors.Chapters12and13cover,respectively,theCox proportionalhazardsmodelandseveralnonproportionalhazardsmodelsfor theidentificationofprognosticfactors.Inthefinalchapter,Chapter14,we introducethelinearlogisticregressionmodelforbinaryoutcomevariablesand itsextensiontohandlepolychotomousoutcomes.
InAppendixAwedescribeanumericalprocedureforsolvingnonlinear equations,theNewton Raphsonmethod.ThismethodissuggestedinChapters7,11,12,and13.AppendixBcomprisesanumberofstatisticaltables.
Mostnonparametrictechniquesdiscussedhereareeasytounderstandand simpletoapply.Parametricmethodsrequireanunderstandingofsurvival distributions.Unfortunately,mostofsurvivaldistributionsarenotsimple. Readerswithoutcalculusmayfinditdifficulttoapplythemontheirown. However,ifthemainpurposeisnotmodelfitting,mostparametrictechniques canbesubstitutedforbytheirnonparametriccompetitors.Infact,alarge percentageofsurvivalstudiesinclinicalorepidemiologicaljournalsare analyzedbynonparametricmethods.Researchersnotinterestedinsurvival
modelfittingshouldreadthechaptersandsectionsonnonparametricmethods. Computerprogramsforsurvivaldataanalysisareavailableinseveralcommerciallyavailablesoftwarepackages:forexample,BMDP,SAS,andSPSS.These computerprogramsarereferredtoinvariouschapterswhenapplicable. Computerprogrammingcodesaregivenformanyoftheexamples.
BibliographicalRemarks CrossandClark (1975) wasthefirstbooktodiscussparametricmodelsand nonparametricandgraphicaltechniquesforbothcompleteandcensored survivaldata.Sincethen,severalotherbookshavebeenpublishedinaddition tothefirsteditionofthisbook (Lee,1980,1992).Elandt-JohnsonandJohnson (1980) discussextensivelytheconstructionoflifetables,modelfitting,competingrisk,andmathematicalmodelsofbiologicalprocessesofdiseaseprogressionandaging.KalbfleischandPrentice (1980) focusonregression problemswithsurvivaldata,particularlyCox’sproportionalhazardsmodel. Miller (1981) coversanumberofparametricandnonparametricmethodsfor survivalanalysis.CoxandOakes (1984) alsocoverthetopicconciselywithan emphasisontheexaminationofexplanatoryvariables.
Nelson (1982) providesagooddiscussionofparametric,nonparametric,and graphicalmethods.Thebookismoresuitedforindustrialreliabilityengineers thanforbiomedicalresearchers,asareHahnandShapiro (1967) andMannet al. (1974).Inaddition,Lawless (1982) givesabroadcoverageoftheareawith applicationsinengineeringandbiomedicalsciences.
MorerecentpublicationsincludeMarubiniandValsecchi (1994),Kleinbaum (1995),KleinandMoeschberger (1997),andHosmerandLemeshow (1999).Mostofthesebookstakeamorerigorousmathematicalapproachand requireknowledgeofmathematicalstatistics.
CHAPTER2 FunctionsofSurvivalTime Survivaltimedatameasurethetimetoacertainevent,suchasfailure,death, response,relapse,thedevelopmentofagivendisease,parole,ordivorce.These timesaresubjecttorandomvariations,andlikeanyrandomvariables,forma distribution.Thedistributionofsurvivaltimesisusuallydescribedorcharacterizedbythreefunctions: (1) thesurvivorshipfunction, (2) theprobability densityfunction,and (3) thehazardfunction.Thesethreefunctionsare mathematicallyequivalent—ifoneofthemisgiven,theothertwocanbe derived.
Inpractice,thethreefunctionscanbeusedtoillustratedifferentaspectsof thedata.Abasicprobleminsurvivaldataanalysisistoestimatefromthe sampleddataoneormoreofthesethreefunctionsandtodrawinferences aboutthesurvivalpatterninthepopulation.InSection2.1wedefinethethree functionsandinSection2.2,discusstheequivalencerelationshipamongthe threefunctions.
2.1DEFINITIONS Let T denotethesurvivaltime.Thedistributionof T canbecharacterizedby threeequivalentfunctions.
SurvivorshipFunction(orSurvivalFunction)
Thisfunction,denotedby S(t),isdefinedastheprobabilitythatanindividual surviveslongerthan t: S(t) P (anindividualsurviveslongerthan t) P(T t )
Fromthedefinitionofthecumulativedistributionfunction F(t)of T, S(t) 1-P (anindividualfailsbefore t) 1 F(t)(2.1.2)
)
Here S(t)isanonincreasingfunctionoftime t withtheproperties
S(t) 1for t 0 0for t
Thatis,theprobabilityofsurvivingatleastatthetimezerois1andthatof survivinganinfinitetimeiszero.
Thefunction S(t)isalsoknownasthe cumulativesurvivalrate. Todepictthe courseofsurvival,Berkson (1942) recommendedagraphicpresentationof S(t). Thegraphof S(t)iscalledthe survivalcurve. Asteepsurvivalcurve,suchas theoneshowninFigure2.1a,representslowsurvivalrateorshortsurvival time.AgradualorflatsurvivalcurvesuchasinFigure2.1b representshigh survivalrateorlongersurvival.
Thesurvivorshipfunctionorthesurvivalcurveisusedtofindthe50th percentile (themedian) andotherpercentiles (e.g.,25thand75th) ofsurvival timeandtocomparesurvivaldistributionsoftwoormoregroups.Themedian survivaltimesinFigure2.1a and b areapproximately5and36unitsoftime, respectively.Themeanisgenerallyusedtodescribethecentraltendencyofa distribution,butinsurvivaldistributionsthemedianisoftenbetterbecausea smallnumberofindividualswithexceptionallylongorshortlifetimeswill causethemeansurvivaltimetobedisproportionatelylargeorsmall.
Inpractice,iftherearenocensoredobservations,thesurvivorshipfunction isestimatedastheproportionofpatientssurvivinglongerthan t :
numberofpatientssurvivinglongerthan t
S(t)
)
totalnumberofpatients
wherethecircumflexdenotesan estimate ofthefunction.Whencensored observationsarepresent,thenumeratorof (2.1.3) cannotalwaysbedetermined. Forexample,considerthefollowingsetofsurvivaldata:4,6,6 ,10 ,15,20.
Figure2.1 Twoexamplesofsurvivalcurves.
Using (2.1.3),wecancompute S(5) 5/6 0.833.However,wecannotobtain S(11)sincetheexactnumberofpatientssurvivinglongerthan11isunknown. Eitherthethirdorthefourthpatient (6 and10 ) couldsurvivelongerthan orlessthan11.Thus,whencensoredobservationsarepresent, (2.1.3) isno longerappropriateforestimating S(t).Nonparametricmethodsofestimating S(t)forcensoreddataarediscussedinChapter4.
ProbabilityDensityFunction(orDensityFunction) Likeanyothercontinuousrandomvariable,thesurvivaltime T hasa probabilitydensityfunctiondefinedasthelimitoftheprobabilitythatan individualfailsintheshortinterval t to t t perunitwidth t,orsimplythe probabilityoffailureinasmallintervalperunittime.Itcanbeexpressedas
f (t) lim P[anindividualdyingintheinterval (t, t t)]
.1.4)
Thegraphof f (t)iscalledthe densitycurve. Figure2.2a and b givetwo examplesofthedensitycurve.Thedensityfunctionhasthefollowingtwo properties:
1. f (t)isanonnegativefunction: f (t) 0forall t 0 0for t 0
2.Theareabetweenthedensitycurveandthe t axisisequalto1.
Inpractice,iftherearenocensoredobservations,theprobabilitydensity function f (t)isestimatedastheproportionofpatientsdyinginanintervalper
Figure2.2 Twoexamplesofdensitycurves.
unitwidth:
f (t)
numberofpatientsdyingintheintervalbeginningattime t (totalnumberofpatients) (intervalwidth)
)
Similartotheestimationof S(t),whencensoredobservationsarepresent, (2.1.5) isnotapplicable.WediscussanappropriatemethodinChapter4.
Theproportionofindividualsthatfailinanytimeintervalandthepeaksof highfrequencyoffailurecanbefoundfromthedensityfunction.Thedensity curveinFigure2.2a givesapatternofhighfailurerateatthebeginningofthe studyanddecreasingfailurerateastimeincreases.InFigure2.2b,thepeakof highfailurefrequencyoccursatapproximately1.7unitsoftime.Theproportionofindividualsthatfailbetween1and2unitsoftimeisequaltotheshaded areabetweenthedensitycurveandtheaxis.Thedensityfunctionisalsoknown asthe unconditionalfailurerate.
HazardFunction Thehazardfunction h(t)ofsurvivaltime T givesthe conditionalfailurerate. Thisisdefinedastheprobabilityoffailureduringaverysmalltimeinterval, assumingthattheindividualhassurvivedtothebeginningoftheinterval,or asthelimitoftheprobabilitythatanindividualfailsinaveryshortinterval, t t,giventhattheindividualhassurvivedtotime t:
P anindividualfailsinthetimeinterval
Thehazardfunctioncanalsobedefinedintermsofthecumulative distributionfunction F(t)andtheprobabilitydensityfunction f (t):
Thehazardfunctionisalsoknownasthe instantaneousfailurerate, forceof mortality, conditionalmortalityrate,and age-specificfailurerate. If t in (2.1.6) isage,itisameasureofthepronenesstofailureasafunctionoftheageofthe individualinthesensethatthequantity th(t)istheexpectedproportionof age t individualswhowillfailintheshorttimeinterval t t.Thehazard functionthusgivestheriskoffailureperunittimeduringtheagingprocess.It playsanimportantroleinsurvivaldataanalysis.
Inpractice,whentherearenocensoredobservationsthehazardfunctionis estimatedastheproportionofpatientsdyinginanintervalperunittime,given
thattheyhavesurvivedtothebeginningoftheinterval:
h(t)
numberofpatientsdyingintheintervalbeginningattime t (numberofpatientssurvivingat t) (intervalwidth)
numberofpatientsdyingperunittimeintheinterval
numberofpatientssurvivingat t (2.1.8)
Actuariesusuallyusetheaveragehazardrateoftheintervalinwhichthe numberofpatientsdyingperunittimeintheintervalisdividedbytheaverage numberofsurvivorsatthemidpointoftheinterval:
h(t)
numberofpatientsdyingperunittimeintheinterval (numberofpatientssurvivingat t) (numberofdeathsintheinterval)/2 (2.1.9)
Theactuarialestimatein (2.1.9) givesahigherhazardratethan (2.1.8) andthus amoreconservativeestimate.
Thehazardfunctionmayincrease,decrease,remainconstant,orindicatea morecomplicatedprocess.Figure2.3isaplotofseveralkindsofhazard function.Forexample,patientswithacuteleukemiawhodonotrespondto treatmenthaveanincreasinghazardrate, h (t), h (t)isadecreasinghazard functionthat,forexample,indicatestheriskofsoldierswoundedbybullets whoundergosurgery.Themaindangeristheoperationitselfandthisdanger decreasesifthesurgeryissuccessful.Anexampleofaconstanthazardfunction, h (t),istheriskofhealthypersonsbetween18and40yearsofagewhosemain risksofdeathareaccidents.The bathtubcurve, h (t),describestheprocessof
Figure2.3 Examplesofthehazardfunction.