UseR!
Wickham: ggplot2(2nded.2016)
Luke: AUser’sGuidetoNetworkAnalysisinR
Monogan: PoliticalAnalysisUsingR
Cano/M.Moguerza/PrietoCorcoba: QualityControlwithR
Schwarzer/Carpenter/Rücker: Meta-AnalysiswithR
Gondro: PrimertoAnalysisofGenomicDataUsingR
Chapman/Feit: RforMarketingResearchandAnalytics
Willekens: MultistateAnalysisofLifeHistorieswithR
Cortez: ModernOptimizationwithR
Kolaczyk/Csàrdi: StatisticalAnalysisofNetworkDatawithR
Swenson/Nathan: FunctionalandPhylogeneticEcologyinR
Nolan/TempleLang: XMLandWebTechnologiesforDataScienceswithR
Nagarajan/Scutari/Lèbre: BayesianNetworksinR vandenBoogaart/Tolosana-Delgado: AnalyzingCompositionalDatawithR Bivand/Pebesma/Gòmez-Rubio: AppliedSpatialData AnalysiswithR(2nded.2013)
Eddelbuettel: SeamlessRandC++IntegrationwithRcpp Knoblauch/Maloney: ModelingPsychophysicalDatainR Lin/Shkedy/Yekutieli/Amaratunga/Bijnens: ModelingDose-ResponseMicroarray DatainEarlyDrugDevelopment ExperimentsUsingR
Cano/M.Moguerza/Redchuk: SixSigmawithR
Soetaert/Cash/Mazzia: SolvingDifferentialEquationsinR
DirkF.Moore DepartmentofBiostatistics
RutgersSchoolofPublicHealth
Piscataway,NJ,USA
ISSN2197-5736ISSN2197-5744(electronic) UseR!
ISBN978-3-319-31243-9ISBN978-3-319-31245-3(eBook) DOI10.1007/978-3-319-31245-3
LibraryofCongressControlNumber:2016940055
©SpringerInternationalPublishingSwitzerland2016
Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped.
Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse.
Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.
Printedonacid-freepaper
ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerInternationalPublishingAGSwitzerland
Preface
Thisbookservesasanintroductoryguideforstudentsandanalystswhoneed toworkwithsurvivaltimedata.Theminimumprerequisitesarebasicapplied coursesinlinearregressionandcategoricaldataanalysis.Studentswhoalsohave takenamaster’slevelcourseinstatisticaltheorywillbewellpreparedtowork throughthisbook,sincefrequentreferenceismadetomaximumlikelihoodtheory. Studentslackingthistrainingmaystillbe abletounderstandmostofthematerial, providedtheyhaveanunderstandingofthebasicconceptsofdifferentialand integralcalculus.Specifically,studentsshouldunderstandtheconceptofthelimit, andtheyshouldknowwhatderivativesandintegralsareandbeabletoevaluatethem insomebasiccases.
Thematerialforthisbookhascomefromtwosources.Thefirstsourceis anintroductoryclassinsurvivalanalysisforgraduatestudentsinepidemiology andbiostatisticsattheRutgersSchoolofPublicHealth.Biostatisticsstudents,as onewouldexpect,haveamuchfirmergraspofmoremathematicalaspectsof statisticsthandoepidemiologystudents. Still,Ihavefoundthat thoseepidemiology studentswithstrongquantitativebackgroundshavebeenabletounderstandsome mathematicalstatisticalproceduressuchasscoreandlikelihoodratiotests,provided thattheyarenotexpectedtosymbolicallydifferentiateorintegratecomplex formulas.InthisbookIhave,whenpossible,usedthenumericalcapabilitiesofthe Rsystemtosubstituteforsymbolicmanipulation.Thesecondsourceofmaterial isderivedfromcollaborationswithphysiciansandepidemiologistsattheRutgers CancerInstituteofNewJerseyandattheRutgersRobertWoodJohnsonMedical School.Anumberofthedatasetsinthistextarederivedfromthesecollaborations. Also,theexperienceoftrainingstatisticalanalyststoworkonthesedatasets providedadditionalinspirationforthebook.
Thefirstchapterintroducestheconceptsofsurvivaltimesandhowright censoringoccursanddescribesseveralofthedatasetsthatwillbeusedthroughout thebook.Chapter 2 presentsfundamentalsofsurvivaltheory.Thisincludeshazard, probabilitydensity,survivalfunctions,andhowtheyarerelated.Thehazard functionisillustratedusingbothlifetabledataandusingsomecommonparametric distributions.Thechapterendswithabriefintroductiontopropertiesofmaximum
IwouldliketothankRebeccaMossforpermissiontousethe“pancreatic”data andMichaelSteinbergforpermissiontousethe“pharmacoSmoking”data.Bothof thesedatasetsareusedrepeatedlythroughoutthetext.Iwouldalsoliketothank GraceLu-Yao,WeichungJoeShih,andYongLinforyears-longcollaborations onusingtheSEER-Medicaredataforstudyingthesurvivaltrajectoriesofprostate cancerpatients.Thesecollaborationsledtothedevelopmentofthe“prostateSurvival”datasetdiscussedinthistextinChapter 9.IthanktheDivisionofCancer EpidemiologyandGeneticsoftheUSNationalCancerInstituteforprovidingthe “asheknazi”data.IalsothankWanYeeLauformakingthe“hepatoCellular”data publicallyavailableintheonlineDryaddatarepositoryandforallowingmeto includeitinthe“asaur”Rpackage.
Piscataway,NJ,USADirkF.Moore October2015
5RegressionAnalysisUsingtheProportionalHazardsModel
5.1CovariatesandNonparametricSurvivalModels
5.2ComparingTwoSurvivalDistributionsUsing aPartialLikelihoodFunction ........................................56
5.3PartialLikelihoodHypothesisTests
5.3.1TheWaldTest ...............................................60
5.3.2TheScoreTest
5.3.3TheLikelihoodRatioTest
5.4ThePartialLikelihoodwithMultipleCovariates
5.5EstimatingtheBaselineSurvivalFunction
5.6HandlingofTiedSurvivalTimes ....................................65
5.7LeftTruncation .......................................................69
5.8AdditionalNotes .....................................................71
6ModelSelectionandInterpretation
6.1CovariateAdjustment ................................................73
6.2CategoricalandContinuousCovariates
6.3HypothesisTestingforNestedModels
6.4TheAkaikeInformationCriterionforComparing Non-nestedModels ...................................................81
6.5IncludingSmoothEstimatesofContinuousCovariates inaSurvivalModel ..................................................84
6.6AdditionalNote
7.1AssessingGoodnessofFitUsingResiduals
7.1.1MartingaleandDevianceResiduals
7.1.2CaseDeletionResiduals
7.2CheckingtheProportionHazardsAssumption
7.2.1LogCumulativeHazardPlots
7.2.2SchoenfeldResiduals .......................................96 7.3AdditionalNote ......................................................100
8TimeDependentCovariates
8.2PredictableTimeDependentVariables
8.2.1UsingtheTimeTransferFunction
8.2.2TimeDependentVariablesThatIncrease LinearlywithTime
8.3AdditionalNote
9MultipleSurvivalOutcomesandCompetingRisks
9.1ClusteredSurvivalTimesandFrailtyModels
9.1.1MarginalSurvivalModels ..................................115
9.1.2FrailtySurvivalModels ....................................116
9.1.3AccountingforFamily-BasedClusters inthe“ashkenazi”Data
9.1.4AccountingforWithin-PersonPairingofEye ObservationsintheDiabetesData .........................120
9.2Cause-SpecificHazards ..............................................121
9.2.1Kaplan-MeierEstimationwithCompetingRisks
9.2.2Cause-SpecificHazardsandCumulative IncidenceFunctions ........................................123
9.2.3CumulativeIncidenceFunctionsforProstate CancerData .................................................126
9.2.4RegressionMethodsforCause-SpecificHazards
9.2.5ComparingtheEffectsofCovariateson DifferentCausesofDeath ..................................131
9.3AdditionalNotes .....................................................134
10.2TheExponentialDistribution
10.3TheWeibullModel ...................................................138
10.3.1AssessingtheWeibullDistributionasaModel forSurvivalDatainaSingleSample
10.3.2MaximumLikelihoodEstimationofWeibull ParametersforaSingleGroupofSurvivalData ..........141
10.3.3ProfileWeibullLikelihood .................................142
10.3.4SelectingaWeibullDistributiontoModel SurvivalData ...............................................143
10.3.5ComparingTwoWeibullDistributionsUsing theAcceleratedFailureTimeandProportional HazardsModels .............................................146
10.3.6ARegressionApproachtotheWeibullModel ............148
10.3.7UsingtheWeibullDistributiontoModel SurvivalDatawithMultipleCovariates ...................149
10.3.8ModelSelectionandResidualAnalysiswith WeibullSurvivalData ......................................151
10.4OtherParametricSurvivalDistributions ............................153
11.1PowerandSampleSizeforaSingleArmStudy
11.2DeterminingtheProbabilityofDeathinaClinicalTrial ...........161
11.3SampleSizeforComparingTwoExponentialSurvival Distributions ..........................................................163
11.4SampleSizeforComparingTwoSurvivalDistributions UsingtheLog-RankTest ............................................165
11.5DeterminingtheProbabilityofDeath fromaNon-parametricSurvivalCurveEstimate ...................166
11.6Example:CalculatingtheRequiredNumberofPatients foraRandomizedStudyofAdvancedGastricCancerPatients
11.7Example:CalculatingtheRequiredNumberofPatients foraRandomizedStudyofPatientswithMetastatic ColorectalCancer ....................................................170
11.8UsingSimulationstoEstimatePower ...............................171
11.9AdditionalNotes .....................................................174
12AdditionalTopics ...........................................................177
12.1UsingPiecewiseConstantHazardstoModelSurvivalData
12.2IntervalCensoring ....................................................187
12.3TheLassoMethodforSelectingPredictiveBiomarkers
AABasicGuidetoUsingRforSurvivalAnalysis ........................201
A.1TheRSystem .........................................................201
A.1.1AFirstRSession ...........................................202
A.1.2ScatterplotsandFittingLinearRegressionModels
A.1.3AccommodatingNon-linearRelationships
A.1.4DataFramesandtheSearchPathforVariableNames
A.1.5DefiningVariablesWithinaDataFrame
A.1.6ImportingandExportingDataFrames
A.2WorkingwithDatesinR .............................................212
A.2.1DatesandLeapYears .......................................213
A.2.2Usingthe“as.date”Function ...............................213
A.3PresentingCoefficientEstimatesUsingForestPlots
A.4ExtractingtheLogPartialLikelihoodandCoefficient EstimatesfromacoxphObject ......................................217
thosewhoarenot,anoverviewofRmaybefoundintheappendix,andlinkstomore extensiveRguidesandmanualsmaybefoundonthemainRwebsite.Readerswho masterthetechniquesinthisbookwillbeequippedtouseRtocarryoutsurvival analysesinapracticalsetting,andthosewhoarefamiliarwithoneofthemany excellentcommercialstatisticalpackagesshouldbeabletoadaptwhattheyhave learnedtotheparticularcommandsyntaxandoutputstyleofthatpackage.
1.2WhatYouNeedtoKnowtoUseThisBook
Survivalanalysisresembleslinearandlogisticregressionanalysisinseveralways: thereis(typically)asingleoutcomevariableandoneormorepredictors;testing statisticalhypothesesabouttherelationshipofthepredictorstotheoutcome variableisofparticularinterest;adjustingforconfoundingcovariatesiscrucial; andmodelselectionandcheckingofassumptionsthroughanalysisofresidualsand othermethodsarekeyrequirements.Thus,readersshouldbefamiliarwithbasic conceptsofclassicalhypothesistestingandwithprinciplesofregressionanalysis. Familiaritywithcategoricaldataanalysis methods,includingcontingencytables, stratifiedcontingencytables,andPoissonandlogisticregression,arealsoimportant. However,survivalanalysisdiffersfromtheseclassicalstatisticalmethodsinthat censoringplaysacentralroleinnearlyallcases,andthetheoreticalunderpinningsof thesubjectarefarmorecomplex.WhileIhavestrivedtokeepthemathematicallevel ofthisbookasassessableaspossible,manyconceptsinsurvivalanalysisdepend onsomeunderstandingofmathematicalstatistics.Readersataminimummust understandkeyideasfromcalculussuchaslimitsandthemeaningofderivativesand integrals;thedefinitionofthehazardfunction,forexample,underlieseverything wewilldo,anditsdefinitiondependsonlimits.Anditsconnectiontothesurvival functiondependsonanintegral.Thosewhoarealreadyfamiliarwithbasicconcepts oflikelihoodtheoryatthelevelofaMastersprograminstatisticsorbiostatisticswill havetheeasiesttimeworkingthroughthisbook.Forthosewhoarelessfamiliarwith thesetopicsIhaveendeavoredtousethenumericalcapabilitiesofRtoillustrate likelihoodprinciplesastheyarise.Also,asalreadymentioned,thereaderisexpected tobefamiliarwiththebasicsofusingtheRsystem,includingsuchconceptsas vectors,matrices,datastructuresandcomponents,anddataframes.Heorsheshould alsobesufficientlyfamiliarwithRtocarryoutbasicdataanalysesandmakedata plots,aswellasunderstandhowtoinstallinRpackagesfromthemainCRAN (ComprehensiveRArchiveNetwork)repository.
1.3SurvivalDataandCensoring
Akeycharacteristicofsurvivaldataisthattheresponsevariableisanon-negative discreteorcontinuousrandomvariable,andrepresentsthetimefromawelldefinedorigintoawell-definedevent.Asecondcharacteristicofsurvivalanalysis,
censoring,ariseswhenthestartingorendingeventsarenotpreciselyobserved. Themostcommonexampleofthisisrightcensoring,whichresultswhenthefinal endpointisonlyknowntoexceedaparticularvalue.Formally,if T isarandom variablerepresentingthetimetofailureand U isarandomvariablerepresenting thetimetoacensoringevent,whatweobserveis T D min.T ; U / andacensoring indicator ı D I ŒT < U .Thatis, ı is0or1accordingtowhether T isacensored timeoranobservedfailuretime.Lesscommonlyonemayhaveleftcensoring, whereeventsareknowntohaveoccurred before acertaintime,orintervalcensoring, wherethefailuretimeisonlyknowntohaveoccurredwithinaspecifiedintervalof time.Fornowwewilladdressthemoreprevalentright-censoringsituation.
Censoringmaybeclassifiedintothreetypes:TypeI,TypeII,orrandom.In TypeIcensoring,thecensoringtimesarepre-specified.Forexample,inananimal experiment,acohortofanimalsmaystartataspecifictime,andallfolloweduntil apre-specifiedendingtime.Animalswhichhavenotexperiencedtheeventof interestbeforetheendofthestudyarethencensoredatthattime.Anotherexample, discussedindetailinExample 1.5,isasmokingcessationstudy,wherebydesign eachsubjectisfolloweduntilrelapse(returntosmoking)or180days,whichever comesfirst.Thosesubjectswhodidnotrelapsewithinthe180dayperiodwere censoredatthattime.
TypeIIcensoringoccurswhentheexperimentalobjectsarefolloweduntilaprespecifiedfractionhavefailed.Suchadesignisrareinbiomedicalstudies,butmay beusedinindustrialsettings,wheretimetofailureofadeviceisofprimaryinterest. Anexamplewouldbeonewherethestudystopsafter,forinstance,25outof100 devicesareobservedtofail.Theremaining75deviceswouldthenbecensored.In thisexample,thesmallest25%oftheorderedfailuretimesareobserved,andthe remainderarecensored.
Thelastgeneralcategoryofcensoringis random censoring.Carefulattentionto thecauseofthecensoringisessentialinordertoavoidbiasedsurvivalestimates.In biomedicalsettings,onecauseofrandomcensoringispatientdropout.Ifthedropout occurstrulyatrandom,andisunrelatedtothediseaseprocess,suchcensoringmay notcauseanyproblemswithbiasintheanalysis.Butifpatientswhoareneardeath aremorelikelytodropoutthanotherpatients,seriousbiasesmayarise.Another causeofrandomcensoringiscompetingevents.Forinstance,inExample 1.4,the primaryoutcomeistimetodeathfromprostatecancer.Butwhenapatientdiesof anothercausefirst,thenthatpatientwillbecensored,sincethetimehewouldhave diedofprostatecancer(hadhenotdiedfirstoftheothercause)isunknown.The questionofindependenceofthecompetingcausesis,ofcourse,animportantissue, andwillbediscussedinSect. 9.2.
Inclinicaltrials,themostcommonsourceofrandomcensoringis administrative censoring,whichresultsbecausesomepatientsinaclinicaltrialhavenotyetdied atthetimetheanalysisiscarriedout.Thisconceptisillustratedinthefollowing example.
Example1.1. Considerahypotheticalcancerclinicaltrialwheresubjectsenterthe trialoveracertainperiodoftime,knownastheaccrualperiod,andarefollowed foranadditionalperiodoftime,knownasthefollow-upperiod,todeterminetheir survivaltimes.Thatis,foreachpatient,wewouldliketoobservethetimebetween whenapatiententeredthetrialandwhenthatpatientdied.Butunlessthetypeof cancerbeingstudiedisquicklyfatal,somepatientswillstillbealiveattheendofthe follow-uptime,andindeedmanypatientsmaysurvivelongafterthistime.Forthese patients,thesurvivaltimesareonlypartiallyobserved;weknowthatthesepatients surviveduntiltheendoffollow-up,butwedon’tknowhowmuchlongertheywill survive.Suchtimesaresaidtoberight-censored,andthistypeofcensoringisboth themostcommonandthemosteasilyaccommodated.Othertypesofcensoring,as wehaveseen,includeleftandintervalcensoring.Wewilldiscussthesebrieflyin thelastchapter.
Figure 1.1 presentsdatafromahypotheticalclinicaltrial.Here,fivepatientswere enteredovera2.5-yearaccrualperiodwhichranfromJanuary1,2000untilJune 30,2002.Thiswasfollowedby4.5yearsofadditionalfollow-uptime,whichlasted untilDecember31,2007.Inthisexample, thedataweremeanttobeanalyzedat thistime,butthreepatients(Patients1,3and4)werestillalive.Alsoshowninthis exampleistheultimatefateofthesethreepatients,butthiswouldnothavebeen knownatthetimeofanalysis.Thus,forthesethreepatients,wehaveincomplete informationabouttheirsurvivaltime.Forexample,weknowthatPatient1survived atleast7years,butasoftheendof2007itwouldnothavebeenknownhowlong thepatientwouldultimatelylive.
Fig.1.1 Clinicaltrialaccrual andfollow-upperiods.The verticaldashedlines indicate thetrialstart,endofaccrual, andendoffollow-up.TheX’s denotedeathsandthe open circles denotecensoring events
Accrual Follow−up
Informative censoring,bycontrast,may(forexample)resultifindividualsina clinicaltrialtendtodropoutofthestudy(andbecomelosttofollow-up)forreasons relatedtothefailureprocess.Thistypeofcensoringcanintroducebiasesintothe analysisthataredifficulttoadjustfor.Themethodswediscusswillrequirethe assumptionthatcensoringisnon-informative.
Thegoalsofsurvivalanalysisaretoestimatethesurvivaldistribution,tocompare twoormoresurvivaldistributions,or(moregenerally)toassesstheeffectsofa numberoffactorsonsurvival.Thetechniquesbearsomeresemblancetoregression analysis,withtheimportantdistinctionsthattheoutcomevariable(time)isalways positiveandoftencensored.
1.4SomeExamplesofSurvivalDataSets
Followingareafewexamplesofstudiesusingsurvivalanalysiswhichwewillrefer tothroughoutthetext.Thedatasetsmaybeobtainedbyinstallingthetext’spackage “asaur”fromthemainCRANrepository.Datafortheseexamplesispresentedina numberofdifferentformats,reflectingtheformatsthatadataanalystmayseein practice.Forexample,mostdatasetspresentsurvivaltimeintermsoftimefrom theorigin(typicallyentryintoatrial).Onecontainsspecificdates(dateofentry intoatrialanddateofdeath)fromwhichwe computethesurvivaltime.Allcontain additionalvariables,suchascensoringvariables,whichindicatethatpartialtime informationonsomesubjectsisavailable. Mostalsocontaintreatmentindicators andothercovariateinformation.
Example1.2. Xeloxinpatientswithadvancedgastriccancer
ThisisaPhaseII(singlesample)clinicaltrialofXelodaandoxaliplatin(XELOX) chemotherapygivenbeforesurgeryto48advancedgastriccancerpatientswithparaaorticlymphnodemetastasis(Wangetal.[74]).Animportantsurvivaloutcomeof interestisprogression-freesurvival,whichisthetimefromentryintotheclinical trialuntilprogressionordeath,whichevercomesfirst.Thedata,whichhavebeen extractedfromthepaper,areinthedataset“gastricXelox”inthe“asaur”package; asampleoftheobservations(forpatients23through27)areasfollows:
>library(asaur)
>gastricXelox[23:27,] timeWeeksdelta 23421 24431 25430 26461 27480
Thefirstcolumnisthepatient(row)number.Thesecondisalistofsurvival times,roundedtothenearestweek,andthethirdis“delta”,whichisthecensoring indicator.Forexample,forpatientnumber23,thetimeis42anddeltais1,indicating
thattheobservedendpoint(progressionordeath)hadbeenobserved42weeksafter entryintothetrial.Forpatientnumber25,thetimeis43anddeltais0,indicating thatthepatientwasaliveat43weeksafterentryandnoprogressionhadbeen observed.WewilldiscussthisdatasetfurtherinChap. 3
Example1.3. Pancreaticcancerinpatientswithlocallyadvancedormetastatic disease
ThisisalsoasinglesamplePhaseIIstudyofachemotherapeuticcompound,and themainpurposewastoassessoverallsurvivalandalso“progression-freesurvival”, whichisdefinedasthetimefromentryintothetrialuntildiseaseprogression ordeath,whichevercomesfirst.Asecondaryinterestinthestudyistocompare theprognosisofpatientswithlocallyadvanceddiseaseascomparedtometastatic disease.TheresultswerepublishedinMossetal.[51]Thedataareavailableinthe dataset“pancreatic”inthe“asaur”package.Herearethefirstfewobservations: >head(pancreatic)
stageonstudyprogressiondeath
1M12/16/20052/2/200610/19/2006
2M1/6/20062/26/20064/19/2006
3LA2/3/20068/2/20061/19/2007
4M3/30/2006.5/11/2006
5LA4/27/20063/11/20075/29/2007
6M5/7/20066/25/200610/11/2006
Forexample,Patient#3,apatientwithlocallyadvanceddisease(stage=“LA”), enteredthestudyonFebruary3,2006.Thatpersonwasfoundtohaveprogressive diseaseonAugust2ofthatyear,anddiedonJanuary19ofthefollowingyear. Theprogression-freesurvivalforthatpatientisthedifferenceoftheprogressiondateandtheon-studydate.Patient#4,apatientwithmetastaticdisease (stage=“M”),enteredonMarch302006anddiedonMay11ofthatyear,with norecordeddateofprogression.Theprogression-freesurvivaltimeforthatpatients isthusthedifferenceofthedeathdateandtheon-studydate.Forbothpatients,the overallsurvivalisthedifferencebetweenthedateofdeathandtheon-studydate.In thisstudytherewasnocensoring,sincenoneoftheseseriouslyillpatientssurvived forverylong.InChap. 3 wewillseehowtocomparethesurvivalofthetwogroups ofpatients.
Example1.4. Survivalprospectsofprostatecancerpatientswithhigh-riskdisease Inthisdatasettherearetwooutcomesofinterest,deathfromprostatecancerand deathfromothercauses,sowehavewhat iscalledacompetingriskssurvival analysisproblem.Inthisexample,wehavesimulateddatafrom14,294prostate cancerpatientsbasedondetailedcompetingrisksanalysespublishedbyLu-Yao etal.[46].Foreachpatientwehavegrade(poorlyormoderatelydifferentiated),age ofdiagnosis(66-70,71-75,76-80,and80+),cancerstage(T1cifscreen-diagnosed usingaprostate-specificantigenbloodtest,T1abifclinicallydiagnosedwithout screening,orT2ifpalpableatdiagnosis),survivaltime(daysfromdiagnosisto deathordatelastseen),andanindicator(“status”)forwhetherthepatientdied
>pharmacoSmoking[1:6,2:8]
ttrrelapsegrpagegenderraceemployment 11820patchOnly36Malewhiteft 2141patchOnly41Malewhiteother 351combination25Femalewhiteother 4161combination54Malewhiteft 501combination45Malewhiteother 61820combination43Malehispanicft
Thevariable“ttr”isthenumberofdayswithoutsmoking(“timetorelapse”),and “relapse=1”indicatesthatthesubjectstartedsmokingagainatthegiventime.The variable“grp”isthetreatmentindicator,and“employment”cantakethevalues“ft” (fulltime),“pt”(parttime),or“other”.Theprimaryobjectivesweretocomparethe twotreatmenttherapieswithregardtotimetorelapse,andtoidentifyotherfactors relatedtothisoutcome.
Example1.6. Predictionofsurvivalofhepatocellularcarcinomapatientsusing biomarkers
Thisstudy(Lietal.[42, 43])focusedonusingexpressionofachemokind knownasCXCL17,andotherclinicalandbiomarkerfactors,topredictoveralland recurrence-freesurvival.Thisexamplecontainsdataon227patients,eachwitha widerangeofclinicalandbiomarkervalues.The“hepatoCellular”dataarepublicly availableintheDryadonlinedatarepository[43]aswellasinthe“asaur”Rpackage thataccompaniesthistext.Here,forillustration,isasmallselectionofcasesand covariates.
>hepatoCellular[c(1,2,3,65,71),c(2,3,16:20,24,47)]
AgeGenderOSDeathRFSRecurrenceCXCL17TCD4NKi67 1570830131113.9472406.04350 258181081054.07154NANA 365179079022.18883NANA 653815151106.78169044.24411 7157111111198.49680099.59232
Thesurvivaloutcomesare“OS”(overallsurvival)and“RFS”(recurrence-freesurvival),andthecorrespondingcensoring indicatorsare“Death”and“Recurrence”. Thefulldatasethas48columns.Incolumns23to48therearemanypatientswith missingvalues,withonly117patientshavingcompletedata.
1.5AdditionalNotes
1.Anothertypeofincompleteobservationwithsurvivaldataistruncation,aresult oflength-biasedsampling.WediscusslefttruncationinSect. 3.5.Righttruncationislesscommonandmoredifficulttomodel.SeeKleinandMoeschberger [36]forfurtherdiscussion.
2.TheHealthcareDeliveryResearchProgramoftheDivisionofCancerControl andPopulationSciences,NationalCancer Institute,USAmaintainstheSEERMedicareLinkedDatabase,whichprovidedthedatausedinLu-Yaoetal.[46].
ThisNCI-basedresearchprogrammakesthisdataavailableforresearchonly, andwillnotpermitittobedistributedforeducationalpurposes.Thusitcannotbe usedinthisbook.Fortunately,however,theLu-Yaopublicationcontainsdetailed cause-specificsurvivalcurvesforpatientscross-classifiedbyfouragegroups, threestagecategories,andtwoGleasonstages,aswellasprecisecountsofthe numbersofpatientsineachcategory.This informationwasusedtosimulatea survivaldatasetthatmaintainsmanyofthecharacteristicsoftheoriginalSEERMedicaredatausedinthepaper.Thissimulateddataset,“prostateSurvival”,is whatisusedinthisbookforinstructionalpurposes.
3.Numerousexcellentillustrativesurvivalanalysisdatasetsarefreelyavailable toall.Thestandard“survival”librarythatisdistributedwiththeRsystem hasanumberofsurvivalanalysisdatasets.Also,the“KMsurv”Rpackage containsarichadditionalsetofdatasetsthatwerediscussedinKleinand Moeschberger[36].The“asaur”Rpackagecontainsdatasetsusedinthecurrent text.
Exercises
1.1.Considerasimpleexampleoffivecancerpatientswhoenteraclinicaltrialas illustratedinthefollowingdiagram:
Re-writethesesurvivaltimesintermsofpatienttime,andcreateasimpledata setlistingthesurvivaltimeandcensoringindicatorforeachpatient.Howmany patientsdied?Howmanyperson-yearsarethereinthistrial?Whatisthedeathrate perperson-year?
1.2.Forthe“gastricXelox”dataset,useRtodeterminehowmanypatientshadthe event(deathorprogression),thenumberofperson-weeksoffollow-uptime,and theeventrateperperson-week.