AdministrativeRecordsforSurvey Methodology
Editedby AsaphYoungChun StatisticsResearchInstitute StatisticsKorea,RepublicofKorea
MichaelD.Larsen
DepartmentofMathematicsandStatistics SaintMichael’sCollege,UnitedStates
GabrieleDurrant
DepartmentofSocialStatisticsandDemography SouthamptonUniversity,UK
JeromeP.Reiter
DepartmentofStatisticalScience DukeUniversity,UnitedStates
Thisfirsteditionfirstpublished2021 ©2021JohnWileyandSons,Inc.
Allrightsreserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmitted, inanyformorbyanymeans,electronic,mechanical,photocopying,recordingorotherwise,exceptas permittedbylaw.Adviceonhowtoobtainpermissiontoreusematerialfromthistitleisavailableathttp:// www.wiley.com/go/permissions.
TherightofAsaphYoungChun,MichaelD.Larsen,GabrieleDurrant,andJeromeP.Reitertobeidentifiedas theauthorsoftheeditorialmaterialinthisworkhasbeenassertedinaccordancewithlaw.
RegisteredOffice
JohnWiley&Sons,Inc.,111RiverStreet,Hoboken,NJ07030,USA
EditorialOffice 111RiverStreet,Hoboken,NJ07030,USA
Fordetailsofourglobaleditorialoffices,customerservices,andmoreinformationaboutWileyproductsvisit usatwww.wiley.com.
Wileyalsopublishesitsbooksinavarietyofelectronicformatsandbyprint-on-demand.Somecontentthat appearsinstandardprintversionsofthisbookmaynotbeavailableinotherformats.
LimitofLiability/DisclaimerofWarranty
Whilethepublisherandauthorshaveusedtheirbesteffortsinpreparingthiswork,theymakeno representationsorwarrantieswithrespecttotheaccuracyorcompletenessofthecontentsofthisworkand specificallydisclaimallwarranties,includingwithoutlimitationanyimpliedwarrantiesofmerchantabilityor fitnessforaparticularpurpose.Nowarrantymaybecreatedorextendedbysalesrepresentatives,writtensales materialsorpromotionalstatementsforthiswork.Thefactthatanorganization,website,orproductis referredtointhisworkasacitationand/orpotentialsourceoffurtherinformationdoesnotmeanthatthe publisherandauthorsendorsetheinformationorservicestheorganization,website,orproductmayprovide orrecommendationsitmaymake.Thisworkissoldwiththeunderstandingthatthepublisherisnotengaged inrenderingprofessionalservices.Theadviceandstrategiescontainedhereinmaynotbesuitableforyour situation.Youshouldconsultwithaspecialistwhereappropriate.Further,readersshouldbeawarethat websiteslistedinthisworkmayhavechangedordisappearedbetweenwhenthisworkwaswrittenandwhen itisread.Neitherthepublishernorauthorsshallbeliableforanylossofprofitoranyothercommercial damages,includingbutnotlimitedtospecial,incidental,consequential,orotherdamages.
LibraryofCongressCataloging-in-PublicationData
Names:Chun,AsaphYoung,editor.|Larsen,MichaelD.,1977-editor.
Title:Administrativerecordsforsurveymethodology/editedbyAsaph YoungChun,StatisticsResearchInstitute|StatisticsKorea,RepublicofKorea,MichaelD.Larsen, St.Michael’sCollege,Colchester,UnitedStates,GabrieleDurrant,UK,JeromeP. Reiter,UnitedStates.
Description:Firstedition.|Hoboken,NJ:Wiley,2021.|Series:Wiley seriesinsurveymethodology
Identifiers:LCCN2020030571(print)|LCCN2020030572(ebook)|ISBN 9781119272045(cloth)|ISBN9781119272052(adobepdf)|ISBN 9781119272069(epub)
Subjects:LCSH:Surveys–Methodology.|Surveys–Qualitycontrol.
Classification:LCCHA31.2.A362021(print)|LCCHA31.2(ebook)|DDC 001.4/33–dc23
LCrecordavailableathttps://lccn.loc.gov/2020030571
LCebookrecordavailableathttps://lccn.loc.gov/2020030572
CoverDesign:Wiley
CoverImage:©PopTika/Shutterstock
Setin9.5/12.5ptSTIXTwoTextbySPiGlobal,Chennai,India
10987654321
Contents
Preface xv
Acknowledgments xxi ListofContributors xxiii
PartIFundamentalsofAdministrativeRecordsResearch andApplications 1
1OntheUseofProxyVariablesinCombiningRegisterand SurveyData 3 Li-ChunZhang
1.1Introduction 3
1.1.1AMultisourceDataPerspective 3
1.1.2ConceptofProxyVariable 5
1.2InstancesofProxyVariable 7
1.2.1Representation 7
1.2.2Measurement 10
1.3EstimationUsingMultipleProxyVariables 12
1.3.1AsymmetricSetting 13
1.3.2UncertaintyEvaluation:ACaseofTwo-WayData 15
1.3.3SymmetricSetting 17
1.4Summary 20 References 20
2DisclosureLimitationandConfidentialityProtectionin LinkedData 25 JohnM.Abowd,IanM.Schmutte,andLarsVilhuber
2.1Introduction 25
2.2ParadigmsofProtection 27
2.2.1InputNoiseInfusion 29
2.2.2FormalPrivacyModels 30
2.3ConfidentialityProtectioninLinkedData:Examples 32
2.3.1HRS–SSA 32
2.3.1.1DataDescription 32
2.3.1.2LinkagestoOtherData 32
2.3.1.3DisclosureAvoidanceMethods 33
2.3.2SIPP–SSA–IRS(SSB) 34
2.3.2.1DataDescription 34
2.3.2.2DisclosureAvoidanceMethods 35
2.3.2.3DisclosureAvoidanceAssessment 35
2.3.2.4AnalyticalValidityAssessment 37
2.3.3LEHD:LinkedEstablishmentandEmployeeRecords 38
2.3.3.1DataDescription 38
2.3.3.2DisclosureAvoidanceMethods 39
2.3.3.3DisclosureAvoidanceAssessmentforQWI 41
2.3.3.4AnalyticalValidityAssessmentforQWI 42
2.4PhysicalandLegalProtections 43
2.4.1StatisticalDataEnclaves 44
2.4.2RemoteProcessing 46
2.4.3Licensing 46
2.4.4DisclosureAvoidanceMethods 47
2.4.5DataSilos 48
2.5Conclusions 49
2.A.1OtherAbbreviations 51
2.A.2Concepts 52 Acknowledgments 54 References 54 PartIIDataQualityofAdministrativeRecordsandLinking Methodology 61
3EvaluationoftheQualityofAdministrativeDataUsedinthe DutchVirtualCensus 63 PietDaas,EricS.Nordholt,MartijnTennekes,andSaskiaOssen
3.1Introduction 63
3.2DataSourcesandVariables 64
3.3QualityFramework 66
3.3.1SourceandMetadataHyperDimensions 66
3.3.2DataHyperDimension 68
3.4QualityEvaluationResultsfortheDutch2011Census 69
3.4.1SourceandMetadata:ApplicationofChecklist 69
3.4.2DataHyperDimension:CompletenessandAccuracyResults 72
3.4.2.1CompletenessDimension 73
3.4.2.2AccuracyDimension 75
3.4.2.3VisualizingwithaTableplot 78
3.4.3DiscussionoftheQualityFindings 80
3.5Summary 81
3.6PracticalImplicationsforImplementationwithSurveysand Censuses 81
3.7Exercises 82 References 82
4ImprovingInputDataQualityinRegister-BasedStatistics: TheNorwegianExperience 85 CoenHendriks
4.1Introduction 85
4.2TheUseofAdministrativeSourcesinStatisticsNorway 86
4.3ManagingStatisticalPopulations 89
4.4ExperiencesfromtheFirstNorwegianPurelyRegister-Based PopulationandHousingCensusof2011 91
4.5TheContactwiththeOwnersofAdministrativeRegistersWasPutinto System 93
4.5.1AgreementsonDataProcessing 93
4.5.2AgreementsofCooperationonDataQualityinAdministrativeData Systems 95
4.5.3TheForumsforCooperation 96
4.6MeasuringandDocumentingInputDataQuality 96
4.6.1QualityIndicators 96
4.6.2OperationalizingtheQualityChecks 97
4.6.3QualityReports 99
4.6.4TheApproachIsBeingAdoptedbytheOwnersofAdministrative Data 99
4.7Summary 100
4.8Exercises 101 References 104
5CleaningandUsingAdministrativeLists:EnhancedPractices andComputationalAlgorithmsforRecordLinkageand Modeling/Editing/Imputation 105 WilliamE.Winkler
5.1IntroductoryComments 105
5.1.1Example1 105
5.1.2Example2 106
5.1.3Example3 107
5.2Edit/Imputation 108
5.2.1Background 108
5.2.2Fellegi–HoltModel 110
5.2.3ImputationGeneralizingLittle–Rubin 110
5.2.4ConnectingEditwithImputation 111
5.2.5AchievingExtremeComputationalSpeed 112
5.3RecordLinkage 113
5.3.1Fellegi–SunterModel 113
5.3.2EstimatingParameters 116
5.3.3EstimatingFalseMatchRates 118
5.3.3.1TheDataFiles 118
5.3.4AchievingExtremeComputationalSpeed 123
5.4ModelsforAdjustingStatisticalAnalysesforLinkageError 124
5.4.1Scheuren–Winkler 124
5.4.2Lahiri–Larsen 125
5.4.3ChambersandKim 127
5.4.4Chipperfield,Bishop,andCampbell 128
5.4.4.1EmpiricalData 130
5.4.5Goldstein,Harron,andWade 132
5.4.6HofandZwinderman 133
5.4.7TancrediandLiseo 133
5.5ConcludingRemarks 133
5.6IssuesandSomeRelatedQuestions 134 References 134
6AssessingUncertaintyWhenUsingLinkedAdministrative Records 139 JeromeP.Reiter
6.1Introduction 139
6.2GeneralSourcesofUncertainty 140
6.2.1ImperfectMatching 140
6.2.2IncompleteMatching 141
6.3ApproachestoAccountingforUncertainty 142
6.3.1ModelingMatchingMatrixasParameter 143
6.3.2DirectModeling 146
6.3.3ImputationofEntireConcatenatedFile 148
6.4ConcludingRemarks 149
6.4.1ProblemstoBeSolved 149
6.4.2PracticalImplications 150
6.5Exercises 150 Acknowledgment 151 References 151
7MeasuringandControllingforNon-ConsentBiasinLinked SurveyandAdministrativeData 155 JosephW.Sakshaug
7.1Introduction 155
7.1.1WhatIsLinkageConsent?WhyIsLinkageConsentNeeded? 155
7.1.2LinkageConsentRatesinLarge-ScaleSurveys 156
7.1.3TheImpactofLinkageNon-ConsentBiasonSurveyInference 158
7.1.4TheChallengeofMeasuringandControllingforLinkageNon-Consent Bias 158
7.2StrategiesforMeasuringLinkageNon-ConsentBias 159
7.2.1FormulationofLinkageNon-ConsentBias 159
7.2.2ModelingNon-ConsentUsingSurveyInformation 160
7.2.3AnalyzingNon-ConsentBiasforAdministrativeVariables 162
7.3MethodsforMinimizingNon-ConsentBiasattheSurveyDesign Stage 163
7.3.1OptimizingLinkageConsentRates 163
7.3.2PlacementoftheConsentRequest 163
7.3.3WordingoftheConsentRequest 165
7.3.4ActiveandPassiveConsentProcedures 166
7.3.5LinkageConsentinPanelStudies 167
7.4MethodsforMinimizingNon-ConsentBiasattheSurveyAnalysis Stage 168
7.4.1ControllingforLinkageNon-ConsentBiasviaStatistical Adjustment 169
7.4.2WeightingAdjustments 169
7.4.3Imputation 170
7.5Summary 172
7.5.1KeyPointsforMeasuringLinkageNon-ConsentBias 172
7.5.2KeyPointsforControllingforLinkageNon-ConsentBias 172
7.6PracticalImplicationsforImplementationwithSurveysand Censuses 173
7.7Exercises 174 References 174
PartIIIUseofAdministrativeRecordsinSurveys 179
8ARegister-BasedCensus:TheSwedishExperience 181 MartinAxelson,AndersHolmberg,IngegerdJansson,andSaraWestling
8.1Introduction 181
8.2Background 182
8.3Census2011 183
8.4ARegister-BasedCensus 185
x Contents
8.4.1RegistersatStatisticsSweden 185
8.4.2FacilitatingaSystemofRegisters 186
8.4.3IntroducingaDwellingIdentificationKey 187
8.4.4TheCensusHouseholdandDwellingPopulations 188
8.5EvaluationoftheCensus 190
8.5.1Introduction 190
8.5.2EvaluatingHouseholdSizeandType 192
8.5.2.1SamplingDesign 192
8.5.2.2DataCollection 193
8.5.2.3Reconciliation 194
8.5.2.4Results 194
8.5.3EvaluatingOwnership 195
8.5.4LessonsLearned 198
8.6ImpactonPopulationandHousingStatistics 199
8.7SummaryandFinalRemarks 201 References 203
9AdministrativeRecordsApplicationsforthe2020 Census 205 VincentT.MuleJr,andAndrewKeller
9.1Introduction 205
9.2AdministrativeRecordUsageintheU.S.Census 206
9.3AdministrativeRecordIntegrationin2020CensusResearch 207
9.3.1AdministrativeRecordUsageDeterminations 207
9.3.2NRFUDesignIncorporatingAdministrativeRecords 208
9.3.3AdministrativeRecordsSourcesandDataPreparation 210
9.3.4ApproachtoDetermineAdministrativeRecordVacantAddresses 212
9.3.5ExtensionofVacantMethodologytoNonexistentCases 214
9.3.6ApproachtoDetermineOccupiedAddresses 215
9.3.7OtherAspectsandAlternativesofAdministrativeRecord Enumeration 217
9.4QualityAssessment 219
9.4.1MicrolevelEvaluationsofQuality 219
9.4.2MacrolevelEvaluationsofQuality 221
9.5OtherApplicationsofAdministrativeRecordUsage 224
9.5.1Register-BasedCensus 224
9.5.2SupplementTraditionalEnumerationwithAdjustmentsforEstimated ErrorforOfficialCensusCounts 224
9.5.3CoverageEvaluation 225
9.6Summary 226
9.7Exercises 227 References 228
10UseofAdministrativeRecordsinSmallArea Estimation 231
AndreeaL.Erciulescu,CarolinaFranco,andParthaLahiri
10.1Introduction 231
10.2DataPreparation 233
10.3SmallAreaEstimationModelsforCombiningInformation 238
10.3.1Area-levelModels 238
10.3.2Unit-levelModels 247
10.4AnApplication 252
10.5ConcludingRemarks 259
10.6Exercises 259 Acknowledgments 261 References 261
PartIVUseofAdministrativeDatainEvidence-Based Policymaking 269
11EnhancementofHealthSurveyswithDataLinkage 271 CordellGoldenandLisaB.Mirel
11.1Introduction 271
11.1.1TheNationalCenterforHealthStatistics(NCHS) 271
11.1.2TheNCHSDataLinkageProgram 272
11.1.3InitialLinkageswithNCHSSurveys 272
11.2ExamplesofNCHSHealthSurveysthatWereEnhancedThrough Linkage 273
11.2.1NationalHealthInterviewSurvey(NHIS) 273
11.2.2NationalHealthandNutritionExaminationSurvey(NHANES) 274
11.2.3NationalHealthCareSurveys 274
11.3NCHSHealthSurveysLinkedwithVitalRecordsandAdministrative Data 275
11.3.1NationalDeathIndex(NDI) 276
11.3.2CentersforMedicareandMedicaidServices(CMS) 276
11.3.3SocialSecurityAdministration(SSA) 277
11.3.4DepartmentofHousingandUrbanDevelopment(HUD) 277
11.3.5UnitedStatesRenalDataSystemandtheFloridaCancerData System 278
11.4NCHSDataLinkageProgram:LinkageMethodologyandProcessing Issues 278
11.4.1InformedConsentinHealthSurveys 278
11.4.2InformedConsentforChildSurveyParticipants 279
11.4.3AdaptiveApproachestoLinkingHealthSurveyswithAdministrative Data 280
11.4.4UseofAlternateRecords 281
11.4.5ProtectingthePrivacyofHealthSurveyParticipantsandMaintaining DataConfidentiality 282
11.4.6UpdatesOverTime 283
11.5EnhancementstoHealthSurveyDataThroughLinkage 284
11.6AnalyticConsiderationsandLimitationsofAdministrativeData 286
11.6.1AdjustingSampleWeightsforLinkage-Eligibility 287
11.6.2ResidentialMobilityandLinkagestoStateProgramsand Registries 288
11.7FutureoftheNCHSDataLinkageProgram 289
11.8Exercises 291
Acknowledgments 292 Disclaimer 292 References 292
12CombiningAdministrativeandSurveyDatatoImprove IncomeMeasurement 297 BruceD.MeyerandNikolasMittag
12.1Introduction 297
12.2MeasuringandDecomposingTotalSurveyError 299
12.3GeneralizedCoverageError 302
12.4ItemNonresponseandImputationError 305
12.5MeasurementError 307
12.6Illustration:UsingDataLinkagetoBetterMeasureIncomeand Poverty 311
12.7AccuracyofLinksandtheAdministrativeData 312
12.8Conclusions 315
12.9Exercises 316
Acknowledgments 317 References 317
13CombiningDatafromMultipleSourcestoDefinea Respondent:TheCaseofEducationData 323 PeterSiegel,DarrylCreel,andJamesChromy
13.1Introduction 323
13.1.1OptionsforDefiningaUnitRespondentWhenDataExistfrom SourcesInsteadoforinAdditiontoanInterview 324
13.1.2ConcernswithDefiningaUnitRespondentWithoutHavingan Interview 325
13.2LiteratureReview 326
13.3Methodology 327
13.3.1ComputingWeightsforInterviewRespondentsandforUnit RespondentsWhoMayNotHaveInterviewData(UsableCase Respondents) 327
13.3.1.1HowManyWeightsAreNecessary? 328
13.3.2ImputingDataWhenAllorSomeInterviewDataAreMissing 328
13.3.3ConductingNonresponseBiasAnalysestoAppropriatelyConsider InterviewandStudyNonresponse 329
13.4ExampleofDefiningaUnitRespondentfortheNational PostsecondaryStudentAidStudy(NPSAS) 330
13.4.1OverviewofNPSAS 330
13.4.2UsableCaseRespondentApproach 333
13.4.2.1Results 333
13.4.3InterviewRespondentApproach 335
13.4.3.1Results 336
13.4.4ComparisonofEstimates,Variances,andNonresponseBiasUsingTwo ApproachestoDefineaUnitRespondent 338
13.5Discussion:AdvantagesandDisadvantagesofTwoApproachesto DefiningaUnitRespondent 340
13.5.1InterviewRespondents 340
13.5.2UsableCaseRespondents 341
13.6PracticalImplicationsforImplementationwithSurveysand Censuses 342
13.AAppendix 343
13.A.1NPSAS:08StudyRespondentDefinition 343
13.BAppendix 343 References 348
Index 349
Preface
Samplesurveysareusedbygovernmentstodescribethepopulationsoftheircountriesandprovideestimatesforuseinpolicydecisionmaking.Surveyscanfocuson individuals,households,businesses,studentsandschools,patientsandhospitals, plotsofland,orotherentities.Forsurveystobeusefulforofficialpurposesthey mustcoverthetargetpopulation,representtheentiretyofthepopulation,collect informationonkeyvariableswithaccuratemeasurementmethods,andhavelarge enoughsamplesizessothatestimatesaresufficientlypreciseatnationalandsubnationallevels.Achievingthesefourgoalsinanationwidesamplesurveywitha limitedbudgetwhilebeingconductedinashorttimeintervalisverychallenging. Thepurposeofthisbookistoexploredevelopmentsintheuseofadministrative recordsforimprovingsamplesurveys.
Samplesurveysaimtogatherinformationonapopulation.Thetargetpopulationisthespecificpartofthepopulationthatoneaimstosurvey.Somepartsof thebroaderpopulationtypicallyareexcludedfromthetargetpopulationbased oncontactmode,datacollectionmode,thesurveyframeorlist,orconvenience. Individualswithoutaregularaddress,residinginsomeformsofgroupquarters, orwithoutphoneorInternetaccess,forexample,mightbeeffectivelyineligibleto serveasrespondents.Surveyframesrecordcontactinformationandsomeother variablesonmembersofapopulation,butofcoursetheydonotnecessarilyinclude allmembersofthepopulationandhaveup-to-dateinformationoneveryone.Some individualswithaccuratecontactinformationintheframewillproveharderthan otherstocontactorevenrefusetoparticipate.Surveysthenarepotentiallylimitedtoreportingaboutrespondentsandthepopulationtowhichtheyaresimilar. Surveyscannotbeoverlylongorelsetheyriskdeterringpotentialrespondents andcostingalotofmoneyperrespondent.Asaresult,surveyscanaccommodate onlysomanyquestions.Self-reportandlessdetailedquestions,withtheirinherentlimitations,forsensitiveandcomplexitems,oftenmustbeusedforexpediency.Budgetsfornationalsurveyscompetewithothergovernmentinterests.Even largesurveystypicallyhavesmaller-than-desiredsamplesizesinlocalareasandin
subsetsofthepopulation.Despitethesesignificantchallenges,officialstatistical agenciesaroundtheworldgathercriticallyusefuldataonamyriadoftopics.
Theconditionsforconductingsamplesurveyshavechangedimmenselyinthe past100years.Thereislittlechancethatchangewillslowdown.In-personsurveyshavebeenreplacedandaugmentedbysurveysbymail,byphone,andby Internet.Contactanddatacollectionviamultiplemodesnowarestandard.The socialenvironment,too,hasevolved.Responseratesarelower.Despitetechnologicaladvances,peopleareincreasinglybusy.Officialgovernmentsurveyscompete forattentionwithever-moremarketingandpolling.Concernsoverprivacyand confidentialityhavebeenelevated,rightlyso,inthepublicconsciousness.Simultaneously,government,researchers,andthepublicwantmorefromdataandsurveys.Officialsurveyscontributetoidentifyingchallengesandtoimprovementsin society.Itisnotpractical,ormaybeevenpossible,togetmoreoutofoldwaysof conductingsurveys.
Administrativerecordsinageneralsensearerecordskeptforadministrative purposesofthegovernment.Administrativerecordscanpertaintoalmostall aspectsoflife,includingtaxes,wages,education,health,residence,voting,crime, andpropertyandbusinessownership.Doesanindividualhavealicensefora dog,forfishingatpubliclakes,todriveacarormotorcycle,ortoownagun? Doesanindividualreceivepublicassistancethroughagovernmentprogram? Administrativerecords,essentialforgovernmentoperations,containawealthof informationonlargesegmentsofthepopulation,buttherearelimitations.The recordscontaininformationononlysomevariablesonsubsetsoftheoverallpopulation.Informationiscollectedsothatagovernmentcanexecuteitsprogram, butnottypicallyforotherpurposes.Additionalvariablesthatmightbeinteresting forstudypurposeslikelyarenotrecorded.Methodsofrecordingvariablesmight notbethosethatwouldbeusedinascientificstudy.Thoseincludedinan administrativedatafilearenotarandomsamplefromthepopulation.Some administrativerecordsarecollectedoverthecourseofseveralmonthsoryears, insteadofonlyduringasuccincttimeinterval.
Theuseofadministrativerecordshasbeenpartofthesurveyprocessformany decades.Surveytextbookssinceatleastthe1960s(Cochran1977;Kish1967; Hansen,Hurwitz,andMadow1953;Särndal,Swensson,andWretman1992) presentmethodsforusingauxiliaryvariables.Ittypicallyisassumedthatvalues ofauxiliaryvariablesareavailableforallmembersofthepopulationwithout error,oratleastthataggregatetotalsareknown.Theymighthavecomefroma census,fromalargesurveyataprevioustime,oraspartofthesampleframe. Auxiliaryvariablesareusedforstratifiedsurveys,probabilityproportionaltosize sampling,differenceestimation,andratioestimation.Often,theyaretreatedin classicliteratureasknown,fixedvalues.
Despitethelimitationsofadministrativerecords,researchers,includingthe authorsinthisbook,havebeenexploringhow“adrecs”canbeusedtoimprove samplesurveysintoday’sworldandbuildontherecordofpastsuccesses.They haveexaminednewpossibilitiesforusingadministrativerecordinformationto addressfourgoals(coverage,response,variables,andaccuracy)ofofficialsurveys. Increasingtimelinessanddecreasingcoststhroughuseofadministrativerecords alsoareofcontinuinginterest.
Thebookisorganizedintofoursections.Thefirstsectioncontainstwochapters. Chapter1,byLi-ChunZhang,presentsfundamentalchallengesandapproaches tointegratingsurveyandadministrativedataforstatisticalpurposes.Thechapter focusesonadministrativedata,alsocalledregisterorregistrydata,asasourcefor proxyvariables.Theproxyvariablesobtainedfromadministrativesourcescan,for example,enhanceasurveybyprovidingadditionalinformation,beusedforqualityassessmentofresponses,andprovidesubstitutesformissingvalues.Chapter 2,byJohnMarionAbowd,IanSchmutte,andLarsVilhuberaddressesconfidentialityprotectionanddisclosurelimitationinlinkeddata.Linkingdataonpopulationelementsisanessentialstepformanyusesofadministrativerecordsin conjunctionwithsurveydata.Ifindividualsfromasurveycanbelocateduniquely inadministrativerecords,thenvariablesinthoseadministrativerecordscanbe meaningfullyassociatedwiththeiroriginatingunits,therebygeneratinguseful proxyvariables.Datafilesfromsurveys,bothfromthoselinkedtoadministrative informationandthosenot,aremadeavailabletoresearchersandpolicyanalysts. Instandardpractice,valuesofpersonallyidentifyinginformation,suchasnames, fine-levelgeographicinformationincludingaddresses,birthdates,andidentificationnumbers,aresuppressed.Adatafilecontainingarichsetofvariablesfor analysis,however,increasesthechancethatsomeonecouldidentifyauniqueindividualfromthesurveyinthepopulationbasedonthevaluesforseveralvariables. Theconcernisthatsuchanidentificationviolateslegalpromisesofconfidentiality,causesharmtoindividualswhoviewtheirsurveyresponsesandadministrativeinformationassensitive,andendangersfuturesurveyoperations.Chapter2 describesthreeapplications,traditionalstatisticaldisclosurelimitationmethods, andnewdevelopments.Thearticleincludesdiscussionofhowresearchersaccess data(accessmodalities)andtheusefulness(analyticvalidity)ofdatamadeavailableaftermodificationforenhanceddisclosurelimitation.
Section2groupstogetherfivechaptersondataqualityandrecordlinkage. Chapter3,byPietDaas,EricSchulteNordholt,MartjinTennekes,andSaskia Ossen,examinesthequalityofadministrativedatausedintheDutchvirtual census.Achallengeinassessingqualityofadatasourceishavingbetterinformationonsomevariablesforatleastasubsetofthepopulation.CoenHendriks, inChapter4,reportsonimprovingthequalityofdatagoingintoNorwegian register-basedstatistics.InChapter5,WilliamWinklerconsidersawiderangeof
topicsfrominitialcleaningofdatafiles,recordlinkage,andintegratedmodeling, editing,andimputation.Theimpactofcleaningdatafilesthroughstandardizing variables,parsingvariablessuchasaddressesintoseparablecomponents,and checkingforlogicalerrorscannotbeoverstated.Variousapproachesareinuse forlinkingrecordsfromtwofilesonthesamepopulation.Dr.Winklerreviews severalenhancements,includingvariationsinstringcomparatormetricsand memoryindexing,thathavebeenputintopracticeattheU.S.CensusBureau. JerryReiterwritesaboutassessinguncertaintywhenusingadministrative recordsinChapter6.Alongwithsurveyestimates,onetypicallyneedstoprovide estimatesofstandarderror.Howdothequalityofadministrativerecordsand theperformanceofthelinkagetothesurveyimpacttheaccuracyofestimates? Multipleimputation(Rubin1986,1987)couldbeoneareaforfurtherexploration. InChapter7,JosephSakshaugaddressesthespecificquestionofmeasuringand controllingnon-consentbiaswhensurveysandadministrativedataarelinked together.Itisincreasinglycommonforsurveysthatplantolinkrespondentsto administrativedatatoaskforpermissiontodoso.Someindividualsrefusetogive permissionforlinkageorcannotbelinkedduetootherreasons,suchasrefusing toprovideinformationonkeylinkagevariables.Thosewhoserecordsarenot linkablecanbedifferentinmanywaysfromthosewhoserecordsare.Biasdueto non-consenttolinkageandfailedlinkageisthereforeanovelcontributingfactor tototalsurveyerror.
Section3containsfourarticlesonusesofadministrativerecordsinsurveysand officialstatistics.Chapter8byIngegerdJansson,MartinAxelson,AndersHolmberg,PeterWerner,andSaraWestlingdescribesexperiencesinthefirstSwedish register-basedcensusofthepopulation.Inaregister-basedcensus,thepopulation iscountedandcharacteristicsaregathereddirectlyfromadministrativerecords, which,inthiscase,arereferredtoaspopulationregisters.Chapter9byVincent TomMuleandAndrewKelleroftheU.S.CensusBureaupresentsresearchon administrativerecordsapplicationsfortheU.S.2020DecennialCensusofthe population.IntheU.S.,thereisnouniversalpopulationregisterandthecensus involvesenumeratingandgatheringbasicinformationoneverypersoninthe country.Administrativerecordshavebeenusedtoimprovethedatagathering processinthepast.Thischapterdescribesexpandedoptionsforimproved design,qualityandaccuracyassessment,anddealingwithmissinginformation.
Chapter10byAndreaErciulescu,CarolinoFranco,andParthaLahiriconcerns methodsforimprovingsmallareaestimationusingadministrativerecords.Surveysaredesignedtoprovideaccurateestimatesatanationalorlargesubnational level,butnottypicallyforsmallgeographicareasorgroups.Smallareaestimation usesmodelsthatprovidearationaleforborrowingstrengthofsampleacross smallareasforlocalestimation.Themethodologyreliesonanadvantageous bias–variancetrade-offandestimationadmissibilityideas(e.g.EfronandMorris 1975).Administrativerecordscanprovidekeyvariablesforuseinsuchmodels.
Section4looksbeyondstatisticalmethodologyforuseofadministrativerecords withsurveysandprovidesthreearticlesaboutusingadministrativedatain evidence-basedpolicymaking.Theapplicationsareinhealth,economics,and education.Chapter11,byCordellGoldenandLisaMirel,focusesonenhancementofhealthsurveysattheU.S.NationalCenterforHealthStatistics,through datalinkage.Chapter12,byBruceMeyerandNikolasMittag,concernseconomic policyanalysis,withanemphasisonusingadministrativerecordstoimprove incomemeasurements.Chapter13,byPeterSiegel,DarrylCreel,andJames Chromy,discussescombiningdatafrommultiplesourcesinthecontextof educationstudies.
Thebookisintendedforadiverseaudience.Itshouldprovideinsightintodevelopmentsinmanyareasandinmanycountriesforthoseconductingsurveysand theirpartnerswhomanageandseektoimproveadministrativerecords.Several articlespresenttheoryaswellasapplicationandadvicebasedonpracticalexperience.Manychaptersinthebookincludeexercisesforreflectiononthematerialpresented.Thebookcouldbeofinteresttostudentsofstatistics,surveysamplingandmethodology,andquantitativeapplicationsingovernment.Certainly, thebookwillhaveusefulchaptersforavarietyofcourses.
Datasciencehasemergedasatermforanintegrationofstatistics,mathematics,andcomputingandtheirintegrationintheefforttosolvecomplexproblems. Administrativerecordsalongwithlarge-scalesamplesurveysprovideasettingfor thebestapplicationsindatascience.Thisbookhopefullywillmotivatethosein thedatasciencecommunitytolearnaboutsurveysampling,officialstatistics,and arichbodyofworkaimingtoutilizeadministrativerecordsforsamplesurveys andsurveymethodology.
AsaphYoungChun 23May2020 StatisticsResearchInstitute StatisticsKorea,RepublicofKorea
MichaelD.Larsen DepartmentofMathematicsandStatistics SaintMichael’sCollege,UnitedStates
GabrieleDurrant DepartmentofSocialStatisticsandDemography SouthamptonUniversity,UK
JeromeP.Reiter DepartmentofStatisticalScience DukeUniversity,UnitedStates
xx Preface References
Cochran,W.G.(1977). SamplingTechniques,3e.Wiley. Efron,B.andMorris,E.(1975).DataanalysisusingStein’sestimatorandits generalizations. JournaloftheAmericanStatisticalAssociation 70(350):311–319. Hansen,M.H.,Hurwitz,W.N.,andMadow,W.G.(1953). SampleSurveyMethodsand Theory,Volume1:MethodsandApplications;Volume2:Theory.Wiley. Kish,L.(1967). SurveySampling,2e.Wiley.
Rubin,D.B.(1986).Statisticalmatchingusingfileconcatenationwithadjusted weightsandmultipleimputations. JournalofBusinessandEconomicStatistics 4: 87–94.
Rubin,D.B.(1987). MultipleImputationforNonresponseinSurveys.NewYork:Wiley. Särndal,C.-E.,Swensson,B.,andWretman,J.(1992). ModelAssistedSurveySampling. SpringerSeriesinStatistics:Springer.
Acknowledgments
Theoriginofthisbookcanbetracedtothe2017meetingoftheEuropeanSurvey ResearchAssociationandthesession“AdministrativeRecordsforSurveyMethodology”(https://www.europeansurveyresearch.org/conference/programme2017? sess=81).Dr.AsaphYoungChun(thenoftheU.S.BureauoftheCensus).was theleadorganizerandchair.Additionalcoordinatorsofthatsessionincluded Drs.MichaelLarsen(thenatGeorgeWashingtonUniversity,Washington,DC), IngegerdJansson(StatisticsSweden),ManfredAntoni(InstituteforEmployment Research,IAB,Germany),andDanielFussandCorinnaKleinert(LeibnizInstituteforEducationalTrajectories,Germany).Paperspresentedattheconference included“EvaluationoftheQualityofAdministrativeDataUsedintheDutch VirtualCensus”(Schulteetal.2017),“EvaluatingtheAccuracyofAdministrative DatatoAugmentSurveyResponses”(Berzofsky,Zimmer,andSmith2017),and “AssessingAdministrativeDataQuality:TheTruthisOutThere”(Chunand Porter2017).
Dr.ChunwithDr.Larsenproposedthebookentitled AdministrativeRecordsfor SurveyMethodology toWileypublishing.Theintentofthebookwastofollowon theconferenceandreachfurtherintotopicsandapplicationsinadditionalcountriesanddisciplines.Dr.JerryReiter(DukeUniversity)andDr.GabrieleDurrant (UniversityofSouthamptom)joinedtheteamasassistanteditors.Sincetheinceptionofthisbook,Dr.ChunhasjoinedStatisticsKoreaandDr.Larsenhasmoved toSaintMichael’sCollegeinVermont.
Thetopicsdescribedbyauthorsinthisbookhavebeendescribedbythese authorsandothersatinternationalconferencessincethe2017ESRAmeeting.Dr. ChunorganizedpanelsessionsattheJointStatisticalMeetingsin2019entitled “LinkedDatatoAdvanceEvidenceBuildinginPublicPolicy”(https://ww2.amstat .org/meetings/jsm/2019/onlineprogram/ActivityDetails.cfm?SessionID=218399) andin2018entitled“AdministrativeRecordsforSurveyMethodologyandEvidenceBuilding”(https://ww2.amstat.org/meetings/jsm/2018/onlineprogram/
ActivityDetails.cfm?SessionID=215012).Somecontributorstothecurrentbook participatedinthesepanels.
Wewishtothankindividualswhohavecontributedtowardbringingthisvolumetofruition.Manypeoplehaveworkedtogethertomakethisbookpossible. First,editorsmadecommentsandsuggestionstoimprovetheseveralchaptersin thisbook.Second,afewindividualsservedasanonymousreviewersonindividual chapters.Third,eightanonymousreviewsontheoverallschemeofthebookwere providedbythepublisherWiley.Fourth,theindividualauthorshavebeenattentivetocommentsandsuggestionsfromtheeditorsandreviewersandgenerous withtheirtimeinimprovingtheircontributions.Fifth,authorsandreviewershave contributedtotheseeffortswiththesupportoftheirgovernmentagencies,educationalinstitutions,sponsoredfundingorganizations,andcompanies.Togetherall involvedhavemadethepresentworkareality.Weapologizeifwehavefailedto mentionanycontributors.
Finally,wewishtothankindividualsatWileywhoagreedtopublishthis manuscriptandwhohavehelpedusalongtheway.Theirremindersofdeadlines andencouragementshavekeptusgoingthroughsometransitions.Specifically, wewishtothankAssociateEditorKathleenSantoloci,ProjectEditorsBlesy RegulasandLindaChristina,supportpersonMindyOkura-Marszycki,Managing EditorKimberlyMonroe-Hill,andContentRefinementSpecialistViniprammia PremkumarofWileyKnowledge&Learning.
Wehopeyoufindthechaptersinthisbookinterestinganduseful.Welookforwardtonewdevelopmentswiththeuseofadministrativerecordsandotherdata sourceswithsamplesurveys.
References
Berzofsky,M.,Zimmer,S.,andSmith,T.(2017).Evaluatingtheaccuracyof administrativedatatoaugmentsurveyresponses.Presentationatthe7th ConferenceoftheEuropeanSurveyResearchAssociation(ESRA).
Chun,A.Y.,andPorter,S.(2017).Assessingadministrativedataquality:thetruthis outthere.Presentationatthe7thConferenceoftheEuropeanSurveyResearch Association(ESRA).
Schulte,E.,Daas,P.,Tennekes,M.,andOssen,S.(2017).Evaluationofthequalityof administrativedatausedintheDutchvirtualcensus.Presentationatthe7th ConferenceoftheEuropeanSurveyResearchAssociation(ESRA).
ListofContributors
JohnM.Abowd
U.S.CensusBureau 4600SilverHillRoad Washington,DC20233
USA
CornellUniversity Ithaca,NY14853
USA
MartinAxelson StatisticsSweden
Box24300
StockholmSE-10451
Sweden
JamesChromy
RTIInternational ResearchTrianglePark,NC27709
USA
DarrylCreel
RTIInternational Rockville,MD20852
USA
PietDaas StatisticsNetherlands
CBS-weg11
Heerlen theNetherlands,6412EX
AndreeaL.Erciulescu Westat Rockville,MD20850
USA
CarolinaFranco U.S.CensusBureau 4600SilverHillRoad Washington,DC20233
USA
CordellGolden U.S.NationalCenterforHealth Statistics(NCHS) 3311ToledoRoad Hyattsville,MD20782
USA
CoenHendriks StatisticsNorway Akersveien26 0177Oslo
Norway AndersHolmberg StatisticsSweden Box24300 StockholmSE-10451
Sweden
xxiv ListofContributors
IngegerdJansson
StatisticsSweden Box24300
StockholmSE-10451
Sweden
AndrewKeller
U.S.CensusBureau
4600SilverHillRoad Washington,DC20233
USA
ParthaLahiri UniversityofMaryland CollegePark Maryland20742
USA
BruceD.Meyer
U.S.CensusBureau 4600SilverHillRoad Washington,DC20233
USA UniversityofChicago 1307E.60thStreet Chicago,IL60637
USA
LisaB.Mirel
U.S.NationalCenterforHealth Statistics 3311ToledoRoad Hyattsville,MD20782
USA
NikolasMittag CERGE-EI
Politickýchv ˇ ez ˇ n ˚ u7 11121Prague1
CzechRepublic
VincentT.MuleJr.
U.S.CensusBureau 4600SilverHillRoad Washington,DC20233
USA
EricS.Nordholt StatisticsNetherlands
CBS-weg11 Heerlen theNetherlands,6412EX
SaskiaOssen StatisticsNetherlands
CBS-weg11 Heerlen theNetherlands,6412EX
JeromeP.Reiter DukeUniversity Durham,NC27708
USA
JosephW.Sakshaug InstituteforEmploymentResearch RegensburgerStr.104 90478Nuremberg
Germany
LudwigMaximilianUniversityof Munich Ludwigstr.33 80539Munich
Germany
IanM.Schmutte UniversityofGeorgia Athens,GA30602
USA
PeterSiegel
RTIInternational ResearchTrianglePark,NC27709
USA
MartijnTennekes
StatisticsNetherlands
CBS-weg11
Heerlen theNetherlands,6412EX
LarsVilhuber
CornellUniversity Ithaca,NY14853
USA
SaraWestling StatisticsSweden Box24300 StockholmSE-10451
Sweden
WilliamE.Winkler U.S.CensusBureau 4600SilverHillRoad Washington,DC20233
USA
Li-ChunZhang UniversityofSouthampton SouthamptonSO171BJ
UK