Issuu

Tille, Y: Sampling and Estimation from Finite Populations (Wiley Series in Probability and Statistics) Yves Tille

Visit to download the full and correct content document: https://ebookmass.com/product/tille-y-sampling-and-estimation-from-finite-populations -wiley-series-in-probability-and-statistics-yves-tille/

More products digital (pdf, epub, mobi) instant download maybe you interests ...

Bailey & Scott's Diagnostic Microbiology 15th Edition

Patricia Tille

https://ebookmass.com/product/bailey-scotts-diagnosticmicrobiology-15th-edition-patricia-tille/

Bailey & Scott’s Diagnostic Microbiology 14th Edition

Patricia M. Tille

https://ebookmass.com/product/bailey-scotts-diagnosticmicrobiology-14th-edition-patricia-m-tille/

Introduction to Linear Regression Analysis (Wiley Series in Probability and Statistics) 6th Edition Montgomery

https://ebookmass.com/product/introduction-to-linear-regressionanalysis-wiley-series-in-probability-and-statistics-6th-editionmontgomery/

Geostatistical Functional Data Analysis (Wiley Series in Probability and Statistics) 1st Edition Jorge Mateu (Editor)

https://ebookmass.com/product/geostatistical-functional-dataanalysis-wiley-series-in-probability-and-statistics-1st-editionjorge-mateu-editor/

Statistical Learning for Big Dependent Data (Wiley Series in Probability and Statistics) 1st Edition

Daniel Peña

https://ebookmass.com/product/statistical-learning-for-bigdependent-data-wiley-series-in-probability-and-statistics-1stedition-daniel-pena/

Methods of Multivariate Analysis (Wiley Series in Probability and Statistics Book 709) 3rd Edition –Ebook PDF Version

https://ebookmass.com/product/methods-of-multivariate-analysiswiley-series-in-probability-and-statistics-book-709-3rd-editionebook-pdf-version/

Probability and Statistics (4th Edition) Morris H.

Degroot

https://ebookmass.com/product/probability-and-statistics-4thedition-morris-h-degroot/

Probability and Statistics (GTU) 2nd Edition Ravish R

Singh

https://ebookmass.com/product/probability-and-statistics-gtu-2ndedition-ravish-r-singh/

Introduction to Probability and Statistics Metric Edition 1925-2009) Mendenhall

https://ebookmass.com/product/introduction-to-probability-andstatistics-metric-edition-1925-2009-mendenhall/

SamplingandEstimationfromFinitePopulations

WILEYSERIESINPROBABILITYANDSTATISTICS

EstablishedbyWALTERA.SHEWHARTandSAMUELS.WILKS Editors:NoelCressie,GarrettFitzmaurice,DavidBalding,GeertMolenberghs,GeofGivens, HarveyGoldstein,DavidScott,AdrianSmith,RueyTsay.

SamplingandEstimationfromFinitePopulations

YvesTillé

UniversitédeNeuchâtel

Switzerland

MostofthisbookhasbeentranslatedfromFrenchby IlyaHekimi

OriginalFrenchtitle: Théoriedessondages:Échantillonnageetestimationen populationsﬁnies

Thiseditionﬁrstpublished2020

Allrightsreserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,or transmitted,inanyformorbyanymeans,electronic,mechanical,photocopying,recordingorotherwise, exceptaspermittedbylaw.Adviceonhowtoobtainpermissiontoreusematerialfromthistitleisavailable athttp://www.wiley.com/go/permissions.

TherightofYvesTillétobeidentiﬁedastheauthorofthisworkhasbeenassertedinaccordancewithlaw.

RegisteredOﬃces

JohnWiley&Sons,Inc.,111RiverStreet,Hoboken,NJ07030,USA

JohnWiley&SonsLtd,TheAtrium,SouthernGate,Chichester,WestSussex,PO198SQ,UK

EditorialOﬃce

9600GarsingtonRoad,Oxford,OX42DQ,UK

Fordetailsofourglobaleditorialoﬃces,customerservices,andmoreinformationaboutWileyproducts visitusatwww.wiley.com.

Wileyalsopublishesitsbooksinavarietyofelectronicformatsandbyprint-on-demand.Somecontentthat appearsinstandardprintversionsofthisbookmaynotbeavailableinotherformats.

LimitofLiability/DisclaimerofWarranty

Whilethepublisherandauthorshaveusedtheirbesteffortsinpreparingthiswork,theymakeno representationsorwarrantieswithrespecttotheaccuracyorcompletenessofthecontentsofthisworkand specificallydisclaimallwarranties,includingwithoutlimitationanyimpliedwarrantiesofmerchantabilityor fitnessforaparticularpurpose.Nowarrantymaybecreatedorextendedbysalesrepresentatives,written salesmaterialsorpromotionalstatementsforthiswork.Thefactthatanorganization,website,orproductis referredtointhisworkasacitationand/orpotentialsourceoffurtherinformationdoesnotmeanthatthe publisherandauthorsendorsetheinformationorservicestheorganization,website,orproductmayprovide orrecommendationsitmaymake.Thisworkissoldwiththeunderstandingthatthepublisherisnotengaged inrenderingprofessionalservices.Theadviceandstrategiescontainedhereinmaynotbesuitableforyour situation.Youshouldconsultwithaspecialistwhereappropriate.Further,readersshouldbeawarethat websiteslistedinthisworkmayhavechangedordisappearedbetweenwhenthisworkwaswrittenandwhen itisread.Neitherthepublishernorauthorsshallbeliableforanylossofprofitoranyothercommercial damages,includingbutnotlimitedtospecial,incidental,consequential,orotherdamages.

LibraryofCongressCataloging-in-PublicationData

Names:Tillé,Yves,author.|Hekimi,Ilya,translator.

Title:Samplingandestimationfromﬁnitepopulations/YvesTillé;most ofthisbookhasbeentranslatedfromFrenchbyIlyaHekimi.

Othertitles:Théoriedessondages.English

Description:Hoboken,NJ:Wiley,[2020]|Series:Wileyseriesin probabilityandstatisticsapplied.Probabilityandstatisticssection| Translationof:Théoriedessondages:échantillonnageetestimation enpopulationsﬁnies.|Includesbibliographicalreferencesandindex.

Identiﬁers:LCCN2019048451|ISBN9780470682050(hardback)|ISBN 9781119071266(adobepdf)|ISBN9781119071273(epub)

Subjects:LCSH:Sampling(Statistics)|Publicopinionpolls–Statistical methods.|Estimationtheory.

Classiﬁcation:LCCQA276.6.T628132020|DDC519.5/2–dc23 LCrecordavailableathttps://lccn.loc.gov/2019048451

CoverDesign:Wiley

CoverImage:©gremlin/GettyImages

Setin10/12ptWarnockProbySPiGlobal,Chennai,India

PrintedandboundbyCPIGroup(UK)Ltd,Croydon,CR04YY 10987654321

Contents

ListofFigures xiii

ListofTables xvii

ListofAlgorithms xix

Preface xxi

PrefacetotheFirstFrenchEdition xxiii

TableofNotations xxv

1AHistoryofIdeasinSurveySamplingTheory 1

1.1Introduction 1

1.2EnumerativeStatisticsDuringthe19thCentury 2

1.3ControversyontheuseofPartialData 4

1.4DevelopmentofaSurveySamplingTheory 5

1.5TheUSElectionsof1936 6

1.6TheStatisticalTheoryofSurveySampling 6

1.7ModelingthePopulation 8

1.8AttempttoaSynthesis 9

1.9AuxiliaryInformation 9

1.10RecentReferencesandDevelopment 10

2Population,Sample,andEstimation 13

2.1Population 13

2.2Sample 14

2.3InclusionProbabilities 15

2.4ParameterEstimation 17

2.5EstimationofaTotal 18

2.6EstimationofaMean 19

2.7VarianceoftheTotalEstimator 20

2.8SamplingwithReplacement 22 Exercises 24

3SimpleandSystematicDesigns 27

3.1SimpleRandomSamplingwithoutReplacementwithFixedSampleSize 27

3.1.1SamplingDesignandInclusionProbabilities 27

3.1.2TheExpansionEstimatoranditsVariance 28

3.1.3CommentontheVariance–CovarianceMatrix 31

3.2BernoulliSampling 32

3.2.1SamplingDesignandInclusionProbabilities 32

3.2.2Estimation 34

3.3SimpleRandomSamplingwithReplacement 36

3.4ComparisonoftheDesignswithandWithoutReplacement 38

3.5SamplingwithReplacementandRetainingDistinctUnits 38

3.5.1SampleSizeandSamplingDesign 38

3.5.2InclusionProbabilitiesandEstimation 41

3.5.3ComparisonoftheEstimators 44

3.6InverseSamplingwithReplacement 45

3.7EstimationofOtherFunctionsofInterest 47

3.7.1EstimationofaCountoraProportion 47

3.7.2EstimationofaRatio 48

3.8DeterminationoftheSampleSize 50

3.9ImplementationofSimpleRandomSamplingDesigns 51

3.9.1ObjectivesandPrinciples 51

3.9.2BernoulliSampling 51

3.9.3SuccessiveDrawingoftheUnits 52

3.9.4RandomSortingMethod 52

3.9.5Selection–RejectionMethod 53

3.9.6TheReservoirMethod 54

3.9.7ImplementationofSimpleRandomSamplingwithReplacement 56

3.10SystematicSamplingwithEqualProbabilities 57

3.11EntropyforSimpleandSystematicDesigns 58

3.11.1BernoulliDesignsandEntropy 58

3.11.2EntropyandSimpleRandomSampling 60

3.11.3GeneralRemarks 61 Exercises 61

4Stratiﬁcation 65

4.1PopulationandStrata 65

4.2Sample,InclusionProbabilities,andEstimation 66

4.3SimpleStratiﬁedDesigns 68

4.4StratiﬁedDesignwithProportionalAllocation 70

4.5OptimalStratiﬁedDesignfortheTotal 71

4.6NotesAboutOptimalityinStratiﬁcation 74

4.7PowerAllocation 75

4.8OptimalityandCost 76

4.9SmallestSampleSize 76

4.10ConstructionoftheStrata 77

4.10.1GeneralComments 77

4.10.2DividingaQuantitativeVariableinStrata 77

4.11StratiﬁcationUnderManyObjectives 79 Exercises 80

5SamplingwithUnequalProbabilities 83

5.1AuxiliaryVariablesandInclusionProbabilities 83

5.2CalculationoftheInclusionProbabilities 84

5.3GeneralRemarks 85

5.4SamplingwithReplacementwithUnequalInclusionProbabilities 86

5.5NonvalidityoftheGeneralizationoftheSuccessiveDrawingwithout Replacement 88

5.6SystematicSamplingwithUnequalProbabilities 89

5.7Deville’sSystematicSampling 91

5.8PoissonSampling 92

5.9MaximumEntropyDesign 95

5.10Rao–SampfordRejectiveProcedure 98

5.11OrderSampling 100

5.12SplittingMethod 101

5.12.1GeneralPrinciples 101

5.12.2MinimumSupportDesign 103

5.12.3DecompositionintoSimpleRandomSamplingDesigns 104

5.12.4PivotalMethod 107

5.12.5BrewerMethod 109

5.13ChoiceofMethod 110

5.14VarianceApproximation 111

5.15VarianceEstimation 114 Exercises 115

6BalancedSampling 119

6.1Introduction 119

6.2BalancedSampling:Deﬁnition 120

6.3BalancedSamplingandLinearProgramming 122

6.4BalancedSamplingbySystematicSampling 123

6.5MethodeofDeville,Grosbras,andRoth 124

6.6CubeMethod 125

6.6.1RepresentationofaSamplingDesignintheformofaCube 125

6.6.2ConstraintSubspace 126

6.6.3RepresentationoftheRoundingProblem 127

6.6.4PrincipleoftheCubeMethod 130

6.6.5TheFlightPhase 130

6.6.6LandingPhasebyLinearProgramming 133

6.6.7ChoiceoftheCostFunction 134

6.6.8LandingPhasebyRelaxingVariables 135

6.6.9QualityofBalancing 135

6.6.10AnExample 136

6.7VarianceApproximation 137

6.8VarianceEstimation 140

6.9SpecialCasesofBalancedSampling 141

6.10PracticalAspectsofBalancedSampling 141 Exercise 142

7ClusterandTwo-stageSampling 143

7.1ClusterSampling 143

7.1.1NotationandDeﬁnitions 143

7.1.2ClusterSamplingwithEqualProbabilities 146

7.1.3SamplingProportionaltoSize 147

7.2Two-stageSampling 148

7.2.1Population,Primary,andSecondaryUnits 149

7.2.2TheExpansionEstimatoranditsVariance 151

7.2.3SamplingwithEqualProbability 155

7.2.4Self-weightingTwo-stageDesign 156

7.3Multi-stageDesigns 157

7.4SelectingPrimaryUnitswithReplacement 158

7.5Two-phaseDesigns 161

7.5.1DesignandEstimation 161

7.5.2VarianceandVarianceEstimation 162

7.6IntersectionofTwoIndependentSamples 163 Exercises 165

8OtherTopicsonSampling 167

8.1SpatialSampling 167

8.1.1TheProblem 167

8.1.2GeneralizedRandomTessellationStratiﬁedSampling 167

8.1.3UsingtheTravelingSalesmanMethod 169

8.1.4TheLocalPivotalMethod 169

8.1.5TheLocalCubeMethod 169

8.1.6MeasuresofSpread 170

8.2CoordinationinRepeatedSurveys 172

8.2.1TheProblem 172

8.2.2Population,Sample,andSampleDesign 173

8.2.3SampleCoordinationandResponseBurden 174

8.2.4PoissonMethodwithPermanentRandomNumbers 175

8.2.5KishandScottMethodforStratiﬁedSamples 176

8.2.6TheCottonandHesseMethod 176

8.2.7TheRivièreMethod 177

8.2.8TheNetherlandsMethod 178

8.2.9TheSwissMethod 178

8.2.10CoordinatingUnequalProbabilityDesignswithFixedSize 181

8.2.11Remarks 181

8.3MultipleSurveyFrames 182

8.3.1Introduction 182

8.3.2CalculatingInclusionProbabilities 183

8.3.3UsingInclusionProbabilitySums 184

8.3.4UsingaMultiplicityVariable 185

8.3.5UsingaWeightedMultiplicityVariable 186

8.3.6Remarks 187

8.4IndirectSampling 187

8.4.1Introduction 187

8.4.2AdaptiveSampling 188

8.4.3SnowballSampling 188

8.4.4IndirectSampling 189

8.4.5TheGeneralizedWeightSharingMethod 190

8.5Capture–Recapture 191

9EstimationwithaQuantitativeAuxiliaryVariable 195

9.1TheProblem 195

9.2RatioEstimator 196

9.2.1MotivationandDeﬁnition 196

9.2.2ApproximateBiasoftheRatioEstimator 197

9.2.3ApproximateVarianceoftheRatioEstimator 198

9.2.4BiasRatio 199

9.2.5RatioandStratiﬁedDesigns 199

9.3TheDiﬀerenceEstimator 201

9.4EstimationbyRegression 202

9.5TheOptimalRegressionEstimator 204

9.6DiscussionoftheThreeEstimationMethods 205 Exercises 208

10Post-StratiﬁcationandCalibrationonMarginalTotals 209

10.1Introduction 209

10.2Post-Stratiﬁcation 209

10.2.1NotationandDeﬁnitions 209

10.2.2Post-StratiﬁedEstimator 211

10.3ThePost-StratiﬁedEstimatorinSimpleDesigns 212

10.3.1Estimator 212

10.3.2ConditioninginaSimpleDesign 213

10.3.3PropertiesoftheEstimatorinaSimpleDesign 214

10.4EstimationbyCalibrationonMarginalTotals 217

10.4.1TheProblem 217

10.4.2CalibrationonMarginalTotals 218

10.4.3CalibrationandKullback–LeiblerDivergence 220

10.4.4RakingRatioEstimation 221

10.5Example 221 Exercises 224

11MultipleRegressionEstimation 225

11.1Introduction 225

11.2MultipleRegressionEstimator 226

11.3AlternativeFormsoftheEstimator 227

11.3.1HomogeneousLinearEstimator 227

11.3.2ProjectiveForm 228

11.3.3CosmeticForm 228

11.4CalibrationoftheMultipleRegressionEstimator 229

11.5VarianceoftheMultipleRegressionEstimator 230

11.6ChoiceofWeights 231

11.7SpecialCases 231

11.7.1RatioEstimator 231

x Contents

11.7.2Post-stratiﬁedEstimator 231

11.7.3RegressionEstimationwithaSingleExplanatoryVariable 233

11.7.4OptimalRegressionEstimator 233

11.7.5ConditionalEstimation 235

11.8ExtensiontoRegressionEstimation 236 Exercise 236

12CalibrationEstimation 237

12.1CalibratedMethods 237

12.2DistancesandCalibrationFunctions 239

12.2.1TheLinearMethod 239

12.2.2TheRakingRatioMethod 240

12.2.3PseudoEmpiricalLikelihood 242

12.2.4ReverseInformation 244

12.2.5TheTruncatedLinearMethod 245

12.2.6GeneralPseudo-Distance 246

12.2.7TheLogisticMethod 249

12.2.8DevilleCalibrationFunction 249

12.2.9RoyandVanheuverzwynMethod 251

12.3SolvingCalibrationEquations 252

12.3.1SolvingbyNewton’sMethod 252

12.3.2BoundManagement 253

12.3.3ImproperCalibrationFunctions 254

12.3.4ExistenceofaSolution 254

12.4CalibratingonHouseholdsandIndividuals 255

12.5GeneralizedCalibration 256

12.5.1CalibrationEquations 256

12.5.2LinearCalibrationFunctions 257

12.6CalibrationinPractice 258

12.7AnExample 259 Exercises 260

13Model-Basedapproach 263

13.1ModelApproach 263

13.2TheModel 263

13.3HomoscedasticConstantModel 267

13.4HeteroscedasticModel1WithoutIntercept 267

13.5HeteroscedasticModel2WithoutIntercept 269

13.6UnivariateHomoscedasticLinearModel 270

13.7StratiﬁedPopulation 271

13.8SimpliﬁedVersionsoftheOptimalEstimator 273

13.9CompletedHeteroscedasticityModel 276

13.10Discussion 277

13.11AnApproachthatisBothModel-andDesign-based 277

14EstimationofComplexParameters 281

14.1EstimationofaFunctionofTotals 281

14.2VarianceEstimation 282

14.3CovarianceEstimation 283

14.4ImplicitFunctionEstimation 283

14.5CumulativeDistributionFunctionandQuantiles 284

14.5.1CumulativeDistributionFunctionEstimation 284

14.5.2QuantileEstimation:Method1 284

14.5.3QuantileEstimation:Method2 285

14.5.4QuantileEstimation:Method3 287

14.5.5QuantileEstimation:Method4 288

14.6CumulativeIncome,LorenzCurve,andQuintileShareRatio 288

14.6.1CumulativeIncomeEstimation 288

14.6.2LorenzCurveEstimation 289

14.6.3QuintileShareRatioEstimation 289

14.7GiniIndex 290

14.8AnExample 291

15VarianceEstimationbyLinearization 295

15.1Introduction 295

15.2OrdersofMagnitudeinProbability 295

15.3AsymptoticHypotheses 300

15.3.1LinearizingaFunctionofTotals 301

15.3.2VarianceEstimation 303

15.4LinearizationofFunctionsofInterest 303

15.4.1LinearizationofaRatio 303

15.4.2LinearizationofaRatioEstimator 304

15.4.3LinearizationofaGeometricMean 305

15.4.4LinearizationofaVariance 305

15.4.5LinearizationofaCovariance 306

15.4.6LinearizationofaVectorofRegressionCoeﬃcients 307

15.5LinearizationbySteps 308

15.5.1DecompositionofLinearizationbySteps 308

15.5.2LinearizationofaRegressionCoeﬃcient 308

15.5.3LinearizationofaUnivariateRegressionEstimator 309

15.5.4LinearizationofaMultipleRegressionEstimator 309

15.6LinearizationofanImplicitFunctionofInterest 310

15.6.1EstimatingEquationandImplicitFunctionofInterest 310

15.6.2LinearizationofaLogisticRegressionCoeﬃcient 311

15.6.3LinearizationofaCalibrationEquationParameter 313

15.6.4LinearizationofaCalibratedEstimator 313

15.7InﬂuenceFunctionApproach 314

15.7.1FunctionofInterest,Functional 314

15.7.2Deﬁnition 315

15.7.3LinearizationofaTotal 316

15.7.4LinearizationofaFunctionofTotals 316

15.7.5LinearizationofSumsandProducts 317

15.7.6LinearizationbySteps 318

15.7.7LinearizationofaParameterDeﬁnedbyanImplicitFunction 318

15.7.8LinearizationofaDoubleSum 319

15.8Binder’sCookbookApproach 321

15.9DemnatiandRaoApproach 322

15.10LinearizationbytheSampleIndicatorVariables 324

15.10.1TheMethod 324

15.10.2LinearizationofaQuantile 326

15.10.3LinearizationofaCalibratedEstimator 327

15.10.4LinearizationofaMultipleRegressionEstimator 328

15.10.5LinearizationofanEstimatorofaComplexFunctionwithCalibrated Weights 329

15.10.6LinearizationoftheGiniIndex 330

15.11DiscussiononVarianceEstimation 331 Exercises 331

16TreatmentofNonresponse 333

16.1SourcesofError 333

16.2CoverageErrors 334

16.3DiﬀerentTypesofNonresponse 334

16.4NonresponseModeling 335

16.5TreatingNonresponsebyReweighting 336

16.5.1NonresponseComingfromaSample 336

16.5.2ModelingtheNonresponseMechanism 337

16.5.3DirectCalibrationofNonresponse 339

16.5.4ReweightingbyGeneralizedCalibration 341

16.6Imputation 342

16.6.1GeneralPrinciples 342

16.6.2ImputingFromanExistingValue 342

16.6.3ImputationbyPrediction 342

16.6.4LinkBetweenRegressionImputationandReweighting 343

16.6.5RandomImputation 345

16.7VarianceEstimationwithNonresponse 347

16.7.1GeneralPrinciples 347

16.7.2EstimationbyDirectCalibration 348

16.7.3GeneralCase 349

16.7.4VarianceforMaximumLikelihoodEstimation 350

16.7.5VarianceforEstimationbyCalibration 353

16.7.6VarianceofanEstimatorImputedbyRegression 356

16.7.7OtherVarianceEstimationTechniques 357

17SummarySolutionstotheExercises 359

Bibliography 379

AuthorIndex 405

SubjectIndex 411

ListofFigures

Figure1.1 Auxiliaryinformationcanbeusedbeforeorafterdatacollectionto improveestimations 10

Figure4.1 Stratiﬁeddesign:thesamplesareselectedindependentlyfromone stratumtoanother 67

Figure5.1 Systematicsampling:examplewithinclusionprobabilities ��1 =0.2, ��2

Figure5.2 MethodofDeville 91

Figure5.3 Splittingintotwoparts 102

Figure5.4 Splittingin M parts 103

Figure5.5 Minimumsupportdesign 105

Figure5.6 Decompositionintosimplerandomsamplingdesigns 106

Figure5.7 Pivotalmethodappliedonvector �� =(0 3, 0 4, 0 6, 0 7)⊤ 108

Figure6.1 Possiblesamplesinapopulationofsize N = 3 126

Figure6.2 Fixedsizeconstraint:thethreesamplesofsize n = 2areconnectedby anaﬃnesubspace 126

Figure6.3 Noneoftheverticesof K isavertexofthecube 128

Figure6.4 Twoverticesof K areverticesofthecube,butthethirdisnot 129

Figure6.5 Flightphaseinapopulationofsize N = 3withaconstraintofﬁxedsize n = 2 131

Figure7.1 Clustersampling:thepopulationisdividedintoclusters.Clustersare randomlyselected.Allunitsfromtheselectedclustersareincludedin thesample 144

Figure7.2 Two-stagesamplingdesign:werandomlyselectprimaryunitsinwhich weselectasampleofsecondaryunits 149

Figure7.3 Two-phasedesign:asample Sb isselectedinsample Sa 161

Figure7.4 Thesample S istheintersectionofsamples SA and SB 164

Figure8.1 Ina 40×40 grid,asystematicsampleandastratiﬁedsamplewithone unitperstratumareselected 168

Figure8.2 RecursivequadrantfunctionusedfortheGRTSmethodwiththree subdivisions 168

ListofFigures

Figure8.3 Originalfunctionwithfourrandompermutations 168

Figure8.4 Samplesof64pointsinagridof 40×40=1600 pointsusingsimple designs,GRTS,thelocalpivotalmethod,andthelocalcube method 170

Figure8.5 Sampleof64pointsinagridof 40×40=1600 pointsandVoronoï polygons.Applicationstosimple,systematic,andstratiﬁeddesigns,the localpivotalmethod,andthelocalcubemethod 171

Figure8.6 Intervalcorrespondingtotheﬁrstwave(extractfromQualité, 2009) 179

Figure8.7 Positivecoordinationwhen �� 2 k ≤ �� 1 k (extractfromQualité,2009) 179

Figure8.8 Positivecoordinationwhen �� 2 k ≥ �� 1

Figure8.9 Negativecoordinationwhen

(extractfromQualité,2009) 179

1 (extractfromQualité, 2009) 180

Figure8.10 Negativecoordinationwhen

1 (extractfromQualité, 2009) 180

Figure8.11 Coordinationofathirdsample(extractfromQualité,2009) 181

Figure8.12 Twosurveyframes UA and UB coverthepopulation.Ineachone,we selectasample 183

Figure8.13 Inthisexample,thepointsrepresentcontaminatedtrees.Duringthe initialsampling,theshadedsquaresareselected.Thebordersinbold surroundtheﬁnalselectedzones 189

Figure8.14 Exampleofindirectsampling.Inpopulation UA , theunitssurrounded byacircleareselected.Twoclusters(UB1 and UB3 )ofpopulation UB eachcontainatleastoneunitthathasalinkwithaunitselectedin population UA .Unitsof UB surroundedbyacircleareselectedatthe end 190

Figure9.1 Ratioestimator:observationsalignedalongalinepassingthroughthe origin 196

Figure9.2 Diﬀerenceestimator:observationsalignedalongalineofslopeequal to1 201

Figure10.1 Post-stratiﬁcation:thepopulationisdividedinpost-strata,butthe sampleisselectedwithouttakingpost-strataintoaccount 210

Figure12.1 Linearmethod:pseudo-distance G(��k , dk ) with qk =1 and dk =10 239

Figure12.2 Linearmethod:function g (��k , dk ) with qk =1 and dk =10. 240

Figure12.3 Linearmethod:function Fk (u) with qk =1 240

Figure12.4 Rakingratio:pseudo-distance G(��k , dk ) with qk =1 and dk =10 241

Figure12.5 Rakingratio:function g (��k , dk ) with qk =1 and dk =10 241

Figure12.6 Rakingratio:function Fk (u) with qk =1 241

Figure12.7 Reverseinformation:pseudo-distance G(��k , dk ) with qk =1 and dk =10 244

Figure12.8 Reverseinformation:function g (��k , dk ) with qk =1 and dk =10 244

Figure12.9 Reverseinformation:function Fk (u) with qk =1 245

Figure12.10 Truncatedlinearmethod:pseudo-distance G(��k , dk ) with qk =1, dk =10, L =0 2,and H =2 5 246

Figure12.11 Truncatedlinearmethod:function g (��k , dk ) with qk =1, dk =10, L =0 2,and H =2 5 246

Figure12.12 Truncatedlinearmethod:calibrationfunction Fk (u) with qk =1, dk =10, L =0.2,and H =2.5 246

Figure12.13 Pseudo-distances G�� (��k , dk ) with �� =−1, 0, 1∕2, 1, 2, 3 and dk =2 247

Figure12.14 Calibrationfunctions F �� k (u) with �� =−1, 0, 1∕2, 1, 2, 3 and qk =1 248

Figure12.15 Logisticmethod:pseudo-distance G(��k , dk ) with qk =1, dk =10, L =0 2,and H =2 5 249

Figure12.16 Logisticmethod:function g (��k , dk ) with qk =1, dk =10, L =0.2,and H =2 5 250

Figure12.17 Logisticmethod:calibrationfunction Fk (u) with qk =1, L =0.2, and H =2.5 250

Figure12.18 Devillecalibration:pseudo-distance G(��k , dk ) with qk =1, dk =10. 250

Figure12.19 Devillecalibration:calibrationfunction Fk (u) with qk =1 251

Figure12.20 Pseudo-distances G�� (��k , dk ) ofRoyandVanheuverzwynwith �� =0, 1, 2, 3, dk =2,and qk =1 251

Figure12.21 Calibrationfunction ̃ F �� k (u) ofRoyandVanheuverzwynwith �� =0, 1, 2, 3 and qk =1 252

Figure12.22 Variationofthe g -weightsfordiﬀerentcalibrationmethodsasa functionoftheirrank 260

Figure13.1 Totaltaxableincomeinmillionsofeuroswithrespecttothenumberof inhabitantsinBelgianmunicipalitiesof100000peopleorlessin2004 (Source:Statbel).Thecloudofpointsisalignedalongalinegoing throughtheorigin 268

Figure14.1 Stepcumulativedistributionfunction ̂ F1 (x) withcorresponding quartiles 285

Figure14.2 Cumulativedistributionfunction ̂ F2 (y) obtainedbyinterpolationof points (yk , F1 (yk )) withcorrespondingquartiles 286

Figure14.3 Cumulativedistributionfunction ̂ F3 (x) obtainedbyinterpolatingthe centeroftheriserswithcorrespondingquartiles 287

Figure14.4 LorenzcurveandthesurfaceassociatedwiththeGiniindex 292

Figure16.1 Two-phaseapproachfornonresponse.Thesetofrespondents R isa subsetofsample S336

Figure16.2 Thereversedapproachfornonresponse.Thesampleof nonrespondents R isindependentoftheselectedsample S336

ListofTables

Table3.1 Simpledesigns:summarytable 38

Table3.2 Exampleofsamplesizesrequiredfordiﬀerentpopulationsizesand diﬀerentvaluesof b for �� =0.05 and ̂ P =1∕2 51

Table4.1 Applicationofoptimalallocation:thesamplesizeislargerthanthe populationsizeinthethirdstratum 73

Table4.2 Secondapplicationofoptimalallocationinstrata1and2 73

Table5.1 Minimumsupportdesign 105

Table5.2 Decompositionintosimplerandomsamplingdesigns 106

Table5.3 Decompositioninto N simplerandomsamplingdesigns 107

Table5.4 Propertiesofthemethods 111

Table6.1 Populationof20studentswithvariables,constant,gender(1,male, 2female),age,andamarkof20inastatisticsexam 136

Table6.2 Totalsandexpansionestimatorsforbalancingvariables 137

Table6.3 Variancesoftheexpansionestimatorsofthemeansundersimple randomsamplingandbalancedsampling 137

Table7.1 Blocknumber,numberofhouseholds,andtotalhouseholdincome 165

Table8.1 MeansofspatialbalancingmeasuresbasedonVoronoïpolygons B(as ) andmodiﬁedMoranindices IB forsixsamplingdesignson1000 simulations 172

Table8.2 Selectionintervalsfornegativecoordinationandselectionindicatorsin thecasewherethePRNsfallswithintheinterval.Ontheleft,thecase where �� 1 k + �� 2 k ≤ 1 (Figure8.9).Ontheright,thecasewhere �� 1 k + �� 2 k ≥ 1 (Figure8.10) 180

Table8.3 Selectionindicatorsforeachselectionintervalforunit k181

Table9.1 Estimationmethods:summarytable 206

Table10.1 Populationpartition 217

Table10.2 Totalswithrespecttotwovariables 218

Table10.3 Calibration,startingtable 219

Table10.4 SalariesinEuros 222

Table10.5 Estimatedtotalsusingsimplerandomsamplingwithout replacement 222

Table10.6 Knownmarginsusingacensus 222

Table10.7 Iteration1:rowtotaladjustment 222

Table10.8 Iteration2:columntotaladjustment 223

Table10.9 Iteration3:rowtotaladjustment 223

Table10.10 Iteration4:columntotaladjustment 223

Table12.1 Pseudo-distancesforcalibration 248

Table12.2 Calibrationfunctionsandtheirderivatives 253

Table12.3 Minima,maxima,means,andstandarddeviationsoftheweightsfor eachcalibrationmethod 260

Table14.1 Sample,variableofinterest yk ,weights ��k ,cumulativeweights Wk , and relativecumulativeweights pk 285

Table14.2 Tableofﬁctitiousincomes yk ,weights ��k ,cumulativeweights Wk , relativecumulativeweights pk ,cumulativeincomes ̂ Y (pk ),andthe Lorenzcurve ̂ L(pk ) 292

Table14.3 TotalsnecessarytoestimatetheGiniindex 293

ListofAlgorithms

Algorithm1 Bernoullisampling 52

Algorithm2 Selection–rejectionmethod 53

Algorithm3 Reservoirmethod 55

Algorithm4 Sequentialalgorithmforsimplerandomsamplingwith replacement 56

Algorithm5 Systematicsamplingwithequalprobabilities 57

Algorithm6 Systematicsamplingwithunequalprobabilities 90

Algorithm7 AlgorithmforPoissonsampling 93

Algorithm8 Sampfordprocedure 100

Algorithm9 Generalalgorithmforthecubemethod 132

Algorithm10 PositivecoordinationusingtheKishandScottmethod 177

Algorithm11 NegativecoordinationwiththeRivièremethod 178

Algorithm12 NegativecoordinationwithEDSmethod 179

Preface

Theﬁrstversionofthisbookwaspublishedin2001,theyearIlefttheEcoleNationalede laStatistiqueetdel’Analysedel’Information(ENSAI)inRennes(France)toteachatthe UniversityofNeuchâtelinSwitzerland.Thisversioncamefromseveralcoursematerials ofsamplingtheorythatIhadtaughtinRennes.AttheENSAI,thecollaborationwith Jean-ClaudeDevillewasparticularlystimulating.

Theeditingofthisneweditionwaslaboriousandwasdoneinﬁtsandstarts.Ithank allthosewhoreviewedthedraftsandprovidedmewiththeircomments.Specialthanks toMoniqueGrafforhermeticulousre-readingofsomechapters.

Thealmost20yearsIspentinNeuchâtelweredottedwithmultipleadventures.Iam particularlygratefultoPhilippeEichenbergerandJean-PierreRenfer,whosuccessively headedtheStatisticalMethodsSectionoftheFederalStatisticalOﬃce.Theirtrustand professionalismhelpedtoestablishafruitfulexchangebetweentheInstituteofStatistics oftheUniversityofNeuchâtelandtheSwissFederalStatisticalOﬃce.

IamalsoverygratefultothePhDstudentsthatIhavehadthepleasureofmentoringsofar.Eachthesisisanadventurethatteachesbothsupervisoranddoctoral student.ThankyoutoAlinaMatei,LionelQuality,DesislavaNedyalkova,ErikaAntal, MattiLangel,TokyRandrianasolo,EricGraf,CarenHasler,MatthieuWilhelm,Mihaela Guinand-Anastasiade,andAudrey-AnneValléewhotrustedmeandwhomIhadthe pleasuretosuperviseforafewyears.

Neuchâtel,2018

YvesTillé

PrefacetotheFirstFrenchEdition

ThisbookcontainsteachingmaterialthatIstartedtodevelopin1994.Allchaptershave indeedservedasasupportforteaching,acourse,training,aworkshoporaseminar.By groupingthismaterial,Ihopetopresentacoherentandmodernsetofresultsonthe sampling,estimation,andtreatmentofnonresponses,inotherwords,onallthestatisticaloperationsofastandardsamplesurvey.

Inproducingthisbook,mygoalisnottoprovideacomprehensiveoverviewofsurvey samplingtheory,butrathertoshowthatsamplingtheoryisalivingdiscipline,withavery broadscope.If,inseveralchaptersdemonstrationshavebeendiscarded,Ihavealways beencarefultoreferthereadertobibliographicalreferences.Theabundanceofvery recentpublicationsatteststothefertilityofthe1990sinthisarea.Allthedevelopments presentedinthisbookarebasedontheso-called“design-based”approach.Intheory, thereisanotherpointofviewbasedonpopulationmodeling.Iintentionallyleftthis approachaside,notoutofdisinterest,buttoproposeanapproachthatIdeemconsistent andethicallyacceptabletothepublicstatistician.

Iwouldliketothankallthepeoplewho,inonewayoranother,helpedmetomake thisbook:LaurenceBroze,whoentrustedmewithmyﬁrstsamplingcourseattheUniversityLille3,CarlSärndal,whoencouragedmeonseveraloccasions,andYvesBerger, withwhomIsharedanoﬃceattheUniversitéLibredeBruxellesforseveralyearsand whogavemeamultitudeofreleventremarks.MythanksalsogotoAntonioCanedo whotaughtmetouseLaTeX,toLydiaZaïdwhohascorrectedthemanuscriptseveral times,andtoJeanDumaisforhismanyconstructivecomments.

Iwrotemostofthisbookatthe ÉcoleNationaledelaStatistiqueetdel’Analysede l’Information.Thewarmatmospherethatprevailedinthestatisticsdepartmentgave mealotofsupport.IespeciallythankmycolleaguesFabienneGaude,CameliaGoga, andSylvieRousseau,whometiculouslyrereadthemanuscript,andGermaineRazé, whodidtheworkofreproductionoftheproofs.SeveralexercisesareduetoPascal Ardilly,Jean-ClaudeDeville,andLaurentWilms.Iwanttothankthemforallowingme toreproducethem.MygratitudegoesparticularlytoJean-ClaudeDevilleforourfruitfulcollaborationwithintheLaboratoryofSurveyStatisticsoftheCenterforResearchin EconomicsandStatistics.Thechaptersonthesplittingmethodandbalancedsampling alsoreﬂecttheresearchthatwehavedonetogether.

Bruz,2001

YvesTillé

TableofNotations

# cardinal(numberofelementsinaset)

≪ muchlessthan

∖ A∖B complementof B in A

′ function f ′ (x) isthederivativeof f (x)

! factorial: n!= n ×(n −1)×···×2×1

( N n ) N ! n!(N n)! numberofwaystochoose k unitsfrom N units

[a ± b] interval [a b, a + b]

≈ isapproximatelyequalto

∝ isproportionalto

∼ followsaspeciﬁcprobabilitydistribution(forarandomvalue)

��{A} equals1if A istrueand0otherwise ak numberoftimesunit k isinthesample a vectorof ak

B0 , B1 , B2 , populationregressioncoeﬃcients

B vectorofpopulationregressioncoeﬃcients

��0 ,��1 ,��2 , … regressioncoeﬃcientsformodel M

�� vectorofregressioncoeﬃcientsofmodel M

B vectorofestimatedregressioncoeﬃcients

�� vectorofestimatedregressioncoeﬃcientsofthemodel

C cubewhoseverticesaresamples

covp (X , Y ) covariancebetweenrandomvariables X and Y

cov(X , Y ) estimatedcovariancebetweenrandomvariables X and Y

CVpopulationcoeﬃcientofvariation

CVestimatedcoeﬃcientofvariation

dk dk =1∕��k expansionestimatorsurveyweights

Ep ( ̂ Y ) mathematicalexpectationunderthesamplingdesign p(.) of estimator ̂ Y

EM ( ̂ Y ) mathematicalexpectationunderthemodel M ofestimator ̂ Y

Eq ( ̂ Y ) mathematicalexpectationunderthenonresponsemechanism q of estimator ̂ Y

EI ( ̂ Y ) mathematicalexpectationundertheimputationmechanism I of estimator ̂ Y

MSEmeansquareerror

f samplingfraction f = n∕N

TableofNotations

gk (.,.) pseudo-distancederivativeforcalibration

gk adjustmentfactoraftercalibrationcalled g -weight gk = ��k ��k = ��k ∕dk

Gk (.,.) pseudo-distanceforcalibration

h strataorpost-strataindex

IC(1− �� ) conﬁdenceintervalwithconﬁdencelevel 1− ��

k ou �� indicatesastatisticalunit, k ∈ U or �� ∈ U

KC ∩ Q intersectionofthecubeandconstraintspaceforthecube method

m numberofclustersorprimaryunitsinthesampleofclustersor primaryunits

M numberofclustersorprimaryunitsinthepopulation

n Samplesize(withoutreplacement)

ni numberofsecondaryunitssampledinprimaryunit i

nS sizeofthesamplein S ifthesizeisrandom

N populationsize

Samplesizeinstratumorpost-stratum Uh

Nh numberofunitsinstratumorpost-stratum Uh

Ni numberofsecondaryunitsinprimaryunit i

Nij populationtotalswhen (i, j) isacontingencytable

ℕ setofnaturalnumbers

ℕ+ setofpositivenaturalnumberswithzero

p(s) probabilityofselectingsample s

pi probabilityofsamplingunit i forsamplingwithreplacement

P or PD proportionofunitsbelongingtodomain D

Pr(A) probabilitythatevent A occurs

Pr(A|B) probabilitythatevent A occurs,given B occurred

Q subspaceofconstraintsforthecubemethod

rk responseindicator

ℝ setofrealnumbers

ℝ+ setofpositiverealnumberswithzero

ℝ∗ + setofstrictlypositiverealnumbers

s Sampleorsubsetofthepopulation, s ⊂ U

s2 y Samplevarianceofvariable y

s2 yh Samplevarianceof y instratumorpost-stratum h

sxy covariancebetweenvariables x and y inthesample

S randomsamplesuchthat Pr (S = s)= p(s)

s2 y varianceofvariance y inthepopulation

Sxy covariancebetweenvariables x and y inthepopulation

Sh randomsampleselectedinstratumorpost-stratum h

s2 yh populationvarianceof y inthestratumorpost-stratum h

⊤ vector u⊤ isthetransposeofvector u

U ﬁnitepopulationofsize N

Uh stratumorpost-stratum h,where h =1, … , H

��k linearizedvariable

��HT ( ̂ Y ) Horvitz–Thompsonestimatorofthevarianceofestimator ̂ Y

��SYG ( ̂ Y ) Sen–Yates–Grundyestimatorofthevarianceofestimator ̂ Y

TableofNotations

varp ( ̂ Y )

varM ( ̂ Y )

varq ( ̂ Y )

varI ( ̂ Y )

��( ̂ Y )

��

varianceofestimator ̂ Y underthesurveydesign

varianceofestimator ̂ Y underthemodel

varianceofestimator ̂ Y underthenonresponsemechanism

varianceofestimator ̂ Y undertheimputationmechanism

varianceestimatorofestimator ̂ Y

k or ��k (S ) weightassociatedwithindividual k inthesampleaftercalibration

x auxiliaryvariable

xk auxiliaryvariablevalueofunit k

xk vectorin ℝp ofthe p valuestakenbytheauxiliaryvariableson k

X totalvalueoftheauxiliaryvariableoveralltheunitsof U

X expansionestimatorof X

X meanvalueoftheauxiliaryvariablesoveralltheunitsof U

X expansionestimatorof X

y variableofinterest

yk valueofthevariableofinterestforunit k

∗ k imputedvalueof y for k (treatingnonresponse)

Y totalvalueofthevariableofinterestoveralltheunitsof U

Yh totalvalueofthevariableofinterestoveralltheunitsinstratumor post-stratum Uh

Yi totalof yk inprimaryunitorcluster i

̂ Y expansionestimatorof Y

Y meanvalueofthevariableofinterestoveralltheunitsof U

Y h meanvalueofthevariableofinterestoverallunitsofstratumor post-stratum Uh

Y h estimatorofthemeanvalueofthevariableofinterestoverallunitsof stratumorpost-stratum Uh

Y expansionestimatorof Y

̂ YBLU bestunbiasedlinearestimatorunderthemodeloftotal Y

̂ YCAL calibratedestimatoroftotal Y

̂ YD diﬀerenceestimatoroftotal Y

̂ Yh estimatoroftotal Yh instratumorpost-stratum Uh

̂ YHAJ Hájekestimatorof Y

̂ YHH Hansen–Hurwitzestimatorof Y

̂ YIMP estimatorusedwhenmissingvaluesareimputed

̂ YOPT expansionestimatorofthetotalinanoptimalstratiﬁeddesign

̂ YPOST post-stratiﬁedestimatorofthetotal Y

̂ YPROP expansionestimatorofthetotalinastratiﬁeddesignwith proportionalallocation

̂ YREG regressionestimatoroftotal Y

̂ YREGM multipleregressionestimatoroftotal Y

̂ YREG-OPT optimalregressionestimatoroftotal Y

̂ YRB Rao–Blackwellizedestimatorof Y

̂ YQ ratioestimatorofthetotal

̂ YSTRAT expansionestimatorofthetotalinastratiﬁeddesign

zp quantileoforder p ofastandardizednormalrandomvariable

�� probabilitythattheparameterisoutsidetheinterval

k �� k �� k ��

��k inclusionprobabilityofunit k

��k �� second-orderinclusionprobabilitiesforunits k and �� k �� =Pr(k and �� ∈ S )

��k responseprobabilityofunit k

�� 2 varianceofaninﬁnitepopulationorvariableof y orvarianceunder themodel

�� correlationbetween x and y inthepopulation

AHistoryofIdeasinSurveySamplingTheory

1.1Introduction

Lookingback,thedebatesthatanimatedascientificdisciplineoftenappearfutile. However,thehistoryofsamplingtheoryisparticularlyinstructive.Itisoneofthe specializationsofstatisticswhichitselfhasasomewhatspecialposition,sinceitisused inalmostallscientificdisciplines.Statisticsisinseparablefromitsfieldsofapplicationsinceitdetermineshowdatashouldbeprocessed.Statisticsisthecornerstone ofquantitativescientificmethods.Itisnotpossibletodeterminetherelevanceof theapplicationsofastatisticaltechniquewithoutreferringtothescientificmethods ofthedisciplinesinwhichitisapplied.

Scientifictruthisoftenpresentedastheconsensusofascientificcommunityataspecificpointintime.Thehistoryofascientificdisciplineisthestoryoftheseconsensuses andespeciallyoftheirchanges.SincetheworkofThomasSamuelKuhn(1970),we haveconsideredthatsciencedevelopsaroundparadigmsthatare,accordingtoKuhn (1970,p.10),“modelsfromwhichspringparticularcoherenttraditionsofscientific research.”Thesemodelshavetwocharacteristics:“Theirachievementwassufficiently unprecedentedtoattractanenduringgroupofadherentsawayfromcompetingmodes ofscientificactivity.Simultaneously,itwassufficientlyopen-endedtoleaveallsortsof problemsfortheredefinedgroupofpractitionerstoresolve.”(Kuhn,1970,p.10).

Manyauthorshaveproposedachronologyofdiscoveriesinsurveytheorythatreflect themajorcontroversiesthathavemarkeditsdevelopment(seeamongothersHansen &Madow,1974;Hansenetal.,1983;Owen&Cochran,1976;Sheynin,1986;Stigler, 1986).Bellhouse(1988a)interpretsthistimelineasastoryofthegreatideasthatcontributedtothedevelopmentofsurveysamplingtheory.Statisticsisapeculiarscience. Withmathematicsfortools,itallowsthemethodologyoftheotherdisciplinestobe finalized.Becauseoftheclosecorrelationbetweenamethodandthemultiplicityofits fieldsofaction,statisticsisbasedonamultitudeofdifferentideasfromthevarious disciplinesinwhichitisapplied.

Thetheoryofsurveysamplingplaysapreponderantroleinthedevelopmentof statistics.However,theuseofsamplingtechniqueshasbeenacceptedonlyvery recently.Amongthecontroversiesthathaveanimatedthistheory,weﬁndsomeof theclassicaldebatesofmathematicalstatistics,suchastheroleofmodelinganda discussionofestimationtechniques.Samplingtheorywastornbetweenthemajor currentsofstatisticsandgaverisetomultipleapproaches:design-based,model-based, model-assisted,predictive,andBayesian. SamplingandEstimationfromFinitePopulations,

1.2EnumerativeStatisticsDuringthe19thCentury

IntheMiddleAges,severalattemptstoextrapolatepartialdatatoanentirepopulation canbefoundinDroesbekeetal.(1987).In1783,inFrance,PierreSimondeLaplace (see1847)presentedtotheAcademyofSciencesamethodtodeterminethenumber ofinhabitantsfrombirthregistersusingasampleofregions.Heproposedtocalculate, fromthissampleofregions,theratioofthenumberofinhabitantstothenumberof birthsandthentomultiplyitbythetotalnumberofbirths,whichcouldbeobtained withprecisionforthewholepopulation.Laplaceevensuggestedestimating“theerror tobefeared”byreferringtothecentrallimittheorem.Inaddition,herecommended theuseofaratioestimatorusingthetotalnumberofbirthsasauxiliaryinformation. Surveymethodologyaswellasprobabilistictoolswereknownbeforethe19thcentury. However,neverduringthisperiodwasthereaconsensusabouttheirvalidity.

Thedevelopmentofstatistics(etymologically,fromGerman:analysisofdataaboutthe state)isinseparablefromtheemergenceofmodernstatesinthe19thcentury.Oneofthe mostoutstandingpersonalitiesintheoﬃcialstatisticsofthe19thcenturyistheBelgian AdolpheQuételet(1796–1874).HeknewofLaplace’smethodandmaintainedacorrespondencewithhim.AccordingtoStigler(1986,pp.164–165),Quételetwasinitially attractedtotheideaofusingpartialdata.HeeventriedtoapplyLaplace’smethodto estimatethepopulationoftheNetherlandsin1824(whichBelgiumwasapartofuntil 1830).However,itseemsthathethenralliedtoanotefromKeverberg(1827)which severelycriticizedtheuseofpartialdatainthenameofprecisionandaccuracy:

Inmyopinion,thereisonlyonewaytoarriveatanexactknowledgeofthepopulationandtheelementsofwhichitiscomposed:itisthatofanactualanddetailed enumeration;thatistosay,theformationofnominativestatesofalltheinhabitants,withindicationoftheirageandoccupation.Onlybythismodeofoperation canreliabledocumentsbeobtainedontheactualnumberofinhabitantsofa country,andatthesametimeonthestatisticsoftheagesofwhichthepopulationiscomposed,andthebranchesofindustryinwhichitﬁndsthemeansof comfortandprosperity.1

InoneofhisletterstotheDukeofSaxe-CoburgGotha,Quételet(1846,p.293)also advocatesforanexhaustivestatement:

LaPlacehadproposedtosubstituteforthecensusofalargecountry,suchas France,somespecialcensusesinselecteddepartmentswherethiskindofoperationmighthavemorechancesofsuccess,andthentocarefullydeterminethe ratioofthepopulationeitheratbirthoratdeath.Bymeansoftheseratiosof thebirthsanddeathsofalltheotherdepartments,ﬁgureswhichcanbeascertainedwithsuﬃcientaccuracy,itistheneasytodeterminethepopulationof

1TranslatedfromFrench:“Àmonavis,iln’existequ’unseulmoyendeparveniràuneconnaissanceexacte delapopulationetdesélémensdontellesecompose:c’estcelled’undénombrementeﬀectifetdétaillé; c’est-à-dire,delaformationd’étatsnominatifsdetousleshabitans,avecindicationdeleurâgeetdeleur profession.Cen’estqueparcemoded’opérer,qu’onpeutobtenirdesdocumensdignesdeconﬁancesurle nombreréeld’habitansd’unpays,etenmêmetempssurlastatistiquedesâgesdontlapopulationse compose,etdesbranchesd’industriedanslesquelleselletrouvedesmoyensd’aisanceetdeprospérité.”

1.2EnumerativeStatisticsDuringthe19thCentury 3 thewholekingdom.Thiswayofoperatingisveryexpeditious,butitsupposes aninvariableratiopassingfromonedepartmenttoanother.[···]Thisindirect methodmustbeavoidedasmuchaspossible,althoughitmaybeusefulinsome cases,wheretheadministrationwouldhavetoproceedquickly;itcanalsobeused withadvantageasameansofcontrol.2

ItisinterestingtoexaminetheargumentusedbyQuételet(1846,p.293)tojustifyhis position.

Tonotobtainthefacultyofverifyingthedocumentsthatarecollectedistofail inoneoftheprincipalrulesofscience.Statisticsisvaluableonlybyitsaccuracy; withoutthisessentialquality,itbecomesnull,dangerouseven,sinceitleadsto error.3

Again,accuracyisconsideredabasicprincipleofstatisticalscience.Despitetheexistenceofprobabilistictoolsanddespitevariousapplicationsofsamplingtechniques,the useofpartialdatawasperceivedasadubiousandunscientificmethod.Quételethad agreatinfluenceonthedevelopmentofofficialstatistics.HeparticipatedinthecreationofasectionforstatisticswithintheBritishAssociationoftheAdvancementof Sciencesin1833withThomasMalthusandCharlesBabbage(seeHorvàth,1974).One ofitsobjectiveswastoharmonizetheproductionofofficialstatistics.Heorganizedthe InternationalCongressofStatisticsinBrusselsin1853.Quételetwaswellacquainted withtheadministrativesystemsofFrance,theUnitedKingdom,theNetherlands,and Belgium.Hehasprobablycontributedtotheideathattheuseofpartialdataisunscientific.

Somepersonalities,suchasMalthusandBabbageinGreatBritain,andQuételetin Belgium,contributedgreatlytothedevelopmentofstatisticalmethodology.Onthe otherhand,theestablishmentofastatisticalapparatuswasanecessityintheconstructionofmodernstates,anditisprobablynotacoincidencethatthesepersonalitiescome fromthetwocountriesmostrapidlyaﬀectedbytheindustrialrevolution.Atthattime, thestatistician’sobjectivewasmainlytomakeenumerations.Themainconcernwasto inventorytheresourcesofnations.Inthiscontext,theuseofsamplingwasunanimously rejectedasaninexactandfundamentallyunscientiﬁcprocedure.Throughoutthe19th century,thediscussionsofstatisticiansfocusedonhowtoobtainreliabledataandon thepresentation,interpretation,andpossiblymodeling(adjustment)ofthesedata.

2TranslatedfromFrench:“LaPlaceavaitproposédesubstitueraurecensementd’ungrandpays,telquela France,quelquesrecensementsparticuliersdansdesdépartementschoisis,oùcegenred’opérationpouvait avoirplusdechancesdesuccès,puisd’ydétermineravecsoinlerapportdelapopulationsoitauxnaissances soitauxdécès.Aumoyendecesrapportsdesnaissancesetdesdécèsdetouslesautresdépartements, chiﬀresqu’onpeutconstateravecassezd’exactitude,ildevientfacileensuitededéterminerlapopulationde toutleroyaume.Cettemanièred’opéreresttrèsexpéditive,maisellesupposeunrapportinvariableen passantd’undépartementàunautre.[···]Cetteméthodeindirectedoitêtreévitéeautantquepossible,bien qu’ellepuisseêtreutiledanscertainscas,oùl’administrationauraitàprocéderavecrapidité;onpeutaussi l’employeravecavantagecommemoyendecontrôle.”

3TranslatedfromFrench:“Nepasseprocurerlafacultédevériﬁerlesdocumentsquel’onréunit,c’est manqueràl’unedesprincipalesrèglesdelascience.Lastatistiquen’adevaleurqueparsonexactitude;sans cettequalitéessentielle,elledevientnulle,dangereusemêmepuisqu’elleconduitàl’erreur.”

1.3ControversyontheuseofPartialData

In1895,theNorwegianAndersNicolaiKiær,DirectoroftheCentralStatisticalOfficeof Norway,presentedtotheCongressoftheInternationalStatisticalInstituteofStatistics (ISI)inBernaworkentitled Observationsetexpériencesconcernantdesdénombrements représentatifs (Observationsandexperimentsonrepresentativeenumeration)forasurveyconductedinNorway.Kiær(1896)firstselectedasampleofcitiesandmunicipalities. Then,ineachofthesemunicipalities,heselectedonlysomeindividualsusingthefirst letteroftheirsurnames.Heappliedatwo-stagedesign,butthechoiceoftheunitswas notrandom.Kiærarguesfortheuseofpartialdataifitisproducedusinga“representativemethod”.Accordingtothismethod,thesamplemustbearepresentationwith areducedsizeofthepopulation.Kiær’sconceptofrepresentativenessislinkedtothe quotamethod.Hisspeechwasfollowedbyaheateddebate,andtheproceedingsof theCongressoftheISIreflectalongdispute.Letustakeacloserlookatthearguments fromtwoopponentsofKiær’smethod(seeISIGeneralAssemblyMinutes,1896).

GeorgvonMayr(Prussia)[ ]Itisespeciallydangeroustocallforthissystemof representativeinvestigationswithinanassemblyofstatisticians.Itisunderstandablethatforlegislativeoradministrativepurposessuchlimitedenumerationmay beuseful–butthenitmustberememberedthatitcanneverreplacecomplete statisticalobservation.Itisallthemorenecessarytosupportthispoint,thatthere isamongusinthesedaysacurrentamongmathematicianswho,inmanydirections,wouldrathercalculatethanobserve.Butwemustremainﬁrmandsay:no calculationwhereobservationcanbedone.4

GuillaumeMilliet(Switzerland).Ibelievethatitisnotrighttogiveacongressional voicetotherepresentativemethod(whichcanonlybeanexpedient)animportancethatseriousstatisticswillneverrecognize.Nodoubt,statisticsmadewith thismethod,or,asImightcallit,statistics, parsprototo,hasgivenushereand thereinterestinginformation;butitsprincipleissomuchincontradictionwith thedemandsofthestatisticalmethodthatasstatisticians,weshouldnotgrantto imperfectthingsthesamerightofbourgeoisie,sotospeak,thatweaccordtothe idealthatscientiﬁcallyweproposetoreach.5

4TranslatedfromFrench:“C’estsurtoutdangereuxdesedéclarerpourcesystèmedesinvestigations représentativesauseind’uneassembléedestatisticiens.Oncomprendquepourdesbutslégislatifsou administratifsunteldénombrementrestreintpeutêtreutile–maisalorsilnefautpasoublierqu’ilnepeut jamaisremplacerl’observationstatistiquecomplète.Ilestd’autantplusnécessaired’appuyerlà-dessus,qu’ily aparminousdanscesjoursuncourantauseindesmathématiciensqui,dansdenombreusesdirections, voudraientplutôtcalculerqu’observer.Maisilfautresterfermeetdire:pasdecalcullàoùl’observationpeut êtrefaite.”

5TranslatedfromFrench:“Jecroisqu’iln’estpasjustededonnerparunvœuducongrèsàlaméthode représentative(quienﬁnnepeutêtrequ’unexpédient)uneimportancequelastatistiquesérieusene reconnaîtrajamais.Sansdoute,lastatistiquefaiteaveccetteméthodeou,commejepourraisl’appeler,la statistique, parsprototo,nousadonnéçaetlàdesrenseignementsintéressants;maissonprincipeest tellementencontradictionaveclesexigencesquedoitavoirlaméthodestatistique,que,commestatisticiens, nousnedevonspasaccorderauxchosesimparfaiteslemêmedroitdebourgeoisie,pourainsidire,quenous accordonsàl’idéalquescientiﬁquementnousnousproposonsd’atteindre.”

Thecontentofthesereactionscanagainbesummarizedasfollows:sincestatisticsis bydefinitionexhaustive,renouncingcompleteenumerationdeniestheverymissionof statisticalscience.ThediscussiondoesnotconcernthemethodproposedbyKiaer,but isonthedefinitionofstatisticalscience.However,Kiaerdidnotletgo,andcontinuedto defendtherepresentativemethodin1897atthecongressoftheISIatSt.Petersburg(see Kiær,1899),in1901inBudapest,andin1903inBerlin(seeKiær,1903,1905).Afterthis date,theissueisnolongermentionedattheISICongress.However,Kiærobtainedthe supportofArthurBowley(1869–1957),whothenplayedadecisiveroleinthedevelopmentofsamplingtheory.Bowley(1906)presentedanempiricalverificationofthe applicationofthecentrallimittheoremtosampling.Hewasthetruepromoterofrandomsamplingtechniques,developedstratifieddesignswithproportionalallocations, andusedthelawoftotalvariance.ItwillbenecessarytowaitfortheendoftheFirst WorldWarandtheemergenceofanewgenerationofstatisticiansfortheproblemto berediscussedwithintheISI.Onthissubject,wecannothelpbutquoteMaxPlank’s reflectionontheappearanceofnewscientifictruths:“anewscientifictruthdoesnot triumphbyconvincingitsopponentsandmakingthemseethelight,butratherbecause itsopponentseventuallydie,andanewgenerationgrowsupthatisfamiliarwithit” (quotedbyKuhn,1970,p.151).

In1924,acommission(composedofArthurBowley,CorradoGini,AdolpheJensen, LucienMarch,VerrijnStuart,andFrantzZizek)wascreatedtoevaluatetherelevanceof usingtherepresentativemethod.Theresultsofthiscommission,entitled“Reportonthe representativemethodofstatistics”,werepresentedatthe1925ISICongressinRome. Thecommissionacceptedtheprincipleofsurveysamplingaslongasthemethodologyis respected.ThirtyyearsafterKiær’scommunication,theideaofsamplingwasofficially accepted.Thecommissionlaidthefoundationforfutureresearch.Twomethodsare clearlydistinguished:“randomselection”and“purposiveselection”.Thesetwomethods correspondtotwofundamentallydifferentscientificapproaches.Ontheonehand,the validationofrandommethodsisbasedonthecalculationofprobabilitiesthatallows confidenceintervalstobebuildforcertainparameters.Ontheotherhand,thevalidation ofthepurposiveselectionmethodcanonlybeobtainedthroughexperimentationby comparingtheobtainedestimationstocensusresults.Therefore,randommethodsare validatedbyastrictlymathematicalargumentwhilepurposivemethodsarevalidatedby anexperimentalapproach.

1.4DevelopmentofaSurveySamplingTheory

ThereportofthecommissionpresentedtotheISICongressin1925markedtheofficial recognitionoftheuseofsurveysampling.Mostofthebasicproblemshadalreadybeen posed,suchastheuseofrandomsamplesandthecalculationofthevarianceofthe estimatorsforsimpleandstratifieddesigns.Theacceptanceoftheuseofpartialdata,and especiallytherecommendationtouserandomdesigns,ledtoarapidmathematizationof thistheory.Atthattime,thecalculationofprobabilitieswasalreadyknown.Inaddition, statisticianshadalreadydevelopedatheoryforexperimentalstatistics.Everythingwas inplacefortherapidprogressofafertilefieldofresearch:theconstructionofastatistical theoryofsurveysampling.

Where can buy Tille, y: sampling and estimation from finite populations (wiley series in probability

SamplingandEstimationfromFinitePopulations

WILEYSERIESINPROBABILITYANDSTATISTICS

SamplingandEstimationfromFinitePopulations

ListofFigures

ListofTables

ListofAlgorithms

Preface

PrefacetotheFirstFrenchEdition

TableofNotations

TableofNotations

AHistoryofIdeasinSurveySamplingTheory

1.2EnumerativeStatisticsDuringthe19thCentury

1.3ControversyontheuseofPartialData

1.4DevelopmentofaSurveySamplingTheory