FundamentalStatistical Inference AComputationalApproach
MarcS.Paolella
DepartmentofBankingandFinance UniversityofZurich Switzerland
Thiseditionfirstpublished2018 ©2018JohnWiley&SonsLtd
Allrightsreserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmitted,in anyformorbyanymeans,electronic,mechanical,photocopying,recordingorotherwise,exceptaspermittedby law.Adviceonhowtoobtainpermissiontoreusematerialfromthistitleisavailableathttp://www.wiley.com/ go/permissions.
TherightofMarcS.Paolellatobeidentifiedastheauthorofthisworkhasbeenassertedinaccordancewithlaw.
RegisteredOffices
JohnWiley&Sons,Inc.,111RiverStreet,Hoboken,NJ07030,USA
JohnWiley&SonsLtd,TheAtrium,SouthernGate,Chichester,WestSussex,PO198SQ,UK
EditorialOffice
9600GarsingtonRoad,Oxford,OX42DQ,UK
Fordetailsofourglobaleditorialoffices,customerservices,andmoreinformationaboutWileyproductsvisitus atwww.wiley.com.
Wileyalsopublishesitsbooksinavarietyofelectronicformatsandbyprint-on-demand.Somecontentthat appearsinstandardprintversionsofthisbookmaynotbeavailableinotherformats.
LimitofLiability/DisclaimerofWarranty
Whilethepublisherandauthorshaveusedtheirbesteffortsinpreparingthiswork,theymakenorepresentations orwarrantieswithrespecttotheaccuracyorcompletenessofthecontentsofthisworkandspecificallydisclaim allwarranties,includingwithoutlimitationanyimpliedwarrantiesofmerchantabilityorfitnessforaparticular purpose.Nowarrantymaybecreatedorextendedbysalesrepresentatives,writtensalesmaterialsor promotionalstatementsforthiswork.Thefactthatanorganization,website,orproductisreferredtointhis workasacitationand/orpotentialsourceoffurtherinformationdoesnotmeanthatthepublisherandauthors endorsetheinformationorservicestheorganization,website,orproductmayprovideorrecommendationsit maymake.Thisworkissoldwiththeunderstandingthatthepublisherisnotengagedinrenderingprofessional services.Theadviceandstrategiescontainedhereinmaynotbesuitableforyoursituation.Youshouldconsult withaspecialistwhereappropriate.Further,readersshouldbeawarethatwebsiteslistedinthisworkmayhave changedordisappearedbetweenwhenthisworkwaswrittenandwhenitisread.Neitherthepublishernor authorsshallbeliableforanylossofprofitoranyothercommercialdamages,includingbutnotlimitedto special,incidental,consequential,orotherdamages.
LibraryofCongressCataloging-in-PublicationDataappliedfor
HardbackISBN:9781119417866
CoverdesignbyWiley
Coverimages:CourtesyofMarcS.Paolella
Setin10/12ptTimesLTStdbySPiGlobal,Chennai,India 10987654321
PARTIESSENTIALCONCEPTSINSTATISTICS
1IntroducingPointandIntervalEstimation3
1.1PointEstimation/4
1.1.1BernoulliModel/4
1.1.2GeometricModel/6
1.1.3SomeRemarksonBiasandConsistency/11
1.2IntervalEstimationviaSimulation/12
1.3IntervalEstimationviatheBootstrap/18
1.3.1ComputationandComparisonwithParametricBootstrap/18
1.3.2ApplicationtoBernoulliModelandModification/20
1.3.3DoubleBootstrap/24
1.3.4DoubleBootstrapwithAnalyticInnerLoop/26
1.4BootstrapConfidenceIntervalsintheGeometricModel/31
1.5Problems/35
2GoodnessofFitandHypothesisTesting37
2.1EmpiricalCumulativeDistributionFunction/38
2.1.1TheGlivenko–CantelliTheorem/38
2.1.2ProofsoftheGlivenko–CantelliTheorem/41
2.1.3ExamplewithContinuousDataandApproximateConfidence Intervals/45
2.1.4ExamplewithDiscreteDataandApproximateConfidence Intervals/49
2.2ComparingParametricandNonparametricMethods/52
2.3Kolmogorov–SmirnovDistanceandHypothesisTesting/57
2.3.1TheKolmogorov–SmirnovandAnderson–DarlingStatistics/57
2.3.2SignificanceandHypothesisTesting/59
2.3.3Small-SampleCorrection/63
2.4TestingNormalitywithKDandAD/65
2.5TestingNormalitywith W 2 and U 2 /68
2.6TestingtheStableParetianDistributionalAssumption:FirstAttempt/69
2.7Two-SampleKolmogorovTest/73
2.8Moreon(Moron?)HypothesisTesting/74
2.8.1Explanation/75
2.8.2MisuseofHypothesisTesting/77
2.8.3UseandMisuseof p-Values/79
2.9Problems/82
3Likelihood85
3.1Introduction/85
3.1.1ScalarParameterCase/87
3.1.2VectorParameterCase/92
3.1.3RobustnessandtheMCDEstimator/100
3.1.4AsymptoticPropertiesoftheMaximumLikelihoodEstimator/102
3.2Cramér–RaoLowerBound/107
3.2.1UnivariateCase/108
3.2.2MultivariateCase/111
3.3ModelSelection/114
3.3.1ModelMisspecification/114
3.3.2TheLikelihoodRatioStatistic/117
3.3.3UseofInformationCriteria/119
3.4Problems/120
4NumericalOptimization123
4.1RootFinding/123
4.1.1OneParameter/124
4.1.2SeveralParameters/131
4.2ApproximatingtheDistributionoftheMaximumLikelihoodEstimator/135
4.3GeneralNumericalLikelihoodMaximization/136
4.3.1Newton–RaphsonandQuasi-NewtonMethods/137
4.3.2ImposingParameterRestrictions/140
4.4EvolutionaryAlgorithms/145
4.4.1DifferentialEvolution/146
4.4.2CovarianceMatrixAdaptionEvolutionaryStrategy/149
4.5Problems/155
5MethodsofPointEstimation157
5.1UnivariateMixedNormalDistribution/157
5.1.1Introduction/157
5.1.2SimulationofUnivariateMixtures/160
5.1.3DirectLikelihoodMaximization/161
5.1.4UseoftheEMAlgorithm/169
5.1.5Shrinkage-TypeEstimation/174
5.1.6Quasi-BayesianEstimation/176
5.1.7ConfidenceIntervals/178
5.2AlternativePointEstimationMethodologies/184
5.2.1MethodofMomentsEstimator/185
5.2.2UseofGoodness-of-FitMeasures/190
5.2.3QuantileLeastSquares/191
5.2.4PearsonMinimumChi-Square/193
5.2.5EmpiricalMomentGeneratingFunctionEstimator/195
5.2.6EmpiricalCharacteristicFunctionEstimator/198
5.3ComparisonofMethods/199
5.4APrimeronShrinkageEstimation/200
5.5Problems/202
PARTIIFURTHERFUNDAMENTALCONCEPTSINSTATISTICS
6Q-QPlotsandDistributionTesting209
6.1P-PPlotsandQ-QPlots/209
6.2NullBands/211
6.2.1DefinitionandMotivation/211
6.2.2PointwiseNullBandsviaSimulation/212
6.2.3AsymptoticApproximationofPointwiseNullBands/213
6.2.4MappingPointwiseandSimultaneousSignificanceLevels/215
6.3Q-QTest/217
6.4FurtherP-PandQ-QTypePlots/219
6.4.1(Horizontal)StabilizedP-PPlots/219
6.4.2ModifiedS-PPlots/220
6.4.3MSPTestforNormality/224
6.4.4ModifiedPercentile(Fowlkes-MP)Plots/228
6.5FurtherTestsforCompositeNormality/231
6.5.1Motivation/232
6.5.2Jarque–BeraTest/234
6.5.3ThreePowerful(andMoreRecent)NormalityTests/237
6.5.4TestingGoodnessofFitviaBinning:Pearson’s X 2 P Test/240
6.6CombiningTestsandPowerEnvelopes/247
6.6.1CombiningTests/248
6.6.2PowerComparisonsforTestingCompositeNormality/252
6.6.3MostPowerfulTestsandPowerEnvelopes/252
6.7DetailsofaFailedAttempt/255
6.8Problems/260
7UnbiasedPointEstimationandBiasReduction269
7.1Sufficiency/269
7.1.1Introduction/269
7.1.2Factorization/272
7.1.3MinimalSufficiency/276
7.1.4TheRao–BlackwellTheorem/283
7.2CompletenessandtheUniformlyMinimumVarianceUnbiasedEstimator/286
7.3AnExamplewithi.i.d.GeometricData/289
7.4MethodsofBiasReduction/293
7.4.1TheBias-FunctionApproach/293
7.4.2Median-UnbiasedEstimation/296
7.4.3Mode-AdjustedEstimator/297
7.4.4TheJackknife/302
7.5Problems/305
8AnalyticIntervalEstimation313
8.1Definitions/313
8.2PivotalMethod/315
8.2.1ExactPivots/315
8.2.2AsymptoticPivots/318
8.3IntervalsAssociatedwithNormalSamples/319
8.3.1SingleSample/319
8.3.2PairedSample/320
8.3.3TwoIndependentSamples/322
8.3.4Welch’sMethodfor ��1 ��2 when �� 2 1
8.3.5Satterthwaite’sApproximation/324
/323
8.4CumulativeDistributionFunctionInversion/326
8.4.1ContinuousCase/326
8.4.2DiscreteCase/330
8.5ApplicationoftheNonparametricBootstrap/334
8.6Problems/337
PARTIIIADDITIONALTOPICS
9InferenceinaHeavy-TailedContext341
9.1EstimatingtheMaximallyExistingMoment/342
9.2APrimeronTailEstimation/346
9.2.1Introduction/346
9.2.2TheHillEstimator/346
9.2.3UsewithStableParetianData/349
9.3NoncentralStudent’s t Estimation/351
9.3.1Introduction/351
9.3.2DirectDensityApproximation/352
9.3.3Quantile-BasedTableLookupEstimation/353
9.3.4ComparisonofNCTEstimators/354
9.4AsymmetricStableParetianEstimation/358
9.4.1Introduction/358
9.4.2TheHintEstimator/359
9.4.3MaximumLikelihoodEstimation/360
9.4.4TheMcCullochEstimator/361
9.4.5TheEmpiricalCharacteristicFunctionEstimator/364
9.4.6TestingforSymmetryintheStableModel/366
9.5TestingtheStableParetianDistribution/368
9.5.1TestBasedontheEmpiricalCharacteristicFunction/368
9.5.2SummabilityTestandModification/371
9.5.3ALHADI:The �� -HatDiscrepancyTest/375
9.5.4JointTestProcedure/383
9.5.5LikelihoodRatioTests/384
9.5.6SizeandPoweroftheSymmetricStableTests/385
9.5.7ExtensiontoTestingtheAsymmetricStableParetianCase/395
10TheMethodofIndirectInference401
10.1Introduction/401
10.2ApplicationtotheLaplaceDistribution/403
10.3ApplicationtoRandomizedResponse/403
10.3.1Introduction/403
10.3.2EstimationviaIndirectInference/406
CONTENTS
10.4ApplicationtotheStableParetianDistribution/409
10.5Problems/416
AReviewofFundamentalConceptsinProbabilityTheory419
A.1CombinatoricsandSpecialFunctions/420
A.2BasicProbabilityandConditioning/423
A.3UnivariateRandomVariables/424
A.4MultivariateRandomVariables/427
A.5ContinuousUnivariateRandomVariables/430
A.6ConditionalRandomVariables/432
A.7GeneratingFunctionsandInversionFormulas/434
A.8ValueatRiskandExpectedShortfall/437
A.9JacobianTransformations/451
A.10SumsandOtherFunctions/453
A.11SaddlepointApproximations/456
A.12OrderStatistics/460
A.13TheMultivariateNormalDistribution/462
A.14NoncentralDistributions/465
A.15InequalitiesandConvergence/467
A.15.1InequalitiesforRandomVariables/467
A.15.2ConvergenceofSequencesofSets/469
A.15.3ConvergenceofSequencesofRandomVariables/473
A.16TheStableParetianDistribution/483
A.17Problems/492
A.18Solutions/509
REFERENCES537
Preface
Youngpeopletodayloveluxury.Theyhavebadmanners,despiseauthority,haveno respectforolderpeople,andchatterwhentheyshouldbeworking.
(Socrates,470–399 BC)
Thisbookonstatisticalinferencecanbeviewedasacontinuationoftheauthor’sprevioustwobooksonprobabilitytheory(Paolella,2006,2007),hereafterreferredtoasBooksI andII.Ofthosetwo,BookI(oranybookatacomparablelevel)ismorerelevant,inestablishingthebasicsofrandomvariablesanddistributionsasrequiredtounderstandstatistical methodology.OccasionaluseofmaterialfromBookIIismade,thoughmostofthatrequired materialisreviewedintheappendixhereininordertokeepthisvolumeasself-contained aspossible.ReferencestothosebookswillbeabbreviatedasIandII,respectively.For example,Figure5.1in(Chapter5of)Paolella(2006)isreferredtoasFigureI.5.1;and similarlyforequationreferences,where(I.5.22)and(II.4.3)refertoequations(5.22)and (4.3)inPaolella(2006)andPaolella(2007)respectively(andbotharetheCauchy–Schwarz inequality).
FurtherprerequisitesarethesameasthoseforBookI,namelyasolidcommandof basicundergraduatecalculusandmatrixalgebra,andoccasionallyveryrudimentaryconceptsfromcomplexanalysis,asrequiredforworkingwithcharacteristicfunctions.Aswith BooksIandII,asolutionsmanualtotheexercisesisavailable.
Certainly,nomeasuretheoryisrequired,noranypreviousexposuretostatisticalinference,thoughitwouldbeusefultohavehadanintroductorycourseinstatisticsordata analysis.Thebookisaimedatbeginningmaster’sstudentsinstatistics,thoughitiswrittentobefullyaccessibletomaster’sstudentsinthesocialsciences.Inparticular,Ihave inmindstudentsineconomicsandfinance,asIprovideintroductorycoverageofsome nonstandardtopics,notablyChapter9onheavy-taileddistributionsandtailestimation,and detailedcoverageofthemixednormaldistributioninChapter5.
Naturally,thebookcanbealsousedforundergraduatesinamathematicsprogram.For theintendedaudienceofmaster’sstudentsinstatisticsorthesocialsciences,theinstructoriswelcometoskipmaterialthatusesconceptsfromconvergenceandlimittheoremsif thetargetaudienceisnotreadyforsuchmathematics.Thisisoneofthepointsofthis book:suchmaterialisincludedsothat,forexample,accessible,detailedproofsofthe Glivenko–Cantellitheoremandthelimitingdistributionofthemaximumlikelihoodestimatorcanbedemonstratedatareasonablyrigorouslevel.Thevastmajorityofthebook onlyrequiressimplealgebraandbasiccalculus.
Inthisbook,Isticktotheindependent,identicallydistributed(i.i.d.)setting,usingitasa platformforintroducingthemajorconceptsarisinginstatisticswithouttheadditionaloverheadandcomplexitiesassociatedwith,say,(generalized)linearmodels,survivalanalysis, copulamethods,andtimeseries.Thisalsoallowsformorein-depthcoverageofimportanttopicssuchasbootstraptechniques,nonparametricinferenceviatheempiricalc.d.f., numericaloptimization,discretemixturemodels,bias-adjustedestimators,tailestimation (asanicesegueintothestudyofextremevaluetheory),andthemethodofindirectinference. Afutureproject,referredtoasBookIV,buildsontheframeworkinthepresentvolumeand isdedicatedtothelinearmodel(regressionandANOVA)and,primarily,timeseriesanalysis(univariateARMAXmodels),GARCH,andmultivariatedistributionsformodelingand predictingfinancialassetreturns.
Beforediscussingthecontentsofthisvolume,itisimportanttomentionthat,similarto BooksIandII,theoverridinggoalsare:
(i)toemphasizethepracticalsideofmattersbyaddressingcomputationissues;
(ii)tomotivatestudentstoactivelyengageinthematerialbyreplicatingandextending reportedresults,andtoreadtheliteratureontopicsoftheirinterest;
(iii)togobeyondthestandardtopicsandexamplestraditionallytaughtatthislevel,albeit stillwithinthei.i.d.framework;and
(iv)tosetthestageforstudentsintendingtopursuefurthercoursesinstatistical/econometricinference(andquantitativeriskmanagement),aswellasthoseembarkingon careersasmoderndataanalystsandappliedquantitativeresearchers.
Regardingpoint(i),Iexplaintostudentsthatcomputerprogrammingskillsarenecessary,butfarfromsufficient,tobesuccessfulinappliedresearch.Inanoccasionallecture dedicatedtoprogrammingissues,Iemphasize(notsarcastically–Idonottestcomputer skills)thatitisfullyoptional,andthosestudentswhoaretrulymathematicallytalentedcan skipit,explainingthattheywillalwayshaveprogrammersintheirteam(inindustry)or PhDstudentsandco-authors(inacademics)asresourcestodothecomputergruntwork implementingtheirtheoreticalconstructs.Oddly,nobodyleavestheroom.
Withrespecttopoint(ii),thereaderwillnoticethatsomechaptershavefew(orno) exercises(somehavemany).ThisisbecauseIbelievethenatureofthematerialpresented issuchthatitoffersthestudentajudiciousplatformforselfexperimentation,particularly withrespecttonumericalimplementation.Someofthematerialcouldhavebeenpackaged asexercises(andmuchis),thoughIprefertoillustrateimportantconcepts,distributions, andmethodsinadetailedway,alongwithcodeandgraphics,insteadofbanishingittothe exercises(or,farworse,litteringtheexerciseswithtrite,uselessalgebraicmanipulations devoidofgenuineapplication)andinsteadencouragethestudenttoreplicate,complement,
andextendthepresentedmaterial.Thereaderwillnodoubttireatmyoccasionalexplicit suggestionsforthis(“Thereaderisencouraged … ”).Oneofmyrolemodelauthorsis Hamilton(1994),whosebookhas no exercises,istwicethesizeofthisbook,andhasbeen praisedasanoutstandingpresentationoftimeseries.Hamiltonclearlyintendedto teach the materialinastraightforward,clearway,withhighlydetailedandaccessiblederivations.I aspiretoasimilarapproach,aswellasaddingnumericillustrationsandMatlabcode.1
Regardingpoint(iii),besidestheobviousbenefitofgivingstudentsamoremodernviewpointonmethodsandapplicationsinstatistics,havingalargevarietyofsuchisusefulfor students(andinstructors)lookingforinteresting,relevanttopicsformaster’stheses.An exampleofanonstandardtopicofinterestisinChapter5,givingadetaileddiscussionon theproblemsassociatedwith,andsolutionsto,estimatingthe(univariate)discretemixed normal,viaavarietyofnon-m.l.e.methods(empiricalm.g.f.,c.f.,quantile-basedmethods, etc.),andtheuseoftheEMalgorithmwithshrinkage,withitsimmediateextensiontothe multivariatecase.Forthelatter,Irefertorecentworkofmineusingtheminimumcovariancedeterminant(MCD)forparameterestimation,thisalsoservingasanexampleof(i) whatcanbedonewhen,here,themultivariatenormalmixtureissurelymisspecified,and (ii)useofamostlikelyinconsistentestimator(whichoutperformsthem.l.e.intermsof densityforecastingandportfolioallocationforfinancialreturnsdata).
ParticularlywiththelesscommontopicsdevelopedinPartIIIofthisbook,theresultis, likeBooksIandII,asubstantiallylargerprojectthansomesimilarlypositionedbooks.It isthusessentialtounderstandthat noteverythinginthetextissupposedtobe(orcould be)coveredintheclassroom,atleastnotinonesemester.Inmyopinion,students(evenin mathematicsdepartments,butparticularlythoseinthesocialsciences)benefitfromhaving clearlylaidoutexplanations,detailedproofs,illustrativeexamples,avarietyofapproaches, introductionstomoderntechniques,anddiscussionsofimportant,possiblycontroversial topics(e.g.,theirrelevanceofconsistentestimatorsinlightofthenotionthat,inrealistic settings,themodeliswronganyway,andchangingthroughtimeorspace;andthearguable superfluousness,ifnotdanger,ofthetypicalhypothesistestingframework),aswellastopics thatcouldinitiallybeskippedinafirstcourse,butreturnedtolaterorassignedasoutside reading,dependingontheinterestsandabilitiesofthestudents.
Iwishtoemphasizethatthisbookisfor teaching,as(obviously)opposedtobeinga researchmonograph,or(lessobviously)adryregurgitationoftraditionalconceptsand examples.AnanonymousreviewerofBookI,whenIinitiallysubmittedittothepublisher Wiley,remarked“it’stoomuchmaterial:Itseemstheauthorhaswrittenabraindump.” WhileIliketothinkIhavemuchmoreinmyheadthanwhatwaswritteninthatbook, he(hisgenderwasindeeddisclosedtome)apparentlybelievesthatstudents(letalone instructors)areincapableofassessingwhatmaterialiscore,andwhatcanbedeemed“extra,”orsuitableforreadingafterthemainconceptsaremastered.Itistrivialtojustskip somematerial,whereasnothavingitatallresultsinanadmittedlyshorterbook(whocares, besidesarguablythepublisher?)thataccomplishesfarless,andmightevengivethestudent afalsesenseofunderstandingandcompetence(whichwillbepainfullyrevealedinaquant jobinterview).Fortunately,noteveryoneagreeswithhim:Besidesheart-warmingstudent feedbackovertheyearsonBookI(frommaster’sstudents)andBookII(fromdoctoral
1 WhileIamatit,Severini(2005)isanotherbookIconsiderexemplaryforteachingatthegraduatelevel,asitis highlydetailedandaccessible,coversarangeofimportanttopics,andisatthesamemathematicallevelas,and hassomeoverlapwith,myBookII.Thoughbewareofthetypos(whichgofarbeyondhiscurrenterratasheet)!
students),Icherishthedetailed,insightful,andhighlypositivereviewsofBooksIandII, byHarvill(2008,2009).(Istillneedtosendherflowers.)
Thechoiceofpreciselywhatmaterialtocover(andwhatnotto)iscrucial.Mydecision istoblend“old”and“new,”helpingtoemphasizethatthesubjecthasimportantrootsgoing backoveracentury,andcontinuestodevelopunabated.(Thereaderwillquicklyseemy adorationofKarlPearsonandRonaldFisher,thefoundersofoursubject;bothfascinating, albeitcomplicatedpersonalities,polymaths,and,attimes,adversaries.)
Chapter1startsmodestlywithbasicconceptsofpointestimation,andincludesmydiatribeontheunnecessaryobsessionwithconsistentestimators insomecontexts.Thesame chapterthenprogressestoaverybasicdevelopmentofthesingleanddoublebootstrapfor computingconfidenceintervals.Ifoneweretoimaginethatthefieldofstatisticssomehowdidnotexist,Iarguethatastudentversedinbasicprobabilitytheoryandwithaccess to,andskillswith,moderncomputingpowerwouldimmediatelydiscoveronhis/herown the(percentile,single,parametric)bootstrapasanaturalwayofdeterminingaconfidence interval.Assuch,itispresentedbeforetheusualasymptoticWaldintervalsandanalytic methods.Thelatter are important,asconceptualentities,andworkwellwhenapplicable, buttheirrelevancetothetasksandgoalsfacedbythenewgenerationofstudentsdealing withmodern,sophisticatedmodelsand/orbigdataapplicationsisdifficulttomotivate.
Chapter2spendsmoretimethanusualontheempiricalc.d.f.,andshows,amongother things,twosimple,instructiveproofsoftheGlivenko–Cantellitheorem,asopposedtonot mentioningitatall,or,perhapsworse,thedreaded“itcanbeshown ....”Besidesbeinga fundamentalresultofenormousimportance,thisservesasaprimerforstudentsinterestedin pointprocesses.Thechapteralsointroducesthemajorconceptsassociatedwithhypothesis testingand p-values,withinthecontextofdistributiontesting.Iargueinthechapterthatthis isaverygoodplatformforuseofhypothesistesting,andthenprovideyetanotherdiatribe aboutwhyIshyawayfrompresentingthestandardmaterialonthesubjectwhenappliedto parametersofamodel.
TherestofPartIconsistsofthreerelatedchaptersonparameterestimation.Thefive chaptersofPartIarewhatIconsidertobethecoreoffundamentalstatisticalinference, andarebestreadintheorderpresented,thoughChapter4canbestudiedindependentlyof otherchaptersandpossiblyassignedasoutsidereading.
ThecornerstoneChapter3introduceslikelihood,andcontainsmanystandardexamples, butalsosomenonstandardmaterial,suchastheMCDmethodtoemphasizetherelevanceof robuststatisticsandtheperniciousissueofmasking.Chapter4isaboutnumericaloptimization,motivatingthedevelopmentofmultivariateHessian-basedtechniquesviarepeated applicationofsimple,univariatemethodsthateverystudentunderstands,suchasbisection. Thischapteralsoincludesdiscussions,withMatlabcode,forgeneticalgorithmsandwhy theyareofsuchimportanceinmanyapplications.
Chapter5isratherunique,usingthemixednormaldistribution(itselfofgreatrelevance, notablynowinmachinelearning)asaplatformforshowingnumerousothermethodsof pointestimationthatcanoutperformthem.l.e.insmallersamples,serveasstartingvalues forcomputingthem.l.e.,orbeusedwhenthelikelihoodisnotaccessible.Chapter5also introducestheuseofshrinkageasapenaltyfactorinthelikelihood,andtheEMalgorithm inthecontextofthediscretemixednormaldistribution.
ThechaptersofPartIIarewrittentobemoreorlessorthogonal.Theinstructor(or studentworkingindependently)canchooseamongthem,basedonhis/herinterests. ThelengthyChapter6,onQ-Qplotsanddistributiontesting,buildsonthematerialin
Chapter2.Itemphasizesthedistinctionbetweenone-at-a-timeandsimultaneousintervals, andpresentsvarioustestsforcompositenormality,includingatestofmine,conveniently abbreviatedMSP:itisnotthemostpowerfultestagainstallalternatives(nosuchtestyet exists),butits development illustratesnumerousimportantconcepts–andthatisthepoint ofthebook.
Chapters7and8(andSection3.2ontheunivariateandmultivariateCramér–Raolower bound)arethemost“classic,”onwell-wornresultsforpointandintervalestimation,respectively,thoughChapter7containssomemoremoderntechniquesforbiasreductionandnew classesofestimators.Asmostofthisisstandardtextbookmaterialatthislevel,thegoal wastodevelopitintheclearestwaypossible,withaccessible,detailed(sometimesmultiple)proofs,andalargevarietyofexamplesandend-of-chapteralgebraicexercises.There arenowseveralexcellentadvancedbooksonmathematicalstatistics:Schervish(1995), LehmannandCasella(1998),Shao(2003),andRobert(2007)cometomind,anditis pointlesstocompetewiththem,norisitthegoalofthisbooktodoso.
ThetwochaptersofPartIIIaremoreassociatedwithfinancialeconometricsandquantitativeriskmanagement,thoughIbelievethematerialshouldbeofinteresttoageneral statisticsaudience.Chapter9coversmuchground.Itintroducesthebasicsoftailestimation,withasimplederivationoftheHillestimator,discussionofitsproblems(alongwith customaryHillhorrorplots),andenoughofaliteraturereviewfortheinterestedstudent topursue.Alsointhischapter(andinSectionA.16),the(univariate,asymmetric)stable Paretiandistributionreceivesmuchattention:Idispelmythsaboutitsinapplicabilityordifficultyinestimation,anddiscussseveralmethodsforthelatter,aswellasincludingrecent workon testing thestabilityassumption.
TherelativelyshortChapter10introducestheconceptandmethodologyofindirectinference,atopicrarelypresentedatthislevelbutoffundamentalimportanceinavarietyof challengingcontexts.Oneoftheexamplesusedforitsdemonstrationinvolvestherandomizedresponsetechniquefordealingwithawkwardquestionsinsurveys(thisbeingnotably atopicsquarelywithinstatistics,asopposedtoeconometrics).Thiselegantsolutionfor obtainingpointestimatorsappearstobenew.
Theappendixisprimarilyareviewofimportantandusefulfactsfromprobabilitytheory,condensedfromBooksIandII(wheremoredetailcanobviouslybefound),withits equationsbeingreferencedthroughout,thushelpingtokeepthisbookasself-contained aspossible.Italsoincludesalargesectionofexercises,manyofwhicharenotinBooks IorIIandsomeofwhicharechallenging,enablingthestudenttorefresh,extend,and self-assesshis/herabilities,and/orenablingtheinstructortogiveaninitialexamtodetermineifthestudenthastherequisiteknowledge.Allthesolutionsareprovidedattheendof theappendix.
ThisappendixalsoincludessomenewmaterialnotfoundinBooksIandII,suchas (i)moreresults,withproofs,onconvergenceindistribution(asrequiredforprovingthe asymptoticpropertiesofthem.l.e.);(ii)adetailedsectiononexpectedshortfall(ES),includingStein’slemma,asrequiredforillustratingtheshrinkageestimatorinSection5.4;(iii) additionalMatlabprograms(notinBookII)forthep.d.f.,c.d.f.,quantilesandESofthe asymmetricstable;and(iv)amongtheexercises,somepotentiallyusefulones,suchas saddlepointapproximationsandcharacteristicfunctioninversionforcomputingthedistributionandESofaconvolutionofindependentskew-normalrandomvariables.
Numeroustopicsofrelevancewereomitted(andsomenotesdeleted–whichwould delightmy“braindump”accuser),suchasancillarity,hierarchicalmodels,rankandpermutationtests,and,mostnotably,Bayesianmethodology.Forthelatter,therearenowmany goodtextbooksonthetopic,inbothpurestatisticsandalsoeconometrics,andthelast thingIwantisthatthereaderignoretheBayesianapproach.Ithinkasolidgroundingin basicprinciples,likelihood-basedinference,andastrongcommandofcomputingserveas anexcellentbackgroundforpursuingdedicatedworksonBayesianmethodology.Section 5.1.6doesintroducetheideaofquasi-Bayesianestimationanditsconnectiontoshrinkage estimation,andillustrates(withoutneedingtobreaktheproverbialfullBayesianegg)2 the effectivenessandimportanceofthesemethods.
Withrespecttocomputing,Ichose(nodoubttotheannoyanceofsome)Matlabas thevehicleforprototyping,thoughIstronglyencouragereadersversedinRtocontinue usingR,orPython,oreventolearntherelativelynewandhighlypromisinglanguage Julia.UnlikewiththeMatlabcodesinBookI,Idonot(sofar)provideRtranslations, thougheveryattemptwasmadetousethemostbasiccodinganddatastructurespossible,sothattranslationsshouldbestraightforward,andalsooccasionallyseparatingthe very-specific-to-Matlabcommands,suchasforgraphics.
Nosinglebookwillevercovereverytopicoraspecttheauthorwouldlike.Asacomplementtothisbook,IrecommendstudentsconcurrentlyreadsomesectionsofPawitan(2001) (withanupdatedandpaperbackversionnowavailable),Davison(2003),andCasellaand Berger(2002),threebooksthatIholdasexemplary;theycoveradditionaltopicsIhave omitted,and,inthecaseoftheformertwo,containfarmoreexampleswithrealdata.
Irecallareviewofabookinfinancialeconometrics(whichIhadbestnotname).Paraphrasing,thereviewerstatedthatacademicbookstendtohaveoneoftwopurposes:(i) toteachthematerial;or(ii)toimpressthereaderand,particularly,colleagueswiththe authors’knowledge.Thereviewerthenwentontosayhowthebookaccomplishedneither.Myhopeisthatthereaderandinstructorunderstandmygoaltobetheformer,with littleregardforthelatter:Asemphasizedabove,thebookcontainsmuchmaterial,computercodes,andtouchesuponsomerecentdevelopments.Whenproofsareshown,they aresimpleanddetailed.Iwrotethebookformotivatedstudentswhowantstraightforward explanations,cleardemonstrations,anddiscussionsofmoremoderntopics,particularlyin anon-Gaussiansetting.MyguidingprinciplewastowritethebookthatIwouldhavekilled forasagraduatestudent.
Someacknowledgmentsareinorder.Ioweanenormousamountofgratitudetothe excellentscientistsandinstructorsIworkedwithduringandaftermygraduatestudies. Alphabetically,theseincludeprofessorsPeterBrockwell,RonaldButler,RichardDavis, Hariharan(Hari)Iyer,StefanMittnik,andSvetlozar(Zari)Rachev.AlloftheseindividualsalsohavetextbooksthatIhighlyrecommend,andsomeofwhichwillbementioned intheprefacetobookIV.Astheyearsgoby,theproverbialcirclestartstoclose,andI havemyowndoctoralstudents,allofwhomhavecontributedinvariouswaystomybook projects.NotablementiongoestoSimonBroda,PawelPolak(bothofwhomarenowprofessorsthemselves)and(currentPhDstudents)MarcoGambaccianiandPatrickWalker,who, alongwithprofessorsKaiCarstensen,WalterFarkas,MarkusHaas,AlexanderMcNeil, Nuttanan(Nate)Wichitaksorn,andMichaelWolf,havereadpartsofthismanuscript(and
2 Thisreferstotheoft-quotedstatementinSavage(1961,p.578)thatFisher’sfiducialinferentialmethodis“a boldattempttomaketheBayesianomeletwithoutbreakingtheBayesianeggs”.
PREFACE xvii bookIV)andhelpedteaseoutmistakesandimprovethepresentation.Finally,Iamindebted tomycopyeditorRichardLeighfromWiley,whoreadeverylineofthebook,checked everygraphicandbibliographyreference,andmadeuncountablecorrectionsandsuggestionstothescientificEnglishpresentation,aswellas(embarrassingly)caughtafewmath mistakes.IhaveobviouslysuggestedtotheeditortohavehimworkonmybookIV(and doublehissalary).
Mygratitudetotheseindividualscannotbeoverstated.
IntroducingPointand IntervalEstimation
Thediscussionsoftheoreticalstatisticsmayberegardedasalternatingbetweenproblemsofestimationandproblemsofdistribution.Inthefirstplaceamethodofcalculating oneofthepopulationparametersisdevisedfromcommon-senseconsiderations:we nextrequiretoknowitsprobableerror,andthereforeanapproximatesolutionofthe distribution,insamples,ofthestatisticscalculated.
(R.A.Fisher,1922,reproducedinKotzandJohnson,1992)
Thischapterandthenexttwointroducetheprimarytoolsandconceptsunderlyingmostall problemsinstatisticalinference.Werestrictourselveshereintotheindependent,identically distributed(i.i.d.)framework,inordertoemphasizethefundamentalconceptswithoutthe needforaddressingtheadditionalissuesandcomplexitiesassociatedwiththeworkhorse modelsofstatistics,suchaslinearmodels,analysisofvariance,designofexperiments, andtimeseries.Theoverridinggoalistoextractrelevantinformationfromtheavailable sampleinordertolearnabouttheunderlyingpopulationfromwhichitwasdrawn.
Webeginwiththebasicdefinitionsassociatedwithpointestimation,andintroducethe maximumlikelihoodestimator(m.l.e.).Wewillhavemoretosayaboutpointestimationand m.l.e.sinChapters3and5.Theremainderofthechapterisdedicatedtoindividualparameterconfidenceintervals(c.i.s),restrictingattentiontotheintuitiveuseofcomputer-intensive methodsfortheirconstruction,astheyaregenerallyapplicableand,formorecomplex problems,oftentheonlyavailablechoice.Inparticular,anaturalprogressionismadefrom simulationtotheparametricbootstrap,tothenonparametricbootstrap,tothedoublenonparametricbootstrap,andfinallytothedoublebootstrapwithanalyticinnerloop,thelatter usingtechniquesfromChapter8.
1.1POINTESTIMATION
Tointroducethenotionofparameterestimationfromasampleofdata,wemakeuseof twosimplemodels,theBernoulliandgeometric.
1.1.1BernoulliModel
Consideranidealizedexperimentthatconsistsofrandomlydrawingamarblefromanurn containing R redand W whitemarbles;itscolorisnotedanditisthenplacedbackinto theurn.Thisisrepeated n times,whereby n isaknown,finiteconstant, butRandWare unknown.ThiscorrespondstoasequenceofBernoullitrialswithunknownprobability p = R∕(R + W ) or p = W ∕(R + W ),dependingonwhatonewantstoconsidera“success.” Assumingtheformer,let Xi , i = 1, … , n,denotetheoutcomesoftheexperiment,with Xi i.i.d. ∼ Bern (p),eachwithsupport ={0, 1}.Theultimategoalistodeterminethevalueof p.If n isfinite(asrealityoftendictates),thiswillbeanimpossibletask.Instead,wecontent ourselveswithattemptingtoinferasmuchinformationaspossibleaboutthevalueof p.
Asastartingpoint,inlinewithFisher’s“common-senseconsiderations”intheopeningquote,itseemsreasonabletoexaminetheproportionofsuccesses.Thatis,wewould compute s∕n,where s istheobservednumberofsuccesses.Thevalue s∕n isreferredto asa pointestimate of p anddenoted ̂ p,pronounced“phat.”Sometimesitisadvantageoustowrite ̂ pn ,wherethesubscriptindicatesthesamplesize.Fromthewayinwhichthe experimentisdefined,itshouldbeclearthat s isarealizationfromthebinomialrandom variable(r.v.) S = ∑n i=1 Xi ∼ Bin (n, p).Toemphasizethis,wealsowrite ̂ p = S∕n andcall thisa pointestimator of p,thedistinctionbeingthatapointestimatorisarandomvariable, whileapointestimateisarealizationofthisrandomvariableresultingfromtheoutcome ofaparticularexperiment.
Notethatthesamenotationofaddinga“hat”totheparameterofinterestisusedto denotebothestimateandestimator,asthisisthecommonstandard.However,thedistinction betweenestimateandestimatoriscrucialwhenattemptingtoassessthepropertiesof ̂ p (e.g., isitcorrectonaverage?)andcompareitsperformancetootherpossibleestimators(e.g.,is onepointestimatormorelikelytobecorrectthantheother?).Forinstance, ��[s∕n]= s∕n, thatis, s∕n isapost-experimentconstant,while ��[S∕n]=(np)∕n = p canbecomputed beforeoraftertheexperimenttakesplace.Inthiscase,estimator S∕n issaidtobe (mean) unbiased. Moreformally,let ̂ �� beapointestimatorofthefinite,fixed,unknownparameter �� ∈Θ ⊂ ℝ suchthat ��[ ̂ �� ] exists.Then:
Thepointestimator ̂ �� is (mean)unbiased (withrespecttotheset Θ)if itsexpectedvalueis �� (forall �� ∈Θ);otherwiseitis (mean)biased with bias ( ̂ �� )= ��[ ̂ �� ]− ��
Generallyspeaking,meanunbiasednessisadesirablepropertybecauseitimpliesthatwe are“correctonaverage,”wherethe“average”referstothehypotheticalideaofrepeatingthe experimentinfinitelyoften–somethingthatofcoursedoesnotactuallyhappeninreality. Animpressivetheoreticalframeworkinmathematicalstatisticswasdeveloped,startingin the1950s,forthederivationandstudyofunbiasedestimatorswithminimumvariance;see Chapter7,especiallySection7.2.Itisoftenthecase,however,thatestimatorscanbefound
thatarebiased,but,byvirtueofhavingalowervariance,winduphavingalowermean squarederror,asseenfrom(1.2)directlybelow.Thisconceptiswellknown,andreflected, forexample,inShaoandTu(1995,p.67),stating“Weneedtobalancetheadvantage ofunbiasednessagainstthedrawbacksofalargemeansquarederror.”Anothertypeof unbiasednessinvolvesusingthemedianinsteadofthemean.SeeSection7.4.2fordetails onmedian-unbiasedestimators.
Forthebinomialexample,thevarianceofestimator ̂ p is
anditclearlygoestozeroasthesamplesizeincreases.Thisisalsodesirable,because,as moresamplesarecollected,theamountofinformationfromwhich p istobeinferredis growing.Thisconceptisreferredtoasconsistency;recallingthedefinitionofconvergence inprobabilityfrom(A.254)andtheweaklawoflargenumbers(A.255),thefollowing definitionshouldseemnatural:
Anestimator ̂ ��n basedonasampleof n observationsis weaklyconsistent (withrespectto Θ)if,as n → ∞,Pr(| ̂ ��n �� | >�� ) → 0forany ��> 0(andall �� ∈Θ).
Observethatanestimatorcanbe(mean)unbiasedbutnotconsistent:if Xi i.i.d. ∼ N (��, 1), i = 1, , n,thentheestimator ̂ �� = X1 isunbiased,butitdoesnotconvergeto �� asthe samplesizeincreases.
Anotherpopularmeasureofthequalityofanestimatorisitsexpectedsquareddeviation fromthetruevalue,called meansquarederror,or m.s.e.:
The meansquarederror oftheestimator ̂ �� isdefinedas ��[( ̂ �� �� )2 ].
Animportantdecompositionofthem.s.e.isasfollows.With
Thereadershouldquicklyverifythatthecross-termisindeedzero.Notethat,foranunbiasedestimator,itsm.s.e.andvarianceareequal.Astheestimator ̂ �� isafunctionofthedata, itisitselfarandomvariable.With f = f ̂ �� thep.d.f.of ̂ �� ,wecanwritePr(| ̂ �� �� | >�� ) for any ��> 0as
sothat ̂ �� isweaklyconsistentifm.s.e. ( ̂ �� ) → 0.
Theestimator ̂ p = S∕n fortheBernoullimodelisratherintuitiveandvirtuallypresents itselfasbeingagoodestimatorof p.Itturnsoutthatthis ̂ p coincideswiththeestimatorwe obtainwhenapplyingaverygeneralandpowerfulmethodofobtaininganestimatorforan
unknownparameterofastatisticalmodel.Webrieflyintroducethismethodnow,andwill havemoretosayaboutitinSection3.1.
The likelihoodfunction (�� ; x) isthejointdensityofasample X =(X1 , , Xn ) as afunctionofthe(fornow,scalar)parameter �� ,forfixedsamplevalues X = x.Thatis, (�� ; x)= fX (x; �� ),where fX isthep.m.f.orp.d.f.of X.Let �� (�� ; x)= log (�� ; x), 1 andwrite just �� (�� ) whenthedataareclearfromthecontext.Denotethefirstandsecondderivativesof �� (�� ) withrespectto �� by �� (�� ) and �� (�� ),respectively.The maximumlikelihoodestimate, abbreviatedm.l.e.anddenotedby ̂ �� (or,todistinguishitfromotherestimates, ̂ ��ML ),isthat valueof �� thatmaximizesthelikelihoodfunctionforagivendataset x.The maximum likelihoodestimator (asopposedto estimate)isthefunctionofthe Xi ,alsodenoted ̂ ��ML , thatyieldsthem.l.e.foranobserveddataset x
Inmanycasesofinterest(includingtheBernoulliandgeometricexamplesinthis chapter),them.l.e.satisfies �� ( ̂ �� )= 0and �� ( ̂ �� ) < 0.Forexample,with Xi i.i.d. ∼ Bern (�� ), i = 1, … , n,thelikelihoodis
where s = ∑n i=1 xi .Then �� (�� )=
fromwhichitfollows(bysetting ̇ �� ( ̂ �� )= 0andconfirming ̈ �� ( ̂ �� ) < 0)that ̂ ��ML = S∕n isthe m.l.e.Itiseasytoseethat ̂ ��ML isunbiased.
1.1.2GeometricModel
Asinthebinomialcase,independentdrawswithreplacementareconductedfroman urnwith R redand W whitemarbles.However,nowthenumberoftrialsisnotfixedin advance;samplingcontinuesuntil r redmarbleshavebeendrawn.Whatcanbesaidabout p = R∕(R + W )?Letther.v. X bethenumberofnecessarytrials.Fromthesampling structure, X followsanegativebinomialdistribution, X ∼ NBin (r , p),withp.m.f. fX (x; r , p)= ( x 1 r 1 ) pr (1 p)x r ��
Recallthat X canbeexpressedasthesumof r i.i.d.geometricr.v.s,say X = ∑r i=1 Gi ,where Gi i.i.d. ∼ Geo (p),eachwithsupport {1, 2, …}
Thisdecompositionisimportantbecauseitallowsustoimaginethatsamplingoccursnot necessarilyconsecutivelyintimeuntil r successesoccur,butratheras r independent(and possiblyconcurrent)geometrictrialsusingurnswiththesameredtowhiteratio,thatis,the same p.Forexample,interestmightcenteronhowlongittakesawomantobecomepregnantusingaparticularmethodofassistance(e.g.,temperaturemeasurementsorhormone treatment).Thisisworthmakinganexample,aswewillrefertoitmorethanonce.
1 Throughoutthisbook,logreferstobaseeunlessotherwisespecified.
Example1.1(Geometric) LetGi i i d ∼ Geo(�� ),i = 1, … , n,withtypicalp.m.f.
Then �� (�� ; x),thelog-likelihoodofthesample x =(x1 , … , xn ),anditsfirstderivative, �� (�� ; x),are,withs = ∑n i=1 xi ,
.
Solvingtheequation ̇ �� (�� ; x)= 0 andconfirming ̈ �� ( ̂ �� ) < 0 gives ̂ ��ML = n
S = 1∕G.Wewill seebelowandinSection7.3thatthem.l.e.isnotunbiased.2 ◾
Imagineastudyinwhicheachof r couples(independentlyofeachother)attemptstoconceiveeachmonthuntiltheysucceed.Inthe r = 1case, X = G1 ∼ Geo (p) and,recallingthat ��[G1 ]= 1∕p,anintuitivepointestimatorof p is1∕G1 .Interestcentersondevelopingapoint estimatorforthe r > 1case.Ofcourse,inthissimplestructure,onewouldjustcomputethe m.l.e.However,weusethiseasycasetoillustratehowonemightproceedwhensimple answersarenotimmediatelyavailable,andsomethinkingandcreativityarerequired.
Basedontheresultfor r = 1,oneideaforthe r > 1casewouldbetousetheaverageof the1∕Gi values, r 1 ∑r i=1 G 1 i ,whichwedenoteby ̂ p1 .Anothercandidateis ̂ p2 = 1∕G = r ∕ ∑r i=1 Gi = r ∕X .Thishappenstobethem.l.e.fromExample1.2.Notethatbothofthese estimatorsreduceto1∕G1 when r = 1.Wealsoconsiderthenonobviouspointestimator ̂ p3 =(r 1)∕(X 1).ItwillbederivedinSection7.3,andisonlyusefulfor r > 1.
Insteadofalgebraicallydeterminingthemeanandvarianceofthe ̂ pi , i = 1, 2, 3,wewill beginourpracticeoflettingthecomputerdothework.TheprograminListing1.1computes thethreepointestimatorsforasimulatedsetof Gi ;itrepeatsthis sim = 10,000times,and theresultingsamplemeanandvarianceofthesesimulatedestimatesapproximatethetrue meanandvariance.
Toillustrate,Figure1.1showsthehistogramsofthesimulatedpointestimatorsforthe casewith p = 0.3and r = 5.Fromthese,thelargeupwardbiasof ̂ p1 isparticularlyclear.
1 function[p1vec,p2vec,p3vec]=geometricparameterestimate(p,r,sim)
2 p1vec=zeros(sim,1);p2vec=p1vec;p3vec=p1vec;
3 fors=1:sim
4 gvec=geornd(p,[r1])+1;
5 p1=mean(1./gvec);p2=1/mean(gvec);p3=(r1)/(sum(gvec)1);
6 p1vec(s)=p1;p2vec(s)=p2;p3vec(s)=p3;
7 end
8 bias1=mean(p1vec)p,bias2=mean(p2vec)p,bias3=mean(p3vec)p
9 var1=var(p1vec),var2=var(p2vec),var3=var(p3vec)
10 mse1=var1+bias1^2,mse2=var2+bias2^2,mse3=var3+bias3^2
ProgramListing1.1:Simulatesthreepointestimatorsfor p inthei.i.d.geometricmodel. Callingthefunctionwith p = 0.3and r = 5correspondstothetrueprobabilityofsuccess being0.3andusingfivecouplesintheexperiment.
2 Weusethesymbol ◾ todenotetheendofproofsoftheorems,aswellasexamplesandremarks,acknowledging thatitistraditionallyonlyusedfortheformer,aspopularizedbyPaulHalmos.
Figure1.1 Distributionofpointestimators ̂ p1 (a), ̂ p2 (b),and ̂ p3 (c)usingoutputfromtheprogramin Listing1.1with p = 0.3and r = 5,basedonsimulationwith10,000replications.
Thediscretenatureof ̂ p2 and ̂ p3 arisesbecausethesetwoestimatorsfirstcomputethesum oftheobservationsandthentakereciprocals,sothatcomputationoftheirp.m.f.siseasy.As anexample, ̂ p3 = 0.4 ⇔ X = 11,which,from(1.3),hasprobability0.06,sothatapproximately600ofthesimulatedvaluesdepictedinthehistogramof ̂ p3 shouldbe0.4;thereare 624inthehistogram.Similarly, ̂ p3 = 0.8 ⇔ X = 6,withprobability0.008505,and94in thehistogram.
As p increasestowardsone,thenumberofpointsinthesupportsof ̂ p2 and ̂ p3 decreases. ThisisillustratedinFigure1.2,showinghistogramsof ̂ p2 for r = 10andfourdifferentvaluesof p.ThecodeusedtomaketheplotsisgiveninListing1.2.Observehowweavoiduse oftheFORloop(aswasusedinListing1.1)forgeneratingthe1millionreplications,thus providingasignificantspeedincrease.(Theuseofthe eval commandwithconcatenated textstringsisalsodemonstrated.)
Forthesimulationof ̂ p1 , ̂ p2 ,and ̂ p3 fromListing1.1,with p = 0.3and r = 5,theresults areshowninthefirstnumericrowofTable1.1.Weseethat ̂ p1 hasalmostfivetimesthe biasof ̂ p2 ,while ̂ p2 hasover100timesthebiasof ̂ p3 .Thevarianceof ̂ p1 isslightlylarger thanthoseof ̂ p2 and ̂ p3 ,whicharenearlythesame.Bycombiningtheseaccordingto(1.2), itisclearthatthem.s.e.willbesmallestfor ̂ p3 ,asalsoshowninthetable.Thenextrow showstheresultsusingalargersampleof15couples.Whilethebiasof ̂ p1 staysthesame, thoseof ̂ p2 and ̂ p3 decrease.Forallpointestimators,thevariancedecreases. Itturnsoutthat,asthenumberofcouples, r ,tendstowardsinfinity,thevarianceofall theestimatorsgoestozero,whilethebiasof ̂ p1 staysat0.22andthatof ̂ p2 goestozero.
Figure1.2 Histogramofpointestimator p2 for r = 10andfourvaluesof p,basedonsimulationwith 1millionreplications.
1 B=1e6;r=10;
2 forp=0.2:0.2:0.8
3 phatvec=1./mean(geornd(p,[rB])+1);%theMLE
4 [histcount,histgrd]=hist(phatvec,1000);
5 figure,h1=bar(histgrd,histcount);set(gca,'fontsize',16),xlim([01])
6 title(['r=',int2str(r),',p=',num2str(p)])
7 set(h1,'facecolor',[0.940.940.94],'edgecolor',[0.90.71])
8 eval(['printdepscphatforgeogetsmorediscretep',int2str(10 ∗ p)])
9 end
ProgramListing1.2:GeneratesthegraphsinFigure1.2.
Hence,wesaythat ̂ p2 is asymptoticallyunbiased.WewillseeinSection7.3that ̂ p3 is unbiased–notjustasymptotically,butforall0 < p ≤ 1andany r > 1.Thisimpliesthat thevalue0.0004inthe ̂ p3 biascolumnofthetablejustreflects samplingerror resulting fromusingonly10,000replicationsinthesimulation.Incomparison,then,pointestimator ̂ p3 seemstobepreferredwithrespecttoallthreecriteria.
ThelowerportionofTable1.1showssimilarresultsusing p = 0.7.Again, ̂ p1 ishighly biased,while,comparativelyspeaking,thebiasof ̂ p2 ismuchsmalleranddiminisheswith growingsamplesize r .Thebiasof ̂ p3 appearsverysmalland,asalreadymentioned,is theoreticallyzero.Theinterestingthingaboutthischoiceof p isthatthevarianceof ̂ p2 issmallerthanthatof ̂ p3 .Infact,thisreductioninvariancecausesthem.s.e.of ̂ p2 tobe smallerthanthatof ̂ p3 eventhoughthebiasof ̂ p3 isessentiallyzero.Thisdemonstratestwo importantpoints:
(i)Anunbiasedpointestimatorneednothavethesmallestm.s.e.
(ii)Therelativepropertiesofpointestimatorsmaychangewiththeunknownparameter ofinterest.
TABLE1.1Comparisonofthreepointestimatorsforthegeometricmodel biasvariancem.s.e. pr
0.350.220.0450.000400.0230.0180.0170.0700.0200.017
HavingdemonstratedthesetwofactsusingjustthevaluesinTable1.1,itwouldbedesirable tographicallydepictthem.s.e.ofestimators ̂ p2 and ̂ p3 asafunctionof p,forseveralsamplesizes.ThisisshowninFigure1.3,fromwhichweseethatm.s.e. ( ̂ p2 ) < m.s.e. ( ̂ p3 ) for (roughly) p > 0.5,butasthesamplesizeincreases,thedifferenceinm.s.e.ofthetwoestimatorsbecomesnegligible.
Facts(i)and(ii)mentionedabovecomplicatethecomparisonofestimators.Somestructurecanbeputontheproblemifwerestrictattentiontounbiasedestimators.Then,minimizingthem.s.e.isthesameasminimizingthevariance;thisgivesrisetothefollowing concepts:
Anunbiasedestimator,say ̂ ��eff ,is efficient (withrespectto Θ)ifithasthe smallestpossiblevarianceofallunbiasedestimators(forall �� ∈Θ).
The efficiency ofanunbiasedestimator ̂ �� isEff( ̂ ��,�� )= �� ( ̂ ��eff )∕�� ( ̂ �� ).
Wewillseelater(Chapter7)thattheestimator ̂ p3 usedaboveisefficient. Inmanyrealisticproblems,theremaybenounbiasedestimators,ornoefficientone; andifthereis,like ̂ p3 above,itmightnothavethesmallestm.s.e.overallorpartsof Θ Thissomewhatdiminishesthevalueoftheefficiencyconceptdefinedabove.Allisnotlost, however.Inmanycasesofinterest,them.l.e.hasthepropertythat,asymptotically,itis (unbiasedand)efficient.Assuch,itservesasanaturalbenchmarkwithwhichtocompare competingestimators.Weexpectthat,withincreasingsamplesize,them.l.e.willeventually beasgoodas,orbetterthan,allotherestimators,withrespecttom.s.e.,forall �� ∈Θ.This certainlydoesnotimplythatthem.l.e.isthebestestimatorinfinitesamples,asweseein Figure1.3comparingthem.l.e. ̂ p2 totheefficientestimator ̂ p3 .(Othercasesinwhichthe
Figure1.3 Them.s.e.ofestimators ̂ p2 (lines)and ̂ p3 (lineswithcircles)forparameter p inthegeometricmodel,asafunctionof p,forthreesamplesizes,obtainedbysimulationwith100,000replications.
m.l.e.isnotthebestestimatorwithrespecttom.s.e.infinitesamplesaredemonstratedin Example9.3andSection7.4.3.)
Beforeleavingthissection,itisworthcommentingonotherfactsobservablefrom Figure1.3.Notethat,forany r > 1,them.s.e.sforbothestimators ̂ p2 and ̂ p3 approachzero as p → 0and p → 1.Thisisbecause,fortheformercase,as p → 0, ��[X ]= r ∕p → ∞,and ̂ p → 0.Forthelattercase,as p → 1,Pr(X = r )= pr → 1,sothat ̂ p → 1.Also,them.s.e. increasesmonotonicallywith p as p movesfrom0+ to(roughly) p = 0.6,anddecreases monotonicallybacktowardszeroas p → 1.
1.1.3SomeRemarksonBiasandConsistency
Themosteffectivewaytodiscourageanappliedstatisticianfromusingamodel ormethodistosaythatitgivesasymptoticallyinconsistentparameterestimates.Thisiscompletelyirrelevantforafixedsmallsample;theintervalof plausiblevalues,notapointestimate,isessential. … Ifthesamplewerelarger, inaproperlyplannedstudy,themodelcouldbedifferent,sothequestionofa parameterestimateinsomefixedmodelconvergingasymptoticallytoa“true” valuedoesnotarise.
(J.K.Lindsey,1999,p.20)
Sections1.1.1and1.1.2introducedthefundamentalconceptsofpointestimation,(mean) unbiasedness,consistency,m.s.e.,likelihood,andefficiency.Wedemonstratedthatabiased estimatormightbepreferredtoanunbiasedonewithrespecttom.s.e.,suchasthem.l.e., which,undercertainconditionsfulfilledinthevastmajorityofsituations,isasymptotically unbiasedandconsistent(seeSection3.1.4fortheformalitiesofthis).
While(mean)unbiasednessisanappealingproperty,inmanymodernapplications,particularlyinthecontextofbigdataandmodelswitharelativelylargenumberofparameters, unbiasednessisnotonlynolongeraconsideration,butbiasedestimates areactuallypreferred,viauseofshrinkageestimation;seeChapter5.Moreover,startinginthelatetwentiethcenturyandcontinuingunabated,theBayesianapproachtoinferencehasgainedin attentionandusagebecauseofadvancesincomputingpowerandrecognitionofsomeof itsinferentialbenefits.Inasense,unbiasednessistheantithesis,ordual,oftheBayesian approach;seeNoorbaloochiandMeeden(1983).So-calledempiricalBayesmethodsform alinkbetweenpureBayesianmethodsandshrinkageestimation,andyieldaformidable approachtoinference;seethereferencesinSection5.4.
Assuch,mostresearchersarenowcomfortableworkingwithbiasedestimators,butwill oftenstillinsistonsuchestimatorsbeingconsistent.Asconsistencyisanasymptoticnotion, butrealitydealswithfinitesamples,onemightalsoquestionitsvalue,assuggestedinthe abovequotefromLindsey(1999,p.20).Asasimpleexampleofinterest(particularlyfor anyonewithapensionfund),consideracasefromfinancialportfoliooptimization.The basicframeworkofMarkowitz(1952)(whichledtohimreceivingthe1990NobelMemorialPrizeinEconomicSciences)isstillusedtodayinindustry,though(aswaswellknown toMarkowitz)themethodisproblematicbecauseitrequiresestimatingthemeanvectorand covariancematrixofpastassetreturns.Thishasbeenresearchedinasubstantialbodyof literature,resultingintheestablishedfindingthatshrinkingtheoptimizedportfolioweights towardstheequallyweightedvector(referredtoas“1∕N ,”where N isthenumberofassets
underconsideration)notonlyimprovesmatterssubstantially(intermsofarisk–reward tradeoff),but justtakingtheweightstobetheshrinkagetarget 1∕N oftenresultsinbetter performance.3 Alternatively,onecanapplytheMarkowitzoptimizationframework,butin conjunctionwithshrinkageappliedtothemeanvectorand/orthecovariancematrix.4
Thehumblingresultthatoneisbetteroffforgoingbasicstatisticalmodelingandjust puttingequalamountsofmoneyineachavailableasset(roughlyequivalenttojustbuying anexchangetradedfund)arisesbecauseof(i)thehighrelevanceandapplicabilityofshrinkageestimationinthissetting;and(ii)thegrossmisspecificationofthemodelunderlyingthe multivariatedistributionofassetreturns,andhowitevolvesovertime.Morestatistically sophisticatedmodelsforassetreturns doexist,suchthatportfoliooptimization does result insubstantiallybetterperformancethanuseof1∕N (letalonethenaiveMarkowitzframework),thoughunsurprisingly,thesearecomplicatedforpeoplenotwellversedinstatistical theory,andrequiremoremathematicalandstatisticalprowessthanusuallyobtainedfrom acourseinintroductorystatisticalmethodsforaspiringinvestors.5 BookIVwilldiscuss somesuchmodels.
Clearly,1∕N isnota“consistent”estimatoroftheoptimalportfolio(asdefinedbyspecifyingsomedesiredlevelofannualreturn,andthenminimizingsomewell-definedriskmeasure,suchasportfoliovariance,intheMarkowitzsetting).Moreimportantly,thisexample highlightsthefactthatthemodelused(ani.i.d.Gaussianor,moregenerally,anellipticdistribution,withconstantunknownmeanvectorandcovariancematrixthroughouttime)for thereturnsissocompletelymisspecified,thatthenotionofconsistencybecomesvacuous inthissetting.
Twofurther,somewhatlesstrivialcasesinwhichinconsistentestimatorsarefavored(and alsointhecontextofmodelingfinancialassetreturns),aregiveninKrauseandPaolella (2014)andGambaccianiandPaolella(2017).
1.2INTERVALESTIMATIONVIASIMULATION
Tointroducetheconceptsassociatedwithintervalestimationandhowsimulatingfromthe truedistributioncanbeusedtocomputeconfidenceintervals,weusetheBernoullimodel fromSection1.1.1.Forafixedsamplesize n,weobserverealizationsof X1 , … , Xn ,where
3 See,forexample,DeMigueletal.(2009a,b2013)andthereferencestherein.
4 See,forexample,Jorion(1986),JagannathanandMa(2003),LedoitandWolf(2003,2004),SchäferandStrimmer(2005),KanandZhou(2007),Fanetal.(2008);BickelandLevina(2008),andthereferencestherein.
5 Thisresultisalsoanathematosupposedlyprofessionalinvestmentconsultantsandmutualfundmanagers,with theirtechniquesfor“stockpicking”and“investmentstrategies.”Thiswasperhapsmostforcefullyandamusingly addressedbyWarrenBuffett(whoapparentlyprofitsenormouslyfrommarketinefficiency).“TheBerkshirechairmanhaslongarguedthatmostinvestorsarebetteroffstickingtheirmoneyinalow-feeS&P500indexfundinstead oftryingtobeatthemarketbyemployingprofessionalstockpickers”(Holm,2016).ToquoteBuffett:“Supposedly sophisticatedpeople,generallyricherpeople,hireconsultants,andnoconsultantintheworldisgoingtotellyou ‘justbuyanS&Pindexfundandsitforthenext50years.’Youdon’tgettobeaconsultantthatway.Andyou certainlydon’tgetanannualfeethatway.Sotheconsultanthaseverymotivationintheworldtotellyou,‘this yearIthinkweshouldconcentratemoreoninternationalstocks,’or‘thismanagerisparticularlygoodontheshort side,’andsotheycomeinandtheytalkforhours,andyoupaythemalargefee,andtheyalwayssuggestsomethingotherthanjustsittingonyourrearendandparticipatingintheAmericanbusinesswithoutcost.Andthen, aftertheygettheirfees,theyinturnrecommendtoyouotherpeoplewhochargefees,which cumulatively eatupcapitallikecrazy”(Holm,2016).SeealsoSorkin(2017)on(i)Buffett’sviews;(ii)whymanyhighwealth individualscontinuetoseekhighlypaidconsultants;and(iii)withrespecttotheconceptofmarketefficiency, whatwouldhappenifmostwealthwerechanneledintoexchangetradedfunds(i.e.,themarketportfolio).
Xi i.i.d. ∼ Bern (p),andcomputethemeanofthe Xi , ̂ p = S∕n,asourestimatorofthefixedbut unknown p.Dependingon n and p,itcouldbethat ̂ p = p,thoughif,forexample, n isodd and p = 0.5,then ̂ p ≠ p.Evenif n isarbitrarilylargebutfinite,if p isanirrationalnumber in (0, 1),thenwithprobabilityone(w.p.1), ̂ p ≠ p
Thepointisthat,foralmostallvaluesof n and p,theprobabilitythat ̂ p = p willbelowor zero.Assuch,itwouldseemwisetoprovideasetofvaluessuchthat,withahighprobability, thetrue p isamongthem.Foraunivariateparametersuchas p,themostcommonsetisan interval,referredtoasa confidenceinterval,or c.i.
Noticethatac.i.pertainstoaparameter,suchas p,and not toanestimateorestimator, ̂ p. Wemightspeakofac.i. associatedwith ̂ p,inwhichcaseitisunderstoodthatthec.i.refers toparameter p.Itdoesnotmakesensetospeakofac.i.for ̂ p.
Togetanideaoftheuncertaintyassociatedwith ̂ p forafixed n and p,wecanusesimulation.ThisiseasilydoneinMatlab,usingitsbuilt-inroutine binornd forsimulating binomialrealizations.
1 p=0.3;n=40;sim=1e4;phat=binornd(n,p,[sim,1])/n;hist(phat)
Thefollowingcodeisalittlefancier.ItmakesuseofMatlab’s tabulate function, discussedinSection2.1.4.
1 p=0.3;n=40;sim=1e4;phat=binornd(n,p,[sim,1])/n;
2 nbins=length(tabulate(phat));[histcount,histgrd]=hist(phat,nbins);
3 h1=bar(histgrd,histcount);xlim([00.8])
4 set(h1,'facecolor',[0.6400.24],'edgecolor',[000],'linewidth',2)
5 set(gca,'fontsize',16),title(['Usingn=',int2str(n)])
Doingthiswith p = 0.3andforsamplesizes n = 20and n = 40yieldsthehistograms showninFigure1.4.Weseethat,whilethemodeof ̂ p isat0.3inbothcases,thereisquite somevariationaroundthisvalueand,particularlyfor n = 20,asmallbutnonnegligible chance(theexactprobabilityofwhichyoucaneasilycalculate)that ̂ p iszeroorhigher than0.6.
Wefirststatesomeusefuldefinitions,andthen,basedonourabilitytoeasilysimulate valuesof ̂ p,determinehowtoformac.i.for p
Consideradistribution(orstatisticalmodel)withunknownbutfixed k -dimensional parameter �� ∈ �� ⊆ ℝk .A confidenceset M (X) ⊂ �� for �� with confidencelevel 1 �� is anysetsuchthat
(�� ∈ M (X)) ≥ 1 ��, ∀ ��
��, 0 <��< 1, (1.4) where M (X) dependsonther.v. X,arealizationofwhichwillbeobserved,butdoesnot dependontheunknownparameter ��.Typicalvaluesof �� are0.01,0.05and0.10.The quantityPr(�� ∈ M (X)) iscalledthe coverageprobability andcandependon ��;itsgreatest lowerbound
��∈��
(�� ∈ M (X)) isreferredtoasthe confidencecoefficient of M (X). Itisimperativetokeepinmindhow(1.4)istobeunderstood:As �� isfixedand M (X) is random,wesaythat,beforethesampleiscollected,theset M (X) willcontain(orcapture)
Another random document with no related content on Scribd:
discovered. But even the victims of that malady find atmospheric and other conditions friendly to a prolongation of life in the salubrious air and sunshine of the South African tablelands.
On the whole, there can be no question as to the general good effect upon health of the South African climate. Europeans and Americans living therein pursue their athletic sports with all the zest experienced in their native climates, and the descendants of the original Dutch and Huguenot settlers—now in the sixth and seventh generations—have lost nothing of the stature nor of the physical energy that characterized their forefathers.
South Africa used to be the habitat of an unusually rich fauna. The lion, leopard, elephant, giraffe, rhinoceros, hippopotamus, antelope in thirty-one species, zebra, quagga, buffalo and various other wild creatures—some of them savage, and all of them beautiful after their kind—abounded. But of late years all this has been changed. Since firearms have been greatly improved and cheapened and the country has been opened to the Nimrods of the world and the [269]swarming natives have procured guns and learned to use them, the wild animals have been thinned out. There are now but two regions in South Africa where big game can be killed in any great numbers—the Portuguese territory from the Zambesi to Delagoa Bay, and the adjoining eastern frontier of the Transvaal.
Snakes of various kinds and sizes, from the poisonous black mamba to the python that grows to over twenty feet in length, used to infest many parts of the country, but they have almost disappeared from the temperate regions inhabited by the whites.
The farmers’ worst enemies are not now the great beasts and reptiles of former years, but the baboons, which gather in the more rocky districts and kill the lambs, and two species of insects—the white ants and the locusts—which sometimes ravage the eastern coast.
Beyond that of most countries in the world of equal extent the flora of South Africa is rich in both genera and species. The neighborhood of Cape Town and the warm, sub-tropical regions of eastern Cape Colony and Natal are specially affluent in beautiful flowers. In the Karoo district, and northeastward over the plateau into Bechuanaland and the Transvaal, vegetation presents [270]but little variety of aspect, owing in part to the general sameness of geological formations and in part to the prevailing dryness of the surface.
In general, South Africa is comparatively bare of forests—a fact for which denudation by man cannot account, for it is yet a country new to civilization. Some primitive forests are to be found on the south coast of Cape Colony and in Natal. These have been put under the care of a Forest Department of the government. In the great Knysna forest wild elephants still roam at large. The trees, however, even in the preserved forests, are small, few of them being more than fifty or sixty feet in height. The yellowwood grows the tallest, but the less lofty sneezewood is the most useful to man. Up the hillsides north of Graham’s Town and King William’s Town are immense tracts of scrub from four to eight feet high, with occasional patches of prickly pear—a formidable invader from America, through which both men and cattle make their passage at the cost of much effort and many irritating wounds from the sharp spines. A large part of this region, being suitable for little else, has been utilized for ostrich farming.
In the Karoo district and northward through [271]Cape Colony, western Bechuanaland and the German possessions in Namaqualand and Damaraland —a desert region—there are few trees except small and thorny mimosas. Farther east, where there is a greater rainfall, the trees are more numerous and less thorny. The plain around Kimberley, once well wooded, has been stripped of its trees to furnish props for the diamond mines and fuel.
The lack of forests is one of the principal drawbacks to the development of South Africa. Timber is everywhere costly; the rainfall is less than it would be if the country were well wooded; and when rains do come the moisture is more rapidly dissipated by absorption, evaporation and sudden freshets because of the absence of shade. Of late energetic measures have been taken to supply nature’s lack by artificial forestry On the great veldt plateau in the vicinity of Kimberley and of Pretoria and in other localities the people have planted the Australian gum tree, the eucalyptus and several varieties of European trees, including the oak, which, besides being useful, is very beautiful. If the practice be continued the country will reap an incalculable benefit, not only in appearance, but also in climatic conditions. [272]
The largest political division of South Africa is Cape Colony. The area is about 292,000 square miles and the population, white and native, is 2,011,305. The whites number about 400,000. But little of it is suitable for agriculture, and considerable portions of it are too arid for stock raising. Including the natives the population is only about seven to the square mile. On the lowlands skirting the sea on the south and west are some fruitful regions that give a profitable yield of
grapes and corn. On the tableland of the interior there is a rainfall of only from five to fifteen inches in the year. As a consequence the surface is dry and unfriendly to vegetable life. In an area of three hundred miles by one hundred and fifty there is not a stream having a current throughout the year, nor is there any moisture at all in the dry season except some shallow pools which are soon dried up by evaporation. Nevertheless, in this desert, bare of trees and of herbage, there is abundance of prickly shrubs, which are sufficiently succulent when they sprout under the summer rains to afford good browsing for goats and sheep. In the northwestern part of the interior and northward to Kimberley and Mafeking, the country is better watered than the more westerly regions, and [273]grazing animals find a generous growth of grass as well as nutritious shrubs. In the southeastern part the rainfall is still heavier. The foothills of the Quathlamba Range toward the sea are covered in places with forests, the grass is more abundant and much of the land can be tilled to profit without artificial irrigation. In 1899 there were about 3,000 miles of railway and nearly 7,000 miles of telegraph open in the colony. The number of vessels entering the ports of Cape Colony in 1897 was 1,093, with a total tonnage of 2,694,370 tons; in addition to this there were 1,278 vessels engaged in the coastwise trade, with a tonnage of 3,725,831 tons. The foreign commerce of Cape Colony is large, including, as it does, the bulk of the import and export trade of all South Africa. The total importation of merchandise for 1897 was $80,127,495, and the exports, including a large proportion of the gold and diamond products of Kimberley and the Transvaal, amounted, in 1898, to $123,213,458.
Natal, beyond any other part of South Africa, is favored by natural advantages. It lies on the seaward slope of the Quathlamba Mountains, and its scenery is charmingly diversified by some of the lesser peaks and the foothills of that range. It is well watered by perennial streams [274]fed by the snows and springs of the mountains. While the higher altitudes to the west are bare, there is abundance of grass lower down and toward the coast there is plenty of wood. The climate in general is much warmer than that of Cape Colony; in the low strip bordering the sea it is almost tropical. This high temperature is not caused so much by latitude as by the current in the Mozambique Channel, which brings from the tropical regions of the Indian Ocean a vast stream of warm water, which acts on the climate of Natal as does the Gulf Stream on that of Georgia and the Carolinas. Nearly the whole of Natal may be counted temperate; the soil is rich, the scenery is beautiful, and, with the exception of certain malarious districts at the north, the climate is healthful. Foreigners from Europe and America may
reasonably hope to enjoy long life and prosperity in it. The principal crop for export is sugar, but cereals of all kinds, coffee, indigo, arrowroot, ginger, tobacco, rice, pepper, cotton and tea are grown to profit. The coal fields of the colony are large, the output in 1897 being 244,000 tons. There are 487 miles of railway, built and operated by the government. The imports in 1897 amounted to nearly $30,000,000. Pop. 828,500; whites, 61,000. [275]
The Orange Free State, in its entire area of 48,000 square miles, is on the great interior plateau at an altitude of from 4,000 to 5,000 feet above the sea level. The surface is mostly level, but there are occasional hills—some of them rising to a height of 6,000 feet. The land is, for the most part, bare of trees, but affords good grazing for two-thirds of the year. The air is remarkably pure and bracing. There are no blizzards to encounter. There are, however, occasional violent thunderstorms, which precipitate enormous hailstones—large enough to kill the smaller animals, and even men. Notwithstanding the generally parched appearance of the country, the larger streams do not dry up in winter. The southeastern part of the Free State, particularly the valley of the Caledon River, is one of the best corn-growing regions in Africa. In the main, however, with the exception of the river valleys, the land is more suitable for pasture than for tillage. The grazing farms are large and require the services of but few men; as a consequence the population increases slowly. The Free State, corresponding in size to the State of New York, has only about 80,000 white inhabitants and 130,000 natives. The chief industry is agriculture and stock-raising. A railway, [276]constructed by the Cape Colony government, connects Bloemfontein, the capital of the Orange Free State, with the ports of Cape Colony and Natal, and with Pretoria, the capital of the South African Republic.
The South African Republic, commonly called the Transvaal, is 119,139 square miles in area. The white population, numbering 345,397, is largely concentrated in the Witwatersrand mining district. The native inhabitants number 748,759. All the Transvaal territory belongs to the interior plateau, with the exception of a strip of lower land on the eastern and northern borders. This lower section is malarious. It is thought, however, that drainage and cultivation will correct this, as they have done in other fever districts. Like the Free State, the Transvaal is principally a grazing country. The few trees that exist in the more sheltered parts are of little value, except those in the lower valleys. The winters are severely cold, and the burning sun of summer soon dries up the moisture and bakes the soil, causing the grass to be stunted and yellow during most of the year. Until about sixteen years ago there was little in the surface appearance and known
resources of the Transvaal to attract settlers, and nothing to make it a desirable [277]possession to any other people than its Africander inhabitants. In 1884 discoveries of gold were made, the first of which that excited the world being some rich auriferous veins on the Sterkfontein farm. In a little time it became known that probably the richest deposit of gold in the world was in the Witwatersrand district of the Transvaal. Later, in 1897, diamonds were discovered in the Transvaal, the first stone having been picked up at Reitfontein, near the Vaal River, in August of that year. Since then the precious crystals have been found in the Pretoria district, in Roodeplaats on the Pienaars River, at Kameelfontein and at Buffelsduff. The output of gold in 1898 was $68,154,000, and of diamonds $212,812.01. The total output of gold since it was first discovered amounts to over $300,000,000, with $3,500,000,000 “in sight,” as valued by experts. The commerce of the South African Republic, while necessarily great because of the large number of people employed by the mining industries, cannot be as accurately stated as that of states whose imports are all received through a given port or ports. Foreign goods reach it through several ports in Cape Colony, Natal, Portuguese East Africa, and in smaller quantities from other ports on the coast. [278]The total imports for 1897 are estimated at $107,575,000.
Griqualand West, a British possession bordering on Cape Colony on the south and on the Free State on the east, owes its chief importance to the Kimberley diamond mines, near the western boundary of the Free State and 600 miles from Cape Town. These mines were opened in 1868 and 1869. It is estimated that since that time $350,000,000 worth of diamonds in the rough—worth double that sum after cutting—have been taken out. This enormous production would have been greatly exceeded had not the owners of the various mines in the group formed an agreement by which the annual output was limited to a small excess over the annual demand in the world’s diamond markets. So plentiful is the supply, and so inexpensive, comparatively, is the cost of mining that other diamond-producing works have almost entirely withdrawn from the industry since the South African mines were opened. It has been estimated that ninetyeight per cent of the diamonds of commerce are now supplied by these mines.
The British protectorate of Bechuanaland, lying to the north of Cape Colony and Griqualand and to the west of the Transvaal, has an [279]area of about 213,000 square miles, with a population of 200,000—mostly natives. A railway and telegraph line connect it with Cape Colony on the south and Rhodesia on the north.
Rhodesia includes the territory formerly known as British South Africa and a large part of that known as British East Africa. The area is about 750,000 square miles—equal to about one-fourth of the area of the United States of America, excluding Alaska. No exact statement of population can be made; estimates range from 1,000,000 to 2,000,000, of which only about 6,000 are whites. The entire territory is under the administration of the British South African Company, organized and incorporated in 1889, subject to the British High Commissioner at Cape Town. Rhodesia lies chiefly within the tablelands of South Africa and has large but yet undeveloped resources, including grazing and agricultural lands and important mining districts. Owing to the newness of the country to civilization no definite statement can be made relative to its commerce. In all probability Rhodesia will open a field wherein enterprise along the lines favored by its natural resources and conditions will be richly rewarded.
T
E . [281]
[Contents]
Standard and Popular Books FOR SALE BY BOOKSELLERS OR WILL BE SENT POSTPAID ON RECEIPT OF PRICE.
RAND, McNALLY & CO., PUBLISHERS, CHICAGO AND NEW YORK. [282]
Standard and Popular Books.
A B C OF MINING AND PROSPECTORS’ HANDBOOK. By Charles A. Bramble, D. L. S. Baedecker style. $1.00.
ACCIDENTS, AND HOW TO SAVE LIFE WHEN THEY OCCUR. 143 pages; profusely illustrated; leatheroid, 25 cents.
ALASKA; ITS HISTORY, CLIMATE, AND NATURAL RESOURCES. By Hon. A. P. Swineford, Ex-Governor of Alaska. Illustrated. 12mo, cloth. $1.00.
ALL ABOUT THE BABY. By Robert N. Tooker, M. D., author of “Diseases of Children,” etc. Illustrated. 8vo, cloth. $1.50.
ALONG THE BOSPHORUS. By Susan E. Wallace (Mrs. Lew Wallace). Profusely illustrated; 12mo; cloth. $1.50.
AMBER GLINTS. By “Amber.” Uniform with “Rosemary and Rue.” Cloth. $1.00.
AMERICAN BOOK OF THE DOG. Edited by G. O. Shields (“Coquinta”). Illustrated; 8vo; 700 pages. Plain edges, cloth, $3.50; half morocco, gilt top, $5.00; full morocco, gilt edges, $6.50.
AMERICAN GAME FISHES. Edited by G. O. Shields. Large 8vo; 155 illustrations and two colored plates. Cloth, $1.50; half morocco, $4.00; full morocco, gilt edged, $5.50.
AMERICAN NOBLEMAN, AN. By William Armstrong. 12mo, cloth, $1.00.
AMERICAN ROADSTERS AND TROTTING HORSES.
Illustrated with photo views of representative stallions. By H. T. Helm. 8vo; 600 pages; cloth, $5.00.
AMERICAN STREET RAILWAYS. By Augustine W. Wright. Bound in flexible, seal-grained leather, with red edges and round corners; gold side-stamps; 200 pages; $5.00.
ARCTIC ALASKA AND SIBERIA; OR, EIGHT MONTHS WITH THE ARCTIC WHALEMEN. By Herbert L. Aldrich. Illustrated; 12mo. Cloth, gold and black $1.00.
AN ARKANSAS PLANTER. By Opie Read. 12mo. Cloth, $1.00; paper, 25 cents.
ARMAGEDDON. By Stanley Waterloo, author of “Story of Ab,” “The Launching of a Man,” etc. 12mo, cloth. $1.00.
ART AND HANDICRAFT—ILLUSTRATED DESIGNS FOR THE NEEDLE, PEN, AND BRUSH. Edited by Maud
Howe Elliott. Cloth; 8vo. $1.50. [283]
ART OF WING SHOOTING. By W. B. Leffingwell. Paper cover, 50 cents; cloth, $1.00.
AT THE BLUE BELL INN. By J. S. Fletcher, author of “When Charles I was King,” etc. 16mo, cloth. 75 cents.
BALDOON. By Le Roy Hooker, author of “Enoch the Philistine.” 12mo, cloth. $1.25.
BANKING SYSTEM OF THE UNITED STATES. By Charles G. Dawes. Cloth. $1.00.
BATTLE OF THE BIG HOLE. By G. O. Shields. Illustrated; 12mo; 150 pages. Cloth. $1.00.
BIG GAME OF NORTH AMERICA. Edited by G. O. Shields (“Coquina”). Illustrated; 8vo; 600 pages. Cloth, $3.50; half morocco, gilt top, $5.00; full morocco, all gilt edges, $6.50.
BILLIARDS, OLD AND NEW. By John A. Thatcher. Vest Pocket Manual. Cloth, 75 cents; leather, $1.00.
BONDWOMAN, THE. By Marah Ellis Ryan, author of “Squaw Elouise,” “A Pagan of the Alleghanies,” etc. 12mo. Cloth. $1.25.
BONNIE MACKIRBY. By Laura Dayton Fessenden. 16mo. Cloth. 75 cents.
CAMPING AND CAMP OUTFITS. By G. O. Shields (“Coquina”). Illustrated; 12mo; 200 pages. Cloth. $1.25.
CHECKED THROUGH. By Richard Henry Savage. Paper, 25 cents; cloth, $1.00.
CHRISTOPHER COLUMBUS AND HIS MONUMENT COLUMBIA. Compiled by J. M. Dickey. Illustrated. 396 pages. Vellum, $2.00; cloth cover, $1.00.
COLONIAL DAME, A. By Laura Dayton Fessenden. Cloth. $1.00.
CONSTITUTIONAL HISTORY OF FRANCE. By Henry C. Lockwood. Illustrated; 8vo; 424 pages; cloth, $2.50; half morocco, gilt top, $3.50.
CRUISE UNDER THE CRESCENT, A. By Charles Warren Stoddard. 100 illustrations by Denslow. 12mo. Cloth. $1.50.
CRUISINGS IN THE CASCADES AND OTHER HUNTING ADVENTURES. By G. O. Shields (“Coquina”). Illustrated. 12mo; 300 pages. Cloth, $2.00; half morocco, $3.00.
CRULL’S TIME AND SPEED CHART. By E. S. Crull. Limp cloth cover; edges of pages indexed by speed in miles per hour; 50 cents. [284]
CURSED BY A FORTUNE. By George Manville Fenn. 12mo. Cloth. $1.00.
DAUGHTER OF CUBA, A. By Helen M. Bowen. 12mo. Cloth. $1.00.
DAUGHTER OF EARTH, A. By E. M. Davy. 12mo. Cloth. $1.00.
DEVIL’S DICE. By Wm. Le Queux, author of “Zoraida,” etc. 12mo. Paper, 25 cents; cloth, $1.00.
DRAWING AND DESIGNING. By Charles G. Leland, A. M. 12 mo; 80 pages; flexible cloth; 65 cents.
DREAM CHILD, A. By Florence Huntley. Cloth. 75 cents.
ENOCH THE PHILISTINE. By Le Roy Hooker. 12mo. Cloth. $1.25.
EVOLUTION OF DODD. By Wm. Hawley Smith. In neat cloth binding, gilt top. 75 cents.
EVOLUTION OF DODD’S SISTER. By Charlotte W. Eastman. In neat cloth binding. 75 cents.
EYE OF THE SUN, THE. By Edw. S. Ellis. 12mo; Cloth. $1.00.
FASCINATION OF THE KING. By Guy Boothby, author of “Dr. Nikola.” 12mo. Cloth. $1.00.
FIFTH OF NOVEMBER, THE. By Charles S. Bentley and F. Kimball Scribner. 12mo. Cloth. $1.00.
FONTENAY, THE SWORDSMAN. A military novel. By Fortune du Boisgobey. 12mo. Cloth. $1.00.
FOR HER LIFE. A story of St. Petersburg. By Richard Henry Savage. Paper, 50 cents; cloth, $1.00.
GEMMA. By Alexander McArthur. 16mo. Cloth. $1.00.
GENTLEMAN JUROR, A. By Charles L. Marsh, author of “Opening the Oyster,” etc. 12mo. Cloth. $1.25.
GLIMPSES OF ALASKA AND THE KLONDIKE. 100
Photographic Views of the Interior, from originals, by Veazie Wilson, compiled by Esther Lyons. 25 cents.
GOLDEN NORTH, THE. By C. R. Tuttle. With maps and engravings. Paper, 50 cents; cloth, $1.00.
HERNANI, THE JEW. A story of Poland. By A. N. Homer. 12mo; cloth, gilt top. $1.00.
HONDURAS. By Cecil Charles. Cloth, with map and portraits, $1.50.
IN SATAN’S REALM. By Edgar C. Blum. 12mo. Cloth. $1.25.
IN THE DAYS OF DRAKE. By J. S. Fletcher, author of “When Charles I was King.” 16mo; cloth. 75 cents. [285]
INCENDIARY, THE. By W. A. Leahy. 12mo; cloth. $1.00.
IN THE SHADOW OF THE PYRAMIDS. By Richard Henry Savage. Paper, 50 cents; cloth, $1.00.
IN THE SWIM. A story of Gayest New York. By Richard Henry Savage. Paper, 50 cents; cloth, $1.00.
JUDGE, THE. By Ella W. Peattie. Large 16mo; cloth. 75 cents.
KING OF THE MOUNTAINS. By Edmond About. 12mo; cloth. $1.00.
KIPLING BOY STORIES. By Rudyard Kipling. Illustrated, 8vo, cloth. $1.00.
KITCHEN, THE; OR, EVERY-DAY COOKERY. 104 pages; illustrated: leatherette. 25 cents.
LABOR, CAPITAL, AND A PROTECTIVE TARIFF. By John Vernon. 72 pages. Paper cover, pocket size, 10 cents.
LADY CHARLOTTE. By Adeline Sergeant. 12mo; cloth. $1.00.
LAST DAYS OF POMPEII. By Bulwer Lytton. 58 full page monogravure illustrations from original photographs. Two vols., boxed. Library, $3.00; De luxe, $6.00.
LAUNCHING OF A MAN, THE. By Stanley Waterloo, author of “A Man and A Woman,” “Story of Ab.” 12mo; cloth. $1.25.
LOCUST, OR GRASSHOPPER. By Chas. V. Riley, M. A., Ph. D. Illustrated; 236 pages; cloth cover. $1.00.
LOST COUNTESS FALKA. By Richard Henry Savage. Paper, 50 cents; cloth, $1.00.
LORNA DOONE. By R. D. Blackmore. 40 illustrations. Two vols., boxed. Cloth, gilt top, $3.00; half-calf, $5.00.
MAID OF THE FRONTIER, A. By H. S. Canfield. Large 16mo; cloth. 75 cents.
MANUAL OF INSTRUCTION FOR THE ECONOMICAL MANAGEMENT OF LOCOMOTIVES. By George H. Baker. Limp cloth; gold side-stamp; pocket form; 125 pages. $1.00.
MARBEAU COUSINS. By Harry Stillwell Edwards, author of “Sons and Fathers.” 12mo; cloth. $1.00.
MARGARET WYNNE. By Adeline Sergeant, author of “A Valuable Life,” etc. 12mo; cloth. $1.00.
MARIPOSILLA. By Mrs. Charles Stewart Daggett. 12mo; cloth. $1.25. [286]
MARRIED MAN, A. By Frances Aymar Matthews, author of “A Man’s Will and A Woman’s Way,” “Joan D’Arc,” etc. 12mo; cloth. $1.25.
MARSA. By Jules Clareti. Large 16mo; cloth. 75 cents.
MEMOIRS OF AN ARTIST. By Charles Gounod. Large 16mo; cloth. $1.25.
MILL OF SILENCE, THE. By B. E. J. Capes. Artistic cloth binding; gilt top. $1.00.
MISS NUME OF JAPAN. A Japanese-American romance. By Onoto Watanna, author of “Natsu-San,” etc. 12mo; cloth. $1.25.
MODERN CORSAIR, A. By Richard Henry Savage. Paper, 50 cents; cloth, $1.00.
MY BROTHER. By Vincent Brown. Neat cloth binding; gilt top. 75 cents.
MY INVISIBLE PARTNER. By Thomas S. Denison. Cloth. $1.00.
ORATIONS, ADDRESSES, AND CLUB ESSAYS. By Hon. George A. Sanders, M. A. Cloth binding. Price $1.25.
PACIFIC COAST GUIDE-BOOK. 8vo; 282 pages; cloth, $1.00; paper, 50 cents.
PHOEBE TILSON. By Mrs. Frank Pope Humphrey. A New England Tale. 12mo; cloth. $1.00.
POLITICS AND PATRIOTISM. By F. W. Schultz. 12mo; cloth. $1.00.
POLYGLOT PRONOUNCING HANDBOOK. By David G. Hubbard. Flexible cloth; 77 pages. 50 cents.
PREMIER AND THE PAINTER, THE. By I. Zangwill. Cloth. $1.00.
PROCEEDINGS OF THE WORLD’S CONGRESS OF BANKERS AND FINANCIERS. 615 pages; bound in half morocco, with gilt top, price $5.00; bound in cloth, price, $3.00.
PURE SAXON ENGLISH; OR, AMERICANS TO THE FRONT. By Elias Molee. 12mo; 167 pages; cloth. $1.00.
QUESTIONABLE MARRIAGE, A. By A. Shackleford Sullivan. 12mo; cloth. $1.00.
RAND, M’NALLY & CO.’S POCKET CYCLOPEDIA. 288 pages; leatherette. 25 cents. [287]
REED’S RULES. By the Hon. Thomas B. Reed. With portrait of the author. The latest acknowledged standard
manual for everyone connected in any way with public life. Price, in cloth cover, 75 cents; full seal grain flexible leather, $1.25.
REMINISCENCES OF W. W. STORY. By Miss M. E. Phillips. 8vo; cloth. $1.75.
REPUBLIC OF COSTA RICA. By Joaquin Bernardo Calvo. With maps and numerous illustrations; 8vo; 292 pages. Price $2.00.
ROMANCE OF A CHILD. By Pierre Loti. In neat cloth binding. 75 cents.
ROMANCE OF GRAYLOCK MANOR. By Louise F. P. Hamilton. 16mo; cloth. $1.25.
ROMOLA. By George Eliot. 56 monogravure illustrations; two volumes, boxed; 8vo; cloth, gilt top. $3.00.
ROSEMARY AND RUE. By “Amber.” With introductory by Opie Read. 12mo; cloth. $1.00.
RULES OF ETIQUETTE AND HOME CULTURE; OR, WHAT TO DO AND HOW TO DO IT. By Prof. Walter R. Houghton. Illustrated; 430 pages; cloth. 50 cents.
SECRET OF SUCCESS; OR, HOW TO GET ON IN THE WORLD. By W. H. Davenport Adams. 338 pages; cloth cover. 50 cents.
SHIFTING SANDS. By Frederick R. Burton. 12mo, cloth. $1.00.
SHOOTING ON UPLAND, MARSH, AND STREAM. Edited by William Bruce Leffingwell, author of “Wild Fowl Shooting.” Profusely illustrated; 8vo; 473 pages. Cloth, $3.50; half morocco, gilt edges, $4.50; full morocco, gilt edges, $6.50.
SIMPLICITY. By A. T. G. Price. Neat cloth binding. 75 cents.
SINNER, THE By Rita (Mrs. E. J. G. Humphreys). 12mo; cloth, gilt top. $1.00.
SONS AND FATHERS. By Harry Stillwell Edwards. Artistic cloth binding, gilt top. $1.00.
STRANGE STORY OF MY LIFE, THE. By John Strange Winter (Mrs. Stannard). 12mo, cloth. $1.50.
STRENGTH. A treatise on the development and use of muscle. By C. A. Sampson. A book specially suited for home use. Cloth, $1.00; paper, 50 cents. [288]
SWEDEN AND THE SWEDES. By William Widgery Thomas, Jr. English edition: One volume, cloth, $3.75; two volumes, $5.00; one volume, half morocco, $5.00; two volumes, $7.00; one volume, full morocco, $7.50; two volumes, $10.00. Swedish edition: One volume, cloth, $3.75; one volume, half morocco, $5.00; one volume, full morocco, $7.50. Large 8vo; 750 pages; 328 illustrations.
THOSE GOOD NORMANS. By Gyp. Artistic cloth binding, designed by J. P. Archibald. $1.00.
TOLD IN THE ROCKIES. By A. M. Barbour. 12mo, cloth. $1.00.
UNDER THE BAN. By Teresa Hammond Strickland 12mo; cloth. $1.00.
UNDER THREE FLAGS. By B. L. Taylor and A. T. Thoits. Artistic cloth binding, gilt top. $1.00.
UNKNOWN LIFE OF JESUS CHRIST. By Nicolas Notovitch. 12mo; cloth. $1.00.
VALUABLE LIFE, A. By Adeline Sergeant. 12mo, cloth. $1.00.
VALUE. An essay, with a short account of American currency. By John Borden. Cloth. $1.00.
VANISHED EMPEROR, THE. By Percy Andreae. 12mo, cloth. $1.25.
WATERS OF CANEY FORK. By Opie Read. 12mo; cloth. $1.00.
WHOM TO TRUST. By P. R. Earling. 304 pages. Cloth. $2.00.
WHOSE SOUL HAVE I NOW? By Mary Clay Knapp. In neat cloth binding. 75 cents.
WHOSO FINDETH A WIFE. By William Le Queux. 12mo, cloth. $1.00.
WILD FOWL SHOOTING. By William Bruce Leffingwell. Handsomely illustrated; 8vo; 373 pages. Cloth cover,
$2.50; half morocco, $3.50; full morocco, gilt edges, $5.50.
WOMAN AND THE SHADOW. By Arabella Kenealy. 12mo, cloth. $1.00.
WORLD’S RELIGIONS IN A NUTSHELL. By Rev. L. P. Mercer. Price, bound in cloth, $1.00; paper, 25 cents.
YANKEE FROM THE WEST, A. A new novel by Opie Read. 12mo, cloth. $1.00.
YOUNG GREER OF KENTUCKY. By Eleanor Talbot Kinkead. 12mo, cloth. $1.25.
C
Availability
This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or reuse it under the terms of the Project Gutenberg License included with this eBook or online at www gutenberg org ↗
This eBook is produced by the Online Distributed Proofreading Team at www pgdp net ↗
Metadata
Title: The Africanders: A century of DutchEnglish feud in South Africa
Author: Le Roy Hooker (1840–1906) Info https://viaf.org/viaf/104480090/
File generation date: 2023-09-23 11:48:34 UTC
Language: English
Original publication date: 1900
Keywords: Afrikaners, Transvaal (South Africa)
History, South Africa -- Politics and government