Download Applied statistics: theory and problem solutions with r dieter rasch rostock ebook All Chap by Education Libraries

https://ebookmass.com/product/applied-statistics-theory-andproblem-solutions-with-r-dieter-rasch-rostock/

Instant digital products (PDF, ePub, MOBI) ready for you

Download now and discover formats that fit your needs...

Applied Statistics with R: A Practical Guide for the Life Sciences Justin C. Touchon

https://ebookmass.com/product/applied-statistics-with-r-a-practicalguide-for-the-life-sciences-justin-c-touchon/ ebookmass.com

Applied Statistics for Environmental Science With R 1st Edition Abbas F. M. Alkarkhi

https://ebookmass.com/product/applied-statistics-for-environmentalscience-with-r-1st-edition-abbas-f-m-alkarkhi/

ebookmass.com

Applied Univariate, Bivariate, and Multivariate Statistics: Understanding Statistics for Social and Natural Scientists, With Applications in SPSS and R 2nd Edition Daniel J. Denis

https://ebookmass.com/product/applied-univariate-bivariate-andmultivariate-statistics-understanding-statistics-for-social-andnatural-scientists-with-applications-in-spss-and-r-2nd-edition-danielj-denis/ ebookmass.com

Family in Transition 17th Edition, (Ebook PDF)

https://ebookmass.com/product/family-in-transition-17th-edition-ebookpdf/

ebookmass.com

Silas: Club Sin Book 4 Jasinda Wilder

https://ebookmass.com/product/silas-club-sin-book-4-jasinda-wilder/

ebookmass.com

Islam and Muslim Resistance to Modernity in Turkey 1st ed. 2020 Edition Gokhan Bacik

https://ebookmass.com/product/islam-and-muslim-resistance-tomodernity-in-turkey-1st-ed-2020-edition-gokhan-bacik/

ebookmass.com

Infinity, Causation, and Paradox Alexander R Pruss

https://ebookmass.com/product/infinity-causation-and-paradoxalexander-r-pruss/

ebookmass.com

Research Methods, Statistics, and Applications Second Edition – Ebook PDF Version

https://ebookmass.com/product/research-methods-statistics-andapplications-second-edition-ebook-pdf-version/

ebookmass.com

Principles of Computer Security - Wm. Arthur Conklin & Greg White & Chuck Cothren & Roger L. Davis & Dwayne Williams

https://ebookmass.com/product/principles-of-computer-security-wmarthur-conklin-greg-white-chuck-cothren-roger-l-davis-dwayne-williams/

ebookmass.com

https://ebookmass.com/product/lead-book-1-gregory-h-garrison/

ebookmass.com

AppliedStatistics

TheoryandProblemSolutionswithR

DieterRasch

Rostock

Germany

RobVerdooren

Wageningen

TheNetherlands

JürgenPilz

Klagenfurt

Austria

Thiseditionﬁrstpublished2020

Allrightsreserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,or transmitted,inanyformorbyanymeans,electronic,mechanical,photocopying,recordingorotherwise, exceptaspermittedbylaw.Adviceonhowtoobtainpermissiontoreusematerialfromthistitleisavailable athttp://www.wiley.com/go/permissions.

TherightofDieterRasch,RobVerdoorenandJürgenPilztobeidentiﬁedastheauthorsofthisworkhas beenassertedinaccordancewithlaw.

RegisteredOﬃces

JohnWiley&Sons,Inc.,111RiverStreet,Hoboken,NJ07030,USA

JohnWiley&SonsLtd,TheAtrium,SouthernGate,Chichester,WestSussex,PO198SQ,UK

EditorialOﬃce

9600GarsingtonRoad,Oxford,OX42DQ,UK

Fordetailsofourglobaleditorialoﬃces,customerservices,andmoreinformationaboutWileyproducts visitusatwww.wiley.com.

Wileyalsopublishesitsbooksinavarietyofelectronicformatsandbyprint-on-demand.Somecontentthat appearsinstandardprintversionsofthisbookmaynotbeavailableinotherformats.

LimitofLiability/DisclaimerofWarranty

Whilethepublisherandauthorshaveusedtheirbesteffortsinpreparingthiswork,theymakeno representationsorwarrantieswithrespecttotheaccuracyorcompletenessofthecontentsofthisworkand specificallydisclaimallwarranties,includingwithoutlimitationanyimpliedwarrantiesofmerchantabilityor fitnessforaparticularpurpose.Nowarrantymaybecreatedorextendedbysalesrepresentatives,written salesmaterialsorpromotionalstatementsforthiswork.Thefactthatanorganization,website,orproductis referredtointhisworkasacitationand/orpotentialsourceoffurtherinformationdoesnotmeanthatthe publisherandauthorsendorsetheinformationorservicestheorganization,website,orproductmayprovide orrecommendationsitmaymake.Thisworkissoldwiththeunderstandingthatthepublisherisnotengaged inrenderingprofessionalservices.Theadviceandstrategiescontainedhereinmaynotbesuitableforyour situation.Youshouldconsultwithaspecialistwhereappropriate.Further,readersshouldbeawarethat websiteslistedinthisworkmayhavechangedordisappearedbetweenwhenthisworkwaswrittenandwhen itisread.Neitherthepublishernorauthorsshallbeliableforanylossofprofitoranyothercommercial damages,includingbutnotlimitedtospecial,incidental,consequential,orotherdamages.

LibraryofCongressCataloging-in-PublicationData

Names:Rasch,Dieter,author.|Verdooren,L.R.,author.|Pilz,Jürgen, 1951-author.

Title:Appliedstatistics:theoryandproblemsolutionswithR/Dieter Rasch(Rostock,GM),RobVerdooren,JürgenPilz.

Description:Hoboken,NJ:Wiley,2020.|Includesbibliographicalreferences andindex.|

Identiﬁers:LCCN2019016568(print)|LCCN2019017761(ebook)|ISBN 9781119551553(AdobePDF)|ISBN9781119551546(ePub)|ISBN9781119551522 (hardcover)

Subjects:LCSH:Mathematicalstatistics–Problems,exercises,etc.|R (Computerprogramlanguage)

Classiﬁcation:LCCQA276(ebook)|LCCQA276.R36682019(print)|DDC 519.5–dc23

LCrecordavailableathttps://lccn.loc.gov/2019016568

Coverdesign:Wiley

CoverImages:©DieterRasch,©whiteMocca/Shutterstock

Setin10/12ptWarnockProbySPiGlobal,Chennai,India

Preface xi

1The R-Package,SamplingProcedures,andRandomVariables 1

1.1Introduction 1

1.2TheStatisticalSoftwarePackage R 1

1.3SamplingProceduresandRandomVariables 4 References 10

2PointEstimation 11

2.1Introduction 11

2.2EstimatingLocationParameters 12

2.2.1MaximumLikelihoodEstimationofLocationParameters 17

2.2.2EstimatingExpectationsfromCensoredSamplesandTruncated Distributions 20

2.2.3EstimatingLocationParametersofFinitePopulations 23

2.3EstimatingScaleParameters 24

2.4EstimatingHigherMoments 27

2.5ContingencyTables 29

2.5.1ModelsofTwo-DimensionalContingencyTables 29

2.5.1.1ModelI 29

2.5.1.2ModelII 29

2.5.1.3ModelIII 30

2.5.2AssociationCoeﬃcientsfor2 × 2Tables 30 References 38

3TestingHypotheses–One-andTwo-SampleProblems 39

3.1Introduction 39

3.2TheOne-SampleProblem 41

3.2.1TestsonanExpectation 41

3.2.1.1TestingtheHypothesisontheExpectationofaNormalDistribution withKnownVariance 41

3.2.1.2TestingtheHypothesisontheExpectationofaNormalDistribution withUnknownVariance 47

3.2.2TestontheMedian 51

3.2.3TestontheVarianceofaNormalDistribution 54

3.2.4TestonaProbability 56

3.2.5PairedComparisons 57

3.2.6SequentialTests 59

3.3TheTwo-SampleProblem 63

3.3.1TestsonTwoExpectations 63

3.3.1.1TheTwo-Sample t -Test 63

3.3.1.2TheWelchTest 66

3.3.1.3TheWilcoxonRankSumTest 70

3.3.1.4DeﬁnitionofRobustnessandResultsofComparingTestsbySimulation 72

3.3.1.5SequentialTwo-SampleTests 74

3.3.2TestonTwoMedians 76

3.3.2.1Rationale 77

3.3.3TestonTwoProbabilities 78

3.3.4TestsonTwoVariances 79 References 81

4ConﬁdenceEstimations–One-andTwo-SampleProblems 83

4.1Introduction 83

4.2TheOne-SampleCase 84

4.2.1AConﬁdenceIntervalfortheExpectationofaNormalDistribution 84

4.2.2AConﬁdenceIntervalfortheVarianceofaNormalDistribution 91

4.2.3AConﬁdenceIntervalforaProbability 93

4.3TheTwo-SampleCase 96

4.3.1AConﬁdenceIntervalfortheDiﬀerenceofTwoExpectations–Equal Variances 96

4.3.2AConﬁdenceIntervalfortheDiﬀerenceofTwoExpectations–Unequal Variances 98

4.3.3AConﬁdenceIntervalfortheDiﬀerenceofTwoProbabilities 100 References 104

5AnalysisofVariance(ANOVA)–FixedEﬀectsModels 105

5.1Introduction 105

5.1.1RemarksaboutProgramPackages 106

5.2PlanningtheSizeofanExperiment 106

5.3One-WayAnalysisofVariance 108

5.3.1AnalysingObservations 109

5.3.2DeterminationoftheSizeofanExperiment 112

5.4Two-WayAnalysisofVariance 115

5.4.1Cross-Classiﬁcation(A × B) 115

5.4.1.1ParameterEstimation 117

5.4.1.2TestingHypotheses 119

5.4.2NestedClassiﬁcation(A≻B) 131

5.5Three-WayClassiﬁcation 134

5.5.1CompleteCross-Classiﬁcation(A×B × C ) 135

5.5.2NestedClassiﬁcation(C ≺ B ≺ A) 144

5.5.3MixedClassiﬁcations 147

5.5.3.1Cross-ClassiﬁcationbetweenTwoFactorswhereOneofThemIs Sub-OrdinatedtoaThirdFactor((B ≺ A)xC ) 148

5.5.3.2Cross-ClassiﬁcationofTwoFactors,inwhichaThirdFactorisNested (C ≺ (A × B)) 153 References 157

6AnalysisofVariance–ModelswithRandomEﬀects 159

6.1Introduction 159

6.2One-WayClassiﬁcation 159

6.2.1EstimationoftheVarianceComponents 160

6.2.1.1ANOVAMethod 160

6.2.1.2MaximumLikelihoodMethod 164

6.2.1.3 REML –Estimation 166

6.2.2TestsofHypothesesandConﬁdenceIntervals 169

6.2.3ExpectationandVariancesoftheANOVAEstimators 174

6.3Two-WayClassiﬁcation 176

6.3.1Two-WayCrossClassiﬁcation 176

6.3.2Two-WayNestedClassiﬁcation 182

6.4Three-WayClassiﬁcation 186

6.4.1Three-WayCross-ClassiﬁcationwithEqualSub-ClassNumbers 186

6.4.2Three-WayNestedClassiﬁcation 192

6.4.3Three-WayMixedClassiﬁcations 195

6.4.3.1Cross-ClassiﬁcationBetweenTwoFactorsWhereOneofThemis Sub-OrdinatedtoaThirdFactor((B ≺ A)×C ) 195

6.4.3.2Cross-ClassiﬁcationofTwoFactorsinWhichaThirdFactorisNested (C ≺ (A×B)) 197 References 199

7AnalysisofVariance–MixedModels 201

7.1Introduction 201

7.2Two-WayClassiﬁcation 201

7.2.1BalancedTwo-WayCross-Classiﬁcation 201

7.2.2Two-WayNestedClassiﬁcation 214

7.3Three-WayLayout 223

7.3.1Three-WayAnalysisofVariance–Cross-Classiﬁcation A × B × C223

7.3.2Three-WayAnalysisofVariance–NestedClassiﬁcation A ≻ B ≻ C230

7.3.2.1Three-WayAnalysisofVariance–NestedClassiﬁcation–ModelIII–BalancedCase 230

7.3.2.2Three-WayAnalysisofVariance–NestedClassiﬁcation–ModelIV–BalancedCase 232

7.3.2.3Three-WayAnalysisofVariance–NestedClassiﬁcation–ModelV–BalancedCase 234

7.3.2.4Three-WayAnalysisofVariance–NestedClassiﬁcation–ModelVI–BalancedCase 236

7.3.2.5Three-WayAnalysisofVariance–NestedClassiﬁcation–ModelVII–BalancedCase 237

7.3.2.6Three-WayAnalysisofVariance–NestedClassiﬁcation–ModelVIII–BalancedCase 238

7.3.3Three-WayAnalysisofVariance–MixedClassiﬁcation–(A × B) ≻ C239

7.3.3.1Three-WayAnalysisofVariance–MixedClassiﬁcation–(A × B) ≻ C ModelIII 239

7.3.3.2Three-WayAnalysisofVariance–MixedClassiﬁcation–(A × B) ≻ C ModelIV 242

7.3.3.3Three-WayAnalysisofVariance–MixedClassiﬁcation–(A × B) ≻ C ModelV 243

7.3.3.4Three-WayAnalysisofVariance–MixedClassiﬁcation–(A × B) ≻ C ModelVI 245

7.3.4Three-WayAnalysisofVariance–MixedClassiﬁcation–(A ≻ B) × C247

7.3.4.1Three-WayAnalysisofVariance–MixedClassiﬁcation–(A ≻ B) × C ModelIII 247

7.3.4.2Three-WayAnalysisofVariance–MixedClassiﬁcation–(A ≻ B) × C ModelIV 249

7.3.4.3Three-WayAnalysisofVariance–MixedClassiﬁcation–(A ≻ B) × C ModelV 251

7.3.4.4Three-WayAnalysisofVariance–MixedClassiﬁcation–(A ≻ B) × C ModelVI 253

7.3.4.5Three-WayAnalysisofVariance–MixedClassiﬁcation–(A ≻ B) × C modelVII 254

7.3.4.6Three-WayAnalysisofVariance–MixedClassiﬁcation–(A ≻ B) × C ModelVIII 255 References 256

8RegressionAnalysis 257

8.1Introduction 257

8.2RegressionwithNon-RandomRegressors–ModelIofRegression 262

8.2.1LinearandQuasilinearRegression 262

8.2.1.1ParameterEstimation 263

8.2.1.2ConﬁdenceIntervalsandHypothesesTesting 274

8.2.2IntrinsicallyNon-LinearRegression 282

8.2.2.1TheAsymptoticDistributionoftheLeastSquaresEstimators 283

8.2.2.2TheMichaelis–MentenRegression 285

8.2.2.3ExponentialRegression 290

8.2.2.4TheLogisticRegression 298

8.2.2.5TheBertalanﬀyFunction 306

8.2.2.6TheGompertzFunction 312

8.2.3OptimalExperimentalDesigns 315

8.2.3.1SimpleLinearandQuasilinearRegression 316

8.2.3.2IntrinsicallyNon-linearRegression 317

8.2.3.3TheMichaelis-MentenRegression 319

8.2.3.4ExponentialRegression 319

8.2.3.5TheLogisticRegression 320

8.2.3.6TheBertalanﬀyFunction 321

8.2.3.7TheGompertzFunction 321

8.3ModelswithRandomRegressors 322

8.3.1TheSimpleLinearCase 322

8.3.2TheMultipleLinearCaseandtheQuasilinearCase 330

8.3.2.1HypothesesTesting-General 333

8.3.2.2ConﬁdenceEstimation 333

8.3.3TheAllometricModel 334

8.3.4ExperimentalDesigns 335 References 335

9AnalysisofCovariance(ANCOVA) 339

9.1Introduction 339

9.2CompletelyRandomisedDesignwithCovariate 340

9.2.1BalancedCompletelyRandomisedDesign 340

9.2.2UnbalancedCompletelyRandomisedDesign 350

9.3RandomisedCompleteBlockDesignwithCovariate 358

9.4ConcludingRemarks 365 References 366

10MultipleDecisionProblems 367

10.1Introduction 367

10.2SelectionProcedures 367

10.2.1TheIndiﬀerenceZoneFormulationforSelectingExpectations 368

10.2.1.1IndiﬀerenceZoneSelection, �� 2 Known 368

10.2.1.2IndiﬀerenceZoneSelection, �� 2 Unknown 371

10.3TheSubsetSelectionProcedureforExpectations 371

10.4OptimalCombinationoftheIndiﬀerenceZoneandtheSubsetSelection Procedure 372

10.5SelectionoftheNormalDistributionwiththeSmallestVariance 375 10.6MultipleComparisons 375

10.6.1TheSolutionofMCProblem10.1 377

10.6.1.1The F -testforMCProblem10.1 377

10.6.1.2Scheﬀé’sMethodforMCProblem10.1 378

10.6.1.3Bonferroni’sMethodforMCProblem10.1 379

10.6.1.4Tukey’sMethodforMCProblem10.1for ni = n382

10.6.1.5GeneralisedTukey’sMethodforMCProblem10.1for ni ≠ n383

10.6.2TheSolutionofMCProblem10.2–theMultiplet-Test 384

10.6.3TheSolutionofMCProblem10.3–PairwiseandSimultaneous ComparisonswithaControl 385

10.6.3.1PairwiseComparisons–TheMultiplet-Test 385 10.6.3.2SimultaneousComparisons–TheDunnettMethod 387 References 390

11GeneralisedLinearModels 393

11.1Introduction 393

11.2ExponentialFamiliesofDistributions 394

11.3GeneralisedLinearModels–AnOverview 396

11.4Analysis–FittingaGLM–TheLinearCase 398

x Contents

11.5BinaryLogisticRegression 399

11.5.1Analysis 400

11.5.2Overdispersion 408

11.6PoissonRegression 411

11.6.1Analysis 411

11.6.2Overdispersion 417

11.7TheGammaRegression 417

11.8GLMforGammaRegression 418

11.9GLMfortheMultinomialDistribution 425 References 428

12SpatialStatistics 429

12.1Introduction 429

12.2Geostatistics 431

12.2.1Semi-variogramFunction 432

12.2.2Semi-variogramParameterEstimation 439

12.2.3Kriging 440

12.2.4 Trans-GaussianKriging 446

12.3SpecialProblemsandOutlook 450

12.3.1GeneralisedLinearModelsinGeostatistics 450

12.3.2CopulaBasedGeostatisticalPrediction 451 References 451

AppendixAListofProblems 455

AppendixBSymbolism 483

AppendixCAbbreviations 485

AppendixDProbabilityandDensityFunctions 487

Index 489

Preface

Wewrotethisbookforpeoplethathavetoapplystatisticalmethodsintheirresearch butwhosemaininterestisnotintheoremsandproofs.Becauseofsuchanapproach, ouraimisnottoprovidethedetailedtheoreticalbackgroundofstatisticalprocedures. Whilemathematicalstatisticsasabranchofmathematicsincludesdeﬁnitionsaswellas theoremsandtheirproofs,appliedstatisticsgiveshintsfortheapplicationoftheresults ofmathematicalstatistics.

Sometimesappliedstatisticsusessimulationresultsinplaceofresultsfromtheorems. Anexampleisthatthenormalityassumptionneededformanytheoremsinmathematicalstatisticscanbeneglectedinapplicationsforlocationparameterssuchas theexpectation,seeforthisRaschandTiku(1985).Nearlyallstatisticaltestsand conﬁdenceestimationsforexpectationshavebeenshownbysimulationstobevery robustagainsttheviolationofthenormalityassumptionneededtoprovecorresponding theorems.

WegavethepresentbookananalogousstructuretothatofRaschandSchott(2018)so thatthereadercaneasilyﬁndthecorrespondingtheoreticalbackgroundthere.Chapter 11‘GeneralisedLinearModels’andChapter12‘SpatialStatistics’ofthepresentbook havenoprototypeinRaschandSchott(2018).Further,thepresentbookcontainsno exercises;lecturerscaneitherusetheexercises(withsolutionsintheappendix)inRasch andSchott(2018)ortheexercisesintheproblemsmentionedbelow.

Instead,ouraimwastodemonstratethetheorypresentedinRaschandSchott(2018) andthatunderlyingthenewChapters11and12usingfunctionsandproceduresavailableinthestatisticalprogrammingsystemR,whichhasbecomethegoldenstandard whenitcomestostatisticalcomputing.

Withinthetext,thereaderﬁndsoftenthesequenceproblem–solution–example withproblemsnumberedwithinthechapters.Readersinterestedonlyinspecialapplicationsinmanycasesmayﬁndthecorrespondingprocedureinthelistofproblemsin AppendixA.

WethankAlisonOliver(Wiley,Oxford)andMustaqAhamed(Wiley)fortheirassistanceinpublishingthisbook.

Weareveryinterestedinthecommentsofreaders.Pleasecontact: d_rasch@t-online.de,l.r.verdooren@hetnet.nl,juergen.pilz@aau.at. Rostock,Wageningen,andKlagenfurt,June2019,theauthors.

xii Preface

References

Rasch,D.andTiku,M.L.(eds.)(1985).Robustnessofstatisticalmethodsand nonparametricstatistics.In: ProceedingsoftheConferenceonRobustnessofStatistical MethodsandNonparametricStatistics,heldatSchwerin(DDR),May29-June2,1983. Boston,Lancaster,Tokyo:ReidelPubl.Co.Dordrecht.

Rasch,D.andSchott,D.(2018). MathematicalStatistics.Oxford:Wiley.

The R-Package,SamplingProcedures,andRandomVariables

1.1Introduction

Inthischapterwegiveanoverviewofthesoftwarepackage R andintroducebasic knowledgeaboutrandomvariablesandsamplingprocedures.

1.2TheStatisticalSoftwarePackage R

Inpracticalinvestigations,professionalstatisticalsoftwareisusedtodesign experimentsortoanalysedataalreadycollected.Weapplyherethesoftwarepackage R. Anybodycanextendthefunctionalityof R withoutanyrestrictionsusingfreesoftware tools;moreover,itisalsopossibletoimplementspecialstatisticalmethodsaswell ascertainproceduresofCandFORTRAN.Suchtoolsareofferedontheinternetin standardisedarchives.ThemostpopulararchiveisprobablyCRAN(Comprehensive R ArchiveNetwork),aservernetthatissupervisedbythe R DevelopmentCore Team.ThisnetalsooffersthepackageOPDOE(optimaldesignofexperiments), whichwasthoroughlydescribedinRasch etal .(2011).Furtheritoffersthefollowing packagesusedinthisbook: car,lme4,DunnettTests,VCA,lmerTest, mvtnorm,seqtest,faraway,MASS,glm2,geoR,gstat. Apartfromonlyafewexceptions, R containsimplementationsforallstatisticalmethodsconcerninganalysis,evaluation,andplanning.Wereferfordetailsto Crawley(2013).

Thesoftwarepackage R isavailablefreeofchargefromhttp://cran.r-project.org fortheoperatingsystemsLinux,MacOSX,andWindows.Theinstallationunder MicrosoftWindowstakesplacevia‘Windows’.Choosing‘base’theinstallationplatformisreached.Using‘DownloadR2.X.XforWindows’(Xstandsfortherequired versionnumber)thesetupfilecanbedownloaded.Afterthisfileisstartedthesetup assistantrunsthroughtheinstallationsteps.Inthisbook,allstandardsettingsare adopted.Theinterestedreaderwillfindmoreinformationabout R athttp://www.rproject.orgorinCrawley(2013).

Afterstarting R theinputwindowwillbeopened,presentingtheredcolouredinput request:‘>’.Herecommandscanbewrittenupandcarriedoutbypressingtheenter button.Theoutputisgivendirectlybelowthecommandline.However,theusercanalso realiselinechangesaswellaslineindentsforincreasingclarity.Notallthisinﬂuencesthe functionalprocedure.Acommandtoreadforinstancedata y = (1,3,8,11)isasfollows: AppliedStatistics:TheoryandProblemSolutionswithR, FirstEdition. DieterRasch,RobVerdooren,andJürgenPilz. ©2020JohnWiley&SonsLtd.Published2020byJohnWiley&SonsLtd.

1The R-Package,SamplingProcedures,andRandomVariables

>y<-c(1,3,8,11)

Theassignmentoperatorin R isthetwo-charactersequence‘<-’or‘=’.

TheWorkspaceisaspecialworkingenvironmentin R.There,certainobjectscanbe storedthatwereobtainedduringthecurrentworkwith R.Suchobjectscontainthe resultsofcomputationsanddatasets.AWorkspaceisloadedusingthemenu

File–LoadWorkspace...

Inthisbookthe R-commandsstartwith >.Readerswholiketouse R-commandsmust onlytypeorcopythetextafter > intothe R-window.

Anadvantageof R isthat,aswithotherstatisticalpackageslikeSASandIBM-SPSS,we nolongerneedanappendixwithtablesinstatisticalbooks.Oftentablesofthedensity ordistributionfunctionofthestandardnormaldistributionappearinsuchappendices. However,thevaluescanbeeasilycalculatedusing R.

ThenotationofthisandthefollowingchaptersisjustthatofRaschandSchott(2018).

Problem1.1 Calculatethevalue ��(z)ofthedensityfunctionofthestandardnormal distributionforagivenvalue z.

Solution

Usethecommand > dnorm(z,mean = 0,sd = 1). Ifthe mean or sd isnot speciﬁedtheyassumethedefaultvaluesof0and1,respectively.Hence > dnorm(z) canbeusedinProblem1.1.

Example

Wecalculatethevalue ��(1)ofthedensityfunctionofthestandardnormaldistribution using

>dnorm(1)

[1]0.2419707

Problem1.2 Calculatethevalue ��(z)ofthedistributionfunctionofthestandardnormaldistributionforagivenvalue z

Solution

Usethecommand > pnorm(z,mean = 0,sd = 1).

Example

Wecalculatethevalue ��(1)ofthedistributionfunctionofthestandardnormaldistributionby > pnorm(1,mean = 0,sd = 1) orusingthedefaultvaluesusing > pnorm(1).

>pnorm(1) [1]0.8413447

Also,forothercontinuousdistributions,weobtainusing d withthe R-nameofadistribution,thevalueofthedensityfunctionand,using p withthe R-nameofadistribution, thevalueofthedistributionfunction.Wedemonstratethisinthenextproblemforthe lognormaldistribution.

Problem1.3 Calculatethevalueofthedensityfunctionofthelognormaldistribution whoselogarithmhasmeanequalto meanlog = 0 andstandarddeviationequalto sdlog = 1 foragivenvalue z.

Solution

Usethecommand > dlnorm(z,meanlog = 0,sdlog = 1) orusethedefault values meanlog = 0 and sdlog = 1 using > dlnorm(z) .

Example

Wecalculatethevalueofthedensityfunctionofthelognormaldistributionwith meanlog = 0 and sdlog = 1 using >dlnorm(1) [1]0.3989423

Problem1.4 Calculatethevalueofthedistributionfunctionofthelognormaldistributionwhoselogarithmhasmeanequalto meanlog = 0 andstandarddeviation equalto sdlog = 1 foragivenvalue z.

Solution

Usethecommand > plnorm(z,meanlog = 0,sdlog = 1) orusethedefault values meanlog = 0 and sdlog = 1 using > plnorm(z) .

Example

Wecalculatethevalueofthedistributionfunctionforz = 1ofthelognormaldistribution with meanlog = 0 and sdlog = 1 using >plnorm(1) [1]0.5

Frommostoftheotherdistributionsweneedthequantiles(orpercentiles) qP = P ( y ≤ P ). Thiscanbedonebywriting q followedbythe R-nameofthedistribution.

Problem1.5 Calculatethe P %-quantileofthe t -distributionwithdfdegreesof freedomandoptionalnon-centralityparameterncp.

Solution

Usethecommand > qt(P,df,ncp) andforacentral t -distributionusethedefault byomitting ncp.

Example

Calculatethe95%-quantileofthecentral t -distributionwith10degreesoffreedom. >qt(0.95,10) [1]1.812461

Wedemonstratetheprocedureforthechi-squareandthe F -distribution.

Problem1.6 Calculatethe P %-quantileofthe �� 2 -distributionwithdfdegreesof freedomandoptionalnon-centralityparameterncp.

4 1The R-Package,SamplingProcedures,andRandomVariables

Solution

Usethecommand > qchisq(P,df,ncp) andforthecentral �� 2 -distributionwith dfdegreesoffreedomuse > qchisq(P,df).

Example

Calculatethe95%-quantileofthecentral �� 2 -distributionwith10degreesoffreedom.

>qchisq(0.95,10) [1]18.30704

Problem1.7 Calculatethe P%-quantileofthe F -distributionwithdf1anddf2degrees offreedomandoptionalnon-centralityparameterncp.

Solution

Usethecommand > qf(P,df1,df2,ncp),andforthecentral F -distributionwith df1anddf2degreesoffreedomuse > qf(P,df1,df2).

Example

Calculatethe95%-quantileofthe centralF -distributionwith10and20degreesof freedom!

>qf(0.95,10,20) [1]2.347878

Forthecalculationoffurthervaluesofprobabilityfunctionsofdiscreterandomvariablesorofdistributionfunctionsandquantilesthecommandscanbefoundbyusing thehelpfunctioninthetoolbarofR,andthenyoumaycallupthe‘manual’oruse Crawley(2013).

1.3SamplingProceduresandRandomVariables

Evenifwe,inthisbook,wemainlydiscusshowtoplanexperimentsandtoanalyse observeddata,westillneedbasicknowledgeaboutrandomvariablesbecause,without this,wecouldnotexplainunbiasedestimatorsortheexpectedlengthofaconﬁdence intervalorhowtodeﬁnetherisksofastatisticaltests.

Definition1.1 Asamplingprocedurewithoutreplacement(wor)orwithreplacement (wr)isaruleofselectingapropersubset,namedsample,fromawell-definedfinitebasic setofobjects(population,universe).Itissaidtobeatrandomifeachelementofthe basicsethasthesameprobability p tobedrawnintothesample.Wealsocansaythat inarandomsamplingprocedureeachpossiblesamplehasthesameprobabilitytobe drawn.

A(concrete)sampleistheresultofasamplingprocedure.Samplesresultingfrom arandomsamplingprocedurearesaidtobe(concrete)randomsamplesorshortly samples.

Ifweconsiderallpossiblesamplesfromagivenﬁniteuniverse,then,fromthisdeﬁnition,itfollowsthateachpossiblesamplehasthesameprobabilitytobedrawn.

1.3SamplingProceduresandRandomVariables 5

Thereareseveralrandomsamplingproceduresthatcanbeusedinpractice.Basic setsofobjectsaremostlycalled(statistical)populationsor,synonymously,(statistical) universes.

Concerningrandomsamplingprocedures,wedistinguish(amongothercases):

• Simple(orpure)randomsamplingwithreplacement(wr)whereeachofthe N elementsofthepopulationisselectedwithprobability 1 N .

• Simplerandomsamplingwithoutreplacement(wor)whereeachunorderedsample of n diﬀerentobjectshasthesameprobabilitytobechosen.

• Inclustersampling,thepopulationisdividedintodisjointsubclasses(clusters).Randomsamplingwithoutreplacementisdoneamongtheseclusters.Intheselected clusters,allobjectsaretakenintothesample.Thiskindofselectionisoftenusedin areasampling.ItisonlyrandomcorrespondingtoDeﬁnition1.1iftheclusterscontain thesamenumberofobjects.

• Inmulti-stagesampling,samplingisdoneinseveralsteps.Werestrictourselvestotwo stagesofsamplingwherethepopulationisdecomposedintodisjointsubsets(primary units).Partoftheprimaryunitsissampledrandomlywithoutreplacement(wor)and withinthempurerandomsamplingwithoutreplacement(wor)isdonewiththesecondaryunits.Amulti-stagesamplingisfavourableifthepopulationhasahierarchical structure(e.g.country,province,townsintheprovince).ItisatrandomcorrespondingtoDeﬁnition1.1iftheprimaryunitscontainthesamenumberofsecondaryunits.

• Sequentialsampling,wherethesamplesizeisnotfixedatthebeginningofthesamplingprocedure.Atfirst,asmallsamplewithreplacementistakenandanalysed.Then itisdecidedwhethertheobtainedinformationissufficient,e.g.torejectortoaccept agivenhypothesis(seeChapter3),orifmoreinformationisneededbyselectinga furtherunit.

Whenaclusterorintwo-stagesamplingtheclustersorprimaryunitshavediﬀerent sizes(numberofelementsorareas),moresophisticatedmethodsareused(Raschetal. 2008,Methods1/31/2110,1/31/3100).

Botharandomsampling(procedure)andarbitrarysampling(procedure)canresultin thesameconcretesample.Hence,wecannotprovebyinspectingtheconcretesample itselfwhetherornotthesampleisrandomlychosen.Wehavetocheckthesampling procedureusedinstead.

Inmathematicalstatisticsrandomsamplingwithareplacementprocedureismodelledbyavector Y = (y1 , y2 , … , yn )T ofrandomvariables yi , i = 1, … , n,whichare independentlydistributedasarandomvariable y,i.e.theyallhavethesamedistribution.The yi , i = 1, , n aresaidindependentlyandidenticallydistributed(i.i.d.).This leadstothefollowingdeﬁnition.

Deﬁnition1.2 Arandomsampleofsize n isavector Y = (y1 , y2 , … , yn )T with n i.i.d. randomvariables yi , i = 1, … , n aselements.

Randomvariablesgiveninboldprint(seeAppendixAformotivation).

Thevector Y = (y1 , y2 , … , yn )T iscalledarealisationof Y = (y1 , y2 , … , yn )T andis usedasamodelofavectorofobservedvaluesorvaluesselectedbyarandomselection procedure.

Toexplainthisapproachletusassumethatwehaveauniverseof100elements(the numbers1–100).Weliketodrawapurerandomsamplewithoutreplacement(wor)of

1The R-Package,SamplingProcedures,andRandomVariables

size n = 10fromthisuniverseandmodelthisby Y = (y1 , y2 , , y10 )T .Whenarandom samplehasbeendrawnitcouldbethevector Y = ( y1 , y2 , , y10 )T = (3,98,12,37,2,67, 33,21,9,56)T = (2,3,9,12,21,33,37,56,67,98)T .Thismeansthatitisonlyimportant whichelementhasbeenselectedandnotatwhichplacethishashappened.Allsamples woroccurwithprobability 1 (100 10 ) .Thedenominator (100 10 ) canbecalculatedby R withthe > choose() command

>choose(100,10) [1]1.731031e+13 andfromthistheprobabilityis 1 1731031×107 . Wecannowwrite

P {(y1 , y2 , , y10 )T =(2, 3, 9, 12, 21, 33, 37, 56, 67, 98)T }= 1 1731031 × 107

Inaprobabilitystatement,somethingmustalwaysberandom.Towrite

P {(y1 , y2 , , y10 )T =(2, 3, 9, 12, 21, 33, 37, 56, 67, 98)T } isnonsensebecause(y1 , y2 , … , y10 )T asthevectorontheright-hand-sideisavectorof specialnumbersanditisnonsensetoaskfortheprobabilitythat5equals7.

Toexplainthesituationagainweconsidertheproblemofthrowingafairdice;thisisa dicewhereweknowthateachofthenumbers1, …,6occurswiththesameprobability 1 6 .Weaskfortheprobabilitythatanevennumberisthrown.Becauseonehalfofthesix numbersareeven,thisprobabilityis 1 2 .Assumewethrowthediceusingadicecupand lettheresultbehidden,thantheprobabilityisstill 1 2 .However,ifwetakethedicecup away,arealisationoccurs,letussaya5.Now,itisstupidtoask,whatistheprobability that5isevenorthatanevennumberiseven.Probabilitystatementsaboutrealisations ofrandomvariablesaresenselessandnotallowed.Thereaderofthisbookshouldonly lookataprobabilitystatementintheformofaformulaifsomethingisinboldprint; onlyinsuchacaseisaprobabilitystatementpossible.

WelearninChapter4whataconfidenceintervalis.Itisdefinedasanintervalwith atleastonerandomboundaryandwecan,forexample,calculatewithsomesmall �� the probability1 �� thattheexpectationofsomerandomvariableiscoveredbythisinterval.However,whenwehaverealisedboundaries,thentheintervalisfixedanditeither coversordoesnotcovertheexpectation.Inappliedstatistics,weworkwithobserved datamodelledbyrealisedrandomvariables.Thenthecalculatedintervaldoesnotallow aprobabilitystatement.Wedonotknow,byusing R orotherwise,whetherthecalculated intervalcoverstheexpectationornot.Whydidwefixthisprobabilitybeforestartingthe experimentwhenwecannotuseitininterpretingtheresult?

Theanswerisnoteasy,butwewilltrytogivesomereasons.Ifaresearcherhasto carryoutmanysimilarexperimentsandineachofthemcalculatesforsomeparameter a(1 �� )conﬁdenceinterval,thenhecansaythatinabout(1 �� )100%ofallcasesthe intervalhascoveredtheparameter,butofcoursehedoesnotknowwhenthishappened. Whatshouldwedowhenonlyoneexperimenthastobedone?Thenweshouldchoose (1 �� )solarge(say0.95or0.99)thatwecantaketheriskofmakinganerroneousstatementbysayingthattheintervalcoverstheparameter.Thisisanalogoustothesituation ofapersonwhohasaseverediseaseandneedsanoperationinhospital.Thepersoncan

1.3SamplingProceduresandRandomVariables 7

choosebetweentwohospitalsandknowsthatinhospitalAabout99%ofpeopleoperatedonsurvivedasimilaroperationandinhospitalBonlyabout80%.Ofcourse(without furtherinformation)thepersonchoosesAevenwithoutknowingwhethershe/hewill survive.Asinnormallife,alsoinscience;wehavetotakerisksandtomakedecisions underuncertainty.

Wenowshowhow R caneasilysolvesimpleproblemsofsampling.

Problem1.8

Drawapurerandomsamplewithoutreplacementofsize n < N from N givenobjectsrepresentedbynumbers1, …, N withoutreplacingthedrawnobjects.

Thereare M = (N n ) possibleunorderedsampleshavingthesameprobability p = 1 M to beselected.

Solution

Insertin R adataﬁle y with N entriesandcontinueinthenextlinewith >sample (y,n,replace = FALSE) or >sample(y,n,replace = F) with n < N to createasampleofsize n < N diﬀerentelementsfromy;whenweinsert replace = TRUE wegetrandomsamplingwithreplacement.Thedefaultis replace = FALSE, henceforsamplingwithoutreplacementwecanuse >sample(y,n).

Example

Wechoose N = 9,and n = 5,withpopulationvalues y = (1,2,3,4,5,6,7,8,9)

>y<-c(1,2,3,4,5,6,7,8,9) >sample(y,5) [1]76513

Apurerandomsamplingwithreplacementalsooccursiftherandomsampleis obtainedbyreplacingtheobjectsimmediatelyafterdrawingandeachobjecthasthe sameprobabilityofcomingintothesampleusingthisprocedure.Hence,thepopulation alwayshasthesamenumberofobjectsbeforeanewobjectistaken.Thisisonlypossible iftheobservationofobjectsworkswithoutdestroyingorchangingthem(examples aretensilebreakingtests,medicalexaminationsofkilledanimals,fellingoftrees, harvestingoffood).

Problem1.9 Drawwithreplacementapurerandomsampleofsize n from N given objectsrepresentedbynumbers1, …, N withreplacingthedrawnobjects.Thereare Mrep = (N + n 1 n ) possibleunorderedsampleshavingthesameprobability 1 Mrep tobe selected.

Solution

Insertin R adataﬁle y with N entriesandcontinueinthenextlinewith >sample (y,n,replace =TRUE) or >sample(y,n,replace=T) tocreateasample ofsize n notnecessarilywithdiﬀerentelementsfrom y.

Examples

Examplewith n < N

8 1The R-Package,SamplingProcedures,andRandomVariables

>y<-c(1,2,3,4,5,6,7,8,9)

>sample(y,5,replace=T)

[1]24642

Examplewith n > N

>y<-c(1,2,3,4,5,6,7,8,9)

>sample(y,10,replace=T)

[1]3955998763

Amethodthatcansometimesberealisedmoreeasilyissystematicsamplingwith arandomstart.Itisapplicableiftheobjectsoftheﬁnitesamplingsetarenumbered from1to N ,andthesequenceisnotrelatedtothecharacterconsidered.Ifthequotient m = N /n isanaturalnumber,avalue i between1and m ischosenatrandom,andthe sampleiscollectedfromobjectswithnumbers i, m + i,2m + i, ,(n –1)m + i.Detailed informationaboutthiscaseandthecasewherethequotient m isnotanintegercanbe foundinRaschetal.(2008,method1/31/1210).

Problem1.10 Fromasetof N objectssystematicsamplingwitharandomstartshould choosearandomsampleofsize n.

Solution

Weassumethatinthesequence1,2, , N thereisnotrend.Letassumethat m = N n isanintegerandselectbypurerandomsamplingavalue1 ≤ x ≤ m (sampleofsize1) fromthe m numbers1, …, m.Thenthesystematicsamplewithrandomstartcontains thenumbers x, x + m, x + 2m, … , x + (n 1)m.

Example

Wechoose N = 500and n = 20,andthequotient 500 20 = 25isaninteger-valued.AnalogoustoProblem1.1wedrawarandomsampleofsize1from(1,2, ,25)using R.

>y<-c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, 16,17,18,19,20,21,22,23,24,25)

>sample(y,1)

[1]9

Theﬁnalsystematicsamplewithrandomstartofsize n = 20startswithnumber x = 9 and m = 25:(9,34,59,84,109,134,159,184,209,234,259,284,309,334,359,384,409, 434,459,484).

Problem1.11 Byclustersampling,fromapopulationofsize N decomposedinto s disjointsubpopulations,so-calledclustersofsizes N 1 , N 2 ,.., N s ,arandomsamplehas tobedrawn.

Solution

Partialsamplesofsize ni arecollectedfromtheithstratum(i = 1,2, … , s)wherepure randomsamplingprocedureswithoutreplacementareusedineachstratum.Thisleads toarandomsamplingwithoutreplacementprocedureforthewholepopulationifthe numbers ni /n arechosenproportionaltothenumbers N i /N .Theﬁnalrandomsample contains n = ∑s i=1 ni elements.

Example

Vienna,thecapitalofAustria,issubdividedinto23municipalities.Werepeatatable withthenumbersofinhabitants N ∗ i inthesemunicipalitiesfromRaschetal.(2011)and roundthenumbersfordemonstratingtheexampletovaluessothat N i /N isaninteger, where N = 1700000.

Nowweselectbypurerandomsamplingwithoutreplacement,asshownin Problem1.8,fromeachmunicipality ni fromthe N i inhabitantstoreachatotalrandom sampleof1000inhabitantsfromthe1700000peopleinVienna.

Whileforthestratifiedrandomsamplingobjectsareselectedwithoutreplacement fromeachsubset,fortwo-stagesampling,subsetsorobjectsareselectedatrandom withoutreplacementateachstage,asdescribedbelow.Letthepopulationconsistof s disjointsubsetsofsize N 0 ,theprimaryunits,inthetwo-stagecase.Further,wesuppose thatthecharactervaluesinthesingleprimaryunitsdifferonlyatrandom,sothatobjects neednottobeselectedfromallprimaryunits.Ifthedesiredsamplesizeis n = rn0 with r < s,theninthefirststep, r ofthe s givenprimaryunitsareselectedusingapurerandom samplingprocedure.Inthesecondstep, n0 objects(secondaryunits)arechosenfrom eachselectedprimaryunit,againapplyingapurerandomsampling.Thenumberof possiblesamplesis ( s r ) ⋅ (N0 n0 ),andeachobjectofthepopulationhasthesameprobability p = r s ⋅ n0 N0 toreachthesamplecorrespondingtoDefinition1.1.

Problem1.12 Drawarandomsampleofsize n inatwo-stageprocedurebyselecting ﬁrstfromthe s primaryunitshavingsizes N i (i = 1, …,s)exactly r units.

Solution

Todrawarandomsamplewithoutreplacementofsize n weselectadivisor r of n and fromthe s primaryunitswerandomlyselect r proportionaltotherelativesizes Ni N with N = ∑s i=1 Ni (i = 1, …, s).Fromeachoftheselected r primaryunitsweselectbypurerandomsamplingwithoutreplacement n r elementsasthetotalsampleofsecondaryunits.

Example

WetakeagainthevaluesofTable1.1andselect r = 5fromthe s = 23municipalitiesto takeanoverallsampleof n = 1000.Forthiswesplittheinterval(0,1]into23subintervals (1000 Ni 1 N , 1000 Ni N ] i = 1, ,23with N 0 = 0andgenerateﬁveuniformlydistributed randomnumbersin(0,1].Ifarandomnumbermultipliedby1000fallsinanyofthe23 sub-intervals(whichcanbeeasilyfoundbyusingthe‘cum’columninTable1.1)the correspondingmunicipalityhastobeselected.Ifafurtherrandomnumberfallsinto thesameintervalitisreplacedbyanotheruniformlydistributedrandomnumber.We generateﬁvesuchrandomnumbersasfollows:

>runif(5) [1]0.187691120.782294300.093594990.466779040.51150546

TheﬁrstnumbercorrespondstoMariahilf,thesecondtoFlorisdorf,thethirdto Landstraße,thefourthtoHietzing,andthelastonetoPenzing.Toobtainarandom sampleofsize1000wetakepurerandomsamplesofsize200frompeopleinMariahilf, Florisdorf,Landstraße,Hietzing,andPenzing,respectively.

1The R-Package,SamplingProcedures,andRandomVariables

Table1.1 Number N∗ i , i = 1, , 23ofinhabitantsin23municipalitiesofVienna.

Municipality N∗ i Ni ni = 1000 Ni N cum

InnereStadt16958170001010

Leopoldstadt945951020006070

Landstraße837378500050120

Wieden305873400020140

Margarethen525485100030170

Mariahilf293713400020190

Neubau300563400020210

Josefstadt239123400020230

Alsergrund394223400020250

Favoriten173623170000100350

Simmering881028500050400

Meidling872858500050450

Hietzing511475100030480

Penzing841878500050530

Rudolfsheim709026800040570

Ottakring9473510200060630

Hernals527015100030660

Währing478615100030690

Döbling682776800040730

Brigittenau823698500050780

Floridsdorf13972913600080860

Donaustadt15340815300090950

Liesing9175985000501000

Total N * =1687271 N = 1700000 n = 1000

Roundednumbers N i , ni ,andcumulated ni .

Source:FromStatistikAustria(2009)BevölkerungsstandinclusiveRevisionseit1.1.2002,Wien, StatistikAustria.

References

Crawley,M.J.(2013). The R Book ,2ndedition,Chichester:Wiley. Rasch,D.andSchott,D.(2018). MathematicalStatistics.Oxford:Wiley. Rasch,D.,Herrendörfer,G.,Bock,J.,Victor,N.,andGuiard,V.(2008). Verfahrensbibliothek Versuchsplanungund-auswertung ,2.verbesserteAuﬂageineinemBandmitCD.R. OldenbourgVerlagMünchenWien.

Rasch,D.,Pilz,J.,Verdooren,R.,andGebhardt,A.(2011). OptimalExperimentalDesign with R.BocaRaton:ChapmanandHall.

PointEstimation

2.1Introduction

Thetheoryofpointestimationisdescribedinmostbooksaboutmathematicalstatistics, andwereferhere,asinotherchapters,mainlytoRaschandSchott(2018).

Wedescribetheproblemasfollows.Letthedistribution P �� ofarandomvariable y dependonaparameter(vector) �� ∈ ��⊆ Rp , p ≥ 1.Withthehelpofarealisation, Y ,ofa randomsample Y = (y1 , y2 , , yn )T , n ≥ 1wehavetomakeastatementconcerningthe valueof �� (orafunctionofit).Theelementsofarandomsample Y areindependently andidenticallydistributed(i.i.d)like y.Obviouslythestatementabout �� shouldbeas preciseaspossible.Whatthisreallymeansdependsonthechoiceofthelossfunction deﬁnedinsection1.4inRaschandSchott(2018).Wedeﬁneanestimator S (Y ),i.e.a measurablemappingof Rn onto �� takingthevalue S (Y )fortherealisation Y =(y1 , y2 , , yn )T of Y, where S (Y )iscalledtheestimateof �� .Theestimateisthustherealisationof theestimator.Inthischapter,dataareassumedtoberealisations(y1 , y2 , , yn )ofone randomsamplewhere n iscalledthesamplesize;thecaseofmorethanonesampleis discussedinthefollowingchapters.Therandomsample,i.e.therandomvariable y stems fromsomedistribution,whichisdescribedwhenthemethodofestimationdependson thedistribution–likeinthemaximumlikelihoodestimation.Forthisdistributionthe r thcentralmoment

isassumedtoexistwhere �� = E (y)istheexpectationand �� 2 = E [(y �� )2 ]isthevariance of y.The r thcentralsamplemoment mr isdeﬁnedas

with

Anestimator S (Y )basedonarandomsample Y = (y1 , y2 , … , yn )T ofsize n ≥ 1issaid tobeunbiasedwithrespectto �� if E [S (Y )]= �� (2.4) holdsforall �� . Thediﬀerence bn (�� ) = E [S (Y )] �� iscalledthebiasoftheestimator S (Y ).

2PointEstimation

Weshowherehow R caneasilycalculateestimatesoflocationandscaleparameters aswellashighermomentsfromadataset.Weatﬁrstcreateasimpledataset y in R.The followingvaluesareweightsinkilogramsandthereforenon-negative.

>y<-c(5,7,1,7,8,9,13,9,10,10,18,10,15,10,10,11,8,11,12,13,15, 22,10,25,11)

Ifweconsider y asasample,thesamplesize n canwith R bedeterminedvia >length(y) [1]25

i.e. n = 25.Westartwithestimatingtheparametersoflocation.

InSections2.2,2.3,and2.4weassumethatweobservemeasurementsinanintervalscaleorratioscale;iftheyareinanordinalornominalscaleweusethemethods describedinSection2.5.

2.2EstimatingLocationParameters

Whenweestimateanyparameterweassumethatitexists,sospeakingaboutexpectations,skewness �� 1 = �� 3 /�� 3 ,kurtosis �� 2 = [�� 4 /�� 4 ] 3andsoonweassumethatthe correspondingmomentsintheunderlyingdistributionexist.

Thearithmeticmean,orbrieﬂy,themean y = 1 n n ∑ i=1 yi

isanestimateoftheexpectation �� ofsomedistribution.

Problem2.1 Calculatethearithmeticmeanofasample.

Solution

Usethecommand > mean(). >mean(y)

Example

Weusethesample Y alreadydeﬁnedaboveandobtain

(2.5)

>y<-c(5,7,1,7,8,9,13,9,10,10,18,10,15,10,10,11,8,11,12,13,15,22, 10,25,11)

>mean(y) [1]11.2

i.e. y = 1 25 ∑25 i=1 yi = 11.2.

Thearithmeticmeanisaleastsquaresestimateoftheexpectation �� of y. Thecorrespondingleastsquaresestimatoris y = 1 n ∑n i=1 yi andisunbiased.

Problem2.2 Calculatetheextremevalues y(1) = min(y)and y(n) = max(y)ofasample.

Solution

Wereceivetheextremevaluesusingthe R commands >min() and >max().

Example

Again,weusethesample y deﬁnedaboveandobtain >min(y)

[1]1 >max(y)

[1]25

i.e. y(1) = 1and y(25) = 25ifwedenotethe jthelementoftheorderedsetof Y by y(j) suchthat y(1) ≤ … ≤ y(n) holds.Note:youcangetbothvaluesusingthecommand > range(y).

Sometimesoneormoreelementsof Y = (y1 , y2 , … , yn )T donothavethesamedistributionastheothersand Y = (y1 , y2 , … , yn )T isnotarandomsample.

Ifonlyafewoftheelementsof Y haveadiﬀerentdistributionwecallthemoutliers. Oftentheminimumandthemaximumvaluesof y representrealisationsofsuchoutliers. Ifweconjecturetheexistenceofsuchoutlierswecanusespecial L-estimatorsasthe trimmedortheWinsorisedmean.Outliersinobservedvaluescanoccurevenifthe correspondingelementof Y isnotanoutlier.Thiscanhappenbyincorrectlywriting downanobservednumberorbyanerrorinthemeasuringinstrument.

L-estimatorsareweightedmeansoforderstatistics(where L standsforlinearcombination).Ifwearrangetheelementsoftherealisation Y of Y accordingtotheirmagnitude,andifwedenotethe jthelementofthisorderedsetby y(j) suchthat y(1) ≤ ≤ y(n) holds,then

Y( ) =(y(1) , , y(n) )T isafunctionoftherealisationof Y ,and S (Y ) = Y (.) = (y(1) , … , y(n) )T issaidtobethe orderstatisticvector,thecomponent y(i) iscalledthe ithorderstatisticand

issaidtobean L-estimatorand ∑

iscalledan L-estimate. Ifweput

c1 =···= ct = cn t +1 =···= cn = 0and ct +1 =···= cn t = 1 n 2t in(2.6)with t < n 2 ,then LT (Y )=

(2.7) iscalledthe t n –trimmedmean.

Ifwedonotsuppressthe t smallestandthe t largestobservations,butconcentrate theminthevalues y(t + 1) and y(n t ) ,respectively,thenwegettheso-called t n Winsorised mean

Themedianinsamplesofevensize n = 2m canbedeﬁnedasthe1/2Winsorisedmean

TocalculatethetrimmedandWinsorisedmeansusing R weﬁrstorderthesamplesof n observationsbymagnitude.

Problem2.3 Orderavectorofnumbersbymagnitude.

Solution

Usethevector y ofnumbersandthecommand >sort(). >sortedy<-sort(y)

Example

Weagainusethesample

>y<-c(5,7,1,7,8,9,13,9,10,10,18,10,15,10,10,11,8, 11,12,13,15,22,10,25,11)

andobtain

>sortedy<-sort(y)

>sortedy

[1]15778899101010101010111111121313151518 2225

Problem2.4 Calculatethe 1 n trimmedmeanofasample.

Solution

Weatﬁrstorderthesample Y usingthecommand sort ,asshowninProblem2.3. Thenwedropthesmallestandthelargestentryin y anddenotetheresultas x.With > mean(x) weobtainthe 1 n trimmedmeanofasample Y .

Example

Weuse sortedy,theorderedsample y fromProblem2.3ofthe25observations.

[1]15778899101010101010111111121313151518 2225

anddropmanuallythesmallestandthelargestentryandcalltheresult x.

x<-c(5,7,7,8,8,9,9,10,10,10,10,10,10,11,11,11,12,13,13,15,15,18,22)

However,thiscanbedonedirectlywith R asfollows

>x<-sortedy[-1]

>x<-x[-24]

[1]577889910101010101011111112131315151822

>length(x) [1]23

Thenwecalculatethemeanoftheentriesin x.

>mean(x) [1]11.04348

andbyroundingweobtain

Thisisthe 1 25 –trimmedmeanof y.

Note:youcandirectlyﬁndthetrimmedmeanusingthecommand > mean(y, trim=1/25).

Problem2.5 Calculatethe 1 n Winsorisedmeanofasampleofsize n.

Solution

Weatﬁrstorderthesample Y usingthecommand sort ,asshowninProblem2.3.Then weset y(1) = y(2) and y(n 1) = y(n) andcalltheresult z.

Example

Wecalculatethe 1 25 Winsorisedmeanof y in y<-c(5,7,1,7,8,9,13,9,10,10,18,10,15,10, 10,11,8,11,12,13,15,22,10,25,11).

Weatﬁrstcalculateusingsorttheorderedsample

>sortedy<-sort(y) 157788991010101010101111111213131515182225 andshiftmanually1to5and22to25.Theresultis

z<-c(5,5,7,7,8,8,9,10,10,10,10,10,10,11,11,11,12,13,13,15,15,18,25,25)

Ofcoursethiscanbedonedirectlyin R using >sortedy[1]<-5 >sortedy[24]<-25 >z<-sortedy >z

[1]55778899101010101010111111121313 1515182525

Wegetthe 1 25 Winsorisedmeanvia >mean(z) [1]11.48

orbyroundingas LW (Y )= 1 25 [∑24 i=2 y(i) + y(2) + y(24) ] = 11.5.