Statistical methods 4th edition donna l. mohr - Download the ebook now for an unlimited reading expe

Page 1


Instant digital products (PDF, ePub, MOBI) ready for you

Download now and discover formats that fit your needs...

Statistical Methods in the Atmospheric Sciences 4th Edition Daniel S. Wilks

https://ebookmass.com/product/statistical-methods-in-the-atmosphericsciences-4th-edition-daniel-s-wilks/

ebookmass.com

Statistical Methods for Survival Data Analysis 3rd Edition

Lee

https://ebookmass.com/product/statistical-methods-for-survival-dataanalysis-3rd-edition-lee/

ebookmass.com

Clinical Chemistry: Fundamentals and Laboratory Techniques

1st Edition Donna L. Larson

https://ebookmass.com/product/clinical-chemistry-fundamentals-andlaboratory-techniques-1st-edition-donna-l-larson/

ebookmass.com

Staffing Organizations Herbert G. Heneman Iii

https://ebookmass.com/product/staffing-organizations-herbert-gheneman-iii/

ebookmass.com

FINCHES FOR DUMMIES 2nd

Edition Nikki

https://ebookmass.com/product/finches-for-dummies-2nd-edition-nikkimoustaki/

ebookmass.com

The First 7 Laura Pohl

https://ebookmass.com/product/the-first-7-laura-pohl-3/

ebookmass.com

Transformations : women, gender and psychology Third Edition. Edition Crawford

https://ebookmass.com/product/transformations-women-gender-andpsychology-third-edition-edition-crawford/

ebookmass.com

Disaster in the Boardroom Gerry Brown

https://ebookmass.com/product/disaster-in-the-boardroom-gerry-brown/

ebookmass.com

Economics Today: The Micro View 20th Edition Roger Leroy Miller

https://ebookmass.com/product/economics-today-the-micro-view-20thedition-roger-leroy-miller/

ebookmass.com

Climate Urbanism: Towards a Critical Research Agenda 1st Edition Vanesa Castán Broto

https://ebookmass.com/product/climate-urbanism-towards-a-criticalresearch-agenda-1st-edition-vanesa-castan-broto/

ebookmass.com

StatisticalMethods

StatisticalMethods

FourthEdition

DONNAL.MOHR

UniversityofNorthFlorida,Emeritus

UniversityofNorthFlorida,Emeritus

RUDOLFJ.FREUND

AcademicPressisanimprintofElsevier

125LondonWall,LondonEC2Y5AS,UnitedKingdom

525BStreet,Suite1650,SanDiego,CA92101,UnitedStates

50HampshireStreet,5thFloor,Cambridge,MA02139,UnitedStates

TheBoulevard,LangfordLane,Kidlington,OxfordOX51GB,UnitedKingdom

Copyright r 2022ElsevierInc.Allrightsreserved.

Nopartofthispublicationmaybereproducedortransmittedinanyformorbyanymeans, electronicormechanical,includingphotocopying,recording,oranyinformationstorageand retrievalsystem,withoutpermissioninwritingfromthepublisher.Detailsonhowtoseek permission,furtherinformationaboutthePublisher’spermissionspoliciesandourarrangementswith organizationssuchastheCopyrightClearanceCenterandtheCopyrightLicensingAgency,canbe foundatourwebsite: www.elsevier.com/permissions

Thisbookandtheindividualcontributionscontainedinitareprotectedundercopyrightbythe Publisher(otherthanasmaybenotedherein).

Notices

Knowledgeandbestpracticeinthisfieldareconstantlychanging.Asnewresearchandexperience broadenourunderstanding,changesinresearchmethods,professionalpractices,ormedical treatmentmaybecomenecessary.

Practitionersandresearchersmustalwaysrelyontheirownexperienceandknowledgeinevaluating andusinganyinformation,methods,compounds,orexperimentsdescribedherein.Inusingsuch informationormethodstheyshouldbemindfuloftheirownsafetyandthesafetyofothers, includingpartiesforwhomtheyhaveaprofessionalresponsibility.

Tothefullestextentofthelaw,neitherthePublishernortheauthors,contributors,oreditors, assumeanyliabilityforanyinjuryand/ordamagetopersonsorpropertyasamatterofproducts liability,negligenceorotherwise,orfromanyuseoroperationofanymethods,products, instructions,orideascontainedinthematerialherein.

BritishLibraryCataloguing-in-PublicationData

AcataloguerecordforthisbookisavailablefromtheBritishLibrary

LibraryofCongressCataloging-in-PublicationData

AcatalogrecordforthisbookisavailablefromtheLibraryofCongress

ISBN:978-0-12-823043-5

ForInformationonallAcademicPresspublications visitourwebsiteat https://www.elsevier.com/books-and-journals

ContentStrategist: KateyBirtcher

ContentDevelopmentSpecialist: AliceGrant

PublishingServicesManager: ShereenJameel

ProjectManager: RukmaniKrishnan

TypesetbyMPSLimited,Chennai,India

PrintedinIndia

Lastdigitistheprintnumber:987654321

1.DataandStatistics1

1.1 Introduction 1

1.1.1 DataSources4

1.1.2 UsingtheComputer5 1.2 ObservationsandVariables5

1.3 TypesofMeasurementsforVariables7 1.4 Distributions13

1.4.1 GraphicalRepresentationofDistributions15

1.5 NumericalDescriptiveStatistics19

1.5.1 Location20

1.5.2 Dispersion24

1.5.3 OtherMeasures29

1.5.4 ComputingtheMeanandStandardDeviationfromaFrequencyDistribution31

1.5.5 ChangeofScale31

1.6 ExploratoryDataAnalysis33

1.6.1 TheStemandLeafPlot33

1.6.2 TheBoxPlot34

1.6.3 ExamplesofExploratoryDataAnalysis35

1.7 BivariateData37

1.7.1 CategoricalVariables38

1.7.2 CategoricalandIntervalVariables40

1.7.3 IntervalVariables40

1.8 Populations,Samples,andStatisticalInference APreview42

DataCollection43

2.2.1 DefinitionsandConcepts70

2.2.2 SystemReliability73

2.2.3 RandomVariables75

2.3 DiscreteProbabilityDistributions78

2.3.1 PropertiesofDiscreteProbabilityDistributions78

2.3.2 DescriptiveMeasuresforProbabilityDistributions79

2.3.3 TheDiscreteUniformDistribution80

2.3.4 TheBinomialDistribution82

2.3.5 ThePoissonDistribution84

2.4 ContinuousProbabilityDistributions85

2.4.1 CharacteristicsofaContinuousProbabilityDistribution86

2.4.2 TheContinuousUniformDistribution87

2.4.3 TheNormalDistribution88

2.4.4 CalculatingProbabilitiesUsingtheTableoftheNormalDistribution90

2.5 SamplingDistributions96

2.5.1 SamplingDistributionoftheMean96

2.5.2 UsefulnessoftheSamplingDistribution101

2.5.3 SamplingDistributionofaProportion103

2.6 OtherSamplingDistributions106

2.6.1 The x 2 Distribution107

2.6.2 DistributionoftheSampleVariance108

2.6.3 The t Distribution109

2.6.4 Usingthe t Distribution110

2.6.5 The F Distribution111

2.6.6 Usingthe F Distribution111

2.6.7 RelationshipsamongtheDistributions113

2.7 ChapterSummary114

2.8 ChapterExercises114 ConceptQuestions114 PracticeExercises115 Exercises 116

3.PrinciplesofInference123

3.1 Introduction124

3.2 HypothesisTesting125

3.2.1 GeneralConsiderations125

3.2.2 TheHypotheses126

3.2.3 RulesforMakingDecisions128

3.2.4 PossibleErrorsinHypothesisTesting129

3.2.5 ProbabilitiesofMakingErrors130

3.2.6 Choosingbetween α and β 132

3.2.7 Five-StepProcedureforHypothesisTesting133

3.2.8 WhyDoWeFocusontheTypeIError?134

3.2.9 Choosing α 135

3.2.10 TheFiveStepsforExample3.3138

3.2.11 p Values139

3.2.12 TheProbabilityofaTypeIIError141

3.2.13 Power144

3.2.14 UniformlyMostPowerfulTests145

3.2.15 One-TailedHypothesisTests145

3.3 Estimation 147

3.3.1 InterpretingtheConfidenceCoefficient149

3.3.2 RelationshipbetweenHypothesisTestingandConfidenceIntervals151

3.4 SampleSize 152

3.5 Assumptions155

3.5.1 StatisticalSignificanceversusPracticalSignificance156

3.6 ChapterSummary157

3.7 ChapterExercises159 ConceptQuestions159 PracticeExercises160 MultipleChoiceQuestions161 Exercises

4.InferencesonaSinglePopulation169

4.1 Introduction171

4.2 InferencesonthePopulationMean171

4.2.1 HypothesisTeston μ

4.2.2 Estimationof μ 175

4.2.3 SampleSize176

4.2.4 DegreesofFreedom177

4.3 InferencesonaProportionforLargeSamples178

4.3.1 HypothesisTeston p 178

4.3.2 Estimationof p 179

4.3.3 SampleSize181

4.4 InferencesontheVarianceofOnePopulation181

4.4.1 HypothesisTeston σ 2

4.4.2 Estimationof σ 2

4.5 Assumptions184

4.5.1 RequiredAssumptionsandSourcesofViolations185

4.5.2 DetectionofViolations185

4.5.3 TestsforNormality186

4.5.4 IfAssumptionsFail188

4.5.5 AlternateMethodology190

4.6 ChapterSummary191

4.7 ChapterExercises192 ConceptQuestions192 PracticeExercises192 Exercises 193 Projects 199

5.InferencesforTwoPopulations201

5.1 Introduction203

5.2 InferencesontheDifferencebetweenMeansUsingIndependentSamples204

5.2.1 SamplingDistributionofaLinearFunctionofRandomVariables204

5.2.2 TheSamplingDistributionoftheDifferencebetweenTwoMeans205

5.2.3 VariancesKnown206

5.2.4 VariancesUnknownbutAssumedEqual207

5.2.5 ThePooledVarianceEstimate207

5.2.6 The “Pooled” t Test208

5.2.7 VariancesUnknownbutNotEqual210

5.2.8 ChoosingbetweenthePooledandUnequalVariance t Tests213

5.3 InferencesonVariances214

5.4 InferencesonMeansforDependentSamples218

5.5 InferencesonProportionsforLargeSamples222

5.5.1 ComparingProportionsUsingIndependentSamples223

5.5.2 ComparingProportionsUsingPairedSamples226

5.6 AssumptionsandRemedialMethods227

5.7 ChapterSummary230

5.8 ChapterExercises231 ConceptQuestions231 PracticeExercises232

6.InferencesforTwoorMoreMeans243

6.1 Introduction244

6.6.1 UsingStatisticalSoftware245

6.2 TheAnalysisofVariance245

6.2.1 NotationandDefinitions247

6.2.2 HeuristicJustificationfortheAnalysisofVariance249

6.2.3 ComputationalFormulasandthePartitioningofSumsofSquares252

6.2.4 TheSumofSquaresbetweenMeans252

6.2.5 TheSumofSquareswithinGroups253

6.2.6 TheRatioofVariances253

6.2.7 PartitioningoftheSumsofSquares253

6.3 TheLinearModel256

6.3.1 TheLinearModelforaSinglePopulation256

6.3.2 TheLinearModelforSeveralPopulations257

6.3.3 TheAnalysisofVarianceModel257

6.3.4 FixedandRandomEffectsModel258

6.3.5 TheHypotheses258

6.3.6 ExpectedMeanSquares259

6.4 Assumptions260

6.4.1 AssumptionsandDetectionofViolations260

6.4.2 FormalTestsfortheAssumptionofEqualVariance261

6.4.3 RemedialMeasures262

6.5 SpecificComparisons265

6.5.1 Contrasts266

6.5.2 Constructinga t StatisticforaContrast267

6.5.3 PlannedContrastswithNoPattern Bonferroni’sMethod268

6.5.4 PlannedComparisonsversusControl Dunnett’sMethod269

6.5.5 PlannedAllPossiblePairwiseComparisons Fisher’sLSDandTukey’sHSD270

6.5.6 PlannedOrthogonalContrasts272

6.5.7 UnplannedContrasts Scheffé’sMethod274

6.5.8 Comments278

6.6 RandomModels278

6.7 AnalysisofMeans281

6.7.1 ANOMforProportions284

6.7.2 ANOMforCountData285

6.8 ChapterSummary287

6.9 ChapterExercises288 ConceptQuestions288 PracticeExercises289 Exercises 291 Projects 299

7.LinearRegression301

7.1 Introduction302

7.2 TheRegressionModel304

7.3 EstimationofParameters β 0 and β 1 308

7.3.1 ANoteonLeastSquares310

7.4 Estimationof σ 2 andthePartitioningofSumsofSquares312

7.5 InferencesforRegression315

7.5.1 TheAnalysisofVarianceTestfor β 1 316

7.5.2 The(Equivalent) t Testfor β 1 317

7.5.3 ConfidenceIntervalfor β 1 319

7.5.4 InferencesontheResponseVariable319

7.6 UsingStatisticalSoftware325

7.7 Correlation328

7.8 RegressionDiagnostics331

7.9 ChapterSummary337

7.10 ChapterExercises340 ConceptQuestions340 PracticeExercises341 Exercises 342 Projects 348

8.MultipleRegression351

8.1 TheMultipleRegressionModel354

8.1.1 ThePartialRegressionCoefficient356

8.2 EstimationofCoefficients357

8.2.1 SimpleLinearRegressionwithMatrices358

8.2.2 EstimatingtheParametersofaMultipleRegressionModel362

8.2.3 CorrectingfortheMean,anAlternativeCalculatingMethod363

8.3 InferentialProcedures370

8.3.1 Estimationof σ 2 andthePartitioningoftheSumsofSquares370

8.3.2 TheCoefficientofVariation372

8.3.3 InferencesforCoefficients372

8.3.4 TestsNormallyProvidedbyStatisticalSoftware375

8.3.5 TheEquivalent t StatisticforIndividualCoefficients378

8.3.6 InferencesontheResponseVariable383

8.4 Correlations384

8.4.1 MultipleCorrelation384

8.4.2 HowUsefulIsthe R2 Statistic?385

8.4.3 PartialCorrelation386

8.5 UsingStatisticalSoftware387

8.6 SpecialModels390

8.6.1 ThePolynomialModel391

8.6.2 TheMultiplicativeModel395

8.6.3 NonlinearModels399

8.7 Multicollinearity399

8.7.1 RedefiningVariables403

8.7.2 OtherMethods405

8.8 VariableSelection405

8.8.1 OtherSelectionProcedures409

8.9 DetectionofOutliers,RowDiagnostics411

8.10 ChapterSummary419

8.11 ChapterExercises423 ConceptQuestions423 PracticeExercises424

9.FactorialExperiments445

9.1 Introduction446

9.2 ConceptsandDefinitions447

9.3 TheTwo-FactorFactorialExperiment450

9.3.1 TheLinearModel450

9.3.2 Notation451

9.3.3 ComputationsfortheAnalysisofVariance452

9.3.4 Between-CellsAnalysis452

9.3.5 TheFactorialAnalysis453

9.3.6 ExpectedMeanSquares455

9.3.7 UnbalancedData459

9.4 SpecificComparisons460

9.4.1 PreplannedContrasts460

9.4.2 BasicTestStatisticforContrasts461

9.4.3 MultipleComparisons462

9.5 QuantitativeFactors468

9.5.1 LackofFit470

9.6 NoReplications472

9.7 ThreeorMoreFactors472

9.7.1 AdditionalConsiderations475

9.8 ChapterSummary475

9.9 ChapterExercises479 ConceptQuestions479 PracticeExercises480

10.DesignofExperiments493

10.1 Introduction495

10.2 TheRandomizedBlockDesign496

10.2.1 TheLinearModel498

10.2.2 RelativeEfficiency501

10.2.3 RandomTreatmentEffectsintheRandomizedBlockDesign502

10.3 RandomizedBlockswithSampling502

10.4 OtherDesigns508

10.4.1 FactorialExperimentsinaRandomizedBlockDesign509

10.4.2 NestedDesigns512

10.5 RepeatedMeasuresDesigns515

10.5.1 OneBetween-SubjectandOneWithin-SubjectFactor516

10.5.2 TwoWithin-SubjectFactors521

10.5.3 AssumptionsoftheRepeatedMeasuresModel523

10.5.4 SplitPlotDesigns524

10.5.5 AdditionalTopics529

10.6 ChapterSummary529

10.7 ChapterExercises533 ConceptQuestions533 PracticeExercises534 Exercises

11.OtherLinearModels547

11.1 Introduction547

11.2 TheDummyVariableModel549

11.2.1 FactorEffectsCoding552

11.2.2 ReferenceCellCoding552

11.2.3 ComparingCodingSchemes552

11.3 UnbalancedData554

11.4 StatisticalSoftware'sImplementationoftheDummyVariableModel556

11.5 ModelswithDummyandIntervalVariables558

11.5.1 AnalysisofCovariance560

11.5.2 MultipleCovariates564

11.5.3 UnequalSlopes565

11.5.4 IndependenceofCovariatesandFactors568

11.6 ExtensionstoOtherModels570

11.7 EstimatingLinearCombinationsofRegressionParameters570

11.7.1 CovarianceMatrices571

11.7.2 LinearCombinationofRegressionParameters572

11.8 WeightedLeastSquares574

CorrelatedErrors577

12.CategoricalData597

12.1 Introduction597

12.2 HypothesisTestsforaMultinomialPopulation598

12.3 GoodnessofFitUsingthe χ 2 Test601

12.3.1 TestforaDiscreteDistribution601

12.3.2 TestforaContinuousDistribution602

12.4 ContingencyTables604

12.4.1 ComputingtheTestStatistic605

12.4.2 TestforHomogeneity606

12.4.3 TestforIndependence608

12.4.4 MeasuresofDependence610

12.4.5 LikelihoodRatioTest611

12.4.6 Fisher'sExactTest612

12.5 SpecificComparisonsinContingencyTables614

12.6 ChapterSummary615

12.7 ChapterExercises616 ConceptQuestions616 PracticeExercises617

13.SpecialTypesofRegression623

13.1 Introduction623

13.1.1 MaximumLikelihoodandLeastSquares623

13.2 LogisticRegression625

13.3 PoissonRegression631

13.3.1 ChoosingbetweenLogisticandPoissonRegression636

13.4 NonlinearLeast-SquaresRegression638

13.4.1 SigmoidalShapes(SCurves)639

13.4.2 SymmetricUnimodalShapes639

13.5 ChapterSummary642

13.6 ChapterExercises643 ConceptQuestions643

14.NonparametricMethods651

14.1 Introduction653

14.1.1 Ranks654

14.1.2 RandomizationTests655

14.1.3 ComparingParametricandNonparametricProcedures657

14.2 OneSample658

14.3 TwoIndependentSamples662

14.4 MoreThanTwoSamples664

14.5 RandomizedBlockDesign668

14.6 RankCorrelation670

14.7 TheBootstrap672 14.8 ChapterSummary674 14.9 ChapterExercises676 ConceptQuestions676 PracticeExercises677

APPENDIXA TablesofDistributions685

A.1 TableoftheStandardNormalDistribution685

A.1A TableofCriticalValuesfortheStandardNormalDistribution686

A.2 Student’ s t Distribution Valuesexceededbyagivenprobability α 687

A.3 The χ 2 Distribution Valuesexceededbyagivenprobability α 688

A.4 The F Distribution 10%intheuppertail,P(F . c) 5 0.10689

A.4A The F Distribution 5%intheuppertail,P(F . c) 5 0.05690

A.4B The F Distribution 2.5%intheuppertail,P(F . c) 5 0.025691

A.4C The F Distribution 1%intheuppertail,P(F . c) 5 0.01692

A.5 CriticalValuesforDunnett’sTwo-SidedTestofTreatmentsversusControl693

A.6 CriticalValuesoftheStudentizedRange,forTukey’sHSD694

A.7 CriticalValuesforUsewiththeAnalysisofMeans(ANOM)695

A.8 CriticalValuesfortheWilcoxonSignedRankTest696

A.9 CriticalValuesfortheMann WhitneyRankSumsTest697 APPENDIXB

APPENDIXB ABriefIntroductiontoMatrices700.e1

B.1 MatrixAlgebra(onlineonly)700.e2 B.2 SolvingLinearEquations(onlineonly)700.e5

C.1 FloridaLakeData701

C.2 StateEducationData702

C.3 NationalAtmosphericDepositionProgram(NADP)Data703

C.4 FloridaCountyData704

C.5 CowpeaData704

C.6 JaxHousePricesData705

C.7 Gainesville,FL,WeatherData706

C.8 GeneralSocialSurvey(GSS)2016Data707

HintsforSelectedExercises709

Preface

Thegoalof StatisticalMethods,FourthEdition,istointroducethestudentbothtostatisticalreasoningandtothemostcommonlyusedstatisticaltechniques.Itisdesigned forundergraduatesinstatistics,engineering,thequantitativesciences,ormathematics, orforgraduatestudentsinawiderangeofdisciplinesrequiringstatisticalanalysisof data.Thetextcanbecoveredinatwo-semestersequence,withthefirstsemestercorrespondingtothefoundationalideasinChapters1through7andperhapsChapter12. Throughoutthetext,techniqueshavealmostuniversalapplicability.Theymaybe illustratedwithexamplesfromagricultureoreducation,buttheapplicationscouldjust haveeasilyoccurredinpublicadministrationorengineering.

Ourambitionisthatstudentswhomasterthismaterialwillbeabletoselect,implement,andinterpretthemostcommontypesofanalysesastheyundertakeresearchin theirowndisciplines.Theyshouldbeabletoreadresearcharticlesandinmostcases understandthedescriptionsofthestatisticalresultsandhowtheauthorsusedthemto reachtheirconclusions.Theyshouldunderstandthepitfallsofcollectingstatisticaldata andtherolesplayedbythevariousmathematicalassumptions.

Statisticscanbestudiedatseverallevels.Ononehand,studentscanlearnbyrote howtoplugnumbersintoformulas,ormoreoftennow,intostatisticalsoftware,and drawanumberwithaneatcirclearounditastheanswer.Thislimitedapproachrarely leadstothekindofunderstandingthatallowsstudentstocriticallyselectmethodsand interpretresults.Ontheotherhand,therearenumeroustextbooksthatprovideintroductionstotheelegantmathematicalbackgroundsofthemethods.Althoughthisisa muchdeeperunderstandingthanthefirstapproach,itsprerequisitemathematical understandingclosesittopractitionersfrommanyotherdisciplines.

Inthistext,wehavetriedtotakeamiddleway.Wepresentenoughoftheformulastomotivatethetechniques,andillustratetheirnumericalapplicationinsmallexamples.However,thefocusofthediscussionisontheselectionofthetechnique,the interpretationoftheresults,andacritiqueofthevalidityoftheanalysis.Weurgethe student(andinstructor)tofocusontheseskills.

GuidingPrinciples

• Nomathematicsbeyondalgebraisrequired.However,mathematicallyoriented studentsmaystillfindthematerialinthisbookchallenging,especiallyiftheyalso participateincoursesinstatisticaltheory.

• Formulasarepresentedprimarilytoshowthehowandwhyofaparticularstatisticalanalysis.Forthatreason,thereareaminimalnumberofexercisesthatplug numbersintoformulas.

• Allexamplesareworkedtoalogicalconclusion,includinginterpretationofresults. Wherecomputerprintoutsareused,resultsarediscussedandexplained.Ingeneral, theemphasisisonconclusionsratherthanmechanics.

• Throughoutthebookwestressthatcertainassumptionsaboutthedatamustbe fulfilledforthestatisticalanalysestobevalid,andweemphasizethatalthoughthe assumptionsareoftenfulfilled,theyshouldberoutinelychecked.

• Examplesofthestatisticaltechniques,astheyareactuallyappliedbyresearchers, arepresentedthroughoutthetext,bothinthechapterdiscussionsandinthe exercises.

• Studentswillhaveopportunitiestoworkwithdatadrawnfromavarietyof disciplines.

NewtothisEdition

• StreamlinedPresentation.Numeroussectionshavebeencompletelyrewrittenwith thegoalofamoreconcisedescriptionofthemethods.

• PracticeProblemsforEveryChapter.EverychapternowincludesPracticeExercises, withfullsolutionspresentedattheendofthetext.

• AdditionalDataSetsforProjects.Wehaveaddedthreenewdatasetsthatinstructors canuseinpreparingassignments,andwehaveupdatedtheolddatasets.

UsingthisBook Organization

Theorganizationof StatisticalMethods,FourthEdition,followstheclassicalorder.The formulasinthebookaregenerallytheso-calleddefinitionalonesthatemphasizeconceptsratherthancomputationalefficiency.Theseformulascanbeusedforafewof theverysimplestexamplesandproblems,butweexpectthatvirtuallyallexerciseswill beimplementedoncomputersusingspecial-purposestatisticalsoftware.Thefirstseven chapters,whicharenormallycoveredinafirstsemester,includedatadescription, probabilityandsamplingdistributions,thebasicsofinferenceforoneandtwosample situations,theanalysisofvariance,andone-variableregression.Thesecondportionof thebookstartswithchaptersonmultipleregression,factorialexperiments,experimentaldesign,andanintroductiontogenerallinearmodelsincludingtheanalysisof covariance.Wehaveseparatedfactorialexperimentsanddesignofexperiments becausetheyaredifferentapplicationsofthesamenumericmethods.

Thelastthreechaptersintroducetopicsintheanalysisofcategoricaldata,logistic andotherspecialtypesofregression,andnonparametricstatistics.Thesechaptersprovideabriefintroductiontotheseimportanttopicsandareintendedtoroundoutthe statisticaleducationofthosewhowilllearnfromthisbook.

Coverage

Thisbookcontainsmorematerialthancanbecoveredinatwo-semestercourse.We havepurposelydonethisfortworeasons:

• Becauseofthewidevarietyofaudiencesforstatisticalmethods,notallinstructors willwanttocoverthesamematerial.Forexample,courseswithheavyenrollments ofstudentsfromthesocialandbehavioralscienceswillwanttoemphasizenonparametricmethodsandtheanalysisofcategoricaldatawithlessemphasisonexperimentaldesign.

• Studentswhohavetakenstatisticalmethodscoursestendtokeeptheirstatistics booksforfuturereference.Werecognizethatnosinglebookwilleverserveasa completereference,butwehopethatthebroadcoverageinthisbookwillatleast leadthesestudentsintheproperdirectionwhentheoccasiondemands.

Sequencing

Forthemostpart,topicsarearrangedsothateachnewtopicbuildsonprevioustopics; hencecoursesequencingshouldfollowthebook.Thereare,however,someexceptionsthatmayappealtosomeinstructors:

• Insomecasesitmaybepreferabletopresentthematerialoncategoricaldataatan earlystage.MuchofthematerialinChapter12(CategoricalData)canbetaught anytimeafterChapter5(InferenceforTwoPopulations).

• Someinstructorsprefertopresentnonparametricmethodsalongwithparametric methods.Again,anyofthesectionsinChapter14(NonparametricMethods)maybe extractedandpresentedalongwiththeiranalogousparametrictopicinearlierchapters.

DataSets

DatafilesforallexercisesandexamplesareavailablefromthetextWebsiteat https://www. elsevier.com/books-and-journals/book-companion/9780128230435 inASCII (txt),EXCEL, andSASformat

AppendixCfullydescribeseightdatasetsdrawnfromthegeosciences,social sciences,andagriculturalsciencesthataresuitableforavarietyofsmallprojects.

Computing

Itisessentialthatstudentshaveaccesstostatisticalsoftware.Allthemethodsusedin thistextarecommonenoughsothatanymultipurposestatisticalsoftwareshould

suffice.(Thesingleexceptionisthebootstrap,attheveryendofthetext.)Forconsistencyandconvenience,andbecauseitisthemostwidelyusedsinglestatisticalcomputingpackage,wehavereliedheavilyontheSASSystemtoillustrateexamplesin thistext.However,westressthattheexamplesandexercisescouldaseasilyhavebeen doneinSPSS,Stata,R,Minitab,oranyofanumberofothersoftwarepackages.As wedemonstrateinafewcases,thevariousprintoutscontainenoughcommoninformationthat,withtheaidofdocumentation,someonewhocaninterpretresultsfrom onepackageshouldbeabletodosofromanyother.

ThistextdoesnotattempttoteachSASoranyotherstatisticalsoftware.Generic ratherthansoftware-specificinstructionsaretheonlydirectionsgivenforperforming theanalyses.Mostcommonstatisticalsoftwarehasanincreasingamountofindependentlypublishedmaterialavailable,eitherintraditionalprintoronline.Forthosewho wishtousetheSASSystem,sampleprogramsfortheexampleswithineachchapter havebeenprovidedonthetextWebsiteat https://www.elsevier.com/books-andjournals/book-companion/9780128230435.Studentsmayfindtheseofuseastemplate programsthattheycanadaptfortheexercises.

Acknowledgments

IwaspleasedwhenRudyFreundandBillWilsoninvitedmetohelpwiththeThird Edition,andhonoredtohavetheopportunitytobecomeleadauthorontheFourth Edition.Bothexperienceshaveleftmewithatremendousrespectfortheerudition, time,andjustplainhardworkthatRudyandBillputintowritingtheoriginaltext. Myrespectforthemasstatisticians,teachers,andmentorsisunbounded.Sadly,Rudy Freundpassedawayin2014.Hisreputationlivesonwiththenumeroustextsand researcharticlesthatheauthored,andwiththestudentsthatheinspired.

CHAPTER1

DataandStatistics

1.1Introduction

Tomostpeople,theword statistics conjuresupimagesofvasttablesofnumbers referringtostockprices,population,orbaseballbattingaverages.Statistics,however, actuallydenotesasystemforreasoningbasedon data.Thecollectionofthedata,

itsdescriptionthroughappropriatesummaries,andthemethodsfordrawingconclusionsfromitallformthedisciplineofstatistics.Itisthefundamentaltoolfordatadrivenreasoning.Itisappropriate,then,tobeginwithadiscussionofthecharacteristicsofdata.Thepurposeofthischapteristo

1. providethedefinitionofasetofdata, 2. definethecomponentsofsuchadataset, 3. presenttoolsthatareusedtodescribeadataset,andbriefly 4. discussmethodsofdatacollection.

Definition1.1: Asetof data isacollectionofobservedvaluesrepresentingoneormorecharacteristicsofsomeobjectsorunits.

Example1.1GSS ATypicalDataSet

Everyyear,theNationalOpinionResearchCenter(NORC)publishestheresultsofapersonalinterview surveyofU.S.households.ThissurveyiscalledtheGeneralSocialSurvey(GSS)andisthebasisformany studiesconductedinthesocialsciences.Inthe1996GSS,atotalof2904householdsweresampled andaskedover70questionsconcerninglifestyles,incomes,religiousandpoliticalbeliefs,andopinions onvarioustopics. Table1.1 liststhedataforasampleof50respondentsonfourofthequestionsasked. Thistableillustratesatypicalmidsizeddataset.Eachoftherowscorrespondstoaparticularrespondent (labeled1through50inthefirstcolumn).Eachofthecolumns,startingwithcolumntwo,areresponses tothefollowingfourquestions:

1. AGE:Therespondent’sageinyears

2. SEX:Therespondent’ssexcoded1formaleand2forfemale

3. HAPPY:Therespondent’sgeneralhappiness,coded: 1for “Nottoohappy” 2for “Prettyhappy” 3for “Veryhappy”

4. TVHOURS:TheaveragenumberofhourstherespondentwatchedTVduringaday Thisdatasetobviouslycontainsalotofinformationaboutthissampleof50respondents. Unfortunatelythisinformationishardtointe rpretwhenthedataarepresentedasshownin Table1.1.Therearejusttoomanynumberstomakeanysenseofthedata andweareonly lookingat50respondents!Bysummarizingsomeaspectsofthisdataset,wecanobtainmuch moreusableinformationandperhapsevenanswersomespecificquestions.Forexample,whatcan wesayabouttheoverallfrequencyofthevariouslevelsofhappiness?Dosomerespondentswatch alotofTV?Istherearelationshipbetweentheageoftherespondentandhisorhergeneralhappiness?IstherearelationshipbetweentheageoftherespondentandthenumberofhoursofTV watched?

Wewillreturntothisdatasetin Section1.10 afterwehaveexploredsomemethodsformaking senseofdatasetslikethisone.Aswedevelopmoresophisticatedmethodsofanalysisinlaterchapters, wewillagainrefertothisdataset.1

1 TheGSSisdiscussedonthefollowingWeb http://www.gss.norc.org

Table1.1 Sampleof50responsestothe1996GSS.

RespondentAGESEXHAPPYTVHOURS

3045223 3164235 3230222 3375220 3453223 3538120 3626122 3725231 3856233 3926221 4054225 4131220 4244120 4336223 4474220 4574223 4637230 4748123 4842226 4977222 5075130

Definition1.2: A population isadatasetrepresentingtheentireentityofinterest.

Forexample,thedecennialcensusoftheUnitedStatesyieldsadatasetcontaining informationaboutallpersonsinthecountryatthattime(theoreticallyallhouseholds correctlyfilloutthecensusforms).Thenumberofpersonsperhouseholdaslistedin thecensusdataconstitutesapopulationoffamilysizesintheUnitedStates.

Noticethatthepointofinterestdetermineswhetheradatasetisapopulation.Consider thereadingcomprehensionscoresofallthirdgradersataspecificelementaryschool.This wouldbeapopulation,ifwewereonlyinterestedinthisparticularschool.Ifweintendto makestatementsaboutabroadergroup,thenitisonlyaportionofthepopulation.

Asweshallseeindiscussionsaboutstatisticalinference,itisimportanttodefine thepopulationthatweintendtostudyverycarefully.

Definition1.3: A sample isadatasetconsistingofaportionofapopulation.Normallya sampleisobtainedinsuchawayastoberepresentativeofthepopulation.

TheCensusBureauconductsvariousactivitiesduringtheyearsbetweeneach decennialcensus,suchastheCurrentPopulationSurvey.Thissurveysamplesasmall numberofscientificallychosenhouseholdstoobtaininformationonchangesin employment,livingconditions,andotherdemographics.Thedataobtainedconstitute asamplefromthepopulationofallhouseholdsinthecountry.Similarly,iffourreadingcomprehensionscoreswereselectedforthirdgradersataspecificschool,thenthis wouldbeasampleofsizefourfromthepopulationofallthirdgraders.

1.1.1DataSources

Althoughtheemphasisinthisbookisonthestatisticalanalysisofdata,wemust emphasizethatproperdatacollectionisjustasimportantasproperanalysis.Wetouch brieflyonissuesofdatacollectionin Section1.9.Therearemanymoredetailedtexts onthissubject(forexample,Scheaffer etal.2012).Remember,eventhemostsophisticatedanalysisprocedurescannotprovidegoodresultsfrombaddata.

Ingeneral,dataareobtainedfromtwobroadcategoriesofsources:

• Primary dataarecollectedaspartofthestudy.

• Secondary dataareobtainedfrompublishedsources,suchasjournals,governmentalpublications,newsmedia,oralmanacs.

Thereareseveralwaysofobtainingprima rydata.Dataareoftenobtainedfrom simpleobservationofaprocess,suchascharacteristicsandpricesofhomessoldina particulargeographiclocation,qualityo fproductscomingoffanassemblyline, politicalopinionsofregisteredvotersinthestateofTexas,orevenapersonstandingonastreetcornerandrecordinghowma nycarspasseachhourduringtheday.

Thiskindofastudyiscalledan observationalstudy.Observationalstudiesare oftenusedtodeterminewhetheranassociationexistsbetweentwoormorecharacteristicsmeasuredinthestudy.Forexample,astudytodeterminetherelationshipbetweenhighschoolstudentperformanceandthehighesteducationallevelof thestudent ’sparentswouldbebasedonanexaminationofstudentperformance andahistoryoftheparents’ educationalexperiences.Nocause-and-effectrelationshipcouldbedetermined,butastrongassociationmightbetheresultofsucha study.Notethatanobservationalstudyd oesnotinvolveanyinterventionbythe researcher.

Oftendatausedinstudiesinvolvingstatisticscomefrom designedexperiments. Inadesignedexperimentresearchersimposetreatmentsandcontrolsontheprocess andthenobservetheresultsandtakemeasurements.Designedexperimentscanbe usedtohelpestablishcausationbetweentwoormorecharacteristics.Forexample,a studycouldbedesignedtodetermineifhighschoolstudentperformanceisaffected

byanutritiousbreakfast.Thisstudymayuseasfewas25typicalurbanhighschool students.Theresultsofthestudycouldpotentiallyshowthatchangesinbreakfast causechangesinperformance.Theresultsobservedinthesamplewouldbegeneralized,orinferred,tothepopulationofallurbanhighschoolstudents.Chapter10providesanintroductiontoexperimentaldesigns.

1.1.2UsingtheComputer

Basicstatisticalanalyses,includingmanyoftheapplicationsinChapters1through7, canbecarriedoutinspreadsheetsoftwareorevengraphingcalculators.More advancedgraphicsandanalysesarebestdonewithdedicatedstatisticalsoftware. Becauseofitscommercialimportance,wehavelargelyusedtheSASSysteminthis text,butanumberofotherpackagesareavailable.

Onecommonfeatureofalmosteverypackageisthewayfilescontainingthedata areorganized.Agoodruleofthumbis “oneobservationequalsonerow”;anotheris “onetypeofmeasurement(orvariable)isonecolumn.” Considerthedatain Table1.1.Arrangedinaspreadsheetoratextfile,thedatawouldappearmuchasin thattable,exceptthattherighthalfofthetablewouldbepastedbelowtheleft,to make50rows.Eachrowwouldcorrespondtoadifferentrespondent.Eachcolumn wouldcorrespondtoadifferentitemreportedonthatrespondent.

Althoughtheinputfileshaveacertainsimilarity,eachsoftwarepackagehasitsownstyle ofoutput.Mostwillcontainthesameresultsbutmaybearrangedandevenlabeleddifferently.Thesoftware’sdocumentationshouldfullyexplaintheinterpretationoftheresults.

1.2ObservationsandVariables

Adatasetiscomposedofinformationfromasetofunits.Informationfromaunitis knownasan observation.Anobservationconsistsofoneormorepiecesofinformationabouttheunit;thesearecalled variables.Someexamples:

• Inastudyoftheeffectivenessofanewheadacheremedy,theunitsareindividual persons,ofwhich10aregiventhenewremedyand10aregivenanaspirin.The resultingdatasethas20observationsandtwovariables:themedicationusedanda scoreindicatingtheseverityoftheheadache.

• InasurveyfordeterminingTVviewinghabits,theunitsarefamilies.Usuallythere isoneobservationforeachofthousandsoffamiliesthathavebeencontactedto participateinthesurvey.Thevariablesdescribetheprogramswatchedanddescriptionsofthecharacteristicsofthefamilies.

• Inastudytodeterminetheeffectivenessofacollegeadmissionstest(e.g.,SAT) theunitsarethefreshmenatauniversity.Thereisoneobservationperunitand thevariablesarethestudents’ scoresonthetestandtheirfirstyear’sGPA.

Variablesthatyieldnonnumericalinformationarecalled qualitative variables. Qualitativevariablesareoftenreferredtoas categorical variables.Thosethatyield numericalmeasurementsarecalled quantitative variables.Quantitativevariablescan befurtherclassifiedasdiscreteorcontinuous.Thediagrambelowsummarizesthese definitions:

QualitativeQuantitative

DiscreteContinuous

Definition1.4: A discretevariable canassumeonlyacountablenumberofvalues. Typically,discretevariablesarefrequenciesofobservationshavingspecificcharacteristics,butall discretevariablesarenotnecessarilyfrequencies.

Definition1.5: A continuousvariable isonethatcantakeanyoneofan uncountablenumberofvaluesinaninterval.Continuousvariablesareusuallymeasuredona scaleand,althoughtheymayappeardiscreteduetoimprecisemeasurement,theycanconceptually takeanyvalueinanintervalandcannotthereforebeenumerated.

Inthefieldofstatisticalqualitycontrol,theterm variabledata isusedwhenreferringtodataobtainedonacontinuousvariableand attributedata whenreferringto dataobtainedonadiscretevariable(usuallythenumberofdefectivesornonconformitiesobserved).

Intheprecedingexamples,thenamesoftheheadacheremediesandnamesofTV programswatchedarequalitative(categorical)variables.Headacheseverityscoresisa discretenumericvariable,whiletheincomesofTV-watchingfamilies,andSATand GPAscoresarecontinuousquantitativevariables.

Wewillusethedatasetin Example1.2 topresentgreaterdetailonvariousconceptsanddefinitionsregardingobservationsandvariables.

Example1.2HousingPrices

Inthefallof2001,JohnModewasofferedanewjobinamidsizedcityineastTexas.Obviously,the availabilityandcostofhousingwillinfluencehisdecisiontoaccept,soheandhiswifeMarshagoto theInternet,find www.realtor.com,andafterafewclicksfindsome500single-familyresidencesforsale inthatarea.Inordertomakethetaskofinvestigatingthehousingmarketmoremanageable,they

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.