Full download Applied univariate, bivariate, and multivariate statistics: understanding statistics f

Page 1


Applied Univariate, Bivariate, and Multivariate Statistics: Understanding Statistics for Social and Natural Scientists, With Applications in SPSS and R 2nd Edition Daniel J. Denis

Visit to download the full and correct content document: https://ebookmass.com/product/applied-univariate-bivariate-and-multivariate-statistics -understanding-statistics-for-social-and-natural-scientists-with-applications-in-spss-an d-r-2nd-edition-daniel-j-denis/

More products digital (pdf, epub, mobi) instant download maybe you interests ...

Applied Statistics: From Bivariate Through Multivariate Techniques Second

https://ebookmass.com/product/applied-statistics-from-bivariatethrough-multivariate-techniques-second/

Applied Statistics: From Bivariate Through Multivariate Techniques Second Edition – Ebook PDF Version

https://ebookmass.com/product/applied-statistics-from-bivariatethrough-multivariate-techniques-second-edition-ebook-pdf-version/

Using Statistics in the Social and Health Sciences with SPSS Excel 1st…

https://ebookmass.com/product/using-statistics-in-the-social-andhealth-sciences-with-spss-excel-1st/

Statistics and Probability with Applications for Engineers and Scientists using MINITAB, R and JMP, Second Edition Bhisham C. Gupta

https://ebookmass.com/product/statistics-and-probability-withapplications-for-engineers-and-scientists-using-minitab-r-andjmp-second-edition-bhisham-c-gupta/

Applied Statistics: Theory and Problem Solutions with R

Dieter Rasch Rostock

https://ebookmass.com/product/applied-statistics-theory-andproblem-solutions-with-r-dieter-rasch-rostock/

Applied Statistics with R: A Practical Guide for the Life Sciences Justin C. Touchon

https://ebookmass.com/product/applied-statistics-with-r-apractical-guide-for-the-life-sciences-justin-c-touchon/

Statistics

for Engineers and Scientists 5th Edition

William Navidi

https://ebookmass.com/product/statistics-for-engineers-andscientists-5th-edition-william-navidi-2/

Statistics for Engineers and Scientists, 6th Edition

William Navidi

https://ebookmass.com/product/statistics-for-engineers-andscientists-6th-edition-william-navidi-2/

Statistics For Engineers and Scientists 6th Edition

William Navidi

https://ebookmass.com/product/statistics-for-engineers-andscientists-6th-edition-william-navidi/

APPLIEDUNIVARIATE, BIVARIATE,AND MULTIVARIATE STATISTICS

APPLIEDUNIVARIATE, BIVARIATE,AND MULTIVARIATE

STATISTICS:

UNDERSTANDING

STATISTICSFORSOCIAL ANDNATURALSCIENTISTS, WITHAPPLICATIONS

INSPSSANDR

SecondEdition

DANIELJ.DENIS

Thissecondeditionfirstpublished2021 ©2021JohnWiley&Sons,Inc.

EditionHistory

JohnWileyandSons,Inc.(1e.2016)

Allrightsreserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmitted,inanyformor byanymeans,electronic,mechanical,photocopying,recordingorotherwise,exceptaspermittedbylaw.Adviceonhowto obtainpermissiontoreusematerialfromthistitleisavailableathttp://www.wiley.com/go/permissions.

TherightofDanielJ.Denistobeidentifiedastheauthorofthisworkhasbeenassertedinaccordancewithlaw.

RegisteredOffice

JohnWiley&Sons,Inc.,111RiverStreet,Hoboken,NJ07030,USA

EditorialOffice

111RiverStreet,Hoboken,NJ07030,USA

Fordetailsofourglobaleditorialoffices,customerservices,andmoreinformationaboutWileyproductsvisitusat www.wiley.com.

Wileyalsopublishesitsbooksinavarietyofelectronicformatsandbyprint-on-demand.Somecontentthatappearsinstandard printversionsofthisbookmaynotbeavailableinotherformats.

LimitofLiability/DisclaimerofWarranty

Whilethepublisherandauthorshaveusedtheirbesteffortsinpreparingthiswork,theymakenorepresentationsorwarranties withrespecttotheaccuracyorcompletenessofthecontentsofthisworkandspecificallydisclaimallwarranties,including withoutlimitationanyimpliedwarrantiesofmerchantabilityorfitnessforaparticularpurpose.Nowarrantymaybecreatedor extendedbysalesrepresentatives,writtensalesmaterialsorpromotionalstatementsforthiswork.Thefactthatanorganization, website,orproductisreferredtointhisworkasacitationand/orpotentialsourceoffurtherinformationdoesnotmeanthatthe publisherandauthorsendorsetheinformationorservicestheorganization,website,orproductmayprovideorrecommendations itmaymake.Thisworkissoldwiththeunderstandingthatthepublisherisnotengagedinrenderingprofessionalservices.The adviceandstrategiescontainedhereinmaynotbesuitableforyoursituation.Youshouldconsultwithaspecialistwhere appropriate.Further,readersshouldbeawarethatwebsiteslistedinthisworkmayhavechangedordisappearedbetweenwhen thisworkwaswrittenandwhenitisread.Neitherthepublishernorauthorsshallbeliableforanylossofprofitoranyother commercialdamages,includingbutnotlimitedtospecial,incidental,consequential,orotherdamages.

LibraryofCongressCataloging-in-PublicationDataappliedfor: ISBN978-1-119-58304-2

CoverDesign:Wiley CoverImage:©tatianazaets/GettyImages

Setin10/12ptTimesLTStdbySPiGlobal,Pondicherry,India 10987654321

ToKaiser

AbouttheCompanionWebsite

1PreliminaryConsiderations1

1.1ThePhilosophicalBasesofKnowledge:RationalisticVersusEmpiricistPursuits,1

1.2Whatisa “Model”?,3

1.3SocialSciencesVersusHardSciences,5

1.4IsComplexityaGoodDepictionofReality?AreMultivariateMethodsUseful?,7

1.5Causality,8

1.6TheNatureofMathematics:MathematicsasaRepresentationofConcepts,8

1.7AsaScientist,HowMuchMathematicsDoYouNeedtoKnow?,10

1.8StatisticsandRelativity,11

1.9ExperimentalVersusStatisticalControl,12

1.10StatisticalVersusPhysicalEffects,12

1.11UnderstandingWhat “AppliedStatistics” Means,13 ReviewExercises,14 FurtherDiscussionandActivities,14

2IntroductoryStatistics16

2.1DensitiesandDistributions,17

2.1.1PlottingNormalDistributions,19

2.1.2BinomialDistributions,21

2.1.3NormalApproximation,23

2.1.4JointProbabilityDensities:BivariateandMultivariateDistributions,24

2.2Chi-SquareDistributionsandGoodness-of-FitTest,27

2.2.1PowerforChi-SquareTestofIndependence,30

2.3SensitivityandSpecificity,31

2.4ScalesofMeasurement:Nominal,Ordinal,Interval,Ratio,31

2.4.1NominalScale,32

2.4.2OrdinalScale,32

2.4.3IntervalScale,33

2.4.4RatioScale,33

2.5MathematicalVariablesVersusRandomVariables,34

2.6MomentsandExpectations,35

2.6.1SampleandPopulationMeanVectors,36

2.7EstimationandEstimators,38

2.8Variance,39

2.9DegreesofFreedom,41

2.10SkewnessandKurtosis,42

2.11SamplingDistributions,44

2.11.1SamplingDistributionoftheMean,44

2.12CentralLimitTheorem,47

2.13ConfidenceIntervals,47

2.14MaximumLikelihood,49

2.15Akaike’sInformationCriteria,50

2.16CovarianceandCorrelation,50

2.17PsychometricValidity,Reliability:ACommonUseofCorrelationCoefficients,54

2.18CovarianceandCorrelationMatrices,57

2.19OtherCorrelationCoefficients,58

2.20Student’s t Distribution,61

2.20.1 t-TestsforOneSample,61

2.20.2 t-TestsforTwoSamples,65

2.20.3Two-Sample t-TestsinR,65

2.21StatisticalPower,67

2.21.1VisualizingPower,69

2.22PowerEstimationUsingRandG∗Power,69

2.22.1EstimatingSampleSizeandPowerforIndependentSamples t-Test,71

2.23Paired-Samples t-Test:StatisticalTestforMatched-Pairs(ElementaryBlocking) Designs,73

2.24BlockingWithSeveralConditions,76

2.25CompositeVariables:LinearCombinations,76

2.26ModelsinMatrixForm,77

2.27GraphicalApproaches,79

2.27.1Box-and-WhiskerPlots,79

2.28WhatMakesa p-ValueSmall?ACriticalOverviewandPracticalDemonstration ofNullHypothesisSignificanceTesting,82

2.28.1NullHypothesisSignificanceTesting(NHST):ALegacyofCriticism,82

2.28.2TheMake-Upofa p-Value:ABriefRecapandSummary,85

2.28.3TheIssueofStandardizedTesting:AreStudentsinYourSchoolAchieving MorethantheNationalAverage?,85

2.28.4OtherTestStatistics,86

2.28.5TheSolution,87

2.28.6StatisticalDistance:Cohen’sd,87

2.28.7WhatDoesCohen’sdActuallyTellUs?,88

2.28.8WhyandWheretheSignificanceTestStillMakesSense,89

2.29ChapterSummaryandHighlights,89 ReviewExercises,92 FurtherDiscussionandActivities,95

3AnalysisofVariance:FixedEffectsModels97

3.1WhatisAnalysisofVariance?FixedVersusRandomEffects,98

3.1.1SmallSampleExample:AchievementasaFunctionofTeacher,99

3.1.2IsAchievementaFunctionofTeacher?,100

3.2HowAnalysisofVarianceWorks:ABigPictureOverview,101

3.2.1IstheObservedDifferenceLikely?ANOVA asaComparison(Ratio)of Variances,102

3.3LogicandTheoryofANOVA:ADeeperLook,103

3.3.1Independent-Samples t-TestsVersusAnalysisofVariance,104

3.3.2TheANOVA Model:ExplainingVariation,105

3.3.3BreakingDownaDeviation,106

3.3.4NamingtheDeviations,107

3.3.5TheSumsofSquaresofANOVA,108

3.4FromSumsofSquarestoUnbiasedVarianceEstimators:DividingbyDegreesof Freedom,109

3.5ExpectedMeanSquaresforOne-WayFixedEffectsModel:Derivingthe F-ratio,110

3.6TheNullHypothesisinANOVA,112

3.7FixedEffectsANOVA:ModelAssumptions,113

3.8AWordonExperimentalDesignandRandomization,115

3.9APreviewoftheConceptofNesting,116

3.10BalancedVersusUnbalancedDatainANOVA Models,116

3.11MeasuresofAssociationandEffectSizeinANOVA:MeasuresofVarianceExplained,117

3.11.1 η 2 Eta-Squared,117

3.11.2Omega-Squared,118

3.12The F-TestandtheIndependentSamples t-Test,118

3.13ContrastsandPost-Hocs,119

3.13.1IndependenceofContrasts,122

3.13.2IndependentSamples t-TestasaLinearContrast,123

3.14Post-HocTests,124

3.14.1Newman–KeulsandTukeyHSD,126

3.14.2TukeyHSD,127

3.14.3SchefféTest,128

3.14.4OtherPost-HocTests,129

3.14.5ContrastVersusPost-Hoc?WhichShouldIbeDoing?,129

3.15SampleSizeandPowerforANOVA:EstimationWithRandG∗Power,130

3.15.1PowerforANOVA inRandG∗Power,130

3.15.2Computing f,130

3.16FixedeffectsOne-WayAnalysisofVarianceinR:MathematicsAchievement asaFunctionofTeacher,133

3.16.1EvaluatingAssumptions,134

3.16.2Post-HocTestsonTeacher,137

3.17AnalysisofVarianceViaR’slm,138

3.18Kruskal-WallisTestinR andtheMotivationBehindNonparametricTests,138

3.19ANOVA inSPSS:AchievementasaFunctionofTeacher,140

3.20ChapterSummaryandHighlights,142 ReviewExercises,143

FurtherDiscussionandActivities,145

4FactorialAnalysisofVariance:ModelingInteractions146

4.1WhatisFactorialAnalysisofVariance?,146

4.2TheoryofFactorialANOVA:ADeeperLook,148

4.2.1DerivingtheModelforTwo-WayFactorialANOVA,149

4.2.2CellEffects,150

4.2.3InteractionEffects,151

4.2.4CellEffectsVersusInteractionEffects,152

4.2.5AModelfortheTwo-WayFixedEffectsANOVA,152

4.3ComparingOne-WayANOVA toTwo-WayANOVA:CellEffectsinFactorialANOVA VersusSampleEffectsinOne-WayANOVA,153

4.4PartitioningtheSumsofSquaresforFactorialANOVA:TheCaseofTwoFactors,153

4.4.1SS Total:AMeasureofTotalVariation,154

4.4.2ModelAssumptions:Two-WayFactorialModel,155

4.4.3ExpectedMeanSquaresforFactorialDesign,156

4.4.4RecapofExpectedMeanSquares,159

4.5InterpretingMainEffectsinthePresenceofInteractions,159

4.6EffectSizeMeasures,160

4.7Three-Way,Four-Way,andHigherModels,161

4.8SimpleMainEffects,161

4.9NestedDesigns,162

4.9.1VarietiesofNesting:NestingofLevelsVersusSubjects,163

4.10AchievementasaFunctionofTeacherandTextbook:ExampleofFactorial ANOVA inR,164

4.10.1ComparingModelsThroughAIC,167

4.10.2VisualizingMainEffectsandInteractionEffectsSimultaneously,169

4.10.3SimpleMainEffectsforAchievementData:BreakingDown InteractionEffects,170

4.11InteractionContrasts,171

4.12ChapterSummaryandHighlights,172 ReviewExercises,173

5IntroductiontoRandomEffectsandMixedModels175

5.1WhatisRandomEffectsAnalysisofVariance?,176

5.2TheoryofRandomEffectsModels,177

5.3EstimationinRandomEffectsModels,178

5.3.1TransitioningfromFixedEffectstoRandomEffects,178

5.3.2ExpectedMeanSquaresforMS BetweenandMSWithin,179

5.4DefiningNullHypothesesinRandomEffectsModels,180

5.4.1 F-RatioforTesting H0,181

5.5ComparingNullHypothesesinFixedVersusRandomEffectsModels: TheImportanceofAssumptions,182

5.6EstimatingVarianceComponentsinRandomEffectsModels:ANOVA, ML,REML Estimators,183

5.6.1ANOVA EstimatorsofVarianceComponents,183

5.6.2MaximumLikelihoodandRestrictedMaximumLikelihood,184

5.7IsAchievementaFunctionofTeacher?One-WayRandomEffectsModelinR,185

5.7.1ProportionofVarianceAccountedforbyTeacher,187

5.8RAnalysisUsingREML,188

5.9AnalysisinSPSS:ObtainingVarianceComponents,188

5.10FactorialRandomEffects:ATwo-WayModel,190

5.11FixedEffectsVersusRandomEffects:AWayofConceptualizingTheirDifferences,191

5.12ConceptualizingtheTwo-WayRandomEffectsModel:TheMake-Upofa RandomlyChosenObservation,192

5.13SumsofSquaresandExpectedMeanSquaresforRandomEffects:The ContaminatingInfluenceofInteractionEffects,193

5.13.1TestingNullHypotheses,194

5.14YouGetWhatYouGoInWith:TheImportanceofModelAssumptionsand ModelSelection,195

5.15MixedModelAnalysisofVariance:IncorporatingFixedandRandomEffects,196

5.15.1MixedModelinR,199

5.16MixedModelsinMatrices,199

5.17MultilevelModelingasaSpecialCaseoftheMixedModel:IncorporatingNesting andClustering,200

5.18ChapterSummaryandHighlights,201 ReviewExercises,202

6RandomizedBlocksandRepeatedMeasures204

6.1WhatisaRandomizedBlockDesign?,205

6.2RandomizedBlockDesigns:SubjectsNestedWithinBlocks,205

6.3TheoryofRandomizedBlockDesigns,207

6.3.1NonadditiveRandomizedBlockDesign,208

6.3.2AdditiveRandomizedBlockDesign,209

6.4TukeyTestforNonadditivity,211

6.5AssumptionsfortheCovarianceMatrix,212

6.6IntraclassCorrelation,213

6.7RepeatedMeasuresModels:ASpecialCaseofRandomizedBlockDesigns,215

6.8IndependentVersusPaired-Samples t-Test,215

6.9TheSubjectFactor:FixedorRandomEffect?,216

6.10ModelforOne-WayRepeatedMeasuresDesign,217

6.10.1ExpectedMeanSquaresforRepeatedMeasuresModels,217

6.11AnalysisUsingR:One-WayRepeatedMeasures:LearningasaFunctionofTrial,218

6.12AnalysisUsingSPSS:One-WayRepeatedMeasures:LearningasaFunctionofTrial,222

6.12.1WhichResultsShouldBeInterpreted?,224

6.13SPSS Two-WayRepeatedMeasuresAnalysisofVarianceMixedDesign: OneBetweenFactor,OneWithinFactor,226

6.13.1AnotherLookattheBetween-SubjectsFactor,229

6.14ChapterSummaryandHighlights,230 ReviewExercises,231

7LinearRegression232

7.1BriefHistoryofRegression,233

7.2RegressionAnalysisandScience:ExperimentalVersusCorrelationalDistinctions,235

7.3AMotivatingExample:CanOffspringHeightBePredicted?,236

7.4TheoryofRegressionAnalysis:ADeeperLook,238

7.5MultilevelYearnings,240

7.6TheLeast-SquaresLine,240

7.7MakingPredictionsWithoutRegression,241

7.8Moreabout εi,243

7.9ModelAssumptionsforLinearRegression,243

7.9.1ModelSpecification,245

7.9.2MeasurementError,245

7.10EstimationofModelParametersinRegression,246

7.10.1OrdinaryLeast-Squares(OLS),247

7.11NullHypothesesforRegression,248

7.12SignificanceTestsandConfidenceIntervalsforModelParameters,250

7.13OtherFormulationsoftheRegressionModel,251

7.14TheRegressionModelinMatrices:AllowingforMoreComplexMultivariableModels,252

7.15OrdinaryLeast-SquaresinMatrices,255

7.16AnalysisofVarianceforRegression,256

7.17MeasuresofModelFitforRegression:HowWellDoestheLinearEquationFit?,259

7.18Adjusted R2,260

7.19What “ExplainedVariance” MeansandMoreImportantly,WhatItDoesNotMean,260

7.20ValuesFitbyRegression,261

7.21Least-SquaresRegressioninR:UsingMatrixOperations,262

7.22LinearRegressionUsingR,265

7.23RegressionDiagnostics:ACheckonModelAssumptions,267

7.23.1UnderstandingHowOutliersInfluenceaRegressionModel,268

7.23.2ExaminingOutliersandResiduals,269

7.23.3DetectingOutliers,272

7.23.4NormalityofResiduals,274

7.24RegressioninSPSS:PredictingQuantitativefromVerbal,275

7.25PowerAnalysisforLinearRegressioninR,279

7.26ChapterSummaryandHighlights,281 ReviewExercises,283 FurtherDiscussionandActivities,285

8MultipleLinearRegression286

8.1TheoryofPartialCorrelation,287

8.2SemipartialCorrelations,288

8.3MultipleRegression,289

8.4SomePerspectiveonRegressionCoefficients: “ExperimentalCoefficients”?,290

8.5MultipleRegressionModelinMatrices,291

8.6EstimationofParameters,292

8.7ConceptualizingMultiple R,292

8.8InterpretingRegressionCoefficients:CorrelatedVersusUncorrelatedPredictors,293

8.9Anderson ’sIrisData:PredictingSepalLengthFromPetalLengthandPetalWidth,293

8.10FittingOtherFunctionalForms:ABriefLookatPolynomialRegression,297

8.11MeasuresofCollinearityinRegression:VarianceInflationFactorandTolerance,298

8.12R-squaredasaFunctionofPartialandSemipartialCorrelations: TheSteppingStonestoForwardandStepwiseRegression,300

8.13Model-BuildingStrategies:Simultaneous,Hierarchical,Forward,Stepwise,301

8.13.1Simultaneous,Hierarchical,Forward,303

8.13.2StepwiseRegression,305

8.13.3SelectionProceduresinR,306

8.13.4WhichRegressionProcedureShouldBeUsed?ConcludingCommentsand RecommendationsRegardingModel-Building,306

8.14PowerAnalysisforMultipleRegression,307

8.15IntroductiontoStatisticalMediation:ConceptsandControversy,307

8.15.1StatisticalVersusTrueMediation:SomePhilosophicalPitfallsinthe InterpretationofMediationAnalysis,309

8.16BriefSurveyofRidgeandLassoRegression:PenalizedRegressionModels andtheConceptofShrinkage,311

8.17ChapterSummaryandHighlights,313 ReviewExercises,314 FurtherDiscussionandActivities,315

9InteractionsinMultipleLinearRegression316

9.1TheAdditiveRegressionModelWithTwoPredictors,317

9.2WhytheInteractionistheProductTerm xizi:DrawinganAnalogytoFactorial ANOVA,318

9.3AMotivatingExampleofInteractioninRegression:CrossingaContinuousPredictor WithaDichotomousPredictor,319

9.4AnalysisofCovariance,323

9.4.1IsANCOVA “Controlling” forAnything?,325

9.5ContinuousModerators,326

9.6SummingUptheIdeaofInteractionsinRegression,326

9.7DoModeratorsReally “Moderate” Anything?,326

9.7.1SomePhilosophicalConsiderations,326

9.8InterpretingModelCoefficientsintheContextofModerators,327

9.9Mean-CenteringPredictors:ImprovingtheInterpretabilityofSimpleSlopes,328

9.10MultilevelRegression:AnotherSpecialCaseoftheMixedModel,330

9.11ChapterSummaryandHighlights,331 ReviewExercises,331

10LogisticRegressionandtheGeneralizedLinearModel333

10.1NonlinearModels,335

10.2GeneralizedLinearModels,336

10.2.1TheLogicoftheGeneralizedLinearModel:HowtheLink FunctionTransformsNonlinearResponseVariables,337

10.3CanonicalLinks,338

10.3.1CanonicalLinkforGaussianVariable,339

10.4DistributionsandGeneralizedLinearModels,339

10.4.1LogisticModels,339

10.4.2PoissonModels,340

10.5DispersionParametersandDeviance,340

10.6LogisticRegression,341

10.6.1AGeneralizedLinearModelforBinaryResponses,341

10.6.2ModelforSinglePredictor,342

10.7ExponentialandLogarithmicFunctions,343

10.7.1Logarithms,345

10.7.2TheNaturalLogarithm,346

10.8OddsandtheLogit,347

10.9PuttingItAllTogether:LogisticRegression,348

10.9.1TheLogisticRegressionModel,348

10.9.2InterpretingtheLogit:ASurveyofLogisticRegressionOutput,348

10.10LogisticRegressioninR,351

10.10.1ChallengerO-ringData,351

10.11ChallengerAnalysisinSPSS,354

10.11.1PredictionsofNewCases,356

10.12SampleSize,EffectSize,andPower,358

10.13FurtherDirections,358

10.14ChapterSummaryandHighlights,359 ReviewExercises,360

11MultivariateAnalysisofVariance361

11.1AMotivatingExample:QuantitativeandVerbalAbilityasaVariate,362

11.2ConstructingtheComposite,363

11.3TheoryofMANOVA,364

11.4IstheLinearCombinationMeaningful?,365

11.4.1ControlOverTypeIErrorRate,365

11.4.2CovarianceAmongDependentVariables,366

11.4.3Rao’sParadox,367

11.5MultivariateHypotheses,368

11.6AssumptionsofMANOVA,368

11.7Hotelling’s T2:TheCaseofGeneralizingFromUnivariatetoMultivariate,369

11.8TheCovarianceMatrixS,373

11.9FromSumsofSquaresandCross-ProductstoVariancesandCovariances,375

11.10HypothesisandErrorMatricesofMANOVA,376

11.11MultivariateTestStatistics,376

11.11.1Pillai’sTrace,378

11.11.2Lawley –Hotelling’sTrace,379

11.12EqualityofCovarianceMatrices,379

11.13MultivariateContrasts,381

11.14MANOVA inRandSPSS,382

11.14.1UnivariateAnalyses,386

11.15MANOVA ofFisher’sIrisData,387

11.16PowerAnalysisandSampleSizeforMANOVA,388

11.17MultivariateAnalysisofCovarianceandMultivariateModels: ABird’sEyeViewofLinearModels,389

11.18ChapterSummaryandHighlights,389

ReviewExercises,391

FurtherDiscussionandActivities,393

12DiscriminantAnalysis394

12.1WhatisDiscriminantAnalysis?TheBigPictureontheIrisData,395

12.2TheoryofDiscriminantAnalysis,396

12.2.1DiscriminantAnalysisforTwoPopulations,397

12.2.2SubstitutingtheMaximizingVectorintoSquaredStandardized Difference,398

12.3LDA inRandSPSS,399

12.4DiscriminantAnalysisforSeveralPopulations,405

12.4.1TheoryforSeveralPopulations,405

12.5DiscriminatingSpeciesofIris:DiscriminantAnalysesforThreePopulations,408

12.6ANoteonClassificationandErrorRates,410

12.6.1StatisticalLives,412

12.7DiscriminantAnalysisandBeyond,412

12.8CanonicalCorrelation,413

12.9MotivatingExampleforCanonicalCorrelation:Hotelling’s1936Data,414

12.10CanonicalCorrelationasaGeneralLinearModel,415

12.11TheoryofCanonicalCorrelation,416

12.12CanonicalCorrelationofHotelling’sData,418

12.13CanonicalCorrelationontheIrisData:ExtractingCanonicalCorrelation FromRegression,MANOVA,LDA,419

12.14ChapterSummaryandHighlights,420

ReviewExercises,421

FurtherDiscussionandActivities,422

13PrincipalComponentsAnalysis423

13.1HistoryofPrincipalComponentsAnalysis,424

13.2Hotelling1933,426

13.3TheoryofPrincipalComponentsAnalysis,428

13.3.1TheTheoremofPrincipalComponentsAnalysis,428

13.4EigenvaluesasVariance,429

13.5PrincipalComponentsasLinearCombinations,429

13.6ExtractingtheFirstComponent,430

13.6.1SampleVarianceofaLinearCombination,430

13.7ExtractingtheSecondComponent,431

13.8ExtractingThirdandRemainingComponents,432

13.9TheEigenvalueastheVarianceofaLinearCombinationRelativetoitsLength,432

13.10DemonstratingPrincipalComponentsAnalysis:Pearson’s1901Illustration,433

13.11ScreePlots,436

13.12PrincipalComponentsVersusLeast-SquaresRegressionLines,439

13.13CovarianceVersusCorrelationMatrices:PrincipalComponentsandScaling,441

13.14PrincipalComponentsAnalysisUsingSPSS,441

13.15ChapterSummaryandHighlights,445

ReviewExercises,446

FurtherDiscussionandActivities,448

14FactorAnalysis449

14.1HistoryofFactorAnalysis,450

14.2FactorAnalysisataGlance,450

14.3ExploratoryVersusConfirmatoryFactorAnalysis,451

14.4TheoryofFactorAnalysis:TheExploratoryFactor-AnalyticModel,451

14.5TheCommonFactor-AnalyticModel,452

14.6AssumptionsoftheFactor-AnalyticModel,454

14.7WhyModelAssumptionsareImportant,455

14.8TheFactorModelasanImplicationfortheCovarianceMatrix Σ,456

14.9Again,Whyis Σ = ΛΛ + ψ SoImportantaResult?,457

14.10TheMajorCritiqueAgainstFactorAnalysis:Indeterminacyandthe NonuniquenessofSolutions,457

14.11HasYourFactorAnalysisBeenSuccessful?,459

14.12EstimationofParametersinExploratoryFactorAnalysis,460

14.13PrincipalFactor,460

14.14MaximumLikelihood,461

14.15TheConcepts(andCriticisms)ofFactorRotation,462

14.16VarimaxandQuartimaxRotation,464

14.17ShouldFactorsBeRotated?IsThatNotCheating?,465

14.18SampleSizeforFactorAnalysis,466

14.19PrincipalComponentsAnalysisVersusFactorAnalysis:TwoKeyDifferences,466

14.19.1HypothesizedModelandUnderlyingTheoreticalAssumptions,466

14.19.2SolutionsareNotInvariantinFactorAnalysis,467

14.20PrincipalFactorinSPSS:PrincipalAxisFactoring,468

14.21BartlettTestofSphericityandKaiser-Meyer-OlkinMeasureof SamplingAdequacy(MSA),474

14.22FactorAnalysisinR:HolzingerandSwineford(1939),476

14.23ClusterAnalysis,477

14.24WhatisClusterAnalysis?TheBigPicture,478

14.25MeasuringProximity,480

14.26HierarchicalClusteringApproaches,483

14.27NonhierarchicalClusteringApproaches,485

14.28 K-MeansClusterAnalysisinR,486

14.29GuidelinesandWarningsAboutClusterAnalysis,489

14.30ABriefLookatMultidimensionalScaling,489

14.31ChapterSummaryandHighlights,492 ReviewExercises,493

FurtherDiscussionandActivities,496

15PathAnalysisandStructuralEquationModeling497

15.1PathAnalysis:AMotivatingExample PredictingIQAcrossGenerations,498

15.2PathAnalysisand “CausalModeling”,500

15.3EarlyPost-WrightPathAnalysis:PredictingChild’sIQ(Burks,1928),502

15.4DecomposingPathCoefficients,503

15.5PathCoefficientsandWright’sContribution,504

15.6PathAnalysisinR AQuickOverview:ModelingGalton’sData,505

15.6.1PathModelinAMOS,508

15.7ConfirmatoryFactorAnalysis:TheMeasurementModel,510

15.7.1ConfirmatoryFactorAnalysisasaMeansofEvaluatingConstruct ValidityandAssessingPsychometricQualities,512

15.8StructuralEquationModels,514

15.9Direct,Indirect,andTotalEffects,515

15.10TheoryofStatisticalModeling:ADeeperLookIntoCovarianceStructures andGeneralModeling,516

15.11TheDiscrepancyFunctionandChi-Square,518

15.12Identification,519

15.13DisturbanceVariables,520

15.14MeasuresandIndicatorsofModelFit,521

15.15OverallMeasuresofModelFit,522

15.15.1RootMeanSquareResidualandStandardizedRootMeanSquare Residual,522

15.15.2RootMeanSquareErrorofApproximation,523

15.16ModelComparisonMeasures:IncrementalFitIndices,523

15.17WhichIndicatorofModelFitisBest?,525

15.18StructuralEquationModelinR,526

15.19HowAllVariablesAreLatent:ASuggestionforResolvingtheManifest-Latent Distinction,528

15.20TheStructuralEquationModelasaGeneralModel:SomeConcluding ThoughtsonStatisticsandScience,529

15.21ChapterSummaryandHighlights,530 ReviewExercises,531

FurtherDiscussionandActivities,533

PREFACE

Technologyisnotprogress.Empathyis.Thedogsarewatchingus.

Nowinits secondedition,thisbookprovidesageneralintroductionandoverviewofunivariate throughtomultivariatestatisticalmodelingtechniquestypicallyusedinthesocial,behavioral,and relatedsciences.Studentsreadingthisbookwillcomefromavarietyoffields,including psychology, sociology, education, politicalscience, biology, medicine,economics, business,forestry,nursing, chemistry,law, amongothers.Thebookshouldbeofinteresttoanyonewhodesiresarelativelycompactandsuccinctsurveyand overviewofstatisticaltechniques usefulforanalyzingdatainthese fields,whilealsowantingtounderstandandappreciatesomeofthetheorybehindthesetools.Spanning severalstatisticalmethods,thefocusofthebookisnaturallyoneof breadth thanof depth intoanyone particulartechnique,focusingontheunifyingprinciplesaswellaswhatsubstantively(scientifically) canorcannotbeconcludedfromamethodwhenappliedtorealdata.Thesearetopicsusuallyencounteredby upperdivisionundergraduate orbeginning graduate studentsintheaforementionedfields.

Thefirsteditionhasalsobeenusedwidelyasareferenceresourceforbothstudentsandresearchers workingondissertations,manuscripts,andotherpublications.Itishopedtoprovidethestudentwitha “bigpicture” overviewofhowappliedstatisticalmodelingworks,whileatthesametimeprovidinghim orhertheopportunityinmanyplacestoimplement,tosomeextentatleast,manyofthesemodelsusing SPSSand/orRsoftware.Referencesandrecommendationsforfurtherreadingareprovidedthroughout thetextforreaderswhowishtopursuethesetopicsfurther.Eachtopicandsoftwaredemonstrationcan literallybe “unpacked” intoadeeperdiscussion,andsolongasthereaderisawareofthis,theywillappreciatethisbookforwhatitis abird’seyeviewofappliedstatistics,andnotthe “oneandonly” source theyshouldrefertowhenconductinganalyses.Thebookdoesnotpretendtobeacompletecompendium ofeachstatisticalmethoditdiscusses,butratherisasurveyofeachmethodinhopesofconveyinghow thesemethodsgenerally “work,” whattechnicalelementsunitesvirtuallyallofthem,andthebenefitsand limitationsofhowtheymaybeusedinaddressingscientificquestions.

Thissecondeditionhasbeenrevisedtomakethebookclearerandmoreaccessiblecomparedto thefirstedition.Thebookalsocontainsagentleintroduction(“footinthedoor”)toavarietyof newtopicsthatdidnotappearinthefirstedition.Allchaptershavebeeneditedtovaryingdegrees

toimproveclarityofproseandinplacesprovidemoreinformationorclarificationoftheconceptunder discussion.Thefollowingisasummaryofupdatesandrevisionsinthesecondedition:

• Significant revision and correctionsoferrata appearinginthefirstedition.Thesecondeditionis a strongerandbetterbook becauseithasbeenthoroughlyre-readandeditedinplaceswhere rewordingwasrequired.Inthissense,thesecondeditionhasundergoneverymuch “vetting” sincethefirstedition.Atthesametime,somesectionshavebeenentirelydeletedfromthefirst editionduetotheirexplanationsbeingtoobrieftomakethemworthwhile.Thesearesectionsthat didnotseemto “work” inthefirstedition,sotheywereomittedinthesecond.Thishopefullywill helpimprovethe “flow” ofthebookwithoutthereaderstumblingacrosssectionsthatareinsufficientlyexplained.

• Boldedtext isusedquiteliberallytoindicate emphasis andsignalareasthatarekeyforagood understandingofappliedstatistics. “Accentuate” boldtextwhenreadingthebook.Theyarethe keywords and themes aroundwhichthebookwasbuilt.

• The images inmanychaptershavebeenreproducedtomakethemclearerandmoredetailedthan inthefirstedition.ThisisthankstoWiley’steamwhohasreconstructedmanyofthefiguresand diagrams.

• Chapter 2nowincludesabriefsurveyof psychometricvalidity and reliability,alongwitha simpledemonstrationofcomputing Cronbach’salpha inSPSS.

• Chapter 3featuresabitmoredetailandbetterintroductiononthenatureof nonparametricstatistics inthecontextoftheanalysisofvariance.

• Chapters 7and8onregressionhavebeenrevisedandeditedinplacestoincludeexpandedornew discussion,includingademonstrationofpoweranalysisusingG∗PowerinadditiontoR. Chapter 8nowincludesamorethoroughanddeeperdiscussionofmodelselection,andalsofeaturesanew sectionthatbrieflyintroduces ridge and lassoregression,bothpenalizedregressionmethods.

• Chapter 9oninteractionsinregressionnowcontainsabriefsoftwaredemonstrationofthe analysisofcovariance (ANCOVA),conceptualizedasaspecialcaseofthewiderregressionmodel. Someofthetheoryofthefirsteditionhasbeenremovedasitdidnotseemtoserveitsintended goal.Forreaderswhowouldliketodelveintothesubjectofinteractionsinregressionmore deeply,additionalsourcesandrecommendationsareprovided.

• Chapter 11nowincludesRandSPSScodeforobtaining Hotelling’s T2.WhilereaderscansimplyuseaMANOVAprogramtoevaluatemeanvectordifferencesontwogroups,theinclusionof therelevantsoftwarecodeforHotelling’s T2 isusefultomaketheMANOVAchapterabitmore complete.

• Chapter 14onexploratoryfactoranalysisnowconcludeswithabriefintroductionandoverview ofthetechniqueof multidimensionalscaling shouldreaderswishtopursuethistopicfurther.By relatingthetechniquesomewhattopreviouslylearnedtechniques,thereaderisencouragedtosee thelearningofnewtechniquesasextendingtheircurrentknowledgebase.Thisisduetothebook emphasizing foundations and fundamentalprinciples ofappliedstatistics,ratherthanaseriesof topicsseeminglyunrelated.

• Chapter 15hasbeenexpandedslightlytoincludeabasicdemonstrationofdataanalysisusing AMOS software.ManyuserswhoperformSEMmodelsuseAMOSinsteadofR,andsoit seemedappropriatetoincludeasmallsampleofAMOSoutputinthecontextofbuildingasimple pathmodel.AdditionalreferencesforlearningandusingAMOSarealsoprovidedforthosewho wishtoventurefurtherintostructuralequationmodels.

• Theinclusioninselectplacesbriefdiscussionsof,andreferencesto, “BigData, ” aswellas data science and machinelearning,andwhyunderstandingfundamentalsand classicalstatistics is evenmoreimportanttodaythaneverbeforeinlightoftheseadvancements.Thesefieldsare

heavilycomputational,butforthemostpart,havetechnicaloriginsinfundamentalstatisticsand mathematics.Wetryourbesttokeythereadertowherethesetopics “fit” inthewiderdataanalytic landscape,soiftheychoosetoembarkonthesetopicsinfuturestudy,orfurthertheirstudyof computerscience, forexample,theyhaveasenseofhowmanyofthesetechniquesbuildonfoundationalelements.

• Select chapterexercises havebeeneditedastoclarifywhattheyareasking,whileafewothers havebeendeletedsincetheydidnotseemtoworkwellinthefirsteditionofthebook.Themajorityoftheexercisesremain conceptually-based astoencourageadeepandfar-reachingunderstandingofthematerial.Selectdata-analyticexerciseshavebeeneithereditedorsubstitutedfor betterones.

• Additional references andcitationshavebeenaddedtosupplementthebookwhichalreadyfeaturesmanyclassicreferencestopioneersinappliedstatistics.

• Anon-lineAppendixfeaturingareviewof essentialmathematicsisavailableat www.datapsyc.com.

ACKNOWLEDGMENTS

IamindebtedtoallatWileywhohelpedintheproductionofthebook,bothdirectlyandindirectly. AsincerethankyoutoMindyOkura-Marszycki,EditoratWiley,whosupportedthewritingofthis secondedition(thefirsteditionwaseditedbySteveQuigleyandJonGurstelle).Thankyouaswellto allotherassociates,bothprofessionalandunprofessional,whoinonewayoranotherinfluencedmy ownlearningasitconcernsstatisticsandresearch.Comments,criticism,corrections,andquestions aboutthebookaremostwelcome.Pleasee-mailyourfeedbacktodaniel.denis@umontana.eduor email@datapsyc.com. Datasets and errata areavailableatwww.datapsyc.com.

DANIEL J.DENIS

ABOUTTHECOMPANIONWEBSITE

Thisbookisaccompaniedbyacompanionwebsite:

www.wiley.com/go/denis/appliedstatistics2e

Thewebsitecontainsappendixandprefaceofthefirstedition.

1 PRELIMINARYCONSIDERATIONS

Still,socialscienceispossible,andneedsastrongempiricalcomponent.Evenstatisticaltechniquemay proveuseful – fromtimetotime.

(Freedman,1987,AsOthersSeeUs:ACaseinPathAnalysis,p.125)

Beforewedelveintothecomplexitiesanddetailsthatisthefieldofappliedstatistics,wefirstlightly surveysomegermanephilosophicalissuesthatlayattheheartofwherestatisticsfitinthebiggerpictureofscience.Thoughthisbookisprimarilyaboutappliedstatisticalmodeling,theend-goalistouse statisticalmodelinginthecontextofscientificexplorationanddiscovery.Tohaveanappreciationfor howstatisticsareusedinscience,onemustfirsthaveasenseofsomeessentialfoundationssothatone cansituatewherestatisticsfindsitselfwithinthelargerframeofscientificinvestigation.

1.1THEPHILOSOPHICALBASESOFKNOWLEDGE: RATIONALISTICVERSUSEMPIRICISTPURSUITS

Allknowledgecanbesaidtobebasedonfundamentalphilosophicalassumptions,andhenceempirical knowledgederivedfromthesciencesisnodifferent.Therehave,historically,beentwomeansby whichknowledgeisthoughttobeattained.The rationalist derivesknowledgeprimarilyfrommental, cognitivepursuits.Inthissense, “realobjects” arethoseoriginatingfromthemindvia reasoning and thelike,ratherthanobtainedempirically.The empiricist ,ontheotherhand,derivesknowledgefrom experience,thatis,onemightcrudelysay, “objective” reality.Totheempiricist,knowledgeisinthe formoftangibleobjectsinthe “realworld.”

Ideally,scienceshouldpossessahealthyblendofbothperspectives.Ontheonehand,science should,ofcourse,begroundedinobjectiveobjects.Theobjectsonestudiesshouldbeindependent AppliedUnivariate,Bivariate,andMultivariateStatistics:UnderstandingStatisticsforSocialandNaturalScientists, WithApplicationsinSPSSandR,SecondEdition.DanielJ.Denis. ©2021JohnWiley&Sons,Inc.Published2021byJohnWiley&Sons,Inc. CompanionWebsite:www.wiley.com/go/denis/appliedstatistics2e

FIGURE1.1 ObservingthebehaviorofapigeoninaSkinnerbox.Source:Dtarazona(1998).https://commons. wikimedia.org/wiki/File:UNMSM_PsiExperimental_1998_2.jpg.PublicDomain.

ofthepsychicalrealm.Acupofcoffeeisacupofcoffeeregardlessofourbeliefortheoryaboutthe existenceofthecup.Ontheotherhand,voidof any rationalistactivity,sciencebecomesthestudyof objectsforwhichwearenotallowedtoassignmeaning.Forexample,thebehaviorofapigeonina Skinnerbox1 (seeFigure1.1)canbedocumentedastothenumberoftimesitpressesontheleverfor therewardofafoodpellet.Thatthepigeonpressesontheleverisempiricalreality. Why thepigeon pressesonthelevelistheoreticalspeculation,ofwhichtherecouldbemanycompetingpossibilities. Observingdataisfine,butwithouttheory,wehaveverylittle “guidance” toeitherexplaincurrent observationsorpredictnewones.B.F.Skinner’stheoryof operantconditioning,beingsuchthat thepigeonpressestheleverbecauseitis reinforced todoso,isaprimeexampleofwhereawedding ofrationalismandempiricismtakesplace.Thetheoryattemptsto explain or account forthepigeon’s behavior.Itisanarrativeforwhythepigeondoeswhatitdoes.

Ofcourse,theorizingcangotoofar,muchtoofar.Onemustbecautioustonot “over-theorize” too emphaticallywithoutacknowledgingtheabsenceofempiricalbacking.Isthereanythingwrongwith hypothesizingthatcloudydaysareassociatedwithdepressivemoods?No,solongasyouareprepared tostatewhatevidenceexiststhatmaysupportorcontradictyourtheory.Ifnoevidenceexists,youmay stilltheorize,butyouoweittoyouraudiencetoadmitthelackofcurrentempiricalsupportforyour hypothesis.

Asanexampleof “heightenedtheorizing,” recallthemissing MalaysiaAirlinesFlight370 wherea Being777aircraftvanished,apparentlywithoutatrace,originallydestinedfromKualaLumpurtoBeijinginMarchof2014.Mediaweresometimescriticizedforproposingnumeroustheoriesastoitsdisappearance,rangingfromtheplanebeingflownintoahiddenlocationtoitbeinghijackedoraresultof pilotsuicide.Onetheoryevenspeculatedthattheplanewasswallowedbyablackhole!Speculationis fineandtheorizingisanecessaryscientificaswellashumanactivity,solongasoneisupfrontabout existentavailableevidencetosupportthetheoryoneisadvancing.Indeed,onecouldassignprobabilitiestocompetingtheoriesandrevisesuchprobabilitiesasnewdatabecomeavailable.Thisisprecisely what Bayesian philosophersandstatisticiansarewonttodo.Atheoryshouldonlybeconsidered credible howeverwhenempiricalrealityandthetheorycoincide(seeFigure1.2).Thefitmaynotbeperfect,andseldomifeveris,butwhentherationalcoincideswellwiththeempirical,credibilityofthe

1 B.F.Skinnerwasapsychologistknownforhistheoryofoperantconditioningwithinthebehavioristtraditioninpsychology. OneofSkinner’sprimaryinvestigatorytoolswasthatofobservingandrecordingtheconditionsthatwouldleadarat,pigeon,or otheranimal,topressaleverforafoodpelletinasmallchamber.Thischambercametobeknownasthe Skinnerbox.Foraread ofSkinner,seeRutherford(2009)andFancherandRutherford(2011). 2

idea isatleasttentativelyassured,atleastuntilpotentiallynewevidencedebunksit(e.g.,thefallof Newtonianphysics).

Wemustalsoensurethatourtheoriesarenottoo convenientofnarratives fittodata.Ifyouhave everwitnessedasportingeventwherethedecidingpointoccurredbytheluckybounceofapuckin hockeyorthebreezypushofatennisballinmidair,onlytohearpost-matchcommentatorslaudthe winningteamorindividualassuddenlysomuchbetterthanthelosingteam,thenyouknowwhat “convenientnarratives” areallabout.Wemustbecarefulnottoexaggeratehowwellourgiventheoryfits datasimplybecauseafewdatapointswent “ourway.” GeorgeBoxoncesaidthat allmodelsare wrongbutsomeareuseful.Inanyscientificendeavor,guardagainst fallinginlovewithyourtheory orotherwiseexaggeratingitfarbeyondwhatthedatasuggest.Otherwise,itnolongerisalegitimate theory,butratherissimply yourbrand andmoreaproductofsubjectivebiasand “career-building” thananythingscientific.After20yearsofadvocatingatheory,istheresearcheryouarespeakingto reallypreparedto “accept” evidencethatcontradictshisorhertheory?Theyhavealotofstakesinthat theory,theirwholecareermayhavebeenbuiltuponit,aretheyreallywillingtoaccept “defeat” ofit? Indeed,onereasonIbelievewhyeconomicpredictions,forinstance,areoftenlookeduponwithsuspicion,isbecauseeconomists,likepsychologists(andtheoreticalphysicists,forthatmatter),arefartoo quicktoadvancetheoriesasthoughtheywerenearfacts. “Sexytheories” soundgreatandmaybemarketabletouncriticalconsumersandmedia(makeanoutlandishclaimoncable,you’llbeahero!),butto goodscientists,theoriesarealwaysonlyasgoodasthedatathatexisttosupportthem.Scienceisexciting,tobesure,butshouldnotbeoverlyspeculative.Ifyouarelookingforfireworks,thenyouarebest tochooseafieldotherthanscience.

1.2WHATISA “MODEL”?

Theword “model” isperhapsthemostpopularwordfeaturedintextbooks,tutorials,andlectureshavinganythingtodowiththeapplicationofquantitativemethods.Attemptingtodefinejustwhat is a modelinstatisticscanbeabitchallenging.WediscusstheconceptbyreferringtoEveritt’sdefinition:

Adescriptionoftheassumedstructureofasetofobservationsthatcanrangefromafairlyimpreciseverbal accountto,moreusually,aformalizedmathematicalexpressionoftheprocessassumedtohavegenerated theobserveddata.Thepurposeofsuchadescriptionistoaidinunderstandingthedata. (Everitt,2002,p.247)

Models,are,essentially,andperhapssomewhatcrudely, equations.Theyareequationsfittodata thatattempttoaccountforhowthedatacameaboutorwere generated inthefirstplace.Forexample,if foreveryhourastudentstudiedforanexamcorrespondedtoexactlya1-pointincreaseinastudent’s grade,themodelthatwouldbestexplainhowthisdatawasgeneratedwouldbea linearmodel.Evenif

FIGURE1.2 “Modelfit” asanoverlapofdatawiththeory.
“MODEL”?

FIGURE1.3 HebbianYerkes–Dodsonperformance–arousalcurve.Source:Diamondetal.(2007).Licensed underCCby3.0.

therelationshipbetweenhoursstudiedandstudentgradewasnotperfect,aperfectlinemightstillbe the “best” summary.Modelsareoftenusedtoaccountformessyorimperfectdata.

AnotherexampleofamodelistheclassicHebbianversionofthe Yerkes–Dodson curveexpressing therelationshipbetweenperformanceandarousal,depictedinFigure1.3.

Thecurveisaninverted “U” shape(anapproximateparabola)thatprovidesausefulmodelrelating thesetwoattributes(i.e.,performanceandarousal).Ifoneexhibitsverylowarousal,performancewill beminimal.Ifoneexhibitsaveryhighdegreeofarousal,performancewilllikelyalsosuffer.However, ifoneexhibitsamoderaterangeofarousal,performancewilllikelybeoptimal.Themodelinthiscase, asinmostcases,doesnotaccountfor all thedataonemightcollect.Theextenttowhichitaccountsfor most ofthedataistheextenttowhichthemodelmaybe,ingeneral,deemed “useful.” Theuseofa modelisalsoenhancedifitcanmakeaccuratepredictionsoffuturebehavior.

Asanotherexampleofamodel,considerthenumberofO-ringincidentsonNASA’sspaceshuttle (thefleetisofficially,andsadly,retirednow)asafunctionoftemperature(Figure1.4).Atverylowor hightemperatures,thenumberofincidentsappearstobeelevated.Asquarefunctionseemstoadequatelymodeltherelationship.Doesitaccountforallpoints?No.Butnonetheless,itprovidesafairly

FIGURE1.4 NumberofO-ringincidentsonboostersasafunctionoftemperature.

goodsummaryoftheavailabledata.SomehavearguedthathadNASAhadsuchamodel(i.e.,essentiallythelinejoiningthepoints)availablebefore Challenger waslaunchedonJanuary28,1986,the launchmayhavebeendelayedandtheshuttleandcrewsavedfromdisaster.2 Wefeaturethisdatainour chapteronlogisticregression.

WhydidGeorgeBoxsaythat allmodelsarewrong,someareuseful?Thereasonisthatevenifwe obtainaperfectlyfittingmodel,thereisnothingtosaythatthisisthe only modelthatwillaccountforthe observeddata.Some,suchasFox(1997),evenencouragedivorcingstatisticalmodelingasaccounting fordeterministicprocesses.Indiscussingthedeterminantsofone’sincome,forinstance,Foxremarks:

Ibelievethatastatisticalmodelcannot,andisnotliterallymeantto,capturethesocialprocessbywhich incomesare “determined”… Noregressionmodel,notevenoneincludingaresidual,canreproducethis process Theunfortunatetendencytoreifystatisticalmodels – toforgetthattheyaredescriptivesummaries,notliteralaccountsofsocialprocesses – canonlyservetodiscreditquantitativedataanalysisin thesocialsciences.

(p.5)

Indeed,psychologicaltheory,forinstance,hasadvancednumerousmodelsofbehaviorjustasbiologicaltheoryhasadvancednumeroustheoriesofhumanfunctioning.Twoormorecompetingmodels mayeachexplainobserveddataquitewell.Sometimes,andunfortunately,themodelweadoptmay havemoretodowithoursociological(andevenpolitical)preferencesthananythingtodowithwhether oneismore “correct” thantheother.Science(andmathematics,forthatmatter)isa human activity, andoftentheoriesthataredeemedvalidortruehavemuchtodowiththe spiritofthetimes (thesocalled Zeitgeist)andwhatthescientificcommunitywillactuallyacceptandtolerateasbeing true 3 Of course,thisisnottrueinallcircumstances,butyoushouldbeawareofthefactorsthatmaketheories popular,especiallyinfieldssuchassocialsciencewhere “hardevidence” canbedifficulttocomeby. Thereasonthe experiment isoftenconsideredthe “goldstandard” forevidenceisbecauseitoften(but notalways)helpsusnarrowdownnarrativestoafewcompellingpossibilities.Instrictlycorrelational research,isolatingthecorrectnarrativecanbeexceedinglydifficultornearlyimpossible,despitewhich narrative wewishuponourdata themost.Goodsciencerequiresaverycriticaleye.Whetherthe theoryisthatoftheBigBang,thedeterminantsofcancer,ortheoriesofbystanderintervention,all ofthesearenarrativestohelpaccountforobserveddata.

1.3SOCIALSCIENCESVERSUSHARDSCIENCES

Thereisoftenstatedadistinctionbetweentheso-called “soft” sciencesandthe “hard” sciences(Meehl, 1967).Thedistinction,asistrueinmanycasesofsomanythings,is fuzzy and blurry andrequires deeperanalysistofullyunderstandtheissue.Thedifferencebetweenwhatis “soft” andwhatis “hard” sciencehasusuallyonlytodowiththe object ofstudy,andnotwiththemethodofanalyticalinquiry. Forexample,considerwhatdistinguishesthescientistwhostudiestemperatureofahumanorganismcomparedtoascientistwhostudiestheself-esteemofadolescents.Theiranalyticalapproaches,at theircore,willberemarkablysimilar.Theywillbothmeasure,collectdata,andsubjectthatdatato curve-fittingorprobabilisticanalysis(i.e.,statisticalmodeling).Their objects,however,arequitedifferent.Indeed,somemayevendoubtthe measurability ofsomethingcalled “self-esteem ” inthefirst

2 SeeFriendly(2000,pp.208–211)forananalysisoftheO-ringdata.SeeVaughan(1996)foranaccountofthesocial,political, andmanagerialinfluencesatNASAthatwerealsopurportedlyresponsibleforthedisaster.

3 ThereaderisstronglyencouragedtoconsultKuhn’sexcellentbook TheStructureofScientificRevolutions inwhichaneminentphilosopherofsciencearguesforwhatmakessometheoriesmorelongstandingthanothersandwhysometheoriesdropout offashion.So-called paradigmshifts arepresentinvirtuallyallsciences.Anawarenessofsuchshiftscanhelponebetterput “theoriesoftheday” intotheirpropercontext.

place.Isself-esteem real?Doesitactuallyexist?Attheheartofthedistinction,really,isthatof measurement.Oncemeasurementofanobjectisagreedupon,thedebatebetweenthehardandsoftsciences usuallyvanishes.Bothscientists,naturalandsocial,aregenerallyaimingtodothesamething,andthat istounderstand,documentphenomena,andtoidentify relations amongthesephenomena.AsHays (1994)putitsowell,theoverreachinggoalofscience,atitscore,istodetermine whatgoeswithwhat Virtuallyeveryscientificinvestigationyoureadabouthasthisunderlyinggoalbutmayoperationalize andexpressitinavarietyofdifferentways.

Socialscienceisacourageousattempt. Hardsciencesare,inmanyrespects,mucheasierthan thesoftersocialsciences,notnecessarilyintheirsubjectmatter(organicchemistry is difficult),but ratherinwhattheyattempttoaccomplish.Studyingbeats-per-minuteinanorganismisrelatively easy.Itisnotthatdifficulttomeasure.Studyingsomethingcalled intelligence ismuch,muchharder. Why?Becauseevenarrivingatasuitableandagreeableoperationaldefinitionofwhatconstitutes intelligenceisdifficult.Mostmoreorlessagreeonwhat “heartrate” means.Fewerpeopleagreeon whatintelligencereallymeans,evenifeveryonecanagreethatsomepeoplehavemoreofthemysterious qualitythandoothers.Butthestudyofanobjectofscienceshouldimplythatwecanactuallymeasureit. Intelligence,unlikeheartrate,isnoteasilymeasuredlargelybecauseitisa construct opentomuch scientificcriticismanddebate.Evenifweacknowledgeitsexistence,itisadifficultthingto “tapinto.”

Giventhedifficultyinmeasuringsocialconstructs,shouldthisthenmeanthesocialscientistgiveup andnotstudytheobjectsofhisorhercraft?Ofcoursenot.Butwhatitdoesmeanisthatshemustbe extremely cautious, conservative,and tentative regardingconclusionsdrawnfromempiricalobservations.Thesocialscientistmustbeupfrontabouttheweaknessesofherresearchandmustbevery carefulnotto overstate conclusions.Forinstance,wecanmeasuretheextenttowhichmelatonin,a popularsleepaid,reducesthetimetosleeponset(i.e.,thetimeittakestofallasleep).Wecanperform experimentaltrialswherewegivesomesubjectsmelatoninandothersnoneandrecordwhofallsasleep faster.Ifwekeepgettingthesameresultstimeandtimeagainacrossavarietyofexperimentalsettings, webegintodrawtheconclusionthatmelatoninhasaroleindecreasingsleeponset.Wemaynotknow why thisisoccurring(maybewedo,butIampretendingforthemomentwedonot),butwecanbe reasonablysurethephenomenonexists,that “something” ishappening.

Now,contrastthemelatoninexampletothefollowingquestion Dopeopleofgreaterintelligence,onaverage,earnmoremoneythanthoseoflesserintelligence?Wecouldcorrelateameasure ofintelligencetoincome,andinthisway,weareproceedinginasimilar empirical (evenifnot experimental,inthiscase)fashionaswouldthenaturalscientist.However,thereisaproblem.Thereisa big problem.Sincefewconsistentlyagreeonwhatintelligence is orhowtoactuallymeasureit,oreven whetherit “exists” inthefirstplace,weareunsureofwheretoeven begin.OnceweagreeonwhatIQ is,howitismeasured,andhowwewillidentifyitandnameit,thecorrelationbetweenIQandincomeis asreputableandrespectableasthecorrelationbetweensuchvariablesasheightandweight.Itisgetting tothevery measurement ofIQthatistheinitialhard,andskepticswouldargue,impossiblepart.But weknowthisalreadyfromexperience.Convincingaparentthathersonhasanelevatedheartrateis mucheasierthanconvincingherthathersonhasadeficitinIQpoints.Onephenomenonismeasurable. Theother,perhapsso,butnotnearlyaseasily,oratminimum, agreeably.

Ourpointisthatonceweagreeontheexistence,meaning,andmeasurementofobjects,softscience isjustas “hard” asthehardsciences.Ifmeasurementisnotonsolidground,noanalyticalmethodofits datawillsaveit.Allstudentsofthesocial(andnatural,tosomeextent)sciencesshouldbeexposedto in-depthcourseworkonthetheory,philosophy,andimportanceof measurement totheirfieldbefore advancingtostatisticalapplicationsontheseobjects,sinceitisintherealmofmeasurementwherethe truecontroversiesofscientific “reputability” usuallylay.Forgeneralreadableintroductionstomeasurementinpsychologyandthesocialsciences,thereaderisencouragedtoconsultCohen,Swerdlik, andSturman(2013),FurrandBacharach(2013),andRaykovandMarcoulides(2011).Foradeeper andphilosophicaltreatment,whichincludesmeasurementinthephysicalsciencesaswell,consult Kyburg(2009).McDonald(1999)alsoprovidesarelativelytechnicaltreatment.

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.