Using multivariate statistics 7th edition barbara g. tabachnick download pdf

Page 1


Using Multivariate Statistics 7th Edition

Visit to download the full and correct content document: https://ebookmass.com/product/using-multivariate-statistics-7th-edition-barbara-g-tab achnick/

More products digital (pdf, epub, mobi) instant download maybe you interests ...

Applied Statistics: From Bivariate Through Multivariate Techniques Second

https://ebookmass.com/product/applied-statistics-from-bivariatethrough-multivariate-techniques-second/

Applied Statistics: From Bivariate Through Multivariate Techniques Second Edition – Ebook PDF Version

https://ebookmass.com/product/applied-statistics-from-bivariatethrough-multivariate-techniques-second-edition-ebook-pdf-version/

eTextbook 978-9351500827 Discovering Statistics Using IBM SPSS Statistics, 4th Edition

https://ebookmass.com/product/etextbook-978-9351500827discovering-statistics-using-ibm-spss-statistics-4th-edition/

Elementary Statistics 11th Edition Allan G. Bluman

https://ebookmass.com/product/elementary-statistics-11th-editionallan-g-bluman/

Statistics for People Who (Think They) Hate Statistics: Using Microsoft

https://ebookmass.com/product/statistics-for-people-who-thinkthey-hate-statistics-using-microsoft/

Discovering Statistics Using IBM SPSS Statistics: North American Edition 5th Edition, (Ebook PDF)

https://ebookmass.com/product/discovering-statistics-using-ibmspss-statistics-north-american-edition-5th-edition-ebook-pdf/

Introductory Statistics Using SPSS 2nd Edition, (Ebook PDF)

https://ebookmass.com/product/introductory-statistics-usingspss-2nd-edition-ebook-pdf/

Statistics: Informed Decisions Using Data 4th Edition, (Ebook PDF)

https://ebookmass.com/product/statistics-informed-decisionsusing-data-4th-edition-ebook-pdf/

Methods of Multivariate Analysis (Wiley Series in Probability and Statistics Book 709) 3rd Edition –Ebook PDF Version

https://ebookmass.com/product/methods-of-multivariate-analysiswiley-series-in-probability-and-statistics-book-709-3rd-editionebook-pdf-version/

Using Multivariate Statistics

California State University, Northridge

California State University, Northridge

Portf olio Manager: Tani maa Mehra

Content Pro d ucer: Kani Kapoor

Portf olio Manager Ass istant: Anna Austin

P roduct Mark eter: Jessicn Quazza

Art/D esigner: Integra Software Services Pvt. Ltd.

Fu ll-Service P r oject Manager : Integra Software Services Pvt. Ltd.

Compositor: Integra Software Services Pvt. Ltd

P rinter/Binder: LSC Communications, Inc.

Cover Printer: Phoenix Co/or/Hagerstown

Cover Design : Lumina Dntamntics, Inc.

Cover Art: Slnttterstock

Acknowledgments of thi rd p arty co ntent a ppear o n p ages within th e text, which constitutes an extens ion o f this co p yrigh t page.

Copyright © 2019, 2013, 2007 b y Pe arson Educatio n , I nc or i ts a f filiates. All Rights Reserved. Printed in th e United S tates of Ame rica This publication is protected b y copyr ig h t, an d pe rmiss ion s h o uld be obtained from th e publis her pri or to an y p rohibited reproduc ti on, storage in a retrieval system, or tra ns m ission in any form o r b y a ny m eans, electronic, m ech an ical, pho tocop ying, reco rding, o r otherw ise. For infom1ation rega rding p ermiss io ns, req uest forms a nd the a ppro p riate contacts within the Pearson Educatio n Globa l R;ghts & Permiss io ns d epar tm en t, please vis it www.pearson ed.com/ p e rmissions/.

PEARSON an d ALWAYS LEARNING are exclus ive trad em arks own ed by Pear son Education , Inc. or its a ffili ates, in th e U.S., and / or oth er countries .

Unless o th erwise ind icated herein, an y thir d -party trade marks th a t may appear in this wo rk are th e property o f th eir res pec ti ve owners an d an y refe rences to thir d -party trade marks, logos o r o ther trade d r ess are for dem o nstra ti ve o r descripti ve purp oses only. S ucll re fe rences ar e not inte nded to im p ly any sponsors h ip, e ndorsement, a u th oriza ti o n, o r pro m otio n o f Pearson's products b y the own ers o f s uch m ar ks, or an y rel a ti o ns h ip b etween th e owner a nd Pearson Educatio n, Inc. or its affiliates, a u thors, licensees or distrib utors.

Man y o f the d es igna ti o ns b y m a nufac ture rs an d seller to d istinguish th ei r p rod ucts are clain1ed as trademar ks. Where those designations a ppea r in th is book, an d th e pu b lish e r was aware o f a tra d emark cl aim , the d es ignations have been p rinted in initial caps o r all caps.

Lib r ary of Congr ess Catal oging-in-Publication D ata

Nam es: Tabachnick, Barb ar a G., a u th o r I Fidell, Linda S., author.

Ti tl e: Us in g mu l tivar ia te s tatis tics/Ba rbar a G. Taba cllnick, Ca liforni a Sta te U n iversity, North r i dge, Linda S. Fidell, Califo rni a Sta te U ni ver sity, Northri dge.

Descri p tion : Seventh edition I Boston : Pearso n, [2019) 1 Chapte r 14, b y Jod ie B. Ullm an

Id entifie rs : LCCN 2017040173 1 ISBN 9780134790541 I ISBN 0134790545

S ubjec ts : LCSH: Mu l tivar ia te an alysis. I Statistics.

C lassifica ti on : LCC Q A278 .T3 2019 I DOC 519 5/35--dc23

LC record ava il ab le a t h ttps :/ /lccn .loc.gov / 2017040173

1 18

Books a Ia Carte

ISBN -1 0: 0-13-479054-5

ISBN-13: 978 -0-13-479054-1

2.1 Research Questions and Associated Techniques 15

2 .1 .1 Degree of Relationship Among Varia bles 15

2.1.1.1 Bivariate r 16

2.1.1.2 Multiple R 16

2.1.1.3 Sequentia l R 16

2.1.1.4 Canonica l R 16

2.1.1.5 M ul tiway Freq uen cy Ana lys is 17

2.1.1.6 M ultileve l Mode ling 17

2. 1 2 S ig n ifica n ce o f C ro up Diffe rences 17

2.1.2.t One-Way ANOVA and I Tes t 17

2.1.2.2 One-Way ANCOVA 17

2.1.2.3 Facto ria l ANOVA 18

2.1.2.4 Factoria l ANCOVA 18

2.1.2.5 lfotelling's

2.1.2.6

2.1.2.7

2.1.2.8

2.1.2.9

2.1.3 Prediction of Group Membership

2.1.3.1 One-Way Discriminant Analysis

2.1.3.2 Sequential One-Way Discriminant Analysis

2.1.3.3 Multi way Frequency Analysis (Logit)

2.1.3.4 Logistic Regression

2.1.3.5 Sequential Logisbc Regression

2.1.3.6 Factorial Discriminant Analysis

2.1.3.7 Sequential Factorial Discriminant Analysis

2 .1.4 Structu re

2.1. 4.1 Principal Components

2.1 .4.2 Factor Ana lysis

2.1. 4.3 Stmc tural Equation Mode ling

2.1.5 Tim e Course of Eve nts

2.1.5.1 Survival/Failure Analys is

2.1.5.2 Tim e-Se ries A na lysis

2 2 So m e Furth e r Co m pa riso ns

2 .3 A Decision Tree

2 .4 Techn iq u e Chap ters

2 .5 Prelimina r y Check of the Data

Re view of Uni va riate and Bi variate S ta ti s ti cs

3.1 Hypothesis Tes ting

3.1.1 One-Sample z Test as Prototype

3.1.2 Power

3.1.3 Extensions ofthe Model

3.1.4 Controversy Surrounding Significance Testing

3.2 Analysis of Variance

3.2 1 One-Way Between-Subjects ANOVA

3.2 2 Factorial Between-Subjects ANOVA

3.2.3 Withi n -Subjects ANOVA

3.2 .4 M ixed Between-Within-S u bjects ANOVA

3.2 .5 Design Comp lexity

3 2 5.1 Nesting

3 2 5.2 Latin-Squa re Des igns

3 2 5.3 U nequa l 11 and Nonor thogona li ty

3 2 5.4 Fixed a nd Ra ndo m Eff<>cts

3.2 .6 Sp ecific Com pari so n s

3 2 6.1 We igh ting Coefficients for Comparisons

3 2 6.2 Orthogonality of We ighting Coefficients

3.2.6.3 Obtained F for Comparisons

3.2.6.4 Critical F for PLmned Comparisons

3.2.6.5 Critical F for Post Hoc Comparisons

3.3 Parameter Estimation

3.4 Effect Size

4.1.4.1

Varian ce-Covar iance Mntriccs 73

4.1.6 Common D a ta Transformations 75

4.1.7

4 .2.1 Screening Ungrouped Data 80

4.2.1.1 Accuracy of Input, Misi.ing Data,

Distribu tions, and Univariate Outliers 81

4.2.1.2 Lineari ty and Homoscedasticity

1 3 Transformation 84

4.2.1.5 Va riables Causing Cases to Be Ou tliers

Cen tering When Interactions

4.2.2.4 Variables Causing Cases to Be Outliers 94 and Powers of IVs Are Included

4.2.2.5 Multicollinearity

5.7

5.7.2

5.7.3

5.7.4

6.3.2.1

6.3.2.2 Absence of Outliers

6.3.2.3

6.3.2.4 Normality of Sampling Distributions 173

6.3.2.5 Homogeneity of Va riance 173

6.32.7 Homogeneity of Regression 173 of Variance and Covariance

6.3.2.8 Reliability o f Co ,•ariat cs 174

65.4.1 Wiiliin-Subjects and Mixed

7.4.2

7.5

7.5.1

7.5.3.2

7.5.3.3

7.5.5.2

9.3 Limitations to Discriminant Analysis 304

9.3 1 Theoretical issues 304

9.3 2 Practical issues 304

9.3.2.1 Unequal Sample Sizes, Missing Data, and Power 304

9.3.2..2 Multivariate Normality 305

9.3 2.3 Absence of Outliers 305

9.324 of Varianc:e--Covariance Matrices 305

9.3 2.5 Linearity 306

9.3 2 6 Absence of Multicollinearity and Stngularity 306

9.4 Fundamental Equations for Discriminant Analysis 306

9 4.1 Derivation and Test of Discriminant Functions

9.4.2 C lassification 309

9.4.3 Computer Analyses of Sma ll -Samp le Ex am pl e 311

9.5 Types of D iscrim ina nt Anal yses 3 15

9.5. 1 Di rect Discri min a nt An a lys is 3 15

9.5.2 Sequential D isc r imi n a n t An a lys is 3 15

9.5.3 S tepwise (S ta t is tical) Discri mina nt Ana lysis 3 16

9.6 Some Important Iss u es 316

9.6 1 Statistical l nference 316

9.6.1.1 Criteria for Overall Statistical Significance 317

9.6. 1.2 Stepping Methods 317

9.6.2 Number of Discriminant Functions 317

9.6.3 Interpreting Discriminant Functions 318

9 6 3 1 Discriminant Function Plots 318

9 6 .3 .2 Structure Matrix of Loadings 318

9.6.4 Evaluating Predictor Variables 320

9.6.5 Effect Size 321

9.6.6 Design Complexity: Factorial Designs 321

9.6.7 Use of Classification Procedures 322

9 6 7 1 Cross-Validation and New Cases 322

9 6 7 2 jackknifed Classification 323

9 6 7 3 Evaluating Improvement in Classification 323

9.7 Comp lete Examp le of Discriminan t Anal ysis 3 2 4

9.7.1 Evalua ti on of Assum pti o n s 3 25

9.7.1.1 Unequa l Sample S izes and Miss ing Da ta 325

9.7. 1.2 Mul tiva ria te No rma li ty

9.7.1.3 Linea rity

9.7.1.4 Outliers

9.7.1.5 l lomogeneity of Varian ceCovariance Matrices

9.7.1.6 Multicollinearity and Singular ity

9.7 2 Direct Discrin1inan t Ana lysis

9.8 Comparison of Programs

9.8.1

9.8.2

9.8.3

10.2 Kinds of Research Questions

12.3

12.2.1 Number of Canonical Variate Pairs

12.2.2 Interpretation of Canonical Variates

12.2.3 Importance of Canonical Variates

12.3.2.1 Ratio of Cases to IVs

12.3.2.2 Norma lity, Linearity, and

13.8 Comparison of Programs 525

13.8.1 IBM SPSS Package 527

13.8.2 SAS System 527

13.8.3 SYSTAT System 527

14 Structural Equation Mod e ling by Jod ie B. Ullman 528

14.1 Genera l Purpose and Description 528

14.2 Kinds of Research Questions 531

14.2.1 Adequacy of the Model 531

14.2.2 Testing Theory 532

14.2.3 Amount of Variance in the Variables Accounted for by the Factors 532

14.2.4 Reliability of the Indicators 532

14.2.5 Parameter Estimates 532

14.2.6 Intervening Variables 532

14.2.7 Croup Differences 532

14.2.8 Longitudinal Differences 533

14.2.9 Multilevel Modeling 533

14.2.10 Latent Class Ana lysis 533

14.3 Limitations to Structura l Equation Modeling 533

14.3.1 Theoretical Issues 533

14.3.2 Practical Issues 534

14.3.21 SampleSizeand Missing Data 534

14 3.2.2 Multivariate Norm.1lity

14 3.2.3 Linearity 535

14.3.2.4 Absence of Multi collinearity and Singularity

143.2.5 Residuals

14.4 Fundamen tal Equations for Slructura l Equat ions Modeling

14.4.1 Covariance Algebra

14.4.2 Model Hypotheses

14.4.3 Model Specification

14.4.4 Model Estimation 535 535 535 535 537 538 540

14.45 Model Evaluation 543

14.4.6 Computer Analysis of Small-Sample Example 545

14.5 Some Important Issues 555

14.5.1 Model Identification 555

14.5.2 Estimatio n Techniqu es 557

14.5.2.1 Estimation Methods and Sample Size 559

14.5.2.2 Estimatio n Methods and Nonnormality 559

14.5.2.3 Estimation Metllods and Dependence 559

14.5.24 Some Recommendations for Choice of Estim.1tion Metllod 560

14.5.3 Assessing the Fit of the Model 560

14.53.1 Comparative Fit Indices 560

14.5.3.2 Absolute Fit Ind ex 562

14 5 3 3 Indices of Proportion of Variance Accounted 562

t4.5.3 4 Degree of Parsimony Fit Indices 563

14.5.3.5 Residual-Based Fit Indices 563

14.5.3.6 Choosing Among Fit Indices 56-1

14.5.4 Model Modification 564

t4.5.4. L Chi-Squa re Difference Test 564

t4.5.4.2 Lilgrange Multiplier ( LM) Test 565

t4.5.4.3 Wald Test 569

14.5.4.4 Some Caveats a nd Hints on Model Modification 570

14.5.5 Reliability and Proportion of Variance 570

14.5.6 Discrete and Ordinal Data 571

1-1.5.7 Multiple Croup Models 572

1-1.5.8 Mean and Covariance Structure Models 573

14.6 Complete Examples of Structural Equation Modeling Analysis 574

14.6.1 Confirmatory Fac tor Analysis of the WISC 574

t4.6.1.1 Mode l Specifica tion for CFA 574

14.6.1.2 Eval u atio n of Assumptions forCFA 574

14 6 1.3 CFA Model Estimation and Preliminary Evaluation 576

14 6 1.4 Model Modification 583 1-1.6.2 SEM of Health Data 589

14 6 2 1 SE..\<1 Model Specification 589

14 6 2 2 E'•aluation of Assumptions forSEM 591

14.6.2.3 SEM Model Estimation and Preliminary Evaluation

t4.6.2.4 Mode l Modification

14.7 Compa rison of Programs

14.7.1 EQS

14.7.2 LlSREL

14.7.3 AMOS

14.7.4 SAS System

Multilevel Linear Modeling

15.1 General Purpose and Description

15.2 Kinds of Research Questions

15.2.1 Croup Differences in Means

15.2.2 Croup Differe nces in Slopes

15.2.3 Cross-Leve l In teracti ons

15.2.4 Me ta -Analysis

15.2.5 Re lative Strength of Predictors at Various Levels

15.2.6 Individual and Croup Structure

15.2.7 Effect Size

15.2.8 Path Analysis at Individual and Croup Levels

15.2.9 Analysis of Longitudinal Data

15.2.10 Multilevel Logistic Regression

15.2.11 Multiple Response Analysis

15.3 Limitations to Multilevel Linear Modeling 618

15.7.1.1 Samp le Sizes, Missing

15 3.1 Theoretical Issues 618 Data, and Distributions 656

15.3.2 Practical Issues 618 15.7.1.2 Outliers 659

15.3.2.1 Sample SUe, Unequal-11, and l\1issing Data 619

15.3.2.2 Independence of Errors 619

15.4 Fundamental Equations 620

15.4 .1 Intercepts-Only Model 623

15 4 1.1 The Model:

Level-l Equation 623

15 4 1.2 The Intercepts-Only Model: Level-2 Equation 623

15 4 1.3 Computer Analyses

of Intercepts-Only Model 624

15.4.2 Model with a First-Level Predictor 627

15 4 2 1 Level-l Equation fora Model with a Level-l 1'1\.>dictor 627

15.4.2 2 Level-2 Equations for a

Model with a Level-l Pl\.>dictor 628

15.4.2.3 Computer Ana lys is of a Model with a Level-l

15.4.3 Mode l with Predictors a t First

15.7.1.4 Independence of Errors: lntracLlss Correlations 659

Multilevel Modeling 661

15.3.2.3 Absence of Multicollinearity and Singularity 620 15.7.1.3 Multicollinearity and Singularity 659

MlwiN

and and Second Levels 633

15.4.3.1 Level-l Equation for Model with Predictors at Both Levels 633

15 4 .3.2 Level-2 Equations for Model with Predictors 16.3 Limitations to Multiway Frequency

1 Independence

at Both Levels 633 16.3.2 2 Ratio of Cases to Variables 675

15 4 .3.3 Computer Analyses of Model with Predictors at 16.3.2.3 Adequacy of Expected Frequencies 675 First and Second Le,·els 634 16.3.2.4 Absence of Outliers in the

15.5 Types of ML\11 638 Solution 676

15.5.1 Repeated Measures 638

15.5.2 Higher-Order ML\11 642

15.5.3 Latent Variables 642

16.4 Fundamental Equations for Multiway Frequency Analysis 676

16.4 1 Screening for Effects 678

15.5.4 Nonnormal Outcome Variables 643 16.4.1.1 Total Effect 678

15.5.5 Multiple Response Models 644

15 6 Some Important Issues 644

15.6.1 l ntraclass Cor relation 644

15.6.2 Cente ring Predic tors and Changes in Their In te r pretations 646

16 4.1.2 First-Order Effects 679

16.4.1.3 Second-Order Effects 679

16.4.1.4 Third-Order Effect 683

15.6.3 Interactions 648 16.4.2 Modeli ng 683 16.4.3 Eva luation and Interpretatio n 685 16.4.3 1 Residua ls 685

15.6.4 Rando m a nd Fixed In tercep ts and Slopes 648

15.6.5 Statistical Infere nce 651

15.6.5.1 Assessing Models 651

15.6.5.2 Tests of Individual Effects 652

15 6.6 Effect Size 653

15.6.7 Estimation Techniques and Convergence Problems 653

15.6.8 Exploratory Model Building 654

15.7 Complete Example of MLM 655

16.4.3.2 J>aramctcr Estimates 686

16.4.4 Compu te r Ana lyses of S ma ll -Sa mp le Example 690

16.5 Some Important Issues 695

16.5.1 H ierarchical and Nonhierarchical Models 695

16.5.2 Statistica l Criteria 696

16.5.2 1 Tests of Models 696

16.5.2.2 Tests of Individual Effects 696

16.5.3 Strategies for Choosing a Model 696

16.5.3 1 IBM SPSS Hll.OGLINEAR

15.7 1 Evaluation of Assumptions 656 ( Hierarchial) 697

16.5.3.2 SPSS GENLOG

(General Log-Linear)

SPSS LOCUNEAR (General

16.6 Complete Example of Multiway

Evaluation of Assumptions:

o f Expected Frequencies

Hierarchical Log-Linear Ana lys is

16.6.2.1 Preliminary Model Screening

16.6.2.2 Stepwise Model Se lection

Interpretation of the

Analysis

Effect Size and Power

17.3 Assumptions of Time-Series Ana lysis 718

17 3.1 Theoretical Issues 718

17.3.2 Practical Issues 718 An Overview of the General

17.3.2.1 Norma lity of Dis tributiOltS

of Residua Is 719

I7 .3.2.2 Homogeneity of Variance

and Zero Mean of Residuals 719

Bivariate to Multivariate Statistics

17.3.23 Independence of Residuals 719 and Overview of Techniques n5

17.3.24 Absence of Outliers 719 18.2.1 Bivariate Form n5

17.3.2.5 Sample Size and Missing Data 719

17.4 Fundamental Equations for

Tune-Series ARIMA Models no

17.4.1 Identification of A RIMA

(p, d, q) Models no

17.4.1.1 Trend Components, d: Making the Process Stati onary 721

Auto-Regressive Components 722

17.4.1.3 Moving Average Components 724 17.4.1.4 Mixed Models 724

17.4.1.5 ACFs and PACFs

17.4.3 Diagnosing a Model

17.4 4 Computer Analysis of Small-Sample

Types of Tune-Series Analyses

17.5.1 Models with Seasonal Components

Appendix A

y Introduction to

and

Multiplication, Transposes, and Square 1 7.5.2 Models with Interventions

A.6 Matrix "Division" (Inverses and B.7 Impact of Seat Belt Law

Determinants)

A.7 Eigenvalues and Eigenvectors: Procedures for Consolidating Variance

from a Matrix 788

The Selene Online Educational Game

C.2 Critical Val u es of th e t D istrib u tion

Research D es igns fo r Co mp le te fo r a = .05 a nd .01, Two-Tailed Test

Examples

B.1 Women's Health and Drug Study 791

C.3 Cri t ica l Va lues of the F Dis trib utio n

Crit ica l Va lues of Chi Sq uare (r)

c.s Critica l Values for Squares Multiple

B.2 Sexual Attraction Study 793 Correlation in Forward Stepwise

B.3 Learning Disabilities Data Bank

B.4 Reaction Ttme to Identify Figures 794 C.6 Critical Values for

B.S Field Studies of Noise-Induced Sleep Distribution for a = .05 and .01

Disturbance B.6 Clinical Trial for Primary Biliary

Cirrh osis

Preface

Some good thi n gs seem to go o n forever: frie ndship a nd u pdating this book. It is d ifficu l t to b e lieve th at the firs t e d i tion manuscript was t ypew r it ten , w i th rea l cu ttin g and pastin g. The pub lisher requ i red a p aper manuscrip t w ith numb e red pages- that was almos t o ur downfa l l. We cou ld write a book on mu l tiva ria te sta tis ti cs, bu t we couldn ' t get the same numbe r of pages (abou t 1200, doub le -spaced) t wice in a r ow. SPSS was in release 9.0, and the o ther p rogram we d emons trated was BMDP. There we re a mere 11 chap ters, o f which 6 of th em wer e describing techniques. Mu l tilevel and stru ctura l equ ati on mode l ing were not ye t ready for prim e tim e. Logis tic regression a nd s ur vival a n a l ysis we re n o t yet popu lar. Ma te ria l n ew to this ed i tion includes a redo of all SAS examp les, with a p retty new o utput fo rma t and replacement of inte r active ana lyses that are n o longer available. We've a lso re-run the IBM SPSS examples to s h ow the new o u tput format. We've tried to upda te the references in a ll ch apters, in cluding only classic citations if they d ate prior to 2000. New work on rela ti ve importance has been incorpora ted in mul tip le regressi on , canonical correla tion, and logistic regress ion ana lysis-comp le te w i th d emonstrations. Mul tip l e impu ta ti on procedu res for deal ing w i th missing da ta have been u pdated, and we've added a new time-se ri es example, ta king a d vantage o f an IBM SPSS exp er t modeler tha t replaces p revious tea - leaf read ing aspects of the analysis.

Our goals in writing the book remain the same as in all prev i ou s ed iti ons- to p resen t comp lex s tatistical procedures in a way tha t is maximally u seful a n d access ib le to researchers who a re n o t necessari ly s ta tistici ans. We s tri ve to be sho rt on theor y but long on conceptu al unders tanding. Th e sta tistica l packages have become in creasing ly easy to u se, making i t all the more cri tical to make sure th at th ey a re applied w ith a good understand ing of what they can and cannot do. But above a ll else-wh at does it a ll mean?

We have not changed the bas ic fo rmat underl ying all of the techni que ch ap ters, n ow 14 of them. We s tart w i th an over v iew of the techniq u e, followed b y the types of resear ch q u es ti on s the techniques are designed to answer We then p r ov i de the ca uti on ary tale-wha t you n eed to worry about and h ow to deal with those worries. Then come the fundamenta l equ a ti on s und erlying the techniqu e, which some readers truly enjoy wo rking through (we know because they h e lpfully point o ut any e rro rs a nd / or inconsistencies they fin d); but o ther rea d ers discover they can skim (or skip) the section wi thout any loss to the ir ab ility to conduct meaningful ana lysis o f their research Th e fundamental equations are in the context o f a small, made-up, usu ally sill y da ta set fo r which compu ter ana lyses are p rovided- usually IBM SPSS and SAS. Next, we del ve into iss u es surrounding the technique (such as different types o f the analys is, foll ow-u p procedures to the main analys is, and effect s ize, if i t is not amp ly covered e lsewh ere). Finally, we prov ide o n e or two full-bore analyses of an actu al rea l-life da ta se t together w i th a Results section app ropria te for a journa l. Data sets for these examp les are ava ilab le at www.pearsontug h ered.com in IBM SPSS, SAS, and ASCTI forma ts. We end each techniqu e chapte r w i th a comparison o f features avail ab le in IBM SPSS, SAS, SYSTAT and sometimes o ther specialized p r ograms. SYSTAT is a sta tis tical package that we relu ctan tl y h ad to d r op a f ew edi tions ago fo r lack of space.

We apo logize in advance for the h eft o f the book; it is not o ur intention to line the coffers of cruropractors, p h ys ical therap ists, acupunctur ists, and the l i ke, but there's rea ll y ju s t so much to say. As to o ur fr iendsh ip, it's still going s trong desp i te living in d iffe rent ci ti es. Art has taken the p lace o f creating b elly dance costum es fo r both o f u s, but we remain sill y in ou tl ook, a ltho u g h seri ou s in o ur analysis of research.

Th e lineup of p eople to thank grows wi th each ed i tion, far too ex tensi ve to lis t: s tu dents, rev iewers, ed itors, and readers w h o send u s correction s a nd point o u t areas of confus i on As a l ways, we ta ke full responsib ility fo r remaining errors and l ack o f clari ty.

Barbnrn G. Tabachnick

Linda S. Fidel/

Chapter 1 Introduction

Learning Objectives

1 .1 Explain the importance of multivariate techniques in analyzing research data

1 .2 Describe the basic statistical concepts used in multivariate analysis

1 3 Explain how multivariate analysis is used to determine relationships between variables

1 .4 Summarize the factors to be considered for the selection of variables in multivariate analysis

1.5 Summarize the importance of statistical power in research study design

1 .6 Describe the types of data sets used in multivariate statistics

1.7 Outline the organization of the text

1. 1 Multivar i ate Statis ti c s: Why?

Mu lti varia te s tatistics are in creasin g ly popular techniques u sed fo r analyzing comp licated da ta sets. They p r ovide ana lysis when there are many ind ependen t va riables (IVs) and/or many dependent va riables (DVs), all correla ted w i th one another to varyin g degrees. Because of th e difficul ty in addressing compl ica ted research q u esti on s w it h u nivaria te ana l yses and beca u se of the ava ilab il i ty of h ighly developed software for performing multiva ria te ana lyses, m ultivaria te statis ti cs have become widely u sed. Indeed, a standa rd u nivaria te s tatistics course only begins to prepare a stu dent to read resea rch li terature or a research e r to p roduce it.

But how much h arder are the m u lti varia te techniqu es? Compared w i th the mul ti varia te methods, u n i varia te statis ti ca l methods a re so s tra i ghtfo r ward and nea tl y struc tured tha t i t is ha rd to be lieve they once took so much e ffort to mas te r. Ye t many resea rchers app ly and correctl y in terp re t resu l ts of in trica te anal ysis of va riance before the grand structure is apparent to them. The same can be true of multiva ria te s tatistical me thods. A lthou g h we are delighted if you ga in in s ights in to th e full multivariate general linear model , 1 we have accomp lis h ed o u r goal if you feel comfortable se lectin g and setting u p multiva ri ate a n a lyses and in te rp reting the comp u te r o u tpu t.

Mu lti varia te methods are more complex than u nivar ia te by at leas t a n order of magnitude. However, fo r the most part, th e greater comple xity requires few con ceptual leaps. Famil iar con cepts s u ch as sampling distrib utions and homogen e ity of variance s imp l y become more elab ora te.

Mu lti varia te models have not ga ined popula ri t y by acci den t-or even by s iniste r design. Th eir grow ing populari ty pa rallels the g reate r comp lexity o f contemporary research

In psych o logy, for example, we are less and less enamo red of the s imple, clean, l abora to r y s tu dy, in which pliant, first-yea r college s tu den ts each prov i de u s w i th a s ingle behavior al measure on cue.

1.1.1 The Domain of Multivariate Statistics: Numbers of IVs and DVs

Mul ti varia te statis ti ca l me thods a re an extens ion of urtivar ia te and b i varia te s ta tistics. Mul tivaria te s ta tistics are the complete or gene ral case, w h ereas univa r iate and b i varia te s tatistics are specia l cases o f the mu lti varia te model. If your des ign has many variab les, mu lti varia te techniqu es o ften let you per fo r m a sin g le ana l ysis in s tead of a seri es o f u n iva r iate or b i varia te analyses.

Variab les are r oughly dichotomized into two majo r t ypes- independen t and depen den t. Indepen dent va r iables (IVs) are the diffe ri n g condi ti ons (treatmen t vs. placebo) to whi ch you expose your research pa rticipants o r the charac te r istics (tall o r short) tha t the pa rticipants themselves bring in to the research si tu a ti on IVs are u s u ally conside red predic to r variables beca u se they predic t th e DVs-t h e response o r ou tcome va riables. Note tha t IV and DV a re defined wi thin a research context; a DV in one resear ch set ting may be an IV in another

Additi on a l te r ms fo r IVs and DVs are p red i ctor-c ri ter ion, stim ulus-response, tas kperformance, o r simply inpu t-ou tput. We u se IV and DV t hrou g h ou t this book to identify variab les tha t belong o n one s ide of an equ ati on o r the o ther, witho u t causal imp lication. That is, the te r ms are u sed for convenience ra the r than to ind ica te tha t one of the variab les caused or determin ed the size o f th e other.

The term univariate statistics refers to analyses in w hi ch there is a s ing le DV. There may be, however, mo re than one IV. For example, the amount o f social behavi o r of gradua te students (the DV) is s tud ied as a functi on of course load (one IV) and type of training in social skills to which studen ts are exposed (another IV). Anal ysis of variance is a commonly used urtiva ria te statistic.

Bivariate statistics freq u en tl y re fers to analys is of two vari ab les, where nei th er is an expe rimenta l IV a n d the desire is simp l y to stu dy th e relati onship between th e var iab les (e.g., the re la ti onship between income and amou n t of ed u cati on). Bi varia te statisti cs, o f course, can be applied in an experimen ta l settin g, b u t u s u ally they a re not. Pro to t ypical examp les of b ivari ate statistics a re the Pearson produc t- moment cor rela ti on coefficient and chi-squa re analysis. (Chap ter 3 reviews univa ri ate a n d b ivaria te s tatistics.)

With m ul tiva ria te s tatistics, you s imu ltaneou sly ana l yze m ultip le dependent and m ul tiple in dependent va r iab les. This capabili ty is important in bo th nonexperimenta l (correl a ti on al or survey) and experimental research.

1.1.2 Experimental and Nonexperimental Research

A cr itical distinction bet ween e xperimenta l and non exper imental research is whether the research er manip u la tes the levels of t he IVs. In a n experiment, the researcher has control over the levels (or condi ti ons) of a t least one IV to whi ch a participant is exposed by dete rmining wha t the levels a re, how they are imp lemented, and how and w h en cases are ass igned and exposed to them. Furth er, the experimen te r randomly ass igns cases to levels of the IV and controls all o the r in fl uential fac to rs by h ol ding them con s ta nt, cou nterbalancing, o r ran domizing their infl u e n ce. Scores on the DV are expec ted to be the same, w ithin random varia ti on, except for the in fl uence o f the IV (Shad ish, Cook, and Campbell, 2002). If there are sys tematic differences in the DV associa ted w i th levels of the IV, th ese differences are attrib uted to the IV.

For example, if groups of undergradu a tes are randomly assigned to the same ma terial bu t differen t types o f teaching techniques, and afterwar d some grou ps o f undergrad u a tes perform be tter than others, the d iffe rence in performance is sa id, w ith some degree o f confidence, to be caused by the difference in teaching technique. In this type of research, the ter ms independent and dependent have obvious meaning: the value of the DV depends on the manipula ted level o f the IV. The IV is manipula ted by the experimen ter and the score on the DV depends on the leve l of the IV.

In nonexperimen tal (cor rela ti onal or sur vey) resea rch, the levels of th e IV(s) are no t manipu la ted by the research e r The resea r cher can define the IV, bu t has no con tro l over the assignmen t of cases to levels o f it. For example, groups o f people may be ca tegor ized in to geographi c a rea o f residence (Northeast, Mi dwest, etc.), b ut only the definition o f th e variable is u n der resear che r control. Excep t for the mili tary or p r ison , place of res iden ce is ra rely s u bject to manip ulation by a researche r Never theless, a naturally occurring d iffe rence like this is often cons idered an IV and is u sed to p red ict some other non exper imen tal (dependen t) variab le s u ch as in come. In t his type of research, the distinction bet ween IVs and DVs is u s u ally a r b i trary a n d many research e rs prefe r to call IVs predictors and DVs criterion variables.

In nonexperimen ta l resear ch, it is ve r y diffi cult to attrib u te ca u sa lity to an IV. lf the re is a systematic d iffe rence in a DV associated w i th levels of a n IV, the two va riables are sai d (w i th some degree o f confiden ce) to be rela ted, b ut the ca u se of the relationship is u nclea r Fo r examp le, income as a DV m i gh t be re la ted to geographic area, bu t n o ca u sa l associa ti on is implied.

Nonexperimenta l resea r ch takes many fo rms, b u t a common examp le is the s urvey. Typ ica ll y, many peop le a re s urveyed , and each responden t prov i des answers to many q u estions, producing a large number of variables. These variabl es are u s u ally in terrelated in high ly comp lex ways, bu t univaria te and b ivaria te s tatistics a re not sensitive to this comp lex ity. Biva riate correla ti on s between all pairs of va riab les, fo r examp le, cou ld n o t revea l tha t th e 20 to 25 va r iab les meas ured really represen t only two o r three "su perva ri ab les."

lf a research goal is to distinguis h among s u bgr o u ps in a samp le (e.g., between Ca th ol ics and P r otestan ts) on th e basis of a var iet y of atti tudin al va r iab les, we cou ld u se severa l u n iva riate I tes ts (o r ana lyses o f va riance) to examine grou p d iffe rences on each va r iable separately. But if the var iab les are rela ted, which is h i ghl y likel y, the resu lts of many t tests a re misleading and s tatisti ca ll y s u spec t.

Wi th the use of m ul tiva ria te s tatistical techniqu es, comp lex inte rrela ti onsh ips among va riab les a re revea led and assessed in s ta tistical inference. Fur ther, i t is poss ible to keep the overall Type I error ra te at, say, 5%, n o ma tter how many vari ab les a re tested.

Al thou g h mos t m u lti var ia te techniqu es we re developed fo r u se in nonexper imen tal research, they a re also u seful in e xpe rimen ta l research , in which there may b e m ultip le IVs a n d mul tiple DVs. With mul tiple IVs, the research is usually des igned so that the IVs are independent of each other and a str a ightfor ward correc ti on fo r n u mero u s statistica l tests is availab le (see Chap ter 3). With mul tiple DVs, a problem o f infla ted erro r ra te arises if each DV is tested separa tely. Further, a t least some of the DVs a re li kely to be correlated w ith each o th er, so separate tests of each DV reana lyze some of the same variance. Therefore, mul ti varia te tes ts are u sed. Experimen ta l resea rch des igns w i th m u ltiple DVs were un u s u al a t on e time. Now, however, w ith attempts to make experimenta l designs more rea listic, and w i th th e availab i l ity of comp ute r programs, experiments often h ave several DVs. It is dangerou s to run an expe riment w i th o n ly one DV a n d ris k missing the impact of the IV beca u se the mos t sensi tive DV is not measured. M u lti varia te s tatistics help the expe rimen ter des ign more effici en t and more realis ti c experimen ts by allowin g measu rement of m u ltip le DVs w ithout v i ola ti on of acceptab le l evels of Type I error.

One o f the few con s idera ti ons not relevan t to ch o ice of statistical techniqu e is whe ther th e da ta a re experimen tal o r cor rela ti onal. The statis ti ca l methods "wo r k" w h e the r the researcher manip u la ted the levels of th e IV or not. But attrib uti on of ca u sali ty to resu lts is crucially affected by the expe rimenta l- n onexperimen ta l d istinction.

1.1.3 Computers and Multivariate Statistics

One answer to the question "Why m u lti vari ate s ta tistics?" is that the techni ques are now accessib le by compu ter. Only the mos t dedica ted n u mber cr unch e r would cons ider doing rea l-li fe-sized prob lems in m u ltivariate statisti cs w itho u t a compu ter. Fortun ate ly, excellen t mu lti var ia te programs are available in a numbe r of compu te r packages.

Two pack ages a re demon s trated in this b ook. Examples are based o n p r ograms in IBM SPSS and SAS.

If you have access to b o th pack ages, you are ind eed fo rtun a te. P r ogra ms w ithin the packages d o not comp letely overlap, and som e p rob le ms a re b e tte r handle d through one p ackage than the other Fo r exam p le, do in g sever a l versions o f the sa m e b asic ana lysis on the sam e set of d a ta is par ticularly easy w i th IBM SPSS, w h e reas SAS has the m ost ex tensi ve capab iliti es for sav ing deri ved sco res fro m d a ta screening o r fro m in term ed i ate a n a lyses.

C hap ter s 5 thro u g h 17 (the chap te rs th a t cover th e specialized m ulti va ria te techni ques) offer expl anations an d ill ustra ti ons of a variety of p rogra m s 2 wi thi n each package and a co mpa r ison of the fea tu res o f the p r ogra ms. We hop e tha t once you un derstand the techniques, you wi ll b e a bl e to generalize to v i r tua ll y any multiva riate progra m

Recen t ve rsions o f th e progra m s a re availa b le in Win dows, with m en u s tha t im plem e n t m ost o f th e tech n i q u es ill u s tr a te d in this b ook. A ll of t he techniques m ay b e i mp le m en te d thr o u g h syn tax, and syn tax i tself is gene rated through m en u s. Then you m ay add o r ch ange syn tax as d es i red for your ana lysis. For examp l e, you may "pas te" m enu cho ices in to a syn tax win dow in IBM SPSS, e d i t the resu l ting tex t, and th en run th e progra m . Also, syn tax gen era ted by IB M SPSS m enus is saved in the "journ a l" fi le (s ta ti s t ics.jnl ), which may also b e accessed and cop ied in to a sy n tax window. Sy n ta x gen e ra te d b y SAS menu s is reco rded in a "log" file. Th e con ten ts m ay then b e copi ed to a n in te rac ti ve w in dow, e d ite d , a n d run . Do no t ove rl ook the he lp fi les in th ese p rograms. In d eed, SAS and IBM SPSS now p rovid e the en tire set of user m an u a ls o n l in e, often w i th m ore cu rren t info rmat ion th an is availa b le in printed ma n uals.

O u r IBM SPSS demons trati ons in this book a re b ased on syntax generated throu gh menus whenever feasib le. We wou l d love to show you the sequ ence of m en u choices, b ut space d oes no t permit. And , fo r the sake of pa rsimon y, we h ave ed ited p rogram o u tpu t to ill u s trate the m a te ria l that we feel is the m ost importa nt fo r in te rpre tation.

With comm e rcia l computer packages, you need to know whi ch vers ion o f the pack age you a re usin g. P rograms a re con tin u a ll y be in g changed, and n o t a ll changes a re immed iately implem en ted a t each faci li ty. Therefo re, m a n y versions o f the var iou s p rograms are s im u ltaneou sly in use a t different insti t utions; even a t one ins ti tuti on , mo re than one ve rs ion of a pack age is som e times availab le.

Progra m u p da tes are o ften correcti on s of err ors discovered in earl ier vers ion s. Som e tim es, a new vers ion w ill change the o utp u t fo rm at bu t n o t i ts info rma ti on Occasionally, thou gh, there a re m ajo r revis ions in o n e o r m o re programs or a new program is added to the p ackage. So me tim es d efau lts change w i th u p d a tes, so tha t the ou t pu t looks diffe rent although syntax is the sa m e. Check to find ou t whi ch vers i on of each package you a re u sin g. Th en, if you a re u s in g a p r in ted m anu al, be su re tha t the m anua l you a re using is consis ten t w i th the vers ion in u se a t you r facil ity. Also chec k u pdates for error cor recti on in prev i ou s releases th a t may be relevant to so m e of your p rev iou s run s.

Excep t whe re no ted, this b ook reviews Wind ows versions o f IBM SPSS Vers ion 24 and SAS Version 9.4. Info rm ation on avai lab ility and versions of software, macros, books, and the like changes a lmost dail y. We reco mme nd the In te rn e t as a sou rce of " keeping u p."

1.1.4 Garbage In, Roses Out?

The trick in m u l ti va ria te s tatisti cs is not in co mpu tation. This is easi ly d one as discu ssed above. The tri ck is to se lect re liab le and va lid m easure m ents, ch oose the appropria te p rogram, use it co rrectly, an d kn ow how to in te rp ret the ou tput. Ou tpu t fro m comme r cial co mpu te r p rogra m s, wi th the ir bea u tifull y fo rm atted tables, graphs, and m a trices, can m ake ga rb age look l i ke r oses. Thro u g h ou t t his b ook, we try to s u ggest cl ues that revea l when the tru e m essage in the o u tpu t m o re close ly resem b l es the fertilize r than the fl owe rs.

Second , when you use multivar ia te s tatis tics, you r a rely ge t as close to the raw da ta as you d o when you app ly uni va ria te statis ti cs to a re la ti vely few cases. Er ro rs and ano m alies

2 We have re ta ined d escrip tions of features of SYSTAT (Vers ion 13) in these sections, d esp ite the re moval of detailed dem ons trations of that progra m in this edition.

in the d a ta tha t wou ld b e ob v ious if th e data we re p rocessed by hand a re less easy to spot when processing is entire ly by comp u ter. Bu t the compu ter p ackages have progr ams to g r a p h and descr ibe you r da ta in the simp lest uni vari a te te rms and to disp lay b i varia te re la ti on shi ps a m on g your variab les. As discussed in Ch a p ter 4, these p rograms p rov ide p reliminary anal yses tha t ar e abso lu te ly necessa r y if the results of m u l ti varia te programs are to be bel i eved.

Th e re a re also certa in costs associ a ted w i th the benefi ts of usin g m u l ti variate p roced u res. Benefi ts o f in creased fle xi bi li ty in research des ign, for in s ta n ce, are som etim es pa ralleled b y in creased ambiguit y in in te rp retati on o f resul ts. In add iti on , mu l tivar ia te res u l ts can be q ui te sensi tive to whi ch a n a lyti c s trategy is chosen (cf. Section 1.2.4) and d o not a l ways prov ide better p rotection again s t s tatisti ca l errors than thei r u ni va riate cou nter par ts. Add to t his the fac t tha t occas iona ll y you s till cannot ge t a firm statis ti ca l a n swer to your research q u es ti ons, a nd you m ay wond er if th e increase in complex i ty and diffi cu lty is war ranted.

Frankly, we think it is. Sl ip per y as som e of the concepts and p rocedu res are, these s ta tistics p rovid e insig h ts into relati on shi ps amon g var iables that m ay more cl osely resemb le the comp lexity o f th e " real" worl d. And so m etimes you get at least par tia l answers to q u es ti ons tha t cou l d not b e as ked a t a ll in th e univa r iate fra m ewor k . Fo r a comp le te analysis, m a king sense of your da ta u sually requires a ju dici ous mix o f multiva r iate a n d u niva ria te statisti cs.

Th e a d diti on of m u lti varia te sta tis tica l m e th od s to your rep ertoi re makes da ta analysis a lot more fu n. If you l iked u n iva r iate s tatistics, you w ill l ove m u l ti va r ia te statistics!3

1.2 So me U s eful Definitions

In o rd er to des crib e m u l ti va ria te s tatistics easi ly, i t is u sefu l to review some co mm on terms in resear ch d es ign a n d b asic s ta tistics. Dis tincti ons were m ade b e t ween IVs and DVs and be tween exp erim enta l a n d nonexperimen ta l resea rch in precedin g sec ti ons. Ad d i tiona l terms tha t a re e n countered rep ea tedly in the book b u t n o t n ecessa rily rela ted to each other a re d escr ibed in this section.

1.2.1 Cont inuo u s, Discrete, and Dicho

t omous D a t a

In appl ying statis ti ca l techniqu es of any sort, it is impor ta n t to consid er the type of meas urement and the n a ture of the cor resp ondence be tween the nu m bers and the events tha t th ey rep resent. The distincti on m ad e here is a m on g con tinu ou s, discre te, and dich o tomous va ria b les; you m ay p refe r to s ubs ti tu te the te rms interval o r quantitative fo r continuous and nominal, categorical or qualitative for dichotomous and discrete.

C on tinuou s va r iab les a re m easured o n a sca le tha t ch anges valu es s m oothl y rather than in s te p s. Continu ou s va riab les ta ke on any va lu es wi thin t h e range of the sca le, and the size o f t h e n umber reflects th e am ou n t of the va riab le. Precision is limited b y the m easu ring ins trumen t, not by the nature of th e sca l e itself. Some exa mp les of con tin uous va riab les a re time as m eas ured on an o ld-fashion ed ana l og cl ock face, ann u a l in co m e, age, te mpera tu re, d is ta n ce, a nd grade poin t ave rage (GPA).

Discre te variab les take o n a fi n i te and u s u a ll y s m a ll nu m ber of va lu es, a nd th e re is n o s m ooth transi ti on fro m o n e va lu e o r category to the next. Examp les in clu de time as displayed b y a digital cl ock , con tinen ts, ca tego ries o f rel igiou s a ffilia ti on , and t ype of communi ty (ru ra l o r urban).

So m e tim es discrete va riab les a re u sed in mu l tiva ria te analyses as if con tinu ou s if th er e a re n ume r ous ca tego ries and the ca tegor ies represen t a q uan ti tative a t t r i bu te. Fo r ins tance, a va r ia b le tha t represents age ca tego ries (wh e re, say, 1 s tan ds fo r 0 to 4 years, 2 s tand s fo r 5 to 9 yea rs, 3 stands fo r 10 to 14 years, a n d so on u p throu gh t h e n o r m a l age s p an) can b e u sed beca u se there are a lot of ca tego ri es and the n umbe rs designa te a q u anti ta tive a ttrib u te (increasing age).

Bu t the sa m e num be rs used to designa te ca tego ries of religious affil ia ti on are not in a p p ropria te

3 Do n't e\'en think about it.

fo rm fo r analys is w ith many o f the techniqu es 4 because rel igions do n o t fall a l on g a q u an ti tative con tin u um

Discrete va riables comp osed of q u al ita ti ve ly differen t categories are so m e tim es ana lyzed after b e ing changed into a n umbe r of d ichotom o u s or t wo- level va riables (e.g., Ca th olic vs. non -Ca t h o l ic, Pro testant vs. non-P r otestan t, Jewish vs. non -Jew is h , and so on un ti l t h e degrees of freedom are u sed). Reca tego riza ti on of a d iscrete variab le in to a seri es of d icho tomous ones is ca lle d dummy variable coding T he conversion of a discre te varia b le in to a ser ies o f dich o tomous o n es is done to limi t th e rela ti on s h i p b e t ween the d icho to m ous vari ab les and o th ers to linear relationships. A discre te var iable w i th more than t wo ca tegor ies can have a rela ti onship of any s h ap e w ith anothe r var iable, and the rela ti onship is ch anged a rb itr ar il y if t h e assignm en t of nu m bers to categor ies is changed. D ich o tomo us varia b les, h owever, w i th o nl y two points, can have only linea r rel a ti on s h ips wi th o the r var iables; they are, th e refore, a p p rop ria tely a n a lyzed b y m ethods u sing co rre la tion in w hi ch o nl y linea r relationships are a n a lyzed

Th e disti ncti on between con tin u o u s and discre te variab les is not always clear. If you ad d en o u g h digi ts to the digital clock, fo r instance, i t becomes for all p rac ti ca l p u r p oses a contin uous m easu r ing dev ice, whereas time as m easu red b y the analog dev ice can a lso be rea d in discrete ca tegor ies s u ch as hou rs or half ho urs. In fact, any contin u ou s meas ure m en t m ay b e rend ered discre te (or dich otomous) w ith som e loss o f infor m a ti on, b y specifying cu toffs on the con tinu ous scal e.

Th e p roper ty of var iab les tha t is crucial to the appl icati on of m u lti var ia te p rocedures is no t the typ e o f m eas ure m en t so much as th e shape of dis tr ibu ti on, as d iscu ssed in Ch apter 4 and in discu ssi on s of tes ts o f assu mp tions in Chap ter s 5 thr o u g h 17. Non-n o rma ll y d is trib u ted con tinu ous va riab les a n d dich o tomo us va riables with very u neven sp li ts between the ca tegor ie s p resent prob lems to sever al o f th e mul ti vari a te ana lyses. This issu e and i ts resol u ti on a re discussed at some length in Cha p ter 4.

Another typ e of m eas urem en t that is u sed som etimes produces a rank o rde r sca le. This sca le assigns a nu m ber to each case to ind icate the case's posi tion vis -a -v is o th er cases along som e d im ensi on For ins tance, ranks are ass igned to con testa n ts (fi rs t p lace, second p lace, third place, e tc.) to p r ovid e a n ind ication of w h o is the bes t-b u t not b y h ow mu ch. A p roblem wi th rank order m easures is tha t the ir d is trib u tions a re rec ta n g ula r (one freq u e n cy per n umb e r ) instead o f n o rma l, u nless tied ranks a re permi t ted and they p il e up in the mi d dle of the distributi on

In p r ac tice, we often trea t varia b les as if th ey a re con tinu ous w h en the u n derl y ing scale is though t to be con tinu ous, bu t the m eas ured sca l e actu a ll y is r a nk o rd e r, the n um be r of ca tego ri es is large-s ay, seven o r m ore, a n d the da ta m eet o th e r assumption s o f the ana l ysis. For in s tance, th e n um be r o f correc t i te m s o n a n obj ec ti ve test is techni ca ll y not con tinu ou s beca u se fr actiona l va lu es a re n o t possib le, b u t i t is t h ou gh t to measure so m e u nd e rl yin g con tinu ous varia b le such as cou rse m as te r y. Another e xa m p le of a va riab le w i th a mbi guou s measurem en t is one m eas ure d on a Like rt - t ype sca le, in which cons ume rs ra te th e ir a ttitud es towa rd a p roduc t as "s tro n g ly l ike," " m ode ra te ly like," " mil d l y li ke," "nei the r like nor dis li ke," " mil d ly disl i ke," " m oderately disl i ke," o r "st r ong ly d isl ike." As m en tioned previou s ly, even d ichotom ou s va riab les may b e tr eated as if con tin u ou s und er som e con d i ti ons. Th us, we o fte n u se the te rm continuous, t h rou ghou t the rem aind er of t h is boo k, w h e ther the m eas ured sca le itself is con ti n u ou s o r the va riab le is to b e treated as if con ti n u o u s. We u se the te rm discrete fo r va r ia b les w i th a few ca tegor ies, whether th e categor ies differ in t ype o r q u antity.

1.2.2 Samples and Populations

Sam p les a re measure d to make generalizati ons abou t p o pu la ti on s. Idea lly, samp les a re se lected , usua lly b y some ran dom p rocess, so tha t they rep resen t the popu lati on of in terest. In rea l life, however, popula tions a re frequen tl y bes t defined in terms o f sam p les, ra t h e r th an vice ve rsa; the p opu la ti on is the grou p from w hi ch you were ab le to rand o mly sa mp le.

4 Som e multi varia te techniques ( e.g., logistic regression, SEM) are appropria te for all types o f va riables

Samp ling has somewhat differen t connotations in nonexperim en ta l and experim en ta l resear ch In n onexpe rimenta l research, you investi gate re la ti on s hi ps am on g varia b les in so me p redefined popu lation. Typ ica lly, you take el ab o r a te p recau ti ons to ensure tha t you h ave achieved a representa ti ve samp le of tha t popula ti on; you d efin e your popul ation, a nd then do your best to random l y sa mp le from i t.5

In expe rim enta l resea rch , you atte mp t to create differen t popu la ti ons b y trea ting subgrou ps fr o m a n originally ho m ogeneou s grou p diffe rentl y. The samp ling o bjecti ve here is to e n s ure tha t a ll cases co m e fro m the sa me popu lation b e fo re you tr ea t them differen tl y. Rand o m samp ling consists of randoml y assigning cases to treatmen t g rou p s (levels o f the IV) to ensu re tha t, b e fo re differen tial t reatmen t, a ll s ubsa m ples co m e fro m the sa m e popula ti on. Statisti cal tests p rovid e evid ence as to whether, afte r trea tm ent, a ll samp les still come from the same pop ulation. Generalizations a b o u t trea t m en t effec ti veness a re m ade to th e t ype of indiv idua ls w h o par ticipa ted in th e exper imen t.

1.2.3 Descriptive and Inferential Statistics

Descr iptive s ta tistics d escr ibe sa mp les o f cases in terms of va riables or combin a tions of variab les. Inferential statis ti ca l techniques tes t hypotheses abou t diffe rences in popul ations on the basis o f measure m ents m a d e on sa m ples of cases. If s ta tistically significan t d iffe rences are fou nd , descrip ti ve s tatisti cs a re then used to p r ovid e estima tions of cen tral tenden cy, a nd t h e l ike, in the pop ul ation. Descripti ve s ta tis tics u sed in t his way a re ca lled parameter estimates. Use of inferen ti al and d escr iptive s tatis tics is r a rely an e i ther-o r p rop osi ti on We a re usu a ll y in teres ted in both describ ing and ma king inferences abou t a d a ta set. We d escr ibe th e da ta, fi n d s ta tistica lly s ignifi can t d iffe rences o r rela ti onship s, and estim ate p opu la ti on valu es for those findings. H oweve r, the re are m o re res tr icti on s on inferen ce tha n there a re o n d escr i pti on. Many assump ti ons of multi va r iate statisti cal m e th od s a re necessary o nly for inferen ce. If simple d escr i p tion of the sam p le is t h e m ajo r goa l, ma n y ass ump tions a re relaxed , as discu ssed in Chap te rs 5 through 17.

1.2.4 Orthogonality: Standard and Sequential Analyses

O rthogona l it y is a p er fect nonassociation b e t ween varia b les. If two variab l es a re ort h ogon al, knowing the va l ue o f o n e va r iab le gives no clue as to th e valu e of th e o the r; t h e co rre l ation b e t ween the m is ze ro.

O rthogona l it y is often desi rab le in s tatisti cal applica ti ons. Fo r instance, fac to rial designs fo r e xp erim ents are orthogon al when two o r more IVs are completel y crossed w ith equ a l samp le sizes in each co mbina ti on of leve ls. Excep t for use of a common erro r term, tests of hypotheses a b o u t main effec ts and in te rac tions a re ind ependen t o f each o th e r; th e ou tcome o f each test gives n o hin t as to the ou tcome o f the o the rs. In o rt hogona l exp e rimen ta l d es igns wi th random assignm en t of cases, m anipul a ti on of the levels o f the IV, and good controls, changes in va l ue of the DV can be una m b ig u ously a ttrib u ted to var iou s ma in effec ts and in te rac tions.

Simila rl y, in mul ti varia te ana lyses, the re are a d vantages if sets o f IVs or DVs are o rthogonal. If a ll pairs of IVs in a set are o rthogonal, each IV a d ds, in a simp le fashion, to p red ic ti on of th e DV. C ons ider in com e as a DV w ith e d uca tio n and occu p ational p res ti ge as IVs. If ed u ca tion a nd occu pa tional p restige a re orthogonal, and if 35% of t h e va riabili t y in in come m ay be p red icted fro m ed u ca ti on a nd a d iffere n t 45% is pred icted fro m occu pa ti on a l prestige, then 80% of the var iance in income (the DV, Y) is predicted fr o m edu cation and occu p a tional p res ti ge together.

O rthogona l it y can eas il y be ill ustra ted in Venn diagra m s, as s h ow n in Figure 1.1. Venn d iagrams represent shared va riance (or correl a ti on ) as overlappin g a reas b e t ween two (or m o re) ci rcl es. Th e total vari ance fo r in come is one circle. The secti on w i th horizon ta l s trip es re p resents the pa rt o f income p red ictab le fr o m educa ti on (t h e firs t IV, X1), a nd t h e sec ti on w i th

5 Strategies fo r random sam pling are discu ssed in m any sources, inclu d ing Levy and Lem ens how (2009), Rea and Par ke r (1997), an d de Vaus (2002).

Fi g ure 1 . 1 Ven n di a gra m for Y (income), X, (ed u c atio n), a nd X2 (o cc upatio nal p rest ig e)

vertica l s tri p es represents the par t p red ictable from occu pa ti on al pres tige ( th e second IV, X2); the ci rcle fo r ed u cation overlaps the circle fo r in come 35% and the circle for occupation a l prestige overlaps 45% Toge the r, they accoun t for 80 % of th e variab ili t y in income beca u se education and occup a ti on a l p restige a re o rthogonal and do n o t themsel ves overl ap. The re are s imi lar a d van tages if a set o f DVs is o rthogona l. The overa ll effec t of an IV can be pa rtitioned in to effec ts on each DV in an add iti ve fashi on.

Usu ally, however, the variab les are cor related wi th each other (n on orthogon al). IVs in nonexper imental d es igns are often cor related na tur ally; in experimental designs, IVs become co rrel a ted wh en unequal numb ers of cases are measured in different cells of the design. DVs are u sually correla ted becau se indi vidual differences among particip ants tend to be con sis ten t over many attribu tes. When va riab les are corre l a te d , they have s h a red o r overlapping va r iance. In the e xample of Figu re 1.2, ed u cation and occupationa l p restige co rrel ate w i th each o the r. Alth ou g h th e in dependen t con tr ibuti on made by educa ti on is s till35% and th at by occupationa l prestige is 45%, their join t contrib u tion to p redic ti on of income is no t 80%, bu t ra the r somet hing s m alle r d u e to the overlapping a rea s h ow n by th e a rrow in Figure 1.2(a). A major dec is i on fo r the mu lti var ia te analyst is how to handle th e va ri ance tha t is predic ta bl e fr om m o re tha n o n e va riable. Many m u l tiva r ia te techrUqu es have at least two s trategies fo r handli n g it, bu t some have m o re. In s tanda rd analysis, the ove rlapping variance cont rib u tes to the size of s u mmary s ta tistics of the overa ll rela ti onship bu t is not ass igned to e ither variab le. Overlapping variance is disregarded in assessing the con trib u tion of each va r ia bl e to the sol u ti on. Figu re 1.2(a) is a Venn d iagra m of a standard ana lysis in w h ich overlapping var ian ce is shown as overlapping areas in circles; the u ni q u e con tr ibu ti ons o f X 1 and X 2 to p red icti on o f Ya r e s h ow n as hor izon tal and ver tica l a reas, respectivel y, and the to tal rel ationship between Y and the combin ation of X1 and X2 is those two a reas plus the a rea w ith the a r r ow. If X1 is ed u cation and X2 is occu pa ti on a l p res ti ge, th en in s tanda rd analy s is, X1 is "cred ited w ith" the area m arked b y the horizon ta l lines a n d X2 b y the a rea mark ed by ve rtica l l in es. Neithe r o f the IVs is assign ed the area d es ignated w ith the arr ow. When X 1 and X2 s ubs tantia lly overlap each o ther, very li tt le h o rizonta l or verti ca l a rea may be left fo r ei the r o f th em, desp ite the fact tha t th ey are both re la ted to Y. Th ey have essen tially knock ed each other o u t of the sol u ti on.

Ar ea represe nts va rian ce in relatio nship th at contr ibute s to solutio n but is assigned to neither x, nor x2

(a) Stand ard analysis

(b) Sequential analy s i s in which X 1 is g ive n p r iority ove r X2

Fi gure 1. 2 S tand a rd (a) a nd s eq uent ia l (b) a n a lys e s of t he re lat io nsh ip between Y. x, , and X2 Horizonta l s had in g d e picts varia nce as sig ned to X1 Vert ic a l s h a d ing dep icts va ri ance ass ig ned to X2

Sequ ential ana lyses differ, in tha t the researcher ass igns priority for en try of variab les in to equ a tions, and the firs t one to ente r is assigned both uniqu e variance a nd any overlapping variance it has w ith othe r variab les. Lower-pri ori ty variables a re th en assigned on en try their uniqu e and any remaining overlapping variance. Figure 1.2(b) sh ows a sequential analysis for the same case as Figure 1.2(a), w h ere X1 (ed u ca tion) is g i ven priority over X2 (occupation al p res tige). The to tal variance explained is the same as in Figure 1.2(a), b u t the rela ti ve contri b u tions of X1 and X2 h ave changed; education now shows a s tronger relationship wi th in come than in the standard anal ysis, whereas the rela tion b etween occupa ti ona l pres tige and income remains the same.

The choice o f stra tegy for dealing with ove rlapping variance is not trivial. If va ri abl es are correla ted, the overa ll rela ti on ship remain s the same, but the apparent importance of variab les to the sol u ti on changes dependin g on whe the r a s ta n dard or a sequ entia l s tra tegy is u sed. If the mul tiva ri a te p r ocedu res have a repu ta ti on for unreliability, it is b ecause sol u ti on s change, sometimes d r amati cally, w h en differen t s tra tegi es fo r entr y of var iables a re chosen. Howeve r, the s tra tegies also ask different q u estions of the da ta, and it is in cumbent on the research e r to de termine exactl y w hi ch ques ti on to as k . We try to make the ch o ices clear in the ch apte rs tha t follow.

1.3 Linear Combinations of Variable s

Mu lti varia te a n a lyses combine variables to do usefu l work, s u ch as p red ict scores o r p red i ct grou p memb e rship. The comb inati on that is fo r med depends on th e rela ti onships among the variab les and th e goals of analys is, bu t in most cases, th e combina ti on is lin ear. A linear combina ti on is one in w hi ch each variab le is ass ig n ed a we ight (e.g., W1), and then the p roduc ts of weights and the va r iab le sco res are s u mmed to pred ict a score on a combin ed va ri ab le. In Equ ati on 1.1, Y' ( the pred icted DV) is predic ted by a l in ea r combin a ti on of X1 and X2 ( the IVs).

(1.1)

If, for examp le, Y' is p red icted income, X1 is educa ti on, and X2 is occupationa l prestige, the best p red iction of income is obtain ed by weighting educa ti on (X 1) by W1 and occupa ti on al prestige (Xv by W2 before s u mming. No o the r va l ues of W1 and W2 prod u ce as good a p red icti on o f income.

Notice tha t Equa ti on 1. 1 includes nei the r X 1 nor X2 r aised to powe rs (expon en ts) n o r a prod u ct o f X 1 and X2 Th is seems to severely restric t mul ti va ria te sol u ti ons u n til o n e realizes tha t X1 cou ld i tself be a p r od u ct o f t wo diffe rent variab les or a single variab le r a ised to a power For examp le, X1 migh t be educa ti on squ a red. A mu lti var ia te sol uti on does no t prod u ce exponents o r cross-prod u cts of IVs to improve a so lu tion, but the resea r ch er can inclu de Xs tha t are cross-prod u cts of IVs or a re IVs ra ised to powers. Inclusion o f variables raised to powers o r cross-produc ts of vari ab les has bo th theoretica l and p rac ti ca l impl icati ons fo r the so lu tion. Ber ry (1993) p rov ides a useful discu ssion o f many of the issues.

Th e s ize o f the W va lu es (o r some function of them) o ften reveals a great deal abou t th e relati on ship between DVs and IVs. If, fo r ins tance, the W valu e fo r some IV is zer o, th e IV is not needed in th e bes t DV- IV re l ationship. Or if some IV has a large W va lue, th en the IV tends to be important to th e relationship. A lthou g h comp l ica ti on s (to be expl ained l a ter) prevent inte rpreta ti on of the multiva riate solu tion from the sizes o f the W valu es alone, they are nonetheless impo rtan t in most multiva r iate p r ocedures.

Th e combin ation o f variables can be consi dered a s u pervari ab le, n o t d irectly meas u red but worth y o f interpreta ti on. Th e s u pervariab le may rep resen t an unde rl y ing d imen s ion tha t pred i cts something or optimizes some rela ti onsh ip. Therefore, the attemp t to unders tand th e meaning of the comb ination o f IVs is worthwhi le in many m u l ti vari ate ana lyses.

In the search for th e b est weig hts to apply in comb ining variab les, compu ters do not try o ut all possib le se ts of weights. Va ri ou s a lgo ri t hms have been deve loped to compute the weigh ts. Most a l go ri thms in vol ve manipul ation of a correla ti on matrix, a va riance-covariance ma tr ix, o r a s um-of-squ a res and cross-produc ts ma trix. Section 1.6 descr ibes these ma trices in ver y

simp le te r ms an d s h ows th ei r deve lopmen t fr o m a ve r y s m all da ta set. Append ix A descr ibes some te rms and m anip u la ti ons appropria te to matrices. In the fo u r t h sections o f Cha p ters 5 through 17, a sma ll h yp o thetical sa mp le o f d a ta is analyzed b y h and to show h ow the wei ghts a re d er ived for each analysis. Thou g h this infor ma ti on is u seful for a basic unders tanding of m u l tiva ria te s ta tistics, it is not n ecessa r y fo r applying m u l ti va ria te techniqu es fru itfull y to your resea rch q u esti ons a n d may, sad ly, b e skipped by those w h o a re ma th aver se.

1.4 Number and Nature of Variable s to Include

Atten ti on to the numb er o f variab les incl uded in a n analys is is important. A genera l rule is to get the best so luti on with the fewes t variab les. As more and more var iab les are includ ed , the solu tion usually imp roves, bu t only sl ig htl y. So me times the imp rovement d oes no t compensa te for the cos t in degrees of freed om of including m ore variables, so the power of the a n alyses diminishes. A second p rob le m is averjitting. With overfitting, the solu tion is ver y good; so good , in fac t, tha t it is unlikel y to generalize to a population. Overfitting occurs when too many variab les are includ ed in an analysis rel a ti ve to the sam p le size. Wi th smalle r samples, ver y few variables can b e analyzed. Gen erally, a researcher sh o ul d incl ude o nl y a limited numb er o f uncorrela ted variables in each analysis,6 fewer w i th s mall er samples. We g i ve gui delines fo r the numb er of variab les that can b e incl uded rela tive to sam p le size in the third sections of Ch apters 5 through 17. Add iti on a l con s idera ti ons fo r incl usi on o f varia b les in a multiva riate ana lysis includ e cos t, ava i labil it y, mean i n g, and theoretica l re lati on shi ps amon g th e va ria b les. Except in a n alysis of s tru ct ure, one us u a ll y wan ts a sma ll n um be r o f valid , cheap ly ob ta ined , eas ily ava ilab le, uncorrela ted var iab les tha t assess a ll the th eore tica lly im portan t d im ensi on s of a resear ch area. Another impo rtan t con s ider a ti on is reliab il i ty. H ow s table is the p osi ti on of a g i ven sco re in a distributi on o f sco res when meas ured a t diffe rent times o r in differen t ways? Unre liab le va r iab les d egra d e an analysis, whereas relia b le o n es enhance it. A few reli ab le var iab les give a m o re m eaning ful sol u ti on th an a l ar ge n umbe r o f less rel ia b le va r iab les. Ind eed , if variab les a re s ufficien tl y unrel iab l e, th e en tire sol u ti on may reflect o nl y measurement e rror. Fur the r con s ide ra ti ons fo r variab le sel ection are m en tioned as they a p p ly to each analysis.

1.5 Statis tical Power

A c ri tical issue in designing any s tud y is whe ther there is adequ ate power. Power, as you m ay recall, rep resents the prob ab ilit y tha t effects tha t actually exis t h ave a chance o f prod ucing sta tistical s ignificance in your even tual da ta analysis. Fo r examp le, d o you h ave a lar ge eno ugh sample size to show a s ignificant rel ationship betw een GRE and GPA if the actual rel ationship is fairly large? Wha t if the rel a tionship is fai r ly s mall? Is your samp le large en ough to reveal s ignificant effects of trea tm ent on yo ur DV(s)? Rel a tionships among power and errors o f inference are d iscu ssed in Ch apte r 3. Iss u es of power a re b es t consid ered in the p lanning s ta te of a stud y w h en th e resea r cher dete rmi n es the required sa mp le size. The resea rcher es tima tes the size of the an ticip ated e ffect (e.g., a n expected m ean differ ence), the va riab il i t y e xp ec te d in assess m en t of the effect, t h e desired a lp h a leve l (ordin a ri l y .05), and the desired p owe r (often .80). These fo ur estim ates a re requ ired to d etermi n e the necessa r y sa mp le size. Fa il u re to consi der power in the p lanning s tage often resul ts in fa il u re to find a s ignifi can t effect (and an unp ub lishab le stu d y). The interes ted reader m ay wish to con s ult C oh en (1988), Ross i (1990), Sedlm e ier a n d G ige renzer (1989), o r Mu r p hy, M yo rs, and Wolach (2014) for m o re detail.

The re is a grea t d eal of softwa re ava ilab l e to help yo u es tima te the power avai lab le w i th var iou s sa mp le sizes for var iou s statisti cal techniques, and to help you de te rmin e n ecessa r y

6 The exceptio ns are ana ly s is of s truc ture, s uch as factor ana ly s is, in which num er ous correlated va riab les are measured .

sample size given a desired level of power (e.g., a n 80% probabi li ty o f acrueving a s ignifican t resu lt if an e ffect exists) and expected sizes o f rela ti onship s. One of t h ese p r ograms th at estim ates power fo r seve ra l tech niqu es is NCSS PASS (Hin tze, 2017). Many o th e r p rograms are rev iewed (and so me times avai lab le as s h areware) o n th e In tern et. Issues o f power re levan t to each o f th e s ta tistical tech niqu es a re d iscu ssed in Chap te rs 5 through 17.

1.6 Data Appropriate fo r Multivariate

Statis tic s

An appropria te d ata set fo r multivaria te s tatis tical methods con sis ts of valu es on a n u mber of variables fo r each of several participants o r cases. Fo r continuou s variab les, the valu es are scores on variab les. For example, if the continuous variable is the G RE, the valu es for the vari ou s participants a re scores such as 500, 420, and 650. For discre te variab les, valu es are n u mber codes for group membership o r trea tment. For examp le, if there are three teaching techniqu es, s tu den ts wh o recei ve one techniqu e are arb itraril y assigned a "1," those recei ving ano ther techniqu e are assigned a "2".

1.6.1 The Data Matrix

Th e d a ta ma t r ix is an organiza ti on of sco res in which rows (li n es) rep resen t pa r ti cipan ts a nd co lumns represent vari ab les. An examp le of a da ta m atrix w i th s ix pa rticipants 7 a nd fo ur va r ia b les is given in Tab le 1.1. For e xamp le, X1 migh t be t ype of teaching tech niqu e, X2 sco re o n t h e G RE, X3 GPA, and X4 gend er, wi th women coded 1 and men coded 2.

Data are en tered into a da ta fi le w i th long- term s torage accessible by comp u ter in ord er to a p p ly co mp u ter techniqu es to them. Each participant s tarts with a new row (lin e). Info rma tion id entifying the p a r ticipan t is t ypica lly entered fi rs t, foll owed b y the valu e of each variab le for tha t participant.

Scores for each variab le a re e n te red in the same o r de r fo r each s tu den t If there a re more da ta fo r each case tha t can be accommoda te d on a single line, th e d a ta a re con tinu ed on ad d i tional lines, b u t a ll of the d a ta fo r each case are kept toge the r. All of the co mpu te r package m anuals p r ovid e infor ma ti on o n setting up a da ta ma trix.

1n this examp le, th e re a re values fo r ever y va r iab le fo r each s tu den t. This is n o t a lways the case w i th research in t h e rea l worl d. Wi th l arge n umbers of cases and variab les, scores are frequ entl y missin g on so m e variab l es for so me cases. For instance, respondents may refuse to answe r some k inds o f ques ti ons, or some s tu den ts may be absen t the day when a parti cu l a r tes t is g i ven, a n d so fo r th . This creates missing va l ues in the da ta m atrix. To dea l w ith missing valu es, firs t b uil d a d a ta file in which some sy mbo l is u sed to ind icate tha t a va lu e o n a variable is missin g in d a ta for a case. The va r io us p r ograms have standard symbols, s u ch as a dot (.), fo r this p u rpose. You can also use other sy mbo ls, bu t it is o ften just as con veni en t to u se one o f th e defa ult symbols. O n ce th e d ata se t is avai lable, consu l t Cha p ter 4 for va r iou s o pti on s to d eal w i th this messy (b u t often u n avoid able) p rob lem.

Table 1. 1 A Data M atrix of Hypoth et ical Sc ores

Tabl e 1. 2 C orrelat ion M at rix fo r Part of Hy pot het ical Data f or Tab le 1.1

1.6.2 The Correlation Matrix

Mos t readers are familiar with R, a cor rel a tion matrix. R is a squ are, symmetrical ma trix. Each row (and each co lumn) rep resen ts a differen t variable, and the valu e at the intersection o f each r ow and column is the correlation b etween the t wo variables. For ins tance, the valu e a t the in tersec ti on of the second row, third column, is the co rrela ti on b etween the second and the third variab les. The sam e co rrel a ti on also appears a t the intersecti on of the third row, second column Thu s, cor rela tion ma trices are said to be symme trical abo u t the main d iagonal, which means they are mirror images o f themselves above and below the d iagonal from top left to bo ttom rig ht. Hence, it is common p racti ce to show only the bo tto m half or the top h alf o f an R ma trix The entries in the main d iagonal are often o mi tted as well , since they are all on es-cor rela tions of variab les with thernselves.8

Ta b le 1.2 shows the co rrelation matr ix for X2- X3 , and X4 of Table 1. 1. The va l ue .85 is the co rre lation between X2 and X3 a nd i t app ears tw ice in th e ma t r ix (as d o o the r valu es). Other co rre lations are as in d icated in the table.

Many p rograms a ll ow the resea rch er a choice between ana l ysis o f a corre lation ma trix and analysis of a variance-covari ance ma tr ix. If t h e co rrel ation matr ix is analyzed, a uni t-free resu lt is p rod u ced. Tha t is, the sol u ti on refl ects th e rel a ti on s h ips among the va r iab les b u t no t in the m etric in which th ey are measu red. If the metric o f the scores is somewha t a r b i tra r y, analysis of R is approp ria te.

1.6.3 The Variance-Cov ariance Matrix

If sco res a re measu red along a meaningful scale, i t is so m etimes appropr ia te to ana lyze a var iance-covarian ce ma tri x . A variance-covari ance m a trix, l: , is also squ are a nd symme trica l, but the elemen ts in the ma in diagonal a re the vari ances of each va riable, and the o ff-diagon a l e lements are covari ances between pairs of differen t va riab les.

Var ian ces, as you reca ll , are averaged squ a red dev ia ti on s of each score fr o m the mean of the scores. Since the dev i ations are averaged , the n umbe r o f scores inclu ded in comp u tation of a va riance is not re levan t, bu t the m etric in w hi ch the sco res a re measu red is re levant. Scores m easured in la rge numbers tend to h ave la rge n umbers as va riances, a n d sco res measured in small n umbers tend to have small vari ances.

Cova riances are averaged cross-p roducts (p roduct o f th e d evia ti on b e t ween on e va riable and i ts m ean and th e dev iati on between a second va r iab le a n d i ts mean). Covarian ces a re similar to correla ti ons excep t tha t th ey, l i ke varia n ces, re tain infor ma ti on conce rnin g the scales in which the variables are measured. The var iance-cova r iance ma trix fo r the con tin uous da ta in Ta b le 1.1 appears in Ta b le 1.3.

T ab l e 1 .3 Varian c ErCovarianc e M atrix for Part of Hypo th et ical Data of Ta ble 1 1

x2 X3 x

x2 7026 66 32.80 - 6.00

l: = X3 32 80 .2 1 - 12 x. - 6.00 - .12 .30 8

1.6.4 The Sum-o f-Squares and Cross-P rod uc t s Matrix

Th e ma trix, S, is a precursor to the va riance-cova riance ma tr ix in w h ich devia ti ons are n o t yet averaged. Thu s, the s ize of the entries depen ds on the number of cases as well as o n the metric in which the e lements were measured. The s um -of-s quares a n d cross-p r od u cts ma tr ix fo r Xz, X3 , and X4 in Table 1.1 appea rs in Table 1.4.

Th e en tr y in the majo r diagonal of the ma trix S is th e s um of squ ared dev iations o f scores fr om th e mean fo r tha t variab le, hence, "sum of squares," or SS. Th a t is, fo r each va r iab le, th e valu e in the major d i agona l is

where i = 1,2, ,N

N = the n u mb e r of cases

j = the variab le iden tifier

X1i = th e score on variab le j by case i

Xi = th e mean of a ll scores on the j th var iab le

Fo r example, fo r X4, th e mean is 1.5. The sum of squ ared dev iations a round th e mean a n d the diagon a l valu e for the va r iable is

6 2, ( Xi4 - X4 ) 2 = {1 - 1.5 ) 2 + {2 - 1.5)2 + {1 - 1.5) 2 + {2 - 1.5)2 + {1 - 1.5) 2 + {2 - 1.5) 2 = 1.50

The off-diagona l elements o f the sum-o f-squ a res and cross-products ma t r i x are th e c r ossprod u c ts- t h e sum o f products (SP)-o f the var iab les. For each pa ir of vari ab l es, rep resented by row and co lu mn labels in Tab le 1.4, the entry is the s u m of t h e p rod u c t o f the deviation of o n e va ri ab le aroun d its mean ti mes the dev i a ti on of the o th er variab le a round i ts mean. N

SP{X1 Xk) = 2, ( X1i- Xi )( X1k - X k) J=l (1.3)

w h e re j iden tifies t he firs t va r iab le, k iden ti fies t he second var i able, and all othe r terms a re as defin ed in Equa t ion 1.1. (Note tha t if j = k, Equation 1.3 becomes ident ica l to Equ at ion 1.2.)

Fo r examp le, the cross-p roduct term fo r va r iables X2 and X3 is

N 2, {X,1 - X2 ){ X13 - X3 ) = {500 - 533.33){3.20 - 3.275 ) + {420 - 533.33 ){ 2.50 - 3.275) + + {600 - 533.33 ){3.25 - 3.275) = 164.00

Most compu ta ti ons start wi th S and proceed to o r R. The progressi on from a s u m-ofsqu ares and cross -produc ts ma tr ix to a varian ce-covarian ce ma tr ix is s imple.

!. = --s N - 1

Table 1.4 Sum-of -Sq uares and CrossProd ucts Matri x f or Part of Hypothet ic al

Data of Tab le 1.1

(1.4)

Th e variance-covariance ma t rix is pro d u ced b y d i vi din g eve r y element in the s u m -ofsqu ares and cross-p roducts matr ix by N - 1, whe re N is the n umbe r of cases.

The correla ti on ma tr ix is der ived from a n S m atrix by d iv i ding each sum-of-squ ares by i tself (to prod u ce the 1s in the main d iagona l o f R ) and each cross-p roduct of the S m atrix b y t h e squ are r oo t o f th e produc t of the s um-of-squ a r ed dev iati on s a r ound the m ean fo r each of the var iab l es in the p a ir Tha t is, each cr oss-pr od u ct is d i vi ded by

D enominato r (X1 Xk) = Y!:(X;i- Xi ) 2 '£(X;k - Xk)2

where terms are defined as in Equ a ti on 1.3.

(1.5)

Fo r so m e m ul ti va r iate ope rations, i t is not necessary to feed the da ta m a trix to a comp uter p r ogr a m Instead , an S o r an R ma trix is en te r ed, w ith each r ow (repr esent ing a variab le) s ta rtin g a new line. Often, consid erab le com p u t ing t ime and expense a r e saved by ent e ring one or the o the r o f these m a trices rather than raw d a t a.

1.6.5 Residua ls

O ften a goa l of ana lysis o r test of i ts effi ciency is i ts ab ility t o r epr oduce the values of a D V o r the correla ti on matrix of a se t of vari abl es. Fo r exa mp le, we migh t wan t to p r ed ic t scor es on the GRE (X2 ) of Table 1. 1 fro m kn ow led ge of G PA (X3 ) and gen der (X4). After app ly ing the p roper sta tistica l oper a tions- a mu lt ip le r egression in this case- a p red icted GRE scor e fo r each s tu dent is computed b y applyin g the p roper weights fo r GPA and gender to the GPA, and gender scores fo r each stu d ent. Bu t b ecause we a lready ob tained G RE sco res fo r the sam p le of stu d en ts, we a r e able to compa r e the p red icted score with the ob t ained G RE score. The d ifference between t h e p r ed icted and ob t ained values is known as the residual and is a measur e o f error of p red iction. In m ost ana lyses, the residu als for th e ent ire sample su m to zero. Th a t is, som e tim es t h e p r ed icti on is too large and so me tim es it is too sma ll , b u t the aver age of a ll t h e err ors is zer o. The squ a r ed valu e o f the resi d u als, however, p rov id es a measu re of how good th e p redic ti on is. When th e p redic ti ons a re close to the ob tained va lues, th e squa r ed erro rs are sma ll . The way tha t the resi d u als a re d is trib u ted is of fu r the r interest in eva l ua t ing th e degree t o w h ich the d ata m eet the assump ti on s o f mu l ti va ri ate a n a lyses, as d iscu ssed in C h apter 4 and e lsewher e.

1. 7 Organ iza tion of the Book

Chap ter 2 gives a gui de t o the m u l ti va ria t e t echniqu es th at are cove red in this book and p l aces the m in con tex t wi th the m o r e famil iar u n i varia te an d b iva r iate statisti cs w h er e possib le. C hap ter 2 incl udes a fl ow chart tha t o r ganizes s tatis tical t echni ques on th e basis of the m ajor resea r ch q u esti on s ask ed. C h apter 3 provid es a br ief review of u n ivari ate a n d b i va r ia t e s ta tistical techni ques fo r those who are interest ed.

C hap te r 4 deals w i t h th e assump ti ons a n d li mi ta ti on s of m u lti var ia te s tatistical methods. Assessm ent and viola ti on of assu mpti on s ar e d iscu ssed, alon g w i th alterna ti ves for dealing w i th viola ti ons when th ey occur The reader is gui ded b ack to Chap te r 4 freq u en tly in Chapte r s 5 through 17.

C hap ters 5 throu gh 17 cover specific mul tivaria te techni qu es. They include d escr ip tive, conceptual section s as well as a guid ed tour throu gh a real-world data set fo r which the anal ysis is appropria t e. The t o ur includ es an exam p le of a Res ul ts sec ti on describ ing the outcome of the s t atis tical anal ysis approp riate for submiss ion to a p rofessional journal. Ea ch technique chap ter includ es a com parison of com p u ter p rogr ams. Yo u m ay want t o vary the o rder in whi ch you cover these chapter s.

C hapter 18 is an attem pt to integra te uni va riate, biva r ia te, and mul ti va riate statistics through the m u l tivari a te gener al linea r m odel. The co mmon e lemen ts u nderl ying all the techni ques are emph asized, r ather than the differences am ong them . Chapter 18 is m eant to p u ll togethe r the ma te ri al in the rem a in der of the b ook with a conceptu al r ather than pragma ti c e mp hasis. Som e may w ish t o con s id er this m ater ial earli er, for inst ance, immedi a tel y after C h a p te r 2.

Chapter 2

A Guide to Statistical Techniques Using the Book Learning Objectives

2 1 Determine statistical techniques based on the type of research questions

2 2 Determine when to use specific analytical strategies

2.3 Use a decision tree to determine selection of statistical techniques

2.4 Outline the organization of the technique chapters

2.5 Summarize the primary requirements before selecting a statistical technique

2.1 Re s earch Que s tion s and A ss ociated Technique s

This chapter o rganizes the s tatis tical techniques in this b ook by majo r resear ch questions. A decis ion tree a t th e end of this chap ter leads you to an a p p rop ria te analysis fo r your data On the basis o f your major research q u estion and a few characteris tics of yo ur d a ta set, you de termine which statistical techni qu e(s) is app ropria te. The first and the m ost impo rtant criterion fo r choosing a technique is the major research question to be answered b y the sta tis tical anal ysis. Here, the research question s are categorized in to degree of relation ship a m on g variab les, significance of g r o up differen ces, predicti on of group m em bership, s tru cture, and q u es ti ons that focus on the time course of even ts. This chapter em p h asizes d iffer en ces in research q u es ti ons answered b y the diffe rent techniqu es d escribed in n ontechnical ter ms, w h ereas Chapter 18 provides an in tegr a ted overvi ew of the techniqu es with som e basic equation s used in the m u l tivaria te general lin ear m od el. 1

2.1.1 D egree of Re lationship Among Variables

If the majo r p urpose of a n a lysis is to assess the associations a m ong two o r more va riab les, som e form of cor re la tion/regression or chi-squ a re is app rop ria te. Th e choice am on g five d iffe ren t s tatistical techniqu es is m ade by de te r mining the n umber of independ ent and depend ent va ria b les, th e n a ture o f the va riables (conti nuou s or discrete), and w h e th e r a n y o f th e indep e n den t variab l es (IVs) are bes t concep tu a l ized as cova ria tes.2

1 You may find it he lpful to read Chapter 18 now instead of wa iting for the end .

2 If the effects of so me IVs are assessed af ter the effects of other IVs are s ta tis tically removed, the latter a re ca lled covo riates

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.