Assignment #4 – Jamie Hoag 1. An exploration of potential relationships among various combinations of predictor variables and an outcome variable (Use car sales, or a transformed version of car sales, as the outcome variable). Table 1: Sales Tests of Normality Kolmogorov-Smirnov Statistic Sales in thousands

df

.218

a

Shapiro-Wilk

Sig. 157

.000

Statistic

df

.667

Sig. 157

.000

a. Lilliefors Significance Correction Table 2: Log of Sales Tests of Normality Kolmogorov-Smirnov Statistic Natural Log of Sales

.066

df

a

Shapiro-Wilk

Sig. 157

.093

Statistic .964

df

Sig. 157

.000

a. Lilliefors Significance Correction

In Table 1 I tested whether or not Sales was normally distributed and found that there was no significance in either test, K-S and S-W. So I then tried it a second time but this time used the Natural Log of Sales (ln_of_sales) and found that the significant value for K-S was .093, which meets normality assumption. 2. & 3.Choosing an appropriate regression “method”, identifying the “best fitting” regression model. Explain your rationale behind (a) your choice of the regression method, and (b) your choice of predictors. Specify the assumptions underlying linear regression, and explain, using logic and/or evidence whether each one of them is supported. A linear regression was then done to test all methods: Entered, Stepwise, Remove, Backward, and Forward, to help determine the “best fitting” regression model. I wanted to test all in order to see what method would be the most logical to use. My dependent variable was Log of Sales (ln_of_sales) and the independent variables were Price in Thousands, Engine Size, Horsepower, Wheelbase, Width, Length, Curb Weight, Fuel Capacity, Fuel Efficiency, and 4Year Resale Value. Below I have shown the Model Summary and Coefficients from each method to help portray why or why not the method is the “best fitting”.

â€œEnter is a method in which all predictors are forced into the model simultaneously and relies on good theoretical reasons for including the chosen predictorsâ€? (Field, 2009, p. 212). Table 3: Method - Entered Model Summary Model

R

R Square

Adjusted R

Std. Error of

Square

the Estimate

b

Change Statistics R Square

F Change

df1

Durbindf2

Change 1

.634

a

.402

.345

1.08253

Watson

Sig. F Change

.402

7.117

10

106

.000

1.404

a. Predictors: (Constant), Fuel efficiency, 4-year resale value, Length, Width, Engine size, Fuel capacity, Wheelbase, Curb weight, Horsepower, Price in thousands b. Dependent Variable: Natural Log of Sales

In this model summary you can see that 34% accounts for all of the independent variables (Predictors) listed above, of the variation in sales. Table 4: Method - Entered Coefficients Model

Unstandardized

Standardized

Coefficients B

Std. Error

a

t

Sig.

95.0% Confidence Interval

Collinearity

Coefficients

for B

Statistics

Beta

Lower Bound Upper Bound

Toleranc

VIF

e (Constant)

-1.097

3.206

-.342

.733

-7.453

5.259

-.036

.036

-.385

-1.015

.312

-.107

.035

.039

25.464

-.012

.038

-.101

-.311

.757

-.086

.063

.053

18.890

Engine size

.310

.260

.244

1.190

.237

-.206

.826

.134

7.474

Horsepower

-.003

.006

-.118

-.470

.639

-.014

.009

.090

11.095

Wheelbase

.093

.030

.559

3.111

.002

.034

.152

.175

5.718

Width

-.026

.052

-.068

-.492

.624

-.129

.078

.298

3.352

Length

-.018

.018

-.188

-1.008

.316

-.054

.018

.163

6.149

Curb weight

.262

.495

.117

.530

.597

-.718

1.242

.116

8.633

Fuel capacity

-.059

.062

-.166

-.949

.345

-.181

.064

.184

5.437

Fuel efficiency

.026

.049

.087

.538

.592

-.071

.123

.217

4.602

Price in thousands 4-year resale value

1

a. Dependent Variable: Natural Log of Sales

In this coefficient table if you look in the Sig. column, you can see that Wheelbase .002 is a good predictor of the outcome (measures less than .05).

â€œIn stepwise regressions decisions about the order in which predictors are entered into the model are based on a purely mathematical criterion. The stepwise method in SPSS is the same as the forward method, except that each time a predictor is added to the equation, a removal test is made of the least useful predictorâ€? (p. 212213). Table 5: Method - Stepwise Model Summary Model

R

R Square

Adjusted R

Std. Error of

Square

the Estimate

c

Change Statistics R Square

F Change

df1

Durbindf2

Change 1 2

Watson

Sig. F Change

.524

a

.275

.268

1.14434

.275

43.544

1

115

.000

.607

b

.369

.358

1.07206

.094

17.032

1

114

.000

1.380

a. Predictors: (Constant), 4-year resale value b. Predictors: (Constant), 4-year resale value, Wheelbase c. Dependent Variable: Natural Log of Sales

In this model summary (adjusted r squared) you can see that 36% stands for both the 4-year resale value and wheelbase, but only 27% accounts for a 4-year resale value of the variation in sales. You also notice the lack of progress from r squared to adjusted r squared. In fact there was a decrease in both models. Table 6: Method - Stepwise Coefficients Model

Unstandardized

Standardized

Coefficients B

Std. Error

a

t

Sig.

95.0% Confidence Interval

Collinearity

Coefficients

for B

Statistics

Beta

Lower Bound Upper Bound

Toleranc

VIF

e

1

(Constant)

4.491

.196

4-year resale

-.060

.009

-1.028

1.350

-.059

.009

.051

.012

22.903

.000

4.102

4.879

-6.599

.000

-.079

-.042

-.762

.448

-3.702

1.646

-.508

-6.812

.000

-.076

.307

4.127

.000

.027

-.524

1.000

1.000

-.041

.997

1.003

.076

.997

1.003

value (Constant) 2

4-year resale value Wheelbase

a. Dependent Variable: Natural Log of Sales

In this coefficient table if you look in the Sig. column, you can see that the 4-year resale value constant is .448, which is not a good predictor of outcome because it measures more than .05). The rest of the Sig. column is .000. Also note that the multicollinearity value in VIF is close to 1, which is an important factor to recognize.

I found that I could not use the removed method because a warning had popped up on my scree telling me that the regression method was invalid and that it could not be used as the first method when building an equation. Below is the warning box. Warnings Invalid REGRESSION METHOD subcommand specification--REMOVE cannot be used as the first method when building an equation. REGRESSION has inserted ENTER as the first method; REMOVE is now the second method.

â€œThe backward method is opposite of the forward method in that the computer begins by placing all predictors in the model and then calculating the contribution of each one by looking at the significance value of the t-test for each predictorâ€? (p. 213). Table 7: Method - Backward Model Summary Model

R

R Square

Adjusted R

Std. Error of

Square

the Estimate

j

Change Statistics R Square

F Change

df1

Durbindf2

Change

Sig. F Change

.634

a

.402

.345

1.08253

.402

7.117

10

106

.000

.633

b

.401

.351

1.07796

-.001

.097

1

106

.757

c

.400

.355

1.07419

-.001

.247

1

107

.620

.631

d

.398

.360

1.07068

-.002

.289

1

108

.592

.629

e

.396

.363

1.06766

-.002

.380

1

109

.539

6

.628

f

.394

.367

1.06480

-.002

.405

1

110

.526

7

.621

g

.386

.364

1.06719

-.008

1.504

1

111

.223

.618

h

.382

.365

1.06590

-.004

.726

1

112

.396

i

.376

.365

1.06600

-.006

1.021

1

113

.314

1 2 3 4 5

8 9

.632

.613

Watson

1.379

a. Predictors: (Constant), 4-year resale value, Length, Fuel efficiency, Width, Engine size, Fuel capacity, Wheelbase, Curb weight, Horsepower, Price in thousands b. Predictors: (Constant), Length, Fuel efficiency, Width, Engine size, Fuel capacity, Wheelbase, Curb weight, Horsepower, Price in thousands c. Predictors: (Constant), Length, Fuel efficiency, Width, Engine size, Fuel capacity, Wheelbase, Curb weight, Price in thousands d. Predictors: (Constant), Length, Fuel efficiency, Engine size, Fuel capacity, Wheelbase, Curb weight, Price in thousands e. Predictors: (Constant), Length, Engine size, Fuel capacity, Wheelbase, Curb weight, Price in thousands f. Predictors: (Constant), Length, Engine size, Fuel capacity, Wheelbase, Price in thousands g. Predictors: (Constant), Engine size, Fuel capacity, Wheelbase, Price in thousands h. Predictors: (Constant), Fuel capacity, Wheelbase, Price in thousands i. Predictors: (Constant), Wheelbase, Price in thousands j. Dependent Variable: Natural Log of Sales

In this model summary (adjusted r squared) you can see that 37% (rounded) stands

for models 6, 8, & 9. Table 8: Method - Backward Coefficients Model

Unstandardized Coefficients

a

Standardized

t

Sig.

95.0% Confidence Interval for B

Coefficients B (Constant)

-1.097

3.206

-.036

.036

Engine size

.310

Horsepower Wheelbase

5.259

-.385

-1.015

.312

-.107

.035

.260

.244

1.190

.237

-.206

.826

-.003

.006

-.118

-.470

.639

-.014

.009

.093

.030

.559

3.111

.002

.034

.152

Width

-.026

.052

-.068

-.492

.624

-.129

.078

Length

-.018

.018

-.188

-1.008

.316

-.054

.018

Curb weight

.262

.495

.117

.530

.597

-.718

1.242

Fuel capacity

-.059

.062

-.166

-.949

.345

-.181

.064

Fuel efficiency

.026

.049

.087

.538

.592

-.071

.123

-.012

.038

-.101

-.311

.757

-.086

.063

-1.301

3.125

-.416

.678

-7.496

4.894

-.046

.017

-.489

-2.793

.006

-.079

-.013

Engine size

.323

.256

.255

1.264

.209

-.184

.830

Horsepower

-.003

.006

-.124

-.497

.620

-.014

.008

Wheelbase

.092

.030

.553

3.108

.002

.033

.151

Width

-.027

.052

-.071

-.516

.607

-.129

.076

Length

-.017

.017

-.175

-.968

.335

-.052

.018

Curb weight

.317

.460

.141

.689

.493

-.595

1.229

Fuel capacity

-.062

.060

-.176

-1.027

.307

-.182

.058

Fuel efficiency

.029

.048

.095

.599

.551

-.067

.124

-1.344

3.113

-.432

.667

-7.514

4.826

-.053

.010

-.557

-5.065

.000

-.073

-.032

Engine size

.238

.188

.187

1.262

.210

-.135

.611

Wheelbase

.094

.029

.564

3.210

.002

.036

.152

Width

-.028

.052

-.073

-.537

.592

-.130

.075

Length

-.019

.017

-.199

-1.147

.254

-.052

.014

Curb weight

.377

.442

.168

.853

.395

-.499

1.254

Fuel capacity

-.062

.060

-.175

-1.024

.308

-.181

.058

Fuel efficiency

.031

.048

.103

.653

.515

-.063

.126

-2.502

2.239

-1.117

.266

-6.940

1.936

-.052

.010

-.547

-5.062

.000

-.072

-.031

Engine size

.204

.177

.161

1.153

.251

-.147

.555

Wheelbase

.094

.029

.565

3.224

.002

.036

.152

Price in thousands

(Constant) Price in thousands

(Constant) 4

Upper Bound

-7.453

(Constant)

3

Lower Bound .733

4-year resale value

2

Beta -.342

Price in thousands

1

Std. Error

Price in thousands

Length

-.022

.016

-.228

-1.380

.170

-.054

.010

Curb weight

.354

.439

.158

.806

.422

-.516

1.223

Fuel capacity

-.068

.059

-.192

-1.150

.253

-.185

.049

Fuel efficiency

.029

.047

.096

.617

.539

-.065

.123

-1.553

1.622

-.958

.340

-4.767

1.661

-.051

.010

-.539

-5.039

.000

-.071

-.031

Engine size

.167

.166

.132

1.006

.316

-.162

.496

Wheelbase

.096

.029

.579

3.340

.001

.039

.153

-.021

.016

-.218

-1.331

.186

-.052

.010

Curb weight

.262

.411

.117

.637

.526

-.553

1.077

Fuel capacity

-.083

.053

-.236

-1.556

.123

-.189

.023

-1.771

1.581

-1.120

.265

-4.904

1.362

-.050

.010

-.524

-5.037

.000

-.069

-.030

Engine size

.199

.158

.157

1.256

.212

-.115

.512

Wheelbase

.098

.029

.589

3.425

.001

.041

.155

Length

-.019

.015

-.196

-1.226

.223

-.049

.012

Fuel capacity

-.063

.042

-.177

-1.473

.143

-.147

.022

-2.338

1.515

-1.542

.126

-5.340

.665

-.050

.010

-.525

-5.043

.000

-.069

-.030

Engine size

.125

.147

.099

.852

.396

-.166

.415

Wheelbase

.070

.017

.422

4.011

.000

.036

.105

-.050

.041

-.141

-1.205

.231

-.131

.032

-2.593

1.484

-1.747

.083

-5.532

.347

-.045

.008

-.474

-5.595

.000

-.061

-.029

.073

.017

.441

4.292

.000

.039

.107

-.040

.040

-.113

-1.011

.314

-.118

.038

-1.920

1.326

-1.448

.150

-4.548

.707

-.049

.007

-.515

-6.945

.000

-.063

-.035

.061

.012

.369

4.980

.000

.037

.086

(Constant) Price in thousands 5

Length

(Constant) Price in thousands 6

(Constant) Price in thousands 7

Fuel capacity (Constant) 8

Price in thousands Wheelbase Fuel capacity (Constant)

9

Price in thousands Wheelbase

a. Dependent Variable: Natural Log of Sales

In this coefficient table if you look in the Sig. column, you can see that Wheelbase .002 and .001 is a good predictor of the outcome (measures less than .05), which is repeated multiple times throughout the model. “In a forward method, an initial model only contains the constant (b0). The computer (SPSS) then searchers for the predictor, which best predicts the outcome variable” (p. 212).

Table 9: Method - Forward Model Summary Model

R

R Square

Adjusted R

Std. Error of

Square

the Estimate

c

Change Statistics R Square

F Change

df1

Durbindf2

Sig. F

Change 1 2

Watson

Change

.524

a

.275

.268

1.14434

.275

43.544

1

115

.000

.607

b

.369

.358

1.07206

.094

17.032

1

114

.000

1.380

a. Predictors: (Constant), 4-year resale value b. Predictors: (Constant), 4-year resale value, Wheelbase c. Dependent Variable: Natural Log of Sales

In this model summary (adjusted r squared) you can see that 37% stands for both the 4-year resale value and wheelbase, but only 28% accounts for a 4-year resale value of the variation in sales. Table 10: Method - Forward Coefficients Model

Unstandardized Coefficients

a

Standardized

t

Sig.

95.0% Confidence Interval for B

Coefficients B 1

(Constant)

4.491

.196

4-year resale value

-.060

.009

-1.028

1.350

-.059

.009

.051

.012

(Constant) 2

Std. Error

4-year resale value Wheelbase

Beta

Lower Bound 22.903

.000

4.102

4.879

-6.599

.000

-.079

-.042

-.762

.448

-3.702

1.646

-.508

-6.812

.000

-.076

-.041

.307

4.127

.000

.027

.076

-.524

a. Dependent Variable: Natural Log of Sales

In this coefficient table if you look in the Sig. column, you can see that the 4-year resale value constant is .448, which is not a good predictor of outcome because it measures more than .05). The rest of the Sig. column is .000. After running all of the methods I also diagnosed the normal assumption of their pplot graphs and found that the models appeared both accurate fort he sample and generalizable to the population. Also each method supported an assumption relating to linear regression, which was noted underneath each table. As to which method best represented a fitted model; I found that Entered was the “best fitting” model. Out of all the other models this method showed the existence of a specific predictor, wheelbase, which was the most significant. Another key factor is the Durbin-Watson Statistic, found in table 3: Method – Entered. The D-W is 1.404, which is a value below 2 that indicates a positive correlation.

Upper Bound

4. Interpret the results of the “best fitting model” in terms of (a) predicting the outcome variable, and (b) generalizability of the results to the population at large, after defining what constitutes the population from which this sample may have been drawn. Support your responses using evidence drawn from the results of the regression. “Generalization is a critical additional step and if we find that our model fits the observed data well we can draw conclusions beyond our sample. Generalization is a critical additional step and if we find that our model is not generalizable, then we must restrict any conclusions based on the model to the sample used” (p. 214). After checking the assumptions (listed above) Entered Method most represented generalize. More assumptions met the population/sample. The Enter Method supports the representative sample of 157. In accordance to the wheelbase, you can see that for the most part, the p-plot is normally distributed. Figure 1: Entered Wheelbase P-Plot