Linear Regression Simulator 2018 Jonas Borgström∗ Teddy Edlund ∗ Jesper Tinell ∗ Daniel Olsson Rasmus Säfvenberg ∗

∗

May 21 – 2018

Contents 1 Introduction 1.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 2 2

2 Terminology

4

3 Instructions 3.1 Interpretations . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5 5

4 Examples

6

5 Troubleshooting

7

6 We out!

7

∗ Umeå

Universitet

1

1

Introduction

Regression is one of the most common methods in statistics and linear regression is a simple yet effective choice when it comes to analyzing and prediciting. The main idea behind regression is ”to examine two things: does a set of predictor variables do a good job in predicting an outcome (dependent) variable? (2) Which variables in particular are significant predictors of the outcome variable, and in what way do they–indicated by the magnitude and sign of the beta estimates–impact the outcome variable? These regression estimates are used to explain the relationship between one dependent variable and one or more independent variables. ” https://www.statisticssolutions.com/what-islinear-regression/

1.1

Purpose

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Etiam lobortis facilisis sem. Nullam nec mi et neque pharetra sollicitudin. Praesent imperdiet mi nec ante. Donec ullamcorper, felis non sodales commodo, lectus velit ultrices augue, a dignissim nibh lectus placerat pede. Vivamus nunc nunc, molestie ut, ultricies vel, semper in, velit. Ut porttitor. Praesent in sapien. Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Duis fringilla tristique neque. Sed interdum libero ut metus. Pellentesque placerat. Nam rutrum augue a leo. Morbi sed elit sit amet ante lobortis sollicitudin. Praesent blandit blandit mauris. Praesent lectus tellus, aliquet aliquam, luctus a, egestas a, turpis. Mauris lacinia lorem sit amet ipsum. Nunc quis urna dictum turpis accumsan semper.

1.2

Background

Now this is a story all about how My life got flipped turned upside down And I’d like to take a minute, just sit right there I’ll tell you how I became the prince of a town called Bel-Air In West Philadelphia, born and raised On the playground is where I spent most of my days Chillin’ out, maxin’, relaxin’ all cool And all shootin’ some bball outside of the school When a couple of guys who were up to no good Started makin’ trouble in my neighborhood I got in one little fight and my mom got scared And said ”You’re movin’ with your auntie and uncle in Bel-Air” I begged and pleaded with her day after day But she packed my suitcase and sent me on my way She gave me a kiss and then she gave me my ticket I put my Walkman on and said ”I might as well kick it” First class, yo, this is bad Drinkin’ orange juice out of a champagne glass Is this what the people of Bel-Air livin’ like? Hmmm, this might be all right But wait, I hear they’re prissy, bourgeois, and all that Is this the type of place that they just sent this cool cat? I don’t think so, I’ll see when I get there I hope they’re prepared for the Prince of Bel-Air Well, uh, the plane landed and when I came out There was a dude looked like a cop standin’ there with my name out I ain’t tryin’ to get arrested yet, I

2

just got here I sprang with the quickness like lightning, disappeared I whistled for a cab and when it came near The license plate said ’Fresh’ and it had dice in the mirror If anything I could say that this cab was rare But I thought ”Nah, forget it, yo holmes, to Bel-Air!” I pulled up to the house about 7 or 8 And I yelled to the cabbie ”Yo holmes, smell ya later” Looked at my kingdom, I was finally there To sit on my throne as the Prince of Bel-Air

3

2

Terminology • P-value - the probability of getting the observered value or a more extreme value, given that the null hypothesis is true. • Residual - the difference between the observed value and the theoretical value • Response variable - the variable that is the basis for the research question • Explanatory variable - a variable that is used to explain change in the response variable. • Correlation - the strength of a linear relationship between variables • Significance level - the probability of commiting a type 1 error, i.e. rejecting the null hypothesis despite the fact that it’s true. • Fitted values - values predicted by a model for a specific dataset. • Theoretical quantiles - quantiles following a normal distribution with mean 0 and standard deviation 1. • Sample quantiles - quantiles from the model which can be plotted against the theoretical quantiles to test normality assumptions. • F-test - tests if any of the explanatory variable has an effect on the response variable. • T-test - tests if one of the explanatory variable has an effect on the response variable. One t-test is calculated per explanatory variable. • R-squared - how much of the variation in the response variable is explained by the independent variable(s) • Degrees of freedom -the number of values in the final calculation of a statistic that are free to vary • Estimate - estimated coefficient of the explanatory variable that shows how the response variable changes when the explanatory variable increases by 1. • Standard error - the standard deviation of estimated coefficient.

4

3

Instructions

The usage of the tool should be straightforward; 1. Begin by either uploading your own dataset and using that or by choosing a datset from the three available. 2. Choose the variable you wish to examine as your response variable. 3. Proceed to choose one or a set of explanatory variables. These variables are used to construct a linear model where the chosen response variable is used as y and the explanatory variables are used as xi , i = 1, 2, ..., i 4. Once these are chosen; a Q-Q plot (used to test normality) and a residuals vs fitted values plot (used to test for mean 0 and heteroscedasticity). It is also possible to show a correlation matrix of the variables in the dataset. 5. Now it is your job to decide if the linear model is an appropriate choice for the selected variables or if there are better alternatives...

3.1

Interpretations

Normal Q-Q Plot: The normality assumption is fulfilled if the points approximately follow the normality line. If the points are ”heavier” in the tails then the normality assumption is most likely not fulfilled. Fitted values vs Residuals: The assumption about equal variance is fulfilled if the points do not follow a pattern such as the ”banana pattern”, while the mean 0 assumption is fulfilled if the points are equally distributed above and below the mean line. P-value (F-test): If the p-value is below the given alpha level, then the null hypothesis is rejected. One interpretation could be: ”We have empirical evidence at a given alpha level that atleast one of the explanatory vairables has an effect on the response variable.” P-value (T-test): If the p-value is below the given alpha level, then the null hypothesis is rejected. One interpretation could be: ”We have empirical evidence at a given alpha level that atleast one of the explanatory vairables has an effect on the response variable.” Coefficients: Shows how the response variable changes when an explanatory variable increases by one unit. ”Each time the explanatory variable increases by one unit, the response variable changes by y units, given that the other variables are included in the model.”

5

4

Examples

6

5

Troubleshooting

6

We out!

7

Advertisement