Handbook on Poverty and Inequality

Page 307

CHAPTER 14: Using Regression

14

For this procedure to work, the initial probit model needs to include at least some variables that do not appear in the wage equation, so that the wage effect may be identified. This is not always easy, because the variables that affect what wage you get are also likely to influence whether you work for a wage or not.

Multicolinearity One of the most common problems in regression analysis is that the right-hand variables may be correlated with one another. In this case, we have multicolinearity, and the problem is that the estimated coefficients can be quite imprecise and inaccurate, even though the equation itself may fit well. To see how this might occur, suppose that we are interested in modeling the determinants of the number of years of schooling that girls get (y). We believe that a girl will tend to get more schooling if her mother has more education (ME), or her father has more education (FE), or she lives in an urban area (URB). A simple regression model would then look like this: yi = β0 + β 1MEi + β2FEi + β3URBi + εi. Many studies have found that the education of the mother has a strong influence on whether a girl goes to school. However, it is typically the case that educated people tend to marry each other (assortative mating); thus, a high level of ME will be associated with a high level of FE. The problem is that this makes it particularly difficult to disentangle the effect of ME on y from the effect of FE on y. If our only interest is in the value of β3 then this may not be troublesome, but frequently, we cannot get out of the dilemma so easily. And because ME and FE really do affect y, the fit of the equation is likely to be good. When an equation fits well but the coefficients are not statistically significant, it is appropriate to suspect that multicolinearity is at work. It is a good idea to look at the simple correlation coefficients among the various independent variables; if any of these are (absolutely) greater than about 0.5, then multicolinearity is likely to be a problem. In the extreme case in which MEi = γ.FEi exactly, we are unable to measure either β1 or β2 correctly. Substituting in for MEi gives the following: yi = β0 + β 1(γ FEi) + β2FEi + β3URBi + εi = β0 + (β lγ + β 2) FEi + β3URBi + εi. In this regression, we have left out ME, but the coefficient on the FE variable is no longer correct. In other words, dropping a variable that is collinear with other variables does not solve the problem. Indeed there is no easy solution to multicolinearity; the best hope is more, or perhaps more accurate, data, but finding such data is easier said than done.

283


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.