How to Harness the Power of Data and Inference | 413
There are various approaches in the machine learning literature. This section highlights four categories of models that are commonly used: robust models, penalized regression models, nonlinear models, and tree-based models, which use different optimization algorithms and can be contrasted with traditional PMT models. A brief and high-level overview of the differences in the modeling philosophies—and not an explanation of the differences between individual algorithms (model approach for maximization of the functions)—for each of the models is presented in box 6.8. A fuller treatment is provided in Areias et al. (forthcoming), based on James et al. (2013) and Kuhn and Johnson (2018).
BOX 6.8
Machine Learning Models That Are Commonly Used Robust Models Linear models (ordinary least squares [OLS] regressions commonly used in proxy means testing) can be overly influenced by outliers. If these outliers violate the assumptions on which linear models are based, the resulting performance of the linear model can be poor. Robust models are “robust” (less sensitive) to outliers. While they are computationally more intensive, robust models have less restrictive assumptions and thus can perform better across a wider range of data.a The main algorithms are robust linear regression and various quantile regressions with different methods for variable and quantile selection.
Penalized Regressions Penalized regression models use shrinkage methods, in that they shrink the OLS coefficient estimates toward zero by introducing a penalty term on the coefficients. The objective of shrinkage methods is to reduce the variance of the models significantly with only a small increase in bias, thus reducing overall model error. This can be the case particularly when multicollinearity between explanatory variables is high. The basic approach is the same as OLS but introduces a penalty on the size of the coefficient (the degree to which a variable explains the outcome of interest). If a variable does not significantly improve the model, its role in the model is reduced. The main algorithms are Lasso regression, Ridge regression, and elastic net regression. continued next page