382 | Revisiting Targeting in Social Assistance
BOX 6.5
PMTs Are a Predictive Model Exercise, Not a Causal Effect One In general, statistical models are powerful tools for causal explanations, prediction, and description of data. A great deal of research/econometric analysis uses statistical modeling to test causal claims. In a regression, the causal claim follows a simple structure in which each of the covariates (called independent variables) is assumed to have a causal influence (regression coefficient) on the dependent variable (for example, income or consumption per capita). The models are based on the assumptions that covariates cannot have any causal influence on one another and there is no reciprocal causal influence from the dependent variable to any of the covariates. In the targeting world, most of the time, similar models are used for their ability to predict what income or consumption would be when it is not measured. The inference is not causal but rather about association. Hence, strong underlying assumptions that are needed to determine causality are not needed or are incorporated in a less formal way. Consequently, the best model is not the one with high explanatory power or R2, but the one with high predictive power, which is quite different. Shmueli (2010) highlights the main differences between explanatory and predictive modeling. First, predictive modeling tends to have higher predictive accuracy than explanatory statistical models. Second, predictive models aim at (1) looking for association between the x (covariates) and y (dependent variable), (2) not having a requirement for direct interpretability in terms of the relationship between x and y, (3) having a forward-looking approach instead of testing an existent set of hypotheses, and (4) reducing at once the combination of bias (the result of misspecification of the model) and estimation variance (the result of using a sample). Addressing these points in predictive models translates into a different approach for selecting the covariates. While building a model for proxy means testing (PMT), the aim is to find correlations and associations rather than to look for causal structure, endogeneity, or reverse causality. The main criteria for selecting the set of covariates are the quality of the association between them and the dependent variable, as well as preexisting knowledge of correlation/association that does not necessarily come from the data set but from other studies or local knowledge.a This procedure is different from explanatory models, where researchers must (1) only keep significant variables in the model, (2) address continued next page