B6.6.1 Distribution of Winners and Losers, by Demographic Group

from Revisiting Targeting in Social Assistance

by World Bank Publications

7.5 Performance Triangle for Two Programs

How to Harness the Power of Data and Inference | 399

BOX 6.6 (continued)

Table B6.6.1 Distribution of Winners and Losers (Intensive and Extensive Margin), by Demographic Group (percent)

Demographic categories Always ineligible Newly eligible

Newly ineligible

Benefit reduction Benefit increase Total

Residence type

Urban 52.7 23.4 28.3 38.4 34.1 49.2

Rural 47.3 76.6 71.7 61.6 65.9 50.8

Female-headed household

No

75.7 71.2 62.2 59.6 66.6 73.9

Yes

24.3 28.8 37.8 40.4 33.4 26.1

Number of children

None 45.3 21.6 78.5 42.6 7.6 43.8 1 or 2 47.8 50.8 17.0 51.8 63.8 47.8 3 and more 6.9 27.6 4.5 5.6 28.6 8.5

Highest level of education for any member of household

Lower secondary or less

1.9 10.2 7.7 8.2 2.0 2.7

Upper secondary 18.5 48.5 46.8 42.9 47.5 23.1

Secondary vocational 24.2 25.2 36.3 28.0 34.2 25.2

University 55.5 16.1 9.2 20.8 16.2 48.9 Source: Baum, Mshvidobadze, and Posadas 2016.

pensioners, persons with disability, and internally displaced persons, compensatory measures were designed, piloted, and implemented, which reduced the number of losers and increased the number of winners. By 2015, the government reformed the TSA to implement a simplified and more effective PMT formula. Compensation measures to reduce the losses of the losers from the reform were introduced in August 2015. Another round of revision of the scoring formula, focusing mostly on updating the coefficients and simplifying the formula, was implemented in 2020 (Honorati et al. 2020).

a. Most variables provided by households to the Social Services Agency are cross-verified against various databases from several sources, including the Ministry of the Interior (car registration), gas and electricity companies, the revenue service, and customs control. b. The pretest sample comes from the Social Services Agency database of the TSA applicants that have applied for benefits since June 1, 2010, comprising 407,307 households. Using two-stage cluster sampling, 4,560 households were selected. Full interviews were conducted with 3,565 households.

400 | Revisiting Targeting in Social Assistance

because the household used the transfers to improve their living conditions, but that would not necessarily imply that they have higher autonomous incomes. One approach to get around this issue is to allow households to remain in the program for a time although their PMT score is above the entry threshold (for example, having different entry and exit thresholds). This is somewhat analogous to means tests with income disregards. In Mexico, for example, the exit threshold of the Oportunidades program was conditional on achieving concrete outcomes in food security, health, and education. For this reason, there was a higher exit criterion to accommodate improvements in beneficiary household well-being. There is a logic to this, but it implies treating households with the same PMT score differently, which contravenes the usual notion of horizontal equity. There is thus a delicate balance to be found.

Finally, new variables for use in both traditional PMT and machine learning can be identified and collected. Programs often have monitoring and evaluation systems that benefit from quantitative and qualitative assessments. Such evaluations can inform program administrators of relevant variables that are not currently used in the model or not available in national household surveys, which could improve prediction. Such findings must be discussed with national statistical offices to evaluate the possibility of adding such data in the upcoming surveys.

In summary, updating traditional PMT or a machine learning model implies not just running new regressions, but also thinking carefully about updating the full set of applicants and revisiting the overall implementation process. The timing of updating should consider the cycle of recertification for the social registry or the main programs that use the method to determine eligibility, and plan proper communications and a new strategy for excluding/including new households due to winners and losers caused by the changes. Moreover, improvements/investments in the interoperability of the information system may lead countries to move from PMT to HMT or means testing, as well as to define new strategies to reduce inclusion errors by applying other criteria as asset filters. Exclusion errors caused by flaws in the delivery system would remain unchanged (see chapter 3) with a model update, but they could be addressed in the associated round of data collection.

Data-Related Limitations and Considerations

Building PMT and machine learning models requires good data and good analysis. Many countries face one or more of three principal challenges on the data side: (1) limited periodicity of the surveys, (2) small sample sizes, and (3) sample design.

How to Harness the Power of Data and Inference | 401

The data that are most commonly used to determine proxies and formulae are obtained from income and expenditure surveys collected by national statistical offices. These surveys are used for weighting the consumer price index and poverty and inequality studies, as well as for indicators of human capital and social development. In poorer countries, surveys are more likely to be fielded every five years or so; in upper-middle-income countries, statistical offices field them every two or three years. Having gaps of five or more years between surveys can affect the quality of PMT models as the predictors of poverty may have changed.

The addition of ancillary data can provide a good idea of the importance of certain kinds of local conditions, local infrastructure, and vulnerability to shocks as covariates to explain welfare. The small area estimation modeling of Elbers, Lanjouw, and Lanjouw (2003) shows that the best predictions for small areas come from having a different model for each area. However, most household surveys’ sample sizes do not allow for such a fine breakdown. In traditional PMT, at least, researchers should allow for different slopes for different localized areas within regional or national models by using interactions. In addition, when small area models are not possible, an improvement that can be done is the addition of ancillary data from the small area into higher-level models.

Limited sample sizes and the sampling design57 of household surveys can also affect the accuracy of the PMT formulae. Household surveys are designed to generate estimates at a particular stratum with enough observations to provide credible estimates with low standard errors, based on a representative number of clusters within a set of enumeration areas, and with a small number of observations per cluster (12 or 15 observations typically). Sampling weights (probabilistic weights) are given to each observation in each cluster and enumeration area for each stratum based on population and cluster characteristics, which can lead to different individual weights, to generate unbiased statistics at the stratum level. This approach is robust as it minimizes intracluster variance and maximizes between-cluster variance. It is efficient for most statistical tests and models because it directly addresses the bias-variance issues that haunt researchers and helps in understanding the average characteristics of the population. Nevertheless, for predictive modeling such as traditional PMT, the analysis relies on having greater variability to capture within-cluster differences and differences between populations; that is, predictive power is placed on the between-cluster variances.58 For this reason, it is preferable to have (for a fixed sample size) a different partition that would guarantee that individuals within clusters are no more similar than individuals in different clusters. That is, it would be preferable to have fewer clusters and more observations per cluster to understand the within-cluster correlations for predicting the nature of their behaviors.

402 | Revisiting Targeting in Social Assistance

The sample size and the fact there is often only a single data set also have implications for the way the predictive power of a model is measured. Predictive modeling power is measured by its capacity to predict household welfare using a different set of data than that which generates the model. To do this, most frequently, researchers do data partitioning. Traditionally, this has meant splitting the sample into two groups, one for modeling (or training) and one for testing. This is an attempt to overcome different sources of prediction error: model variance and model bias.59

Some researchers use resampling methods, such as bootstrap or jackknife resampling,60 because more data are preferable for predictive modeling to have better control over bias and variance. More specifically, the approach of resampling allows better measurement of the variance, showing the variability of a model prediction for a given data point. An approach from machine learning that is becoming increasingly popular in PMT development is to split data into training and test data sets. However, instead of just splitting the data once, a standard machine learning approach (called k-fold cross validation) splits observations into multiple groups and takes turns estimating the models on some of the groups and testing them on other groups. For example, the data might be split into 10 random and equal groups (or folds). The model is estimated using nine of the groups as the training set and the last group as the test set. This process is then repeated after moving the test group into the training set and swapping one of the previous training groups out to be the new test set. Once all the groups have been in both the training and test sets, the model’s performance is assessed by averaging the errors across all iterations. This has the advantage over just using a single split of ensuring that every observation can appear in both the training and test sets.61

Nevertheless, there is always a trade-off between bias, which is the error that measures the difference between the (average) prediction of the model and the correct value for a household and mainly caused by underfitting and overfitting, and variance. In predictive modeling, the aim is to have both low bias and low variance. When choosing a model and selecting variables, it must first be acknowledged that the predictions are mostly based on a single data set, meaning that there is greater control over bias and less control of variance without resampling. Moreover, Greene (2011) recommends the following: (1) acknowledging/respecting the estimator’s properties; (2) addressing multicollinearity that can be identified when small changes in the data produce wide changes in the parameter estimates, or when coefficients have very high standard errors and low significance levels although they are jointly significant and the R2 for the regression is quite high, and when coefficients may have the wrong sign or implausible magnitudes; (3) avoiding using pretest estimators that try to address multicollinearity by adding a third estimator (pretest), which is not

How to Harness the Power of Data and Inference | 403

recommended as an ad hoc remedy for multicollinearity; and (4) understanding data measurement errors, which leads to a better understanding of data/modeling issues that have an influence on the bias-variance tradeoff. This is important because at the core of this relationship, the analyst must deal with overfitting and underfitting of the model. As bias is reduced (better prediction of actual outcomes), variance is likely to increase (wider range of predictions in different data sets). In other words, in moving away from underfitting by adding more variables, the likelihood of variance sharply increases due to model complexity. As more and more parameters are added to a model, the complexity of the model rises, and variance becomes the primary concern while bias steadily falls. As Fortmann-Roe (2012) shows, there is a sweet spot, called model complexity, for any model where the level of complexity at which the increase in bias is equivalent to the reduction in variance, meaning that more complexity would overfit the model while less complexity would underfit the model. However, finding the right complexity level is not a simple task. There is no right or wrong way to find this balance, but as an acceptable prediction error can be set, the analyst can simply explore different levels of complexity and then choose the complexity level that minimizes the overall error.

When using statistical methods to create “pseudo” testing samples, resampling, or deciding to partition the current data set for testing and model calibration, researchers must always pay attention to and respect household sampling design. Household surveys include expansion factors/ weights that indicate how many of the number of units in the population being surveyed each of the sampled units represents. Hence, using expansion factors/weights as sampling weights is necessary for the model estimation of proportion, means, and regression parameters. When partitioning or resampling to create pseudo test sampling, researchers must first ensure that the sampling design is incorporated into the process. Selecting a process by convenience implies that the results may not be representative of the population, adding bias in the estimates generated. Moreover, survey balance can disappear without proper definitions of strata, clusters, and enumeration areas. Respecting sampling design and adding a reweighting process to correct the sampling weights is needed to guarantee that both the treatment and testing (holdout) sample, as well as all pseudo-samples generated by statistical methods, still represent the population for which the household survey was designed.

Hence, testing whether the household survey data set of the potential regressors/covariates distribution matches that of the population of interest is key before running predictive models. Comparing the descriptive statistics and covariate distributions (of both the training and holdout or pseudosamples) with other data sources, such as censuses, Demographic and Health Surveys, labor force surveys, and so forth, can indicate how

B6.6.1 Distribution of Winners and Losers, by Demographic Group

Next Article

7.5 Performance Triangle for Two Programs

Table B6.6.1 Distribution of Winners and Losers (Intensive and Extensive Margin), by Demographic Group (percent)

Demographic categories Always ineligible Newly eligible

Newly ineligible

Benefit reduction Benefit increase Total

Residence type

Female-headed household

No

Yes

Number of children

Highest level of education for any member of household

Lower secondary or less

Data-Related Limitations and Considerations

More articles from this publication:

7.5 Performance Triangle for Two Programs

7.9 Relative Efficiency of Programs

Concluding Remarks

7.13 Exclusion and Inclusion Errors

the Poverty Line

7.12 Impacts on Poverty and Inequality

7.3 Inclusion and Exclusion Errors in a 10-Person Economy

7.4 Targeting Differential

What to Look for When Conducting Method Assessments

This article is from:

Revisiting Targeting in Social Assistance