
11 minute read
Measurement and Interpretation
82 | Revisiting Targeting in Social Assistance
This chapter also provides a brief synopsis of the costs of targeting. To address disincentive costs, the chapter summarizes the conceptual issues and a few of the seminal works from the wider impact evaluation literature. The summary is brief because the literature is extensive and needs no addition. On administrative costs, the chapter outlines the issues in measurement and provides illustrative numbers from a range of programs. The evidence on transaction costs, stigma, and social costs that is available in the literature is much scarcer, but it is enough to signal the importance of such costs and efforts to reduce them. Altogether, the evidence on administrative costs, transaction costs, and stigma motivates chapter 4 on delivery systems and how they can/should be improved. Essay 9, in chapter 1, covers the political costs of targeting.
Other chapters bring in other parts of the empirical story. Chapter 5 illustrates how simulations can help in comparing geographic, demographic, and household-specific targeting methods. It also catalogs the nascent literature on experiments with comparative treatment arms among household-specific methods, mostly comparing community-based methods and proxy means testing. Such experiments are important because community-based targeting is difficult to simulate credibly. Chapter 6 provides the evidence base—especially from simulations and deep process evaluations—that illuminates how each targeting method can be designed to best advantage.
This chapter focuses on the most proximate indicators for targeting in the Besley and Kanbur (1990) framework—coverage, benefit levels, incidence, and simple estimates of changes in poverty. Policy makers or social policy analysts may differ in the weights they give to these outcomes vis-à-vis others, but all programs have distributional outcomes. For countries with pledges to reduce poverty and inequality and provide universal social protection within this decade, the chapter discusses a relevant although not comprehensive set of indicators.3
Measurement and Interpretation
Measurement across the Welfare Spectrum
The traditional framing of the targeting problem as dichotomous is too simplistic. In a dichotomous framing, the value of a transfer is the same for anyone below the poverty line/eligibility threshold, and it is zero for anyone above the threshold. It has long been recognized that in most people’s minds and in most social welfare functions, a more continuous valuation makes sense. A transfer to someone at the very bottom of the welfare distribution, say the 5th centile, seems more important than a transfer to someone in the 19th centile. The value of a transfer to someone in the 19th centile does not seem to be much different from the value of a transfer to someone in the
Unpacking the Empirics of Targeting in Low- and Middle-Income Countries | 83
21st centile, even if the 20th centile is eligibility threshold. Even a transfer to someone in the 30th centile would be valued more than a transfer to someone in the 60th centile. Dichotomous measures exaggerate the notion of targeting errors by placing as much value on an error in distinguishing between someone on the 19th or 21st centiles as an error in distinguishing someone in the 5th or 60th centiles. In a continuous welfare function, the first kind of error, just around a threshold, is much more frequent but not nearly as important as the second kind of error among people with very different levels of welfare, which occurs less frequently.
The continuous welfare function has important consequences for measurement—giving preference to more continuous measures of coverage and incidence or summative measures, such as impacts on poverty or over the whole distribution rather than the dichotomous formulation of errors of inclusion and exclusion. This theme is taken up in greater detail in chapter 7. Although the limitations of dichotomous framing have long been understood, the use of the dichotomous measures of errors of inclusion and exclusion is still surprisingly prevalent.
A difficulty in evaluating the targeting problem is in weighting the errors of inclusion versus the errors of exclusion. In poverty economics, this problem is often solved by the choice of the welfare function. For example, a common choice is to use a Foster-Greer-Thorbecke poverty headcount (FGT0) measure for a dichotomous welfare function and an FGT poverty gap (FGT1) or squared poverty gap (FGT2) for a continuous version. These measures can be applied to simulations of different transfer levels, budgets, selection rules, and the implied targeting errors. Such exercises help in exploring trade-offs (Acosta, Leite, and Rigolini 2011; Brown, Ravallion, and van de Walle 2016; Knox-Vydmanov 2011, embedded in the ADePT software4). Other ways to weight one kind of error against another are used and described in chapters 6, 7, and 8.
The form of the welfare function determines how much more value is put on the welfare of the very poorest versus that of the just poor, the not quite poor, or those who are better off. Most such continuous welfare functions imply that there is some tolerance for errors of exclusion. If a few of the intended population are excluded but enough extra value can be given to the poor(-er) who are included, it will reduce overall poverty or inequality and thus be judged better for society as a whole.
For policy makers, the available budget can influence the relative importance of one type of error versus the other in a more practical way. Consider two thought experiments: • With a firmly binding budget constraint. This scenario starts with a baseline of whatever poverty levels have gone before the initiation of a possible new poverty-targeted program, which will at least initially be budget rationed to include fewer than all the poor in the country. In such a
84 | Revisiting Targeting in Social Assistance
scenario, a poor person who is excluded by a targeting error ends up with the same welfare5 as those who are unserved because the program is small. A needy person served by the program is better off. And any nonpoor person who is included (error of inclusion) takes up budget that could have helped a poor person in the target group. In this scenario of budget rationing, reducing errors of inclusion is a means to reduce errors of exclusion, and errors of exclusion are inevitable given the rationing.
This scenario fits well the situations in poor countries that are just starting to build their social protection systems. Beegle, Coudouel, and
Monsalve (2018) provide several African examples using the latest data available at the time of the study. In Ethiopia, the poverty rate (measured by the international standard of $1.90 purchasing power parity/day) was 34 percent, but social assistance covered only 8 percent of the population.
In Kenya, the poverty rate was also 34 percent, but social assistance covered only 6 percent of the population. In Tanzania, the poverty rate was 47 percent and social assistance covered 13 percent of the population.
These three countries have significant, new flagship programs. In countries such as Sierra Leone or Madagascar, the disjuncture was much larger—in Sierra Leone, there were more than 10 times as many poor people as those served by social assistance; in Madagascar, it was more than 20 times. In such cases of budget rationing, including nonpoor people in the programs will crowd out the poor. To cover all the poor, increasing budgets is vital, but reducing errors of inclusion will help as well. • With a less binding budget constraint. An alternative scenario starts with a budget that is sufficient at least to serve all those who are poor plus any nonpoor who are in the program by design or due to errors in eligibility assessment. In this case, as before, reducing errors of exclusion is vital to ending poverty and realizing the principle of nondiscrimination as articulated in the human rights frameworks. However, the ability to do so is not rationed by the budget but by potential deficiencies in the delivery system or targeting mechanism. Reducing an error of inclusion may save budget, but with a budget that is already sufficient to serve all the poor, it will not map directly to reducing errors of exclusion.
The increasing use of human rights rather than economic perspectives and the prevalence of the goal of universal social protection seem to be building a consensus that errors of exclusion must be given greater weight. However, the concern about errors of exclusion is hardly new, having been prominently flagged years ago in the United Nations Children’s Fund’s (UNICEF) work on the social costs of adjustment and calls for adjustment with a human face (see, for example, Cornia, Jolly, and Stewart 1987). Economic welfare functions that place heavier weight on the welfare of the poor than the less poor or nonpoor are consonant with the consensus that errors of exclusion are important.
Unpacking the Empirics of Targeting in Low- and Middle-Income Countries | 85
Pros and Cons of Household Survey–Based, Cross-Country Comparisons
Looking at a wide range of countries and programs can help in understanding the degree to which there are common findings or marked variability. Understanding the degree to which inferences can be made helps to establish realistic expectations.
Household survey–based analysis of targeted social programs is highly sensitive to the method used, which puts a premium on being able to use primary data that can be handled with comparable methods. Results are sensitive to how welfare aggregates are constructed—whether households are ranked by welfare using the pre- or posttransfer welfare, to poverty lines, to eligibility thresholds, and so forth. The World Bank invested in the ADePT SP software6 and ASPIRE7 to improve comparability across some of these dimensions, and the analysis in this chapter relies on these strengths.
However, general purpose household surveys such as those captured in ASPIRE may miss a portion of social assistance programming. The surveys are usually designed for multiple purposes and often rooted in providing weights for the consumer price index. Such surveys may not have samples among the poor that are large enough or questionnaires that are well-tuned to pick up participation in social protection programs, especially small ones. The problem may be most acute for low-income countries where survey data tend to be scarcer and social protection programs have only recently emerged at scale. Until programs are well cemented in national policy and large enough to observe systematically in survey samples, it is unsurprising that questionnaire designers would not alter their traditional questionnaires, especially since a large body of survey design practice shows that measurements are sensitive to changes in instruments. Chapter 7 provides more on these issues, including some examples.
For international benchmarking, it is usual to report on metrics that are useful for cross-country comparisons but that may be different from those used for specific countries or programs. There are two especially common aspects of this. • The threshold. ASPIRE analysis often discusses the poorest quintile, as this chapter does as well. But, as discussed in chapter 1, eligibility thresholds for different programs can be set for smaller or larger shares of the population, and poverty rates will differ as well. In high-income countries, the poverty line or eligibility threshold for guaranteed minimum income programs may be set lower than the bottom quintile. For example, Chile designed its Chile Solidario program for the bottom 5 percent of the population because that was the share that was chronically poor by the
86 | Revisiting Targeting in Social Assistance
country’s standards. In a low-income country (LIC), a program with a low budget may aim to cover only the poorest, say, 10 percent of the population at the start, even if this is a lower threshold than the absolute poverty line. Even if such programs perfectly meet their program-specific goals, the programs would show under-coverage in a cross-country comparison using the lowest quintile as the target group. To help counterbalance this problem, the next section presents findings for the whole distribution rather than focusing solely on the bottom quintile as the only threshold. • Ancillary eligibility criteria. Simple cross-country, cross-program benchmarking does not account for all the other characteristics or criteria that influence eligibility for a given program. For example, many social assistance programs are for children (child grants and school feeding programs) or families with children (conditional cash transfers and many other unconditional ones). However, a nontrivial share of households do not have children in the eligible age range (a much lower share in Africa than in Eastern Europe due to differences in demography and the frequency of multigenerational households) and conversely for programs dedicated to the elderly. To help counterbalance this problem, the next section presents some findings for the whole of social assistance programming, rather than program by program.
To allow cross-country comparability, the comparisons use a common welfare aggregate, which differs from the welfare aggregate used by some of the programs being compared. The income or consumption used in the analysis could differ from the country-specific or program-specific definition (for example, not accounting for specific types of income). The welfare aggregate is harmonized for differences in the size of the household in a per capita indicator, while some programs may use an adult equivalence scale to determine the operational welfare aggregate used for eligibility and/or define multiple assistance units within the household. Because of these factors, the harmonized welfare aggregate does not coincide with the operational welfare aggregate that some individual programs use, causing a downward bias in the “true” benefit incidence of those programs.
More precise assessments of misclassification in eligibility assessments in a specific program are specialized and conducted with a variety of methods. For means- or asset-tested programs, the assessments may rely on re-interviews of households, more extensive cross-checks with other databases, and triangulation between what is reported in application files and what is seen in representative national surveys (see box 6.7 in chapter 6 for an illustration). For proxy means tests or geographic targeting, ex ante simulations are often used to show how well the algorithms select from the larger pool represented in household surveys (see discussions in chapters 5, 6, and 7). For community-based methods, the results may be compared