How to Harness the Power of Data and Inference | 431
systematic prediction errors for a large number of different subgroups, such as gender, ethnicity, and religion, but without careful management, oversights, and audits, machine learning algorithms can encode bias. It will also be important to learn whether or how much using big data to determine eligibility for benefits may change behavior. If financial transactions are used to model household welfare, will some transactions move back to cash to hide them? Will households avoid purchases of important goods just to remain eligible? Will households change or mask their calling behavior or even their movement patterns? Will they self-censor on social media or post misleading information? For example, when it was realized in Kenya that GiveDirectly was determining household eligibility based on satellite assessments of roofing, households would not upgrade their roofs. As credit based on mobile phone use systems has scaled, manipulation has become commonplace as borrowers learn what behaviors will increase their credit limits (see the references in Björkegren, Blumenstock, and Knight [2020]). Potential incentive issues are not unique to this type of big data,82 but they take on new forms that are not yet well explored or quantified. Efforts are being made to make machine learning approaches more robust to strategic behavior (Björkegren, Blumenstock, and Knight 2020; Hardt et al. 2015). In summary, private big data offer exciting possibilities for improving eligibility determination, but work on various fronts—technical, social, and legal—will be needed for fully dimensioning and grasping these. This is a field that has been developing quickly with big advances even during the time the rest of this book was being conceived and written. It seems likely that data science will advance as quickly as ground truth data can support, and this will induce further needed attention on the regulatory and policy fronts.
Key Elements for Community-Based Targeting Traditional CBT takes a far different tack on discerning who is poorer. It forgoes interoperability among government databases, household-byhousehold quantitative surveys, and fancy algorithms. Rather it uses a group of community members or leaders, en masse or in committees, as the main agents in the selection of beneficiaries for social assistance p rograms. The community members are expected to have enough knowledge about their neighbors from their day-to-day lives—who buys how much of what in the market, how people work, what clothes or shoes they wear, and how they participate in community social interchange—so that they could do some sort of needs assessment without carrying out special purpose data collection. In the sense that the data used are already generated for other purposes (community life), it is like big data but maybe the term