6.10 Machine Learning, Big Data, and Human Rights

from Revisiting Targeting in Social Assistance

7.5 Performance Triangle for Two Programs

428 | Revisiting Targeting in Social Assistance

Using big data for eligibility determination may raise social or ethical concerns that bear consideration. The use of data on personal actions, such as phone calls or social media posts, for eligibility determination takes on a larger cultural dimension of “Big Brother watching,” especially if people do not fully understand that it is not the content of the conversations that is monitored but the somewhat less intimate details of their frequency, length, origin, whether incoming or outgoing, and text or voice (although it does include who else you have been talking to). Will people feel comfortable if how they chat with friends by phone, Facebook, or Twitter; search on Google; or buy on Amazon influences their eligibility for social protection programs? And how will the outcomes be explained? Although this book is not the place, we believe it is important that a deeper look be made into not just what legal protections are required governing data ownership and protection, but more broadly what sociocultural considerations should be explicitly brought to bear on these issues.

Big data and machine learning may raise some significant human rights issues (see box 6.10), although work is underway to try to provide privacy guarantees while still allowing public good applications.81 In the case of Togo, the academic team carefully tested for demographic parity and

BOX 6.10

Machine Learning, Big Data, and Human Rights

Machine learning, private big data, and biometric technology may generate gains to the delivery system and selection of beneficiaries of social protection programs, but they also pose risks and challenges that must be considered and minimized or mitigated.

Among the advantages, biometric identification systems help in uniquely identifying people for social protection through systems such as fingerprints, iris, and face recognition, allowing not only such identification but deduplication and interoperability of systems. Machine learning and big data can help in understanding poverty and patterns of need, matching people to programs, and changing the interaction between people and the state (Gelb and Metz 2018).

However, there are multiple risks that have implications for human rights. Ohlenburg (2020a) and Sepulveda (2018) are thoughtful sources. Among the risks they cite are (1) inaccuracy of data or exclusions from data or algorithms, (2) identity theft/data protection, and (3) security risks and the misuse of data.

continued next page

How to Harness the Power of Data and Inference | 429

BOX 6.10 (continued)

With biometric technology, data inaccuracy may occur when individuals enroll their biometric data. For example, biometrics are not good at reading the fingerprints of a share of manual workers or the elderly as finger pads can be worn down, or the irises of those with glaucoma or cataracts. Thus, during the process of matching an individual’s biometric against a template stored in a database, inaccurate matches/nonmatches may occur.

Machine learning algorithms also risk perpetuating inequalities and bias (exclusion) against certain groups.a Modeling may be based on data that reflect historical biases. When a learning algorithm is presented with data that reflects historical discrimination, it may learn to imitate the biased patterns of the past. For example, a machine learning algorithm could discern that variables such as ethnicity are good predictors of outcomes. But except for certain affirmative action programs, in most places, ethnicity-based targeting would not be acceptable. Or the machine learning algorithm could pick up other aspects of historical discrimination, for example, related to ethnic ghettos or redlining a discriminatory practice in which services are withheld from potential customers who reside in neighborhoods classified as “hazardous” to investment; these residents largely belong to racial and ethnic minorities (Zou and Schiebinger 2018). As computers cannot distinguish between ethical and unethical decisions, data scientists need to be aware of historical biases and consider how they can be addressed.b Machine learning will exclude all that are not part of the data-generating technology on which the artificial intelligence system relies; mobile phone use is an obvious example. As access to digital services tends to rise with income, the poorest are the most likely to be data poor as well and consequently excluded.

A related risk is that machine learning algorithms learn best to predict the data set on which they are trained. The World Development Report 2021 on data highlights several issues (World Bank 2021b). For example, an algorithm to predict drug use in California trained on police arrests and predicted predominantly African American communities despite survey data indicating widespread use across Caucasian and African American communities (Smith IV 2016). Facial recognition algorithms were first trained on Caucasian males and so recognize them best; they perform less well on African Americans (Hill 2020) and worst on African American females, with error rates reaching 35 percent (Buolamwini and Gebru 2018). In the case of the former, this has led to mistaken identities and arrests.c Voice recognition also suffers from male bias (Tatman 2017).

continued next page

430 | Revisiting Targeting in Social Assistance

BOX 6.10 (continued)

For data protection, it is necessary to have proper safeguards in place to ensure that identifier indicators are hard to compromise, reducing the chance of imposters gaining access to data. The purpose of social registries is widespread use by different government agencies, and a strength is that they can draw together data from different agencies. However, this richness carries risk. Without proper safeguards, systems may allow too many users too much access. Data protection and security must be core elements of system design, starting with the internal procedures of organizations storing information and with proper controls.

Moreover, as World Development Report 2021 notes, in principle, data protection laws limit the use of personal data, but generally exceptions exist. In most cases, these are limited to specific uses, such as national security, but in other cases, they are wide ranging. Justifications for these exceptions are required in a third of highincome countries but less than a tenth of low-income countries—which opens the door for additional opportunities for unchecked state surveillance, thus undermining trust in data use (Ben-Avie and Tiwari 2019). In addition, World Development Report 2021 notes that the increasingly widespread practice of linking data sets stretches the limits of anonymization, creating the possibility of reidentifying deidentified data and blurring the boundary between personal and nonpersonal data (Lubarsky 2017).

On security risks and misuse of data, manipulation of personal data raises the risks of violations of rights, such as: (1) loss or unauthorized access, destruction, modification, or disclosure of data; (2) misuse of the information by governments or the private sector for systemic surveillance of individuals; and (3) vulnerability to hackers. The use of facial recognition in the context of the governments’ ability to curtail rights such as freedom of assembly and expression through the identification of protesters is a particularly worrisome concern.

a. Lindert et al. (2020), Sepulveda (2018), and Sepulveda and Nyst (2012) highlight that the exclusion of people from social programs or from obtaining IDs is, among others, due to lack of awareness of the enrollment; limited infrastructure or presence of the enrollment office or station, mainly in rural and the poorest areas; physical mobility of individuals; cost or any other administrative requirement; physical inability to provide reliable biometric information; and cultural barriers and gender norms.

b. Ohlenburg (2020a) indicates that when a learning algorithm is presented with data that reflect historical discrimination, it will learn to imitate the biased patterns of the past, hence perpetuating the historical discrimination. c. See, for example, https://www.nytimes.com/2020/06/24/technology/facial -recognition-arrest.html and https://www.cnn.com/2020/06/24/tech/aclu-mis taken-facial-recognition/index.html.

6.10 Machine Learning, Big Data, and Human Rights

Next Article

7.5 Performance Triangle for Two Programs

BOX 6.10

Machine Learning, Big Data, and Human Rights

More articles from this publication:

7.5 Performance Triangle for Two Programs

7.9 Relative Efficiency of Programs

Concluding Remarks

7.13 Exclusion and Inclusion Errors

the Poverty Line

7.12 Impacts on Poverty and Inequality

7.3 Inclusion and Exclusion Errors in a 10-Person Economy

7.4 Targeting Differential

What to Look for When Conducting Method Assessments

This article is from:

Revisiting Targeting in Social Assistance