2 minute read

4.4 Checking data quality in real time: A case study from the Demand for Safe Spaces project

High-frequency data quality

checks (HFCs) are data quality checks run in real time during data collection so that any issues can be addressed while the data collection is still ongoing. For more details, see the DIME Wiki at https://dimewiki.worldbank .org/High_Frequency_Checks.

High-frequency data quality checks (HFCs) should be scripted before data collection begins, so that data checks can start as soon as data start to arrive. A research assistant should run the HFCs on a daily basis for the duration of the survey. HFCs should include monitoring the consistency and range of responses to each question, validating survey programming, testing for enumerator-specific effects, and checking for duplicate entries and completeness of online submissions vis-à-vis the field log.

HFCs will improve data quality only if the issues they catch are communicated to the team collecting the data and if corrections are documented and applied to the data. This effort requires close communication with the field team, so that enumerators are promptly made aware of data quality issues and have a transparent system for documenting issues and corrections. There are many ways to communicate the results of HFCs to the field team, with the most important being to create actionable information. ipacheck, for example, generates a spreadsheet with flagged errors; these spreadsheets can be sent directly to the data collection teams. Many teams display results in other formats, such as online dashboards created by custom scripts. It is also possible to automate the communication of errors to the field team by adding scripts to link the HFCs with a messaging platform. Any of these solutions is possible: what works best for the team will depend on factors such as cellular networks in fieldwork areas, whether field supervisors have access to laptops, internet speed, and coding skills of the team preparing the HFC workflow (see box 4.4. for how data quality assurance was applied in the Demand for Safe Spaces project).

BOX 4.4 CHECKING DATA QUALITY IN REAL TIME: A CASE STUDY FROM THE DEMAND FOR SAFE SPACES PROJECT

The Demand for Safe Spaces project created protocols for checking the quality of the platform survey data. In this case, the survey instrument was programmed for electronic data collection using the SurveyCTO platform.

• Enumerators submitted surveys to the server upon completing interviews. • The team’s field coordinator made daily notes of any unusual field occurrences in the documentation folder in the project folder shared by the research team. • The research team downloaded data daily; after each download the research assistant ran coded data quality checks. The code was prepared in advance of data collection, based on the pilot data. • The data quality checks flagged any duplicated identifications, outliers, and inconsistencies in the day’s data. Issues were reported to the field team the next day. In practice, the only issues flagged were higher-than-expected rates of refusal to answer and wrongly entered identification numbers. The field team responded on the same day to each case, and the research assistant documented any resulting changes to the data set through code.

(Box continues on next page)

This article is from: