Issuu

Development Research in Practice

the values for each variable fall within the expected range and whether related variables contradict each other. Electronic data collection systems often incorporate many quality control features, such as range restrictions and logical flows. Data received from systems that do not include such controls should be checked more carefully. Consistency checks are project specific, so it is difficult to provide general guidance. Having detailed knowledge of the variables in the data set and carefully examining the analysis plan are the best ways to prepare. Examples of inconsistencies in survey data include cases in which a household reports having cultivated a plot in one module but does not list any cultivated crops in another. Response consistency should be checked across all data sets, because this task is much harder to automate. For example, if two sets of administrative records are received, one with hospital-level information and one with data on each medical staff member, the number of entries in the second set of entries should match the number of employed personnel in the first set. Finally, no amount of preprogrammed checks can replace actually looking at the data. Of course, that does not mean checking each data point; it does mean plotting and tabulating distributions for the main variables of interest. Doing so will help to identify outliers and other potentially problematic patterns that were not foreseen. A common source of outlier values in survey data are typos, but outliers can also occur in administrative data if, for example, the unit reported changed over time, but the data were stored with the same variable name. Identifying unforeseen patterns in the distribution will also help the team to gather relevant information—for example, if there were no harvest data because of a particular pest in the community or if the unusual call records in a particular area were caused by temporary downtime of a tower. Analysis of metadata can also be useful in assessing data quality. For example, electronic survey software generates automatically collected timestamps and trace histories, showing when data were submitted, how long enumerators spent on each question, and how many times answers were changed before or after the data were submitted. See box 5.4 for examples of data quality checks implemented in the Demand for Safe Spaces project.

BOX 5.4 ASSURING DATA QUALITY: A CASE STUDY FROM THE DEMAND FOR SAFE SPACES PROJECT The Demand for Safe Spaces team adopted three categories of data quality assurance checks for the crowdsourced ride data. The first—completeness—made sure that each data point made technical sense in that it contained the right elements and that the data as a whole covered all of the expected spaces. The second—consistency—made sure that real-world details were right: stations were on (Box continues on next page)

CHAPTER 5: CLEANING AND PROCESSING RESEARCH DATA

111

Development Research in Practice

Articles inside

Appendix C: Research design for impact evaluation

Appendix A: The DIME Analytics Coding Guide

Appendix B: DIME Analytics resource directory

8.1 Research data work outputs

Chapter 8: Conclusion

7.4 Releasing a reproducibility package: A case study from the Demand for Safe Spaces project

7.1 Summary: Publishing reproducible research outputs

7.3 Publishing research data sets: A case study from the Demand for Safe Spaces project

7.2 Publishing research papers and reports: A case study from the Demand for Safe Spaces project

Chapter 7: Publishing reproducible research outputs

6.1 Data analysis tasks and outputs

6.8 Managing outputs: A case study from the Demand for Safe Spaces project

6.7 Visualizing data: A case study from the Demand for Safe Spaces project

6.6 Organizing analysis code: A case study from the Demand for Safe Spaces project

6.5 Writing analysis code: A case study from the Demand for Safe Spaces project

6.4 Documenting variable construction: A case study from the Demand for Safe Spaces project

6.3 Creating analysis variables: A case study from the Demand for Safe Spaces project

6.2 Integrating multiple data sources: A case study from the Demand for Safe Spaces project

6.1 Summary: Constructing and analyzing research data

Chapter 6: Constructing and analyzing research data

5.7 Recoding and annotating data: A case study from the Demand for Safe Spaces project

5.6 Correcting data points: A case study from the Demand for Safe Spaces project

5.5 Implementing de-identification: A case study from the Demand for Safe Spaces project

5.1 Summary: Cleaning and processing research data

5.4 Assuring data quality: A case study from the Demand for Safe Spaces project

5.3 Tidying data: A case study from the Demand for Safe Spaces project

5.2 Establishing a unique identifier: A case study from the Demand for Safe Spaces project

Chapter 5: Cleaning and processing research data

B4.4.1 A sample dashboard of indicators of progress

4.4 Checking data quality in real time: A case study from the Demand for Safe Spaces project

4.3 Piloting survey instruments: A case study from the Demand for Safe Spaces project

4.2 Determining data ownership: A case study from the Demand for Safe Spaces project

B3.3.1 Flowchart of a project data map

B2.3.1 Folder structure of the Demand for Safe Spaces data work

Chapter 4: Acquiring development data

Chapter 3: Establishing a measurement framework

Chapter 1: Conducting reproducible, transparent, and credible research

Chapter 2: Setting the stage for effective and efficient collaboration

I.1 Overview of the tasks involved in development research data work

Introduction