Development Research in Practice by World Bank Publications

Development Research in Practice

BOX 5.1 SUMMARY: CLEANING AND PROCESSING RESEARCH DATA After being acquired, data must be structured for analysis in accordance with the research design, as laid out in the data linkage tables and the data flowcharts discussed in chapter 3. This process entails the following tasks. 1. Tidy the data. Many data sets do not have an unambiguous identifier as received, and the rows in the data set often do not match the units of observation specified by the research plan and data linkage table. To prepare the data for analysis requires two steps: • Determine the unique identifier for each unit of observation in the data. • Transform the data so that the desired unit of observation uniquely identifies rows in each data set. 2. Validate data quality. Data completeness and quality should be validated upon receipt to ensure that the information is an accurate representation of the characteristics and individuals it is supposed to describe. This process entails three steps: • Check that the data are complete—that is, that all the observations in the desired sample were received. • Make sure that data points are consistent across variables and data sets. • Explore the distribution of key variables to identify outliers and other unexpected patterns. 3. De-identify, correct, and annotate the data. After the data have been processed and de-identified, the information must be archived, published, or both. Before publication, it is necessary to ensure that the processed version is highly accurate and appropriately protects the privacy of individuals: • De-identify the data in accordance with best practices and relevant privacy regulations. • Correct data points that are identified as being in error compared to ground reality. • Recode, document, and annotate data sets so that all of the content will be fully interpretable by future users, whether or not they were involved in the acquisition process.

Key responsibilities for task team leaders and principal investigators • Determine the units of observation needed for experimental design and supervise the development of appropriate unique identifiers. • Indicate priorities for quality checks, including key indicators and reference values. • Provide guidance on how to resolve all issues identified in data processing, cleaning, and preparation. • Publish or archive the prepared data set.

Key responsibilities for research assistants • Develop code, data, and documentation linking data sets with the data map and study design, and tidy all data sets to correspond to the required units of observation. (Box continues on next page)

102

DEVELOPMENT RESEARCH IN PRACTICE: THE DIME ANALYTICS DATA HANDBOOK

Development Research in Practice

Articles inside

Appendix C: Research design for impact evaluation

Appendix A: The DIME Analytics Coding Guide

Appendix B: DIME Analytics resource directory

8.1 Research data work outputs

Chapter 8: Conclusion

7.4 Releasing a reproducibility package: A case study from the Demand for Safe Spaces project

7.1 Summary: Publishing reproducible research outputs

7.3 Publishing research data sets: A case study from the Demand for Safe Spaces project

7.2 Publishing research papers and reports: A case study from the Demand for Safe Spaces project

Chapter 7: Publishing reproducible research outputs

6.1 Data analysis tasks and outputs

6.8 Managing outputs: A case study from the Demand for Safe Spaces project

6.7 Visualizing data: A case study from the Demand for Safe Spaces project

6.6 Organizing analysis code: A case study from the Demand for Safe Spaces project

6.5 Writing analysis code: A case study from the Demand for Safe Spaces project

6.4 Documenting variable construction: A case study from the Demand for Safe Spaces project

6.3 Creating analysis variables: A case study from the Demand for Safe Spaces project

6.2 Integrating multiple data sources: A case study from the Demand for Safe Spaces project

6.1 Summary: Constructing and analyzing research data

Chapter 6: Constructing and analyzing research data

5.7 Recoding and annotating data: A case study from the Demand for Safe Spaces project

5.6 Correcting data points: A case study from the Demand for Safe Spaces project

5.5 Implementing de-identification: A case study from the Demand for Safe Spaces project

5.1 Summary: Cleaning and processing research data

5.4 Assuring data quality: A case study from the Demand for Safe Spaces project

5.3 Tidying data: A case study from the Demand for Safe Spaces project

5.2 Establishing a unique identifier: A case study from the Demand for Safe Spaces project

Chapter 5: Cleaning and processing research data

B4.4.1 A sample dashboard of indicators of progress

4.4 Checking data quality in real time: A case study from the Demand for Safe Spaces project

4.3 Piloting survey instruments: A case study from the Demand for Safe Spaces project

4.2 Determining data ownership: A case study from the Demand for Safe Spaces project

B3.3.1 Flowchart of a project data map

B2.3.1 Folder structure of the Demand for Safe Spaces data work

Chapter 4: Acquiring development data

Chapter 3: Establishing a measurement framework

Chapter 1: Conducting reproducible, transparent, and credible research

Chapter 2: Setting the stage for effective and efficient collaboration

I.1 Overview of the tasks involved in development research data work

Introduction