Issuu

Development Research in Practice

BOX 5.2 ESTABLISHING A UNIQUE IDENTIFIER: A CASE STUDY FROM THE DEMAND FOR SAFE SPACES PROJECT All data sets have a unit of observation, and the first columns of each data set should uniquely identify which unit is being observed. In the Demand for Safe Spaces project, as should be the case in all projects, the first few lines of code that imported each original data set immediately ensured that this was true and applied any corrections from the field needed to fix errors related to uniqueness. The code segment below was used to import the crowdsourced ride data; it used the ieduplicates command to remove duplicate values of the uniquely identifying variable in the data

set. The screen shot of the corresponding ieduplicates report shows how the command documents and resolves duplicate identifiers in data collection. After applying the corrections, the code confirms that the data are uniquely identified by riders and ride identifiers and documents the decisions in an optimized format. 1 // Import to Stata format ============================================================ 2 3

import delimited using "${encrypt}/Baseline/07112016/Contributions 07112016", ///

delim(",")

///

bindquotes(strict) ///

varnames(1)

clear

///

8 9 * There are two duplicated values for obs_uid, each with two submissions. 10 * All four entries are demographic surveys from the same user, who seems to 11 * have submitted the data twice, each time creating two entries. 12 * Possibly a connectivity issue 13

ieduplicates obs_uid using "${doc_rider}/baseline-study/raw-duplicates.xlsx", ///

uniquevars(v1) ///

keepvars(created submitted started)

16 17 * Verify unique identifier, sort, optimize storage, 18 * remove blank entries and save data 19

isid user_uuid obs_uid, sort

compress

dropmiss, force

save "${encrypt}/baseline_raw.dta", replace

To access this code in do-file format, visit the GitHub repository at https://github.com/worldbank /dime-data-handbook/tree/main/code.

CHAPTER 5: CLEANING AND PROCESSING RESEARCH DATA

105

Development Research in Practice

Articles inside

Appendix C: Research design for impact evaluation

Appendix A: The DIME Analytics Coding Guide

Appendix B: DIME Analytics resource directory

8.1 Research data work outputs

Chapter 8: Conclusion

7.4 Releasing a reproducibility package: A case study from the Demand for Safe Spaces project

7.1 Summary: Publishing reproducible research outputs

7.3 Publishing research data sets: A case study from the Demand for Safe Spaces project

7.2 Publishing research papers and reports: A case study from the Demand for Safe Spaces project

Chapter 7: Publishing reproducible research outputs

6.1 Data analysis tasks and outputs

6.8 Managing outputs: A case study from the Demand for Safe Spaces project

6.7 Visualizing data: A case study from the Demand for Safe Spaces project

6.6 Organizing analysis code: A case study from the Demand for Safe Spaces project

6.5 Writing analysis code: A case study from the Demand for Safe Spaces project

6.4 Documenting variable construction: A case study from the Demand for Safe Spaces project

6.3 Creating analysis variables: A case study from the Demand for Safe Spaces project

6.2 Integrating multiple data sources: A case study from the Demand for Safe Spaces project

6.1 Summary: Constructing and analyzing research data

Chapter 6: Constructing and analyzing research data

5.7 Recoding and annotating data: A case study from the Demand for Safe Spaces project

5.6 Correcting data points: A case study from the Demand for Safe Spaces project

5.5 Implementing de-identification: A case study from the Demand for Safe Spaces project

5.1 Summary: Cleaning and processing research data

5.4 Assuring data quality: A case study from the Demand for Safe Spaces project

5.3 Tidying data: A case study from the Demand for Safe Spaces project

5.2 Establishing a unique identifier: A case study from the Demand for Safe Spaces project

Chapter 5: Cleaning and processing research data

B4.4.1 A sample dashboard of indicators of progress

4.4 Checking data quality in real time: A case study from the Demand for Safe Spaces project

4.3 Piloting survey instruments: A case study from the Demand for Safe Spaces project

4.2 Determining data ownership: A case study from the Demand for Safe Spaces project

B3.3.1 Flowchart of a project data map

B2.3.1 Folder structure of the Demand for Safe Spaces data work

Chapter 4: Acquiring development data

Chapter 3: Establishing a measurement framework

Chapter 1: Conducting reproducible, transparent, and credible research

Chapter 2: Setting the stage for effective and efficient collaboration

I.1 Overview of the tasks involved in development research data work

Introduction