Development Research in Practice

Page 138

made directly to the original data set. Instead, any corrections must be made as part of data cleaning, applied through code, and saved to a new data set (see box 5.6 for a discussion of how data corrections were made for the Demand for Safe Spaces project).

BOX 5.6 CORRECTING DATA POINTS: A CASE STUDY FROM THE DEMAND FOR SAFE SPACES PROJECT Most of the issues that the Demand for Safe Spaces team identified in the raw crowdsourced data during data quality assurance were related to incorrect station and line identifiers. Two steps were taken to address this issue. The first was to correct data points. The second was to document the corrections made. The correct values for the line and station identifiers, as well as notes on how they were identified, were saved in a data set called station_correction.dta. The team used the command

merge to replace the values in the raw data in memory (called the “master data” in merge) with the station_correction.dta data (called the “using data” in merge). The following options were used for the following reasons:

• update replace was used to update values in the “master data” with values from the same variable in the “using data.”

• keepusing(user_station) was used to keep only the user_station variable from the “using data.”

• assert(master match_update) was used to confirm that all observations were either

only in the “master data” or were in both the “master data” and the “using data” and that the values were updated with the values in the “using data.” This quality assurance check was important to ensure that data were merged as expected.

To document the final contents of the original data, the team published supplemental materials on GitHub as well as on the World Bank Microdata Catalog. 1 * There was a problem with the line option for one of the stations. 2 * This fixes it: 3 * -----------------------------------------------------------------------4 5

merge 1:1 obs_uuid

///

6

using "${doc_rider}/compliance-pilot/station_corrections.dta", ///

7

update replace

///

8

keepusing(user_station)

///

9

assert(master match_update)

///

10

nogen

For the complete script, visit the GitHub repository at https://git.io/Jt2ZC.

118

DEVELOPMENT RESEARCH IN PRACTICE: THE DIME ANALYTICS DATA HANDBOOK


Turn static files into dynamic content formats.

Create a flipbook

Articles inside

Appendix C: Research design for impact evaluation

33min
pages 215-231

Appendix A: The DIME Analytics Coding Guide

24min
pages 195-210

Appendix B: DIME Analytics resource directory

3min
pages 211-214

8.1 Research data work outputs

6min
pages 190-194

Chapter 8: Conclusion

1min
page 189

7.4 Releasing a reproducibility package: A case study from the Demand for Safe Spaces project

3min
pages 184-186

7.1 Summary: Publishing reproducible research outputs

8min
pages 172-175

7.3 Publishing research data sets: A case study from the Demand for Safe Spaces project

10min
pages 180-183

7.2 Publishing research papers and reports: A case study from the Demand for Safe Spaces project

8min
pages 176-179

Chapter 7: Publishing reproducible research outputs

1min
page 171

6.1 Data analysis tasks and outputs

3min
pages 168-170

6.8 Managing outputs: A case study from the Demand for Safe Spaces project

10min
pages 163-167

6.7 Visualizing data: A case study from the Demand for Safe Spaces project

4min
pages 161-162

6.6 Organizing analysis code: A case study from the Demand for Safe Spaces project

4min
pages 159-160

6.5 Writing analysis code: A case study from the Demand for Safe Spaces project

3min
pages 157-158

6.4 Documenting variable construction: A case study from the Demand for Safe Spaces project

4min
pages 155-156

6.3 Creating analysis variables: A case study from the Demand for Safe Spaces project

1min
page 154

6.2 Integrating multiple data sources: A case study from the Demand for Safe Spaces project

9min
pages 150-153

6.1 Summary: Constructing and analyzing research data

10min
pages 146-149

Chapter 6: Constructing and analyzing research data

1min
page 145

5.7 Recoding and annotating data: A case study from the Demand for Safe Spaces project

3min
pages 140-141

5.6 Correcting data points: A case study from the Demand for Safe Spaces project

4min
pages 138-139

5.5 Implementing de-identification: A case study from the Demand for Safe Spaces project

9min
pages 134-137

5.1 Summary: Cleaning and processing research data

7min
pages 122-124

5.4 Assuring data quality: A case study from the Demand for Safe Spaces project

7min
pages 131-133

5.3 Tidying data: A case study from the Demand for Safe Spaces project

7min
pages 128-130

5.2 Establishing a unique identifier: A case study from the Demand for Safe Spaces project

7min
pages 125-127

Chapter 5: Cleaning and processing research data

1min
page 121

B4.4.1 A sample dashboard of indicators of progress

12min
pages 113-117

4.4 Checking data quality in real time: A case study from the Demand for Safe Spaces project

2min
page 112

4.3 Piloting survey instruments: A case study from the Demand for Safe Spaces project

14min
pages 106-111

4.2 Determining data ownership: A case study from the Demand for Safe Spaces project

16min
pages 100-105

B3.3.1 Flowchart of a project data map

37min
pages 81-96

B2.3.1 Folder structure of the Demand for Safe Spaces data work

36min
pages 55-72

Chapter 4: Acquiring development data

5min
pages 97-99

Chapter 3: Establishing a measurement framework

18min
pages 73-80

Chapter 1: Conducting reproducible, transparent, and credible research

35min
pages 31-46

Chapter 2: Setting the stage for effective and efficient collaboration

18min
pages 47-54

I.1 Overview of the tasks involved in development research data work

18min
pages 22-30

Introduction

2min
page 21
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.
Development Research in Practice by World Bank Publications - Issuu