Development Research in Practice

Page 150

validation code so the script will return an error if unexpected results show up in future runs. Paying close attention to merge results is necessary to avoid ­unintentional changes to the data. Two issues that require careful scrutiny are missing values and dropped observations. This process entails reading about how each command treats missing observations: Are unmatched observations dropped, or are they kept with missing values? Whenever possible, automated checks should be added in the script to throw an error message if the result is different than what is expected; if this step is skipped, changes in the outcome may appear after running large chunks of code, and these changes will not be flagged. In ­addition, any changes in the number of observations in the data need to be documented in the comments, including explanations for why they are happening. If subsets of the data are being created, keeping only matched observations, it is helpful to document the reason why the observations differ across data sets as well as why the team is only interested in observations that match. The same applies when adding new observations from the merged data set. Some merges of data with different units of observation are more conceptually complex. Examples include overlaying road location data with household data using a spatial match; combining school administrative data, such as attendance records and test scores, with student demographic characteristics from a survey; or linking a data set of infrastructure access points, such as water pumps or schools, with a data set of household locations. In these cases, a key contribution of the research is figuring out a useful way to combine the data sets. Because the conceptual constructs that link observations from the two data sources are important and can take many possible forms, it is especially important to ensure that the data integration is documented ­extensively and separately from other construction tasks (see box 6.2 for an example of merges followed by automated tests from the Demand for Safe Spaces project).

BOX 6.2  INTEGRATING MULTIPLE DATA SOURCES: A CASE STUDY FROM THE DEMAND FOR SAFE SPACES PROJECT The research team received the raw crowdsourced data acquired for the Demand for Safe Spaces study in a different level of observation than the one relevant for analysis. The unit of analysis was a ride, and each trip was represented in the crowdsourced data set by three rows, one for questions answered before boarding the train, one for those answered during the trip, and one for those answered after leaving the train. The Tidying data example in box 5.3 explains how the team created three intermediate data sets for each of these tasks. To create the ride-level data set, the team combined the individual task data sets. The following code shows how the team assured that all observations had merged as expected, showing two different approaches depending on what was expected. (Box continues on next page)

130

DEVELOPMENT RESEARCH IN PRACTICE: THE DIME ANALYTICS DATA HANDBOOK


Turn static files into dynamic content formats.

Create a flipbook

Articles inside

Appendix C: Research design for impact evaluation

33min
pages 215-231

Appendix A: The DIME Analytics Coding Guide

24min
pages 195-210

Appendix B: DIME Analytics resource directory

3min
pages 211-214

8.1 Research data work outputs

6min
pages 190-194

Chapter 8: Conclusion

1min
page 189

7.4 Releasing a reproducibility package: A case study from the Demand for Safe Spaces project

3min
pages 184-186

7.1 Summary: Publishing reproducible research outputs

8min
pages 172-175

7.3 Publishing research data sets: A case study from the Demand for Safe Spaces project

10min
pages 180-183

7.2 Publishing research papers and reports: A case study from the Demand for Safe Spaces project

8min
pages 176-179

Chapter 7: Publishing reproducible research outputs

1min
page 171

6.1 Data analysis tasks and outputs

3min
pages 168-170

6.8 Managing outputs: A case study from the Demand for Safe Spaces project

10min
pages 163-167

6.7 Visualizing data: A case study from the Demand for Safe Spaces project

4min
pages 161-162

6.6 Organizing analysis code: A case study from the Demand for Safe Spaces project

4min
pages 159-160

6.5 Writing analysis code: A case study from the Demand for Safe Spaces project

3min
pages 157-158

6.4 Documenting variable construction: A case study from the Demand for Safe Spaces project

4min
pages 155-156

6.3 Creating analysis variables: A case study from the Demand for Safe Spaces project

1min
page 154

6.2 Integrating multiple data sources: A case study from the Demand for Safe Spaces project

9min
pages 150-153

6.1 Summary: Constructing and analyzing research data

10min
pages 146-149

Chapter 6: Constructing and analyzing research data

1min
page 145

5.7 Recoding and annotating data: A case study from the Demand for Safe Spaces project

3min
pages 140-141

5.6 Correcting data points: A case study from the Demand for Safe Spaces project

4min
pages 138-139

5.5 Implementing de-identification: A case study from the Demand for Safe Spaces project

9min
pages 134-137

5.1 Summary: Cleaning and processing research data

7min
pages 122-124

5.4 Assuring data quality: A case study from the Demand for Safe Spaces project

7min
pages 131-133

5.3 Tidying data: A case study from the Demand for Safe Spaces project

7min
pages 128-130

5.2 Establishing a unique identifier: A case study from the Demand for Safe Spaces project

7min
pages 125-127

Chapter 5: Cleaning and processing research data

1min
page 121

B4.4.1 A sample dashboard of indicators of progress

12min
pages 113-117

4.4 Checking data quality in real time: A case study from the Demand for Safe Spaces project

2min
page 112

4.3 Piloting survey instruments: A case study from the Demand for Safe Spaces project

14min
pages 106-111

4.2 Determining data ownership: A case study from the Demand for Safe Spaces project

16min
pages 100-105

B3.3.1 Flowchart of a project data map

37min
pages 81-96

B2.3.1 Folder structure of the Demand for Safe Spaces data work

36min
pages 55-72

Chapter 4: Acquiring development data

5min
pages 97-99

Chapter 3: Establishing a measurement framework

18min
pages 73-80

Chapter 1: Conducting reproducible, transparent, and credible research

35min
pages 31-46

Chapter 2: Setting the stage for effective and efficient collaboration

18min
pages 47-54

I.1 Overview of the tasks involved in development research data work

18min
pages 22-30

Introduction

2min
page 21
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.