completed, these variables may be removed from the data set. In fact, starting from a minimal set of variables and adding new ones as they are cleaned can make the data easier to handle. Using commands such as compress in Stata so that the data are always stored in the most efficient format helps to ensure that the cleaned data set file does not get too big to handle. Although all of these tasks are key to making the data easy to use, implementing them can be quite repetitive and create convoluted scripts. The iecodebook command suite, part of the iefieldkit Stata package, is designed to make some of the most tedious components of this process more efficient. It also creates a self-documenting workflow, so the data-cleaning documentation is created alongside the code, with no extra steps (see box 5.7 for a description of how iecodebook was used in the Demand for Safe Spaces project). In R, the Tidyverse (https://www.tidyverse.org) packages provide a consistent and useful grammar for performing the same tasks and can be used in a similar workflow.
BOX 5.7 RECODING AND ANNOTATING DATA: A CASE STUDY FROM THE DEMAND FOR SAFE SPACES PROJECT The Demand for Safe Spaces team relied mostly on the iecodebook command for this part of the
data-cleaning process. The screenshot below shows the iecodebook form used to clean the crowdsourced ride data. This process was carried out for each task.
Column B contains the corrected variable labels, column D indicates the value labels to be used for categorical variables, and column I recodes the underlying numbers in those variables. The differences between columns E and A indicate changes to variable names. Typically, it is strongly recommended not to rename variables at the cleaning stage, because it is important to maintain correspondence with the original data set. However, that was not possible in this case, because the same question had inconsistent variable names across multiple transfers of the data from the technology firm managing the mobile application. In fact, this is one of the two cleaning tasks that
(Box continues on next page) 120
DEVELOPMENT RESEARCH IN PRACTICE: THE DIME ANALYTICS DATA HANDBOOK