also be able to reproduce those steps and recreate the constructed variables. Therefore, documentation is an output of construction as important as the code and data, and it is good practice for papers to have an accompanying data appendix listing the analysis variables and their definitions. The development of construction documentation provides a good opportunity for the team to have a wider discussion about creating protocols for defining variables: such protocols guarantee that indicators are defined consistently across projects. A detailed account of how variables are created is needed and will be implemented in the code, but comments are also needed explaining in human language what is being done and why. This step is crucial both to prevent mistakes and to guarantee transparency. To make sure that these comments can be navigated more easily, it is wise to start writing a variable dictionary as soon as the team begins thinking about making changes to the data (for an example, see Jones et al. 2019). The variable dictionary can be saved in an Excel spreadsheet, a Word document, or even a plain-text file. Whatever format it takes, it should carefully record how specific variables have been transformed, combined, recoded, or rescaled. Whenever relevant, the documentation should point to specific scripts to indicate where the definitions are being implemented in code. The iecodebook export subcommand is a good way to ensure that the project has easy-to-read documentation. When all final indicators have been created, it can be used to list all variables in the data set in an Excel sheet. The variable definitions can be added to that file to create a concise metadata document. This step provides a good opportunity to review the notes and make sure that the code is implementing exactly what is described in the documentation (see box 6.4 for an example of variable construction documentation). BOX 6.4 DOCUMENTING VARIABLE CONSTRUCTION: A CASE STUDY FROM THE DEMAND FOR SAFE SPACES PROJECT In an appendix to the working paper, the Demand for Safe Spaces team documented the definition of every variable used to produce the outputs presented in the paper:
Variable definitions for rider audit demographic survey Variable
Definition
Age
Median age in years of the rider’s age category when demographic survey was responded
Employed
= 1 if rider had part-time or full-time job when responded to demographic survey
High self-reported socio-economic status
= 1 if rider reported being a member of classes A or B
Low education (middle school or less)
= 1 if highest degree obtained by the rider at the time the demographic survey was responded was middle school or lower (Box continues on next page)
CHAPTER 6: CONSTRUCTING AND ANALYZING RESEARCH DATA
135