BOX 7.1 SUMMARY: PUBLISHING REPRODUCIBLE RESEARCH OUTPUTS Whether writing a policy brief or academic article or producing some other kind of research product, it is important to create three final outputs that are ready for public release (or internal archiving if not public). 1. The data publication package. If the researcher holds the rights to distribute data that have been collected or obtained, this information should be made available to the public as soon as feasible. This release should • Contain all nonidentifying variables and observations originally collected in a widely accessible format, with a data codebook describing all variables and values; • Contain original documentation about the collection of the data, such as a survey questionnaire, API script, or data license; • Be modified or masked only to correct errors and to protect the privacy of people described in the data; and • Be appropriately archived and licensed, with clear terms of use. 2. The research reproducibility package. Either researchers or their organization will typically have the rights to distribute the code for data analysis, even if access to the data is restricted. This package should • Contain all code required to derive analysis data from the published data; • Contain all code required to reproduce research outputs from analysis data; • Contain a README file with documentation on the use and structure of the code; and • Be appropriately archived and licensed, with clear terms of use. 3. The written research product(s). These products should be • Written and maintained as a dynamic document, such as a LaTeX file; • Linked to the locations of all code outputs in the code directory; • Recompiled with all final figures, tables, and other code outputs before release; and • Authored, licensed, and published in accordance with the policies of the organization or publisher.
Key responsibilities for task team leaders and principal investigators • Oversee the production of outputs, and know where to obtain legal or technical support if needed. • Have original legal documentation available for all data. • Understand the team’s rights and responsibilities regarding data, code, and research publication. • Decide among potential publication locations and processes for code, data, and written materials. • Verify that replication material runs and replicates the outputs in the written research product(s) exactly. (Box continues on next page) 152
DEVELOPMENT RESEARCH IN PRACTICE: THE DIME ANALYTICS DATA HANDBOOK