Icpsr poster portfolio

Page 1

portfolio ICPSR POSTER

Data Stewardship on Display DESIGNS BY JENNA TYSON


One of the best ways ICPSR communicates the vast resources it has to offer is with its impactful posters, artfully designed to inform and inspire social scientists and data stewards at events around the world. From data management to open access to data curation, we’ve got it covered.


National Archive of Data on Arts and Culture: A Resource for Researchers, Policymakers, Practitioners, and the General Public Alison Stroud and Amy Pienta, ICPSR, University of Michigan The National Archive of Data on Arts and Culture (NADAC) is a repository that facilitates research on arts and culture by acquiring data, particularly those funded by federal agencies and other organizations, and sharing those data with researchers, policymakers, people in the arts and culture field, and the general public.

Log In/Create Account

contact us

HOME

DATA

GO

search for data

PUBLICATIONS

DEPOSIT

ABOUT

National Archive of Data on Arts & Culture

The NADAC website provides free and useful tools that enable data users to quickly and effectively explore arts and culture data. NADAC Data Exploration Tools include the following:

67 datasets in our repository ...and counting

Data User Services

ARTS & CULTURE IN AMERICAN LIFE

NADAC provides the following services to assist data users:

Featured Data • 35529: American Community Survey, 2008–2012 [United States]: Public Use Microdata Sample: Artist Extract • 35478: General Social Survey, 2012 Merged Data, Including a Cultural Module [United States] • 35168: Survey of Public Participation in the Arts (SPPA), 2012 [United States]

Quick Facts is a valuable tool that provide infographics and statistics on topics including artists and performers, funding issues, public attitudes, arts education, and arts organizations.

32% 18% 18%

Read books

• Technical assistance with data preparation • Consultation on data collection to enhance the quality of the data for analysis • Confidentiality reviews of data

34%

Dance socially

NADAC provides the following services to assist data depositors:

58%

Access music via mobile devices

Data Depositor Services

Percent of U.S. Adults Who:

Do artistic photography/ photo editing Attend plays or musicals

• Identification of arts and culture data on specific topics • Online analysis and customized subsetting of selected datasets using the Survey Documentation and Analysis (SDA) statistical software written by staff at the University of California-Berkeley. • Assistance with retrieval and use of data files from NADAC • Bibliographic citations to publications based on research data in NADAC • Information in understanding and using research data

NADAC Data Exploration Tools for the Public

Quick Tables allows NADAC users to generate custom crosstabulations from a pre‐selected subset of the variables in the study.

Source: 2012 Survey of Public Participation in the Arts

The National Archive of Data on Arts & Culture provides free and easy access to data on the value and impact of the arts in individuals and communities.

Affiliations NADAC is one of several topical archives hosted by the Inter-university Consortium for Political and Social Research (ICPSR), the largest social science data archive in the world and part of the University of Michigan's Institute for Social Research. NADAC is funded by the National Endowment for the Arts (NEA). Because of NEA's support, users obtain data from NADAC at no charge.

Simple Crosstab Builder is a tool that presents a streamlined interface for generating frequencies for one variable or for building a table of two variables.

Contact Us National Archive of Data on Arts & Culture ICPSR, University of Michigan P.O. Box 1248, Ann Arbor, MI 48106-1248 734.647.2200 • icpsr-nadac@umich.edu NADAC Team Amy Pienta, Archive Director Alison Stroud, Archive Manager

Publication Search helps NADAC users find articles or reports related to the dataset they are interested. NADAC users can refine their search using filters for type of publications, journals, and authors.

www.icpsr.umich.edu/nadac


A Common Data Platform Serving Unique Data Needs Common Platform ICPSR is a collection of data collections related to one another by a common data platform, common guidelines for data ingest and processing, and a common presentation of data for discovery via a shared (common), searchable, data catalog. Common ingest of data aids quality control; common presentation of data aids discovery and access for the data user.

Unique Approaches Diverse data collections require unique approaches to sharing data and documentation. ICPSR data collections retain distinctive website imagery and branding, oftentimes diverse analysis functionality and presentation of data, varied methods in accessing restricted-use data, and unique outreach strategies.

Advancing Research Now and in the Future The goal, of course, is to maximize the return on research data investment by aiding today’s and tomorrow’s scientists with transparent and new discovery via well-documented and preserved scientific data.

HMCA

DATA DEPOSITOR

DATA USER DA TA I

NG

EST

CURATION

Cleaning & Processing Metadata

DISCOVERY

AD LO N W DO

ICPSR Data Catalog search

Dissemination & Preservation

D

AT RM A P O L AT F

open


From Data Sharing to Data Stewardship

Questions? Contact us! 734.647.2200 or netmail@icpsr.umich.edu

Meeting Federal Requirements Now and Into the Future About Public Data Sharing Federal Environment Increased emphasis on sharing scientific research — publications and digital data to be made available and useful for public to maximize Federal investment: • NSF: January 2011 — data management plan (DMP) submission requirements in proposals; NIH has similar requirements • OSTP: February 2013 — agencies to develop strategies for public access to scientific data

Keys to Good Data Sharing — Data Stewardship • Data discoverable, accessible, usable > Data available today and in future > Research respondents protected

Sustainable Public Data Sharing Models • Fee-for-access model — subscription • Agency model — agency or foundation funds public access • Fee-for-deposit model — researcher writes fee into grant, pays at deposit

Sharing Restricted-use Data with the Public • Data with disclosure risk — might identify a subject or contain sensitive personal information • Can be securely shared when: > Virtual Data Enclave retains data safely on server > Process and experience exist to garner IRB approval > Tested system, technology, data professionals, and collaboration space are in place to disseminate

ICPSR’s Support of Public Data Sharing

Public Access at ICPSR

ICPSR’s Stewardship Role

ICPSR’s Fee-for-Deposit Data Sharing Model

• Over 50 years as global leader preserving and sharing research data • Is a center within Institute for Social Research at University of Michigan • ICPSR supports researchers who: > Write research articles, books, and papers > Teach or utilize quantitative methods > Write grant/contract proposals (require DMPs) • Data stewardship = data curation = ICPSR’s purpose!

Data Curation • Curation (Latin "to care") is the process used to add value to data, maximize access, and ensure long-term preservation. • Data curation is akin to work performed by an art or museum curator. > Data are organized, described, cleaned, enhanced, and preserved, like work done on paintings or rare books to make the works accessible.

ICPSR, Curation, & Public Data Sharing ICPSR strongly supports public data sharing that emphasizes data curation to ensure data are: • Discoverable: ICPSR expertly tags scientific data for easy discovery by users. • Used: Thousands of individuals search ICPSR’s data catalog seeking data to download and analyze. It also is indexed by search engines. • Sustainable: Depositors count on ICPSR’s longevity. They needn’t worry their data will suddenly disappear due, for example, to a loss of funding. • Securely handled: It has the infrastructure and knowledge in place to store and disseminate restricted-use data securely.

Tips for Evaluating Public-Access Data Services

ICPSR has utilized sustainable fee-for-access and agency-funded models for decades. Recently, ICPSR launched openICPSR, a fee-for-deposit, public-access data sharing service.

Consider these key questions:

openICPSR uniquely includes elements of data curation within the fee-for-deposit environment, making it the only public-access data sharing service: • Where the deposit is reviewed by professional social and behavioral sciences data curators • With an immediate distribution network of over 750 institutions that has powerful search tools, and a search-engine indexed catalog • Sustained by an organization with over 50 years of data stewardship experience • Ready to accept and disseminate sensitive and/or restricted-use data

q How will the service sustain itself? Is there a long-term funding stream? q How will the service care for my data in the long term should the service fail? Is there a safety net? q Can the service quickly maximize discoverability of my data? How? q Does it have a large network of researchers and students seeking data? Will my data get used?

openICPSR — Powering Public-Access Data Sharing The service provides for:

q Does the service understand international archiving standards?

• Self Deposit: Researchers deposit data and documentation on demand. Depositors receive a DOI and data citation; a metadata review is provided shortly afterward. Cost: $600 per project, usually paid for by research funds from the grant award.

q Does it provide a DOI, data citation, and version control for updating my files? q Does the service have proven experience securing sensitive data upon intake and when sharing?

• Professional Curation: Enables a researcher to tap all aspects of ICPSR’s curation services. Fee depends on data complexity and curation services utilized. Cost: Call for a quote, preferably as grant proposal (specifically the DMP) is being prepared. • Institutional or Journal Branded Data Sharing: Fully hosted public-access repository stored on openICPSR servers. In addition to the openICPSR self-curation tools and professional review, the institution or journal receives basic site branding and a custom URL. Cost: Annual fee is based on storage requirements and is paid by institution or journal.

Visit openICPSR.org

Visit www.openICPSR.org

About ICPSR

ICPSR is the world's largest archive of behavioral and social science research data. We advance research by acquiring, curating, and preserving original behavioral and social science data. We are a vital partner in research and instruction.

We support researchers, students, instructors, and policy makers who conduct secondary research or generate new findings, preserve and disseminate primary research data, study or teach statistical methods in quantitative analysis, write articles or theses, or develop funding proposals for grants or contracts that require a data management plan. The Inter-university Consortium for Political and Social Research (ICPSR) is part of the Institute for Social Research at the University of Michigan.


Bridging the Data Divide — Economical Repository Management in the openICPSR Cloud

FOR INSTITUTIONS AND JOURNALS

Public-Access Data Repository Needs and Preferences expressed by Institutions and Journals

openICPSR Public-Access Data Sharing from ICPSR What is openICPSR? openICPSR is a research data-sharing service for the social and behavioral sciences. It allows the public to access research data at no charge. openICPSR was created to help research scientists be recognized, cited, and credited, to impart confidence that their data will be protected and sustained, and to provide a sharing service for those who require restricted-use data dissemination.

What is unique about openICPSR? openICPSR provides: • Immediate distribution through an established network of over 760 research institutions that has powerful search tools and a data catalog indexed by major search engines • Reliability of a trusted, sustainable organization with over 50 years’ experience storing research data • Metadata review by professional social and behavioral science data curators • The ability to accept and disseminate sensitive or restricted-use data in the public-access environment

How is data accessed and preserved? openICPSR provides bit-level preservation and public access. (Curated data preservation and public access are also available and highly encouraged.)

openICPSR for Institutions and Journals Using the features of openICPSR to meet the unique needs of institutions and journals, openICPSR for Institutions and Journals: • Shares research data with the public to fulfill government agency grant requirements • Brands the research data-sharing service with an entity’s logo, colors, and unique URL • Demonstrates research transparency by making data available for replication and providing live links to publications

In May 2014, the openICPSR development team conducted executive interviews with representatives of universities and journals regarding their needs in a public-access data repository. Here is how they described their desired repository: • Fully hosted repository-stored external servers (the cloud) requiring no technical or infrastructure systems to build or maintain • Basic site branding and a unique URL

• Provides research scientists with DOIs and data citations upon deposit

• Immediate DOI and data citation

• Increases exposure and reach of institution’s research via professionally reviewed metadata, inclusion in ICPSR’s data catalog, and integration with the institution’s social media

• Basic usage statistics and reporting on deposits, views, shares, downloads

• Administers the data repository economically and easily without the need for additional technical staff or equipment; a dashboard of usage reports is available on demand • Shares and preserves restricted-use (sensitive) data securely and with confidence • Promotes trust that organization’s research data are safe for the long term, backed by over 50 years of sustained experience from the world’s largest archive of social and behavioral science research data

How does openICPSR accept & disseminate restricted-use data? openICPSR accepts restricted-use data for deposit using the same deposit process as for public-use data. Restricted-use data is disseminated via ICPSR’s Virtual Data Enclave.

What do institutions and journals need in a respository?

www.openicpsr.org

• Professional metadata review by expert staff to maximize discovery of data and usage • Preservation and public access for the long term • Inclusion in a highly used data catalog to increase exposure • Meet federal requirements for public access of scientific data • Capability (infrastructure) to accept and disseminate restricted-use data


Data Sharing Redesigned

www.openICPSR.org

www.openicpsr.org

(beta)

Browse data

Institutional repositories

Three easy steps:

Journal repositories

1. Name your project

2. Uploade and describe files

My projects

Login

3. Purchase a curation option

TA

Watch our videos>>

BE

Share your social and behavioral science research data Get started now >>

Maximize Access

Store Safely

Protect Confidentiality

Be recognized and cited

Store your data with confidence

Ensure confidentiality and privacy

© Inter-university Consortium for Political and Social Research, 2014

About | Privacy | FAQs | Pricing | Contact Us

The Challenge Need for Data

The Solution

The Technology

New Platform for Publishing Data

Modern Technologies The site is built using Groovy on Grails.

openICPSR is a new product that allows instant publishing of research data via a self-deposit website. This move is radical for social science researchers; it’s not so radical for IT professionals. (beta)

Browse data

Institutional repositories

Journal repositories

My projects

Login

Students, instructors, and researchers need access to a wide variety of datasets in order to learn statistics and good research methodology.

New Federal Guidelines Research funded by federal agencies must (often) be made available to the general public for free, per federal open data guidelines.

Powerful Frameworks The site uses the Twitter Bootstrap framework to ensure consistent display on different devices and to make it easy for developers. It also uses the Bootstrap Editable library to make data entry easier on depositors.

New Competitors There are new competitors in the marketplace that offer instant publishing of research data for free. Their efforts distract customers and threaten to divert funding to projects more focused on quantity, rather than quality.

Reduced Resources With limited funding, ICPSR needs to focus its resources on high-value data and find a way to minimize costs for niche data, while staying true to its preservation and dissemination mission.

© Inter-university Consortium for Political and Social Research, 2014

About | Privacy | FAQs | Pricing | Contact Us

Long-Term Preservation

Leveraging U-M Relationships

Data are preserved and disseminated for 10 years.

The site and its contents are hosted using Amazon Web Services, making the site very portable and stable. When U-M experienced the recent VPN firewall issues, openICPSR was completely unaffected.

Easy Citations Create Digital Object Identifiers (DOIs) and citations so that data downloaders can easily cite the survey data.

Broad Audience Published projects included in ICPSR site, which serves over 700 universities and research institutions worldwide.

Insitutional Products Institutional/Journal Repositories created; openICPSR becomes a branded data management solution for other universities.

www.openicpsr.org/repoEntity/list

(beta)

Browse data

Institutional repositories

Journal repositories

My projects

Login

Browse Data This is a listing of all openICPSR collection titles. To do a keyword search, as well as to search the main ICPSR holdings of 7,000+ studies, go to the ICPSR website.

Study Title/Investigator 1.

2009 Federal Stimulus Package Certification Study Miller, Edward A.; Blanding, David

2.

Bay Area Race and Politics Survey 1986 Sniderman, Paul M.; Piazza, Thomas

3.

California Work Force Survey 2001-2002 Institute for Labor and Unemployment at the University of California; Survey Research Center, at the University of California, Berkeley

4.

Chicago African American Survey 1997 Sniderman, Paul; Piazza, Thomas

5.

China Rice Theory Data Talhelm, Thomas

6.

Circulation of US Daily Newspapers, 1924, Audit Bureau of Circulations. Gentzkow, Matthew; Shapiro, Jesse; Sinkinson, Michael

7.

College Application Dataset: 2014 [United States] Hanek, Kathrin; Garcia, Stephen M.; Tor, Avishalom

8.

Dutch Prejudice Survey 1998 Hagendoorn, Louk; Sniderman, Paul; Piazza, Thomas; Nekuee, Shervin

9.

Evaluating Impact of the 2009 American Recovery and Reinvestment Act on Social Science Pienta, Amy

Released/Updated

Questions? Contact us at web-support@icpsr.umich.edu


High Value, High Risk: Options for Restricted Data Dissemination at ICPSR Johanna Davidson Bleckman, University of Michigan

The vast majority of ICPSR data holdings are public-use files with no restrictions on access. In some cases, however, ICPSR provides access — for approved researchers — to restricted-use data versions that retain confidential or sensitive data. We do this by imposing strict legal and electronic requirements for access off-site or on-site in a physical data enclave. ICPSR offers three main methods of restricted-use data access:

1

2

Secure, Encrypted Download Data are encrypted and delivered via secure download to the researcher. When the request is approved, the Investigator will receive a temporary link and password to download files; alternatively in some instances, data will be sent to researchers on encrypted data CDs through the postal service, signature required. Data must be destroyed at the completion of the project.

Physical Enclave The researcher must be present in person at ICPSR to analyze data. All results are reviewed for disclosure before release to the researcher.

3

ICPSR’s Newest Secure Dissemination Vehicle: the Virtual Data Enclave (VDE) What is the VDE? The VDE is a virtual machine launched from the researcher's own desktop but operating on a remote server, similar to remotely logging into another physical computer.

How does it work? The virtual machine is isolated from the user's physical desktop computer, restricting the user from downloading files or parts of files to their physical computer. The virtual machine is also restricted in its external access, preventing users from emailing, copying, or otherwise moving files outside of the secure environment, either accidentally or intentionally. Once the analytic work is complete, researchers submit a request for their output to be vetted. ICPSR staff review the output, ensuring established confidentiality standards are met. Once the output

has been vetted, the researcher is free to share, publish, and present their results.

What are the benefits to researchers? There is growing interest in establishing innovative ways of sharing research data in its original form while minimizing disclosure risk and honoring confidentiality assurances given to research subjects. The VDE offers full access to disclosive and sensitive restricted data with reduced need to coarsen, truncate, or otherwise alter the data, therefore increasing the analytic value of the data and maximizing the impact of the initial data collection investment.

What is the user experience like? The VDE operates just like a standard Windows desktop and includes many of the most popular statistical analysis packages. All work with the data files is done within the VDE. The data files and software never leave the ICPSR servers. Software packages available in the VDE include: • ArcGIS • Emacs • Adobe Acrobat • Textpad • Microsoft Office • WinZip • SPSS • R • SAS • HLM • Stata • MPlus • StatTransfer • WinBugs • SUDAAN (SAS-callable)


Local ICPSR Data Cura0on Workshop Pilot Project Linda De)erman, Jennifer Doty, Jared Lyle, Amy Pienta, Lizzy Rolando, and Mandy Swygart-­‐Hobaugh

Overview Researchers are now increasingly encouraged or required to share and archive their data, yet training in good data practices is still lacking. In a 2009-­‐10 survey of data sharing by scientists (Tenopir et al., 2011), nearly two-­‐thirds (59 percent) of respondents reported that their organization or project does not provide training on best practices for data management. Only one-­‐third (35 percent) of respondents said they “are provided with the necessary tools and technical support for long-­‐term data management.”

Libraries are well-­‐positioned to help researchers fulJill data policies and possess the skill sets, longevity, and infrastructure needed to manage, disseminate, preserve, and track usage of data (Heidorn, 2011). Yet, they, too, indicate a desire to train their own staff since many librarians enter the profession with minimal or no data experience. A recent analysis of iSchool and LIS programs, for instance, found less than a quarter offer a course focused on research data management and curation (Creamer et al., 2012).

Domain repositories have long-­‐term expertise in data management and curation, and they are increasingly interested in connecting with and training their user communities, although they have limited opportunities to meet researchers locally. References

Heidorn, P. Bryan, The Emerging Role of Libraries in Data Curation and E-­‐science. Journal of Library Administration, 2011. 51(7-­‐8): p. 662-­‐672. http://doi.org/10.1080/01930826.2011.601269 Creamer, Andrew T., Morales, Myrna E., Kafel, Donna, Crespo, Javier, and Martin, Elaine R. (2013). "Sample of Research Data Curation and Management Courses" Journal of eScience Librarianship 1.2. http://escholarship.umassmed.edu/cgi/viewcontent.cgi?article=1016&context=jeslib

Objec0ves

Agenda

Next Steps

This pilot project teamed a social and behavioral science domain repository, ICPSR, with three local universities, Emory University, Georgia Tech, and Georgia State, to offer two day-­‐long data curation trainings: one for faculty, graduate students, and research staff, and another day for librarians and library staff.

Workshop content was patterned on the ICPSR Summer Program workshop, “Curating and Managing Research Data for Re-­‐use.” Presentations were followed by hands-­‐on exercises and discussion. Topics covered included: •  Identifying and Finding Data to Archive •  Reviewing Data •  Reviewing Data for ConJidentiality Issues •  Cleaning Data •  Describing Data •  Depositing Data •  Disseminating and Publishing Data •  Local curation resources

For ICPSR: •  Revise agenda to vary the approach between researchers and librarians. •  Condense into one day with two distinct sessions: •  Researchers: emphasize data management planning, best practices in preparing data, and how to deposit data. •  Librarians: focus on curation topics and hands-­‐ on experiences. •  Plan for additional offerings in other locations. •  Make materials available for anyone to reuse and remix.

The goals: •  Raise awareness of funder requirements and journal policies to preserve and share data, and resources available to help. •  Educate both researchers and librarians in best practices for documenting, preparing, and curating data for long-­‐term preservation and sharing. •  Provide guidance and support to researchers depositing their data with appropriate domain repositories (e.g., ICPSR, Dryad). •  Offer an opportunity to reach the researchers where they reside.

Feedback

Positive: •  “This was a great workshop and I'm glad that I had the opportunity to attend. It made me want to learn more and provided me with great resources that I can return to and explore.” •  “Got both a broad yet detailed enough view of what ICPSR is, chances to ask my project speciJic questions, and helpful hands-­‐on sessions.” •  “I learned a lot, and the topics were varied enough to give an overview, but not so in-­‐depth as to be overwhelming.”

Suggestions: •  “I believe I misunderstood what the workshop was about. The description was perhaps too broad.” •  “Some workshops were hard to complete within a given time range. Workshop should be easier to comprehend (what we have to do) and more focused (e.g., fewer tasks/questions).”

For local institutions: •  Identify related training to offer locally. •  Adopt methods to support researchers preparing data for archiving and sharing. •  Explore additional opportunities to partner with domain data archives.


Data on Children and Youth Available from NAHDAP for Possible Harmonization National Addiction & HIV Data Archive Program

Amy Pienta and Robert Choate, ICPSR, University of Michigan

Data about Children and Youth Available for Possible Harmonization Public-Use

NAHDAP’s (National Addiction and HIV Data Archive Program) mission is to facilitate research on drug addiction and HIV infection by acquiring, enhancing, preserving, and sharing data produced by research grants, particularly those funded by the National Institute on Drug Abuse. NAHDAP staff assist researchers using secondary data by providing technical assistance and specialized training for data depositors and data users. NAHDAP was created in 2009 as a topical archive within the Inter-university Consortium for Political and Social Research. ICPSR is a major center within the University of Michigan’s Institute for Social Research. NAHDAP is funded by the National Institute on Drug Abuse.

Compare Variables Compare (3)

• Boys Town Study of Youth Development • Drug Use Among Young American Indians: Epidemiology and Prediction, 2001–2006 and 2009–2013

Health Behavior in School-Aged Children (HBSC), 20092010: Student Survey

EVER HAD MARIJUANA: IN THE LAST 12 MONTHS Question

Accessing NAHDAP’s Restricted Data requires: • Online application completed (affiliation and credentials, members of research team specified, analysis plan) • Approved data safety and monitoring plan • IRB approval (or exemption) from the applicant’s institution • Restricted-data-use agreement signed by the University of Michigan and the researcher’s institution • Compliance with NAHDAP’s terms of access in the data use agreement NAHDAP provides a tutorial on the web site on how to complete the online application to request restricted-use data.

þ Q20

þ V7113

Drug Use Among Young American Indians: Epidemiology and Prediction: 2001-2006 and 2009-2013

Monitoring the Future: A Continuing Study of American Youth (8th- and 10th-Grade Surveys), 2012

Youth were selected for potential enrollment after a review of court files in each locale revealed that they had been adjudicated (found guilty) of a serious offense. Eligible crimes included all felony offenses with the exception of less serious property crimes, as well as misdemeanor weapons offenses and misdemeanor sexual assault.

2012 A01b #XMJ+HS/LAST12MO F1234

Last 12 months: Frequency used marijuana Question

# In the last 12 months

How often in the last 12 months have you used marijuana?

Item number: 00270

All females who met the age and adjudicated crime requirements, or any youth whose case was being considered for trial in the adult court system, were eligible for enrollment regardless if the charged crime was a drug offense.

#1234

On how many occasions (if any) have you used marijuana (weed, pot) or hashish (hash) during the last 12 months?

Time Method: Longitudinal: Panel Mode of Data Collection: computer-assisted personal interview (CAPI)

Value

Label

Unweighted Frequencies

%

83.5

1

None

10,499

63.3

3.5

2

Value

Label

Unweighted Frequencies

%

1

Never

10,553

2

Once or twice

442

Value 1

Label 0 OCCAS:(1)

Unweighted Frequencies

%

24,111

77.5

1-2 times

1,881

11.3

2

1-2X:(2)

1,918

3

3-5 times

195

1.5

3

3-9 times

1,300

7.8

3

3-5X:(3)

1,014

4

6-9 times

172

1.4

4

10-19 times

762

4.6

4

6-9X:(4)

714

2.3

5

10-19 times

115

0.9

5

20-49 times

643

3.9

5

10-19X:(5)

767

2.5

20-39 times

112

0.9

7

40 times or more

223

1.8

-9

6

Missing

830

6.6

12,642

100%

Missing Data

Total

Response Rates: During the enrollment period (November 2000 to January 2003) 10,461 individuals who met the age and petitioned charge criteria were processed in the court systems in Philadelphia and Phoenix. In 5,382 of the these cases (51 percent) the youth was found not guilty or had the charges reduced below a felony-level offense at adjudication. Another 1,272 cases were dropped (12 percent) from consideration because the court data were insufficient to determine the person's eligibility status at adjudication.

Responses

Responses

Responses

Maternal Lifestyle Study in Four Sites in the US, 1993–2011 National Survey of Parents and Youth (NSPY), 1998–2004 Northwestern Juvenile Project (Cook County, IL), 1995–1998 Oregon Youth Substance Use Project (OYSUP), 1998–2010 Research on Pathways to Desistance Series — (Official Arrests; Release Measures; Calendar Data) Strategic Prevention Framework State Incentive Grant (SPF-SIG) National Cross-Site Evaluation

6

50+ times

-9

Missing

-5

Not asked on this form

1,333

8.0

151

0.9

0

0.0

Missing Data

6

20-39X:(6)

7

40+X:(7)

-9

MISSING:(-9)

572

This resulted in 2,008 youths who were approached for inclusion into the study. Of those youths who were approached 1,354 consented and participated (67 percent).

3.3

Over the course of the 7-year follow-up period, there were 864 respondents (63.8 percent) were located and interviewed for 10 of 10 possible interviews. An additional 309 youths (22.8 percent) were located and interviewed for 8 or 9 out of 10 possible interviews. Conversely, there were 17 (1.3 percent) respondents who didn't participate in any additional surveys and another 22 (1.6 percent) who only were located and and interviewed for just 1 or 2 follow-up of the 10 possible follow-up interviews. These numbers do not adjust for 91 participants who either died (n=48) or refused continued participation (n=43) of the study over the course of the 7-year follow-up period.

1.8

1,124

3.5

886

2.8

31,106

100%

Overall the study was able to achieve an average of 89.5 percent for each follow-up interview.

Missing Data

Total

Variable comparison tool allows you to see how questions are asked across multiple studies.

Each study home page contains a detailed description of the study including summary, citation, subject terms, collection notes, and methodology. Comparison of this metadata will help guide data harmonization.

A Closer Look at Two NAHDAP Data Series

Online Analysis Tools NAHDAP utilizes the Survey Documentation Analysis (SDA) system to enable users to perform analysis of the data online without having to download it into a statistical software package. Another option that is run through SDA is Quick Tables which enable users to produce cross-tabulations quickly of high value target variables. Quick Tables are currently only available for the HBSC series and Drug Use Among Young American Indian study. They will soon be available for MTF also. NAHDAP will also soon start displaying data for MTF via interactive maps. Users will be able to view drug use by census region. Variables SDA Analysis

[Use classic interface] Create Variables

Selected Study: Athletic Involvement Study (of Students in a Northeastern University in the United States), 2006 Download

Codebook

Getting Started

Variable Selection: Help Selected: SEX

View

Copy to: Row Col Ctrl Filter Mode:

Append

Replace

Athletic Involvement Study (of Students in a Northeastern University in the United States, 2008 CASE IDENTIFICATION INFORMATION SECTION1: DEMOGRAPHIC BACKGROUND Personal Demographics AGE - STUDENT’S AGE SEX - STUDENT’S GENDER RACE - STUDENT’S RACE ETHNICITY - ARE YOU LATINO/HISPANIC RELIGION - WHAT IS YOUR RELIGIOUS AFFILIATION FUND - ARE YOU A FUNDAMENTALIST/EVANGELICAL ORIENT - WHAT IS YOUR SEXUAL ORIENTATION MARSTAT - MARITAL STATUS RESIDE - CURRENT RESIDENCE COLYEAR - YEAR IN COLLEGE COLGPA - COLLEGE GPA HSGRADE - HIGH SCHOOL GRADES EMPLOY - CURRENT WORK STATUS + Parent’s Demographics + SECTION 2: ATHLETIC AND PHYSICAL ACTIVITIES SECTION 3: HEALTH RISK BEHAVIORS Substance Use

+ -

SDA Frequencies/Crosstabulation Program Help: General / Recoding Variables REQUIRED Variable names to specify Row: DAI01 OPTIONAL Variable names to specify Column: RACE Control: SEX Selection Filter(s): Weight: No Weight TABLE OPTIONS Percentaging: þ Column ¨ Row ¨ Total

Row

DAI01

PAST 12 MONTHS MARIJUANA

0-8

*--1

1

Column

RACE

STUDENT’S RACE

2-6

*--1

1

Control

SEX

STUDENT’S GENDER

0-1

*--1

1

Cells contain: - Column percent - N of cases Example: age (18-50) CHART OPTIONS Type of Chart: Stacked Bar Chart Bar Chart Options: Orientation Vertical ¡ Horizontal Visual Effects: 2-D ¡ 3-D

¨ Unweighted þ Weighted

Label

MD Dataset

RACE

¨ Standard error of each percent

¨ Summary statistics

Name

Statistics for SEX = 0 (Male)

¨ Confidence Intervals Level: 95 percent ¨ N of cases to display:

Role

Show percents: ¨ Yes Palette: Color ¡ Grayscale Size: width 600 height 400

¨ Question text ¨ Suppress table þ Color coding ¨ Show Z-statistic ¨ Include missing-data values

The SDA system allows users to analyze data online. Users can also create or recode variables for use. The output is customizable and can be exported into other documents.

2 3 4 Asian Black or White American African or or Pacific American Caucasian Islander

5 Mixed Race

6 American Indian/ Native American/ ROW Other TOTAL 5.6 10.3 14.5 1 4 60

0: 0 times

12.8 6

16.7 7

15.6 42

1: 1-2 times

14.9 7

11.9 5

13.0 35

11.1 2

15.4 6

13.3 55

2: 3-11 times

10.6 5

14.3 6

13.4 36

11.1 2

12.8 5

13.0 54

DAI01 3: 12-50 times

.0 0

4: 51+ times 8: Never in my life Col Total

Color coding:

4.8 2

12.3 33

16.7 3

7.7 3

9.9 41

6.4 3

7.1 3

20.8 56

11.1 2

10.3 4

16.4 68

55.3 26

45.2 10

24.9 67

44.4 8

43.6 17

33.0 137

100.0 269

100.0 18

100.0 39

100.0 415

100.0 47

< -2.0 < -1.0 < 0.0

> 0.0

100.0 42 > 1.0

> 2.0

N in each cell: Smaller than expected Larger than expected

Z

• Annual nationally representative survey conducted by the University of Michigan and funded by the National Institute on Drug Abuse • 12th grade data since 1976; 8th–10th grade data since 1991 • Study provides systematic and accurate description of trends occurring over time • Topics routinely asked about include personal substance use, attitudes towards substance use, relationship with peers and adults, educational aspirations and experiences, and many more • New features coming soon — interactive maps and quick tables

Pathways to Desistance Series

Harmonization potential of NAHDAP data can be explored without necessarily downloading data (or applying for restricted data) by:

Of the remaining 3,807 eligible cases 1,799 (47 percent) were excluded from consideration due to potential case overload of the local interviewer or the 15 percent threshhold of drug offenders was close to being breached.

6.2

Monitoring the Future Series

Range

Many NAHDAP data sets share, at least partly, methodological approaches (sampling frame, data collection methods) and content and are possible candidates for harmonization and/or integrated data analysis.

Drug offenses constitute a large proportion of all offenses committed by youth. And males comprise the vast majority of youth who are charged with drug offenses. Therefore the study instituted a capped proportion of males with drug offenses to 15 percent of the sample at each site.

Question

Have you ever taken marijuana (pot, weed, hashish, joint)?

Weight: none

Accessing Public-use vs. Restricted-use Data NAHDAP data are freely available for all users and when possible data are available for download from the NAHDAP web site to maximize public accessibility. However, it is not always possible to minimize disclosure risk to respondents by removing all direct (name, government ID) and indirect (e.g., detailed demographic information, geography) identifying information. In these instances, access to data is restricted.

NAHDAP has a wide range of data available to support secondary analysis is issues related to child and youth development across a broad range of outcomes

Sample: Six potential cities/counties were investigated for potential selection before Phoenix and Philadelphia were finalized. These two areas were selected due to containing (a) high enough rates of serious crime committed by juveniles; (b) a diverse racial/ethnic mix of potential participants; (c) a sizable enough number of female offenders; (d) a contrast in the way the systems operate; (e) political support for the study and cooperation from the practitioners in the juvenile and criminal justice systems; and (f) the presence of experienced research collaborators to oversee the data collection.

Back to search results

þ Q818

Restricted • • • • • •

Clear All (3)

Summary

Methodology

• Athletic Involvement Study (of Students in a Northeastern University in the United States), 2006

• Flint [Michigan] Adolescent Study (FAS): A Longitudinal Study of School Dropout and Substance Use, 1994–1997 • Health Behavior in School-Aged Children (HBSC) series (5 waves) • Monitoring the Future (MTF) series (39 waves of 12th grade; 23 waves of 8th–10th grade) • PACARDO: Data on Drug Use and Behavior in School-Aged Children and Teenagers in Panama, Central America, and the Dominican Republic, 1999–2000 • Research on Pathways to Desistance Series — (Subject Measures, Collateral Measures) • Strengthening Washington DC Families (SWFP) Project, 1998–2004

What is NAHDAP?

Ways to Use the NAHDAP Website to Find Data to Harmonize

EACH DAY

3,287

TEENS

use marijuana for the first time.1

78%

of 2.4 million people who began using in the last year were aged 12 to 20.1

Studies show that marijuana interferes with attention, motivation, memory, and learning.

• Investigating and comparing the study methodology of each data set (available on the “data home page”) • Searching for variables across datasets and comparing how questions were worded • Using our online analysis facility to investigate the distribution of a particular variables across datasets

www.icpsr.umich.edu/nahdap

Most marijuana use

reduced intellectual level Students who use marijuana regularly tend to get lower grades and are more likely to drop out of high school than those who don’t use. Those who use it regularly may be functioning at a reduced intellectual level most or all of the time.

BEGINS in ADOLESCENSE

For more information: As perception of harm decreases, teen marijuana use increases.

MARIJUANA USE & EDUCATIONAL OUTCOMES

Data from the National Survey on Drug Use and Health (NSDUH) 2013 and Monitoring the Future (MTF) 2013.

• A longitudinal study conducted by the University of Pittsburgh following 1,354 youth located in either Phoenix, AZ, or Philadelphia, PA, between 2000–2010 • Sampled youth were found guilty of at least one serious offense between the ages of 14 and 18 • Youth were followed for seven years after enrollment into the study • Data series contains five studies > Subject Measures (public) — Baseline interview with the youth and 10 waves of follow-up (every six months for the first three years, and then yearly). > Collateral Measures (public) — Baseline interview with someone close to the youth (friend, parent, sibling) and three waves of follow-up (yearly for the first three years). > Official Arrest Records (restricted) — Administrative data taken from court and FBI records for up to 15 petitions prior to the study and up to 24 re-arrests after the baseline interview. > Release Measures (restricted) — Interview attempted each time a youth was released from an institutional stay. The goal was to assess the youth’s perceptions the environment within the institution. > Calendar Data (restricted) — Administrative records spanning the entire data collection period. Data is broken out into 13 topical domains by 3 to 4 reference time periods.

Phone:

734.647.2200

Email:

nahdap@icpsr.umich.edu

Facebook:

www.facebook.com/NAHDAP

Twitter:

twitter.com/NAHDAP1

NAHDAP is funded by the National Institute on Drug Abuse. NAHDAP is hosted by ICPSR and is part of the University of Michigan's Institute for Social Research.


Cross National Comparison of Underestimation of Chronic Conditions in Surveys of Older Adults in the Developing World Introduction

Methods

The rapid demographic changes of the 1930s–1960s improved life expectancy by reducing infant and child mortality and as a result may have produced a cohort of survivors who are more susceptible to the effects of poor early life conditions as they age. Recent results using extensive cross national data show an association between poor early life nutritional environment and older adult diabetes in the developing world which suggests that this conjecture may have merit (McEniry, 2014). Being born in a low caloric intake developing country that experienced rapid demographic changes in the 1930s–1940s was associated with a higher prevalence of adult diabetes (Figure 1).

Data

| |

20

• • • •• • •

C. Puerto Rico

C. Costa Rica D. Mexico

• • • •

|

10

D. Mexico-SAGE D. Mexico-MHAS C. Taiwan E. Bangladesh

E. China-SAGE

D. Brazil

C. Chile C. South AFrica

B. Uruguay

••

A. Netherlands

E. China-CLHLS

••

• ••

A. US-HRS

B. Cuba A. US-WLS

D. Russia

B. Argentina

A. UK

E. India

0

E. China-CHNS E. Indonesia

|

Age-Standardized Prevalence (%)

30

Figure 1: Caloric intake in early life and older adult diabetes

|

2000

|

|

2500

3000

|

3500

Calories note: Best-fit equation; diabetes=-0.006011(calories)+31.00; R-squared=0.40 note: China-CHNS; China-CLHLS, China-SAGE, India and Indonesia omitted from best-fit regression

Sources: Age-standardized prevalence of diabetes based RELATE (2013) for those born during the late 1920s and early 1940s. Daily caloric intake per capita based on Food and Agriculture Organization of the United Nations, 1946.

These results, however, are based on self-reported diabetes. Surveys of older adult health have, until recently, relied almost entirely on self-reported adult health questions to assess health status. Some studies show that, under certain conditions, there are few notable differences between self-reported health and health using biomarkers (Banks et al., 2006; Brenes, 2008; Goldman, Lin, Weinstein, & Lin, 2003). Self-reported health questions from surveys of older adults can be problematic. They assume that people have a good understanding of their own health either because they visit doctors on a regular basis or have access to good quality health care. Information obtained from surveys asking if a medical doctor has ever diagnosed the respondent with a particular health condition may reflect respondents who infrequently go to a doctor or live in an area with restricted quality health care. Cultural idiosyncrasies could cause different understanding and response of questions asked regarding health. A preliminary analysis of recent cross national data in low, middle and high income countries suggests that to a certain degree, self-reported health questions have validity (McEniry, 2014). Strong associations appeared between self-reported heart disease and diabetes and conditions or behaviors which have been shown to be associated with these conditions: hypertension, stroke, poor self-reported health, and mortality. However, the prevalence of heart disease clearly increased in some developing countries when using well-validated symptom questions for angina in conjunction with self-reports (Rose, 1962), suggesting that underestimation may be problematic. Nevertheless, logistic regression models showed that although the magnitude of the associations differed in some cases, the direction of associations were generally consistent with modeling based on either self-reported heart disease or a dependent variable based on symptom data (Table 1; McEniry, 2014). More investigation is warranted using biomarkers for heart disease and diabetes. The purpose of this presentation is to further examine biomarkers in relation to adult diabetes.

Mary McEniry, University of Michigan Jacob McDermott, University of Michigan Institute for Social Research

Analysis

Results

Biomarkers such as glucose and hemoglobin, cholesterol, C-reactive protein (CRP) in addition to Comparing prevalence of diabetes using self-reports and biomarkers (Table 2) shows that in most Survey data come from a newly compiled cross national data set of low, middle and high income measured blood pressure and BMI provide an indication of risk for chronic conditions such as countries underestimation at a population level of older adults does not appear to be problematic countries, RELATE (Research on Early Life and Aging Trends and Effects, 2013). The data are diabetes and heart disease (Crimmins, Kim, & Vasunilashorn, 2010; Yan et al., 2012). We used the with the exception of China. However, even though China is of most concern, its higher prevalence drawn from comprehensive and representative surveys of older adults or household surveys at biomarker data to develop multiple measures to assess biological risk for heart disease and diabetes still does not reach the much higher prevalence of diabetes in the Latin American region shown in either the national, regional or major city level. From Latin America there are the Mexican using cut off points reported in the literature (Yan et al., 2012). For the purposes of this presentation Figure 1 for countries that experienced rapid demographic changes in the early to mid 20th century. Health and Aging Study (MHAS, first wave, n=13,463), Puerto Rican Elderly: Health we present the results for diabetes only. The first step was to determine the degree to which A series of logistic regression models predicting adult diabetes with biomarkers and self-reports Conditions (PREHCO, first wave, n=4,291), Study of Aging Survey on Health and Well Being misclassification occurs and in particular the degree to which people who report no condition reconfirm the idea from earlier results that even with gross underestimation of chronic conditions, of Elders (SABE, n=10,597) and Costa Rican Study of Longevity and Healthy Aging (CRELES, actually have the condition. We gauged the degree to which underestimation was occurring by the direction of the association in models may not change but the magnitude of the association may comparing self-reports with measured risk first wave, n=2,827). From Asia there are the China Health and Nutrition change (Table 3). The results suggest that in some circumstances broad inferences can be made and determined differences among Study (CHNS, n=6,452), Chinese Longitudinal Healthy Longevity about the determinants of older adult health even in the face of underestimation of chronic Table 1: Comparison between self-reports, symptom and biomarker data respondents. The second step was to estimate Survey (CLHLS, n=16,064), WHO Study on Global Ageing and Adult conditions such as diabetes. multivariate models using both self-reported Health Study in China (WHO-SAGE, n=12,284), Indonesia Family Life Panel A:Heart Disease in SAGE Self-reports Self-reports and Symptoms diabetes and diabetes based on biomarkers. Survey (IFLS, wave 2000, n=13,260), the Bangladesh Matlab Health and Age 1.02*** 1.01** These multivariate models estimate the Socio-Economic Survey (MHSS, n= 3,721), WHO Study on Global Female 1.36*** 1.38*** Discussion likelihood of adult diabetes as a function of Education (years) 1.06*** 1.01 Ageing and Adult Health Study in India (WHO-SAGE, first wave, Obesity 1.21* 1.19* Examining conjectures regarding early life conditions and older adult health is dependent on poor early life conditions (low height), adult n=6,559) and Social Environment and Biomarkers of Aging Study in Functionality 1.32*** 1.72*** surveys of older adult health which often rely on self-reported measures. Further analysis of SES (education), adult lifestyle (smoking, Taiwan (SEBAS, n=1,023). From Africa there are the WHO Study on Diabetes 1.91*** 1.46*** self-reported diabetes and diabetes using biomarkers using extensive cross national data suggest caloric intake, obesity) and health Global Ageing and Adult Health Survey in Ghana (WHO-SAGE, Poor health 2.56*** 2.22*** (functionality, poor self-reported health). that while underestimation may be problematic in some circumstances, broad inferences can be n=4,302) and South Africa (WHO-SAGE, first wave, n=3,830). From the Ever smoke 1.02 1.00 made. When biomarker data become available for the SAGE countries that developed world there are the Health and Retirement Study (HRS, wave Exercises 0.89 0.99 are part of RELATE we will be able to further examine the merit of the China 3.93*** 1.22* 2000, n=12,527), Wisconsin Longitudinal Study (WLS, wave 2004, Table 2: Prevalence of diabetes using self-reported questions & using biomarker Ghana (reference) 1.00 1.00 conjecture regarding the cohorts born during the rapid demographic n=10,317), English Longitudinal Study of Ageing (ELSA, second wave, India 1.61*** 1.69*** changes of the 1930s–1960s. Study (year) Self-report Biomarker n=8,780) and Survey of Health, Ageing and Retirement-Netherlands Mexico 0.44*** 0.64*** High caloric intake (SHARE-Netherlands, first wave, n= 2,979).

Measures

Adult health.— Elderly adult health was defined by dichotomous variables using self-reported diabetes. The self-reports were based on questions asked of the respondent about whether a doctor had ever diagnosed them with diabetes. Using the biomarker for glycated hemoglobin, HbA1c, and using the definition of Yan et al. (2012) we created another variable to define high risk of diabetes: Not at risk (HbA1c); at risk-impaired glucose control (HbA1c>=5.7% and <6.5%); high risk (HbA1c>6.5% or taking diabetes medication). Obesity was calculated using body mass index (BMI) based on height and weight measurements (BMI greater than or equal to 30). A harmonized measure of difficulties with activities of daily living (ADLs) and poor self-reported health were also used as adult health outcomes (McEniry, 2014). Predictor variables.— All statistical models controlled for age, gender, years of education, adult low height, and smoking. Smoking was defined according to whether a respondent ever smoked, smoked in the past or currently smokes based on self-reports.

Sample selection

We selected surveys from the RELATE data which have biomarkers collected through blood samples by which to ascertain the risk of diabetes. These biomarkers were obtained through blood samples based on overnight fasting and are publicly available. The selected surveys also have panel data: CRELES, CHNS, HRS, ELSA, and SEBAS. The CRELES study collected an array of biomarkers (Brenes, 2008). The CHNS study has recently released biomarker data from 2009 on fasting blood measures which include measures for heart disease and diabetes collected in 2009 (Yan et al., 2012). The HRS collected biomarkers in 2006 and 2008 in a random subsample with a follow-up in 2010 and 2012 and we analyzed available data from 2006 and 2008. ELSA collected biomarkers in 2004 and 2008. We used biomarker data from the Taiwan SEBAS study of 2000.

Russian Federation South Africa Log likelihood Total observations

13.97*** 1.15 -3886 12235

Panel B: Diabetes in Costa Rica Age Female Education (years) Obesity Functionality Poor health Ever smoke Exercises Log likelihood Total observations

Self-reports 0.97*** 1.25 0.99 2.16*** 1.37** 1.70*** 0.85 0.70* -1005 2197

6.57*** 0.51*** -5436 12235

Glucose 0.98*** 1.38** 1.01 2.18*** 1.30* 1.37** 1.01 0.75* -1206 2197

Hemoglobin 0.98*** 1.25 1.00 2.46*** 1.44*** 1.57*** 0.87 0.73* -1084 2197

Source: RELATE (2013); SAGE and CRELES surveys. * p<0.05 **p<0.01 *** p<0.001

England-ELSA (2004) US-HRS (2006) US-HRS (2008) Mid caloric intake Costa Rica-CRELES (2003) Taiwan-SEBAS (2000) Low caloric intake China-CHNS (2009)

8 22 25

9 24 24

23 14

26 16

5

15

Source: RELATE (2013), those born late 1920s–early 1940s. Notes: Using the biomarker for glycated hemoglobin, HbA1c, and using the definition of Yan et al. (2012) to define high risk of diabetes:Not at risk (HbA1c <5.7); at risk-impaired glucose control (HbA1c>=5.7% and <6.5%); High risk (HbA1c>6.5% or taking diabetes medication). Prevalence is shown as a percentage. These numbers are not directly comparable with Figure 1 because in some cases the biomarkers and self-reports were collected at a different time period: HRS 2006, 2008 versus the original data HRS 2000; CHNS 2009 versus original data 2000. The studies are grouped according to country-level caloric intake in the 1930s.

Table 3: Odds of diabetes according to self-reports and biomarker

Age Gender Yrs education Low height Never smoke Past smoker Current smoker Obese Functionality Poor health

Costa Rica SR Bio 1.01 1.01 1.33* 1.32* 0.98 0.97 0.96 1.02 1.00 1.00 0.83 0.90 0.56* 0.64 2.05*** 2.32*** 0.88 1.02 1.79*** 1.46**

Taiwan SR Bio 1.04* 1.03 1.33 1.49 0.98 0.97 0.74 0.88 1.00 1.00 1.13 1.30 1.14 1.80 2.78***

N

1627

732

1627

1.90* 1.70 1.72**

US SR 1.00 0.74*** 0.96** 0.94 1.00 1.06 0.75* 2.73*** 0.97 2.41***

Bio 1.00 0.74*** 0.96** 1.03 1.00 0.90 0.79* 2.86*** 1.05 2.30***

England SR Bio 1.05*** 1.02 0.61*** 0.57*** 1.00 0.96 0.62** 0.75 1.00 1.00 1.12 1.05 1.02 1.37 2.36*** 2.61*** 1.06 1.32 3.11*** 2.62***

China SR Bio 0.98 1.04* 1.31* 1.32 1.08* 1.04 0.53 0.43*** 1.00 1.00 2.44 1.07 0.67 0.79 6.15*** 7.35*** 0.95 1.33 1.02 1.04

732

3992

3992

3160

1133

3160

1133

Source: RELATE (2013); those born in the late 1920s–early 1940s; imputed data. Note: Surveys represented here are CRELES (Costa Rica), SEBAS (Taiwan), HRS (US), ELSA (England), CHNS (China). US is HRS 2006; results were similar for HRS 2008. Low height is the lowest quartile of height; functionality is at least one difficulty with functionality (harmonized measured); poor health is poor/fair self-reported health. Taiwan had a slightly different measure for smoking. * p<0.05 **p<0.01 *** p<0.001

Selected references Banks, J., Marmot, M., Oldfield, Z., and Smith, J. (2006). Disease and disadvantage in the United States and in England. Journal of the American Medical Association, 295(17):2037–2045. Brenes, G. (2008). The effect of early life events on the burden of diabetes mellitus among Costa Rican elderly: estimates and projections. PhD Dissertation. University of Wisconsin—Madison. Crimmins, E., Kim, J.K., & Vasunilashorn, S. (2010). New Approaches to Understanding Trends and Differences in Population Health and Mortality. Demography, 47(Suppl), S41–S64. Goldman, N., Lin, I., Weinstein, M., and Lin, Y. (2003). Evaluating the quality of self-reports of hypertension and diabetes. Journal of Clinical Epidemiology, 56:148–154.

McEniry, M. (2014). Early Life Conditions and Rapid Demographic Changes: Consequences for Older Adult Health. Dordrecht, The Netherlands: Springer Science+Business Media. Research on Early Life and Aging: Trends and Effects (RELATE). (2013). PI M. McEniry. Produced and distributed by the University of Michigan: Institute for Social Research, ICPSR [distributor]. Release date: June 2013. Rose, G.A. (1962). The Diagnosis of Ischaemic Heart Pain and Intermittent Claudication in Field Surveys. Bulletin of the World Health Organization, 27, 645–658. Yan, S., Li, J., Li, S., Zhang, B., Du, S., Gordon-Larsen, P., et al. (2012). The expanding burden of cardiometabolic risk in China: the China Health and Nutrition Survey. Obesity Reviews, 13(9): 810–21.


National Archive of Criminal Justice Data Background

NACJD Activities

NACJD’s mission is to facilitate research that will improve the efficacy, value, and fairness of the criminal justice system by

Online Analysis

• NACJD was founded in 1978 as part of ICPSR • Supported by the National Institute of Justice, Bureau of Justice Statistics, and Office of Juvenile Justice & Delinquency Prevention • NACJD holds more than 2,000 data collections, including more than - 800 NIJ sponsored studies - 900 studies collected by the Bureau of Justice Statistics - 50 OJJDP sponsored studies

• Acquiring and preserving computerized crime and justice data from federal and state agencies, and from investigator initiated research projects. • Distributing these data to practitioners, policy makers, evaluators, academics, and others for analysis. • Training users in specialized areas of statistical analysis of crime and justice data. • Extending the value of these data by facilitating the reproduction of original results, replication of others’ conclusions, and testing of new hypotheses.

Summer Program As part of the ICPSR Summer Program, NACJD organizes the three courses to the right. Courses offer hands-on training in the use of criminal justice data, and background information on the design of data collection projects.

• Archiving and distributing criminal justice data - Monitoring and updating data collections - Preparing studies for online data analysis • Providing user support - Resource Guides assist with the retrieval and use of files obtained from the archive • Bibliography of Data-Related Literature • Address data-management issues for NIJ, BJS, and OJJDP staff • Conducting educational programs

A selection of NACJD studies are available for online data analysis. Researchers can • Search for variables of interest in a dataset and among all datasets prepared in our online system. • Produce simple summary analysis reports. • Perform statistical procedures, such as: List values of individual cases, frequencies, crosstabulations, mean comparisons, correlation matrix, and OLS regression. • Create a subset of cases or variables from a particularly large collection to save downloading time and space on a personal computer.

• A four week BJS-sponsored seminar • Three- and five-day NIJ-sponsored workshops • An OJJDP Program of Research on the Causes and Correlates of Delinquency

W E B NACJD Project Specific Web Sites


A Disciplinary Repository for Research on Social Dimensions of Emerging Technologies: Challenges and Opportunities The Nanoscience and Emerging Technologies in Society: Sharing Research and Learning Tools (NETS) is an Institute of Museum and Library Services (IMLS)-funded project to investigate the development of a disciplinary repository for the Ethical, Legal and Social Implications (ELSI) of nanoscience and emerging technologies research. NETS partners will explore future integration of digital services for researchers studying ethical, legal, and social implications associated with the development of nanotechnology and other emerging technologies.

Investigate, Generate, Establish

NO NANO

Partners

Societal dimensions research investigating the impacts of new and emerging technologies in nanoscience is among the largest research programs of its kind in the United States, with an explicit mission to communicate outcomes and insights to the public. By 2015, scholars across the country affiliated with this program will have spent ten years collecting qualitative and quantitative data and developing analytic and methodological tools for examining the human dimensions of nanotechnology. The sharing of data and research tools in this field will foster a new kind of social science inquiry and ensure that the outcomes of research reach public audiences through multiple pathways.

University of Massachusetts Amherst Libraries

Inter-university Consortium for Political and Social Research

Center for Nanotechnology in Society, University of California Santa Barbara

Center for Nanotechnology in Society, Arizona State University

SPRING ���� WORKSHOP

People

The central activity of this project involves a spring 2013 workshop that will gather key researchers in the field and digital librarians together to plan the development of a disciplinary repository of data, curricula, and methodological tools. RESEARCH QUESTIONS • How are disciplinary data repositories planned, developed, and assessed?

Co-Principal Investigator, Rebecca Reznik-Zellen (UMass Amherst)

Co-Principal Investigator, Jessica Adamick (UMass Amherst)

Consultant, Gretchen Gano (Amherst College)

Consultant, Peter Granda (ICPSR)

• What are methods to determine the readiness of a community to use and contribute to a disciplinary data repository?

PROJECT GOALS GOAL � : Project leads will conduct a full investigation into the feasibility and requirements for a digital registry and repository for nano ELSI research.

GOAL � : Project leads will generate community support for and commitment to the development of a nano ELSI research repository.

GOAL � : Project leads will establish the foundation for a smooth and efficient implementation process with additional funding.

IMLS Grant Award Number LG-51-12-0511-12 The Institute of Museum and Library Services is the primary source of federal support for the nation’s 123,000 libraries and 17,500 museums. Through grant making, policy development, and research, IMLS helps communities and individuals thrive through broad public access to knowledge, cultural heritage, and lifelong learning.


Data for Investigating Better Ways to Identify and Understand Effective Teaching The Measures of Effective Teaching Longitudinal Database (MET LDB) preserves and disseminates data from the MET Project, the largest study of classroom teaching ever conducted in the United States.

Video Data from Real Classrooms • Thousands of hours of rich video data — active classrooms, real teachers, current practices • 360° panoramic view of the teaching session

Quantitative Data from Teachers and Students • Data on more than 2,500 fourth- through ninth-grade teachers working in 317 schools • Multiple levels of analysis — teacher, classroom, school • Diverse set of surveys and assessments, including: - Standardized math and reading test scores for all students in all six districts - Student, teacher, and principal surveys - Value-added measures - And much more met-ldb-inquiries@umich.edu

www.icpsr.umich.edu/metldb


Data Seal of Approval DSA Guidelines There are 16 guidelines, based on five criteria that together determine whether the data are sustainably archived. The data: • Can be found on the Internet • Are accessible (clear rights and licenses) • Are in a usable format • Are reliable

Objectives

Procedures

• Support the long-term usability of data

The starting point is completing the DSA self-assessment in the online tool. Once the assessment is submitted, it is peer reviewed by the DSA Board. When granted the Data Seal, the repository displays the DSA logo on its web site along with the assessment and relevant documents.

• Increase the trustworthiness of digital data repositories • Encourage transparency with respect to digital archiving policies and procedures

DSA Board

Impact of the Data Seal

• Initiated by DANS (The Netherlands)

• Displaying the DSA logo demonstrates that the repository is considered a reliable long-term archive

• Can be referred to (persistent identifier)

• Established by a number of institutions committed to longterm archiving of research data

info at datasealofapproval.org

• Board members assigning the Data Seal: • CINES (France) • DANS (The Netherlands) • ICPSR (USA) • Max Planck Institute for Psycholinguistics (The Netherlands) • NESTOR (Germany) • United Kingdom Data Archive (United Kingdom)

• The DSA indicates that the repository complies with the 16 DSA guidelines, as determined through an assessment.

www.datasealofapproval.org


Video and quantitative data for investigating better ways to identify and understand effective teaching What is the Measures of Effective Teaching Longitudinal Database (MET LDB)? Video Data from Real Classrooms • Thousands of hours of rich video data — active classrooms, real teachers, current practices • 360° panoramic video of the teaching session, concurrent display of the classroom’s main board, basic information about the session, teachers’ reflections, and uploaded instructional support materials

• It is the collection designed to consolidate, disseminate, and preserve data from the MET Project for use by researchers at institutions like yours. • It is data from the largest study of classroom teaching ever conducted in the United States. • The data allow researchers to quantify effective methods of teaching — a concept that has been historically difficult to measure. • It contains variables of interest for researchers in several areas, including:

Data available now!

• mathematics/reading education • sociology

Researchers at your institution can be among the first to access the MET quantitative and video data using secure online technical systems.

• curriculum and instruction • educational psychology • teacher education/development

The documentation files are available for public download via the MET LDB website, www.icpsr.umich.edu/METLDB

Learn More • Email us at met-ldb-inquiries@umich.edu • Sign up for our email list at: www.icpsr.umich.edu/METLDB • Visit this related link: www.metproject.org

Quantitative Data from Teachers and Students • Student achievement gains on state standardized tests and supplemental tests • Classroom observations and teacher reflections • Teachers’ pedagogical content knowledge • Student perceptions of the classroom instructional environment • Teachers’ perceptions of working conditions and support at their schools

• research methodology/measurement

Funding and Partners

• human development • pedagogy • educational policy

• Funding is provided by the Bill & Melinda Gates Foundation.

• …and more!

www.icpsr.umich.edu/METLDB

• Access to data from the Measures of Effective Teaching project was coordinated by four units at the University of Michigan: Inter-university Consortium for Political and Social Research (ICPSR), Survey Research Center (SRC), School of Education, and University of Michigan Library.


Records Management in a Data Management World: Using a Retention Schedule as a Tool for Data Curation Traditional records management programs long have used retention schedules as a tool for managing records from creation to final disposition. The same tool can be applied to research records and the data curation lifecycle, be it at the organization level, the unit level, or for a specific research project.

What Goes In?

What Comes Out?

DATA LIFECYCLE ANALYSIS

DATA MANAGEMENT

• Existing Documentation: Best Practices, Project Records • Records Survey: Shared Drives, Storage Areas • Researcher Interviews LEGAL AND CONTRACTUAL REQUIREMENTS • Federal and State Regulations • Funder’s Requirements • University Standard Practices

Project Records Retention Schedule A well-designed retention schedule is an actionable document that defines activities which produce records, what records are produced, how long they must be retained, and what their final disposition will be.

IMPLEMENTATION RESEARCH UNIT NEEDS • Information Governance • Institutional Memory • Operational Documentation

Existing documentation, such as this Best Practices website, is an excellent source of information on the data lifecycle.

• Post on Intranet or Web site for Accessibility • Give Presentations to Research Units for Training • Participate in Project Kick-off Meetings to Encourage Adoption • Conduct Annual Review for Continued Relevance

Activity

Transaction Records

Retention

Disposition

Pre-Award

Estimates: Work Scope Memo Budget Values Budget Justification Smoothing Documents Correspondence Other Documents

15 Years or Permanent

Archive-Funded Projects Review/Destroy — Nonfunded Projects

Awarded: Contracts Memo of Understanding Contract Modifications Project Award Notice Form 7471 Financial Operations Notifications

Permanent

Management

Financial Setups: CRS Specifications Correspondence Other Documents

7 Years

Destroy

N

Management

Project Managment Plan: Scope Statement Scope Changes Schedule Resource Management Communication Plan Risk Analysis Quality Assurance

Permanent

Archive — Formal and Final

N

Process Management Documents: Kickoff Meeting Documents Project Review Documents Monthly Project Reports Cost Reports Client Reports Client Meeting Minute Other Documentation

10 Years or Permanent

Sampling

Sample Design and Final Documentation: Correspondence and Minutes Design Documentation Final Report File Index with Documentation

Sampling

Sample Listing: Segment-Level Documents Update Logs Field Notes and Observations Maps Shapefiles

Pre-Award

Management

Archive

Contains PII? N

N

• Comprehensive Project Records • Consistency of Classification and Documentation • Data Provenance INFORMATION GOVERNANCE • Demonstrates Compliance • Reduces Security and Access Risks • Support from Administration OPERATIONAL EFFICIENCY • Accuracy in Search and Retrieval • Defensible Destruction • Reduced Cost of Records Maintenance

For More Information JISC — RECORD RETENTION SCHEDULE: RESEARCH

Archive — Formal and Final Destroy — Interim and Working Versions Review Other Documents for Archival Value

N

Permanent

Archive — Formal and Final Destroy — Interim and Working

N

www.arma.org/bookstore/files/Torres.pdf

10 Years or Permanent

Archive — Final Segments, Field Notes and Observations, Updates, Maps Destroy — Interim and Working Updates, Shapefiles, Training Documents

Y

Have Questions?

www.jisc.ac.uk/publications/generalpublications/2002/ recordssrlstructure/rrs02.aspx

ARMA INTERNATIONAL — CREATING A PROCESS-FOCUSED RETENTION SCHEDULE

Contact Kelly Chatain, Associate Archivist ISR SRC, Survey Research Operations kchatain@umich.edu


Data Sharing for Demographic Research A data archive for demography and population sciences Director: Mary McEniry Archive Manager: Tannaz Sabet • Research Technician Senior: Andrew Proctor

Background and Mission/Vision Cooperative Agreement Funded by NICHD Purpose • Resource for users and producers of demographic and population science data • Facilitate data sharing for NICHD-funded projects Services • Data preservation and dissemination • Data sharing guidelines and assistance • Restricted-use data sharing User support and outreach • Focus on NICHD-Funded Research • Reproductive Health, Fertility, Family Planning, Sexual Behavior • Families and Households • Children and Youth • Health and Mortality

Who Accesses Our Data? Profile of Unique Data and Document Users (2010–2013)

Files Downloaded: 1,529,474 (2010–2013)

0.5% 4%

8%

Graduate Student Faculty

18%

47%

• • • • •

National Longitudinal Study of Adolescent Health India Human Development Survey Welfare, Children, and Families: A Three-City Study Immigration and Intergenerational Mobility in Metropolitan Los Angeles (IMMLA), 2004 National Couples Survey, 2005–2006

Other Files

Research Staff Government Employee

534,060

Data Files

35%

733,270 48%

Other *Due to rounding, chart does not total 100%

23%

Codebooks 262,144 17%

Departmental Affiliation (2010–2013) 2%

14%

Economics (3,182)

26%

Sociology/Demography (2,267)

3% 3% 3%

Psychology (966)

4%

Public Policy/Public Administration (533)

Public Health/Medicine (699) Political Science/Government (549) Social Science (524)

4% 18%

4%

Top Five Downloaded Studies in Health

Undergraduate Student

5% 6%

8%

Social Work (499)

Education (400)

Criminal Justice (420)

Business (294)

Statistics (404)

Other Departments (1,703)

Moving into the Future... We expect to make significant progress in the following areas: • Increased awareness among researchers in the NICHD-funded population centers of the services we offer • Increased engagement with demographers and population scientists in how to improve data sharing in the research community • Increased presence in the international arena

Website and Contact Information www.icpsr.umich.edu/DSDR

Current Projects: Improvements in the Science of Data Sharing Common Catalog • We are working to compile into one catalog information about NICHD-funded data collection projects in the last 10 years. When this catalog is completed investigators will be able to quickly identify projects of interest. Complex Merge Tool • We are in the developmental stages of a tool that can be used as a teaching tool for students who may not feel comfortable merging large data sets together because they do not have a firm grasp of statistical software such as Stata or SPSS. Automated System for Processing Restricted Data Applications • We have worked over the last year to refine our automated system to fit the needs of one of our most important studies — the Add Health study. Latin American Demography • We are working with the Latin American demography association ALAP to hold a series of workshops on data sharing in the Latin American region.

Photo (c) Russ Bowling


Using the www.icpsr.umich.edu/ICPSR/citations

What Is the ICPSR Bibliography of Data-related Literature? This tool is a searchable database that contains over 66,000 citations of known published and unpublished works resulting from analyses of data held at ICPSR. The collection represents over 50 years of scholarship in the quantitative social sciences.

ICPSR links you to works that cite our holdings, providing new ways to discover datasets and track their use. Start a search at either the Bibliography’s Portal

OR

Description Page of Any ICPSR Study

Who finds it useful? • Instructors — teach students to learn about the data via the literature • Researchers — can examine existing findings before reusing data

The Bibliography makes it possible to:

Choose from Results

What Should You Include in a Data Citation?

• Understand, evaluate, and build upon others' findings

• Author: Name(s) of each individual or organizational entity responsible for the creation of the dataset.

• Look at the usage patterns of data resources

• Date of Publication: Year the dataset was published or disseminated.

• Investigate the life cycle of data and the types of analyses undertaken Link to Data and Full Text

Filter, Sort, Export

• Electronic Location or Identifier: Web address or unique, persistent, global identifier used to locate the dataset (such as a DOI). Append the date retrieved if the title and locator are not specific to the exact instance of the data you used.

• Avoid accidentally duplicating analyses that have already been done • Identify cross-disciplinary implications and uses of the data

Contact:

Elizabeth Moss Associate Librarian eammoss@umich.edu

www.icpsr.umich.edu/ICPSR/citations

• Title: Complete title of the dataset, including the edition or version number, if applicable. • Publisher and/or Distributor: Organizational entity that makes the dataset available by archiving, producing, publishing, and/or distributing the dataset.

• Understand the limitations as well as the research potential of the data, by seeing the data in use and reading the observations and findings of other researchers

For More Information

Just mentioning the title of a study or describing the sample in the text of your article is not enough for a reader to know the exact data you used. A formal citation resides in the reference section of a publication, and it provides enough information for the reader to identify, retrieve, and access it.

Link to Full List

• Identify much of the research that uses a given ICPSR dataset

• Learn more about methodological issues, some of which are covered solely in the published literature

Citing datasets used in published research is just as important as citing journal articles, books, and other sources.

By citing your use of a dataset, you are supporting the reproducibility of your research and attributing credit to those who provided the data — including data you created! Citations also allow for tracking reuse and measuring data’s impact.

• Reporters and policy makers — look for processed statistics to explain studies • Principal investigators and funders — track how data are used after deposit at ICPSR

Why Does Citing Data Make Good Sense?

Discover Data New to You!

Link Out to Full Text

These are the minimum elements required for dataset identification and retrieval. Fewer or additional elements may be requested by author guidelines or style manuals. Be sure to include as many elements as needed to identify precisely the dataset you have used.


Strategic Plan 2013 ICPSR seeks to leverage its success, strong network, and leadership position to increase stakeholder value and support a vibrant field of social and behavioral research. 1. Internal and external synergy — building robust linkages and communication across activities and programs 2. Inclusion and diversity — expanding ICPSR’s work into additional regions, integrating new voices into our processes, and enlarging educational opportunities 3. Build on strengths — advancing from the 2008 Strategic Plan and drawing on current assets and successes

ICPSR advances and expands social and behavioral research, acting as a global leader in data stewardship and providing rich data resources and responsive educational opportunities for present and future generations.

The plan consists of three externally focused strategic directions and one designed to align organizational structures and processes with strategic priorities.

Direction 1: Enhancing Our Global Leadership Plan Development Since ICPSR released its prior Strategic Plan, new forms of data-based research have emerged, new mandates for open access have been put forward, the data archiving community has grown, and economic uncertainty has altered the funding picture. ICPSR saw the need for new strategies, and in September 2012, the planning process started. It included discussions and information gathering via phone interviews and meetings with ISR and ICPSR staff, ORs, and partner institutions. In November 2012, ICPSR senior leaders laid out a strategic-plan framework. More input was sought from ORs, the Council and staff. The plan’s final version was published in September 2013.

ICPSR staff is developing achievement tactics and success measures. Plan execution is slated to begin in January 2014.

ICPSR will enhance its role as a leader in data stewardship, engaging the global community as a partner and supporter. Archives, universities, and others look to ICPSR for leadership in addressing new opportunities. There is also a need to introduce ICPSR to new audiences and to add diverse perspectives. By collaboratively engaging with the global data community, ICPSR can advance data-enabled science.

Direction 2: Developing New and Responsive Products and Services

ICPSR will create innovative data services through ongoing R&D, leading to new funding sources. ICPSR also will expand the data it provides to reflect new research methodologies. ICPSR is a leader in creating data services that respond to diverse data users’ needs. We will listen to communities of interest to deliver new forms of data and innovative services to them.

Direction 3: Advancing Knowledge, Skills, and Tools for the Research Community ICPSR will expand its role in building understanding of the research process — including data management and analysis — to facilitate effective research. ICPSR will add knowledge-building opportunities to meet the challenges posed by the increasing abundance and complexity of data. Our Summer Program in Quantitative Methods and resources supporting data use in the classroom can serve as a springboard to new levels of excellence in education.

Direction 4: Expanding Organizational Capacity for Leadership and Innovation Read the full plan at www.icpsr.umich.edu/files/ICPSR/strategic/ strategic-plan-2013.pdf Please submit comments to: icpsrstrategicplan@umich.edu

ICPSR will invest in its staff and internal structures and systems. ICPSR will evolve to reflect an ongoing commitment to effective planning, communication, and transparency. Aligning internal systems with external priorities will leverage current strengths and create new capacities.

Strategies for Enhancing Our Global Leadership (Direction 1) • Develop national and international partnerships. • Facilitate consensus building and advocacy. • Support emerging capacity for international partners. • Expand the global ICPSR membership network. Strategies for Developing New and Responsive Data Products and Services (Direction 2) • Heighten engagement with the community. • Develop innovative data services through ongoing R&D. • Develop revenue sources. Strategies for Advancing Knowledge, Skills, and Tools for the Research Community (Direction 3) • Engage new audiences with our educational programs. • Widen access to training and related tools. • Build educational partnerships. • Increase integration of knowledge-building opportunities. Strategies for Expanding Organizational Capacity for Leadership and Innovation (Direction 4) • Foster a culture that encourages learning and innovation. • Ensure a high-performing and inclusive staff. • Build bridges across groups and excitement around our mission.


Exploring the Resource Center for Minority Data Rich Resources of the RCMD Valuable Data Such as these popular downloads … • Latino National Survey, 2006 • National Asian American Survey, 2008 • National Neighborhood Crime Study, 2000 • 21st Century Americanism: Nationally Representative Survey of the United States Population, 2004

What is the RCMD?

RCMD Provides Helpful Features Search/Browse Functionality • By Variable: Covers over 638,900 variables representing about 1,930 studies • By Bibliography: Identifies publications resulting from the data • By Geography: Finds research conducted in particular geographic areas

A smart archive — providing educators, researchers, and students with data resources for analyzing issues affecting racial and ethnic minority populations in the U.S.

• Immigrant Second Generation in Metropolitan New York

Data Analysis Functionality

A portal — seeking to assist in the public dissemination and preservation of quality data to generate more “good science” for years to come

… these recent releases …

• Sample Characteristics: Tool assesses survey sample sizes for minority populations for many MDRC data files.

A participant — contributing to the interactive community of persons interested in minority-related issues/investigations

• Gates Millennium Scholars Studies • Washington State Achievers Longitudinal Surveys, 2000–2007 • Latino National Survey Update • Behavioral Risk Factor Surveillance System, 2003 • County-Specific Net Migration by Five-Year Age Groups, Hispanic Origin, Race and Sex: 2000–2010

ideas We Want to Hear From You

… and these future releases:

All ideas are welcome as RCMD grows into the future!

• Latino MSM Community Involvement: HIV Protective Effects

Email us at: ICPSR-RCMD@umich.edu www.icpsr.umich.edu/RCMD

• Analyze Data Online: Perform online data analysis on our website without downloading files by using SDA.

Learning and Instruction • Webinars: Introduce data resources, methods, and quantitative literacy • Teaching Modules: Instruct students how to perform sophisticated analysis • Online Learning Resources: Data-Driven Learning Guides that supplement undergraduate social science coursework with learning exercises using RCMD data

• Latino National Survey — New England update • Hurricane Katrina Advisory Group (Waves 2, 3) • Behavioral Risk Factor Surveillance System, 2011

www.icpsr.umich.edu/RCMD

Really Cool

Resource Center for Minority Data

<

• Citizenship, Democracy, and Drug-Related Violence, 2011

(RCMD)


V Long-Term Management

I Discovery & Planning tadata Powered Me

ddi

The use of DDI-Codebook as an interface for a browser-based variable editor

I DD by

IV Publication & Sharing

DDI-Codebook in Action

II Initial Data Collection

Metadata Flow

III Final Data Preparation & Analysis

Variable Metadata is initially stored and managed in a series of Excel spreadsheets — a distinct spreadsheet for each group of variables and each type of metadata.

Background DDI-Codebook is a metadata specification for the social and behavioral sciences and is the standard used by the NSFG Variable Editor for importing and exporting variable metadata.

ICPSR's General Archive and Computer & Network Support

DDI-Codebook Element Tables The element tables generated by SAS' XML Mapper utility are based on the elements included in the XML document — not the entire DDI-Codebook specification.

The metadata is imported into SAS and marked up with DDI-Codebook tags by a series of DATA steps in order to generate the valid XML file.

The DDI-Codebook XML file is imported into the NSFG Variable Editor using its import function.

ICPSR's Computer and Network Support staff developed a browser-based variable editor as part of ICPSR's partnership with the National Survey of Family Growth.

The variable metadata is then editable by both NSFG staff and ICPSR staff.

What About the Data?

The NSFG Variable Editor uses Grails as a framework utilizing the Groovy language. Grails utilizes several common open source frameworks such as Spring (for decoupling web/db layers), Hibernate (object-relational mapping), Velocity (HTML templates), and many others. The NSFG Variable Editor also has an export function — enabling it to be used to generate valid DDI-Codebook XML files.

Underneath it all, Java technology is utilized as it runs in a Tomcat 7 web container. Apache Shiro is used as the authentication/authorization framework, authentication is done against UM's Active Directory instance, and the AD instance is also used to govern user access to the application (through group associations). Oracle is used at the back end to store most of the data, while the Compass framework abstracts a Lucene index for search capabilities.

NSFG Variable Editor description and support provided by Derek Van Assche

Other DDI-Codebook elements, such as LRECL column locations, descriptive statistics, and frequencies, can be joined with the columns in the element tables. Once new elements are marked-up properly, the new XML file can be re-imported into the NSFG Variable Editor or other XML application! Anybody want to render a codebook?

SAS' XML Mapper utility is capable of reading valid XML markup and generating code that can be used to generate related, numerically keyed tables comprised of the element values. Poster elements and SAS processing provided by Philip A. Wright


Instructional Resources at ICPSR Build Students’ Quantitative Skills

ICPSR offers a variety of materials that support faculty efforts to use real social science data in undergraduate classrooms. These include exercises with preset analyses and tools intended for exploring data within the ICPSR collection that have great classroom applications.

Collection of External Resources TeachingWithData.org (TwD) TeachingWithData.org is a library of social science exercises; pedagogical strategies for incorporating data; and maps, tables, and graphs, tagged with searchable keywords. Also, the popular Data in the News feature provides descriptions of stories in current media that use data well, allowing students to improve critical thinking skills and quantitative literacy.

ICPSR Tools Turned Classroom Aids Tools researchers use in exploring ICPSR data make great teaching resources as well! • Bibliography of Data-related Literature — starting point for research papers and literature reviews • Social Science Variables Database and Compare Variables feature — demonstrate operationalization of concepts, build a survey, compare responses by sample or across time, or find out whether students answer as survey respondents did • Survey Documentation and Analysis online analysis package — explore data without the need for licensed statistical software • Crosstab Assignment Builder — allows students to create contingency tables from a subset of variables predefined by the instructor • Study codebooks and other documentation — compare modes of survey research or explore methods of collecting administrative data ICPSR also links to teaching resources created by other data archives! Guides are available to help students learn how to read scientific journal articles effectively and to interpret SPSS output for different statistical tests.

Exercises with Preset Analyses Data-Driven Learning Guides (DDLGs) DDLGs are short, stand-alone modules that can be incorporated into lower-level social science classes. Topics represent focal concepts in history, political science, psychology, research methods, sociology, and related disciplines. Each contains a social science discussion of the concept, a description of data used, and pre-run statistical analyses with questions for each. Interpretation tips ensure students understand the story told by the data.

Exercise Sets Exploring Data through Research Literature (EDRL) "...but, what should I study?" Instructors who assign independent research often hear these words. Rachel Barlow created EDRL, based on the ICPSR Bibliography of Data-related Literature, to provide guidance. The activities demonstrate the breadth of research topics possible from a single study and show the interconnections between social scientists and their work. Instructors choose a focal article and students must find other articles using the same data, about the same topic, or by the same authors.

Investigating Community and Social Capital (ICSC) Teaching about social capital? Lori M. Weber designed this series of activities based on Robert Putnam's Bowling Alone for use in research methods and other political science courses. Using ICPSR data and codebooks, students learn about social research while replicating and expanding upon Putnam’s work. For an additional learning opportunity, consider having students find comparable data for more recent years.

Supplementary Empirical Teaching Units in Political Science (SETUPS) 2012 SETUPS 2012 is the newest release in a series of teaching exercises based on the American National Election Studies (ANES). Students explore voting behaviors of population subgroups while learning about survey research! Developers Charles Prysby and Carmine Scavo provide a subset of the ANES data and codebook so students can work with the real data without being overwhelmed. The 10 analytic exercises are recommended for courses in American politics/government or research methods in political science. SETUPS 2004 and 2008 remain available, too.

www.icpsr.umich.edu/instructors

Opportunities for Students Paper Competition Each year, ICPSR hosts student paper competitions for undergraduate and master’s students who use ICPSR data. In addition to cash prizes, winners and their mentors are recognized on the ICPSR website and their papers are published in the ICPSR Bulletin. Submission deadline is in late January or early February. Summer Internship ICPSR hosts a highly competitive National Science Foundation Research Experience for Undergraduates. Interns spend 10 weeks on campus learning to process data for preservation and dissemination, taking courses in the ICPSR Summer Program in Quantitative Methods of Social Research, and working on their own research projects with a mentor. Housing, food, tuition, books, and a stipend are provided. Interns leave with a conference-ready poster about their research. Applications are due January 31.

For More Information Contact ICPSR Instructional Resources at 763.615.5653 or email lhoelter@umich.edu


Introducing openICPSR

open Can scientists still deposit with ICPSR? Absolutely. Depositors can donate their data to the ICPSR membership for curation at no fee. Those data are accessible to the membership.

How will openICPSR accept and disseminate restricted-use data? •

Phase I — In January 2014, openICPSR will accept restricted-use data for deposit at the same fee levels as public-use data. Researchers will receive DOIs and related citations upon deposit. Restricted-use data will not be disseminated until Phase II. Phase II — In late 2014, openICPSR will begin to disseminate restricted-use data via its Virtual Data Enclave; data users will be charged an administration fee to access restricted-use data.

How do ICPSR members benefit from openICPSR? openICPSR meets the needs of member data depositors who require public access yet desire to share and preserve their data with a trusted social and behavioral sciences repository. ICPSR members using openICPSR also receive: • 10X the storage space for openICPSR deposits • Access to selected fully curated openICPSR deposits ICPSR members also continue to receive: • Access to over 65,000 datasets • Access to fully curated datasets with professional metadata, stats package conversion, standardized codebook, variable-level search, bibliography search, and other data tools • Teaching and instructional tools • Discounted tuition for ICPSR Summer Program courses

Public Access Data Sharing at ICPSR Recognized. Cited. Confident. Secure.

openICPSR was created to help research scientists be recognized and cited, to impart confidence that their data will be taken care of for the long term, and to provide a sharing service for those who require restricted-use data dissemination.

What is openICPSR?

Why is there a charge for deposits?

openICPSR is a research data-sharing service for the social and behavioral sciences. It enables the public to access research data without charge — or in the case of restricted-use data, for nominal charge.

There are paid curation professionals involved and storage costs for multiple copies to ensure data are kept safe. openICPSR’s deposit fees sustain the service, assuring access to data.

What is unique about openICPSR vs. other data sharing services? openICPSR is the only public data-sharing service: • Where the deposit is reviewed by professional data curators who are experts in developing metadata for the social and behavioral sciences • With an immediate distribution network of over 740 institutions looking for research data, that has powerful search tools, and a data catalog indexed by major search engines • Sustained by a respected organization building on over 50 years of experience in reliably storing research data • Eager to accept and disseminate sensitive and/or restricted-use data in the public-access environment

Why is ICPSR launching a public-access data-sharing service? openICPSR was developed to assist in meeting requirements for public access to federally funded data. It can ensure data depositors fulfill public-access requirements found in grant and contract RFPs.

What are openICPSR’s deposit options? 1. Self Curation: Enables research scientists to deposit data on demand and provide immediate public access. Depositors prepare all files and metadata (meta tags). Once data are published, depositors get a DOI and a data citation; openICPSR conducts a metadata review to maximize exposure in ICPSR’s catalog. Package fee is $600. 2. Professional Curation: Enables a research scientist to tap all aspects of ICPSR’s curation services including full metadata generation and a bibliography search, stat package conversion, and user support. The fee to distributors depends upon data complexity. 3. Topic Archive: Enables an agency, foundation, or large project with many datasets to fully fund the dissemination of its data. While data undergo treatment as in the Professional Curation option, the Topic Archive option includes premium services such as dedicated staff specialists, an exclusive website and customized data tools, and acquisitions and compliance reporting. Project managers, officers, and agencies should call for a proposal.

What happens to data deposits? openICPSR provides bit-level preservation and public access for at least five years. Depositors can provide further funding to extend preservation and public access. Thereafter, unfunded deposits transfer to the ICPSR membership.


www.icpsr.umich.edu/NACJD

About NACJD The National Archive of Criminal Justice Data (NACJD) facilitates research in criminal justice and criminology through the preservation, enhancement, and sharing of computerized data resources; the production of original research based on archived data; and specialized training workshops in quantitative analysis of crime and justice data. NACJD supports data users and researchers by: • Identifying appropriate criminological and criminal justice data collections on specific topics • Assisting with the retrieval and use of files obtained from the archive • Providing related literature citations for all data in the archive • Developing resource guides to facilitate data discovery and use • Offering technical user support • Hosting educational programs

Resource Guides Online resource guides that provide detailed information about complex or frequently accessed data collections are available for the following topics and data series: • Capital Punishment in the United States • Expenditure and Employment for the Criminal Justice System • Federal Justice Statistics Program • Geographical Information Systems • Homicide • Homicides in Chicago • National Corrections Reporting Program • National Juvenile Corrections Data • Violence Against Women See the website for a listing of additional resource guides.

National Archive of Criminal Justice Data Available Data NACJD maintains over 2,200 studies of data collected from NIJ, BJS, and OJJDP, with over 9,000 citations of works resulting from analyses of these holdings. NIJ is the research, development, and evaluation agency dedicated to improving knowledge and understanding of crime and justice issues through science. BJS collects, analyzes, publishes, and disseminates information on crime, criminal offenders, victims of crime, and the operation of the justice systems at all levels of government. OJJDP strives to strengthen the juvenile justice system’s efforts to protect public safety, hold offenders accountable, and provide services that address the needs of youth and their families.

Frequently Accessed Holdings National Institute of Justice • Arrestee Drug Abuse Monitoring (ADAM) Program/Drug Use Forecasting (DUF) Series • Project on Human Development in Chicago Neighborhoods (PHDCN) Series • Serious and Violent Offender Reentry Initiative (SVORI) Multi-site Impact Evaluation, 2004–2007 • Violence and Threats of Violence Against Women and Men in the United States, 1994–1996

Bureau of Justice Statistics • • • •

Law Enforcement Management and Administrative Statistics (LEMAS) Series National Criminal Victimization Survey (NCVS) Series National Incident-Based Reporting System (NIBRS) Series Contact Us Survey of Inmates of State and Federal Correctional Facilities Series Email: nacjd@icpsr.umich.edu • Uniform Crime Reporting (UCR) Program Data Series Phone: 800-999-0960

Office of Juvenile Justice and Delinquency Prevention • Census of Juveniles in Residential Placement (CJRP) • Juvenile Residential Facility Census (JRFC) Series • Survey of Youth in Residential Placement (SYRP)

www.icpsr.umich.edu/NACJD

Summer Program The Bureau of Justice Statistics (BJS) sponsors the annual Quantitative Analysis of Crime and Criminal Justice Data summer workshop at ICPSR. This four-week intensive seminar features an overview of many of the data collections compiled by BJS and promotes the use of these data through individual projects. The seminar provides an issue-oriented examination of BJS data and their use to answer substantive criminal justice questions. It also offers a methodological exploration of data challenges such as sampling error, measurement error, and complex file structure. Participants obtain hands-on experience by designing, completing, and presenting a quantitative research project using a BJS dataset of their choosing. The workshop is designed for faculty and research professionals from academia, nonprofit organizations, and government agencies, including advanced social science graduate students who are familiar with data analysis and quantitative research.

NACJD is primarily sponsored by federal agencies within the United States Department of Justice: the National Institute of Justice (NIJ), the Bureau of Justice Statistics (BJS), and the Office of Juvenile Justice and Delinquency Prevention (OJJDP).




IASSIST Takes Action! What is a special interest group at IASSIST?

1 Creating Core Instructions for Citing Data Citing datasets used in published research is just as important as citing journal articles, books, and other sources that contributed to the research. Citing the use of a dataset supports the reproducibility of research and attributes credit to those who provided the data. Citations also enable tracking data reuse and measuring impact of the data. The SIGDC created the Quick Guide to Data Citation because instructions for citation styles do not consistently provide examples for dataset citations. The Guide will help you determine what citation elements to include. The point is to provide enough information in a citation so that the reader can identify, retrieve, and access the unique dataset used.

3 Changing Citation Managment Software

Study Number Original Release Date Series Title Version Date of Collection Funding Agency Short Title Abbreviation ISSN

2 Updating Style Guides

DOI Version History Geographic Coverage

Style guides set the norms for citation, so the SIGDC advocates that they include robust data citation standards. We identified style guides in current use and determined their practices for the citation of data. Then we developed boilerplate language for a letter-writing campaign to style guide editors, tailoring each letter to reflect any specific issues with disciplines and style. This letter to the American Psychological Association (APA) acknowledges their current standard and provides suggestions for how they could align their practices with those established in our Quick Guide to Data Citation.

www.iassistdata.org/community/sigdc

Time Period Unit of Observation Data Type

Dataset Computer Program Conference Paper Conference Proceedings Dataset Dictionary Edited Book Electronic Article Electronic Book Electronic Book Section Encyclopedia Equation Figure Film or Broadcast Generic Government Document Grant Hearing Journal Article Legal Rule or Regulation Magazine Article Manuscript Map Newspaper Article Online Database Online Multimedia Pamphlet Patent Personal Communication Report Serial

Most citation software was created for use with traditional print resources like journal articles and books. When users want to save citations to datasets, often there is no reference "type" available to create an entry for it. Members of IASSIST successfully lobbied EndNote to add "dataset" to the list of citations types offered. The SIGDC is working to influence other management software producers like Zotero, RefWorks, and Mendeley to make sure data references are included and reflect the unique characteristics of data.

Dataset(s)

4 Providing Links to Resources The SIGDC Web site lists a number of resources available on data citation. To suggest additional resources, please contact the SIGDC at www.iassistdata.org/community/sigdc.

At its most basic, an IASSIST special interest group is composed of at least four members plus a coordinator. The primary purpose is to develop collective knowledge about a topic relevant to IASSIST members and then convey it to the broader IASSIST community and beyond.

An interest group can: • Contribute content to the IASSIST Web site (contact the Web Editor) • Guest edit a special issue of the IASSIST Quarterly journal (contact the IQ Editor: kbr @ sam.sdu.dk) • Present a session or a poster at a future conference If you have an idea for a special interest group, find at least three other like-minded IASSISTers, decide on a coordinator, and get it approved by the IASSIST Administrative Committee. Interest groups are a great way to stay connected with other members who have similar professional interests.

What is the IASSIST SIGDC? The IASSIST Special Interest Group on Data Citation (SIGDC) was established in 2010 to promote awareness of data-related research and scholarship through data citation. Citing data supports the discovery and reuse of data, leading to better science through the validation of results. It also recognizes data as an essential part of the scientific record. Nearly 50 IASSISTers have joined the group, which is chaired by Michael Witt, Purdue University, and Mary Vardigan, ICPSR. IASSIST Special Interest Group on Data Citation poster creators: R. Downs, M. Edwards, M. Hayslett, B. Mento, H. Mooney, E. Moss, M. Vardigan, and M. Witt Special Interest Group on Data Citation, 2012


1962 Founding

1972 Purchases Mainframe

1982 First Diskette

1992 FTP Service

2002 First Web site

SDA

Building on 50 Years of Data Management and Looking Forward

Consortium members: 21

Consortium members: 700+

Summer Program: 62 participants and 9 courses

Summer Program: 900+ participants and 65+ courses

Dissemination media: punch cards and magnetic tape

Bibliography

Online Deposit Form

ICPSR at

Dissemination media: on-demand, direct to computer via the Internet ICPSR Holdings 8,300+ studies 63,000+ datasets 60,200+ bibliographic citations

2012

Virtual Computer Labs — Online restricted data dissemination: ICPSR Secure Data Services Video research data dissemination Improved Online Analysis (SDA) Interface New File Management System — FLAME: File Level Archive Management Engine

ICPSR provides leadership and training in data access, curation, and methods of analysis for a diverse and expanding social science research community


ICPSR’s Strategic Plan: New Strategies and an Evolving Vision

The fi

rst go

al is s

ynergy . The seco nd goa

l is inclusion and diversity.

uil o a l i s to b The third g

Direction 1: Enhancing Our Global Leadership ICPSR will enhance its role as a global leader in data stewardship, engaging the global community as a partner, convener, advocate, and supporter.

As the data landscape undergoes rapid transformation, archives, universities, publishers, libraries, and others around the world are looking to ICPSR for leadership in adapting to new challenges and seizing new opportunities. There is also a need to introduce ICPSR to new audiences and to bring diverse voices and perspectives to the table to enrich the global scholarship network. By fully engaging with the global data community in a collaborative way, ICPSR can advance data-enabled science and facilitate research for the broader good.

d on stre

ngths.

o Where d ? you fit

Strategy 1: Develop international partnerships. Increase outreach to groups involved in managing, curating, preserving, and providing access to research data, creating interdisciplinary partnerships based on shared goals of advancing science and building sustainable infrastructure. Strategy 2: Facilitate consensus building and advocacy around issues of import across the full lifecycle of research data, from data collection to reuse and beyond. Take the lead in facilitating global discussion around key topics related to research data, with an emphasis on data access and sharing. Through convening groups and creating partnerships, ICPSR will stimulate the development of and compliance with standards and norms of the field, advocate for responsible data stewardship, and help to enhance and expand the shared infrastructure for data. Strategy 3: Support emerging capacity around stewardship of research data for international partners. Offer capacity-building support to international partners, with a focus on developing countries that are in the process of building new data systems but face resource- and capacity-related barriers. Strategy 4: Expand the ICPSR membership network to build the global community of scholars, ensuring an open and inclusive structure. Expand membership in parts of the world currently under-represented in the membership, enlarging the ICPSR network, facilitating more research, and this supporting new and existing members with additional training and research is where my job fits: opportunities. strat #4


Data Sharing for Demographic Research A data archive for demography and population sciences Director: Mary McEniry Research Technician Senior: Andrew Proctor • Research Assistant: Yishi Wang

Background and Mission/Vision Cooperative Agreement Funded by NICHD Purpose • Resource for users and producers of demographic and population science data • Facilitate data sharing for NICHD-funded projects

Who Accesses Our Data?

Data Files 733,270

Profile of Unique Data and Document Users (2010–2013)

Top Five Downloaded Studies in Health • National Longitudinal Study of Adolescent Health • India Human Development Survey • Welfare, Children, and Families: A Three-City Study • Immigration and Intergenerational Mobility in Metropolitan Los Angeles (IMMLA), 2004 • National Couples Survey, 2005–2006

48% Other Files 534,060 35%

Departmental Affiliation (2010–2013)

Graduate Student 47%

26%

Services • Data preservation and dissemination • Data sharing guidelines and assistance • Restricted-use data sharing User support and outreach • Focus on NICHD-Funded Research • Reproductive Health, Fertility, Family Planning, Sexual Behavior • Families and Households • Children and Youth • Health and Mortality

Files Downloaded: 1,529,474 (2010–2013)

17%

Sociology/Demography (2,267)

18%

Faculty

Codebooks 262,144

Economics (3,182)

12,440 Unique Users (2010–2013)

23%

18%

8%

Psychology (966)

6%

Public Health/Medicine (699)

5%

Undergraduate Student 8%

4% 4%

Research Staff Government Employee

0.5%

4%

Other

*Due to rounding, chart does not total 100%

4%

Moving into the Future... We expect to make significant progress in the following areas: • Increased awareness among researchers in the NICHD-funded population centers of the services we offer • Increased engagement with demographers and population scientists in how to improve data sharing in the research community • Increased presence in the international arena

Political Science/Government (549)

Public Policy/Public Administration (533)

Social Science (524)

Social Work (499) 3% Criminal Justice (420) 3% Statistics (404) 2% Education (400) 14% Business (294) Other Departments (1,703) 3%

Website and Contact Information www.icpsr.umich.edu/DSDR

Current Projects: Improvements in the Science of Data Sharing Common Catalog We are working to compile into one catalog information about NICHD-funded data collection projects in the last 10 years. When this catalog is completed investigators will be able to quickly identify projects of interest. Complex Merge Tool We are in the developmental stages of a tool that can be used as a teaching tool for students who may not feel comfortable merging large data sets together because they do not have a firm grasp of statistical software such as Stata or SPSS. Automated System for Processing Restricted Data Applications We have worked over the last year to refine our automated system to fit the needs of one of our most important studies — the Add Health study. Latin American Demography We are working with the Latin American demography association ALAP to hold a series of workshops on data sharing in the Latin American region.

Photo (c) Russ Bowling


Cross National Comparsons Across Low, Middle, and High Income Countries of Poor Early Life Nutrition and Diet and Older Adult Diabetes and Heart Disease Mary McEniry, Research Scientist Figure 1. Components of Daily Caloric Intake per Capita in the 1930s

Background The topic of early life conditions and older adult health in low and middle income countries continues to be of interest. Although data limitations exist, the recent increase in population-based studies of older adults combined with historical data may prove useful in better understanding the determinants of older adult health in these settings.

Methods A subset of cross national survey data on over 144,000 older adults in 20 low, middle and high income countries were used along with historical data on country-level daily caloric intake per capita from Latin America and the Caribbean, Asia, Africa, the US, England and the Netherlands (Figure 1) to estimate multivariate models for adult heart disease and diabetes as a function of childhood birthplace and nutrition and diet, adult education, obesity, smoking, health problems and visits to a doctor. The data are drawn from comprehensive national surveys of older adults or households. From Latin America there are the MHAS, PREHCO, SABE and CRELES. From Asia there are the CHNS, CLHLS, WHO-SAGE-China, IFLS, MHSS, WHO-SAGE-India and SEBAS. From Africa there are WHO-SAGE-Ghana and WHO-SAGE-South Africa. From the developed world there are the HRS, WLS, ELSA and SHARE-Netherlands.

Very Early

England Netherlands US Argentina Early Uruguay Cuba Chile Mid Costa Rica Puerto Rico South Africa Taiwan Brazil Late Mexico Russia China Very Late Ghana India Indonesia

Results

3005 2958 3249 3275 2902 2918 2481 2014 2219 2300 2153 2552 1909 2827 2201 2311 2021 2040

0

20

Cereals

40

F&V

60

80

Percent

Meat

Dairy

Sugars

100

Fats

Other

Note: Graph shows mortality regimes, countries, caloric supplies and composition of diet. Cereals includes cereals, roots, and tubers; F&V includes fruits, vegetables, and pulses.

Figure 2. Proportion Reporting Diabetes in Relation to Demographic Transition and Early Life Caloric Intake in the Early 20th Century 1 Eng

Indon

Very Early Neth

India

Very Late

US-HRS

Ghana

Two contrasting patterns emerged (Figures 2 and 3). The prevalence of diabetes was much higher in selected middle income countries whereas the prevalence of heart disease was higher in the higher income countries. Using multivariate models, as expected, poor early life nutrition was associated with a higher risk of adult diabetes in selected middle income countries (Table 1). In contrast, better early life nutrition was associated with adult heart disease in higher income countries (Table 2). The results remain consistent after controlling for adult lifestyle and socioeconomic status.

Conclusions Differences between the impact of dietary volume and dietary quality (composition of early life diet) on health may partially explain the contrasting cross national patterns between older adult diabetes and heart disease in low and middle income countries. Further investigation is warranted to better understand the contrasting cross national patterns between adult heart disease and diabetes and early life nutrition and diet and to examine the long term consequences of demographic transitions on older adult health.

US-WLS Arg

Chi-SAGE

Early Chi-CLHLS

Uruguay

Cuba

Chi-CHNS

3

Bang

Chile

.1 .15

Mex-SAGE Mex MHAS

.25 .32

Mex Brazil

Barb

2

CR

.2 PR S. Africa

Mid

Taiwan

1 - High caloric intake 2 - Mid-low caloric intake 3 - Low caloric intake

Late

Figure 3. Proportion Reporting Heart Disease in Relation to Demographic Transition and Early Life Caloric Intake in the Early 20th Century Very Early

1 Eng

Indon

Neth

India

Very Late

US-HRS

Ghana

US-WLS

Chi-SAGE

Arg

Early Chi-CLHLS

Uruguay

3

Cuba

Chi-CHNS

.1 Mex-SAGE

Chile

.15 .2

Mex MHAS

.25 Mex Brazil

Late

PR

.32 Barb

2

CR

S. Africa Taiwan

Mid 1 - High caloric intake 2 - Mid-low caloric intake 3 - Low caloric intake

References: National Research Council. (2001). Preparing for an aging world: The case for cross-national research. Panel on a Research Agenda and New Data for an Aging World, Committee on Population and Committee on National Statistics, Division of Behavioral and Social Sciences and Education. Washington, DC: National Academy Press. National Institute on Aging & US Department of State. (2007). Why population aging matters: A global perspective. Washington, DC: National Institute on Aging. Cross National Data on Aging Populations (2012). Methods report on the compilation of cross national data on older adults from 20 low, middle and high income countries. Madison, WI: University of Wisconsin.


GIS Resources at the National Archive of Criminal Justice Data NACJD offers • • • •

software, workbooks, and tutorials that support geographical data analysis geocoded data data with geographical information that can be geocoded by the user a step-by-step tutorial on how to use NACJD data with GIS software

Software, Workbooks and Tutorials developed with MAPS support • •

CrimeStat (www.icpsr.umich.edu/CrimeStat) — a spatial statistics program for crime incident locations. School Crime Operations Package (School COP) — an application for mapping incidents that occur in and around schools (www.schoolcopsoftware.com) CrimeMap Tutorial — designed for self-paced instruction (www.icpsr.umich.edu/ NACJD/cmtutorial.jsp)

Geographic Data in Mapping Software Files • • • •

ICPSR Study 4545, Development of Crime Forecasting and Mapping Systems for Use by Police in Pittsburgh, PA, and Rochester, NY. ICPSR Study 2895, Examination of Crime Guns and Homicide in Pittsburgh, PA. ICPSR Study 4546, Exploratory Spatial Data Approach to Identify the Context of Unemployment-Crime Linkages in Virginia. ICPSR Study 4547, Geographies of Urban Crime in Nashville, TN, Portland, OR, and Tucson, Arizona.

Data with FIPS Codes include: • • •

ICPSR Study 3398, Gangs in Rural America. Juvenile Court Statistics (series) Law Enforcement Management and Administrative Statistics (series)

Data with XY Coordinates • •

ICPSR Study 9998, Arrests as Communications to Criminals in St. Louis ICPSR Study 6644, Effects of the Baltimore County, MD, Police Department's Community-Oriented Drug Enforcement (CODE) Program

Data with Census Tracts include: • •

ICPSR Study 6399, Homicides in Chicago ICPSR Study 3261, Neighborhood Revitalization and Disorder in Salt Lake City.

Data with Zip Codes • • •

ICPSR Study 3864, Public Opinion on the Courts in the United States. Annual Survey of Jails in Indian Country (series) ICPSR Study 2895, Examination of Crime Guns and Homicide in Pittsburgh, PA.

References for Linking Geographic Identifiers to Data • •

Law Enforcement Agency Identifiers Crosswalk [United States], 2005 Uniform Crime Reports (UCR) and Federal Information Processing Standards (FIPS) State and County Geographic Codes 1990: United States

O U R Sponsors The National Institute of Justice's Mapping and Analysis for Public Safety (MAPS) program sponsors grants and research projects to examine crime, law, and public disorder from a geographic perspective. Some MAPS projects led to the development of statistical and mapping software or provided data for use with GIS software described above.

Other GIS-adaptable data were sponsored by the Bureau of Justice Statistics, and the Office of Juvenile Justice and Delinquency Prevention. All of these GIS resources are located on the NACJD web site under the Geographical Information Systems Data Resource Guide.


Beecher-poster-FINAL3.pdf

C

M

Y

CM

MY

CY

CMY

K

1

5/17/11

10:36 PM


Infusing Quantitative Literacy Throughout the Social Science Curriculum William Frey, George Alter, John Paul DeWitt, Suzanne Hodge, Lynette Hoelter (University of Michigan), and Flora McMartin (Broad-Based Knowledge, LLC)

This project is one of a pair of coordinated efforts focused on the use of quantitative data in undergraduate courses. Both projects aim to encourage instructors to use social science data to introduce concepts, examine phenomena, and test hypotheses as a way of engaging students with the social sciences a deeper way. Exposure to real data not only gives students a better sense of how social scientists work, it also reinforces quantitative literacy skills such as reading tables, interpreting graphs, setting up and conducting basic analyses, and evaluating empirical arguments. These valuable skills will be useful in students’ everyday lives, even if they do not take another social science course. Including data early and often in the curriculum should also create a foundation for later research methods and statistics courses.

Main Objectives

Project Findings to Date

• Develop an operational definition of quantitative literacy as it is applied in the social sciences, match this definition to specific student learning outcomes, and create assessment tools based on those outcomes. Field test tools by faculty. • Examine the ways in which faculty members find, choose, adopt, and use online data-driven learning modules. • Introduce faculty to the American Community Survey and its utility in the classroom though the development of new modules, Webinars, and workshops.

• A rubric defining the elements of quantitative literacy was derived and example items measuring each learning objective were identified. • Approximately ten faculty members incorporated the rubric and specified learning objectives for data-driven learning modules used in their own courses and created assessment strategies to measure changes in students’ quantitative literacy. All found that students were more confident in using tables and numbers than they were early in the course. Reading crosstabulation tables remained difficult for students, though the percentage interpreting them correctly rose substantially in each class. The faculty represented a variety of types of institutions (community college, liberal arts college, research university) and all found effects of module use. • The process of using the modules and assessing the effects made instructors more aware of pedagogical issues and some changed their approach to creating and grading assignments as a result. • Across the group of participants, some had a much easier time incorporating modules into their courses than others. That difference provided a window into unanticipated barriers to this teaching technique. • A large-scale survey of social science faculty provided a baseline for understanding the use of data-driven learning guides and other digital resources in the classroom.

Research Questions 1) Why do instructors include data-driven learning modules in their courses? Is quantitative literacy a specific learning objective? What factors influence decisions to adopt/use data-driven learning modules? 2) What does “quantitative literacy” mean to social science faculty? How can student learning be measured? 3) Do modes of dissemination (e.g., workshop, Website, partnering with other faculty) differ in their impact on long- and short-term teaching practices?

Next Steps • Participants are currently collecting a second semester’s worth of assessment data as well as reflecting on the process of incorporating the data-driven learning modules. • A survey and interviews with individual faculty members will be conducted to examine not only the effectiveness of teaching quantitative literacy using data-driven learning modules, but also whether the context of one’s initial exposure to doing so influences the potential to create, adapt, or adopt such modules. This will inform later dissemination efforts.

Dimension

1 Unacceptable

2 Acceptable

3 Accomplished

Calculation: Ability to perform mathematical operations

Performs few/less than half of calculations correctly.

Successfully performs many calculations but patterns of errors are evident.

Successfully performs most calculations, errors are rare

Interpretation: Ability to explain information presented in a mathematical form (e.g., tables, equations, graphs, or diagrams)

Incorrectly explains information in key forms of presentation or with many errors across types of data.

Correctly explains information in some forms correctly (but not others) or makes several errors across various data forms

Correctly explains information in most forms consistently or makes few errors across various data forms.

Representation: Ability to convert relevant information from one mathematical form to another (e.g., tables, equations, graphs or diagrams)

Unable to convert data from one mathematical form into any other form or makes significant errors when doing so.

Able to convert data from some mathematical forms into some, but not all, other forms or converts among all forms with several errors.

Able to convert data from most mathematical forms into other forms, or converts among all forms with a few errors.

Analysis: Ability to make judgments based on quantitative analysis

Rarely or never makes correct judgments based on data presented.

Generally makes correct judgments based on data presented.

Often makes correct judgments based on data presented.

Method selection: Ability to choose the mathematical operations required to answer a research question

Consistently unsure of the correct mathematical operation (e.g., the correct measure of centr tendency or bivariate tests appropriate to the level of measurement) to answer research questions.

Accurately chooses the correct mathematical operation to answer research questions some of the time.

Accurately chooses the correct mathematical operat answer research questions most of the time.

Estimation/Reasonableness Checks: Ability to recognize the limits of a method and to form reasonable predictions of unknown quantities

Unable to assess the limitations of a method or to predict quantities that are reasonable based on relevant data.

Able to assess the limitations of some methods or under some circumstances. Predicts reasonable quantities in many cases.

Able to assess the limitations of most methods under most circumstances and typically predict reasonable quantities based on relevant data.

Communication: Ability to use appropriate levels and types of quantitative information (data, reasoning, tools) to support a conclusion or explain a situation in a way that takes the audience into account.

Fails to develop an argument or bases it on weak or incorrect quantitative information. Presents the information without taking the audience into account.

Develops an argument using quantitative information that is incomplete, irrelevant, or somewhat misinterpreted, therefore weakening the argument. The argument may not

Develops an argument using quantitative information that is either slightly incomplete, not the most relevant, or with slight misinterpretations, or presents he argument in a way that does not fit the intended audience.

Find/Identify/Generate Data: Ability to identify or generate appropriate information to answer a question

Rarely able to determine what kind of data would be appropriate to answer specific questions.

Generally able to determine what kind of data would be appropriate to answer specific questions.

Almost always able to determine what kind of data would be appropriate to answer specific questions.

Research design: Understand the links between theory and data

Unable to identify dependent/independent/control variables, or to determine the variables that may be important in examining an issue. Cannot write proper research questions and/or is not able to critique the use of data in popular media.

Correctly performs some of the following: 1) identifies dependent/independent/control variables, 2) determines what variables may be important in examining an issue, 3) writes viable research questions, and/or 4) is able to critique the use of data in popular media, or performs them several but with some errors.

Correctly performs most of the following: 1) identifies dependent/independent/control variables, 2) determines what variables may be important in examining an issue, 3) writes

Uneasy about completing most or all tasks related to quantitative data.

Expresses confidence in completing some tasks related to quantitative data.

Expresses confidence in completing most tasks related to quantitative data.

Confidence: Level of comfort in performing and interpreting a method of quantitative analysis Content learning outcomes varies by module/course

4) is able to critique the use of data in popular media, or performs them all with few errors.


Exploring New Methods

for Protecting and Distributing Confidential Research Data Felicia B. LeClere

National Opinion Research Center

Bryan Beecher

University of Michigan

How can we better protect confidential research data? Digital repositories currently ask researchers to create and implement an IT security plan. Even with the best of intentions, creating such a plan carries many risks: Does the researcher have access to the IT resources to create a credible plan? What if the plan is not fully implemented? And if it is implemented at the start of the research project, is the plan updated over time?

How can we make confidential research data more accessible? Understandably, data producers are concerned about the security of the data they collect. They must balance the potential opportunities presented by secondary use of their data against the risk that the data may be lost or stolen. Data producers will not share their results if they are not confident that the data will be used in a safe and secure environment.

How can we lower the barriers to using confidential research data? Researchers have ready access to public-use data from many different sources: the transaction cost is low, typically requiring only that they login to a web portal to download the desired content. The transaction cost to using restricted-use data is much higher, often including the need to construct a highly secure computing environment, an expensive task that slows the research process.

This work has been conducted under NIH grant 1RC1HD063792-01 from the Office of the Director of the Eunice Kennedy Shriver National Institute for Health and Human Development

Steve Burling

University of Michigan

Cloudlet Chooser Collects system & application selections from the researcher Cloudlet Launcher Automates cloud-based workstation and storage provisioning

Research Plan Research Credentials

Trusted Digital Repository

Cloudlet Eraser Securely erases researcher’s workspace

Stuart Hutchings University of Michigan

White Hat hacker to probe our system for vulnerabilities

Network Operations Center to measure performance and availability

http://www.flickr.com/photos/ solo_with_others/558162687/

Traditional Sharing Protocol

CI-enabled Sharing Protocol Research Plan Research Credentials

Research Plan Research Credentials IT Security Plan

Trusted Digital Repository

Confidential Data

Trusted Digital Repository

Confidential Data Secure Storage Secure Compute

In this project, the Inter University Consortium for Political and Social Research and partners at the Rand Corporation and the Survey Research Center at the University of Michigan will build and test a data storage and dissemination system for confidential data, which obviates the need for users to build and secure their own computing environments. Recent advances in public utility (or “cloud”) computing now makes it feasible to provision powerful, secure data analysis platforms on-demand. We will leverage these advances to build a system which collects “system configuration” information from analysts using a simple web interface, and then produces a custom computing environment for each confidential data contract holder. Each custom system will secure the data storage and usage environment in accordance with the confidentiality requirements of each data file. When the analysis has been completed, this custom system will be fed into a “virtual shredder” before final disposal. This prototype data dissemination system will be tested for (1) system functionality (i.e., does it remove the usual barriers to data access?); (2) storage and computing security (i.e., does it keep the data secure?); and (3) usability (i.e., is the entire system easier to use?). Contract holders of two major data systems (the Panel Study of Income Dynamics and the Los Angeles Family and Neighborhood Study) will be recruited to assess both the user interface and the analytic flexibility of the new customized computing environments.


Using Quantitative Data in Teaching: ICPSR Resources

Online Learning Center (OLC) www.icpsr.umich.edu/OLC

The OLC is made up of flexible, stand-alone Data-driven Learning Guides that can be adapted to fit one’s teaching style. Meant primarily for intro-level courses, each Web-based Learning Guide focuses on a basic social science concept (e.g., political attitudes, deviance, power in intimate relationships) and utilizes survey data to help students connect what they are learning with real-world examples. In addition to actively learning about the focal topic, students are exposed to the research process and learn how to build data-based arguments. Statistical tests range from measures of central tendency to regression, with an emphasis on crosstabulation.

TeachingWithData.org www.teachingwithdata.org

TeachingWithData.org, funded by NSF and created in partnership with the Social Science Data Analysis Network, is a portal for faculty with links to resources that reduce the challenges of using data in the classroom. It contains readily available, user-friendly, data-driven teaching materials including exercises and games, interactive and static tables and maps, and pedagogical references. All resources are tagged with such metadata as topic, context of use, and a short description for easy searching.

Exploring Data Through Research Literature (EDRL) www.icpsr.umich.edu/EDRL

The EDRL presents a unique approach to teaching about research. Using ICPSR’s Bibliography of Data-related Literature, this series of exercises demonstrates that social science research is not conducted in a vacuum. Instructors choose a focal article from the Bibliography and students are asked to find additional articles that share a characteristic (topic, author, dataset) with that article. EDRL is particularly useful for methods or capstone courses where students plan their own research projects.

Investigating Community and Social Capital (ICSC) www.icpsr.umich.edu/ICSC

Based on Robert Putnam's Bowling Alone, the ICSC introduces students to quantitative research using a case study of social capital. Sequenced activities model the research process by helping students use a codebook to operationalize concepts and carry the project through to analysis and interpretation. Important concepts such as unit of analysis, levels of measurement, longitudinal vs. cross-sectional data, and replication are also highlighted.

Voting Behavior: The 2008 Election (SETUPS 2008) www.icpsr.umich.edu/SETUPS2008

The 2008 American National Election Study is used to explore characteristics of voters and their behaviors. The module provides information on survey research and the election as a foundation for the analyses. The 2008 edition continues the American Political Science Association’s Supplementary Empirical Teaching Units in Political Science (SETUPS) series.

Many of ICPSR’s value-added resources are useful in undergraduate instruction. For example, the Social Science Variables Database is a great way to introduce operationalization, survey design effects, and changes in question wording/importance of various topics over time. The Bibliography of Data-Related Literature is a good starting point for students who have trouble coming up with a research topic or the dataset with which to study it. Data are classified by subject into Thematic Collections and the Survey Documentation and Analysis (SDA) online statistical software allows for easy exploration of almost 600 of the datasets within ICPSR’s holdings.

STUDENT OPPORTUNITIES

TEACHING RESOURCES

Lynette F. Hoelter & Suzanne Hodge, University of Michigan

Research Paper Competition

www.icpsr.umich.edu/icpsrweb/ICPSR/prize/ index.jsp Each year, ICPSR sponsors competitions for undergraduate and Master’s-level research papers. uthors must use data within ICPSR’s holdings to critically analyze a social scientific topic. Cash prizes are awarded to winners in each category and both the student author and his/her mentor are recognized in printed and online materials.

Undergraduate Internships

www.icpsr.umich.edu/icpsrweb/ICPSR/careers /internship.jsp Ten-week paid internship opportunities offer students firsthand experience with data-handling and the analysis of secondary quantitative data! Interns participate in processing a study for dissemination and preservation, conduct a research project with the help of a faculty mentor (resulting in a conference-ready poster), and take classes in the ICPSR Summer Program in Quantitative Methods of Data Analysis.


Supporting Quantitative Literacy

Online Learning Center Supporting Quantitative Literacy ICPSR’s Online Learning Center (OLC) tools and resources assist faculty who strive to help students open the door to the world of statistical inquiry and critical thinking. The OLC is: • Informed by discussions with teaching faculty • Focused on bringing data into the classroom • Designed to meet faculty’s most common challenges: the ability to quickly locate relevant datasets that are easy to work with and clearly demonstrate the concepts or relationships • Formatted to enable faculty to customize to their personal teaching approaches and incorporate into their individual syllabi • Intended to provide a portal where faculty can locate resources dedicated to enhancing quantitative literacy in the social sciences

The OLC helps students: • Test ideas and hypotheses • Enter the world of statistics and methods without intimidation • Engage in fact-based discussion

Lynette F. Hoelter, University of Michigan

OLC Content

Future Possibilities

Current OLC Instructional Resources

The OLC will continue to develop data-driven Learning Guides and provide resources for teaching faculty including:

Data-Driven Learning Guides • Designed to introduce students to social science topics, research methods and statistics, and data interpretation through hands-on use of datasets held within ICPSR • Common template across all guides that describes the focal concept and chosen datasets, demonstrates the concepts, and suggests interpretation issues and reference materials • Substantive topics reflect those typically taught in introductory (100-200 level) courses, but the variety in coverage and statistical sophistication allows for use in more narrowly focused substantive classes • Analysis is simplified through the recoding of variables to collapse categories or deal with missing data as appropriate MyClass Tool Enables instructors to quickly register students en masse for temporary ICPSR MyData accounts Web Resources Links to ICPSR and external resources for teaching quantitative skills OLC Listserv Instructors receive news about OLC teaching resource updates OLC Blog Public forum where instructors can share ideas about how to strengthen quantitative literacy in social science courses

• Instructions and templates for faculty to submit their own Learning Guides through the Faculty Contribution Center • Learning Guides with a variety of topics, data, and statistical techniques • Assessment Center will highlight the most frequently used Learning Guides and assist in measuring student learning outcomes

www.icpsr.umich.edu/olc Goal & Concept

• Enhanced MyClass registration system

About ICPSR

Example Results

ICPSR is the world’s largest archive of digital social science data. We acquire, preserve, and distribute original social science research data. What We Do ICPSR, a membership organization, is a vital partner in social science research and instruction. We support students, faculty, researchers, and policy makers who seek to: • Write articles, papers, or theses to fulfill undergraduate or graduate requirements • Conduct secondary research to better understand results of a study, support findings of primary research, or generate new findings • Preserve and disseminate primary research data • Study or teach statistical methods

Interpretation of Results


George Alter, John P. DeWitt, William Frey, Suzanne Hodge, and Lynette Hoelter University of Michigan

TeachingWithData.org is a portal where faculty can find resources to bring real data into post-secondary classes. Instructors are encouraged to infuse quantitative reasoning throughout the social science curriculum with readily available user-friendly, data-driven instructional materials. Students benefit by practicing quantitative skills, being exposed to the creativity and excitement of empirical research, minimizing the disconnect between substantive and methods courses, and gaining a better understanding of how social scientists work.

• Creating a Toolkit to simplify the process of creating and sharing data-driven learning modules. Data Translation Making existing data available to less experienced users through the:

Data-Related Tools Assisting faculty to create new data-based learning modules for their own courses by:

• Creation of Extracts and Tools to assist faculty in developing custom extracts for common packages such as SPSS, SAS, and Stata. • Inclusion of Online Analysis Software such as SDA and WebCHIP to alleviate the need for additional statistical packages. • Repackaging of the American Community Survey for Instruction with simple, easy-touse extracts of the ACS for multiple levels of geography and covering a wide range of topics and applications.

• Automating Data Extraction to allow instructors or students to easily identify relevant data and create custom data extracts using only the variables needed for an exercise. • Connecting Data With Analysis/Visualization Tools to display survey data as tables, graphics, and maps.

Community Building Training faculty and encouraging use and sharing of materials though workshops, webinars, and outreach activities. Web 2.0 features and involvement of editorial and advisory committees provide opportunities for interaction and collaboration among users.

Tabular and downloadable data Analysis and visualization tools Student exercises Games and simulations Statistics information Interactive and static databased maps • Pedagogical/SoTL resources • • • • • •

About the Partners ICPSR is the world’s largest archive of digital social science data. We acquire, preserve, and distribute original social science research data. ICPSR is a vital partner in social science research and instruction by supporting students, faculty, researchers, and policymakers working with primary and secondary data.

PA

Repository Cataloging of existing learning tools and other resources to help incorporate data from government sources, opinion polls, and social science surveys into student learning. Resources are tagged with metadata to facilitate searching for appropriate materials. Examples of captured materials include:

N

Project Partners TeachingWithData.org is a partnership between the Inter-university Consortium for Political and Social Research (ICPSR) and the Social Science Data Analysis Network (SSDAN), both at the University of Michigan. The project is funded by NSF Award 0840642, George Alter (ICPSR), PI and William Frey (SSDAN), co-PI.

RT

Primary Functions of TeachingWithData.org

S R E

SSDAN is a university-based organization that creates demographic media, such as user guides, Web sites, and hands-on classroom computer materials that make U.S. census data accessible to policymakers, educators, the media, and informed citizens. Additional Collaboration and Support From: • American Economic Association Committee on Economics Education • American Political Science Association • American Sociological Association • Association of American Geographers

• Consortium for the Advancement of Undergraduate Statistics Education • The MAA Mathematical Sciences Digital Library • National Numeracy Network • Science Education Resource Center at Carleton College


Secure Data Services Technologies focused on broadening access to sensitive research data while simultaneously reducing disclosure risk

broadening access. reducing risk. Why “Restricted” Data?

ICPSR ensures respondent confidentiality by either removing, masking, or collapsing variables in public-use versions of research datasets

Why Online Contract Ingest and Tracking?

Why Online Analysis and Disclosure Risk Review?

• Enables data producer to establish custom terms of data use and contract behavior preferences

• Sensitive data no longer ‘shipped’ on removable media • Reduces data user’s data protection plan requirements

• Enables end users to apply for restricted data online and track progress

Sometimes protective measures taken to reduce disclosure risk significantly degrade the research utility of the data. In these cases, ICPSR provides access to restricted-use versions that retain confidential data by imposing stringent requirements for access.

• Enables data producer, if desired, to conduct additional disclosure reviews prior to release of new analyses

• Enables ICPSR user support to track hundreds of approved end users electronically

ICPSR Secure Data Services 1. Obtain Access

2. Use the VDE

user

yes

Locate Desired Data

Requestrequest Access to Data

user

user

Locate data of interest on an ICPSR archive Web site.

Find, complete, and submit the IDARS web form to begin the contracting process.

request

3. Export Results

reject

accept Conduct

Approval Process icpsr sponsor drb

• Review application for completeness • Review application for minimum requirements • Respond with Acceptance, simple Rejection, or Rejection with encouragement to revise and resubmit

accept To: User

Complete Training

user is certified

Send automated IDARS-triggered reminders to users and ICPSR User Support to ensure compliance with disclosure procedures, including user recertification as necessary

Provide VDE Instance

Notify Users: VDE is Ready

Download VDE Client Software

Begin Using VDE

users

cns

icpsr

users

users

Receive training for disclosure. Certification is granted for a specified time period (at most 1 year). Users will need to be recertified if they need access beyond that time period.

Create a virtual host machine accessible from the users’ IP address(es). Generate a user ID and password for each user. Future: Verify users’ identity via bio-identification. Utilize a worldwide network of sentinels (like notaries).

Notify users that their VDE instance is ready, and they can begin working with it. Convey to the users their IDs and passwords.

Online Training System

VDE Client

3 years

Place Results in Export Finder Provide Help icpsr user support

ICPSR Web site

IDARS portal

COMING SOON: ICPSR video data dissemination technology!

ICPSR Virtual Data Enclave

Applications • Microsoft Office • SAS • SPSS • Stata • SUDAAN U-M Information Technology Services’ Virtual Data Infrastructure

workspace storage commitment expires

contract ends

users

Conduct Site Visit

Review Results icpsr sponsor drb

Conduct a disclosure review of the users’ results.

icpsr

to be constructed

IDARS portal

4. Obtain Results

icpsr

Prepare data and documentation files for the users to use in the VDE. This may include merging external data at the user’s request.

Revise and Resubmit?

• New cost structures may include virtual data labs and/or data user fees

Issue Training Reminders

icpsr

no

• Costs are variable and attributed to each data user machine – a challenge for the consortium’s traditional pooling of funds approach

to be constructed

Prepare Data and Docs

request is abandoned

Challenges & Future Developments

User Workspace export folder ICPSR Computing and Network Services’ secure data environment

accept

Export Results icpsr icpsr data group supervisor

Export result files from the VDE to the user.

Disable User Access

Delete User Workspace

cns

cns

ICPSR retains the user’s VDE workspace for 3 years after the end of their contract. Users may contact ICPSR to regain access.

icpsr cns sponsor drb IDARS portal

An ICPSR topical archive ICPSR’s Computing and Network Services Group An external data provider An external Disclosure Review Board ICPSR Data Agreement Request System Web Portal

Cole Whiteman • colew@umich.edu • 3/19/2012 Confidential and Proprietary • Not for Redistribution


Kristine Witkowski

Finding a Needle in a Haystack:

University of Michigan

The Theoretical and Empirical Foundations of Assessing Disclosure Risk for Contextualized Microdata Study #1 — Risk Posed by Population Uniques Offset by Other Reidentification Factors

Motivation • The scientific value of contextual data has led to a growing demand for spatial information.

• What factors should be considered when assessing disclosure risk for contextualized microdata?

• Limited vs. full search • Search priorities

Haystack Composition

Population Size

Components of Risk

Hard-to-Count Composition

Not Named

Contextual Composition Known Respondent Location.

Many Named

• Unable to assign names, uncertainty from missing twins • Coverage error of ID files • Accuracy of haystack size

Respondent from Group-1 / Group-2

?

Reidentification Probabilities

?

Few Named

Haystack for Group-1 / Group-2 Under Counted Zero Probability of Respondent from Group-1 / Group-2

Data and Methods • Simulations using artificial set of survey respondents that reflect the spatial dispersion and composition of U.S. population • Compositional measures: Race/ethnicity, socioeconomic, social context, and labor market

Study #2 — Risk Rises with Small Study Area, Number of Geography Attributes, and Measurement Detail

• Spatial scale: Counties, tracts, and block groups Absolute and Proportional Change in Standardized Number of Look-Alike Geographic Units

Absolute and Proportional Change in Standardized Number of Look-Alike Geographic Units

1,082

Obscured by Look-Alike Geographies.

geographies having populations�above and below 100,000

» Reflects average population size of counties » Function of design elements

? ? ? ? ? ? ?

% Not Named

» Averages for survey respondents with look-alike

• Study #2: Standardized Number of Look-Alike Geographies

% Small Number of Look-Alike Individuals

Reidentification Probabilities

Reidentified Respondent

Many Unique

% Small Number of Look-Alike Geographies

• Ability to pinpoint location and respondent • Number of look-alike geographies (i.e., context) • Number of look-alike individuals (i.e., twins, haystack size)

Risk Measures • Study #1: Number of Look-Alike Geographies and Individuals, Proportion of Tracts Undercounted

Population Uniques

Few Unique

Population Uniques Haystack

Few

% Respondents

Intruder Search Behavior

Field of Look-Alike Geographies

• But to release geographically-rich microdata,�one must protect both the utility of the collected information and the privacy of subjects.

Respondents From Context

Many

(- 0.85) 165 (- 0.91) 97 34 (- 0.97) 20 (- 0.98)

Amassed Look-Alike Individuals Not Named.

0.32) 381 (-(-0.32) 0.45) 307 (-(-0.45)

County

(- 0.62) 276 87 (- 0.88) 37 (- 0.95) 18 (- 0.97) 10 (- 0.99)

National Population Density Division Division & Pop. Density State State & Pop. Density

Tract

(- 0.63) 262 82 (- 0.88) 34 (- 0.95) 17 (- 0.98) 9 (- 0.99)

0.43) 227 (-(-0.43) 0.58) 165 (-(-0.58) 0.75) 99 (-(-0.75) 0.82) 71 (-(-0.82)

Blockgroup Figure 1. Scope of Study and Reductions in Full Intruder Search Effort

1-Key 2-Keys 3-Keys 4-Keys 5-Keys

(14.30) 482 (14.30)

15 (4.67) 83 (4.67)

Tract

(25.79) 392 (25.79)

Blockgroup Figure 2. Number of Geographic Attributes and Reductions in Full Intruder Search Effort

(9.73) (9.73) 157 (14.64) (14.64) 229 (18.35) (18.35) 283 !",-%+*"'"

392 0.44) 220 (-(-0.44) 0.61) 154 (-(-0.61) 0.77) 91 (-(-0.77) (0.83) (- 0.83) 66

(8.04) 285 (8.04) (10.80) 372 (10.80)

18 (3.62) (3.62) 82 (7.66) (7.66) 153 (11.02) (11.02) 213 (14.63) 277 (14.63) (19.70) 366 (19.70)

• Sources: 2000 U.S. Census of Population and�Housing; Tract Level Planning Database with Census 2000 Data (Bruce and Robinson 2003)

Conclusions

(19.38) 642 (19.38)

398

705

Blockgroup

(3.94) 156 (3.94)

County

0.61) 216 (-(-0.61) 0.68) 179 (-(-0.68)

723

Tract

31

556

(- 0.48) 568

County

Absolute and Proportional Change in Standardized Number of Look-Alike Geographic Units

1% 5% 10% 15% 20% TB25%

Figure 3. Measurement Detail of Geographic Attributes and Increases in Full Intruder Search Effort

• Contextualized microdata may be a viable method of safely distributing geographically rich information, particularly for county-level contexts. • Coverage error has a potentially important role in ensuring the anonymity of respondents. Research support from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), Grant 5 P01 HD045753 as a supplement to the project Human Subject Protection and Disclosure Risk Analysis, is gratefully acknowledged.


Identifying Measurement Error Introduced by Harmonization of Ancestry Data What is the Integrated Fertility Survey Series? The Integrated Fertility Survey Series (IFSS) is a project of the Population Studies Center and the Inter-university Consortium for Political and Social Research at the Institute for Social Research at the University of Michigan. With funding from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD, 5R01 HD053533), IFSS is developing a harmonized dataset of U.S. family and fertility surveys conducted between 1955 and 2002 period, including the Growth of American Families surveys, the National Fertility Surveys, and the National Surveys of Family Growth.

Harmonization Process Five principles for measuring origin categories over time (Hahn & Stroup, 1994): 1. Conceptually valid

2. Mutually exclusive and exhaustive

3. Consistent (same concept over time, studies)

4. Flexible (captures different terms over time; e.g., “Negro” or “Black”; “Spanish” or “Hispanic”)

5. Construct validity (O’Leary-Kelly and Vokurka, 1998; Carmines and Zeller, 1979) Key • Unidimensionality • Convergent and discriminant validity

• Concurrent validity/measurement error

Discrete Origin Response Options by Study

Christopher Ward Jeremy Albright Felicia B. LeClere

Pamela J. Smock, P.I. Peter Granda Lynette F. Hoelter

University of Michigan

Background Ex-post data harmonization of ambiguously comparable variables poses challenges, including how best to harmonize ex-post data on respondent origin from seven surveys.

Summary of Relevant Questionnaire Differences

Response options, question structure, question text, and variable availability vary across studies. Availability of different origin groups in a given study depends on immigration patterns over time. The way those groups were captured depends on the structure of each questionnaire.

# unique origin variables Multiple origins within some response options? Open-ended responses possible? Question text: “origin” or “nationality” Father’s or respondent’s origin?

1955 GAF

1960 GAF

1965 NFS

1973 NSFG

1976 NSFG

1982 NSFG

1988 NSFG

1

1

1

16

1

2

16

Yes

Yes

No

No

Yes

No

No

Yes

Yes

No

No

No

No

No

Nationality

Nationality

Nationality

Origin

Origin

Origin

Origin

Father

Father

Father

Respondent

Respondent

Respondent

Respondent

Construct Validity Test: A factor analysis was performed on all six harmonized variables to determine the unidimensionality of each origin construct. Result: Each harmonized origin variable loads strongly onto one of five factors. W. European and African origin variables load onto a common factor, though oppositely.

Concurrent Validity Factor Analysis of Harmonized Origin Variables: Varimax-rotated Factor Loadings1: Variable W. Europe

Factor 1

Factor 2

Factor 3

Factor 4

-0.9032

E. Europe

.9939

Asia Africa

.9998 .9189

Latin America

.9956

All other 1

Factor 5

.9946

Factor loadings less than .3 have been suppressed

Test: Tetrachoric correlations were computed for each pair of harmonized origin variables and previously established correlate variables. Results: Tetrachoric correlations generally conform to expected relationships between harmonized origin variables and related religion, race, and region-of-residence variables (see handout).

Construct Validity: Convergent and Discriminant Validity Test: A logit regression was performed on each harmonized origin variable for the 1988 NSFG. Explanatory variables included race, religion, and region of residence. To address the classificatory power of the logit model, we produced a classification table indicating the proportion of cases in which predicted mention of origin matches observed mention of origin. Predicted probabilities of an origin’s mention were calculated for each harmonized origin variable from the logit model. Correlations between predicted probabilities and harmonized origin variables were calculated to test convergent and discriminant validity.

Percentage of Cases Correctly Classified Percentage correctly classified

W. Europe

E. Europe

Latin America

Asia

Africa

All other

85%

95%

93%

96%

95%

90%

Pearson Correlations between Predicted Probabilities and Harmonized Origin Variables

Correlation with Pr (origin mentioned)

W. Europe

E. Europe

Latin America

Asia

Africa

All other

.67

.42

.38

.56

.92

.18

Results: The logit model has strong classificatory power, correctly classifying each origin at high rates (85 to 96 percent). Inspection of the distributions of predicted probabilities suggests strong discriminant validity and moderate-to-strong convergent validity (see handout). All but one harmonized origin variables have reasonably high correlations with the predicted probabilities established from the logit regression model, suggesting both high convergent validity and high discriminant validity.

Discussion Harmonized origin variables generally demonstrate strong validity over time. The use of harmonized data requires acknowledgement of error introduced. Tradeoff: Harmonization renders more efficient analysis at the cost of introducing error. A model is needed to estimate the extent to which some origin groups are un- or undercounted.


Building Partnerships Between Social Science Data Archives and Institutional Repositories Jared Lyle

George Alter

Amy Pienta

lyle@umich.edu

altergc@umich.edu

apienta@umich.edu

Fostering Partnerships This project, which is supported by an award from the Institute of Museum and Library Services (IMLS), is a first step in fostering partnerships between domain archives and Institutional Repositories. ICPSR distributes social science data to more than 700 member universities, colleges, and research organizations.

As our members develop their own repositories and digital archiving capacities, we are eager to develop new services to help Institutional Repositories discover, evaluate, curate, preserve, and disseminate social science data.

We will follow the framework outlined by Myron Gutmann and Ann Green in their article “Building Partnerships Among Social Science Researchers, Institution-based Repositories, and Domain Specific Data Archives.”

V. Access to and stewardship of data and metadata over the long-term are commitments made by the Domain Repository. Ongoing contributions of research and development of domain-specific tools and standards for creation and preservation are made available by the Domain Repository to Institutional Repositories and the Research Community.

I. Even at the early stages, research projects can benefit from conversations with repository experts about intellectual property issues, long-term digital preservation planning, access controls, confidentiality considerations, file format options, and metadata standards.

This project has three primary goals: 1. Form partnerships with Institutional Repositories to curate and archive a small number of social science datasets as pilot studies. 2. Use these experiences to develop best practices for archiving such materials, which we will publish in a guide designed for use by Institutional Repositories. 3. Identify and design services that ICPSR can offer to Institutional Repositories to assist them with specialized tasks in the archiving and dissemination of social science data. We thank the Institute of Museum and Library Services (IMLS) for project funding. We also thank Myron Gutmann and Ann Green for allowing us to reference the illustrations and descriptions from their article, as well as Cole Whiteman for providing the Partnering with IRs illustrations and the ICPSR Pipeline Process diagram.

IV. When the research project is at the publication and sharing stage, the data are moved into the repositories and shared services are developed and supported.

III. Repository experts can provide tools and support for long-term sharing and archiving, as well as guidelines for data processing and metadata production, confidentiality review, and other requirements.

II. High level metadata can be produced and passed from researchers to local repositories when initial work gets underway. This can trigger a local discussion about data processing requirements and what the research team needs to know to deposit data in IRs and with the later possibility of passing archive-ready versions of data to domain-specific repositories.

There is a need for guidelines and tools to inform Institutional Repositories about best practices in archiving social science data and specialized services for unusual or complex curatorial tasks. The case studies to be processed at ICPSR will be used to develop guidelines and decision rules for Institutional Repositories. ICPSR’s Guide to Social Science Data Preparation and Archiving is an example of providing guidance to data creators.

Institutional repositories have different capabilities and needs. We will provide a menu of services that ICPSR can provide to IRs for archiving social science data. These services may include: • Data recovery from obsolete file formats & media • Disclosure analysis • Codebook validation • Metadata construction and editing • Management of dissemination of confidential data • Links in the ICPSR catalog to data in IRs


Integrated Fertility Survey Series: Harmonizing Fertility and Family Data from 1955–2002

What is the IFSS?

Variable Selection Tool

The IFSS is a harmonized dataset created from ten nationally representative surveys of family and fertility. Researchers from many disciplines have collected a large body of data on family and fertility patterns, but there are difficulties in comparing such data over time. The harmonization carried out by the IFSS team overcomes these difficulties – including weighting, imputation, and changes in the respondent universe – and renders data collected in ten surveys over five decades comparable. Thus, researchers, policymakers, students, and others will be able to more easily and accurately analyze fertility and family trends over time.

Faceted searches help guide users through data.

Variable groups released: • Adoption history • Children from unions • Cohabitation • Dates • Education • Family characteristics • Farm residence history • Future union expectations • Geography • Husband/partner children • Husband/partner information • Income & supplemental income • Menstruation history • Non-biological & adopted children • Origin/descent • Pregnancy summary • Race & ethnicity • Religion • Union histories with dates • Urbanicity

The harmonized file, composed of sociodemographic variables for the respondent and her current or most recent husband or partner, union history variables, pregnancy and adoption summary variables, and subsample filter variables, is available at the IFSS Web site: www.icpsr.umich.edu/ifss. The surveys included in the IFSS are: • 1995 and 1960 Growth of American Families studies • 1965 and 1970 National Fertility Surveys • 1973,1976,1982,1988,1995, and 2002 National Surveys of Family Growth The IFSS data are made possible with funding from Eunice Kennedy Shriver National Institute of Child Health and Human Development (grant #5R01 HD053533). The project is a partnership between the Population Studies Center and the Inter-university Consortium for Political and Social Research, both centers of the University of Michigan's Institute for Social Research.

Custom extraction allows users to create tailored subsets.

Current Status • A harmonized file, composed of information regarding sociodemographic status, childbearing and adoption summaries, and marriage and relationship histories, is currently available with documentation for download and online analysis. • The remaining harmonized data products, addressing topics such as contraception, infertility, attitudes and expectations, and pregnancy interval files will be available between late 2011 and 2012. • All ten component surveys, as well as the harmonized file, are available from ICPSR in ASCII format with setup files; as SPSS, SAS, or Stata system files; and with online analysis capabilities.

Comparability notes provide information on variable creation and are available by clicking on the variable name.

Concept map indicates where variables are available.

Work history variable groups coming soon: • Breastfeeding • Child/infant characteristics • Contraception & fertility attitudes • Contraceptive use • Fertility intentions & expectations • Infertility diagnoses & treatment • Pregnancy outcomes • Sexual behavior


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.