FROM DATA TO DISCOVERY: A PUBLIC-PRIVATE PARTNERSHIP TO ADVANCE DATA SCIENCE The National Consortium For Data Science (NCDS) is a collaboration of leaders in academia, industry and government formed to address the data challenges and opportunities of the 21st century. The NCDS was founded as a mechanism to help the U.S. take advantage of the ever-increasing flow of digital data in ways that result in new jobs and industries, new advances in healthcare, transformative discoveries in science, and competitive advantages for U.S. industry. The NCDS works to advance four key goals: + Engage broad communities of data experts + Coordinate data science research priorities that span disciplines and industries + Facilitate the development of education and training programs + Apply NCDS expertise to data challenges in science, business and government NCDS 2015 Members

RESEARCH YEAR 2 DATA FELLOWS ATTACK NOVEL DATA PROBLEMS The NCDS named three faculty members at three different universities as 2015 NCDS Data Fellows. Each Data Fellow received $50,000 to support work that addresses data science research issues in novel and innovative ways. Their work will be expected to advance the mission and vision of the consortium. Data Fellow positions are open to faculty members at all NCDS academic institutions, which includes universities in the University of North Carolina system, Duke University, and Drexel University. A wide range of researchers from six different universities applied for the Fellowships. Their research proposals addressed many of the hot topics in data science, from cybersecurity to applying techniques used by online music databases to develop more precise search algorithms. In addition to final reports on their research, this year’s Data Fellows also presented webinars in the NCDS DataBytes Lunchtime Series. To view their presentations, visit the DataBytes archive. The 2015 NCDS Data Fellows are: + David Gotz, PhD, associate professor, School of Information and Library Science, UNC-Chapel Hill, and assistant director of the Carolina Health Informatics Program (CHIP). Visual Analytics for Large-scale Temporal Event Data. + Erik Saule, PhD, assistant professor, Department of Computer Science, UNC Charlotte. Toward Machine Oblivious Graph Analysis. + Erjia Yan, PhD, assistant professor, College of Computing and Informatics, Drexel University. Assessing the Impact of Data and Software on Science Using Hybrid Metrics.

David Gotz

Erik Saule

Erjia Yan

The Data Fellows program enables corporate members to network with academics and their students around industry related topics to foster future collaboration and employment opportunities. y Steven Gustafson, PhD, GE Global Research


RESEARCH RENCI TEAMS WITH GEORGIA TECH ON BIG DATA HUB PROPOSAL A new effort to develop a Big Data Regional Innovation Hub covering 16 southern states and the District of Columbia will be led by NCDS member RENCI and the Georgia Institute of Technology. The South Big Data Regional Innovation Hub (South BD Hub) will be established through the National Science Foundation’s Big Data Regional Innovation Hubs (BD Hubs), an initiative to establish innovative public-private partnerships on the key challenges and opportunities related to big data. NSF Big Data hubs for the South, Northeast, Midwest and West were announced by NSF on Oct. 29, 2015. Each of the NSF BD Hubs will engage businesses and research organizations in their region to develop common big data goals that would be impossible for individual members to achieve alone. The Hubs will develop community-driven governance structures as well as Hub “spokes” based on regional big data priorities and partnerships. NCDS members supported the RENCI-Georgia Tech BD Hub proposal and as it develops, they will have opportunities to participate in Hub activities.

PARTNERSHIP WITH NC STATE GOVERNMENT WILL INVESTIGATE DATA SCIENCE “GRAND OPPORTUNITY” The NCDS will work with the North Carolina Board of Science, Technology, and Innovation (NCBSTI) to determine the importance of data-driven research and business to the state’s economy, both now and in the future. The partnership was established after the NCBSTI identified data science as one of the six “grand opportunities” for North Carolina’s economy. The project, now in the planning stage, will involve a survey of North Carolina’s data science assets and an analysis to determine how the state compares to other states in terms of data science assets and employer needs for data science talent. The NCDS and the NCBSTI will deliver a report to the North Carolina Department of Commerce that summarizes the survey results, including the likely impact of data science jobs on the economy and recommendations for optimizing the potential of data-focused business and research. The partners will also develop metrics to track data science research and business endeavors as economic development engines for North Carolina. The NCDS Workforce Development Working Group will leverage the project as it works to understand the impact of data science on the state and national workforce and the skills that will be needed to prepare workers for success in a data-driven economy.


STATE FUNDING WILL HELP NCDS DATA OBSERVATORY GROW The NCDS Data Observatory was established to provide a diverse repository of very large data sets for NCDS members to use and share for data science research and education as well as data analytics. The Observatory gives NCDS members a platform for the exchange of tools, approaches, data, and other relevant information. In March, the Observatory became one of the key components of the North Carolina Data Science and Analytics (DSA) Initiative. The project teams NCDS – through its connection to RENCI – with UNC Charlotte and North Carolina State University and brings together ongoing research efforts at the three institutions to develop a data science and analytics technology infrastructure that will support strategic hubs of excellence. The project, led by researchers at UNC Charlotte, will receive about $2.1 million over three years through the UNC Research Opportunities Initiative (ROI) awards, funded by a targeted North Carolina legislative appropriation. For more information, read our press release.

NCDS WORKING GROUPS OVERVIEW NCDS Working Groups bring NCDS members together to identify and investigate specific data science questions and challenges important to their organizations. Each working group produces products that facilitate discussion and problem solving related to its data science topic. Working group products could include white papers, position papers, best practice documents, open forum lectures, panel discussions, or special events.

Current NCDS Working Groups: Anonymizing Data Privacy and/or security concerns can limit user access to data and reduce the analysis that can be done on that data. Examples of such data sets include population use cases, medical records, and financial transactions. The limitations in techniques for masking data to protect privacy while still allowing access for analytics represents a hurdle for achieving the benefits of big data. Internet of Things Many organizations have launched programs to create products, procedures, and solutions that will take advantage of the growing network of connected devices known as the IoT. However, the data produced by the IoT is not easily available or usable and the underlying technologies needed to sustain a worldwide IoT are still developing. The IoT Working Group seeks to define the challenges and opportunities of this disruptive technology. Workforce Development In 2011, the McKinsey Global Institute predicted an acute deficit of “deep analytical talent” in the workforce by 2018. The response to this deficit ranges from university master’s programs to vendor-driven training to MOOCs and other online resources. The Workforce Development Working Group addresses the gap between forecasted data science graduates and demand in the workforce, examines the skills needed to build the data workforce, and considers how to develop a data science core curriculum.

RESEARCH IOT FOR THE INDUSTRIAL INTERNET WORKSHOP DEFINES KEY RESEARCH QUESTIONS Internet-enabled sensors and devices that network and communicate with each other – often called the Internet of Things (IoT) – are the next grand challenge in big data. With this in mind, the NCDS established an Internet of Things Working Group focused on developing a clear research and problem-solving agenda on the IoT and its uses in business, government, research, and healthcare. Working group members participated in an NCDS community workshop, sponsored by the National Science Foundation (NSF), and hosted at Cisco Systems’ San Jose, CA global headquarters July 29 and 30. At the workshop, IoT and data science thought leaders from industry, government, and academia worked to define a thoughtful, cohesive research agenda specific to the IoT for the Industrial Internet, which will help the NSF guide future research priorities. Three keynote speakers offered a focus and structure for participants. Steve Lohr, technology reporter for The New York Times and author of Data-ism, spoke to the philosophical buildup of the IoT and its grand potential for the future. Lohr’s take-aways from the workshop are summarized in a recent article on The New York Times tech blog, Bits. Lance Donny, founder and CEO of OnFarm, an IoT platform for farmers and food producers, spoke of the practical IoT applications now used in the agricultural industry. He defined the ages of farming as pre-industrial, industrial, and the new age of Ag 3.0. Peggy Irelan of Intel delved into technical aspects of the IoT, including the safest and most efficient ways to move IoT data to the edge. According to Irelan, “When dealing with the IoT, the most important priorities are that it be trusted and actionable.” Without these characteristics, she says, the big data era is nothing more than too much information. Workshop participants produced a list of key IoT research questions designed to focus future attention on the most important challenges and opportunities related to the IoT for the Industrial Internet: security, analytics, privacy, standardization, movement, and future talent to drive the IoT workforce. The NCDS will publish a white paper summarizing the research recommendations developed at the workshop. To be notified of the white paper’s publication, please email us at info@data2discovery.org, sign up for our newsletter, or find us on Facebook. To view the workshop slides and keynote presentations, visit the NCDS website. Future workshops addressing the IoT as it relates to developing smart cities and mobile health applications are being planned for 2016.

Big Data opens the door to revolution in measurement and makes possible a different mindset or point of view about decision making. y Steve Lohr, The New York Times


From left: Bo Begole, Huawei Technologies, Jose Alvarez, Huawei Technologies, Noel Greis, UNC-Chapel Hill, Russ Gyurek, Cisco, and Ashok Krishnamurthy, RENCI, discuss IoT successes and challenges in opening night table sessions of the IoT for the Industrial Internet Workshop.


EDUCATION DATA MATTERS DRAWS OVER 100 STUDENTS TO CHAPEL HILL A second cohort of students spent part of their summer learning data science skills at the Data Matters Short Course Series, sponsored by the NCDS, RENCI, and the Odum Institute for Research in Social Science at UNC-Chapel Hill. A total of 132 business managers, data analytics specialists, academic researchers, and others who grapple with big data attended the short course series held during the last full week of June at the William and Ida Friday Center for Continuing Education. Students delved into topics such as information visualization, data curation, health informatics, open data, and machine learning taught by instructors from UNC-Chapel Hill, Duke University, the Georgia Institute of Technology, and the University of Massachusetts at Amherst. Wondwossen Lerebo, from Mekele University in Ethiopia, said he traveled all the way to Chapel Hill because of the need for data analysis skills in developing countries. “If you take HIV/AIDS, for the past 20 years the CDC/USID were helping developing countries to collect data. And most of the time, those data are not properly analyzed. I had an interest in analyzing those data, and at that time, I had a limitation on data mining using the recent software like R and health informatics. That’s why I came here. The investment is rewarding, and I believe that I am going to apply what I’ve learned.”

Closer to home, UNC-Chapel Hill student Erica Brody studied information visualization and big health data. “I learned about eight different tools to do information visualization, and I think I’ll probably use some of those tools in my career. I also enjoyed meeting the other participants and learning about other people’s uses for data.” At an evening reception for students, instructors and guests, Stan Ahalt, director of RENCI, touted the importance of big data analytical talent in the modern economy. “Data has permeated every facet of our society, so really the question is how are we going to make use of all this data, and you’re the answer. Everywhere I go, this is the itch that people are trying to scratch, so the skills you’re learning in taking these courses are invaluable.” Tom Carsey, director of the Odum Institute, added that the data revolution, above all else, is about creating a better world. “Data is about people. To aspire to do things like lowering invasive procedures in neonatal care or to help manage environmental changes to make people’s lives better and safer, this is what we can do in data science and this is what we can do [in academic environments] where we might not otherwise.” The series will be held again June 20-24, 2016. For updates regarding this year’s classes and how to register, join our mailing list.


CAREER PANEL DRAWS FUTURE DATA SCIENTISTS TO UNC CAMPUS Networking with Data Science Professionals: A Panel Discussion and Meet-up with Triangle Business Leaders once again gave UNC students interested in data science careers the chance to interact with representatives from NCDS member institutions. The event, held April 9 on the UNC campus, included a panel from four corporations – Deloitte, EMC, IBM, all NCDS members, and MetLife – who shared their data science employment needs. The panel was followed by an informal meet-up where students, faculty, and business representatives had the chance to network. Dianne Fodell, program director for IBM Global University Programs, told the students that “big data, analytics, and cybersecurity are going to be skills for life,” while MetLife’s Director of Software Development Ajay Patel urged students to volunteer in order to see data science in action. Deloitte’s Bob Dalton and EMC’s Stephen Worth also talked about the data science expertise needed in their organizations and provided advice for landing jobs in data science fields.

Dianne Fodell, IBM, offers advice to a room of potential future data scientists at the spring career panel.


EVENTS DATABYTES WEBINAR SERIES PROVIDES FORUM FOR DISCUSSION OF ALL THINGS DATA April marked the start of the NCDS DataBytes Lunchtime Webinar Series. The series gives NCDS members and the broader data science community a chance to delve into a new topic each month. This year featured presentations from representatives of RTI International, Cisco, GE, and Deloitte in addition to the NCDS 2015 Data Fellows. Topics ranged from general overviews, such as Russ Gyurek’s (Cisco) talk on the Internet of Things, to more specific presentations like Bill Wheaton’s (RTI) discussion of synthetic populations in the healthcare realm. Visit the DataBytes web page to view archived videos and sign up for the next webinar in the monthly series.


Geospatially Explicit Synthetic Populations: Concepts, Developments, and Application


The Internet of Everything: Grand Challenges and the Next Big Wave in Data


Bill Wheaton, Director of Geospatial Science and Technology, RTI International

Russ Gyurek, Director of Cisco Innovation Labs

Leveraging Artificial Intelligence for Big Industrial Data Science Steve Gustafson, Knowledge Discovery Lab Manager, GE Global Research

Big Data, Small Data, and Behavioral Insights for Better Decision-Making James Guszcza, Chief Data Scientist, Deloitte Consulting

Toward Effective Analytics for Large-Scale Temporal Event Data David Gotz, NCDS Data Fellow and associate professor of information science, UNC School of Information and Library Science


Toward Machine Oblivious Graph Analysis


Assessing the Impact of Data and Software on Science Using Hybrid Methods

Erik Saule, NCDS Data Fellow and assistant professor, department of computer science, UNC Charlotte

Erjia Yan, NCDS Data Fellow and assistant professor, College of Computing and Informatics, Drexel University

NCDS CONTINUES CONFERENCE SUPPORT Thoughtful data science conferences often spur connection and conversation, two important facets of creating community. When Data 4 Decisions launched in Raleigh in 2015, the NCDS supported the conference – which assembled business and technical leaders to uncover the strategies that best harness the benefits of big data. The conference’s success has merited its return to Raleigh, March 22-24, 2016. NCDS also supported the ASE International Conference on Data Science, held at Stanford University in August. This conference provided perspective on the current state-of-the-art and recent advances to address data science, cybersecurity, and social computing. The NCDS also helped support the unconference Analytics Forward, held in Durham, NC on Pi Day (March 14). The event brought together participants from various fields to learn the latest techniques, trends, and tools in data analytics. 10

