Annual Report 2014
NCDS: It’s about access and collaboration
he last year has been a busy one for the National Consortium for Data Science. As a new organization dedicated to bringing together data specialists in business, academia, government and the nonprofit sector, we operate in a space that is new, different, exciting, and sometimes challenging. While other groups interested in seizing the opportunity of big data have formed across the country, none is as focused on bringing together industry leaders and researchers to address these issues together. The NCDS believes that industry leaders who have access to world-class researchers in data science benefit by getting a glimpse at cutting-edge data research, understanding research challenges, and helping researchers focus their work in ways that lead to business innovation and new opportunities. We also believe researchers are enriched by access to people who use and create data in government and in the private sector—people who can help them understand the impact of their work on practical problems that affect efficiency, productivity, and competitiveness.
NCDS Members Corporate: Deloitte Consulting, LLP General Electric IBM Cisco Systems, Inc. SAS Institute, Inc. Academic: Drexel University Duke University North Carolina State University UNC Chapel Hill UNC General Administration
For those reasons, NCDS events in 2014 focused on bringing together people with different backgrounds and perspectives. Our Data Innovation Showcase gave members and promising students the chance to share their most interesting projects and ideas. Our Data Science Faculty Fellows were selected by a committee of mostly industry members in an effort to align Fellows’ research projects with industry interests.
Of course data science and innovation can only take place if we have talented data specialists to feed a hungry job market and if professionals in all sectors have the skills to succeed in a data-rich world. That’s why education was an important component of NCDS activities in 2014. From our data careers panel discussion, held in collaboration with UNC Career Services, to the Data Matters Short Course Series that attracted more than 110 working professionals, the NCDS is committed to programs that will help our nation meet the big data challenge head on.
UNC Charlotte Texas A&M University Nonprofit: MCNC Government: National Institute of Environmental Health Sciences U.S. Environmental Protection Agency
This report contains an overview of our activities during the last year. My sincere appreciation goes out to all our members and supporters who made these events successful. As the data deluge continues to grow and present new challenges and opportunities, the NCDS is also growing and building momentum. I invite you to join us on this unique journey. I guarantee it will be interesting.
Stanley C. Ahalt, PhD. Chair, NCDS Steering Committee Director, Renaissance Computing Institute (RENCI) Professor of Computer Science, UNC Chapel Hill
National Consortium for Data Science 100 Europa Drive, Suite 540 Chapel Hill, NC, 27517 919-445-9640 www.data2discovery.org email@example.com @thencds Page 2
NCDS awards fellowships to faculty researchers advancing data science
he NCDS named five
faculty members at North Carolina universities as its inaugural Data Science Faculty Fellows. The Faculty Fellows each received $30,000 to support research projects that address novel and innovative data science issues. The program aims to enable research, fund prototype development, and facilitate activities that support the NCDS vision of unleashing the power of big data by developing and mastering data science. The Fellows program also seeks to foster relationships between university researchers and NCDS members, to bridge gaps between research and practice, to promote innovative approaches to data science challenges, and to engage the next generation of data scientists. Twenty faculty members from seven institutions submitted proposals for the Fellowships, which were reviewed by a committee of NCDS members and supporters. The program was the first official NCDS effort to support scientists involved in research that shows promise
for advancing data science. The NCDS expects to continue the program in 2015. Congratulations to the 2014 Faculty Fellow awardees:
• Rajeev Agrawal, PhD, assistant professor, department of electronics, computer and information technology, North Carolina A & T State University. Designing Sustainable and Domain Neutral Next Generation Data Infrastructure to Advance Big Data Science. • Jane Greenberg, PhD, professor, School of Information and Library Science, UNC-Chapel Hill, and Director, Metadata Research Center. The Metadata Capital Initiative. • Blair Sullivan, PhD, assistant professor, department of computer science, North Carolina State University. Tracking Community Evolution in Dynamic Graph Data Using Tree-like Structure.
• Wlodek Zadrozny, PhD, associate professor, College of Computing and Informatics, UNCCharlotte. Searchable Repository of Resilience and Sustainability Technologies. • Justin Zahn, PhD, department of computer science, North Carolina A & T State University, COMDET: A Novel Community Detection System for Large Networks.
In addition to furthering the NCDS vision, Data Fellows are expected over time to generate measurable deliverables such as new methods, models, applications, or prototypes that can be used to develop larger efforts supported with extramural funding. For more information, see http://data2discovery.org/ data-fellows/.
Member working groups to launch in late 2014
CDS member organizations will have the chance to collaborate on data challenges directly related to their organizational objectives through working groups that span disciplines, business sectors, and the public and private sectors beginning in late 2014. Working groups will give members opportunities to address data issues from a variety of perspectives and to develop outcomes that impact members, such as white papers, position papers, best practice documents, lectures, panel discussion or special events.
Although details about structure and activities are still being finalized, tentative plans call for small groups of members, and possibly nonmember experts to meet several times over the course of a year. The groups will give members a mechanism for interacting with data experts who are not part of the NCDS and possibly with data science students. More information about working groups will be presented at the NCDS Fall teleconference and in future newsletters. Page 3
First student-industry networking event focuses on data careers
ata-focued businesses are always on the hunt for bright young talent, and students studying curricula related to data science are anxious to know how their educations can translate into rewarding careers. As an organization that spans industry and academia, the NCDS has the ability to connect these job providers with job-seeking students. The first NCDS student networking event, held in collaboration with the UNCChapel Hill Career Services Office brought more than 100 students with a wide range of backgrounds and interests to UNC’s Hanes Hall on the evening of April 7 to discuss career opportunities in big data and informatics. The students learned from industry experts about career opportunities in data science and the skill sets that employers look for. The industry representatives also offered advice to students on furthering their education and pursuing internships and job opportunities. A Data Science Industry Panel from NCDS member institutions kicked off the event. Panelists talked about the data science expertise needed at their organizations and provided advice for landing jobs in data
Above: Students listen to NCDS panelists at the UNC career event. Left: IBM’s Dianne Fodell talks to students at the event’s networking session.
science fields. A Q and A session followed. Panelists included Pat Herbert, SAS; Dianne Fodell, IBM; Monique Morrow, Cisco; and Craig Hill, RTI. The NCDS plans to offer the event annually, possibly at different NCDS member campuses.
Fall UNC Computer Science Tech Talks to Feature NCDS speakers
his Fall, the UNC computer science department and University Career Services invite NCDS corporate representatives to take part in NCDS Tech Talks. The talks will allow industry members to share information about career and internship opportunities with their organizations and educate the UNC community about their business objectives and corporate culture.
trends, and begin the relationship building process that leads to gainful and fruitful employment.
Tech Talk speakers will offer a general overview of their company, positions currently open with the company, and skills and characteristics they look for in employees. Additionally, they will share current research questions and challenges and discuss the qualities they look for in employees to help address these issues.
Both talks will begin at 5:30 p.m. and last until 7:30 p.m., including time for networking between students and speakers. Both will take place in Sitterson Hall on the UNC Chapel Hill campus.
Featured speakers: • October 9: Karen Davis, Vice President, Research Computing Division, RTI International • October 23rd: Russ Gyurek, Director of Innovation, Office of the CTO, CISCO
Tech Talks provide an opportunity for NCDS members and students to network in a relaxed and convenient environment, discuss current data science interests/ Page 4
NCDS co-hosts Data Matters Summer Workshop Series
he NCDS co-sponsored a summer workshop series June 23 - 27 with RENCI and UNC-Chapel Hillâ€™s Odum Institute. The Data Matters Summer Workshop Series was aimed at business leaders, academic researchers and government officials who could benefit by better understanding how to manage, use, share and store big data. The courses targeted people interested in learning how to leverage the so-called â€œdata delugeâ€? to their benefit, those looking to understand how data can be used in their work, and those interested in specific software and data challenges. The week involved two-day courses on Monday/ Tuesday and Thursday/Friday and one-day courses on Wednesday. Classes were conducted at the William and Ida Friday Center for Continuing Education in Chapel Hill, and most also included hands-on lab sessions on the UNC-Chapel Hill campus. Instructors included experts from the Odum Institute, RENCI, University of Massachusetts at Amherst, Saffron Technology, Pennsylvania State University, Duke University and Cisco. Topics covered during the packed week included data science, its goals, techniques and concepts, strategies for managing big data, social network analysis, data management tools such as Hadoop and SAS, using large-scale data networks, data mining and machine learning, and data visualization. There were over 110 students in attendance representing 25 different organizations; the majority representing universities. The week also included a kick-off reception Monday night at Top of the Hill restaurant, where attendees, instructors and invited guests had the opportunity to network more informally with their classmates and experts in the field.
At the reception, UNC-Chapel Hill Executive Vice Chancellor and Provost James Dean welcomed participants and spoke about the value of data. In his words, data is the currency of the 21st century, and those who learn how to analyze, manage, share and glean knowledge from it will be the leaders of the 21st century. Dean ended by thanking the Data Matters sponsors and participants. A second Data Matters short course series is planned for June 2015. For additional information, visit: http://data2discovery.org/ events/10/data-matters-summer-workshop-series/. UNC Provost James Dean
NCDS provides support to data conferences
s part of its effort to advance data science, the NCDS partners with organizations planning conferences of interest to the field. In May, the NCDS was a gold-level sponsor of the second international conference on big data science and computing. Sponsored by the Academy of Science and Engineering, the conference, called BigDataScience, was held at Stanford University in Palo Alto, CA. Justin Zahn, an NCDS Faculty Fellow, served as the conference steering chair and Stan Ahalt, head of the NCDS Steering Committee and director of RENCI, presented a talk on the NCDS and importance of advancing data science. For more on the conference, visit the
BigDataScience website. In March 2015, the NCDS will participate in the Data4Decisions conference and exposition, a new national trade show that organizers hope to hold annually at the Raleigh Convention Center. Ahalt is a member of the conference planning committee and the NCDS plans to participate as an exhibitor and sponsor. For more information, visit the Data4Decisions website at http://data4decisions2015.com/.
Data Innovation Showcase features presentations, student posters
he NCDS Data Innovation Showcase brought together NCDS members, NCDS Data Science Faculty Fellows, and talented data science students to share ongoing and new innovative data-related projects, activities and ideas. The event was held May 21 at RENCI headquarters in Chapel Hill. The Showcase included three components. First, NCDS Faculty Fellows delivered short presentations about their NCDS-supported projects. Later in the day, NCDS members presented on a wide range of topics, including ongoing data science research, development of new products and services, case studies, and more. Presentations by NCDS members included: • Judith Cone, special assistant to the Chancellor for Innovation & Entrepreneurship, UNC-Chapel Hill. Developing Data-literate Students. • Bill Wheaton, director of the Geospatial Science and Technology program, RTI International. Concepts and Applications of Large-Scale Synthetic Human Populations. • Stanislav Minsker, PhD, visiting assistant professor, Duke. Geometric Median-based Approach to Robust and Scalable Statistical Estimation. • Russ Gyurek, director, Innovation Labs-CTO Group, CISCO. IoT. • Pat Herbert, principal systems architect for big data, SAS International. From Interesting to Actionable…Data Science Yields Functional Results. • Steve Gustafson, Knowledge Discovery Lab manager, GE Global Research, Big Industrial Data.
• Claire McPherson, SAS Global Alliances, Deloitte LLP. Deloitte Analytics - Embedding Analytics in Everything We Do. Students from member institutions presented posters on topics related to the NCDS mission during the Student Poster Session. Posters were Above: Student Poster Session participants reviewed by an NCDS committee, and 14 posters were on display all day. Five students received Best Student Poster awards: • Michael O’Brien, PhD (May 2017), Computer Science, NC State. • Angela Murillo, PhD candidate (May 2015), Information & Library Science, UNC-Chapel Hill • Rebecca Lee, BS-Biology (May 2014), UNC-Chapel Hill. • Muhammad Suleiman, MS-Information Technology (May 2014), NC A&T State. • Kristin Garrett, PhD candidate (May 2016), Political Science, UNC-Chapel Hill. All winning posters received monetary awards and all student participants received certificates of appreciation. The NCDS plans to replicate this event in 2015. For more, see http://data2discovery.org/ innovation-showcase/. Page 6
Data Observatory launches with Dataverse Network and data sets
he NCDS Data Observatory seeks to create a diverse repository of very large data sets for NCDS members to use and share in support of the mission of advancing data science. It will provide a place for those interested in the science of data to form a community to exchange tools, approaches, data and other relevant information.
Left: RENCI’s EriK Scott (left) shows UNC computer science students how to work with a data supercomputer designed for dataintensive computing using Hadoop.
Last fall the implementation team, including a graduate student from UNC Chapel Hill’s computer science department, set up the observatory’s computing environment and completed test installs of iRODS (the intergrated Rule-Oriented Data System) and an instance of a Dataverse Network, a container for research data studies, customized and managed by its owner. This year, the team created an NCDS-branded Dataverse Network (http://observatory.data2discovery.org/dvn/) and uploaded the first two data sets to the network: a North Carolina Digital Elevation Model, and storm surge and wind wave data from the ADCIRC modeling system. They also will document how users create their own Dataverse within the NCDS network and address issues such as account requests and creating use agreements. As part of an effort to link development of the data observatory with data science education, RENCI Senior Research Software Developer Erik Scott and a graduate student supported big data courses at NC State and UNC-Chapel Hill during the 2013-2014 academic year. Their work focused on educating students on Hadoop,
a framework for massively parallel data storage and analysis. Scott set up a machine environment and accounts for both classes, presented lectures, and collaborated on creating homework assignments and grading. The courses were: Statistics 810, Big Data: A Statistical Perspective, taught by Lexin Li, associate professor in the NC State department of statistics; and Information and Library Science 690 - 163, Introduction to Big Data and NoSQL, taught by Arcot Rajasekar, professor in the UNC School of Information and Library Science.
NCDS Mission To advance data science to better enable the U.S. to utilize big data in ways that result in new jobs and industries, advances in healthcare, and transformative discoveries in science. NCDS Goals • Engage a broad community of data science experts in business, academia and government. • Facilitate interaction among data specialists across disciplines and business sectors so that data challenges can be addressed strategically and holistically. • Support the development of educational programs that will train a new generation of data scientists and develop a data-literate workforce. • Encourage the development of technical, ethical and policy standards for data. Stay in touch with the NCDS Our electronic newsletter, Data Matters, is published quarterly and more often if needed. Anyone can sign up to receive the newsletter from the homepage of the NCDS website (data2discovery.org).