Roadmap for Research Libraries Involvement in eScience by Oxana Smirnova, Bertil Dorch and Alfred Heller Application of computers is a common premise in most contemporary science. This results in digital-born data, results and publications. Research libraries are affected in that they have to meet the associated challenges facing their services in time. DEFF addresses e-Science by including it in its "Handlingsplan 2008" (action plan) and by appointing a work group to prepare a "Roadmap for e-Science", which can form the basis for a work plan on this topic. The commission of this Roadmap is to be a survey on e-Science, leading to a set of recommendations to the Steering Committee and DEFF itself. The recommendations should support international efforts and at the same time respect the limited size of Denmark and the resources available. The concept of "e-Science" is ambiguous; hence as a compromise between precision and completeness, we choose the following definition:
"Science performed through collaboration enabled by the Internet, using electronic data collections and computing resources, and the technologies that enable such collaboration" This definition is based on Taylor (2005), Hey (2006) and others. It follows that the related concepts include "Open Data", various forms of "Grid Technologies", "Virtual Laboratories", "Scientific Computing" and "Online Communities".
Brief international background The utilization of computers in science has had a large impact on science methods. Theoretical and experimental work is supported by computational science, visualization, data mining, simulations and other methods. During the past decade, the application of networks has changed the cooperative aspects of science enabling global, distributed science, connecting sensors, experiments, computational resources and scientists. Back in 2003, in "Revolutionizing Science and Engineering through Cyberinfrastructue", Atkins introduced the term Cyberinfrastructure, which seems to be the "tool" for the science that was named e-Science, by Tony Hey (Hey and Hay, 2006) among others. Due to the limitation of the term "Science" in number of research fields, other terms are introduced, such as "Digitial Scholarship", "eResearch" etc. A given setup of a cyberinfrastructure can also be called a "Virtual Research Environment" (VRE), or "Virtual Workspace", exemplified by e.g. the Virtual Observatory concept. Within eResearch, a Data Life Cycle was defined to consist of Data Acquisition, Data Ingest, Metadata, Annotation, Provenance, Data Storage, Data Cleansing, Data Mining, Curration and Preservation. It seems very obvious that libraries did all this work regarding the research publications. Hence, it seems obvious that research libraries can play a role in the
emerging e-Science endeavor. "Subject and Institutional Repositories will be a key part of Cyberinfrastructure" (Hey, presentation). A compiled list of projects and networks are presented at http://www.dresnet.net/relatednetworks. This homepage is managed by an international network under JISC that aims at connecting e-Science with Institutional Repositories and Grid. It could be valuable to join this network (cf. recommendations below). Among other institutional repository projects (Fedora-Commons, DSpace), DRIVER, http://www.driver-repository.eu/, a European project involves DTU. A relevant statement for the current work:
Innovation, resulting in a richer information space, is achieved by extending the focus from textual publications for research and higher education to "enhanced publications", i.e. publications that link to supplementary materials such as primary data or multimedia. The German eSciDoc project by the Max Planck Institute, http://www.escidocproject.de/JSPWiki/en/Startpage aims at developing an infrastructure based on FedoraCommons handling some of the issues that have to be tackled; complex object modeling, relationships between resources, unique identification schemes, authorization and much more. The project outcome is open source and could be applied in Denmark too. The Dataverse Network Project (http://thedata.org/ ) aims at solving the many topics within e-Science and libraries by establishing standards and web tools. The project points out eight requirements that must be addressed to solve the objectives: 1) 2) 3) 4) 5) 6) 7) 8)
Association between data and publications Accessibility of data Authorization for non open data Validation of data sets and sources Verification of data to ensure that the data is the "original" Persistence Ease of use Legal protection.
The open source, free Dataverse Network software has been developed to solve these challenges. The European DILIGEN (http://www.diligentproject.org/ ) project aims at developing a service oriented infrastructure creating an advanced test-bed to support virtual e-Science communities in knowledge sharing and collaboration. National and international networks and projects are working within the limits of this Roadmap. Denmark has a very limited financial framework and should join these activities. International cooperation ought to be coordinated through one of the above mentioned projects or the Knowledge Exchange network http://www.knowledge-exchange.info/.
e-Science in Denmark e-Science is an emerging topic in Denmark. During the last few years, the term has not been used frequently, but now there are a number of new, ambitious initiatives e.g. University of Copenhagen's cross-disciplinary Master's education in e-Science. e-Science is also described at the Danish Ministry of Education's website "UddannelsesGuiden.dk", and eScience has received exposure on Danish television in the popular science show "Viden Om", informing about research performed using Danish supercomputers (November 2007), and featuring representatives of IBM Denmark. Similarly, several initiatives and much ongoing work fall within the definition of e-Science that we employ here. Practically all science using computing resources at the Danish Center for Scientific Computing (DCSC) is natural born e-Science, and includes research topics within the fields of astrophysics (both theory and observation), bioinformatics, chemistry, climate research, and economy. Additionally, DCSC builds and operates Grid infrastructures in Denmark. The term "Grid computing" was coined and widely used somewhat earlier than "e-Science"; consequently, Grid efforts in Denmark precede those formally named e-Science. First Gridrelated developments were triggered by the needs of high-energy physics researchers participating in CERN experiments, and took place back in year 2000, when the Niels Bohr Institute got actively involved in the CERN-led EU DataGrid project and became one of the co-founders of the NorduGrid collaboration. Back then these activities were supported through the Nordunet2 programme. Based on the success of NorduGrid, the Nordic Data Grid Facility (NDGF) was established in 2002 with funding from Nordic Councils of Ministers, and Brian Vinter (then at SDU - University of Southern Denmark) was appointed NDGF Director. Both NorduGrid and NDGF are international activities that involve primarily Nordic countries, but they also have many affiliated participants from other countries. The Danish Center for Grid Computing (DCGC) followed, bringing together Danish Grid efforts. Contribution of Danish researchers to NorduGrid and related projects is absolutely essential; the NDGF headquarter is presently located in Kastrup (hosted by NORDUnet). The NorduGrid Certificate Authority, responsible for issuing Grid "passports" to all Nordic researchers, is also situated in Denmark. Danish researchers were among the first in the world to start using Grid facilities on a regular basis, and Danish universities (Copenhagen and Aalborg) are among the very few universities in the world that offer Grid syllabus. Today Denmark has excellent Grid expertise and highly developed technology and infrastructure, that is firmly integrated in international Grid activities. In addition to these large-scale and international collaborations, there are a number of newer small-scale projects that are either independent projects or projects that spring from larger, but national oriented projects. Among these is a project to develop e-Science related to the Danish Galathea mission (the mission funded by i.a. the Ministry of Science, Technology and Innovation), a project to further develop the Danish Astrophysical Virtual Observatory (funded in part by the Ministry of Culture), as well as various research centers that have participants from Danish universities, e.g. the Centre for Molecular Movies (funded by the Danish National Research Foundation).
Libraries and e-Science In a special issue on Connecting Digital Libraries to e-Science of the International Journal of Digital Libraries (2007), Volume 7, Issue 1, you find a number of real case cooperation projects where libraries are involved in e-Science. It is argued that the requirements of the research community is well-known, especially "small science" has a need for support by libraries. The issue presents a first investigation on e-Science cooperatives; "Pathways" (http://www.infosci.cornell.edu/pathways/ ), a project arguing for an interoperability framework connecting distributed, heterogeneous systems and repositories, "DART" (http://dart.edu.au ), a very large scale project on handling data sets and presenting a next generation publication platform, cross applications between e-Science and e-learning and a number of case studies. e-Science will influence the relation between research libraries and their customers due to the very large volume of information content necessary to modern research. The tasks involved are presently leveraged by other organizations, such as data centers, Grid infrastructures, and research utilities. On the one hand, the solutions and services developed so far service researchers by means of short-term archiving and day-to-day work. On the other hand, there is currently a lack of certain e-Science related services that already exist in more traditional literature-oriented contexts, e.g. reference abilities, search and retrieval abilities, and more. We argue that key library competences are currently absent from the general e-Science landscape, and that these have the potential of improving the overall quality and efficiency of e-Science. We also argue that research libraries can play a natural role due in the organizations’ close relations with the researchers. In the current section, we investigate the possible tasks where libraries would play a natural role within e-Science. In a recent report, the ARL states that "There is a perception that science librarians, more than ever before, need to be actively engaged in their user communities. They need to understand not only the concepts of the domain, but also the methodologies and norms of scholarly exchange. This level of understanding and engagement goes well beyond knowledge of the literature. It requires being a trusted member of the community with recognized authority in information related matters. This new paradigm suggests a shift in focus from managing specialized collections (the “branch library” model) to one that emphasizes outreach and engagement." • •
What are the library-like and library-suitable sides of e-Science (the library "optics" on e-Science)? What is being done by libraries on the international scene?
OBS: Not sure it belongs here, but we ought to mention digital repositories IMHO! In the era of e-Science, libraries are developing digital repositories of ever increasing scale and complexity. Research, development and integration activities in the area of digital repositories attract much attention in the EU FP7 work and in other initiatives. This is a rapidly developing area with plenty of research challenges.
Even a brief list of issues faced by the digital repositories is rather long: •
Required services and facilities: • Dynamic Service-Oriented Infrastructures (SOI) • e-Infrastructures (machines, networks, application resources) • Virtual collaborations (dynamic, on-demand, collaborative environments) • Ontologies • Grid file systems Required functionalities: • Data replication, data partitioning • Data subscription, notification • Data access (secure, uniform, prompt) • Data preservation, sustainability • Data curration, indexing, metadata • Data provenance • Conformance to open standards Security and policy issues: • Access policies (open, restricted, time-limited...) • Contracts between data producers and consumers • Intellectual property, licensing issues • Privacy issues • Quality of data, certification Networking, education and outreach challenges
Some of these issues can be addressed using existing technologies, such as Grid ones. Others still need development of new standards and new tools. Some others require decisions and regulations on various administrative levels. JISC, the UK counterpart of DEFF, presents their research findings on application of research data sets in a extensive report, summarized on http://www.rin.ac.uk/data-publication. One of the main findings is that even though there is a demand for reuse of and accessibility to data sets, these data sets are often not made available to others. On top of the lack of technical solutions, there seems to be a lack of publication of data sets and these are the two main barriers in widening the open access to data sets. Some obstacles are reported regarding the sharing of data sets, e.g. missing linking and accessing abilities, refusal to release data, inadequate metadata, need for licensing, lack of quality assurance and much more. A paper by Jane Hunter, Scientific Publication Packages .., The International Journal of Digital Curration, Issue1, Volume 1, 2006, presents a very down to earth description of the research process and related issues in data curration that points out some of the issues to be addressed by the e-Science community and libraries.
Recommendations to DEFF • •
DEFF chooses to position the research library community as a significant partner in the development of e-science. DEFF establishes dedicated capacity within its program agenda over time and building a shared understanding among the members of the component issues and challenges for library engagement. DEFF works actively for the generation of a community-wide cooperation/partnering for e-Science re-visioning the tasks of research libraries. DEFF consults a part time equivalent for ongoing observation within the wider range of e-Science, relevant for research libraries, keeping the level of knowledge up-todate. DEFF supports the establishment of an online resource for e-Science topic, pointing to current research, projects and resources. DEFF joins international networks and activities such as http://www.dresnet.net/related-networks to get relevant international connections regarding the topics. DEFF supports efforts on education for the research library community about scientific trends, the emergent role of data curration, characteristics of virtual organizations, relevant policy for data and research dissemination, and tools and infrastructure systems. DEFF will identify skills necessary by these developments. DEFF supports and encourages Open Data archiving, access to Open Data and other forms of e-Science initiatives that augment research material supporting publisher's so called "supplementary" material if it is Open Data (i.e. as described by Science Commons and Open Data Commons). DEFF supports case studies by research libraries on providing access to e-Science and Open Data material, especially in "small science". DEFF works towards providing access to e-Science resources through library OPACs, library websites and integrated search engines.