ELIXIR Annual Report 2017 by ELIXIR

ELIXIR Annual Report 2017

Contents

2 3

Foreword by Robert Gentleman, Chair of Scientific Advisory Board Foreword by ELIXIR Director

5 8 10 12 14 17 20

Platforms Tools: Services and connectors to drive access and exploitation Data: Sustaining Europeâ&#x20AC;&#x2122;s life science data infrastructure Compute: Access, exchange and storage Interoperability: Integration of data and services Training: Professional skills for managing and exploiting data Platform leadersâ&#x20AC;

21 23 24 25 26 27 28

Use Cases and Communities Human data Rare diseases Marine metagenomics Plant sciences Use Case leaders New ELIXIR Communities

Members

2017 Highlights

43 44 46

EU Grants ELIXIR-EXCELERATE Collaboration with other Research Infrastructures

49 50 51 53 54 55 57 59

Supporting activities Capacity building and Node development Industry engagement International collaboration Impact and sustainability Communications Governance ELIXIR Hub staff

61 62 65 69

Governance Committees and financial data ELIXIR Committees Implementation studies in 2017 Financial data

Foreword by Robert Gentleman

The past year has seen a remarkable increase in the technical and scientific activities of ELIXIR, ranging from the selection of ELIXIR Implementation Studies by peer review for the first time, to the delivery of activities across all of the ELIXIR Platforms and Use Cases, to the results of ELIXIR collaborations with its European and international partners. As Chair of the ELIXIR Scientific Advisory Board (SAB), I have watched the development of ELIXIR in 2017 in great detail, and – as we stated in our regular SAB reports – the rate of progress in the implementation of the ELIXIR Scientific Programme has been phenomenal. The selection and publication of the ELIXIR Core Data Resources was not only the highlight of 2017, but of the entire programming cycle of 2014–2018. By establishing the initial list of ELIXIR Core Data Resources, ELIXIR has defined the best practices in assessing the quality of biological data provision and has gained worldwide recognition as an authority in this field. This directly feeds into the global case to sustain core data resources. Another important milestone in 2017 was the creation of new ELIXIR Communities. Although each of the three new communities created – the Proteomics, Metabolomics and the Galaxy community – have different requirements, they provide very good opportunities to develop strong technical activities that will benefit their respective user groups. In 2017, we also saw the beginning of the development of the ELIXIR Scientific Programme for 2019–2023, the strategic document that lays out the priorities and objectives for the next five years. I’ve been pleased to see ELIXIR reaching out to ELIXIR Nodes in the development of this programme. It will be crucial in 2018 to maintain this vibrant dialogue to ensure the Programme responds to the needs of the life science community in Europe.

ELIXIR Annual Report 2017

I have had the privilege of engaging with ELIXIR and its members through my role on the ELIXIR Scientific Advisory Board since the early days of ELIXIR. I am proud of the impact that the SAB has had and I continue to be impressed by the energy and effort of its members. It has been enormously interesting to watch and help guide the development of ELIXIR, and I am looking forward to seeing it build and develop further. I believe that the future of ELIXIR is very bright and that it will leverage the potential of life-science data for maximum impact on research, innovation and society at large. Robert Gentleman Chair of ELIXIR Scientific Advisory Board (2013-2017)

Foreword by ELIXIR Director

This Annual Report illustrates the growth and maturity of our research infrastructure, highlighting the efforts of the 650 plus national experts involved in ELIXIR through our 21 ELIXIR Nodes, which collectively span over 200 institutes across Europe. As I reflect on the major achievements of 2017, there are several events and themes that stand out: the publication of the initial list of ELIXIR Core Data Resources, the collective efforts in ELIXIREXCELERATE that resulted in a highly successful midterm review, and the implementation of FAIR principles across our data resources and services.

Long-term sustainability A key goal for ELIXIR, ever since the planning began for a distributed European data infrastructure, has been to develop a new model that ensures the long-term sustainability of the key life science databases and knowledgebases. In 2017, we made several important steps towards this goal. In Prague, during the ISMB conference in July, we announced the initial list of ELIXIR Core Data Resources. It was the culmination of many months of committed effort by our Data Platform, Heads of Nodes, Scientific Advisory Board, our external evaluators and – not least – the representatives of all the ELIXIR data resources who embraced the process and participated in it. Our work to establish ELIXIR Core Data Resources, and the very positive reactions we have received about these resources from the community, also strengthened our position in the dialogue with global funders. Throughout 2017, ELIXIR was an active participant in the initiative to establish a global coalition to sustain core data resources, initially facilitated by the Human Frontier Science Program Organization.

ELIXIR-EXCELERATE In May 2017, our flagship project ELIXIR-EXCELERATE received very positive feedback on the activities and results of the first half of this project. During the project’s mid-term review meeting in May in Brussels, we had a very fruitful discussion with our external reviewer and received many useful recommendations for our future work.

The quality of our work in ELIXIR-EXCELERATE was also recognised by the European Commission; ELIXIR-EXCELERATE was cited as an early success story in the analysis of the Horizon 2020 Research Infrastructures programme published in May 2017 by the European Commission.

FAIR principles In September, we published a position paper on FAIR Data Management in the life sciences, setting out our view on the guiding principles on FAIR Data Management. The position paper confirmed our commitment to supporting the FAIR data principles within the framework of the European Open Science Cloud (EOSC). Good examples of how we support FAIR data are the continued success of the Bioschemas Community. In 2017, Bioschemas held a series of workshops and hackathons to engage with the life-science community and encourage their members to adopt and use the Bioschemas specifications and make their data discoverable by others. As more ELIXIR resources adopt the Bioschemas markup, we are building a key component of ELIXIR FAIR data infrastructure.

ELIXIR Annual Report 2017

Looking ahead The year 2018 is the last year of our current Scientific Programme and we are working hard on the next ELIXIR Scientific Programme for 2019–2023. As a strategic document to set the future direction of ELIXIR, this new programme reflects the priorities and activities of ELIXIR Members and the needs of our user communities. Development of the ELIXIR 2019–23 Scientific Programme is a considerable undertaking that involves the large community of ELIXIR Node experts, the leaders of ELIXIR Platforms and Use Cases, and the Heads of ELIXIR Nodes. I would like to take the opportunity to thank all of those involved in the work to develop the vision of ELIXIR in 2023, including those working on the technical roadmaps that will take us there, and on the development and delivery of services that have earned us the trust and commitment of users. I look forward to the continued success of ELIXIR in this transformative era in the life sciences. Niklas Blomberg

ELIXIR Annual Report 2017

Platforms

ELIXIR Platforms

ELIXIR activities are structured around five Platforms and a growing portfolio of Use Cases (see the Use Case section). The Platforms form the basic units of operation within ELIXIR, drawing on technical expertise and resources from ELIXIR Nodes. ELIXIR Platforms are built on the real and changing needs of established research communities. They are led by senior scientists from ELIXIR Nodes and are supported by a Platform coordinator at the ELIXIR Hub. The ELIXIR Platforms comprise: • Data: Sustaining Europe’s life-science data infrastructure • Tools: Services and connectors to drive access and exploitation • Interoperability: Supporting the discovery, integration and analysis of biological data • Compute: Storage, computing and authentication / access services • Training: Professional skills for managing and exploiting data The activities of the Platforms are primarily funded by the ELIXIR-EXCELERATE project, in which each Platform is represented by a Work Package. Each Platform also manages an expanding set of ELIXIR Implementation Studies, funded through the ELIXIR Hub budget. Additional activities are funded through other EU grants (CORBEL, EOSC and others).

ELIXIR Annual Report 2017

Platforms

Communities

Data

Rare diseases

Human data

Training

Marine metagenomics

Compute

Plant science

Proteomics

Interoperability

Metabolomics

Tools

Galaxy

ELIXIR Annual Report 2017

Tools Services and connectors for access andÂ exploitation The ELIXIR Tools Platform supports the discovery, quality and sustainability of software resources, initially from ELIXIR Nodes, and later from the wider life sciences community. The main objectives of this Platform are: to help users find, access, re-use, deploy and benchmark software tools, including workflows; and to help software providers and developers to better describe and develop software tools, including workflows. The ELIXIR Tools Platform coordinates technical activities, and engages end-users across seven technical groups: (1) Bio.tools; (2) Scientific benchmarking and technical monitoring; (3) Software deployment; (4) Workflows and workbenches; (5) Software development best practices; (6) Tools interoperability; and (7) Galaxy Workflow Management system. In 2017, the platform focused activities on Bio. tools, Scientific benchmarking, Software deployment, and the Software development best practices, which are highlighted below.

Bio.tools: ELIXIR tools and services registry This technical group aims to deliver a world-leading discovery portal (bio.tools) for bioinformatics software information. In 2017, the content of bio.tools expanded to over 10,000 entries and migrated to a new version of the data model (biotoolsSchema 2.0). The usability of the portal was also improved by the development of new features, such as content sharing, and the enrichment of results with literature data. In September, the Platform published the first (beta) versions of Tool Information Standards and bio.tools Curation Guidelines1. The bio.tools developers published two new research papers on the integration of bio.tools and workbench environments. The first paper was published in April in Giga Science and presented ReGaTE, a software utility to automate the registration of services available in a Galaxy instance in bio.tools2. The second paper presented the ToolDog (Tool DescriptiOn Generator), which facilitates the integration of tools registered in bio.tools into workbench environments3.

ELIXIR Annual Report 2017

The Tools Platform also started a joint Implementation Study with the ELIXIR Training Platform to integrate ELIXIR portals from a user perspective4. In addition, the Galaxy Working Group was selected in October 2017 to become an ELIXIR Community and is planning its first community meeting in March 2018 (see Chapter New Communities).

Scientific Benchmarking and technical monitoring Scientific benchmarking and technical monitoring of bioinformatics tools and services provides an objective way to ensure that bioinformatics tools are available, stable, robust and fit for purpose. In 2017, the first OpenEbench prototype was released at http://elixir.bsc. es. This Benchmarking platform will provide guidance and software infrastructure for benchmarking and technical monitoring of bioinformatics tools, webservers and workflows. The first workshop organised around OpenEBench was held in September 2017 inÂ Basel. The experience gained from establishing the ELIXIR Benchmarking Platform was published in a working paper Lessons Learned: Recommendations for Establishing Critical Periodic Scientific Benchmarking5.

Software deployment The main goal of the Software deployment group is to support existing community efforts with their work on bioinformatics software deployments using traditional packages and containers. In October 2017, the ELIXIR Tools Platform organised an ELIXIR hackathon in Paris, France, on Biocontainers to accelerate and consolidate the Containers platform as a Service, and to integrate it with other groups within the ELIXIR Tools Platform. As a result of this event, a new ELIXIR Biocontainers Implementation Study was approved to begin in January 2018. The Workflow and workbenches group was established by the ELIXIR Tools Platform to support the integration of the bio.tools with integrated environments (Galaxy, Taverna, etc.) and to develop the link between discovery and execution. This group developed the ToolDog tool to facilitate the integration of tools registered in bio.tools into workbench environments3.

Number of entries in the bio.tools registry: January 2017 - January 2018 11000 10000 9000 8000 3000 7000 6000 5000 4000 3000 Jan 17

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Jan 18

Software development best practices

1. http://biotools.readthedocs.io/en/latest/curators_guide.html

In July 2017, the Software Development Best Practice Working Group published a new paper to encourage developers, research institutes, and companies to adopt four best practices for open source development of life science research software.6

2. Doppelt-Azeroual O, Mareuil F et al. ReGaTE: Registration of Galaxy Tools in ELIXIR, GigaScience 2017, 6(6): 1-4, (doi: 10.1093/gigascience/gix022

The paper is the outcome of year-long discussions and deliberations driven by the Platform, together with the Software Sustainability Institute and the Netherlands eScience Center. They involved a wide range of researchers and developers, representing over 40 different institutes and organisations. As such, the recommendations present a broad consensus of the life-science research community.

5. Capella-Gutierrez, S. et al. Lessons Learned: Recommendations for Establishing Critical Periodic Scientific Benchmarking. 2017.URI: http://hdl. handle.net/2117/107279

3. Hillion KH, Kuzmin I, Khodak A et al. Using bio.tools to generate and annotate workbench tool descriptions. F1000Research 2017, 6(ELIXIR):2074 (doi: 10.12688/f1000research.12974.1 ) 4. https://www.elixir-europe.org/activities/elixir-integration-user-perspective

6. JimĂŠnez RC, Kuzak M, Alhamdoosh M et al. Four simple recommendations to encourage best practices in research software. F1000Research 2017, 6:876 (doi: 10.12688/f1000research.11407.1)

The ELIXIR Tools Platform is funded through the ELIXIREXCELERATE project (Work Package 1: Tools Interoperability and Service Registry, and Work Package 2: Benchmarking) and Implementation Studies commissioned by the ELIXIR Hub.

ELIXIR Annual Report 2017

Data Sustaining Europe’s life-science data infrastructure The ELIXIR Data Platform provides a framework for developing ELIXIR’s sustainability strategy for life-science data resources. The main goal of the Platform is to establish quality metrics for data resources, to identify the key data resources across all life science domains, and to make data resources easier to find and access. In 2017, the main focus of the Platform was the selection of ELIXIR Core Data Resources and ELIXIR Recommended Deposition databases. The Data Platform also piloted, for the first time, the selection of Implementation Studies through peer review involving external reviewers.

ELIXIR Core Data Resources and Recommended Deposition Databases ELIXIR Core Data Resources are the primary focus of the Data Platform’s activities. Core Data Resources are a set of European life-science data resources (deposition archives and knowledgebases) that are of fundamental importance to life-science research and to the long-term preservation of biological data. In 2017, the Data Platform concluded the first round of the process to select the ELIXIR Core Data Resources and announced the initial list. In addition to this list of Core Data Resources, ELIXIR compiled a list of databases that it recommends be used for the deposition of experimental data. The goal was to provide guidance to journals and funders on the appropriate repositories in which to publish open data in the life sciences. Many funding agencies have already shown an interest in recommending that their grantees use ELIXIR’s Deposition Databases. The Core Data Resources form the backbone of ELIXIR’s sustainability strategy. The monitoring and evaluation of their usage will provide reliable measures of their scientific and economic value and will highlight the benefits of generating a sustainable infrastructure for open biological data.

Following the selection process, the Data Platform and the Core Data Resources leads agreed on a set of metrics on the Core Data Resources to be gathered annually, as a way of demonstrating their value and managing their life cycles. The plan for collating these quality indicators of ELIXIR’s Core Data Resources was published in August 20171. In September 2017, the Platform began the second round of the selection process for both ELIXIR Core Data Resources and ELIXIR Deposition Databases. The selection and evaluation will take place regularly and further resources will be included as the ELIXIR data infrastructure evolves.

Global coalition to sustain core data resources ELIXIR is an active member of the initiative to establish a global coalition to sustain core data resources, which brings together senior managers of key databases and leaders of major funding organisations across the world. In March 2017, the group published a call-for-action for a global coalition to sustain core data resources2, which identified the main shortcomings of the current funding of life-science data infrastructures and which called for more fit-for-purpose infrastructure funding models. The work done by ELIXIR in establishing Core Data Resources in Europe and in identifying indicators to assess the importance of data resources has provided this global coalition with a framework that can be developed globally.

Data Platform Implementation studies In 2017, the Data Platform ran a peer review process to select a portfolio of Implementation Studies linked to the Platform to begin in 2018. The Platform published a Request for Proposals, along with the Criteria for Evaluation. The selection process relied on the assessment of independent experts in bioinformatics and bioinformatics service provision. Of the 17 applications received, seven studies were selected, involving 13 ELIXIR Nodes. This is the first time ELIXIR Implementation Studies have been selected via independent peer review. The goal of this approach was to drive excellence in the proposed work, to increase the transparency of ELIXIR resource allocation, and to provide feedback to ELIXIR Nodes regarding their service provision plans and objectives.

ELIXIR Annual Report 2017

ELIXIR Core Data Resources ArrayExpress: Functional Genomics Data from high-throughput functional genomics experiments. CATH: A hierarchical domain classification of protein structures in the Protein Data Bank. ChEBI: Dictionary of molecular entities focused on ‘small’ chemical compounds. ChEMBL: Database of bioactive druglike small molecules, it contains 2-D structures, calculated properties and abstracted bioactivities. EGA: A database of personally identifiable genetic and phenotypic data resulting from biomedical research projects. ENA: A database of nucleotide sequencing information, covering raw sequencing data, sequence assembly information, and functional annotation. Ensembl: Genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation, and transcriptional regulation.

Ensembl Genomes: Genome browser for non-vertebrate genomes that supports comparative analysis, data mining, and visualisation of nonvertebrate genomes Europe PMC: A repository that provides access to world-wide life sciences articles, books, patents and clinical guidelines. Human Protein Atlas: A knowledgebase that contains most of all human protein-coding genes, and information about the expression and localization of the corresponding proteins based on both RNA and protein data. The IMEx Consortium, represented by IntAct and MINT: IntAct provides a freely available, open source database system and analysis tools for molecular interaction data. MINT focuses on experimentally verified protein-protein interactions mined from the scientific literature by expert curators.

Long-term funding models for bioinformatics databases To better understand the challenges in developing and securing long-term sustainable funding for key bioinformatics resources, ELIXIR funded an Implementation Study to explore and review sustainable funding models for a specific case: the UniProt knowledgebase. The Implementation Study ran from January to December 2017 and was carried out by ELIXIR Switzerland (SIB, Swiss Institute of Bioinformatics). The study resulted in a paper, published in the ELIXIR F1000R channel in November 2017 3. The article presented and reviewed twelve funding models for data resources and applied them to the specific case of UniProt. It showed that most of the models lead to inconsistencies with open access or equity policies and proposed a new Infrastructure Model whereby funding agencies would set aside a fixed percentage of their research grant volumes, which would subsequently be redistributed to core data resources.

InterPro: An umbrella resource to which many collaborating databases contribute. It enables the functional analysis of protein sequences, performed by classifying sequences into families and predicting the presence of domains and important sites. PDBe: A database of biological macromolecular structures. PRIDE: A database of mass spectrometry-based proteomics data, including peptide and protein expression information (identifications and quantification values), and the supporting mass spectra evidence. STRING-db: A database of known and predicted protein-protein interactions. UniProt: A comprehensive repository and resource for protein sequence and annotation data.

The ELIXIR Data Platform is funded through the ELIXIREXCELERATE project (Work Package 2: Data Resources and Services) and through ELIXIR Implementation Studies commissioned by the ELIXIR Hub.

1. Stockinger, H., Barlow, M., Cook, C. et al. Plan for collation of metrics and quality data at the ELIXIR Hub. Zenodo 2018. (doi: 10.5281/zenodo.1194123) 2. Data management: A global coalition to sustain core data. Nature 543, 179 (09 March 2017) doi:10.1038/543179a3. 3. Gabella C, Durinx C and Appel R. Funding knowledgebases: Towards a sustainable funding model for the UniProt use case. F1000Research 2017, 6(ELIXIR):2051 (doi: 10.12688/f1000research.12989.1)

ELIXIR Annual Report 2017

Compute Access, exchange and storage The ELIXIR Compute Platform is developing a robust technical infrastructure for accessing, transferring, exchanging and analysing biological data. It aims to provide cloud, computing, storage and access services for the research community. The objective is to integrate the individual technical components provided by the ELIXIR Nodes into a seamless service provision system for the life-science research community. The Compute Platform works closely with the ELIXIR Use Cases and with the ELIXIR Training Platform to ensure that technical solutions address their specific needs. In 2017, the development of the Compute Platform services resulted in the first release of services to end-users.

ELIXIR Authentication and Authorisation Infrastructure (AAI) The ELIXIR AAI allows a user to create an ELIXIR identity based on a pre-existing identity (e.g. Google, ORCID or the researcher’s home university) to authenticate access to multiple infrastructure services. This allows users to use their existing accounts to access ELIXIR services and helps operators of ELIXIR resources to manage user access to their services. In 2017, several resources and services adopted ELIXIR AAI, including the AAI Gateway of the European Grid Infrastructure (EGI) to provide access to EGI services. The relying service provider network using the ELIXIR identity that underpins ELIXIR AAI also includes two commercial cloud service providers within the Helix Nebula Science Cloud project, and integrates with EUDAT’s B2ACCESS1 service, which allows access to EUDAT services. In addition to receiving ELIXIR-EXCELERATE funding, the AAI tasks were also supported through an ELIXIR Implementation Study in 2017, which accelerated the development and integration of the AAI services needed to support human data. The ELIXIR AAI is currently a production service used by an increasing number of service providers within the ELIXIR community, with three hundred identity providers, more than 1000 users, and tens of services connected to date.

1. https://www.eudat.eu/services/b2access

ELIXIR Annual Report 2017

Transfer of large volumes of sensitive human data – demonstrators Moving large volumes of data between sites, while maintaining confidentiality and security, is a key capability of the Compute Platform. In 2017, the Platform deployed GridFTP servers and integrated them with the ELIXIR AAI to enable regular file transfers between nine GridFTP servers, whilst providing a record of reliability and network performance. The demonstration of the transfer of sensitive data between the GridFTP servers was presented in November 2017, during an ELIXIR Webinar, which gave an overview of improved support to researchers in accessing and processing sensitive data from European Genome-Phenome Archive (EGA, http://www.ega-archive.org). An ELIXIR Implementation Study which started in November 2017 will support a number of ELIXIR Nodes in testing and integrating this service into production.

Cloud resources integration The goal of the cloud integration task within the Compute Platform is to integrate the cloud resources affiliated with the ELIXIR Nodes. The Platform has been evaluating the EGI Federated Cloud model, especially within the context of the emerging European Open Science Cloud (EOSC) initiative, where it may be used as the federation model. The ELIXIR Compute Platform is engaging with EOSC through an ELIXIR Competency Centre, funded as part of the EOSC-Hub project. The work of the ELIXIR Compute Platform was funded through the ELIXIR-EXCELERATE project (Work Package 4: Compute, Data access and exchange services) and Implementation Studies commissioned by the ELIXIR Hub. The development of the ELIXIR AAI was informed by close collaboration with the AARC and AARC2 (Authentication and Authorisation for Research and Collaboration) projects.

Number of ELIXIR AAI users and number of research institutions enabled for ELIXIR AAI login 1600 1400 1400 1200 1000 800 600 400 200 0 Jan 17

ELIXIR users

Oct 17

Apr 18

Home Organisation Identity Providers enabled for login (eduGAIN)

Number of registered resource providers connected to ELIXIR AAI 100

79 61

50 0

9 Jan 17

Oct 17

Apr 18

ELIXIR Annual Report 2017

Interoperability Integration of data and services The ELIXIR Interoperability Platform aims to support people and machines in finding, combining and reusing datasets, as well as individual data records from different sources, across institutional, geographical and scientific domains. Implementing and promoting the ‘FAIR’ principles1 (Findable, Accessible, Interoperable and Reusable) of data stewardship, the Platform encourages the life-science community to adopt standardised formats, metadata, vocabularies and identifiers. The Platform is driven by the needs of ELIXIR’s Use Cases and collaborates globally through a number of initiatives, including the Research Data Alliance, FORCE11, U.S. NIH data commons projects, and others.

Emerging interoperability services In 2016, the major outcome of the Platform was the publication of the Interoperability Platform Roadmap, which defines the Platform’s technical and scientific strategy and outlines requirements for ELIXIR’s Interoperability services. In 2017, the Interoperability Platform started to implement this Roadmap and presented the first components of the emerging portfolio of ELIXIR interoperability services. The activities that fall within the Interoperability Platform are divided into seven projects: (1) Service Framework; (2) Resource Markup (Bioschemas); (3) Identifiers (including mapping services); (4) Metadata services and Standards Registry; (5) Linked Open Data; (6) Workflow and Tools Interoperability (Common Workflow Language); and (7) Interoperability Knowledge Hub. In 2017, the Interoperability Platform focused their attention and efforts on Service Framework, Bioschemas, Common Workflow Language, and Identifiers. The Platform has also worked within the EOSCPilot project to pilot Data Catalogue interoperability, Common Workflow Language adoption, and identifier schemes within the European Open Science Cloud (see EU Grants section).

ELIXIR Annual Report 2017

Service Framework The Interoperability Platform Roadmap defined general components of the interoperability service stack and identified existing resources – both within and outside ELIXIR – that can deliver these components most effectively. The needs analysis was captured in the Service Framework for the Interoperability Backbone. The first ELIXIR interoperability services selected as part of the Service Framework are as follows: Identifiers.org (https://identifiers.org); Ontology Lookup Service (https://www.ebi.ac.uk/ols); and FAIRsharing (https://fairsharing.org/, formerly BioSharing). The Platform developed a work plan to improve their interoperability with other Platform registries and to facilitate their longer-term sustainability. The Interoperability Platform also drafted criteria for FAIR Interoperability services and registries, resulting in the development of procedures and processes (SOPs) through which other candidate registries and resources can be incorporated into the Platform. Bioschemas – universal markup for datasets and biological entities Bioschemas is a community-driven metadata vocabulary and markup based on schemas.org, which is tailored for the life sciences. In 2017, Bioschemas mobilised the life-science community to develop description profiles for data resources and data types. Bioschemas also encouraged researchers to develop, adopt, and to use the specifications through a series of workshops in Hinxton, UK, and at various international meetings, including the Open Science Fair in Athens. In October 2017, in Hinxton, representatives from over 30 different biological resources, including major international resources such as UniProt and PDBe, tested and adopted at least one of the Bioschemas specifications. As part of this effort, Bioschemas also expanded its scope and developed specifications for more types of life-science data. By the end of 2017, twelve different specifications2 had been developed, describing the general properties of datasets, as well as specific markup for data types, samples, proteins, and markup for other bio resources, such as laboratory protocols and tools. Tools to assist with markup, validation and indexing are currently being piloted.

Common Workflow Language The Common Workflow Language (CWL) supports the reproducibility and interoperability of workflows and analysis tools. It helps scientists and bioinformaticians to describe analysis tools and workflows, which can then be used across a variety of platforms. The Interoperability Platform works in close partnership with the CWL grassroots community and has adopted CWL’s approach to improve reproducibility and interoperability of life science data and research. In 2017, the Platform worked with the Marine Metagenomics Use Case on the description of their analysis pipelines. This led to a funded ELIXIR Implementation study in 2018 to support the integration of CWL with ELIXIR tools, and the interoperability of pipelines with international partners. Along with the Tools and Training Platforms, the Interoperability Platform initiated a programme of workshops, best practice guides, and community engagement efforts, notably with the Galaxy Community. Support for CWL in Galaxy is on the official project roadmap. It is currently being implemented, and a significant part of the CWL standard is already implemented in the Galaxy’s backend. The next important step is to adapt its web frontend. Identifiers The vast majority of data collections in the life sciences are now accessible online. It is therefore crucial to have a stable and persistent means to identify and reference data. Identifiers.org, run by EMBL-EBI, is an established resolving system, which provides resolvable identifiers for life-science resources (databases). In 2017, the Interoperability Platform ran an ELIXIR Implementation Study to establish Identifiers.org as a core ELIXIR service to provide stable and resolvable identifiers for life-science data. This Implementation Study updated the registry that underpins the service, to include the majority of resources run by ELIXIR Nodes.

To address the needs of scientific journals, identifiers. org has implemented compact identifiers, providing an easy and human readable way of referencing data in scientific papers. This work was organised through a Force11 identifiers group and carried out in collaboration with an equivalent meta-resolver, name-2-thing (n2t), which is based at the California Digital Library, USA. This collaboration resulted in the implementation of a global resolution of compact Identifiers for biomedical data, which was presented in Nature Scientific Data in 20183. Experts within ELIXIR’s Interoperability Platform lead the data interoperability task within the ESOCpilot project. In 2017, this work culminated in a first draft of the strategy and recommendations to help users and services to find and access datasets across several scientific disciplines. (See more in EU Grants section).

Looking ahead The overall goal of the Interoperability Platform remains to enable scientists to find, access, combine, and analyse multiple datasets. In addition, it also aims to enable data- and service- providers to deliver findable, accessible, interoperable and reusable datasets and services, and to facilitate the adoption of global standards. In 2018, the Interoperability platform will build on the work of 2017. It will also focus on establishing its first recommended portfolio of services, and on improving its guidelines and knowledge hub for ELIXIR members. It will propose criteria and a selection procedure for ELIXIR recommended interoperability services and it will organise a call for proposals for ELIXIR interoperability services. An ELIXIR Implementation Study will also focus on validation services for meta data and common formats for the Platform’s partners to use. Proof of concept validation services will also be implemented for the ELIXIR Plant Use Case, among others. The work of the ELIXIR Interoperability Platform was funded through the ELIXIR-EXCELERATE project (Work Package 5: the ELIXIR Interoperability Backbone). The work on Identifiers. org and Bioschemas were funded by the ELIXIR Hub through ELIXIR Implementation Studies. Other sources of grant funding, including the CORBEL, EOSCPilot, and BioExcel projects, have contributed to activities within the Platform.

1. Wilkinson, MD., Dumontier, M. The FAIR Guiding Principles for scientific data. Scientific Data 2016,03(15) online, (doi: 10.1038/sdata.2016.18) 2. http://bioschemas.org/specifications/ 3. Sarala M. Wimalaratne et al. Uniform resolution of compact identifiers for biomedical data. Sci. Data 5:180029 doi: 10.1038/sdata.2018.29 (2018).

ELIXIR Annual Report 2017

Interoperability Platform Services Framework Standards and APIs

Applications

Intergration

Pipelines

Identifier, resoution, versioning, provenance

Standards registry

Ontology

API description

Identifier mapping

Tools registry

Linked data

Tools and workflow descriptions

Citation implimentation

Workflows registry

Annotation and curation

Dataset description

Identifier authority

Search and Query

Data integration

Validation services

BYOD

ELIXIR Annual Report 2017

BYOW

BYOAPI

Training Professional skills for managing and exploiting data The provision of bioinformatics training to help life-science researchers work effectively with ELIXIR resources is a key priority. The ELIXIR Training Platform is therefore building a sustainable training infrastructure that aims to provide access to training courses, tools, resources, and expertise, including quality assurance and monitoring. In 2017, with the delivery and completion of the basic elements of the Training Infrastructure, the ELIXIR Training Platform entered its second phase of the ELIXIR-EXCELERATE grant. This second phase aims to extend, strengthen and to consolidate current efforts, and to apply our defined standards and descriptors to the all content produced by the Platform. The activities of the Training Platform are made up of the following components: (1) ELIXIR Courses for researchers, developers and trainers, (2) Training evaluation, and (3) Training infrastructure, including the ELIXIR Training Portal TeSS, the ELIXIR-SI e-learning platform, and the Virtual Coffee Room. These activities are implemented in the ELIXIR Nodes under responsibility of the Training Coordinators Group (TrCG).

ELIXIR Training Courses In 2017, the ELIXIR Training Platform organised 44 training events for researchers, trainers and developers under the umbrella of ELIXIR-EXCELERATE; several hundred training events have been organised by the Nodes themselves. One example of such training for researchers are the Software and Data Carpentry workshops. After a successful pilot1, ELIXIR has set up an agreement with the Carpentry Foundation to roll out the provision of Software and Data carpentry courses within ELIXIR, to build an instructor pool, and thus to empower researchers to more easily access the ELIXIR infrastructure. The Platform also concluded the Train-the-Trainer (TtT) Pilot programme, which consisted of seven pilot courses organised in 2016 and 2017, the development of training materials, and the systematic collection of feedback from course participants. The results and lessons learnt were published in a paper in the ELIXIR F1000R channel2, and all materials produced are available online3. The TtT programme has been running at full speed since then, and will be further expanded from 2018 on with the ELIXIR TtT Exchange, which will facilitate the participation of ELIXIR members in TtT courses. To make sure ELIXIR courses effectively exploit computing resources, the Training Platform started an Implementation study to define the best mechanisms to request and use ELIXIR Node cloud resources for bioinformatics training. Ready-to-run virtual machines that contain an operating system and pre-installed analysis software should improve the portability and reproducibility of ELIXIR courses.

1. Pawlik A, van Gelder CWG, Nenadic A et al. Developing a strategy for computational lab skills training through Software and Data Carpentry: Experiences from the ELIXIR Pilot action. F1000Research 2017, 6:1040, (doi:10.12688/f1000research.11718.1) 2. Morgan SL, Palagi PM, Fernandes PL et al. The ELIXIR-EXCELERATE Trainthe-Trainer pilot programme: empower researchers to deliver high-quality training. F1000Research 2017, 6:1557 (doi: 10.12688/f1000research.12332.1) 3. https://github.com/TrainTheTrainer/EXCELERATE-TtT

ELIXIR Annual Report 2017

Figure 1: Overall satisfaction of participants of ELIXIR Training courses

Satisfactory 5%

Poor 1%

It did not help as I do not use the resources covered in the course 6%

Excellent 10%

It improved my ability to handle data 44%

Good 16%

Very good 68%

Training evaluation Using a consistent set of Key Performance Indicators (KPIs), the Training Platform started to systematically collect feedback from courses organised with the support of the ELIXIR-EXCELERATE project and of ELIXIR Nodes. The data collected to date cover 144 courses that were organised between September 2016 and July 2017, from 2,200 respondents, across twelve participating ELIXIR Nodes. According to the survey, nearly 80% of respondents thought that the course they attended was ‘excellent’ to ‘very good’ (Figure 1), and 87% of them would recommend their course to colleagues. The data collected six months after participants had attended a course indicate that attending a course has improved their ability to handle data or has improved their work efficiency (Figure 2). Following a detailed evaluation of the KPIs used and the results collected to date, the evaluation methodology has been officially adopted by ELIXIR and will be used to collect information for all ELIXIR training courses. The next iteration of ELIXIR training course feedback data collection began in August 2017 and will continue until July 2018.

ELIXIR Annual Report 2017

Figure 2: Concrete ways ELIXIR training course helped their participants It improved my overall efficiency 22%

It improved my interactions with the bioinformatician analysing my data 28%

Training Infrastructure TeSS: ELIXIR Training portal The TeSS portal allows scientists to browse, discover and to organise life-science training events and materials that have been aggregated from ELIXIR Nodes and thirdparty providers (e.g. RI-Train, BioEXCEL, GOBLET, etc). By the end of 2017, it included nearly 300 events and over 800 training materials from 45 providers. In 2017, the TeSS team started an ELIXIR Implementation Study to cross-link materials in TeSS with relevant tools registered in ELIXIR Tools and Service Registry (bio.tools). The ultimate goal is to enable researchers to discover and use ELIXIR resources across domains and platforms through an intuitive graphical interface, based on diagrams of the most commonly used bioinformatics workflows. ELIXIR e-Learning Throughout the year, ELIXIR courses were broadcast via a video conferencing system to allow numerous, geographically distributed users to attend training. Bioinformatics tools and services were also embedded into the ELIXIR-SI eLearning Platform, easing the access of course participants to HPC, cloud-based resources, and to containers, overcoming the technical problems that participants can experience when trying to access training resources remotely. Overall, eleven courses and over 320 course participants benefited from the ELIXIR e-learning resources.

Virtual Coffee Room The Virtual Coffee Room (VCR), a web-based platform, was released in March 2017 to ease the exchange of information among ELIXIR developers and trainers, to share questions, tasks and issues about software development among developers, and also to more quickly identify training needs. In the future, additional uses for the VCR will be explored by other ELIXIR communities, for instance as a help-desk platform for ELIXIR services.

Outlook In 2018 and the remainder of the first ELIXIR programming cycle (2014â&#x20AC;&#x201C;2019), the Training Platform will further develop a coherent portfolio of Train-theDeveloper, Train-the-Researcher, and Train-the-Trainer courses, and it will further expand the Platformâ&#x20AC;&#x2122;s training activities in areas identified as training needs, such as data stewardship, data management, and training in ELIXIR resources. Another area of focus will be to connect individual training courses into learning paths that are tailored to the competencies and needs of individual researchers, including in industry. The ELIXIR Training Platform is funded through ELIXIREXCELERATE project (Work Package 11: ELIXIR Training Programme). The Platform actively collaborates with partner initiatives and projects (GOBLET, Software and Data Carpentries, CORBEL, RITrain and others.)

ELIXIR Annual Report 2017

Platform leaders

Tools

SĂ¸ren Brunak

Data

Jo McEntyre

Alfonso Valencia

Compute

Ludek Matyska

Steven Newhouse

Tommi NyrĂśnen

Chris Evelo

Helen Parkinson

Interoperability

Carole Goble

Training

Patricia Palagi

Celia van Gelder

ELIXIR Annual Report 2017

Gabriella Rustici

Christine Durinx

Use Cases and Communities

ELIXIR Use Cases and Communities

The ELIXIR Use Cases drive the work of the ELIXIR Platforms by defining their bioinformatics needs and requirements. This close collaboration ensures that the services developed by the ELIXIR Platforms are fit for purpose and serve the needs of their research communities. The activities of the existing ELIXIR Use Cases have so far been funded principally through the ELIXIREXCELERATE grant. They bring together experts to develop specialised standards and services in their respective domains, and also provide feedback on the Platform services, helping to ensure that they are practical and useful. In September 2017, the ELIXIR Heads of Nodes agreed to continue the four existing ELIXIR Use Cases, and also to establish three new ‘Communities’, each serving a specific group of researchers. The current portfolio of ELIXIR Use Cases and Communities thus now consists of: • Human data: Developing long-term strategies for managing and accessing sensitive human data • Rare diseases: Supporting the development of new therapies for rare diseases • Marine metagenomics: Developing a sustainable metagenomics infrastructure to nurture research and innovation in marine science • Plant science: Developing an infrastructure to facilitate genotype-phenotype analyses for crop and tree species • Proteomics: Supporting research on the expression and interaction of proteins • Metabolomics: Providing infrastructure services for metabolite identification • Galaxy: Integrating Galaxy platform with ELIXIR resources and services

ELIXIR Annual Report 2017

Use Cases as emerging Communities With the first programme cycle coming to an end in 2018, the ELIXIR Head of Nodes Committee reviewed in 2017 the existing model of four Use Cases - Human Data, Plant Sciences, Rare Diseases, and Marine Metagenomics. Following this evaluation, the Head of Nodes Committee decided to change the name of Use Cases to better reflect how they are organised and how they contribute to ELIXIR’s development. Starting in 2019, when ELIXIR’s next Scientific Programme (2019-2023) begins, Use Cases will be referred to as “Communities”. This will avoid ambiguity, especially in systems and software engineering, where the term ‘use case’ has a well-established meaning. The word ‘community’ also better characterises how ELIXIR coordinates its expertise and services for scientists within a particular domain. Communities will function similarly to Use Cases, and their activities will be funded through a variety of sources, including Hub-funded Commissioned Services, project-based funding from the European Commission, and commitments from ELIXIR Nodes.

Human data Use case The ELIXIR Human Data Use Case is building the technical infrastructure required for researchers to discover, combine, and to exchange controlled-access human data, while complying with data-privacy and datasecurity requirements. The backbone of the Human Data Use Case is the European Genome-phenome Archive (EGA)1. The Use Case extends and generalises the EGA system of access authorisation and secure data transfer, and makes it available to researchers across the ELIXIR Nodes. The Use Case works closely with the Global Alliance for Genomics and Health (GA4GH) in developing and establishing global standards for the sharing and exchange of genomics data.

ELIXIR Beacons The GA4GH Beacon is a lightweight web platform that allows any genomic data centre in the world to make its data discoverable. Users can ask Beacons straightforward yes or no questions like, ‘Do any of these data resources have genomes with this allele at that position?’ The search result informs a researcher as to whether making a data access request is required for their research, saving valuable time and resource. The collaboration between ELIXIR and GA4GH on the ELIXIR Beacon Project expanded in 2017 to develop the network of ELIXIR Beacons and to improve the discoverability of European genomics data. Additional goals of the collaboration were to develop new features and to add security measures to attract stakeholders with more sensitive data sets while minimising risks to individual privacy. The new ELIXIR Beacon Network will allow users to query all ELIXIR Beacons simultaneously. The integration of the Beacon network with the ELIXIR Authentication and Authorisation Infrastructure (AAI) will also enable streamlined access to sensitive human data in a secure and safe manner. In October 2017, during the GA4GH 5th Plenary Meeting in Orlando, USA, the Beacon project was named one of the GA4GH Driver projects, with the aim of driving the development of, and providing guidance on the development of, global standards for genomics data. The development of the Beacon network continues in 2018, and the first release of the ELIXIR Beacon specifications is planned for the last quarter of 2018.

The transfer of large volumes of sensitive human data In 2017, the Use Cases worked closely with the ELIXIR Compute Platform to allow the secure transfer of sensitive human data stored within the European Genome-phenome Archive, using ELIXIR AAI. A demonstration of this sensitive data transfer was presented in November 2017 to researchers via an ELIXIR Webinar, giving an overview of the improved support in accessing and processing sensitive data from the European Genome-Phenome Archive (EGA).

Facilitating the re-use of human data Following up on the results of the ELIXIR Implementation study Genomic data management for TraIT using the EGA, the Human data Use Case published a second research paper, presenting a technical solution for linking Dutch data portal (TranSMART) and Galaxy with EGA, for the reuse of human translational research data2. TraIT (Translational research IT) is a project by the Dutch Center for Translational Molecular Medicine (CTMM) to implement an IT infrastructure for translational biomedical research. Linking TraIT’s TranSMART portal and the Galaxy platform with EGA enabled Dutch researchers to use EGA as the long-term storage solution for raw data.

1. http://www.ega-archive.org/ 2. Zhang C, Bijlard J, Staiger C et al. Systematically linking tranSMART, Galaxy and EGA for reusing human translational research data. F1000Research 2017, 6:14889 (doi: 10.12688/f1000research.12168.1)

ELIXIR Annual Report 2017

Rare diseases Use case The ELIXIR Rare Diseases Use Case aims to build a portfolio of ELIXIR resources to address the needs of the rare diseases research community. The goal of this Use Case is to create a federated infrastructure that will enable researchers to discover, access, and analyse different rare disease repositories across Europe. It is doing this in partnership with other European infrastructures and projects, namely RDCONNECT, BBMRI-ERIC and E-Rare.

Catalogue of Rare Disease data resources and tools Based on a user survey carried out in 2016 and 2017, the Rare Diseases Use Case prioritised the first resources to include in the ELIXIR infrastructure and published a catalogue of resources, data sources and methods for the RD communities. The first version of the resulting catalogue – released in February 2017 – comprised 51 resources and analysis tools. Throughout 2017, the catalogue expanded to over 100 different resources, most of which are now registered in the ELIXIR Service and Tools registry (bio.tools) as part of a specific Rare Diseases collection1. The rare disease portfolio of resources will be regularly evaluated using a selection of datasets and benchmarking strategies. The development of the evaluation procedure was informed by the ELIXIR Tools Platform as part of the ELIXIR benchmarking strategy.

Rare Disease training capacity and needs survey In the first half of 2017, the Rare Disease Use Case carried out a survey to collect data about the training capacity and needs of the rare diseases research community. The goal was to use the data to develop new training courses and workshops that will be developed in collaboration with the ELIXIR Training Platform and will address the existing skills gap within the community.

Rare Diseases Implementation Studies An ELIXIR Implementation Study was set up to connect the RD-Connect platform for rare disease research2 with the European Genome-phenome Archive (EGA)3. This Implementation study developed a solution for visualizing the data stored in the EGA by allowing authorized users to visualize files from the EGA through a genome browser (such as Genome Maps or the Integrated Genome Browser) that is integrated with the RD-Connect platform. The results of this Implementation Study were presented to researchers in an ELIXIR webinar in early 20184. The second ELIXIR Implementation study in 2017 aimed to map out the requirements for making sources of data FAIR, with a particular focus on the interoperability of molecular rare disease data, as well as on enabling federated queries5. This Implementation study improved variant-phenotype mapping and visualization, and introduced better ontology based annotation of the data. In addition, it also identified problems and challenges in making rare disease data FAIR.

1. https://goo.gl/LSfv9k 2. https://platform.rd-connect.eu 3. Visualization of aligned genomics data for rare diseases (RD-Connect) as a driver for re-al-time access of controlled data at the EGA: https://www.elixireurope.org/visualisation-aligned-rd-data 4. https://www.elixir-europe.org/events/elixir-webinar-visualisation-raredisease-genomics- data 5. Interpretation of phenotypic and genotypic variation for rare diseases in terms of biological pathways: https://www.elixir-europe.org/about-us/ implementation-studies/interpretation-phenotypic-and-g enotypic-variation

ELIXIR Annual Report 2017

Marine metagenomics Use case The Marine Metagenomics Use Case aims to develop a sustainable metagenomics infrastructure to enhance research and industrial innovation within the marine domain. Marine metagenomics resources range from deposition archives with research data output to highly dynamic knowledgebases that aggregate and process research data through manual curation and complex analysis pipelines.

Data standards for the marine research community In June 2017, the Marine Metagenomics Use Case published its first publication on best practices1. The paper proposes best practice as a foundation for a community standard to enable reproducibility and the better sharing of metagenomics datasets, leading ultimately to greater metagenomics data reuse and repurposing. It outlines best practice for the reporting of metagenomics workflows throughout four essential steps: (1) sampling, (2) sequencing, (3) data analysis, and (4) data archiving, and highlights essential variable parameters and common data formats in each step.

Marine-specific data resources In March 2017, the Use Case launched the Marine Metagenomics Portal2, which contains three contextual and sequence reference databases: MarRef, MarDB and MarCat. MarRef3 is a database for completely sequenced marine prokaryotic genomes, MarDB4 is a database of sequenced marine prokaryotic genomes regardless of the level of completeness, and MarCat5 is a catalogue of marine genes and proteins derived from metagenomics samples. This Use Case also released two additional databases: ITSoneDB6, a comprehensive collection of marine fungal ribosomal RNA Internal Transcribed Spacer 1 (ITS1) to support metabarcoding surveys of fungal and other microbial eukaryotic environmental communities, and Eukaryotic gene catalogue7, a unigene catalog from the samples collected by the Tara Ocean samples.

Metagenomics analysis pipelines Working under the umbrella of the ELIXIR Marine Metagenomics Use Case, EMBL-EBI significantly updated the EMBL-EBI’s Metagenomics resource (EMG)8, including an overhaul of the taxonomic profiling section of their pipeline, and updating the underlying reference databases and tools. These updates facilitated higher-resolution taxonomic assignments which helped to classify over 70 % of previously unclassified 16s rRNAs. EMG also became the first adopters of the Common Workflow Language (CWL) within the Marine Metagenomics Use Case. The adoption of the CWL enabled the existing in-house pipeline to be replaced with a considerably simpler code of CWL. All EMG workflow descriptions are now available on a public repository 9. The Marine Metagenomics Portal also improved their META-pipe pipelines to enhance the precision and accuracy of biodiversity and function analysis. The Metagenomics Portal also released MAR BLAST10, a search engine for interrogating marine metagenomics datasets. This tool enables BLAST searches to be performed on all genes and proteincoding sequences from the marine databases MarRef, MarDB and MarCat.

5. https://mmp.sfb.uit.no/databases/marcat/ 1. ten Hoopen P, Finn RD et al. The metagenomic data life-cycle: standards and best practices, GigaScience 2017, 6(8):1–11, (doi: https://doi.org/10.1093/ gigascience/gix047) 2. https://mmp.sfb.uit.no 3. https://mmp.sfb.uit.no/databases/marref/#/ 4. https://mmp.sfb.uit.no/databases/mardb/

6. http://itsonedb.cloud.ba.infn.it/ 7. https://www.ebi.ac.uk/ena/data/view/ERZ480625 8. https://www.ebi.ac.uk/metagenomics/ 9. https://github.com/EBI-Metagenomics/ebi-metagenomics-cwl 10. https://mmp.sfb.uit.no/blast/

ELIXIR Annual Report 2017

Plant science Use case The ELIXIR Plant science Use Case is building a common technical infrastructure and associated social practices to support plant genotype-phenotype analysis based on the widest available public datasets. The goal is to make plant genotypic and phenotypic data easier to find, integrate and analyse, by making them FAIR.

Standards for plant data and metadata The Plant Sciences Use Case developed and proposed an extension of the Minimal Information about Plant Phenotyping Experiments (MIAPPE) v1.0 specification1. This extension is now out for wider community consultation, and the Use Case continues to work on the model. During the subsequent stages of the MIAPPE’s development, special attention was given to biosource attributes for non-crop species, such as forest trees, which were not well-represented by the previous specification. The specification was also complemented with a list of proposed ontologies and expected data types to help users annotate their data. This revised specification is now considered for potential adoption as MIAPPE v1.1. A workshop organised in Oeiras, Portugal, in September 2017, also produced the backbone of the MIAPPE specification integration within RDF (Resource Description Framework), which is compliant with RDA -WDI standards (Research Data Alliance – Wheat Data Interoperability). To ensure the broad adoption of the data and the extended standards, Plant Sciences Use Case members played an active role in establishing the formal governance structure for MIAPPE, to serve as a persistent body to integrate, validate and promote efforts to develop standards in this area. Two members of the Plant Science Use Case were selected to serve on the initial steering committee. This Committee will ensure that information is exchanged between relevant projects. It will also identify future priority areas and establish working groups to address these priorities, and it will identify opportunities for outreach and funding.

1. http://www.miappe.org 2. https://www.ebi.ac.uk/ols/ontologies/co_357 3. https://bitbucket.org/PlantExpAssay/ontology

ELIXIR Annual Report 2017

The Use Case also developed new ontology lists to address gaps in existing vocabularies. The Woody Plant Ontology2 was released in August 2017. It provides all variables used for woody plant observations, collected from various past and ongoing projects at national and international levels. The Plant Experimental Assay Ontology3 focuses on the description of pipelines of manipulations performed from specimens to data and contains entities from three distinct realms (biological, physical and data), including experimental products, their relations, and the protocols describing the manipulation with the products.

Plant data discovery and access – Breeding API The Plant Science Use Case participated in the Breeding API (BrAPI) project, which aims to develop and implement a Web Service API for exchanging data on plant material, and on phenotyping and genotyping, mainly for breeding purposes. The Plant Science Use Case also organised three ‘Bring Your Own Data’ hackathons to showcase the potential of FAIR data in the context of plant research and to demonstrate how to ensure the interoperability of plant domain data, using MIAPPE and ELIXIR Interoperability Platform resources.

Use Case leaders

Human data

Serena Scollen

Thomas Keane

Jordi Rambla

Ivo Gut

Marco Roos

Rare diseases

Serena Scollen

Marine metagenomics

Nils Peder Willassen

Rob Finn

Plant sciences

Paul Kersey

Celia Miguel

ELIXIR Annual Report 2017

New ELIXIR Communities

Metabolomics

Following the review of the existing ELIXIR Use Cases, the Head of Nodes Committee invited research communities in ELIXIR to submit proposals to become recognised ELIXIR Communities.

The Metabolomics Community will facilitate an ELIXIR infrastructure for metabolite identification, in order to help scientists to better understand the biochemistry of organisms. The tools used for metabolite identification produce large data sets that are more efficiently analysed, reported and stored using resources connected by ELIXIR. Representatives from ten Nodes met in April 2017 to discuss metabolomics within the scope of ELIXIR. As with the Proteomics Community, the outcomes of this meeting have been described in a paper published in ELIXIR’s F1000 Research channel2.

From the submitted proposals, the Head of Nodes Committee selected three new Communities, covering Proteomics, Metabolomics and Galaxy. In 2017, these three new Communities established their structure and leadership and prepared their work programme for 2018. A further seven Community proposals will undergo evaluation during 2018.

Proteomics The Proteomics Community aims to align ELIXIR activities with the needs of scientists researching protein expression and interactions. Merging currently available and future sustainable proteomics resources into existing ELIXIR Platforms and Use Cases will help to integrate proteomics data with multi-omics data. The Proteomics Community will also improve data processing and analysis pipelines, and create guidelines for proteomics data management and annotation. Representatives from eleven Nodes discussed ELIXIR’s activities in proteomics at a strategic meeting held in March 2017. Evidence-based recommendations arising from this meeting were published in ELIXIR’s F1000 Research channel1.

1. Vizcaíno JA, Walzer M, Jiménez RC et al. A community proposal to integrate proteomics activities in ELIXIR. F1000Research 2017, 6:875 (doi: 10.12688/ f1000research.11751.1) 2. van Rijswijk M, Beirnaert C, Caron C et al. The future of metabolomics in ELIXIR. F1000Research 2017, 6(ELIXIR):1649 (doi: 10.12688/ f1000research.12342.2)

Galaxy Galaxy is a workflow management system that removes the need for users to compile and install tools, at the same time, facilitating the sharing of data and results so that science is reproducible. The Galaxy Working Group formed within the Tools Platform. As a recognised Community, Galaxy will continue to support other Communities, such as the Proteomics and Metabolomics Communities. It will also develop a strategy with the Nodes to increase the availability of data visualisation tools, to integrate the Galaxy resources with ELIXIR AAI, and to further develop training materials and events.

Members

ELIXIR Nodes updates

Members Belgium Czech Republic Denmark EMBL-EBI Estonia Finland France Germany Hungary Ireland Israel

Observers Italy Luxembourg Netherlands Norway Portugal Slovenia Spain Sweden Switzerland UK

ELIXIR Annual Report 2017

Greece

Belgium • ELIXIR Belgium officially launched in Ghent in February with a one-day event, including a data management workshop

• Organised workshops and trainings related to ELIXIR CZ services, including: Advanced in silico drug design workshop, RNA-seq Chipster online course, and a Repeat Explorer workshop

• Finalised and approved the ELIXIR Collaboration Agreement

• Presented ELIXIR CZ at a day on national Research Infrastructures organised by the Czech Ministry of Education, Youth and Sport

• Started the Implementation Study ‘ELIXIR Integration from a User perspective’ and is due to participate in five more Implementation Studies

Denmark

• Organized a BYOD hackathon together with ELIXIR Netherlands, as well as a workshop on one of ELIXIR Belgium’s Node services (PLAZA), and two training courses on data-mining and data-processing • Co-organized the European Galaxy developer Workshop (Strasbourg), the ELIXIR/GOBLET/ GTN hackathon for Galaxy training material re-use (Cambridge), and the data hackathon of the Galaxy Community Conference (Montpellier) • Hosted the ELIXIR Innovation and SME-forum on ‘Data-Driven Innovation in Food, Nutrition and Microbiome’ in Brussels • Supported the Belgian Metabolomics Day and the Benelux Bioinformatics Conference, presenting ELIXIR Belgium at these events • Implemented an automated procedure for Belgian training events to aggregate training information on TeSS

Czech Republic • Secured additional grant from the national research infrastructure development fund • Successfully passed an external interim evaluation of the ELIXIR Czech Republic infrastructure project • Completed the ELIXIR Implementation Study for ELIXIR AAI Production 2017 (with ELIXIR Finland) • Started work on one new ELIXIR Implementation study (DataMovement – ELIXIR Proof of concept study on the availability of big datasets on remote compute infrastructure), and was involved in two more ELIXIR Implementation Studies in 2018 (AAI Production 2018, and Towards Data Stewardship in ELIXIR)

• Expanded the ELIXIR registry of bioinformatics tools and data services (https://bio.tools) to over 10,000 entries in total from 681 contributors and 293 domains, representing most major European service providers • Published a schema (https://github.com/bio-tools/ biotoolsSchema/) and ontology (https://github.com/ edamontology/) releases for the formalised syntactic and semantic description of tools • Organised or participated in five events throughout 2017 around the development and use of bio.tools • Ran a studentship scheme supporting students to work on curation-focused mini-projects with an impact on bio.tools content growth and quality • Co-initiated ELIXIR Implementation Studies for “Architecture for Software Containers” and “ELIXIR Integration from a User perspective” • Helped create a community to integrate proteomics activities throughout ELIXIR • Organised the third annual Danish Bioinformatics Conference (August 2017) • Co-authored PLOS Biology paper on the design, provision and re-use of identifiers in the life sciences1 • Co-authored GigaScience paper to present the ReGaTE utility for registration of Galaxy tools in bio. tools (see also the Tools Platform section)2 • Co-authored F1000R paper on encouraging best practices in research software3 • Co-authored F1000R paper to present the ToolDog (Tool DescriptiOn Generator) to facilitate the integration of tools registered in the ELIXIR tools registry (see also the Tools Platform section)4

• Ran an ELIXIR Staff Exchange project with EMBL-EBI to work on the 3DPATCH project • Organised an annual ELIXIR CZ conference for infrastructure partners and users

1. Julie A. McMurry et al., Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data. PLOS Biology 2017 (doi: 10.1371/journal.pbio.2001414)

ELIXIR Annual Report 2017

EMBL-EBI

Estonia

• 13 EMBL-EBI resources were named as ELIXIR Core Data Resources; 12 EMBL-EBI resources were named as ELIXIR Deposition Databases

• Started national actions (service development and maintenance, and training) supported through EU Structural Funds (€1.3M, until 2022)

• Implemented system for replicating data sets between ELIXIR Nodes

• Co-hosted ELIXIR Innovation and SME Forum in Helsinki, Finland, in February 2017 (with ELIXIR Finland)

• Integrated the ELIXIR Authentication and Authorization Infrastructure at EMBL-EBI • Registered EMBL-EBI tools in Bio.tools • Commenced Implementation Study on the development of architecture for software containers at ELIXIR and their use by EXCELERATE Use Cases • Led (with ELIXIR Netherlands) the creation of a Community to integrate metabolomics activities throughout ELIXIR • Contributed to development of the MIAPPE data standards for plant phenotyping data, and BrAPI, the plant breeding API, and the integration of phenotypic data with EMBL-EBI data resources • Implemented Bioschemas in a number of ELIXIRaffiliated EMBL-EBI data resources • Led the creation of a Community to integrate proteomics activities throughout ELIXIR • Started two ELIXIR Implementation Studies: (1) on plant phenotyping data validation in collaboration with the University of Gent Center for Plant Systems Biology (VIB); and (2) to create open proteomics analysis pipelines for Data Dependent Acquisition approaches (in collaboration with ELIXIR-DE) • Deployed ELIXIR AAI for the BioSamples database • Introduced the Human Cell Atlas project into the ELIXIR Human Data Use Case • Collaborated with the Ontology Lookup Service (OLS) to resolve CURIEs used for ontology terms • Implemented the ELIXIR Beacon network for relevant EMBL-EBI data resources • Began implementation of the htsget secure streaming API with RD-Connect • Organised four training courses in Europe, five courses at EMBL-EBI, and two webinars in data analysis and data management

ELIXIR Annual Report 2017

• Initiated work on the Implementation Study ‘ELIXIR Integration from a User perspective’ together with ELIXIR UK and ELIXIR Belgium as main cocontributors • Deployed ELIXIR AAI for the Virtual Coffee Room (https://cafe.elixir.ut.ee) • Launched funcExplorer, https://biit.cs.ut.ee/ funcexplorer/, a web tool for fast data clustering coupled with enrichment analysis • Co-authored ELIXIR F1000R paper to present the ToolDog (Tool DescriptiOn Generator) to facilitate the integration of tools registered in the ELIXIR tools registry (See also the Tools Platform section)5

Finland • Secured €1.6 million grant from national research infrastructure roadmap to expand secure cloud e.g. for cancer research • Organized a metagenomics course with trainers from ELIXIR Norway, EMBL-EBI, and ELIXIR Finland. Developed a training course on RNA-seq data analysis with Chipster via the eLearning platform with the ELIXIR Czech Republic and ELIXIR Slovenia Nodes • Developed a protocol to launch a copy of the META-pipe metagenomics annotation pipeline in OpenStack clouds and in the EGI-federated cloud (in collaboration with ELIXIR Norway and ELIXIR Czech Republic). The annotation service based on this protocol is now being used in ELIXIR Finland • Integrated tools for single-cell RNA-seq data analysis in the Chipster platform and organized a training course on it • Started the Implementation Study on ‘Using clouds and VMs for training’ • Continued developing and operating the ELIXIR AAI together with ELIXIR Czech Republic; represented ELIXIR in the AARC and AARC2 project

• Co-hosted ELIXIR Innovation and SME Forum in Helsinki, Finland, in February 2017 (with ELIXIR Estonia) • Contributed to the development of training on national data management planning and to the development of guidance on national data management planning for sensitive data to be added to DMPTuuli (the Finnish instance of DMPonline, an online platform to help researchers create, review, and share data management plans that meet institutional and funder requirements). • Run an ELIXIR Staff exchange project with ELIXIR Spain to develop Local EGA technologies, and to develop secure human data discovery and transfer

France • Formally launched the new National Research Infrastructure Roadmap for ensuring the long-term sustainability of the national Node • Held the first European Galaxy Administrator Workshop

• Organised the second International de.NBI Symposium “The Future Development of Bioinformatics in Germany and Europe” (October 2017) • Organised 69 training courses with a total of 1,489 participants • Established a de.NBI cloud at five universities in Bielefeld, Freiburg, Gießen, Heidelberg, and Tübingen, and integrated it with ELIXIR’s AAI system

Israel • Developed an ELIXIR Staff Exchange project with EMBL-EBI in Structural Biology (to start in January 2018) • Hosted a visit of the ELIXIR Director to the ELIXIR Israel Node • Announced the physical location of the Node as being at the Nancy and Stephen Grand Israel National Center for Personalized Medicine

• Held the ELIXIR Innovation and SME Forum on ‘Data Driven Innovation in Rare Diseases and Personalised Medicine’ • Organised and Hosted the Galaxy Community Conference 2017 (June) • Organised the first ELIXIR BioContainers Hackathon (October 2017) • Hosted the ELIXIR Board meeting (November 2017) • Launched and offered nearly 80 applications and around 2,000 containers in the Cloud Federation Biosphere • Trained over 1,200 people through more than 110 training courses

Germany • Received further support from the Federal Ministry of Education and Research (BMBF) to run the national ELIXIR Node for another two years until February 2020 • Organized the strategic workshop on ‘The Future of proteomics in ELIXIR’ (March 2017) and published a white paper in the ELIXIR F1000R channel6 • Organised the strategic workshop on ‘The Future of Metabolomics in ELIXIR’ (April 2017) and published a white paper in the ELIXIR F1000R channel7

2. Doppelt-Azeroual O, Mareuil F et al. ReGaTE: Registration of Galaxy Tools in ELIXIR, GigaScience 2017, 6(6): 1-4, (doi: 10.1093/gigascience/gix022) 3. Rafael C. Jiménez et al., (2017), Four simple recommendations to encourage best practices in research software, (doi: 10.12688/ f1000research.11407.1) 4. Hillion KH, Kuzmin I, Khodak A, Rasche E, Crusoe M, Peterson H, Ison J, Ménager H et al. Using bio.tools to generate and annotate workbench tool descriptions. F1000Research 2017, 6(ELIXIR):2074 (doi: 10.12688/ f1000research.12974.1) 5. Hillion KH, Kuzmin I, Khodak A, Rasche E, Crusoe M, Peterson H, Ison J, Ménager H et al. Using bio.tools to generate and annotate workbench tool descriptions. F1000Research 2017, 6(ELIXIR):2074 (doi: 10.12688/ f1000research.12974.1) 6. Vizcaíno JA, Walzer M, Jiménez RC et al. A community proposal to integrate proteomics activities in ELIXIR. F1000Research 2017, 6:875 (doi: 10.12688/f1000research.11751.1) 7. van Rijswijk M, Beirnaert C, Caron C et al. The future of metabolomics in ELIXIR. F1000Research 2017, 6(ELIXIR):1649 (doi: 10.12688/ f1000research.12342.2)

ELIXIR Annual Report 2017

Ireland • Submitted ELIXIR Node application, which was approved by the ELIXIR SAB in September 2017 • Developed the ELIXIR Service Delivery Plan with a selection of ELIXIR Ireland services to be offered through ELIXIR, including: BioOpener (resource to access and work with fragmented biomedical repositories); CancerGD.org (resource for analyzing and interpreting genetic dependencies in cancer); Clustal Omega (Multiple Sequence Alignment package); Riboseq.Org (for ribosome profiling (RiboSeq) data analysis); SLiMs (suite of tools and repositories for the analysis, visualisation and dissemination of short linear motifs in proteins); and, Whatizit (text-processing system linking biomedical terms to publicly available databases)

Italy • Launched or significantly updated 11 databases: DisProt, RepeatsDB, ITSoneDB, REDIportal, eDGAR, Galactosemia Proteins Database, MobiDB (for which we obtained a persistent Identifiers.org prefix), MINT, Signor 2.0, PeachVarDB, and HmtDB • Launched or updated 12 tools and pieces of software: MetaShot, A-GAME, RNentropy, CoVaCS, ISPRED 4, SChloro, BAR 3.0, Disnor, MToolBox, BEAM, MobiDB-lite, and SODA • Included MobiDB (part of InterPro) and MINT (within IMEx Consortium) in the list of ELIXIR Core Data Resources • Re-organised the internal structure of ELIXIR Italy to mirror ELIXIR’s overall structure. ELIXIR Italy now has five platforms with leads and deputies to better integrate the national activities within ELIXIR Platforms and Use Cases • Developed and approved a quality management policy for ELIXIR Italy database and tool services • Registered or updated ELIXIR Italy bioinformatics services into the ELIXIR Tools and Services Registry: a total of 174 ELIXIR Italy tools and databases are registered in bio.tools • Secured Horizon 2020 funding as part of the MCSARISE European grant

• Organised 10 courses and workshops all over Italy, including in Rome, Naples, Padua, Bari, Milan, Cagliari, Trento, Salerno, and Palermo. Ran five training courses, three workshops, one tutorial and one ELIXIR-EXCELERATE Train the Trainer course • Held the first Summer School in advanced computational metagenomics (June 2017) • Co-organised a nine-day EMBO Practical course on Population genomics (May 2017) • Developed and made available in an open repository training course materials: https://github.com/ELIXIRIIB-training • Launched a new web page for the ELIXIR-IT Training Platform at: https://elixir-iib-training.github.io/ website • Provided access to ownCloud services hosted at INFN Cloud infrastructure • Provided HPC resources to 20 projects through the ELIXIR-IT HPC@CINECA initiative • Worked on the “ELIXIR-IT integration” ELIXIR Implementation study • Secured a national grant with a total budget of €400,000 • Co-authored over 20 publications about released or updated services and best practices

Luxembourg • Finalised and approved the ELIXIR Collaboration Agreement • Officially launched ELIXIR Luxembourg on 7 September 2017, with a half-day symposium • Started the Implementation Study on ‘Integrating ELIXIR-Luxembourg into ELIXIR Activities’ • Engaged in three additional Implementation Studies with the Training Platform that were approved for funding • Organised and hosted a Training course in Luxembourg: Data processing with R tidyverse (four days in November 2017) • Deployed OpenStack-based IT infrastructure for Data and Compute services • Launched a new high performance database cluster for data hosting at the Node

ELIXIR Annual Report 2017

• Developed the Translational Medicine Data Catalogue8 in collaboration with eTRIKS

Netherlands

• Established a sustainability solution for the Innovative Medicine Initiative project eTRIKS, which is transferable to other projects (data hosting agreements in progress)

• Organised Health-RI conference 2017, together with other Dutch research infrastructures, presenting a business plan for the Health-RI initiative. Received commitment of ~20 organisations, including funders towards this initiative

• Co-authored ‘The Future of metabolomics in ELIXIR’7 as a member of the new ELIXIR Community on Metabolomics

• Organised an ELIXIR track at the Dutch Bioinformatics conference BioSB2017 in April

• Co-organised the international conference Impact of Big Data Analytics on Healthcare9 with the Luxembourg Centre for Systems Biomedicine (LCSB) on 4–5 October, 2017

Norway • Secured national funding for ELIXIR Norway for 2017–2021 (€15 m) • Launched the Marine Metagenomics Portal10 (March 2017), which contains the MAR databases; MarRef, MarDB and MarCat, and META-pipe. • Contributed to the GigaScience paper ‘The metagenomic data life-cycle: standards and best practices’11 • Upgraded the Norwegian e-infrastructure for Life Science (NeLS) • Set in production integrated service provision with NorSeq sequencing consortium, providing endusers’ data through NeLS

• Organised two Data Carpentry Genomics workshops and one Software Carpentry / Data Carpentry Instructor training • Organised three BYOD hackathons on ELIXIR-BrAPI (with ELIXIR Belgium), Cancer Genomics, and at the Rare Disease Summer school (with ELIXIR Italy), as well as a FAIR/BYOD workshop at Bio IT World 2017 in Boston, USA • Co-authored two F1000R papers in the ELIXIR channel, one about Software and Data Carpentry12, and the other about Linking EGA, Galaxy & tranSMART13 • Organised a workshop about FAIR Data and Data Stewardship in ELIXIR and a workshop to receive feedback on the Data Management Plan wizard during the ELIXIR All Hands Meeting in Rome, March 2017 • Organised ELIXR Netherlands Roadmap partner meeting in Utrecht, January 2017

• Secured continued Nordic funding for handling sensitive bioinformatics data (Tryggve2, 2017–2020) • Organised 11 training workshops on the NeLS platform, on NGS data analysis, meta analysis, and data storage • Worked with FAIRDOM UK to integrate their SEEK system with the NeLS platform for handling project meta-data and data, initially for Digital Life Norway projects • Contributed to data management hands-on courses organized by Digital Life Norway • Co-organised the “Metagenomics data analysis” workshop, Helsinki, Finland (April, 2017)

8. http://datacatalog.elixir-luxembourg.org 9. https://bigdata.uni.lu 10. https://mmp.sfb.uit.no 11. Petra ten Hoopen, Robert D. Finn et al. The metagenomic data life-cycle: standards and best practices, GigaScience, Volume 6, Issue 8, 1 August 2017, Pages 1–11, https://doi.org/10.1093/gigascience/gix047 12. Pawlik A, van Gelder CWG, Nenadic A et al. Developing a strategy for computational lab skills training through Software and Data Carpentry: Experiences from the ELIXIR Pilot action. F1000Research 2017, 6:1040 (doi: 10.12688/f1000research.11718.1) 13. Zhang C, Bijlard J, Staiger C et al. Systematically linking tranSMART, Galaxy and EGA for reusing human translational research data. F1000Research 2017, 6:1488 (doi: 10.12688/f1000research.12168.1)

ELIXIR Annual Report 2017

• Organised ELIXR Netherlands Kick-off Meeting in Utrecht, October 2017. This meeting brought together important parties for a FAIR infrastructure for data and services (e.g. DRE and euroCAT), leading to collaborations on the Personal Health Train

• Launched ELIXIR Portugal Node Computing Service (proposal) providing virtual machines to researchers in bioinformatics using a cloud-based OpenStack platform (available images include: Genomics Virtual Lab, Galaxy Docker, IGC Galaxy)

• Participated as an ELIXIR representative in RDA meetings in Barcelona and Montreal

Slovenia

• Presented an ELIXIR webinar on FAIR data tooling

• ELIXIR-SI entity (BEE) established in 2017 new collaborations with: The Agricultural Institute of Slovenia; the Biotechnical Faculty and Faculty for Computer and Information Science in Ljubljana; and with the University Medical Centre Ljubljana, to build on pre-existing collaborations with the National Institute of Biology, Arnes (Geant, EGI and PRACE members) and with the Institute Jožef Stefan

• Published a Briefings in Bioinformatics paper entitled: ‘Bioinformatics in the Netherlands: the value of a nationwide community’14 • Organised two workshops with the Health Research Board in Ireland to teach FAIR data stewardship to their internal organisation and to their researchers • Organised a three-day meeting of the ELIXIR Interoperability Platform at the Vrije Universiteit Amsterdam, January, 2017 • Expanded ELIXIR Netherlands to include 55 partners, of which 10 are currently active in ELIXIR projects • Participated in an ELIXIR Implementation Study on Data Management Planning (together with other ELIXIR Nodes) • Participated in three new Use Case proposals (Metabolomics, Nutrition and Toxicology) • Ran an ELIXIR Staff Exchange project in Plant Breeding (BrAPI), with EMBL-EBI

Portugal • Organised an ELIXIR-EXCELERATE Plant Sciences Use Case workshop to build a domain ontology for plant phenotyping data (September 2017) • Participated in ELIXIR Staff Exchange project with ELIXIR Netherlands • Contributed to the development of the MIAPPE specifications • Presented ELIXIR Portugal at Fórum de Gestão de Dados de Investigação (Research Data Management Forum) and at the Bioinformatics Open Days in Braga • Launched a Community Service Registration (as part of bio.tools) to help Portuguese community to share and promote their bioinformatics services • Started the execution of the BioData project, the national project that supports the ELIXIR Portuguese node • Organized 10 training courses, integrated into the (proposed) GTPB Node Service • Further improved the (proposed) Yeastract Node Service with: updated Gene Ontology terms, updated Gene information from Saccharomyces Genome Database, and new curated regulatory information 36

ELIXIR Annual Report 2017

• Used the ELIXIR-SI eLearning Platform (EeLP) to support bioinformatics courses and webinars that were run across Europe https://elixir.mf.uni-lj.si (available also via http://elearning.elixir-slovenia.org) • (Co-)organised eleven training courses and workshops (courses are in EeLP), in collaboration with several ELIXIR Nodes, including Czech, German, Finnish, Spanish, French, Italian, Portugese and Swedish • Co-authored an ELIXIR F1000R paper on ‘Ten steps to get started in genome assembly and annotation’ (as part of ELIXIR Capacity Building and Training/ eLearning activity)15. • Co-authored F1000R paper to present the ToolDog (Tool Description Generator) to facilitate the integration of tools registered in the ELIXIR tools registry (see also the Tools Platform section). • Became new partners for training in two ELIXIR Implementation Studies (Beacons and CWL) • Partnered with ELIXIR-ES in Staff Exchange project approved for 2018 (Enhanced Cloud Computing with Resource Auto-Scaling for Educational Software) • Received €88,000 research infrastructure grant from the Slovenian Research Agency

Spain • Formally joined ELIXIR in October 2017; Spain had previously been a Provisional Member • Secured national funding for 2018–2020. Spanish National Institute of Bioinformatics (ELIXIR Spain) expanded from 10 to 19 groups from 13 different institutions • Launched a national network of bioinformatics groups working at Health Research Institutes associated with Hospitals (TransBioNet)

• The European Genome-phenome Archive (EGA) – jointly managed and developed by the EBI-EMBL and ELIXIR Spain – named ELIXIR Core Data Resource and included in the recommended ELIXIR Deposition Databases • Hosted an ELIXIR Innovation and SME Forum event in June 2017 in Barcelona around data driven innovation in health and personalised genomics • Contributed to three implementation studies concerning Human Data and Rare diseases Use Cases: ELIXIR Beacon 2017, Implementation of phenotypic and genotypic variation for rare-diseases in terms of biological pathways, and Remote realtime visualization of human rare disease genomics data (RD-Connect) stored at EGA • Co-organised in collaboration with the Centre of Excellence for Molecular Biology (BioExcel), a Bring Your Own Workflow (BYOF) hackathon/meeting in Amsterdam, November 2017. • Co-organised a training course on High Performance Computing together with ELIXIR Slovenia (April 2017 in Malaga, Spain) • Participated in the Staff Exchange programme by hosting colleagues from Sweden, Finland and Italy, working on EGA, ELIXIR AAI, and OpenEBench • Co-authored and published a position paper on scientific benchmarking activities and how benchmarking should be taken into account when designing infrastructures to support life science research16

Sweden • Secured funding from the Swedish Research Council of €5.7 m (57 MSEK) for national and European activities, from 2018–2020; a substantial increase compared to previous years

EGA, in collaboration with other Nordic ELIXIR nodes and with ELIXIR Spain • Organised 20 national advanced training events in bioinformatics • Organised a distributed course in linux using the e-learning system from the Slovenian ELIXIR Node • Ran weekly drop-in sessions at all major sites in Sweden where researchers could meet bioinformaticians to discuss projects

Switzerland • Two core resources of ELIXIR Switzerland (SIB – Swiss Institute for Bioinformatics) were named as ELIXIR Core Data Resources: UniProt, the world reference resource for protein sequence and functional information; and STRING, the knowledgebase of in-depth information on proteinprotein interactions • Launched two new core resources as recommended by SIB Scientific Advisory Board: V-Pipe, an emerging tool to help research on virus genomics, and SwissLipids, a comprehensive knowledgebase of lipids • Conducted a study, supported by ELIXIR and published on the F1000R channel, that identified a sustainable funding model for core data resources in the life sciences17 • Initiated the implementation of a national secure and interoperable infrastructure, as part of the Swiss Personalized Health Network (SPHN)18 • Launched the first Swiss Certificate of Advanced Studies in personalized molecular oncology, together with the University Hospital of Basel and the University Hospital of Lausanne

• Secured Nordic funding from NordForsk for sensitive data as part of the Tryggve project, with €600,000 (6 MSEK) allocated to Sweden for 2018–2020 • Organised two Capacity Building Workshops on Genome Annotation and Assembly (in Slovenia and Portugal) • Participated in three Implementation Studies: Data movement, ELIXIR Beacons, and Local Ensembl • The Human Protein Atlas was named as an ELIXIR Core Data Resource • Launched the new Pathology Atlas with an analysis of all human genes in all major cancers, which showed the consequence of protein levels for overall patient survival (August 2017) • Contributed to the development of Local/Federated

14. Celia W. G. van Gelder, Rob W. W. Hooft, Merlijn N. van Rijswijk, Linda van den Berg, Ruben G. Kok, Marcel Reinders, Barend Mons, Jaap Heringa; Bioinformatics in the Netherlands: the value of a nationwide community, Briefings in Bioinformatics, (doi: https://doi.org/10.1093/bib/bbx087) 15. Dominguez del Angel V, Hjerde E, Sterck L, Capella-Gutierrez S, Notredame C, Vinnere Pettersson O, Amselem J, Bouri L, Bocs S, Leskošek B, et al. Ten steps to get started in genome assembly and annotation. F1000Research, 2018, vol. 7, https://f1000research.com/articles/7-148/v1 16. Capella-Gutierrez S, De la Iglesia D, Haas J, Lourenco A, Fernandez Gonzalez JM, Repchevsky D, Dessimoz C, Schwede T, Notredame C, Gelpi JL, Valencia A. Lessons Learned: Recommendations for Establishing Critical Periodic Scientific Benchmarking. bioRxiv 181677; https://doi. org/10.1101/181677 17. Gabella C, Durinx C and Appel R. Funding knowledgebases: Towards a sustainable funding model for the UniProt use case [version 1; referees: 3 approved]. F1000Research 2017, 6(ELIXIR):2051 (doi: 10.12688/ f1000research.12989.1)

ELIXIR Annual Report 2017

• Organised 53 courses in bioinformatics-related topics, spanning 94 days of teaching and training nearly 1,200 researchers • Hosted an ELIXIR Train the Trainer workshop in Lausanne (January 2017) and contributed to developing training materials • Contributed to the ELIXIR paper to present the ELIXIR-EXCELERATE Train-the-Trainer pilot programme19 • Contributed to the ELIXIR Training platform work to define metrics for measuring quality/impact of training • Contributed to the development of SourceData20 by EMBO, an open-access platform to bring to the surface information buried in the figures of scientific papers20 • Organized the first Swiss bioinformatics public hackathon • Co-organised, together with the University of Basel, the 13th Basel Computational Biology Conference [BC]2, which had more than 500 participants from 24 countries

Birmingham Metabolomics Training Centre) • Organised the ELIXIR/GOBLET/GTN hackathon for Galaxy training material re-use (May 2017) • Secured funding for three Node resources: PHI-base (from Smart Crop Protection Institute); InterMine and and ISA Tools (from The Wellcome Trust); and CATH/ Gene3D (from the BBSRC)

Greece (Observer) • Secured national funding for ELIXIR Greece for 2018–2020, with an overall budget of €4 million • Funds are for a shared national compute resource, and for: 11 databases, 23 tools, a middleware, and for five community pilot actions (Marine bioinformatics, Computational Metabolomics, Protein interactomics, NcRNA biomarker identification, and Pathogen metagenomics) • Funded partners are from six universities, eight research centers, and a national IT infrastructure provider

United Kingdom

• Designed a website for ELIXIR Greece, which includes a national registry of bioinformatics/ computational biology resources (to be launched in the first quarter of 2018)

• Finalised the ELIXIR-UK Consortium Agreement

• Installed a monitoring system for resource usage

• The Pathogen-Host Interactions database joined as an ELIXIR-UK Node Resource, and the CATH database was awarded ELIXIR Core Data Resource status

• Conducted a survey on computing demands for the design of the ELIXIR Greece compute resources

• Led two 2017 ELIXIR Implementation Studies (Bioschemas and Learning Paths) and will lead the 2018 Data Implementation Study (CATH, SwissModel, PDBe, and InterPro) • Partnered in three 2018 ELIXIR Implementation Studies in Interoperability (Validation, Workflow interoperability) and Data (FAIRmetrics) • Ran ELIXIR Portals including: TeSS, which saw a 32% increase in users; and FAIRsharing, which now includes over 1000 standards, 1000 databases, and 100 data policies • Published three ELIXIR F1000R publications22, and co-authored an additional three papers23 • Partnered with Software Sustainability Institute and BBSRC to develop guidance or software outputs in funding awards, and ran a national workshop on licensing • Trained 2,470 researchers through 108 face-toface training courses, and 2936 active learners in four online courses (run by the two ELIXIR-UK training Node resources, the Bioinformatics Training Programme of the University of Cambridge, and The

ELIXIR Annual Report 2017

18. https://www.sphn.ch/en.html 19. Morgan SL, Palagi PM, Fernandes PL et al. The ELIXIR-EXCELERATE Train-the-Trainer pilot programme: empower researchers to deliver highquality training [version 1; referees: 2 approved]. F1000Research 2017, 6:1557 (doi: 10.12688/f1000research.12332.1) 20. http://sourcedata.embo.org/ 21. https://www.bc2.ch/2017 22. Pawlik A, van Gelder CWG, Nenadic A et al. Developing a strategy for computational lab skills training through Software and Data Carpentry: Experiences from the ELIXIR Pilot action F1000Research 2017, 6:1040, (doi: 10.12688/f1000research.11718.1) Larcombe L, Hendricusdottir R, Attwood TK et al. ELIXIR-UK role in bioinformatics training at the national level and across ELIXIR. F1000Research 2017, 6:952 (doi: 10.12688/f1000research.11837.1) Hancock JM, Game A, Ponting CP and Goble CA. An open and transparent process to select ELIXIR Node Services as implemented by ELIXIR-UK. F1000Research 2017, 5(ELIXIR):2894 (doi: 10.12688/f1000research.10473.2) 23. Morgan SL, Palagi PM, Fernandes PL et al. The ELIXIR-EXCELERATE Train-the-Trainer pilot programme: empower researchers to deliver high-quality training. F1000Research 2017, 6:1557 (doi: 10.12688/ f1000research.12332.1) Jiménez RC, Kuzak M, Alhamdoosh M et al. Four simple recommendations to encourage best practices in research software. F1000Research 2017, 6:876 (doi: 10.12688/f1000research.11407.1) van Rijswijk M, Beirnaert C, Caron C et al. The future of metabolomics in ELIXIR. F1000Research 2017, 6(ELIXIR):1649 (doi: 10.12688/ f1000research.12342.2)

2017 highlights Hungary becomes a Member of ELIXIR Hungary became the 21st Member to join ELIXIR! The ELIXIR Node in Hungary is led by the MTA Research Centre for Natural Sciences and is coordinated by Professor Laszlo Patthy of the Institute of Enzymology within the Research Centre for Natural Sciences of the Hungarian Academy of Sciences. The Node will focus on novel tools, services and databases in the fields of protein sequence and structure investigation, DNA sequence analysis, and translational medicine.

Jan ELIXIR and GA4GH Beacon Team Up to Advance Genomic Data Sharing The Beacon Project of the Global Alliance for Genomics and Health (GA4GH) and ELIXIR expanded their partnership to develop the Beacon project and to improve the discoverability of European genomic data. The partnership aims to establish a European network of Beacons that allows users to query all ELIXIR Beacons simultaneously and to introduce new security measures. In October, the Beacon project was named one of the GA4GH driver projects, an initiative to help guide GA4GH development efforts and to establish standard tools for genomic data sharing.

ELIXIR takes part in a Call for action for global coalition to sustain core data resources The leaders of major international life-science data resources issued a call for a global coalition to support biological data resources that are essential for the work of life-science researchers, educators, and innovators. ELIXIR has actively participated in the development of this call, using the experience gained in developing the selection procedure for ELIXIR Core Data Resources.

Feb

Mar

ELIXIR 2017 All Hands community meeting in Rome The third ELIXIR All Hands meeting was held in Rome, Italy on 21-23 March, with nearly 300 attendees. The meeting presented the full breadth of EXCELERATE activities; the keynotes by R.Gentleman and F. Oullette presented the research activities of 23andMe, and the Training Programme of Genome Canada.

“ELIXIR is helping data stewards make their data discoverable for re-use. Working with the GA4GH, ELIXIR is helping to scale up existing standards and good practices in data discovery.” Peter Goodhand, GA4GH Executive Director

ELIXIR Annual Report 2017

New ELIXIR website design ELIXIR Hub launched a new design for the ELIXIR website at www.elixr-europe.org The new design improves the navigation and organisational structure of the site and updates its overall layout.

Apr

ELIXIR launches staff exchange programme ELIXIR launched the first Call for Staff Exchange projects between ELIXIR Nodes. The purpose of the ELIXIR Staff Exchange programme is to support capacity building in ELIXIR Nodes and to exchange best practices in bioinformatics service provision. The programme will also strengthen the links between ELIXIR Nodes and supports the interoperability and sustainability of ELIXIR services and data resources.

May ELIXIR-EXCELERATE receive a very positive Mid term review The ELIXIR-EXCELERATE mid-term review assessed the activities and results of the first half of the project. According to the assessment report ELIXIR-EXCELERATE â&#x20AC;&#x153;... fully achieved its objectives and milestones for the period and has delivered exceptional results with significant immediate or potential impact.â&#x20AC;?

Building Galaxy training capacity ELIXIR-EXCELERATE, GOBLET and the Galaxy Training Network, organised a joint hackathon on Galaxy training material re-use in Cambridge, UK. The main objective was to extend the existing collection of Galaxy training materials, which was initiated at the 2016 Galaxy Community Conference. ELIXIR also supported the Galaxy community by sponsoring the Galaxy Community Conference in Montpellier, France, in June 2017.

ELIXIR Annual Report 2017

ELIXIR-EXCELERATE was also recognised in the analysis of the Horizon 2020 Research Infrastructures programme published in May by the European Commission. In the document, the European Commission presented ELIXIR-EXCELERATE as a success story and an illustration of added value provided by Horizon 2020 research infrastructure programme.

Jun

ELIXIR at ISMB/ECCB, Prague, Czech Republic, 21–25 July ELIXIR presented its activities in a dedicated programme track at the ISMB-ECCB Conference, the biggest and most important event in bioinformatics and computational biology. Hosted

Jul

ELIXIR announces initial list of Core Data Resources and Deposition Databases ELIXIR published the initial list of ELIXIR Core Data Resources – data resources considered to be fundamental importance to the life-science community and to the long-term preservation of biological data. These resources provide a benchmark for high quality concerning the infrastructure of service provision and they drive ELIXIR’s discussions with funders and policy-makers on the sustainability of life-science data resources. “We are already seeing the first benefits of the process to identify the Core Data Resources in terms of improving ELIXIR’s capacity to deliver data resources that meet the scientific need. For example, as a result of the evaluation process, we have seen data resources change their license to align with ELIXIR’s Open Access principles, allowing more extensive data reuse not only for basic research but for industry too.” Jo McEntyre (EMBL-EBI), co-Leader of the ELIXIR Data Platform.

in Prague, Czech Republic, the conference presented the activities of the ELIXIR Platforms and Use Cases, as well as its Industry Programme and Capacity Building Programme. The event also saw the official launch of the ELIXIR Core Data Resources.

Aug

ELIXIR publishes position paper on FAIR Data Management in life sciences This position paper highlights the key role of research infrastructure in helping researchers to make published life-science data FAIR (Findable, Accessible, Interoperable and Reusable). This statement also voices ELIXIR’s commitment to enabling the availability of FAIR data within the framework of the European Open Science Cloud (EOSC), an initiative of the European Commission and Member States to connect big data across Europe. In support of the EOSC Declaration from July 2017, ELIXIR published a set of guiding principles on FAIR Data Management.

Sep ELIXIR establishes new Communities for Proteomics, Metabolomics and Galaxy Following agreement with the Heads of Nodes committee, ELIXIR establishes three new Communities, covering Proteomics, Metabolomics and Galaxy, while continuing to operate its four current Use Cases on Marine Metagenomics, Plant Sciences, Rare Diseases and Human Data.

ELIXIR-EXCELERATE Train-theTrainer programme helps close skills gap in bioinformatics training The ELIXIR Training Platform published a strategic paper on the delivery and outcomes of seven ELIXIREXCELERATE Train-the-Trainer courses organised in 2016 and 2017. Suggestions from this paper have already contributed to the development of an ELIXIR Train-the-Trainer programme that can be hosted by any ELIXIR Node. “Scientists at all career stages need ‘point of need’ training to help them make the best use of bioinformatics tools and resources in their research. ELIXIR is contributing to expand the provision of courses and also of new instructors able to deliver high-quality courses, and cope with the high demand for training.“ Patricia Palagi, ELIXIR Training Platform Co-Lead.

ELIXIR Annual Report 2017

ELIXIR joins new initiative to improve security in drug development ELIXIR becomes a partner in eTRANSAFE, a new €40 million project funded by the Innovative Medicines Initiative (IMI), which began in September. This five-year project aims to develop an advanced data integration infrastructure and new computational methods to improve security in drug development process. ELIXIR joins the project consortium, which consists of eight academic institutions, six SMEs and twelve pharmaceutical companies, coordinated by the Fundació Institut Mar d'Investigacions Mèdiques (IMIM, part of ELIXIR Spain) and led by the pharmaceutical company, Novartis

Nov

Oct

Bioschemas meeting The Bioschemas workshop organised in October at the Wellcome Genome Campus in Hinxton, UK, to engage with the life-science community and encourage their members to adopt and use the Bioschemas specifications. The event brought together representatives from 33 different biological resources, including major international resources like UniProt and PDBe. Each of them tested and adopted at least one of the Bioschemas specifications. “The life-science community is beginning to see the benefits of the Bioschemas markup and we will be expand our reach to new data types and new data sets. By getting Core and Node-supported data resources to embed markup in their sites, ELIXIR will build a crucial component of our FAIR metadata infrastructure; enough to be indexable by search engines.” Carole Goble, ELIXIR UK Head of Node and one of the Bioschemas leaders

ELIXIR Annual Report 2017

Dec

ELIXIR finished its 2017 Innovation and SME programme in Paris Paris, France, saw the last event in ELIXIR Innovation and SME programme in 2017. The programme involved four thematic events in Helsinki, Barcelona, Brussels and Paris, each presented a range of ELIXIR bioinformatics resources to R&D companies and SMEs. These four events attracted a total of 382 participants, the largest number of attendees we have so far supported in a single year. Furthermore, the average level of satisfaction of the participants exceeded 95%, as demonstrated in the post-event feedback survey.

EU Grants

ELIXIR-EXCELERATE

ELIXIR-EXCELERATE is a €19 million Horizon 2020 project to fast-track the implementation of ELIXIR by coordinating national data infrastructures and by ensuring the delivery of life-science data services through its Platforms and Use Cases.

The ELXIR-EXCELERATE Work Packages are as follows: • WP1: Tools Platform: Tools Interoperability and Service Registry • WP2: Tools Platform: Benchmarking • WP3: Data Platform: Data Resources and Services • WP4: Compute Platform: Compute, Data access and exchange services • WP5: Interoperability Platform: The ELIXIR Interoperability Backbone

Basic facts about this project • €19.8 million in funding • Project term of four years (2015–2019) • Involving 48 partners in 18 countries

Overall goals ELIXIR-EXCELERATE fast-tracks the implementation of key scientific and organisational aspects of ELIXIR and facilitates the integration of Europe’s bioinformatics resources. It aims to deliver ‘excellence’ to ELIXIR’s users by ‘accelerating’ the implementation of one of Europe’s three priority Research Infrastructures, as considered by ESFRI and the European Council. The goals of ELIXIR-EXCELERATE are to: • Deliver world-leading data services for academia and industry • Increase bioinformatics capacity and competence across Europe • Complete the management and organisational processes for an efficient, distributed infrastructure The ELIXIR-EXCELERATE project is fully embedded into ELIXIR’s operations, meaning that all EXCELERATE activities and objectives reflect and complement the objectives of ELIXIR’s Scientific Programme, 2014–2018. The ELIXIR Platforms and Use Cases are represented in EXCELERATE as Work Packages (WPs), coupled by dedicated Work Packages on Capacity Development, Operations, Communications and Ethics.

• WP6: Marine Metagenomics Use Case: Marine metagenomics infrastructure as a driver for research and industrial innovation • WP7: Plant Sciences Use Case: Integrating Genomic and Phenotypic Data for Crop and Forest Plants • WP8: Rare Disease Use Case: ELIXIR infrastructure for Rare Disease research • WP9: Human Data Use Case: Secure archiving, dissemination and analysis of human accesscontrolled data • WP10: ELIXIR Node Capacity Building Programme Training • WP11:Training Platform: ELIXIR Training Programme • WP12: Excellence in ELIXIR Management and Operations • WP13: Communications, Industry and Community Engagement • WP14: Ethics requirements In 2017, the EXCELERATE project held its second Annual General Meeting in Rome on 21–23 March 2017; the three-day event was held in conjunction with the ELIXIR All Hands meeting. On 10 May 2017 – halfway through the duration of the grant in the mid-term review took place to assess the activities and results of the first half of the ELIXIREXCELERATE project. During the mid-term review meeting in Brussels, each Work Package presented their activities and achievements to date and their impact on life-science user communities. The feedback from the external review was very positive, and according to the assessment report submitted to the European Commission, ELIXIREXCELERATE “... fully achieved its objectives and

ELIXIR Annual Report 2017

milestones for the period and has delivered exceptional results with significant immediate or potential impact.” ELIXIR-EXCELERATE was also recognised in the analysis of the Horizon 2020 Research Infrastructures programme1 published in May by the European Commission. In the document, the European Commission presented ELIXIR-EXCELERATE as an early success story and as an illustration of added value benefits provided by Horizon 2020 research infrastructure interventions.

The main ELIXIR-EXCELERATE outputs in 2017 • Published the initial list of ELIXIR Core Data Resources as fundamental resources for life-science research and for the long-term preservation of biological data (WP3) • Organised four ELIXIR Innovation and SMEs Forums (in Helsinki, January; Barcelona, June; Brussels, October; Paris, November) (WP13) • Launched ELIXIR Scientific Benchmarking and Technical Monitoring platform (OpenEBench) (WP2) • Developed technical demonstrator for the secure transfer of large volumes of sensitive human data from the European Genome-phenome Archive (WP9) • Published version 2 of the ELIXIR Handbook of Operations, which is the main source of information on ELIXIR procedures, recommendations and guidelines, and which is released annually (WP12) • ELIXIR-EXCELERATE Marine Metagenomics Backbone (WP6) developed new tools, pipeline and reference databases that helped to classify over 70 % of previously unclassified 16s rRNAs • Extended previously proposed standard MIAPPE (Minimal information about a Plant Phenotyping Experiment) (WP7) • Developed and presented Demonstrator for the secure transfer of large volumes of sensitive human data (WP4)

1. https://ec.europa.eu/research/evaluations/pdf/archive/h2020_evaluations/ swd(2017)221-annex-2-interim_evaluation-h2020.pdf

ELIXIR Annual Report 2017

Collaboration with other Research Infrastructures

AARC / AARC 2

eTRANSAFE

The AARC (Authorisation and Authentication for Research and Collaboration) project and its successor, AARC2, are e-infrastructure projects that focus on the authentication of researchers and on managing their access rights to services. The AARC projects develop reference architectures to enable research infrastructures and e-infrastructures to take similar approaches to user authentication and authorization (AAI), by removing obstacles from crossinfrastructure interoperability.

eTRANSAFE is a €40 million project funded by the Innovative Medicines Initiative (IMI), which began in September 2017. The five-year project aims to develop an advanced data integration infrastructure and new computational methods to improve security in drug development process.

In 2017, ELIXIR participated in AARC and AARC2 projects through ELIXIR Finland (CSC – IT Center for Science) and ELIXIR Czech Republic (CESNET). The work focused on developing a level of assurance framework for authenticating users that meets the needs of the life-science community. ELIXIR also worked on a pilot project within the AARC on the integration of CILogon (an integrated open source identity and access management platform for research collaborations) and VOMS (Virtual Organization Membership Service) into the ELIXIR AAI. Together with CORBEL, which forms part of WP5, the AARC2 project pulled together the requirements on a Life Science AAI and ran a pilot on this AAI using the e-infrastructures. In April 2017, the AARC project also organised an AAI training event for the ELIXIR community. (See also ELIXIR Compute Platform section)

ELIXIR Annual Report 2017

ELIXIR is part of a project consortium that consists of eight academic institutions, six SMEs and twelve pharmaceutical companies, coordinated by the Fundació Institut Mar d'Investigacions Mèdiques (IMIM, part of ELIXIR Spain) and led by the pharmaceutical company, Novartis. ELIXIR leads two main tasks within this project: (1) creating a policy framework that allows industry and other organisations to share drug safety data and to adhere to consistent guidelines for predictive toxicology models; and (2) data interoperability and integration. The technical and scientific work involved in these tasks is being carried out by three ELIXIR Nodes: EMBL-EBI, ELIXIR Denmark (Technical University of Denmark), and ELIXIR Spain (through the Barcelona Supercomputing Centre and IMIM).

EOSCPilot (2017–2018) The European Open Science Cloud for Research pilot project (EOSCpilot) is supporting the first phase of the development of the European Open Science Cloud (EOSC). It is a consortium of 33 pan-European organisations and 15 third parties that aims to reduce the fragmentation of, and improve the interoperability between, European data infrastructures. The objectives of the project are to: (1) develop and trial the governance framework for the EOSC and contribute to the development of European open science policy and best practice; and (2) launch a number of demonstrators that will function as high-profile pilots that integrate services and infrastructures. These pilots aim to demonstrate interoperability and its benefits in a number of scientific domains. ELIXIR is a partner in the Data Interoperability and in the Governance Work Packages of the EOSCPilot. In 2017, the project’s Data Interoperability Work Package produced a first draft of the strategy and recommendations to help users and services to find and access datasets across several scientific disciplines. This work involves ELIXIR interoperability services such as Bioschemas, DATs1, FAIRsharing, OmicsDI and ELIXIR Core Data Resources, such as PRIDE.

The two major outcomes of the EOSCpilot data interoperability task are: (1) EDMI (EOSC Dataset Minimum Information), a set of crosswalk metadata guidelines on minimum information for finding and accessing datasets and; (2) recommendations on how to support an ecosystem of metadata catalogues in EOSC. These recommendations aim to promote the reuse of existing domain-specific registries (such as ELIXIR registries in the life sciences); the goal is to use existing standards and Bioschemas to show how ELIXIR resources are compliant with EDMI. This can be used to evaluate the FAIRness of datasets and data resources, with a special emphasis on Findability and Accessibility. In the EOSC Governance Work Package, ELIXIR has co-led the task to develop a first set of Principles of Engagement – a set of basic principles that would apply to service providers and users of the EOSC. The first phase of this task involved mapping the principles and rules that current domain-specific infrastructures and e-Infrastructures have when providing access to users. ELIXIR led several panel discussions at EOSC meetings and workshops to get feedback from service providers and to test concepts with potential users.

1. Sansone SA, Gonzalez-Beltran A, DATS, the data tag suite to enable discoverability of datasets. Sci Data. 2017 Jun 6;4:170059. doi: 10.1038/ sdata.2017.59.

ELIXIR Annual Report 2017

ENVRIPlus (2015–2019) ENVRIplus is a Horizon 2020 project linking Environmental and Earth System Research Infrastructures, projects and networks together with technical specialist partners to create a more coherent, interdisciplinary and interoperable cluster of Environmental Research Infrastructures. ELIXIR, represented by EMBL-EBI, provides expertise and resources in the Biodiversity and Ecosystem field and in the ‘Data for science’ theme.

EMBRIC (2015–2019) EMBRIC – initiated in 2015 and, financed with 9 million euros from the Horizon 2020 programme – connects marine biotechnology initiatives that focus on science, industry and regional growth. In EMBRIC, ELIXIR partners with research infrastructures such as the European Marine Biological Resource Centre (EMBRC), Microbial Resource Research Infrastructure (MIRRI) and EU-OPENSCREEN, to drive at stronger connections between science and industry through a number of “workflows” including bioproduct discovery, leverage of microbiological culture collections and aquaculture breeding strategies informed by genomics. In 2017, the project further developed its 'Configurator Service'1 and made it available to the marine biotechnology community. The configurator assists marine scientists in planning their data management, sharing, analysis and publication needs.

CORBEL (2015–2019) CORBEL is a collaboration project between 13 ESFRI Biological and Medical Research Infrastructures funded through EU’s Horizon 2020 programme. The goal of CORBEL is to establish a framework of shared services between the participating infrastructures (BMS RI), which enhances the efficiency, productivity and impact of European biomedical research and its translation into medicine. The CORBEL consortium is led by ELIXIR as the coordinator and the Biobanking and Biomolecular Resources Research Infrastructure (BBMRI) as co-coordinator. In the first half of 2017 the project finalised the first periodic report, the contractual financial and technical report submitted to the Europen Commission which was the basis of the project's mid-term review. The mid-term review was then held in Brussels in June 2017. According to the evaluation review prepared by the external evaluator appointed by the European Commission "the project has delivered exceptional results with significant immediate or potential impact (...). Each of the eight work packages has made substantial progress in achieving its stated goals." In the 2nd half of 2017 CORBEL published its Catalogue of Services, providing an overview of services that can be accessed via the participating research infrastructures (access to samples and technologies, data, tools, expertise and others.) At the 2nd Annual General Meeting in October, the consortium decided to publish a second Open Call, including services from medical infrastructures in addition. The coordinator has therefore started a formal contract amendment process, asking for a cost neutral prolongation of the project for 9 months.

1. http://www.embric.eu/node/1371

ELIXIR Annual Report 2017

Supporting activities

Capacity Building and Node development

Staff Exchange Programme The purpose of the ELIXIR Staff Exchange programme is to support capacity building in ELIXIR Nodes and the exchange of best practice in bioinformatics service provision. The programme also strengthens the links between ELIXIR Nodes and supports the interoperability and sustainability of ELIXIR services and data resources. The first set of Staff Exchange projects were selected in August 2017 through an internal peer-review process. The seven selected staff exchanges started in October 2017 and included a diverse set of projects, for example, to integrate services, tools or workflows from Nodes, to develop ontologies, and to improve computing resources for training. Based on the initial feedback and experience from this first Call, staff exchange appears to be a powerful mechanism by which to bring Nodes of the distributed ELIXIR infrastructure together. Given this initial success, ELIXIR has allocated funding for two new calls in 2018.

Building an annotation infrastructure in ELIXIR Nodes The ELIXIR Staff Exchange Programme builds on the experience gained during a pilot Staff Exchange scheme between EMBL-EBI and Masaryk University (ELIXIR Czech Republic). The project trained six graduate students from the Masaryk University (MU) in the annotation of data in the Protein Databank in Europe (PDBe), with the goals of building local capacity for the annotation of PDBe data in the Czech Republic and of establishing a longer-term partnership between EMBL-EBI and MU. In addition to the immediate benefits of building annotation expertise in ELIXIR Czech Republic and of improving the annotation data in PDBe (close to 300 macromolecular complexes were annotated as part of this project), the project also served as a successful proof-of-concept for establishing annotation capacity at, and for strengthening links between, ELIXIR Nodes. The collaboration between PDBe and ELIXIR Czech Republic now continues with support from an EU Regional Development Funds grant and has expanded to other institutes within ELIXIR Czech Republic.

ELIXIR Nodes participating in ELIXIR Staff Exchange projects.

Sweden Netherlands

Spain

(3 projects)

(4 projects)

Portugal (1 project)

France

(1 project)

Slovenia (2 projects)

Italy

(2 projects)

Czech Rep. (2 projects)

ELIXIR Annual Report 2017

EMBL-EBI (2 projects)

Finland

(2 projects)

ELIXIR Industry Engagement

As the costs of generating '-omics' data decrease, the bioinformatics industry will continue to grow. This will, in turn, increase the demand for a robust and sustainable infrastructure for public life-science data. ELIXIR’s industry engagement supports the use of public bioinformatics resources by research-intensive companies and by Small to Medium-Sized Enterprises (SMEs). The ELIXIR Industry Strategy has set five objectives to increase the awareness of public bioinformatics resources and to promote open innovation: • Increase industry usage of ELIXIR resources and ensure the name is synonymous with quality • Enable Open innovation by Europe’s SMEs • Build effective partnerships with key industry stakeholders and initiatives • Ensure effective communication between industry and ELIXIR • Support the bioinformatics training needs of industry

ELIXIR SME and Innovation Programme The main focus of ELIXIR’s industry engagement activities in 2017 was the Innovation and SME programme of four events: • Helsinki, Finland, in February • Barcelona, Spain, in June • Brussels, Belgium, in October • Paris, France, in November. ELIXIR Finland and ELIXIR Estonia together with the Global Alliance for Genomics and Health hosted the ELIXIR Innovation and SME Forum in Helsinki. This forum was aimed at companies in the genomics and health domains that use public bioinformatics resources and are looking to further streamline this process by using global data resources available through ELIXIR. The forum held in Barcelona, hosted by ELIXIR Spain, focused on public-private partnerships in genomics, bioinformatics and health. The programme featured presentations on some of the key resources in ELIXIR and ELIXIR Spain. The programme also included a technical seminar that introduced funding opportunities in the pre-competitive space, such as those offered by Innovative Medicines Initiative, as well as a hands-on demonstration of resources that are available through ELIXIR and Bioinformatics Barcelona.

Panel discussion at the ELIXIR Innovation and SME Forum in Barcelona, June 2017.

ELIXIR Annual Report 2017

The Brussels event, hosted by ELIXIR Belgium, brought together companies that are active in the probiotics, food and health sectors, to stimulate interactions between companies and academic partners. It showcased ELIXIR resources in the microbial research space, such as ENA, PRIDE and Metagenomics Use Case tools. The programme also featured talks by Unilever and DSM, by a selection of SMEs, as well as by the European Commission. The event was also co-located together with a workshop titled, Personalised nutrition for better health – targeting the microbiome, which was co-organised by OECD and the Department of Economy, Science and Innovation of the Flemish government. The final ELIXIR Innovation and SME Forum of 2017 was held in Paris, France, and focused on Rare Diseases and Personalised Medicine. In this lunch-tolunch event, attendees were immersed in the world of data-driven innovation, illustrated through talks by innovative companies and by presentations of ELIXIR’s open data resources and services, such as Orphanet, and of resources developed by the ELIXIR Rare Diseases Use Case.

These four events attracted a total of 382 participants, the largest number of attendees we have so far supported in a single year. The average level of satisfaction of the participants exceeded 95%, as demonstrated in the post-event feedback survey. Furthermore, four collaborative projects and proposals involving eight different companies (all SMEs) and ELIXIR Nodes were initiated as a result of the Innovation and SME Forums organised in 2017.

ELIXIR’s Industry Advisory Committee The ELIXIR Industry Advisory Committee (IAC) met for the third time in early 2017. Based on their discussion, the IAC presented a set of high-level recommendations for ELIXIR, including the need to keep developing good links with other successful industry–academia initiatives, such as the Innovative Medicines Initiative. The IAC also recommended continuing to work with regional SME associations, to maximize the number of companies that can find out about and access ELIXIR’s services.

Members of the ELIXIR Industry Advisory Committee. From left to righ: Filip Pattyn (ONTOFORCE), Andreas Kremer (ITTM, Luxembourg, appointed in November 2017), Abel Ureta-Vidal (Eagle Genomics, appointed in November 2017), Ian Barrett (AstraZeneca UK), Elizabeth Reynolds (General Bioinformatics, Vice Chair) and Iain Hrynaszkiewicz (Springer Nature, Chair). Members of ELIXIR Hub Niklas Blomber (ELIXIR Director), Andrew Smith (ELIXIR External Relations Manager) and Pablo Roman (ELIXIR Industry Officer). Not pictured: Martin Ebeling, Anita Eliasson, Natalia Jiménez Lozano, Christian Paulitz, Angel Pizarro, Philippe Sanseau, Sándor Szalma, Sara Paulina de Oliveira Monteiro, Claus Stie Kallesøe, and Belinda Clarke. 52

ELIXIR Annual Report 2017

International collaboration

Global collaboration is a cornerstone of ELIXIR’s implementation; users come from all over the world, many databases are run as part of global collaborations, and to ensure an effective, integrated data infrastructure, relevant initiatives must work in close cooperation. ELIXIR’s global collaborations are mapped out in the International Strategy1, which underwent a revision and update in 2017. The International Strategy is aimed at a broad range of stakeholders, including: users and communities across the world; global informatics and data initiatives; policy makers and funders; and ELIXIR partners. It acts as a roadmap that showcases how ELIXIR engages with key global initiatives and countries.

In April 2017, ELIXIR’s Board approved a formal document that set out the principles that ELIXIR should follow when considering Membership applications from countries outside of ESFRI3. Whilst any country in the world can use the services run by ELIXIR, the agreed strategy when considering a formal application to become a Member from outside of ESFRI is to consider the relative maturity of that country’s bioinformatics landscape and the benefits to existing Members of that country joining ELIXIR. This document now acts as a guide in ELIXIR’s dialogue with countries outside of Europe. Other key work globally in 2017 included the effort – led by the Data Platform – to support the longterm sustainability of Core Data Resources through engagement in the global coalition to support lifescience data resources (see Data Platform section for further details).

The objectives of the International Strategy are to: • Ensure that ELIXIR serves life-science users and communities across the globe • Support collaboration between ELIXIR and relevant global bioinformatics and data initiatives • Shape global science policy discussions on data and research infrastructures • Develop formal collaborations with those countries outside of Europe where there is mutual benefit In 2017, activities took place to support the implementation of all of the above objectives. Collaboration Strategies were concluded with key partners, including with the Global Alliance for Genomics and Health (GA4GH) and the BioExcel project2. ELIXIR’s Training Platform continued to collaborate closely with the GOBLET training initiative, with which ELIXIR already has a pre-established Joint Training Strategy.

1. https://www.elixir-europe.org/elixir-international-strategy 2. https://bioexcel.eu/ 3. See ESFRI Member States: http://www.esfri.eu/delegates

ELIXIR Annual Report 2017

Impact and sustainability

Throughout 2017, ELIXIR undertook a range of activities in order to understand and demonstrate the impact of public life-science data. By showcasing the value of investments in ELIXIR and of a public bioinformatics infrastructure generally, important steps can be taken towards ensuring the sustainability of these publicly-funded resources. ELIXIR was interviewed as part of the OECD’s work to develop a methodology for assessing the socio-economic impact assessment of research infrastructures across all disciplines. ELIXIR was also invited to present at a number of OECD workshops on the theme of business models for databases and international data infrastructures, both of which ensured that ELIXIR’s visibility remained high with policy makers. A proposal to Horizon 2020 on impact assessment was also prepared and submitted in early 2017. The project proposal, RI Impact Pathways, brought together economists and impact-assessment experts to study four existing infrastructures – ELIXIR, Cern, Desy and Alba – and to attempt to develop a model for impact assessment. The proposal was selected for funding in autumn 2017 and began in early 2018.

1. Roman Garcia P, Smith A and Blomberg N. Public data resources as a business model for SMEs. The Role of Public Bioinformatics Infrastructure in supporting innovation in the life sciences. F1000Research 2018, 7(ELIXIR):590 (document) (doi: 10.7490/f1000research.1115445.1) 2. Blomberg N and ELIXIR Consortium. ELIXIR position paper on FAIR data management in the life sciences. F1000Research 2017, 6(ELIXIR):1857 (document) (doi: 10.7490/f1000research.1114985.1)

ELIXIR Annual Report 2017

Desk-based research and interviews with many Small to Medium-Sized Enterprises (SMEs) were carried out in late 2017. These interviews were performed to describe the role, and to understand the importance, of public life-science databases to industry. This work culminated in the report, published in early 2018, entitled ‘Public Data Resources as a Business Model for SMEs’1. In order to shape the science policy and funding landscape, ELIXIR continued to produce a range of Position Papers and responses to relevant consultations. This included formal submissions to the Interim Evaluation of Horizon 2020, where ELIXIR presented a case for the need for appropriate, longterm funding for data infrastructures, and a Position Paper on ‘FAIR Data Management in the Life Sciences, which also doubled as ELIXIR’s endorsement of the European Open Science Cloud (EOSC) Declaration2.

Communications

Communications strategy review

ELIXIR videos

The ELIXIR Communications strategy was first released in May 2016. In 2017, the Communications strategy was reviewed against: (1) the mission and objectives of ELIXIR; and (2) the communications activities of other ESFRI research infrastructures and of other organisations similar to ELIXIR. The resulting report presented qualitative and quantitative data to measure the effectiveness of ELIXIR Communications and provided a series of recommendations for specific communications channels and/or audience.

Throughout 2017, the ELIXIR Hub released two more promotional videos on the ELIXIR YouTube channel, which complement the ELIXIR profile video published in October 2017. The first video in 2017 was released in March and focused on the impact of ELIXIR. It presents how life-science researchers use three particular resources that are available through ELIXIR (Human Protein Atlas, SwissProt/UniProt , and Europe PMC), and also how industry can benefit from open bioinformatics data. The second video – published in November – features ELIXIR activities in Human genomics and translational data. Through interviews with senior scientists from ELIXIR Nodes and members of the ELIXIR leadership team, the video presents a portfolio of services to facilitate the sharing, exchange and re-use of genetics data.

The available data showed that ELIXIR's communication presence in general outperforms other ESFRI research infrastructures. In particular, ELIXIR has a strong presence in social media and has built a strong audience. This digital communication is complemented by a significantly improved ELIXIR website, both in terms of content and design (see below). Another strong communications asset is ELIXIR’s Newsletter, which is considered to be the most useful communications channel, particularly for ELIXIR’s internal audience.

New ELIXIR website

The three videos will be complemented by one additional video to be released in 2018 to present the ELIXIR Training Programme. In 2017, the ELIXIR Hub continued with the ELIXIR webinar series and organised 16 webinars to present the work of ELIXIR Implementation studies, ELIXIR Platforms and Use Cases.

The layout, design and content structure of the ELIXIR website were redesigned in Spring 2017. The main goal of this re-design was to improve the user experience of the site, by simplifying the navigation structure throughout the website, by updating its content, and by improving the site’s layout and design. The feedback on the redesigned ELIXIR website, and the data collected during the Communications review (see above), suggest that the website’s new content is well-aligned with the needs of the ELIXIR stakeholders. Overall, the website was rated as good or very good by over 63% of the survey respondents. ELIXIR video on Human Genomics and Translational Data. All ELIXIR videos are available on the ELIXIR Youtube channel: https://youtu.be/stTY6fxwonY .

ELIXIR Annual Report 2017

ELIXIR at ISMB/ECCB 2017

ELIXIR Gateway on F1000Research

In 2017, the major conference for ELIXIR was the joint ISMB-ECCB Conference (Intelligent Systems for Molecular Biology – European Conference for Computational Biology), which was held on 21–25 July in Prague, the Czech Republic.

The ELIXIR Gateway on F1000Research was launched in December 2015, as a platform to collect and capture ELIXIR’s research and technical outputs. In 2017, the ELIXIR channel published 12 articles, all transparently peer-reviewed through the F1000Research invited post-publication peer review process.

The programme of the conference featured a dedicated track to showcase ELIXIR activities and services. The ELIXIR Special Track presented ELIXIR Platforms and Use Cases, ELIXIR Capacity building programme, and ELIXIR Industry support. Participants could also learn about ELIXIR services at the ELIXIR demonstration stand where they could talk directly to developers and operators of selected ELIXIR resources. The ISMB-ECCB 2017 was also organised with the support of ELIXIR Czech Republic.

The most successful article published in the ELIXIR Gateway in 2017 was by Jiménez, Kuzak et al,1 which presented recommendations to encourage best practices in research software development. This has been viewed by over 2,200 readers. Other popular publications include articles by Morgan, Palagi et al. presenting ELIXIR Train-the-Trainer programme2 and by Gabella, Durinx and Appel exploring sustainable funding model for life science data resources3. The editorial oversight of the ELIXIR Gateway is provided by an Advisory Board, who review all papers submitted to the Gateway to ensure all materials are relevant to the ELIXIR community. The members of the Advisory Board of the ELIXIR Gateway in 2017 were: • Niklas Blomberg, ELIXIR Director • Inge Jonassen, University of Bergen, Head of ELIXIR Norway • Arlindo Oliveira, Instituto Superior Técnico, Head of ELIXIR Portugal • Bengt Persson, Uppsala University, Sweden, Head of ELIXIR Sweden • Graziano Pesole, The University of Bari Aldo Moro, Head of ELIXIR Italy

1. Jiménez RC, Kuzak M, Alhamdoosh M et al. Four simple recommendations to encourage best practices in research software [version 1; referees: 3 approved]. F1000Research 2017, 6:876 (doi: 10.12688/f1000research.11407.1) 2. Morgan SL, Palagi PM, Fernandes PL et al. The ELIXIR-EXCELERATE Trainthe-Trainer pilot programme: empower researchers to deliver high-quality training [version 1; referees: 2 approved]. F1000Research 2017, 6:1557 (doi: 10.12688/f1000research.12332.1) 3. Gabella C, Durinx C and Appel R. Funding knowledgebases: Towards a sustainable funding model for the UniProt use case [version 2; referees: 3 approved]. F1000Research 2018, 6(ELIXIR):2051 (doi: 10.12688/ f1000research.12989.2)

ELIXIR Annual Report 2017

Governance

To ensure the integration of bioinformatics services into a coherent distributed infrastructure, it is critical to create effective links between the national institutes that make up the ELIXIR Nodes, and between those institutes and the ELIXIR Hub. This is done through the signing of Collaboration Agreements, which allow the ELIXIRNodes to receive funding from the ELIXIR Hub for Commissioned Services.

In 2017, the ELIXIR Hub worked closely with the ELIXIR Nodes on the development of Collaboration Agreements. During this period, four Collaboration Agreements (between ELIXIR Hub and the ELIXIR nodes for Belgium, Italy, Luxembourg and UK) were signed. By the end of 2017, a total of fifteen ELIXIR Nodes had their Collaboration Agreements in place. Negotiations will continue with the remaining Nodes in 2018. ELIXIR Node Service Delivery Plans (SDPs) describe the scientific and service provision content that each ELIXIR Node provides through ELIXIR. In 2017, the Hub prepared a SDP with ELIXIR Luxembourg and submitted it for review and approval to the ELIXIR Board during its 2017 Spring meeting. The portfolio of ELIXIR services, as put forward by ELIXIR Nodes, is available on the ELIXIR website (http://elixir-europe. org/services).

Number of completed governance agreements

2013

2014

2015

2016

2017

Year ELIXIR Consortium Agreements

Service Delivery Plans

Collaboration Agreements

First Commissioned Service Contracts

The growth of ELIXIR in its first four years (2014–2017) is demonstrated by the number of ELIXIR members (ELIXIR Consortium Agreements). The growing number of completed Collaboration Agreements, Service Delivery Plans and Commissioned Service Contracts illustrate the successful implementation of ELIXIR Governance structure. These contract allow ELIXIR Nodes to collaborate and ELIXIR Hub to commission technical services.

ELIXIR Annual Report 2017

ELIXIR Scientific Processes Working Group The ELIXIR Heads of Nodes established a Working Group to develop scientific processes within ELIXIR, with the specific aim for 2017 of establishing a process for the selection of new Communities. The Working Group also developed recommendations for processes concerning the Programme, annual Work Plans, and Implementation Studies, and will continue its work through to 2018.

Handbook of Operations The ELIXIR Handbook of Operations, which underwent an update in 2017, continues to be the authoritative source of information on ELIXIR procedures, recommendations and guidelines, strategies and reference documents. It is aimed at the whole ELIXIR community, including all staff in ELIXIR Nodes, ELIXIR Hub staff, ELIXIR Board members and national funders. The topics covered by the Handbook include Governance, Nodes and Service provision, ELIXIR Programme cycle, Project management, Communications and External relations, and Technical operations.

ELIXIR Annual Report 2017

The ELIXIR 2019–2023 Programme In 2017, we also initiated the development of the next ELIXIR Scientific Programme. ELIXIR operates through quinquennial Programmes, the first of which started in 2014. The next Programme will span the years, 2019– 2023. In order to be ready for approval by the ELIXIR Board in November 2018, the first draft of the next Programme was prepared in 2017 and reviewed by the Heads of Nodes. This programme will also be reviewed by the ELIXIR Scientific Advisory Board in early 2018. The Programme is developed along with the 2019– 2023 Financial Plan, which lays out how the ELIXIR Budget is planned to be used, both for coordination and Commissioned Services activities.

ELIXIR Hub staff

In 2017, the ELIXIR Hub significantly expanded its team and strengthened its coordination capacity. The recruitment of the new ELIXIR Chief Technical Officer (CTO) and the appointment of ELIXIR Platform Coordinators was a major step in terms of establishing technical support and coordination, both within and between ELIXIR Platforms and Use Cases. This structural change allowed the technical and domain experts in the Nodes to focus on their technical work, and provided greater capacity to drive the development of technical strategy across ELIXIR Platforms and Use Cases. Jerry Lanfear joined the ELIXIR Hub in September 2017 as its new Chief Technical Officer (CTO). As CTO, Jerry’s main task is to lead the design and implementation of an ELIXIR-wide technical strategy and to oversee the developments of ELIXIR Platforms and the interaction between them. Jerry joined ELIXIR from Pfizer where he was a Senior Director within the IT group with responsibility for IT service delivery across R&D within the UK and for data management globally.

Jerry took over the CTO position from Rafael Jiménez, who had served as ELIXIR CTO from 2014. Following Jerry’s appointment, Rafael moved to the post of Chief Data Architect, working on key projects within ELIXIR, such as Bioschemas and the European Open Science Cloud. In the ELIXIR Hub Project Management Unit, Juan Arenas replaced Steffi Suhr as the EXCELERATE project manager in March 2017, and became the Head of the Project Management Unit in July. Juan came to ELIXIR with over 15 years of experience in project management. Before joining ELIXIR, Juan worked as Portfolio Manager & Technology Officer at the University of Sheffield, UK. Prior to that, Juan led ICT projects for global companies in a variety of sectors and technologies in Spain. Rachel Drysdale was appointed ELIXIR Data Platform Coordinator in March, working closely with the Data Platform leadership team, coordinating the activities of the Platform, notably the process of identifying ELIXIR Core Data Resources. Rachel joined ELIXIR from PLOS, where she had worked as Manager of Taxonomy Systems and Analysis, and before that as Consulting Editor for PLOS ONE.

Members of the ELIXIR Hub (from left to right): Juan Arenas, Rafael Jimenez, Niklas Blomberg, Laura Mangan, Jerry Lanfear, Kayla Wiles, Phyllida Hallidie, Rachel Drysdale, Andrew Smith, Susanna Repo, John Hancock, Sheena Lee, Martin Cook, Premysl Velek, Dana Cernoskova, Pablo Roman, Friederike Schmidt-Tremmel, David Loyd and Pascal Kahlem (external consultant).

ELIXIR Annual Report 2017

In June, Susheel Varma joined the ELIXIR Hub as Technical Coordinator for Human Genomics and Translational Data, and worked to develop and coordinate services within the Human Genomics and Translational Data portfolio, headed by Serena Scollen, Head of Human Genomics and Translational Data in ELIXIR. Susheel left the ELIXIR Hub at the end of 2017 and took up a new position at EMBL-EBI, where he works on as ELIXIR Competency Centre Project Manager. Norman Morrison left the position of ELIXIR Interoperability Platform Coordinator in September 2017, to take up a new post in the Human Cell Atlas project. The Technical Team also appointed David Lloyd as a Project Coordinator and John Hancock as Communities and Services Coordinator. David Lloyd works with the ELIXIR Nodes in developing their ELIXIR Service Delivery Plans and also supports the ELIXIR Beacons project. David previously worked at the Sanger Institute where he was a coordinator for the Global Alliance for Genomics and Health.

ELIXIR Annual Report 2017

John Hancock leads the development of research communities associated with ELIXIR’s Use Cases and their integration with ELIXIR Platforms. John has broad experience in various research and management positions in computational biology. His previous posts include being a Group Leader at the MRC Clinical Sciences Centre, a Reader in Computational Biology at Royal Holloway University of London, and Head of Bioinformatics at the MRC Mammalian Genetics Unit. In 2015, he moved to the Earlham Institute in Norwich to work as ELIXIR UK Node Coordinator. Dana Černošková joined the Hub in January as ELIXIR Events Officer, covering for Melissa Balzano, who went on maternity leave. To support the overall administration of the ELIXIR Hub office, the ELIXIR Hub recruited Laura Mangan as Administrative Assistant. The ELIXIR Hub also hosted two interns. Guillermo Calderon Mantilla spent six months (March – August) supporting the Bioschemas initiative; Kayla Wiles joined the External Relations team in June for six months to support the communications activities of the Hub, namely the evaluation of the Communications Strategy and development of the ELIXIR Node Toolkit.

Governance committees and financial data

ELIXIR Committees

ELIXIR Board Chair Prof Rein Aasland, Norway

Vice Chairs Dr Ruben Kok, Netherlands Prof Rita Casadio, Italy

Country

Scientific delegate

Administrative delegate

Belgium

Laurence Lenoir

Michele Oleo, Didier Flagothier

Czech Republic

Jaroslav Koča

Jan Buriánek

Denmark

Anders Krogh

Troels Tvedegaard Rasmussen

Estonia

Pärt Peterson

Toivo Räim, Priit Tamm

Finland

Per Öster

Riina Vuorento, Jarmo Wahlfors

France

Claudine Medigue

Eric Guittet

Germany

Roland Eils, Alexander Goesmann

Johannes Mohr

Hungary

László Patthy

Gábor Tóth

Ireland

Marion Boland, Dara Dunican

TBA

Israel

Yossi Kalifa

Ilana Lowi

Italy

Rita Casadio (from April 2017)

Salvatore La Rosa

Luxembourg

Rudi Balling, Regina Becker (both from December 2017)

Lynn Wenandy, Pierre Misteri

Netherlands

Ruben Kok

Bea Pauw

Norway

Rein Aasland, Stig Omholt

Portugal

Ana Teresa Freitas

Andreia Feijão, Tiago Saborida

Slovenia

Damjana Rozman

Albin Kralj

Spain

Ferran Sanz

Cristina Bauluz, Dr Rafael de Andres-Medina

Sweden

Björn Andersson

Karl Gertow (from October 2016) Anna Wetterbom (stepped down end of 2016)

Switzerland

Christian von Mering (from August 2017)

Isabella Beretta

Chris Rawlings

Mark Palmer, Amanda Collis

EMBL

Iain Mattaj, Janet Thornton

Silke Schumacher

ELIXIR Annual Report 2017

Heads of Node Committee Chair Niklas Blomberg, ELIXIR Director

Country

Head of Node

Belgium

Yves Van de Peer

Czech Republic

Jiří Vondrášek

Denmark

Søren Brunak

Estonia

Jaak Vilo

Finland

Tommi Nyrönen

France

Claudine Médigue, Jacques van Helden (from March 2017)

Germany

Alfred Pühler

Hungary

Balázs Gyorffy (from January 2018)

Ireland

Walter Koch

Israel

Michal Linial

Italy

Graziano Pesole

Luxembourg

Reinhard Schneider

Netherlands

Jaap Heringa

Norway

Inge Jonassen

Portugal

Arlindo Oliveira

Slovenia

Brane Leskošek

Spain

Alfonso Valencia

Sweden

Bengt Persson

Switzerland

Ron Appel

Carole Goble

EMBL-EBI

Rolf Apweiler, Ewan Birney

ELIXIR Annual Report 2017

Scientific Advisory Committee

Industry Advisory Committee

Chair Robert Gentleman, 23andMe, USA

Chair Iain Hrynaszkiewicz, Springer Nature

Vice Chair Dr Janet Kelso, Max Planck Institute for Evolutionary Anthropology, Germany

Vice Chair Elizabeth Reynolds, General Bioinformatics, UK

Members Prof Pascal Borry, University of Leuven, Belgium Prof Elina Ikonen, University of Helsinki, Finland Prof Larry Hunter, University of Colorado, USA Prof Nicola Mulder, UCT Computational Biology Group (NBN), South Africa Dr Francis Ouellette, Ontario Institute for Cancer Research, Canada Prof Juni Palmgren, Karolinska Institutet, Sweden Dr Susan E. Wallace, University of Leicester, UK Dr Jérôme Wojcik, Quartz Bio, Switzerland

Members of the SAB (from left to right): Janet Kelso, Larry Hunter, Francis Ouelette, Robert Gentleman (Chair), Nicola Mulder, Elina Ikonen and Juni Palmgren, (not pictured: Jérôme Wojcik, Susan E. Wallace, Pascal Borry and Alan Archibald).

ELIXIR Annual Report 2017

Members Martin Ebeling, Hoffmann-La Roche, Switzerland Anita Eliasson, Biocomputing Platforms Ltd, Finland Natalia Jiménez Lozano, Atos, Spain Christian Paulitz, Bayer CropScience, Germany Angel Pizarro, Amazon Web Services, USA Philippe Sanseau, GlaxoSmithKline, UK Sándor Szalma, Takeda Pharmaceuticals, USA Sara Paulina de Oliveira Monteiro, P-BIO, Portugal Claus Stie Kallesøe, Gritsystems A/S, Denmark Belinda Clarke, Agri-Tech East, UK

Implementation studies in 2017

Implementation Studies are short-term projects carried out by ELIXIR Nodes that address key scientific and technical issues within ELIXIR. The outcome of an Implementation Study might be a description of service requirements, a piece of software, or a technical deliverable with an accompanying report. Implementation Studies are funded through the budget of the ELIXIR Hub and form part of ELIXIR’s ongoing activities in a particular Platform or Use Case. They are proposed by Platforms, agreed with the ELIXIR Heads of Nodes Committee, and the contracts are approved by the ELIXIR Board. In 2017, ELIXIR ran 13 Implementation Studies, with one additional study approved in 2017 with a start date of early 2018. All ongoing and completed Implementation Studies are listed at https://www.elixireurope.org/aboutus/implementation-studies

Name

In total, the work put into these studies came to 250 Person Months and required an approximate budget of €2.6 m, demonstrating the rapid maturity of ELIXIR and the increasing focus within the ELIXIR budget of supporting technical activities that are carried out by the ELIXIR Nodes. On average, 2.5 Nodes participated in each study and a total of nine Nodes were involved across all studies. In addition, during 2017, the ELIXIR Platforms laid the groundwork for a large portfolio of Implementation Studies to be carried out in 2018. They did this by planning and developing project ideas for a further eight projects, which have been approved to start on 1 January 2018.

Sector

Nodes

Leads

Solutions for IMI data management

Human Data

EMBL-EBI ELIXIR Hub

Dylan Spalding, EMBL-EBI; Susanna Repo, ELIXIR Hub; David Henderson, IMI OncoTrack

Data Resource Implementations for the Global Alliance for Genomics and Health Data Schema

Human Data

Switzerland France EMBL-EBI

Michael Baudis, CH

Completed in 2017

ELIXIR Annual Report 2017

Name

Sector

Nodes

Leads

ELIXIR Beacons

Human Data

Belgium EMBL-EBI Spain, Switzerland Netherlands Finland, Sweden

Serena Scollen, ELIXIR Hub; Susheel Varma, ELIXIR Hub; Ilkka Lappalainen, FI; Jordi Rambla, ES; Michael Baudis, CH; Dylan Spalding, EMBL-EBI

Assessment of the operation of the ELIXIR AAI

Compute

Czech Republic Finland

Michal Prochazka, CZ; Mikael Linden, FI

Interpretation of phenotypic and genotypic variation for rare diseases in terms of biological pathways

Rare Disease

Netherlands Spain Italy

Friederike Ehrhart, NL; Chris Evelo, NL; Marco Roos, NL

Name

Sector

Nodes

Leads

Implementation Study on Data Identification and Interoperability

Interoperability

EMBL-EBI

Sarala Wimalaratne, EMBL-EBI;

The scientific and economic impact of ELIXIR Data Resources â&#x20AC;&#x201C; Towards a sustainable funding model for the UniProt-SwissProt use case

Data

Switzerland

Christine Durinx, CH

Integrating distributed resources in Ensembl Genomes

Data

EMBL-EBI Sweden Norway

Paul Kersey, EMBL-EBI

Microbial metabolism resource for Systems Biology

Data

France Switzerland EMBL-EBI

Claudine Medigue, FR

Proteomics infrastructure service

Data

EMBL-EBI Germany

Juan Antonio Vizcaino, EMBL-EBI;

Bioschemas

Interoperability

UK EMBL-EBI NL

Alasdair Gray, UK; Carole Goble, UK; Rafael Jimenez, ELIXIR Hub

Started and finished in 2017

Ongoing in 2017

ELIXIR Annual Report 2017

Name

Sector

Nodes

Leads

Visualization of aligned genomics data for rare diseases (RD-Connect) as a driver for real-time access of controlled data at the EGA

Rare Diseases

Spain EMBL-EBI

Sergi Beltran, Jordi Rambla, JoaquĂn Dopazo, Alfonso Valencia and Salvador Capella, ES

Integrating ELIXIR Luxembourg into ELIXIR activities

Human Data

Luxembourg

Reinhard Schneider, LU

Integrating ELIXIR Italy into ELIXIR activities

Italy Data Tools Interoperability Human Data Rare Disease Marine

Graziano Pesole, IT

Using clouds and VMs for bioinformatics training (Workshop as a Service)

Training Compute

Finland Netherlands Switzerland France, UK Belgium, Spain Slovenia, Germany

Eija Korpelainen, FI

ELIXIR integration from a user perspective

Training & Tools

UK, Estonia Belgium Denmark Switzerland EMBL-EBI Norway, France

Frederik Coppens, BE

ELIXIR Proof of concept study on the availability of big datasets on remote compute infrastructure

Compute

Sweden Germany Czech Republic EMBL-EBI, Finland

Mikael Borg, SE

Ongoing in 2017 (cont.)

ELIXIR Annual Report 2017

Timeline of ELIXIR Implementation studies in 2017

2016 Q3

2017 Q4

2018 Q2

OncoTrack feasibility study

Data Identification and interoperability Data Resource Implementation for GA4GH schema

Sustainability of data resources Towards distributed Ensembl Mining the proteome Systems biology ELIXIR AAI ELIXIR Beacons Bioschemas Visualisation of Rare Disease data Making RD data sources FAIR

Integration of ELIXIR Luxembourg Integration of ELIXIR Italy Using Clouds in Training Integration from User Pe rspective Big Data in remote compute infrastructure

Implementation studies led b y Human Data Use Case

Training Platform

Rare Disease Use Case

Compute Plat form

Node driven study

Data Platform Interoperability Platform

ELIXIR Annual Report 2017

Financial data

Appendix 1. ELIXIR Income and Expenditure for 2017 (Note 30 to EMBL annual accounts) In its 2014 Summer meeting, EMBL Council unanimously approved ELIXIR’s legal framework, including its status within EMBL as a "Special Project” as well as EMBL's membership of ELIXIR (EMBL/2013/16/Rev 1).

As of 31 December 2016, the total number of signatories to the ECA stood at 20 (with France and Spain as Provisional Members) with Greece additionally making contributions to its financing as an Observer.

As of 31 December 2015, the total number of signatories to the ECA stood at 15 (with France and Spain as Provisional Members) with Slovenia additionally making contributions to its financing.

The budget of ELIXIR is set annually by the ELIXIR Board and all funds related to its activities, including its surplus, are ring-fenced within EMBL's accounts.

2017

2017 Budget

2016

Actual

Revised

Original

Actual

€000

5.035

5.059

5.033

3.710

(3)

984

1.301

1.244

899

6.016

6.360

6.277

4.660

405

680

610

195

259

217

151

1.311

3.493

3.671

435

1.776

4.441

4.507

781

Salaries

686

1.085

882

640

Running costs

372

564

497

260

1.072

1.658

1.388

900

Support and Admin Infrastructure costs

626

1.439

1.119

342

Grant expenditure incurred

987

1.301

1.244

899

Total Expenditure

4.461

8.839

8.258

2.922

Surplus/(Deficit)

1.555

(2.479)

(1.981)

1.738

Income ELIXIR Member state contributions Ordinary contributions

(a)

Foreign exchange (loss)/gain on sterling contributions

(b)

Grant income (c) Other income Net income

Expenditure Technological activities Salaries Running costs Equipment and depreciation Commissioned services Total expenditure Technological Activities

Directorate and Administrative expenditure

Equipment and depreciation Total expenditure Directorate and Administration

ELIXIR Annual Report 2017

a) ELIXIR Member state contributions

â&#x201A;Ź000

145

123

Czech Republic

Denmark

Estonia

Finland

France

846

719

Belgium

Germany

1.061

376

Greece

Hungary

34 -

Ireland

Israel

Italy

625

531

Luxemburg

Netherland

241

205

Norway

133

113

Portugal

Slovenia

Spain

438

372

Sweden

147

125

Switzerland

188

159

United Kingdom

717

630

5,035

3,710

Total

2017 2016 Actual Actual

ELIXIR Annual Report 2017

b) The ELIXIR Board approved that, from January 2016, the UK will pay its member state contributions in Sterling (Elixir/2015128). The difference between the value of these contributions valued in Eurosat the date of payment and the date of the approval of the 2017 budget was a loss of €3k (2016: gain of €50k).

c) Grant income

2017 2016 Actual Actual

€000 €000 Grant funding awarded

4.096

5.832

Grant income earned in the current year

984

899

Grant expenditure incurred in the current year

(987)

(899)

2,060

4,892

Unutilised grant income

(d) The following countries have amounts due or prepaid at 31 December 2016 Values in €000 Greece

Contribution 2017

Contribution 2016/2015/2014 Total

Prepayments for 2018

45 (e)

Israel

Total

(e) A provision has been raised against the 2013/2014 contributions, refer note 12.3.

ELIXIR Annual Report 2017

Credits and acknowledgements

Produced on the direction of the ELIXIR Board in May 2018. With special thanks to all of those who contributed to the development of ELIXIR infrastructure in 2017, most notably Heads of Nodes, Platform and Use Case leads, Technical and Training Coordinators and members of the various Working Groups. ÂŠ 2018 ELIXIR This publication was produced by the External Relations team at the ELIXIR Hub For more information about ELIXIR please contact info@elixir-europe.org

Art direction and design: Design Science Cover and divider pages: Keith Peters

ELIXIR is building a sustainable European Infrastructure for biological information, supporting life science research and its translation to: Medicine Environment Bioindustries Society

Contact Niklas Blomberg, Director ELIXIR Wellcome Genome Campus Hinxton, Cambridgeshire CB10 1SD, United Kingdon +44 (0)1223 492 670 +44 (0)1223 494 468 info@elixir-europe.org www.elixir-europe.org