ELIXIR Scientific Programme 2019-2023

Page 1

ELIXIR Scientific Programme 2019–2023

ELIXIR Scientific Programme 2019–23

1


Contents 2 Foreword by ELIXIR Director Bioinformatics in 2023 – the opportunity of digital life science

6 The changing landscape of life science research 8

Preserving trust: Responsible access and reuse of human research data

12 Navigating a sea of data 15 European Open Science Cloud 16 Delivering value to wider society from Research Infrastructures

ELIXIR, Europe’s life science research infrastructure

22 Established transnational service operations for data integration, data analysis, compute and storage 26 ELIXIR develops in close partnership with research communities 28 ELIXIR in the European Research Infrastructure landscape

Strategic objectives of the ELIXIR 2019–23 Programme

33 ELIXIR operates a portfolio of integrated services that meet the data needs of life scientists at a European scale 35 ELIXIR Communities drive service uptake, support standards development, and connect ELIXIR’s experts in life science disciplines 36 ELIXIR Core Data Resources are the global standard for bioinformatics resource management and are the foundation for an international funding and life cycle management strategy that secures the longterm sustainability of those resources 37 ELIXIR is the recognised and trusted life science foundation of the European Open Science Cloud 38 All ELIXIR Nodes connect life science users in academia and industry to our open, federated service network

Delivering the ELIXIR Programme

44 Programme structure and resource allocation 46 Organisation and Governance 47 Data Platform 53 Tools Platform 61 Compute Platform 72 Interoperability Platform 82 Training Platform 92 Communities 95 Human Data Communities 101 Interactions between Platforms, Communities and Nodes 102 Cross-platform strategic priorities 105 Industry 108 International outlook 110

2

ELIXIR Scientific Programme 2019–23

Management


ELIXIR in 2023: a federation that enables users to access and extract knowledge from large and distributed life science datasets Life science is a data science: it rests on the generation, sharing and integrated analysis of vast quantities of digital data. It is now possible to connect life science data across species, over scales from atoms to ecosystems, and over times that range from split seconds molecular reactions to the decades of observation in long-term cohort studies. The knowledge created by connecting these dat is transforming biological research and drives a new era of integrative biology. Research addressing major challenges facing our society – such as food security, changing ecosystems and provision of sustainable health care – rests on our ability to connect and compare data from many countries, disciplines and experiments. In research on rare genetic diseases, for instance, international collaborations bringing together genomic and clinical data from centres throughout Europe have achieved new paradigms in diagnosis and care. Digital biology is transforming environmental, agricultural and animal health research. The combination of large-scale laboratory data with field measurements helps to improve crops to withstand drought. Global surveys of marine ecosystems help us to understand how our oceans respond to a changing climate. In all these fields data sharing, data reuse and integration of diverse data types fuel new ground-breaking discoveries. Advancing the understanding of life and disease over all these domains requires that research data, analysis tools, standards and computational services are FAIR1 – findable, accessible, interoperable, and reusable – for researchers across scientific disciplines and national boundaries.

1. FAIR principles for data stewardship. Nature Genetics 2016, 48, 343–343, https://doi.org/10.1038/ng.3544; Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 2016, 3, 160018, https://doi.org/10.1038/sdata.2016.18

4

ELIXIR Scientific Programme 2019–23

be secured for the long term. This can only happen by collaboration at a global scale. Accordingly, in the ELIXIR 2019–23 Programme, a key goal is to establish the ELIXIR Core Data Resources, as the global standard for bioinformatics resources management. With international partners we aim to develop a model for long-term international funding of Core Data Resources with a life-cycle management strategy that secures global sustainability.

As a distributed organisation, ELIXIR is uniquely placed to drive the transformation of the European life science data landscape into a long-term, standards-based infrastructure. After five years of operation, the ELIXIR data infrastructure links more than 200 institutes in 22 Nodes that jointly develop and operate foundation services that enable data access, integration, and analysis for the research community. Building on national best practice contributed by our Nodes, the ELIXIR 2019–23 Programme is an unprecedented effort to align the national and international services from these national centres of excellence around common standards. More than 100 scientists from all ELIXIR Nodes have developed the scientific and technical roadmaps that tie together national services into the Europe-wide infrastructure set out in this Programme. Ultimately, the ambition is a truly international ecosystem of joint services where federated resources for life science data are used by national and international projects across all life science disciplines, with widespread support for standard components securing their long-term sustainability. Provision of a Europe-wide toolkit for FAIR data management will drive good data stewardship, data reproducibility and data reuse. Connecting distributed datasets via common standards will allow researchers unprecedented opportunities to detect rare signals in complex datasets and lay the ground for widespread application of advanced data analysis methods in the life sciences. Data discovery and integration in the life sciences rely on data resources that provide the authoritative reference for key entities such as gene and protein sequences, employing unambiguous naming conventions, universal identifiers for molecular entities and agreed genome coordinates. Knowledgebases are critical components of basic and translational research: databases of human genomes and variation are fundamental for cancer genomics; data from model organisms drives translational research; and research into plant breeding, biodiversity and ecology relies on the careful annotation of genomes from many plant and animal species. These critical data resources must

Niklas Blomberg Director

‘Well-connected and collaborating Nodes allow users to access and extract knowledge from large and distributed life science datasets’

Life science, and perhaps in particular computational biology, is a rapidly developing field where change can be brisk with new technologies disrupting established ways of working. ELIXIR must stay agile. Annual reviews of the Programme direction and outcomes will drive the strategic development of ELIXIR and form the basis for annual work plans and detailed project proposals. In summary, the ELIXIR 2019–23 Programme will transform European life science data landscape by aligning national and international services into a standards-based infrastructure operating at a continental scale. The collective development and agreement on strategic objectives and outcomes described in the ELIXIR 2019–23 Programme will bring together the work of over 700 national experts towards a common set of key results. ELIXIR Nodes will drive new scientific discoveries by providing a joint set of tools, cloud services and data resources in a federated ecosystem used by national and international research projects. Through our national Nodes, ELIXIR has the geographical spread, service portfolio and expertise required to fulfill our ambitions for FAIR data stewardship based on common standards within every European life science project. This will enable researchers effectively to access, analyse and re-use large, complex and geographically distributed datasets, and maximise the potential of both the data and the researchers.

Niklas Blomberg, ELIXIR Director

ELIXIR Scientific Programme 2019–23

5


Bioinformatics in 2023 – the opportunity of digital life science

Barcelona Supercomputing Center (BSC, part of ELIXIR Spain) specialises in high performance computing and manages the MareNostrum supercomputer, one of the most powerful supercomputers in Europe, located in the Torre Girona chapel.

6

ELIXIR Scientific Programme 2019–23

ELIXIR Scientific Programme 2019–23

7


The changing landscape of life science research

Computational biology is at the centre of the changing life science research landscape. Large international collaborations have provided details of the genome and its variations, structure and modification. The human proteome has been mapped, and efforts are underway to map every cell in the human body with integrated functional and imaging data to create a comprehensive atlas. These projects have made largescale biology routine.

The integration and analysis of the resulting open access resources has relied on, and has driven, the rapid development of bioinformatics methods and services. Scientists – in all fields of life science – increasingly work in multidisciplinary collaborations with a common need to find and share data, exchange expertise, and access advanced bioinformatics tools and large-scale computational facilities wherever they are and without the boundaries that hinder international collaborative projects.

Open biological data drives research and the bio-economy

Refining the landscape for secure data sharing

Genomes linked to phenotypes for large populations

8

Access to FAIR data enabling reuse

Translating open data through bioinformatics for the bio-economy

ELIXIR Scientific Programme 2019–23

Describing rich biological data at all scales

Genomics technologies have transformed the life sciences from a hypothesis-driven activity to data-led science. In most life science disciplines, large-scale nucleotide sequencing is now routine, with 60 million human genomes estimated to have been sequenced by 2023.2 Large-scale genomic data will also be available in agriculture, microbiology and biodiversity research, and the data volumes generated by DNA-sequencing projects will be among the largest datasets in science.3 Ambitious projects aim to capture the genomic diversity of all eukaryotic organisms, which would truly change biological research and position genomics as the basis for discovering and understanding the Earth’s ecology.4 However, life science data is much more than genomics. Experiments and observations that aim to understand living organisms will generate rich and complex biomolecular and phenotypic descriptions, as well as high-throughput metabolomics, proteomics and imaging data that describe living organisms in unprecedented detail. These molecular descriptions will be linked to large-scale analyses of cells, tissues, whole organisms and populations. In the environmental and agricultural sciences, these molecular descriptions need to be linked to data on environment, yields and climate; in medicine, to diagnosis, treatment outcomes and social data. The increasingly rich life science datasets pose many challenges for life science research. High-content phenotypic data is heterogeneous, recorded using different standards both within and between domains, and can be difficult to find and re-use. Datasets are becoming too large and complex to manage in central archives; the cost and complexity of downloading and manging local versions necessitates a new approach. Privacy and regulatory considerations preclude copies of sensitive human research data being accessed outside of the originating institute. Emerging national data federations will need to develop standardised discovery protocols in order for data to be found by the research community. Additionally, a secure authentication and authorisation process, alongside guidelines and compliance processes, are essentially required to enable the community to use these data without compromising privacy and informed consent.

Services will need to cater for the data sharing that is needed to operate transnational collaborative projects, as well as to support broader, long-term sharing and access by the wider research community. To enable data integration and reuse, data producers must have compatible (interoperable) interfaces and, where data cannot be downloaded but must be analysed locally, they will require access to the necessary computational services. Data standards that support federated data access and provide a basis for collaboration between laboratories must be developed in close alignment with the research communities in the many life science domains. Data federations will be needed in many life science disciplines with a large research base, growing data needs, and wide geographical spread. Indeed, data federations are emerging in agriculture, biodiversity, biomedicine, marine and microbiome research.

‘Vast datasets in all areas of life science: integrating genomic, high-content phenotype, imaging, biochemical and environmental data across time and spatial scales’

2. Birney E, Vamathevan J, Goodhand P. Genomics in healthcare: GA4GH looks to 2022. bioRxiv 2017, 203554, https://doi.org/10.1101/203554 3. Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, Efron MJ, et al. Big Data: Astronomical or Genomical? PLoSBiol. 2015, 13(7):e1002195, https://doi. org/10.1371/journal.pbio.1002195 4. Lewin, Harris A, Gene E Robinson, W John Kress, William J Baker, Jonathan Coddington, Keith A Crandall, Richard Durbin, et al. Earth BioGenome Project: Sequencing Life for the Future of Life. Proceedings of the National Academy of Sciences 2018, 115 (17), 4325–4333. https://doi.org/10.1073/pnas.1720115115

ELIXIR Scientific Programme 2019–23

9


Preserving trust: Responsible access and reuse of human research data

Genomics has the potential to provide diagnostic, economic, and efficiency benefits to the healthcare services of EU member states, particularly those that are increasingly being challenged by aging populations. Rapid advancements in genomics hold the promises of improved clinical outcomes with tailored treatments and health information. As a result, national efforts are underway to sequence human genomes, with plans to generate exabytes of data from millions of participants (see Table 1). Additionally, European countries recently signed a Declaration to sequence and provide transnational access to at least 1M human genomes by 2022.5 The responsible sharing of these sensitive data will create virtual cohorts with tens of millions of participants and will offer unprecedented possibilities to identify the genetic drivers of cancer, rare diseases, as well as of other highly prevalent diseases, and improvements to lifestyle. Large cohorts, with potentially millions of participants, are needed to understand the genetic and molecular signatures of complex diseases, and they provide a cornerstone for the creation of personalised treatments. Securely accessing personal genomic data on this scale, and across national borders, is an enormous challenge that will require significant investments in national and international infrastructure. The implementation of the General Data Protection Regulation (GDPR) in

the EU, requires a framework for the secure archiving, discovery, dissemination, and analysis of personal data with audit and reporting capabilities. Supporting services are needed to manage policies, access, and user support. The scope and complexity of this infrastructure is beyond any single organisation – international collaboration is needed to provide recognised, secure, standardised, documented, and interoperable services. The realisation of the vision of a federated ecosystem of interoperable services to enable population-scale genomic and biomolecular data will accelerate biomedical research and will also improve the health of European residents. Managing data of such sensitive nature is a challenge, and whilst infrastructure to store, access, and to share research data already exists (such as the European Genome-phenome Archive; EGA), the shift from biomedical research to healthcare will bring new challenges. As an example, it is likely that data generated through healthcare will not be shared as widely as research data, as the healthcare sector does not yet have the tools to facilitate sharing. Beyond that, healthcare operates at a national level and is subject to national laws: it is often unacceptable for health data from one country to be exported outside of regional or national jurisdictions. Therefore, in order to truly

Scientific research at the healthcare interface Genomics: from sampling to populations

Real world data form integral part of research projects

5. Declaration of Cooperation: Towards access to at least 1 million sequenced genomes in the European Union by 2022, http://ec.europa.eu/newsroom/dae/ document.cfm?doc_id=50964

10

ELIXIR Scientific Programme 2019–23

Manage trust and regulatory requirements

translate genomics from biomedical research into routine applications in healthcare systems, a federated data management system must be adopted that exploits the advantages of scale across Europe. This system will require tools that work in environments where datasets are not directly available for integration and where, in many cases, even metadata is not disclosed in detail (as disclosure would result in the

breach of privacy or consent). Protocols to share and link data across multiple infrastructures will need to be transparent, robust, and consistently implemented across organisational and national borders. Additionally, training and education to inform the participants, researchers, and clinicians about how to manage sensitive human data will be essential to elevate concerns and to handle data effectively.

Table 1: Healthcare focussed genomics-based national initiative projects across the EU6 National initiative

Funder(s)

Focus area(s)

A-C-G-T Analysis of Czech Genomes for Theranostics (2018 – 2022)

Public-private: Funded by European Regional Development Fund, Czech State Budget, and private partner contributions

General

FarGen (Denmark) (2013 – ongoing)

Public: Some government funding already promised

General

France Médecine Génomique 2025 (2018 – 2025)

Public: Government funding

General

Genome Denmark (2012 – ongoing)

Public-private: Funded primarily by the Innovation Fund Denmark, but with contributions from the private partners too

Two pilot studies, one has general research aim, the other is focused on cancer research

Genome of the Netherlands (GoNL) (dates unknown, but findings published 2014)

Public: Netherlands Organization for Scientific Research

General

Genomics England (2012–ongoing)

Public: Genomics England is a registered company entirely owned by the UK Department of Health

Rare diseases and cancer

FinnGen and the Sequencing Initiative Suomi (Finland) (2014-ongoing)

Public-private: Funded by Business Finland (Public Innovation Research agency) and international pharma industry

General, with a focus on common, chronic diseases

The Estonian Biobank/ (EGCUT) (2000 – ongoing)

Public: Estonian Ministry of Social General Affairs and Ministry of Education and Research

The Scottish Genomes Partnership (2015 – ongoing)

Public: the Chief Scientist Office of the Scottish Government Health Directorates and the Medical Research Council Whole Genome Sequencing for Health and Wealth Initiative

General, cancer, rare, diseases

UK Biobank (2006 – ongoing)

Public-charity: Various UK government agencies; the Wellcome Trust; British Heart Foundation; Diabetes UK

Middle- and old-age related diseases

National Centre for Excellence in Research in Parkinson’s Disease (Luxembourg) (2015 – ongoing)

Public: Luxembourg National Research Fund

Parkinson's disease

ELIXIR Scientific Programme 2019–23

11


Table 1: Healthcare focussed genomics-based National Initiative projects across the EU6 (cont.) National initiative

Funder(s)

Focus area(s)

National Center for Medical Genomics – Czech national research infrastructure (2014 – ongoing)

Public: Funded by Czech State Budget and the European Regional Development Fund

General

National contact point and network for Rare Diseases in Slovenia (2016 – ongoing)

Public: Ministry of Health and Ministry of Education, Science and Sports

Rare diseases

Swiss Personalised Health Network (SPHN) (2017 – ongoing)

Public: Swiss government, distributed through a National Steering Board, the Swiss Academy of Medical Sciences and the Swiss Institute of Bioinformatics

General

The Heidelberg Institute for Theoretical Studies (HITS), part of ELIXIR Germany. HITS focuses on the processing, structuring, and analyzing of large amounts of complex data and the development of computational methods and software. © HITS/Keskin

6. Dubow T, Marjanovic S. Population-scale sequencing and the future of genomic medicine: Learning from past and present efforts. RAND Corporation 2016, https://doi.org/10.7249/RR1520. Supplemented by national genomics initiatives across ELIXIR Nodes.

12

ELIXIR Scientific Programme 2019–23

ELIXIR Scientific Programme 2019–23

13


Navigating a sea of data – ELIXIR Core Data Resources are the reference points Life science research relies upon an extensive ecosystem of open data resources that archive, curate, integrate, analyse and provide access to data generated worldwide. Some of these resources are repositories of primary data (such as nucleic acid sequences and protein structures), while others are knowledge bases that contain curated, high-quality, reference data about, for example, proteins, biomolecular interactions, model organisms or Mendelian diseases. Researchers across the world are dependent on the data integration and analyses that these open-access resources enable.

‘Researchers around the world rely on the ability to freely deposit into and download data from these resources. If for any reason we were to lose access to these Core Data Resources, it would have a devastating effect not only on science, but also on medicine, industry and innovation’

7. https://www.elixir-europe.org/platforms/data/core-data-resources 8. Zheng-Bradley X, Flicek P. Applications of the 1000 Genomes Project resources. Briefings in Functional Genomics 2017, 16(3), 163–70, https://doi.org/10.1093/bfgp/elw027

14

ELIXIR Scientific Programme 2019–23

ELIXIR has spearheaded the definition of Core Data Resources,7 the subset of ELIXIR data resources that are of fundamental importance to researchers in academia and industry. Being foundational to life science, the impact and influence of these data resources underpins their value to teachers, students and clinicians, as well as reaching through to the interested wider public. They are highly valued and form a research infrastructure that is critical for ensuring the reproducibility and integrity of the research enterprise. The ability to deposit into, and download data and information from, these resources freely and without restrictions is essential to progress in life science research. The significant loss of data from these resources or the introduction of barriers to data access would likely have devastating consequences for science, medicine, and for society as a whole. Molecular medicine, genome editing and the large-scale assessment of microbial biomes would be unthinkable without our ability to compare and annotate data with reference collections of genome sequences and other data types. For instance, over five petabytes of human genome data are distributed every year from EMBL-EBI resources, such as EGA and the European Nucleotide Archive (ENA; both ELIXIR Core Data Resources), as a basis for the identification and imputation of pathogenic variants.8

European Open Science Cloud: discoverability, access, sharing and analysis In 2016, the European Commission presented its plans for a European Open Science Cloud (EOSC) as a major priority for science policy, and research and e-Infrastructure investments in Europe. The EOSC aims to provide “1.7 million European researchers and 70 million professionals in science and technology a virtual environment with free at the point of use, open and seamless services for storage, management, analysis and re-use of research data, across borders and scientific disciplines.”9 The EOSC is a largescale effort that cuts across many of the EC-funded programmes and policy initiatives, and is aimed at integrating Open Data and Open Science policies with a federation of unified e-Infrastructure and research infrastructure services. The developing EOSC model describes a pan-European federation of data infrastructures that provides access to a wide range of publicly funded services, supplied at national, regional and institutional levels, and to complementary commercial services. The vision is a ‘digital single market’ for research assets. The development of EOSC is a major driver in the ELIXIR 2019–23 Programme. The EOSC requires that data resources and computational tools are Findable, Accessible, Interoperable and Reusable (FAIR), secure and operate in the cloud. ELIXIR has actively contributed to the development of open data policies and the FAIR concept: ELIXIR co-sponsored the development of FAIR,10 and has published a set of guidelines for the practical implementation of FAIR data management in the life sciences.11 ELIXIR Nodes manage significant resources for cloud compute and storage and, through the ELIXIR Compute Platform, we are actively developing models for transnational access, as well as the data-distribution mechanisms needed for effective interoperability between national clouds.

Thus, ELIXIR is well positioned to act as a major driver for the development of EOSC in the life sciences amd ELIXIR has services and data resources that reach a large proportion of Europe’s life science researchers. The vision of EOSC will influence the provision of compute resources through the national life science clouds managed by our Nodes. The emerging EOSC architecture is built around science domains (life sciences, earth and environment, etc.) and will drive the development and implementation of the data discovery, indexing and cataloguing services (such as Bioschemas, TeSS, omicsDI, for which more information is available in section "Delivering the ELIXIR Programme") for the life sciences. Recognising the strategic importance of EOSC in the development of a federated European data landscape, ELIXIR has engaged in the H2020 projects that are developing the first phase of EOSC. ELIXIR is leading work on data interoperability and the EDMI12 data model, science demonstrators, and principles of engagement in the EOSCpilot project (2017–2018). ELIXIR also participates in the EOSC-hub project where ELIXIR is deploying a reference data set distribution service to mirror key data resources in a distributed cloud environment. ELIXIR is coordinating the EOSC-Life project, where the 13 Biological and Medical Research Infrastructures (BMS RIs) in Europe will join forces to create an open collaborative digital space for life science in the EOSC. The project aims to: publish data from the BMS RI facilities and centres as FAIR Data Resources in the cloud; link reusable Tools and Workflows to standardised compute services in national life science clouds; connect all BMS RI users across Europe to a single login authentication and resource authorisation system; and to develop joint data policies to preserve and deepen the trust given by research participants and patients volunteering their data and samples.

9. European Commission. Implementation Roadmap for the European Open Science Cloud, http://ec.europa.eu/research/openscience/pdf/swd_2018_83_ f1_staff_working_paper_en.pdf

11. Blomberg N and ELIXIR Consortium. ELIXIR position paper on FAIR data management in the life sciences F1000Research 2017, 6(ELIXIR):1857 (document), doi: 10.7490/f1000research.1114985.1

10. https://www.lorentzcenter.nl/lc/web/2014/602/info.php3?wsid=602

12. https://eoscpilot.eu/edmi-metadata-guidelines

ELIXIR Scientific Programme 2019–23

15


Open, high quality tools and workflows drive reuse and reproducible computational science Scientists need software tools to access, study and compare data. They need clear descriptions of the use, performance, and licensing of these tools, and they need to integrate the tools into robust scientific workflows. The long-term use and uptake of such tools by the life science community requires that tools and workflows are findable and accessible, in line with the FAIR principles, and further that they are also citable and their metadata profiles managed and available for review and analytics, with the author’s credit attributed fairly and accurately. Running complex data analysis workflows from the command line often requires IT and programming experience that make such workflows inaccessible to many scientists. Constant changes from new software versions and different operating systems and cloud installations add to the complexity. To address this, it is key to work not only with tool developers, but also with users and cloud providers, to enable users to access workflows encapsulated in containers that can be easily reused and deployed in both academic and commercial clouds. Well documented, containerised workflows are also inherently reproducible, thus addressing one of the key challenges in computational life science.

‘A data federation for European biology: Connecting, accessing and reusing data over geographical borders and across life science disciplines’

Connecting national clouds As life science datasets grow larger and become multidimensional, the computational resources required to process and to analyse the data frequently outgrow the capabilities of individual researchers and even of large research institutes. Providing compute resources from a large shared pool, and giving open access to data by cloud computation technologies, empowers researchers, whether they work in small laboratories and remote locations or in Europe’s largest research centres. Consequently, in the last few years, specialised genomics and biomedical cloud environments have been created for individual projects in Europe and globally. However, many challenges remain to fully utilise cloud services across Europe to enable their use in seamless workflows. Resource allocation and cost models must be developed to allow transnational access. Collaborative projects and cloud interoperability standards need further development and widespread adoption with harmonisation of task and workflow execution systems. Security standards and user access protocols must be established with the necessary mutual recognition processes. Ultimately, the vision is that national life science clouds are compatible with life science services and data access protocols, and operate in a cloud ecosystem that spans local private clouds, national community clouds, European research clouds, as well as public/commercial cloud providers.

Delivering value to wider society from Research Infrastructures

ELIXIR operates at the interface between life science data generation and the application of bioinformatics in the fields of health, food security, and the environment (Figure 1). Research activities in these fields have substantial socio-economic benefits and support the collective commitment of countries to the UN Sustainable Development Goals.13 In particular the Sustainable Development Goals 2 (Zero Hunger), 3 (Good Health and Wellbeing), 9 (Industry, Innovation and Infrastructure), 14 (Life Below Water), and 15 (Life On Land). The applications of bioinformatics in the health sector are well recognised, ranging from disease diagnosis (e.g. rare diseases) and prevention (e.g. certain cancers), to personalised medicine (or precision medicine), and to the development of new treatments. For instance, a test based on genetic variants and using only a saliva sample has recently been developed to help identify men who have a vastly increased chance of developing prostate cancer.14 Successful applications of bioinformatics to food production can be found in agriculture, farming and aquaculture, and are key to ensuring security of food supplies globally. In particular, resistance to pests and pathogens has been shown to help reduce the cost of food production.15

The environment, including biodiversity and ecosystem services, is probably the area where bioinformatics applications have the greatest, unrealised potential to date. Bioinformatics can, for instance, provide solutions for pollution control and remediation, whilst metagenomics is about to revolutionise our biogeographic knowledge of species and habitats, having already provided hope for the development of new antibiotics.16 Undoubtedly, many of the open biological data services developed and coordinated under ELIXIR have resulted in applications of clear socio-economic benefit – the challenge is often to uncover which resources have been used. Looking at citations within patent documents is one effective way to link open biological data resources with bio-industry applications (from drugs and diagnostics, to consumer products, such as washing powders that use enzymatic activity), thereby demonstrating the usefulness of scientific research for human society.17 Also, an ELIXIR report on the use of public data resources by SMEs18 shows examples of companies that fundamentally rely on public data for their operation and to generate revenue.

The value of open biological data resources and associated research infrastructure translates into research and socio-economic impacts. Sustainable development goals:

Knowledge generation

ELIXIR’s Platforms & Communities

R&D applications: health, food, security, environment

• Zero hunger • Good health and well-being • Industry, innovation and infrastructure • Life below water • Life on land

Scientific impact Socio-economic impact

Figure 1: Translation the values of ELIXIR resources into research and socio-economic impact. 13. https://sustainabledevelopment.un.org 14. Schumacher FR, Al Olama AA, Eeles RA et al. Association analyses of more than 140,000 men identify 63 new prostate cancer susceptibility loci. Nature Genetics 2018, 50, 928–936, https://doi.org/10.1038/s41588-018-0142-8 15. University of Edinburgh. "Gene-edited pigs are resistant to billion-dollar virus." ScienceDaily 20 June 2018, https://goo.gl/9uYUJt 16. Adetunji J. Marine compound first new natural antibiotic in decades. The Conversation 26 July 2013, https://goo.gl/FC6qXc

16

ELIXIR Scientific Programme 2019–23

17. Bousfield D, McEntyre J, Velankar S et al. Patterns of database citation in articles and patents indicate long-term scientific and industry value of biological data resources. F1000Research 2016, 5(ELIXIR):160, https://doi. org/10.12688/f1000research.7911.1 18. Roman Garcia P, Smith A and Blomberg N. Public data resources as a business model for SMEs. F1000Research 2018, 7(ELIXIR):590 (document), https://doi.org/10.7490/f1000research.1115445.1

ELIXIR Scientific Programme 2019–23

17


ELIXIR, Europe's life science research infrastructure

Rachel Drysdale (ELIXIR Hub) and Gos Micklem (ELIXIR UK) at the ELIXIR All Hands meeting 2017 in Rome.

18

ELIXIR Scientific Programme 2019–23

ELIXIR Scientific Programme 2019–23

19


ELIXIR’s mission is to operate a sustainable European infrastructure

Figure 2: Distributed infrastructure with joined-up services

for biological information, supporting life science research and its translation to society, the bio-industries, environment and medicine. ELIXIR is the European life sciences infrastructure for biological information. ELIXIR brings together bioinformatics tools, resources and experts from its Nodes into a distributed infrastructure that enables researchers across Europe to share and store their life science research data. Within ELIXIR, each Member State establishes an ELIXIR Node, often as a nation-wide consortium. The Nodes provide services (such as data management, clouds, and workflows) to local users, and the Bioinformatics facilities in the Nodes are tightly linked to research projects within their national life science communities. Indeed, one of the lasting benefits achieved in the initial phase of ELIXIR is the formation of well-organised national infrastructures. However, it is important to note though that the role of the Node in each Member State is based on national priorities, and there is considerable diversity in the scope, organisation and maturity of national Nodes. ELIXIR Nodes contribute to and run all the bioinformatics services that form the ELIXIR Research Infrastructure and are accessed by users both nationally and internationally. The Nodes provide the majority of the capacity needed to deliver the ELIXIR operational activities, programmes and governance. The ELIXIR Hub is responsible for coordinating organisational and technical interactions with the Nodes. In addition, the Hub is responsible for the day-to-day operational, financial and administrative management of ELIXIR, in accordance with the decisions by the ELIXIR Board. A primary function of ELIXIR is to support Nodes by spreading good practice through focused capacity building, including building capacity for programme management and resource development, as well as the skills needed to manage and exploit data. The ELIXIR Hub funds, through Commissioned Services, a technical programme that supports ELIXIR-wide infrastructure services that underpin and connect scientific services and data resources from ELIXIR Nodes.

This joint ELIXIR technical and scientific programme is funded from a combination of internal projects (funded by ELIXIR member states), joint European research infrastructure projects (funded by the European Commission and other international funders), and by Node contributions of services and capacity (funded by national, European Commission (EC) and international funds). The Programme, through the detailed technical roadmaps provided by the ELIXIR Platforms, will thus drive the implementation of ELIXIR’s joint scientific strategy and will also guide coordinated grant applications. ELIXIR has, since the early preparatory phase, had a strong focus on industry’s use of bioinformatics data with a strategy based on strengthening the industry and SME partnership capabilities of ELIXIR Nodes. During the first ELIXIR Programme we developed an SME & Innovation programme that, by fostering collaboration with national “bio-innovation networks”, enabled local biotechnology and life science informatics companies to access expertise across ELIXIR via focussed SME workshops. Our experience from the programme is very good; typically, over half of the participants are from local biotechnology companies, and by focusing on themes of high national interest, we provide opportunities to align outcomes with strategic research investments. This programme is unique among ESFRI research infrastructures and a strong differentiator for ELIXIR.

ELIXIR operates as a virtual organisation and executes joint technical programme, funded by ELIXIR’s core budget

Bilateral agreement between a Node and the ELIXIR Hub

International agreement between countries

As ELIXIR begins its second scientific Programme for 2019–23, it does so as the largest ESFRI biomedical research infrastructure (ESFRI, the European Strategy Forum on Research Infrastructures). From its launch in 2014 with five Member States, ELIXIR now brings together national bioinformatics infrastructures – ELIXIR Nodes – from 22 members and one observer country. Collectively, our ELIXIR Nodes represent over 200 institutions, and our weekly update to all Node staff reaches over 700 national experts dedicated to infrastructure operations and to advanced user support and training.

The ELIXIR Programme consolidates priorities over the ELIXIR Nodes for the coordinated infrastructure needs that drive shared service operations. The Programme describes the jointly operated services used by national Nodes and the life science community at large.

20

ELIXIR Scientific Programme 2019–23

ELIXIR Scientific Programme 2019–23

21


• 22 Members and 1 Observer • Over 200 institutes • Over 700 experts dedicated to infrastructure operations • 18 Core Data Resources • 145 services offered by ELIXIR Nodes • 52 Implementation Studies • Over 400 companies and SMEs engaged

Data

Tools

Node

Compute

Interoperability

Training

Node


Established transnational service operations for data integration, data analysis, compute and storage Organisationally, ELIXIR has now reached critical mass and has established the implementation mechanisms and legal framework that allows its infrastructure to drive coordinated, transnational actions. ELIXIR has established a portfolio of Commissioned Services, i.e. services or studies funded by the ELIXIR Budget, and has built the foundation for the joint infrastructure services set out in this Programme. The main Commissioned Services mechanism deployed by ELIXIR to date has been the Implementation Studies. Commissioned Services are funded through the ELIXIR Budget to drive the integration and the ELIXIRwide availability of the ELIXIR Node-funded services provided through national funding schemes. The development of robust processes for Commissioned Services was one of the strategic objectives of the first ELIXIR Programme, and the establishment of a portfolio with around twenty concurrent service projects is one of the main deliverables. In the ELIXIR 2019–23 Programme, Commissioned Services will involve investments over a number of years and will provide the long-term, essential Infrastructure Services upon which other services can be built. The ELIXIR Hub has also built up the project management capability required to lead large infrastructure grants: the CORBEL and ELIXIREXCELERATE projects each have over 40 beneficiaries and linked third party participants. ELIXIR has established the agreements necessary to participate as an infrastructure provider in large programmes where it acts as a single beneficiary with Nodes as linked third parties.

ELIXIR coordinates the services provided by its national Nodes by organising these into five technical Platforms (Data, Tools, Compute, Interoperability and Training) that drive harmonisation and service development. The Platforms also identify and operate joint transnational services. Each ELIXIR Platform is co-chaired by senior experts from Nodes appointed by the ELIXIR Heads of Nodes (HoN). Across the Platforms, around 40 different groups work on specific service areas. As the first ELIXIR Programme comes to an end, the ELIXIR Platforms each have long-term roadmaps19 with a portfolio of services in production. For example, the ELIXIR AAI20 has a rapidly growing number of relying services,21 such as the de.NBI cloud, and the ELIXIR AAI is now adopted also by other BMS RIs and by commercial cloud brokering services.

A Europe-wide FAIR data management programme The ELIXIR principles for FAIR data management provide a set of recommendations to help users to find and access datasets and services across scientific disciplines. Our strategy relies on the exposure of common and minimal metadata by data resources, and on the creation of a coordinated set of catalogues that efficiently manage and exchange the metadata between different data types. Through our national Nodes and the associated data resources, ELIXIR will support the adoption of lightweight, minimal metadata standards, and will operate a network of data discovery and access services that form the basis of a life science data federation. ELIXIR will operate the computational services that implement the data and metadata standards – the life science registries, identifier and ontology services that underpin strong data stewardship with common standards that make data FAIR. By linking joint services with Node data management programmes, ELIXIR provides the infrastructure needed for a Europe-wide data federation. ELIXIR, via its Nodes, is also involved in the FREYA 22 project on persistent identifiers (a key component of FAIR) and collaborates with OpenAIRE23 to ascertain the compatibility of registries and catalogues.

ELIXIR has addressed the discoverability of tools by developing tool and workflow registries, by increasing the quality and sustainability of software through the development of open source software best practices and by partnering in community driven initiatives such as the Common Workflow Language (CWL) standards and the Galaxy workflow environment. For instance, the Marine Metagenomics Community have worked with the Interoperability Platform to adopt the CWL standard and benchmark workflows between Nodes, thus laying the groundwork for the subsequent cloud deployment and integration with the ELIXIR AAI using ELIXIR Compute Platform services. Similarly, the Tools Platform has worked with the Rare Disease Community to establish tool annotations and benchmarking pipelines for common data analysis tools. However, challenges remain: while there are individual examples that show how tools and algorithms can access large, distributed datasets, much work remains to be done to make distributed analysis in a federated data ecosystem the reality for European life scientists.

Software Carpentry Workshop in October 2018, organised by ELIXIR Belgium in Gent. © Mauricio Macossay Castillo. 19. E.g. ELIXIR Compute Platform’s ‘A Technical Services Roadmap for Life Science Research in Europe’

24

ELIXIR Scientific Programme 2019–23

20. https://www.elixir-europe.org/services/compute/aai

22. https://www.project-freya.eu/en

21. https://perun.elixir-czech.cz/services/

23. https://www.openaire.eu

ELIXIR Scientific Programme 2019–23

25


Achievements of ELIXIR Platforms and Communities in 2014–2018 Platforms

Data

Communities

Tools

Compute

Interoperability

Training

Human data

Rare diseases

Marine

Plants

18 Core Data Resources identified

Over 10,000 tools registered in bio.tools

Demonstrated technology to transfer sensitive human data to secure cloud service

113 policies, 1128 databases, and 1199 standards metadata in FAIRsharing

Developed Training portal giving access to over 1000 training materials

Coordinating federated data management

Developed registry of over 100 data resources for rare disease research

Released ITSoneDB Marine section

Upgraded Minimum information about plant phenotyping experiments: MIAPPE

12 ELIXIR recommended Deposition Databases

4,000 containers registered in BioContainers

ELIXIR AAI deployed with 67 services and over 1,800 registered users

Over 5,500,000 terms from 212 ontologies in Ontology Lookup Service

Developed Quality and impact assessment framework

Developed ELIXIR Beacon implementation

Comparison and standardisation of services

Updated MGnifyDB: EBI Metagenomics Portal

Developed ELIXIR Plant Data lookup service

Pioneered first externalreview-based selection of Implementation Studies

Driving benchmarking within over 10 scientific communities

Drive development of European cloud capacities for life science (through EOSC)

655 collections, 822 resources with resolvable URIs and CURIEs prefixes in Identifiers.org

Organised 16 e-learning courses

Increased sensitive data accessibility and re-use

Developed analysis pipelines for marine metagenomics (MGnifyDB and MMP)

Drive development of Breeding API & BrAPPS

44 resources using Bioschemas metadata specifications

Organised 570 training events

Released Ocean Gene Atlas

Trained 146 new bioinformatics trainers

Launched Marine Metagenomics Portal (MMP)

Drive development of Galaxy platforms

26

ELIXIR Scientific Programme 2019–23

ELIXIR Scientific Programme 2019–23

27


ELIXIR develops in close partnership with research communities

Successful life science research communities often develop strong conventions and standards for managing data. For example, since 2002, the Proteomics Standards Initiative24 has developed a range of standards to facilitate data comparison, exchange and verification in proteomics. Similarly, the Global Alliance for Genomics and Health (GA4GH) oversees and develops standards and protocols for the international, responsible sharing of human genomics data. These initiatives are powered by national experts who commonly lead national centres of excellence and who, in Europe, are most often linked to ELIXIR Nodes. As an infrastructure, it is critical for ELIXIR services to be closely aligned with and to respond to the requirements of scientific communities. ELIXIR’s portfolio of cross-cutting resources will need to adapt to the demands from specific research domains, and it is vital that ELIXIR actively identifies these with reference to the user communities. In the ELIXIR 2019– 23 Programme, we develop the concept of scalable and sustainable partnerships, called ELIXIR Communities, which bring together our Node experts with users in specific research fields (e.g. clinical genomics, plant and agricultural research, or proteomics). The ELIXIR Communities are formally recognised groupings with published roadmaps25 that set out the goals for each community and that describe how to link the long-term services of ELIXIR Platforms with the research community’s needs. These community roadmaps allow ELIXIR to identify and implement sustainable solutions that are reusable across fields, and that gain from synergies and pave the way to new infrastructure services. The partnership with ELIXIR allows the research communities to develop solutions on top of stable, long-term services, and to access best practice and expertise in infrastructure challenges, such as data management, cloud compute and transnational data sharing.

For example, ELIXIR Node bioinformaticians in the field of agriculture and plant sciences have worked together with large-scale plant phenotyping centres to develop a metadata standard (MIAPPE) and a shared API for data exchange (the Breeding API, BrAPI). Together with the ELIXIR Interoperability Platform, the ELIXIR Plant Community is adopting a suite of generic datavalidation tools and the implementation of BrAPI in national phenotyping centres to drive an emerging data federation.26 The goal of the Federated Human Data Community is to create a federated ecosystem of interoperable services that enables population scale genomic and biomolecular data to be made accessible across international borders. By providing internationally recognised, secure, standardised, documented and interoperable services, ELIXIR will accelerate biomedical research and will improve the health of European residents. The Federated Human Data model for developing a data federation could also be applied to other major life science disciplines that have a large research base. ELIXIR will monitor other research communities, such as those in agriculture, biodiversity and microbiomes, for opportunities to create partnerships that support long-term data federations. Stable, sustainable resources and services with clear terms of use and management plans are the foundation for such partnerships: when research infrastructures, projects and user communities embed and extend ELIXIR services, long-term relationships are forged that drive the development of an integrated FAIR data and knowledge management infrastructure.

The Earlham Institute (EI) in Norwich, part of ELIXIR UK. EI explores living systems using genomics, bioinformatics and biotechnology.

24. http://www.psidev.info 25. See e.g. Vizcaíno JA, Walzer M, Jiménez RC et al. A community proposal to integrate proteomics activities in ELIXIR. F1000Research 2017, 6(ELIXIR):875, https://f1000research.com/articles/6-875/v1 and van Rijswijk M, Beirnaert C, Caron C et al. The future of metabolomics in ELIXIR. F1000Research 2017, 6(ELIXIR):1649, https://doi.org/10.12688/f1000research.12342.2 26. https://brapi.org

28

ELIXIR Scientific Programme 2019–23

ELIXIR Scientific Programme 2019–23

29


ELIXIR in the European Research Infrastructure landscape

ELIXIR, as Europe’s life science data infrastructure, has an important role in connecting the 13 ESFRI BMS RIs. The ELIXIR 2019–23 Scientific Programme builds on the well-established national collaborations and European partnerships between the BMS RIs to ensure that our services are widely available to researchers as they access Europe’s advanced life science facilities. Our goal is to ensure that life science data is FAIR and readily available for reuse and that users have easy access to the computational tools and resources. There are many strong, national collaborations between BMS RIs, for instance the HealthRI collaboration in the NL brings together the national nodes of BBMRI, EATRIS and ELIXIR; IMIM in Spain co-hosts the nodes of Instruct, EU-OPENSCREEN and ELIXIR. International alignment is therefore a key requirement and we expect that joint solutions in the EOSC that link data, tools and workflows will drive further collaboration and harmonisation.

Building on established collaborative projects with BMS RIs and with European e-Infrastructures ELIXIR has coordinated the cluster projects that bring together the ESFRI research infrastructures within the EC research infrastructure programme. We have made our different data types interoperable (e.g. in the EC FP7-funded cluster project, BioMedBridges) and have enabled researchers to access any combination of our services in an integrated manner (e.g. in the EC H2020-funded cluster project CORBEL). Through these projects, the BMS RIs have mapped out their data management practices and generated metadata standards catalogues.27 These practices and standards, through the strong connection with data experts and users in other infrastructures, guide and embed the ELIXIR services throughout the community. The CORBEL project developed the concept of shared innovation pipelines; advanced biomedical research (e.g. biomarker discovery) needs to make extensive use of services across infrastructures (such as imaging, biobanks, and molecular structures) and to put in place effective interfaces, as well as the joint services needed for open, excellence-driven, calls for access across research infrastructures. The CORBEL data management work has been embedded into the ELIXIR Interoperability Platform and has delivered the services, identifiers and ontologies needed for the integration and interoperability of BMS RI data resources. The BMS RIs have also jointly engaged in the e-Infrastructure project, AARC2, to develop a specification of AAI suitable for the life science community. The ELIXIR AAI, and work planned on AAI in AARC2 (2017–2019), and the EOSC build on the foundation laid in ELIXIR-EXCELERATE (2015-2019) and in ELIXIR’s Implementation Studies. We now move towards the operations of a joint Life Science AAI, with component e-Infrastructure services provided by a GEANT, EGI and EUDAT consortium.

27. ELIXIR, EU-OPENSCREEN, BBMRI, EATRIS, ECRIN, INFRAFRONTIER, Suhr S (editor). Principles of data management and sharing at European Research Infrastructures. Zenodo 2014, http://doi.org/10.5281/zenodo.8304

30

ELIXIR Scientific Programme 2019–23

ELIXIR leads the joint BMS RI consortium in the EOSC (EOSC-Life), and we expect that this will be a key driver for the continued collaboration between BMS RIs in the context of the developing EOSC. ELIXIR is also a partner in EOSCpilot and in the e-Infrastructureled EOSC-hub projects, and is thus actively involved in shaping the European open data policy and data service landscape in close collaboration with other stakeholders.

Deepening the long-term partnership with European Research and e-Infrastructures The BMS RI partnership builds on joint long-term planning. In 2015, the BMS RIs, through a joint memorandum of understanding, established the BMS RI Strategy Board comprising the Directors of the established ESFRI BMS RIs together with the coordinators of the ESFRI preparatory phase infrastructures. The BMS RI Strategy Board actively monitors the overall development of the BMS RI landscape and provides an interface between the individual research infrastructures, emerging new infrastructures, and design studies. The BMS RI Strategy Board contributes to long-term sustainability by identifying synergies and by articulating joint priorities between infrastructures, and by ensuring that ELIXIR maintains strong political, scientific and technical links with the other BMS RIs. In particular it provides the high-level strategic discussion of common issues concerning data, where ELIXIR is considered the lead.

‘All ESFRI Biomedical Science Research Infrastructures generate data and these need to flow into sustainable databases that are interoperable, safe and secure.’

The BMS RIs are at different stages of development, from established ESFRI Landmarks to early preparatory phase projects. Aside from the work in our joint H2020 project, ELIXIR’s interactions and partnerships with the other BMS RIs will therefore consist of multiple mechanisms. For instance, in 2014 ELIXIR and EuroBioImaging agreed an Image Data Strategy to jointly work on data resources to enable Imaging Archives to be linked to genomics and other biomolecular data. ELIXIR’s Marine Community is working in close collaboration with the marine research stations in EMBRC; and likewise, the work of the ELIXIR plant community is closely aligned with the developing data strategy for the plant phenotyping centres in EMPHASIS. ELIXIR Nodes and resources covering molecular and cellular biology, chemical biology, translational research, environmental and agricultural sciences, form the research data backbone for life science in the European Research Area. Through our distributed organisation we bring together experts and users from all over Europe and aim to form strong national collaborations with the nodes of other BMS RIs. The ELIXIR Communities provide an important mechanism for these partnerships, where the domain experts from ELIXIR Nodes, who often work in close collaboration with national facilities from other BMS RI, can identify and implement the data services needed by the users of these facilities. ELIXIR will also seek to establish further joint data strategies, where necessary, supported by high-level collaborative agreements or MoUs, to drive the development of a shared European data landscape rooted in FAIR principles and based on shared data and metadata standards. As ELIXIR’s portfolio of Communities evolves we expect many of them to be closely aligned with other BMS RIs, facilitating closer interactions and co-working. Examples of possible future interactions at this level include the INSTRUCT and Infrafrontier BMS RIs.


Strategic objectives of the ELIXIR 2019–23 Programme

Plant Sciences core facility at CEITEC, The Central European Institute of Technology in Brno (part of ELIXIR Czech Republic).

32

ELIXIR Scientific Programme 2019–23

ELIXIR Scientific Programme 2019–23

33


In the 2019–23 Programme ELIXIR will extend our established Europewide infrastructure of key reference databases, FAIR data services and standards validators, and common platforms for tools, workflows and clouds, into a federated data ecosystem in which life scientists – across all our member states – routinely access our joint infrastructure for national projects and international collaborations. We will track progress towards our objectives with a set of ELIXIR-wide key results that cascade to all of our individual groups. Taken together, these objectives will drive the implementation of a Europewide infrastructure for federated data access and

analysis across all of life sciences disciplines based on common international standards and long-term sustainable services and data resources, and will make this infrastructure available at scale for the 500,000 life scientists working in Europe.

The five strategic objectives of the 2019–2023 Programme In 2023:

1 2

ELIXIR operates a portfolio of integrated services that meet the

3

ELIXIR Core Data Resources are the global standard for

4 5

ELIXIR is the recognised and trusted life science foundation of

data needs of life scientists at a European scale ELIXIR Communities drive service uptake, support standards

1

ELIXIR operates a portfolio of

integrated services that meet the data needs of life scientists at a European scale A Europe-wide infrastructure for federated data access and analysis across all of the life science disciplines will require a bioinformatics infrastructure that encompasses data resources, metadata services, tools, workflows and computational services that scale to meet the needs of large research communities. The services provided need to be dependable and maintained for the long-term so users can trust that dependencies will not break. The provision of services to users in different countries and organisations requires robust user authentication and management, effective protocols for data distribution, and cloud computing services that can support international collaborations. Users need to be trained and to have access to support and help desks. Providing a portfolio of integrated services with recommendations on use, training and advanced support is at the core of ELIXIR’s mission.

development, and connect ELIXIR’s experts in life science disciplines

ELIXIR will partner with open collaborative frameworks (e.g. Galaxy and CWL), and it will bring together the large community of tools developers and workflow experts to promote standards, to link to our cloud deployment frameworks and to drive the findability, availability, interoperability and reuse of life science tools. Building a common infrastructure and engaging with European and international experts (e.g. from NIH Cloud Commons projects, CyVerse, GA4GH) will help to form a strong developer community aligned around open standards in active reuse. The ELIXIR registries are a key part of this infrastructure: they serve as discovery portals that help end users find distributed services. Registries also allow developers of complex workflows to access and integrate tools and data provided by our many national centres.28 Thus, registries are a critical part of a federated cloud infrastructure for science. The ELIXIR 2019–23 Programme will develop a common registry strategy that provides coordinated registries of datasets, tools and containers that help users to find the right resources for the research problem at hand. ELIXIR will lead on defining the relationship, dependencies and maintenance of our life science specific registries within the EOSC architecture (c.f. objective 4 below). ELIXIR, through Nodes and individual services, will also lead on global collaborations to harmonise and interoperate registries, in particular with BD2K and NIH Data Commons. Europe has a rich ecosystem of data resources, many of which provide high quality curation and insights into specific areas of biology. ELIXIR will support the development of excellence and capacity in data resource management and will provide services for the sustainable, high-quality, manual curation of biological facts. ELIXIR will also drive the integration of literature and data in the life sciences to capitalise on the changes in scientific publishing and the rapid development of preprints alongside data publishing. ELIXIR also aims to support the management – and international linking – of the whole landscape of research data resources that connect ELIXIR’s data resources to the cloud environments and interoperable workflows that drive reuse.

bioinformatics resource management and are the foundation for an international funding and life cycle management strategy that secures the long-term sustainability of those resources the European Open Science Cloud All ELIXIR Nodes connect life science users in academia and industry to our open, federated service network

28. The importance of registries in modern bioinformatics is highlighted by the many examples led and supported by ELIXIR Nodes: BioContainers (integrated with BioShaDock) for software containers; Bio.tools for tools and databases; identifiers.org for databases resolution of their identifiers; FAIRsharing for standards, policies and databases; OmicsDI for datasets; the ELIXIR registry for Beacon interfaces; OLS for ontologies; TeSS for training materials and events; BioSamples for biological samples; MyExperiment for workflows; Biocatalogue for web services; and so on.

34

ELIXIR Scientific Programme 2019–23

ELIXIR Scientific Programme 2019–23

35


The ELIXIR Training Platform – which links national training efforts in all our Nodes – provides a panEuropean, comprehensive set of courses via ELIXIR Node training programmes. The ELIXIR TeSS training portal forms part of this Platform and provides open access training materials. In the 2019–23 Programme ELIXIR will strengthen national training programmes to enable it to train a large user community to use the ELIXIR services; through training and staff exchange programmes it will also support the professional development of Node staff to deliver coordinated and interoperable services. The ELIXIR group of Node training coordinators will continue to work on quality benchmarking and impact assessment to further develop our training programmes.

Expected outcomes By the end of the ELIXIR 2019–23 Programme, the critical need for harmonised data infrastructure in the life sciences will be met through a portfolio of European-wide services operated by ELIXIR Nodes. In a distributed infrastructure, such as ELIXIR, this requires agreement between our many service providers on performance indicators (KPI) for adoption, delivery and impact. Delivering these KPIs is an important milestone towards a harmonised service portfolio. Throughout the ELIXIR Member States scientists will have access to advanced services with the in-depth user support necessary for use of data management services, interoperable workflows, tools and computational resources at scale. This will enable the wide adoption of FAIR principles, ensuring that data is openly accessible for reuse in full compliance with all ethical, regulatory and legal requirements. The ELIXIR Training programme has developed the expertise and human capacity needed to use and operate a Europe-wide, federated data infrastructure with a suite of learning options for users and with regular training and staff exchange to continuously develop the staff in our Nodes.

Key results • By end of 2021, all ELIXIR Platforms have defined a portfolio of services as part of their service architecture with service management and sustainability plans and ongoing monitoring of usage and impact. • By end of 2023, ELIXIR routinely provides the life science research data infrastructure for European (FP9) projects, supporting data stewardship, providing advanced analysis tools, and access and authentication services to life science resources. • By the end of 2023, ELIXIR will have four long-term Infrastructure Services in operation.

36

ELIXIR Scientific Programme 2019–23

2

ELIXIR Communities drive service

uptake, support standards development, and connect ELIXIR’s experts in life science disciplines A major new development in the ELIXIR 2019–23 Programme is ELIXIR Communities. These Communities consist of groupings of Node experts and advanced users in bioinformatic domains where there is a strategic interest in a specific domain, based on subject area or technology, as well as critical mass across ELIXIR Nodes. ELIXIR Communities establish formal links where Node experts actively collaborate across Nodes to define and use solutions within specific research areas (e.g. rare heritable diseases, proteomics). The partnership between Node experts and advanced users formed within ELIXIR Communities will drive the adoption of standards-based, interoperable data management and the deployment of sustainable tools and workflows in national and European projects. The ELIXIR 2019–23 Programme fosters a close collaboration between our ELIXIR Platforms and the ELIXIR Communities and will drive the usage and impact of ELIXIR’s service portfolio through targeted Commissioned Services (Community-led Implementation Studies). These partnerships allow user needs – from different fields of life science – to be captured into formal requirements by the ELIXIR Platforms. They will also support the uptake of standard solutions such as transferable workflows using standardised components and containers, or collaborations to access ELIXIR-associated clouds. Through ELIXIR Communities, the experts and advanced users in specific fields also have an opportunity to engage in and drive global standardisation efforts. ELIXIR – as a coalition of the national European bioinformatics infrastructures – is well positioned to drive the development and implementation of global, community-defined standards for life science data management. By supporting ELIXIR Communities, we have an opportunity to foster the development of community standards (e.g. BrAPI/MIAPPE, Marine metagenomics recommendations, CWL); link to standards organisations (e.g. HuPO/PSI); and to use our distributed organisations to drive the uptake of these standards in national data management programmes. Building on the established strategic partnership ELIXIR will also continue to work closely with the GA4GH to support the development and implementation global solutions for management and

sharing of personal biomedical research data. ELIXIR, through coordinated approaches as well as aligned Node services, will also engage in other international research and data sharing projects to develop and operate a data sharing and analysis infrastructure. The focus on ELIXIR Communities signals ELIXIR’s shift from coordinating and building its infrastructure to the operation of long-term sustainable services in partnership with scientific communities, to drive their usage and impact in research projects.

Expected outcomes The key outcome is that the data management, access and analysis needs of diverse user communities are captured by ELIXIR Platforms and are used to target a portfolio of bioinformatic infrastructure services that are tuned to the needs of research communities. White papers,29 developed as part of establishing an ELIXIR Community, will help to prioritise service delivery and will highlight gaps in the ELIXIR portfolio. The expected outcome will be captured by performance indicators that measure usage and impact. Another key outcome is that ELIXIR helps the Community to address future challenges and enables international collaborations. Ultimately each Community adopts ELIXIR services as the default for new, large-scale initiatives.

Key results • By the end of 2021, five ELIXIR Communities have established ELIXIR Platform services as a key component of the Community’s scientific work with an agreed mechanism for including ELIXIR services as the infrastructure component for collaborative international projects. • By the end of 2023, ELIXIR has established its infrastructure for the secure handling of federated human research data and the usage of the infrastructure is demonstrated by participating in several large-scale international collaborative projects.

29. See e.g. Vizcaíno JA, Walzer M, Jiménez RC et al. A community proposal to integrate proteomics activities in ELIXIR. F1000Research 2017, 6:875, https:// doi.org/10.12688/f1000research.11751.1; and van Rijswijk M, Beirnaert C, Caron C et al. The future of metabolomics in ELIXIR. F1000Research 2017, 6(ELIXIR):1649, https://doi.org/10.12688/f1000research.12342.2

ELIXIR Scientific Programme 2019–23

37


3

ELIXIR Core Data Resources

are the global standard for bioinformatics resource management and are the foundation for an international funding and life cycle management strategy that secures the long-term sustainability of those resources A principal aim of ELIXIR is to provide a stable means to manage and maintain reliable and high-quality life science data resources that are accessible to the widest possible user community. This is a challenging goal: the distributed nature of biological data resources does not conform to the traditional ‘single-site research facility model’ familiar to most scientists, funders or science policy makers. In our first Programme, ELIXIR defined a methodology for identifying and describing data resources as a research infrastructure via the ELIXIR Core Data Resources. However, the need to secure the long-term sustainability of ELIXIR Core Data Resources can only be addressed through global collaboration.

The long-term maintenance of open research data is the foundation for all other ELIXIR activities; the preservation of open research data is a key challenge for ELIXIR. Core Data Resources are not funded directly from ELIXIR, they are supported by largely dispersed investments from national and charitable funders with no international scheme for equitable, priority-driven long-term access. A particular challenge in Europe is the development of models that support the data deposition and curation costs that arise from Open Data policies in international funding programmes, such as IMI and H2020. There is an emerging international recognition of the importance of establishing equitable, long-term funding for open-access data resources, and a coalition of the world’s biomedical and life sciences research funders is being established to coordinate support for the most essential data resources in a more strategic and sustainable way.30, 31 ELIXIR has played a leading role in establishing this coalition, and will continue to drive the development of a global framework as part of this work. In particular, ELIXIR will work with resource owners on indicators of use and impact for the Core Data Resources, to describe the long-term value of these resources, and will work with scientific bodies internationally to adapt and adopt the ELIXIR Core Data Resources model as a global standard.

Expected outcomes The outcome of this strategic objective is clear and ambitious: we aim to have an international framework for sustaining Core Data Resources in place by 2023.

Key results • By the end of 2021, ELIXIR participates in a global (pilot) scheme for developing long-term management of Core Data Resources. • By end of 2023, ELIXIR Core Data Resources are established globally as the standard for bioinformatic resources management, and form the basis for an international funding and life cycle management strategy that secures long-term sustainability.

30. Anderson WP. Data management: A global coalition to sustain core data. Nature 2017 543(7644):179–9, https://doi.org/10.1038/543179a 31. Anderson, W. et al. Towards coordinated international support of core data resources for the life sciences. bioRxiv 2017, 110825, https://org/10.1101/110825.

38

ELIXIR Scientific Programme 2019–23

4

ELIXIR is the recognised and trusted life science foundation of the European Open Science Cloud

ELIXIR is ideally placed to drive the development of the European Open Science Cloud because of its focus on the data, tools, interoperability, compute and training needs of the life science community and its status as a mature data infrastructure with the necessary scale. ELIXIR’s aim for the EOSC is to ensure that European scientists will have access to advanced data services, technology platforms, samples and support services throughout the European Research Area. All of these resources will be openly accessible for reuse through the EOSC in full compliance with all ethical, regulatory and legal requirements. To establish EOSC for the life sciences, ELIXIR will work closely with other BMS RIs and user communities in the EOSC-Life consortium to develop a joint approach for the EOSC based on the substantial investments into life science clouds in our ELIXIR Nodes. EOSCLife aims to publish data resources from the BMS RIs as FAIR data resources and to link reusable tools and workflows to standardised compute services in national life science clouds. This will connect our users across Europe to a single login authentication and resource authorisation system. EOSC-Life will also develop the data policies needed to preserve and deepen the trust given by research participants and patients volunteering their data and samples. ELIXIR’s 2019–23 Programme activities in this area will build on the developments of the first Programme to provide a coordinated gateway for life scientists accessing the EOSC. Our internal programme will align closely with, and will complement, EOSC-Life, such that EOSC interoperability and user access will be a core theme in all ELIXIR Platforms. The ELIXIR Core Data Resources and recommended Deposition Databases bring vast quantities of life science data into EOSC. ELIXIR’s metadata services (e.g. identifier and ontology resolution and mapping services) provide the critically needed connectivity between the sources of data generation and data storage, analysis and exploitation. ELIXIR’s expertise in tools and workflows will ensure that high quality, well-maintained cloudbased workflows are available across the life science community. ELIXIR Registries will ensure workflow discovery and reuse. The ELIXIR AAI, clouds and secure data management services represent major service offerings from ELIXIR that will shape EOSC for the life sciences.

The availability of open, national and international cloud infrastructures for workflow implementation is exemplified by the usegalaxy.eu platform, which was recently launched by ELIXIR-DE. This is supported by the de.NBI cloud with support for the ELIXIR AAI. usegalaxy.eu is an EU mirror of Galaxy Main (usegalaxy.org), with an additional EU focus. Additional usegalaxy.xx services are in the pipeline, supported by other cloud services, providing access at the national level. This work is driven by the ELIXIR Galaxy Community. An additional key aspect will be the development of cloud-agnostic containerised workflows, allowing the workflows we develop to be run in any cloud environment. The ELIXIR Galaxy Community is active in the international BioContainers initiative and recently published recommendations for the packaging and containerizing of bioinformatics software. The EOSC provides a strategic opportunity for ELIXIR to influence the development of generic e-Infrastructure services that are supported and accessible throughout Europe. Thus, ELIXIR will engage with the EC, ESFRI, e-Infrastructures, and with other European stakeholders, to ensure that the EOSC drives the establishment of a set of integrated services from multiple non­commercial and commercial providers that can support the needs of the life science community, and that of science and science users generally. ELIXIR has developed a position paper on EOSC that will guide our engagement with EOSC.32 As EOSC is in its early stages, it is still undergoing active development. In a first phase, ELIXIR will engage with EOSC stakeholders, in concerted, cross-platform work, to connect ELIXIR resources to EOSC and will deliver a roadmap with clear guidance for Nodes on how to engage and connect services with EOSC. A key aim for ELIXIR in the early development of EOSC is the positioning of ELIXIR Registries within the overall data architecture for EOSC. Life sciences, through ELIXIR Node registries, have a well-established set of catalogues that, combined with Bioschemas, lays the foundation for federation across Nodes. This must now be linked to the cross-cutting EOSC registries via the EDMI data model.

32. ELIXIR Recommendation for European Open Science Cloud, http://bit.ly/2QTja9L

ELIXIR Scientific Programme 2019–23

39


Expected outcomes By 2023 ELIXIR, in partnership with BMS RIs and other stakeholders, will have defined and developed EOSC for the life sciences, allowing users to access fully operational services via ELIXIR portals and to be supported by ELIXIR Node experts. This requires that ELIXIR Recommended Resources, such as Core Data Resources and Deposition Databases,33 become core parts of the EOSC for life sciences. ELIXIR will provide sustainable, cloud-based workflows that represent the gold standard for workflow-based biological data analysis, demonstrating the central role of ELIXIR in the European life science data landscape. The EOSC will transform, in many ways, how European science communities work, and through training and other capacity building efforts, ELIXIR will build the capabilities of the user communities and infrastructure operators required to make the best use of the EOSC across all our Nodes.

Key results • By the end of 2019, ELIXIR AAI and Registries have a defined role in the EOSC architecture. • By the end of 2021, ELIXIR can routinely deploy containerised workflows for federated data in harmonised clouds that support trans-national user access. • By end of 2023, all ELIXIR Nodes can routinely support their users with access to EOSC via a set of established ELIXIR Infrastructure Services that provide access to clouds, workflows and ELIXIR data resources.

33. https://www.elixir-europe.org/platforms/data 34. Roman Garcia P, Smith A and Blomberg N. Public data resources as a business model for SMEs. The Role of Public Bioinformatics Infrastructure in supporting innovation in the life sciences. F1000Research 2018, 7(ELIXIR):590 (document), https://doi.org/10.7490/f1000research.1115445.1

40

ELIXIR Scientific Programme 2019–23

5

All ELIXIR Nodes connect life science users in academia and industry to our open, federated service network

Our vision of a federated, interconnected system for life science data rests on ELIXIR’s capacity for coordinated action through national Nodes. The ELIXIR Nodes, as national centres that provide services and support to life science users, reach out across our member states and provide the tools, computational services, data management planning, and harmonised data handling that are needed for FAIR data in European life science projects. The establishment of 22 ELIXIR Nodes across Europe provides an unprecedented opportunity to align practices and to drive the coordinated use of standards, services and long-term data management practice. ELIXIR Nodes are well positioned to support the “last mile/first mile” of data management with local user support. Many ELIXIR Nodes also provide consultancy services and advanced support resources for national projects. By supporting the consulting services, for example through joint training and staff exchange, we strengthen the Nodes’ services, and drive the standardisation and sharing of good practice. By linking data management tools and Deposition Databases with analysis workflows and national cloud environments, ELIXIR provides an incentive for researchers to submit data early in the project cycle and to adopt agreed data management standards as the default. Node experts within ELIXIR Communities will have a key role in delivering this strategy. For instance, ELIXIR has created a Europe-wide, expert network across Nodes in genome annotation and assembly through a targeted capacity building effort. ELIXIR’s resources are also well used by industry and small companies. In the ELIXIR 2019–23 Programme we will further develop our interactions with national bio-incubators and with similar organisations using the insights gained during the first ELIXIR Programme into successful business models and interactions.34 The goal is to develop ELIXIR Nodes as an enabler for open data innovation in small companies that spin out of national research environments. Open data from scientific research is a public good and, as an open research infrastructure, ELIXIR has a duty to make sure that open research data is not just an asset restricted to academic scientific use but that is accessible and used in education, and to create value in the private sector.

ELIXIR is a distributed organisation, in which services are delivered from our national Nodes. The impact and delivery of these services rely on well organised and inclusive Nodes that continuously engage with national research communities. The ELIXIR 2019–23 Programme will continue to develop our Nodes to provide excellence in delivering scientific services nationally and will contribute to international development within the Node’s focus areas. ELIXIR will also continue to develop indicators of use and impact, to capture and describe value to the national scientific community, as well as to demonstrate the broader societal impact of these scientific services. As ELIXIR grows, the organisational capabilities of the Nodes will also need to grow to meet the demands of increasing usage and the requirements arising from close collaborations with other ESFRI BMS RIs. This will require robust operational processes to be developed that allow the Nodes to participate in Commissioned Services and ELIXIR grants, in addition to ensuring that national strategies and communications are in place that allow the Nodes to effectively represent their national communities. With ELIXIR Platforms established as the long-term foundation of the infrastructure, the Training Platform (with strong representation in all Nodes) forms the basis of ELIXIR’s capacity and capability building activities. By reserving dedicated funding for capacity building activities throughout the 2019–23 Programme, we will ensure a continued focus and the required resources for Node development (e.g. in sustainable organisation and funding, including accessing structural funds, strengthening national bioinformatics and data management capabilities, and in promoting local innovation environments). In the 2019–23 Programme, we will allocate resources to Nodes for strategic projects to build capacity and to strengthen national environments by alignment with other ELIXIR initiatives based on an externally evaluated Requests for Proposals (see the section below “Programme structure and resource allocation”). The ELIXIR Staff Exchange Programme will be used to build scientific, technical and management capacity in the ELIXIR Nodes. ELIXIR recently published an Equal Opportunity Strategy,35 and in the 2019–23 Programme, we will develop actions to support a diverse and inclusive workforce that provides equal opportunities for all based on merit.

Expected outcomes The ELIXIR Nodes connect local scientific environments to ELIXIR’s federated data ecosystem. ELIXIR’s trans-national coordination (e.g. uptake of standards and the use of joint services) add value to the investments in national data management programmes including advanced support to national users. The expected outcome of these investments is the further strengthening of ELIXIR Nodes as national centres of excellence for bioinformatics and life science data. The ELIXIR Nodes will also act as enablers for open data innovation in small companies that spin out of national research environments. We will further develop the ELIXIR Node processes of service selection and delivery, at a national level, to share best practice within the infrastructure. These best practises include transparent mechanisms for the life-cycle management of Node’s services, as well as service maturity and continuous improvement models that respond to the needs of the user community. We will establish and monitor jointly agreed indicators of use and impact for ELIXIR’s services. The success of ELIXIR depends on our network of diverse and well-trained national experts. The Staff Exchange Programme and the planned project and service management training will support the personal development of our experts. Strong collaboration between national Nodes drives excellence in scientific services and the annual ELIXIR All Hands meeting is positioned as the meeting point for bioinformatics service delivery in Europe. Finally, we will assess the outcomes, using performance indicators, for: the overall operations and management capabilities of ELIXIR (e.g. an effective Commissioned Services process); for diversity and inclusion (e.g. geographical and gender representation); and for the representation and community engagement of ELIXIR Nodes, including in SME and industry programmes.

35. Gater C. ELIXIR Equal Opportunities Strategy. F1000Research 2018, 7(ELIXIR):1234 (document), https://doi.org/10.7490/f1000research.1115874.1

ELIXIR Scientific Programme 2019–23

41


Key results • By the end of 2021, at least 8 ELIXIR Nodes will have established national initiatives for data management and will have hosted events to support industry and SME usage of open, public data resources. • By the end of 2023, ELIXIR Nodes will provide FAIR data management support for life science including actively supporting submissions to the ELIXIR Deposition Databases. • By the end of 2023, ELIXIR Nodes will understand the national landscape of open data use in SMEs and will actively support the reuse of data by SMEs in partnership with national innovation programmes.

Operations of ELIXIR Finland are provided by CSC, the Finnish IT Center for Science, based in Espoo.

42

ELIXIR Scientific Programme 2019–23

ELIXIR Scientific Programme 2019–23

43


Delivering the ELIXIR Programme

Community of experts from across ELIXIR Nodes at the ELIXIR All Hands meeting in Berlin, June 2018

44

ELIXIR Scientific Programme 2019–23

ELIXIR Scientific Programme 2019–23

45


Programme structure and resource allocation • Strategy-driven Implementation Studies

The ELIXIR 2019–23 Programme is developed together with the ELIXIR 2019–23 Financial Plan, which details the indicative ELIXIR Budget with resources allocated towards specific challenges based on the ambitions set out in this project. The Platforms (Data, Tools, Compute, Interoperability and Training) and the Communities (including the cross-Platform services for access-controlled human data) have their own section in the Programme that outlines participation, resources, specific objectives and tasks funded through Commissioned Services. In addition, these sections will also outline the broad ambition of the respective Platforms with a set of high-level objectives that can guide aligned national Node funding and joint, externally funded projects. The ELIXIR 2019–23 Financial Plan is based on the scope and scale of activities described in the following sections. The allocation of resources will reflect the priority and strengths of each ELIXIR Node, the size of the Node, the research community supported by the Node, and how each Node can contribute to drive the Platform roadmaps (service excellence). The resource allocation will be agreed by the HoN Committee. The process and outcomes will be reviewed by the ELIXIR SAB as part of Programme development. In addition to the Hub funding, ELIXIR activities in 2019–23 will be funded via external grants. Our approach to building up a collective ELIXIR grant portfolio will be laid out in the section on the funding strategy of the Programme. The budget for technical activities in ELIXIR Nodes (Platforms and Communities) that are funded through the ELIXIR Budget is planned through the following categories of financial instruments:

46

ELIXIR Scientific Programme 2019–23

• Infrastructure services Operational resources identified through the ELIXIR Platforms. This budget will be allocated over the whole ELIXIR 2019–23 Programme, although resourcing in the second phase of the Programme is contingent on the outcome of the Programme MidTerm Review.

• ELIXIR Platform funding For the continued development, service coordination and inclusion of additional Node services to the ELIXIR Platforms: Data, Tools, Compute, Interoperability and Training. The funding will be allocated over the full length of the ELIXIR 2019–23 Programme, based on detailed Platform work plans (analogous to a work package in a H2020 grant). Plans and resourcing to Platforms may be revised following the Programme Mid-Term Review.

Resources for ELIXIR to drive cross-Platform activities and to drive initiatives to tackle strategic priorities ELIXIR-wide. The challenge and selection criteria are developed and agreed with the ELIXIR HoN, with resources concentrated on a small number of projects to tackle the identified challenge.

• Funding to ELIXIR Communities In the ELIXIR 2019–23 Programme the recognised ELIXIR Communities will have base-level funding each year to help organise and bring the community together, e.g. to sponsor an annual meeting or community hackathons. Following the acceptance of the Community roadmap by the HoN, a recognised ELIXIR Community will also run an initial Implementation Study to initiate the implementation of the roadmap and to link the Community to the ELIXIR Platform services. We will also reserve funding for a workshop to produce the Community roadmap as an F1000R white paper for emerging communities.

• The Staff Exchange Programme This programme is identified as a high priority by the ELIXIR HoN. Staff Exchange in the first ELIXIR Programme (2014–18) has been well evaluated by European Commission and the ELIXIR SAB, recognising the importance of this programme for building capacity across ELIXIR Nodes.

• Industry engagement activities The ELIXIR SME and Innovation Programme (supported by the EXCELERATE grant in 2015–2019) has been highly successful in promoting interactions between academia and industry. Maintaining the success of our industry outreach, and supporting the further development of an innovation ecosystem built on open data, is a high priority for ELIXIR. The ELIXIR 2019–23 Programme plans continued engagement activites with two SME events per year to be held in different ELIXIR Nodes with one event to support the newly established annual ELIXIR Bioinformatics Suppliers Forum (EBSF).

• Community-led Implementation Studies Funding for Implementation Studies to support the linking of ELIXIR Communities with ELIXIR Platforms. This resource would be allocated via a regular Request for Proposals (RFP) mechanism targeting recognised ELIXIR Communities, together with ELIXIR Platform services, for an Implementation Study to adapt and adopt services to meet the needs of the research community. Note, that only ELIXIR Nodes (as defined by the existence of active Collaboration Agreements) can be funded through ELIXIR Commissioned Services.

ELIXIR Scientific Programme 2019–23

47


Organisation and Governance

The ELIXIR Board is the highest decision-making body in ELIXIR. The Scientific Advisory Board (SAB) advises the Board on ELIXIR’s scientific strategy and reviews ELIXIR Node applications. The Director is responsible to the ELIXIR Board for implementing ELIXIR’s scientific programme. The Heads of Nodes Committee consists of the heads of the ELIXIR national infrastructures (Nodes) and is chaired by the Director. The Committee develops ELIXIR’s scientific and technical strategy, and agrees on the ELIXIR Programme, Annual Work Plan, and Commissioned Services project plans. ELIXIR has five technical Platforms that coordinate services across our Nodes: Data, Tools, Compute, Interoperability and Training. Each Platform is led by a leadership team (‘Platform ExCo’ or co-chairs), which is appointed by the Heads of Nodes Committee. The Platform membership consists of task leads, technical staff and a Platform Coordinator based at the ELIXIR Hub. Each Platform Coordinator provides support to the Platform leaders, oversees implementation projects, and ensures that the Platform’s activities are aligned to its goals. Similarly, the Coordinators for the ELIXIR Communities and Human Data work across these groupings to facilitate cross-Node working and to drive the implementation of Platform roadmaps. ELIXIR has two permanent Working Groups. The Technical Coordinators Group (TeCG) comprises technical representatives from each ELIXIR Node. The group identifies gaps and promotes connections among Nodes and Platforms. It is the responsibility of the TeCG to explore technical opportunities and issues and to provide advice and recommendations. The TeCG connects to the technical Platforms through Node technical representatives, who are actively involved in the implementation of Platforms.

36. ELIXIR Handbook of Opeations. 2018, https://www.elixir-europe.org/aboutus/governance/handbook-operations

48

ELIXIR Scientific Programme 2019–23

The Training Coordinators Group (TrCG) is embedded in the Training Platform and consists of training representatives from each of the ELIXIR Nodes. The TrCG meets regularly to share information and expertise and to coordinate and lead the implementation of the ELIXIR training strategy across Europe. The ELIXIR Handbook of Operations36 is the authoritative and up-to-date source for all aspects of ELIXIR Organisation, Governance and internal processes.

Data Platform

The ELIXIR Data Platform will deliver a sustainably funded portfolio of Core Data Resources that exemplify excellence within a coordinated and vibrant ecosystem of Node data resources, to meet the needs of the European life science research community. Platform organisation Platform co-Leaders Johanna McEntyre (EMBL-EBI) and Christine Durinx (ELIXIR-CH)

Platform Coordinator Rachel Drysdale (ELIXIR Hub)

Platform scope and ambition Challenge Bioinformaticians and life science researchers in both academic and industrial settings need open access to technically and scientifically excellent data resources for effective data discovery, deposition, and re-use, with confidence in the sound governance, life cycle management, and long-term sustainability of those data resources.

General objective The ELIXIR Data Platform aims to drive use, re-use, and the value of life science data by providing users with robust, long-term sustainable data resources within a coordinated, scalable and connected data ecosystem. The Data Platform will run the processes required to maintain the portfolio of Core Data Resources and Deposition Databases as flagships of excellence. Based on quantitative and qualitative indicators of excellence, the Platform will provide materials to support of discussions of sustainable funding models, working together with funding organisations and science policy makers at global levels, such as the Global Coalition for Life Sciences Data Resources. The

Data Platform will build a connected data ecosystem by crosslinking articles in Europe PMC to all ELIXIR data resources, and by providing mechanisms for deep linking to specific locations in full text research articles. The ecosystem will be extended to include “orphan data” that are related to the ELIXIR data landscape but currently unhoused by existing data resources. For knowledgebases specifically, the Data Platform will maintain infrastructure (developed as part of EXCELERATE WP3) to support scalable curation workflows for both professional curators and community curators; this will use a combination of software engineering and the outputs of the text mining community to deliver richly annotated articles and triage systems. This work will be carried out in four groups (see Table 2).

The Data Platform in 2023 • By 2023, the ELIXIR Core Data Resources will be established, identified and recognized globally as a gold standard for data resources. • The Core Data Resources will contribute to the dissemination of good practise in Data Resource management, collaboration, sustainability and communication. • The long-term sustainability of ELIXIR Core Data Resources will be assured through an international funding and life cycle management strategy. • The Data Platform will promote an authoritative list of mandated ELIXIR Deposition Databases, to support the Data Management policies of funding agencies and publishers.

ELIXIR Scientific Programme 2019–23

49


• Via a combination of Data Platform activities and competitive Implementation Studies, the Data Platform will build a connected ecosystem across all ELIXIR data resources listed in Node Service Delivery Plans (including restricted-access human data repositories, such as EGA, unstructured supplementary [orphan] data housed in BioStudies, and specialist knowledgebases), linking data with the literature.

• The open access full text literature will be used as the basis of an infrastructure for supporting human curation via automated and computational approaches, with the objective to make critical curation workflows more scalable. The adoption of community curation mechanisms will be better understood and supported with best practice guidelines.

Table 2: Data Platform groups and their objectives

Data Platform goal for 2019–2013: Build a connected, sustainable ecosystem of data resources across ELIXIR Task 1 Administration and support for Core Data Resource (CDR) and Deposition Database (EDD) portfolio

Groups

Aim

SubTask 1.1

Core Data Resources and Deposition Databases

Manage the Core Data Resource and Deposition Database portfolio

Core Data Resource and Deposition Database selection

• Ensuring the maintenance, monitoring and ongoing selection process for the ELIXIR Core Data Resource and ELIXIR Deposition Database lists. • Implementing, and collecting annually, quantitative and qualitative indicators for ELIXIR Data Resources; aggregated reports will be generated and disseminated to stakeholders. • Fostering coherence in approaches for demonstrating value of life science resources via the Core Data Resource Forum.

Literature-Data Integration

Build a connected ELIXIR data ecosystem • Through outreach activities such as staff placements and hackathons, provide support for deep integration between ELIXIR data resources: (1) support reproducibility of research by linking papers to underlying data in ELIXIR data resources; (2) ensure reference datasets in ELIXIR data resources, both deposited and curated, comprehensively link to Europe PMC; (3) provide technology to support deep linking between Europe PMC and databases; (4) provide support for ORCID integration into data resources.

Scalable Curation

Maximise support for human curation • Develop infrastructure around full text article resources to support curator workflows. This will be done by semantically enriching research articles and exploring the development of article triage systems as infrastructure. For example, daily text mining of biological concepts from full text research articles and sharing the annotations for use in search, triage, and crosslinking. The opportunities and role for community curation will also be explored.

Long Term Sustainability

Global partnerships for sustainable Core Data Resources • Contribute to the establishment of a global, internationally shared, sustainable funding model for Core Data Resources. • Share the experience gained with the European life science data infrastructure from the ELIXIR Core Data Resource selection process, as considerations of global priorities and resource allocation proceed.

Objective To add new data resources to the ELIXIR Core Data Resource and Deposition Database lists in light of maturation of data resource provision across the ELIXIR Nodes, and to reflect the evolution of life science needs. • Outcome: Decisions to add (or not) new CDRs and EDDs to the lists, for 2020 and 2022. • How will it be accomplished: Using the previously used process, undertake third and fourth round of CDR selection.

Deliverables D1.1 Updated CDR/EDD list, (Y2) D1.2 Updated CDR/EDD list, (Y4)

SubTask 1.2 Core Data Resource indicator collection and service monitoring Objective To gather indicator data on an annual basis, to support periodic review of data resources and the generation of infographics and data visualisations for ELIXIR outreach and policy activities (see “Long Term Sustainability” Group). • Outcome: Complete, confidential, collection of indicator statistics gathered annually (Q2-Q3). • How will it be accomplished: Using the processes previously developed as part of EXCELERATE WP3 Task 3.2.

Deliverables D1.3 All Indicator data annually updated and provided to the HoN Committee, (Q3 of each year)

SubTask 1.3 Core Data Resource and Deposition Database lists formal periodic review Objective To ensure that the resources on the CDR and EDD lists remain state-of-the-art and reflect the needs and practices of the life science community. • Outcome: Consensus on the make-up of the ongoing portfolio of CDRs and EDDs. • How will it be accomplished: Based on the indicator data (Subtask 1.2), the HoN Committee reviews all ELIXIR Core Data Resources and recommended ELIXIR Deposition Databases every two to three years. This activity will start in Q3:2020, i.e. 3 years after the publication of the first set of ELIXIR CDRs. An extraordinary evaluation of an individual resource can also be done, upon request of minimum three Heads of Nodes, on the basis of the monitoring data. • Risks & Opportunities: Removing a data resource from the CDR and/or EDD list may have far-reaching consequences for the resource in question.

Deliverables D1.4 Reviewed/Updated CDR and EDD lists, as required, (systematic review starting Y2)

• Influence the development of Data Management Plans to ensure best practice and adoption of ELIXIR Core Data Resources and ELIXIR Deposition Databases.

50

ELIXIR Scientific Programme 2019–23

ELIXIR Scientific Programme 2019–23

51


Task 2

Risks & opportunities

How will it be accomplished

Subtask 3.3

Literature-Data Integration

Extensive crosslinking across related resources will maximise the return on investment in those resources, help foster a sense of community, and engender a coherent infrastructure. The risk of not doing this will be to fracture ELIXIR, as well as make it harder on a practical level for related resources to engage with ELIXIR resources (such as journals and data resources that are not within ELIXIR, but have points of contact).

Through a combination of core platform support and competitive, strategic Implementation Studies and workshops/hackathons that build on the previous work from Task 3.3, ELIXIR-EXCELERATE, to enable the following:

Community annotation

ELIXIR is a distributed data infrastructure that meets the scientific needs of the life science research community for data deposition and informatics service provision, and as a source of reference for best practises in data resource operation and management. The goal of this group is to ensure that the links between the distributed data components of this infrastructure are made consistently and extensively, providing deep integration between those components, enabling synergies across the data ecosystem, and supporting initiatives aimed at documenting and demonstrating reproducibility of research.

Deliverables D2.1 Periodic quantitative and qualitative reporting on activities such as hackathons and the status of data resource crosslinking with Europe PMC, orphan data integration, uptake of Schema.org and Bioschemas, (Y1, Y2, Y3, Y4, Y5).

Objective To extend the connected ecosystem to any ELIXIR data resource with the scientific literature, incorporating orphan data and human data, and to provide connectivity with related infrastructures.

How will it be accomplished Outreach and technical support in the form of presentations, webinars, hackathons and staff exchange on the following topics: • Comprehensive cross-linking at deep level between data resources, Europe PMC and researchers, via their ORCiD identifiers. • Analytic services and API use (for example, for citation data, and crosslink information) • Methods to deep-link curation statements • Integration of orphan data related to ELIXIR data resources and/or Data Management Plans via BioStudies; use of BioStudies as a part of long-term sustainability planning • Encouraging schema.org and Bioschemas adoption in CDRs (with Interoperability Platform) • At least two hackathons/workshops to engage and technically support data resources to strengthen their integration with other elements of the ELIXIR infrastructure. • Outcome: Increased number of ELIXIR data resources linked to each other, the scientific literature via Europe PMC, and integrated with orphan data as appropriate.

Task 3 Scalable Curation In the future, the research literature will be increasingly open access, with new communication mechanisms such as preprints requiring versions management and new peer review mechanisms. Managing full text article corpora for text mining will be much more challenging than managing just abstracts, and it is unlikely that each and every text mining group will want to invest the necessary time and effort when there are public resources already available. Bringing the compute to the data is commonplace in most informatics workflows and there is no reason why text mining operations will be different in the long term. The process of curation, performed by expert biologists, is the life-blood of knowledgebases. Curators need to identify key papers, read the full text of the articles to weigh up the evidence, then extract the most pertinent information. A growing corpus of open access full text articles provides new opportunities to enhance article triage and browsing systems; at the same time, many text mining workflows are mature enough to support curation activities. This group of tasks aims to build community and infrastructure based on the open full-text research literature. By providing a platform for doing text mining and sharing the outputs, developing standards, and then combining the semantic enrichment with rich article metadata and software tools, we expect to provide scalable support for curation across multiple knowledgebases.

Objective

Subtask 3.1 Semantically annotated Europe PMC documents, linked to underlying data resources. –– Infrastructure support for 3rd party providers of text-mined or manual annotations to upload and share annotations both programmatically and with end users. This will be based on existing ontology and annotations infrastructure, annotations APIs and Europe PMC interfaces. Providers will be expected to deliver annotations by a choice of specific protocols (as a VM, API calls or data) and according to defined standards (e.g. Web Annotation Format). –– Routine extraction of additional key entities central to multiple curation groups, as identified by user research. For example, experimental methods, mutations, sequence fragments, or biological processes for the Gene Ontology. –– Validation of text-mined annotations by curators, providing feedback on annotation quality, feeding into algorithms; adaptation of existing public tools in Europe PMC to support validation workflows; mapping existing curation statements back to the source literature.

Subtask 3.2 Scalable article triage systems (“triage systems as infrastructure”) –– Provide public customized search results using a pre-annotated document collection, enriched as required through Subtask 3.1 above, in combination with the rich metadata and search tools provided by Europe PMC. –– Tune and evaluate effectiveness of custom search engines with sample curated data and user research.

Explore workflows, incentives and challenges for community curation. Identify and recommend best practices through workshops with both professional and community expert curators. (With Training Platform)

Outcome Use of the infrastructure by multiple curation groups. Curators able to report efficiency savings (either time, or increased throughput); description of current landscape of community curation practice, standards and incentives.

Risks & opportunities The opportunity is to build infrastructure around article full text that anticipates future requirements. In turn this will maximise manual curation efforts across all ELIXIR data resources, in particular knowledgebases, building community and elevating the quality of data resources. The risk in not taking this approach is that the opportunity to capitalise on open access content for public infrastructure will be missed, making it more challenging to sustain knowledgebases in the future. It is possible that the outputs from automated methods and tools do not meet curator needs, but this will be mitigated through significant amounts of user research and user feedback.

Milestones M3.1 Inclusion of experimental methods tagging in Europe PMC daily pipeline, (Y1) M3.2 Initial back-mapping for specific UniProt annotations such as mutations, (Y1) M3.3 Inclusion of other key entity tagging in Europe PMC daily pipeline, as determined by user research, (Y2) M3.4 Extension of tools for curator feedback, (Y3)

Deliverables D3.1 Richly annotated Europe PMC that meets curation needs for CDR knowledgebases, (Y1) D3.2 Report on community curation practices and opportunities, (Y2) D3.3 Demonstration of scalable article triage system based on public infrastructure, (Y2) D3.4 Combined scalable triage, pre-annotated Europe PMC, 3rd party annotations infrastructure, demonstration of use by community curators in resources such as IntAct, Disprot and neXtProt, (Y4)

To maximise the ability of expert human curators to enrich the ELIXIR knowledgebases through providing trans-resource, scalable curation solutions

52

ELIXIR Scientific Programme 2019–23

ELIXIR Scientific Programme 2019–23

53


Task 4

Subtask 4.2

Long term sustainability

Outreach activities: Build awareness on Core Data Resources and the need for long-term sustainability

Supporting the establishment of global partnerships and business cases for the long-term financial sustainability of Core Data Resources. Objective: To ensure the long-term financial sustainability of the ELIXIR Core Data Resources by contributing to the establishment of a global, internationally shared, sustainable funding model for Core Data Resources.

Outcome ELIXIR contributes significantly to the awarenessbuilding and international efforts related to a global, internationally shared funding model for Core Data Resources.

How will it be accomplished By supporting the work of the Global Coalition through the work of the Data Platform, the data related to the ELIXIR CDRs, and through our publications.

Risks & opportunities Accomplishment is dependent on many external factors.

When will it be accomplished? Y4.

Subtask 4.1 Generate a suite of materials that support the business case and related activities towards the sustainability of CDRs • Build CDR/EDD data infographics and visualisations of CDR data, reviewed annually (aligned with Group 1 activities) • Generate impact stories that span Core Data Resources and other ELIXIR resources, via user stories from the CDR Forum and elsewhere, and linked data/text mining analysis of use of CDRs (aligned with Groups 2 and 3) • Road test and seek support for these activities with the CDR community via the running of and engagement with the CDR Forum via quarterly or biannual teleconferences, or co-located face-to-face meetings

54

ELIXIR Scientific Programme 2019–23

Tools Platform

• Via Nodes, encourage the use of Core Data Resources and ELIXIR Deposition Databases by life science researchers, through fostering best practices in data management • With journals, establish best practices around data deposition, data linking and data citation • Build a programme of outreach around the need to ensure the financial long-term sustainability of the CDRs by –– giving presentations –– writing publications • Conduct surveys of funding guidelines from other European funding bodies (in addition to ERC) to find additional opportunities for publicising ELIXIR Deposition Databases.

The Tools Platform aims to improve the discovery, quality, interoperability and sustainability of software resources to enable scientists to analyse life sciences data.

Platform organisation

• Continue our collaboration with the Global Coalition for data resource sustainability.

Platform co-Leaders

Deliverables

Alfonso Valencia / Salvador Capella (ELIXIR-ES), Søren Brunak / Jon Ison (ELIXIR-DK)

D4.1 A suite of outreach materials (visualisations, infographics) made available for public use by ELIXIR and its partners; collections of impact stories; publication of analyses of scientific use across Core Data Resources, (Q3 of each year) D4.2 Strategy document addressing the need for outreach message for financial long-term sustainability of the Core Data Resources, including methodology for targeting funders and publishers for promoting the ELIXIR Deposition Databases; lists of meetings attended, presentations given and outcomes, etc., (Y4)

Platform Coordinator Jen Harrow (ELIXIR Hub)

Platform scope and ambition Challenge There is a growing need to support and enable researchers to find, use and integrate bioinformatics software resources in order to study biological phenomena from a broad range of data sources.

General objective The ELIXIR Tools Platform will be the entry point for the emerging ELIXIR Communities by facilitating the migration of their tools, workflows and resources under the ELIXIR umbrella. Such migration can easily happen using the software containers infrastructure of the Tools Platform, the integration of reference datasets for benchmarking activities, and the registry of all of those bioinformatics resources. This foundation will facilitate: the use of the existing computational resources provided for running analytical workflows e.g. using Galaxy (ELIXIR Compute Platform); the adoption of interoperability standards and principles e.g. FAIR data (Interoperability Platform); the identification of relevant training material and activities, as well the development of new ones where needed (Training Platform); and use of relevant ELIXIR Core Data Resource and Deposition Databases (Data Platform). This will be achieved by following the three main objectives of the Tools Platform: 1) help software users to find, compare, access, reuse and deploy benchmarked software tools, including workflows; 2) help software providers and developers to better describe, develop and monitor their productions; and 3) help establish and host continuous benchmarking efforts across a scientific community.

ELIXIR Scientific Programme 2019–23

55


Crosslinks

(TeSS, Identifiers.org, MyExperiment etc)

User experience

An integrated environment for life science software driven by established and emerging ELIXIR Communities

Registry of basic information & identifiers (bio. tools)

Packaging, containerisation & deployment (BioContainers)

Performance benchmarking & technical monitoring (OpenEBench)

Integration, execution & interoperability (e.g. Galaxy platform)

Information standards (bioToolsSchema, CWL, OpenAPI, Bioschemas, EDAM)

Software development best practices

(Scientific) Community support & engagement

Tools interoperability

Training materials

Tools Platform Tools interoperability

ELIXIR Training & Interoperability Platforms

ELIXIR Data & Compute Platforms

Cloud services (EOSC)

Data repositories including ELIXIR Core Data Resources

Publications

Figure 3: ELIXIR Tools Platform and its position within the ELIXIR structure.

The Tools Platform in 2023 The Tools Platform will: • Be the central hub for European life science software, providing a cohesive ecosystem of integrated resources to enable tool end users, developers and communities to access, reuse and benchmark software tools and workflows. • Maintain a registry – fully integrated with other ELIXIR resources – of high-quality, validated “canonical” descriptions of tools, enabling end-users worldwide to find, compare and connect appropriate tools, workflows and data resources, based on benchmarking and on the technical monitoring of data, publications and the user-driven filtering of metadata. • Be the entry point for the deployment of scientific tools and workflows, with seamless provision via any computational resource (e.g. Galaxy or/and BioContainers); a global architecture for container deployment on ELIXIR will provide thousands of “ready to deploy” bioinformatics tools, containerized, 56

ELIXIR Scientific Programme 2019–23

interoperable and linked. Tools will also be usable for training as Open Educational Resources (OERs), integrated with tools and services in the Learning Management Systems (LMS). This effort will be strongly tied to the ELIXIR Compute Platform and will be coordinated with similar efforts in international initiatives, such as the GA4GH. • Raise the quality and sustainability of software by enabling better research software development through information standards, best practices and training resources, with a special focus on helping developers to develop and describe their software tools, workflows and containers in a way that benefits end-users. • Promote and support tool end-users, developers, providers and communities to engage with, contribute to and to use the Tools Platform infrastructure in strong collaboration with Training and Capacity Building. The work of the Tools Platform includes five technical groups that remain responsive to community suggestions and events:

Table 3: Tools Platform groups and their objectives Groups

Aim

Packaging, containerisation & deployment

To support community efforts on bioinformatics software deployment methods, in particular the Bioconda/BioContainers initiative, and to support the sustainable integration of these projects with bio.tools and OpenEBench, especially with respect to the benchmarking and monitoring of bioinformatics software.

Performance benchmarking & technical monitoring

To deliver the OpenEbench service, an emerging infrastructure that provides services ranging from the hosting of community-driven scientific benchmark activities, and the technical monitoring of bioinformatics tools and services. The scientific benchmarking supports scientists to select the best-suited bioinformatics tool for a given scientific question, technical monitoring encompasses uptime, sustainability and software quality.

Registry of basic information & identifiers

To deliver bio.tools, a discovery portal for bioinformatics software information. Bio.tools will provide a persistent reference: to high-quality (curated and verified) “canonical” descriptions of tools and data services, with links to the places they can be downloaded, deployed or run (Bioconda/BioContainers, Galaxy); to the results of performance benchmarking and technical monitoring (OpenEBench); to the scientific literature; and to training materials (TeSS). It will also interconnect training materials with tools and services.

Integration, execution & interoperability

To drive the development of workflow execution platforms (notably Galaxy) and ensure their integration with bio.tools and workflow systems and standards, such as CWL and OpenAPI. This task will support the sustainable upkeep of tool metadata and the deployment of tools in all of these environments, and will provide specific training for users, developers and system administrators to encourage their adoption by the ELIXIR Platforms and Communities.

Software development best practices

Raise the quality and sustainability of software by producing, adopting, promoting and measuring information standards and best practices applied to the life cycle stages of software development.

Tools Platform goals for 2019–2013 ELIXIR activities are being focused and organised around scientific communities. The Tools Platform will play an increasingly pivotal role, helping the communities identify, register and benchmark tools in order to access, analyse and integrate biological data, driving scientific discovery across the spectrum of the life sciences. We will focus on integration activities to develop components of the Tools Platform into a cohesive environment for end-users. Tools and data services will be connected: to the places, where they can be downloaded, deployed or run; to results of performance benchmarking and technical monitoring; to the scientific literature, and to training materials, thus providing a more cohesive experience for the endusers. The emphasis is on putting tools in the context of their scientific application, the common workflows in bioinformatics and practical deployment and execution solutions, notably container technologies and Galaxy. Technically, the core products of the Platform (bio.tools, BioContainers and OpenEBench) will be integrated and cross-linked to other ELIXIR

resources, such as TeSS, in a sustainable way. We will maintain information standards for tools, and produce, adopt, promote and measure best practices for software development for the benefit of endusers, supporting them and developers, providers and communities to engage with, contribute to, and use, the Tools Platform infrastructure.

Overarching task Scientific communities depend upon workflows for the flexible utilisation of life science tools that are needed to optimally convert data into knowledge. The Tools Platform has laid the foundation to describe, access, benchmark and deploy these resources, and now seeks to systematically develop and apply the infrastructure to the essential workflows of existing and emerging communities across the spectrum of life sciences. Working with the ELIXIR Communities, we will provide state-of-the-art benchmarked workflows available in Galaxy for each community, integrating data repositories, tools and training materials.

ELIXIR Scientific Programme 2019–23

57


Task 1 Packaging, containerisation & deployment We have, via the ongoing BioContainers Implementation Study, already provided 4,300 containers for open source software, which can be installed and executed in an isolated and controllable environment:37 We will maintain the BioContainers project, enhance the technical infrastructure and foster the community around it, putting it on more sustainable footing, and increasing content by working closely with ELIXIR Communities and projects such as EOSC-Life.

Deliverable Provide containerised tools and state-of-the-art benchmarked workflows available in Galaxy for scientific communities, integrating data repositories, tools, and training materials, and ensuring all workflows and tools are curated to a high standard, rendered FAIR, and follow agreed standards with/by initiatives like GA4GH and/or EOSC to ensure long-term sustainability and impact. This depends upon a robust infrastructure for build and packaging (BioContainers), benchmarking (OpenEBench) and description/ discovery (bio.tools), which will be enhanced and maintained for the purpose.

Cross-platform Compute

Subtask 1.1 Maintenance, new functionality & community building Enhance and maintain the build and packaging infrastructure to create BioContainers containers, while strengthening the BioContainers community to make it more sustainable. • 1.1.1 Maintain & improve BioContainers infrastructure: The package and container build infrastructure will be extended, maintained and adopted to new technologies (e.g. new container formats, new specifications, and enhanced tests). Updating of packages and containers will be handled by an automatic bot as far as possible, followed by extensive testing and reviewing. The plan is to also build containers for conda-forge automatically, and to support other communities, such as big-data machine learning. All artefacts (tarballs, packages, containers and others) will be stored, mirrored and made accessible to everyone via HTTP. The build system will leverage existing efforts using Continuous Integration Systems e.g. Travis; and will be supported by partners from the ELIXIR Compute Platform when needed.

• 1.1.2 Improving and extending BioContainers API for community workflows: The BioContainers API should provide more entry points for Tool Descriptors/test and connections to workflows (task 1.1.3) increasing the integration with TRS (GA4GH), and similar initiatives, which might emerge from global initiatives such as EOSC. This effort includes releasing an API specification and reference implementation. • 1.1.3 Fostering and evolving community participation: We will invest in community building by organising online hackathons, helping people to build upon BioContainers and Bioconda, and through community support and training. Increasing the implementation of best practices for containers development in collaboration with other communities such as DockStore, Bioconda and Singularity. • 1.1.4 Containers for community workflows: working with the ELIXIR Communities, we will make containerised tools and state-of-the-art benchmarked workflows available in Galaxy and CWL, integrating data repositories, tools and training materials, and ensuring all workflows and tools are curated to a high standard and rendered FAIR. Every community will be presented with at least one workflow on usegalaxy.eu, that is well described in a training, benchmarked in OpenEBench, runs completely with support of BioContainers and is tested automatically and regularly on usegalaxy.eu. • 1.1.5 Design and implement an initial pilot for federated software containers images. A specific mechanism for having a local BioContainers-like version, which might be private and/or public and contains (at least partially) the software container images reference in that local implementation. • 1.1.6 Monitor the use of software containers. Propose mechanisms and implement a pilot to track how software containers are used across the community and which platforms are represented, e.g. academic, private, industrial, or Galaxy. This will contribute to a better picture of software containers that are being used across and beyond the ELIXIR ecosystem. This data will be also accessible via an API.

Milestones M1.1. Monitor the use of software containers, (Y2) M1.2. Fostering and evolving community participation, (Y1-5)

ELIXIR Scientific Programme 2019–23

Subtask 2.1

D1.1. Maintain & improve BioContainers infrastructure, (Y1-5) D1.2. Improving, extending BioContainers API for community workflows, (Y1-2) D1.3. Delivering containers for the community workflows, (Y1-2) D1.4. Design and implement an initial pilot for federated software containers images, (Y2)

The core benchmarking service

Task 2 Performance benchmarking & technical monitoring OpenEbench has been designed to host Communityled scientific benchmarking efforts of bioinformatics software resources including workflows. Moreover, OpenEBench technically monitors the performance of those software resources. One of the main goals of OpenEBench is to become an observatory of bioinformatics software quality regarding openness. OpenEBench is strongly committed to promoting the adoption of all guidelines and technologies promoted by ELIXIR including but not limited to ELIXIR AAI, software development and containerization best-practices, FAIR data principles for reference datasets, as well as the principles on Open Data, Open Source and Open Science. The long-term objective of OpenEBench is to become an integral part of the ELIXIR Tools Platform ecosystem, aiming to provide a central hub for researchers, developers, communities and funding agencies, to find, use and deploy the best bioinformatics resources for their research questions.

Deliverable Provide a stable infrastructure (OpenEBench) to support community-led benchmarking activities and to technically monitor registered services, servers and tools at various ELIXIR infrastructures like bio.tools, Bioconda/BioContainers and Galaxy. OpenEBench will adjust the support according to the maturity of the scientific community, from the well-established to the newly formed ones, by offering a three-level entry point to the platform. OpenEBench will provide guidelines for communities – especially the existing and emerging ELIXIR ones – to easily join the platform and the whole ELIXIR ecosystem.

Cross-platform ELIXIR Communities, Galaxy, Compute, Training, Data, Interoperability

37. http://biocontainers.pro/

58

Deliverables

Establish the core ELIXIR Service for benchmarking to lower the benchmarking start-up hurdle, provide basic tests of tool operability, and alleviate reimplementation of abstractable workflows. • 2.1.1 Execution environment for benchmarking workflows: Provide an environment for tool and workflow execution to allow a fair technical comparison and to guarantee data provenance from input to output. This will be done in strong collaboration with the ELIXIR Compute Platform to leverage cloud resources provided by Nodes, and with the ELIXIR Interoperability Platform to guarantee that data follows the FAIR principles and uses the most appropriate standards. • 2.1.2 Automated data import: Provide stable mechanisms, e.g. APIs, for import of scientific benchmarking data including reference datasets from existing efforts. Tools test sets and/or the results of running such tests can be gathered from communitydriven initiatives like Galaxy. • 2.1.3 Scientific benchmarking test case: Collaborate with established scientific communities devoted to benchmarking of tools, web-servers and/or workflows (e.g. CAFA, and CAMEO), by integrating the results of 2.1 and 2.2 in the benchmarking service to demonstrate its feasibility to a broader audience.

Subtask 2.2 Curation & rendering of benchmarking data Curate and render information about the technical and/ or scientific performance of tools and workflows, and share that information with relevant ELIXIR resources e.g. bio.tools, BioContainers, Training Platform and Capacity Building, and beyond, as a mechanism for engaging and increasing visibility. • 2.2.1 Semi-automated Curation: Exploit scientific publication data to identify 1) technically monitored tools that are not registered in bio.tools; 2) highly used but un-benchmarked workflows; and 3) highly used tools without containers and/or training materials. Provide links from benchmarked workflows to their software containers (BioContainers), their availability on ELIXIR Cloud services (e.g. Galaxy) and training materials (TeSS). • 2.2.2 Extended Widgets Gallery: Provide a set of widgets that can be deployed anywhere to render scientific benchmarking and/or technical monitoring information. Increase the number of available widgets as new information becomes available and/or new needs are identified by developers and/or end-users.

ELIXIR Scientific Programme 2019–23

59


Milestones

Subtask 3.1

M2.1. Produce a community benchmarking test case, (Y2)

Sustainable curation

Deliverables D2.1. Deliver an environment for tool and workflow execution, (Y2) D2.2. Deliver an automated data import, (Y1)

Task 3 Registry of bioinformatic tools metadata and identifiers bio.tools provides persistent identifiers for 10,000+ verified tool descriptions and has attracted 20 new contributors per month over the last year. We will implement a raft of developments: 1) to ensure the curation effort, including integration with other projects, is much more sustainable; 2) to engage with scientific experts including ELIXIR Communities and ELIXIR Nodes to improve the scientific quality and national representation; and 3) to deliver the content on the Semantic Web.

Deliverable Establish and support a network of thematic editors for scientific areas and Nodes, to oversee and drive improvements of the scientific quality of bio.tools and EDAM, ensuring the interests of scientific communities (including existing and emerging ELIXIR Communities) and ELIXIR Nodes are adequately represented. This will be delivered via a combination of best-practice guidelines, extension of the bio.tools studentship scheme, community-led workshops, and adherence to the bio.tools Tool Information Standards.

Provide richer and more consistent tool metadata, ensuring bio.tools is comprehensive and up-to-date. Enable better and easier scientific annotation (EDAM) of tools, workflows and data services, and make curation more flexible and sustainable. • 3.1.1 Entry ownership: systematically promote the adoption and maintenance of bio.tools entries by their rightful owners via seamless email-based mechanism and drive improvements according to the tool information standard38 • 3.1.2 Curation tooling: further develop utilities that assist manual curation (e.g. edamBrowser,39 edamToolAnnotator,40 and edamMap41 ), integrating these with bio.tools to provide a consistent framework for efficient curation • 3.1.3 Registration mechanisms: implement stable, automated import mechanisms where tool metadata is already (partially) maintained elsewhere (e.g. BioConductor, BioJS). Support pull of metadata from repositories such as GitHub or from marked-up web pages • 3.1.4 Cross-linking: implement stable mechanisms for cross-linking bio.tools and other ELIXIR resources (e.g. BioContainers, TeSS, and OpenEBench) via bio.tools tool identifiers or EDAM annotations as applicable

Cross-platform

• 3.2.2 Curation: Improve coverage and scientific quality of EDAM & tool descriptions in thematic areas, via studentships and workshops overseen by thematic editors. This will include high priority, community specific tools and workflows, and production of common workflow diagrams for purposes of training (TeSS) and of browsing bio. tools.

D3.3. Establishing a network of community thematic editors, (Y2-4)

• 3.2.3 ELIXIR Node editors: Establish network of editors for each Node (by default Technical Coordinators) to represent the national interests in software. Drive the improvement of ELIXIR Node software and databases in bio.tools as per the tool information standards, and publish an article promoting these national offerings.

Integration, execution & interoperability

Cross-platform ELIXIR Communities, ELIXIR Nodes

Subtask 3.3 bio.tools triple store & Linked Data We will serve the entire bio.tools content as a triple store and in new serialisation formats to enable novel Semantic Web / Linked Data applications. • 3.3.1 bio.tools triple store & serialisations: Provide a triple store (RDF/XML) of the bio.tools data and appropriate serialisation (e.g. JSON-LD); this will entail encapsulating controlled vocabularies from biotoolsSchema in a formalised ontology

Subtask 3.2

• 3.3.2 Novel visualisations and query: explore powerful new visualisation, query and integration capabilities, using the triple store, that are currently not convenient or even possible, in collaboration with bio.tools end-users

Cross-platform

Thematic editors

ELIXIR Communities, ELIXIR Nodes, Data, Training

Milestones

Establish a network of thematic editors42 for scientific areas and Nodes, to oversee improvements in the scientific quality of EDAM and bio.tools, and ensure national interests are adequately represented. This will represent existing and emerging ELIXIR Communities, e.g. proteomics, metabolomics, structural biology and marine metagenomics.

38. https://github.com/bio-tools/biotoolsSchemaDocs/blob/master/ information_standard.rst 39. https://github.com/IFB-ElixirFr/edam-browser

ELIXIR Communities, ELIXIR Nodes

• 3.2.1 Thematic editors network & guidelines: Establish network of thematic editors in key scientific areas, with lead editor to provide oversight. Develop and publish practical guidelines and recipes for use by Thematic editors around scientific community liaison, EDAM ontology development, bio.tools curation and registration.

M3.1. Publish guidance for thematic editors to contribute to the Tools Registry infrastructure, (Y1)

Deliverables D3.1. Develop tools to assist manual curation within bio.tools, (Y1) D3.2. Implement automated import methods to obtain tool metadata, (Y3)

43. Doppelt-Azeroual O, Mareuil F, Deveaud F, Kalaš M, Soranzo N, van den Beek M, Grüning B, Ison J, Ménager H. ReGaTE: Registration of Galaxy Tools in ELIXIR, GigaScience, 2017, 6(6), https://doi.org/10.1093/gigascience/gix022 44. Ménager H, Kalaš M, Rapacki K et al. Using registries to integrate bioinformatics tools and services into workbench environments. Int J Softw Tools Technol Transfer 2016, 18: 581, https://doi.org/10.1007/s10009-0150392-z

40. https://github.com/bio-tools/edamToolAnnotator

45. Willighagen E, Mélius J. Automatic OpenAPI to Bio.tools Conversion. bioRxiv 2017, 170274, https://doi.org/10.1101/170274

41. https://github.com/edamontology/edammap

46. http://planemo.readthedocs.io/en/latest/

42. http://biotools.readthedocs.io/en/latest/editors_guide.html

47. e.g. https://github.com/common-workflow-language/workflows/

60

ELIXIR Scientific Programme 2019–23

Cross-platform Interoperability

Task 4

During EXCELERATE we developed and published the technical foundation to add links to bio.tools to demonstrate which tools may be run in Galaxy43,44 and which APIs are available for biological databases:45 we will now apply these methods to provide a major and sustainable boost to the usefulness of bio.tools. This task is currently unfunded in the Tools Platform 5-year funding plan, but resources may be supplied via a Community-led Implementation Study.

Deliverable Integrate bio.tools with systems for tool execution, including: (1) published Galaxy tools for users and (2) Galaxy and CWL “tool descriptions” for platform maintainers, via new sustainable mechanisms for the annotation of Galaxy and CWL tools with bio.tools identifiers ensuring coverage/annotations in bio.tools.

Subtask 4.1 Integration of bio.tools with tool/ workflow execution environments Integrate bio.tools with systems for tool execution including (1) published Galaxy tools for users, (2) Galaxy and CWL “tool descriptions” for platform maintainers. To do so, we will provide sustainable mechanisms for the annotation of Galaxy and CWL tools with bio.tools identifiers and coverage/annotation in bio.tools • 4.1.1 Coverage of Galaxy & CWL-enabled tools: Ensure coverage in bio.tools of all online services available in public Galaxy instances and major sources of CWL-enabled tools: a bio.tools entry (unique tool) will link to a CWL file and all the available places that tool can be run • 4.1.2 Crosslinking: Develop Galaxy & CWL formats, tooling (e.g. planemo46), documentation and housekeeping mechanism to enable community bio.tools ID annotations, adding bio.tools IDs to existing best-practice Galaxy tools (Galaxy-IUC provided) and community-provided CWL tools47

ELIXIR Scientific Programme 2019–23

61


• 4.1.3 Maintenance: Provide sustainable monitoring and upkeep mechanism (based on further development of the ReGaTE48 utility) to ensure coverage and that crosslinks are maintained (e.g. for new services, service updates and deletion of old ones)

Cross-platform Galaxy Community, Interoperability

Subtask 4.2 Promotion of OpenAPI-enabled APIs for access to biological databases Integrate bio.tools with OpenAPI services, through semantic-enrichment and registration of exemplar APIs • 4.2.1 Extension of OpenAPI specification: we will extend the OpenAPI specification to enable semantic annotation (for EDAM, but generically) • 4.2.2 API registration: Annotate available openAPIenabled APIs with EDAM, further develop tooling49 for openAPI->bio.tools conversion, applying this to register the exemplar APIs • 4.2.3 Promotion: Publish and promote the adoption of OpenAPI in collaboration with EMBL-EBI and other major data providers

Cross-platform Data, Interoperability

Other important strategic areas for the Tools Platform These strategic tasks are currently unfunded in the Tools Platform but are still regarded as important. Therefore, opportunity for resources may be sought via Community-led Implementation Studies or other external funding.

Information standards The EDAM ontology provides the semantics essential for consistent scientific description of workflows and their components. biotoolsSchema provides a rigorous syntax and validation for software metadata. The two are combined within a practical information standard yielding useful metrics for life science software. We will maintain EDAM and support its use by communities, developing further biotoolsSchema and the information standard with community requested features.

Software development best practices We have published four simple recommendations to encourage best practices in research software.50 We will now adopt, promote, and recognise these practices, by developing comprehensive guidelines for software curation, and through workshops for training best practices, and improving the usability of the Tools Platform products.

Scientific community support & engagement The Tools Platform is underpinned by a vibrant foundation of technical products and expertise. We will outreach to all end-users, especially the ELIXIR Communities, to ensure they are put in the driving seat of the Platform.

Proposed events 1 Thematic Editors/EDAM joint workshop / theme / year 1 UX/usability workshop / year 1 Best practice (training materials) workshop / year joint with the Training Platform or carpentries 1 Tools Platform and end-users meeting / year

48. Doppelt-Azeroual O, Mareuil F, Deveaud F, Kalaš M, Soranzo N, van den Beek M, Grüning B, Ison J, Ménager H. ReGaTE: Registration of Galaxy Tools in ELIXIR, GigaScience, 2017, 6(6), https://doi.org/10.1093/gigascience/gix022 49. https://github.com/BiGCAT-UM/swagger2BioTools 50. Jiménez RC, Kuzak M, Alhamdoosh M et al. Four simple recommendations to encourage best practices in research software. F1000Research 2017, 6:876, https://doi.org/10.12688/f1000research.11407.1

62

ELIXIR Scientific Programme 2019–23

Compute Platform

The Compute Platform aims to build, deploy and provision cloud, compute, containers, storage and access services for the life science research community.

Modern biology requires computational access to data and versatile scientific software tool environments. ELIXIR Compute Platform (ECP) brings together key expertise, resources and resource managers in ELIXIR. It provides a process to implement common technical standards for life science in the European research infrastructure ecosystem. Technical integration driven by the ELIXIR Compute Platform ensures that life scientists accessing the distributed ELIXIR Nodes can analyse permitted data through a sustained reliance on services and protocols, e.g. via a singlesign-on to compute and data resources that underpin bioinformatics tools running in environments needed by scientific communities.

Platform organisation Platform co-Leaders Luděk Matyska (ELIXIR-CZ), Steven Newhouse (EMBLEBI), Tommi Nyrönen (ELIXIR-FI)

Platform Coordinator Jonathan Tedds (ELIXIR Hub) The ELIXIR Compute Platform (ECP) technical integration and design is driven by the scientific communities’ technical use cases, which provide proof that the technologies work for science as intended. New technical use cases are chosen in discussion with the ELIXIR Communities and will be integrated as part of the ECP technical architecture using existing ELIXIR Node resources, which are made available, for example, through Implementation Studies. The ECP work in the ELIXIR Programme is organised into teams around four tasks, in which experts from the ELIXIR Nodes join forces. Each team will have coleads to help coordinate tasks. These teams engage

together in projects (e.g. Implementation Studies, and other ELIXIR EU project activities), representing the ECP as a whole. Expertise within Nodes is connected and coordinated by the ECP via regular meetings, and the teams can consult with the ECP community at any time. Where possible the work is carried out in collaboration with national e-Infrastructures and with other trusted service providers.

Platform scope and ambition Challenge Currently, thousands of scientific laboratories across the world generate massive amounts of data, analytic tools and compute environments. In this situation, the traditional method of downloading and analysing data locally is no longer viable. Data and compute resources thus need to be managed as a federation, where providers work as a single, interconnected infrastructure. The ELIXIR Compute Platform infrastructure enables life scientists to access, share, and analyse data using standardised compute resources provided by distributed ELIXIR Nodes.

General objective To enable researchers to combine technical components of the ELIXIR Compute Platform services into a seamless ecosystem, thereby creating a science ready, standardised interface to the key resources and technological capabilities that are available for life sciences. In addition, ELIXIR services, such as software tools and data access, are supported by ECP Compute services (middleware and other lower level infrastructure services). ECP coordinates the dependencies between these components for the communities, with a focus on service development, service production and training. ELIXIR Scientific Programme 2019–23

63


• ELIXIR TECHNICAL EXPERTS: Our group of experts in AAI, cloud platforms, storage, data transfers, and containers from ELIXIR Nodes will meet in biannual ECP F2F meetings and in weekly/bi-weekly teleconferences to ensure Compute Platform coordination. • SERVICE DEVELOPMENT: ECP uses projects (e.g. EU projects) to fund service development (i.e. cloud, AAI), extended with technical implementation studies when needed (e.g. quick actions to create new functionalities).

Strategy for building relations with other ELIXIR Platforms and Communities ECP experts engage in technical work related to the four tasks (see Table 4) to empower other ELIXIR Platforms and ELIXIR Communities. Successful collaboration requires a technical lead contact to be named by both the ECP and the relying party side. The availability of an ECP expert to work in a collaboration depends on the individual Node’s priorities and funding.

• SERVICE PRODUCTION: We make a clear division between development- and production- level services. ELIXIR services that rely on prototype/ development services are not planned and will not enter production with routine delivery to end-user researchers. To drive overall infrastructure stability, we will sustain Compute services using ELIXIR Commissioned Services within a federated service management framework.

ELIXIR Infrastructure Services operated by the ECP partners will provide a point of contact for support requests from relying parties (e.g. ELIXIR AAI). These might run independently for an interim period (20192020), but seek actively to integrate with a more general ELIXIR single-point-of-contact or helpdesk. This will integrate with the wider ELIXIR Communities support, as it is developed and becomes available.

• TRAINING: Which comprises both the training in the access and use of standardised compute environments and resources alongside training in the provision of such compute environments.

The Platform work is focussed in 4 key technical groups with a 5th stream of work embedded throughout that is dedicated to providing training in the use and provision of cloud environments and associated ECP resources in the table below:

Table 4: Compute Platform groups and their objectives Groups

Aim

Identity and access management

Provide a centralised and sustained user identity and access management infrastructure (Life Science AAI, LS AAI) » All Platforms & Communities

Data transfer and availability

Coordinate the easy movement and synchronisation of large public and sensitive datasets across ELIXIR clouds and other e-Infrastructures. » Data Platform, Plant & Human Data Communities

ELIXIR hybrid cloud

Coordinate technical, operational and funding aspects of cloud, data and compute services across Europe for the ELIXIR and larger life science community within a seamless hybrid cloud ecosystem. » All Platforms & Communities

Container orchestration

64

Coordinate the provision and operation of a GA4GH-compatible container platform to allow the execution of containerised software workflow loads, supporting public and sensitive data. » Tools and Training Platforms & Human Data Communities

ELIXIR Scientific Programme 2019–23

The Compute Platform in 2023 will 1. Deploy ELIXIR-wide Identity and access management (Task 1) through the ELIXIR AAI Service that will underpin the LifeScience AAI (to be deployed across European research infrastructures and beyond): • Defining and continuously upgrading the access and user management system for the life science community (the LS AAI) • Implementing AAI services compatible with the EOSC and interoperable with standards supported by the e-infrastructures • Providing access control to sensitive resources (data, compute, tools) • Operating the LifeScience AAI • Coordinating training for users, service and resource providers 2. Host and provide standardised compute access to important public reference (Task 2a) and sensitive data sets (Task 2b) in relevant (cloud) providers such that: • Datasets can be accessed locally via consistent, versatile interfaces (e.g. S3 or POSIX) • Local copies are updated as new versions are released • Users and cloud managers are able to manage their own data set subscriptions • Secure but easy to use access control mechanisms • Assured network connectivity (bandwidth and security) through software-defined networking 3. Implement a hybrid cloud ecosystem interoperable with key resources that is accessible to researchers spanning (Task 3): • Local, private clouds (e.g. EMBL-EBI Embassy, CyVerse UK) • National community clouds (e.g. cPouta) • European research and innovation oriented clouds (e.g. EOSC) • Public/commercial compliant clouds (e.g Google, Azure, AWS) 4. Enable containers to be deployed and operated at scale and across cloud systems in standardised formats (Task 4) which: • Comprise scientific software and workflows from ELIXIR Tools Platform ecosystem e.g. bio.tools, BioContainers, and OpenEBench • Comprise data analysis platforms to support researchers (e.g. Spark). • Connect with existing relevant ELIXIR Communities e.g. Galaxy • Provide an easy-to-use and transparent way to automate service deployment, orchestration, and configuration supporting the ELIXIR use cases

5. Coordinate the support and provision of standardised cloud and compute environments for the training of providers, technologists, developers, and researchers across Europe 6. Design KPIs and monitor impact across the Tasks to measure success, to include: service usage, number of deployments, community engagement metrics, citations, use of cloud and compute-specific metrics including Help Desk and responsiveness, updates to services. The Compute Platform is organised around four key tasks where experts from Nodes can join forces. Embedded in each will be a commitment to working across Platforms and Communities to enable access and training in advanced compute facilities, environments, tools, and datasets. Each task builds on previous tasks, with the goal of a commissioned ELIXIR AAI Service under Task 1 which then progressively enables access to and transfer of key life science datasets. Users can then authenticate with an interoperable ELIXIR hybrid cloud ecosystem developed through Task 3 in which they can deploy container environments (Task 4) for generic and selfprovided tools for management, sharing, and analysis.

Task 1 Identity and access management, including AAI ELIXIR will lead collaborations across the BMS RIs to provide a convergent identity and access management system. Such a system enables access to data, clouds, applications and workflows used by the BMS community. This then enables the necessary setting and control of access rights for all users through their digital identities associated with ELIXIR and other BMS RIs. Support for secure access to sensitive data with their specific requirements is a high priority. To achieve this goal, ELIXIR will contribute to/lead the implementation of the federated life science authentication and authorization infrastructure (LS AAI) and the access proposal/control system supporting access. This will be achieved through enabling different user entry points, managing user life cycle, and controlling and providing fine-grained access control to the resources. The system will be fully operational and will provide the necessary infrastructure to which all the ELIXIR and BMS services will be connected. In this way, we will allow users to access and share data and resources across a wide range of resources provided by and through the BMS community and their RIs. ELIXIR will support management and operation of such an access and user management system and its components. ELIXIR Scientific Programme 2019–23

65


• E-infrastructures fail to deliver, deliver a poor service or do not listen to or understand the needs coming from the BMS RIs.

Subtask 1.3

Task 2

Training in AAI

Timing

Outcome

Making datasets available in relevant cloud providers

In collaboration with the Training Platform, this Task will provide experts to run the training for the whole pipeline for life science RIs resource providers, developers and users.

• 2020: Transition from ELIXIR AAI to the more complex LS AAI • Explicit Involvement of e-infrastructures in 2019 as part of the EOSC-Life project • Continuous service delivery through the whole timeframe of the Programme • Gradual updates of the AA infrastructure (once or twice per year since 2020)

• ELIXIR (and LS) researchers know the LS AAI and are able to use it properly to access all LS data and compute resources. • Developers are aware of LS AAI APIs and can configure services as appropriate to make them interoperable with their own RIs and products. • Resource providers are able to make their resources available through LS AAI in a controlled way (including access to sensitive data).

Subtask 1.1

Subtask 1.2

Description

Define, operate and continuously upgrade the access and user management system to meet the requirements as described

Access to sensitive data

A set of training events dedicated to different target groups (primary researchers, developers and resource owners/providers/implementers) will be organized through the timeframe of the Programme.

ELIXIR will work in partnership with other BMS RI on sustainability planning to ensure long-term operation and thus to provide guarantees for the depending services. The whole access and user management system will be made fully interoperable with the EOSC ecosystem and with the EOSC AAI framework as we help define and implement it.

Outcome An up to date access and user identity management system (the Life Science AAI)

Description This work will determine the appropriate point when the move (upgrade) from the current ELIXIR AAI can be decided, and moved into the wider Life Science AAI, such that the required components can be operated by the e-infrastructures. When the decision to upgrade is made, the transition will be carried out based on the outcomes of the current LS AAI pilot run within the AARC2 project. The selection of involved e-infrastructures is planned as a part of the EOSC-Life project.

Risk & opportunities • The selection of involved e-infrastructures fails: In this case ELIXIR will work with the BMS RIs to deploy the current ELIXIR AAI into LS AAI. • Relying parties (SPs, users) refuse to trust a service operated by e-Infrastructures. • Relying parties (SPs) find it cumbersome to move from ELIXIR AAI to LS AAI. This can be mitigated by allocating enough ELIXIR and e-Infra human data work to smooth the transition. • Challenges of GDPR compliance, defining the data controller and data processor role. • EOSC interoperability, which is dependent on the development of the wider LS AAI and EOSC AAI. • Opportunity to continue as the representative of the LS community needs with respect to the AAI infrastructure. • BMS RIs fail to agree on the governance and sustainability model for the LS AAI. 66

ELIXIR Scientific Programme 2019–23

Outcome LS AAI provides all the features needed for access to sensitive and human data: strong identity vetting; identity freshness checks; second factor authentication; and well-established electronic tools for managing researchers’ access to datasets incorporating several levels of granularity.

Description Adding and strengthening features and components that are needed to provide strong authentication and trusted authorization and the access control needed to control access to sensitive data. Implementing the resultant services for a range of Human Data requirements and use cases.

Risk & opportunities • Reluctance of service/data providers with sensitive data to use the LS AAI. • Enhanced security AAI features are not accepted by the community (researchers). • Conflicting access and security requirements with other Human Data Communities and their existing solutions. • Co-operation with GA4GH data access models.

Timing

Risk & opportunities

Many bioinformatics analysis activities are dependent on reference data sets to undertake their work. Transferring the data sets on demand will delay the start of any analysis activity, as moving large data set does not happen instantly. Instead, pre-positioning of relevant data sets on popular cloud resources means that they are already available when they are needed. This involves two tasks: • Provisioning of ELIXIR-wide federated namespace, where storage locations of important biological datasets are available for transfer, backed by ELIXIR Compute Platform providers and authn/authZ by AAI. • “Site-2-Site” Data Transfer services for data access to manage the distribution of these data sets to the ELIXIR Nodes and other cloud and compute resources that life scientists can access. • “Site-2-User” Data Transfer services for data access to desktop devices.

• Low interest in the provided training • Insufficient human resource available to provide the training

Subtask 2.1

Timing • Targeted training events prioritised, e.g. for implementers and topics such as AAI AuthZ. • Online training materials, policy templates available starting from 2020.

Provisioning of federated storage namespace Outcome Provide a location independent mechanism to identify data which can then be resolved to the location(s) of the data in the physical infrastructure. This will allow a researcher to find where a specific data set is located and to decide if they are able to move their workload to this data, or, if a data transfer site 2 site is needed prior to starting computations.

Description Same data set will inevitably exist in multiple locations within ELIXIR.

Risk & opportunities Only a few Nodes have process and resources for international compute service access.

• The first phase will be accomplished in 2019/2020 and will provide all the basic components. • During the second phase (2021 to 2022), feedback from the first LS AAI deployment for sensitive data environments will be collected and improvements and additions will be identified and implemented.

ELIXIR Scientific Programme 2019–23

67


Subtask 2.2

Subtask 2.3

Task 3

Subtask 3.1

Site-2-Site Data Transfers

Site-2-User Data Transfer

Outcome

Outcome

Defining and coordinating an ELIXIR hybrid cloud ecosystem

Definition and evolution of ELIXIR hybrid cloud ecosystem

Public and sensitive reference data sets are available on the cloud sites that have subscribed to receive them. Sensitive data sets are then made available locally for authorised users to access the data.

Public data sets can be downloaded from their source to a researcher’s desktop environment.

e-Infrastructures are becoming more flexible via virtualisation. This allows contemporary biological research projects with large processing and storage requirements to leverage their capacities. The technical challenges of large-scale cloud development for research will be addressed within the EOSC umbrella, where the ELIXIR Compute Platform, while already involved, will be a participant only. However, there are non-technical challenges of the cloud infrastructure that should be the primary focus of the ELIXIR Compute Platform: what is actually needed across the ELIXIR Nodes; how cloud provisioning and resource access (and allocation) should be organised; who will provide the cloud resources; and how the providers should be reimbursed/sustained. The acute problem of cloud resource allocation will be investigated, including the proposal to establish a Resource Allocation Committee (RAC) in the case of commissioned central ELIXIR resources.

Outcome

Description The Reference Data Set Distribution Service (RDSDS) for site-to-site transfer will be used to allow sites to subscribe to specific public data sets that can be provisioned onto their cloud resources at the specified location, and when a new version of a given public data set is made available. Sensitive data sets will be made available through a secure cloud environment whereby the data set can be hosted securely in the remote cloud environment and remain encrypted in situ. The rights to access the sensitive data are verified each time the data is accessed by the user. ELIXIR webinar introduced the proposed technologies for secure data transfer between two ELIXIR Nodes in May 2018.

Risk & opportunities • Reference data sets need to be available and maintained such that the cloud providers remain as attractive places for researchers to analyse their data, thus minimising the barriers to entry for dealing with large data set management and local replication. • Strategic replication of reference data sets will make cloud providers in ELIXIR and commercial cloud providers attractive to users. • Storage/compute availability within cloud sites does not meet user demand.

Timing • 2018: Work undertaken through the Data Transfer Implementation Study and the ELIXIR Competency Centre in EOSC-hub will bring the Reference Data Set Distribution Service into operation. • 2019: Implement key WP9 ELIXIR-EXCELERATE developments and bring these into production, including, e.g. ELIXIR Human Data through Federated EGA. • 2020: Have reference data sets available in both ELIXIR and public commercial clouds.

68

ELIXIR Scientific Programme 2019–23

Description Provides a means for large data sets to be delivered asynchronously from their source to where a user needs them for their analysis.

Risk & opportunities • Installation of download software to the desktop is too complicated for users to do without support. • Asynchronous delivery of data to the user means that longer latency and more economical storage systems can be used instead of expensive low-latency storage systems.

Timing • 2019: Prototype the download environment using Globus Transfer. • 2020: Prototype ‘cold’ cloud storage as the back-end server for delivering data asynchronously to a user’s desktop.

This task, following current and future technology constraints, will organise ELIXIR cloud experts and focus them around the problem of defining the ELIXIR hybrid cloud ecosystem and how it should be coordinated. Initially concerned with ELIXIR’s own requirements, the task will gradually expand to cover the larger life science community, similar to the transition of ELIXIR AAI to LS AAI. The task will also work on piloting the proposed approaches, as a part of the EOSC-Life project in collaboration with the EOSC-hub and similar future projects; and as a standalone ELIXIR activity, using resources and resource providers of individual ELIXIR Nodes.

To determine and establish the strategic principles and sustainability for the ELIXIR hybrid cloud ecosystem, including resourcing, service levels and sustainability. Pilot development will focus on: • Shifting from resource control to resource orchestration • Focussing on enabling user interactions (user needs) • Shifting to greater openness • Development of accretive strategies to engage producers and consumers within the platform ecosystem • Development of user-centric and ecosystem-centric metrics The proposal must deal with the different funding strategies while implementing the EOSC governance work on cloud provisioning. In the second iteration, the specific requirements of work with sensitive data must be also included, i.e. enhanced information governance, higher cost base, specific technologies deployed, etc.

Risk & opportunities This task opens an opportunity for ELIXIR to play a leading role in hybrid research cloud ecosystem provisioning for the life science community and also to be a strong partner for the EOSC in defining and implementing the EOSC hybrid cloud ecosystem.

Timing • 2019: Deliverable “ELIXIR/LS hybrid cloud ecosystem”; the first draft. • 2020/2021: The second version, using feedback from the first pilots. This version also incorporates clouds for sensitive data storage and processing. • 2022/2023: The third version, aiming to define a production hybrid cloud ecosystem for the whole LS community (not only for ELIXIR).

ELIXIR Scientific Programme 2019–23

69


Subtask 3.2

Subtask 4.1

Piloting the proposal

ELIXIR container platform to allow execution of containerised software workflow loads developed with the ELIXIR Tools Platform ecosystem

Outcome Several user-centric and inter-platform pilots using the resources provided by the involved ELIXIR Nodes and also by external EOSC-related resources, if provided. The pilots will demonstrate the usability of the proposal to setup a cloud ecosystem for ELIXIR (using the hybrid cloud environment to support (semi)production operation of use cases that are defined in the other parts of the Programme). Feedback from these pilots will be used to improve the ELIXIR hybrid cloud ecosystem definition.

Risk & opportunities • Insufficient interest of the cloud resource providers to participate in the pilots (i.e. to provide the actual cloud resources). • Insufficient push from use cases for the new hybrid cloud resources (not sufficiently ambitious use cases). • Insufficient technology (to accomplishes the vision described in the proposal for the ELIXIR hybrid cloud ecosystem) provided through EOSC-related activities.

Timing • 2019: Work with use cases to define the first round of pilots. • 2020: First pilots, feedback provided to the team working on the framework. • 2021: New set of use cases (including use cases from outside the ELIXIR Communities); second round of pilots and feedback. • 2022: Third round of pilots, including work with extensive sets of sensitive data use cases

Task 4 Community containers being deployed and operated at scale This Task coordinates ECP expertise and resources to leverage the technology development in other projects, in order to provide high-level, communitydriven orchestration of containers. This will allow standardised, containerised and community-chosen software applications to be hosted on ELIXIR Nodes and cloud infrastructures. The target is to provide a high-level abstraction layer to the underlying integrated technological components provided by Tasks 1, 2 and 3.

70

ELIXIR Scientific Programme 2019–23

Outcome Provide a Container as a Service (CaaS) offer for a user to execute and manage the lifecycle of their containers on platforms made available from ELIXIR Nodes and later on European e-Infrastructures. This CaaS service will be offered to end-users (application developers in this example) through GA4GH compliant APIs.

• 2020: Resource allocation principles, expansion to other Nodes, emerging service at European level, production services at Node level. • 2021: European coordinated platform for life science containers is defined with ELIXIR Tools Platform to allow moving into production, discussion with supporting projects, e.g. EOSC. • 2022: Platform service ready for production, process for federating with EOSC. • 2023: Trans-national user access to distributed resource with EOSC.

Milestones for Tasks 1-4 M1-5 Technical Demostrator, (Y1, Y2, Y3, Y4, Y5) M5 ELIXIR/LS AAI training material available, (Y2) M6 Resource allocation policy proposal published, (Y3)

How will it be accomplished

Deliverables for Tasks 1-4

First, the ELIXIR container execution platform design will consider work provided on interfaces, protocols, tools and file standards across the ELIXIR Platforms (Compute, Tools, Interoperability, Data) and will leverage GA4GH Cloud-WS standards (WES/TES/TRS/DOS). Implementations will utilise the GA4GH Data Use and Researchers ID (DURI) standards to enable common authentication and data authorisation solutions (e.g. ELIXIR AAI) across cloud and HPC environments. The idea is that the same credentials for logging into these services and access datasets are authorised in all of these services. This leverages the work in Task 1 and 2.

D1-5 Technical Roadmap (Living document), (Y1, Y2, Y3, Y4, Y5) D5 ELIXIR/LS hybrid cloud ecosystem, (Y3) D6 ELIXIR/LS AAI sustainability model, (Y3) D7 ELIXIR/LS container execution platform prototype, (Y4) D8 ELIXIR/LS data transfer & distribution services, (Y2)

Second, the aim is to provide high-level interfaces for containerised software tools available through ELIXIR. Users will be able to execute those tools on cloud platforms available for, e.g. ELIXIR Community workloads. Container workload orchestration can be achieved by developing a scheduler to execute containers on the target platforms. The success of the Task will be measured by the ability to provide the analysis pipelines used by the ELIXIR Communities as a service in collaboration with the ELIXIR Tools Platform.

Outcome

Risk & opportunities • Supporting EU framework projects for the life science part of the work are not accepted. • EOSC and HPC projects are also working on container technologies and this could reduce the required technical work into adoption of the most suitable components. • Experts in container technologies are in highdemand at the Node level, they might not be able to engage at European level because of this.

Subtask 4.2 Access to sensitive datasets with containers Provision of analysis pipelines as a service using containers that are able to process sensitive data using the ELIXIR security guidelines.

How will it be accomplished The task will design and provide interfaces in which the user can interact with the overall integrated scientific workflows of the use cases defined in ELIXIR Communities. This includes sensitive human data and unifies the underlying heterogeneous ICT infrastructure, relying on the APIs developed across the ELIXIR Compute Platform. Interfaces will not only give access to the heterogeneous infrastructure, but also to the management and application level that is required to request access to sensitive data, as developed through the systems implemented in Task 1 (e.g. OAuth2).

Risk & opportunities Need to define and support the development of an ELIXIR-wide Operational Security and Trust Framework for handling Sensitive Data.

When will it be accomplished 2020–2022: Requires stable AAI operation, sensitive data distribution service, and computational access to sensitive data from the Compute Platform.

Task 5 Training Provide support for cloud environments for the training of technologists, developers and researchers in utilisation of ELIXIR compute resources. The Platform will provide key expertise to enable the provision of compute resources for training of researchers and other ELIXIR users in the secure access, data transfer, cloud enabled processing of user and community containers at scale. Training for the providers of cloud enabled workshops will be supported through the continued development of Workshops as a Service. The programme will be co-designed with the Training Platform in order to support the wider strategic goals of ELIXIR including the timing of key provision as it becomes available via the platforms.

Events To be supported and co-designed through the lead of the Training Platform and in support of the wider ELIXIR Platform milestones.

Milestones For AAI Training we will incorporate the milestones from EOSC-Life project proposal. Workshop provision will be addressed by EOSC-Life Task 7.2 (Cloud Training Programme).

When will it be accomplished • 2019: Proof of concept service with 2–3 ELIXIR Nodes emerging from the ELIXIR compatible cloud container Implementation Study.

ELIXIR Scientific Programme 2019–23

71


South Building on the Wellcome Genome Campus in Hinxton, UK, seat of the ELIXIR Hub.

72

ELIXIR Scientific Programme 2019–23

ELIXIR Scientific Programme 2019–23

73


Interoperability Platform

General objective The Interoperability Platform aims to provide Services, Standards and Expertise in order to maximise value and benefit by integrating data from disparate resources across disciplines and borders, and to align with activities in other Platforms. We work across the whole of ELIXIR to deliver a sustainable portfolio of FAIR Recommended Interoperability Resources (RIRs), including those embedded within national data management practices and Node Service Delivery

Plans (SDPs), and to provide the infrastructure that will aid the discovery, exploration, interoperability and reuse of scientific data. We review our RIRs against a service maturity and life-cycle model (Figure 4). We adopt, adapt and drive emerging practices and technologies, avoiding ad-hoc and proprietary implementations, and align with relevant and related key global efforts (Research Data Alliance, NIH BD2K Data Commons, GA4GH, and other international activities). This work will be carried out in four groups (see Table 5).

The ELIXIR Interoperability Platform’s mission is to maximise the interoperability of bioscience services, datasets and knowledgebases, building on the concept of making data FAIR and directed at its actual reuse. We will work in partnerships ranging from individual data providers and interoperability service developers to international standardisation initiatives and communities. Through sustainable interoperability products and services, combined with a philosophy across European life science of ‘standards as the default at data source’, we aim to maximise the value and benefit of disparate resources across disciplines, communities and borders. This work is community driven to deliver “Interoperability with a Purpose”.

Platform organisation Platform co-Leaders Chris Evelo (ELIXIR-NL), Carole Goble (ELIXIR-UK), Helen Parkinson (EMBL-EBI)

Platform Coordinator Sirarat Sarntivijai (ELIXIR Hub)

74

ELIXIR Scientific Programme 2019–23

Idea Gap in the Interoperability architecture highlights an idea for a new resource

Alpha First working version of resource. Used by subset of community

Resource deployment Resource becomes a key part of the interoperability Architecture. Resource evolves sustainable funding method

Platform scope and ambition Challenge The ELIXIR Interoperability Platform (EIP) has been established to deal with the challenge of delivering FAIR data: to make available the services needed to make data FAIR, to work with FAIR data, and to enable its actual reuse. The FAIR data challenge spans the different complexity levels and the variety of life science data types; across the datasets, data catalogues, data tools and services; across the multitude of biological disciplines and organisational boundaries and, at the EOSC level, across disciplines and to support e-Infrastructure services. The adoption of standards, services and stewardship best practice by data providers will provide scientists with the tools they need to do research efficiently.

Prototype Explore technical options for new resource. May produce early prototype

Beta Established working version of resource. Used by wider community. Architecture gap seen to be filled albeit not with production grade

Resource retirement Resource is retired as no longer needed or is replaced by improved technology

Figure 4: Recommended Interoperability Resource Maturity Model

ELIXIR Scientific Programme 2019–23

75


Table 5: Interoperability Platform groups and their objectives Groups

Aim

FAIR Service Architecture

The technical framework and supporting processes for Recommended Interoperability Resources (RIRs) of the EIP: (1) identify and sustain RIRs, identify and act on gaps; aid service improvements with providers; (2) widespread adoption of RIRs by Core Data Resources and recommended Deposition Databases, Node resources and pipelines.

Interoperability with a Purpose at Source

Capacity Building

Interoperability Services for the Cloud

Identify, source and adopt appropriate interoperability services for: (1) ELIXIR Communities such as Human Data, Plant/Crop Phenotyping, and Marine Metagenomics, remaining responsive to the evolving needs of those communities; (2) the needs, capabilities and services of the Nodes, which are increasingly supporting researchers with project data management. Those researchers need to capture standards-compliant metadata at the time of data generation (first mile interoperability) and to analyse and integrate local data with public datasets using analysis platforms such as Galaxy (last mile interoperability).

Dissemination and sustainable knowledge sharing through the documentation and distribution of know-how; and development of best practices, with examples made available through an online knowledge hub (linked to the TeSS training web portal) and supplemented with workshop/ tutorial events. Cloud-based deployment of interoperability resources initially requires cost/ benefit evaluation on a per resource level. Implementation and deployment of cloud-enabled EIP Interoperability RIRs and services to support EOSC projects: metadata exchange between catalogues and for e-Infrastructure services; containerisation of RIR services; EIP services to coordinate FAIR data storage and processing for Core Data Resources and recommended Deposition Databases.

2019–23 Goal Maximise the interoperability of bioscience services, datasets and knowledgebases.

Task 1 FAIR Service Architecture Objective Delivery of a FAIR service infrastructure through the stepwise incorporation of components necessary for a functional platform. This requires process specifications, identification of gaps in services, and will be implemented through the ability to adapt, adopt or solicit additional services.

How will it be accomplished Delivery of a FAIR service architecture for ELIXIR requires a set of Recommended Interoperability Resources (RIRs) covering services and technological solutions (e.g. Bioschemas) driven by community and Node requirements. Key EIP resources were identified during the 2014–2018 Programme to be sustained and migrated into the ELIXIR Infrastructure Services. These include resources such as Identifiers.org, Ontology Lookup Service, FAIRsharing, and Bioschemas. By providing an architecture rather than a set of technical specifications, we provide space for the addition of new services, and scope to innovate through design and by adoption of emerging technologies. We require collaborations with the other ELIXIR Platforms for delivery.

Subtask 1.2 Recommended Interoperability Resources • Continue to identify, assess and monitor Recommended Interoperability Resources (RIRs), supported by an iterative (e.g. yearly) selection and transparent review, and develop an ongoing monitoring process ensuring resources remain fit for purpose. • Identify and assist with areas of service improvements and sustainability options for Recommended Interoperability Resources in collaboration with service owners, using the RIR maturity model (Figure 2) and feedback from Subtask 3.2.

Subtask 1.3 Recommended Interoperability Resources adoption • Universal adoption and live deployment of Bioschemas by all of ELIXIR Core Data Resources and Deposition Databases and 50% of Node databases/knowledgebases, in partnership with the Data Platform. • Registration of all Core Data Resources, Deposition Databases and Node databases in Identifier resolution services, databases and standards portal and other relevant Interoperability services. • Adoption of CWL by an ELIXIR pipeline in partnership with the Tools Platform. • Produce best practice guidelines for all adoption stakeholders with Subtask 3.3.

Subtask 1.1 The Interoperability Platform in 2023 • By 2023, we anticipate that a researcher who needs to find, combine or analyse multiple data sets, from different life science disciplines, will encounter far fewer technical challenges than today, and critically will instead be able to focus on extracting insights and knowledge from that data, rather than on, for example, data-wrangling activities. • KPIs to illustrate success include: service usage, number of deployments, community engagement metrics, citations, use of nascent FAIR metrics, scope and diversity of interoperability services, responsiveness and updates to services.

76

ELIXIR Scientific Programme 2019–23

Roadmaps, maturity models and gap analysis • Facilitate ‘Interoperability for interoperability’ framework – ensuring a coordinated set of resources in line with FAIR Service Architecture roadmap. • Maintain a FAIR service architecture roadmap and resource landscape against the RIR Maturity Model, published on Knowledge Hub (Subtask 3.1) • Identify and assess gaps necessary for a fully functional service infrastructure, prioritise activities to address these gaps and commission new services where appropriate. • Develop process to overview the quality and coverage of Node Interoperability resources and services, using feedback from Subtask 3.2.

Subtask 1.4 Implementation Study calls and management • Strategic Implementation Studies and systematic evaluation of ongoing Implementation Studies to define a long-term sustainable course of support. • Bioschemas has been innovative for the platform in discovery. We aim to facilitate Bioschemas to be a “spun-out” independent and self-sustaining community initiative by 2021 with a significant push in 2019–2021. Bioschemas is admirably placed to be a Strategic Implementation Study with the Data Platform. • Other Strategic Implementation Studies will emerge.

ELIXIR Scientific Programme 2019–23

77


Outcome

When will it be accomplished?

Task 2

Subtask 2.1

• A maintained landscape and roadmap of interoperability resources, and a process for identifying, selecting, monitoring and supporting RIRs. • Portfolio of ELIXIR Recommended Interoperability Resources; Portfolio of Resources to support interoperability needs of Nodes and of creators/ users of the Data and other Platforms; standards and databases. • Recommendation of migrating matured RIRs that are identified as universally critical to ELIXIR mission into ELIXIR Infrastructure Service Plan. • Universal coverage of live Bioschemas markup of ELIXIR Core Data Resources and Deposition Databases, and full live adoption in the RIRs. Adoption in indexing, integration and other dataset consuming services and demonstrated value. Wide coverage of database registration against identifier and metadata standards. 50% of Node datasets adopt Bioschemas. Widespread adoption of CWL for workflow interoperability.

The delivery is a staged plan with multiple components, each of which will follow the documented schedule. We aim to facilitate Bioschemas to be a “spun-out” independent and self-sustaining community initiative by 2021.

Interoperability with a purpose

Interoperability for ELIXIR Communities

Risks & opportunities Opportunity to lead on the FAIR services and technology solutions needed for life sciences and position ourselves with EOSC and cloud models in general; provide the linkage and semantic infrastructure needed for datasets and tools (Data/ Tools Platform); Bioschemas opportunity for scalable and universal markup and “Knowledge Graph” for biosciences; driving community toolkit beyond ELIXIR to promote the processes and the services we deliver; Interoperability is an added value of ELIXIR; Risks: the more complex the FAIR services architecture becomes, the harder it will be to sustain it; as the service portfolio increases ensuring cross-service interoperability becomes harder; as technologies change, the FAIR service architecture may become fragile or hard to maintain, hence the adoption of robust and community approaches such as Bioschemas.

Events 2 Bioschemas workshops/bio service hackathons, (Y1, Y2)

Milestones M1.1 Completed periodic gap analysis, (Y2 and Y4) M1.2 Call for interoperability resources and review, (Y1, 2, 3, 4, 5) M1.3 New interoperability resource adoption, (annual based on M2 outcomes) M1.4 Established Interoperability Resource Maturity Model, (Y2) M1.5 Implementation Studies call and execution, (Y2, Y4)

Deliverables D1.1 Recommended Interoperability Resources, gap analysis, landscape and roadmap, (Y2, Y4) D1.2 Bioschema live deployment and sustainability plan, (Y1, Y3) D1.3 Report on Implementation Studies, (Y3, Y5) Tool interoperability: FAIR approaches using standard APIs, standards for APIs, and metadata to describe workflows using the Common Workflow Language (CWL) referenced in Tools Platform.

Objective Identify needs for interoperability services that allow the use of FAIR data to answer research questions. This includes the mapping and linking of different types of data and the connections to tools often used in ELIXIR Communities. Ensure, based on that, that RIRs and EIP services are appropriately selected and fit for purpose for existing and emerging ELIXIR Communities, and for the needs of Nodes. Establish and promote connections among communities and Nodes through the EIP resources deployment and community engagement. Operate a practical “Just Enough, Just in Time” approach (rather than high-cost perfection), with respect to service provision community and Node engagement; drives T1 and complements T3.

How will it be accomplished To identify, source, develop, adopt and communicate appropriate interoperability services for Communities and Nodes, we will focus on dedicated activities that delve deeper into their requirements and facilitate sustained co-working. Previously EIP has focused on metadata validation services for the Plant Community, Linked Data services for the Rare Disease Community and workflow interoperability for the Marine Metagenomics Community. Nodes are increasingly supporting researchers with well managed project data management as well as engaging in community activities. We will therefore support both these cases and engage with two or more community use cases, for example, ‘Federated Human Data’. To improve downstream efficiencies, those researchers need to capture standards compliant study metadata at the time of data generation (first mile interoperability), and analyse and integrate local data with public datasets, using integrative analysis platforms such as R, Cytoscape and make available as workflows in e.g. Galaxy (last mile interoperability). That needs both linking and mapping of data to be integrated and connections of these services to these tools (often as plugins and such). Thus, the first and last mile interoperability needs of Node-supported project data management is to be addressed to allow optimal use in analysis.

51. Koureas D, Arvanitidis C, Belbin L et al. Community engagement: The ‘last mile’ challenge for European research e-infrastructures. Research Ideas and Outcomes 2016, 2: e9933, https://doi.org/10.3897/rio.2.e9933

78

ELIXIR Scientific Programme 2019–23

• Working with Community to gather user’s interoperability service requirements (a start of that is already available in the project plan service framework) • Facilitating communications and planning between the EIP technical community and use case communities to develop interoperability services that cater to the use-case communities’ needs, and the retrospective FAIR application when the available services specify FAIR needs more precisely • Identify service and requirement commonalities and synergies across Communities enabling interoperability • Stimulate and support the development of the identified ‘needed’ services and promote the usage of existing ones, by showing examples and by stimulating the development of the necessary connections in analysis platforms. • Promotion of Community and technology choices (e.g. Linked data, CWL, Standard APIs, etc.) through the Knowledge Hub (Subtask 3.2) and BYODs focussing on reuse (second stage BYODs) (Subtask 3.3)

Subtask 2.2 Interoperability for ELIXIR Nodes • Assemble a Node-based Interoperability Working Group to focus on the Interoperability User Experience: First Mile (into community users)/Last Mile51 (from developers), report on User Experience and propose guidelines and protocols. • Define a Strategic Implementation Study to bring Nodes together to investigate and share common developments with their data management platforms and data analysis needs with respect to Interoperability services, notably upload and linkage support to ELIXIR Deposition Databases (with the Data Platform) and use of analysis tools (with the Tools Platform). • Identify key interoperability services provided by, and needed by, Nodes to support First and Last Mile Interoperability and Node-based Data Management Planning, both FAIR at source and retrospectively FAIR. Facilitate communications and planning between the EIP technical community and Nodes to develop these interoperability services. • Deployment and take-up of RIR (e.g. Bioschemas) and EIP services (e.g. lookup and mapping tools) in Nodes, for Node DMP and for Node services included in the Service Delivery Plans.

ELIXIR Scientific Programme 2019–23

79


Subtask 2.3 International strategic collaborations for Interoperability Services • Strategic collaborations with flagship international organisations such as the Research Data Alliance, NIH BD2K Data Commons, and GA4GH to define/ refine FAIRification protocols. • Promotion of FAIRMetrics, FAIR capability maturity model (FAIRplus), and related developments to support resource identification/evaluation and pathways to FAIR adoption whether ‘at source’ or ‘post hoc’ in Communities and across Nodes. Dissemination through Subtasks 3.1–3.3.

Outcome National data management plan that aims to deliver demonstrable and improved level of usage support for FAIR data by key interoperability resources and services including RIRs. The improved usage support for FAIR data is demonstrated by the consistent annotations of the data at source that is conformant to the requirements (and future needs) of users, as determined through deep dive into use cases, with community. A better integrated portfolio of resources and services that crosses domain and community barriers, by virtue of cross-Platform/Node activities, e.g. the delivery of software implementation (Tools Platform) equipped with Interoperability functionality. Resources that support use of FAIR data, improved specifications and corresponding metrics for FAIR data itself, identified through this service development and thus having data available to new processes or workflows without requiring ad hoc manipulations. A well-documented strategy to assist in the adoption of the most suitable methodology through which to FAIRify new sources/services, and the strategy to be accepted into recommendations by other organisations; 50% of Node data resources adopt Bioschemas live, 80% using identifiers (e.g. ontology terms and database ID) and descriptors (e.g. chemical structures in INCHIs) recognised by interoperability services and thus usable in major tools.

Risks & opportunities Opportunities – Create an environment for research where FAIR data can actually be used for analysis and combined with other data, leading to a situation where FAIRification becomes one of the initial concepts built into delivery, rather than requiring retrospective data wrangling; Deliver a procedure through which resources can be evaluated for FAIRness; RIRs and EIP services have a real purpose and deliver critical capability to generators and users of FAIR data; Identify most needed EIP services and stimulate their

80

ELIXIR Scientific Programme 2019–23

development; Deliver a best practice guideline which will assess individual resources for most appropriate FAIRification methodology. Risks – Too many Communities to engage with meaningfully; Reluctance of resources to submit to some formal appraisal; Inertia in getting buy in from resources with regards to advantages of FAIRification (mismatched with current process, no advantage to resource only to user, etc.); International efforts having different priorities/resource constrained, slowing development of common approaches, and resulting in tangential outputs.

When will it be accomplished? The delivery is a staged plan with multiple components, each of which will follow a documented schedule developed with the Communities, other Platforms and Nodes.

Events Community workshops, co-located with Community events, (Y1, Y2) 2 joint Node workshops to pump-prime working group, (Y1, Y2)

Milestones M2.1 Identified EIP service portfolio and RIR/EIP service landscape for Communities, (Y1) M2.2 Established Node Interoperability Working Group and preliminary requirements, (Y2) M2.3 Update of best practice guidelines on FAIRmetrics and use of maturity model, (Y3) M2.4 Strategic Implementation Study submitted for Node interoperability / Community, (Y2,3,4,5)

Deliverables D2.1 Interoperability Services Portfolio periodic report, (Y2, Y4) D2.2 Revised FAIRmetrics and CMMI based on practical experiences, (Y5) D2.3 FAIRfication and FAIR based analysis Best practice document, (Y4)

Task 3 Capacity Building Objective Dissemination and sustainable knowledge sharing by the documentation and distribution of know-how, identification of standard development best practices, and examples through the EIP online Knowledge Hub (linked to TeSS and other registries identified by ELIXIR and workshop/tutorial events).

Subtask 3.1 Knowledge Dissemination through the EIP Knowledge Hub • Develop and maintain EIP Knowledge Hub to its full functionalities by extending and updating the current EIP website to cover: public information and communication channels; non-public internal information dissemination portal; and to provide an EIP service registry and a service consultancy. • Work with UX expert and web developers to transform the analysis of user requirements into an agile development of the EIP Knowledge Hub to improve the existing EIP website to include: i) better browsing experience (user finds what they are looking for); ii) Help Desk mechanism for EIP resource consultancy including TeSS linking; and iii) seamless transition between public and private (intranet) portal (integrative public-private content access of information mandated by AAI single signon to allow an internal user to see specific content based on his/her permission level).

Subtask 3.2 Community Outreach and Training • Coordinate workshops and tutorials with the ELIXIR Training Platform addressing specific needs based on input from the ELIXIR technical and use-case communities (T2), and to consistently link and register Recommended Interoperability Resources and EIP services (T1), training material and events in TeSS portal. • The initial user requirements will be identified to scope out the areas of specific needs and support that the users require from EIP services by the Help Desk mechanism established in SubTask 3.1 and communications with the ELIXIR Communities (T2). • Work with the ELIXIR Training Platform to organise workshops and tutorials that cater to specific needs identified, cross-list training activities on TeSS Portal, and reach out to a wider audience by presenting EIP service framework and architecture at conferences and meetings internal and external to ELIXIR. • Feedbacks from each event are collected and circulated back to the Recommended Interoperability Resources and EIP service developers (Subtask 1.1, Subtask 1.2) and other relevant ELIXIR Platforms. Feed into the RIR Maturity Model, identify areas of improvement or missing services, and make appropriate recommendations of new candidate tools or improvement of tools to be developed. • Represent EIP by presenting at key international 52. Open Research Data and Data Management Plans. Information for ERC grantees by the ERC Scientific Council, https://goo.gl/Z2YSd8

conferences, and internal/external key user dissemination venues (Webinar, research roundtables, etc.), and collectively document feedback from participants.

Subtask 3.3 Interoperability Best Practices Implementation/Recommendation • Identify areas needing improvements to standard practices and develop a guideline for interoperability best practices to promote the methodology for data reusability and reproducibility, by recommending and routinely reviewing best practices of standards and methodologies development that reflect the dynamics of technology evolution. • Survey across Recommended Interoperability Resources and EIP services to identify the FAIR activities that should be formalised into a generalisable method rather than a service, e.g. evolving BYODs into a guideline for FAIR best practice, promoting Bioschemas implementation for Core Data Resources. • Work with relevant parties (EIP service developer, ELIXIR Communities, ELIXIR Technical Coordinators) to develop Interoperability best-practice handbook for FAIR implementation and raise awareness of the behind-the-scene interoperability activities to a wider audience through Knowledge Dissemination (Subtask 3.1), and Community Outreach (Subtask 3.2).

Outcome An improved content and EIP service registry organisation for public and internal communications on the EIP Knowledge Hub website for 2020, transitioning towards the comprehensive EIP Knowledge Hub online portal by the end of planning period (2023). A sustainable knowledge sharing mechanism by a pay-it-forward training (train-the-trainers), a real-time knowledge dissemination to the end operational users (train-the-trainees), and a user feedback collection process for a distribution of new requirement findings to relevant Platforms such as Tools and Training; Formalisation of FAIR activities from a service into a methodology when appropriate, and recognition of the cross-cutting interoperability best-practices underlying scientific research.

Risks & opportunities Opportunity – ELIXIR Interoperability Platform is recommended as a metadata standards reference in the ERC Open Research Data and Data Management Plan,52 where future collaboration with ERC grantees may present itself in the form of DMP consultation provided by EIP through the capacity building ELIXIR Scientific Programme 2019–23

81


activities; Collaborations with other ELIXIR Platforms as a response to new insight into a missing service or a service needing improvement; wider adoption of Recommended Interoperability Resources; Education/ early engagement possible through consultancy services and availability of documentation/guides, making success more likely. Risk – Maintenance and improvement of content and knowledge dissemination is a crucial activity in the dynamic landscape of evolving technologies; Guidelines (e.g. Bioschemas) become over-specified (too cumbersome) or underspecified (too general); CDRs and/or other ELIXIR resources fail to comply with guidelines.

Task 4

Subtask 4.2

Risks & opportunities

Interoperability Services for the Cloud

RIR Services on the Cloud

Opportunity – improved robustness of services available on >1 cloud e.g. improved uptime, increased portability; increased skill set for interoperability technical personnel; services will be aligned with international efforts such as Genomic Data Commons and GA4GH Cloud workstream; portable services available to restricted access data e.g. data scenarios where data is unable to move for ELSI reasons and services are required to move to data. Risks – overhead in delivering a cloud ready architecture for a service adds additional development overhead for new service; unknown long-term cost models for cloud operations

When will it be accomplished?

There is a shift to international resource provision in the cloud and concomitant changes to the delivery of data and databases necessary to ensure data and databases are interoperable going forward to meet the FAIR principles. As cloud-based deployment becomes more desirable and an increasingly standard practice, there is a need to ensure that interoperability services are ported to cloud deployments and available as cloud hosted services for use by ELIXIR Communities, as well as the Core Data Resources and Deposition Databases. This task relates ultimately to the EOSC objectives (Interoperability Platform has provided EOSC registries) together with extension through use of commercial and national clouds, enabling us to leverage EC and national funding in this space.

The delivery is a staged plan with multiple components, each of which will follow the documented schedule.

Events 8 Training events, 2 per annum (with Training Platform), (Y2, Y3, Y4, Y5) Potential venues for presentations and training outreach: ELIXIR-organised meetings: ELIXIR All-hands Meeting, ELIXIR-Biohackathon, Webinars Non-ELIXIR meetings: ISCB-ISMB, ISCB-ECCB, ICBO, NETTAB, EOSC Events

Objective Implementation and deployment of cloud-enabled EIP Interoperability RIRs and services to support EOSC projects: metadata exchange between catalogues and for e-Infrastructure services; containerisation of RIR services.

How will it be accomplished

Milestones M3.1 EIP Knowledge Hub UX, (Y2) M3.2 Training workshop, (Y2, Y3, Y4, Y5) M3.3 Periodic review of FAIR implementation best practices, (Y2, Y4)

Deliverables D3.1 Knowledge Hub Phase I, (Y2), Knowledge Hub Phase II, (Y3), Knowledge Hub Phase III, (Y5) D3.2 Reports on training workshops, (Y3, Y5) D3.3 Gap analysis of user requirement findings and recommendations of RIR resource and tool improvement within EIP and ELIXIR Platforms, (Y4)

82

ELIXIR Scientific Programme 2019–23

Subtask 4.1 Assessment of needs Assessment of needs with the identified stakeholders, capacity and training assessment with the providers groups, and design of a verification strategy with the Interoperability Platform members to determine success criteria. Identify any RIRs and EIP services needed and needs for FAIR data storage and processing in EOSC-Infrastructure Cloud.

• Identify interoperability resources working towards cloud hosting. Track their approach and progress, use as exemplars and advise from both technical and strategic perspectives. • Containerisation of Interoperability services, with Compute/Tools Platform. • Support deployments necessary for EOSC and other Clouds, e.g. the nascent Health Research and Innovation Cloud. • Share best practice across the EIP, across the ELIXIR infrastructure, and potentially further afield across different infrastructures in collaboration with the Compute Platform. • Collaboration with Compute Platform to provide funding model allowing ELIXIR Node resources to be made EOSC compatible, to address use cases from ELIXIR Communities.

Subtask 4.3 EDMI deployment ELIXIR RIR Registries – EOSC EDMI compliant and support inter-catalogue metadata exchange and EOSC e-Infrastructure services.

Outcome Establish the need for cloud deployed instances of individual RIRs through consultation with ELIXIR Communities and through established Use Cases; perform cost benefit analysis for each such RIR (in collaboration with the FAIR SA activities above and Compute Platform). Evaluate the landscape (broad coverage) and extent (depth) of current cloud-enabled interoperability resources within ELIXIR, and identify ‘glue’ services (cloud and non-cloud) to facilitate or support those existing services (process improvements for robustness, or to improve their cooperativity, etc.). Deliver a needs assessment and best-practice guidelines for cloud deployed services for the service consumers from ELIXIR Communities and use cases (in collaboration with the FAIR SA activities above and Compute Platform). Where indicated, documentation of designing of a transition plan for cloud deployment of prioritised services identified in the needs assessment, and for new services appearing throughout the Programme will be provided, this is a critical task for the FAIR SA Task above.

When will it be accomplished? Verified success is when all required (identified as ‘necessary’) Recommended Interoperability Resources are cloud hosted, evidenced through a test deployment. This is dependent on the resource adoption process and will be added as a requirement after the first two services are available as verified cloud deployed services.

Events Interoperability cloud hackathon, (M18)

Milestones M4.1 Cloud deployment best practice document, (Y1) M4.2 Cloud deployment best practice document updated, (Y2)

Deliverables D4.1 First two interoperability services cloud deployed, (Y2) D4.2 Cloud deployment as a criterion for interoperability resource selection, (Y3) D4.3 Implementation of the EDMI metadata guidelines for cloud deployed services, (Y4)

ELIXIR Scientific Programme 2019–23

83


Training Platform

Table 6: Training Platform groups and their objectives

The mission of the ELIXIR Training Platform is to establish an interactive training infrastructure that is supported and adopted by all member states. The infrastructure will strengthen national training programmes through the delivery of coherent, high-quality and impactful ELIXIR training in order to grow bioinformatics capacity and competence throughout Europe, and to empower researchers across Europe to use the ELIXIR services and tools.

Platform organisation Platform co-Leaders Celia van Gelder (ELIXIR-NL), Patricia Palagi (ELIXIRCH), Gabriella Rustici (ELIXIR UK)

Platform Coordinator Pascal Kahlem (ELIXIR Hub)

Scope and ambition The ELIXIR Training Platform was established to set up the training strategy within ELIXIR, with the objective to develop an interactive, ELIXIR-wide training community spanning all member states, in order to strengthen national training programmes, to grow bioinformatics capacity and competence across Europe, and to empower researchers to use the ELIXIR services and tools. The Training Platform establishes and implements best practices in bioinformatics training; supports training providers across Europe in developing and delivering training to three major target audiences – Developers, Researchers and Trainers; and aims to build a sustainable training infrastructure. It is important to note that the need for bioinformatics training is continually evolving, and that keeping up with the constant development of new technologies and infrastructure services is difficult, particularly for early-career researchers exposed to big data analysis, usually for the first time.

84

ELIXIR Scientific Programme 2019–23

The major challenges of the Training Platform are: boosting training capacity; extending the scope and geographical reach of training; identifying gaps and providing training in relevant, newly-emerging life science areas; and securing adequate funding to safeguard the sustainability of the established Training Infrastructure. These challenges can be tackled by: (i) delivering to stakeholders a timely, up-to-date and impactful portfolio of training resources; (ii) further developing and exploiting the ELIXIR large community of well-connected trainers; (iii) increasing the use of the e-learning ecosystem; and (iv) continually raising awareness of the existing training solutions available to the life science community.

The Training Platform in 2023 By 2023 we aim for an ELIXIR Training Infrastructure that: • will be a well-established point of reference for those who either seek training or seek to develop new training. • will offer a seamlessly-integrated technical infrastructure (including the flagship portal TeSS); a training toolkit that contains guidelines, metrics, training descriptors and Train the Trainer (TtT) materials; and a course portfolio to support the training needs of the ELIXIR Community. • will be a sustainable, organizational framework dedicated to training and will be adopted and implemented by all ELIXIR Nodes.

Groups

Aim

Training Toolkit

Expand the ELIXIR Training Toolkit, which includes established training guidelines, best practices, quality and impact metrics, training descriptors and competences. It will be implemented throughout the ELIXIR Nodes and made available to all course providers.

Gap analysis, training development and delivery

Ensure that any emerging training gaps are identified and tackled in this ever-evolving field. Develop training materials and deliver these trainings. Rerun and rollout mature courses.

Training technical infrastructure

Consolidate the current efforts to maintain TeSS as the ELIXIR Training Platform’s flagship resource. Ensure that the ELIXIR Training Toolkit and e-learning objects can be findable in TeSS, and that TeSS also properly connects to the other ELIXIR Platform resources (e.g. bio.tools, FAIRsharing, BioContainers, cloud/HPC/grid).

Training Capacity Building

Continue to build training capacity across ELIXIR Nodes, by implementing and rolling out the Train the Trainer (TtT) framework, built in EXCELERATE, and assess the training needs for management and operations across the Nodes.

Task 1

Subtask 1.1

Training Toolkit

Publication and adoption of the ELIXIR Training Toolkit

During the course of the EXCELERATE grant, the ELIXIR Training Platform has developed a series of training best practices and guidelines, covering several aspects of training development and delivery, including, but not limited to: how to design and implement training courses for different audiences (Train the Researcher, Train the Developer, etc.); how to make existing training discoverable using the Bioschemas specifications; how to use competency frameworks to design learning paths; how to collect feedback and determine the quality and impact of training activities, as well as building training capacity through the Train the Trainer programme. These practices are the building blocks of the ‘ELIXIR Training Toolkit’, a collection of resources for developing, delivering, and evaluating training that we wish all ELIXIR training providers to adopt. In particular, this represents an important resource for new training providers wishing to align their training efforts to the ELIXIR training programme, whether they are already part of ELIXIR or not. All providers adopting the Toolkit are required to report on the ELIXIR training that they put in place, utilizing the quality and impact framework developed in EXCELERATE.

The ELIXIR training best practices and guidelines will be made available to everyone involved in training organisation, in the form of a Handbook that collates all materials developed so far. The Handbook will include resources covering course development and organization; instructor training; impact assessment; online documentation for implementing Bioschemas specifications, etc., aiming to provide a comprehensive reference resource for developing training capacity, rolling out new training programs within Nodes, as well as expanding existing ones, following the strategies developed by the ELIXIR Training Platform. The Handbook will be made available through TeSS (in collaboration with T3) and will be a one-stop shop that consolidates all best practices and guidelines so far published, as well as the ones yet to be released.

ELIXIR Scientific Programme 2019–23

85


Outcome Handbook compiled and made available through TeSS

How will it be accomplished Create a set of dedicated pages for the Handbook on TeSS and populate them with materials, links and appropriate references to all ELIXIR training best practices and guidelines; some content development might be needed but most content should already exist.

Risks & Opportunities Opportunity: to engage with Nodes, ELIXIR Communities and ELIXIR partners. Risks: Low adoption of the Toolkit by Nodes and by the national programmes.

When will it be accomplished? Y3.

Subtask 1.2 Toolkit expansion – Impact assessment During EXCELERATE, the TP developed a strategy and sets of core metrics to measure quality and impact of the entire ELIXIR training programme. Quality, such as overall satisfaction, is measured in the short term via surveys at the end of a training event, while impact, including changes in trainees’ confidence in handling biological data, is measured in the long term via surveys 6 months to 1 year after an event. In addition, demographic data is captured for individual events (reported by the Training Coordinators and training providers, when not captured in the survey). To date, a good percentage of Nodes have implemented the strategy developed for collecting demographic information and measuring quality in the short term, but only few Nodes were in the position to implement the long-term strategy regarding impact assessment. More work is therefore needed to ensure that all Nodes adopt it and collect uniform, comparable data for long-term feedback. During the 2019–23 Programme, we wish to tailor the current strategy to also measure the impact that training is having on ELIXIR Services, Platforms and Communities. Although training on some ELIXIR Services, or targeting the needs of a specific Community, is already part of the ELIXIR training programme, we have not yet customized our impact strategy to measure the impact that training is having on the adoption of ELIXIR Services or the Communities at large. Similarly, although events have been organized in the past in collaboration with other Platforms, such as hackathons and ‘Bring Your Own’ (BYO) type of 86

ELIXIR Scientific Programme 2019–23

events, we have not yet formalized a way to capture demographic and quality information for these events or the effect that such training is having on the participants and on the Platforms.

When will it be accomplished?

Subtask 2.1

Y3.

Gap analysis in ELIXIR Platforms, Communities, Nodes and Industry

Furthermore, data collection will continue to be simplified and streamlined to facilitate the reporting and subsequent access to the data. Towards the end of EXCELERATE, the TP will have released a platform for collecting training statistics and impact data; this work is ongoing at the moment but we foresee that the functionality of the platform will need further development during the 2019–23 Programme, to improve its usability. In particular, we aim to provide a tool that training providers can use, not just for reporting, but also to access the data available for their training activities, as well as visualize their training contribution against the data collected for the entire ELIXIR training programme. Additionally, this platform will be linked to TeSS, pulling training events from TeSS and pushing additional event metadata back to TeSS, in turn introducing curation of the events displayed there.

M1.1 A set of dedicated pages for the Handbook on TeSS to be populated with materials, links and appropriate references to all ELIXIR training best practices and guidelines, (Y2) M1.2 Adoption of the short and long-term feedback strategies by the majority of ELIXIR Nodes, (Y2)

It has been proposed that the impact assessment strategy developed during EXCELERATE is utilized by other European projects and some funding has been secured to contribute to this in two recently submitted EC grant applications, the European Joint Programme on Rare Diseases and EOSC-Life. These external grants will provide additional funding to support this activity from 2020 onwards.

Outcome Adoption of the short and long-term feedback strategies by the majority of ELIXIR Nodes; tailoring of the current strategy to measure impact of training on ELIXIR services, Platforms and Communities; mature platform for submission and visualization of training statistics, quality and impact data by training providers, cross-linked to TeSS.

How will it be accomplished Collection of all statistics and feedback data from the majority of ELIXIR Nodes through the abovementioned platform; publication of a report presenting long-term impact within ELIXIR, including impact on ELIXIR Services, Platforms and Communities.

Risks & opportunities Risk: Strategy developed for assessing impact on Services, Platforms and Communities fails to provide a meaningful measure of such impact; no sufficient funding available to support development and maintenance of the data collection platform. Opportunity: Expansion of impact assessment strategy to measure impact of training on Services, Platforms and Communities.

Milestones

Deliverables D1.1 Handbook for ELIXIR Training Toolkit compiled and made available through TeSS, (Y3) D1.2 Mature platform for submission and visualization of training statistics, quality and impact data that is cross-linked to TeSS, (Y3)

The ELIXIR TP surveyed the training needs of the scientific communities in 2013 and 2016, and consequently took measures to fill the gaps identified with appropriate training. To ensure that new training gaps are identified and appropriate training courses are put in place to fill such gaps, the needs of the ELIXIR Platforms, Communities, Industry, Resource providers and Nodes will be investigated in a continuous and joint effort with all stakeholders.

Outcome A biannual report on the training needs and gaps covering all ELIXIR Platforms and Communities with a prioritization of training courses.

Task 2

How will it be accomplished

Gap analysis, training materials development and training delivery

The training needs for each Platform and Community will be identified through dedicated surveys and in a limited number of dedicated meetings between the TP and representatives of the Platforms, Communities, Nodes, resource providers and Industry; such meetings could be co-located with the All Hands Meeting or other ELIXIR events. It is foreseen that the Training Platform Coordinator will execute this task.

Objective The objectives of this task, from the perspective of the TP, are to: 1) identify training gaps; 2) facilitate the development of training materials; and 3) support the organization of training events. The need for bioinformatics training evolves constantly, due to the continuous development of new technologies, as well as the increasing number of ELIXIR Services and Communities. The ELIXIR Training Platform, jointly with all other ELIXIR Platforms, Communities and Nodes, will continue to identify emerging gaps in training provision across Europe, and to ensure that appropriate training solutions are developed and delivered, either by ELIXIR or the Nodes, in order to tackle such gaps. The ELIXIR Platforms, Communities and Nodes are composed of experts (scientists, developers, platform managers, etc.) who master the ELIXIR Services, resources and scientific domains. In the context of this task, they have the role of content providers, producers of training materials, and trainers of ELIXIR courses. The ELIXIR TP is composed of experts in course organization, course development (identification of target audiences, curriculum design, learning design, etc.), training material development, training new trainers (TtT), training coordination and impact assessment. All the tasks associated with a training event, from conception to delivery, need a tight collaboration between these two expert groups.

Risks & opportunities The main risks are the inadequacy of surveys or meaningless questions (survey designs must be carefully done), and to be effectively able to reach the right target audiences of those surveys. This task is a good opportunity to engage with ELIXIR Platforms, ELIXIR Communities, resource providers, Nodes and industrial partners (all ELIXIR users). The lack of manpower for this task is another risk. We plan to put this task on the Training Platform Coordinator and will evaluate after the first year if the task could be executed as foreseen in the plan.

When will it be accomplished? Y2, Y5.

ELIXIR Scientific Programme 2019–23

87


Subtask 2.2

Risks & opportunities

Development of Training materials for identified gaps and for ELIXIR Services, Resources, and Communities

The main risk is to ensure commitment (manpower and funding) of the Platforms and Communities to coorganize the hackathons.

In recent Implementation Studies (CWL, Beacons, and Bioschemas), the ELIXIR TP has allocated resources to support the development of training materials and to ensure their availability through TeSS. In the past, the TP has also organized hackathons where domain experts got together to produce training materials; the two examples are the hackathons for metagenomics and Galaxy, whose training materials are now available in TeSS for anyone to use. For the topics identified as training gaps in Task 2.1, and for those prioritized directly by the ELIXIR Platforms and Communities, the TP aims to organise hackathons, BYOD and BYOL (Bring your own lesson) events to collaboratively develop training materials, to consolidate existing training materials according to best practices, and at the same time, empower Nodes to develop skills in preparing such materials. Some topics already prioritized are AAI, proteomics and metabolomics. The TP will then promote the usage of the developed training materials into courses that will be provided by the Nodes or ELIXIR Platforms and Communities to be organized in a collaboration with the TP (Task 2.3). As indicated above, all hackathons will be co-productions. Considering the budget for this task, the TP will launch annual competitive calls for Platforms, Communities and Nodes to prioritise topics for a limited number of hackathons to develop training materials. In addition, the TP will participate and provide expertise in additional hackathons to develop training materials, which are organized and funded by other Platforms, Communities or Nodes.

Outcome Training materials produced for topics identified in the gap analysis and/or by the Platforms or Communities. Training materials available in TeSS and tracking of their reuse available.

How will it be accomplished The TP will organize hackathons to gather the experts in the identified topics and together produce training materials. These materials should be available to the Nodes, Platforms and Communities, exposed in TeSS, and their usage promoted in dedicated courses and in Train the Trainer courses. Trainers will be trained also in delivering the content of the training materials.

88

ELIXIR Scientific Programme 2019–23

When will it be accomplished? Y1, Y2, Y3, Y4, Y5

Subtask 2.3 Scaling up ELIXIR Training To be able to reach ELIXIR users in all Nodes and Communities we need to scale up ELIXIR Training, both by rolling out mature courses and by running courses on new topics (e.g. for those topics where training materials have been created in T2.2). Mature courses are, for example, those that have been successfully developed during EXCELERATE, such as ‘Genome assembly and annotation’, Galaxy, Chipster, or, SWC/ DC, to name a few. New topics such as biocuration, BioContainers, and Data Management & Data Stewardship, are currently been considered in specific Implementation Studies and the respective training materials are under development. For mature courses it is expected that Nodes and Communities can run these courses rather easily, with minimal support from the TP. For newly developed courses (e.g. those from T2.2) more support from the TP is needed. Due to the resources, training courses to be organized will be prioritized through competitive calls. Priority will be given to activities which can be co-funded by other Nodes, ELIXIR Platforms or Communities. The TP will collaborate with the other Platforms, Communities and Nodes to co-organize the courses selected through competitive calls. The Nodes, Platforms and Communities will contribute to this Task by bringing in their specific expertise and manpower related to a certain topic for the teaching part. The TP will support them with the expertise of organizing courses, best practices, usage of the Training Toolkit (T1), etc. This approach will allow to align the Node activities with the overarching goals of the ELIXIR training programme. It will also empower the Nodes to build and expand their national training portfolios and communities, and it will increase training capacity within the Nodes. It is to be expected that in particular the Nodes that have Training included in their Node Service Delivery Plan (SDP) will contribute to this Task, but the TP offer is open to all Nodes who want to organize training.

It is the ambition that in due time, established courses, identified as core courses that are in constant need (such as AAI, Genome Assembly and Annotation, etc.), will continue to be run by the Nodes and evolve into a set of training resources, representing what we consider fundamental training for all European Life Scientists and could become part of future “ELIXIR Core Training Resources” (Task 2.4).

Outcome • New courses delivered based on the training materials developed in T2.2, following the T1 best practices and guidelines. All course deliveries will follow the quality and impact ELIXIR KPI and best practices guidelines (Toolkit, T1). • Mature courses delivered based on identified needs and with contribution of Platforms, Communities and Nodes.

How will it be accomplished The TP will collaborate with the other Platforms, Communities and Nodes to co-organize the courses that need to be rolled-out. The TP will launch a competitive call for Platforms, Communities and Nodes to select topics for a limited number of training courses. In addition, the TP will support running of new or mature courses which are organized and funded by other Platforms, Communities or Nodes.

Risks & opportunities The main risk is to ensure commitment of the Platforms, Nodes and Communities to invest their expertise, manpower and funding. The opportunities are to align the Node training activities with the overarching ELIXIR training programme goals, to empower the Nodes to provide high-quality training independently, and to extend the ELIXIR training programme with the active participation of Nodes.

When will it be accomplished? Y1, Y2, Y3, Y4, Y5.

Subtask 2.4 Define ELIXIR Core Training Resources ELIXIR and ELIXIR Nodes are delivering high quality, timely training courses. In this task, the TP, together with the TrCG, will explore the creation of an ELIXIR Training Certificate. This will entail: (i) establishing what such Certificate means; (ii) defining criteria for providing and awarding the Certificate; and (iii) identifying ‘training resources’ that could be recipients of such Certificate, including key training topics, courses and ELIXIR resources/services/tools relevant for the ELIXIR Nodes and European Life Scientists. This approach would be similar to the process that was developed by the Data Platform for the identification of ELIXIR Core Data Resources, and will establish a mechanism for defining ELIXIR Core Training Resources, and an ELIXIR Training Curriculum.

Outcome A proposal defining ELIXIR Core Training Resources, ELIXIR Training Certificate and an ELIXIR Curriculum.

How will it be accomplished The TP and TrCG will create a working group who will be in charge of defining these concepts.

Risks & opportunities • Risk: to ensure commitment of Nodes to compose the working group; strategy developed to define ELIXIR Core Training Resources and an ELIXIR training curriculum fails to provide meaningful definitions that would be adopted/accepted by all ELIXIR Nodes. • Opportunities: Engage with Nodes, increase visibility of ELIXIR and raise ELIXIR to the top worldwide training stakeholders’ list.

When will it be accomplished? Y3.

Milestones M2.1 Training needs identified across representatives of Communities, Nodes, Platforms and Industry, (Y1, Y3) M2.2 Launch a competitive call for Platforms, Communities and Nodes to select topics for new courses to be developed, (Y1, Y2, Y3, Y4, Y5) M2.3 Hackathons to gather the experts and produce training materials, (Y1, Y2, Y3, Y4, Y5) M2.4 TP and TrCG to create a working group who will be in charge of defining the concept of ELIXIR Core Training Resources, (Y1)

ELIXIR Scientific Programme 2019–23

89


Deliverables D2.1 Report on the training needs and gaps for all ELIXIR Platforms and Communities with a prioritization of training courses, (Y2, Y5) D2.2 Training materials produced for topics identified in the gap analysis and materials made available in TeSS to ensure tracking of their reuse, (Y1, Y2, Y3, Y4, Y5) D2.3 New and mature courses delivered based on identified needs and with contribution of Platforms, Communities and Nodes, (Y1, Y2, Y3, Y4, Y5)

Task 3 Training Technical Infrastructure – TeSS The focus of this task is TeSS, the ELIXIR Training Platform’s flagship resource and key reference training portal for the ELIXIR; this is the principal channel of communication and aggregator of information about ELIXIR training events. TeSS began as an ELIXIR-UK pilot project prior to EXCELERATE, and has matured during EXCELERATE. As of July 2018, TeSS had displayed more than 8,000 training events, with a monthly average of at least 200 upcoming events; in addition, TeSS currently holds more than 1,000 training materials, aggregated from 51 training providers, of which 13 are ELIXIR Nodes. In the last 3 years, TeSS has been visited by around 13,000 unique users across Europe and beyond. This task has 4 main objectives: 1) to maintain and sustain the standard minimal features of TeSS, including critical dependencies with other ELIXIR registries and Nodes, with 3rd-party data and training content providers, etc.; 2) to develop TeSS (i.e. add functional capabilities); 3) to showcase, within TeSS, the training solutions available from all Nodes, both increasing their visibility and expanding the portfolio of TP’s e-learning activities; and 4) to make the ELIXIR Training Handbook available through TeSS.

Subtask 3.1 Maintenance of TeSS As any technical service, TeSS needs to be routinely maintained. The minimal activities to ensure the viability of the service include both basic sysadmin functions (security patches, back-ups, hosting and domain renewal, etc.) as well as support and maintenance of core features and services: these include: i) current automatic ingestions from content providers, to ensure that the information disseminated by TeSS is up-to-date; ii) user authorisation controls, to prevent predatory conference spam; iii) current widget code-base and services dependent on TeSS widgets, to sustain those dependencies; iv) integrations

90

ELIXIR Scientific Programme 2019–23

between TeSS and other registries (bio.tools, FAIRSharing, Impact platform, etc.) and 3rd-party services (e.g. Nominatim, Google Maps, Google Calendar, Bioportal); and v) the Bioschemas efforts, to ensure continued community work on interoperable training specifications. For 2020–2023, these activities are proposed be carried out under the umbrella of the proposal for TeSS to become an ELIXIR Infrastructure Service. During 2019, it will be critical for the TP, working with the UK Node, to articulate the requirements that TeSS has for its routine maintenance to be funded as an Infrastructure Service, building on the following:

Outcome TeSS maintained as the flagship for the ELIXIR Training Platform and ELIXIR.

How will it be accomplished Routine sysadmin of TeSS, maintenance of integrations between TeSS & other registries (bio.tools, FAIRSharing, Impact platform, etc.), maintenance of integrations between TeSS & 3rd-party services; continued community work on interoperable training specifications with the Bioschemas group.

Risks & opportunities • Not being able to fully integrate with other ELIXIR resources (mitigation: develop decision-support tools to help curators decide which links to be integrated); dependencies on automatic scrapers, which break when source websites are no longer maintained (mitigation: provision of troubleshooting documents to help providers fix broken scrapers, and Bioschemas adoption to reduce broken scrapers in the long term). • To ensure integration with resources from the Interoperability, Tools and Compute Platforms, to reach out to ELIXIR users to make ELIXIR events, and training opportunities and materials discoverable.

When will it be accomplished? Y1 If TeSS does transition to an ELIXIR Infrastructure Service in late 2019, this will release funds allocated to the Training Platform, currently for the maintenance of TeSS, to do development work associated with TeSS or other work of priority to the Training Platform and ELIXIR.

Subtask 3.2 Future technical developments of TeSS It will be important to ensure that TeSS is adopted by all Nodes, and kept up to date to remain the key reference training portal for the ELIXIR TP and Community. In the building stages, several consultations took place to shape TeSS development, mainly with the TrCG and mostly in the format of a number of workshops. New developments will be guided by the feedback from the Training Coordinators, as well as from TeSS users participating in a series of UX activities to be launched in the last year of EXCELERATE and continued in the period 2020-2023. Several new developments and features are already envisioned: • Develop new automatic ingestions for content providers outside ELIXIR and new Nodes joining ELIXIR. • Implement features for providers to access dynamic reports on TeSS traffic for each of their registered resources. • Implement features to automate discovery & ingestion of resources, searching websites for structured data containing training content. In particular, TeSS will be linked to the platform for collecting training statistics and impact data developed in T1. • Develop new tools to automatically extract categorical information (topics, keywords, authors, difficulty levels, prerequisites, etc.) from TeSS content. • Develop techniques to semi-automatically form associations between TeSS resources & those in other ELIXIR registries. • Conduct and implement recommendations from a 2nd UX study. • Integrate with other registries relevant to training: –– containers registered in BioContainers, designed to offer the correct computer environment for a class –– computational infrastructures that allow classes to run educational example code such as Dodona (ELIXIR-BE). • Develop Node ‘shop windows’, displaying details of their upcoming events, their resources, specialisations, personnel, etc. • Implement ‘calendar view’ to facilitate organisation of events & avoid scheduling conflicts – provide overviews of historic & upcoming events around target dates using event data in TeSS. • Allow organisations wishing to set up a training portal to deploy an instance of TeSS. Modify codebase to make settings & styles easily configurable by developer wishing to set up training catalogues across different disciplines.

• Ensure TeSS and bio.tools are adequately crosslinked to ensure information about the tools (bio.tools), deployments (BioContainers, Bioconda), composition of tools into workflows and provision in online services (CWL, Galaxy), and the scientific and technical performance (OpenEbench) are available.

Subtask 3.3 Expand the e-learning portfolio of the ELIXIR Platform In the first ELIXIR Programme, the e-learning activities of the TP mostly focused on the ELIXIR e-Learning platform (EeLP), a Moodle-based environment, run by ELIXIR-SI. Several courses on topics identified by the Nodes as training needs, have been successfully implemented and run in a synchronous manner. In several Nodes (EBI, BE, CH, etc.), additional e-learning activities are taking place, using different Learning Management Systems, and offering asynchronous e-learning modules on a large variety of subjects. This task will ensure that those modules become an integral part of the ELIXIR e-learning strategy, and are visible and easily findable in TeSS. In order to further expand its e-learning portfolio, the TP will also explore the possible extension of the ELIXIR e-learning activities to produce training videos for a selection of topics/ courses identified in T2 and T4.

Outcome e-learning section in TeSS.

How will it be accomplished e-learning activities of the Nodes will be tagged as e-learning in TeSS, and a dedicated section in TeSS will be created to make these e-learning activities visible and findable in TeSS.

Risks & opportunities Risks – Not a clear definition of e-learning is used and users become confused by the information displayed in TeSS, the activities in EeLP are not accessible in TeSS. Opportunities: to engage with all ELIXIR Nodes, and to attract more users to TeSS.

When will it be accomplished? Y1

ELIXIR Scientific Programme 2019–23

91


Subtask 3.4

Task 4

Risks & opportunities

When will it be accomplished?

Integrate the ELIXIR Training Handbook in TeSS

Training Capacity Building

Risk – to ensure commitment of Nodes to volunteer to train in TtT courses. Opportunities – engage with Nodes and training community, improve training capacity in Nodes.

Y1.

The ELIXIR Training Handbook, described in Task 1.1 as the collection of resources for developing, delivering and evaluating training, should be made available through TeSS. This work represents, in general, setting up a specific data-curation procedure, performing the data curation itself and some minimal developments to integrate the Handbook into TeSS, thus extending the functionalities of TeSS as the central landing page for all ELIXIR Training activities.

Outcome ELIXIR Training Handbook available in TeSS for all ELIXIR Nodes.

How will it be accomplished New developments will be made to integrate the ELIXIR Training Handbook in TeSS.

Risks & opportunities Opportunities – attract more Nodes to use TeSS systematically.

When will it be accomplished? Y1

Milestones M3.1 Complete integrations between TeSS & other registries (bio.tools, FAIR-Sharing, platform for training statistics, quality and impact data), (Y1) M3.2 Yearly release of the TeSS updates, (Y1, Y2, Y3, Y4, Y5) M3.3 Tag e-learning activities of the Nodes as e-learning in TeSS, (Y1) M3.4 Develop TeSS in order to integrate the ELIXIR Trainig Handbook, (Y1)

Deliverables D3.1 Report on maintanance of TeSS, (Y1) D3.2 Report on usage of TeSS by all ELIXIR Nodes, (Y4) D3.3 A new e-learning section in TeSS is established, (Y1) D3.4 ELIXIR Training Handbook draft available in TeSS for all ELIXIR Nodes, (Y1)

The ELIXIR network has grown in recent years, with new Nodes and Communities joining. Empowering the new ELIXIR members to grow their skills and capacities is necessary in order to maintain the quality of ELIXIR training, services and resources. The TP will continue to expand the Train the Trainer framework, that was developed in EXCELERATE, to increase the skills of ELIXIR trainers as well as the number of trainers across Nodes. The Nodes’ capacity in management and operations will be assessed and the appropriate measures will be taken to fill the identified training gaps.

Subtask 4.1 Train the Trainer As one of the flagship programmes of the TP, the TtT programme will continue to promote the development of materials and the delivery of courses. The impact of this programme will be measured, in collaboration with T1: by creating a dedicated set of metrics, which should include numbers of new trainers trained, number of events they have trained on after the participation to a TtT event, and whether this was facilitated by the ELIXIR Train the Trainers’ Exchange Programme; as well as new training partnerships established across Nodes, and changes in trainers’ practices. During the first year we will also assess the impact of the ELIXIR Train the Trainers’ Exchange Programme and will provide a recommendation whether this should be continued or not. We will also consolidate the best practices guidelines of the TtT programme and establish a mechanism for certifying ELIXIR trainers. By 2023, we aim to have a TtT coordinating point in each ELIXIR Node (which could, but does not have to be, the national TrC).

Outcome At least 4 TtT courses per year (leading to a growing number of trainers, courses and students across ELIXIR), trainers exchange among Nodes, training materials constantly improved, assessment of the ELIXIR Train the Trainers’ Exchange Programme.

How will it be accomplished The TP will organize the TtT events, and ELIXIR trainers will volunteer to teach in the TtT events. The Nodes involved in the task will contribute to the development of the training materials.

92

ELIXIR Scientific Programme 2019–23

When will it be accomplished? Y1, Y2, Y3, Y4, Y5.

Subtask 4.2 Management and operations – Node development A gap analysis will be undertaken in 2019 to assess the needs of ELIXIR Node staff for training in management and operations. A working group will be formed to design and implement this analysis. According to the needs, we will evaluate existing programmes and provide recommendations on suitable training available to ELIXIR Nodes’ personnel. It is expected that training will not be provided by ELIXIR, which does not have such expertise, but we will look for external providers. Once suitable training is identified, training resources on management and operations will be made available through TeSS, where possible. In 2018, ELIXIR sponsored a team formed by personnel involved in the coordination of ELIXIR Nodes (e.g. Heads of Nodes, Technical, Training and Node Coordinators) to participate in the RItrain Programme on Training in the Management of Research Infrastructures. The TP will assess if the experience is successful and scalable, and will provide recommendation to ELIXIR on the continuation of this training.

Outcome Report on the assessment of the needs in training of management and operations across Nodes, recommendations on suitable training available to ELIXIR Nodes’ personnel.

How will it be accomplished A working group will be formed to design and implement needs’ analysis, and be responsible for producing the assessment and recommendations reports.

Milestones M4.1 TP to organize at least 4 TtT events per year. ELIXIR trainers will volunteer to teach in the TtT events. The Nodes involved in the task will contribute to the development of the training materials. (Y1, Y2, Y3, Y4, Y5) M4.2 Form a working group to design and implement needs’ analysis, and be responsible for producing the assessment and recommendations reports, (Y1)

Deliverables D4.1 Deliver at least 4 Train the Trainer courses per year, (Y1, Y2, Y3, Y4, Y5) D4.2 Report on the assessment of the needs in training of management and operations across Nodes, recommendations on suitable training available to ELIXIR Nodes' personnel, (Y1)

Other important strategic areas for the Training Platform This strategic task is currently unfunded in the Training Platform 5-year plan but is still regarded as important. Therefore, opportunity for resources may be sought via Community-led Implementation Studies or external funding.

Plans for FAIR Training Working Group The FAIR principles start to be widely adopted, slowly but surely, by the scientific community, and their implications towards training need to be identified. A first wide discussion about ELIXIR Training & FAIR took place at the workshop “How to make training FAIR” held at the All Hands Meeting in Berlin, where several actions have been identified, including the need to establish a working group with members both from ELIXIR as well as external to ELIXIR. This working group should be the forum to discuss the challenges, issues, and needed developments for FAIR training. It will prepare the next steps and a roadmap to ensure FAIRness of ELIXIR training, as well as facilitate fruitful and long-lasting partnerships.

Risks & opportunities • Risk – to ensure commitment of Nodes to compose the working group; strategy developed for assessing training needs fails to provide a meaningful measure of such needs. • Opportunities – engage with Nodes and empower them to build intra-Node capacity.

ELIXIR Scientific Programme 2019–23

93


Communities

Implementation Studies will be allocated via a regular Request for Proposals (RFP) mechanism, the expected timetable for which is illustrated in the timeline on the next page. ELIXIR’s ongoing support of Community activities also provides support for meetings (typically annually).

Communities are ELIXIR’s means to capture user needs into formal requirements, driving the development of standards, tools and workflows, and to enable alignment between ELIXIR and other Research Infrastructures. They drive the development of the ELIXIR Platforms by providing use cases for strategically important application areas.

Communities Coordinator John Hancock (ELIXIR Hub)

Scope and ambition The ELIXIR Communities will, by their breadth of focus and relevance to strategic scientific goals, drive the evolution of ELIXIR over the 2019–23 Programme period. They will do this by capturing the requirements of important communities with a need to make use of and advance ELIXIR’s infrastructure and by cementing links between the communities and with the ELIXIR Platforms. ELIXIR recognises different kinds of user communities: those dealing with a specific research area (e.g. rare heritable diseases), those dealing with a major technology (e.g. proteomics), and those providing specialist user support (e.g. Galaxy). ELIXIR Communities are broad-based across ELIXIR Nodes and interact across ELIXIR Platforms, enabling them to recommend standard solutions such as transferable workflows using standardised components and containers, or collaborations to access ELIXIRassociated clouds. Through ELIXIR, the expert communities also have the opportunity to engage in and drive global standardisation efforts, e.g. via ELIXIR’s collaboration with GA4GH Robust services to securely manage, discover, share and analyse access-controlled data for human genomics and translational research are integral to the success of ELIXIR. Through the newly developed ‘Human Data Communities’ umbrella, which covers the Human Data, Rare Diseases, and the proposed Human Copy Number Variation Communities, ELIXIR will

94

ELIXIR Scientific Programme 2019–23

ELIXIR Communities are expected to make a longterm contribution to the evolution of ELIXIR but we understand that they are likely to have a life cycle, with some being retired when they no longer make an active contribution to ELIXIR, with the option being available to merge or split Communities if appropriate. A review of the Communities portfolio will take place biannually, scheduled for 2020 and 2022. The ELIXIR Communities will form a central plank of ELIXIR activities and will be well-integrated with the ELIXIR Platforms. We expect to expand the portfolio of ELIXIR Communities during the Programme to fill any strategic gaps we identify.

Our Communities are typically well-established groupings with a significant critical mass (ELIXIR partners with established communities, we do not develop Communities de novo). The incorporation of new Communities into ELIXIR follows formal calls for Expressions of Interest which lead to EoIs being prioritised for further action by the HoN (see Figure 5). Following prioritisation, selected new Communities will receive the support they need to hold a face to face (F2F) meeting and to develop a roadmap, to be published in F1000R. Once a roadmap has been developed, it will be evaluated by the HoN and, if agreed, the new Community will become part of ELIXIR. It will then be eligible to receive Implementation Study funding to begin its integration into ELIXIR activities and to enable linking with ELIXIR Platforms. Formally accepted Communities are also eligible to lead or participate in Communityled Implementation Studies. These will be driven by Communities’ needs and will additionally involve two Platforms as participants. The resource for these

• A combination of bottom-up and top-down processes will be used to bring new Communities into the portfolio. The portfolio will be reviewed at the mid-term of the Programme to evaluate its appropriateness, identify potential new Communities from a strategic perspective, and re-shape the portfolio if necessary. • RFP call will be used to fund scientifically excellent Implementation Studies, led by Communities and integrating Platforms.

Risks We do not have an appropriate portfolio of Communities to respond to significant funding calls, e.g. from the EC. The strategic review is intended to address this.

Opportunities

provide a joined-up solution of Data, Tools, Compute and Interoperability services for access-controlled human data to the different translational research communities. As well as playing a critical role in the evolution of ELIXIR, the ELIXIR Communities facilitate formal collaboration with other BMS RIs, reflecting the many existing collaborations between expert staff in our Nodes and colleagues in the national Nodes of other research infrastructures. Finally, the Communities allow ELIXIR to collaborate with research communities in Nodes engaged with research topics that did not form part of our portfolio of Use Cases in the first ELIXIR Programme.

This outcome has two components: the enlargement of the Communities portfolio, and the development and implementation of processes to support the integration of the Communities and Platforms.

Closer linking of Communities and Platforms; closer links to other RIs; keeping ELIXIR at the cutting edge of computational requirements

IDEA

Mature community: annual meetings, eligible for RFPs

Expression of interest

Approval by HoNs

Possible attendance by Platform Coordinators and Platform Leaders

Hold inaugural face-to-face meeting

Approval by HoNs, Implementation study to establish the community

F1000 paper to describe plans for the community

Figure 5: Process of establishing ELIXIR Communities

ELIXIR Scientific Programme 2019–23

95


Timeline over the planned development of ELIXIR Communities and indicative timing for RFPs for Community-led Implementation Studies 2017 Q4 Review: selection of communities (aligned with RFP 1)

2020 Q2 Review: roadmap of Communities and possibly strategic addition of new Community/ies

2018 Q4 RFP1. Call for RFP open (Selected projects duration June 2019 – May 2021)

2021 Q2 Mid-term review (MTR): Review all Communities as part of the MTR of the Programme

2020 Q4 RFP2. Call for RFP open (Selected projects duration June 2021 – May 2023)

Human Data Communities

2023 Q4 RFP3. Possibility for a third RFP to bridge to and prepare for the next Programme

2020 Q4 Review: Update roadmap (where appropriate, based on MTR) to prepare for next Programme (2024–2028)

Figure 6: Planned development of ELIXIR Communities

Table 7: ELIXIR Communities at the start of the 2019–23 Programme Community

Focus

Federated Human Data

Develops long-term strategies for managing and accessing sensitive human data

Rare Diseases

Supports the development of new therapies for rare diseases

Marine Metagenomics

Develops a sustainable metagenomics infrastructure to nurture research and innovation in the marine domain

Plant Sciences

Develops an infrastructure to facilitate genotype-phenotype analysis for crop and tree species

The ELIXIR Human Data Communities (HDCs) aim to create a mechanism where ELIXIR Nodes, Communities, and key projects can co-develop a long-term European strategy for sharing sensitive human research data based on a set of flexible, production quality and sustainable components. This is not to detract from the identities of the participating communities, but instead provide a forum for coordination and communication across these powerful entities. It is important that the HDCs maintain the flexibility and agility to remain dynamic over time to construct and operate a sustainable infrastructure for Human Data in Europe to support life science research and its translation into medicine at scale. Of critical importance is to ensure data that can and will be shared, responsibly, and in line with General Data Protection Regulations and ethical policies, to preserve the trust given by people participating in research and volunteering their data. Head of Human Genomics and Translational Data Serena Scollen (ELIXIR Hub)

Galaxy

Monitors and fosters the use of Galaxy in ELIXIR

Metabolomics

Identifies and addresses the principal challenges within metabolomics within the scope and mission of ELIXIR

Proteomics

Develops and maintains sustainable proteomics tools and data resources

96

ELIXIR Scientific Programme 2019–23

Human Data Coordinator Gary Saunders (ELIXIR Hub) The ELIXIR Human Data Communities (HDCs) are composed of Federated Human Data and Rare Disease Communities. The HDCs were identified as a critical point of investment early in the ELIXIR 2014–18 Programme and consequently, the HDC is the only Community with dedicated staff employed at the Hub and therefore the plans for the 2019–23 Programme are detailed below.

Scope and ambition Challenge Genomic technologies have advanced over the last decade to the extent that the generation of sequence data, even whole-genome sequence data, is no longer prohibited by cost and time. This has led to a significant increase in the generation of genomics data by research communities and healthcare institutes. Indeed, it is projected that most DNA sequencing in the future is likely to be generated for healthcare and not directly for research, and that by 2025, we will have a virtual cohort of genomics data from millions of human participants. However, neither the scientific nor medical communities are on a path to use these data effectively. Currently, human genomics data are geographically distributed and difficult to find, often stored in project-specific databases. In addition, an infrastructure to provide secure access to sensitive human data has not been developed, and the inconsistent annotation of data between datasets blocks their aggregation, presenting a major hurdle for the integration, knowledge mining, and advanced modelling of human genomic data. There is, all too

ELIXIR Scientific Programme 2019–23

97


Overview of the ELIXIR Human Data Communities

By 2023

ELIXIR Platforms ELIXIR Node services

H

Key partners

es

Nodes projects and EU grants

data Commu an ni m ti u Flagship projects

• The HDCs will be established as the key mechanism by which ELIXIR Nodes and Communities can codevelop the long-term European strategy for the sharing of sensitive human data, whilst enabling alignment and input into global and National requirements and projects. • The HDCs will establish of funding and a strategic workflow to secure additional ELIXIR and external funding for novel and innovative projects, and will position the HDCs to continue their role as a global lead of infrastructure development to share human data, leading to the translation of genomics research into medicine.

e celerate

2019–23 Goal: Develop a longterm strategy for sharing sensitive human data Federated Human Data

e.g. EJP-RD, FAIR+, EOSC-Life, CINECA

Figure 6: ELIXIR Human Data Communities and their interactions within and outside ELIXIR.

often, little opportunity for the reuse of data outside of, or beyond, the core project. Furthermore, scientists are frequently inclined not to share or archive data, for a variety of reasons, such as to maintain a research advantage, and owing to issues relating to consent, proprietary constraints, time, cost, and possible attempts to identify participants.

Opportunities The HDCs aim to develop a framework to facilitate the secure discovery, dissemination, archiving and analysis of human data (thereby, increasing FAIR data services). Increased opportunities to reuse data will increase the rate of scientific discovery and validation, leading to greater impact in the scientific and medical/healthcare fields. A secure authentication and authorisation process is essential to enable users to access human data without compromising privacy and/or informed consent. Incentives, procedures and standards need to be developed so that the human data that can be shared will be shared, responsibly and in line with GDPR and ethical policies. Working as an

98

ELIXIR Scientific Programme 2019–23

integrated community, alongside Global and National Initiatives (such as GA4GH, the Nordic Tryggve Project, National Biobanks and National sequencing projects) to achieve this, it will be possible to demonstrate how the discoverability, access, and analysis of genomic and linked phenotypic data at a scale of millions of participants will drive the translation of genomics research into medicine.

General objective The overall objective is to establish a robust, European-level infrastructure (standards, tools and services) that enables human data research communities and healthcare institutes to discover, access, manage and re-use human data to advance genomics and health. Providing a sustainable infrastructure for users that manages data identifiers, secure data archiving and access, and ensures mappings between resources will enable long-term, cost-effective, data management and will drive “standards as the default” across the European life science and health data landscape. The work will be carried out across five groups (see Table 8).

Recently, European countries signed an EU Declaration to sequence and share transnationally at least 1M human genomes by 2022.53 This initiative will catalyse the transition of genomics from the bench to bedside in Europe. We envisage that a significant subset of this data will be made available for secondary research. In the Strategic Implementation Study on Federated Human Data, we aim to provide the necessary infrastructure to meet the aims of the Declaration by coordinating the delivery of FAIR compliant metadata standards, interfaces, and reference implementation to support the federated ELIXIR network of human data resources. Our vision is to provide secure, standardized, documented and interoperable services under the framework of the European Genome-phenome Archive (EGA). The three-year plan, 2019–21, includes a structured roadmap for ELIXIR Nodes to join the EGA federated network by providing the necessary technical, logistical, and training coordination across the network. By the end of 2021, the overall goal for the ELIXIR Federated Human Data Community is to create a federated ecosystem of interoperable services that enables population scale genomic and biomolecular data to be accessible across international borders accelerating research and improving the health of individuals resident across Europe.

53. Declaration of Cooperation: Towards access to at least 1 million sequenced genomes in the European Union by 2022, http://ec.europa.eu/newsroom/dae/ document.cfm?doc_id=50964

This project builds on earlier work in the ELIXIREXCELERATE, CORBEL and Tryggve projects. It will be led by the European Genome-phenome Archive to ensure work described in this proposal is aligned with the policies, legal agreements, and governance model for establishing the Federated EGA. There is a specific work package that will build on work in EXCELERATE WP9 to create a reference software implementation, the Local EGA, that Nodes can use to operate their federated Node. In establishing a federated network to share sensitive human data across the ELIXIR Nodes by the end of the 2021, the Community will be ready to focus on the interoperable analysis of these data in the remaining two years of the project (2022–23).

Rare Diseases In 2017, the International Rare Diseases Research Consortium (IRDiRC) announced its 10-year vision for Rare Diseases (RD) research (2017–2027) to enable all people that are living with a RD to receive an accurate diagnosis, care, and available therapy within one year of coming to medical attention. It is critical that the ELIXIR RD Community aligns its strategy and specific objectives for adapting and adopting ELIXIR infrastructure towards these IRDiRC goals in order to support the RD community to achieve its challenges. The ELIXIR RD infrastructure will underpin the required increase in efficient and effective use of data in the RD domain. An important goal of the RD user community is to interconnect the available European (and international) RD infrastructure with the general ELIXIR infrastructure, aligning with the ELIXIR Platforms on common solutions, and building upon the services provided by ELIXIR Nodes. In addition, it will evaluate models for long-term sustainability of key European infrastructures dedicated to RD research and provide a multifunctional RD training environment for RD projects and researchers. The Strategic Implementation Study during 2019–21 will provide foundational infrastructure building blocks for the RD community, building on ELIXIR-supporting infrastructure and objectives: • Analysis of ‘-omics’ data OR more simply just ‘Data analysis’ • Ensuring data are findable, accessible, interoperable and re-useable (FAIR) • Training For the first pillar, RD have benefited from the development of new molecular biology techniques and methodologies provided by -omics approaches. With support from ELIXIR and BBMRI, and others, the RD-Connect platform54 is emerging as one of the

54. http://rd-connect.eu/

ELIXIR Scientific Programme 2019–23

99


Table 8: ELIXIR Human Communities groups and their objectives Group

Aim

Federated Human Data Community

Sensitive Human data will be internationally accessible (e.g. through federated EGA)

Rare Diseases Community

Rare disease resources will be networked and analyses made reproducible

ELIXIR-Beacon

The Beacon will be a networked framework for querying other types of genomic variation data and also options for quantitative and qualitative responses

New/emerging Human Data Communities

Expand the breadth of focus across the collective Human Data Communities in alignment with the ELIXIR 2019–23 Programme goals

National and Global Initiatives & Key European Commission funded projects

Alignment to key projects and initiatives to demonstrate ELIXIR added value, access best practice, contribute to developing infrastructure, and to link to, as well as input into, GA4GH standards and national infrastructure

main European online platforms for RD research, complementing the RD information hub Orphanet.55 It currently supports analysis of harmonized genome data through the Genome-Phenome Analysis Platform (GPAP)56 and provides access to a range of resources such as the Biobank and Registry Finder,57 the RD Biosample Catalogue58 and other tools already listed in ELIXIR bio.tools.59 Many of the RD-Connect resources, including the GPAP, require complex computational and integrative bioinformatics strategies and infrastructures to facilitate analysis and interpretation of data. This proposal will support the RD community by aligning and securely interconnecting existing international infrastructures (RD-Connect, EGA, and TranSMART) with the general ELIXIR infrastructure. Tasks will develop upon services provided by ELIXIR Nodes and international standards such as those from the Global Alliance for Genomics and Health (GA4GH). For the second pillar, ‘FAIR at source’, the RD community, with support from ELIXIR and ELIXIREXCELERATE, has been in the forefront of developing and applying FAIR principles. Indeed, FAIR guiding principles have been accepted as an IRDiRC recognised resource,60 and pilots are running to fine-tune FAIRification protocols and tooling through the crossproject rare disease data linkage plan (RDDLP) which links the ELIXIR Rare Disease Use Case and the RD community with the ELIXIR Interoperability Platform and services. An important challenge for the RD community is now to scale up FAIRification and define specific RD evaluation metrics. This proposal will build

100

ELIXIR Scientific Programme 2019–23

on the FAIR metrics described by Wilkinson et al.61 to establish a framework for RD data FAIRification by providing RD-specific recommendations and FAIRness scores. This framework will be evaluated through the FAIRNess estimation of different key RD resources such as Orphanet62 and the RD-Connect platform.63 Once defined, the RD-adapted FAIR metrics will be prepared for IRDiRC recognition to serve the entire RD community. Finally, for the third pillar, training, RD research has benefited from multiple training activities organised by or in collaboration with ELIXIR-EXCELERATE and the ELIXIR Rare Disease Use Case. However, the RD community is missing a tailored digital environment for the collection, sharing and reuse of the training materials produced. This proposal aims to bring the ELIXIR training infrastructure to the RD community through the integration of RD training materials into the TeSS64 environment. We will start developing this RD digital environment using training materials from courses and workshops organised to promote the FAIRification of Rare Disease registries. This procedure entails creating a framework for organising, structuring, and annotating training material for easy access and deployment in training events. This work on the FAIRification of Rare Diseases resources during 2019–21 will allow the Community to focus on the interoperability and update of these services in the two remaining years of the Programme (2022–23).

ELIXIR-Beacons for human data The 2018 Beacon Implementation Study established the ELIXIR Beacons as a GA4GH Driver project with full alignment to GA4GH Technical Work Streams, and consolidated the process of “Lighting a Beacon” for any ELIXIR Node, based on the ELIXIR reference implementation. Additionally, this Implementation Study designed and prototyped the Beacon Network interoperability, while evaluating related security concerns and preparing future scalability. It also started the evaluation process of how to procure the ELIXIR Beacon Network and the Node Beacons as an ELIXIR Infrastructure Service. During 2019–21 the main aims of the Strategic Implementation Study on ELIXIR Beacons for human data are to: • Extend the Beacon protocol to become the reference ELIXIR Data Discovery product, through expanding query options and providing richer responses, with view on biomedical applications, and in alignment with developing GA4GH standards • Deliver ELIXIR Beacon Network as an established ELIXIR service • Leverage ELIXIR Nodes to increase data flow through Beacon services • Actively support the integration of the Beacon API with human data resources throughout ELIXIR, with particular view on National Genomics Initiatives, biobanks, and Human Data Communities (such as Rare Diseases and Human CNV) To achieve these goals, it is imperative to continue a strategic partnership with the GA4GH Work Streams as a Driver Project and to increase coordination with the ELIXIR Platforms. The focus is to prioritise and deliver future European requirements on Beacon and Beacon Network API development, continue to develop the overall security framework for this service type, and contribute to the GA4GH Discovery Work Stream goals towards a global genomic query language. We also believe that the expectations for true interoperability between the ELIXIR and GA4GH services will increase during the project lifetime. The ELIXIR Beacon Driver Project status within GA4GH will permit this service to become the testbed for interoperability – and hence the coordination between ELIXIR AAI and GA4GH Data Use and Researcher Identities Work Stream will be important for the ELIXIR Beacon project. By end of 2021, we will have a Beacon Reference Implementation with an updated GA4GH specification v.2.0 that supports enhanced metadata response types and query interfaces. The project will also develop and deploy the Beacon and Beacon Network to address more real-world use cases, for example, through use in biomedical genomic discovery applications. This will ensure that the Beacon, Registry, and Network APIs

evolve and expand to address both scientific research and clinical research user needs. The Programme mid-term review in 2021, will provide ELIXIR with an opportunity to assess the outcomes of the Beacon project and plan the next steps for the project for the remaining two years of the Programme (2022–23).

New/emerging Human Data Communities As ELIXIR becomes larger and more complex the Communities facilitate the broadening of the range of its application domains. Therefore, critical to the success of the HDCs is the ability to remain dynamic over time. During 2019–23 the number, breadth, and cross-linkage of the HDCs will grow in order to reflect ELIXIR’s organisational maturation. In addition to the two existing Communities (Federated Human Data and Rare Diseases) the HDCs shall support the provisionally accepted emerging Community, Human Copy Number Variation, that is due to complete official ELIXIR endorsement during 2019. Additionally, the HDCs shall work to coordinate at least two successful proposals from the next ELIXIR Communities call, which is expected during 2020. It is envisaged that two successful applications for HDCs Communities will come out of that call. Although each of the pillaring Communities within the HDCs will be a well-established entity with a number of funding streams, the key to success will be cross-community development, coordination and communication. This cross-community collaboration will result in the identification of a set of key ELIXIR services that must be supported and maintained to meet the challenges of sharing and analysing human data at scale.

55. https://www.orpha.net/consor/cgi-bin/index.php 56. https://rd-connect.eu/analysis-platform 57. http://catalogue.rd-connect.eu 58. https://rd-connect.eu/biosamples-data/sample-catalogue 59. https://bio.tools 60. www.irdirc.org/activities/irdirc-recognized-resources 61. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6018520 62. https://www.orpha.net/consor/cgi-bin/index.php 63. https://platform.rd-connect.eu/

ELIXIR Scientific Programme 2019–23

101


National and Global Initiatives & key European Commission funded projects Recent years have seen the emergence of many large cohorts of human samples from research and national healthcare initiatives. A number of ELIXIR member countries have nascent personalised medicine programmes (See also Table 1 on page 9) meaning that human genomics is undergoing a step change from being a predominantly research-driven activity to one funded through healthcare. For example, France Médecine Génomique 2025, has an investment of €670 million over the first five years to sequence 235,000 genomes annually by 2020. Another example, FinnGen, plans to analyse up to 500,000 unique blood samples collected by a nation-wide network of Finnish biobanks with the goal to increase understanding about the origins of diseases and their treatment. Furthermore, in April 2018, the Health ministries of a number of European Union member states (including ELIXIR Nodes) signed a joint declaration committing to sequence and share at least one million human genomes by 2022. Establishing and building upon the touch points ELIXIR Nodes already have, as well as potentially coordinating infrastructure/pilot projects required to ensure that data is FAIR wherever possible, will be important during this shift in the genomics field. In addition to this, during 2019–23, each ELIXIR Node is likely to be involved in a number of independent, Horizon 2020 European Commission funded projects. Moreover, Human Data Communities members will continue to collaborate through EC funded projects such as Neuromics, Eurenomics, Eurobiobank, CINECA, EOSC-Life, RD-Connect, ELIXIREXCELERATE, EJP-RD, Tryggve, and Solve-RD.

102

ELIXIR Scientific Programme 2019–23

Working together, the HDCs will build and strengthen collaboration; align Node infrastructures; promote the reuse of existing components; collaborate with GA4GH to input into and disseminate standards; jointly build on existing services and identifying gaps and needs for new ones, and generate coordinated funding applications in order to carry out this work. To avoid unnecessary disparate efforts, it will be key for ELIXIR HDCs to communicate with National and Global Initiatives and projects. Specifically, during 2019–23 the HDCs will continue to coordinate and promote monthly teleconference meetings of National Initiative and key EC funded project representatives, plus one annual face-to-face meeting, in order to meet the challenges of working internationally with sensitive human data. Furthermore, the HDCs will continue to promote publication of relevant material in our ELIXIR F1000R channel, annually sponsor a minimum of two international workshops, and promote the addition of training materials to our Training eSupport System (TeSS). Throughout 2019–23, the HDCs will develop a greater understanding of the requirements of National Initiatives and projects in order to co-develop a longterm strategy for the development of infrastructures for data sharing, which will result in expedited translation of genomics into medicine.

Interactions between Platforms, Communities and Nodes

The interaction of, and coordination among, individual Platforms, and among Platforms and other ELIXIR groups, such as Communities and Nodes, is a key theme in the 2019–23 Programme, and provides the foundation for service uptake, use and impact by users in academia and industry. The aim is to build on the success of the individual Platforms that was realised in the first Programme by creating a more scientifically and technically cohesive plan, with projects arising from multiple Platforms and Communities and that directly deliver value to bioinformatics users. In the 2014-18 Programme, this interaction was built into the EXCELERATE project; going forward, strategic drivers need to be established to ensure that such interactions happen because ELIXIR activities will be funded from a portfolio of grants and other sources. Hence, in the 2019–23 Programme cross-Platform interactions are engineered through a variety of funding and organisational strategies. For instance, operationally a number of Inter- and Intra-Platform activities will drive the alignment of services, such as clouds with workflows and reference data provision. There will be a strict requirement for the Implementation Studies proposed by Communities to include at least two Platforms. Organisationally, a rota of Platform All ExCo meetings, with agendas focussed on cross-Platform interactions and on building the relationships between Platforms, Communities and Nodes, will drive the strategic alignment of the individual Platform roadmaps.

In addition, operating efficiently across the Platforms, Communities and Nodes, will directly support some of ELIXIR’s key strategic drivers, as encapsulated by the proposed set of Strategic Implementation Studies. For instance, ELIXIR can only realise the required ecosystem of Services that support secure but FAIR Human Data through a coordinated approach that involves the Human Data Communities but also the Compute and Interoperability Platforms. Similarly, if ELIXIR is to realise an integrated suite of interoperable and standardised Registries across the life science domain, then all Platforms will need to be involved, since all Platforms have at least one Registry as a key infrastructural component. This same argument applies to most of the projects proposed for strategic investment e.g. Deployment of Containers, Bioschemas, Data management plans, etc. To this end, the Hub proposes to take a proactive role in coordinating some of these key projects, via the Platform coordinators, to ensure that the strategic direction across Platforms is on track. The overall vision, for the Platforms by 2023, will therefore be to realise a portfolio of key Infrastructure Services, built on the needs of the Communities, and realised through a coordinated set of Strategic investments that bring together the Nodes and the Platforms.

ELIXIR Scientific Programme 2019–23

103


Cross-Platform strategic priorities

As ELIXIR has evolved, it has identified a number of areas that cut across Platforms and that represent topics of particular strategic importance. The most mature of these is the Human Data Communities (HDCs, described in detail above), which is now the subject of allocated headcount at the Hub and receives significant ELIXIR and EC project funding. Whilst not currently planned, it is possible that in the 2019–23 Programme, other current Communities or other strategic initiatives (such as Plant sciences, Marine metagenomics or Data Management) could evolve into a sufficiently mature position to operate in a fashion analogous to of the HDCs. In addition, as the organisation has matured, several other areas of strategic interest have emerged, not directly aligned to the Communities, which will also be areas of focus moving forward. This could either be through direct funding or by investment of time by ELIXIR staff to develop the necessary thinking and collaborative alignments. Through these mechanisms, ELIXIR can provide focussed, enabling funds that advance a strategic area and that also drive its adoption across the Nodes. During the 2019–2023 Programme, we plan to focus on the following through direct funding of activities in the Nodes via Implementation Studies:

Bioschemas Bioschemas is a community initiative supported by ELIXIR that aims to improve the annotation of structured metadata to enhance discoverability in the life sciences via Schema.org markup. The goal of Bioschemas in 2019-2023 is: to have a wide adoption of Bioschemas markup across bioinformatics resources; to build applications to use Bioschemas metadata; and to establish the community with a sustainable governance and funding model. The planned work to support Bioschemas will be funded through a Strategic Implementation Study, starting in mid 2019. One aspect of the planned study will be to help lay the foundation of the activities needed for Bioschemas to become a “spun-out” independent and selfsustaining initiative.

104

ELIXIR Scientific Programme 2019–23

Registries Registries (also known as metadata catalogues, metadata registries or metadata indexes) are databases of curated metadata descriptions. They facilitate the discovery of scattered digital resources (databases, datasets, publications, software, services, standards, etc.). Registries collect the necessary metadata to help users to find digital resources and direct them to the original source of information. The goal of the ELIXIR Registries working group in 2019–2023 is to implement a common registry strategy that interlinks registries to drive findability and contextualisation, and to exploit registries and their applications. The major topics identified by the ELIXIR registries strategy are Data, Tools and Community. “Data” includes work to improve the content consistency of the data registries, a plan to better connect the content of our registries, as well as actions to improve the description and registration of metadata. “Tools” is about the common development of tools and standards that support the operation of our registries. It includes the development of registry components that could be reused across registries, as well as within third party services, and a plan to better harmonise our interfaces, including the adoption of guidelines and best practices. “Community” focuses on knowledge exchange, community building, training and promotion, including a task targeting how to support the sustainability of the ELIXIR ecosystem of registries. The work to support Registries is planned to be funded through a Strategic Implementation Study, starting in mid 2019.

Transnational access to containers Software containers have rapidly evolved to become a standard way of deploying not only the software tools used and/or developed by researchers, but also the crucial environment and libraries utilised by these tools. Typically, only technical specialists can easily redeploy research-oriented software in their own workflows. Containers simplify this process to allow wider reuse. To benefit from container functionality, users typically need access to large-scale compute resources and the availability of key datasets, such as the Core Data Resources, which will be provided via ELIXIR. The ELIXIR Compute Platform therefore intends to pilot the provisioning of standardised container(s), as provided by the Tools Platform, for deployment across a number of Cloud environments, including ELIXIR Nodes and, potentially, commercialbased endpoints. The work will be aligned with the developing GA4GH standards around containerised workflow deployment in the cloud, in which ELIXIR plays an important role. The goal is that for ELIXIR to lead the standards for Tools Registry Service integration and Common Workflow Language and how these should be implemented. This, in turn, will enable maximum reuse of ELIXIR provisioned resources for researchers and external partners.

Data Management in Nodes Interoperability works best when data and metadata are managed at source. This includes activities such as (meta)data curation, schematic markup, software implementation, and standard file-handling protocols. Achieving FAIRness requires a cross-Platform technical support, and we plan to explore these assertions via a Strategic Implementation Study in the 2021–22 time frame. This Strategy-driven Implementation Study aims to systematically evaluate ongoing Communityled Implementation Studies across all Nodes to define a long-term sustainable course of support to assist the establishment of Data Management in Nodes. The data managers of each Node will, in turn, make data sharing across the ELIXIR network more robust, sustainable, and FAIR. This study will involve various tasks, such as gathering users’ interoperability service requirements, identifying their Data/Tools/Compute/ Training support needs, facilitating communications, identifying commonalities and synergies across communities, stimulating and encouraging the “reuse” of existing resources and/or the development of new “needed” resources, and promoting Community and technology choices (e.g. Linked data, CWL, Standard APIs, Containerisation). As stated above, we plan to perform this work by using Strategic Implementation Study funding. However, in advance of that, we will pursue a broader funding effort though an application, focussed on distributed data management, in response to the Infradev-3 EC funding call in early 2019. If this proposal is funded, we will adapt our plans for Data Management within the Programme, which will be described in the 2020 Work Plan.

ELIXIR Scientific Programme 2019–23

105


During the 2019–23 Programme we plan on focussing on the following through investment of time by ELIXIR staff to develop the thinking and collaborative alignments:

Recommended Services A core activity that has emerged for ELIXIR is to provide recommendations on good practice, resources and services for the life science user community. These services will form part of ELIXIR’s strategy to support the users by providing them with clear recommendations on what resources to use, for example, for depositing, linking or analysing their data. For instance, ELIXIR Deposition Databases provide an authoritative reference for long-term deposition of biomolecular data. A user that wants to develop data management plans, for example, will be able to navigate the ELIXIR universe using our Recommended Services on Data and Interoperability. ELIXIR Recommended Services undergo a selection process within ELIXIR and are agreed by the ELIXIR Heads of Nodes Committee. By identifying classes of recommended services tailored to specific tasks, ELIXIR will ensure that users can easily find and use those services that best meet their needs. Over the next Programme we will continue to supplement and enhance the portfolio of Recommended Services, including the Deposition Databases, the Recommended Interoperability Resources and the ELIXIR Registries. Other classes of Services may be considered as we evolve our thinking in this area. As these efforts evolve, we envisage that the recommended services will be surfaced through the ELIXIR web-site in a way that helps and guides the wider life science community as they, for instance, develop data management plans.

64. Regev A, Teichmann SA, Lander ES, Amit I, Benoist C, Birney E et al. Human Cell Atlas. eLife 2017, 6:e27041 https://doi.org/10.7554/eLife.27041, http:// www.humancellatlas.org

106

ELIXIR Scientific Programme 2019–23

Human Cell Atlas The Human Cell Atlas project64 (HCA) aims “To create comprehensive reference maps of all human cells, as a basis for both understanding human health and diagnosing, monitoring, and treating disease.” The HCA is gathering transcriptomic data from a comprehensive set of cell types (human and non-human) with the aim of making the data freely available and accessible to third party applications via a standardised API. The data it will make available, will be of major value to ELIXIR participants, particularly within the Human Data Communities. The EMBL-EBI is contributing to the HCA by providing the data ingestion service for the project’s databases. All HCA software will be open source. The HCA received significant initial funding from the Chan Zuckerberg Initiative and has recently received additional funding from the Wellcome Trust. ELIXIR proposes to establish a strategic partnership with the HCA to aid collaboration between the HCA and ELIXIR partners. This will involve encouraging HCA software re-use, the development of third-party applications making use of HCA data, and building links that lead to joint funding applications.

Biodiversity Biodiversity informatics, associated with the generation and aggregation of Biodiversity related data, both from traditional observational methodologies and from emerging technology such as meta-genomic sequencing or remote sensing, is anticipated to emerge as a key global priority over the next few years. The work within the Marine Metagenomics Community in the first Programme has already been particularly significant in this area. There are efforts underway to expose novel species taxonomic and occurrence data, derived from sea water sample metagenomic sequencing, in the Global Biodiversity Information Facility (GBIF). It is anticipated that this could be the foundation for a more significant engagement between ELIXIR and the wider Biodiversity community, including GBIF itself, the Ocean Biogeographic Information System (OBIS) and the Distributed System of Scientific Collections DISSCO, as ELIXIR enters this Programme. In the first instance, ELIXIR Hub staff will engage with potential collaborators and with Nodes to continue efforts to populate data from the Marine Metagenomics Community into GBIF.

Industry

As a publicly-funded infrastructure, one of the principle ways ELIXIR aims to deliver a return to funders is through the jobs and growth that come from companies using the services and resources run by Nodes. The bioinformatics sector is projected to reach USD 16–18 Billion in value by 2021,65 making it a major contributor to wealth and prosperity globally. In recent years, there has been a move from fully integrated to open innovation, which realises the mutual benefits that arise from public-private partnerships. Innovation in this sector is often a collaborative effort that involves multiple stakeholders including universities, technology, biotech and pharmaceutical companies, governmental institutions and funders working in collaboration. The creation of this open innovation ecosystem is the only way to tackle the grand challenges in the life sciences, and at the heart of this innovation is public life science data. We recognise the added-value of collaborative approaches and aim to build strong and long-lasting relationships with the private sector. Within ELIXIR, different models are employed to ensure that activities and services are aligned with the needs of industry, for example, through seeking high-level input via the Industry Advisory Committee. ELIXIR will continue in this Programme to work closely with industry partners through multi-stakeholder projects (such as the IMI-funded FAIRplus project), informal collaborations (e.g. Pistoia Alliance) and through our Innovation and SME programme to ensure the uptake of ELIXIR services and to assure industry partners that ELIXIR services are of industry standard. By the end of 2023, ELIXIR aims to have become recognised as an essential part of the bioinformatics ecosystem, supporting open innovation in academia and industry, by providing service resources and collaboration opportunities that facilitate datadriven innovation.

ELIXIR’s Industry Strategy ELIXIR’s current industry engagement is based on the objectives of the ELIXIR Industry Strategy, a stand-alone document, first published in 2016 under the ELIXIR-EXCELERATE grant. In order to ensure a better alignment between the Industry Strategy and this ELIXIR Scientific Programme, the timeline for the new Industry Strategy will be brought into line with the Programme cycle of 2019–2023. Thus, a review of the Industry Strategy will take place alongside the mid-term review of the Scientific Programme, and will include an assessment of its effectiveness against indicators and metrics. The new ELIXIR Industry Strategy builds on the successful initiatives of the previously published document, but offers more concrete options for industrial partners to engage with ELIXIR Nodes, Platforms and Communities. Figure 7 presents an overview of the types and levels of engagement between ELIXIR and the private sector for 2019–2023.

User innovation: the Innovation and SME Forum A key activity within the ELIXIR Industry Strategy is the Innovation and SME Forum, a series of specialised events that provide a space for networking between the experts in ELIXIR and industry. The 2019–23 Programme will see several events per year taking place in ELIXIR Member States hosted by ELIXIR Nodes. Themes of the events will be based on topics relevant to ELIXIR Communities and Platforms and aim to bring together expertise from across the ELIXIR infrastructure, serving as a focal point for further interactions between ELIXIR Nodes, SMEs, large multinational companies and industry clusters/associations.

65. Markets and Markets: Bioinformatics Market by Sector (Molecular Medicine, Agriculture, Forensic, Animal, Research & Gene Therapy), Product (Sequencing Platforms, Knowledge Management & Data Analysis) & Application (Genomics, Proteomics & Metabolomics) – Global Forecast to 2021, https://goo.gl/YmBVCm

ELIXIR Scientific Programme 2019–23

107


Overview of the different levels of engagement and commitment in ELIXIR-industry interactions.

Engagement intensity (Commitment level)

Awareness & visibility • Conferences • Accessing Training (TeSS) • Web presence, social media, print media

Low-level involvement & small scale project • Innovation & SME Forum • Bioinfomatics Suppliers Forum • SME report • One-off small scale projects

Joint collaboration & support • Joint grant application/research project • ELIXIR Community membership (observer status) • Industry Staff Exchange • Joint workshops on topics relevant to ELIXIR

Level 2

Level 3

Level 1 (Low)

Strategic partnership Long-term projects of mutual benefit to ELIXIR and industry partner (with financial involvement on both sides). Both parties involved shape the direction of a resource-intensive and challenging project • Industry as a partner in a Node • Industry as a formal partner in a research initiative (e.g. IMI FAIRPlus) • Industry as a steering partner in a community

The focus of ELIXIR’s industry engagement to this date has mostly been centred around users of ELIXIR services. For the 2019–23 Programme, we plan to launch an activity focussed on industry providers of bioinformatics services (including HPC, commercial compute, software, etc.) through the ELIXIR Bioinformatics Suppliers Forum. Working closely with the ELIXIR Compute Platform and others, the new forum will provide suppliers with access to an open forum and networking space on neutral ground, where technical information relating to their products and services can be presented, and an opportunity for use of these services by consuming organisations can be proposed. The introduction of this new industry activity will be initiated with a first event in Q2 of 2019.

Ensuring visibility in industry engagement efforts To showcase ELIXIR’s efforts in engaging with industry, a descriptive report will be published at the end of 2018, covering the many ways in which ELIXIR engages, provides services to, or formally collaborates with industry. This report will act as a starting point to capture the entirety of ELIXIR’s industry engagement more formally and will provide best practice examples for other ELIXIR members to adopt. The report will be updated on a regular basis throughout the period 2019–2023. 108

ELIXIR Scientific Programme 2019–23

The 2019–23 Programme will see the inclusion of several new concrete measures and projects to engage industry more formally on different levels.

ELIXIR Industry Staff Exchange ELIXIR community partner

New ELIXIR initiatives with industry engagement

Level 4 (High)

Figure 7: Different levels of engagement in ELIXIR interactions with industry.

Supply-side innovation: the ELIXIR Bioinformatics Suppliers Forum

New activities and projects to support industry engagement

Further, based on the report Public data resources as a business model for SMEs66 published in 2018, which describes the existence of an ecosystem of companies that fundamentally rely on public data resources for their operation, we will publish regular updates to this report, showing how the landscape has developed. Future reports will include a more-economic impact assessment of this entrepreneurial ecosystem, which will in turn highlight the economic importance of ELIXIR and will emphasise the importance of services provided by ELIXIR members of the wider research community to industry. Both reports will require active engagement of ELIXIR Nodes. The strengthening of the internal ELIXIR industry group is therefore key to achieving this, and with developing capacity within ELIXIR Nodes for running successful Node-level industry engagement activities. By the end of 2023, the aim is to have at least one industry contact person from each Node as part of this group.

66. Roman Garcia P, Smith A and Blomberg N. Public data resources as a business model for SMEs. The Role of Public Bioinformatics Infrastructure in supporting innovation in the life sciences. F1000Research 2018, 7(ELIXIR):590 (document), https://doi.org/10.7490/f1000research.1115445.1

FAIRplus project

Figure 8: Overview of novel industry engagement initiatives. The colour scheme matches engagement levels in the figure on the opposite page.

New ELIXIR Hub-funded activities for industry engagement ELIXIR-industry staff exchange programme As a novel way of engaging with industry, we will extend the current ELIXIR Staff Exchange Programme to allow for staff exchanges between industry and staff in ELIXIR Nodes. Exchanges will be for a short period of time and focussed on a project related to ELIXIR services, Communities or Implementation Studies. The aim of this staff exchange initiative is to embed ELIXIR more closely into the industrial ecosystem, foster open innovation, and help Nodes to build sustainable partnerships with industry partners. The ELIXIRindustry Staff Exchange Programme will see a pilot exchange take place in 2019 to establish a framework for future exchanges. Due to ELIXIR’s rules on funding partners – only institutes in ELIXIR Nodes will be eligible to receive funding – industry will co-finance its involvement in this scheme. In addition to this staff exchange programme, which will be funded through the Scientific Programme, industry and Nodes will also be encouraged to seek complementary funding from alternative staff exchange schemes, such as the Marie Sklodowskowska Curie Research and Innovation

Staff Exchange (RISE) scheme funded through Horizon 2020, and the FAIR Fellowship Programme, which will be funded as part of the FAIRplus project.

ELIXIR Community partnership We also plan to create ways for companies to participate in ELIXIR Communities as partners. We propose two tiers of engagement: Tier 1: Observer (knowledge transfer, no financial contribution): –– Community partner is able to join ELIXIR Community communication events (e.g. TC, F2F, Workshops, etc.) –– Community partner is able to contribute to ELIXIR Community literature (e.g. internal/external documents, publications, meeting agendas, etc.) –– Community partners have no influence on project proposals or research topics. Tier 2: Strategic partnership (knowledge transfer, financial contribution) –– Community partner commits to contribute financially to an ELIXIR Community. –– Community partner is able to influence project proposals and direct research topics in concordance with ELIXIR aims and strategies. –– Community partner is able to join ELIXIR Community communication events and to contribute to ELIXIR Community literature. A pilot project with an industry partner as a formal partner included in an ELIXIR Community is planned for 2019 to establish a framework for future engagement opportunities.

Collaboration on external multi-stakeholder grants – the FAIRplus project Under the scope of the IMI FAIRplus grant starting in 2019, we will directly engage with IMI & EFPIA consortia, academic and industry partners to develop the guidelines and tools needed to make data FAIR. The project proposal includes the realisation of FAIR Innovation & SME Events which will enable wide data reuse and will foster an innovation ecosystem around these data that power future reuse, knowledge generation, and societal benefit. We will work closely with our partners in this project, ranging from large multinational companies to SMEs and academic institutes, to attract the participation of other SMEs to our FAIR Innovation and SME events, with the goal to develop data analysis tools and services to facilitate future collaboration.

ELIXIR Scientific Programme 2019–23

109


International outlook

In this section, the term ‘International’ means ‘beyond Europe’, with ‘Europe’ defined as the countries of the European Research Area.67

An international perspective of ELIXIR Bioinformatics is a global science, as evidenced by the geographic location of its data producers, service providers, and users. ELIXIR’s services have a truly global user base: for instance, Orphanet, the international rare disease and orphan drug database, has around 782,000 visitors per month from most countries of the globe.68 A number of ELIXIR services, such as UniProt69 (a database on protein sequence and functional information), are run as part of international (i.e. reaching beyond Europe) collaborations and funding arrangements. Research infrastructures, such as ELIXIR, underpin access to life science data, and hence are at the core of bioinformatics research, and its many applications in the fields of health (e.g. personalised medicine), food security (e.g. aquaculture) and the environment (e.g. pollution). These applications are of significant societal and economic benefits. ELIXIR has established formal agreements with international bioinformaticsrelated initiatives: for instance, it collaborates with GA4GH in developing and promoting standards and frameworks for the responsible sharing and reuse of genomics data.70 Finally, in the policy sphere, ELIXIR is recognised as a Research Infrastructure of Global Interest by the Group of Senior Officials of the G7.71

67. ERA progress: http://ec.europa.eu/research/era/eraprogress_en.htm 68. The international rare disease and orphan drug database: bridging healthcare & research, 2017 https://www.orpha.net/orphacom/cahiers/docs/GB/ ActivityReportLeaflet2017.pdf 69. UniProt is a collaboration between the European Bioinformatics Institute (EMBL-EBI), the SIB Swiss Institute of Bioinformatics and the Protein Information Resource (PIR). See list of donors at: https://www.uniprot.org/ help/about

110

ELIXIR Scientific Programme 2019–23

Table 9: The four objectives of ELIXIR's International Strategy and the stakeholders they target

ELIXIR’s International Strategy Similar to ELIXIR’s industry Strategy, ELIXIR’s International Strategy72 was first developed and published as part of the ELIXIR-EXCELERATE project. It is a stand-alone document that describes the current and future collaborations between ELIXIR Nodes, Platforms and Communities, and key bioinformatics initiatives and countries outside the European Research Area. The International Strategy supports the vision of standards-based and well-annotated life science data that are internationally shared and accessed.

Objectives

Target stakeholder(s)

1. Scale-up the international user base of ELIXIR’s services

Users of ELIXIR’s bioinformatics services in countries beyond Europe

2a. Improve bioinformatics services and promote global standards through formal/ concrete collaborations with:

International bioinformatics-related initiatives and organisations (e.g. GA4GH)

2b. Improve bioinformatics services and promote global standards through formal/ concrete collaborations with:

National-level bioinformatics service providers in countries beyond Europe (in support of Objective 3)

3. Expand Membership in ELIXIR beyond Europe

Government officials in Ministries of countries beyond Europe, with support from the country’s bioinformatics community (outcome of Objective #2b)

4. Ensure that ELIXIR is recognised as an infrastructure of global relevance, and a partner of choice by intergovernmental organisations

Intergovernmental organisations, including Policy-makers, (e.g. OECD and the G7’s Group of Senior Officials)

The International Strategy describes the current ELIXIR activities of international relevance, and planned implementation actions to reinforce ELIXIR’s global significance and impact. These actions, which are articulated around the four strategic objectives (Table on the next page), are annexed to the Strategy. The Strategy will continue to be updated on an annual basis, based on progress and to account for the evolution of the international bioinformatics landscape. The international audience of the Strategy encompasses users of bioinformatics services, national-level bioinformatics infrastructures, and government officials in Ministries; also targeted by this Strategy are international bioinformatics-related initiatives and organisations, and intergovernmental organisations. Closer to home, the Strategy aims to raise awareness of ELIXIR’s global significance among ELIXIR participants in the Member countries, and to guide efforts to ensure coherent actions in support of the objectives of the Strategy.

70. https://www.elixir-europe.org/news/elixir-and-ga4gh-agree-collaborationstrategy 71. Group of Senior Officials on Global Research Infrastructures. Progress Report 2015. Meeting of the G7 Science Ministers, 8-9 October 2015, https://www. bmbf.de/files/151109_G7_Broschere.pdf 72. ELIXIR International Strategy 2018: https://www.elixir-europe.org/elixirinternational-strategy

ELIXIR Scientific Programme 2019–23

111


Management

The ELIXIR Hub will continue to look to simplify, wherever possible, the bureaucracy for Nodes when entering and running EC and IMI grants. In particular, in large grants where funders and the leads of consortia wish for ELIXIR to engage as a single entity, there is an increasing need to use the Link Third Party mechanism to bring in institutes involved in ELIXIR Nodes. The Hub will work with Nodes to try to ensure this is done as efficiently as possible, including through improving the Collaboration Agreement template.

Approach for external grants The ELIXIR Hub will continue to seek additional external funding from Horizon 2020 (H2020), its successor programme, Horizon Europe, the Innovative Medicines Initiative (IMI) and also from other sources where appropriate (for example, COST) to ensure an external portfolio of funding to support the activities and services run by ELIXIR Nodes. The ELIXIR Hub aims to support the whole project life-cycle, from influencing and shaping the landscape of life science infrastructure funding, through to the identification of forthcoming and current opportunities (The External Relations team at the Hub), to assisting with the coordination of joint ELIXIR applications and the preparation of materials for ELIXIR Nodes,73 and the management of awarded grants (The Project Management Unit at the Hub). The major, high-profile ELIXIR-EXCELERATE grant will come to an end in 2019. This single grant provided substantial funding for ELIXIR Nodes, and helped to establish ELIXIR Platforms and Use Cases, the latter of which has now become ELIXIR Communities. The end of this grant – dedicated solely to implementing ELIXIR – necessitates a change in approach for external funding; rather than one single grant, ELIXIR’s future

grant portfolio will comprise numerous grants across many parts of the H2020 and Horizon Europe, from the Research Infrastructure Programme to the Health Work Programme, and to new areas that ELIXIR has not yet engaged in as closely, such as the Food, Agriculture, Fisheries and Biotech Work Programme. During the ELIXIR 2019–23 Programme, we therefore expect more engagement from ELIXIR Communities (supported by the ELIXIR Hub when appropriate) in the submission and implementation of successful proposals to scientific calls, building on the experience of the European Joint Programme Cofund in Rare Diseases. These grants will provide long-term sustainability for the ELIXIR Communities, and for the ELIXIR Platforms and services, that will enable them to accomplish their scientific goals. The Hub will continue to coordinate proposals when necessary and appropriate (such as EOSC-Life and FAIRplus) and will also have, as a priority, to submit a successful proposal to the INFRADEV-3 topic, which closes in spring 2019, and which if funded would provide a dedicated grant to implement ELIXIR, albeit at a far smaller scale than EXCELERATE. The current ELIXIR Hub portfolio for external projects is listed below:

Table 10: ELIXIR portfolio of external projects Hub Role

Funding

Project

Start date

End date

Coordinator

EC

ELIXIR-EXCELERATE

1/9/2015

31/8/2019

CORBEL

1/9/2015

31/5/2020

EOSC-Life*

1/3/2019**

28/2/2023**

IMI

FAIRplus*

1/1/2019**

30/6/2022**

EC

AARC2

1/5/2017

30/4/2019

RI IMPACT PATHWAYS

1/1/2018

31/6/2020

EU-STANDS4PM*

1/1/2019**

31/12/2021**

EJP RD*

1/1/2019**

31/12/2023**

eTRANSAFE

1/9/2017

31/8/2022

Beneficiary

IMI

*In the negotiation phase, Grant Agreement to be signed before the end of 2018. ** Estimated

112

ELIXIR Scientific Programme 2019–23

Long-term funding strategy The ELIXIR 2014–2018 Programme included several activities aimed at supporting long-term sustainability, both of ELIXIR itself and the individual services run by Nodes. For example, ELIXIR Switzerland carried out an Implementation Study on Funding Models for Knowledgebases;74 the Data Platform’s work on Core Data Resources has led directly to the establishment of a Global Coalition to support the sustainability of life science resources; and ELIXIR’s External Relations team has worked to ensure that the funding and policylandscape is conductive to ELIXIR engaging in grant proposals. However, there is not yet a formal Strategy document that describes these actions in detail, nor who owns them, nor what metrics will be used to assess whether the actions have been effective. In the ELIXIR 2019–23 Programme, the ELIXIR Hub will produce, publish and implement a formal Longterm Funding Strategy for ELIXIR. This will build on the seven recommendations made by ELIXIR’s Long Term Sustainability Working Group, providing greater visibility to the work ELIXIR will undertake against each of those recommendations. The Long-term Funding Strategy will be owned by the External Relations team, and will touch on aspects of other relevant Strategies, such as ELIXIR’s International Strategy (particularly in relation to its interaction with relevant global initiatives), ELIXIR’s Industry Strategy (in relation to showcasing the benefits of ELIXIR to industry) and ELIXIR’s Communication Strategy (in relation to communicating the benefits of ELIXIR, and harnessing vocal bioinformatics champions).

Programme review The ELIXIR 2019–23 Programme will include a formal mid-term review in 2021. Based on the outcomes of this review, there will be an opportunity to realign work within the Platforms and the allocation of funding in the Commissioned Services, based on the evaluation by external experts in the ELIXIR SAB.

Overall risk register The ELIXIR risk management process applies a distributed approach that allows for the risk to be managed and monitored by the appropriate body at each moment in time. ELIXIR risks are logged into the ELIXIR risk register and reviewed according to the management process.

Key results, milestones and deliverables For each strategic objective of the 2019–23 Programme, ELIXIR has defined key results to follow the progress towards these objectives. The Platform work packages define the expected outcomes as high-level milestones and deliverables, while the exact delivery schedule for each task will be defined in the detailed project plans with the participating Nodes, to be prepared alongside the Commissioned Services Contracts.

73. ELIXIR Handbook of Operations: https://www.elixir-europe.org/about-us/ governance/handbook-operations 74. Gabella C, Durinx C and Appel R. Funding knowledgebases: Towards a sustainable funding model for the UniProt use case. F1000Research 2018, 6(ELIXIR):2051, https://doi.org/10.12688/f1000research.12989.2

ELIXIR Scientific Programme 2019–23

113


Credits and acknowledgements

With special thanks to all of those who contributed to the development of the ELIXIR 2019–23 Scientific Programme, most notably Heads of Nodes, Platform and Community Leaders, Technical and Training Coordinators and members of the various Working Groups. © 2018 ELIXIR. Photos: ELIXIR Hub unless stated otherwise. This publication was produced by the ELIXIR Hub. For more information about ELIXIR please contact: info@elixir-europe.org

Publication design: design-science.org.uk

ELIXIR Scientific Programme 2019–23

115


116

ELIXIR Scientific Programme 2019–23


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.