SIB Profile 2024 - Data scientists for life

Page 1

SIB Profile 2024

Data scientists for life

SPECIAL FOCUS GENERATIVE AI: NEW HORIZONS ONE HEALTH, MULTIPLE DATA CHALLENGES

Our mission as a federal public service

SIB is one of the federal government’s instruments to support research and innovation in Switzerland, and to promote the country internationally.

SEE P. 14

Activity pillar I – Developing world-class, open biodata resources

Our open software and databases are used across the world to accelerate research, boost innovation and tackle global challenges, from food security to climate change. SEE P. 32

Forewords

With the increasing pressure of humans on the planet, addressing health challenges requires a comprehensive and collaborative approach. In response, the One Health concept recognizes the intricate links between animal welfare, environmental sustainability and human health.

At SIB, advancing One Health is part of our strategic roadmap for the coming years. We will leverage our expertise in open research data to study and integrate a multitude of information from all compartments of life, from their molecular building blocks to their large-scale interactions.

We are also privileged to contribute to moving the One Health theme forward globally. At the European level, we act as co-leads of ELIXIR’s priority area on “Biodiversity, Food Security, and Pathogens” for instance. In Switzerland, we provide SPSP as a platform to monitor, at the molecular level, microorganisms that occur in humans, animals and the environment.

By committing our expertise, as well as our collaborative spirit, to the One Health concept, we want to make a significant and lasting impact on human, animal and planetary health. •

“SIB has established itself as a pioneer in life sciences, extending its influence from biotech companies to global business leaders.”

SIB has established itself as a pioneer in life sciences, extending its influence from biotech companies to global business leaders. Through its coordination activities, the institute drives the definition of standards and best practices to ensure optimal use of biological data. SIB embodies the collaborative spirit that characterizes the Swiss public research-industry partnership. It maintains its autonomy while fostering connections worldwide.

Its influential role facilitates not just technological strides and the establishment of valuable benchmarks; it also champions the “open research data” model, which is integral to SIB’s philosophy and resources. These contributions, in turn, serve as the backbone of its more service-oriented activities, nurturing innovation for local life science companies and global pharmaceutical or cosmetics front-runners alike.

In 2023, I renewed my commitment as a member of SIB’s Board of Directors. I am very proud to keep on supporting such an excellence-driven institution. •

2

In an era marked by unprecedented global challenges—from climate change and biodiversity loss to pandemic threats—data scientists offer crucial insights to inform the decision-making process. Investing in knowledgeable and reliable data-centric institutions is thus essential to securing our collective independence.

As discussions about the ethical use of AI arise at the Council of Europe and in the Swiss Parliament, underlying training datasets need to be structured and dependable. This is particularly true in the life sciences, where AI applications have far-reaching societal implications in health and beyond. SIB’s leading role as a domain expert and provider of high-quality open databases is thus an invaluable asset for the advancement of data science.

The provision of open databases has also been recognized as one of the unique contributions of the institute in a collaborative report by representatives of leading Swiss institutions sitting on the SIB Foundation Council. Among the other contributions highlighted are: offering specialized training to life scientists, federating bioinformatics research here in Switzerland and connecting us to international life science data infrastructures.

Twenty-six years ago, Swiss politicians demonstrated foresight and vision by championing the creation of SIB. Today, as we enter a new age defined by AI, its mission to push the boundaries of data science through in-depth knowledge of biological and biomedical data is more meaningful than ever. As we confront the challenges of tomorrow, let us recognize the pivotal role of institutions like SIB in bringing innovative solutions for the future. •

“Twenty-six years ago, Swiss politicians demonstrated foresight and vision by championing the creation of SIB.”
SIB Profile 2024 3
Simone de Montmollin President of the Foundation Council
4

pillar I: Developing world-class, open biodata resources

pillar II: A centre of excellence delivering life science data solutions 38 Activity pillar III: Coordination – bringing key partners and data together in large-scale projects 42 Spotlight on the science produced by our community

Generative AI combined with bioinformatics: a wide range of applications

Taming the AI beast and its challenges through key expertise

Why approaching health from multiple perspectives

Unlocking the potential of data for One Health

Index of SIB Group and Team Leaders

Acknowledgements

SIB Profile 2024 5
06 Data scientists for life 08 Bioinformatics: a definition 10 SIB in brief 18 Organization and governance 30 Activity
46 Generative AI: new horizons 48
54 One Health, multiple data challenges 56
60
63
67
Table of contents
34 Activity
52
matters
6

Data scientists for life

We are multidisciplinary experts who curate data and make it speak to solve biological questions. Discover how we work and the highlights of the year from our members and teams.

SIB Profile 2024 7

What kinds of data are we talking

Bioinformatics deals with a broad spectrum of complex

Bioinformatics: a definition

Thanks to computer-based approaches, researchers can improve their understanding of complex systems.

Life scientists and clinicians have long tried to assemble data and evidence to find the right answers to fundamental questions. Today, however, there is no shortage of data, and we find ourselves with a different issue. New technologies are producing data at an unprecedented speed, and in such quantity and variety that they can no longer be interpreted by the human mind alone.

Enter bioinformatics.

Bioinformatics is the application of computer technology to better understand and effectively use biological and biomedical data. It is the discipline that stores, analyses and interprets the big data generated by life-science experiments, or collected in a clinical context. This multidisciplinary field is driven by experts from a variety of backgrounds: biologists, computer scientists, mathematicians, statisticians and physicists.

Bioinformatics encompasses:

DATABASES for storing, retrieving and organizing curated information to maximize the value of biological data;

SOFTWARE TOOLS for modelling, visualizing, interpreting and comparing biological data;

ANALYSIS of complex biological datasets or systems using novel statistical approaches or machine learning techniques;

RESEARCH harnessing computational methods in a wide variety of biological fields to develop solutions in diverse areas, from agriculture to precision medicine; (SEE P. 28)

COMPUTING AND STORAGE

INFRASTRUCTURE to process and safeguard large amounts of biological data.

Expression data, such as the level of expression of a gene in a sample Imaging data Text And more... DNA, RNA or proteins
about?
data types. 8 DATA SCIENTISTS FOR LIFE

BRINGING BIOINFORMATICS TO SOCIETY

From precision medicine to drug design and DNA testing: bioinformatics is increasingly tied to health and societal issues.

Through outreach activities, SIB regularly communicates on advances made in this field and on their importance to the general public. A special emphasis is placed on young people and girls in particular, to foster their interest in STEM disciplines (Science, Technology, Engineering and Mathematics).

The website In the Light of Evolution (EN, FR, DE, IT), an SNSFAgora-funded project, shows the real-life implications of evolutionary biology through interactive stories. This year, new topics included: “Did Archeae give us sex?” and “The surprising effects of genetic variation”. The project served as the basis for the development of new workshops, such as “Decrypting evolution”, in collaboration the University of Geneva. It was also described in a publication “Bringing science to the public in the light of evolution”.

DOI: 10.1093/biomethods/bpad040

Explore our other interactive outreach websites, such as ChromosomeWalk.ch and PrecisionMed.ch.

In 2023

2,300

participants took part in over

120 activities and events, including:

Pint of Science at the University of Lausanne

Digital Week Lausanne and Neuchatel

TecDays in Basel, Sion, Geneva and Bulach, organized by the Swiss Academy of Engineering Sciences for students aged 15-20

Mystères de l’UNIL the University of Lausanne’s open day with the theme “Health in all its forms”

“Dare to Choose Any Career Day” aimed at pupils in Years 7 to 9 in the canton of Vaud, to combat gender stereotypes in career choices

More activities and news on Facebook, our dedicated outreach channel goo.gl/4c6xCZ

In 2024

Meet us at:

MYSTÈRES DE L’UNIL

Lausanne, 30 May - 2 June

Theme “Earth”

NUIT DE LA SCIENCE

Geneva, 6-7 July

Theme “Cycles!”

EXPANDING YOUR HORIZONS

Geneva, 16 November

Event dedicated to providing young girls with experiences in STEM

The Protein Spotlight comics are an adaptation of a selection of articles written by Vivienne Baillie Gerritsen. Intended for a lay audience, they are drawn by the artist aloys lolo. A book was published in 2023 by éditions Antipodes.

www.proteinspotlight.org/comics/

SIB Profile 2024 9
WWW.LIGHTOFEVOLUTION.ORG
WWW.CHROMOSOMEWALK.CH WWW.PRECISIONMED.CH

SIB in brief

A flagship organization in biological and biomedical data science, SIB shares its ambitious vision for our society with its partners. Its mission is supported by three complementary activity pillars.

Our vision

At SIB, we know that expertise in life science data is key to solving many of the world’s most pressing challenges. By unlocking the potential of biological and biomedical data, we aim to generate knowledge and innovate for a better future.

Our mission

Our mission is to push the boundaries of data science through in-depth knowledge of biological data, cutting-edge technologies and interdisciplinary collaborations.

We provide researchers and clinicians with outstanding resources, services and training, to accelerate innovation in many fields, from medicine and health to agriculture, the preservation of biodiversity and the environment.

We represent and federate Swiss bioinformatics. By fostering a culture of scientific excellence and collaboration, we contribute to ensuring Switzerland remains one of the most innovative countries in the world.

88 groups

913 members, including

192 employees

28 institutional partners across Switzerland

160 databases and software tools developed by our members and employees, accessible via the Expasy web portal

494 peer-reviewed articles published in 2023*

As of 1 January 2024

* Source: Web of Science

10 DATA SCIENTISTS FOR LIFE

Three pillars of activity enable us to fulfil our mission and are detailed in this report:

ACTIVITY PILLAR I

Open databases and software tools

Life scientists and clinicians across the world need to deal with large quantities of datasets of various types to perform their research. SIB is developing essential software tools and databases to accelerate their work and develop new applications.

Read more on p. 30

ACTIVITY PILLAR III Coordination

Making data and knowledge universally understandable and available is crucial to enable discoveries and maximize public investments in research. We facilitate collaboration between disciplines, scientific communities, institutions and across borders, to build complex infrastructure at the national and international levels.

Read more on p. 38

ACTIVITY PILLAR II Centre of excellence

We believe that good data science requires a detailed understanding of the nature of the data and the processes behind their generation. Our teams of data science experts, software developers, biocurators and computational biologists provide their expertise to academics, clinical, governmental and industry partners in Switzerland and abroad.

Read more on p. 46

SIB Profile 2024 11

Life sciences and health actors

Converting biological questions into answers with

various applications

Basic research Hospitals and clinics Research institutes Private sector Agriculture Medicine Environmental sciences Tailoring treatment to cancer patients Basic research Hospitals and clinics Research institutes Private sector Agriculture Medicine Environmental sciences Basic research Hospitals and clinics Research institutes Private sector Agriculture Medicine Environmental sciences Tailoring treatment to cancer patients Basic research Hospitals and clinics Research institutes Private sector Agriculture Medicine Environmental sciences Tailoring treatment to cancer patients
Private sector Governmental institutions Research institutes Hospitals and clinics
amount of data of various types: genetics, text, biochemical, imaging, etc. Developing green pesticides Preventing obesity 12 DATA SCIENTISTS FOR LIFE
Massive

SIB

Swiss Institute of Bioinformatics

Open databases and software tools

Sharing access all over the world to the most reliable and precise biodata resources made in Switzerland

Centre of excellence

Leveraging data science expertise, resources, national network and independent non-profit status

Dedicated multidisciplinary experts

Coordination

Developing common standards to facilitate collaboration and allow data to be reused meaningfully by humans and machines

Tracking
SIB Profile 2024 13
pathogens

COLLABORATIVE REPORT ON THE UNIQUE CONTRIBUTIONS OF SIB

The Foundation Council is SIB’s highest governance body and consists of representatives of its partner institutions (SEE P. 18). A 2023 report instigated by SIB’s president Simone de Montmollin highlights SIB’s unique contributions to Switzerland: a summary.

Context and objectives of the report

A working group was created at the initiative of the Foundation Council’s president Simone de Montmollin, who took office at the end of 2022. Its task was to identify the unique contributions of SIB for Switzerland and its life science and health community. It identified SIB’s most impactful, unique and synergetic contributions in four key areas, as well as the institute’s overall positioning within the Swiss landscape. These contributions, stably funded by the State Secretariat for Education, Research and Innovation, are nested within SIB’s three activity pillars described on p.11.

“SIB’s blend of bioinformatics proficiency, networking and training is key to its success, serves as a model in other countries and could also serve as a model for other domains within Switzerland.”

The report authors Read

de Montmollin President of the SIB Foundation Council Member of the National Council

Dessimoz

Antoine Geissbühler

of the Faculty

Medicine University of Geneva

Elisabeth Stark Vice President Research University of Zurich

14 DATA SCIENTISTS FOR LIFE
the full report
Simone Hugues Abriel Vice Rector for Research University of Bern Edouard Bugnion Vice President for Information Systems EPFL

UNIQUE CONTRIBUTION

Provision of open biodata resources

SIB has pioneered the provision of open databases and software tools in life sciences. Among the rich ecosystem of resources developed in Switzerland, SIB carefully selects a subset to be part of its portfolio. In so doing, SIB supports the promotion of excellence in resource development and operation. MORE ON P. 30.

UNIQUE CONTRIBUTION

Training in bioinformatics and biological and health data science

SIB’s highly successful training activity is characterized by its emphasis on practical, tailored instruction on the latest methods, tools and languages, by building on the institute’s national network. Now a leading provider of bioinformatics training, SIB’s offer complements the BSc/MSc teaching programmes of universities by concentrating on hands-on and applied teaching for postgraduate scientists.

UNIQUE CONTRIBUTION

Connecting Switzerland to international life science data infrastructure

SIB is a co-founder of, and trusted partner in several European and international initiatives, such as: ELIXIR, the European infrastructure for life-science information; the Global Biodata Coalition (GBC), a forum for funders to better coordinate approaches for the efficient management of biodata resources worldwide; and the Global Alliance for Genomics & Health (GA4GH).

Contribution to innovative national life science research infrastructure

Besides the four unique contributions outlined on the left, SIB has a successful track record of establishing nationwide infrastructure for life science data, in partnership with other stakeholders. This includes:

The BioMedIT network as the secure computing infrastructure for medical data, enabling Swiss researchers to work on the data patients have consented to share;

The Swiss Pathogen Surveillance Platform as a national instrument to improve pandemic preparedness (SEE P. 58);

The SwissBioData ecosystem as a project part of the SERI infrastructure roadmap 2023 to enable the sharing and interoperability of biological

UNIQUE CONTRIBUTION

Federation of Swiss bioinformatics research

The common affiliation of a national community behind the SIB brand reinforces Switzerland’s visibility in the world. By fostering collaboration and knowledge sharing within its national network, and by bringing together the resources and expertise in bioinformatics across the country, SIB contributes to driving progress and innovation in this rapidly evolving field. MORE ON P. 38

SIB Profile 2024 15

CELEBRATING 25 YEARS OF SIB AND LOOKING AHEAD

A quarter of a century after its creation, SIB is a central actor for life science data. The milestone was celebrated in various ways during the year, including with the release of our roadmap for the next funding period.

Reinforcing Swiss bioinformatics visibility with a new brand identity and website

Launched at the [BC]2 conference in September 2023, SIB’s new website and graphic design aim to better reflect our positioning: human-based, data-driven and impact-oriented. Its revisited content presents our flagship projects, national community of experts, key societal impacts of our activities and much more, for scientists, politicians and journalists alike. Visit www.sib.swiss

Celebrating in Basel and Geneva

The SIB community gathered to celebrate the institute’s 25 years on the Basel waterfront and at Geneva’s Museum of Art and History. The strategic roadmap was also presented on this occasion as part of the consultation process.

A

special edition of [BC]2 “Big data in biology: promises and challenges”

A record number of over 500 participants from around the world convened in Basel for the [BC]2 conference, celebrating its own milestone of 20 years. Topics ranged from scrutinizing scientific fraud and AI-based single-cell modelling, with the keynote speeches by Elisabeth Bik and Fabien J. Theis respectively, to the latest breakthroughs in computational biology and how to create bridges between academia and industry. The event also included for the first time an art competition, sponsored by Roche. Participants were asked to imagine and represent their work 20 years into the future of bioinformatics.

KEY FIGURES OF [BC] 2

+23%

attendees compared with the previous event

+104% more participants in the tutorials and workshops

+52% abstract submissions

16 DATA SCIENTISTS FOR LIFE
Winning artwork of the [BC]2 competition: “Bionauts of life science data: expanding the horizons of biocomputation”, Anastasia Sveshnikova, Swiss-Prot

OBJECTIVE 1

Enable life science advances through open resources and open research data

Every year, more scientists and clinicians rely on SIB’s databases, tools and data for their discoveries and innovations. A trend we aspire to sustain.

OBJECTIVE 3

Contribute to the environmental conservation effort

The biodiversity and environmental crises call for urgent action from us all and we are furthering our efforts on this front.

Our strategic roadmap

SIB’s roadmap is setting the course of the institute’s activities for the five years to come. Endorsed by the SIB community, here are our five strategic objectives and the enablers that will help us get there.

OBJECTIVE 4

Remain at the leading edge of new technological developments

Our role as specialized data scientists is to contribute to and make the most of the AI revolution for the life sciences. Read more in the dedicated chapter on P. 47.

OBJECTIVE 5

Represent Swiss interests internationally in life science research infrastructure

SIB will continue to be a key asset for Switzerland in terms of resource efficiency, visibility and weight on the international scene.

OBJECTIVE 2

Unlock the potential of -omics data for better health

SIB played a key role in the rise of personalized health by setting up the SPHN infrastructure and BioMedIT national secure IT network. Further developments are already in motion.

Key enablers of our strategic goals

Some of the most important factors and functions enabling our institute to achieve its objectives are:

Our funding model, with approximately 50% of funding provided by SERI and 50% from other sources, including competitive ones (SEE P. 20);

Shared principles for sustainable development including around environmental impact and equal opportunities;

Professional and domain-specific support functions.

Read the full roadmap
SIB Profile 2024 17

Organization and governance

The unique structure of SIB’s governance forms the basis of its strength.

SIB is anchored in the country’s research ecosystem at several levels, from its highest governing body to its national network of affiliated members. This ensures a close connection to the needs of life scientists.

Foundation Council

Highest authority in the institute with supervisory powers, its responsibilities include changes to SIB’s statutes, nomination of Group Leaders and approval of the annual budget and financial report.

Board of Directors

Scientific Advisory Board

Validates decisions necessary to achieve the aims of the institute, such as the scientific strategy and internal procedures, and allocating federal funds to service and infrastructure activities.

Discusses all matters relating to SIB Groups as a whole, and proposes new Group Leaders for nomination. The Council also elects its representatives on the Board of Directors.

Acts as an independent consultative body, providing recommendations to the Board of Directors and the Council of Group Leaders. Its main tasks consist in monitoring service and infrastructure activities, such as the SIB Resources. (SEE P. 30)

Staffed and headed by SIB Employees, focuses on SIB’s three core activities: coordination, open resources and centre of excellence. Includes: six scientific groups (SEE P. 24), executive management defining and implementing the institute’s strategic goals; support functions including scientific relations, finance & grant services, legal & technology transfer, human resources, information technology, cyber security, biodata resources and communication & scientific events.

18 DATA SCIENTISTS FOR LIFE
Two external members One Executive Director Council of Group Leaders SIB Hub Two Group Leaders

GOVERNING BODIES

The Foundation Council

Each of SIB’s partner institutions is represented on the Council. Composition as of 1 January 2024

President Simone de Montmollin Member of the National Council President of the Commission for Science, Education and Culture

Founding Members

Prof. Ron Appel Former SIB Executive Director

Prof. Amos Bairoch Group Leader, SIB and University of Geneva

Dr Philipp Bucher Associate Group Leader, SIB

Prof. Denis Hochstrasser Former Vice Rector, University of Geneva

Prof. C. Victor Jongeneel

Carl R. Woese Institute for Genomic Biology, University of Illinois, USA

Prof. Manuel Peitsch Honorary Professor, University of Basel

Ex officio Members

Prof. Hugues Abriel Vice Rector for Research, University of Bern

Mr Thomas Baenninger Chief Financial Officer, Ludwig Institute for Cancer Research (LICR)

Prof. Claudia Bagni

Vice Dean for Research and Innovation, Faculty of Biology and Medicine, University of Lausanne

Prof. Thomas Bieber Director, Cardio-CARE AG

Prof. Enrica Bordignon Vice Dean of the Faculty of Science, University of Geneva

Prof. Edouard Bugnion Vice President for Information Systems, EPFL

Prof. Emmanuele Carpanzano Director, Department of Innovative Technologies, University of Applied Sciences and Arts of Southern Switzerland (SUPSI)

Prof. Carlo Catapano Director, IOR Institute of Oncology Research

Prof. Andrea Cavallaro Director, IDIAP

Prof. Estelle Doudet Vice Rector Research, University of Lausanne

Prof. Katharina Fromm Vice Rector, University of Fribourg

Prof. Patrick Gagliardini Vice Rector for Research, Universita della Svizzera italiana (USI)

Prof. Brigitte Galliot Vice Rector, University of Geneva

Prof. Antoine Geissbühler Dean of the Faculty of Medicine, University of Geneva

Prof. Doron Merkler

Department of Pathology and Immunology, University of Geneva

Dr Vincent Peiris

Dean, School of Business and Engineering Vaud (HEIG-VD), HES-SO

Prof. Jean-Marc Piveteau

President, Zurich University of Applied Sciences (ZHAW)

Prof. Alexandre Reymond Centre for Integrative Genomics (CIG), Faculty of Biology and Medicine, University of Lausanne

Prof. Davide F. Robbiani Director, Institute for Research in Biomedicine (IRB)

Prof. Patrick Ruch Head of Research, School of Business Administration (HEG-Geneva), HES-SO

Prof. Gebhard Schertler Division Head of Biology and Chemistry, Paul Scherrer Institute (PSI)

Prof. Falko Schlottig Director, Fachhochschule Nordwestschweiz (FHNW) School of Life Sciences

Prof. Dirk Schübeler Co-Director, Friedrich Miescher Institute for Biomedical Research (FMI)

Prof. Torsten Schwede Vice Rector of Research and Talent Promotion, University of Basel

Prof. Elisabeth Stark Vice President Research, University of Zurich

Prof. Jürg Utzinger Director, Swiss Tropical and Public Health Institute (Swiss TPH)

Prof. Dr Christian Wolfrum Vice President for Research, ETH Zurich

Co-opted Member

Prof. Alfonso Valencia

Life Sciences Department Director, Barcelona Supercomputing Centre, Spain

The Board of Directors (BoD)

The BoD consists of two Group Leaders elected jointly by the Council of Group Leaders and the BoD, two external members elected by the Foundation Council on the recommendation of the BoD, and the SIB Executive Director. Members of the BoD are appointed for a renewable five-year period.

Dr Jérôme Wojcik (Chairman) Industrial Data Scientist & Entrepreneur

Prof. Christophe Dessimoz SIB Executive Director

PD Dr Katja Bärenfaller Group Leader, SIB and Swiss Institute of Allergy and Asthma Research (SIAF)

Prof. Richard Neher Group Leader, SIB and University of Basel

Dr Marie Owens Thomsen SVP Sustainability & Chief Economist, IATA

The Scientific Advisory Board (SAB)

The SAB is made up of at least five members, who are internationally renowned scientists from the institute’s fields of activity.

Prof. Alfonso Valencia (Chairman) Life Sciences Department Director, Barcelona Supercomputing Centre, Spain

Prof. Melissa Haendel Director of Precision Health & Translational Informatics and Professor of Genetics, University of North Carolina-Chapel Hill School of Medicine, USA

Prof. Oliver Kohlbacher Director of the Institute for Translational Bioinformatics, University Medical Center, Tübingen, Germany

Prof. Claudine Médigue Head of the Laboratory of Bioinformatics Analyses for Genomics and Metabolism (LABGeM), Génoscope & CNRS, Evry, France

Prof. Alexey I. Nesvizhskii Department of Pathology and Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, USA

Prof. Christine Orengo Department of Structural and Molecular Biology, University College London, United Kingdom

Prof. Ron Shamir

Computational Genomics Group at the Blavatnik School of Computer Science, Tel Aviv University, Israel

Council of Group Leaders

The Council consists of the Group Leaders (SEE P. 63).

SIB Profile 2024 19
SIB is a not-for-profit organization combining employees and affiliated members from 28 partner institutions
(SEE P. 26)

FINANCES

In 2023, SIB achieved healthy financial results with a total income of CHF 32 million from both national and international funders supporting its three activities.

Powered by the Swiss Confederation

The largest and most stable funder of SIB is the State Secretariat for Education, Research and Innovation (SERI). This base contribution accounts for 36% of our institute’s income.

A unique funding model to sustain open resources

The largest part of SERI’s base contribution is devoted to our open resources (58%, CHF 6.8 million). This subsidy is complemented by many additional funding sources, including competitive grants and contributions from the industry. In addition, most resources are also supported by short-term research funds held at partner institutions.

 One third of SIB’s income comes from competitive funds

In 2023, CHF 10.5 million (33% of SIB’s income) originate from competitive grants, collaborations and services. This amount is up by 7% compared with 2022. SIB’s independent status, the diversity of skills of its employees, and the full professional support they offer, make the institute a desirable partner. This includes innovative projects with the industry (e.g. Innosuisse) and long-term collaborations in European public-private consortia (e.g. Innovative Medicines/ Health Initiatives).

    
20 DATA SCIENTISTS FOR LIFE 36%

All figures are in millions of Swiss francs.

 Keeping up with increasing demands

The loss of 0.2M (0.6% of expenses) is funded by reserves to support additional open resources, in accordance with the investment strategy for the 2021-2024 period.

 A centre of excellence that delivers

Our centre of excellence includes training and bioinformatics services to universities, private companies and hospitals (SEE P. 34) amounting to a total of CHF 8.5 million. It is mostly funded through collaboration and competitive grants - many international.

 Lean and effective management & support

The management and support functions expenses (12%, CHF 4.3 million) cover day-to-day administration, people and culture, finance, communication, legal and IT support. These support teams also contribute to projects in the key pillars, for which the funds are allocated to the relevant activities (e.g. legal expertise for SPHN is allocated to Coordination).

 Investing in people at the heart of research

71% (CHF 24.5 million) of SIB’s expenses are devoted to our people, of which 73% for employees at the SIB Hub and 27% for employees embedded in partner institutions. This reflects our unique and efficient model anchoring open science infrastructure in research.

 Bioinformatics: a variety of competences and shared life science data expertise

 SIB Profile 2024 21

PEOPLE AND CULTURE

SIB relies on the expertise and commitment of its employees to deliver its value and impact.

There are 93 women (48%) and 99 men (52%) working at SIB

The median age at SIB is 45 years old, with a balanced pyramid of ages favouring knowledge exchange between early career scientists and senior experts.

The median length of service is 8 years

Geneva

72 employees

6 groups

Lausanne

92 employees

9 groups* including Management and support

Basel

23 employees

5 groups

Zurich

5 employees

2 groups

* Some of our groups have members in different cities, such as the Training group, with members in Lausanne, Bern and Basel.

Matthieu Fillon Senior Software Developer at SIB
“What motivates me most is thinking about how the solution that I contribute to developing can help improve users’ daily tasks and ultimately, have a positive impact on patients.”
22 DATA SCIENTISTS FOR LIFE
45
8
Read more Figures as of 1 January 2024
We

prioritize people, foster inclusion, and embrace diversity and fairness. Our workplace encourages personal development, exposure to diverse projects and proactive contributions from employees.

A stimulating workplace with continuous learning and development opportunities

Bioinformatics is a fast-moving field, at the interface between industry, academia and the health sector. To keep up to date with latest developments, foster exchanges across our community and acquire new skills, our experts benefit from a range of initiatives. This includes taking part in:

SIB-wide focus groups on cuttingedge scientific and transversal topics

(SEE P. 40)

The training programme, as well as the tutorials and workshops sessions or talks organised at SIB’s yearly conferences.

SIB has 192 employees of 22 different nationalities

Additional initiatives were launched in 2023:

A SIB-wide Competency Sharing Platform to increase opportunities for employees to participate in different projects across the Swiss network

Learn@Lunch sessions about key scientific concepts of SIB’s main activities (e.g. theoretical concepts of bioinformatics, FAIR data, drug design)

Curated monthly selection of LinkedIn courses (e.g. public communication, project management, conflict resolution)

SIB Profile 2024 23

MEET OUR TEAMS

Comprising and headed by employees, our teams at the SIB Hub harness their expertise to collaborate with partners and other SIB Groups across our three pillars of activity.

Clinical Bioinformatics

“We support hospitals, public health and veterinary federal departments, as well as the private sector, to make the most of an exponential flow of data, to enhance diagnostics and pathogen surveillance, and foster optimal patient care and well-being. We do this through the software tools and AI-based methods we develop.”

EXAMPLES

Innovative AI-based approaches for molecular imaging analysis. Diagnostic applications (cancer, genetic diseases, etc.) for the medical and pharmaceutical domain. Collaborative platforms to enable data sharing for epidemic surveillance, research or clinical purposes.

TAGS human genetics; interoperability; infectious disease; oncology; outreach; personalized medicine; training; pathogen surveillance

Environmental Bioinformatics

“We recognize the need to coordinate efforts to address environmental challenges in a One Health perspective, from data collection and ingestion into open repositories, through to data analysis and the provision of tools and services that take full advantage of the data ecosystem.”

EXAMPLES

Actively participating in international initiatives such as the European Reference Genome Atlas (ERGA), Biodiversity Genomics Europe (BGE) and the ELIXIR Biodiversity Community.

TAGS agriculture; biodiversity; comparative genomics; open research data; data mining; evolutionary biology; functional genomics; text mining

Personalized Health Informatics

“To ensure high-quality care and patient safety in the long term, healthcare and research must go hand-in-hand. In the context of the Swiss Personalized Health Network (SPHN), we have thus made healthrelated data in Switzerland available for research in a lasting way. This is done through their FAIRification* and the national secure IT infrastructure BioMedIT.”

EXAMPLE

The framework to easily link data from all five university hospitals to accelerate biomedical research and facilitate data sharing within the SPHN consortium is in place. The onboarding of cantonal hospitals into the network is now ongoing.

TAGS information security; interoperability; personalized medicine; training

24 DATA SCIENTISTS FOR LIFE
Robert Waterhouse

Supporting data science: a range of specialized teams

Other teams at the SIB Hub also provide non-scientific support to the community, including legal & technology transfer, communication or IT: see full list P. 18.

Swiss-ProtKnowledgebases

“As a competence centre for biocuration and knowledge management we develop, annotate and maintain internationally renowned knowledge resources. Our activities are thus crucial to the AI revolution in biology.”

EXAMPLES

Some of our flagship resources include: UniProtKB/Swiss-Prot, ENZYME, Rhea, SwissLipids, HAMAP, PROSITE and ViralZone.

TAGS biochemistry; database curation; lipidomics; metabolomics; ontology; proteomics; proteins and proteomes; semantic web; systems biology

Training

“Our training offer at SIB is unique: taught by key experts, constantly up-to-date and hands-on. It thus complements the teaching available at Swiss universities, with whom we collaborate. By bringing together attendees from academia and the industry, it also fosters collaboration.”

60 courses offered to the community by groups from across Switzerland and bioinformatics domains

66 experts and trainers

1,445 participants

“As both computational biologists and software developers, we understand data and how to manage them, as well as the underlying biological questions. Our focus is on finding innovative approaches to data analysis, such as overcoming constraints related to sensitive data, or interconnecting data through knowledge graphs.”

EXAMPLE

Setting up federated data analysis systems across several countries to enable access to large patient cohorts while addressing legal, ethical and FAIR principles*.

TAGS data management; data mining; genome reconstruction; knowledge representation; machine learning; mass spectrometry; next-generation sequencing; personalized medicine; software engineering; systems biology

*FAIR: a set of guiding principles to improve the Findability, Accessibility, Interoperability, and Reuse of digital assets

Vital-IT –Computational
biology
Alan Bridge Patricia Palagi
KEY FIGURES 2023
SIB Profile 2024 25

A

GATEWAY TO SWISS-WIDE EXPERTISE

Through institutional partnerships and close collaborations with their affiliated academic groups, SIB unlocks access to national expertise.

Figures as of 1 January 2024

SIB Members, incl. 192 employees (SEE P. 22), across 28 partner institutions

GENEVA 9 GROUPS 109 MEMBERS INCL. 72 EMPLOYEES

YVERDON 1 GROUP 6 MEMBERS

LAUSANNE 23 GROUPS 310 MEMBERS INCL. 92 EMPLOYEES

FRIBOURG 4 GROUPS 18 MEMBERS

MARTIGNY 1 GROUP 6 MEMBERS

BASEL 16 GROUPS 192 MEMBERS INCL. 23 EMPLOYEES

BERN 4 GROUPS 39 MEMBERS

913
FRIBOURG
LAUSANNE YVERDON BERN BASEL 26 DATA SCIENTISTS FOR LIFE
GENEVA MARTIGNY

VILLIGEN 1 GROUP 5 MEMBERS

ZURICH 19 GROUPS 167 MEMBERS INCL. 5 EMPLOYEES

WÄDENSWIL 2 GROUPS 21 MEMBERS

ST GALLEN 1 GROUP 3 MEMBERS

Collaborative by design

SIB’s strength lies in its unique structure: in addition to 192 employees, over 700 members are affiliated to SIB as well as major academic institutions in Switzerland. These 28 institutions are themselves partners who sit on the organization’s Foundation Council, SIB’s highest governing body. These strong partnerships across the country enable SIB to coordinate national initiatives and international projects with independence and efficiency. This also guarantees that the best expertise in the country will be found to meet the needs of private partners that consult us.

DAVOS 3 GROUP 17 MEMBERS

VILLIGEN

BELLINZONA 2 GROUPS 12 MEMBERS

LUGANO 2 GROUPS 8 MEMBERS

DAVOS
WÄDENSWIL LUGANO BELLINZONA ST
ZURICH
GALLEN
SIB Profile 2024 27

SCIENTIFIC FIELDS

The specific activities of our teams at the SIB Hub are fed by and benefit from those of our network, which cover the many areas of bioinformatics.

Genes and genomes

Life’s instruction manual

14

A genome is the sum of genetic material of an organism, including all of its genes. It is composed of DNA and contains all the information needed to create and maintain an organism, as well as the instructions on how this information should be expressed.

Bioinformatics develops tools to read genomes, store, analyse and interpret the resulting data.

Proteins and proteomes

The building blocks of life

10

A proteome is the sum of proteins expressed by a cell, a tissue or an organism, at a given time. Proteins are the products of genes, and are involved in nearly every task carried out within an organism – from transporting oxygen to fighting off pathogens.

Bioinformatics develops tools to understand the role of proteins.

Evolution and phylogeny

From ancestors to descendants

14

Changes that occur in genomes tell life scientists how an organism has evolved over time. Comparisons made between genomes from different species or populations tell them how they are related to one another – this is the field of phylogenetics.

Bioinformatics develops tools to compare the genomes of organisms, as well as computational methods to reconstruct their past and build their ‘family’ trees.

Number of groups per domain (only the groups that gave these themes as their main activities are listed)

Key resources on Expasy.ch (160+ tools and databases developed)

28 DATA SCIENTISTS FOR LIFE
54 76 28

Structural biology

The third dimension

10

Macromolecules such as DNA and proteins have specific 3D structures that are dictated by their sequence. A protein’s function is defined by its 3D structure, which in turn defines the way it interacts with other molecules.

Bioinformatics develops software to create 3D models of proteins to study their interactions with other molecules, such as drugs.

Systems biology

Never alone

16

Machine learning and text mining

Rise of the machines

6

5 16 43

Machine learning (ML) techniques allow computers to learn from data without explicit instructions, and to draw inferences from data patterns. Text mining algorithms, often based on ML, are designed to recognize patterns within text, such as biomedical terms.

Bioinformatics is supported by and feeds into ML algorithms, with diverse applications including drug design, biomarker discovery and text mining to facilitate literature triage.

Life occurs and is sustained by a mesh of interactions within and between cells, tissues, organisms and their environment. Understanding how these complex systems function allows scientists to predict what happens if one of the components changes or the conditions are altered.

Bioinformatics methods help to predict metabolic pathways.

Core facilities

The means to an end

11

The quantity of data generated by the life sciences has grown exponentially over the years and needs to be stored and processed. Researchers also need support in making sense of their data. Core facilities centralize research resources, and provide tools, technologies, services and expert consultation to this end.

Bioinformatics core facilities support researchers with specific issues relating to the management and analysis of their datasets to make the most of them.

“Working alongside researchers in international collaborations allows us to measure the significant impact made by SIB in terms of scientific excellence, quality of governance and effectiveness in bringing Swiss bioinformaticians together under the same banner.”
Vincent Zoete, SIB Group Leader, University of Lausanne
SIB Profile 2024 29

BACTIVITY PILLAR I

Developing world-class, open biodata resources

Providing and maintaining essential databases and software to serve the fast-evolving needs of life scientists and clinicians is at the heart of our mission. Why do we need to foster best-in-class open science resources?

ioinformatics resources, such as those developed at SIB, enable scientists worldwide to study life and to foster discoveries. Find out about the stringent selection process guiding their inclusion in the institute’s portfolio of leading open biodata resources and the professional support they benefit from throughout their life cycle. Our unique model helps them to reach and maintain the highest level of usefulness, accuracy and reliability for their over 10 million yearly users. Several SIB Resources have been recognized as essential for the international community.

International recognition as essential biodata infrastructure

Six out of our 14 SIB Resources are recognized outside of our borders as of fundamental importance to the worldwide life sciences community. A great tribute to the SIB Groups developing them, to SIB’s commitment to life science infrastructure and to Switzerland’s leading expertise in biological data.

EUROPE

Five SIB Resources are recognized as ELIXIR Core Data Resources (Cellosaurus, Rhea, SWISS-MODEL*, UniProt and STRING)

WORLDWIDE

Five SIB Resources are recognized as Global Core Biodata Resources (GCBRs), with Bgee*, Cellosaurus*, Rhea, UniProt and STRING.

*New in 2023

Massive volumes of data are produced by life science research and biomedical activities. The preservation of scientific knowledge and the reproducibility, impact and quality of science – all of which are key to maximizing public investments in research – rely on open data resources. Recognizing this, since 2000 the State Secretariat for Education, Research and Innovation has ensured stable funding for SIB to identify, support and develop essential bioinformatics resources.

Assessment and monitoring of the SIB Resources is done across a set of 30 indicators

“SIB’s funding is one of the few means available to sustain such bioinformatics services. The institute also plays an important role in the identification of connections and complementarities between resources and the presentation of a coherent portfolio of leading Swiss bioinformatics resources to the global research community.”
DATA SCIENTISTS FOR LIFE 30

Becoming an SIB Resource: a robust selection process involving international experts

Every four years, our independent Scientific Advisory Board (SAB) recommends which resources from the SIB community should be included in the institute’s portfolio. Its recommendations are based on external peer reviews and its appreciation of the resources’ scientific impact and alignment with the institute’s mission. The SAB also provides guidelines for their continuous development, every two years. The level of support in terms of allocation of SIB employees to implement these guidelines is then decided by the institute’s Board of Directors. A new selection process for the financial period 2025-2028 will take place in 2024: a difficult task given the number of proposals and the tightness of available resources.

Professional support to help SIB Resources reach long-term sustainability and remain at the cutting edge

Our resources are supported throughout their life cycle to reinforce their impact and quality with a dedicated service offering that is unique in academia. Coordinated by the Biodata Resources team, it includes technical support, best practices and knowledge sharing, infrastructure hosting, user experience, communication, legal advice (e.g. best practice guidelines for open licensing), data protection and grant management.

A virtuous circle is thus created, where it is more likely that they will receive additional funding from other sources, thereby increasing their sustainability.

Facilitating access and reuse of computer-predicted protein structures

The SIB Resource ModelArchive, an open repository for sharing computationally determined protein structure models, has been awarded a grant by swissuniversities to expand open research data (ORD) principles. No fewer than six SIB Group Leaders are collaborating with international experts to develop this data resource, which complements the Protein Data Bank (PDB) dedicated to experimentally determined protein structures:

Andrea Cavalli, Institute for Research in Biomedicine, Bellinzona

Matteo Dal Peraro, EPFL

Markus Lill, University of Basel

Olivier Michielin, Geneva University Hospital

Torsten Schwede, University of Basel

Vincent Zoete, University of Lausanne

Marc Robinson-Rechavi SIB Group Leader, University of Lausanne SIB Resource Bgee

Artificial intelligence tools shed light on millions of proteins

Embracing the recent deep-learning revolution, SIB scientists have uncovered a treasure trove of uncharacterized proteins, leading to a publication in Nature. They constructed an interactive network of 53 million proteins with high-quality AlphaFold (AI-predicted) structures. Building on the expertise of the Schwede group in the SIB Resource SWISS-MODEL, they made the network available online as the “Protein Universe Atlas”. This work benefited from a dedicated kickstarter grant from SIB to encourage the adoption of AI in life science resources. (SEE QUOTE BELOW)

DOI: 10.1038/s41586-023-06622-3

“SIB provides incentives for collaborations among its resources, through its Biodata Resource team, the ‘SIB Resource day’, or funding for collaborative projects such as a funding call for innovative AI projects.”
SIB Profile 2024 31

TACKLING GLOBAL CHALLENGES THANKS TO OUR DATABASES AND SOFTWARE

Our biodata resources enable scientists to study life at different scales to tackle global challenges.

Scientific impact of SIB Resources in 2023

Measuring the impact of databases and software is no mean feat. The stories on this page highlight a few examples of how our resources enable discoveries with a clear societal impact, while usage figures tell another part of the story, showing their relevance for millions of users and how they boost innovation through patents.

AGRICULTURE

Advancing agricultural biotechnology

Understanding how genes control plant growth is crucial for improving agricultural technology, especially in dealing with environmental changes. This knowledge helps develop crop varieties that can thrive in different conditions and continue to produce enough food for a growing population.

EXAMPLE

OMA (part of SwissOrthology) helped the identification of genes in rice and other model species which correspond to barley genes important for seed germination. This enabled researchers to explore when and where genes are expressed during germination, which ultimately could lead to barley grain improvement.

DOI: 10.1093/nar/gkad521

17,041

citations* in peer-reviewed papers

11.3

9,000

mentions in patents

MEDICINE & HEALTH

Enhancing anti-tumour immunity

Cytotoxic T cells are powerful actors in the anticancer immune response and as such, play a key role in current successful cancer immunotherapies. However, these cells are also prone to ‘exhaustion’ in the context of chronic stimulation of the immune system in human cancer.

EXAMPLE

The authors found that high levels of the protein SNX9 play a role in T-cell exhaustion. Using ISMARA (part of SwissRegulon) to investigate the cause of these high expression levels, they discovered that two other proteins, NFATC and NR4A1/3, are key drivers of T-cell exhaustion. This finding could help develop better treatments for cancer. DOI: 10.1038/s41467-022-35583-w

From atomic interactions and molecules, to cells, organs and populations, SIB software and databases enable scientists from around the world to make discoveries feeding into the world’s most pressing challenges.

This way for more stories…

ENVIRONMENT

Understanding carbon fixation processes

Cyanobacteria are special organisms that can perform photosynthesis, like plants. They could play an important role in reducing the levels of CO2 produced by humans. Understanding how they capture carbon could help us increase this process.

EXAMPLE

With the help of SWISS-MODEL, researchers were able to figure out the shape of an enzyme regulating the process. In that shape they were able to identify specific sites that control how much carbon the cynanobacteria absorb from the air.

DOI: 10.1038/s42255-023-00831-w

* Source: Web of Science

32 DATA SCIENTISTS FOR LIFE
million users

MEDICINE & HEALTH

The biological processes of Autism Spectrum Disorder (ASD)

ASD is a neurodevelopmental disorder characterized by communication deficits and repetitive behavioural patterns. While genetic factors are known to play a role, a better understanding of the mechanisms in play is needed.

EXAMPLE

Using TopAnat (part of Bgee), researchers were able to pinpoint specific brain regions associated with biological processes disrupted in ASD. These are thus plausible targets for drug development.

DOI: 10.3390/biomedicines11112971

The 14 SIB Resources, representing 37 tools and databases in total

ASAP Automated single-cell analysis portal

BGEE Gene expression expertise

CELLOSAURUS

Cell lines knowledge resource

GLYCO@EXPASY

Zooming in on webbased glycoinformatics resources

NEXTSTRAIN

Real-time tracking of pathogen evolution

RHEA Knowledgebase of biochemical reactions

STRING Protein-protein interaction networks and functional enrichment analysis

SWISSDRUGDESIGN

Widening access to computer-aided drug design

SWISSLIPIDS Knowledge resource for lipids

SWISS-MODEL

Protein structure homology-modelling

SWISSORTHOLOGY

One-stop shop for orthologs

SWISSREGULON PORTAL Tools and data for regulatory genomics

UNIPROTKB/ SWISS-PROT Protein knowledgebase

V-PIPE Viral genomics pipeline

SIB Profile 2024 33

ACTIVITY PILLAR II

A centre of excellence delivering life science data solutions

We offer data solutions fostering innovation, health advances and societal impact on a day-to-day basis to academic, clinical, governmental and industry partners.

As a leading independent scientific foundation specialized in life science data, our teams of data scientists, software developers, biocurators and computational biologists provide professional services and robust solutions to partners in Switzerland and abroad.

Our service offering spans across sectors

We have over 25 years of experience in successfully delivering data science services to the academic and private sectors, in large European public-private partnerships as well as contracts or Innosuisse schemes with biotech/ biomed start-ups, SMEs, large companies and hospitals. Our solutions can be applied to any life science field, from agriculture to environment and medicine. In particular, we have extensive experience in managing and analysing data in projects around infectious diseases, diabetes and precision oncology.

We boost innovation across a range of applications

Here are some examples of how we support our partners with solutions at different stages of innovation:

– IN PHARMACEUTICAL R&D: discovery of novel biomarkers for Type 2 diabetes.

– IN ONCOLOGY DIAGNOSTICS: development of Oncobench®, a platform used routinely at the Geneva University Hospital to help interpret genetic variants in patients’ tumours.

– IN MANUFACTURING PROCESS OPTIMIZATION: development of a bacterial genotyping analysis tool.

– IN PUBLIC HEALTH SURVEILLANCE: development of the SARS-CoV-2 Swiss Data Hub to support the Federal Office of Public Health.

New European publicprivate partnerships

In 2023, our Vital-IT Computational Biology group joined several new panEuropean projects to provide expertise in data management, analysis and AI:

PREVENTING CHILDHOOD OBESITY with the OBELISK project, funded by Horizon Europe, UK Research and Innovation and the Swiss State Secretariat for Education, Research and Innovation (SERI). The project brings together 15 partners, including SIB, from 9 countries across Europe.

PERSONALIZED STRATEGIES TO BETTER TACKLE CARDIOVASCULAR DISEASES with the iCARE4CVD project, funded by the Innovative Health Initiative. The project brings together 33 international partners to better understand cardiovascular diseases and optimize future prevention and treatment. SIB brings its expertise in data management and harmonization to enable the AI-supported analysis of multiple cohorts, representing data from over 1 million patients.

EXPLORING THE CAUSES OF INFLAMMATION IN ARTHRITIS with the Endotarget project, funded by the European Union and SERI.

Renewed partnership with Lunaphore: a second Innosuisse grant awarded

An initial Innosuisse grant enabled a collaboration between our Clinical Bioinformatics group, Lunaphore – a Bio-Techne brand – and the Geneva University Hospitals. The aim was to generate machine-learning solutions to power the analysis of molecular imagery and better understand the tumour environment. Thanks to a second Innosuisse grant, a new collaboration started in 2023 to develop AI-based assay development tools. These tools will enhance the automated approach of Lunaphore’s COMETTM platform and further accelerate spatial biology adoption in any research project.

“This is a unique project that will empower scientists to reach their research goals more efficiently and develop the next generation of personalized therapies.”

By bringing value to data, we remove barriers to innovation and ensure the societal impact of research

Data produced in the lab

Specialized context

Stored in ad hoc format

Low added value

Organized and connected data thanks to SIB

Understandable and reusable by all (FAIR) Maximized value

Ready for trustworthy AI

SIB Profile 2024 35
Innovation Health advances Societal impact

OUR END-TO-END SERVICE OFFERING

We provide expertise in a wide variety of services, from knowledge graphs to AI modelling. This is combined with an in-depth understanding of the underlying life science data and professional support, such as data protection and project management.

Clinical data

Text, including scientific literature and clinical reports

We are experts in all kinds of life sciences data:

Molecular imaging

Biostatistics and bioinformatics analysis

Making sense of life science data

We harmonize, integrate and analyse all kinds of data from a range of technologies to enable discoveries. Our areas of expertise include molecular imaging analysis; de novo assembly of sequencing data; functional analysis, multiomics data integration, machine learning and more.

Multiomics and spatial analysis

Data stewardship and management

Organizing data for long-term reuse

We assist with defining and implementing Data Management Plans (DMP) for research proposals and funding applications, as well as reaching data interoperability targets, from local to international scales. We do this within academic or regulated environments and ensure long-term management, expert annotation and storage of biological data. This involves making data FAIR and harmonizing datasets.

36 DATA SCIENTISTS FOR LIFE

We also offer professional support

Working with SIB also means benefiting from added value: in addition to scientific expertise, we support you to ensure the solutions we deliver help you in the most efficient way and over the long term:

Legal and technology transfer IT infrastructure

Data security

Data protection and GDPR

Project management and coordination

SIB Profile 2024 37

Coordination – bringing key partners and data together in large-scale projects

Maximizing the discoveries that can be made from life science data requires massive coordination efforts among disciplines and institutions, and alignment between Swiss and international practices.

Thanks to its multidisciplinary nature, bioinformatics is at the crossroads of life science needs and developments. Our coordination activities thus aim to facilitate collaboration between disciplines, institutions and across borders, and optimize investments in science. This involves, for instance, establishing common standards to make open data and knowledge universally understandable and usable. We do it thanks to our strong national network, by connecting Swiss bioinformatics with the world and by implementing impactful research infrastructure.

“Acting as the coordinator of such a European project is a recognition of both our long-term expertise in open research data and of our capability in bringing together multidisciplinary actors on a large scale. TRIPLE aims to build a cornerstone on which integrated searches can be performed over publicprivate research data.”

Leading an international project to integrate public-private data for discoveries

WHAT

TRIPLE is a newly funded initiative that will enable an unprecedented level of interoperable data sharing between researchers from any science domain.

HOW

Access to both public and selectively shared private research data will be facilitated through innovative solutions. Coordinated by SIB, harnessing its expertise in data FAIRification, knowledge representation and open databases, the project brings together partners from Belgium and the Czech Republic.

EXAMPLE OF IMPACT

Through the discoveries it will enable, TRIPLE will benefit the scientific community at large as well as society, with one of its first applications being the search for organisms that could help degrade pollutants.

38 DATA SCIENTISTS FOR LIFE
ACTIVITY PILLAR III

Coordinating international alignment in research data management

WHAT

A new swissuniversities-funded project, the Swiss Research Data Support Network, was accepted in 2023. Its aim is to establish a robust and inclusive research data support community. SIB’s Training group is coordinating the international alignment efforts.

HOW

The Network brings together all Swiss universities and higher schools, as well as all the key actors, from researchers to librarians and data stewards. It will further the aim of the Swiss Data Stewardship Environment, funded by swissuniversities in 2022, to set up a national training programme for data stewards. SIB Training is leading the life science track there.

EXAMPLE OF IMPACT

The Swiss Research Data Support Network will facilitate collaboration, knowledge sharing and best-practice development among Swiss and international institutions to make scientific data ready for reuse. This will optimize public investments in research, and therefore its benefits for society.

Our data coordination role in a large panEuropean biomedical project against obesity

WHAT

As part of the public-private research consortium ‘SOPHIA’ (Stratification of obese phenotypes to optimize future obesity therapy), we coordinate the federation of the project’s clinical data from across Europe to enable collaborative analysis by researchers and accelerate discoveries, while preserving patients’ data privacy.

HOW

We created a federated database of 16 harmonized cohorts (i.e. patients with the same disease and monitored over time), enabling remote or local analysis on individual or multiple cohorts – introducing a new way of working thanks to FAIR data management.

DOI: 10.3390/LIFE14020262

EXAMPLE OF IMPACT

A first study by project partners using the federated database aimed at predicting weight loss after bariatric surgery using a statistical model. We are involved in a follow-up study to further refine the prediction by integrating omics data collected within the cohort, to identify molecular biomarkers of weight loss and help prioritize surgical interventions.

SIB Bioinformatics Awards 2023: worldwide recognition for excellence in the field

Every other year, SIB coordinates the selection and recognition of excellence in bioinformatics on the international level. The 2023 winners are:

Viktor Petukhov (University of Copenhagen), PhD Paper Award for “Innovative approaches for imagingbased transcriptomics”

Maria Brbić (EPFL), Early Career Award for her dedication to research in biomedicine with a commitment to Equality, Diversity and Inclusion

OpenGenomeBrowser, Innovative Resource Award for facilitating the visualization of complex genomic data

Watch the winners’ talks

SIB Profile 2024 39

A LIVELY SWISS-WIDE COMMUNITY

Our Swiss network is characterized by a strong community spirit enabling innovation and collaboration. Discover the community initiatives making progress or newly launched.

First Bike to Work event and launching a carbon assessment study

The EcoImpact focus group addresses questions such as “What is the environmental impact of SIB’s activities?” and “How can current practices in computer science be more sustainable, while preserving scientific competitiveness?”. In 2023, it notably:

Organized the first SIB-wide ‘Bike to Work’ event, where 16 teams and 60 participants rode 17,681 km, saving 2,546 kg of CO2 emissions;

Launched a carbon assessment study to evaluate the carbon footprint of our IT infrastructure.

Fostering exchanges and bioinformatics know-how among doctoral students

The PhD Training Network provides a supportive community for doctoral students undertaking bioinformatics research in Switzerland. In 2023:

The very first “Alumni Career Path” retreat took place, with 13 SIB PhD Training Network alumni invited, bringing together a record number of over 50 participants;

A Summer School was also organized in the Alps together with the French Institute of Bioinformatics (IFB), about multiomics data analysis and integration, bringing together 40 participants from across Europe.

Coming together to advance cutting-edge and specific topics

Focus groups have been established across the community to foster exchanges of knowledge and encourage collaborations across disciplines. In addition to ‘Epigenomics’, ‘Single-cell omics’, ‘Diversity’ and ‘EcoImpact’, a new focus group was created: Semantic Web of data. Its aim is to work towards solutions for the seamless integration of life science datasets and databases.

The group led a 2023 paper highlighting the interoperability among SIB Resources, a key to making new discoveries.

DOI: 10.1093/nar/gkad902

40 DATA SCIENTISTS FOR LIFE

Initiatives to promote diversity and inclusion

SIB is committed to fostering equality, diversity and inclusion (EDI) for members of all backgrounds. Dedicated to these questions, the Diversity focus group leads activities around three main pillars: science; institutional and organizational; and academic advocacy. In 2023, it notably:

Contributed to SIB’s Gender Equality Plan (GEP), part of the eligibility criteria of the Horizon Europe programme;

Organized a “Lunch & Learn” on sex and gender, a group outing to the “Queer – Diversity is in our nature” exhibition at the Natural History Museum in Bern, and regular communications in the SIB’s newsletters to members and employees.

Increasing Swiss bioinformatics’ visibility at scientific conferences

On top of the SIB’s scientific conference [BC]2 organized in Basel and bringing together about 500 scientists from Switzerland and abroad (SEE P. 16), the institute supports and participates in a range of events. This is part of SIB’s mission to promote knowledge exchange across sectors, its service offering and the widespread expertise of its national network. It is also a way to foster new collaborations with academia and industry. In 2023, we took part in the following events:

ORGANIZED BY OUR SCIENTISTS: Perspectives on AI Symposium Series: Data Science in Oncology, Martigny Clinics meets Data Science symposium, Zurich

SWAT4HCLS, Basel

ORGANIZED BY OUR PARTNERS, WITH THE PARTICIPATION OF SIB SCIENTISTS: International Society for Biocuration Annual Conference, Padua (Italy) LS2 Annual Meeting “Life on earth: coping with challenges”, Zurich ISMB / ECCB, Lyon (France) BioAlps Networking Day, Geneva

OTHER CONFERENCES: Life Science Zurich Impact Conference “Data for Health”, Zurich BioTechX Europe, Basel

SIB Profile 2024 41

Spotlight on the science produced by our community

Enlightening findings of our members that were shared with the global scientific community and general public through science news, press releases and talks.

248,000

visitors to the SIB website

18,850

followers on LinkedIn

Streamlining analysis of data generated from mass spectrometry

Mass spectrometry is an essential tool for measuring the amounts of different proteins in biological samples. To streamline the analysis of data obtained with this method, the software package “prolfqua” was developed and presented in this in silico talk*.

GROUP INVOLVED

Proteome Informatics, led by Christian Panse

Published in the Journal of Proteome Research

DOI: 10.1021/acs.jproteome.2c00441

WATCH THE IN SILICO TALK ABOUT THE PAPER

* in silico talks inform bioinformaticians, life scientists and clinicians about the latest advances led by SIB Scientists on a wide range of topics in bioinformatics methods, research and resources.

Single-cell data science enables recent breakthroughs in cancer research

As the possibilities of data acquisition to study cancer widen, so do the methods used to investigate and reveal the nature of this complex disease. Recent illustrations of this are the advances in the field of immunotherapy led by clinical researchers in Switzerland, empowered by the expertise of SIB scientists in single-cell expression data analysis.

GROUP INVOLVED

Translational Data Science, led by Raphael Gottardo

Published in

Science DOI: 10.1126/science.abl7207

Nature DOI: 10.1038/s41586-022-05605-0

Immunity DOI: 10.1016/j.immuni.2022.12.006

Machine learning to accelerate discovery of antimalarial properties in plants

Researchers, supported by ML models developed at SIB, reviewed 21,000 species from three plant families. They estimate that over a third could have antimalarial properties and merit further investigation using a proposed predictive framework. They also believe at least 1,300 active anti-Plasmodium species may have been missed using conventional approaches.

GROUP INVOLVED

Computational Evolutionary

Paleobiology, led by Daniele Silvestro

Published in Frontiers in Plant Science DOI: 10.3389/fpls.2023.1173328

42 DATA SCIENTISTS FOR LIFE

Uunravelling why partners often share similar traits

From education to blood pressure, partners in couples tend to present striking trait similarities. For the first time, the reasons for this are teased apart thanks to advanced statistical methods and large public datasets. The study showed how a combination of factors, from initial partner choice to convergence over time, contributes to couples sharing similar traits.

GROUP INVOLVED

Statistical Genetics, led by Zoltán Kutalik

Published in Nature Human Behaviour

DOI: 10.1038/s41562-022-01500-w

Novel insights into kidney metabolism

The processes which lead to the production of urine follow a rhythm driven by molecular mechanisms called the “circadian clock”. Researchers set out to investigate how this circadian clock controls metabolism in the kidney, and found evidence for a rhythmical expression of portions of genes, proteins and metabolites in the mouse throughout the day. The study also provides a unique resource for the understanding of rhythmic kidney physiology.

GROUP INVOLVED

Vital-IT, led by Mark Ibberson

Published in The Journal of Clinical Investigation

DOI: 10.1172/JCI167133

A lethal fungus with unprecedented abilities to evade the human immune system

The Pneumocystis fungus causes lethal pneumonia in immunocompromised patients, such as those receiving organ transplants. An international study shows its unprecedented ability to outwit the human immune system, through a new molecular mechanism. New therapeutic strategies to combat the disease could derive from these insights.

GROUP INVOLVED

Vital-IT, led by Mark Ibberson

Published in Nature Communications

DOI: 10.1038/s41467-023-42685-6

Explore all our news and subscribe to our newsletter to keep up-to-date throughout the year.

SIB Profile 2024 43

SIB REMARKABLE OUTPUTS 2023

Discover the best achievements and work produced by our scientists over the last year.

Staying abreast of the latest advances and bright ideas emerging

Decoding the link between tissue architecture and cell plasticity

DOI: 10.1038/s41588-023-01588-4

GROUP INVOLVED

Computational Systems Oncology, led by Giovanni Ciriello, Lausanne

DESCRIPTION

CellCharter aims to uncover and analyse communities of cells within tissues, providing deeper insights into cellular behaviour and interactions in various biological environments, such as cancer.

WHAT THE COMMITTEE SAID ABOUT THE WORK

“CellCharter is a remarkable tool to identify, characterize and compare cell communities in spatial -omics datasets. It integrates existing approaches and provides a generally applicable, performant method.”

GROUP INVOLVED

Cellular Consequences of Genetic Variation, led by Pedro Beltrao, Zurich

DESCRIPTION

AlphaFold Clusters is a user-friendly web interface, which allows access to around 2 million clusters of related proteins to study their novelty and evolution.

WHAT THE COMMITTEE SAID ABOUT THE WORK

“Recognized by nearly 50 citations since its publication, this work has significantly contributed to the field of life science by providing a meticulously organized, readily accessible and freely available dataset of protein clusters.”

GROUPS INVOLVED

Training, led by Patricia Palagi & IT, Lausanne

DESCRIPTION

GitHub and GitLab contain many repositories with excellent bioinformatics training materials. Glittr.org enables users to find, compare and reuse these effortlessly.

WHAT THE COMMITTEE SAID ABOUT THE WORK

“Glittr.org provides a unique portal to search for and access a variety of publicly available training material. Making training resources FAIR is a particularly important contribution, which Glittr.org managed to achieve in a noticeably short amount of time.”

44 DATA SCIENTISTS FOR LIFE

Mining tumour mutation trees for evolution-guided precision oncology

DOI:10.1038/s41467-023-39400-w

GROUP INVOLVED

Computational Biology Group, led by Niko Beerenwinkel, Basel

DESCRIPTION

By learning reproducible evolutionary patterns and complex interactions among cancer mutations, TreeMHN can better predict tumour progression and facilitates evolution-guided precision oncology.

WHAT THE COMMITTEE SAID ABOUT THE WORK “TreeMHN’s computational method represents outstanding progress in the life science field. It enables better processing of single cell-based tumour mutation data for the representation and potentially, prediction of cancer trajectories.”

Navigate the vast universe of proteins

www.uniprot3d.org/

GROUP INVOLVED

Computational Structural Biology, led by Torsten Schwede, Basel

DESCRIPTION

The Protein Universe Atlas is a web service that allows users to navigate a universe populated with millions of known proteins. It also contains proteins whose function cannot yet be predicted.

WHAT THE COMMITTEE SAID ABOUT THE WORK “The Protein Universe Atlas is a groundbreaking resource for exploring the diversity of proteins. Its user-friendly web interface empowers researchers, biocurators and students in navigating the “dark matter” to explore proteins of unknown function.”

Improved predictions of antigen presentation reveal new structural insights

DOI:10.1016/j.immuni.2023.03.009

GROUPS INVOLVED

Computational Cancer Biology, led by David Gfeller, in collaboration with Computer-aided Molecular Engineering, led by Vincent Zoete, Lausanne

DESCRIPTION

The study improves predictions of ligands targeted by specific immune cells and reveals a unique way some of these ligands bind to their receptor. These results may contribute to accelerating personalized immuno-

Predicting how single cells react to perturbations

DOI:10.1038/s41592-023-01969-x

GROUP INVOLVED

Biomedical Informatics, led by Gunnar Rätsch, Zurich

DESCRIPTION

CellOT is a mathematical framework that can learn and predict the complex and heterogeneous perturbation responses of individual cells, opening the path for a better understanding of cell therapies.

WHAT THE COMMITTEE SAID ABOUT THE WORK

“The paper is a significant step forward in terms of understanding the heterogeneous response of different cell

SIB Profile 2024 45
46

Generative AI: new horizons

ChatGPT and a wealth of other generative AI models are not only disrupting our everyday lives, but also science. Find out how SIB is embracing this revolution across domains, while applying its unique expertise to tackling the new challenges and shortcomings these technologies present.

SIB Profile 2024 47

Generative AI combined with bioinformatics: a wide range of applications

What happens when generative AI is trained on scientific texts from the biological literature, databases, genetic sequences or code? Explore its applications across bioinformatics.

Remaining at the leading edge of new technological developments is among SIB’s strategic objectives (SEE P. 17). Disrupters such as generative AI have been very much in the spotlight since the release of ChatGPT. But even before then, our scientists had been embracing cuttingedge AI techniques to boost research, to improve and innovate on tools, and to accelerated discovery across life science domains. These approaches are now part of our toolbox: discover how they open new horizons, from studying biodiversity to supporting medical treatment planning.

“While these models show promise for supporting clinical work, they can still provide incorrect information, suggesting caution in their use for medical purposes. Nonetheless, as these models continue to improve, they hold potential for influencing clinical practice.”
Janna Hastings SIB Group Leader, University of Zurich and University of St. Gallen

Of generative AI and LLMs

Generative Artificial Intelligence (AI) encompasses systems capable of creating new content, from text and images to videos, music and much more. Large Language Models (LLMs), a key type of Generative AI, are trained on extensive text data, including genetic sequences or informatic code, to summarize, generate and predict new content. Models such as ChatGPT and BioBERT exemplify this, with ChatGPT excelling in generating text for chatbots and creative writing, while BioBERT focuses on (i.e. is pre-trained on) biomedical text. LLMs employ deeplearning techniques, particularly transformers, to analyse and understand language patterns from vast datasets, and to predict the next ‘word’ or sequence of words based on context.

Answering medical questions in radiation oncology

In an exploratory study involving the SIB Group of Janna Hastings, the remarkable ability of ChatGPT to answer questions in the medical field was tested in the specialized case of radiation therapy. It responded accurately to most multiplechoice questions (94%), but less so for open-ended responses, as assessed by oncologists (48%). Such inconsistency makes such models unsuited as a selfcontained source of medical information, but their language capabilities make them an exciting new user interface for databases and guidelines.

DOI: 10.1016/j.adro.2023.101400

48 GENERATIVE AI

“Integrating data from databases with different naming standards can be daunting for researchers aiming to access comprehensive knowledge on a subject. Our approach to broadening access to open data stands out by its flexibility, as it employs various AI models tailored to specific tasks.”

Understanding how insects shed their skin

Arthropods, such as insects and spiders, are Earth’s most diverse creatures, vital for nature, farming and health. The periodic shedding of their outer shell, called moulting, is key to their adaptability. However, to study this process, an integrated reference for arthropod naming is lacking. As part of a Sinergia collaboration, the SIB Groups of Marc Robinson-Rechavi and Frédéric Bastian and of Robert Waterhouse integrated species name data with sequence data from different public databases using generative AI methods into the MoultDB resource, serving as a reference for the field.

Fast generation of custom antibodies to fight disease

Monoclonal antibodies are special proteins produced in the laboratory. By cloning a single type of immune cell, it is possible to obtain a large quantity of identical antibodies that can recognize and bind to their target with high precision. These targets include, for instance, germs or diseased cells. However, their traditional discovery is very time-consuming. The SIB Group of Andrea Cavalli is working on AntibodyGPT, a language model to predict the chemical structure of an antibody with a desired property, to accelerate their development.

Liberating knowledge on species from the literature

As part of their resource BiodiversityPMC, the SIB team of Patrick Ruch used LLMs to read digitalized publications and to answer various users’ questions about species’ name, genes and interactions (ALSO READ P. 61). www.sibils.text-analytics.ch/search

A game changer for bioinformatics teaching

At the ISMB - 2023 Intelligent Systems for Molecular Biology, SIB’s Head of Training, Patricia Palagi, presented the benefits (e.g. self-learning, and discovering resources) and challenges (e.g. changes in teaching practices and critical thinking) posed by the generative AI revolution in bioinformatics teaching. The bottom line: “Trainers and students need to be prepared to deal with these new technologies and to adopt them in teaching and learning”.

www.zenodo.org/records/8187885

SIB Profile 2024 49

Deciphering the hidden role of RNA in cancer

The SIB group of Raphaëlle Luisier is teaming up with experts in Natural Language Processing at SIB and IDIAP to study RNA, molecules which carry genetic instructions and help make proteins in living cells. They are interested in parts of RNA that do not directly code for proteins, and how they affect complex human disorders, such as neurodegeneration and cancer. In melanoma, a type of skin cancer, some treatments do not work well over time, especially drugs called BRAF inhibitors, and RNA could play a role.

“LLMs possess a unique capacity to autonomously comprehend relationships within biological data. This is particularly promising in the field of genomics, where the complex nature of RNA and DNA presents formidable challenges for conventional analysis techniques.”
Raphaëlle Luisier SIB Group Leader, IDIAP

Conversing with complex biological databases

Can technologies like ChatGPT support life science researchers in exploring data they are not familiar with? This is the question our new Knowledge Representation unit investigated, through concrete examples from SIB’s leading open databases and software tools. They showed the potential of conversational AI to describe biological datasets, as well as generate and explain complex queries across them. While the benefits include leveraging the wealth of open data, authors also stressed that caution should be exercised in the process.

DOI: 10.48550/arXiv.2304.1042

Fancy a chat with Expasy, the Swiss bioinformatics resource portal?

Expasy brings together over 160 databases and software developed by SIB Groups on a platform enabling life scientists to search, filter and get suggestions as to which tool(s) could best help them in their research. A new project to integrate LLMs into Expasy aims to make queries on specific biological questions possible in natural language for the user (e.g. “Which are the genes, expressed in the rat, corresponding to human genes associated with cancer?”), by seamlessly retrieving information from the various underlying resources. SIB’s Semantic Web focus group is working on this together with the Biodata Resource team.

50 GENERATIVE AI

Generative AI and biocuration: a virtuous cycle

The interplay between the possibilities offered by AI, and LLMs in particular, and the importance of human expertise is well illustrated in the context of biocuration, where SIB is a recognized leader. Biocuration is the art of expertly extracting knowledge from the biological and biomedical literature to build an accurate, reliable and up-todate encyclopedia serving science at large.

“LLMs have the potential to help the scale-up of literature curation and guide it. However, domain expertise is essential to benchmark the models and to develop filters for the knowledge that LLMs can generate from the literature.”
Alan Bridge Director of SIB’s Swiss-Prot group

Domain experts train general AI models

AI supports biocuration, removing the more mundane tasks to let the experts focus on what they do best. The high-quality, structured and labelled datasets produced by biocuration are key to serving as training sets for AI models.

AI models support

knowledgebase creation

Supporting expert curation thanks to AI

LLMs and other AI tools are used to support biocurators in identifying nuggets of relevant knowledge from the vast and ever-increasing biomedical literature to integrate into databases. The cover of this Profile features an example of how LLMs support biocuration at the SIB Swiss-Prot group. The image illustrates a map of all enzyme-driven biochemical reactions extracted from the PubMed literature database using LLMs, overlaid with those already documented in SIB’s Rhea reaction knowledgebase, to help biocurators identify those to annotate as a priority. Colours denote whether the activity is described in Rhea (red), extracted from PubMed using LLMs (green) or both (pink). Enzymatic activities are grouped based on their similarity in terms of compound structure change.

The ability of LLMs to extract this type of knowledge was “fine-tuned” using a newly curated collection of scientific literature about enzymes. This work was performed in collaboration with the PubMed team at NCBI.

Predicting protein structure, function and sequence thanks to high-quality data

The function of a protein is a pivotal piece of information to understand molecular processes involved in disease, drug development or enzymatic activity. This function results from the protein’s 3D structure, itself determined by its sequence of amino acids. Today, generative AI models can be used to predict:

A protein structure from its sequence, which could be used to design new drugs that bind it.

A protein function from its sequence, which could help to annotate a newly assembled genome, the blueprint for life.

A protein sequence that could carry out a specific function, such as degrading an environmental pollutant.

For this, many models, from Google DeepMind’s AlphaFold to ProtGPT2, are trained on the universal protein knowledgebase UniProt, co-developed by SIB, and where proteins are extensively and reliably curated.

SIB Profile 2024 51

Taming the AI beast and its challenges through key expertise

At times seen as a threat, at others as an opportunity, the advent of generative AI is profoundly impacting our society and science. SIB scientists are aware of the challenges and are actively tackling them.

Applications of generative AI in bioinformatics already span a wide diversity of topics. However, one message cuts across these examples: there are no one-size-fits-all models, and caution must be exercised to ensure the benefits outweigh the costs. The road to trustworthy and ethical AI is indeed paved with challenges, from inaccuracies and toxic biases to environmental impact. SIB is the ideal environment where domain expertise and high-quality data come together to lead to AI models that benefit research and society alike.

Need for large quantities of high-quality data

To generate accurate predictions and outputs, but also to avoid biases that can lead to inequalities and ethical issues, models need to be trained on reliable, structured, labelled data.

Democratizing data to make them accessible and understandable by both humans and machines is at the heart of our work. We do this by ensuring our datasets follow the FAIR (Findable, Accessible, Interoperable and Reusable) principles, such as through knowledge graphs, i.e. maps showing how different pieces of knowledge are connected to each other (for instance a species, its genes, proteins and their bioactivity), helping us understand relationships and find useful insights more easily.

The Swiss AI initiative aims to leverage the new Alps supercomputer of the National Supercomputing Centre, to build academic instances of ChatGPTlike models. SIB scientists, including the group of Fabio Rinaldi, our Knowledge Representation Unit and Swiss-Prot group, are contributing data and use cases to the project, such as the universal protein knowledgebase UniProt. Incorporating such authoritative sources of knowledge will help ensure advances towards trustworthy AI.

“Knowledge

graphs are a powerful tool to connect and integrate insights from various sources. LLMs have the potential to revolutionize the way we interact with data by enabling us to directly interrogate them. Thus, knowledge graphs are an ideal complement to LLMs, in that they provide accurate and up-todate information, covering domain-specific data that is not otherwise captured.”

Ana-Claudia

Sima, Knowledge Representation Manager at SIB

Flan T5 TK Flan UL2 Chat GLM UL2
Switch GLM mT5 T0 T5 BART DeBERTa
BERT ALBERT ERNIE RoBERTa BERT ULMFiT ELMo 2022 2021 2023 2020 2019 2018 52
ST-MoE
ELECTRA Distill

Environmental impact

The larger the model, the more computing power and time to run, with a distinct impact on our carbon footprint.

Our teams fine-tune models to ensure the best fit depending on the needs, from domain-specific models trained on datasets such as PubMed with relatively few parameters, to general language models like GPT-4 with much larger training datasets and many more parameters. An SIB-wide focus group is also dedicated to study the environmental impact of our IT activity (SEE P. 40).

Finding the appropriate model

Researchers need to navigate a maze of increasingly diverse LLM models, each with their specificities and prior training sets.

The benchmarking performed by SIB experts among models in specific domains (e.g. biodiversity, proteins and clinical) serves as a guide to researchers worldwide. DOI: 10.48550/arXiv.2404.14209

Hallucinations

We have all witnessed mistakes in ChatGPT’s answers. But they may not be obvious if you are not an expert on the topic.

Critical evaluations are done by SIB’s domain experts, who excel at evaluating the models and who are able to interpret and detect mistakes in their answers. This is done, for instance, by developing specific tests to check the model’s output, such as mapping LLM-extracted biochemical reactions onto known ones to identify hallucinations.

Privacy concern for sensitive data

Unwanted third-party access to sensitive data such as personal information is a concerning aspect of the widespread use of LLMs.

The SIB Group of Janna Hastings, working with sensitive clinical data (e.g. historical clinical notes), is, for instance, setting up local instances of open-source models to enable clinicians to use the technology for real-world studies, without publicly sharing sensitive information.

“Specialized knowledge is essential when using LLMs, to interpret results accurately, maintain relevance and understand the intricacies of life science data. Model fine-tuning such as that performed at SIB is also key.”

Fabio Rinaldi SIB Group Leader, IDSIA USI-SUPSI

Interdisciplinary work between model developers and domain experts

To improve the explainability and accuracy of LLMs, it is crucial that developers and domain experts work hand-in-hand.

As bioinformaticians and computational biologists, we have both the biological domain expertise and the ability to evaluate which algorithms are appropriate in a given context. This makes us strategic partners in the dialogue with LLM engineers on life science topics.

The evolutionary tree of modern LLMs traces the development of language models in recent years and highlights some of the most well-known models. Adapted from Yang J et al., Harnessing the Power of LLMs in Practice: A survey on ChatGPT and beyond, 27 April 2023, DOI: 10.48550/arXiv.2304.13712

SIB Profile 2024 53 ChatGPT Sparrow OPT-IML Galactica YaLM Minerva BLOOMZ BLOOM Flan PaLM Claude Jurassic-2 GPT-4 Bard LLaMA PaLM LaMDA Anthropic LM_v4-s3 Anthropic LM ERNIE3.0 Cohere GPT-NeoX GPT-J GPT-Neo GLaM Jurassic-1 MT-NLG OPT Chinchilla Gopher
CodeX GPT-1 XLNet GPT-2 GPT-3
InstructGPT
54

One Health, multiple data challenges

One Health is a crucial concept, acknowledging the intricate connections between human, animal and environmental health. Discover how bioinformatics serves as its backbone.

SIB Profile 2024 55

Why approaching health from multiple perspectives matters

How does climate change influence animal health? What will the next epidemic threat be? To answer these questions, knowledge on human, animal and environmental health must be leveraged. Bioinformatics is one of the keys.

The One Health concept recognizes how human, animal and environmental health are interconnected.

COVID-19 highlighted, for instance, how environmental destruction can bring new epidemics by increasing the risk of transmission of pathogens from animals. However, harnessing vast amounts of information from various sources (e.g. DNA, climatic and clinical) is not straightforward. Find out how our science helps researchers and governments to improve pandemic preparedness, assess the impact of poor environmental health or join forces internationally.

To better grasp cross-impacts between humans, animals and the environment

The pangenome: insights into species’ adaptability to environmental change

Pangenomes represent the collective genetic diversity found within a species, inferred from the DNA of multiple individuals of the same species. Studying them allows scientists to identify traits that allow species to adapt to environmental changes or that confer resistance to pathogens. In 2023, SIB’s Robert Waterhouse co-authored a study of the first pangenome of the ecologically and economically important pollinator, the Asian honeybee. It revealed functional features related to the climatic adaptive capacity of the insect.

DOI: 10.1111/1755-0998.13905

Unlocking the potential of microbes to make plants climate resilient

SIB is part of a newly launched panEuropean project looking at how microorganisms can improve the climate resiliency of plants and crops. MICROBES4CLIMATE will provide researchers with an integrated network of infrastructure to boost our knowledge on the interaction between soil microorganisms, plants and the environment. In this ambitious endeavour, 30 partners from 13 countries are coming together to upscale the sharing of data on the topic, with SIB representing ELIXIR and Switzerland. The findings will uncover hidden ecosystem services that can potentially mitigate climate change.

56 ONE HEALTH

Leveraging data on human, animal and environmental health with bioinformatics, to be better prepared against the next pandemic.

Anticipating human-pet virus transmission

The interaction between human hosts and animal reservoirs is of crucial importance both for understanding current epidemics and for better anticipating future zoonotic pandemics. The SIB Resource V-pipe has been used on samples provided by VetSuisse from viral tests on companion pets living in proximity with SARS-CoV-2-infected owners. Results have helped gain a better understanding of the transmission chains between humans and their pets, and among the animals themselves.

DOI: 10.3390/v15010245

Tracing mosquito evolution to better understand disease transmission patterns

Researchers used the SIB Resource OMA as a starting point to build a complete list of similar genes found in different species of mosquitoes. This was instrumental in tracing the evolution of mosquitoes and revealing their history of host use. The findings have significant implications for understanding disease transmission patterns and informing both medical and ecological strategies.

DOI: 10.1038/s41467-023-41764-y

Overcoming marine plastics pollution

Researchers discovered that a few tiny sea creatures can break down a type of plastic called PBS. Using the SIB Resource SwissDock, part of SwissDrugDesign, they found enzymes that might be responsible for this process and identified a particularly promising one, PBSase. This finding could contribute to a more sustainable society through further utilization of PBS.

DOI: 10.1111/1462-2920.16512

To be better prepared for a new epidemic

A nationwide One Health data exchange platform to improve pandemic preparedness

The Swiss Pathogen Surveillance Platform (SPSP) is a shared secure surveillance platform between human and veterinary medicine, which also includes environmental and foodborne pathogens. It is managed by SIB in collaboration with the University Hospitals of Basel, Lausanne and Geneva, as well as the Universities of Bern and Zurich.

The platform enables rapid and detailed monitoring of pathogen transmission and epidemics using whole genome sequencing data and associated metadata from bacteria, viruses and fungi. It features controlled data access, complex dynamic queries, dedicated dashboards and automated data sharing with international repositories, providing actionable results for public health.

DOI: 10.1099/mgen.0.001001

58 ONE HEALTH
1000 800 600 400 200 0

SARS-CoV-2 sequences deposited with the Swiss Pathogen Surveillance Platform: breakdown by age of samples since 24 February 2020.

Bringing Swiss public health authorities to the table

SPSP was already partly funded and mandated by the Federal Office for Public Health (FOPH) for its role as the SARS-CoV-2 Data Hub.

In 2023, two new endeavours started to develop SPSP’s mandate on further pathogens of interest:

Influenza and Respiratory Syncytial Virus (RSV), with renewed funding from the FOPH, initially on human strains but already working with other labs, including VetSuisse, on integrating animal strains;

Listeria, Salmonella and more, with funding from the FOPH and the Federal Food Safety and Veterinary Office (FSVO) to optimize analysis workflow.

SPSP is now part of the FOPH’s infectious diseases dashboard, and is mandated to foster exchanges with European authorities and to become a hub for pathogen genomics data, from animal, environmental and human sources.

A collaborative consortium

SPSP brings together all major stakeholders and associated experts in Swiss healthcare and public health, including human and veterinary microbiology laboratories, infectious disease and hospital epidemiology, cantonal physicians and laboratories, the Federal Office for Public Health (FOPH), the Federal Food Safety and Veterinary Office (FSVO), as well as several central laboratories with reference functions and also regional and private laboratories.

“SPSP serves as a crucial platform for the molecular monitoring of microorganisms occurring in humans, animals and the environment, fostering a One Health approach at a national level.”
Aitana Neves
Associate Group Leader Chair of the SPSP consortium
SIB Profile 2024 59
Jul ’20 2021 Jul ’21 2022 Jul ’22 Jul ’23 2023 2024

Unlocking the potential of data for One Health

One Health represents a change of paradigm, bringing together data from very different realms. With this come new challenges.

With a large part of our knowledge of the world’s species buried in the literature, accessing accurate information (e.g. species name, genetics and ecological information) is not a trivial task. Reliable and accessible references are thus crucial, whether for monitoring species through their DNA footprint or for tracking pathogen evolution. They also underpin community efforts to fulfil the biodiversity restoration objectives from COP15, for instance. Find out how resources and methods developed at SIB tackle such roadblocks to advance knowledge on human, animal and environmental health.

Some of the major roadblocks hindering One Health projects

Data heterogeneity: data vary in type, scope and scale, requiring harmonization

Data paucity: data suffer from many gaps, requiring imputation or modelling

Data quality/resolution: production technologies vary, requiring curation and standards

Data integration: incompatibility between data requires their FAIRification

“However

crucial for One Health, the development and adoption of standards, protocols or guidelines should not be on the shoulders of individual researchers, but of a dedicated research infrastructure. This is part of the ambition of the Biodiversity Genomics Europe project, in which SIB participates. It is also what SIB’s Environmental Bioinformatics group is meant to contribute to.”

60 ONE HEALTH

Detailed information about viruses according to their host organism

The expertly curated resource ViralZone, developed at SIB’s Swiss-Prot group, includes factsheets for viruses infecting animals, plants, microorganisms, etc. This includes 64 viruses that infect humans from animals (i.e. zoonoses). These viruses are the most likely to spill over and develop human-to-human transmission before becoming endemic. This is what happened with SARS-CoV-2, most likely from bats.

Monitoring species DNA in the environment: not without the right infrastructure

Environmental DNA – or eDNA – allows researchers to detect and monitor the presence of various organisms, including pathogens, plants, animals and microbes, without the need for direct observation, for instance from water or droppings. But with sometimes very low amounts of DNA, the nature of the data requires a solid bioinformatics infrastructure (e.g. analysis pipelines and modelling approaches), such as that developed at SIB, to produce reliable results. It also relies on good-quality reference data, which is part of what the European Reference Genome (ERGA) project, chaired by SIB’s Robert Waterhouse, is set to establish.

A

new resource to query scientific papers on One Health issues: filling a key gap

To address the current gap in literature libraries on ecology and environmental papers, a dedicated resource, BiodiversityPMC, was developed by the SIB Group led by Patrick Ruch, together with Plazi and the publisher Pensoft. This resource mines the literature to enable users to answer a wide range of biodiversity questions related to human health such as:

Which evidence supports the idea that pangolins and bats interact?

Which species are reservoir hosts to the tick?

DOI: 10.3897/biss.7.111660

SIB Profile 2024 61
Examples of viruses from animals infecting humans. From top to bottom: H5N1, tickborne encephalitis, monkey pox
“We are looking forward to developing computational technologies allowing the expansion of the genomic monitoring and real-time analysis currently conducted on SARS-CoV-2 to other important pathogens such as influenza and RSV, by leveraging the SIB network of expertise and multidisciplinary partnerships.”
Tanja Stadler
SIB Group Leader, ETH Zurich

Monitoring pathogens in Swiss wastewater: a collaborative effort

The passive monitoring of SARS-CoV-2 in wastewater is a cornerstone of the current surveillance strategy of the FOPH, in the absence of broad clinical testing. This monitoring enables early warning of the introduction of new variants, provides estimates of their spread and evaluates epidemiological characteristics, earlier than traditional clinical surveillance and at a fraction of the cost. This stream of data converges with the SPSP for their open publication on the European Nucleotide Archive according to international standards (SEE P. 58)

SIB GROUPS INVOLVED

Computational Biology group (ETH Zurich), developing the SIB Resource V-pipe

Computational Evolution group (ETH Zurich)

Functional Genomics Center Zurich (ETH Zurich / University of Zurich)

NEXUS Personalized Health Technologies (ETH Zurich)

OTHER INSTITUTIONS INVOLVED

EAWAG, ETH Zurich / University of Basel, EPFL, Biosafety Laboratory (Basel-Stadt) and Microsynth AG.

A suite of leading complementary data analysis tools to track COVID-19 and other pathogens

Nextstrain (developed by Richard Neher’s group at SIB) was a key surveillance tool during, but also before the pandemic. It is also notably used to track influenza variants and supports yearly vaccine development.

covSPECTRUM (developed by Tanja Stadler’s group at SIB) enables the extraction of temporal analytics on the circulating variants and mutations.

CoVariants (developed by SIB’s Emma Hodcroft) is a trusted reference for the overview of SARS-CoV-2 variants and mutations and was the first varianttracking website.

V-pipe (developed by Niko Beerenwinkel’s group at SIB) allows for early detection and quantification of SARS-CoV-2 genomic variants in wastewater and estimation of their relative fitness advantages.

ViralZone (developed by SIB’s Swiss-Prot group) presents curated information on the virus’s proteins.

Microbes have no borders: international cooperation is key

In Switzerland and internationally, there is growing recognition of the need to bring together researchers and practitioners across institutions to coordinate efforts and address environmental challenges. In addition to supporting international research, SIB is playing an active role in international initiatives related to One Health:

SIB co-leads ELIXIR’s priority area on “Biodiversity, Food Security and Pathogens” for the Scientific Programme 2024-2028.

SIB co-led the ELIXIR CONVERGE WP9 initiative aimed at mobilizing European SARS-CoV-2 genomic data from 20+ national genomic surveillance programmes and that plans to federate national data hubs beyond COVID-19.

Drawing on its European coordination experience, SIB was asked to lead an NIH grant proposal for pathogen surveillance with a global impact involving 11 countries and 12 institutions.

62 ONE HEALTH

INDEX OF SIB GROUP AND TEAM LEADERS

As of 1 January 2024

AAhrens Christian Proteins and proteomes Agroscope

Anisimova Maria Evolution and phylogeny Zurich University of Applied Sciences (ZHAW)

BBaerenfaller Katja Proteins and proteomes SIAF – University of Zurich

Bairoch Amos Proteins and proteomes University of Geneva

Bank Claudia Evolution and phylogeny University of Bern

Barbié Valérie Core facilities and services SIB Hub – Geneva

Bastian Frédéric Evolution and phylogeny University of Lausanne

Baudis Michael Genes and genomes University of Zurich

Beerenwinkel Niko Evolution and phylogeny ETH Zurich, D-BSSE

Bergmann Sven Genes and genomes University of Lausanne

Beltrao Pedro Systems biology ETH Zurich

Bitbol Anne-Florence Evolution and phylogeny EPFL

Boeva Valentina Systems biology ETH Zurich

Bridge Alan Proteins and proteomes SIB Hub – Geneva

Bruggmann Rémy Core facilities and services University of Bern

Buljan Marija Systems biology Empa

CCarmona Santiago Systems biology University of Lausanne

Cascione Luciano Core facilities and services Institute of Oncology Research

Cavalli Andrea Structural biology Università della Svizzera italiana

Chopard Bastien Systems biology University of Geneva

Ciriello Giovanni Systems biology University of Lausanne

Correia Bruno Structural biology EPFL

Crameri Katrin Core facilities and services SIB Hub – Basel

SIB Profile 2024 63
NAME FIELDS OF ACTIVITY LOCATION

DDal Peraro Matteo Structural biology

EPFL

Deupi Xavier Structural biology Paul Scherrer Institute (PSI)

Deplancke Bart Genes and genomes EPFL

Dessimoz Christophe Evolution and phylogeny University of Lausanne

EExcoffier Laurent Evolution and phylogeny University of Bern

FFalquet Laurent Genes and genomes University of Fribourg

Fellay Jacques Genes and genomes EPFL

GGervasio Francesco Luigi Structural biology University of Geneva

Gfeller David Proteins and proteomes University of Lausanne

Glover Natasha Evolution and phylogeny University of Lausanne

Gonnet Gaston Evolution and phylogeny ETH Zurich

Goudet Jérôme Evolution and phylogeny University of Lausanne

Gottardo Raphael Core facilities and services University of Lausanne

HHastings Janna NEW Machine learning and text mining University of Zurich

IIbberson Mark Core facilities and services SIB Hub – Lausanne

Iber Dagmar Systems biology ETH Zurich, D-BSSE

Ivanek Robert Systems biology University of Basel & University Hospital Basel

JJutzeler Catherine NEW Genes and genomes ETH Zurich

KKahraman Abdullah Core facilities and services University Hospital Zurich

Kriventseva Evgenia Genes and genomes University of Geneva

Kutalik Zoltán Genes and genomes University of Lausanne

64
NAME FIELDS OF ACTIVITY LOCATION

LLane Lydie

Proteins and proteomes

Lill Markus NEW Structural biology

Lisacek Frédérique Proteins and proteomes

Luisier Raphaëlle Genes and genomes

MMalaspinas Anna-Sapfo

Genes and genomes

Mazza Christian Systems biology

Messner Christoph NEW

Michielin Olivier

Proteins and proteomes

Structural biology

Miho Enkelejda Systems biology

University of Geneva

University of Basel

University of Geneva

IDIAP

University of Lausanne

University of Fribourg

University of Zurich, Davos

University of Lausanne

FHNW University of Applied Sciences and Arts

Northwestern Switzerland

Milinkovitch Michel Systems biology University of Geneva

Mitri Sara Evolution and phylogeny

NNeher Richard

Evolution and phylogeny

Ng Charlotte Systems biology

PPalagi Patricia

Panse Christian

Core facilities and services

Core facilities and services

Pedrioli Patrick Proteins and proteomes

Peña-Reyes Carlos-Andrés

Text mining and machine learning

Pivkin Igor Systems biology

University of Lausanne

University of Basel

University of Bern

SIB Hub – Lausanne

ETH Zurich

ETH Zurich

HEIG-VD

Università della Svizzera italiana

Rätsch Gunnar

Rehrauer Hubert

Riedi Marcel

Rinaldi Fabio

Rinn Bernd

Robinson Mark

Robinson-Rechavi Marc

Ruch Patrick

Text mining and machine learning

Core facilities and services

Core facilities and services

Text mining and machine learning

Core facilities and services

Genes and genomes

Evolution and phylogeny

Text mining and machine learning

ETH Zurich

ETH Zurich, University of Zurich

University of Zurich

SUPSI

ETH Zurich, D-BSSE

University of Zurich

University of Lausanne

HES-SO - Geneva School of Business Administration (HEG)

SIB Profile 2024 65
R
NAME FIELDS OF ACTIVITY LOCATION

SSchütz Frédéric Core facilities and services University of Lausanne

Schwede Torsten Structural biology, Core facilities University of Basel and services

Sengstag Thierry Core facilities and services

University of Basel

Silvestro Daniele Evolution and phylogeny University of Fribourg

Snijder Berend Systems biology

ETH Zurich

Stadler Michael Genes and genomes Friedrich Miescher Institute for Biomedical Research

Stadler Tanja Evolution and phylogeny

Stekhoven Daniel Core facilities and services

Stelling Jürg Systems biology

Sunagawa Shinichi Genes and genomes

Vvan Nimwegen Erik Genes and genomes

Vogt Julia Text mining and machine learning

von Mering Christian Proteins and proteomes

WWagner Andreas Evolution and phylogeny

ETH Zurich, D-BSSE

ETH Zurich

ETH Zurich, D-BSSE

ETH Zurich

University of Basel

ETH Zurich

University of Zurich

University of Zurich

Waterhouse Robert Core facilities and services SIB Hub – Lausanne

Wegmann Daniel Evolution and phylogeny University of Fribourg

Wollscheid Bernd Proteins and proteomes

ZZavolan Mihaela Systems biology

Zdobnov Evgeny Genes and genomes

ETH Zurich

University of Basel

University of Geneva

Ziegler Andreas Core facilities and services Cardio-CARE AG

Zoete Vincent Structural biology University of Lausanne

66
NAME FIELDS OF ACTIVITY LOCATION

ACKNOWLEDGEMENTS

We gratefully acknowledge the following funders, sponsors and partners for their financial support and encouragement in helping us fulfil our mission in 2023.

The Swiss government and in particular: The State Secretariat for Education, Research and Innovation SERI

The Federal Office of Public Health FOPH

The Swiss National Science Foundation (SNSF) Innosuisse

Our institutional partners:

The European Commission

The National Institutes of Health (NIH)

The Research for Life Foundation

Our [BC]2 sponsors: Agilent Technologies

Biozentrum

Kanton Basel-Stadt

Novartis

Novigenix

PHRT Personalized Health and Related Technologies

Roche

SCNAT Swiss Academy of Sciences

SPHN Swiss Personalized Health Network

We also thank all industrial and academic partners who trust SIB’s expertise – and all employees and members who contributed to this edition of the SIB Profile.

IMPRESSUM

© 2024 – SIB Swiss Institute of Bioinformatics

ILLUSTRATION BY Davide Bonazzi / Salzmanart www.davidebonazzi.com

DESIGN AND LAYOUT BY Bogsch & Bacco, www.bogsch-bacco.ch

IMAGE CREDITS

(from top to bottom and from left to right) Cover

The cover image represents a map of potential enzymatic reactions extracted from PubMed using LLMs that were fine-tuned to perform this task using the EnzChemRED corpus.

DOI: 10.48550/arXiv.2404.14209

Credit: Dr Anastasia Sveshnikova (Swiss-Prot group).

Inside cover

Keystone / Peter Klaunzer Bogsch & Bacco

Pages

P. 1 Alamy Stock Photo / Mark Shenley

P. 2 Nicolas Righetti / lundi13

P. 3 Valentin Luggen

P. 4 Adobe Stock / Guido Parmiggiani

Davide Bonazzi

Ute Röhrig - SIB

p. 8 Franziska Gruhl – SIB. All rights reserved

Sutthaburawonk / iStock

Fabio Rinaldi – SIB. All rights reserved

P. 11 Nicolas Righetti / lundi13

P. 14 Valentin Luggen

University of Bern – Dres Hubacher

EPFL

Nicolas Righetti / lundi13

University of Geneva – Jacques Erard

University of Zurich

P. 16 Marie-Claude Blatter – SIB

Anastasia Sveshnikova – SIB

P. 24 Nicolas Righetti / lundi13

Felix Imhof

P. 25 Nicolas Righetti / lundi13

P. 30 Indlekoferw - CC BY-SA 4.0

P. 31 Felix Imhof

P. 33 Davide Bonazzi

P. 35 Bio-Techne

P. 37 Pierre Fabre

P. 39 Portra / iStock

Petukhov, V et al. Nat Biotechnol 2022. DOI:10.1038/s41587-021-01044-w

P. 40 Olivier Martin – SIB

P. 41 ISMB / ECCB, Lyon

NMBE / Rodriguez

DR

P. 42 Adobe Stock

Keystone / Nikolai Kurzenko

P. 43 Alamy Stock Photo

Keystone / Michael Abbey

P. 44 A0A1S3QU81, AlphaFold Clusters

CellCharter

P. 45 Alamy Stock Photo

P. 49 Viola Beghini

Dan Rieck / Alamy Stock Photo

P. 50 Ute Röhrig - SIB

Nicolas Righetti / lundi13

P. 51 Nicolas Righetti / lundi13

Figure adapted from Alan Bridge - SIB

P. 56 Adobe Stock

P. 57 Davide Bonazzi

P. 58 Adobe Stock / Alexander Potapov

P. 59 Nicolas Righetti / lundi13

P. 60 Felix Imhof

P. 61 Keystone / Dr Gopal Murti

Keystone / A. Dowsett

Keystone / Hazel Appleton

P. 62 ETH Zurich

Keystone / Gaetan Bally

P. 68 Keystone / Rupert Oberhaeuser

SIB Profile 2024 67

Activity pillar II – A centre of excellence delivering life science data solutions

We deliver data solutions fostering innovation, health advances and societal impact on a day-to-day basis to academic, clinical, governmental and industry partners. SEE P. 46

Activity pillar III –Coordination: bringing key partners and data together in large-scale projects We maximize the discoveries from life science data through coordination among disciplines and institutions, and alignment between Swiss and international practices.

SEE P. 38

See
Do you know how scientists at Swiss-Prot use AI to support their biocuration work?
P. 51 for more.
SIB Swiss Institute of Bioinformatics
www.sib.swiss

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.