5 2
THE AI REVOLUTION CONTINUES
Supplying the foundation of AI AlphaFold, the AI model recognized in the 2024 Nobel Prize in Chemistry, was built on bioinformatics expertise – including decades of work by SIB scientists.
T
o effectively tackle complex challenges, AI models must be trained with accurate and relevant data. Their predictions must also be tested and fine-tuned. Our work in three key areas provides these essential foundations for trustworthy AI: democratizing data and knowledge through biocuration; describing and connecting data and information through knowledge representation; and assessing performance through benchmarking. These building blocks enable SIB scientists and other researchers to develop reliable AI systems that deliver meaningful biological insights – from elucidating cellular processes to developing new biomedicines and biotechnologies.
John Jumper Distinguished Scientist, DeepMind AlphaFold Team Lead
“High-quality, publicly available databases and benchmarking systems – including UniProt, CAMEO and CASP – were essential to AlphaFold's development and validation.”
Gold-standard training data through curated databases Our curated databases provide highly reliable data and knowledge from which AI models can learn to recognize patterns and make relevant predictions. One of these, UniProt, was crucial in training AlphaFold (SEE P. 53). Many others are available for, and being used in, AI applications across the life sciences.
Transforming data into AI-ready knowledge SIB databases follow FAIR and machine-readable principles, and are open and high quality. We achieve this by: harmonizing and organizing data according to international standards; annotating data with high-quality metadata to remove ambiguities, provide context and enable connections with other datasets; and in several databases even removing duplications and errors. Expertly curated databases additionally include relevant and continuously updated information from the scientific literature and other sources. Biocurators in our Swiss-Prot group, for example, combine deep expertise in bioinformatics and protein biology to annotate protein sequences in UniProt with knowledge on the proteins’ structure, function and more. This explicit encoding of complex information in machine-readable formats underpins the democratization of data and knowledge – which is essential for AI advances in the scientific domain. DOI: 10.1038/S41597-024-03099-1