
5 minute read
/// DATA SCIENCE, MACHINE LEARNING AND STATISTICS AND THEIR IMPACT ON FUTURE MOLECULAR BIOLOGY AND MEDICAL SCIENCE
Prof. Yakhini and PhD student Inbal Preuss with a screen showing the progress of the process; A DNA sample sequenced on a nanopore device. Sequencing is an important component of modern molecular biology. Much of the group’s activity involves analysis of sequencing data
Led by Prof. Zohar Yakhini, the Yakhini Research Group, an active bioinformatics and data science research collective, consists of scientists from the Efi Arazi School of Computer Science at Reichman University and the Faculty of Computer Science at the Technion. Together, the group uses data science, machine learning, statistics and algorithmics to develop data analysis methods and tools, and apply them to molecular biology, medical science and other domains.
One of the studies the group is currently working on is led by PhD student Inbal Preuss. The study, which is the follow up to a study conducted by Prof. Yakhini and his group at the Technion, led by Dr. Leon Anavy, aims to improve and further optimize the compression mechanism that allows data to be stored in synthetic DNA molecules. The goal: allowing more data to be stored using a less expensive synthesis process. According to Preuss, “Storing data in DNA is a promising new research field that holds the potential to resolve the data storage challenge for several specific use scenarios, making it more space-efficient by about ten orders of magnitude.” Prof. Yakhini adds that while there are a number of advantages to storing data in DNA, the most important is that DNA is timeless.
“DNA is central to all things life sciences and medical sciences, and therefore, will never go obsolete. Its biggest disadvantage, however,” he explains, “Is that the materials and the process itself are currently very expensive, and the data is not as accessible as, for example, flash memory.” Prof. Yakhini further notes that, “Specific applications can still be attractive in the short term, despite the cost.” Work on storing data on DNA is done in collaboration with other academic and industry groups, including the groups of Prof. Roee Amit and Prof. Eitan Yaakobi at the Technion.
Another study that Preuss is leading uses genome editing technology (CRISPR or Clustered Regularly Interspaced Short Palindromic Repeats) to promote the efficient production of lab-grown meat. According to Preuss, “Lab-grown meat will greatly contribute to resolving environmental and social challenges stemming from escalating food prices and the depletion of crucial resources during food production.” In cooperation with the Rak Lab in the Volcani Center, this study aims to make synthetic meat production more effective and cost-efficient.
Yet another study to note was led by MSc student Ido Amit, who used Machine Learning and statistics to identify and measure the side effects of genome editing. During the editing process, genomic areas that are not targeted may be affected, in addition to the desired target regions.
Data storage using synthetic DNA. Text or image is converted into a binary file which is then encoded in synthetic DNA. The original content can be reconstructed by sequencing the DNA recognize morphology markers that are not currently known, and use these discoveries to improve cancer treatment. This line of research is led by PhD students Alona Levy-Jurgenssen, Ben Galili, and Roy Shafir; MSc students Chloe Benmoussa, Ilan Gefen, and Shuli Finley; and in collaboration with Prof. Ariel Shamir at Reichman University. The group also collaborates with international research groups in Oslo, Stockholm, Erlangen and Barcelona amongst others.

Data storage using synthetic DNA. Text or image is converted into a binary file which is then encoded in synthetic DNA. The original content can be reconstructed by sequencing the DNA
Amit’s work, done in collaboration with the Hendel Lab at Bar Ilan University, aimed to measure and quantify the side effects and to minimize them, in order to make the editing process safe for use in clinical practice. The group continues this investigation, with method development work led by Guy Gozlan, Guy Assa and others.
Bioinformatics and computational biology are also being used to develop tools for medical research and practice. For example, using Machine Learning, the group is working on improving the use of pathology in cancer treatment. Today, a pathologist sits in front of the microscope and examines every sample, looking for the relevant signature markers of each type of cancer. However, with Machine Learning, data and significant labels can be used by a computer, which will learn predictive features from scanned biopsy samples.
This process, an aspect of digital pathology, is aimed at making pathology testing more timeand cost-efficient. More importantly, the group has already shown that the computer will be able to
Prof. Yakhini concludes, “Modern life science has undergone a revolution in recent decades, driven by the development of data-intensive technologies that allow for deeper scrutiny and greater understanding of samples and processes.For this greater resolution of information to materialize into useful knowledge and into improved clinical practice, scientists use efficient and effective computational tools and methodologies. In our group, we are excited to be part of this revolution and to work together in this dynamic field, pursuing algorithmic and statistical innovation and enabling important life science discoveries and technologies.”
Standard biopsy slides (left) are Machine Learning analyzed to produce maps of over and under expression of important biological markers (right)

Standard biopsy slides (left) are Machine Learning analyzed to produce maps of over and under expression of important biological markers (right)