OMICS POSTER 1

Metabolomics - based machine learning method for the discovery and characterization of biomarkers implicated in Non-Alcoholic Fatty Liver Disease Ambrin Farizah Babu1, Sara Leal Siliceo2, Howell Leung2, Emmanouil Nychas2, Gianni Panagiotou2, Kati Hanhineva 1 1

Institute of Public Health and Clinical Nutrition, University of Eastern Finland, Kuopio, Finland Institute for Natural Product Research and Infection Biology – Hans Knöll Institute (HKI), Systems Biology and Bioinformatics, Jena, Germany 2Leibniz

ABSTRACT ➢ Nonalcoholic fatty liver disease (NAFLD) is the most common chronic liver disease worldwide. However, the diagnostic approaches for NAFLD detection is challenging due to the limited availability of non-invasive biomarkers. Metabolomics coupled to machine learning can pave way to identify diagnostic biomarkers, understand disease mechanisms, and evaluate the treatment of various diseases. ➢ Here, targeted metabolomics was performed by liquid chromatography – mass spectrometry on healthy adults and those with NAFLD. 6 machine learning approaches were applied to the metabolomics dataset – Artificial Neural Network (ANN), K- Nearest Neighbor (KNN), Logistic regression, Support Vector Machine (SVM), Decision tree and Ensemble. These were randomly split into training, validation and test sets, and included dimension reduction, feature selection, and classification model development. The accuracies of these 6 models were tested. ANN pattern recognition model has the highest area under the curve (AOC) in classifying the subjects with and without NAFLD. ➢ The study demonstrates the potential of ANN for NAFLD metabolomics data classification in realistic situations. Further model development and independent validation testing in other cohorts are warranted. METHODOLOGY

Fig 1: The general workflow used in this study. The samples from healthy and NAFLD patients were prepared and analysed with UPLC-QTOF-MS. The data analysis included peak picking, alignment and biostatistical analysis to identify the differential metabolites. The list of identified metabolites are randomly split into training-validation and test dataset and fed into various machine learning platforms. Based on metabolomics data, the machine learning method classifies the data into NAFLD vs Non-NAFLD.

RESULTS & DISCUSSION ➢ Of the 123 measured metabolites, 32 were identified as optimal to discriminate between NAFLD and control. ➢ ANN pattern recognition model has the highest area under the curve (AOC) in classifying the subjects with and without NAFLD.

Fig 2. A) Annotated metabolites and their chemical classes B) Top ranked differential metabolites between the two groups (NAFLD and control) C) Prediction accuracies of the 6-machine learning methods D) Confusion plot and ROC curves of the best performing machine learning method

➢ The study demonstrates the potential of ANN for NAFLD metabolomics data classification in realistic situations. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 813781

Contact: ambrin.babu@uef.fi

Turn static files into dynamic content formats.

Create a flipbook