
3 minute read
The use of Machine Learning in predicting Urinary
from DC QUANTUM
by MahaNawaz
Tract
Infections and minimising the use of antibiotics.
Advertisement
Problem statement
Urinary Tract Infections are infections caused by bacteria that enter the urethra and then infect the urinary tract. These bacteria may also travel upwards and infect the kidneys [1]. Furthermore, the excessive prescription of antibiotics is the driving force to why, in present day, antibiotic resistance is surging and becoming one of the world’s greatest problems with regards to healthcare. Many studies have proven that doctors may prescribe antibiotics based on the symptoms without consulting the results of the urine bacterial cultures. While there are many current educational courses and regulations that aim to enhance this type of prescribing behaviour, this project focuses on using a ‘logistic regression model’, a machine learning algorithm, that uses patient’s history and data extracted from electronic health records, to detect these UTIs before the doctor can prescribe medicines and treat illnesses.
Aim of the project
This project revolves around developing and evaluating a machine learning model that will predict urinary tract infections for patients in outpatient care and to also minimise the prescription and use of antibiotics.

Understanding a logistic regression model
Logical regression is a supervised machine learning algorithm that is used to predict the probability of a binary answer occurring. For example, a way in which a logistic regression model could be applied effectively to machine learning is to determine whether a person is obese or not, or if a person is infected with Covid-19 or not. In this project, we use a logistic regression model to predict whether the patient is likely to have a UTI or not likely to have one.
Pre-processing of the data
Patients’ data from 2015 to 2021 in a large hospital with primary, secondary and tertiary care facilities was used to undertake this project. The dataset also included adult outpatient details with at least one result of a urine bacterial culture test. For the input features that the logistic regression model will find patterns in, we pre-processed each of the patient’s vital signs (e.g., the heart rate, temperature, oxygen level etc.) as well as ICD-10 codes, which is a way physicians classify diagnoses and symptoms into categories, and data gathered from the patient’s previous hospital encounter. The output was defined as a binary label of 0s and 1s, which indicates whether the patient had a positive or negative urine culture test result. A positive UTI test result was classified if the concentration of the urine pathogen is higher than 10,000 colony forming units per millilitre (C FU/ml).
Division of the dataset and rationale
The dataset was split randomly into a training set consisting of 70% of the data, a validation set, consisting of 10% of the data, and a test set, which contains the remaining 20% of previously unseen data the algorithm has never been exposed to before to determine the true accuracy of the model.
Results
After applying the logistic regression model to the dataset and based on the accuracy, we found that 80% of patients who are truly positive were also predicted by the Machine Learning model to be positive for the urinary tract infection. In addition, out of the 1,518 patients the model predicted were not likely to contract the infection, 29 of these patients were prescribed antibiotics. Furthermore, out of these 29 patients, 12 of them had a negative urine culture test result. This shows how these 12 patients were unnecessarily given antibiotics which could have been reduced through this machine learning algorithm.
Conclusion
Through this project, we used an optimised Machine Learning model that can decrease the number of false positives and reduce the unnecessary prescription of antibiotics. This project is deemed to be very important into the future of healthcare, considering the emergence of antibiotic resistance becoming a global threat, especially when dealing with Urinary Tract Infections.
Next steps
Further improve the machine learning model by fine-tuning the hyperparameters of the model and get support to apply this in hospitals.
Project Team
- Farah E Shamout (Assistant Professor, Emerging Scholar of Computer Engineering at NYU Abu Dhabi) – Mentor and Project Lead
- Zaki Almallah – Doctor advising on the health-related side of the project
- Nasir Hayat, Terrence Lee St John, Phillip Wang, Vee Nis Ling, Lelan Orquiola, Vansh Gadhia – Project Members
This project was selected as one of the best abstracts and offered a podium presentation for the Faculty of Clinical Informatics Scientific Conference happening in summer of 2022 in UK.
References
[1] CDC. Urinary tract infection [Internet]. Centers for Disease Control and Prevention. 2022 [cited 2022 Jun 18]. Available from: http://cdc.gov/antibiotic-use/uti.html
Vansh Gadhia 11LRU