
3 minute read
AI Measures COVID-19 Lung Disease Severity on Chest X-Rays
Early in 2020, a team of researchers at Massachusetts General Hospital was developing an artificial intelligence (AI) algorithm that could measure disease severity and detect change from medical images, which could help inform clinical decision-making. By March, they had described two potential applications of the algorithm. Monitoring disease progression in retinopathy of prematurity on retinal photographs showed a great deal of promise, as did evaluating disease severity in knee osteoarthritis on x-rays.
The researchers started thinking about other possible uses; because the algorithm architecture is flexible, it could be generalized to other clinical applications. But then the COVID-19 outbreak emerged as a global pandemic, scrambling the work of researchers and clinicians.
Advertisement
Matthew D. Li is a resident physician in the Department of Radiology at Mass General and a member of the Quantitative Translational Imaging in Medicine (QTIM) Lab at the Martinos Center for Biomedical Imaging, the group that developed the algorithm. When the hospital experienced a surge of patients with COVID-19, he volunteered to redeploy to a COVID-19 inpatient service to help with the effort to treat them. “While I was there,” he says, “I noticed there is a lot of information in a chest x-ray that isn’t really extracted or communicated in the report,” information that could convey to clinicians the severity of the findings. To help clinicians take advantage of this information, he and colleagues in the QTIM group set out to create an AI tool that could extract it from chest x-rays. They reported the tool in late July in the journal Radiology: Artificial Intelligence.

Matthew D. Li
Objective Measures of Severity and Change
Clinicians often use imaging to evaluate both the severity and progression of disease, in many cases by assigning severity to one of several categories based on the imaging findings and seeing whether and how the classification changes on follow-up. This approach has limitations, though. Because manual readings are inherently subjective, there is inevitably variability in clinicians’ interpretations of the severity.
In their work done before the pandemic, the members of the QTIM group addressed this challenge by developing an automated algorithm for disease severity evaluation and change detection on a continuous spectrum. Based on the Siamese neural network, a type of deep learning architecture originally deployed in the 1990s for verification of credit card signatures, this algorithm takes two medical images as inputs and outputs a quantitative measure of difference in disease severity between the two images.
Having decided to apply it to imaging of COVID-19, the researchers set about training the algorithm—that is, feeding it imaging data so it could learn to extract patterns from other, similar data. For pretraining, they used a publicly available chest x-ray data set, CheXpert, from Stanford Hospital in Palo Alto, with 224,316 chest x-rays. For subsequent training, they used 314 admission chest x-rays from patients hospitalized at Mass General for COVID-19 during the period of April 1 to April 10, 2020. Further testing was done with additional COVID-19 data sets.
Now that it has been trained, when fed a new chest x-ray, the algorithm can extract a quantified measure of COVID-19 lung disease severity called the pulmonary x-ray severity (PXS) score. Comparisons with ground-truth interpretations by multiple Mass General radiologists showed correlations between PXS scores and tedious manually annotated scores for severity.
Informing Clinical Decision-Making
In the wake of the Radiology: Artificial Intelligences study, Jayashree Kalpathy-Cramer, senior author of the study and director of the QTIM Lab, described several possible use cases for the algorithm. First, she said, was enabling standardized radiologist reporting and thus addressing the challenge of inter-rater and intra-rater variability. When a frontline clinician reads the radiologist’s report, a descriptor is only useful if it is reproducible (severe means severe, mild means mild, etc.). The PXS score may help improve this reproducibility. The algorithm could also be helpful in clinical decision-making, as the PXS score can help predict subsequent intubation (needing a ventilator) or death. For example, by setting a PXS score threshold, frontline clinicians could use it to help assess the risk of patient decompensation, along with other clinical data like lab values and vital signs. To this end, the researchers were working to validate the algorithm in several additional cohorts so it could better generalize to broader patient populations.
Another possible application in patients with COVID-19 is monitoring the progression of the disease, providing a quantifiable answer to the question of whether or not a patient is getting better. Here, the authors of the study were working with Bruce Fischl and Adrian Dalca, both of the Laboratory for Computational Neuroimaging, also at the Martinos Center, to incorporate novel registration techniques to improve change detection with the algorithm.

Jayashree Kalpathy-Cramer