5 minute read

Artificial Intelligence-based Approaches to Clinical Text Mining

Next Article
Future Outlook

Future Outlook

Artificial Intelligence (AI)-based Approaches to Clinical Text Mining

Dr. Sophie Laurenson (BSc Hons., Ph.D. Cantab.)

Human-driven codification and annotation of clinical text is time- and costintensive. AI-based approaches using Natural Language Processing (NPL) and Machine Learning (ML) can aid in entity identification and codification.

Introduction The information stored in electronic health records (EHRs) is often recorded as clinical narratives in free text format. This data is essentially unstructured, making the extraction and analysis of useful data features difficult. To access the wealth of clinical data stored in EHRs, order and structure must be imposed on free text data. This can be achieved by codification, applying rules and algorithms to unstructured information. Such processes can be implemented manually using human-driven codification systems based on established standards and rules-based guidance systems. Alternatively, Artificial Intelligence (AI)- based approaches leveraging Machine Learning (ML) and Natural Language Processing (NLP) techniques can convert clinical documents into data elements that can be identified and analysed 31 .

AI-Based Approaches to Clinical Text Mining Some aspects of digital health information are recorded in structured formats. Even records that included unstructured free text fields can retain an overall structured format which aids in data extraction and analysis. Indeed, many computerized provider order entry (CPOE) systems use controlled vocabularies to avoid unstructured narrative. However, in instances where an obvious structure cannot be superimposed by annotation, AI-based approaches can be employed to detect and extract key terms from narrative data.

NLP techniques are derived from computer science and computational linguistics disciplines. They include processing tasks such as named entity recognition, tokenisation and character gazetteer. Advanced NLP systems are built on the basis of word or phrase recognition mapping to medical terms that represent domain concepts as well as understanding the relationships between concepts. Modern NLP methods employ a combination of rule-based and supervised machine learning approaches. Once developed, NLP techniques have the advantage of scalability and may be adapted and applied to a range of datasets.

Text Mining in EHR Analytics Narrative or free text data represent a large fraction of the patient data contained within an EHR. Examples of free text data include physicians’ notes describing physical examination, symptoms and medical interventions. This unstructured data poses a challenge for extraction by automated computer processing. Several frameworks have been developed to facilitate clinical language processing and link data to scientific and medical knowledge bases. Examples include the National Library of Medicine’s Unified Medical Language System (UMLS) 32 , General Architecture for Text Engineering (GATE) 33 , Unstructured Information Management applications (UIMA) 34 and provided by the Open Health Natural Language Processing (OHNLP) Consortium 35 . Recently, several NLP techniques have been developed to facilitate information extraction from the free text in EHRs. Applications have included diagnostic classification, identifying patient cohorts, identifying co-morbidities and postoperative complications, reporting of notifiable diseases, syndrome surveillance, medication event extraction, adverse event detection and disease management 24 . NPL has been used in a number of proof-ofconcept studies to extract and analyse data from EHRs. Examples have included using NLP to extract cancer staging data 36 , formulating oncology treatment summaries 37 , automation

Artificial Intelligencebased approaches leveraging Machine Learning (ML) and Natural Language Processing (NLP) techniques can convert clinical documents into data elements that can be identified and analysed

of patient risk stratification 38 , predict outcomes based on radiology reports 39 and EHR data for oncology patients 13 . Unsupervised NPL methods, such as word2vec and CUI2vec-based approaches, have gained recent popularity. These approaches use contextual language information to make determinations on specific terms and their relationships and have been applied to parsing free text in pathology reports 40,41 . This trend marks a significant shift from previous approaches that based classifications on predetermined labels applied by content experts.

Application of AI-Based Approaches in CDS Systems CDS systems have demonstrated improved practitioner performance in approximately 60% of cases reviewed in literature 42 . A CDS platform can contribute to improved patient care in several ways: by automatically and proactively providing decision support within clinician workflows; providing recommendations and providing decision support at the time and location of decision making 43 . The key functions of CDS systems require understanding the context within a clinical narrative from which an entity is extracted and the relationships between entities. The explicit application of NLP tools for extracting clinical data for CDS applications has been lagging, due to the high levels of accuracy in entity identification that is required 6 . Extracted data may suffer from quality issues including validity and accuracy. As a result, the information contained within free text data have not been widely adopted within Clinical Decision Support (CDS) systems and represents a missed opportunity for improving healthcare. Concerns over the accuracy of CDS systems have largely been due to the lack of explicit guidelines for decision making. These concerns could be remedied through the application of data-driven decision models utilising the wealth of clinical data captured in narrative format.

A recent example applied NPL to combat poor provider compliance with guidelines surrounding cervical cancer screening to create a CDSS platform combining a free-text rule base and a guideline rule base to analyse Pap smear reports 24 . In this instance, the explicit decisionmaking guidelines and the well-structured format of Pap smear reports enabled NPL to extract key data for use in clinical decision support.

For diseases that are not supported by clear evidence-based decision-making guidelines, AIbased text mining could offer potential to unlock hidden patterns within clinical data. However, the process by which these insights become integrated into routine clinical care is slow. AI algorithms must be developed in a manner which is consistent with legal and regulatory frameworks, requiring full transparency.

If CDS systems were developed based upon AI approaches, it would require reliable accurate algorithm performance integrated within flexible and fast systems 43 . Passive NLP CDS applications require input by the user to generate output, whereas active NLP CDS applications leverage existing data to proactively push information to users as alerts or reminders. Advanced NLP engines could also be leveraged to provide computer-assisted coding solutions to aid in human-driven annotations and codification.

Conclusion Human-driven annotation and codification of unstructured free text data is a time- and costintensive process, often limited by deficiencies in clinical knowledge and prone to error. AI-based methods can be employed to identify entities within free text data, maps entities to concepts and relations between concepts. The process of developing automated codification systems differ depending on the goals of developing a system: to process free text for any task versus solving a specific clinical task. Current research suggests that NPL for developing clinical decision support systems may be achieved using tools developed for specific tasks and rule bases. However, as AI methods advance, the potential to develop more autonomous and generalised systems will impact on clinical practise.

This article is from: