AMSJ Volume 8, Issue 1 2017

Page 23

Volume 8, Issue 1 | 2017 Records identified through database searching n=164

Additional records identified through other sources n=1

Records after duplicates removed n=165

Challenges of Big Data

Records screened n=165

Records excluded n=119

Full-text articles assessed for eligibility n=46

Full-text articles excluded with reasons n=23

Studies included in literature review n=23

Figure 1. PRIMSA flow diagram demonstrating search strategy.

in keying in patient ID numbers into electronic medical records to access patient information [7]. In contrast, unstructured data include traditional print records, electronic free text, radiographic films, or survey data collected from patients [9]. c. Veracity concerns the true representativeness of the data. It refers to the goal of achieving validity and credibility in the data set [7]. d. Velocity represents the rate at which data is recorded and generated to allow timely retrieval for analysis and decisionmaking [7].

Advantages of using big data (i) Accessibility and availability Big data are readily available [10]. Patient records, such as admission history, investigations, diagnostic results, and medications, are all electronically documented on hospital databases. As these hospital databases are installed on staff computers in the hospital, health professionals working in these hospitals are able to easily access this data for review or, increasingly, for clinical research. The integration of multi-pathway patient records1 in big data provides a convenient, comprehensive pool of information available to researchers [11]. This integration facilitates retrospective cohort studies and therefore aids researchers to identify patterns in disease progression and compare the effectiveness of treatments [11]. (ii) Cost- and time-efficiency Given the convenience of data collection using electronic patient registers, the process of obtaining information needed for clinical research is shortened, in comparison to the more time-consuming alternative of manually collecting patient data [12]. Big data are useful in minimising logistical impediments in prospective and retrospective, longitudinal, population-based studies [13,14]. Researchers who require large sample sizes can also easily extract information from the available pool of data in these databases, potentially increasing the study power of their research [13]. The added benefit of being able to use computerised techniques to analyse unstructured data within big data also means that finer data acquisition can be performed, compared to data acquired 1

by laborious, manual extraction from traditional datasets [14]. In addition, this information is available at a low cost, if not free, to clinical research staff, bypassing potential additional costs that might be incurred through manual data collection [15].

Multi-pathway patient refer to a thorough documentation of a patient’s journey from admission into the emergency department, referral to various specialties and notes from allied health. These include documentation on the patient’s presentation, treatment, progress and recovery.

Kaplan et al. [16] suggests that several biases can arise when analysing big data, including, but not limited to, sampling bias and lack of scope in the information recorded. Secondly, the validity of big data is highly dependent on the context in which it is being used [17]. Lastly, minor data security issues may arise from the utilisation of big data. (i) Sampling bias and lack of scope Sampling bias of big data can be discussed in terms of its standardisation and completeness. Completeness of data encompasses both its comprehensiveness and whether it is a good representation of the population of interest [17]. Clinical research often requires data collection from a large sample size of patients. As every patient will have different investigations, diagnoses, and treatment plans, every patient will have varying types and amounts of clinical documentation and to differing degrees of detail. There will, therefore, be difficulty in standardising a method for data collection across an array of available patient information to ensure completeness of the data. It is crucial to ensure that the data is complete, otherwise the research results could be subject to information bias [8]. Typically, the ideal method to achieve this is to conduct prospective data collection, minimising omissions [8]. However, as big data is retrospective, it is often difficult to agree upon a decision regarding inclusion of the data or methods to retrieve missing data when medical records are not available [8]. In those situations, the clinical researcher will be required to design algorithms to clean and correct the available data, however it is difficult to design an objective method to validate certain choices made in this process of data collection. In addition, the coding of information is very much skewed towards documenting and following up the primary diagnoses [17]. As such, secondary diagnoses are often missed or poorly recorded, resulting in a lack of well-documented secondary patient information, such as co-morbidities. (ii) Validity of big data Joppe defines validity in quantitative research as a criterion that determines whether a research truly measures what it was initially intended to measure [18]. The validity of big data varies between different clinical specialties and the circumstances in which the data is being used [17]. Occasionally, big data may contain incomplete data sets, or even incorrect data, due to errors in transcription or abstraction [8]. There have been instances when data is misclassified during the recording of data during the data coding process [17]. These may occur when a patient undergoes a procedure that treats more than one condition, or in recording a patient’s hospital admission based on presenting complaint [17]. These systematic errors are hence potentially misrepresentative of the data [17]. A literature review by Talbert and Lou Sole [8] in 2013 found that there has been substantial research suggesting that administrative databases, a subset of big data, have only moderate sensitivities and specificities for correct data coding and may underreport procedures [8]. The increasing trend of activity-based funding of hospitals in some countries, such as the United States and Australia, may also influence the information recorded in big data at discharge [19]. Activity-based funding is a policy intervention targeted at restructuring incentives across healthcare systems through a fixed

Australian Medical Student Journal

23


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.