Analysis of predictive patterns between historical site issues and audit findings Background: Periodic process and data audits provide an assessment of quality and compliance. Audit observations provide insight into trial conduct, both internally as well as at our investigational sites. These audits utilize a classification ranking of Critical, Major, Minor, and provide recommendations to support management of future quality and compliance risks, as well as actions to mitigate risks for conduct which has already occurred. An audit plan is developed based on risk criteria elements pre-defined by our quality group and applied to the portfolio of ongoing clinical trials. The consequences of poor quality and lack of compliance can be as severe as exclusion of patient data from FDA submission. The volume of trials and investigative sites where we conduct research is significant, therefore it is unreasonable and resource inefficient to conduct audits for every trial site. However, the consistency of the trial conduct processes and activities allow for the use of data sampling to determine risk trends of compliance and level of quality with an expectation of reasonable extrapolation across the company. The use of this type of data aligns with the concepts of risk-based monitoring whereby predefined risk thresholds are defined and would allow for dynamic monitoring design per trial site. For example, at-risk sites may require more frequent or more detailed monitoring, whereas low risk sites may require less. To facilitate a dynamic monitoring design, real-time analysis of risk indicators will be critical for success, along with inclusion of all ongoing and completed trials at conducted at the investigator site. Proposal: To validate the hypothesis that ongoing trial issues can provide an indication of potential risk at investigator sites where we are currently or planning to conduct further research. I will accomplish this through the evaluation of cross-trial historical site issues from monitoring reports in comparison to sample audit findings from 2011 and 2012. Utilizing supervised machine learning, I will be able to classify historical data and identify observable patterns, which can be utilized in a predictive fashion to adopt a targeted monitoring approach. This methodology should be a component of our risk-based monitoring plan and will to further increase quality and compliance, thereby increasing resource utilization and preventing the loss of patient data. Conclusion: I evaluated all issues documented 12 months prior to each site audit and found that 78% of major findings in 2011 and 2012 were preceded by combination of >3 CRF issues AND >1 protocol compliance issues.
Figure 1: Decision Tree (n=46 Audit Findings)
Applying this one finding in a predictive fashion could suggest that institutions where, across all ongoing trials, we have documented >3 CRF AND >1 protocol compliance issue would have a 78% risk probability toward a major audit finding. While this single finding is valuable it triggers additional questions, such as “What types of major findings?” and “What about patterns among critical and minor findings?” Unfortunately there are limitations with our current data sets and their quality which must be addressed to support these and other important business questions. We must address the following issues:
Inconsistent or nonexistent classification of site issues Inconsistent classification of audit grouping Updates and/or system changes which have redefined classifications within analysis period Lack of correlation between site issue and site audit categorizations in CTMS and audit records Lack of site issues documented prior to site audit
Further data collection and analysis of key site and clinical data would allow us to answer the question “Is there relationship between CRF issues and Protocol Compliance at an institution?” further reinforcing the use of data as a mean to support better decision making and risk mitigation. Finally, when applying a machine-learning model, more data is better and therefore I would expect greater clarity using a larger data set. Expansion of analysis to include other countries or greater time period, if appropriate, may lead to a better-trained model, therefore better results. Analysis Methodology Understanding the data Audit findings: A sample of 13 site audits performed in 2011 and 2012 across 11 institutions with 86 findings, classified as Critical (6), Major (66), or Minor (44). Site Issues: We included issues were documented within 1 year prior to each audit. A total of 346 site issues were utilized as input variables. These are categorized with the following labels: CRF Delegation/Oversight Essential Documentation Protocol Compliance Source Documentation Trial Medication Data Preparation Updates made to ClinAdmin during 2010-2011 resulted in inconsistencies in issue categories, producing an apple to oranges situation. Manual mapping of the old to new issue categories was required to properly analyze the data. In addition there were a significant number of issues categorized as “other” or “not classified.” I had to review the narrative for each of these cases and re-classify accordingly. Figure 2 is a representation of the data set classifications.
Figure 2: Sample Dataset
Modeling Since I believe that our input variables are related, my preferred approach was to use C4.5 so that a decision tree was not limited to binary splits. Unlike models such as neutral network, C4.5 provided the most direct conclusions by displaying clear pattern of behavior. Using 66% for training purpose, this model achieved 87.5% classification accuracy.
Andrew Chen 7 May 2013