Page 1


Safety in cardiac surgery Sabrina Siregar


Safety in cardiac surgery PhD thesis, Utrecht University, The Netherlands, with a summary in Dutch ISBN 978-94-6108-455-2 Author Sabrina Siregar Cover design Simon van Anken Lay-out Nicole Nijhuis, Gildeprint Drukkerijen, Enschede, The Netherlands Printed by Gildeprint Drukkerijen, Enschede, The Netherlands Printed on FSC certified paper Š Sabrina Siregar, Utrecht, The Netherlands, 2013


Safety in cardiac surgery Veiligheid en hartchirurgie (met een samenvatting in het Nederlands)

Proefschrift

ter verkrijging van de graad van doctor aan de Universiteit Utrecht op gezag van de rector magnificus, prof. dr. G.J. van der Zwaan, ingevolge het besluit van het college voor promoties in het openbaar te verdedigen op donderdag 13 juni 2013 des middags te 14.30 uur door Sabrina Siregar geboren op 24 oktober 1985 te gemeente ‘s-Gravenhage


Promotoren:

Prof. dr. L.A. van Herwerden Prof. dr. Y. van der Graaf

Co-promotor:

Dr. R.H.H. Groenwold

Het verschijnen van dit proefschrift werd mede mogelijk gemaakt door de steun van de Nederlandse Hartstichting en de Hart & Long Stichting Utrecht. Daarnaast werd ruime financiĂŤle steun geleverd door de Begeleidingscommissie Hartinterventies Nederland.


Voor mijn ouders


Table of Contents Introduction Chapter 1: Preface: In charge of your own outcomes Chapter 2: Rationale and outline of this thesis Chapter 3: Data Resource Profile: Adult cardiac surgery database of the Netherlands Association for Cardio-Thoracic Surgery

9 11 19

Part 1: Measuring Safety in Cardiac Surgery Chapter 4: Performance indicators for hospitals Chapter 5: Statistical methods to monitor risk factors in a clinical database: example of a national cardiac surgery registry Chapter 6: Evaluation of cardiac surgery mortality rates: 30-day mortality or longer follow-up?

39 41

25

53 71

Part 2 : Comparing Safety in Cardiac Surgery 89 Chapter 7: Performance of the original EuroSCORE 91 Chapter 8: Limitations of ranking lists based on cardiac surgery mortality rates 115 Chapter 9: Gaming in risk-adjusted mortality rates: Effect of misclassification of risk factors in the benchmarking of cardiac surgery risk-adjusted mortality rates 129 Chapter 10: The Dutch Hospital Standardized Mortality Ratio (HSMR) method and cardiac surgery: benchmarking using hospital administration data versus a clinical database 147 Synthesis Chapter 11: General discussion

165 167

Supplement 189 Chapter S1: Trends and outcomes of valve surgery: 16-year results of The Netherlands Adult Cardiac Surgery Database 191 Chapter S2: Trends and outcomes of coronary artery bypass grafting: 16-year results of The Netherlands Adult Cardiac Surgery Database 215


Closing pages Summary Samenvatting Cardiac surgery centers and Data Registration Committee Manuscripts based on the studies presented in this thesis Dankwoord Curriculum Vitae

233 237 241 245 247 249 253


Introduction


1 Preface: In charge of your own outcomes

Siregar S, Versteegh MIM, van Herwerden LA Ned Tijdschr Geneeskd. 2011;155(50):A4103


Chapter 1

12


Preface: In charge of your own outcomes

Background The demand for transparency in the quality of healthcare has increased: quality indicators are reviewed and expanded, the quality of care is measured in many different ways and the demand for publication of mortality rates is increasing.(1-3) The public, led by the media, political influences and health insurance companies, calls for transparency. Invoking the Freedom of Information Act (Wet Openbaarheid van Bestuur), Elsevier Magazine requested the publication of the Hospital Standardized Mortality Ratios (HSMR) of hospitals in The Netherlands in 2009.(3) However, the Inspectorate of Health claimed it did not have access to this information and disclosure did not follow. The public debate that followed exposed why publication was not desired: the HSMR does not adequately reflect the quality of healthcare, which leads to an invalid comparison of hospitals.(2) This example clearly illustrates the potential risk of the appeal for transparency: the publication of incorrect results may distort the perceived quality of care in hospitals. This risk has been emphasized before. In 2004, the marketing research institute Prismant published unadjusted mortality rates of hospitals in The Netherlands.(4) It reported large differences in the mortality rates across hospitals. The publication caused upheaval among healthcare providers, who emphasized the fact that differences in case-mix were not accounted for. In other words, the patients’ a priori risk of mortality was not considered.(5) For example, patients in a poor physical condition or those who are undergoing complex surgery have a higher baseline risk of mortality than the general population. Hospitals performing surgery on high-risk patients will be falsely discredited when casemix is not, or insufficiently, taken into account. The core of the problem thus lies in the fact that outcomes are not or inadequately adjusted for the severity of patients and disease, which invalidates the comparison across hospitals. What can healthcare providers do against oversimplified publications such as those described above? Measuring risk-adjusted outcomes External parties commonly use administrative databases to measure outcomes in healthcare. Such databases were developed for other purposes than the evaluation of the outcomes of care. For example, a hospital’s administration office collects International Classification of Diseases (ICD) codes for administration purposes and Diagnose Treatment Combination (Diagnose Behandel Combinaties) codes for billing purposes. An administrative database contains limited clinical information, which restricts the possibilities for casemix adjustment. This issue is illustrated by the HSMR.(6) The comorbidity of patients is deduced from the recorded ICD-9 codes, but no information is available on the severity of disease. For example, the codes for anemia do not indicate the hemoglobin value or the acuteness of onset. Such clinical factors have prognostic value and are necessary for casemix adjustment. In addition, a distinction between risk factors (present on admission) and complications (e.g.

13


Chapter 1

pneumonia) is in general not possible, because all codes are only recorded upon discharge. Finally, as individuals who were not involved in the clinical care perform the coding, the accuracy may be questionable. Ideally, a clinical database contains objective variables, such as laboratory values and clinical measurements, to limit variability between centers. Apart from these limitations, administrative databases have many advantages: they are cheap, the data are readily available and all patients and interventions are recorded. Clinical databases require maintenance and are therefore expensive. Nonetheless, the necessity of clinical databases is indisputable; quality should not be assessed based on incomplete and inaccurate data. In short, healthcare providers can improve risk adjustment by recording outcomes and risk factors in a clinical database.(7) Such databases allow for risk adjustment of results and substantiate the protest against oversimplified reports. In other words, healthcare providers can maintain control over their results by actively collecting their own data. Besides the external demand for transparency, there is another - and much more important reason to register and evaluate outcomes: the monitoring of quality by means of trend analyses and benchmarking. After all, the numbers tell the tale. Timely intervention in case of deviations from the trend in, for example, mortality as outcome measure, may prevent the occurrence or continuation of underperformance. The database of The Netherlands Association for Cardio-Thoracic Surgery A good example of healthcare providers who apply internal quality control by means of an outcomes registration is The Netherlands Association for Cardio-Thoracic Surgery (Nederlandse Vereniging voor Thoraxchirurgie, NVT). Since 1995, Dutch cardiac surgeons have registered all cardiac surgical interventions in adults in the Supervisory Committee for Cardiac Interventions (Begeleidingscommissie Hartinterventies Nederland, BHN) registry. The BHN is an umbrella foundation set up by cardiologists, pediatric cardiologists, cardiac anesthesiologists and cardiac surgeons. The intervention registry appeared to be insufficient to allow for the monitoring of outcomes. In 2005, the media reported that the mortality after cardiac surgery was twice as high in the University Medical Center St. Radboud than in other centers. Investigations showed that mortality rates had been conceivably elevated since the beginning of 2004. The department was temporarily shut down for adult cardiac surgery.(8) Subsequently, the NVT acknowledged its responsibility and established a database. In addition to demographic factors and type of intervention, the database also includes in-hospital mortality and 18 risk factors for mortality, as defined by the EuroSCORE model.(9) The EuroSCORE is a widely used risk model that estimates the risk of mortality during and after cardiac surgery. The use of this model resulted in relative uniformity of data across the cardiac surgery centers. The score is calculated based on 18 patient- and operation characteristics. For example, a patient

14


Preface: In charge of your own outcomes

with poor left ventricular function and pulmonary hypertension has a relatively high risk of mortality and will thus have a high EuroSCORE. The EuroSCORE of patients is accounted for in the comparison of outcomes across centers. Improvement through feedback The purpose of the NVT database is to improve the cardiac surgical care provided by the 16 centers. The registration of outcomes enables the identification of an increase or decrease in postoperative mortality, while taking into account the changes in casemix. Thus, the database allows for monitoring of risk-adjusted mortality. In addition, the centers are benchmarked against national results. The strength of the initiative lies in the frequent feedback of results to the centers. All 16 cardiac surgery centers in The Netherlands participate in the database. One surgeon from each center is seated on the Data Registration Committee of the NVT. This committee assembles every quarter to discuss results, trends, deviations in the data and other issues, and to benchmark their own center against the national results. The frequent evaluation of results allows for early identification of potential problems. Database maintenance Database maintenance is performed by the department of Medical Informatics in the Academic Medical Center in Amsterdam. Data on risk factors and outcomes are collected in the 16 centers and are submitted every quarter. Data collection, transfer, storage and cleaning is funded by the centers themselves. Some employ data managers, whereas in other centers cardiac surgeons maintain the local database. On a national level, data are entered into the national database, checked for errors and improbable values, feedback on data accuracy is provided, corrected data are entered and a final report is generated (quarterly and annually), along with national results to compare with. The national processing of data is also funded by the centers. As a result, the NVT currently possesses comprehensive information on over 80,000 surgical interventions, which have been performed since 2007 (www.nvtnet.nl). The completeness of data is high: all centers in The Netherlands participate and less than 1% of the data on demographics, risk factors and intervention type are missing. In 2009, an auditing committee was installed to obtain insight into the process of data collection. During on-site visits the auditing committee compared data from the national database to that collected from medical records. In 2010, the NVT consulted the Julius Center Utrecht (University Medical Center Utrecht) for clinical epidemiological analyses of the data, with regard to both the data quality and the reported outcomes. The first reports showed that the quality of data was adequate: no signs of structural errors were found. Currently research is aimed at the comparison of outcomes in order to benchmark centers. This means that the risk-adjusted mortality rates

15


Chapter 1

of the 16 centers are compared to a national bench (for example the average mortality rate), as opposed to each other. Centers with a mortality rate higher than expected might be notified and if necessary, interventions may be planned in order to safeguard the quality of care. The effect of possible interventions can be measured using the subsequent outcomes. Ranking lists will not be constructed. National outcomes registry for colon cancer Other healthcare providers have also started to use outcomes registries to improve the quality of care. For example, registries have been launched for the surgical treatment of colorectal-, breast-, esophageal- and gastric cancer. The databases are run by The Dutch Institute for Clinical Auditing (DICA) and include risk factors and outcomes. Again, the goal is to monitor risk-adjusted outcomes, provide feedback and identify deviant results. The number of participating hospitals and the completeness of data are increasing.(10) Conclusion Unadjusted mortality rates lead to an invalid reflection of the quality of care. Therefore, healthcare providers should strive for risk adjustment of outcomes, by creating their own clinical databases. In addition, the measuring of results provides insight into trends and outcomes of the provided care, enabling immediate improvement. The comprehensive database of The Netherlands Associations for Cardio-Thoracic Surgery could serve as an example. Hopefully, many other medical societies and professionals will follow and stay in charge of their own outcomes.

 

16


Preface: In charge of your own outcomes

Reference List 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Veiligheidsindicatoren ziekenhuizen 2010 t/m 2012. Utrecht: Inspectie voor de Gezondheidszorg; 2010. Geelkerken RH, Mastboom WJB, Bertelink BP, Van der Palen J, Berg M, Kingma JH. Een onrijp instrument, Sterftecijfer niet geschikt als maat voor ziekenhuiskwaliteit. Medisch Contact 2008;3704. van Leeuwen A. Sterftecijfers: ja of nee? Elsevier 2009 Oct 24;18-9. J.Meegdes. Regioverschillen in mortaliteit ziekenhuizen. Prismant, 8 Apr 2004. www.prismant.nl van Dam FSAM, van Maanen H. Voortijdige en ongenuanceerde publicatie over sterfteverschillen tussen ziekenhuizen in de media. Ned Tijdschr Geneeskd 2004;1077-8. van den Bosch WF, Spreeuwenberg P, Wagner C. Gestandaardiseerd ziekenhuissterftecijfer (HSMR): correctie voor ernst hoofddiagnose kan beter. Ned Tijdschr Geneeskd 2011;A3299. Shahian DM, Silverstein T, Lovett AF, Wolf RE, Normand SL. Comparison of clinical and administrative data sources for hospital coronary artery bypass graft surgery report cards. Circulation 2007 Mar 27;115:1518-27. Een tekortschietend zorgproces. Een onderzoek naar de kwaliteit en veiligheid van de cardiochirurgische zorgketen voor volwassenen in het UMC St Radboud te Nijmegen. Inspectie voor de Gezondheidszorg; 2006 Apr 24. Nashef SA, Roques F, Michel P, Gauducheau E, Lemeshow S, Salamon R. European system for cardiac operative risk evaluation (EuroSCORE). Eur J Cardiothorac Surg 1999 Jul;16:9-13. The Dutch Institute for Clinical Auditing, www.clinicalaudit.nl.

17


2 Rationale and outline of this thesis


Chapter 2

20


Rationale and outline of this thesis

In 2005 the media reported that the mortality rate after cardiac surgery in one of the 16 cardiac surgery centers was twice as high as in other clinics in The Netherlands.(1;2) External investigation showed that this could have been detected one year earlier if mortality rates had been measured and evaluated. The incident exposed the necessity of proper outcomes evaluation in cardiac surgery. The Netherlands Association for Cardio-Thoracic Surgery (Nederlandse Vereniging voor Thoraxchirurgie, NVT) acknowledged its responsibility in safeguarding the cardiac surgical care and subsequently constituted the NVT Adult Cardiac Surgery Database.(3) The goal of this database was to evaluate risk-adjusted mortality rates and to monitor safety as an elementary component of the quality of care. The history and rationale of the clinical database in The Netherlands defined its purpose and current form. Whereas patients, health insurance companies and the media mainly focused on the identification of the best healthcare providers, the purpose of the NVT database was to monitor the absence of danger, i.e. safety. Safety is an elementary component of quality, but does not cover its meaning in the full range. The absence of danger (i.e. adverse outcomes) does not imply high-quality care, but it is indisputably a prerequisite for highquality care. This nuance may seem trivial at first. However, it does put into perspective why in-hospital mortality was the only outcome collected in the database for several years. In the beginning of 2012, other outcomes were added to the data set, thereby broadening the scope of the database towards the assessment of quality. As the studies in this thesis are based on data from 2007 until 2010, it is primarily safety that can be discussed. Nonetheless, the conclusions drawn from this thesis are applicable to both safety and quality assessment in cardiac surgery. Since its constitution in 2007, the database has grown and evolved considerably. At the start of this research the database contained information on approximately 45,000 cardiac surgery interventions. Currently, this number has grown to over 80,000 interventions. The exemplary database has a nationwide coverage, a high completeness of data and is run by professionals themselves. However, recapitulating the goal of the database, the monitoring of safety is not achieved by merely the collection of mortality rates. Several steps are required in order to achieve this goal. The first step is to measure. As safety and quality are concepts, they cannot be measured directly. Thus, a wide variety of variables are used as proxies. In practice, outcomes, and in particular mortality, is the most commonly used measure in cardiac surgery.(4-8) Yet, even an outcome as unambiguous as mortality can be used in many different ways. For example, it can be measured upon discharge, after 30 days and after one year. On the other hand, variables that describe the processes and structure of care are also often used to measure

21


Chapter 2

safety and quality.(9) Typically, this information is easier to obtain than patient outcomes. However, do they adequately reflect the safety and quality of cardiac surgery? The next challenge is to compare outcomes across hospitals. High mortality rates do not necessarily imply an unsafe situation. There are many other factors that may affect mortality, such as patient severity and the complexity of the performed cardiac operations.(10) Evaluation of mortality rates thus requires the consideration of this so called case-mix. For this purpose, risk adjustment models have been developed, such as the Parsonnet score, the Society of Thoracic Surgeons Models, the EuroSCORE and the EuroSCORE 2.(4-6;8) By taking into account patient risk factors for mortality and the type of performed interventions, these models level the playing field across hospitals. If risk adjustment is insufficient, high mortality rates may be attributable to the patient risk or the type of surgery performed, as opposed to an unsafe situation. Therefore, risk adjustment is a fundamental element in the comparison of outcomes. The NVT uses EuroSCORE risk factors for risk adjustment. The EuroSCORE estimates a patient’s risk of mortality after cardiac surgery using 18 variables.(4) It was developed in 1999 and its publication in 2000 was warmly welcomed in Europe: the EuroSCORE was adopted in many registries and became the most commonly used risk adjustment model in Europe and The Netherlands. Now, in 2013, the data used to develop the EuroSCORE are over 18 years old. It should be questioned if this model is still appropriate to use for benchmarking. Safety and quality are frequently assessed using administrative data, such as billing data and data from the hospital administration. As vast amounts of administrative data are readily available, they are becoming increasingly popular for purposes other than administration. Many have expressed their concern that risk adjustment may not be sufficient when administrative data are used.(11-15) Hence, this thesis includes a study on the possibilities and pitfalls of administrative data for the evaluation of outcomes in cardiac surgery. Another concern in the comparison of outcomes is data accuracy. Both unintentional and intentional data errors may affect prevalence rates of risk factors. Risk factors may be intentionally upcoded to exaggerate patient risk, in order to “improve� risk-adjusted outcomes. This phenomenon is also termed gaming.(7;16;17) This thesis includes the results of a simulation study that quantifies the effect of gaming of risk factors on benchmarking. In addition, methods are presented to control and identify gaming in future data. Likewise, data inaccuracies may occur in outcome variables. Up to now, it was unknown if this presented a tangible issue in the database and if so, how it may be averted.

22


Rationale and outline of this thesis

The abovementioned aspects are merely some highlights to illustrate the complexity of the monitoring of safety in cardiac surgery; the process involves many clinical, practical, methodological and statistical issues. The objective of this thesis was to monitor safety in cardiac surgery in The Netherlands using the NVT Adult Cardiac Surgery Database. The specific aims can be defined as follows: 1. To investigate methods to measure safety in cardiac surgery 2. To investigate methods to compare safety across cardiac surgery centers The outline of this thesis follows these goals and is also partitioned in two. Chapter 3 of the introductory part provides an in-depth analysis of the content of the database. Part 1 elaborates on how to measure safety in cardiac surgery. The basic concepts are discussed in Chapter 4. The monitoring of data accuracy is discussed in Chapter 5 and the question when to measure mortality is answered in Chapter 6. In Part 2 the comparison of safety across hospitals is studied. The performance of the EuroSCORE as a risk adjustment tool is investigated in a systematic review in Chapter 7. The use of ranking lists is evaluated in Chapter 8 and the impact of gaming is examined in Chapter 9. Finally, the question if an administrative database can be used to measure and compare safety is answered in Chapter 10. Two supplementary chapters have been included in this thesis. Chapters S1 and S2 provide a comprehensive overview of trends and outcomes of valve surgery and isolated CABG surgery in The Netherlands from 1995 to 2010.

23


Chapter 2

Reference List 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.

24

Een tekortschietend zorgproces. Een onderzoek naar de kwaliteit en veiligheid van de cardiochirurgische zorgketen voor volwassenen in het UMC St Radboud te Nijmegen. Inspectie voor de Gezondheidszorg; 2006 Apr 24. Een onvolledig bestuurlijk proces: hartchirurgie in UMC St Radboud.Onderzoek naar aanleiding van berichtgeving op 28 september 2005 over te hoge mortaliteit. Den Haag: de onderzoeksraad voor veiligheid; 2008 Apr. Nederlandse Vereniging voor Thoraxchirurgie. www.nvtnet.nl. Nashef SA, Roques F, Michel P, Gauducheau E, Lemeshow S, Salamon R. European system for cardiac operative risk evaluation (EuroSCORE). Eur J Cardiothorac Surg 1999 Jul;16(1):9-13. Nashef SA, Roques F, Sharples LD, Nilsson J, Smith C, Goldstone AR, et al. EuroSCORE II. Eur J Cardiothorac Surg 2012 Apr;41(4):734-44. Parsonnet V, Dean D, Bernstein AD. A method of uniform stratification of risk for evaluating the results of surgery in acquired adult heart disease. Circulation 1989 Jun;79(6 Pt 2):I3-12. Shahian DM, Normand SL, Torchiana DF, Lewis SM, Pastore JO, Kuntz RE, et al. Cardiac surgery report cards: comprehensive review and statistical critique. Ann Thorac Surg 2001 Dec;72(6):2155-68. Shahian DM, O’Brien SM, Filardo G, Ferraris VA, Haan CK, Rich JB, et al. The Society of Thoracic Surgeons 2008 cardiac surgery risk models: part 1--coronary artery bypass grafting surgery. Ann Thorac Surg 2009 Jul;88(1 Suppl):S2-22. Donabedian A. Explorations in quality assessment and monitoring. The definition of quality and approaches to its assessment. Michigan: Health Administration Press; 1980. Iezzoni LI. Risk adjustment for measuring healthcare outcomes. Ann Arbor, Mich: Health Administration Press; 1994. Brinkman S, Abu-Hanna A, van der Veen A, de JE, de Keizer NF. A comparison of the performance of a model based on administrative data and a model based on clinical data: effect of severity of illness on standardized mortality ratios of intensive care units. Crit Care Med 2012 Feb;40(2):373-8. Glance LG, Dick AW, Osler TM, Mukamel DB. Accuracy of hospital report cards based on administrative data. Health Serv Res 2006 Aug;41(4 Pt 1):1413-37. Hannan EL, Racz MJ, Jollis JG, Peterson ED. Using Medicare claims data to assess provider quality for CABG surgery: does it work well enough? Health Serv Res 1997 Feb;31(6):659-78. Mack MJ, Herbert M, Prince S, Dewey TM, Magee MJ, Edgerton JR. Does reporting of coronary artery bypass grafting from administrative databases accurately reflect actual clinical outcomes? J Thorac Cardiovasc Surg 2005 Jun;129(6):1309-17. Shahian DM, Silverstein T, Lovett AF, Wolf RE, Normand SL. Comparison of clinical and administrative data sources for hospital coronary artery bypass graft surgery report cards. Circulation 2007 Mar 27;115(12):1518-27. Califf RM, Jollis JG, Peterson ED. Operator-specific outcomes. A call to professional responsibility. Circulation 1996 Feb 1;93(3):403-6. Green J, Wintfeld N. Report cards on cardiac surgeons. Assessing New York State’s approach. N Engl J Med 1995 May 4;332(18):1229-32.


3 Data Resource Profile: Adult cardiac surgery database of the Netherlands Association for Cardio-Thoracic Surgery

Siregar S, Groenwold RHH, Versteegh MIM, Takkenberg JJM, Bots ML, van der Graaf Y, van Herwerden LA Int J Epidemiol. 2013;42(1):142-149


Chapter 3

Abstract In 2007 the Netherlands Association for Cardio-Thoracic Surgery (Nederlandse Vereniging voor Thoraxchirurgie, NVT) instituted the Adult Cardiac Surgery Database. The dataset comprises demographic factors, type of intervention, in-hospital mortality and 18 risk factors for mortality after cardiac surgery, according to EuroSCORE definitions. Currently, this procedural database contains over 60,000 interventions. Completeness of data is excellent and national coverage of all 16 Dutch cardio-thoracic surgery centers has been achieved since the start. The primary goal of the database is to control and maintain the quality of care by evaluation of outcomes. This is accomplished by regular feedback and comparison of outcomes. For a subset of the database (procedures from 10 out of 16 centers) longerterm follow-up has been established by means of data linkage to two national registries. This provides information on survival status, causes of death and readmissions. The database has recently been used for research, resulting in methodological papers aimed at optimizing comparison of outcomes. In future, clinical issues will also be addressed, for example survival after coronary artery bypass grafting and valve surgery.

26


Data Resource Profile: Adult cardiac surgery database

Background of the database Cardiac surgery has a long tradition of outcomes evaluation. Since the public release of cardiac surgery mortality rates in the US mid-eighties, many efforts have been made to allow and improve the comparison of outcomes.(1) In the following years, local as well as national databases were established and many risk models were developed.(2-5) In The Netherlands, since 1995 cardiac interventions, both surgical and catheter based, have been registered by an umbrella foundation set up by cardiologists, pediatric cardiologists, cardiac anesthesiologists and cardiac surgeons, called Supervisory Committee for Cardiac Interventions in The Netherlands (Begeleidingscommissie Hartinterventies Nederland, BHN). Demographic details and the type of intervention were registered. However, the register appeared to be insufficient to allow the monitoring and evaluation of outcomes. This became evident in 2005, when the media reported that mortality rates in one of the 16 cardio-thoracic surgery hospitals in The Netherlands was twice as high as in other clinics. External investigations indicated that this could have been detected a year earlier if the national database would have been more complete. The hospital was temporarily not allowed to perform adult cardiac surgery, until the department was reorganized. This event strengthened the belief among members of The Netherlands Association for CardioThoracic Surgery (NVT) that it is a societal responsibility to ensure the safety of cardiac surgery in The Netherlands. The NVT subsequently established a database, which in addition to demographic factors and type of intervention, also includes in-hospital mortality and risk factors for mortality after cardiac surgery. Participation was made compulsory by societal decree. The goal of this database was to evaluate risk-adjusted mortality rates and monitor safety as an elementary component of the quality of care. An outline of the database is provided in the Box below. Country Procedures Purpose of database Data collection frequency

All 16 cardiac surgery centers in The Netherlands All adult cardiac surgery procedures Outcomes evaluation Periprocedural: patient demographics, type of intervention and risk factors for mortality. Periodical follow-up: survival status, cause of death and readmissions.

Topic headings

Outcomes evaluation, monitoring safety in cardiac surgery, trends and outcomes in cardiac surgery. Collection, transfer, storage and cleaning of the data funded by the centers; additional funding by the Dutch Department of Health, Welfare and Sport for extension of the dataset, first series of analyses and additional research aimed at quality improvement.

Funding sources

27


Chapter 3

Data resource area and population coverage The prospective database includes all adult cardiac surgery procedures in all 16 cardiothoracic surgery centers in the Netherlands since 1 January 2007 onwards. Procedures were included in the database if open heart surgery was performed in a patient of 18 years or older. Open heart surgery is defined by a surgical intervention to the heart with opening of the pericardium. The baseline data are collected prospectively during hospital stay. Patient characteristics and risk factors for mortality after cardiac surgery are shown in Table 1. The interventions that were included in this database are shown in Table 2. The majority of interventions involved coronary artery bypass grafting (69.4 %) and in over one third of the cases valve surgery (38.8%). Table 1: Baseline characteristics of all procedures performed from 2007 to 2010 included in the Adult Cardiac Surgery Database of the Netherlands Association for Cardio-Thoracic Surgery N (%) N = 63061a Risk factors for mortality after cardiac surgeryb Age (continuous) Female Serum creatinine >200 μmol/l Extracardiac arteriopathy Pulmonary disease Neurological dysfunction Previous cardiac surgery Recent myocardial infarct LVEF 30–50% LVEF <30% Systolic pulmonary pressure >60 mmHg Active endocarditis Unstable angina Emergency operation Critical preoperative state Ventricular septal rupture Other than isolated coronary bypass surgery Thoracic aortic surgery Logistic EuroSCORE

66.0 (±11.2) 18960 (30.1) 1252 (2.0) 7703 (12.2) 7251 (11.5) 1998 (3.2) 4668 (7.4) 7950 (12.6) 12048 (19.1) 3463 (5.5) 1967 (3.1) 937 (1.5) 3659 (5.8) 4196 (6.7) 2949 (4.7) 138 (0.2) 29045 (46.1) 3535 (5.6) mean 7.3 (± 10.1) median 4.0

Categorical variables are displayed as numbers with percentages between brackets; continuous variables are displayed as means with standard deviations between brackets. a Records with complete EuroSCORE variables, b Definitions according to the EuroSCORE risk model.

28


Data Resource Profile: Adult cardiac surgery database

Table 2: In hospital mortality in the Dutch Adult Cardiac Surgery Database 2007-2010, specified by type of intervention Intervention All interventions CABG Isolated CABG Valve Isolated valve Aortic valve Isolated aortic valve Mitral valve Isolated mitral valve Pulmonary valve Tricuspid valve CABG and valve

N (%) N = 63061 63061 (100.0) 43775 (69.4) 33401 (53.0) 24462 (38.8) 11974 (19.0) 16662 (26.4) 7372 (11.7) 9237 (14.6) 2798 (4.4) 155 (0.2) 2293 (3.6) 7620 (12.1)

In-hospital mortality (%) [95% CI] 3.02 [2.89; 3.16] 2.49 [2.37;2.62] 1.37 [1.28;1.46] 4.43 [4.27;4.59] 3.40 [3.26;3.54] 4.39 [4.23;4.55] 2.61 [2.48;2.73] 5.72 [5.53;5.90] 3.18 [3.04;3.32] 1.29 [1.20;1.38] 7.94 [7.73;8.15] 5.39 [5.22;5.57]

Logistic EuroSCOREa (%) [95% CI] 7.30 [7.22;7.38] 6.06 [6.00;6.13] 4.66 [4.61;4.72] 9.72 [9.63;9.81] 8.74 [8.66;8.82] 9.88 [9.79;9.96] 8.54 [8.46;8.61] 10.09 [10.00;10.19] 8.13 [8.04;8.22] 6.90 [6.85;6.95] 11.30 [11.21;11.39] 9.82 [9.73;9.90]

The logistic EuroSCORE gives an estimate of the risk of operative mortality (within 30 days after operation or within the same hospital admission).

a

Data are collected at the participating centers (Figure 1). Some centers have chosen to employ data managers, whereas others rely on the surgeons to do the administrative work. Every three months anonymized data are sent to the Department of Clinical Informatics at the Academic Medical Center in Amsterdam, where it is further processed on a national level. First data are entered into the database and error checks are run. In case of errors, centers are notified and requested to provide corrected data. A special Data Registration Committee, consisting of one delegate from each center, assembles every three months to discuss errors and results generated using the anonymized data. Results are reported to the Data Registration Committee in the form of reports, including national and anonymized peer-results to compare with. The national database is stored in the Academic Medical Center in Amsterdam.

29


Chapter 3

Figure 1: Cardio-thoracic surgery centers in The Netherlands

Map of The Netherlands showing all 16 centers where cardiac surgery is performed (blue squares). All 16 centers participate in the Adult Cardiac Surgery Database of the Netherlands Association for Cardio-Thoracic Surgery.

Survey frequency If a surgical procedure is registered in the database, the patient is followed for the duration of the hospital stay. After surgery, intervention related information is collected and survival status is recorded upon discharge. Information on mortality after discharge is not routinely collected by the centers. To obtain reliable information on survival status after discharge, survival information is obtained from two national registries. The first follow-up took place in 2012. Ten out of the 16 participating centers agreed to participate in the linkage to the Dutch Population Registry (PR) and subsequently to two other national registries, which were made available by Statistics Netherlands (CBS): the Cause of Death Registry and the Hospital Discharge Registry (HDR).(6-8) The former contains information on survival status and cause of death of all Dutch residents and is extracted from

30


Data Resource Profile: Adult cardiac surgery database

the municipal registries. Currently the Cause of Death Registry is completed until 2011. This means that a minimum follow-up of one year was achieved for all procedures performed until 2010 in ten out of 16 hospitals. Using the details on hospital admissions, we aimed to determine hospital mortality after transfer from the primary center (where surgery was performed), readmission rates and causes of readmission. To obtain information on readmissions after discharge, data from the HDR were used.(7) The HDR contains information on hospital admissions of a vast majority of the hospitals in The Netherlands.(9;10) Currently the registry is completed until 2010. This means that follow-up data on hospital readmissions during the first postoperative year was available for procedures in the database performed from 2007 to 2009 in ten out of 16 hospitals. A structured periodical follow-up to attain information on readmissions and survival status of the patients through linkage of national registries is desired. However, a protocol for the frequency and methods of follow-up has not been established yet. The goal is to accomplish this in the near future. The Adult Cardiac Surgery Database is a procedural database. This means one person can be included multiple times in case of cardiac reoperations. Because all data is anonymized, analyses could only be performed on procedural level. For the follow-up study CBS matched all procedures in the database to a personal record in the municipal registries, after which the database could be linked to Hospital Discharge Registry and the Cause of Death Registry. To maintain consistency of methods, analyses were continued on procedural level. As a consequence, in the rare occasion that a patient died after multiple operations in a short period of time, mortality might be counted multiple times. Considering the fact that only 620 procedures (1.9%) were performed in patients who were already in the database, this issue is not likely to have influenced our results. Measures Baseline data Upon inclusion in the database, administrative and demographic variables are recorded: date of acceptance for surgery, hospital name, sex, postal code, date of birth and age. Preoperatively, variables from the EuroSCORE risk model and additional risk factors are measured: age, sex, chronic pulmonary disease, extracardiac arteriopathy, neurological dysfunction, previous cardiac surgery, serum creatinine >200 micromole/L, active endocarditis, critical preoperative state, unstable angina, LV dysfunction (moderate or poor), recent myocardial infarction, pulmonary hypertension, emergency intervention, other than isolated CABG, surgery on thoracic aorta, postinfarction septal rupture.(3) Variables on previous cardiac interventions include a history of percutaneous catheter interventions

31


Chapter 3

(PCI) of the coronary arteries, coronary artery surgery, valve surgery, surgery on the aorta, or other cardiac surgery. After surgery the following information is collected (if applicable): the type of bypass graft used (arterial, venous or other) and the number of anastomoses; specification of the valve (aortic, mitral, pulmonary, tricuspid) and the type of valve surgery (repair, stentless bioprosthesis, stented bio-prosthesis, mechanical prosthesis, homograft, autograft or other); specification of (concomitant) aortic surgery type (ascending aorta, arch or descending aorta); and other cardiac surgery types (left ventricular aneurysm correction, ventricular septal rupture, heart transplantation, surgery for cardiac rhythm disturbances, unspecified other cardiac surgery). Figure 2 illustrates the distribution of the types of intervention, types of grafts and position of valves operated on in the database. Finally, in-hospital mortality including mortality date is recorded. This is shown in Table 2, specified for each type of intervention. The hospital mortality rate in the entire database was 3.0%. Figure 2: Intervention characteristics of the Dutch Adult Cardiac Surgery Database 2007-2010 Interventions isolated valve

19.0% 12.1%

isolated CABG

CABG + valve

53.0% 16.0% other

Type of grafts

Valve surgery all art.

MV

21.7% 68.3% art.+ven.

Â

9.7%

24.1% all ven.

0.3% other

AV

60.1%

13.0%

double

1.4% 1.3%other triple

Panel A: Types of intervention. CABG=coronary artery bypass graft surgery. Panel B: Type of grafts used in CABG, Art.: arterial, ven.: venous. Panel C: Type of valve surgery. AV: single valve surgery on aortic valve. AV: single valve surgery on mitral valve.

32


Data Resource Profile: Adult cardiac surgery database

The EuroSCORE risk model The NVT applies the definitions of risk factors for mortality used by the EuroSCORE.(3) This is the most commonly used model in The Netherlands and Europe to predict the risk of mortality after cardiac surgery and it is an internationally accepted tool for benchmarking.(3) The use of such a widely known model has several advantages. Firstly, it maximizes the uniformity of data. Secondly, it enables merger or comparison with data from other parts of the world. The score is calculated using 18 patient characteristics and intervention related variables. For example, a patient with a poor left ventricular function and pulmonary hypertension will have a high EuroSCORE, which corresponds to a high mortality risk. This allows outcomes to be corrected for the a priori risk of mortality (i.e. the expected mortality). The mean and median EuroSCORE for the entire database are 7.3% en 4.0% respectively. For most procedures in the database (59.3%) a logistic EuroSCORE below 5% (Figure 3) was calculated.

0.4 0.3 0.0

0.1

0.2

Proportion

0.5

0.6

0.7

Figure 3: Preoperative risk of mortality in the Dutch Adult Cardiac Surgery Database 2007-2010 logistic EuroSCORE (%) according to the logistic EuroSCORE (%)

0

10

20

30

40

50

60

70

80

90

100

logistic EuroSCORE

Â

Mortality Variables extracted from the Cause of Death Registry were: survival status, date of death, primary cause of death according to the 10th revision of the International Classification of Diseases (ICD-10)(11), up to three secondary causes of death according to ICD-10, location of death (hospital/institution or at home, type of hospital/institution).(8) Results of the first linkage to the Cause of Death Registry are shown in Table 3.

33


Chapter 3

Table 3: Survival status in the Dutch Adult Cardiac Surgery Database Subset of database (10 centers) Procedures performed 2007-2010 a In-hospital mortality primary center 30-day mortality One-year mortality Causes of death in first postoperative year Cardiac Cerebrovascular Other Missing

N= 33094

(%)

972 998 2052

2.9 3.0 6.2

1335 233 474 10

65.1 11.4 23.1 0.4

Procedures performed between 1 January 2007 and 31 December 2010 in 10 out of 16 centers were linked to the Cause of Death Registry to attain survival status of the patients. In total 33094 of 34011 procedures were successfully linked (97.3%). Cardiac causes are defined by the following ICD-10 codes: I.01, I.05-I.09, I.11, I.13, I.20-I.27, I.30-I.52. Cerebrovascular causes are defined by the following ICD-10 codes: I.10, I.12, I.60-I.99.

a

Readmissions From the Hospital Discharge Registry the following information was extracted: hospital where patient was admitted during follow-up, date and time of hospital admission, insurance, reason for admission, urgency of admission, date and time of discharge, destination after discharge, primary diagnosis according to the 9th revision of the International Classification of Diseases (ICD-9)(11), primary intervention, primary department admitted to.(8) Results of the linkage to the Hospital Discharge Registry are shown in Table 4. Audit In 2009 an audit committee was installed by the NVT to obtain insight into the process of data collection in participating centers, and completeness, uniformity and reliability of collected data. Five centers volunteered to participate in pilot audits that took place in 2010. During on-site visits the audit committee evaluated through a standardized interview the process of data collection. Additionally a random sample of 2.5% of surgical procedures from 1.5-0.5 year preceding the audit were audited through renewed collection of data by the auditors and matching of data from the national database. The audit committee concluded that an increased awareness of the process of data collection and management was achieved, and initiatives to improve this process were stimulated. A new cycle of audits was proposed.

34


Data Resource Profile: Adult cardiac surgery database

Table 4: Hospital admission and readmissions in the Dutch Adult Cardiac Surgery Database Subset of database (10 centers) Procedures performed 2007-2009 a In-hospital mortality primary center according to national registry In-hospital mortality primary center according to database Transfer to secondary centers In-hospital mortality including transfer to secondary centers Mean post-operative length of stay in index hospital (days) Mean post-operative length of stay including subsequent stay in secondary hospital (days, SD) Readmissions one year after intervention None 1 2 3 4 5 or more Unknown Causes first readmission Cardiac Cerebrovascular Other Causes all readmissions Cardiac Cerebrovascular Other

N= 21112 (% or s.d.) 677 647 10742 749 8.7 12.4

3.2 3.1 50.9 3.5 10.7 12.7

12437 4374 2089 940 497 596 182

58.9 20.7 9.9 4.5 2.4 2.8 0.9

2410 468 5618

28.4 5.5 66.1

4777 907 11838

27.3 5.2 67.6

Results are based on procedures performed between 1 January 2007 and 31 December 2009 in 10 out of 16 centers that could be linked to the Hospital Discharge Registry. In total 21115 of 25044 procedures were successfully linked (84.3%). Cardiac causes are defined by the following ICD-9 codes: 391, 393-398, 402, 404, 410-416, 420-429. Cerebrovascular causes are defined by the following ICD-9 codes: 430-459.

a

Data Resource use The initial purpose of the database was to monitor and maintain the quality of care by evaluation of outcome parameters. This has been accomplished by regular discussion of results in confidentiality: mortality rates of the participating centers are reported to the corresponding delegate in the Data Registration Committee, along with the national results and anonymous results of all peers. Recently the possibility to use the database also for research has been exploited. To serve the initial purpose of the database, research was first aimed at optimizing the methodological issues regarding comparison of outcomes and benchmarking. This resulted in several methodological papers.(12-14) The first study showed that ranking lists are an imprecise statistical method to report cardiac surgery mortality rates and prone to random fluctuation. Therefore, ranking lists should not be used and comparison against a benchmark is recommended instead.(12) A second study

35


Chapter 3

showed that benchmarking based on risk-adjusted mortality rates can be manipulated by misclassification of risk factors. Limited upcoding of multiple risk factors in high-risk patients can greatly influence benchmarking. Therefore, the prevalence of all risk factors should be carefully monitored. (13) For this purpose Statistical Process Control in the form of Shewhart control charts, exponentially-weighted moving average (EWMA) charts and cumulative sum (CUSUM) charts can be used.(15) The most recent study showed that the course of early mortality after cardiac surgery differs across interventions and continues up to approximately 120 days. As a consequence, follow-up should be prolonged to capture early mortality of all types of interventions [data not published]. In the near future, clinical issues will be addressed as well. The database will be used to investigate the survival after CABG and valvular surgery. Strengths and weaknesses Nation-wide participation and completeness of data The main strength of this database is that interventions from all hospitals in the Netherlands are included and completeness of data is very high (99% of the data is complete). This ensures complete coverage of all cardiac surgery procedures in a country. In other countries we have seen organizations struggling to achieve this.(16-18) Often, participation cannot be made compulsory for all hospitals and particularly private hospitals do not contribute data. The first follow-up could only be performed in a subset of the database. Centers were requested to send the full 6 digit zip code of all patients operated since 2007 to the national database. We expected this to be the major reason for non-participation in the follow-up. Since 1 January 2012 the full zip code has been collected upon inclusion in the database, thereby eliminating this issue. Quarterly meetings of the Data Registration Committee provide the opportunity to give feedback on the quality of the data, so that centers can (re)submit missing or incorrect information. However, the largest advantage of these frequent meetings is that results and trends are analyzed and discussed. Because the exact methods to analyze and publish the risk-adjusted outcomes have not been fully developed, the process is currently performed in a confidential environment. Completeness of follow-up and data accuracy One of the challenges this database faces is a structured follow-up protocol. Ideally, the linkage to the national registries would be performed systematically after a predefined period. Completeness of follow-up depends on the completeness of the national registries and the sensitivity of the matching procedure (using date of birth, sex and zip code). Fortunately, 97.3 % of procedures could be matched with a patient in the Cause of Death

36


Data Resource Profile: Adult cardiac surgery database

Registry. This means that patients of in total 33.085 out of 34011 interventions were traced for follow-up of survival status. Although all deaths are registered in the Cause of Death Registry, the Hospital Discharge Registry is less complete. The HDR receives administrative data on a voluntary base. For the years 2007, 2008 and 2009 in total 14.3, 14.3 and 14.6% of all admissions in The Netherlands were not registered in the HDR or could not be matched to a personal record.(19) This is likely to have led to an underestimation of readmission rates. Despite this shortcoming, the HDR has proven to be a valuable tool for the follow-up of a large number of procedures such as in a national database. Another challenge to the adult cardiac surgery database, as to all other large databases, is to ensure data accuracy. Efforts in this area include the quarterly meetings, the audits and verification of mortality using the national registries. Data Resource access Collaboration with other large databases on cardiac surgery is encouraged. Some hospitals in The Netherlands have participated in the database of the European Association for Cardio-Thoracic Surgery (EACTS). Data supplied to the EACTS as well as the NVT database remain the property of the original data owner (being the hospitals). All decisions regarding the data (including data sharing) are taken by the individual centers, assembled in the Data Registration Committee. Interested researchers and directors can visit www.nvtnet.nl for more information on the database and contact the Netherlands Association for CardioThoracic Surgery at secretariaat@nvtnet.nl to submit proposals for collaboration. Variable lists are available upon request. â&#x20AC;&#x192; â&#x20AC;&#x192;

37


Chapter 3

Reference List 1. 2. 3. 4. 5.

6. 7. 8. 9. 10. 11. 12. 13.

14. 15. 16. 17.

18. 19.

38

Healthcare Financing Administration. Medicare Hospital Information Report. Washington, DC: Government Printing Office; 1992. Edwards FH, Clark RE, Schwartz M. Coronary artery bypass grafting: the Society of Thoracic Surgeons National Database experience. Ann Thorac Surg 1994;57:12-9. Nashef SA, Roques F, Michel P, Gauducheau E, Lemeshow S, Salamon R. European system for cardiac operative risk evaluation (EuroSCORE). Eur J Cardiothorac Surg 1999;16:9-13. Parsonnet V, Dean D, Bernstein AD. A method of uniform stratification of risk for evaluating the results of surgery in acquired adult heart disease. Circulation 1989;79:I3-12. Shahian DM, Oâ&#x20AC;&#x2122;Brien SM, Filardo G, Ferraris VA, Haan CK, Rich JB, Normand SL, DeLong ER, Shewan CM, Dokholyan RS, Peterson ED, Edwards FH, Anderson RP. The Society of Thoracic Surgeons 2008 cardiac surgery risk models: part 1--coronary artery bypass grafting surgery. Ann Thorac Surg 2009;88:S2-22. Centraal Bureau voor de Statistiek. www.cbs.nl. Den Haag, The Netherlands. Dutch Hospital Data. www.dutchhospitaldata.nl. Utrecht, The Netherlands. Vaartjes I, Hoes AW, Reitsma JB, de BA, Grobbee DE, Mosterd A, Bots MI. Age- and gender-specific risk of death after first hospitalization for heart failure. BMC Public Health 2010;10:637. Slobbe LC, Arah OA, de BA, Westert GP. Mortality in Dutch hospitals: trends in time, place and cause of death after admission for myocardial infarction and stroke. An observational study. BMC Health Serv Res 2008;8:52. Wong A, Boshuizen HC, Schellevis FG, Kommer GJ, Polder JJ. Longitudinal administrative data can be used to examine multimorbidity, provided false discoveries are controlled for. J Clin Epidemiol 2011;64:1109-17. World Health Organization. www.who.int. Siregar S, Groenwold RH, Jansen EK, Bots ML, van der Graaf Y, van Herwerden LA. Limitations of ranking lists based on cardiac surgery mortality rates. Circ Cardiovasc Qual Outcomes 2012;5:403-9. Siregar S, Groenwold RHH, Versteegh MIM, Noyez L, Ter Burg WJPP, Bots ML, van der Graaf Y, van Herwerden LA. Gaming in risk-adjusted mortality rates: Effect of misclassification of risk factors in the benchmarking of cardiac surgery risk-adjusted mortality rates. J Thorac Cardiovasc Surg. 2012;145:781-9. Siregar S, Versteegh MI, van Herwerden LA. [Risk-adjusted hospital mortality rates]. Ned Tijdschr Geneeskd 2011;155:A4103. Siregar S, Roes K, van Straten AHM, Bots ML, van der Graaf Y, van Herwerden LA, Groenwold RH. Statistical methods to monitor risk factors in a clinical database. Circ Cardiovasc Qual Outcomes 2013;6:110-8. The Society for Cardio-thoracic Surgery in Great Britain & Ireland. Sixth National Adult Cardiac Surgical Database Report 2008, Demonstrating Quality. Dendrite Clinical Systems Ltd.; 2009 Jul. Jacobs JP, Edwards FH, Shahian DM, Haan CK, Puskas JD, Morales DL, Gammie JS, Sanchez JA, Brennan JM, Oâ&#x20AC;&#x2122;Brien SM, Dokholyan RS, Hammill BG, Curtis LH, Peterson ED, Badhwar V, George KM, Mayer JE, Jr., Chitwood WR, Jr., Murray GF, Grover FL. Successful linking of the Society of Thoracic Surgeons adult cardiac surgery database to Centers for Medicare and Medicaid Services Medicare data. Ann Thorac Surg 2010;90:1150-6. Dinh DT, Lee GA, Billah B, Smith JA, Shardey GC, Reid CM. Trends in coronary artery bypass graft surgery in Victoria, 2001-2006: findings from the Australasian Society of Cardiac and Thoracic Surgeons database project. Med J Aust 2008;188:214-7. Centraal Bureau voor de Statistiek. Documentatierapport Landelijke Medische Registratie (LMR) 2009V1. 2011 Jul 25.


Part 1 Measuring safety in cardiac surgery


4 Performance indicators for hospitals

Siregar S, Groenwold RHH, Versteegh MIM, van Herwerden LA Ned Tijdschr Geneeskd. 2012;156(49):A5487


Chapter 4

Abstract A good performance indicator reflects the quality of care and often uses clinical outcomes of patients. When outcomes are difficult to obtain, process and structure variables can provide a picture of the quality of care. A single performance indicator for the entire hospital is unattainable and efforts should focus on performance indicators for specific diseases and populations. There are many pitfalls and limitations in the use of performance indicators to compare hospitals; these include correction for casemix, the statistical precision of the measurements and the validation of performance indicators. Well-considered performance indicators can be used to monitor trends and to identify providers that perform below a certain benchmark, but they cannot be used to rank them.

42


Performance indicators for hospitals

Introduction The quality of care in Dutch hospitals is measured in many different ways. The Algemeen Dagblad Newspaper and Elsevier Magazine publish yearly league tables of best performing hospitals, the debate on the Hospital Standardized Mortality Rate (HSMR) has been going on for years and recently the Inspectorate of Health expanded its set of quality indicators.(1-4) Measuring quality starts with determining the goal of the assessment. Subsequently the appropriate methods may be selected and finally, limitations and other considerations should be acknowledged to ensure a correct interpretation of results. This article will elaborate on these and other issues that are related to the quality assessment of hospitals. The purpose of quality assessment There are two important reasons to measure the quality of care provided in hospitals: internal quality monitoring and public accountability. Internal quality monitoring is initiated by a department, a hospital, or healthcare professionals themselves. Improvement or deterioration of quality can be identified and policies may be adjusted wherever necessary. For example, when the number of wound infections in surgical units is monitored, an increase may trigger changes in hygiene measures. The quality of care can also be compared to other hospitals or to a common benchmark. Both underperformers and best practices may be identified; the latter could serve as an example to other hospitals. The results of internal quality assessments are usually not published. Some examples of internal quality assessment and monitoring programs are the Netherlands Association for Cardio-Thoracic Surgery (NVT, www.nvtnet.nl) and the Dutch Institute for Clinical Auditing (DICA, www. dica.nl). The NVT has recorded all cardiac surgery interventions in The Netherlands since 2007 and the DICA registries include interventions performed on patients with colorectal-, breast-, gastric-, esophageal- and lung cancer.(5) External quality assessment and monitoring is performed in the light of public accountability. The Inspectorate of Health measures and monitors the quality of care to ensure all healthcare providers meet a set of predefined requirements. In contrast, patients, health insurance companies and the media will mainly focus on the identification of top performers. Measuring quality In Dutch hospitals performance equals the quality of the care delivered to patients. Unlike in business, efficiency, profit, growth and expansion are generally not taken into account when quality is assessed. This means that performance indicators and quality measures are exchangeable in healthcare. The quality of care is a complex concept. It is defined by the National Healthcare Council as â&#x20AC;&#x153;the extent to which the entirety of characteristics (as accomplished) of a product, process or service meets the requirements (predefined or 43


Chapter 4

expected) that follow from its purposeâ&#x20AC;?.(6) Thus, the definition of the quality of care reflects the purpose and the demands of the delivered care. In order to measure if it conforms to the demands, concrete aspects of care can be appraised, so called performance indicators. Thus, the purposes and perspectives of the quality assessment influence the selection of the performance indicators. In general, the aim of medical treatment is to improve the clinical situation or the quality of life of a patient. It follows that the quality of care is primarily measured according to the clinical outcomes of a patient: outcome measures. In case these are difficult to obtain, or for any other reason are not at hand, the structure and processes of care can help form a picture of the quality of care. This triad of outcomes, structure and process variables reflect the complexity of the concept quality of care.(7) Structure indicators provide an outline of the setting or system in which care is delivered and describe the hospital as an organization. The most commonly used structure indicator in healthcare is volume.(8;9) Other examples include the availability of specialized units, machinery and personnel (such as a coronary care or intensive care unit), the status of teaching hospital and the number of patients per nurse. Structure indicators are difficult to influence by care providers and are for that reason less useful in programs targeted at direct quality improvement. Structure indicators are relatively easy to obtain, as the information is gathered on a hospital level. Process indicators describe the process of care, such as expressed in time-on-waiting list or the compliance to guidelines.(8) Examples include the proportion of patients receiving beta-blockers after coronary bypass surgery and the use of thrombo-embolic prophylaxis in immobilized patients. As opposed to the structure of care, processes of care can be improved. Process variables can usually be collected on a hospital level as well. Outcomes are the most commonly used performance indicators. They include mortality, complications, percentage of successful treatments and quality of life. The disadvantage of the use of outcomes is that they are collected on a patient level, which in general requires time and financial resources. Considerations when selecting a performance indicator Is the performance indicator relevant? Structure, process and outcomes measures for one disease may not be relevant for another disease. Every diagnosis, treatment, or group of comparable treatments requires the selection of a new set of relevant parameters: in every group the relation between the performance indicator and the quality of care must be reconsidered. For example, in-hospital mortality is used as a performance indicator in cardiac surgery. However, the measure clearly has no relation with quality in palliative care. Another example

44


Performance indicators for hospitals

is patient satisfaction. The satisfaction of a patient is based on a wide range of aspects of care. Some aspects (such as friendliness of staff and room facilities) are not associated with clinical outcomes, which makes the validity of patient satisfaction as a performance indicator doubtful.(10) Finally, the relation between volumes and clinical outcomes is often debated and differs across interventions.(9) This means that volume can only be used as a performance indicator in certain situations. A performance indicator should have face validity (are the parameters relevant at first sight?) and construct validity (is there a relation between the measured aspects and the quality of care i.e. outcomes?).(11;12) As the latter is uncertain for structure and process variables, so is the appropriateness of their use as performance indicators. Is the definition of the performance indicator unambiguous? Next, the definition of the performance indicator should be unambiguous. Mortality is unambiguous and relatively easy to determine. However, limb functionality, patient satisfaction and a complicated clinical course require clear definitions. Even with mortality further specification is needed, for example on the time period of measurement: often inhospital mortality in the primary center is used, but 30-day and 1-year mortality might also be applied. Does the performance indicator cover all aspects of quality? Composite performance indicators combine multiple parameters, for example mortality, volume and time-on-waiting list. Summing these parameters into one score requires them to be independent. A correlation between the parameters would mean that they were measuring the same aspect and that they should not simply be summed. In addition, the weighting of the parameters relative to each other should be determined. Varying methods of combining the individual parameters could greatly influence the composite score.(13-15) Lastly, the largest concern is whether the performance indicator covers all aspects of quality. This relates to the content of the performance indicator and is thus called content validity. Limitations of performance indicators Can quality be compared across hospitals? The performance indicator can be affected by other factors than quality, for example patient characteristics such as age and stage of disease.(11) The characteristics of the treated population is also called casemix (Box 1). In the example of cardiac surgery, the risk of mortality is calculated using risk models based on patient characteristics.(16-18) Adjustment for differences in the preoperative risk (i.e. casemix) allows for comparison of hospitals. If adjustment for differences in casemix is insufficient, the performance indicator does not

45


Chapter 4

adequately reflect the quality of care. In other words, an adequate performance indicator must have attributable validity (are the differences in the performance indicator attributable to differences in quality?).(11;12) If the patient population across hospitals is completely different, adjustment for casemix is not possible and a comparison is inappropriate.(19) For example, it would be meaningless and incorrect to compare a childrenâ&#x20AC;&#x2122;s hospital with a hospital performing mainly cataract surgery. This example shows why the quality of a hospital cannot be measured using a hospital-wide performance indicator. For most diseases there is no consensus on the adequate performance indicators, or on the appropriate risk factors to adjust for. Box 1: differences in casemix A hospitalâ&#x20AC;&#x2122;s casemix is described by the characteristics of its patient population. For example, the typical casemix of an academic center consists of complex surgery in relatively ill patients, which is in general different to the casemix of non-academic centers. Differences in casemix affect the comparison between hospital outcomes. Therefore, performance indicators cannot be evaluated without consideration of a hospitalâ&#x20AC;&#x2122;s casemix. Risk models (such as the HSMR) can be used to summarize the range of patient characteristics into one overall score.(16-18) Such a risk model calculates an expected rate, which can be compared to the observed outcome rate, for example using the observed:expected ratio (O:E ratio). An O:E ratio of 1 means that the observed mortality is exactly equal to the expected mortality. An O:E ratio above 1 means that the observed mortality is higher than expected and an O:E ratio below 1 means that the observed mortality is lower than expected.

Administrative or clinical data? Administrative data, for example collected by hospital administrations, contains few patient characteristics such as age, sex, primary diagnosis and secondary diagnoses. The HSMR is a score that is calculated using risk adjustment models based on such data (Box 2).(4) There is concern that administrative data might not contain sufficient information for adequate risk adjustment. For example, the severity of disease and the physical condition of a patient are usually not recorded in an administrative database. In general, more elaborate clinical information is required for proper risk adjustment. However, a clinical database requires both time and financial resources. Box 2: HSMR The Hospital Standardized Mortality Ratio (HSMR) is a relative measure for hospital mortality and is composed of Standardized Mortality Ratios (SMRs) of 50 diagnosis groups.(4) One diagnosis group comprises multiple comparable diseases. For each diagnosis group, the mortality rate is calculated and compared to the expected mortality rate based on the national results, while

46


Performance indicators for hospitals

accounting for casemix differences. The information used to perform risk adjustment is derived from hospital administration systems. A major concern to the HSMR is that administrative data are incomplete and inaccurate. In addition, they may not contain sufficient information to adequately adjust for differences in casemix.

Uncertainty of a performance indicator Hospital performance may vary due to chance.(20) A hospital will rarely have the exact mortality rate in two consecutive years. If this variation due to chance is not accounted for, the differences in performance will incorrectly be attributed to differences in the quality of care.(20) The larger the number of patients included in the analysis of a performance indicator, the smaller the variation due to chance. For this reason, performance indicators are preferably applied to large numbers of patients. If necessary, data from multiple years may be aggregated. A study on the quality of care in the treatment of breast cancer showed that the local recurrence rate is not suitable to use as a performance indicator across hospitals, because of the low incidence.(21) A second point of consideration is the clustering of patients within hospitals. Patients who are treated in one hospital are more alike than patients treated in another hospital. As a result, the true difference in the quality of care between two hospitals is likely to be smaller than what is measured. So called random effect models are statistical models that account for this form of clustering.(22;23) The uncertainty of a performance indicator (due to clustering within hospitals) should always be considered in the interpretation of differences between hospitals.(14) Limitations of ranking lists Ranking lists are an appealing way to report results. One can see at a glance which hospital performs best. A ranking suggests that hospitals actually differ (as they are ranked on different positions) and that the quality of care increases or decreases proportional to the ranks. In reality, hospitals ranked in different positions will sometimes be comparable with regard to the quality of care. In addition, a ranking list is a relative method of comparison, which does not provide information on the absolute quality of a hospital. A good alternative to a ranking list is the comparison against a common value or bench (also called benchmarking), while taking into account the statistical uncertainty of the indicator, as shown in Figure 1.(24) Positive and negative consequences of performance indicators Merely the measuring of outcomes is thought to have a positive effect on the quality of care.(25-28) The evaluation of results is likely to trigger quality improvement, irrespective of public release. For example, in cardiac surgery, the reporting of performance indicators led to a decrease in mortality rates; both after public release and when results were reported

47


Chapter 4

internally.(26;27) Some claim that publication of results lead to market forces that may be beneficial to the quality of care: poor performance could lead to a decrease in revenue if this is known to the public, while top performers might see an increase in patients. Lastly, quality assessment is necessary to prevent poor performing hospitals from continuing to provide healthcare. However, quality assessment may also have negative consequences. One of the negative consequences is risk averse behavior, which means that hospitals may refuse to perform complex surgery or operate on high-risk patients. In addition, hospitals may report less mortality or a worse patient severity than in reality, as this improves risk-adjusted results. Also, the focus on the performance indicators may result in other -unmeasured- aspects to be neglected. Finally, an inadequate performance indicator may lead to undue criticism or sanctions, which could have a catastrophic effect on the market position of a hospital or healthcare provider.

5 4 3 2 1

lower than benchmark higher than benchmark

Performance indicator e.g. mortality (%)

Figure 1: Example of a performance indicator: estimates with accompanying confidence intervals

centers

 The performance indicator of a center is compared to the benchmark (dotted line). The performance indicator is lower than the benchmark (green) or higher than the benchmark (red) when the confidence interval does not cover the benchmark.

Criticism Ranking lists, such as for example those constructed by the Algemeen Dagblad Newspaper and Elsevier Magazine, appear to reduce the complexity of the concept of quality into something simple.(2;3) The quality of care in hospitals is inadequately assessed using mainly structure- and process indicators. Cardinal outcomes, such as mortality or complications after major surgical interventions, are not considered. Further concerns relate to the

48


Performance indicators for hospitals

selection of the indicators, the weighting of the individual indicators, the lack of error margins in the comparison across hospitals and the absence of an absolute benchmark. As a result, these ranking lists are not correlated and seem to be measuring something other than the quality of care, as shown in Figure 2.(29) However, perhaps the strongest argument against such ranking lists and also against the HSMR, is that the performance of a very heterogeneous group of diseases is reduced into one hospital-wide score. A hospitalwide score is uninformative for the quality of care for specific diseases. After all, the relevant performance indicators differ across the spectrum of treatments and diseases. Combining a wide variety of performance indicators will eventually lead to a score that is not useful for either the patient or the healthcare provider. Figure 2: Correlation between the ranking lists of Algemeen Dagblad Newspaper and Elsevier Magazine

The correlation coefficient between the two scores was r=0.14. A perfect correlation would result in a diagonal line (r=1). From Giard RW. Ned Tijdschr Geneeskd 2006;150(43):2355-8. Used with permission.(29)

Conclusion An adequate performance indicator measures the quality of the delivered care, usually expressed as clinical patient outcomes. If outcomes are not available, the process and structure of care may help describe the quality of care. Considering the wide variety of diseases and treatments, the use of one uniform hospital-wide performance indicator is unattainable. Instead, we should focus on performance indicators for specific diseases. There are many issues to consider in the comparison of hospitals. Carefully selected performance

49


Chapter 4

indicators may be used to monitor trends and to identify underperformance. However, an assessment of the quality of care in hospitals is too complex to allow for hospital ranking lists, let alone to identify the best hospital in The Netherlands.

50


Performance indicators for hospitals

Reference List 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22.

Veiligheidsindicatoren ziekenhuizen 2010 t/m 2012. Utrecht: Inspectie voor de Gezondheidszorg; 2010. AD Ziekenhuis Top 100. Algemeen Dagblad www.ad.nl/ziekenhuistop100/ De beste ziekenhuizen. Elsevier Magazine http://www.elsevier.nl/web/Weekblad-78/De-besteziekenhuizen-2011.htm Jarman B, Pieter D, van d, V, Kool RB, Aylin P, Bottle A, et al. The hospital standardised mortality ratio: a powerful tool for Dutch hospitals to assess their quality of care? Qual Saf Healthcare 2010 Feb;19(1):9-13. Siregar S, Versteegh MI, van Herwerden LA. Risicogewogen ziekenhuismortaliteit. Ned Tijdschr Geneeskd 2011;155(50):A4103. Advies kwaliteit van zorg: terreinverkenning en prioriteiten voor wetenschappelijk onderzoek kwaliteit van zorg. Den Haag: Raad voor Gezondheidsonderzoek, RGO; 1990. Donabedian A. Explorations in quality assessment and monitoring. The definition of quality and approaches to its assessment. Michigan: Health Administration Press; 1980. Birkmeyer JD, Dimick JB, Birkmeyer NJ. Measuring the quality of surgical care: structure, process, or outcomes? J Am Coll Surg 2004 Apr;198(4):626-32. Birkmeyer JD, Siewers AE, Finlayson EV, Stukel TA, Lucas FL, Batista I, et al. Hospital volume and surgical mortality in the United States. N Engl J Med 2002 Apr 11;346(15):1128-37. Kupfer JM, Bond EU. Patient satisfaction and patient-centered care: necessary but not equal. JAMA 2012 Jul 11;308(2):139-40. Iezzoni LI. Risk adjustment for measuring healthcare outcomes. Ann Arbor, Mich: Health Administration Press; 1994. Shahian DM, Edwards FH, Ferraris VA, Haan CK, Rich JB, Normand SL, et al. Quality measurement in adult cardiac surgery: part 1--Conceptual framework and measure selection. Ann Thorac Surg 2007 Apr;83(4 Suppl):S3-12. O’Brien SM, DeLong ER, Dokholyan RS, Edwards FH, Peterson ED. Exploring the behavior of hospital composite performance indicators: an example from coronary artery bypass surgery. Circulation 2007 Dec 18;116(25):2969-75. Jacobs R, Goddard M, Smith PC. How robust are hospital ranks based on composite performance indicators? Med Care 2005 Dec;43(12):1177-84. O’Brien SM, Shahian DM, DeLong ER, Normand SL, Edwards FH, Ferraris VA, et al. Quality measurement in adult cardiac surgery: part 2--Statistical considerations in composite measure scoring and provider rating. Ann Thorac Surg 2007 Apr;83(4 Suppl):S13-S26. Shahian DM, O’Brien SM, Filardo G, Ferraris VA, Haan CK, Rich JB, et al. The Society of Thoracic Surgeons 2008 cardiac surgery risk models: part 1--coronary artery bypass grafting surgery. Ann Thorac Surg 2009 Jul;88(1 Suppl):S2-22. Nashef SA, Roques F, Michel P, Gauducheau E, Lemeshow S, Salamon R. European system for cardiac operative risk evaluation (EuroSCORE). Eur J Cardiothorac Surg 1999 Jul;16(1):9-13. Parsonnet V, Dean D, Bernstein AD. A method of uniform stratification of risk for evaluating the results of surgery in acquired adult heart disease. Circulation 1989 Jun;79(6 Pt 2):I3-12. Shahian DM, Normand SL. Comparison of “risk-adjusted” hospital outcomes. Circulation 2008 Apr 15;117(15):1955-63. van Dishoeck AM, Looman CM, van der Wilden-van Lier EC, Mackenbach JP, Steyerberg EW. Prestatie-indicatoren voor ziekenhuizen. De invloed van onzekerheid. Ned Tijdschr Geneeskd 2009 Apr 25;153(17):804-11. van der Heiden-van der Loo, Ho VK, Damhuis RA, Siesling S, Menke MB, Peeters PH, et al. Weinig lokaal recidieven na mammachirurgie: goede kwaliteit van de Nederlandse borstkankerzorg. Ned Tijdschr Geneeskd 2010;154:A1984. Lingsma HF, Steyerberg EW, Eijkemans MJ, Dippel DW, Scholte Op Reimer WJ, Van Houwelingen HC. Comparing and ranking hospitals based on outcome: results from The Netherlands Stroke Survey. QJM 2010 Feb;103(2):99-108.

51


Chapter 4

23. 24. 25. 26. 27. 28. 29.

52

Austin PC, Tu JV, Alter DA. Comparing hierarchical modeling with traditional logistic regression analysis among patients hospitalized with acute myocardial infarction: should we be analyzing cardiovascular outcomes data differently? Am Heart J 2003 Jan;145(1):27-35. Siregar S, Groenwold RH, Jansen EK, Bots ML, van der GY, van Herwerden LA. Limitations of ranking lists based on cardiac surgery mortality rates. Circ Cardiovasc Qual Outcomes 2012 May 1;5(3):4039. Bradley EH, Holmboe ES, Mattera JA, Roumanis SA, Radford MJ, Krumholz HM. Data feedback efforts in quality improvement: lessons learned from US hospitals. Qual Saf Healthcare 2004 Feb;13(1):2631. Hannan EL, Sarrazin MS, Doran DR, Rosenthal GE. Provider profiling and quality improvement efforts in coronary artery bypass graft surgery: the effect on short-term mortality among Medicare beneficiaries. Med Care 2003 Oct;41(10):1164-72. Oâ&#x20AC;&#x2122;Connor GT, Plume SK, Olmstead EM, Morton JR, Maloney CT, Nugent WC, et al. A regional intervention to improve the hospital mortality associated with coronary artery bypass graft surgery. The Northern New England Cardiovascular Disease Study Group. JAMA 1996 Mar 20;275(11):841-6. Scales DC, Dainty K, Hales B, Pinto R, Fowler RA, Adhikari NK, et al. A multifaceted intervention for quality improvement in a network of intensive care units: a cluster randomized trial. JAMA 2011 Jan 26;305(4):363-72. Giard RW. Ziekenhuizentop-100: wisselende ranglijsten, wisselende reputaties. Ned Tijdschr Geneeskd 2006 Oct 28;150(43):2355-8.


5 Statistical methods to monitor risk factors in a clinical database: example of a national cardiac surgery registry

Siregar S, Roes KCB, van Straten AHM, Bots ML, van der Graaf Y, van Herwerden LA, Groenwold RHH Circ Cardiovasc Qual Outcomes. 2013 Jan 1;6(1):110-8


Chapter 5

Abstract Background Comparison of outcomes requires adequate adjustment for differences in patient risk and the type of intervention performed. Both unintentional and intentional misclassification (also called gaming) of risk factors might lead to incorrect benchmark results. Therefore, misclassification of risk factors should be detected. We investigated the use of Statistical Process Control (SPC) techniques to monitor the frequency of risk factors in a clinical database. Methods and results A national population based study was performed using simulation and statistical process control. All patients undergoing cardiac surgery between 1 January 2007 and 31 December 2009 in all 16 cardio-thoracic surgery centers in The Netherlands were included. Data on 46,883 consecutive cardiac surgery interventions were extracted. The expected risk factor frequencies were based on 2007 and 2008 data. Monthly frequency rates of 18 risk factors in 2009 were monitored using Shewhart control charts, exponentially weighted moving average (EWMA) and cumulative sum (CUSUM) charts. Upcoding (i.e. gaming) in random patients was simulated and detected in 100% of the simulations. Subtle forms of gaming, involving specifically high-risk patients, were more difficult to identify (detection rate of 44%). However, the accompanying rise in mean logistic EuroSCORE was detected in all simulations. Conclusions SPC in the form of Shewhart control charts, EWMA and CUSUM charts provide a means to monitor changes in risk factor frequencies in a clinical database. Surveillance of the overall expected risk in addition to the separate risk factors ensures a high sensitivity to detect gaming. The use of SPC for risk factor surveillance is recommended.

54


Statistical methods to monitor risk factors in a clinical database

Introduction Outcomes evaluation is an essential part of maintaining and improving the quality of care. (1;2) Often, mortality rates or other outcomes are collected and compared across hospitals or against a common benchmark. Fair comparison of results requires adequate adjustment for differences in patient risk and the type of intervention performed. This is called “casemix” and determines the “risk profile” of a hospital. For this purpose so called risk adjustment methods are used. By correcting for patient characteristics and variables describing the intervention, these methods “level the playing field” and enable the comparison of results. Risk adjustment models thus constitute a fundamental element in outcomes evaluation.(2) However, much concern has been expressed about the accuracy of coding of risk factors included in such models.(3;4) After all, errors in the coding of risk factors could invalidate the comparison of outcomes. In most databases different forms of checks have been implemented to reduce erroneous coding. Inter-observer variability, ambiguous risk factor definitions and random errors could cause unintentional undercoding and upcoding. In addition, intentional misclassification of risk factors -also called “gaming”- has the goal to increase patient severity and thereby “improve” risk-adjusted outcome. The possibility of gaming has been a concern ever since outcomes were evaluated.(3) To investigate its occurrence, audits might be performed that compare reported data to patient records. Unfortunately, such audits are expensive, time consuming and laborious. In the manufacturing industry methods have been developed to monitor variables that reflect the manufacturing processing of a product, such as the length of a bolt or the content of a can of soft drink.(5;6) This encompasses a wide range of tools that are all described as Statistical Process Control (SPC). In healthcare some of these techniques have already been opted for the monitoring of outcomes, e.g. CUSUM techniques in cardiac surgery.(7-10) However, to our knowledge, we are the first to describe the use of SPC techniques in their use to monitor risk factors in clinical databases. The aim of this study was to illustrate and evaluate the use of different SPC methods for the purpose of monitoring variables in clinical databases, using empirical data on cardiac surgery in The Netherlands. Cardiac surgery is known to have a longstanding history of outcomes evaluation, with advanced risk models that are widely used for this purpose(11-13).

55


Chapter 5

Methods Data Data was obtained from the adult national cardiac surgery database of the Netherlands Association of Thoracic Surgery. This database has a national coverage with participation of all 16 centers performing cardiac surgery in The Netherlands. In total 46.883 consecutive cardiac surgeries were included, performed between 1 January 2007 and 31 December 2009. The anonymized dataset consisted of risk factors that were defined according to the EuroSCORE definitions(11): age, gender, serum creatinine > 200 micromole/l, extracardiac arteriopathy, pulmonary disease, neurological dysfunction, previous cardiac surgery, recent myocardial infarction, left ventricular ejection fraction (LVEF) 30-50%, LVEF <30%, systolic pulmonary pressure > 60 mmHg, active endocarditis, unstable angina, emergency operation, critical preoperative state, ventricular septal rupture, other than isolated coronary surgery and thoracic aortic surgery. The baseline characteristics of our study population are described in Table 1. The majority of interventions comprised CABG, either isolated or with concomitant surgery. Statistical process control (SPC) methods The proportion of patients with a specific risk factor will be referred to as the “frequency” of that risk factor. All frequencies are calculated per month. Data from 2007 and 2008 (n= 30.971) were used to calculate the reference frequency of the risk variables. This will be referred to as the “expected frequency” in all further analyses. Subsequently, analyses were performed and plots were constructed on data from 2009 (n=15.912). We chose to apply the three most commonly used type of SPC charts: the Shewhart control chart, the exponentially weighted moving mean (EWMA) chart and the cumulative sum (CUSUM) chart. Shewhart control charts In a Shewhart control chart the observed frequency is plotted for consecutive time intervals. Based on pre-existing knowledge or prior observations, the expected value and the accepted range around it are marked as boundaries. The expected value was set at the expected frequency. The accepted range around it was defined in terms of the standard deviation (s.d.) of the monthly frequency rates in 2007 and 2008, weighted by sample size for each month in 2009. We set the warning threshold at 2 s.d. and the alarm threshold at 3 s.d. from the expected frequency, which is equivalent to a p-value of 0.05 and 0.003 respectively.(5;6) In a second analysis we applied extra rules to these limits: the Western Electric Rules.(5) This means an alarm is signalled when: 1) one or more points are outside the alarm limits of 3 s.d., 2) two of three consecutive points are between the 2 and 3 s.d. limits, 3) four of five consecutive points are beyond the 1 s.d. limit, 4) eight consecutive points are above or below the expected frequency. 56


Statistical methods to monitor risk factors in a clinical database

Table 1: Baseline characteristics of study population N (%) N = 46883 Risk factors Age (continuous) Female Serum creatinine >200 μmol/l Extracardiac arteriopathy Pulmonary disease Neurological dysfunction Previous cardiac surgery Recent myocardial infarct LVEF 30–50% LVEF <30% Systolic pulmonary pressure >60 mmHg Active endocarditis Unstable angina Emergency operation Critical preoperative state Ventricular septal rupture Other than isolated coronary surgery Thoracic aortic surgery Interventions CABG Isolated valve CABG and valve Logistic EuroSCORE Mortality

65.9 (± 11.2) 14049 (30) 915 (2.0) 5723 (12.2) 5312 (11.3) 1634 (3.5) 3438 (7.3) 5757 (12.3) 9098 (19.4) 2569 (5.5) 1514 (3.2) 668 (1.4) 2889 (6.2) 3056 (6.5) 2206 (4.7) 100 (0.2) 21333 (45.5) 2530 (5.4) 32956 (70.3) 17973 (38.3) 6722 (14.3) mean 7.2 (± 10.0) median 3.9 1447 (3.1)

For dichotomous variables the number of patients and percentage of total population is reported; for continuous variables the mean and standard deviation. CABG indicates coronary artery bypass graft; EuroSCORE, European system for cardiac operative risk evaluation; and LVEF, left ventricular ejection fraction.

Exponentially weighted moving mean charts Mean charts depict the mean of reported frequency measurements within the designated period in time. The exponentially weighted moving average (EWMA) is a weighted mean of the current and all past frequencies. The weights are based on time: current and recent frequency rates have an exponentially larger weight than older ones. The appropriate boundaries in the EWMA chart are based on the average run length (ARL) properties of the chart. The ARL0 is the average time it takes until the chart crosses a boundary in the situation that the mean of the measurements has not changed (equivalent to a type I error). The ARL differs for each possible change in frequency and can be estimated using simulation techniques. We set the alarm limit at 2.86 s.d. from the expected frequency. In our plots this yielded the same ARL0 properties as a Shewhart control chart with 3 s.d. limits. 57


Chapter 5

CUSUM charts The cumulative sum (CUSUM) analysis uses the cumulative difference between the observed frequency and the expected frequency of a risk factor. The expected frequency is the reference value, which in our case was the frequency during the years 2007 and 2008. This value is subtracted from every observed frequency for consecutive time intervals. The cumulative sum of all these differences, also called the CUSUM score, is what is plotted in a CUSUM chart. A tabular CUSUM chart is designed such that it deviates from the horizontal axis when there is no difference between observed and expected frequency. Before constructing a tabular CUSUM chart, an arbitrary maximum tolerated value should be specified. It is common to use 1 s.d. above the expected frequency as the limit value. The curve will rise when the observed frequency is higher than the limit value. A progression above the upper limit means that the mean frequency of the risk factor has changed with more than 1 s.d. When the CUSUM drops below zero it restarts at zero again to maintain sensitivity. The same was performed to detect a decrease in frequency. This resulted in a chart with in the upper half a CUSUM to detect an increase and in the lower half one to detect a decrease.(5;6) The appropriate boundaries for detection of a change in the tabular CUSUM chart, denoted as h, are based on the ARL properties. Just as for the EWMA chart, the ARL of the tabular CUSUM chart can be estimated using simulation techniques. We used h=4.77, because in general this value will give approximately the same ARL0 properties as a Shewhart control chart with 3 s.d. limits.(5) Example of graphs Of the 288 graphs plotted using each SPC method we use the chart of one risk factor in one hospital to illustrate and further explain the applied methods. Figure 1 shows the Shewhart, EWMA and CUSUM charts of the variable female gender in hospital A. The mean frequency in 2007 and 2008 was 29.6% (standard error 0.9%). These measures were then used to calculate the expected frequency and the accepted range of 3 s.d. around it for 2009. As can be seen in the graph the frequency in each month of 2009 falls between the limits. Even more so, there are no frequency rates that crossed the warning limit of 2 s.d. The EWMA and CUSUM show similar results: no limits were crossed. Figure 2 shows the same charts for the logistic EuroSCORE in hospital B. The mean logistic EuroSCORE in 2007 and 2008 was 8.3% (standard error 0.15%). In the last month in 2009 the logistic EuroSCORE was higher than expected (Shewhart), resulting in an alarm for a significantly increased mean as well (EWMA and CUSUM).

58


Statistical methods to monitor risk factors in a clinical database

Figure 1: Shewhart control chart, exponentially weighted moving average (EWMA) chart and CUSUM chart for the variable female gender during 2009 in center A

The Shewhart control, exponentially weighted moving average (EWMA) and CUSUM charts indicate that the frequency is stable over 2009. Panel A: Shewhart control chart showing the frequency each month, the expected frequency and the warning (2 s.d.) and alarm (3 s.d.) boundaries as dotted lines. The chart does not signal during the whole year, meaning no observed monthly frequency was significantly higher than expected. Panel B: EWMA chart showing the average frequency of each month and all previous months, using exponentially smaller weights for the past. The chart does not cross the upper or lower boundary, meaning no observed frequency was significantly higher or lower than expected. Crosses designate the monthly measures, dots indicate the moving average. Panel C: Tabular CUSUM chart designed to detect a 1 s.d. difference with the expected frequency. The chart does not cross the upper or lower boundary, meaning that at no point during the year the mean frequency was significantly higher than expected.

59


Chapter 5

Figure 2: Shewhart control chart, exponentially weighted moving average (EWMA) chart and CUSUM chart for the mean logistic EuroSCORE during 2009 in center B

The charts cross the upper boundaries at the 12th month, meaning that the logistic EuroSCORE is higher than expected in that month (Panel A) and that the mean has changed significantly by the end of 2009 (Panel B and C). Panel A: Shewhart control chart. Panel B: EWMA chart. Panel C: Tabular CUSUM chart.

Simulation of “gaming” To study the sensitivity of the SPC methods to “gaming” of risk factors, we simulated upcoding of patients in our database. For this analysis we assumed there was no misclassification in the current database (i.e. reference). We upcoded selected variables in a number of patients in the reference database. The number of upcoded patients was based on the results of our previous study.(14) In that study we simulated “gaming” and investigated how much upcoding is needed in one center to affect the results of a benchmarking procedure, meaning to convert the outlier status of the specific center. We performed this analysis in four centers: the two high-mortality outliers were upcoded until they became average centers and two average centers were upcoded until they became low mortality outliers. We simulated misclassification in one center, while the risk factors in all other centers remained unchanged. Variables were chosen based on the clinical probability of misclassification, the weight of the variable in the EuroSCORE model and the frequency in the database. The simulated scenarios are described along with the results in Figures 3 and 4. Upcoding was performed in random patients (non-differential misclassification) and in patients with the highest risk (differential misclassification). In the latter, upcoding was started in patients

60


Statistical methods to monitor risk factors in a clinical database

with the highest EuroSCORE, until the desired frequency of the risk factor was reached.(14) After “gaming” was simulated in the database, the Shewhart, EWMA and CUSUM charts were constructed again. The expected frequencies and s.d. were based on the original database (2007 to 2009). It was counted how many times the methods could detect “gaming”, i.e. in how many simulations the SPC charts signalled an increase in frequency. The detection rate was averaged over the centers and risk factors. Figure 3: Detection of “gaming” by the Shewhart control chart, EWMA and CUSUM charts when random patients are upcoded Detection rate of gaming (%) Monitoring of

moderate LVF poor LVF 2

2.737.1%

2.732.6%

4.727.1%

8.427.4%

13.5x

12.0x

5.75x

3.25x

12.665.8% 2.714.4% 5.25x

14.367.8% 2.712.9% 4.75x

15.444.8% 4.713.7% 2.9x





























8.418.6%

















2.2x









8.415.2%





7.914.2%





12.633.9%

14.334.3%

15.427.8%

14.225.6%

poor LVF

2.77.4%

2.76.5%

4.78.5%

9.625.9%

15.737.7%

17.832.1%

arteriopathy

EuroSCORE 

14.231.3%

moderate LVF

3 recent MI

risk factors 

14.839.9%

11.728.0%

15.127.3%

9.016.1%

2.7x

2.4x

1.8x

1.8x









   

Centre D

   

1

Centre C

   

poor LVF

Centre B

   

Centre A

Monitoring of

   

High mortality  average

   

Average  low mortality

One, two or four risk factors were upcoded in random patients. This was simulated 1000 times. Green represents the proportion of simulations in which upcoding was detected; red the proportion in which it was not detected. The EWMA and CUSUM charts signalled each time upcoding was performed. When four risk factors were upcoded simultaneously, the increase in risk factor frequency was smaller and could not always be detected by the Shewhart method. Monitoring of the EuroSCORE as an overall score led to a 100% detection of gaming by every method. LVF: left ventricular function; MI: myocardial infarction.

61


Chapter 5

Figure 4: Detection of “gaming” by the Shewhart control chart, EWMA and CUSUM charts when high-risk patients were upcoded Detection rate of gaming (%)

moderate LVF poor LVF 2

moderate LVF poor LVF 3 recent MI arteriopathy

2.76.0%

2.75.6%

4.76.6%

8.410.5%

2.2x

2.05x

1.4x

1.25x

12.627.6%

14.324.3%

15.419.3%

14.217.8%

2.76.0%

2.74.6%

4.75.9%

8.410.5%

2.2x

1.7x

1.25x

1.25x

12.614.9%

14.316.3%

15.416.5%

14.215.7%

2.73.2%

2.73.1%

4.75.0%

8.49.3%

9.611.4%

15.717.9%

17.819.1%

7.98.6%

14.817.6% 1.19x

11.713.3% 1.14x

15.116.2% 1.07x

risk factors

Centre D

9.09.8% 1.1x

EuroSCORE

























   

1

Centre C

























   

poor LVF

Centre B

























   

Centre A

Monitoring of

   

Monitoring of

   

High mortality  average

   

Average  low mortality

One, two or four risk factors were upcoded in patients with the highest logistic EuroSCORE. Green represents the proportion of simulations in which upcoding was detected; red the proportion in which it was not detected. The increase in the frequency of the individual risk factors was limited, which led to low detection rates when monitored separately. However, the resulting rise in mean EuroSCORE was always detected when all methods were used. LVF: left ventricular function; MI: myocardial infarction.

Analysis The frequency for each month was calculated by taking the mean of the continuous variables and the proportion in the dichotomous variables. The s.d. of binomial variables was calculated using the Score method.(15) EWMA and CUSUM charts were constructed in the log-odds scale. For binomial variables the s.d. used for the EWMA and CUSUM charts were derived from the upper limit of 1 s.d. (z=1) calculated using the Score method.(15) Two iterations were performed to exclude outliers in the frequencies measured in 2007 and 2008, based on the 3 s.d. limits. The analysis was then repeated to calculate the final expected frequency and accompanying limits. This process was performed to minimize the effect of outliers on the final expected value and to ensure the expected value was based on an in-control process. Charts were constructed for all 16 hospitals and all 18 variables.

62


Statistical methods to monitor risk factors in a clinical database

Simulations of upcoding in random patients were repeated 1000 times to reduce simulation error. Differential misclassification was simulated once. All analyses were performed in R version 2.10.(16) Simulation codes are available on request.

Results Detection of an increased or decreased frequency During 2009 there were 87 alarms for an increased or decreased frequency of a risk factor (54 from the Shewhart control chart, 73 from the EWMA chart and 62 from the CUSUM chart), which is 2.5% of all 3456 reported monthly frequencies. Of these, 18 alarms referred to an increased frequency, with a range of 0 to 5 per month. Table 2 shows the alarms for an increased frequency sorted by method. During the whole year, the CUSUM chart most frequently detected an increased frequency (15 alarms), followed by the EWMA chart (14 alarms) and the Shewhart control chart (1 alarm). Most of the alarms are signalled by two or more methods. An increase in frequency was signalled on average 8.3 months after the beginning of the year by the EWMA, where this took the CUSUM 9.1 months. This suggests that the EWMA is slightly faster in the detection of a deviant frequency. Addition of the Western Electric rules to the Shewhart chart nearly doubled the number of alarms fired by this method from 54 to 114. The rule detecting 8 consecutive points above or below the expected frequency caused the most extra alarms (45 alarms). Detection of “gaming” in simulated databases When “gaming” is performed, the results are very different from the stable situation shown in the example. Figure 3 shows the sensitivity of each method to gaming of risk factors. When gaming in random patients was restricted to one or two risk factors, extensive misclassification was required (2 to 13-fold increase in risk factor frequency). This was detected by all SPC methods. However, when gaming was performed in four risk factors concurrently, less extensive upcoding (a 1.8 to 2.7-fold increase in frequency) was required. This could not always be detected by the Shewhart control chart. The CUSUM and EWMA chart maintained a 100% detection rate.

63


Chapter 5

Table 2: all alarms for an increased frequency fired by the Shewhart, EWMA and CUSUM charts

Alarms (%)

Mean time to alarm (months)

Alarms (%)

Mean time to alarm (months)

Alarm by any method (% of total)

CUSUM

Mean time to alarm (months)

Age Gender Previous cardiac surgery Pulmonary disease Extracardiac arteriopathy Neurological dysfunction Serum creatinine Active endocarditis Critical preoperative state Unstable angina LVEF 30-50% LVEF <30% Recent MI Pulmonary hypertension Emergency operation Other than isolated CABG Thoracic aorta surgery Ventricular septal rupture Overall

EWMA

Alarms (%)

Shewhart

0 0 0 0 0 0 0 0 0 0 0 1 (0.5%) 0 0 0 0 0 0 1 (0.03%)

10 10

2 (1.0%) 0 2 (1.0%) 2 (1.0%) 0 0 0 0 0 0 0 1 (0.5%) 2 (1.0%) 0 1 (0.5%) 4 (2.1%) 0 0 14 (0.41%)

12.0 6.0 9.5 10.0 8.5 9.0 6.2 8.3

2 (1.0%) 0 2 (1.0%) 2 (1.0%) 1 (0.5%) 0 0 0 0 0 1 (0.5%) 1 (0.5%) 2 (1.0%) 0 0 4 (2.1%) 0 0 15 (0.43%)

12.0 6.0 9.5 11.0 7.0 11.0 8.0 9.0 9.1

2 (1.0%) 0 2 (1.0%) 3 (1.6%) 1 (0.5%) 0 0 0 0 0 1 (0.5%) 1 (0.5%) 2 (1.0%) 0 1 (0.5%) 5 (2.1%) 0 0 18 (0.52%)

The number of alarms signalled by a method is shown for each variable. Each risk factor was plotted 192 times: 12 monthly frequency rates for each of the 16 centers in 2009. The variables are risk factors for mortality after cardiac surgery according to the EuroSCORE.(11) CABG indicates coronary artery bypass graft; EuroSCORE; LVEF, left ventricular ejection fraction. MI: myocardial infarction.

Upcoding in high-risk patients was more difficult to detect. This specific way of gaming leads to an efficient rise in the mean logistic EuroSCORE, with only limited increase in the frequency of risk factors. As can be seen in Figure 4, sensitivity was 75% when gaming was performed in one risk factor. Again, when four risk factors were upcoded concurrently, the increase in risk factor frequency was limited. This is due to the fact that addition of an extra risk factor has a larger effect on the expected risk in high-risk patients than in those with a low risk. The small increase in the separate risk factors was difficult to identify (detection rate 44%). However, the SPC methods did detect a clear rise in the mean logistic EuroSCORE in all simulated scenarios. This is illustrated in Figure 5. The alarm in the EWMA and CUSUM charts at the end of the first year indicate that the mean logistic EuroSCORE has significantly increased. 64


Statistical methods to monitor risk factors in a clinical database

Figure 5: Shewhart control chart, exponentially weighted moving average (EWMA) chart and the CUSUM chart for the logistic EuroSCORE during 2009 in center C when differential “gaming” was introduced

Panel A: Shewhart control chart showing the frequency each month, the expected frequency and the warning and alarm boundaries (dotted lines). The alarm boundary is crossed at the 11th month, meaning the logistic EuroSCORE in that month was significantly higher than expected and the gaming was detected. Panel B: EWMA chart showing the average frequency of each month and all previous months, using exponentially smaller weights for previous months. Crosses designate the monthly measures, dots indicate the moving average. From the 11th month onwards the curve clearly shows that the mean logistic EuroSCORE is higher than expected. The simulated “gaming” of risk factors was identified. Panel C: Tabular CUSUM chart designed to detect a 1 s.d. difference with the expected frequency. The upper line rises and crosses the upper limit at the 11th month, indicating that the logistic EuroSCORE has increased significantly. The simulated “gaming” of risk factors was identified.

Discussion Principle findings This paper demonstrates the use of three statistical process control tools to monitor risk variables in a cardiac surgery database: the Shewhart, EWMA and CUSUM chart. These methods of graphical display of variables provide a means to follow fluctuations in the reported frequencies. Upper and lower limits of the accepted range can be based on preceding years. To assess the sensitivity of the monitoring methods to “gaming”, we simulated upcoding of risk factors. The results of the simulations show that these SPC methods are capable of detecting all forms of gaming of risk factors. Although upcoding

65


Chapter 5

in high-risk patients was more difficult to identify in the separate risk factors, the evident increase in the mean logistic EuroSCORE was clearly demonstrated by the SPC methods. The importance of surveillance Evaluation of outcomes constitutes a fundamental element of quality maintenance and improvement in healthcare.(1;2) It is therefore not surprising that the focus of monitoring lies on the outcome measures. SPC methods such as the CUSUM have been applied many times in different fields of healthcare in order to monitor mortality or another outcome measure.(7-10;17-22) However, for most interventions the outcome measure in itself is not sufficient to enable evaluation of results and risk adjustment using risk factors is required.(2) This means both changes in the outcome as well as in patient severity (i.e. the risk factors) influence the benchmarking results. To improve apparent clinical performance, risk factors might be intentionally upcoded to exaggerate patient severity. This phenomenon is also called â&#x20AC;&#x153;gamingâ&#x20AC;?.(3;23;24) Audits that are performed to check data accuracy, usually verify only a small part of the data. Moreover, they are expensive, time consuming and laborious. Therefore, much could be gained with methods that allow a central, yet strict surveillance of risk factors in large databases. To our knowledge, we are the first to describe the use of SPC techniques in their use to monitor risk factors in clinical databases. Using a monitoring system Changes in risk factors could reflect three possible mechanisms: 1) actual trends and changes in a risk profile, 2) coding variability and 3) invalidity of data. With regard to the first point, a new treatment option or a new indication for a treatment could for instance affect the patient risk profile of a center. For example, the increasing transcatheter implantation of heart valves is likely to have increased the risk profile of the performing centers. In addition, a change of frequency might also be caused by chance, possible seasonal fluctuations of risk factors, summer recess and so on. Coding variability between hospitals can be caused by different practices in risk factor detection (e.g. routine or targeted pulmonary hypertension testing), differences in devices (e.g. interlaboratory variability in the measurement of creatinine), interobserver variability (in variables such as neurological dysfunction) and differences in standard care (e.g. intra venous nitroglycerine for angina, resulting in unstable angina according to the EuroSCORE definition). Importantly, however, changes in risk factor frequency could reflect invalidity of data. Causes include software errors, unintentional erroneous coding and gaming. In order to identify the actual cause of the increased or decreased frequency, the first step is to signal the change. The monitoring of variables in any database is therefore crucial to guard the accuracy of data. Whenever an increased frequency is identified, the underlying cause remains to be investigated and the situation has to be explored beyond statistics.

66


Statistical methods to monitor risk factors in a clinical database

Advantages and disadvantages of the three methods When the theories behind the methods are appreciated, it can be reasoned what added value each method has in the monitoring of variables. The Shewhart control chart has the advantage of ease of application and interpretation. Every month is considered as a separate measurement, which is tested against the values of the boundaries. The chart is capable of identifying an isolated odd measure immediately, whereas this is likely to be averaged out by the CUSUM and EWMA method. The disadvantages include the resulting multiple testing (every month the frequency is tested) and the fact that subtle changes are not likely to be detected, because measurements are not cumulated in any way. The CUSUM chart on the other hand, does take into account all previous measurements. This allows identification of subtle changes that occur over a longer period. Studies using CUSUM techniques to monitor outcomes have shown that the method can detect increased mortality rates earlier than standard statistical techniques.(25) Although all three methods have the advantage of continuous monitoring (i.e. increases in frequency are detected during the year instead of at the end of an arbitrary time frame), the CUSUM takes into account the fact that data is accumulated over time and multiple testing is avoided.(26;27) EWMA bears some resemblance to the CUSUM method. The larger weighting of the frequency in the most recent month causes the chart to remain sensitive to changes, irrespective of the number of past measurements. However, this also makes the method less sensitive to subtle but consistent changes compared the CUSUM. Taking these characteristics into account, it is advisable to use either of the two methods in addition to a Shewhart control chart. Monitoring in practice In practice, the monitoring process can be simplified by focussing on a composite measure (in this case the logistic EuroSCORE), allowing surveillance of multiple risk factors at the same time. When the goal is to detect gaming of risk factors, then upcoding is the only issue of interest and monitoring of increases would suffice. However, when it is trends and changes in risk profiles that one is interested in, decreases are valuable information as well. In practice, if the logistic EuroSCORE is increased or decreased, the charts of the separate risk factors can be further investigated. The concerning hospital could be requested to present possible causes for the change in risk profile. If no plausible explanation is provided, a comparison between the medical files and the database might be made by means of an on-site audit. Limits can be maintained until the changes in prevalence are confirmed. In addition, limits should be recalculated periodically, for example every year, to maintain sensitivity and up-to-date expected frequencies of all risk factors. Lastly, the efficiency of a planned audit could be optimized using the results of the monitoring procedure. For example, records coded with certain risk factors could specifically be audited.

67


Chapter 5

Possible limitations and strengths The extent of misclassification needed to affect benchmark results, depends on many factors, such as: the model used for risk adjustment, the distribution of risk factors and the dispersion of between-hospital differences. This means that the extent of upcoding of risk factors â&#x20AC;&#x201C; deduced from our previous study on misclassification of data - might be specific to this database. Therefore, the exact results of the simulation study do not apply to other data either. However, considering the national coverage of our database and the large amount of individual patient data that was used for our simulations, we expect the general conclusions from this study to be comparable in other large clinical databases. Ideally the SPC methods presented in this article should be externally validated in their ability to detect â&#x20AC;&#x153;gamingâ&#x20AC;?. However, this is unattainable because the true extent of gaming practices will always be unknown. For this reason, we used different scenarios and simulated data to validate the methods (internal validation). SPC methods have extensively been studied in their ability to detect changes in many types of other processes, including those in healthcare.(7-10) Therefore, we believe that the methods we applied yielded valid conclusions. Although many have described the use of SPC in healthcare, none of the previous studies have focussed on the monitoring of risk factors.(7-10;17-22) To our knowledge we are the first to describe the use of these efficient methods to improve and maintain data accuracy of clinical databases. Other strengths of this paper are its wide applicability to clinical databases in all medical fields and the possible implications to the costs and maintenance efforts of databases. SPC could potentially cut on the expenses of database maintenance by maximizing the efficiency of laborious and expensive on-site audits. Conclusion Statistical Process Control in the form of Shewhart control, EWMA and CUSUM charts provide a means to monitor changes in risk factor frequencies in a clinical database. Surveillance of the overall expected risk in addition to the separate risk factors ensures a high sensitivity to detect gaming using these methods. The use of SPC for risk factor surveillance is recommended.

68


Statistical methods to monitor risk factors in a clinical database

Reference List 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.

14.

15. 16. 17. 18. 19. 20. 21. 22. 23.

Neuhauser D. Ernest Amory Codman MD. Qual Saf Healthcare 2002;11:104-5. Iezzoni LI. Risk adjustment for measuring healthcare outcomes. Ann Arbor, Mich: Health Administration Press; 1994. Green J, Wintfeld N. Report cards on cardiac surgeons. Assessing New York Stateâ&#x20AC;&#x2122;s approach. N Engl J Med 1995;332:1229-32. Schaff HV, Brown ML, Lenoch JR. Data entry and data accuracy. J Thorac Cardiovasc Surg 2010;140:960-1. Montgomery DC. Introduction to statistical quality control. Third edition ed. United States: John Wiley & Sons, Inc.; 1997. Oakland JS. Statistical process control. Fifth edition ed. Burlington, MA: Elsevier ButterworthHeinemann; 2003. Calsina L, Clara A, Vidal-Barraquer F. The use of the CUSUM chart method for surveillance of learning effects and quality of care in endovascular procedures. Eur J Vasc Endovasc Surg 2011;41:679-84. Grigg OA, Farewell VT, Spiegelhalter DJ. Use of risk-adjusted CUSUM and RSPRT charts for monitoring in medical contexts. Stat Methods Med Res 2003;12:147-70. Grunkemeier GL, Wu YX, Furnary AP. Cumulative sum techniques for assessing surgical results. Ann Thorac Surg 2003;76:663-7. Sibanda T, Sibanda N. The CUSUM chart method as a tool for continuous monitoring of clinical outcomes using routinely collected data. BMC Med Res Methodol 2007;7:46. Nashef SA, Roques F, Michel P, Gauducheau E, Lemeshow S, Salamon R. European system for cardiac operative risk evaluation (EuroSCORE). Eur J Cardiothorac Surg 1999;16:9-13. Parsonnet V, Dean D, Bernstein AD. A method of uniform stratification of risk for evaluating the results of surgery in acquired adult heart disease. Circulation 1989;79:I3-12. Shahian DM, Oâ&#x20AC;&#x2122;Brien SM, Filardo G, Ferraris VA, Haan CK, Rich JB, Normand SL, DeLong ER, Shewan CM, Dokholyan RS, Peterson ED, Edwards FH, Anderson RP. The Society of Thoracic Surgeons 2008 cardiac surgery risk models: part 1--coronary artery bypass grafting surgery. Ann Thorac Surg 2009;88:S2-22. Siregar S, Groenwold RHH, Versteegh MIM, Noyez L, Ter Burg WJPP, Bots ML, van der Graaf Y, van Herwerden LA. Gaming in risk-adjusted mortality rates: Effect of misclassification of risk factors in the benchmarking of cardiac surgery risk-adjusted mortality rates. J Thorac Cardiovasc Surg. 2012; 145:781-9. Newcombe RG. Two-sided confidence intervals for the single proportion: comparison of seven methods. Stat Med 1998;17:857-72. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2011. Grunkemeier GL, Jin R, Wu Y. Cumulative sum curves and their prediction limits. Ann Thorac Surg 2009;87:361-4. Poloniecki J, Sismanidis C, Bland M, Jones P. Retrospective cohort study of false alarm rates associated with a series of heart operations: the case for hospital mortality monitoring groups. BMJ 2004;328:375. Sherlaw-Johnson C, Morton A, Robinson MB, Hall A. Real-time monitoring of coronary care mortality: a comparison and combination of two monitoring tools. Int J Cardiol 2005;100:301-7. Spiegelhalter D, Grigg O, Kinsman R, Treasure T. Risk-adjusted sequential probability ratio tests: applications to Bristol, Shipman and adult cardiac surgery. Int J Qual Healthcare 2003;15:7-13. Steiner SH, Cook RJ, Farewell VT, Treasure T. Monitoring surgical performance using risk-adjusted cumulative sum charts. Biostatistics 2000;1:441-52. Williams SM, Parry BR, Schlup MM. Quality control: an application of the cusum. BMJ 1992;304:135961. Califf RM, Jollis JG, Peterson ED. Operator-specific outcomes. A call to professional responsibility. Circulation 1996;93:403-6.

69


Chapter 5

24. 25. 26. 27.

70

Shahian DM, Normand SL, Torchiana DF, Lewis SM, Pastore JO, Kuntz RE, Dreyer PI. Cardiac surgery report cards: comprehensive review and statistical critique. Ann Thorac Surg 2001;72:2155-68. Novick RJ, Stitt LW. The learning curve of an academic cardiac surgeon: use of the CUSUM method. J Card Surg 1999;14:312-20. Altman DG, Royston JP. The hidden effect of time. Stat Med 1988;7:629-37. McPherson K. Statistics: the problem of examining accumulating data more than once. N Engl J Med 1974;290:501-2.


6 Evaluation of cardiac surgery mortality rates: 30-day mortality or longer follow-up?

Siregar S, Groenwold RHH, de Mol BAJM, Speekenbrink RGH, Versteegh MIM, Brandon Bravo Bruinsma GJ, Bots ML, van der Graaf Y, van Herwerden LA Eur J Cardiothorac Surg (accepted)


Chapter 6

Abstract Background There is no consensus on the optimal period during which to assess mortality related to cardiac surgery. The aim of our study was to investigate early mortality after cardiac surgery and to determine the most adequate follow-up period for evaluation of mortality rates. Methods Information on all adult cardiac surgery procedures in 10 out of 16 cardio-thoracic centers in The Netherlands from 2007 until 2010 was extracted from the database of the Netherlands Association for Cardio-Thoracic Surgery (n=33,094). Survival up to one year after surgery was obtained from the national death registry. Survival analysis was performed using KaplanMeier and Cox regression analysis. Benchmarking was performed using logistic regression with mortality rates at different time points as dependent variable, the logistic EuroSCORE as covariate and a random intercept per center. Results In-hospital mortality was 2.94% (n=972), 30-day mortality 3.02% (n=998), operative mortality 3.57% (n=1181), 60-day mortality 3.84% (n=1271), 6-months mortality 5.16% (n=1707) and 1-year mortality 6.20% (n=2052). The survival curves showed a steep initial decline followed by stabilization after approximately 60 to 120 days, depending on the intervention performed e.g. 60 days for isolated coronary artery bypass grafting (CABG) and 120 days for combined CABG and valve surgery. Benchmark results were affected by the choice of followup period: 4 hospitals changed outlier status when follow-up was increased from 30-days to 1 year. In the isolated CABG subgroup benchmark results were unaffected: no outliers were found using 30-day or 1-year follow-up. Conclusions The course of early mortality after cardiac surgery differs across interventions and continues up to approximately 120 days. Thirty-day mortality reflects only a part of early mortality after cardiac surgery and should only be used for benchmarking of isolated CABG procedures. Follow-up should be prolonged to capture early mortality of all types of interventions.

72


30-day mortality or longer follow-up?

Background Mortality is the most commonly used outcome measure in cardiac surgery. The fact that it is unambiguous and relatively easy to determine, makes it an appealing measure for outcomes evaluation. There is no consensus on the optimal period during which to assess mortality related to cardiac surgery. Often in-hospital or 30-day mortality is used, but some have opted for longer follow-up periods varying from 60 days up to six months.(15) Previous studies comparing these outcome measures led to varying conclusions. Where some studies conclude that in-hospital and 30-day mortality are nearly identical, others show that evidently lower rates are measured when in-hospital mortality is compared to 30-day mortality.(3,6) If differences between mortality measures exist, the results of outcomes evaluation or benchmarking might depend on which mortality measure is compared across hospitals, i.e. after which time interval mortality is measured. There have been studies implying that the use of 30-day or 180-day mortality after coronary artery bypass grafting (CABG) would not alter benchmarking results.(7) However, the topic remains frequently debated whenever outcomes are evaluated. To our knowledge, our study is the first to investigate the impact of the use of different outcome measures on benchmarking using clinical data. The aim of our study was to investigate early mortality after cardiac surgery and determine the most adequate follow-up period for evaluation of mortality rates.

Methods Data Information was extracted from the database of the Netherlands Association for CardioThoracic Surgery (NVT). All records of adult cardiac surgery in 10 out of 16 cardio-thoracic centers in The Netherlands from 1 January 2007 until 31 December 2010 were used, which comprised 34234 surgical procedures. There were 213 (=0.6%) records with one or more missing EuroSCORE variables, which were excluded, leaving 34021 complete cases for further analyses. The dataset consisted of demographic characteristics, details on the intervention, in-hospital mortality and risk factors for mortality after cardiac surgery, notably EuroSCORE variables.(8) Patient follow-up Survival status was obtained from the national death registry by linkage of data. Linkage was facilitated by Statistics Netherlands.(9) All 10 centers consented to the linkage. All deaths occurring up to the 31 December 2011 were extracted including cause and place of death. Linkage between the NVT data and the death registry was performed using matching based

73


Chapter 6

on date of birth, sex and postal code at the time of surgery. The sensitivity of this matching procedure was 97.3% (927 patients could not be matched). This means that a minimum follow-up of one year could be performed for in total 33094 interventions (among which 620 reoperations performed in patients previously included in the database). All analyses were performed on intervention level, meaning one patient could be counted multiple times in case of reoperations. Early mortality measures: The following early mortality measures were calculated: in-hospital mortality (mortality in the hospital where cardiac surgery was performed and in the same admission), 30-day mortality (mortality within 30 days after cardiac surgery, regardless of place of death), operative mortality (in-hospital mortality or mortality within 30 days after cardiac surgery), 60-, 90-, 120-, 180-day and one year mortality (mortality within 60, 90, 120, 180 days and one year after cardiac surgery, regardless of place of death). Mortality at fixed time intervals includes all mortality up to that point, including all causes and irrespective of location. Survival and hazards: The risk of mortality after surgery at any given time can be expressed as the instantaneous hazard. It can be calculated by dividing the number of deaths by the number at risk at any time during follow-up and thus represents the risk of mortality at that moment. Hazards and survival functions are different ways to describe time-to-event data (in this case time-todeath), but in fact give the same information: the survival at any point in time is 1 minus the cumulative hazard up to that point. The instantaneous hazard after cardiac surgery varies with time. Survival functions are calculated using all deaths as events, including all causes and irrespective of location. Risk adjustment and benchmarking For the benchmarking procedure, risk adjustment was performed using the logistic EuroSCORE. This model is the most commonly used risk adjustment method in Europe and its definitions are used in the NVT database. The logistic EuroSCORE was calculated for each patient. Subsequently, benchmarking was performed using each early mortality measure. A random effects model was fitted with one of the mortality measures as the outcome variable and including the logistic EuroSCORE as covariate. A random effects model accounts for within-hospital variability and between-hospital variability and is the preferred type of regression model used for comparison between centers.(10,11) This regression model thus assumes that mortality is partly explained by patient characteristics (i.e. disease severity quantified by the EuroSCORE) and partly by a center effect, which is specific to each center and can be compared across centers.(12,13)

74


30-day mortality or longer follow-up?

Analyses All abovementioned early mortality measures were calculated. Non-parametric survival analysis was performed using the Kaplan-Meier method.(14) The survival rate in the agematched general population was calculated and compared to the survival rate in our study population. Survival functions were calculated for all cardiac surgery and for different strata of preoperative risk, quantified by quartiles of the logistic EuroSCORE values. In addition, survival functions stratified by type of intervention were calculated. Lastly, the survival function for all cardiac surgery was calculated using only cardiac mortality. Risk-adjusted survival functions were calculated using the Cox Proportional Hazard method with the logistic EuroSCORE as a covariate.(15) Survival functions corrected for logistic EuroSCORE were calculated while stratifying for type of intervention and center. In addition, time-dependency of the effect of the logistic EuroSCORE and the effect of intervention type was investigated by testing if the coefficients were constant in time (slope=0), indicating proportional hazards. A random effects model with one fixed intercept and a random intercept for each center was modelled. The random intercepts were compared to the overall random intercept value of 0. Statistical uncertainty was addressed by estimating 95% confidence intervals (CI) of the random intercepts for all centers using the posterior variances.(16) Centers with a CI of the random intercept that does not cover 0 are identified as statistical outliers. Random intercepts above 0 reflect higher than expected mortality rates; those below 0 lower than expected mortality. Benchmarking using regression analysis was repeated for each of the eight mortality measures. All analyses were performed in R version 2.12.(17) Code is available on request.

Results Overall mortality rates and survival In total data on 33094 interventions were extracted from the NVT national database. The study population is described in Table 1. Total follow-up time after intervention was 90386.6 years and the mean follow-up time was 996.9 days. Early mortality rates using the different measures are presented in Figure 1. Mortality after discharge from the primary hospital was doubled after one year: from 972 deaths (2.94%) to 2052 deaths (6.20%). In-hospital and 30-day mortality were nearly identical. However, in Table 2 the difference between these outcome measures is shown. Approximately 20% of all deaths during admission occur after 30 days. The other way around holds true as well: 20% of all deaths within 30 days occur at home or at another care facility.

75


Chapter 6

Table 1: Characteristics of study population N (%) N = 33094 Risk factors for mortality after cardiac surgery Age (continuous) Female Serum creatinine >200 μmol/l Extracardiac arteriopathy Pulmonary disease Neurological dysfunction Previous cardiac surgery Recent myocardial infarct LVEF 30–50% LVEF <30% Systolic pulmonary pressure >60 mmHg Active endocarditis Unstable angina Emergency operation Critical preoperative state Ventricular septal rupture Other than isolated coronary surgery Thoracic aortic surgery Logistic EuroSCORE Types of intervention CABG Isolated CABG Valve Isolated valve Aortic Mitral Double valve Other CABG and valve (and other cardiac surgery) Aortic Mitral Double valve Other Aortic surgery

76

65.8 (±11.3) 9911 (29.9) 654 (2.0) 4043 (12.2) 3731 (11.3) 991 (3.0) 2354 (7.1) 4098 (12.4) 5043 (15.2) 1699 (5.1) 800 (2.4) 538 (1.6) 1894 (5.7) 1899 (5.7) 1352 (4.1) 61 (0.2) 15324 (46.3) 1682 (5.1) mean 6.8 (± 9.3) median 3.7 22696 (68.6) 17711 (53.5) 12262 (37.1) 6653 (20.1) 4128 (12.5) 1466 (4.4) 861 198 4500 (13.6) 2746 (8.3) 1198 (3.6) 476 80 1685 (5.1)


30-day mortality or longer follow-up?

The Kaplan Meier survival analysis of all cardiac surgery is shown in Figure 1. The survival curves of the cardiac surgery population and that of the general Dutch population run parallel to each other from approximately 120 days onwards. The mortality rate in the remainder of the first year is 0.065 (95% CI of 0.060 - 0.710) deaths per 1000 person-days and is comparable to the mortality rate in the age-matched general population of 0.06 deaths per 1000 person-days. The hazard function in Panel B seems to stabilize after the same period. Analyses using only cardiac mortality yielded similar results. Figure 1: Kaplan Meier survival curve with 95% confidence interval after cardiac surgery %

2.94 3.02 3.57 4.68 5.16 6.20

0.0015

N

972 998 1181 1549 1707 2052

30

60

90

120

180

365

Days after surgery

0.0000

90

92

0.0005

94

0.0010

Hazard

96

Survival (%)

98

In−hospital mortality 30−day mortality Operative mortality 120−day mortality 6−months mortality 1−year mortality

0.0020

B

100

A

30

60

90 120

180

365

Days after surgery

The green line represents the survival rate of the age-matched general population in The Netherlands. The survival rate of the cardiac surgery population equals that of the general population from approximately 120 days after surgery onwards. The hazard (risk of mortality) after cardiac continues to decline well after 30 days postoperatively. The constant phase of the hazard seems to start after approximately 120 days. Table 2: Comparison of in-hospital and 30-day mortality 30-day mortality

In-hospital mortality

No

Yes

No

31913 (94.6 %)

209 (0.6 %)

32122 (97.1 %)

Yes

183 (0.6 %)

789 (2.4 %)

972 (2.9 %)

32096 (97.0 %)

998 (3.0 %)

33094 (100 %)

Figures in bold indicate the number of deaths that are included in 30-day mortality, but not in inhospital mortality and the other way around.

77


Chapter 6

EuroSCORE and survival Figure 2 shows the survival curves for each quartile of EuroSCORE (with interquartile logistic EuroSCOREs of 1.94/3.74/7.48%) and the accompanying hazard functions. This Figure shows that the risk of dying is higher with high logistic EuroSCOREs. This holds true for the whole follow-up period of one year. In the low EuroSCORE stratum, most mortality occurs in the first 60 days postoperatively, whereas in the stratum with the highest EuroSCORE most mortality occurs in the first 120 days. This means that the duration of the early phase of the hazard after cardiac surgery depends on the preoperative risk. The effect of the EuroSCORE appeared to be time-dependent (slope of coefficient -0.212, p<0.0001). This means that the effect of the logistic EuroSCORE (i.e. the preoperative risk) on mortality decreases with time. Figure 2: Kaplan-Meier survival functions for each quartile of logistic EuroSCORE and accompanying hazard functions

B

1st quartile logistic EuroSCORE 2nd quartile logistic EuroSCORE 3rd quartile logistic EuroSCORE 4th quartile logistic EuroSCORE

80

1st quartile logistic EuroSCORE 2nd quartile logistic EuroSCORE 3rd quartile logistic EuroSCORE 4th quartile logistic EuroSCORE 30

60

90

120

180

Days after surgery

365

0.0000

0.0005

85

0.0010

Hazard

90

Survival (%)

95

0.0015

100

0.0020

A

30

60

90 120

180

Days after surgery

365

Panel A: Kaplan-Meier survival functions for each quartile of logistic EuroSCORE; Panel B: Hazard functions for each quartile of logistic EuroSCORE. In the low EuroSCORE strata, most mortality occurs in the early period after surgery. The hazard is nearly stable after 30 days. In contrast, in the high risk strata survival continues to drop well after 30 days, as also evident by the continuously declining hazard functions.

Survival across types of interventions and across centers Figure 3 shows the risk-adjusted survival and hazard functions, stratified in the following intervention groups: isolated CABG, isolated valve, CABG and valve and other cardiac surgery. The curves all correspond to a patient with the median logistic EuroSCORE value of 3.74%. Stabilization of hazards is seen after a varying period of time. The hazard in the isolated CABG subgroup appears to reach the constant phase much earlier than the other intervention groups, approximately after 60 days. For the isolated valve subgroup and for the CABG and valve group subgroup this takes approximately 90 and 120 days respectively. The

78


30-day mortality or longer follow-up?

effect of intervention group appeared to be time-dependent (slopes of coefficients -0.04, -0.06, -0.12, p=.01, p<0.001, p<0.001). This means that the effect of the type of intervention decreased with time. Figure 3: Risk-adjusted survival functions for different types of interventions and accompanying hazard functions

B

Isolated CABG Isolated valve CABG and valve Other

88

Isolated CABG Isolated valve CABG and valve Other 30

60

90

120

180

Days after surgery

365

0.0000

90

0.0005

92

94

0.0010

Hazard

Survival (%)

96

0.0015

98

100

0.0020

A

30

60

90 120

180

Days after surgery

365

Panel A: Risk-adjusted survival curves (corrected for the logistic EuroSCORE), stratified by the following intervention groups: isolated CABG, isolated valve, CABG and valve and other cardiac surgery; Panel B: Risk-adjusted hazard functions, stratified by the same intervention groups. The curves correspond to a patient with the median logistic EuroSCORE value of 3.74%. Even after risk adjustment, stabilization of hazards is seen after a varying period of time. The hazard in the isolated CABG subgroup appears to reach the constant phase much earlier than the other intervention groups.

Figure 4 shows the risk-adjusted survival and hazard functions for the 10 centers. The curves all correspond to a patient with the median logistic EuroSCORE value of 3.74%. Again, stabilization of hazards is seen after a varying period of time, even when risk adjustment is performed. For example, in the center corresponding with the dark blue line, the hazard appears to stabilize earlier than in the other centers. Overall hazards reached the constant phase at approximately 120 days.

79


Chapter 6

Figure 4: Risk-adjusted survival functions of the ten hospitals and accompanying hazard functions

B

30

60

90

120

180

Days after surgery

365

0.0000

90

92

0.0005

94

0.0010

Hazard

96

Survival (%)

0.0015

98

100

0.0020

A

30

60

90 120

180

Days after surgery

365

Panel A: Risk-adjusted survival functions for the 10 centers; Panel B: Hazard functions for the ten centers. The curves all correspond to a patient with the median logistic EuroSCORE value of 3.74%. Stabilization of hazards is seen after a varying period of time, even when risk adjustment is performed. For example, in the center corresponding with the dark blue line, the hazard appears to stabilize earlier than in the other centers. Overall, hazards reached the constant phase after approximately 120 days.

Benchmarking using different outcome measures The effect of using different outcome measures on the benchmarking procedure is shown in Figure 5. When in-hospital mortality (blue symbols) is used as outcome, one low mortality outlier (center A) and two high-mortality outliers are found (centers I and J). However, by using 30-day mortality as outcome measure, two other centers are identified as outliers as well: center B as a low mortality outlier and center E as a high-mortality outlier. Prolonging follow-up from 30 days to one year leads to changes in outlier status in four hospitals (centers B, C, H, and J). When the same is done for a subset of isolated CABG procedures, benchmarking results remain unchanged with different follow-up periods. This is shown in Figure 6.

80


30-day mortality or longer follow-up?

● ●

0.0

● ●

in−hospital mortality 30−day mortality operative mortality 60−day mortality 90−day mortality 120−day mortality 180−day mortality 1−year mortality

● ●

−1.0

−0.5

center effect

0.5

1.0

Figure 5: Benchmarking of all cardiac surgery in 10 hospitals using different mortality measures, from 2007 to 2010

A

B

C

D

E

F

G

H

I

J

centers

0.0

−0.5

−1.0

center effect

0.5

1.0

Figure 6: Benchmarking of isolated CABG in 10 hospitals using different mortality measures, from 2007 to 2010

A

D

I

C

B

G

H

E

in−hospital mortality 30−day mortality operative mortality 60−day mortality 90−day mortality 120−day mortality 180−day mortality 1−year mortality F

J

centers

81


Chapter 6

Discussion Main findings We used survival status after all adult cardiac surgery procedures in the Netherlands from 2007 to 2010 to study early mortality after cardiac surgery. The differences between mortality rates during hospital stay, after 30 days or after longer intervals were assessed and benchmarking was performed using these different outcomes. The slope of the survival function after cardiac surgery continues to decline many days after the usual 30-day cut off point for evaluation. The decline of the slopes depends on the performed intervention: isolated CABG procedures seem to reach a stable phase after approximately 60 days, whereas valvular or combined interventions maintain a higher hazard for a longer period. Similarly, the decline of the slopes also depends on the preoperative risk of mortality, as measured with the logistic EuroSCORE. Benchmarking using in-hospital mortality, 30-day mortality or longer fixed-period mortality rates yield different results. Average-mortality centers could change into outliers when follow-up is extended up to one year and vice versa. From approximately 120 days onwards the hazards had reached a constant phase for all the types of interventions. Benchmarking of different types of interventions should therefore not be performed before this period. When follow-up is shorter, it is recommended that benchmarking is only performed in isolated CABG procedures. Differences in outcome measures Our results show that mortality rates can vary largely depending on the cut-off point used for follow-up. Mortality at discharge or at 30 days is more than doubled after one year. Previous studies found similar results. Edwards et al. studied over 80,000 patients in the UK Heart Valve Registry and found almost a doubling of the mortality rate after one year, when compared to 30 days.(18) In-hospital mortality and 30-day mortality was nearly equal in our database. A large study comparing hospital mortality, 30-day mortality and operative mortality rates in CABG reported similarly.(6) The authors concluded that because the numbers are nearly identical, the more convenient outcome in-hospital mortality could be used for outcomes evaluation. However, although the in-hospital and the 30-day mortality are equivalent in numbers, they do not refer to the same patients: 20% of the patients counted in each measure are not included in the other measure. This difference is relevant, because patients that die within 30 days (either in the hospital or elsewhere) are likely to be different from the patients that remain in the hospital for a long time and eventually die. The latter type of mortality is more likely to be influenced by preoperative comorbidities than the former.(2) Thus, inhospital mortality and 30-day mortality measure two different types of end points and are not interchangeable.

82


30-day mortality or longer follow-up?

Hospital- or fixed interval mortality? Hospital mortality rates depend on the postoperative transfer policy of patients to other healthcare facilities or back to the referring hospitals. A hospital with a policy of relatively early discharge or transfer will have lower in-hospital mortality rates than a similar hospital with a policy of late discharge or transfer. The fact that the moment of transfer is at the discretion of providers leaves room for â&#x20AC;&#x153;gamingâ&#x20AC;? of results: mortality rates can be kept low by early transfer of patients to other healthcare facilities.(19) Carey et al. investigated the exact impact of discharge to other healthcare facilities on in-hospital mortality.(20) They concluded that a substantial percentage of in-hospital deaths occur after discharge from the primary institution and that the reported in-hospital death rate might therefore be an underestimation of the true in-hospital death rate. Other studies have also shown the discrepancies between hospital mortality and 30-day mortality and similarly concluded that the former relates to institution-specific discharge policy rather than outcomes useful for benchmarking.(2) These problems relating to in-hospital mortality can be avoided by using mortality rates at a fixed period after surgery, independent of the place of death. The effect of casemix on survival and benchmarking The duration of follow-up (i.e. the use of different outcome measurements) has clear consequences on the benchmarking results. In our empirical data, benchmarking of inhospital mortality yields other outliers than that of 30-day or 1-year mortality. Center H is initially benchmarked as an average center, but becomes an outlier when follow-up is extended to 60 days or more. The opposite occurs in Center J. The hospital is initially identified as a high-mortality outlier, but converts its outlier status into average after 120 days. Changing positions with relation to the benchmark reflect crossing of hazard curves (i.e. have not reached a steady state yet). Our results show that survival differs for each type of intervention. As a result, the total hazard curve of a hospital depends on the type of interventions performed. Thus, the underlying mechanism causing the observed changes in outlier status is probably the difference in the performed types of interventions. When benchmarking is performed only with isolated CABG procedures results are unaffected by the choice of follow-up period. This is best illustrated using the following example: a hospital with mainly isolated CABG procedures will have 60- and 90-day mortality rates that will be nearly comparable. After all, the hazard for isolated CABG has nearly reached a steady state after 60 days, meaning mortality rates will not rise much after 60 days in both hospitals. In contrast, a hospital where mainly combined CABG and valve procedures are performed will have a 90-day mortality rate that will be considerably higher than the 60-day mortality rate. As shown in Figure 3 this is due to the fact that the hazard is still on the steep decline between 60 and 90 days. For this hospital an evaluation of mortality rates across centers is therefore much more beneficial after 60-days than after 90-days. When the goal is to

83


Chapter 6

evaluate early mortality, a follow-up of 90 days or even longer is clearly more adequate in this example to ensure early mortality is captured as completely as possible. If only isolated CABG procedures would be compared, most mortality has already occurred in the first 30 days and the difference with longer follow-up periods is expected to be small. This is confirmed in Figure 6, where benchmarking results are similar using a follow-up of 30 days and longer periods. A Health Services Research study following 5000 CABG patients for six months concluded similarly. When observed minus expected mortality rates at 30-days and at 180-days were compared, the ranking lists composed using these two outcomes hardly differed.(7) What is the adequate follow-up interval for benchmarking? Having shown the differences between the mortality measures and the effects on benchmarking, the next question is to decide which measure to choose. Considering the arguments mentioned above, it would be logical to take the longest follow-up possible before benchmarking is performed. However, a long follow-up has several downsides. First, follow-up of patients after discharge is time-consuming and requires effort and money. This problem should not be underestimated, since incomplete follow-up might lead to biased results.(21) Secondly, the question is what it is that needs to be measured; different mortality rates might reflect other processes. For instance, patient compliance to medication, quality of home care, extent of involvement of the cardiologist and many other factors have an increasing effect on the risk of mortality after discharge. On the contrary, the effect of the initial care provided around the intervention in the hospital is likely to decrease in time. Therefore, it seems less adequate to measure mortality after one year or longer, when it is the process surrounding the cardiac surgery that one is interested in. In 1986, Blackstone et al. suggested that the hazard can be subdivided into an early, constant and a late phase. (22,23) In benchmarking of outcomes, the early phase reflects the part of the process of care that we want to evaluate. Sergeant and Blackstone found an early phase hazard that lasted for 6 months after CABG, suggesting that follow-up should be extended to a half year after surgery.(4) Other studies evaluating early mortality after CABG found that early mortality occurred up to approximately 60 days.(2) The authors expected this interval to be even longer in procedures performed in more recent years. We found similar results for isolated CABG procedures. For other interventions stabilization of hazards was seen after approximately 120 days. Based on these findings, it seems advisable to extend follow-up to a period beyond the commonly used 30 days. In case longer follow-up is not attainable, evaluation of mortality rates should only be performed within specific procedures.

84


30-day mortality or longer follow-up?

Final notes Although we assume the effect of the intervention to decrease over time in general, some procedural factors do influence hazards even years after surgery.(5) For example, the beneficial effects of the use of arterial grafts are seen in an improved late survival. Longterm follow-up is difficult to accomplish. Ideally, it is performed using structured followup methods that are incorporated as fixed elements in the whole process of care. It must be stressed that our study restricts analyses and conclusions to early mortality. It is questionable whether long-term outcomes are useful for the purpose of benchmarking, considering the fact that increased mortality rates should be identified as soon as possible to allow immediate action and where possible, to prevent further excess deaths. Secondly, it must be stressed that benchmarking in these analyses is performed on all cardiac surgery interventions, meaning complex and very specific interventions are included as well. It is questionable whether the logistic EuroSCORE provides adequate risk adjustment in this very heterogeneous group. Consequently, residual casemix might very well have influenced results. This study illustrates the effect of various outcome measures on the benchmarking results and does not focus on the specific results in itself. Outliers in these analyses should thus be interpreted as statistical outliers, as further investigations on residual casemix should follow and results should be interpreted with caution. Possible limitations Ten of the 16 centers performing cardio-thoracic surgery in The Netherlands consented to the linkage of data with the national death registry. This resulted in a comprehensive multicenter database of all types of cardiac surgery, including a follow-up of one year or longer. However, this also means that approximately a third of all cardiac surgery procedures in The Netherlands (in the non-participating centers) from 2007 to 2010 were not included in our analyses. Differences between the population treated by the participating and nonparticipating centers could theoretically affect the generalizability of the results. However, a comparison of risk factors showed no significant differences between our study population and the six other cardiac surgery centers (results not shown, available on request). Results are therefore likely to be generalizable to other populations. In addition, the matching performed to establish a linkage to the national death registry had a sensitivity of 97.3 %. Unmatched individuals were removed from further analyses. Baseline characteristics of these patients were comparable to the matched patients, with no significant difference in in-hospital mortality (p= 0.494) and logistic EuroSCORE (p=0.174). Therefore, we assume that this constituted missing completely at random and therefore removal is unlikely to have biased our results.

85


Chapter 6

Analyses were performed at an intervention level, meaning that the death of one patient was counted twice in case of a second heart operation. The NVT database does not contain person-identifying variables and analyses at a patient level are therefore not possible. Although in this study interventions could be linked to individual patients using the national death registry, the NVT database would otherwise not have had this option. To maintain consistency of methods, we chose to perform survival analyses at an intervention level as well. In total 620 reoperations were performed. Considering our large study population, we assume the influence on our results was minimal. Conclusion The course of early mortality after cardiac surgery differs across interventions and continues up to approximately 120 days. Thirty-day mortality reflects only a part of early mortality after cardiac surgery and should only be used for benchmarking of isolated CABG procedures. To capture early mortality of all types of interventions, follow-up must be prolonged.

86


30-day mortality or longer follow-up?

Reference List 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.

14. 15. 16. 17. 18. 19. 20. 21. 22. 23.

Noyez L, Verheugt FW, Peppelenbosch AG, Skotnicki SH, Brouwer MH. [Aortocoronary bypass surgery; at least 6 months follow-up required for assessment of postoperative course]. Ned Tijdschr Geneeskd 2000 Sep 23;144:1874-7. Osswald BR, Blackstone EH, Tochtermann U, Thomas G, Vahl CF, Hagl S. The meaning of early mortality after CABG. Eur J Cardiothorac Surg 1999 Apr;15:401-7. Osswald BR, Tochtermann U, Schweiger P, Gohring D, Thomas G, Vahl CF, et al. Minimal early mortality in CABG--simply a question of surgical quality? Thorac Cardiovasc Surg 2002 Oct;50:27680. Sergeant P, Blackstone E, Meyns B. Validation and interdependence with patient-variables of the influence of procedural variables on early and late survival after CABG. K.U. Leuven Coronary Surgery Program. Eur J Cardiothorac Surg 1997 Jul;12:1-19. Sergeant PT, Blackstone EH. Closing the loop: optimizing physiciansâ&#x20AC;&#x2122; operational and strategic behavior. Ann Thorac Surg 1999 Aug;68:362-6. Likosky DS, Nugent WC, Clough RA, Weldner PW, Quinton HB, Ross CS, et al. Comparison of three measurements of cardiac surgery mortality for the Northern New England Cardiovascular Disease Study Group. Ann Thorac Surg 2006 Apr;81:1393-5. Garnick DW, DeLong ER, Luft HS. Measuring hospital mortality rates: are 30-day data enough? Ischemic Heart Disease Patient Outcomes Research Team. Health Serv Res 1995 Feb;29:679-95. Nashef SA, Roques F, Michel P, Gauducheau E, Lemeshow S, Salamon R. European system for cardiac operative risk evaluation (EuroSCORE). Eur J Cardiothorac Surg 1999 Jul;16:9-13. Centraal Bureau voor de Statistiek. www.cbs.nl. 2012. Thomas N, Longford NT, Rolph JE. A statistical framework for severity adjustment of hospital mortality rates. Rand, Santa Monica (CA); 1992. Lingsma HF, Steyerberg EW, Eijkemans MJ, Dippel DW, Scholte Op Reimer WJ, Van Houwelingen HC. Comparing and ranking hospitals based on outcome: results from The Netherlands Stroke Survey. QJM 2010 Feb;103:99-108. Siregar S, Groenwold RH, Jansen EK, Bots ML, van der GY, van Herwerden LA. Limitations of ranking lists based on cardiac surgery mortality rates. Circ Cardiovasc Qual Outcomes 2012 May 1;5:403-9. Siregar S, Groenwold RHH, Versteegh MIM, Noyez L, Ter Burg WJPP, Bots ML, van der Graaf Y, van Herwerden LA. Gaming in risk-adjusted mortality rates: Effect of misclassification of risk factors in the benchmarking of cardiac surgery risk-adjusted mortality rates. J Thorac Cardiovasc Surg. 2012 Mar;145:781-9. Kaplan EL, Meier P. Nonparametric estimation from incomplete estimations. Journal of the American Statistical Association 1958;53:457-81. Cox DR. Regression models and life-tables. Journal of the Royal Statistical Society 1972;34:187-220. Lingsma HF, Steyerberg EW, Eijkemans MJ, Dippel DW, Scholte Op Reimer WJ, Van Houwelingen HC. Comparing and ranking hospitals based on outcome: results from The Netherlands Stroke Survey. QJM 2010 Feb;103:99-108. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2011. Edwards MB, Taylor KM. Is 30-day mortality an adequate outcome statistic for patients considering heart valve replacement? Ann Thorac Surg 2003 Aug;76:482-5. Shahian DM, Normand SL, Torchiana DF, Lewis SM, Pastore JO, Kuntz RE, et al. Cardiac surgery report cards: comprehensive review and statistical critique. Ann Thorac Surg 2001 Dec;72:2155-68. Carey JS, Parker JP, Robertson JM, Misbach GA, Fisher AL. Hospital discharge to other healthcare facilities: impact on in-hospital mortality. J Am Coll Surg 2003 Nov;197:806-12. Drolette ME. The effect of incomplete follow-up. Biometrics 1975 Mar;31:135-44. Blackstone EH, Naftel DC, Turner ME. The decomposition of time-varying hazard into phases, each incorporating a separate stream of concomitant information. Journal of the American Statistical Association 1986;81:615-24. Blackstone EH. Outcome analysis using hazard function methodology. Ann Thorac Surg 1996 Feb;61:S2-S7. 87


Part 2 Comparing safety across cardiac surgery centers


7 Performance of the original EuroSCORE

Siregar S, Groenwold RHH, de Heer F, Bots ML, van der Graaf Y, van Herwerden LA Eur J Cardiothorac Surg. 2012 Apr;41(4):746-54


Chapter 7

Abstract Background The European system for cardiac operative risk evaluation (EuroSCORE) is a commonly used risk score for operative mortality following cardiac surgery. We aimed to conduct a systematic review of the performance of the additive and logistic EuroSCORE. Methods A literature search was performed in PubMed. Studies applying the EuroSCORE on patients undergoing cardiac surgery and which reported early mortality were included. Performance was defined as the observed:expected (O:E) mortality ratio and the area under the curve (AUC) of the ROC-curve. Weighted meta-regression analysis was used to study trends in the performance and in time. Results In total 67 articles were included for analysis. The mean O:E ratio was 0.47 for the additive model and 0.43 for the logistic model. This indicates that the EuroSCORE overestimated mortality. However, the performance of the models depended on the risk profile of patients: in high-risk patients, the additive model actually underestimated mortality. Discriminative performance was good: the mean AUC was 0.8 and no trend over time was seen. Conclusions Given the poor predictive performance, the EuroSCORE may not be suitable as a tool for patient selection nor for benchmarking.

92


Performance of the original EuroSCORE

Introduction In the past decades multiple risk scores have been developed to estimate the outcome of cardiac surgery.(1) A commonly applied risk score for operative mortality after cardiac surgery is the European system for cardiac operative risk evaluation (EuroSCORE).(2) It was developed by a European steering group in 1999 with information on approximately 15,000 consecutive adult patients undergoing cardiac surgery under cardiopulmonary bypass. In 1995 a database was formed with information from 128 centers in eight European countries. The database was divided into a derivation subset (n=13,302) and a validation subset (n=1,497). Statistical analyses identified 17 characteristics that were included in the model. These variables were then each given a weight based on the logistic regression betacoefficients to form an additive risk score for operative mortality. The final model was tested on both the construction and the validation subset and showed a satisfactory performance. (2) However, as stated by the authors, the true test lies in the widespread use of such a risk score. The purpose of the EuroSCORE is to help in the assessment of the quality of cardiac surgical care.(2) This can be done by comparing observed mortality rates against the expected rates between care providers, so called benchmarking. For this purpose however, the presence of other factors that might influence outcome should always be kept in mind, for example postoperative care. The quality of predictions of a widely used risk score has major consequences. Poor performance leads to inadequate prediction and potentially to invalid benchmarking. An appropriate evaluation of the performance should include all available evidence. So far, one systematic review limited to six articles, has been written on this topic.(3) It does not fully comprise the large amount of literature on this subject. The new EuroSCORE is published simultaneously in this issue.(4) This is a crucial moment to understand which improvements have and should have been made to the model. Therefore, the objective of this study was to systematically review all available literature on the performance of the additive and logistic EuroSCORE.

Methods Search strategy and study selection On 6 June 2010 a search was conducted in PubMed. The search terms “EuroSCORE” and “Euro SCORE” were used in the fields of title and abstract. Based on title and abstract, all studies applying the additive or logistic EuroSCORE on patients undergoing cardiac surgery were included for further evaluation.

93


Chapter 7

The following exclusion criteria were applied: articles written in languages other than English, studies conducted in the original EuroSCORE database and editorial comments or letters. Articles with very specific domains such as endocarditis, thoracic aortic surgery and prosthesis dysfunction were excluded as well. Remaining studies focused on coronary artery bypass grafting (CABG), valvular surgery and combined surgery. Both prospective and retrospective studies were included. Only studies which reported early mortality were included in the final selection. Early mortality was defined as either 30-day mortality, in-hospital mortality or operative mortality (30-day and in-hospital mortality). If no expected and observed mortality could be extracted from the text or could be calculated from the presented data, articles were excluded. In case multiple articles reported on the same dataset, the first published article was selected. The predictive performance of the EuroSCORE was quantified by means of calibration and discrimination. Calibration Calibration refers to the ability of a test or a model to estimate the probability of the occurrence of the outcome, in this case mortality.(5) A well calibrated model for mortality is one with a high agreement between the actual (observed) number of deaths and the predicted (expected) number of deaths. In this review we focused on aggregate data instead of individual patient data, i.e. the mean observed mortality and the mean expected mortality of included studies. The calibration of the EuroSCORE was assessed by the observed:expected ratio (O:E ratio) of mortality. This ratio was obtained by dividing the observed mortality with the expected mortality within a population. Ideally this ratio equals one, since in that case the observed mortality equals the expected mortality and hence the predictive model is optimally calibrated. A value below one corresponds to overestimation of mortality and a value above one to underestimation of mortality. The confidence interval (CI) of the ratio was then estimated using the method by Breslow and Day.(6) Whenever the expected mortality or mean EuroSCORE was not explicitly mentioned in an article, the mean additive EuroSCORE was estimated based on reported distributions of preoperative patient characteristics if possible. Expected mortality was then calculated by incorporating the mean values of the patient characteristics in the additive EuroSCORE model. The calibration of the EuroSCORE was also evaluated for different risk groups. For this, articles were used that stratified patients by their preoperative risk (in EuroSCORE), provided that observed mortality, expected mortality and size of each stratum were reported. The O:E ratio calculated from each stratum of patients represented one measurement in the analyses. 94


Performance of the original EuroSCORE

Discrimination Discrimination refers to the ability of the EuroSCORE to differentiate between postoperative survivors and non-survivors.(5) This was quantified using the reported areas under the ROC curve, or c-statistics. A c-statistic of 0.5 indicates no ability of the model to discriminate and a c-statistic of 1.0 indicates a perfect ability to discriminate. Surgical categories Studies or cohorts described within studies were divided according to the performed procedure. The following categories were identified: cardiac surgery, isolated CABG, isolated valve, and mixed CABG and valve. The cardiac surgery category contained studies that included all forms of cardiac surgery. Some of these articles excluded specific procedures, such as thoracic aortic surgery, off-pump surgery and surgery for congenital anomalies. The mixed CABG and valve category comprised all studies that could not be allocated to the surgical categories cardiac surgery, isolated CABG or isolated valve. Analyses All analyses were conducted using PASW Statistics 17.0(7) and R for Windows.(8) To evaluate the calibration of the EuroSCORE, we assessed the relation between predicted mortality and the O:E ratio using meta-regression analysis. Additionally, the relation between year of surgery (defined as the median of the years of surgery included in a study) and O:E ratio was assessed, also by means of meta-regression analysis. All meta-regression analyses were performed using a random effects model, in which the O:E ratio for each study was weighted by the inverse of the variance of the O:E ratio. In practice, this means that larger studies tend to get more weight. Since the O:E ratio was not normally distributed, we logtransformed this ratio, after which it was normally distributed. Furthermore, studies that presented O:E ratios stratified by risk score (i.e. stratified by expected risk) were analyzed separately using a mixed random effects model, thus accounting for the dependency of the stratified observations within studies. To evaluate the discriminative performance of the EuroSCORE, the c-statistic of each study was weighted according to study size. The variance of the c-statistic was not routinely reported and could not be calculated from the reported aggregate data. Sensitivity analyses were performed to evaluate whether the findings were mainly determined by the largest studies. This was done by excluding the studies with the largest weight and repeating all analyses. These results were then compared to the original results.

95


Chapter 7

Results Overview of available literature The search resulted in 686 articles (Figure 1). After applying the inclusion and exclusion criteria, 102 articles remained. Among these studies, seven were not available in full text. (9-15) Another 10 studies had a domain that was too specific(16-25) and 11 studies did not report an observed or an expected mortality rate (nor could these be calculated based on what was reported).(26-36) Seven articles gained data from a dataset used for a previously published article. Figure 1: Strategy and results of literature search

Flow chart showing search criteria and results of literature search.

96


Performance of the original EuroSCORE

The final evaluation thus included 67 articles(1;37-102), which were based on surgery performed from 1992 to 2009 on 462,243 patients. The studies varied from large multi center studies conducted to assess the overall performance of the EuroSCORE, to single center studies using the EuroSCORE for internal quality control. Studies applied either the additive or the logistic EuroSCORE, or both. The mean expected EuroSCORE was not reported in 9 articles and had to be estimated from the reported data. Calibration of the EuroSCORE In Table 1 the results of the data extraction and analysis are shown. Most studies applied the additive EuroSCORE and the general domain of cardiac surgery (21 articles). The mean observed mortality was 3.9% and 3.1% for studies applying the additive (53 articles) and logistic EuroSCORE (47 articles) respectively. Mean expected mortality was approximately twice as high and resulted in O:E ratios of 0.47 and 0.43 respectively. The overestimation is more evident for the logistic score than for the additive score. Although calibration of the EuroSCORE varied much between the surgical categories, all the means of the O:E ratios were below 1.0. This is clearly illustrated in Figure 1 and Figure 2 of the supplementary data. Table 1: Observed and expected mortality rates for the EuroSCORE by surgical category

Min.

Max.

Mean†

Min.

Max.

3.9 0.8 4.1 1.1 2.0 0.8 2.9 2.2 4.5 3.5 3.1 0.6 3.5 2.5 2.3 0.8 2.8 0.6 5.7 3.52

O:E

Mean †

53 21 18 7 9 44 11 12 11 12

Expected Mortality* (%)

Max.

Mean †

373531 268928 75972 8633 26502 193814 80613 96062 16703 12773

Min.

N. articles

All studies Cardiac Surgery Isolated CABG Isolated valve CABG and valve All studies Cardiac Surgery Isolated CABG Isolated valve CABG and valve

N. patients Logistic

Additive

Observed mortality* (%)

10.6 5.9 4.9 4.8 10.6 13.9 7.5 5.0 3.9 13.9

8.8 9.3 3.6 6.5 4.7 7.5 7.5 6.2 9.7 9.9

1.9 3.2 1.9 5.2 3.5 2.3 5.7 2.3 5.3 2.9

9.9 9.9 5.4 7.5 8.8 16.1 13.0 10.9 13.2 16.1

0.47 0.45 0.59 0.47 0.94 0.43 0.48 0.40 0.32 0.61

0.24 0.31 0.24 0.36 0.77 0.10 0.37 0.22 0.10 0.38

2.12 1.18 1.1 0.92 2.12 2.7 0.85 2.05 0.57 2.70

* Studies excluded with only high risk or high age inclusion † Means are weighted (see Methods)

97


Chapter 7

Overall, O:E ratios decreased with increasing score (O:E ratio = e-0.067* additive EuroSCORE – 0.199 , 95% CI of coefficient [-0.090;-0.044]; O:E ratio = e-0.041* logistic EuroSCORE – 0.568, 95% CI of coefficient [-0.072;-0.010]). This means that overestimation is stronger in high-risk patients. When the analyses were performed with studies stratifying in risk groups, (39;40;44;45;47;48;50;57;59;64-68;71;73;75;76;85;90;95-100) the O:E ratios of the additive model increased with increasing score (Figure 2). An O:E ratio of more than 1.0 was seen in scores above 15.2. This means that the degree of overestimation is actually less in high-risk patients and that even an underestimation of mortality is found above an additive score of 15.2. The logistic EuroSCORE demonstrated O:E ratios between 0.4 and 0.5, with decreased ratios in high-risk patients, which is illustrated in Figure 3. Sensitivity analysis showed similar results when the largest studies were excluded. For the separate surgical categories, the direction of the regression lines corresponded with those in Figure 2 and Figure 3, although confidence intervals were wider due to the fewer studies available. Figure 2: Weighted regression analysis of additive EuroSCORE and calibration

Observed:expected ratios with linear regression line for additive EuroSCORE. Only risk stratified studies were taken into account.(39,44,47,50,59,65-68,71,73,75,76,85,90,95, 97,98,100) O:E ratio = e0.062* additive EuroSCORE – 0.947, 95% CI of the coefficient [0.033;0.092]. One circle represents one risk stratum and the size indicates its weight.

98


Performance of the original EuroSCORE

Figure 3: Weighted regression analysis of logistic EuroSCORE and calibration.

Observed:expected ratios with linear regression line for logistic EuroSCORE. Only risk stratified studies were taken into account.(40,45,48,50,57,64,75,90,95,96,98-100) O:E ratio = e-0.002 * logistic EuroSCORE – 0.897, 95% CI of the coefficient [-0.003;-0.001]. One circle represents one risk stratum and the size indicates its weight.

The calibration of the additive EuroSCORE throughout the years is depicted in Figure 4. The O:E ratio has been below 1.0 (i.e. overestimation of mortality) since 1994 and is slowly increasing (O:E ratio = e0.049 * (year of surgery – 1994) – 0.900, 95% CI of coefficient [0.030;0.067]. This means that the degree of overestimation has declined. When the largest study(72) was excluded from the analysis, a similar increasing trend was found (O:E ratio = e0.010 * (year of surgery – 1994) – 0.582 , 95% CI of coefficient [-0.031;0.051]. Most results for the logistic EuroSCORE were non-significant because of the small number of studies. A small significant increase in the calibration over the years was found in the isolated CABG category (O:E ratio = e 0.075 * (year of surgery – 1997) - 1.417 , 95% CI interval of coefficient [0.032;0.118]. The expected mortality over time was also analyzed with regression analysis. The results were significant and showed a decrease of the reported expected mortality over the years, which is depicted in Figure 5. Studies using the additive model showed the same decrease of expected mortality over time (additive EuroSCORE = e-0.101* (year of surgery – 1994) + 2.362, 95% CI of the coefficient [-0.121;-0.082]).

99


Chapter 7

Figure 4: Weighted regression analysis of the calibration of the additive EuroSCORE and year of surgery

All studies applying the additive EuroSCORE were included in the analysis. O:E ratio = e0.049* additive EuroSCORE , 95% CI of the coefficient [0.030;0.067]. One circle represents one study and the size indicates its weight. The largest circle, representing the largest study(72), is scaled down by a factor 15 to fit into the graph. â&#x20AC;&#x201C; 0.900

Figure 5: Weighted regression analysis of the mean logistic EuroSCORE and year of surgery

Logistic EuroSCORE = e-0.044 * (surgery year â&#x20AC;&#x201C; 1997) + 2.213, 95% CI interval of coefficient [-0.081;-0.007]. One circle represents one study and the size indicates its weight. Studies with only high risk or high age inclusion were not taken into account. 100


Performance of the original EuroSCORE

Discrimination of the EuroSCORE The discrimination ability of the EuroSCORE was good, with average c-statistics between 0.7 and 0.8 in all surgical categories, except for the logistic EuroSCORE in the isolated valve category (Table 2). Over the years, the mean c-statistic of all studies remained 0.8 (c-statistic of additive EuroSCORE = 0.002*(year of surgery â&#x20AC;&#x201C; 1994) + 0.772, 95% CI of coefficient [-0.0004; 0.004] and c-statistic logistic EuroSCORE = -0.005*(year of surgery â&#x20AC;&#x201C; 1997) + 0.803, 95% CI of coefficient [-0.010; 0.0004]. Table 2: C-statistics of the EuroSCORE by surgical category

N. articles

Min.

Max.

Mean *

All studies Cardiac Surgery Isolated CABG Isolated valve CABG and valve All studies Cardiac Surgery Isolated CABG Isolated valve CABG and valve

N. patients Logistic

Additive

C-statistic

367039 268800 71192 7264 26287 194570 86347 95652 11956 12952

47 20 16 5 8 35 10 12 6 9

0.64 0.70 0.70 0.68 0.64 0.62 0.70 0.71 0.62 0.65

0.89 0.86 0.89 0.84 0.81 0.95 0.84 0.95 0.76 0.81

0.78 0.78 0.78 0.77 0.79 0.77 0.80 0.77 0.69 0.73

* Means are weighted according to study size

Discussion Principle findings This systematic review shows that both the additive and the logistic EuroSCORE overestimate mortality in all surgical categories. The degree of overestimation depends on the preoperative risk of the patients. Furthermore, the overestimation is more prominent by the logistic than the additive EuroSCORE. Previous reports on the performance of the EuroSCORE similarly concluded that the EuroSCORE overestimated mortality.(3;45) In highrisk patients, an underestimation of mortality by the additive EuroSCORE has previously been shown as well.(3;46;62;66;100) It was suggested that this phenomenon is inherent to the additive model(62) and thus can be resolved by using a logistic method.(103) This systematic review shows that the logistic model overestimates mortality in all risk groups.

101


Chapter 7

Indeed, calibration of the logistic model appears to be less dependent on the preoperative risk of the patient, which means that it is more stable across risk groups. For these reasons, it is advisable that the logistic model is used whenever possible. Trends in past years A common explanation for the described overestimation is the hypothesis that the EuroSCORE is an outdated risk score. Therefore, a good calibration at the start and deterioration with progressing years were expected in the analyses. The EuroSCORE was constructed using data from patients operated in 1995. Changes in indication for cardiac surgery and the increasing role of percutaneous intervention might have had an effect on patient characteristics. Technological improvements of pre-, peri- and postoperatively used equipment have likely reduced the risk of mortality. These changes could partly have accounted for the poor calibration in current practice.(3) However, our analysis demonstrated an opposite trend. The additive and logistic EuroSCORE showed O:E ratios below 1.0 from the beginning and a slow increase of O:E over the years. In other words, overestimation was present from the beginning (Figure 4) and deterioration of the score over time could not be demonstrated despite the 16 years that have passed. Another remarkable trend was found when studies were observed over time. Our results indicate a decrease in expected risk when all studies were pooled. Subgroup analyses for all surgical categories showed similar trends, although results were not significant. Previous reports from other large cohorts are ambiguous regarding this matter. The database of the Society of Thoracic Surgeons (STS) in the United States demonstrated that isolated CABG patients are currently older and sicker than before.(104) The Society of Cardio-thoracic Surgeons (SCTS) in Great Britain and Ireland reported an increase of patients with high risk in all surgical categories.(105) However, other large cohorts in Europe report no or a minimal increase in patient risk over time. In the Danish Heart Register a marginal increase of 0.02 is detected in the mean additive EuroSCORE from 2006 to 2010 (from 5.16 to 5.18 for isolated CABG and from 7.03 to 7.05 for valvular surgery).1 The Netherlands Association for CardioThoracic Surgery reports no rise in the logistic EuroSCORE from 2007 to 2009 (both 4.9% for isolated CABG and from 8.0 to 7.8% for aortic valve replacement).2 Valvular surgery The EuroSCORE derivation dataset consisted of 63.6% isolated CABG and 29.8% valvular operations. It has previously been discussed that the type of procedure affects the outcome and that the influence of a risk factor differs across the types of procedures.(105) We found that the EuroSCORE performed worst in the isolated valve category regarding both calibration and discrimination. In a comment by Nashef(106) a less favorable discrimination 1 2

The Danish Heart Register: http://www.dhreg.dk/ Netherlands Association for Cardio-Thoracic Surgery: http://www.nvtnet.nl/

102


Performance of the original EuroSCORE

in valvular surgery was explained by exclusion of certain items of the EuroSCORE that allow the model to discriminate. However, studies that do not exclude EuroSCORE items also found low C-statistic of 0.69.(57;75) Despite these examples of poor discrimination, the mean C-statistic for the both EuroSCORE models remains above 0.7 (Table 2) and thus sufficient. Some authors advocated the use of separate risk stratification models for valvular surgery. (93) However, the strength of the EuroSCORE is its general applicability to various kinds of cardiac surgery. Extra attention to risk factors related to valvular surgery should make it possible to keep one model with acceptable performance for cardiac surgery. Purpose of the EuroSCORE A model that is constructed for patient selection must meet other requirements than a risk model that is constructed for benchmarking. The first type of model should be simple enough for clinical use and both calibration and discrimination are important. For the latter, the main concern is a scrupulous calibration in all risk groups, so that fair comparisons between providers can be made.(107) The reported aim of the EuroSCORE was to aid quality assessment in surgical care. For this purpose the demonstrated good discrimination alone is not adequate and calibration is of higher importance. Unfortunately, the weakness of the EuroSCORE model is its calibration. The overestimation cannot simply be resolved by multiplying the expected risk with a certain factor (e.g. in this case multiplying the expected mortality by a factor 0.5). Because the calibration differs across risk groups, the correction factor would have to be different for every risk group. Consequently, the model may not be accurate enough for benchmarking. It remains debatable whether the calibration difference across risk groups in the logistic EuroSCORE is clinically relevant for the sole purpose of patient selection. In this issue EuroSCORE II is presented.(4) The goal of improving both calibration and discrimination fits the initial purpose of the score well. Improving the model: EuroSCORE II It is evident that improvement was required and has indeed been made with the updated model, EuroSCORE II.(4) Firstly, the score was developed using more patients from a diverse collection of countries all over the world. This increases generalizability of the model. Furthermore, the authors omitted the additive score. The problems regarding the calibration of the additive score, as discussed previously in this paper, is therefore no longer an issue. Thirdly, in order to enhance the performance of the model in valvular (and other concomitant) surgery, the number of major cardiac procedures performed is incorporated in the score calculation. Although other specific risk factors for valvular surgery were not added to EuroSCORE II, this modification is likely to improve the performance of the model

103


Chapter 7

in valvular surgery. At the same time, the convenience of having one model for different types of cardiac surgery is maintained. Considering the calibration problems discussed in this article, the largest improvement is probably made with the total recalibration of the model. The intercept as well as all coefficients have been updated. This caused the calibration of the model to improve dramatically: the Observed:Expected ratio in the validation set is 0.94. Whether this can be reproduced in other datasets remains to be seen, yet this result is promising. In addition, several risk factors have been added or altered in some way (e.g. left ventricular function is now divided into four categories). Although the c-statistics of the former EuroSCORE was satisfactory, this is likely to further improve the discrimination of the model. Taking these changes together, the EuroSCORE has been improved in many ways and we expect the performance of EuroSCORE II to be superior to the former model. Limitations and strengths Potential limitations of this review include publication bias, quality of the reviewed studies and ecological bias. Publication bias is a potential problem in every meta-analysis, but particularly in meta-analyses on observational studies when compared to those on randomized trials.(108;109) Since statistical significant results are more likely to be published, the reviewed papers might not reflect the actual situation. Recently, Nashef commented on the performance of the EuroSCORE in valvular operations and also stated that publication bias might explain the overestimation.(106) In this review however, funnel plots of the O:E ratios were not suggestive for publication bias. However, as can be seen from the comparison of trends between our results and those from other cohorts, it remains unclear whether the reviewed papers are representative of all cardiac surgery. Therefore, publication bias should always be considered in the interpretation of results. The validity of this review is directly affected by the methodological quality of the included studies. According to guidelines on the conduct of prognostic studies, this feature is related to the representativeness of the study populations, loss to follow-up and measurement of prognostic factors and outcomes. Referring to the former, it is impossible to evaluate whether the studied populations are representative for the entire cardiac surgery population. As to the second point, it is unlikely that loss to follow-up materially affected our results. The reason for this is that the studied outcome was early postoperative mortality, which means that follow-up time is short. Lastly, the measurement of predictors and outcome was inconsistent across articles. Some studies extracted data from existing databases that used other definitions then those in the EuroSCORE. Outcome differed from in-hospital mortality, to 30-day mortality and classic operative mortality. The impact of these differences in methodological quality is difficult to evaluate.

104


Performance of the original EuroSCORE

In this review including meta-regression analysis, aggregate data were used, i.e. no individual patient data. This may have led to ecological bias: the effect observed in the aggregate data is biased and would not be present if individual patient data were analyzed. To evaluate the calibration of the EuroSCORE across risk groups, an additional analysis was therefore performed using O:E ratios derived from each risk group within a study. The advantage of this analysis is that smaller numbers of patients are aggregated, which likely reduces possible ecological bias. Indeed, for the additive EuroSCORE the overestimation of mortality risk appeared constant when considering aggregate data only, whereas this overestimation appeared to be stronger in high-risk patients when considering stratified data. For the logistic EuroSCORE such discrepancy was not observed. Reported data were usually not stratified in time periods or years. Therefore, similar additional analyses for trends in time could not be performed with time-stratified data. The phenomenon of ecological bias is a common pitfall in the analyses of aggregate data and its occurrence should always be considered with caution. One of the strengths of this review is its comprehensiveness. In this study 67 articles were used to evaluate the performance of the EuroSCORE. Hypotheses and results of individual studies could therefore be reliably confirmed or rejected. The only previous review on the EuroSCORE included six studies. Another advantage of this study is that extracted data was divided according to the surgical category. This gave a clear view on the poor performance of the EuroSCORE in different types of procedures and answered the question whether a separate score is needed for valvular surgery. Finally, all analyses performed in this review were weighted according to a random effects model. Consequently, data from large studies have more impact on the results than data from small studies. This made it possible to use all available studies, while limiting the influence of small studies. Conclusions This comprehensive review shows that both the additive and the logistic EuroSCORE do not adequately predict operative mortality following cardiac surgery. The discrepancy between the expected and observed mortality differs across risk groups. Therefore, the EuroSCORE may not be suitable as a tool for patient selection and benchmarking of healthcare providers.

105


Chapter 7

Supplemental Figure 1                                                         



  

106














Performance of the original EuroSCORE

Supplemental Figure 2  













           

            

           

          

 

107


Chapter 7

Reference List 1. Nilsson J, Algotsson L, Hoglund P, Luhrs C, Brandt J. Comparison of 19 pre-operative risk stratification models in open-heart surgery. Eur Heart J 2006 Apr;27(7):867-74. 2. Nashef SA, Roques F, Michel P, Gauducheau E, Lemeshow S, Salamon R. European system for cardiac operative risk evaluation (EuroSCORE). Eur J Cardiothorac Surg 1999 Jul;16(1):9-13. 3. Gogbashian A, Sedrakyan A, Treasure T. EuroSCORE: a systematic review of international performance. Eur J Cardiothorac Surg 2004 May;25(5):695-700. 4. Nashef SA, Roques F, Sharples LD, Nilsson J, Smith C, Goldstone AR, et al. EuroSCORE II. Eur J Cardiothorac Surg 2011;X(X):X. 5. Royston P, Moons KG, Altman DG, Vergouwe Y. Prognosis and prognostic research: Developing a prognostic model. BMJ 2009;338:b604. 6. Breslow NE, Day NE. Rates and rate standardization. In: Heseltine E, editor. Statistical Methods in Cancer Research: Vol. II - The Design and Analysis of Cohort Studies.Lyon: International Agency for Research on Cancer; 1987. p. 48-80. 7. PASW Statistics 17.0. Version 17.0.2. 2009. 8. R: A Language and Environment for Statistical Computing. Version 2.8.1. Vienna, Austria: R Foundation for Statistical Computing; 2008. 9. Engebretsen KV, Friis C, Sandvik L, Tonnessen T. Survival after CABG--better than predicted by EuroSCORE and equal to the general population. Scand Cardiovasc J 2009 Apr;43(2):123-8. 10. Gurler S, Gebhard A, Godehardt E, Boeken U, Feindt P, Gams E. EuroSCORE as a predictor for complications and outcome. Thorac Cardiovasc Surg 2003 Apr;51(2):73-7. 11. Holinski S, Claus B, Christ T, Kasperiunaite R, Konertz W. Overestimation of the operative risk by the EuroSCORE also in high-risk patients undergoing aortic valve replacement with a stentless biological prosthesis. Heart Surg Forum 2010 Feb 1;13(1):E13-E16. 12. Klinceva M, Widimsky P, Dohnalova A. Prospective use of EuroSCORE for the short-term risk evaluation of consecutive cardiac surgery candidates: are there any differences in prediction of perioperative risk versus risk of nonsurgical treatments? Vnitr Lek 2006 Dec;52(12):1156-61. 13. Mujicic E, Ivanusa M, Omerbasic E, Straus S, Perva O, Granov N. Application of EuroSCORE in â&#x20AC;&#x153;Heart center Sarajevoâ&#x20AC;?. Bosn J Basic Med Sci 2007 Feb;7(1):52-4. 14. Sciangula A, Puddu PE, Schiariti M, Acconcia MC, Missiroli B, Papalia U, et al. Comparative application of multivariate models developed in Italy and Europe to predict early (28 days) and late (1 year) postoperative death after on- or off-pump coronary artery bypass grafting. Heart Surg Forum 2007;10(4):E258-E266. 15. Vanagas G, Kinduris S, Buivydaite K. Assessment of validity for EuroSCORE risk stratification system. Scand Cardiovasc J 2005 Apr;39(1-2):67-70. 16. Bakaeen FG, Chu D, Huh J, LeMaire SA, Soltero ER, Petersen NJ, et al. Contemporary outcomes of open thoracic aortic surgery in a veteran population: do risk models exaggerate mortality? Am J Surg 2009 Dec;198(6):889-94. 17. Danner BC, Didilis VN, Stojanovic T, Popov A, Grossmann M, Seipelt R, et al. A three-group model to predict mortality in emergent coronary artery bypass graft surgery. Ann Thorac Surg 2009 Nov;88(5):1433-9. 18. Jaussaud N, Gariboldi V, Giorgi R, Grisoli D, Chalvignac V, Thuny F, et al. Risk of reoperation for aortic bioprosthesis dysfunction. J Heart Valve Dis 2009 May;18(3):256-61. 19. Matsuura K, Ogino H, Matsuda H, Minatoya K, Sasaki H, Yagihara T, et al. Limitations of EuroSCORE for measurement of risk-stratified mortality in aortic arch surgery using selective cerebral perfusion: is advanced age no longer a risk? Ann Thorac Surg 2006 Jun;81(6):2084-7. 20. Mestres CA, Castro MA, Bernabeu E, Josa M, Cartana R, Pomar JL, et al. Preoperative risk stratification in infective endocarditis. Does the EuroSCORE model work? Preliminary results. Eur J Cardiothorac Surg 2007 Aug;32(2):281-5. 21. Nishida T, Masuda M, Tomita Y, Tokunaga S, Tanoue Y, Shiose A, et al. The logistic EuroSCORE predicts the hospital mortality of the thoracic aortic surgery in consecutive 327 Japanese patients better than the additive EuroSCORE. Eur J Cardiothorac Surg 2006 Oct;30(4):578-82.

108


Performance of the original EuroSCORE

22. Piazza N, Wenaweser P, van GM, Pilgrim T, Tsikas A, Otten A, et al. Relationship between the logistic EuroSCORE and the Society of Thoracic Surgeons Predicted Risk of Mortality score in patients implanted with the CoreValve ReValving system--a Bern-Rotterdam Study. Am Heart J 2010 Feb;159(2):323-9. 23. Putman LM, van GM, Meijboom FJ, de Jong PL, Roos-Hesselink JW, Witsenburg M, et al. Seventeen years of adult congenital heart surgery: a single center experience. Eur J Cardiothorac Surg 2009 Jul;36(1):96-104. 24. Rasmussen RV, Bruun LE, Lund J, Larsen CT, Hassager C, Bruun NE. The impact of cardiac surgery in native valve infective endocarditis: Can euroSCORE guide patient selection? Int J Cardiol 2011 Jun;149(3):304-9. 25. Toumpoulis IK, Anagnostopoulos CE, Ioannidis JP, Toumpoulis SK, Chamogeorgakis T, Swistel DG, et al. The importance of independent risk-factors for long-term mortality prediction after cardiac surgery. Eur J Clin Invest 2006 Sep;36(9):599-607. 26. Barmettler H, Immer FF, Berdat PA, Eckstein FS, Kipfer B, Carrel TP. Risk-stratification in thoracic aortic surgery: should the EuroSCORE be modified? Eur J Cardiothorac Surg 2004 May;25(5):691-4. 27. Biancari F, Kangasniemi OP, Luukkonen J, Vuorisalo S, Satta J, Pokela R, et al. EuroSCORE predicts immediate and late outcome after coronary artery bypass surgery. Ann Thorac Surg 2006 Jul;82(1):57-61. 28. Huijskes RV, Wesselink RM, Noyez L, Rosseel PM, Klok T, van Straten BH, et al. Predictive models for thoracic aorta surgery. Is the Euroscore the optimal risk model in the Netherlands? Interact Cardiovasc Thorac Surg 2005 Dec;4(6):538-42. 29. Kurki TS, Jarvinen O, Kataja MJ, Laurikka J, Tarkka M. Performance of three preoperative risk indices; CABDEAL, EuroSCORE and Cleveland models in a prospective coronary bypass database. Eur J Cardiothorac Surg 2002 Mar;21(3):406-10. 30. Lurati Buse GA, Koller MT, Grapow M, Bruni CM, Kasper J, Seeberger MD, et al. 12-month outcome after cardiac surgery: prediction by troponin T in combination with the European system for cardiac operative risk evaluation. Ann Thorac Surg 2009 Dec;88(6):1806-12. 31. Sastry P, Theologou T, Field M, Shaw M, Pullan DM, Fabri BM. Predictive accuracy of EuroSCORE: is end-diastolic dysfunction a missing variable? Eur J Cardiothorac Surg 2010 Feb;37(2):261-6. 32. Syed AU, Fawzy H, Farag A, Nemlander A. Predictive value of EuroSCORE and Parsonnet scoring in Saudi population. Heart Lung Circ 2004 Dec;13(4):384-8. 33. Vanagas G, Kinduris S. Assessing the validity of cardiac surgery risk stratification systems for CABG patients in a single center. Med Sci Monit 2005 May;11(5):CR215-CR218. 34. Langanay T, Verhoye JP, Ocampo G, Vola M, Tauran A, De La TB, et al. Current hospital mortality of aortic valve replacement in octogenarians. J Heart Valve Dis 2006 Sep;15(5):630-7. 35. Florath I, Albert A, Hassanein W, Arnrich B, Rosendahl U, Ennker IC, et al. Current determinants of 30-day and 3-month mortality in over 2000 aortic valve replacements: Impact of routine laboratory parameters. Eur J Cardiothorac Surg 2006 Nov;30(5):716-21. 36. Hirose H, Noguchi C, Inaba H, Tambara K, Yamamoto T, Yamasaki M, et al. The role of EuroSCORE in patients undergoing off-pump coronary artery bypass. Interact Cardiovasc Thorac Surg 2010 May;10(5):771-6. 37. Abildstrom SZ, Hvelplund A, Rasmussen S, Nielsen PH, Mortensen PE, Kruse M. Prognostic information in administrative co-morbidity data following coronary artery bypass grafting. Eur J Cardiothorac Surg 2010 Nov;38(5):573-6. 38. Ad N, Barnett SD, Speir AM. The performance of the EuroSCORE and the Society of Thoracic Surgeons mortality risk score: the gender factor. Interact Cardiovasc Thorac Surg 2007 Apr;6(2):192-5. 39. Al-Ruzzeh S, Asimakopoulos G, Ambler G, Omar R, Hasan R, Fabri B, et al. Validation of four different risk stratification systems in patients undergoing off-pump coronary artery bypass surgery: a UK multicenter analysis of 2223 patients. Heart 2003 Apr;89(4):432-5. 40. Antunes PE, Eugenio L, Ferrao de OJ, Antunes MJ. Mortality risk prediction in coronary surgery: a locally developed model outperforms external risk models. Interact Cardiovasc Thorac Surg 2007 Aug;6(4):437-41. 41. Asimakopoulos G, Al-Ruzzeh S, Ambler G, Omar RZ, Punjabi P, Amrani M, et al. An evaluation of existing risk stratification models as a tool for comparison of surgical performances for coronary artery bypass grafting between institutions. Eur J Cardiothorac Surg 2003 Jun;23(6):935-41. 109


Chapter 7

42. Au WK, Sun MP, Lam KT, Cheng LC, Chiu SW, Das SR. Mortality prediction in adult cardiac surgery patients: comparison of two risk stratification models. Hong Kong Med J 2007 Aug;13(4):293-7. 43. Barili F, Di GO, Capo A, Ardemagni E, Rosato F, Argenziano M, et al. Aortic valve replacement: reliability of EuroSCORE in predicting early outcomes. Int J Cardiol 2010 Oct 8;144(2):343-5. 44. Berman M, Stamler A, Sahar G, Georghiou GP, Sharoni E, Brauner R, et al. Validation of the 2000 Bernstein-Parsonnet score versus the EuroSCORE as a prognostic tool in cardiac surgery. Ann Thorac Surg 2006 Feb;81(2):537-40. 45. Bhatti F, Grayson AD, Grotte G, Fabri BM, Au J, Jones M, et al. The logistic EuroSCORE in cardiac surgery: how well does it predict operative risk? Heart 2006 Dec;92(12):1817-20. 46. Bridgewater B, Grayson AD, Jackson M, Brooks N, Grotte GJ, Keenan DJ, et al. Surgeon specific mortality in adult cardiac surgery: comparison between crude and risk stratified data. BMJ 2003 Jul 5;327(7405):13-7. 47. Calafiore AM, Di MM, Canosa C, Di GG, Iaco AL, Contini M. Early and late outcome of myocardial revascularization with and without cardiopulmonary bypass in high-risk patients (EuroSCORE > or = 6). Eur J Cardiothorac Surg 2003 Mar;23(3):360-7. 48. Campagnucci VP, Pinto E Silva AM, Pereira WL, Chamlian EG, Gandra SM, Rivetti LA. EuroSCORE and the patients undergoing coronary bypass surgery at Santa Casa de Sao Paulo. Rev Bras Cir Cardiovasc 2008 Jun;23(2):262-7. 49. Chen CC, Wang CC, Hsieh SR, Tsai HW, Wei HJ, Chang Y. Application of European system for cardiac operative risk evaluation (EuroSCORE) in coronary artery bypass surgery for Taiwanese. Interact Cardiovasc Thorac Surg 2004 Dec;3(4):562-5. 50. Collart F, Feier H, Kerbaul F, Mouly-Bandini A, Riberi A, Mesana TG, et al. Valvular surgery in octogenarians: operative risks factors, evaluation of Euroscore and long term results. Eur J Cardiothorac Surg 2005 Feb;27(2):276-80. 51. Dâ&#x20AC;&#x2122;Errigo P, Seccareccia F, Rosato S, Manno V, Badoni G, Fusco D, et al. Comparison between an empirically derived model and the EuroSCORE system in the evaluation of hospital performance: the example of the Italian CABG Outcome Project. Eur J Cardiothorac Surg 2008 Mar;33(3):325-33. 52. Dewey TM, Brown D, Ryan WH, Herbert MA, Prince SL, Mack MJ. Reliability of risk algorithms in predicting early and late operative outcomes in high-risk patients undergoing aortic valve replacement. J Thorac Cardiovasc Surg 2008 Jan;135(1):180-7. 53. Geissler HJ, Holzl P, Marohl S, Kuhn-Regnier F, Mehlhorn U, Sudkamp M, et al. Risk stratification in heart surgery: comparison of six score systems. Eur J Cardiothorac Surg 2000 Apr;17(4):400-6. 54. Ghazy T, Kappert U, Ouda A, Conen D, Matschke K. A question of clinical reliability: observed versus EuroSCORE-predicted mortality after aortic valve replacement. J Heart Valve Dis 2010 Jan;19(1):1620. 55. Grant SW, Grayson AD, Jackson M, Au J, Fabri BM, Grotte G, et al. Does the choice of risk adjustment model influence the outcome of surgeon-specific mortality analysis? A retrospective analysis of 14,637 patients under 31 surgeons. Heart 2008 Aug;94(8):1044-9. 56. Grossi EA, Schwartz CF, Yu PJ, Jorde UP, Crooke GA, Grau JB, et al. High-risk aortic valve replacement: are the outcomes as bad as predicted? Ann Thorac Surg 2008 Jan;85(1):102-6. 57. Gummert JF, Funkat A, Osswald B, Beckmann A, Schiller W, Krian A, et al. EuroSCORE overestimates the risk of cardiac surgery: results from the national registry of the German Society of Thoracic and Cardiovascular Surgery. Clin Res Cardiol 2009 Jun;98(6):363-9. 58. Heikkinen J, Biancari F, Satta J, Salmela E, Mosorin M, Juvonen T, et al. Predicting immediate and late outcome after surgery for mitral valve regurgitation with EuroSCORE. J Heart Valve Dis 2007 Mar;16(2):116-21. 59. Hirose H, Inaba H, Noguchi C, Tambara K, Yamamoto T, Yamasaki M, et al. EuroSCORE predicts postoperative mortality, certain morbidities, and recovery time. Interact Cardiovasc Thorac Surg 2009 Oct;9(4):613-7. 60. Huijskes RV, Rosseel PM, Tijssen JG. Outcome prediction in coronary artery bypass grafting and valve surgery in the Netherlands: development of the Amphiascore and its comparison with the Euroscore. Eur J Cardiothorac Surg 2003 Nov;24(5):741-9. 61. Iyem H. Evaluation of the reliability of the EuroSCORE risk analysis prediction in high-risk older patients undergoing CABG. Cardiovasc J Afr 2009 Nov;20(6):340-3.

110


Performance of the original EuroSCORE

62. Jin R, Grunkemeier GL. Additive vs. logistic risk models for cardiac surgery mortality. Eur J Cardiothorac Surg 2005 Aug;28(2):240-3. 63. Kaartama T, Heikkinen L, Vento A. An evaluation of mitral valve procedures using the European system for cardiac operative risk evaluation. Scand J Surg 2008;97(3):254-8. 64. Kalavrouziotis D, Li D, Buth KJ, Legare JF. The European System for Cardiac Operative Risk Evaluation (EuroSCORE) is not appropriate for withholding surgery in high-risk patients with aortic stenosis: a retrospective cohort study. J Cardiothorac Surg 2009;4:32. 65. Karabulut H, Toraman F, Alhan C, Camur G, Evrenkaya S, Dagdelen S, et al. EuroSCORE overestimates the cardiac operative risk. Cardiovasc Surg 2003 Aug;11(4):295-8. 66. Karthik S, Srinivasan AK, Grayson AD, Jackson M, Sharpe DA, Keenan DJ, et al. Limitations of additive EuroSCORE for measuring risk stratified mortality in combined coronary and valve surgery. Eur J Cardiothorac Surg 2004 Aug;26(2):318-22. 67. Kawachi Y, Nakashima A, Toshima Y, Arinaga K, Kawano H. Risk stratification analysis of operative mortality in heart and thoracic aorta surgery: comparison between Parsonnet and EuroSCORE additive model. Eur J Cardiothorac Surg 2001 Nov;20(5):961-6. 68. Kobayashi KJ, Williams JA, Nwakanma LU, Weiss ES, Gott VL, Baumgartner WA, et al. EuroSCORE predicts short- and mid-term mortality in combined aortic valve replacement and coronary artery bypass patients. J Card Surg 2009 Nov;24(6):637-43. 69. Kuduvalli M, Grayson AD, Au J, Grotte G, Bridgewater B, Fabri BM. A multi-center additive and logistic risk model for in-hospital mortality following aortic valve replacement. Eur J Cardiothorac Surg 2007 Apr;31(4):607-13. 70. Leontyev S, Walther T, Borger MA, Lehmann S, Funkat AK, Rastan A, et al. Aortic valve replacement in octogenarians: utility of risk stratification with EuroSCORE. Ann Thorac Surg 2009 May;87(5):14405. 71. Mesquita ET, Ribeiro A, Araujo MP, Campos LA, Fernandes MA, Colafranceschi AS, et al. Indicators of healthcare quality in isolated coronary artery bypass graft surgery performed at a tertiary cardiology center. Arq Bras Cardiol 2008 May;90(5):320-3. 72. Nashef SA, Roques F, Hammill BG, Peterson ED, Michel P, Grover FL, et al. Validation of European System for Cardiac Operative Risk Evaluation (EuroSCORE) in North American cardiac surgery. Eur J Cardiothorac Surg 2002 Jul;22(1):101-5. 73. Nilsson J, Algotsson L, Hoglund P, Luhrs C, Brandt J. Early mortality in coronary bypass surgery: the EuroSCORE versus The Society of Thoracic Surgeons risk algorithm. Ann Thorac Surg 2004 Apr;77(4):1235-9. 74. Nissinen J, Biancari F, Wistbacka JO, Loponen P, Teittinen K, Tarkiainen P, et al. Is it possible to improve the accuracy of EuroSCORE? Eur J Cardiothorac Surg 2009 Nov;36(5):799-804. 75. Osswald BR, Gegouskov V, Badowski-Zyla D, Tochtermann U, Thomas G, Hagl S, et al. Overestimation of aortic valve replacement risk by EuroSCORE: implications for percutaneous valve replacement. Eur Heart J 2009 Jan;30(1):74-80. 76. Ouattara A, Niculescu M, Ghazouani S, Babolian A, Landi M, Lecomte P, et al. Predictive performance and variability of the cardiac anesthesia risk evaluation score. Anesthesiology 2004 Jun;100(6):140510. 77. Parolari A, Pesce LL, Trezzi M, Loardi C, Kassem S, Brambillasca C, et al. Performance of EuroSCORE in CABG and off-pump coronary artery bypass grafting: single institution experience and metaanalysis. Eur Heart J 2009 Feb;30(3):297-304. 78. Pinna-Pintor P, Bobbio M, Colangelo S, Veglia F, Giammaria M, Cuni D, et al. Inaccuracy of four coronary surgery risk-adjusted models to predict mortality in individual patients. Eur J Cardiothorac Surg 2002 Feb;21(2):199-204. 79. Pitkanen O, Niskanen M, Rehnberg S, Hippelainen M, Hynynen M. Intra-institutional prediction of outcome after cardiac surgery: comparison between a locally derived model and the EuroSCORE. Eur J Cardiothorac Surg 2000 Dec;18(6):703-10. 80. Ranucci M, Castelvecchio S, Menicanti LA, Scolletta S, Biagioli B, Giomarelli P. An adjusted EuroSCORE model for high-risk cardiac patients. Eur J Cardiothorac Surg 2009 Nov;36(5):791-7. 81. Ranucci M, Guarracino F, Castelvecchio S, Baldassarri R, Covello RD, Landoni G. Surgical and transcatheter aortic valve procedures. The limits of risk scores. Interact Cardiovasc Thorac Surg 2010 Aug;11(2):138-41. 111


Chapter 7

82. Ranucci M, Castelvecchio S, Menicanti L, Frigiola A, Pelissero G. Accuracy, calibration and clinical performance of the EuroSCORE: can we reduce the number of variables? Eur J Cardiothorac Surg 2010 Mar;37(3):724-9. 83. Ribera A, Ferreira-Gonzalez I, Cascant P, Pons JM, Permanyer-Miralda G. The EuroSCORE and a local model consistently predicted coronary surgery mortality and showed complementary properties. J Clin Epidemiol 2008 Jul;61(7):663-70. 84. Schenk S, Fritzsche D, Atoui R, Koertke H, Koerfer R, Eitz T. EuroSCORE-predicted mortality and surgical judgment for interventional aortic valve replacement. J Heart Valve Dis 2010 Jan;19(1):5-15. 85. Sergeant P, de WE, Meyns B. Single center, single domain validation of the EuroSCORE on a consecutive sample of primary and repeat CABG. Eur J Cardiothorac Surg 2001 Dec;20(6):1176-82. 86. Stoica SC, Sharples LD, Ahmed I, Roques F, Large SR, Nashef SA. Preoperative risk prediction and intraoperative events in cardiac surgery. Eur J Cardiothorac Surg 2002 Jan;21(1):41-6. 87. Suojaranta-Ylinen RT, Kuitunen AH, Kukkonen SI, Vento AE, Salminen US. Risk evaluation of cardiac surgery in octogenarians. J Cardiothorac Vasc Anesth 2006 Aug;20(4):526-30. 88. Swart MJ, Joubert G. The EuroSCORE does well for a single surgeon outside Europe. Eur J Cardiothorac Surg 2004 Jan;25(1):145-6. 89. Tan JI, Allsopp TJ, Paterson HS, Byth K, Maclennan DR. Maintenance of quality in a small cardiac surgical unit. Heart Lung Circ 2008 Dec;17(6):484-7. 90. Toumpoulis IK, Anagnostopoulos CE, Derose JJ, Swistel DG. European system for cardiac operative risk evaluation predicts long-term survival in patients with coronary artery bypass grafting. Eur J Cardiothorac Surg 2004 Jan;25(1):51-8. 91. Toumpoulis IK, Anagnostopoulos CE, Swistel DG, DeRose JJ, Jr. Does EuroSCORE predict length of stay and specific postoperative complications after cardiac surgery? Eur J Cardiothorac Surg 2005 Jan;27(1):128-33. 92. Toumpoulis IK, Anagnostopoulos CE. Does EuroSCORE predict length of stay and specific postoperative complications after heart valve surgery? J Heart Valve Dis 2005 Mar;14(2):243-50. 93. van Gameren M, Kappetein AP, Steyerberg EW, Venema AC, Berenschot EA, Hannan EL, et al. Do we need separate risk stratification models for hospital mortality after heart valve surgery? Ann Thorac Surg 2008 Mar;85(3):921-30. 94. Vanagas G, Kinduris S, Leveckyte A. Comparison of various score systems for risk stratification in heart surgery. Medicina (Kaunas ) 2003;39(8):739-44. 95. Wang C, Yao F, Han L, Zhu J, Xu ZY. Validation of the European system for cardiac operative risk evaluation (EuroSCORE) in Chinese heart valve surgery patients. J Heart Valve Dis 2010 Jan;19(1):217. 96. Wendt D, Osswald BR, Kayser K, Thielmann M, Tossios P, Massoudy P, et al. Society of Thoracic Surgeons score is superior to the EuroSCORE determining mortality in high-risk patients undergoing isolated aortic valve replacement. Ann Thorac Surg 2009 Aug;88(2):468-74. 97. Yap CH, Reid C, Yii M, Rowland MA, Mohajeri M, Skillington PD, et al. Validation of the EuroSCORE model in Australia. Eur J Cardiothorac Surg 2006 Apr;29(4):441-6. 98. Youn YN, Kwak YL, Yoo KJ. Can the EuroSCORE predict the early and mid-term mortality after offpump coronary artery bypass grafting? Ann Thorac Surg 2007 Jun;83(6):2111-7. 99. Zheng Z, Li Y, Zhang S, Hu S. The Chinese coronary artery bypass grafting registry study: how well does the EuroSCORE predict operative risk for Chinese population? Eur J Cardiothorac Surg 2009 Jan;35(1):54-8. 100. Zingone B, Pappalardo A, Dreas L. Logistic versus additive EuroSCORE. A comparative assessment of the two models in an independent population sample. Eur J Cardiothorac Surg 2004 Dec;26(6):113440. 101. Farrokhyar F, Wang X, Kent R, Lamy A. Early mortality from off-pump and on-pump coronary bypass surgery in Canada: a comparison of the STS and the EuroSCORE risk prediction algorithms. Can J Cardiol 2007 Sep;23(11):879-83. 102. Xu J, Ge Y, Hu S, Song Y, Sun H, Liu P. A simple predictive model of prolonged intensive care unit stay after surgery for acquired heart valve disease. J Heart Valve Dis 2007 Mar;16(2):109-15. 103. Michel P, Roques F, Nashef SA. Logistic or additive EuroSCORE for high-risk patients? Eur J Cardiothorac Surg 2003 May;23(5):684-7.

112


Performance of the original EuroSCORE

104. Ferguson TB, Jr., Hammill BG, Peterson ED, DeLong ER, Grover FL. A decade of change--risk profiles and outcomes for isolated coronary artery bypass grafting procedures, 1990-1999: a report from the STS National Database Committee and the Duke Clinical Research Institute. Society of Thoracic Surgeons. Ann Thorac Surg 2002 Feb;73(2):480-9. 105. The Society for Cardio-thoracic Surgery in Great Britain & Ireland. Sixth National Adult Cardiac Surgical Database Report 2008, Demonstrating quality. 2008. 106. Nashef SA. EuroSCORE and heart valve surgery. J Heart Valve Dis 2010 Jan;19(1):1-4. 107. Omar RZ, Ambler G, Royston P, Eliahoo J, Taylor KM. Cardiac surgery risk modeling for mortality: a review of current practice and suggestions for improvement. Ann Thorac Surg 2004 Jun;77(6):22327. 108. Altman DG. Systematic reviews of evaluations of prognostic variables. BMJ 2001 Jul 28;323(7306):2248. 109. Easterbrook PJ, Berlin JA, Gopalan R, Matthews DR. Publication bias in clinical research. Lancet 1991 Apr 13;337(8746):867-72.

113


8 Limitations of ranking lists based on cardiac surgery mortality rates

Siregar S, Groenwold RHH, Jansen EK, Bots ML, van der Graaf Y, van Herwerden LA Circ Cardiovasc Qual Outcomes. 2012 May;5(3):403-9


Chapter 8

Abstract Background Ranking lists are a common way of reporting performance in cardiac surgery. However, rankings have shown to be imprecise, yet the extent of this imprecision is unknown. We aimed to determine the precision of, and fluctuations in, ranking lists in the comparison of cardiac surgery mortality rates. Methods Information on all adult cardiac surgery patients in all 16 cardio-thoracic centers in The Netherlands from 1 January 2007 until 31 December 2009 was extracted from the database of the Netherlands Association for Cardio-Thoracic Surgery (n=46883). Ranks were assessed using crude and adjusted mortality rates, using a random effects logistic regression model. Risk adjustment was performed using the logistic EuroSCORE. Statistical precision of ranks was assessed with 95% confidence intervals. Additional analyses were performed for isolated coronary artery bypass grafting (CABG) patients. Results The ranking lists based on mortality rates in three consecutive years showed considerable reshuffling. When all data was pooled, the mean width of the 95% confidence intervals was 10 ranks using crude and 8 ranks using adjusted mortality rates. The large overlap of the confidence intervals across hospitals indicates that rank statistics were not materially different. Results were similar in the isolated CABG subgroup. Conclusions Rankings are an imprecise statistical method to report cardiac surgery mortality rates and prone to (random) fluctuation. Hence, reshuffling of ranks can be expected solely due to chance. Therefore, we strongly discourage the use of ranking lists in the comparison of mortality rates.

116


Limitations of ranking lists

Introduction Football leagues, college and university rankings, the Thomson Reuters league tables in business â&#x20AC;&#x201C; in all types of branches teams, institutions or companies are ranked based on their performance. Ranking lists are convenient in the way they present results in a simple way: one can see at a glance who is performing well and who is not. The history of ranking lists in cardiac intervention outcomes goes back to 1987, when the Healthcare Financing Administration published Medicare cardiac surgery mortality rates in the US.(1;2) It was the start of public scrutiny in the field of cardiac surgery, which has continued to exist until today. However, such lists could have major consequences in cardiac surgery and the rest of healthcare. After all, feedback on institutions, regulatory interventions, marketing strategies and of course, the choice of physicians and patients might all be influenced by their results. As an example, previously published ranking lists on cardiac surgery mortality rates in New York State led 20% of the bottom quartile surgeons to relocate or cease practicing within two years.(3) With these potential consequences, ranking lists should be scrupulous. However, rankings have been criticized before for their limitations in the comparison of institutional performance. In 1996 Goldstein and Spiegelhalter showed that ranks are misleading when they are interpreted without taking into account their statistical imprecision, i.e. chance variation.(4) The New York State cardiac surgery lists showed massive reshuffling of ranks from year to year. Almost half of the surgeons had moved to the other half of the ranking list in one year, suggesting a substantial impact of chance variation.(5) However, it is unknown to what extent ranking lists differences can be attributed to real differences or are merely reflecting random variation. Since cardiac intervention outcomes are increasingly being evaluated using peer-comparison, it is crucial to know whether ranking lists are the suitable format to do so. Although the statistical limitations have been discussed before(4), ranking lists of cardiac intervention outcomes have never been evaluated before using patient data. Since 2007, the Netherlands Association for Cardio-thoracic Surgery has collected data on all adult cardiac surgery. Using this clinical database, our aim was to determine the precision of, and fluctuations in, ranking lists in the comparison of cardiac surgery mortality rates across centers.

117


Chapter 8

Methods Data Information was extracted from the database of the Netherlands Association for Cardiothoracic Surgery. All records of adult patients undergoing cardiac surgery in all 16 cardiothoracic centers in The Netherlands from 1 January 2007 until 31 December 2009 were used, which comprised 46,883 surgical procedures. The dataset consisted of demographic characteristics, details on the intervention, in-hospital mortality and risk factors for mortality after cardiac surgery, notably EuroSCORE variables.(6) Within-hospital and between-hospital variability When a variable is compared across hospitals, two sources of variability must be distinguished: variability due to chance (i.e. within-hospital variability) and variability due to systematic differences between hospitals (i.e. between-hospital variability). To study these, distribution plots of the variable can be drawn from the collected data for each center. These can then be used to calculate the mean for each center and its corresponding 95% confidence interval. Accordingly, one can decide whether the differences seen across centers can mainly be attributed to within-hospital variability or to between-hospital variability. For example, wide and overlapping distribution plots or confidence intervals indicate large within-hospital variability and small between-hospital variability. Confidence intervals around ranks In contrary to variables such as age, distribution plots and confidence intervals cannot be constructed from the data in such a straightforward way for ranking statistics. In order to do so, a simulation technique called bootstrapping had to be used, which is a flexible way of evaluating the random variation in empirical data.(7) This means that within each center, samples as large as the original sample were drawn from the database with replacement. Resampling was performed 1000 times, thus yielding 1000 simulated databases. A ranking list was constructed in each of these new databases, resulting in 1000 simulated ranks for each center. The distributions of the simulated ranks were then used to calculate the mean rank and 95% confidence interval (i.e., interval between 2.5% and 97.5% quantiles) for each center. Analysis Ranking lists for each year (2007, 2008 and 2009) were constructed based on crude mortality rates as well as risk-adjusted mortality rates using the logistic EuroSCORE. For the latter, a random effects logistic regression model was used, with mortality as the dependent variable, the logistic EuroSCORE as independent variable and hospital as grouping factor.

118


Limitations of ranking lists

This random effects model accounts for within-hospital variability and between-hospital variability.(8-10) Hospitals were ranked according to their random intercepts, which reflects the between-hospital variation. We updated the logistic EuroSCORE model in our data by including it in the regression model as a dependent variable (equivalent to an adjustment of the original intercept).(11) To assess the precision of ranks, all data of 2007, 2008 and 2009 were combined and bootstrapping was applied. This resulted in mean ranks and accompanying 95% confidence intervals for each center. To investigate the possible effects of the heterogeneity of procedures on the precision of ranks, all analyses were repeated in a subgroup of only isolated coronary artery bypass grafting (CABG) procedures.

Results Table 1 shows the types of cardiac surgery included in the database, the mean logistic EuroSCORE and the mortality rates for all years separately and combined. Approximately half of the interventions were isolated CABG and over a third of all procedures involved valvular surgery. The mean mortality rate over three years was 3% and the mean EuroSCORE was 7%. Hospital volumes ranged from approximately 500 to 2000 patients per year and roughly 1600 to 5700 for the three years combined. When hospital volume was included in the benchmarking model, no significant effect was found (when categorized into 5 volume classes, all p-values were above 0.15), which suggests that volume had no effect on the risk of mortality in our data. Table 1. Characteristics of data set

CABG Isolated CABG Valvular surgery Valve and CABG Logistic EuroSCORE Mortality Center volume

2007 N = 15195 71.8% (7.7) 54.6% (8.8) 38.5% (7.8) 14.8% (2.9) 7.1% (1.1) 3.1% (0.8) 446 - 1953 621 / 843 / 1133

2008 N = 15776 69.8% (8.3) 54.2% (8.7) 38.2% (8.5) 13.8% (2.6) 7.0% (0.9) 3.3% (0.8) 523 - 1933 721 / 837 / 1165

2009 All years N = 15912 N = 46883 68.4% (8.4) 69.8% (7.8) 52.8% (8.4) 53.8% (8.3) 39.1% (7.0) 38.6% (7.5) 13.9% (2.2) 14.1% (2.2) 7.1% (1.0) 7.1% (0.9) 2.7% (0.8) 3.1% (0.6) 583 - 1969 1599 - 5730 745 / 857 / 1136 2176 / 2388 / 3434

Analyzed on hospital level: values indicate the means of 16 hospitals, with standard deviations between brackets. Volumes are reported as ranges and quartiles. CABG: coronary artery bypass grafting.

119


Chapter 8

Figure 1 shows the ranking lists based on risk-adjusted mortality rates for the separate years 2007, 2008 and 2009. Reshuffling of ranks is observed across the years using both methods: not a single hospital maintains its rank throughout the three years. Figure 1: Ranking lists based on crude and risk-adjusted mortality rates of all 16 cardio-thoracic Ranking lists 2007-2009 surgery centers in The Netherlands for the years 2007, 2008 and 2009 separately Crude mortality Crude mortality

Risk-adjusted Risk-adjusted mortality mortality

1

A

B

B

1

A

D

B

2

B

E

E

2

D

B

E

3

C

O

C

3

B

J

D

4

D

C

M

4

C

O

G

5

E

F

L

5

I

E

C

6

F

H

A

6

J

C

M

7

G

M

G

7

E

G

A

8

H

K

F

8

G

K

I

9

I

G

I

9

F

I

L

10

J

J

D

10

L

M

F

11

K

D

H

11

K

F

J

12

L

I

O

12

H

H

K

13

M

P

K

13

O

P

O

14

N

N

N

14

M

N

H

15

O

L

J

15

N

L

N

16

P

A

P

16

P

A

P

2007

2008

2009

2007

2008

2009

Rank

Rank

Risk adjustment was performed using a random-effects model. Reshuffling of the ranks is seen across the years in both panels.

The distributions of the simulated ranks are presented in Figure 2. There is large overlap in the distributions of ranks both when crude and adjusted mortality rates are used. A few narrow and peaked curves at the high and low end of the ranking list can be distinguished in the figure. However, most hospitals contribute to the agglomeration of curves in the wide middle segment of the plot. This illustrates that the highest and lowest ranked hospitals are consistently ranked in high and low positions respectively, despite random variability (due to chance). However, most hospitals are in the middle part of the ranking lists, where the flat and wide distribution curves indicate that the hospital ranks are likely to fluctuate due to chance.

120


Limitations of ranking lists

Figure 2: The distributions of the ranks of all 16 cardio-thoracic surgery centers in The Netherlands using pooled data from 2007 to 2009 Distribution Distribution Distribution ofof ranks ranks(2007-2009) of(2007-2009) ranks (2007-2009) Crude Crude mortality mortality Crude mortality

0.20.2

0.2

0.3

Proportion

Proportion Proportion

0.20.2

0.30.3

0.3

Proportion

Proportion Proportion

0.30.3

Risk-adjusted Risk-adjusted Risk-adjusted mortality mortalitymortality

0.2

0.10.1

0.1

0.10.1

0.1

0.00.0

0.0

0.00.0

0.0

1 1 2 2 3 3 4 145 25 6 36 7 47 8 589 69 107101181112912 1310 131411 141512 151613 16 14 15 16

Rank Rank

Rank

1 1 2 2 3 3 4 145 25 6 36 7 47 8 589 69 107101181112912 1310 131411 141512 151613 16 14 15 16

Rank Rank

Rank

Each curve represents the distribution of the simulated ranks in one center. Ranks are assessed with crude mortality rates and adjusted mortality rates (using a random effects logistic regression model). Much overlap in the distribution of the ranks is seen, indicating that most ranks do not significantly differ.

The distribution plots of ranks can be translated into 95% confidence intervals, as seen in Figure 3. Wide intervals are seen with much overlap across hospitals. For the ranking list constructed using crude mortality rates the average width of all confidence intervals was 10 ranks and for the lists using risk-adjusted mortality 8 ranks. This indicates that the ranks are imprecise. The two highest ranked hospitals have ranks that significantly differ from the two lowest ranked, because the confidence intervals do not overlap. As with the distribution plots in Figure 2, this means that the hospitals in the top of the list are not likely to end up in the bottom of the list and the other way around, merely on account of chance. However, all other ways of reshuffling of ranks is very likely to happen due to chance variability, because of the strong overlap of confidence intervals. We identified 25,095 isolated CABGâ&#x20AC;&#x2122;s performed from 2007 until 2009. Subgroup analysis with only isolated CABGâ&#x20AC;&#x2122;s showed similar results. Large confidence intervals of the ranks were seen when crude mortality and adjusted mortality were used: the mean width of the confidence intervals was 11 ranks, as shown in Figure 4.

121


Chapter 8

Figure 3: Ranks of all 16 cardio-thoracic surgery centers in The Netherlands and their 95% confidence 95% Confidence 95% Confidence intervals intervals 2007-2009 2007-2009 intervals using pooled data from 2007 to 2009 Risk-adjusted Risk-adjusted mortality mortality 1

1

2

2

2

3

3

3

3

4

4

4

4

5

5

5

5

6

6

6

6

7

7

7

7

8

8

8

Rank

1

2

Rank

1

Rank

Rank

Crude mortality Crude mortality

8

9

9

9

9

10

10

10

10

11

11

11

11

12

12

12

12

13

13

13

13

14

14

14

14

15

15

15

15

16

16

16

B E

C M B E F

C G M D F H A G D I K H A O IL

center

center

K J

O N P L

J

N P

16 B D

E

C B D G E J

C I

G A O J IK F A O M K H F L

center

center

M N P H L

N P

Each point represents one center; bars indicate 95% confidence intervals. Ranks are assessed with crude mortality rates and adjusted mortality rates (using a random effects logistic regression model). Much overlap in the confidence intervals of the ranks is seen, indicating that most ranks do not significantly differ. Figure 4: Ranks of all 16 cardio-thoracic surgery centers in The Netherlands and their 95% confidence intervals using pooled data from 2007 toIsolated 2009,CABG, based on isolated CABG procedures Isolated 95% CABG, Confidence 95% Confidence intervals intervals 2007-2009 2007-2009only Risk-adjusted Risk-adjusted mortality mortality 1

2

2

2

2

3

3

3

3

4

4

4

4

5

5

5

5

6

6

6

6

7

7

7

7

8

8

8

Rank

1

Rank

1

Rank

Rank

Crude mortality Crude mortality 1

8

9

9

9

9

10

10

10

10

11

11

11

11

12

12

12

12

13

13

13

13

14

14

14

14

15

15

15

15

16

16

16

B C

N E B M C D N O E JM FD H O A J

center

K F

L H G A P K IL

center

G P

I

16 B C

D N B JC E D O N G J K E M O LG F K A M P L

center

center

H F IA P

H I

Each point represents one center; bars indicate 95% confidence intervals. Ranks are assessed with crude mortality rates and adjusted mortality rates (using a random effects logistic regression model). As with all procedures combined, much overlap in the confidence intervals of the ranks is seen, indicating that most ranks do not significantly differ.

122


Limitations of ranking lists

Discussion Principle findings We used data on cardiac surgery in all 16 centers in The Netherlands to investigate the precision of ranking lists of cardiac surgery mortality rates. This study showed that ranking statistics were very imprecise. Ranks were likely to fluctuate merely due to chance and were thus instable. The results held true for both crude and risk-adjusted mortality rates. Statistical imprecision and relativity of ranks When mortality rates are considered, a distinction must be made between variability caused by systematic differences (between-hospital variability) in the mortality rates and that caused by chance (within-hospital variability). If this chance variability is not taken into account, differences between hospitals are exaggerated and do not reflect the true between-hospital variability. In addition to this within-hospital variability, rank statistics have another source of variability due to chance: correlation between ranks. Therefore, the confidence intervals of ranks are even wider than that of mortality rates, which are not correlated. The problem is best illustrated by the following example: when a hospital moves from rank 6 to rank 1, all hospitals ranked from 1 to 5 will go down one rank even without any changes in the underlying mortality rates. In other words, in a ranking list a hospital can move in rank without any change in the underlying mortality rate, but only because another hospital changed. This also means that the mortality rate of a center is always directly compared to other hospitals (relative scale) and cannot be interpreted on its own (absolute scale). Even when a center has a significantly higher or lower rank than other centers, this merely indicates a relative performance. High and low ranks do not necessarily imply absolute high or low performance. Moreover, it has been shown that this form of direct comparison of hospitals is only valid when case-mix between hospitals is comparable and should otherwise not be performed.(12) The width of the confidence intervals represents the extent of chance variation that should be taken into account. The large confidence intervals of ranks thus indicate a large amount of random variation, which is likely to cause reshuffling of ranks merely by chance. In other words, the statistical imprecision of ranks causes the ranks to reflect random variation instead of systematic differences in mortality rates. Without notion of the imprecision of the estimation, one would be unaware of the fact that most values actually do not differ significantly. By definition, a ranking list requires centers to be ranked even when the differences are negligible. Hence, simple ranking lists ignore both the uncertainty around the estimates as well as the magnitude of the differences.

123


Chapter 8

Previous studies on this topic concluded similarly. Ranstam showed that even small amounts of missing data can considerably increase the margin of error around ranks and Feudtner found wide confidence intervals around ranks based on mortality rates as well.(13;14) When only one certain type of procedure was analyzed, in this case isolated CABG procedures, results were similar. The average confidence interval was even slightly larger compared to those resulting from all procedures because of the smaller sample sizes. This indicates that the fluctuations and imprecision of ranks cannot be accounted to the heterogeneity of the population. Consequences for the use of ranking lists The extent of imprecision and fluctuations of ranking lists depend on the sample sizes and the differences in the underlying mortality rates between hospitals. Referring to the first, nearly 47,000 procedures in 16 centers over a period of 3 years were included in our study. In reality even larger sample sizes are hard to realize and more stable ranking lists will be difficult to accomplish for that reason alone. In addition, the variation of hospital volumes is not likely to have affected our results, considering the fact that hospital volume had no significant effect when it was included in the benchmarking model. Referring to the second, larger differences between the hospital mortality rates will likely result in less overlap of distributions and confidence intervals. For example, in the highest and lowest ranked hospitals there was a large difference in the underlying mortality rate, which resulted in fairly stable ranks. One could hypothesize that in a population with greatly diverging mortality rates between hospitals, ranking lists could be more stable than our results might suggest. The relation between within-hospital and between-hospital variance is described as â&#x20AC;&#x153;rankabilityâ&#x20AC;? by Van Dishoeck.(15) Rankability is large when the differences between hospitals dominate the within-hospital variance. Yet, even in that case our general conclusions would hold: 1) the interpretation of ranking lists always requires knowledge about variability due to chance, because it enables to ascertain systematic differences rather than random variation and 2) chance variability is larger in ranking statistics than in mortality rates, because ranks represent a relative scale and are correlated to each other. Considering the fact that simple ranking lists are never reported with confidence intervals or any other unit to describe precision, we strongly discourage their use in the comparison of cardiac surgery mortality rates. The importance of reporting the margin of error around rank estimates is emphasized by other author authors as well. (13;14;16-19) Misinterpretation or plain negligence of the uncertainty surrounding ranks or any other measure will lead to flawed conclusions. Considering the unmerited consequences this might have for some centers, this must be avoided at all times.

124


Limitations of ranking lists

Alternatives to ranking lists The limitations of ranking lists should not be an impediment to outcomes evaluation and provider profiling. Whether outcomes are publicly reported or compared in peerconfidentiality, data collection and feedback seem to be associated with improved outcomes and should therefore be accomplished.(20;21) Fortunately, other ways have been opted to avoid the issues inherent to ranking, yet still report differences in mortality rates. Lingsma and Steyerberg propose to use expected ranks.(22) These are rank statistics based on the probability that a hospital performs worse than any other hospital in the ranking list. The expected ranks incorporate the magnitude of difference and thus allow subtle differences between hospitals (e.g. rank 4, 5 and 6 vs. expected rank 4.1, 4.2 and 5.9). The advantage of this type of “ranking” is that the probabilities of hospitals performing better than any other (i.e. the expected ranks) are not as strongly correlated as usual ranks. When the performance of one hospital changes, the expected ranks of other hospitals do not necessarily have to shift as well. This makes the expected “rank” not so much a rank but another derived measure to compare hospitals. The disadvantage of expected ranks is that statistical imprecision cannot be read from the ranks, nor is it shown as a confidence interval. Again, this makes it difficult to ascertain systematic differences, rather than random variation. Similar to usual ranks, expected ranks enable evaluation of outcomes in a relative way, but not in an absolute way. A more common approach to avoid ranking lists is to compare each hospital against one value. This method is based on the identification of statistical outliers. For example, the STS national database uses a three-tiered rating system for its composite quality scores. Usually, identification of outliers is achieved by assessing confidence intervals of mortality rates and determining their overlap with the overall average mortality rate. When no overlap exists, the mortality rate is significantly different from the overall rate and the concerning hospital is considered to be an outlier.(12) Other, more statistically advanced techniques include Bayesian analysis to investigate statistical difference with an overall value. When the main goal of evaluating mortality rates is quality control and improvement, these types of methods might be more suitable than ranking lists. Possible limitations Because the goal of this study was to investigate the stability of ranking lists and not to find the optimal approach to compare hospitals, many other issues in the comparison of hospital-specific mortality rates were not discussed. This complex subject is extensively discussed in many other papers.(20) The major concerns are in the area of risk adjustment models, differences in treated patients (case-mix), and unmeasured risk factors. The importance of risk adjustment was apparent with the major reshuffling of ranks when crude mortality rates were adjusted for risk. Because the logistic EuroSCORE model is known to have a poor calibration, we recalibrated the model in our data and achieved adequate

125


Chapter 8

model performance.(23) However, it can be debated whether the EuroSCORE model is the best method for risk adjustment and whether unmeasured risk factors have caused differences in mortality rates as well. Although much discussion continuous on this topic, there is no reason to assume that another risk adjustment model would lead to different conclusions concerning the large fluctuations in ranks. Both unadjusted as adjusted mortality rates yielded the same results in this matter. Conclusion In conclusion, rankings are an imprecise statistical method to report cardiac surgery mortality rates. The 95% confidence intervals of most ranks in the ranking list strongly overlap. As a consequence of this, reshuffling of ranks can be expected solely due to chance and this was indeed observed over a period of three years. Therefore, we strongly discourage the use of ranking lists for the purpose of comparison of risk-adjusted cardiac surgery mortality rates.

126


Limitations of ranking lists

Reference List 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23.

Adult cardiac surgery in New York State 2005-2007. Albany, New York: New York State Department of Health; 2010 Apr. Hospital Guide 2010. What makes a good hospital? Dr foster intelligence; 2010 Nov. Jha AK, Epstein AM. The predictive accuracy of the New York State coronary artery bypass surgery report-card system. Health Aff (Millwood ) 2006;25:844-55. Goldstein H, Spiegelhalter DJ. League tables and their limitations: statistical issues in comparisons of institutional performance. Journal of the Royal Statistical Society Series A (Statistics in Society) 1996;159:385-443. Green J, Wintfeld N. Report cards on cardiac surgeons. Assessing New York State’s approach. N Engl J Med 1995;332:1229-32. Nashef SA, Roques F, Michel P, Gauducheau E, Lemeshow S, Salamon R. European system for cardiac operative risk evaluation (EuroSCORE). Eur J Cardiothorac Surg 1999;16:9-13. Metropolis N, Ulam S. The Monte Carlo Method. Journal of the American Statistical Association 1949;44:335-41. Lingsma HF, Steyerberg EW, Eijkemans MJ, Dippel DW, Scholte Op Reimer WJ, Van Houwelingen HC. Comparing and ranking hospitals based on outcome: results from The Netherlands Stroke Survey. QJM 2010;103:99-108. Thomas N, Longford NT, Rolph JE. A statistical framework for severity adjustment of hospital mortality rates. Rand, Santa Monica (CA); 1992. Thomas N, Longford NT, Rolph JE. Empirical Bayes methods for estimating hospital-specific mortality rates. Stat Med 1994;13:889-903. Janssen KJ, Moons KG, Kalkman CJ, Grobbee DE, Vergouwe Y. Updating methods improved the performance of a clinical prediction model in new patients. J Clin Epidemiol 2008;61:76-86. Shahian DM, Normand SL. Comparison of “risk-adjusted” hospital outcomes. Circulation 2008;117:1955-63. Ranstam J, Wagner P, Robertsson O, Lidgren L. Health-care quality registers: outcome-orientated ranking of hospitals is unreliable. J Bone Joint Surg Br 2008;90:1558-61. Feudtner C, Berry JG, Parry G, Hain P, Morse RB, Slonim AD, Shah SS, Hall M. Statistical uncertainty of mortality rates and rankings for children’s hospitals. Pediatrics 2011;128:e966-e972. van Dishoeck AM, Lingsma HF, Mackenbach JP, Steyerberg EW. Random variation and rankability of hospitals using outcome indicators. BMJ Qual Saf 2011;20:869-74. Marshall EC, Spiegelhalter DJ. Reliability of league tables of in vitro fertilisation clinics: retrospective analysis of live birth rates. BMJ 1998;316:1701-4. Parry GJ, Gould CR, McCabe CJ, Tarnow-Mordi WO. Annual league tables of mortality in neonatal intensive care units: longitudinal study. International Neonatal Network and the Scottish Neonatal Consultants and Nurses Collaborative Study Group. BMJ 1998;316:1931-5. van Dishoeck AM, Looman CW, van der Wilden-van Lier EC, Mackenbach JP, Steyerberg EW. Displaying random variation in comparing hospital performance. BMJ Qual Saf 2011;20:651-7. Jacobs R, Goddard M, Smith PC. How robust are hospital ranks based on composite performance measures? Med Care 2005;43:1177-84. Shahian DM, Normand SL, Torchiana DF, Lewis SM, Pastore JO, Kuntz RE, Dreyer PI. Cardiac surgery report cards: comprehensive review and statistical critique. Ann Thorac Surg 2001;72:2155-68. Hannan EL, Sarrazin MS, Doran DR, Rosenthal GE. Provider profiling and quality improvement efforts in coronary artery bypass graft surgery: the effect on short-term mortality among Medicare beneficiaries. Med Care 2003;41:1164-72. Lingsma HF, Eijkemans MJ, Steyerberg EW. Incorporating natural variation into IVF clinic league tables: The Expected Rank. BMC Med Res Methodol 2009;9:53. Siregar S, Groenwold RH, de HF, Bots ML, van der GY, van Herwerden LA. Performance of the original EuroSCORE. Eur J Cardiothorac Surg 2012;41:746-54.

127


9 Gaming in risk-adjusted mortality rates: Effect of misclassification of risk factors in the benchmarking of cardiac surgery risk-adjusted mortality rates

Siregar S, Groenwold RHH, Versteegh MIM, Noyez L, Ter Burg WJPP, Bots ML, van der Graaf Y, van Herwerden LA J Thorac Cardiovasc Surg. 2012 Mar;145(3):781-9


Chapter 9

Abstract Background Upcoding or undercoding of risk factors could affect benchmarking of risk-adjusted mortality rates. The aim was to investigate the impact of misclassification of risk factors on the benchmarking of mortality rates after cardiac surgery. Methods A prospective cohort was used comprising all adult cardiac surgery patients in all 16 cardiothoracic centers in The Netherlands from 1 January 2007 until 31 December 2009. A random effects model including the logistic EuroSCORE was used to benchmark in-hospital mortality rates. We simulated upcoding and undercoding of five selected variables in patients from one center. These patients were selected randomly (nondifferential misclassification) or by EuroSCORE (differential misclassification). Results In random patients substantial misclassification was required to affect benchmarking: a 1.8fold increase in prevalence of four risk factors changed an underperformer into an average performing center. Upcoding of one variable required even more. When patients with the highest EuroSCORE were upcoded (i.e. differential misclassification), a 1.1-fold increase was sufficient: moderate LVF from 14.2 to 15.7%, poor LVF from 8.4 to 9.3%, recent myocardial infarction from 7.9 to 8.6% and extracardiac arteriopathy from 9.0 to 9.8%. Conclusions Benchmarking based on risk factors for mortality rates can be manipulated by misclassification of EuroSCORE-adjusted. Misclassification of random patients or of single variables has little effect. However, limited upcoding of multiple risk factors in high-risk patients can greatly influence benchmarking. To minimize â&#x20AC;&#x153;gamingâ&#x20AC;? the prevalence of all risk factors should be carefully monitored.

130


Gaming in risk-adjusted mortality rates

Introduction As early as 1911 Ernest A. Codman, a surgeon at the Massachusetts General Center recorded and publicly reported errors and outcomes of his patients. He made annual reports on the errors and outcomes in his center and sent them all over the US in order to stimulate others to do the same.(1) In more recent years, interest in the performance of healthcare providers grew rapidly with the 1987 Healthcare Financing Administration publication of Medicare cardiac surgery mortality rates in the US.(2) It caused uproar among cardiac surgeons, who claimed that the figures were inappropriately adjusted for patient severity. This in turn fuelled existing efforts to establish a comprehensive national database and generally applicable risk adjustment models, enabling fair comparison of outcomes between centers. Many risk adjustment models for cardiac surgery were developed in the following years, such as the EuroSCORE, the Parsonnet score and the STS score.(3-5) Before mortality rates across centers can be compared using these models, the validity of the data must be assessed. A commonly discussed issue is data accuracy. Inter-observer variability and ambiguous variable definitions could cause unintentional undercoding and upcoding, which leads to false prevalence rates of risk factors. In addition, in order to “improve” apparent clinical performance, risk factors might be intentionally upcoded to exaggerate patient severity. This phenomenon is also called “gaming”.(6-8) Considering the fact that usually only a small part of the data is audited,(9) if any at all, risk factors used for adjustment are particularly prone to intended misclassification. Despite it being a major concern, the exact effect of this form of gaming on the benchmarking of centers by risk-adjusted mortality rates is unknown. Therefore, the aim of this study was to investigate and quantify the effect of misclassification of risk factors of mortality after cardiac surgery on the benchmarking based on EuroSCORE adjusted mortality rates.

Methods Data The national database of the Netherlands Association for Cardio-Thoracic Surgery (NVT) was used for analysis. Data on all adult cardiac surgery in all 16 cardio-thoracic centers in The Netherlands from 1 January 2007 to 31 December 2009 were extracted from the database. In total 47,539 surgical procedures were included. The dataset comprises date and type of intervention, anonymized patient information, risk factors and outcome. Risk factors for cardiac mortality were defined according to the EuroSCORE model(4). Outcome was measured as in-hospital mortality.

131


Chapter 9

The completeness of the NVT database is exceptionally high, with all 16 centers participating and approximately 99% complete cases. The very low percentage of missing values is unique for a database of this size. Currently audits in the form of site-visits are held to further investigate and improve the quality of the data. Trend analysis showed no signs of improper data collection across the years [results available on request]. Risk adjustment and benchmarking To make a fair comparison of mortality rates between centers, ideally all centers would have to treat exactly the same patients. The differences in mortality can then completely be attributed to differences in the medical care that is offered, instead of the type of patients that are treated. As this cannot happen in reality, so called risk adjustment methods are used to make mortality risks across centers comparable. They adjust mortality rates for preoperative patient risk (patient severity) and constitute a fundamental element in the comparison of outcome between centers.(10) We applied the logistic EuroSCORE model to adjust for preoperative risk. The EuroSCORE model is the most commonly used risk adjustment method in the Netherlands and its definitions are used in the NVT database. In the comparison between centers, an important distinction should be made between random (chance) variability and systematic differences between the centers. If this distinction is not accounted for, random variation due to chance may be considered as a systematic difference, thereby overestimating the between-center variation.(11) In our analysis we used a random effects model, which separates chance variation from systematic variation between centers.(8;12) Several authors have recommended this method in the comparison of outcomes between centers.(8;12) A random effects model was fit with in-hospital mortality as outcome variable and the logistic EuroSCORE as covariate. One fixed intercept and a random intercept for each center were modelled. This regression model thus assumed that mortality is partly explained by patient characteristics (i.e. disease severity quantified by the EuroSCORE) and partly by a center effect, which is specific to each center and can be compared across centers (for more details we refer to Appendix 1). From each center effect (i.e. random intercept) a risk of mortality can be calculated for any value of the EuroSCORE. To improve interpretation we report the risk of mortality for the different centers based on a (hypothetical) patient with a median value of the EuroSCORE. Variables to misclassify Variables to misclassify were chosen based on the clinical probability of misclassification, the weight of the variable in the EuroSCORE model and the prevalence in the database. The variables age and gender are not taken into account, because of the minimal likelihood of misclassification on these variables. These considerations resulted in the selection of

132


Gaming in risk-adjusted mortality rates

the following variables: moderate and poor left ventricle function (LVF), recent myocardial infarction, extracardiac arteriopathy and pulmonary disease. Upcoding and undercoding When â&#x20AC;&#x153;gamingâ&#x20AC;?, the aim is to increase expected mortality rates by upcoding, thus artificially increasing the prevalence of risk factors. However, unintentional misclassification might also involve undercoding. Therefore, we chose to analyze upcoding as well as undercoding. In or our analysis we assumed there was no misclassification in the current database (i.e. reference). To simulate gaming, we upcoded selected variables in random patients in the reference database, also called nondifferential misclassification. In addition, we introduced differential misclassification by upcoding variables in patients with the highest EuroSCORE, since the impact of upcoding was expected to be largest in these patients. The number of patients that are upcoded is expressed as a multiplicative factor. It refers to the relative increase in prevalence compared to the original situation. For example, if the prevalence of a risk factor is increased from 10 to 15%, this is denoted as a factor 1.5, or a 1.5-fold increase in prevalence. The other way around, when the prevalence is decreased from 10 to 5%, this is denoted as a factor 0.5 decrease (see Appendix 2). Outliers were defined as the centers that differ significantly from the overall risk of mortality (for a patient with the median value of the EuroSCORE). Centers with a risk of mortality of which the 95% confidence interval did not cover the overall risk of mortality (for a patient with the median value of the EuroSCORE) were considered to be outliers. We simulated misclassification in one center, while the risk factors in all other centers remained unchanged. Misclassification of one single factor as well as concurrent misclassification in multiple risk factors was simulated. The extent of misclassification (i.e. the number of patients upcoded or undercoded) was increased until the benchmarking results were affected. For upcoding this was the case when a high-mortality outlier became an average center or an average center became a low mortality outlier. For undercoding this was the case when a low mortality outlier became an average center or an average center became a high-mortality outlier. Finally, we simulated nondifferential misclassification in all centers. This refers to the situation that all centers upcode four variables concurrently in random patients. We chose to simulate an increase in prevalence of factor 1.3. Analyses First, benchmarking in the original database was performed. The logistic EuroSCORE was calculated for each patient.(13) Then, a nonlinear random effects model with logit-link was fit with in-hospital mortality as the dependent variable, the logistic EuroSCORE as an independent variable, and a random intercept for each center(12). The random intercepts

133


Chapter 9

are used to calculate center-specific risks of a patient with the median logistic EuroSCORE (Appendix 1). The regression coefficient of the logistic EuroSCORE variable can be considered as a correction factor to recalibrate the EuroSCORE in our data.(14) The fixed part of the intercept refers to the overall risk of mortality. Statistical uncertainty was addressed by estimating 95% confidence intervals (CI) of the random intercepts for all centers using the posterior variances.(12) All simulations were repeated 1000 times, yielding 1000 new datasets for each scenario. Each dataset yielded random intercepts for the centers. These were averaged over the 1000 simulations. All simulations and analyses were performed in R version 2.10.(15) Simulation code is available on request.

Results The prevalence of the risk factors in all centers is summarized in Table 1. Some EuroSCORE variables had a large variation in prevalence. For example, the center-specific prevalence of moderate LVF ranged from 8.2% to 40.9%. For unstable angina and other than isolated CABG the prevalence varied from 2.2% to 13.8% and 29.7% to 62.2% respectively. The risk of mortality for a patient with the median logistic EuroSCORE (3.9%) in each center is shown in Figure 1. The dotted line shows the overall risk of mortality for a patient with the median logistic EuroSCORE. Four outliers could be identified, meaning the confidence intervals of these centers did not cover the overall risk of mortality (for a patient with the median value of the EuroSCORE): centers A and B were low mortality outliers and centers O and P were high-mortality outliers. The prevalence of risk factors in these outliers are listed in Table 1. In about one half of the risk factors, the prevalence in the high and low outliers was significantly different from that in the rest of the centers. The results of the simulated nondifferential upcoding (i.e. in random patients) are shown in Table 2. Centers H and I are average centers in which upcoded was performed to such extent that they became low mortality outliers. The upcoding of one or two variables requires a 4- to 13-fold increase in prevalence to achieve this. When only extracardiac arteriopathy or pulmonary disease were upcoded, benchmark results of center H could not be affected. Concurrent upcoding of four variables by a factor 2.7 and 2.4 respectively led the low mortality outliers to become average centers. In the high-mortality outliers results are comparable. At least a doubling of the prevalence of risk factors is needed to turn these centers into average.

134


Gaming in risk-adjusted mortality rates

Table 1: Center-specific prevalence of risk factors, overall and in outliers All centers

Age (continuous) a 65.9 (63.0-67.2) Female 30.0 (24.8-33.1) Serum creatinine >200 Îźmol/l 2.0 (0.8-3.3) Extracardiac arteriopathy 12.2 (9.0-16.5) Pulmonary disease 11.3 (7.4-17.3) Neurological dysfunction 3.5 (1.1-8.8) Previous cardiac surgery 7.3 (2.5-14.4) Recent myocardial infarct 12.3 (3.4-17.8) LVEF 30â&#x20AC;&#x201C;50% 19.4 (8.2-40.9) LVEF <30% 5.5 (2.7-8.4) Systolic pulmonary pressure 3.2 (1.2-9.3) >60 mmHg Active endocarditis 1.4 (0.7-2.4) Unstable angina 6.2 (2.2-13.8) Emergency operation 6.5 (2.4-10.1) Critical preoperative state 4.7 (2.2-8.7) Ventricular septal rupture 0.2 (0.1-0.5) Other than isolated coronary 45.5 (29.7-62.2) surgery Thoracic aortic surgery 5.4 (1.2-14.4)

Low mortality outliers (A and B) 66.4 30.6 2.0 11.9 13.6 4.8 7.5 12.9 23.2 5.9 2.1

P*

P*

Coefficient in EuroSCORE (13)

<0.01 0.26 0.97 0.42 <0.01 <0.01 0.55 0.13 <0.01 0.12 <0.01

Highmortality outliers (O and P) 65.7 29.7 2.1 13.4 11.0 3.8 9.0 15.1 15.1 5.7 1.9

0.10 0.57 0.40 <0.01 0.35 0.19 <0.01 <0.01 <0.01 0.31 <0.01

0.067 0.330 0.652 0.656 0.493 0.842 1.003 0.546 0.419 1.094 0.768

1.7 5.9 8.3 7.4 0.2 44.4

0.04 0.44 <0.01 <0.01 0.64 0.01

1.4 4.9 6.3 2.8 0.2 45.5

0.89 <0.01 0.46 <0.01 0.63 0.99

1.101 0.568 0.713 0.906 1.462 0.542

3.6

<0.01

5.6

0.33

1.160

Data presented as median, with range in parentheses, unless otherwise noted. EuroSCORE, European system for cardiac operative risk evaluation; LVEF, left ventricular ejection fraction. *P value of difference with other centers. Figure 1: Benchmarking of 16 centers performing cardio-thoracic surgery in The Netherlands

Black diamonds indicate risk of mortality of a patient with the median European system for cardiac operative risk evaluation (EuroSCORE) value (3.9) in each center. Lines indicate corresponding 95% confidence intervals (CIs).

135


Chapter 9

Table 2: Upcoding required to affect benchmarking: random patients upcoded

Sc. Risk factor upcoded

Upcoding of random patients until outlier status was converted From average to low From high mortality mortality outlier outlier to average Center H Center I Center O Center P

1

moderate LVF poor LVF

12.6%  65.8% 14.3%  67.8% 15.4%  44.8% 14.2%  31.3% 2.7%  14.4% 2.7%  12.9% 4.7%  13.7% 8.4%  18.6% Factor 5.25 Factor 4.75 Factor 2.9 Factor 2.2

2

poor LVF

2.7%  37.1% Factor 13.5

2.7%  32.6% Factor 12.0

3

recent MI

9.6%  98.5% Factor 10.25

15.7%  94.5% 17.8%  80.3% 7.9%  57.2% Factor 6.0 Factor 4.5 Factor 7.25

4

extracardiac arteriopathy 14.8%  100%a 11.7%  80.5% 15.1%  62.1% 9.0%  48.4% Not possible Factor 6.9 Factor 4.1 Factor 5.4

5

pulmonary disease

6

11.5%  100% a Not possible moderate LVF 12.6%  33.9% poor LVF 2.7%  7.4% recent MI 9.6%  25.9% extracardiac arteriopathy 14.8%  39.9% Factor 2.7

11.9%  100% Factor 8.4 14.3%  34.3% 2.7%  6.5% 15.7%  37.7% 11.7%  28.0% Factor 2.4

4.7%  27.1% Factor 5.75

11.2%  77.4% Factor 6.9 15.4%  27.8% 4.7%  8.5% 17.8%  32.1% 15.1%  27.3% Factor 1.8

8.4%  27.4% Factor 3.25

10.4%  67.8% Factor 6.5 14.2%  25.6% 8.4%  15.2% 7.9%  14.2% 9.0%  16.1% Factor 1.8

Centers H and I were average centers in which random patients were upcoded to convert the centers to low-mortality outliers; centers O and P were high-mortality outliers in which random patients were upcoded to convert outlier status to average. Data presented as percentages for original prevalence and minimum prevalence needed to convert from average to low outlier or high outlier to average center. LVF, Left ventricular function; MI, myocardial infarction. a Upcoding of specific variables in all patients did not result in a change in benchmarking result; the center remained an average center.

Table 3 presents the results of the differential upcoding, meaning patients with the highest EuroSCORE are misclassified. The extent of upcoding required to affect benchmarking is considerable lower for all variables. For the high-mortality outliers centers O and P a 25% (relative to the original prevalence) increase in moderate and poor LVF will falsely change their benchmark status into average. When four variables are upcoded a limited increase of 7 and 10% for centers O and P respectively was sufficient. Upcoding in center P is illustrated in Figure 2.

136


Gaming in risk-adjusted mortality rates

Table 3: Upcoding required to affect benchmarking: high-risk patients upcoded Upcoding of high-risk patients until outlier status was converted From average to low From high-mortality outlier Sc. Risk factor upcoded mortality outlier to average Center H Center I Center O Center P 1 moderate LVF 12.6%  27.6% 14.3%  24.3% 15.4%  19.3% 14.2%  17.8% poor LVF 2.7%  6.0% 2.7%  4.6% 4.7%  5.9% 8.4%  10.5% Factor 2.2 Factor 1.7 Factor 1.25 Factor 1.25 2 poor LVF 2.7%  6.0% 2.7%  5.6% 4.7%  6.6% 8.4%  10.5% Factor 2.2 Factor 2.05 Factor 1.4 Factor 1.25 3 recent MI 9.6%  75.4% 15.7%  46.4% 17.8%  33.0% 7.9%  14.2% Factor 7.85 Factor 2.95 Factor 1.85 Factor 1.80 4 extracardiac arteriopathy 14.8%  100%a 11.7%  25.7% 15.1%  20.4% 9.0%  13.8% Not possible Factor 2.2 Factor 1.35 Factor 1.55 5 pulmonary disease 11.5%  100%a 11.9%  80.9% 11.2%  24.7% 10.4%  20.3% Not possible Factor 6.8 Factor 2.2 Factor 1.95 6 moderate LVF 12.6%  14.9% 14.3%  16.3% 15.4%  16.5% 14.2%  15.7% poor LVF 8.4%  9.3% 2.7%  3.2% 2.7%  3.1% 4.7%  5.0% recent MI 9.6%  11.4% 15.7%  17.9% 17.8%  19.1% 7.9%  8.6% extracardiac arteriopathy 14.8%  17.6% 11.7%  13.3% 15.1%  16.2% 9.0%  9.8% Factor 1.07 Factor 1.1 Factor 1.19 Factor 1.14 Centers H and I were average centers in which high-risk patients were selectively upcoded to convert the centers to low-mortality outliers; centers O and P were high-mortality outliers in which high-risk patients were selectively upcoded to convert outlier status to average. Data presented as percentages for original prevalence and minimum prevalence needed to convert from average to low outlier or high outlier to average center. LVF, Left ventricular function; MI, myocardial infarction. a Upcoding of specific variables in all patients did not result in a change in benchmarking result; the center remained an average center.

Undercoding is presented in Table 4. It shows that undercoding in one variable is not likely to affect benchmark results, as all or nearly all patients would have to be coded as not having the risk factor. Center A can be falsely benchmarked as an average center when all patients with moderate LVF, poor LVF, recent MI and extracardiac arteriopathy are misclassified as not having these risk factors. For center B undercoding of these four variables in one third of all patients is required. The average center H cannot be converted into a high-mortality outlier by undercoding. For center I this can only be achieved when there is concurrent undercoding in the four previously mentioned variables, leaving only 6% of the original number of patients with the risk factor coded as such (scenario 6). This means that average centers are unlikely to be falsely benchmarked as high-mortality outliers merely due to undercoding.

137


Chapter 9

Figure 2: Upcoding in center P results in the center to be falsely benchmarked as average

Gray diamonds and black squares indicate the risk of mortality of a patient with the median European system for cardiac operative risk evaluation (EuroSCORE) value (3.9) in each center. Gray diamonds were determined from reference data; black squares, upcoded data in center P. The hospital was initially benchmarked as a high-mortality outlier, but changed into an average center after upcoding. This was achieved by an increase in the prevalence of factor 1.25 (relative to the original prevalence) of the variables moderate and poor LVF in high-risk patients. The extent of upcoding required for other centers and variables is listed in Table 2.

Figure 3 illustrates the results of scenario in which all centers upcode to some extent (30% relative to the original prevalence of a risk factor). It shows that misclassification in all centers does not change the results of benchmarking. When the risk of mortality decreases in all centers, the overall risk of mortality against which we benchmark decreases as well. This causes the results of the benchmark to remain approximately the same.

138


Gaming in risk-adjusted mortality rates

Table 4: Undercoding required to affect benchmarking: random patients undercoded

Sc. Risk factor undercoded

Undercoding of random patients until outlier status was converted From low mortality outlier From average to low to average mortality outlier Center A

1

2 3 4 5

6

moderate LVF poor LVF

8.2%  0% 2.9%  0%a Not possible poor LVF 2.9%  0%a Not possible recent MI 12.6%  0%a Not possible extracardiac arteriopathy 14.9%  0% a Not possible pulmonary disease 12.9%  0% a Not possible moderate LVF 8.2%  0% poor LVF 2.9%  0% recent MI 12.6%  0% extracardiac arteriopathy 14.9%  0% Factor 0.0 a

Center B

Center H

Center I

31.7%  14.2% 7.6%  3.4% Factor 0.45 7.6%  0.8% Factor 0.1 13.0%  0%a Not possible 10.2%  0% a Not possible 14.1%  0% a Not possible 31.7%  20.6% 7.6%  4.9% 13%  8.5% 10.2%  6.6% Factor 0.65

12.6%  0% 2.7%  0%a Not possible 2.7%  0%a Not possible 9.6%  0%a Not possible 14.8%  0%a Not possible 11.5%  0%a Not possible 12.6%  0%a 2.7%  0%a 9.6%  0%a 14.8%  0%a Not possible a

14.3%  0%a 2.7%  0%a Not possible 2.7%  0%a Not possible 15.7%  0%a Not possible 11.7%  0%a Not possible 11.9%  0%a Not possible 14.3%  0.8% 2.7%  0.2% 15.7%  0.9% 11.7%  0.6% Factor 0.06

Centers A and B were low-mortality centers in which random patients were undercoded until outlier status converted to average; centers H and I were average centers in which random patients were undercoded to convert the centers to high-mortality outliers. Data presented as percentages for original prevalence and prevalence needed to convert outliers to average centers or average centers to low outliers. LVF, Left ventricular function; MI, myocardial infarction. a Undercoding of the specific variables in all patients did not result in a change in benchmarking result; the center remained a low mortality outlier.

139


Chapter 9

Figure 3: Upcoding of the variables moderate left ventricular function (LVF), poor LVF, recent myocardial infarction and extracardiac arteriopathy in all centers

Gray diamonds and black squares indicate the risk of mortality of a patient with the median European system for cardiac operative risk evaluation (EuroSCORE) value (3.9) in each center. Gray diamonds determined from reference data; black squares, upcoded data in all centers. The prevalence of the mentioned risk factors was increased by 30%in all centers. The benchmarking results remained approximately the same.

Discussion Principle findings Misclassification of variables used for risk adjustment has an effect on benchmarking of centers in the Netherlands based on mortality rates. Extensive misclassification is required to cause small changes when random patients are upcoded. However, benchmarking can be severely distorted by limited upcoding of multiple variables in high-risk patients. Gaming Many have expressed their concern about “gaming” of risk factors in the evaluation of risk-adjusted outcome.(6;7;16) After implementation of the Cardiac Surgery Reporting System in New York State the prevalence of risk factors increased. This caused the predicted mortality to increase and the risk-adjusted mortality to decrease state-wide.(7) In addition, there was a 73% increase in high-risk cases from 1990 to 1992 in New York State.(17) The question however was whether “gaming” was involved and how this may have affected the benchmark results of centers, or surgeons in the New York State case. This study shows that the change in risk profile is difficult to accomplish by upcoding random patients. However, when patients are upcoded who already have a high risk, a hospital’s risk profile is exaggerated more effectively. This can be explained by the logistic model which

140


Gaming in risk-adjusted mortality rates

was used for risk adjustment (logistic EuroSCORE). The relation between the calculated score and the expected risk follows an S-shaped curve. This means that a similar increase in score (by upcoding variables) will augment the expected risk more in a high-risk patient than in a low-risk patient. Considering the fact that currently most risk adjustment models rely on logistic regression analysis, this phenomenon can also be expected with the use of other models. The finding that benchmarking is more sensitive to misclassification in highrisk patients, could direct future audits. For example, larger samples of high-risk patients could be audited. Our results also illustrate that limited misclassification of multiple variables has a more profound effect on benchmarking than extensive misclassification of one risk factor. This method of gaming in multiple variables is more difficult to detect then gaming in a single variable. Scrupulous monitoring of all variables is therefore crucial to minimize â&#x20AC;&#x153;gamingâ&#x20AC;?. When monitoring variables in a database, it should be taken into account that small but structural changes in multiple risk factors are even more treacherous than single odd measures. Other methods of gaming fall outside the scope of this article, but nonetheless should not be forgotten as a possible cause of erroneous benchmarking of mortality rates. In a review on the subject of report cards, Shahian mentioned as examples: the changing of operative class and the transfer of critically ill patients.(8) Unintentional misclassification and inaccurate definitions It has been discussed previously that inter-observer differences can be a source of data variability.(18) However, our results suggest that unintentional, small amounts of misclassification introduced by inaccuracy of data collection are not likely to affect the results of benchmarking based on risk-adjusted mortality rates. Upcoding and undercoding of random patients will only alter a hospitalâ&#x20AC;&#x2122;s benchmarking position when it is performed to a large extent or systematically in high-risk patients. Another issue related to the coding of risk factors is the (in)accuracy of the used definitions. In the EuroSCORE model variables such as moderate LVF and chronic pulmonary disease include a wide spectrum of severity (e.g. from asthma on bronchodilators to end-stage COPD and left ventricular ejection fractions ranging from 30% to 50%). Although this is not an issue of misclassification, it may invalidate the comparison of risk-adjusted outcomes. This is particularly the case when the effect on mortality differs across the spectrum of severity of disease (e.g. the risk of mortality is likely to be higher for patients with end-stage COPD patients than those with asthma) and the distribution differs across hospitals (e.g. some hospitals have mainly asthma patients while others have mainly end-stage COPD patients). Improvement in the accuracy of definitions should always be strived for. For example, the Cardiac Surgery Reporting System in New York State has its definitions refined periodically

141


Chapter 9

to make them as objective as possible.(19) Recently, the EuroSCORE model was updated as well. In the EuroSCORE II, variables such as poor LVF and pulmonary artery pressure have been refined (www.euroscore.org). The accuracy of definitions is reflected by the predictive performance of a model. After all, the concerning risk factor will lose its ability to predict a patientâ&#x20AC;&#x2122;s risk of mortality, which in turn will decrease model performance. In this study, we had no reason to believe that this was the case, because both discrimination and calibration of the model were adequate (after recalibration in our data). Because definitions are used in all centers and for all patients, we expect the effect of broad definitions to be comparable to random misclassification in all centers (Figure 3) and thus small. A study investigating the impact of undercoding in administrative databases on the accuracy of hospital report cards, concluded that undercoding in random patients, in one variable, or in multiple hospitals has minimal effects on the outlier status of hospitals.(20) Our findings seem to be in concordance and suggest that outlier status cannot be completely attributable to the above mentioned points. Nonetheless, both unintentional misclassification and the accuracy of definitions should be taken into account in the evaluation of mortality rates. Where a confidence interval reflects the statistical margin of error, it does not deal with possible imprecision due to the abovementioned issues. Therefore, marginal outliers and marginal average performing centers should be considered as being in a grey area: they should be warned and carefully observed. In addition, the possibility of gaming should be considered in these centers as well. The best way to manage misclassification is obviously to avoid it. In addition to studies like the present one, audits are performed as well to evaluate the quality of the data and to motivate centers to collect data correctly. This will likely reduce the amount of misclassification. However, it would be an illusion to believe that misclassification can be fully eradicated. Therefore, analyses as performed here are a valuable tool in the evaluation of the robustness to misclassification of the benchmarking and evaluation procedure performed. Final notes The impact of upcoding and undercoding depends on the model used for risk adjustment, the distribution of risk factors and the dispersion of the center-specific effects. For example, when a model is used which includes other risk factors, the extent of misclassification will differ from our results. Also, when the between-center variation is larger, outliers will deviate more from the overall expected risk of mortality. This means more misclassification is probably needed to change the benchmarking status of the outlier. The exact results presented in this article might for these reasons be specific to this database. However, considering the large sample size, the wide range of surgical interventions and the commonly

142


Gaming in risk-adjusted mortality rates

used type of risk adjustment model, we expect the general conclusions from this study to be comparable in other large cardiac surgery databases. Patient severity captured in reported risk factors, is not the only factor accounting for differences in mortality rates. Differences are also partly explained by other effects or other risk factors. The presented center-effects are produced by a combination of factors that influence any part of the treatment. It should be stressed that by benchmarking mortality after cardiac surgery we evaluate the whole treatment as one entity. The results can therefore not merely be ascribed to for example surgical skills. The referral process and workup by the cardiologist before admission, the pre- and postoperative care on the ward, anaesthetic care, the process on the intensive care, technical equipment and even patient compliance to the treatment, can all account for the between-center differences. Lastly, misclassified data could affect not only the results of outcomes evaluation, but also the results of secondary research using the data. After all, incorrect prevalence rates will lead to corrupted effect estimates of risk factors. For example, upcoding will cause an overestimation of the effect of the concerning risk factor on mortality. Conclusion Benchmarking based on risk-adjusted mortality rates can be manipulated by misclassification of EuroSCORE risk factors. Misclassification in random patients or in single variables has little effect. However, limited upcoding of multiple risk factors in high-risk patients can greatly influence benchmarking. To minimize â&#x20AC;&#x153;gamingâ&#x20AC;? the prevalence of all risk factors should be carefully monitored.

143


Chapter 9

Reference List 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.

144

Neuhauser D. Ernest Amory Codman MD. Qual Saf Healthcare 2002;11:104-5. Healthcare Financing Administration. Medicare Hospital Information Report. Washington, DC: Government Printing Office; 1992. Edwards FH, Clark RE, Schwartz M. Coronary artery bypass grafting: the Society of Thoracic Surgeons National Database experience. Ann Thorac Surg 1994;57:12-9. Nashef SA, Roques F, Michel P, Gauducheau E, Lemeshow S, Salamon R. European system for cardiac operative risk evaluation (EuroSCORE). Eur J Cardiothorac Surg 1999;16:9-13. Parsonnet V, Dean D, Bernstein AD. A method of uniform stratification of risk for evaluating the results of surgery in acquired adult heart disease. Circulation 1989;79:I3-12. Califf RM, Jollis JG, Peterson ED. Operator-specific outcomes. A call to professional responsibility. Circulation 1996;93:403-6. Green J, Wintfeld N. Report cards on cardiac surgeons. Assessing New York Stateâ&#x20AC;&#x2122;s approach. N Engl J Med 1995;332:1229-32. Shahian DM, Normand SL, Torchiana DF, Lewis SM, Pastore JO, Kuntz RE, et al. Cardiac surgery report cards: comprehensive review and statistical critique. Ann Thorac Surg 2001;72:2155-68. Fine LG, Keogh BE, Cretin S, Orlando M, Gould MM. How to evaluate and improve the quality and credibility of an outcomes database: validation and feedback study on the UK Cardiac Surgery Experience. BMJ 2003;326:25-8. Iezzoni LI. Risk adjustment for measuring healthcare outcomes. Ann Arbor, Mich: Health Administration Press; 1994. Normand SL, Glickman ME, Gatsonis CA. Statistical methods for profiling providers of medical care: issues and applications. Journal of the American Statistical Association 1997;92:803-14. Lingsma HF, Steyerberg EW, Eijkemans MJ, Dippel DW, Scholte Op Reimer WJ, Van Houwelingen HC. Comparing and ranking hospitals based on outcome: results from The Netherlands Stroke Survey. QJM 2010;103:99-108. Roques F, Michel P, Goldstone AR, Nashef SA. The logistic EuroSCORE. Eur Heart J 2003;24:881-2. Peterson ED, DeLong ER, Muhlbaier LH, Rosen AB, Buell HE, Kiefe CI, et al. Challenges in comparing risk-adjusted bypass surgery mortality results: results from the Cooperative Cardiovascular Project. J Am Coll Cardiol 2000;36:2174-84. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2011. Burack JH, Impellizzeri P, Homel P, Cunningham JN, Jr. Public reporting of surgical mortality: a survey of New York State cardio-thoracic surgeons. Ann Thorac Surg 1999;68:1195-200. Hannan EL, Siu AL, Kumar D, Racz M, Pryor DB, Chassin MR. Assessment of coronary artery bypass graft surgery performance in New York. Is there a bias against taking high-risk patients? Med Care 1997;35:49-56. Brown ML, Lenoch JR, Schaff HV. Variability in data: the Society of Thoracic Surgeons National Adult Cardiac Surgery Database. J Thorac Cardiovasc Surg 2010;140:267-73. Chassin MR, Hannan EL, DeBuono BA. Benefits and hazards of reporting medical outcomes publicly. N Engl J Med 1996;334:394-8. Austin PC, Tu JV, Alter DA, Naylor CD. The impact of under coding of cardiac severity and comorbid diseases on the accuracy of hospital report cards. Med Care 2005;43:801-9.


Gaming in risk-adjusted mortality rates

Appendix 1: Calculation and benchmarking of center-specific risks of mortality

Appendix 1: Calculation and benchmarking of center-specific risks of mortality 1. Apply the logistic EuroSCORE model in each patient to calculate the risk of mortality for each patient.

2. Fit the model described in the graph.

3. Fill in the formula: use the random intercept (specific to each center) of the concerning center, the fixed intercept (equal for all centers) yielded by the model fit in 2, the β yielded by the model fit in 2 and use the median logistic EuroSCORE of all patients in all centers. Interpretation: this value represents the risk of mortality in the concerning center for a patient with the median logistic EuroSCORE.

4. Repeat the abovementioned procedure for all centers to calculate all center-specific risks of mortality.

5. Use the variance of the random intercept to calculate the confidence interval of the risks of mortality.

6. Calculate the overall risk of mortality by filling in the formula and using 0 as the random intercept.

7. Compare the risk of mortality in a specific center with the overall risk of mortality. Interpretation: when the confidence interval of the risk of mortality in a specific center overlaps the overall risk of mortality, there is no significant difference between the risk in that center and the overall risk. When there is no overlap, the risk of mortality in the center is significantly higher or lower than the overall risk. In that case the center was considered to be an outlier.

166

145


Chapter 9

Appendix 2: Simulation of misclassification

146


10 The Dutch Hospital Standardized Mortality Ratio (HSMR) method and cardiac surgery: benchmarking using hospital administration data versus a clinical database

Siregar S*, Pouw ME*, Moons KGM, Versteegh MIM, Bots ML, van der Graaf Y, Kalkman CJ, van Herwerden LA, Groenwold RHH

* Both authors contributed equally to this work


Chapter 10

Abstract Background Benchmaring is often performed using hospital administration data. The aim of this study was to compare the accuracy of data from hospital administration databases and a national clinical cardiac surgery database and to compare the performance of the Dutch Hospital Standardized Mortality Ratio method and the logistic EuroSCORE, for the purpose of benchmarking of mortality across hospitals. Methods All patients undergoing cardiac surgery between 1 January 2007 and 31 December 2010 in 10 cardio-thoracic surgery centers in The Netherlands were included. Data was extracted from the Netherlands Association for Cardio-Thoracic Surgery database, containing data collected by cardiac surgeons, and the Hospital Discharge Registry, containing administrative hospital data and compared with regard to the number of cardiac surgery interventions performed. Risk adjustment models based on the two databases were updated in our study population and compared with regard to the discrimination (C-statistic) and the calibration (calibration plots and Brier-score). Results The number of cardiac surgery interventions performed could not be assessed using the administrative database, as the intervention code was incorrect in 1.4 to 26.3%, depending on the type of intervention. In addition, in 7.3% of all cardiac interventions no intervention code was registered. The updated administrative model was inferior to the clinical model with respect to discrimination (c-statistic of 0.77 versus 0.85, p-value for difference <0.001) and calibration (Brier Score of 2.8% versus 2.6%, p-value for difference <0.001, maximum score 3.0). Two outliers according to the updated clinical model became average performing hospitals when benchmarking was performed using the updated administrative model. Conclusions In cardiac surgery, administrative data are less suitable than clinical data for the purpose of benchmarking. The use of either administrative or clinical risk adjustment models can affect the outlier status of hospitals. Risk adjustment models including procedure-specific clinical risk factors are recommended.

148


The Dutch HSMR method and cardiac surgery

Introduction A valid comparison of outcomes between hospitals or healthcare providers (benchmarking) requires adjustment for severity of the health condition of patients and the performed interventions, often referred to as case-mix differences.(1-3) For this purpose prediction models have been developed to estimate risk-adjusted outcomes across hospitals. Most of these models are based on routinely collected administrative hospital data. For example, the hospital standardised mortality ratio (HSMR), first developed by Jarman in 1999 for the United Kingdom, is a risk-adjusted mortality rate calculated using prediction models based on administrative data.(4) Because administrative data are collected for other purposes, they are easily available, and thus the use of these data for benchmarking is cheap and requires relatively little extra effort. However, administrative databases are often criticised of being inaccurate, incomplete, and containing limited information.(5-9) As a consequence, comparisons of risk-adjusted outcome rates between healthcare providers that are based on administrative database data might be unreliable, leading to unjustified criticism. For that reason clinical databases with corresponding clinical prediction models have been developed (e.g. European System for Cardiac Operative Risk Evaluation and Society of Thoracic Surgeons risk models in cardiac surgery) that include multiple clinical predictors for mortality.(10-12) The European System for Cardiac Operative Risk Evaluation (EuroSCORE) is a prediction model that was specifically designed to predict the risk of operative mortality related to cardiac surgery using 18 demographic and risk factors. The EuroSCORE can thus be used to adjust for differences in casemix in the comparison between healthcare providers. Models based on clinical risk factors are claimed to have a better predictive performance, resulting in improved risk adjustment, and enable valid comparison of outcomes across centres.(5-7;13) The downside is that clinical databases are more expensive; they comprise information that is obtained by active data collection by dedicated individuals and thus require continuous maintenance. Previous studies have not come to a conclusive answer to the question if clinical risk factors are necessary for adequate risk adjustment. Some concluded that administrative data are sufficient to enable benchmarking, whereas others show a clear inferiority and insufficiency when compared to clinical data.(6-8;13-20) The aim of our study was to analyse whether a risk adjustment model based on administrative data allows for adequate benchmarking in cardiac surgery. Using a nationwide cohort of cardiac surgery patients, we assessed the accuracy of an administrative database and the predictive performance of administrative models in comparison to a clinical database and the clinical EuroSCORE model.(21)

149


Chapter 10

Methods Data Both EuroSCORE and administrative variables of a national cohort of cardiac surgery patients in The Netherlands have been collected in two separate databases: 1.) The adult national cardiac surgery database of the Netherlands Association for Cardio-Thoracic Surgery (NVT) and 2.) The National Hospital Discharge Registry (HDR) of The Netherlands.(21-23) The adult national cardiac surgery database of the Netherlands Association for CardioThoracic Surgery This clinical database has a national coverage with participation of all 16 centres performing cardiac surgery in The Netherlands.(21) All patients undergoing cardiac surgery excluding transfemoral aortic valve implantation, circulatory assist devices and pacemakers, are included in the database. Ten out of 16 cardiac centres participated in our study, in which 34229 consecutive procedures were performed between 1 January 2007 and 31 December 2010. Procedures with incomplete data were excluded (N=218, 0.6%), resulting in 34011 procedures for further analyses. The dataset consisted of predictors for mortality as listed in Table 1, defined according to the EuroSCORE.(10) The EuroSCORE was developed to estimate the operative risk of mortality related to cardiac surgery (within 30-days and/or during the same hospital admission).(11) In this study, the EuroSCORE was used to estimate the risk of in-hospital mortality. The Hospital Discharge Registry The Hospital Discharge Registry contains administrative data of all 10 hospitals included in this study. The dataset consists of patient characteristics and admission details such as age, comorbidity, sex and urgency of admission. For interventions the International Classifications of Health Interventions (ICHI) coding system is used and for diagnoses the International Classification of Disease (ICD-9).(24) The Dutch HSMR method is based on the HDR database and uses 50 risk adjustment models, each for one specific group of diagnoses. The models estimate the risk of mortality for patients with a diagnosis belonging to the specific diagnosis group.(22) Linkage of datasets In order to compare the HDR and the NVT database and the models based on them, information on cardiac surgery interventions was required from both databases. Therefore, the HDR and NVT databases were linked to identify similar records. Both the HDR and NVT database contain anonymised data, meaning no directly identifying information is stored. Records from both databases were linked to the municipal registries based on date of birth,

150


The Dutch HSMR method and cardiac surgery

gender and zipcode, and were subsequently linked to each other. The linkage was performed by Statistics Netherlands and is described in previous publications.(21;25;26) In total 26,178 (77%) records from the NVT database could be linked to a record in the HDR database and were used for further analyses. The predicted mortality according to the logistic EuroSCORE did not differ between the linked and the non-linked population (median 3.7%). Reasons for failed linkage were: the HDR record could not be linked to the municipal registries or no HDR record existed for the specific intervention (18.7%), the NVT record could not be linked to the municipal registries (2.7%), or no administrative model was available for the record (1.6%). The linkage of the HDR database to the municipal registries caused most linkage failure, as only 4 digits (out of 6) of the zipcode were available in the HDR database. Comparison of data between the NVT and HDR database: intervention and in-hospital mortality The type of intervention and the outcome in-hospital mortality were compared between the registries. Considering the fact that the NVT and the HDR registries use other risk factors for risk adjustment, these were not compared. The NVT database was used as the reference for the type of intervention, because this information is collected by the surgeons themselves. The HDR database was used as the reference for in-hospital mortality, as the date of mortality is extracted directly from the up-to-date municipality registers. The comparison of in-hospital mortality between both databases was performed on patient level (as opposed to intervention level), to avoid persons being counted multiple times for mortality. Comparison of risk adjustment models The administrative and clinical model The Dutch Hospital Standardized Mortality Ratio (HSMR) method (models based on administrative data) and the logistic EuroSCORE (model based on clinical data) were applied in their original form to our study population, to predict the risk of in-hospital mortality in our study population.(10;22) These models will subsequently be called Administrative.1 and Clinical.1. Existing risk adjustment models can be updated to a new study population. Updated models are adjusted to the characteristics of that population and are likely to show improved generalisability.(27) For this reason, the Administrative.1 and Clinical.1 models were updated in our study population (i.e. the linked data from both databases). There are several methods to update a risk adjustment model.(27) As cardiac surgery interventions are incorporated in multiple Dutch HSMR models (i.e. several diagnosis groups), one model for cardiac surgery was constructed using stepwise backward selection based on Akaikeâ&#x20AC;&#x2122;s Information Criterion.(28) This means that the intercept and the coefficients of all included covariates were re-estimated in our study population and only relevant risk factors were

151


Chapter 10

included in the updated model. To update the EuroSCORE model, the intercept and the coefficients of all included covariates were also re-estimated in our study population. This resulted in the models Administrative.2 and Clinical.2. The models can be updated even more thoroughly by inclusion of interaction terms, in order to maximise risk adjustment in our study population. Thus, first-order interaction terms between all covariates were added to the models Administrative.2 and Clinical.2, resulting in the models Administrative.3 and Clinical.3.(28) Comparison of model performance The predictive performance of a risk adjustment model is quantified by means of calibration and discrimination. Discrimination refers to the ability of a model to differentiate between subjects with and without the outcome and depends on the variables included in the model. A model is able to discriminate if subjects with the outcome tend to have higher predicted probabilities than those without the outcome. The discrimination of the models was quantified using the area under the ROC-curve, which is equivalent to the c-statistic. The 95% CI of the c-statistic and the difference between two c-statistics was tested using DeLongâ&#x20AC;&#x2122;s test.(29) The calibration of a risk model refers to the ability of a model to predict how many patients will have the outcome. It is measured on an aggregated level, in this case on hospital level. The calibration of was assessed by inspection of calibration plots and the Brier Score. The Brier Score measures model accuracy on patient level by squaring and summing the difference between the predicted and the observed outcome per patient. The method by Redelmeier was used to estimate the 95% CI of the Brier Score and test the difference between two Brier scores.(30) Benchmarking Ultimately, differences in the performance of risk adjustment models are only relevant to the comparison of hospitals if they actually affect the results of benchmarking. In this study, benchmarking is performed by calculating the standardised mortality ratio (SMR) for all hospitals. The SMR is calculated by dividing the observed mortality with the expected mortality within a hospital. SMRs of the administrative and clinical models were compared. Centres with a SMR for which the 95% confidence intervals (CI) did not cover the value 1 were considered to be outliers. The 95% CI of the SMRs were estimated using the method described by Breslow and Day.(31) All analyses were performed in R version 2.15.(32)

152


The Dutch HSMR method and cardiac surgery

Results Risk factor coding The risk factors in the linked subset from both the administrative and clinical database are presented in Table 1. Mean age was 66.6 years (+/- 10.7) and 29.5% of patients were female. A comparison of the prevalence of risk factors could not be made, as the definitions differed between the administrative and the clinical database. Number of cardiac interventions performed (by type of intervention) In total 14300 (54.6%) isolated CABG procedures were performed according to the NVT database. Other frequently performed interventions were: aortic valve replacement with or without concomitant CABG (12.1 and 8.3% respectively) and mitral valve repair with or without concomitant CABG (3.1 and 2.7% respectively). The proportion of isolated CABG, isolated aortic valve replacement, isolated mitral valve repair and isolated mitral valve replacement which was coded with the correct main intervention code in the HDR ranged from 64.6% to 92.2% (Table 2). The intervention code in the HDR was missing in 1923 (7.3%) procedures. As a result, the number of cardiac surgery interventions could not be accurately assessed using HDR data. In-hospital mortality In-hospital mortality in the HDR database is derived from the municipal registries and highly accurate. In the NVT database 42 of 762 (5.5%) patients who died during admission were not coded as such and the other way around, 36 of 25005 (0.1%) survivors were incorrectly coded as in-hospital mortality during the same hospital admission.

NVT database (clinical data)

Table 2: Comparison of intervention type and in-hospital mortality Hospital Discharge Registry (administrative data) Intervention type Correct main Incorrect main No code intervention code intervention code Isolated CABG 14300 (100%) 13185 (92.2%) 197 (1.4%) 918 (6.4%) Isolated AoV replacement 3157 (100%) 2461 (78.0%) 457 (14.5%) 239 (7.6%) Isolated MV repair 820 (100%) 625 (76.2%) 134 (16.3%) 61 (7.4%) Isolated MV replacement 316 (100%) 204 (64.6%) 83 (26.3%) 29 (9.2%)

NVT: Netherlands Association for Cardio-Thoracic Surgery, AoV: aortic valve, MV: mitral valve, CABG: coronary artery bypass grafting.

153


Chapter 10

Table 1: Variables recorded in the administrative database (HDR) and the clinical database (NVT) Administrative variables

N (%) N = 26178

Clinical variables OR in updated model Age (continuous) reference

66.5 (± 10.7) Age <25 years (categories of 5 years up to >85) 0.29-1.01 Female Sex 7714 ( 29.5) 1.44*** Acute myocardial 1899 (7.3) 1.25 infarction Congestive heart 696 (2.7) 4.11*** failure Pulmonary disease 623 (2.4) Renal disease 293 (1.1) 3.62*** Urgency 3292 (12.6) 2.20*** Peripheral vascular 551 (2.1) 2.17*** disease Cerebral vascular 241 (0.9) 2.75*** accident Peptic ulcer 51 (0.2) 4.36*** Social economic status Lowest 5450 (16.5%) reference Below average 5379 (16.3%) 0.89 Average 4999 (15.1%) 0.79* Above average 5801 (17.5%) 0.70** Highest 4541 (13.7%) 0.95 Unknown 6925 (20.9%) Year of discharge 2007 6829 (20.6%) reference 2008 6697 (20.2%) 1.04 2009 6941 (21.0%) 0.83 2010 5711 (17.3%) 0.69*** Unknown 6917 (20.9%) Admission from Home 19907 (60.2%) reference Nursing home 145 (0.4%) 3.41*** General hospital 4952 (15.0%) 1.27** Academic center 1174 (3.5%) 1.38* Unknown 6917 (20.9%)

Female sex Recent myocardial infarction (<90 days) LVEF 30–50% LVEF <30% Pulmonary disease Serum creatinine >200 μmol/l Emergency operation Extracardiac arteriopathy

N (%) N = 26178

OR in updated model 66.6 (± 10.7) 1.06***

7714 (29.5) 3191 (12.2)

1.33*** 1.57***

4165 (15.9) 1324 (5.1) 3019 (11.5) 464 (1.8) 1317 (5.0) 3202 (12.2)

1.69*** 2.95*** 1.79*** 2.79*** 2.38*** 1.83***

Neurological dysfunction

780 (3.0)

1.26

Previous cardiac surgery Systolic pulmonary pressure >60 mmHg Active endocarditis Unstable angina Critical preoperative state Ventricular septal rupture

1709 (6.5) 606 (2.3)

2.78*** 1.97***

216 (0.8) 15776 (6.0) 983 (3.8) 47 (0.2)

1.45 1.95*** 2.51*** 3.93***

Other than isolated CABG Thoracic aortic surgery

11809 (45.1) 3.43*** 1258 (4.8) 2.75***

For dichotomous variables the number of patients and percentage of total population is reported; for continuous variables the mean and standard deviation. OR: odds ratio in multivariable logistic regression risk adjustment model re-estimated in the study population; CABG: coronary artery bypass grafting; IQR: interquartile range; LVEF: left ventricular ejection fraction. *p < 0.05; **p < 0.01; ***p < 0.001.

154


The Dutch HSMR method and cardiac surgery

Calibration of the administrative models and the clinical models Calibration of the risk models is shown in Figure 1. Both the original models (Administrative.1 and Clinical.1) were poorly calibrated. Administrative.1 underestimated the risk of mortality, whereas Clinical.1 overestimated the risk of mortality. Updating improved calibration of both models, as the difference between observed and predicted mortality became smaller. However, in all model pairs the Brier Score for the administrative models remained significantly higher in comparison to the clinical models, indicating inferior calibration of the administrative model (Table 3). The maximum Brier score in this data was 3.0%. Rescaling of the Brier Score on a scale from 0 to 100% would result in a score of 93.8% for Administrative.3 and 87.8% for Clinical.3. Results were comparable in the subgroup analyses on isolated CABG procedures (Figure 1 and Table 3), where the maximum Brier score that was possible in these data was 1.3%. Figure 1: Calibration plot of the three clinical models and the three administrative models

0.00

B

0.00

0.05

0.10

0.15

Predicted mortality

0.20 0.20

0.20 0.15

0.20

0.00

0.05

0.10

0.15

0.20

Predicted mortality

Updated models

Updated + interaction models

● ● ● ● ● ● ● ● ● ● ● ● ●

0.00

0.10

Observed mortality

0.00 0.15

0.15

0.00

● ● ●● ● ● ●● ●● ● ●●●

0.10

● ●● ●● ● ● ● ● ● ● ● ● ● ●

Predicted mortality

0.10

Observed mortality

0.05

Clinical.2 Administrative.2

0.05

0.15 0.10

0.05 0.00

Observed mortality

0.20

Original models Clinical.1 Administrative.1

0.05

0.15

0.20

0.20

0.15

0.05

0.10

0.15

Predicted mortality

0.20

Clinical.3 Administrative.3

0.15

0.10

Predicted mortality

Clinical.3 Administrative.3

0.10

0.05

● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ●

Observed mortality

0.00

0.00

● ● ● ●● ● ● ●●●● ● ● ●

0.10

Observed mortality

0.05

0.15 0.10

● ●

Updated + interaction models

Clinical.2 Administrative.2

0.05

0.20

Updated models

0.05 0.00

Observed mortality

0.20

Original models Clinical.1 Administrative.1

0.00

A

● ●● ● ● ● ● ● ● ● ● ●

0.00

0.05

0.10

0.15

0.20

Predicted mortality

The calibration plots of the clinical models are depicted in red and the calibration plots of the administrative models in blue. Panel A: models fitted on all cardiac surgery. Panel B: models fitted on isolated coronary artery bypass grafting procedures.

155


Chapter 10

Table 3: Brier Score of the three clinical models and the three administrative models, for all cardiac surgery and for only isolated coronary artery bypass surgery All cardiac surgery administrative clinical

Brier Scores

P value difference Original models 2.9 % [2.8-3.0] 3.0 % [2.8-3.2] 0.093 Updated models 2.9 % [2.7-3.1] 2.7 % [2.5-2.9] <0.001 Updated + interaction 2.8 % [2.6-3.0] 2.6 % [2.5-2.8] <0.001 terms

Isolated CABG surgery administrative clinical P value difference 1.3 % [1.2-1.5] 1.4 % [1.1-1.7] 0.030 1.3% [1.1-1.4] 1.2 % [1.0-1.3] <0.001 1.2% [1.1-1.4] 1.2 % [1.0-1.3] 0.026

Brier scores range from 0 to a value depending on the prevalence of the outcome. The maximum Brier score that was possible in this data was 3.0% for all cardiac surgery and 1.3% for isolated CABG. A lower Brier score indicates better calibration. Brackets denote 95% confidence intervals. CABG: coronary artery bypass grafting.

Figure 2: Area under the ROC-curve of the clinical and the administrative models for the prediction of in-hospital mortality

0.6

0.8

1.0

1.0 0.8 0.0

0.2

0.4

0.6

0.8

0.0

Clinical.3 Administrative.3

1.0

0.0

0.2

0.4

0.6

0.8

1.0

Original models

Updated models

Updated + interaction models

0.2

0.4

0.6

1 − Specificity

0.8

1.0

0.8 0.6

Sensitivity

0.2 0.0

0.2

0.4

0.6

1 − Specificity

0.8

1.0

Clinical.3 Administrative.3

0.0

Clinical.2 Administrative.2

0.0

0.0

Clinical.1 Administrative.1

0.844 [0.815−0.873] 0.778 [0.744−0.811]

0.4

0.8 0.6

Sensitivity

0.2

0.4 0.2

0.829 [0.798−0.860] 0.750 [0.715−0.784]

0.4

0.6

0.8

1.0

1 − Specificity

1.0

1 − Specificity

0.828 [0.798−0.859] 0.756 [0.719−0.793]

0.0

0.6

Sensitivity

0.2

0.2 0.4

Clinical.2 Administrative.2

1 − Specificity

1.0

B

0.2

0.846 [0.833−0.860] 0.773 [0.756−0.789]

0.4

0.8 0.6

Sensitivity

0.838 [0.825−0.852] 0.756 [0.739−0.772]

0.0

0.0

Clinical.1 Administrative.1 0.0

Sensitivity

Updated + interaction models

0.4

0.6 0.4

0.838 [0.825−0.851] 0.788 [0.772−0.804]

0.2

Sensitivity

0.8

1.0

Updated models

1.0

Original models

A

0.0

0.2

0.4

0.6

0.8

1.0

1 − Specificity

The ROC-curves of the clinical models are depicted in red and the ROC-curves of the administrative models in blue. Panel A: models fitted on all cardiac surgery. Panel B: models fitted on isolated coronary artery bypass grafting procedures.

156


The Dutch HSMR method and cardiac surgery

Figure 3: Benchmarking using Standardized Mortality Ratio (SMR) calculated by the clinical models and the administrative models A

Original models

Updated models

Clinical.3 Administrative.3

A

A

B

B

C

C

C

B

D

D

D

E F

Hospitals

A

E F

G F

G

G

E

H

H

H

I

J

J

J

I

I

0.0

0.5

1.0

1.5

2.0

2.5

0.0

0.5

SMR

B

Original models Clinical.1 Administrative.1

1.0

1.5

2.0

0.0

Clinical.3 Administrative.3

B

B

C

C

C

A

Hospitals

E D G

E D G

G

F

H

H

H

F

I

I

I

J

J

J

0.0

1.0

2.0 SMR

3.0

2.0

Updated + interaction models

A

F

1.5

Updated models

B

E

1.0 SMR

A

D

0.5

SMR

Clinical.2 Administrative.2

Hospitals

Hospitals

Updated + interaction models

Clinical.2 Administrative.2

Hospitals

Hospitals

Clinical.1 Administrative.1

0.0

0.5

1.0

1.5 SMR

2.0

2.5

3.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

SMR

The Standardized Mortality Ratios (SMRs) of the clinical models are depicted in red and the SMRs of the administrative models in blue. Panel A: models fitted on all cardiac surgery. Panel B: models fitted on isolated coronary artery bypass grafting procedures.

Discrimination of the administrative models and the clinical models Discrimination of the models is shown in Figure 2. The c-statistics of the administrative models (0.756 â&#x20AC;&#x201C; 0.788) are substantially lower than that of the clinical models (0.838 â&#x20AC;&#x201C; 0.846), indicating inferior discrimination of the administrative models (p <0.001 for all three model pairs). Updating of the administrative model did not improve the discrimination (Figure 2). The effect on benchmarking The effect of the use of administrative versus clinical models on benchmarking is shown in Figure 3. The majority of standardised mortality ratios (SMR) calculated using the original administrative model were higher than 1, which indicates that the model underestimated the risk of mortality. For the original clinical model the opposite was found: the model overestimated the risk of mortality. 157


Chapter 10

Updating of models resulted in better predictions on hospital level (SMRâ&#x20AC;&#x2122;s closer to 1). However, a considerable difference was found between the updated administrative versus the updated clinical models, for example in hospital B and hospital C (Figure 3). The expected number of deaths per hospital according to the updated administrative models ranged from 35% less to 34% more than that calculated according to the updated clinical models. The SMRâ&#x20AC;&#x2122;s calculated using the clinical and administrative models yielded different outliers. Hospital C and hospital J changed outlier status when either the updated model Administrative.3 or Clinical.3 were used. The analyses using only isolated CABG surgery yielded comparable results as those based on all cardiac surgery data (Figure 3).

Discussion Principle findings This study compared 1) data accuracy in the administrative HDR database to that in the clinical cardiac surgery database of the Netherlands Association of Cardio-Thoracic Surgery (NVT) and 2) the predictive performance of administrative models to the that of the clinical EuroSCORE model. The reported intervention code in the administrative database was incorrect in up to 26%, depending on the type of surgery. As a result, the number of cardiac surgery interventions could not be accurately assessed. After updating of the models to our data, the calibration of the administrative model was inferior to that of the clinical model. The importance of this shortcoming is marked by the identification of other outliers when used for benchmarking of hospitals. Why models based on administrative data have inferior calibration and discrimination When developing a risk prediction model, the first logical step is to consider which variables could be predictors for the outcome. In this search for predictors, administrative models are limited to the routinely collected variables, which might not necessarily be the strongest predictors of the outcome. In our study, several strong predictors for mortality (shown in Table 1) were not available in the administrative database. The other way around, administrative risk factors that were not in the clinical database and were strongly associated with mortality, had a low prevalence in our study population. The lack of several strong predictors for mortality is likely to have affected the calibration and discrimination of the administrative models. Previous studies reported that much of the predictive performance of risk models is derived from a relatively small number of clinical variables and the predictive performance of administrative models could be improved with the addition of a limited number of clinical variables.(7;13;19;33)

158


The Dutch HSMR method and cardiac surgery

Why administrative data are inferior to clinical data for benchmarking purposes The requirements of a risk adjustment model depend on its goal. For benchmarking an adequate calibration is required: the model should adequately predict the expected mortality rate in a hospital. It can be seen as a scale that should weigh correctly. The performance of a scale mainly depends on its ability to weigh a (kilo)gram. If this feature is adequate, but the weighing is off par, the scale can be reset to zero to adjust it to any new situation. Similarly, the performance of a model depends on the strength of the predictors in the model (i.e. discrimination), as the model can be re-calibrated to update it in time or to make it suitable for a new population. It follows from aforementioned that the inferior discrimination of administrative models (in comparison to clinical models) will result in inferior calibration. At first sight, the clinical importance of this difference in calibration and discrimination may not be readily apparent. However, it is shown in this study that the choice for either the administrative or the clinical model could very well affect the outlier status of a hospital. Other issues in the use of administrative data There are other reasons why the HDR database with routinely collected data turned out to be unsuitable for analyses of outcomes in cardiac surgery. Firstly, the accuracy of intervention codes was unsatisfactory. For a considerable number of records in our study population the intervention code was incorrect, unspecified (e.g. â&#x20AC;&#x153;cardiac surgeryâ&#x20AC;?) or missing. Consequently, the number of cardiac surgery interventions performed could not be reliably assessed. Previous studies have also reported discrepant counts of operations in administrative data versus clinical data.(8;17;34) Inaccurate coding could be attributed to the fact that data were collected by persons who were not actively involved in the clinical care and thus were dissociated from clinical information that could be necessary for correct reporting of data.(35) In addition, occasionally not all interventions and diagnoses are recorded. Also, admission and discharge dates are collected, instead of dates of intervention. This has been reported before as an important reason for variance in cardiac surgery volumes between administrative and clinical databases.(17) Furthermore, the HSMR-method uses administrative models for specific diagnosis codes. However, in cardiac surgery analyses of outcomes is performed by intervention type, as risk is considered to be mainly related to the performed intervention. Implications for practice The use of administrative data has many advantages over the use of clinical data. The data are routinely collected and stored, making them cheaper and readily available. Not surprisingly, many have opted to use administrative models as a rough indicator for quality, such as seen in the HSMR-method. However, the apparent benefits should be carefully weighed against

159


Chapter 10

the limitations and drawbacks of administrative models, when compared to clinical models. Public benchmarking in general can be dangerous in the sense that the general public cannot be expected to understand the limitations and the prerequisites under which the results should be interpreted. As the limitations are more pronounced for administrative data, this aspect of benchmarking should be considered even more thoroughly when administrative data are used. This is particularly important because poor results in benchmarking could have far reaching consequences when known to healthcare consumers, the media, health insurance companies or governmental bodies. In this context, development of models with a high predictive performance, which might include clinical risk factors, should be strived for at all times. If clinical data are already collected, their availability for benchmarking should be encouraged. On the other hand, clinical data appeared to have an evident weakness as well. The outcome in-hospital mortality was misclassified in nearly 6% of the records in the clinical database used in this study. For outcomes such as vital state and readmissions, administrative databases were highly accurate, as information was derived from municipal registries. Administrative data sources could be used to verify outcomes data, thus complementing clinical databases. In this way, the strengths of both types of data are combined in order to optimise benchmarking in healthcare. The findings in this study are likely to hold true for populations other than cardiac surgery patients and in other countries in the world. Most probably, other specific surgical interventions such as for example oesophageal or hepato-biliary surgery, also require adjustment for risk factors not commonly included in administrative databases. Consequently, benchmarking in those populations will result in similar issues as encountered in this study. Possible limitations These analyses were based on data from 10 out of 16 cardiac surgery centres in The Netherlands. In general, the population of the six hospitals not participating in this study did not differ from the study population with regard to age, sex and the median logistic EuroSCORE. However, it is unknown if the results with regard to data accuracy are generalisable to all centres. Secondly, the sensitivity of the linkage between the clinical and the administrative database was 77%. Although we did not find a difference in the overall risk profile between the linked and non-linked records, we do acknowledge that a substantial part of the total population was excluded from the analyses. We have no reason to believe that administrative models would perform any differently in the non-linked records or that data accuracy was better in the non-linked records. The conclusions of our study are thus unlikely to be affected by this limitation.

160


The Dutch HSMR method and cardiac surgery

The goal of this study was to assess the accuracy of administrative data and the predictive performance of the accompanying models. As such, it was not our intention to design a new model for risk prediction in cardiac surgery. Thus, we chose to stay in line with the methods used to construct the original models and refrain from further sophisticated methods such as hierarchical modelling and shrinkage of coefficients. The outcome in this study is in-hospital mortality. Several publications have previously shown why mortality at fixed time intervals is a more appropriate measure in outcomes evaluation. We acknowledge the limitations of this outcome and we are aware that mortality is one of the several indicators that can be used to measure quality, but certainly not the only one. We chose this outcome, as the original administrative models were developed using in-hospital mortality. For the purpose of our study, we have no reason to believe this has affected our results, as both the clinical and the administrative models were fitted on this outcome. Conclusion Although there are advantages to the use of administrative models for benchmarking in cardiac surgery, their calibration and discrimination (and thus performance in benchmarking) is inferior to that of clinical models. The use of either an administrative or a clinical model may affect the outlier status of hospitals. Therefore, in specific populations such as cardiac surgery, the use of prediction models including clinical risk factors is recommended.

161


Chapter 10

Reference List 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.

17. 18. 19.

162

Iezzoni LI. Risk adjustment for measuring healthcare outcomes. Ann Arbor, Mich: Health Administration Press; 1994. Shahian DM, Normand SL, Torchiana DF, Lewis SM, Pastore JO, Kuntz RE, et al. Cardiac surgery report cards: comprehensive review and statistical critique. Ann Thorac Surg 2001 Dec;72(6):2155-68. Heijink R, Koolman X, Pieter D, van d, V, Jarman B, Westert G. Measuring and explaining mortality in Dutch hospitals; the hospital standardized mortality rate between 2003 and 2005. BMC Health Serv Res 2008;8:73. Jarman B, Gault S, Alves B, Hider A, Dolan S, Cook A, et al. Explaining differences in English hospital death rates using routinely collected data. BMJ 1999 Jun 5;318(7197):1515-20. Bohensky MA, Jolley D, Pilcher DV, Sundararajan V, Evans S, Brand CA. Prognostic models based on administrative data alone inadequately predict the survival outcomes for critically ill patients at 180 days post-hospital discharge. J Crit Care 2012 May 15. Brinkman S, Abu-Hanna A, van der Veen A, de JE, de Keizer NF. A comparison of the performance of a model based on administrative data and a model based on clinical data: effect of severity of illness on standardized mortality ratios of intensive care units. Crit Care Med 2012 Feb;40(2):373-8. Hannan EL, Racz MJ, Jollis JG, Peterson ED. Using Medicare claims data to assess provider quality for CABG surgery: does it work well enough? Health Serv Res 1997 Feb;31(6):659-78. Shahian DM, Silverstein T, Lovett AF, Wolf RE, Normand SL. Comparison of clinical and administrative data sources for hospital coronary artery bypass graft surgery report cards. Circulation 2007 Mar 27;115(12):1518-27. Glance LG, Dick AW, Osler TM, Mukamel DB. Accuracy of hospital report cards based on administrative data. Health Serv Res 2006 Aug;41(4 Pt 1):1413-37. Nashef SA, Roques F, Michel P, Gauducheau E, Lemeshow S, Salamon R. European system for cardiac operative risk evaluation (EuroSCORE). Eur J Cardiothorac Surg 1999 Jul;16(1):9-13. Roques F, Michel P, Goldstone AR, Nashef SA. The logistic EuroSCORE. Eur Heart J 2003 May;24(9):8812. Shahian DM, Oâ&#x20AC;&#x2122;Brien SM, Filardo G, Ferraris VA, Haan CK, Rich JB, et al. The Society of Thoracic Surgeons 2008 cardiac surgery risk models: part 1--coronary artery bypass grafting surgery. Ann Thorac Surg 2009 Jul;88(1 Suppl):S2-22. Geraci JM, Johnson ML, Gordon HS, Petersen NJ, Shroyer AL, Grover FL, et al. Mortality after cardiac bypass surgery: prediction from administrative versus clinical data. Med Care 2005 Feb;43(2):14958. Bratzler DW, Normand SL, Wang Y, Oâ&#x20AC;&#x2122;Donnell WJ, Metersky M, Han LF, et al. An administrative claims model for profiling hospital 30-day mortality rates for pneumonia patients. PLoS One 2011;6(4):e17401. Gordon HS, Johnson ML, Wray NP, Petersen NJ, Henderson WG, Khuri SF, et al. Mortality after noncardiac surgery: prediction from administrative versus clinical data. Med Care 2005 Feb;43(2):159-67. Hall BL, Hirbe M, Waterman B, Boslaugh S, Dunagan WC. Comparison of mortality risk adjustment using a clinical data algorithm (American College of Surgeons National Surgical Quality Improvement Program) and an administrative data algorithm (Solucient) at the case level within a single institution. J Am Coll Surg 2007 Dec;205(6):767-77. Mack MJ, Herbert M, Prince S, Dewey TM, Magee MJ, Edgerton JR. Does reporting of coronary artery bypass grafting from administrative databases accurately reflect actual clinical outcomes? J Thorac Cardiovasc Surg 2005 Jun;129(6):1309-17. Parker JP, Li Z, Damberg CL, Danielsen B, Carlisle DM. Administrative versus clinical data for coronary artery bypass graft surgery report cards: the view from California. Med Care 2006 Jul;44(7):687-95. Ugolini C, Nobilio L. Risk adjustment for coronary artery bypass graft surgery: an administrative approach versus EuroSCORE. Int J Qual Healthcare 2004 Apr;16(2):157-64.


The Dutch HSMR method and cardiac surgery

20.

21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33.

34. 35.

Hall BL, Hirbe M, Waterman B, Boslaugh S, Dunagan WC. Comparison of mortality risk adjustment using a clinical data algorithm (American College of Surgeons National Surgical Quality Improvement Program) and an administrative data algorithm (Solucient) at the case level within a single institution. J Am Coll Surg 2007 Dec;205(6):767-77. Siregar S, Groenwold RH, Versteegh MI, Takkenberg JHH, Bots ML, van der GY, et al. Data Resource Profile: Adult cardiac surgery database of the Netherlands Association for Cardio-Thoracic Surgery. Int J Epidemiology 2013 Feb 9. Jarman B, Pieter D, van der Veen AA, Kool RB, Aylin P, Bottle A, et al. The hospital standardised mortality ratio: a powerful tool for Dutch hospitals to assess their quality of care? Qual Saf Healthcare 2010 Feb;19(1):9-13. Dutch Hospital Data. www.dutchhospitaldata.nl. Utrecht, The Netherlands. World Health Organization. www.who.int. 2012. Centraal Bureau voor de Statistiek. www.cbs.nl. Den Haag, The Netherlands. Vaartjes I, Hoes AW, Reitsma JB, de BA, Grobbee DE, Mosterd A, et al. Age- and gender-specific risk of death after first hospitalization for heart failure. BMC Public Health 2010;10:637. Toll DB, Janssen KJ, Vergouwe Y, Moons KG. Validation, updating and impact of clinical prediction rules: a review. J Clin Epidemiol 2008 Nov;61(11):1085-94. Steyerberg EW. Clinical prediction models. New York: Springer Science+Business Media; 2009. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988 Sep;44(3):83745. Redelmeier DA, Bloch DA, Hickam DH. Assessing predictive accuracy: how to compare Brier scores. J Clin Epidemiol 1991;44(11):1141-6. Breslow NE, Day NE. Rates and rate standardization. In: Heseltine E, editor. Statistical Methods in Cancer Research: Vol. II - The Design and Analysis of Cohort Studies.Lyon: International Agency for Research on Cancer; 1987. p. 48-80. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2011. Jones RH, Hannan EL, Hammermeister KE, DeLong ER, Oâ&#x20AC;&#x2122;Connor GT, Luepker RV, et al. Identification of preoperative variables needed for risk adjustment of short-term mortality after coronary artery bypass graft surgery. The Working Group Panel on the Cooperative CABG Database Project. J Am Coll Cardiol 1996 Nov 15;28(6):1478-87. Aylin P, Bottle A, Majeed A. Use of administrative data or clinical databases as predictors of risk of death in hospital: comparison of models. BMJ 2007 May 19;334(7602):1044. Zhan C, Miller MR. Administrative data based patient safety research: a critical review. Qual Saf Healthcare 2003 Dec;12 Suppl 2:ii58-ii63.

163


Synthesis


11 General Discussion


Chapter 11

168


General discussion

The objective of this thesis was to monitor safety in cardiac surgery in The Netherlands using the Adult Cardiac Surgery Database of the Netherlands Association for Cardio-Thoracic Surgery. The database has come a long way since its development only six years ago. As the only national clinical outcomes registry in The Netherlands with full coverage, this initiative should be applauded, admired and treasured. It is exactly for this reason that improvement is even more desired; the potential of this database has not been exploited to the fullest. Recapitulating the goal of the database, monitoring of safety is not achieved by merely measuring mortality rates. The process consists of several steps and involves many clinical, practical, methodological and statistical issues. The research that has led to this thesis provided knowledge on the possibilities for improvement of the database, in order to optimize the monitoring of safety in cardiac surgery. In practical terms, this can be broken down in to recommendations on specific aspects of the database, which follow from the specific aims of this thesis (see Box): to investigate methods to measure and to compare safety in cardiac surgery. In addition, valuable knowledge and experience from two exemplary registries are related to the current situation in The Netherlands. The New York State (NYS) Cardiac Surgery Reporting System (CSRS) is the oldest registry of its kind; it initiated the era of public accountability in cardiac surgery in 1991, when it published surgeon and hospital-specific risk-adjusted mortality rates.(1) Since then the CSRS has continuously improved with regard to data collection, data validation, risk adjustment and many other issues. More importantly, it is one of the few clinical registries that are funded and mandated by the state. The CSRS thus has both the resources and the authority to function as a supervisory body, thereby creating favorable conditions for safety and quality monitoring. In this General Discussion different aspects of safety and quality monitoring are reviewed, thereby comparing the situation in The Netherlands to that in NYS. Finally, suggestions are provided of how the NVT can approach these aspects in order to achieve the initial goals of the NVT database. On a hospital level, the Cleveland Clinic is one of the frontrunners of quality improvement. Recently designated as a three-star cardiac surgery program (out of three) by the Society of Thoracic Surgeons, this program has made an enormous investment to optimize data collection, data accuracy, data flow, feedback and eventually quality improvement. A close analysis of the structure of the Cleveland Clinic quality improvement program may be beneficial to all centers in The Netherlands and is for this reason included in this discussion. Close examination of these two registries led to the addition of two paragraphs: effectuate improvement and improving quality on hospital level. Although these issues have not been discussed in the studies included in this thesis, they contribute to the next step: improving safety.

169


Chapter 11

Aspects of a national database to optimize for the purpose of safety (and quality) monitoring Measuring safety (and quality) Data accuracy and completeness Audits Verification of outcomes Measuring quality Measuring mortality Ranking lists Comparing safety (and quality) Risk-factors for risk adjustment Hierarchical modeling Monitoring trends in outcomes Best practices versus underperformers Hospital-specific and surgeon-specific results Generic models versus intervention-specific models Effectuate improvement Feedback to centers Outlier management Public accountability Generating knowledge Collaboration with other databases and registries Improving quality on a hospital level A comparison with Cleveland Clinic

The New York State Cardiac Surgery Reporting System (CSRS) The CSRS is a clinical registry and was developed by the NYS Department of Health in 1989. In-hospital mortality, 30-day mortality and risk factors for mortality after cardiac surgery are collected. Risk-adjusted mortality rates are calculated for isolated CABG, isolated valve and valve with CABG procedures, for hospitals as well as for individual surgeons. Results are published in annual reports. The Cardiac Advisory Committee (consisting of on 25 cardiac surgeons and cardiologists) provides clinical guidance for reporting and analyses and serves as an advisory body to the Department of Health. The complete Cardiac Services Program (CSP) covers the CSRS, the PCI registry, the pediatric CSRS and a cardiac catheterization registry. In total 10 FTE is made available by the State Department of Health to run the CSP. The costs for the whole CSP in total are 1.38 million dollars and cover the salary of 10 FTE. Approximately a quarter of these resources go to the CSRS. All cardiac surgery programs are obliged to have a data coordinator. Costs for the data coordinator are not covered by the Health Department.

170


General discussion

Measuring safety (and quality) Data accuracy and completeness The importance of data accuracy is evident. In order to draw credible inferences about the quality of care, the data on which these are based must be credible; misclassification of risk factors can severely affect the comparison of risk-adjusted outcomes (Chapter 9). Ideally, data is checked for accuracy and completeness in the centers, before submission to the national database. Nevertheless, in the light of the possibility for either intentional (gaming) or unintentional misclassification of variables, validation of data in the national database is required. In order to ensure high data completeness and accuracy, a data coordinator at each center is strongly desired. There should be frequent contact between the national database data coordinators and the local data coordinators. The current situation in The Netherlands: Currently over 99% of the NVT data is complete. Data collection starts at the 16 cardiac surgery centers. Every three months data are submitted to the central database, situated in one of the academic medical centers (Academic Medical Center Amsterdam) in The Netherlands. In each center, one surgeon is responsible for adequate collection and submission of data. Subsequently error checks are run to identify out of range values and other forms of incorrect entries. An error report is sent to the centers and corrections to the data are requested. In addition, all centers are asked to verify the records coded with the outcome in-hospital mortality. All data is then stored for further analyses. CSRS: This process of data collection and cleaning is to some extent comparable to that described above. However, in addition to the basis error checks, extensive effort is made to verify that the collected data are accurate. Frequent contact between the CSP staff and the data coordinators in the centers ensures that missing and out-of-range values do not occur in the database. Data coordinators are requested to confirm key risk factors and rare or improbable values. In some cases medical files and operative notes are requested (e.g. if a case is excluded from analyses because of a preoperative state of shock) to verify the accuracy of the reported data. Every year a data coordinator meeting is organized to discuss practical issues regarding data collection. Verification of data covers a substantial part of the CSP staff activities. These efforts combined with on-site audits led to changes of up to 0.6% in expected mortality rate in some hospitals. Audits Audits are a way to check the validity of the reported data. During an audit the medical files of a sample of the surgical cases is compared to the reported data; an activity that is both laborious and time consuming. For this reason, audits should be made as efficient as possible. They can be supported by the monitoring of the prevalence of risk factors and the

171


Chapter 11

risk profile of the audited hospital in time. In the manufacturing industry, methods have been developed to monitor variables that reflect the manufacturing processing of a product, such as the length of a bolt or the content of a can of soft drink.(2;3) This encompasses a wide range of tools that are all described as Statistical Process Control (SPC). SPC methods can be used to monitor changes in risk factor prevalence (Chapter 5) and may identify areas of possible misclassification in the data. SPC methods allow central monitoring of overall patient risk and could potentially cut on the expenses of database maintenance by maximizing the efficiency of on-site audits. Depending on the available resources, the process of on-site auditing could be outsourced to an external organization, or centrally coordinated (e.g. central sample selection) round robin site-visits could be performed by data coordinators. Whichever method is chosen, the necessity of structural on-site audits is undoubted. The current situation in The Netherlands: In 2009 an audit committee (consisting of a cardiac surgeon, an epidemiologist and a data manager) was installed by the NVT to obtain insight into the data collection in participating centers, and completeness, uniformity and reliability of collected data. Five centers volunteered to participate in pilot audits in 2010. A new circle of audits was proposed, but has not been performed yet. CSRS: Audits for the CSRS are performed by IPRO, a national not-for-profit organization that provides a variety of healthcare assessments. IPRO nurses are provided with data element definition materials and training by CSRS staff before the start of each review cycle. Hospitals are selected based on unusual risk factor reporting, time since last review and total number of hospitals to be reviewed given the budget. CSRS staff selects a sample of 50 cases from each selected hospital to be reviewed, mostly cases with higher prevalence of risk factors. Incorrect coding has major consequences; a cardiac surgery center may be requested to provide medical record documentation for every remaining case in their database with a risk factor that was found to be structurally misclassified in the sample. A center is audited approximately once every 4 years. The costs of the IPRO audits are estimated to be approximately 200,000 dollars per year for the 40 cardiac surgery centers in NYS. Verification of outcomes Misclassification of mortality affects risk-adjusted mortality rates. This is especially true when the incidence of the outcome is low, for example with mortality in cardiac surgery. Administrative data provide a means to verify outcomes such as vital state, length of stay and readmissions in a relatively easy manner. Linkage of the NVT database with administrative data sources (Chapter 10) showed 6% underreporting of in-hospital mortality in the NVT database. As this corrupts comparison of outcomes, a periodical linkage to the administrative data sources is absolutely necessary to verify that all cases are included and to verify vital state. Ideally, after verification using administrative databases, the data could also be sent back to the centers. In this way, the corrected data would also be available to the centers for 172


General discussion

their own use and local databases could be updated with the vital state of patients. The current situation in The Netherlands: In the beginning of 2012 a subset of procedures performed in 10 out of 16 centers in the NVT database was linked to the national Death Registry and the national Discharge Registry. This study showed that annual linkage to administrative data sources is feasible. Unfortunately, information on mortality date and cause could only be sent as aggregated data, as the National Death Registry can only be used for research purposes. CSRS: Underreporting of mortality was also experienced in the early years of the CSRS, which for that reason now uses the NYS Vital Statistics to assess vital state up to 30 days after surgery. In addition, data is linked to the State-wide Planning and Research Cooperative System (SPARCS) to ensure that all patients undergoing cardiac surgery are in the registry and that their discharge disposition is accurate. After all analyses are performed, the cardiac surgery centers receive their corrected data, including 30-day survival status. Assessing safety and quality Safety and quality are difficult concepts to define. In healthcare, safety is usually described as the absence of adverse outcomes. The quality of care is commonly assessed using outcomes such as mortality, complications, length of stay and residual symptoms. If outcome measures are difficult to define or to obtain, process and structure measures may function as a proxy for quality. The association between most of the available process and structure measures and outcome remain doubtful. For example, the volume-outcome relation in CABG surgery has frequently been studied and appears to be weak.(4-8) On the other hand, the use of beta-blockers and the use of internal mammary artery grafts are proven to be associated with better long-term outcomes and could be used as a proxy for quality.(9-11) However, as there are numerous outcomes measures available, the necessity for the use of process and structure measures is unclear. The process variables that are related to long-term outcomes (such as the use of beta blockers) provide a means to help assess quality in the absence of long-term outcomes. For this reason, these variables are indeed valuable for quality assessment. As opposed to outcomes, process measures are particularly useful for quality improvement initiatives, as they are actionable. Processes can be changed in order to improve outcomes. Processes should therefore be measured so that intervention can be targeted in case of poor outcomes. In addition, processes of best practices may be related to good outcomes and could serve as an example to other centers. The following is an example of how processes may be related to poor outcomes. If a high prevalence of renal failure is observed, a center could seek to find potential suboptimal processes of care that may be related to the occurrence of this complication: in patients with preexisting poor renal function a higher arterial pressure could be maintained during surgery and the use of diuretics in the postoperative period may be modified to this specific patient population. 173


Chapter 11

The current situation in The Netherlands: Up to the end of 2012, the NVT data set included only in-hospital mortality as outcome measure. The NVT data set currently also contains complications and several process and structure variables from the STS Quality measures. CSRS: The CSRS uses in-hospital and/or 30-day mortality as outcome. Recently, readmissions were obtained from SPARCS. Additionally, other outcomes could be recorded in order to enable better quality assessment. However, these other outcomes are often difficult to audit, for example complications. In the CSRS, process and structure variables are not used for the purpose of benchmarking. Instead, they are used for the subsequent step: processes and structures of best practices and underperformers are studied and used to identify possible relations with the quality of care. These results can then be used for other centers in order to improve their own structures and processes. For example: the mortality of a center identified as a high-mortality outlier appeared to be mainly attributable to emergency cases. Further investigations into the process of care for emergency patients revealed that patients were not sufficiently stabilized before going to the operating room. Changes were made to the stabilization process of emergency patients and mortality rates dropped spectacularly. Measuring mortality Mortality is the most commonly used outcome measure in cardiac surgery. The fact that it is unambiguous and relatively easy to determine makes it an appealing measure for outcomes evaluation. Mortality after cardiac surgery is often measured as in-hospital mortality, 30-day mortality or operative mortality. In-hospital mortality rates depend on the postoperative transfer policy of patients to other healthcare facilities or back to the referring hospitals. The fact that the moment of transfer is at the discretion of providers leaves room for â&#x20AC;&#x153;gamingâ&#x20AC;? of results. These problems can be avoided by using 30-day mortality. However, the 30-day period for measurement of cardiac surgery-related mortality is shown to be an arbitrary cut-off value. Early mortality continues to decline far beyond 30 days and stabilization of the hazard after cardiac surgery occurs after approximately 120 days (Chapter 5). The question that needs to be answered in this issue is what it is that needs to be measured; 30-day and one-year mortality might each reflect other processes. Patient compliance to medication, quality of home care, extent of involvement of the cardiologist and many other factors have an increasing effect on the risk of mortality after discharge. On the contrary, the effect of the initial care provided around the intervention in the hospital is likely to decrease in time. Comparison of mortality should cover as much of the surgery related mortality, while keeping the effect of other factors limited. Considering the findings of Chapter 6, surgery related mortality can be evaluated from approximately 120 days postoperatively. The current situation in The Netherlands: the NVT collects in-hospital mortality. Currently, assessment of survival status after discharge by the individual centers does not seem feasible. However, the linkage to administrative data sources in 2012 showed that these registries provide a means to obtain this information. 174


General discussion

CSRS: The CSRS recognizes that surgery-related mortality continues beyond the arbitrary cut-off of 30 days. Therefore, in addition to in-hospital and 30-day mortality, CSRS is adding one-year mortality to the analyses shortly.

Comparing safety (and quality) Risk factors for risk adjustment Risk adjustment is essential for the comparison and evaluation of outcomes. In Europe, the most commonly used risk-model is the EuroSCORE. As shown in Chapter 7, both the logistic and the additive EuroSCORE greatly overestimate the risk of mortality (calibration). The discrimination of the model has remained good over the years. An annual update of the model in new data is therefore essential. The EuroSCORE should not be used in its original form as its calculation of expected risks is inadequate. Independent of which risk-model is used, risk adjustment can be performed with a limited number of variables. For isolated CABG surgery, it has previously been shown that the addition of variables will have a limited effect on risk adjustment in models that already include 7 risk factors.(12) A second reason for using a limited number of variables is that an exhaustive data set could overburden hospitals. Considering the limited resources for data collection, there is an actual risk of overburdening of the cardiac surgery centers in The Netherlands. The consequences could be ample: it could affect the completeness and, even more troublesome, it could endanger the voluntary participation of some hospitals in the database. Therefore, restriction of the number of data elements is desired and further expansion of the data is discouraged. The current situation in The Netherlands: The NVT data set includes EuroSCORE risk factors. CSRS: The CSRS has a relatively small data set when compared to the STS. Only risk factors that are thought to be relevant for risk adjustment are collected, to limit the burden of data collection. Ranking lists Ranking lists are an imprecise statistical method to report cardiac surgery mortality rates (Chapter 8). Unfortunately they remain appealing for media and patients, as one can see at a glance which center supposedly performs best and which performs poorest. Professional societies currently hardly ever use ranking lists. A good alternative to ranking lists is the comparison against a common benchmark, also called benchmarking. In case of mortality as the outcome measure, this would mean that a hospital can have a mortality that is higher than, lower than or not significantly different from the benchmark. A three star system, such as used by the Society of Thoracic Surgeons, simplifies reporting to make it interpretable to all who are less acquainted with benchmarking.

175


Chapter 11

The current situation in The Netherlands: The NVT compares hospitals against a common benchmark and strongly discourages the use of ranking lists. CSRS: Rankings and league tables are not used in the CSRS. To avoid even the slightest allusion of presenting ranks, hospitals are ordered alphabetically by name instead of by mortality rate. Hierarchical modeling When mortality rates are compared, a distinction must be made between variability caused by systematic differences (between-hospital variability) and those caused by chance (within-hospital variability). If this chance variability is not taken into account, differences between hospitals are exaggerated and do not reflect the true between-hospital variability. Hierarchical modeling and Bayesian modeling account for the statistical impact of clustering of patients within hospitals, which relates to within-hospital variability. Their use for the comparison of outcomes has been advocated by many.(11;13-15) In hierarchical modeling, estimates of small volume hospitals are moved towards the mean. The advantage is that differences between hospitals are not likely to be over-interpreted, as may be the case in regular logistic regression models. However, it also makes persistent high mortality rates in small volume hospitals more difficult to detect as outliers when compared to regular logistic regression models. Nevertheless, it remains doubtful if this justifies the use of models that exaggerate between-hospital differences. The current situation in The Netherlands: The NVT applies hierarchical models for the comparison of outcomes across the 16 cardiac surgery centers. There are 16 cardiac surgery programs covering for 16 million inhabitants in the Netherlands. Thus, low-volume cardiac surgery programs do not exist in the current Dutch health system. In addition, benchmarking is performed on surgery performed during a three-year period, thus aggregating more data for analyses. CSRS: The CSRS uses regular (i.e. fixed effects) logistic regression models, because it is believed that the philosophy behind hierarchical modeling does not fit the purpose of benchmarking. The main concern is that hierarchical modeling is meant to predict future results, as opposed to explaining the variation seen in the present data. In addition, the assumption that the center-effects of hospitals originate from one underlying distribution may not always be true. In addition, NYS has 40 cardiac surgery programs for approximately 20 million inhabitants. Small volume hospitals with high mortality rates may remain undetected if hierarchical modeling was used. How to monitor trends in outcomes In addition to the benchmarking procedure once a year or once in the two or three years, outcomes can be monitored continuously. Continuous monitoring allows early detection

176


General discussion

of changes in the outcome rate. Statistical process control (SPC), as described previously, comprises statistical methods that allow the monitoring of a certain process, such as the outcome rate. The most commonly used SPC method is the cumulative sum (CUSUM) graph. Studies using CUSUM techniques to monitor outcomes have shown that the method can detect increased mortality rates earlier than standard statistical techniques.(16) Other forms of SPC that can be used for outcomes monitoring are: Shewhart chart, mean average charts, exponentially weighted mean average charts. The use of risk-adjusted mortality rates (calculated with models of the previous year) and SPC methods will aid the interpretation of mortality trends in time and are recommended in any outcomes database. The current situation in The Netherlands: The NVT constructs CUSUM graphs to monitor in-hospital mortality. These CUSUM graphs are risk-adjusted using the original EuroSCORE. The use of the EuroSCORE in its original form is not appropriate for this purpose, as both the logistic and the additive EuroSCORE greatly overestimate the risk of mortality (see discussion on risk adjustment). â&#x20AC;&#x153;Continuousâ&#x20AC;? monitoring in practice comes down to a quarterly evaluation of outcomes, as data is sent to the NVT database on a quarterly base. CSRS: The CSRS performs analyses on crude mortality after every quarterly data transfer. This information is sent to the centers. Quarterly alert letters are sent to cardiac surgery centers with a crude mortality rate more than 2.5 times the last reported statewide crude mortality rate. Risk adjustment models of previous years are used to calculate quarterly riskadjusted mortality rates on request. Best practices or underperformers? Benchmarking can be used to identify both under performers and best practices. The identification of hospitals with a high adverse outcome rates is necessary because it may reflect a potential hazardous situation, which could then be intervened upon. The identification of hospitals with a low adverse outcome rate is also desired, as this may reflect a best practice. What the exact underlying mechanism is of either under performance or best practice cannot be directly deduced from the outcomes. Further investigations on processes of care may provide insight into potential causes of underperformance or best practice. As quality and safety are such complex concepts to define, identifying best cardiac surgery centers for the overall quality or safety of care should be avoided. Rather, identifying best practices for specific outcomes (e.g. deep sternal wound infection, neurological complications or readmissions) seems more appropriate and more useful. The current situation in The Netherlands: The NVT database was built to safeguard the quality of cardiac surgical care in The Netherlands. It follows from this perspective that the focus was to identify underperformers i.e. potential dangerous situations. However, all centers could be driven to improve quality by comparing results to best practices.

177


Chapter 11

CSRS: The CSRS focuses on both best practices and underperformers. As discussed in a previous section, the approach of the CSRS is to study the processes and structure of care in best practices and underperformers in order to improve all cardiac surgery centers. Hospital-specific or surgeon-specific results? Within many cardiac surgery centers outcomes are evaluated for the whole department and by surgeon. Some methodological concerns have been raised on performing analyses by surgeon. Lower numbers lead to less precision when compared to results on hospital level. In addition, a surgeonâ&#x20AC;&#x2122;s case mix might be very specific, which makes comparison of surgeon-specific outcomes more difficult than comparison of hospital-specific outcomes. As outcomes are not only attributed to the surgical skills of a surgeon, but to the quality of care provided during the complete hospital stay (and even in the pre-admission and postdischarge stage), the value of surgeon-specific outcomes should not be overestimated. Past examples of underperformance in The Netherlands have shown that processes of care were the key causes, as opposed to the performance of individual surgeons.(17) The current situation in The Netherlands: The NVT does not collect surgeon-specific outcomes. There is little, if any, support among surgeons for analyses of surgeon-specific results. CSRS: In NYS, surgeon-specific reporting magnifies the strengths and weaknesses of public reporting of risk-adjusted mortality rates. All the limitations of benchmarking due to small numbers are magnified when benchmarking is performed on surgeon-level. To account for this, only results of surgeons who performed more than 200 cases per year are released in NYS. In addition, limitations of the risk adjustment method are magnified when applied on surgeon-level, because some surgeons might almost exclusively perform one type of surgery. To account for this, the CSRS frequently evaluates new risk factors for the risk adjustment model and new criteria for exclusion (p.e. cardiogenic shock and cardiac arrest). The Cardiac Advisory Committee provides clinical recommendations in case a potential unaccounted risk factor is identified that might have affected the results. Generic models or intervention-specific models? The EuroSCORE is a generic model that can be applied to all cardiac surgery. Although the simplicity might seem convenient at first sight, it is not the preferred method for benchmarking. The concept of analyzing completely different type of interventions in one model seems inappropriate.(11;18-20) For example, a risk factor could have no effect or a much stronger effect on mortality in isolated CABG procedures than in valve procedures. Therefore, all future risk models should be intervention specific.

178


General discussion

The current situation in The Netherlands: The NVT collects variables according to the EuroSCORE definitions. The comparison of outcomes in the NVT database is currently performed in three intervention categories: isolated CABG, isolated aortic valve replacement (AVR) and AVR combined with CABG. Using the EuroSCORE risk variables, different models are fitted in each of these categories. Approximately one third of all cardiac surgery is excluded from the intervention categories applied in the NVT database. This â&#x20AC;&#x153;otherâ&#x20AC;? category comprises a variety of interventions, ranging from CABG plus MAZE to triple valve and thoracic aortic surgery. Subdivision of this category (specifying for example mitral valves and multiple valves) would provide more insight into this heterogeneous group. CSRS: The CSRS uses different models for isolated CABG, isolated valve and valve with CABG surgery. A generic model is strongly discouraged in NYS, as such a model would be based on incomparable type of interventions.

Effectuate improvement Feedback to centers Measuring and reporting of outcomes and processes raises awareness of potential underperformance and identifies possibilities for improvement. This was acknowledged by the NVT in 2007 when the database for adult cardiac surgery was founded. However, improvement requires a target, which can be specified using the data. This fundamental element of feedback can be found in nearly every other registry for quality improvement. (21-24) Feedback should be standardized, by reporting outcomes and the prevalence of risk factors and preferably process variables. Equally important, periodical reporting of benchmark results on all outcomes is essential. Centers should be informed on their performance in comparison to a national benchmark. After all, a center cannot be expected to improve if it is unaware of its poor outcomes. A second point that needs to be addressed is the focus on actionable data. The provided feedback should be used to identify certain process measures to intervene on. Actionable data will help centers in closing the loop of quality improvement, by providing direction to where changes should be made. Timely and actionable feedback on outcomes allows for focused intervention and is an absolute necessity for effective quality improvement. Current status in The Netherlands: Every three months a committee consisting of one fixed representative of each center assembles to discuss reported data and results. At this meeting every center receives its own crude mortality rate, EuroSCORE and prevalence of risk factors as well as the national results to compare with. The mortality rates are not shared among peer centers. The NVT believes that frequent feedback and evaluation of results is the driving force behind quality improvement.

179


Chapter 11

CSRS: Every quarter the CSRS sends reports to the centers informing them of their risk profile and crude mortality rate in comparison to the statewide results. In addition, quality improvement would be encouraged if centers were also provided with feedback on their processes and structures in comparison to those of the centers identified as best practices. Risk-adjusted mortality rates are published in an annual report, including hospital and surgery names. What to do with outliers? Each situation of potential underperformance should be carefully appraised. Letters should be sent to notify the particular center of its potential underperformance. However, merely feedback does not suffice; improvement can only be made if adequate actions are taken. Further investigations by the center itself might be desired and in case of persisting outlier status, an on-site visit of external experts might be needed. The decision on when and how to act is difficult, but cardinal. An advisory committee could serve to increase objectivity and constituency on delicate issues such as outlier management. This committee consisting of experienced and respectable experts could be consulted to evaluate problems varying from outlier management to risk adjustment issues, and tailor the approach to each specific situation. This advisory body should be recognized by all NVT members and the Inspectorate of Health. Current status in The Netherlands: the NVT has only recently performed benchmarking of mortality rates. It is currently seeking suitable methods for outlier management. CSRS: The CSRS has no predefined outline on the procedure for centers and surgeons identified as high-mortality outliers. Persistent outlier status triggers on-site visits by the Cardiac Advisory Committee to investigate possible reasons for poor outcomes. The CAC reports their findings to the Department of Health, which then decides upon the appropriate actions to be taken. This could include recommendations on how to improve their processes, a change of certain members of the staff, and in a rare occasion temporarily closing a center for all cardiac surgery or for specific types of interventions. Quality improvement in NYS is to a great extent effectuated by the actions taken upon high mortality rates. This underlines the importance of adequate outlier detection. Public accountability Besides the feedback to the centers, some national registries (Sweden) and some US states (New York State, California, Pennsylvania, Massachusetts, New Jersey) publicly release riskadjusted results on hospital-level and for some, surgeon-level. Important arguments in favor of public accountability include the right for patients to be informed on the results in healthcare and the possible market forces that might benefit the quality of provided care. (25) Possible disadvantages are risk averse behavior, â&#x20AC;&#x153;gamingâ&#x20AC;? of risk factors and outcomes, and negligence of non-measured outcomes. 180


General discussion

It is presumed that healthcare providers have a paradigmatic intrinsic desire to improve the quality of care. However, the history in the US has shown that most existing quality improvement programs were either initiated or accelerated when other strong incentives arose. Proper data collection, data validation and adequate risk adjustment were only valued after it became clear that the data and results were going to affect the reputation, the financial status and even the mere existence of a cardiac surgery program. Currently, the merits of data collection and outcomes reporting are reflected by the resources made available for these purposes. Current status in The Netherlands: the NVT recently published an outcomes booklet, including the anonymized benchmarking results of the 16 cardio-thoracic surgery centers. CSRS: In 1986 the Healthcare Financing Administration (HCFA) in NYC, the predecessor of the Center for Medicare and Medicaid Services (CMS), publicly reported mortality rates that were adjusted for patient risk using administrative (created for reimbursement) data. Many criticized that administrative data would not suffice for this purpose and clinical registries were created in reaction. Although public release caused much commotion among cardiac surgeons, some admitted that they would not have taken results seriously if they had not been published. New York State reports show that risk averse behavior has not been a problem after the implementation of the public report cards, as the expected mortality in NYS hospitals remained unchanged.(26;27) Nowadays, the feared negative aspects of public reporting seem to have become less relevant as the attentiveness to the published results has faded. Generating knowledge A database is a well of information that should not only be used for benchmarking, but quality improvement and outcomes evaluation in its broadest sense. The collected data can be used to generate knowledge and provide answers to questions on, for example, the safety of new surgical techniques or products. For research purposes, not only early outcomes, but also long-term follow-up of patients is much desired. On a national level this can be accomplished using administrative data sources. Current status in The Netherlands: To serve the initial purpose of the database, research was first aimed at optimizing the methodological issues regarding comparison of outcomes and benchmarking. However, soon clinical issues were also addressed. The database has for example been used to investigate the survival after CABG and valve surgery and will in the near future be used to study gender differences in cardiac surgery. CSRS: The CSRS has extensively been used to generate knowledge in order to improve quality. An example among many published studies on CSRS data is a study on the relation of body temperature and outcomes.(28)

181


Chapter 11

Collaboration with other databases/registries In Europe, the European Association for Cardio-Thoracic Surgery (EACTS) should be applauded for their efforts in developing a European adult cardiac surgery database. Transnational databases encounter many issues, including the legal aspect of communicating personal information, financial support and data ownership. The NVT is in the phase of consolidating knowledge, effort, experience and resources to improve and maximize the use of its own database. Therefore, for the purpose of quality improvement in The Netherlands, the first priority should be to adequately benchmark and improve the quality of care in cardiac surgery centers in The Netherlands. Also, incomparable healthcare systems and varying quality of data makes the advantage of benchmarking on a European level questionable. However, for the purpose of generating knowledge, collaboration of databases would undoubtedly give many advantages. Taking the abovementioned into consideration, on a European level, a logical first step would be to align definitions and outcomes across databases. A European (and preferably a global) consensus on a uniform core dataset for CABG and valve surgery would increase the opportunities for generating knowledge. Current status in The Netherlands: In the past, some cardiac surgery centers in The Netherlands contributed data to the database of the European Association for CardioThoracic Surgery (EACTS). Currently, the EACTS database is being reviewed and pilots are planned for the new version of the database. Discrepancies between the EACTS data set and that collected for the NVT caused inconvenience and may have been a reason for nonparticipation in the past years. CSRS: In New York State, participation in the CSRS is compulsory for all non-federal cardiac surgery programs. Some NYS hospitals also report to the nationwide STS database. The CSRS has many definitions that are identical to those of the STS, and has aligned other definitions with the STS data set to facilitate data collection and reporting. In NYS, withinstate benchmarking of 40 cardiac surgery programs is regarded as sufficient to improve the quality of care. The added value and the validity of nation-wide benchmarking are questioned, especially because data quality in other states might vary from that in the CSRS. Data validation efforts in the CSRS are exceptional, as well as the linking to National Vital Statistics data.

Improving quality on a hospital level The Cleveland Clinic Heart and Vascular Institute The Heart and Vascular Institute (HVI) covers all care provided to heart and vascular conditions in Cleveland Clinic, both surgical and medical care. Programs include cardiovascular surgery, thoracic surgery, congenital heart surgery, vascular surgery, cardiac electrophysiology and

182


General discussion

pacing, cardiovascular imaging, clinical cardiology, heart failure and cardiac transplant medicine, invasive cardiology and many more. The HVI registry contains multiple registries, among which a clinical registry for adult cardiac surgery, pediatric cardiac surgery and transcatheter aortic valve replacement registry. HVI submits data to the Society of Thoracic Surgeons (STS), the STS-EACTS congenital heart surgery registry and the transcatheter valve implantation (TVT) registry. The HVI performed over 4,000 cardiac surgery interventions in 2012. Registry structure HVI has 4 nurse abstractors who abstract data from electronic medical records. A quality manager runs regular error checks on the data before it is used for further analyses or submitted to the STS. In addition, 10% of all records are audited on key variables by another person within the HVI (internal validation). The HVI adult cardiac surgery registry has not been through an external validation of data by the STS yet. The STS uses an independent audit firm and has now audited approximately 5% of all adult cardiac surgery registry participants. In-hospital mortality and complications are collected. Thirty-day mortality is difficult to obtain, as vital state cannot be retrieved from the National Vital Statistics System or the Social Security Death Master Index. The final data is checked with the hospital surgery schedule and billing data, to ensure all interventions are reported. The annual costs of all the HVI registries combined are estimated to be between 2 and 3 million dollars. This includes the salary for 22 FTE registry personnel. Direct quality improvement within the center The HVI registries are used to improve the quality of care by direct utilization of data to intervene on patient-level and on process-level. An example of the first is the early followup of patients. Cardiac surgery patients used to be contacted 2 and 10 days after discharge to enquire about their clinical situation. These fixed intervals for follow-up were changed into 1 and 6 days after discharge, when data showed that most readmissions occurred in the first week, peaking at the second and third day. By expediting the follow-up moments, complications could be detected in an earlier stage. Also, when unintentional noncompliance to medical treatment is identified during data abstraction 1 day after discharge, patients are contacted to ensure the proper medical treatment is prescribed. In addition to using feedback to intervene on patient-level care, it is also used to intervene on process-level. For example, physicians are provided with weekly emails reporting on Patient Safety Indicators (specific complications potentially linked to revenues) and medication non-compliance. Monthly emails are sent to the physicians to report on complications and mortality. Timely and actionable data feedback to clinicians has helped them focus on the day-to-day processes that need to be improved. Also, monthly aggregate reports

183


Chapter 11

are presented to the Quality Review Officers (physicians) of sections, departments and the HVI. Quality reports are discussed annually by the Executive Council of Cleveland Clinic. The philosophy of this system is that feedback on results should be provided to the whole chain of personnel. After all, leadership facilitates process changes, while physicians and other clinical personnel drive and implement process changes. The current approach to quality improvement using timely and actionable data started approximately two years ago. Back then, Cleveland Clinic was rated a two-star (out of three stars) hospital in the STS composite quality rating. The quality improvement initiative started with a focus on the use of beta blockers in isolated CABG patients. After that other process measures were scrutinized and improved. Soon, the focus on outcomes and processes led to a culture change: the value of measuring and improving quality using data was finally acknowledged. The focus on outcomes has helped the Cleveland Clinic cardiac surgery program prosper as never before: the adverse outcomes of cardiac surgery have decreased spectacularly and the cardiac surgery program is now rated with three stars. Generating knowledge The clinical registries are extensively used for research purposes. Currently HVI is developing software to integrate data from various data sources in the hospital, ranging from catheter laboratory systems, surgical administration systems, billing administration and so on, in order to form one database. One of the challenges is the merger of data that could be inconsistent or even contradictory. Another great challenge is to store data in such a way that information can be deduced from already collected data. This comprehensive database could be used for both reporting and research. This project is currently under construction. The HVI research department is one of the leading research groups in cardiac surgery. Again, clinical data is not only collected, it is actively used to generate knowledge. Public accountability Cleveland Clinic publishes its outcomes in annual Outcomes booklets and is one of the few hospitals that publish their STS results. There is an increasing tendency towards public accountability in healthcare in the United States: rankings of best performing hospitals or programs are published by magazines, newspapers and websites (e.g. US News, USA Today, healthgrades.com). They are highly regarded by the public and by health insurance companies, which produces a great financial incentive for data collection, accuracy of data and quality improvement. Not only do rankings directly affect consumers in their choice for a healthcare provider, CMS and insurance companies also use these public rankings to determine their policy on reimbursements and which hospitals to contract. Consequently, the items that are scored in the ranking systems â&#x20AC;&#x201C; which often includes STS quality measures - are scrutinized.

184


General discussion

Concluding remarks: eating the elephant one fork full at a time Having compared the NVT database to the quality improvement programs in the US, it seems that the financial burden strongly limits the NVT database. The NVT receives financial support from the Dutch Department of Health to cover the costs for analyses and research aimed at quality improvement. However, all costs of on-site data collection and storage are borne by the centers. Resources are limited and most centers cannot afford a data manager. In the US, the constant scrutiny of outcomes and processes are likely to have attributed to the dramatic decrease in mortality and other complications, which in turn led to less expenses and an increment in the value of cardiac surgery programs. The seemingly excessive amount of resources put into data collection, validation and feedback, appears to be a sagacious investment. This very important aspect of quality programs should be emphasized to policy makers in healthcare: quality improvement requires a substantial investment of resources, but it will â&#x20AC;&#x201C; as history shows - eventually lead to great financial benefits. One of the lessons to be learned from the arduous journey US hospitals have been through, is that additional stimuli are needed to achieve scrutiny of outcomes. Although the situation in the US may not be directly applicable to The Netherlands, the scrutiny of outcomes is much desired as well. Firstly, the quality of collected data may benefit greatly from an increased interest in the results generated from them. Although data quality does not directly improve the quality of healthcare, it is crucial to the success of a database that is used for quality improvement. After all, if the data are not credible, then neither are the results generated from the data. In addition, there is another important reason why the NVT should cease the opportunity to increase data accuracy at once. Currently, there is a great discrepancy between the demand for reliable outcomes data and its availability. The government, healthcare insurance companies, the media and healthcare consumers are forced to turn to data sources that we -as clinicians- claim to be inferior, as long as no alternative is provided. When the time arrives for public accountability, this should be based on data that adequately represent the care delivered by the 16 cardiac surgery centers. The NVT can provide in this need. More importantly, the quality of care may benefit greatly from scrutiny of outcomes and processes related to outcomes. In the US, the publication of results, in particular dissatisfaction over the results, forced healthcare providers to take this step. It is unknown what is needed to initiate the same effect in The Netherlands. Following the US, the notion that quality improvement must be driven by public accountability or financial incentives is increasingly adopted in the Netherlands as well. For example, the Hospital Standardized Mortality Rates have recently been published, hospital rankings are increasingly popular and

185


Chapter 11

pay-for-performance arrangements have already been implemented by the government. The NVT has the opportunity to anticipate on future developments in healthcare, by creating its own incentive for quality improvement. The question which incentive is most suitable or most powerful cannot be answered at this time. It could be that mere feedback on outlier status (and dissatisfaction over outlier status) suffices, but it is more likely that peer accountability (or even public accountability) is required before cardiac surgeons start acting on their results. Besides suggestions to optimize the monitoring of cardiac surgery safety in The Netherlands, this thesis also exposed the hiatus in our present-day knowledge. Currently, every situation of unsafety must be carefully studied to identify processes of care that could have contributed to the hazardous situation. An exemplary clinical registry could detect underperformance, but we may still be in the dark about where the danger lies. The reason for this is that crucial processes of care related to important outcomes in cardiac surgery have yet to be identified. Currently in CABG surgery the only operative care process measure that is associated with outcome is the use of single or multiple internal mammary arteries (IMA).(29;30) Specific preoperative and postoperative medication have been identified as being beneficial in patients who have undergone CABG.(31-36) However, there are still many process and structure variables in cardiac surgical care that may have a large impact on short-term and long-term outcomes. An example of processes of care that together -as a package- have a significant impact on surgical mortality rates are the items that comprise the SURPASS checklist. The nearly 100 items on the checklists address the availability of imaging information, equipment and materials, patient and operativesite verification, communication of postoperative instructions between caregivers, and discharge instructions.(37) Implementation of the SURPASS led to a decrease of patients with one or more complications (from 15.4 to 10.6%) and a decrease in mortality rate (from 1.5 to 0.8%).(38) In cardiac surgery, even after risk adjustment, mortality after isolated CABG shows variability in The Netherlands. This suggests that there is room for improvement, but the knowledge on what exactly to improve is apparently lacking. Future research should therefore be aimed at identifying high-leverage process measures in cardiac surgery. These efforts can then be translated into collaborative quality and safety improvement efforts in all cardiac surgery centers. To conclude, the NVT database has grown tremendously and continues to evolve; it is now time to use the collected data and actually effectuate changes. The task of closing the loop of quality improvement may seem as difficult as eating an elephant. Where to start? The lesson I recently learned is to eat the elephant one fork full at a time. Let us prioritize goals. Let us eat the first fork by using the data, despite its imperfections, to act on potential situations of unsafety. The rest will follow. 186


General discussion

Reference List 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23.

Hannan EL, Kilburn H, Jr., Oâ&#x20AC;&#x2122;Donnell JF, Lukacik G, Shields EP. Adult open heart surgery in New York State. An analysis of risk factors and hospital mortality rates. JAMA 1990 Dec 5;264(21):2768-74. Montgomery DC. Introduction to statistical quality control. Third edition ed. United States: John Wiley & Sons, Inc.; 1997. Oakland JS. Statistical process control. Fifth edition ed. Burlington, MA: Elsevier ButterworthHeinemann; 2003. Birkmeyer JD, Siewers AE, Finlayson EV, Stukel TA, Lucas FL, Batista I, et al. Hospital volume and surgical mortality in the United States. N Engl J Med 2002 Apr 11;346(15):1128-37. Peterson ED, Coombs LP, DeLong ER, Haan CK, Ferguson TB. Procedural volume as a marker of quality for CABG surgery. JAMA 2004 Jan 14;291(2):195-201. Shahian DM, Normand SL. The volume-outcome relationship: from Luft to Leapfrog. Ann Thorac Surg 2003 Mar;75(3):1048-58. Shahian DM. Improving cardiac surgery quality--volume, outcome, process? JAMA 2004 Jan 14;291(2):246-8. Shroyer AL, Marshall G, Warner BA, Johnson RR, Guo W, Grover FL, et al. No continuous relationship between Veterans Affairs hospital coronary artery bypass grafting surgical volume and operative mortality. Ann Thorac Surg 1996 Jan;61(1):17-20. Berwick DM. Public performance reports and the will for change. JAMA 2002 Sep 25;288(12):15234. Landon BE, Normand SL, Blumenthal D, Daley J. Physician clinical performance assessment: prospects and barriers. JAMA 2003 Sep 3;290(9):1183-9. Shahian DM, Normand SL, Torchiana DF, Lewis SM, Pastore JO, Kuntz RE, et al. Cardiac surgery report cards: comprehensive review and statistical critique. Ann Thorac Surg 2001 Dec;72(6):2155-68. Tu JV, Sykora K, Naylor CD. Assessing the outcomes of coronary artery bypass graft surgery: how many risk factors are enough? Steering Committee of the Cardiac Care Network of Ontario. J Am Coll Cardiol 1997 Nov 1;30(5):1317-23. Goldstein H, Spiegelhalter DJ. League tables and their limitations: statistical issues in comparisons of institutional performance. Journal of the Royal Statistical Society Series A (Statistics in Society) 1996;159(3):385-443. Normand SL, Glickman ME, Gatsonis CA. Statistical methods for profiling providers of medical care: issues and applications. Journal of the American Statistical Association 1997 Sep;92(439):803-14. Thomas N, Longford NT, Rolph JE. Empirical Bayes methods for estimating hospital-specific mortality rates. Stat Med 1994 May 15;13(9):889-903. Novick RJ, Stitt LW. The learning curve of an academic cardiac surgeon: use of the CUSUM method. J Card Surg 1999 Sep;14(5):312-20. Een tekortschietend zorgproces. Een onderzoek naar de kwaliteit en veiligheid van de cardiochirurgische zorgketen voor volwassenen in het UMC St Radboud te Nijmegen. Inspectie voor de Gezondheidszorg; 2006 Apr 24. Hannan EL, Wu C, Bennett EV, Carlson RE, Culliford AT, Gold JP, et al. Risk index for predicting inhospital mortality for cardiac valve surgery. Ann Thorac Surg 2007 Mar;83(3):921-9. Hannan EL, Racz M, Culliford AT, Lahey SJ, Wechsler A, Jordan D, et al. Risk Score for Predicting InHospital/30-Day Mortality for Patients Undergoing Valve and Valve/Coronary Artery Bypass Graft Surgery. Ann Thorac Surg 2013 Jan 24. van GM, Kappetein AP, Steyerberg EW, Venema AC, Berenschot EA, Hannan EL, et al. Do we need separate risk stratification models for hospital mortality after heart valve surgery? Ann Thorac Surg 2008 Mar;85(3):921-30. Swedeheart Annual Report 2011. http://www.ucr.uu.se/swedeheart/index php/arsrapporter/doc_ download/178-swedeheart-annual-report-2011-english. 2012 September 14. Sixth National Adult Cardiac Surgical Database Report 2008. Dendrite Clinical Systems Ltd; 2009 Jul. Adult Cardiac Surgery in New York State 2007 - 2009. New York State Department of Health; 2012 Feb.

187


Chapter 11

24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38.

188

Shahian DM, Jacobs JP, Edwards FH, Brennan JM, Dokholyan RS, Prager RL, et al. The Society of Thoracic Surgeons National Database. Heart 2013 Jan 18. Shahian DM, Edwards FH, Jacobs JP, Prager RL, Normand SL, Shewan CM, et al. Public reporting of cardiac surgery performance: Part 1--history, rationale, consequences. Ann Thorac Surg 2011 Sep;92(3 Suppl):S2-11. Chassin MR, Hannan EL, DeBuono BA. Benefits and hazards of reporting medical outcomes publicly. N Engl J Med 1996 Feb 8;334(6):394-8. Hannan EL, Siu AL, Kumar D, Racz M, Pryor DB, Chassin MR. Assessment of coronary artery bypass graft surgery performance in New York. Is there a bias against taking high-risk patients? Med Care 1997 Jan;35(1):49-56. Hannan EL, Samadashvili Z, Wechsler A, Jordan D, Lahey SJ, Culliford AT, et al. The relationship between perioperative temperature and adverse outcomes after off-pump coronary artery bypass graft surgery. J Thorac Cardiovasc Surg 2010 Jun;139(6):1568-75. Loop FD, Lytle BW, Cosgrove DM, Stewart RW, Goormastic M, Williams GW, et al. Influence of the internal-mammary-artery graft on 10-year survival and other cardiac events. N Engl J Med 1986 Jan 2;314(1):1-6. Lytle BW, Blackstone EH, Sabik JF, Houghtaling P, Loop FD, Cosgrove DM. The effect of bilateral internal thoracic artery grafting on survival during 20 postoperative years. Ann Thorac Surg 2004 Dec;78(6):2005-12. Denton TA, Fonarow GC, LaBresh KA, Trento A. Secondary prevention after coronary bypass: the American Heart Association â&#x20AC;&#x153;Get with the Guidelinesâ&#x20AC;? program. Ann Thorac Surg 2003 Mar;75(3):758-60. Ferguson TB, Jr., Coombs LP, Peterson ED. Preoperative beta-blocker use and mortality and morbidity following CABG surgery in North America. JAMA 2002 May 1;287(17):2221-7. Ferguson TB, Jr., Peterson ED, Coombs LP, Eiken MC, Carey ML, Grover FL, et al. Use of continuous quality improvement to increase use of process measures in patients undergoing coronary artery bypass graft surgery: a randomized controlled trial. JAMA 2003 Jul 2;290(1):49-56. Ferraris VA, Ferraris SP, Moliterno DJ, Camp P, Walenga JM, Messmore HL, et al. The Society of Thoracic Surgeons practice guideline series: aspirin and other antiplatelet agents during operative coronary revascularization (executive summary). Ann Thorac Surg 2005 Apr;79(4):1454-61. Lazar HL. Role of statin therapy in the coronary bypass patient. Ann Thorac Surg 2004 Aug;78(2):73040. Okrainec K, Platt R, Pilote L, Eisenberg MJ. Cardiac medical therapy in patients after undergoing coronary artery bypass graft surgery: a review of randomized controlled trials. J Am Coll Cardiol 2005 Jan 18;45(2):177-84. de Vries EN, Hollmann MW, Smorenburg SM, Gouma DJ, Boermeester MA. Development and validation of the SURgical PAtient Safety System (SURPASS) checklist. Qual Saf Healthcare 2009 Apr;18(2):121-6. de Vries EN, Prins HA, Crolla RM, den Outer AJ, van AG, van Helden SH, et al. Effect of a comprehensive surgical safety system on patient outcomes. N Engl J Med 2010 Nov 11;363(20):1928-37.


Supplement Supplement Synthesis


S1 Trends and outcomes of valve surgery: 16-year results of The Netherlands Adult Cardiac Surgery Database

Siregar S*, de Heer F*, Groenwold RHH, Versteegh MIM, Bekkers JA, Brinkman ES, Bots ML, van der Graaf Y, van Herwerden LA

* Both authors contributed equally to this work


Chapter S1

Abstract Objective To describe procedural volumes, patient risk profile and outcomes of heart valve surgery in the past 16 years in The Netherlands. Methods The Dutch Adult Cardiac Surgery Database includes approximately 200,000 cardiac operations performed between 1995 and 2010. Information on all valve surgery (56,397 operations) was extracted. We determined trends for changes in procedural volume, demographics, risk profile, and in-hospital mortality of valve operations. Because of incomplete data in the first years of registration, the total number of operations in those years was estimated using Poisson regression. For a subset from 2007 to 2010 follow-up data was available. Survival status was obtained through linkage with the national Cause of Death Registry and survival analysis was performed using Kaplan-Meier analysis. Information on discharge and readmissions was obtained from the National Hospital Discharge Registry. Results The annual volume of heart valve operations increased by more than 100% from an estimated 2,431 in 1995 to 5,906 in 2010. Adjusted for population size in the Netherlands, the number of operations per 100,000 adults increased from 20 in 1995 to 43 in 2010. In aortic valve surgery, an increasing use of bioprostheses in all age categories is observed. In 2010, 75.4% of mitral valve surgery was performed by repair instead of replacement. In-hospital mortality for all valve surgery decreased from 4.2% in 2007 to 3.6% in 2010, whereas the mean logistic EuroSCORE remained stable (median 5.8, p=1.000). Thirty-day mortality after all valve surgery was 3.9% and 120-day mortality was 6.5%. At one year, survival after all valve surgery was 91.6% and a reoperation had been performed in 1.6%. The median postoperative length of stay was 7 days (interquartile range 5-11) in the primary hospital and 11 days (interquartile range 8-16) including subsequent stay in the secondary hospital. Conclusion The results of this study provide a comprehensive overview of valve surgery trends and outcomes in the Netherlands. The number of heart valve operations performed in The Netherlands has increased since 1995. The decrease in mortality and unchanged EuroSCORE between 2007 and 2010 might reflect a general improvement of the safety of valve surgery.

192


Trends and outcomes of valve surgery in The Netherlands

Introduction Heart valve surgery has changed over time dramatically. The first reports describing heart valve surgery originate from the late 1940â&#x20AC;&#x2122;s, when Bailey and Harken performed closed mitral valve reconstruction in a patient with mitral stenosis.(1) A decade later, the introduction of the cardiopulmonary bypass facilitated the first open heart valve surgery and another decade later the first heart valve replacements were performed. Ever since, thanks to technical improvements, increasing knowledge on the etiology of heart valve disease, risk prediction models and guidelines on valvular heart disease, the postoperative mortality has decreased dramatically.(2-4) Even in recent years, early mortality has continued to drop, despite the fact that the patient risk profile has increased.(5) Previous studies using large databases have provided insight into changes in patient demographics, risk factors and outcomes in particular regions in the world.(5-10) The majority of these studies were performed using data from North America. Only scarcely are valve surgery trends and outcomes on a national level published in Europe.(6;9;11) In The Netherlands information on heart valve surgery has been collected in a national database since 1995. The aim of our study was to describe trends in procedural volumes, patient risk profile and outcomes of heart valve surgery since 1995 using national registry data from The Netherlands.

Methods Data The data used for this study was derived from four data sources (Figure 1) and is partly described in previously published papers.(12) Source 1: The Supervisory Committee for Heart Interventions Netherlands (BHN) database In The Netherlands, since 1995 all cardiac surgery has been registered by an umbrella foundation set up by cardiologists, pediatric cardiologists, cardiac anesthesiologists and cardiac surgeons, called Supervisory Committee for Heart Interventions Netherlands (Begeleidingscommissie Hartinterventies Nederland, BHN).(13) Demographic details and the type of intervention are registered in this procedural database. Six of the 13 cardiac surgery centers participated at the start (in 1995) and were soon followed by the rest of the centers in the following years. All records of adult patients undergoing valve surgery from January 1995 until December 2010 were included in this study (n= 62,464). Records with unknown intervention type (1.6%), concomitant aortic surgery (7.3%) and transcatheter aortic valve implantations (0.7%) were excluded. This resulted in information on 56,397 operations.

193


Chapter S1

Figure 1: Overview of data sources valve surgery in The Netherlands Figure 1: Overview of data sources valve surgery in The Netherlands Source 1 BHN database

Source 2 NVT database

Source 3 Cause of Death Registry

Source 4 Discharge Registry

data on N= 56,397 operations

data on N= 21,967 operations

data on N= 11,996 operations

data on N = 7,724 operations

1995

6 / 13 centers  Operation rates  Demographic factors

2004 *

15 / 16 centers  Operation rates  Demographic factors

2007

16 / 16 centers  Operation rates  Demographic factors

2010

16 / 16 centers  Demographic factors  Risk factors for mortality  In-hospital mortality

10 / 16 centers  Survival status  Date of death Cause of death (Linked with sources 1 and 2 in April 2012)

10 / 16 centers  Discharge information  Readmission rate  Cause of readmission (Linked with sources 1 and 2 in April 2012)

2011

* In 2004 three new cardio-thoracic centers were opened in The Netherlands * In 2004 three new cardio-thoracic centers were opened in The Netherlands

Source 2: The Netherlands Association for Cardio-Thoracic Surgery (NVT) database In 2007 Netherlands Association for Cardio-Thoracic Surgery (NVT, www.nvtnet.nl) expanded the BHN dataset by adding in-hospital mortality and risk factors for mortality after cardiac surgery, as defined by the EuroSCORE.(14) Since 2007 all 16 cardiac surgery hospitals in The Netherlands have participated in both databases.(12) All records of adult cardiac valve surgery performed from January 2007 until December 2010 were included. Records with one or more missing EuroSCORE variables were excluded (n=795, 1.2%). Again, operations with concomitant aortic surgery and transcatheter aortic valve implantations were excluded. This resulted in 21,967 surgical heart valve operations.

194

239


Trends and outcomes of valve surgery in The Netherlands

Source 3: The Cause of Death Registry Ten out of 16 centers participated in the follow-up of valve surgery performed from 2007 to 2010 (n=12,286 heart valve operations). The linkage between the NVT database and the Cause of Death Registry has previously been described in detail.(12;15) Records were matched by date of birth, sex and postal code. At the time of analysis, survival status, date of death and cause of death up to 31 December 2011 was available. In total 11,996 records (97.6%) of valve surgery could successfully be linked. Source 4: The Hospital Discharge Registry The Hospital Discharge Registry contains information on a majority of the hospital admissions in The Netherlands. The linkage between the NVT database and the Hospital Discharge Registry has previously been described in detail.(12;16) At the time of analysis, information on hospital discharges up to 31 December 2010 was available. A subset of valve surgery performed from 2007 to 2009 was linked to the registry (n = 9,126 heart valve operations), to ensure a minimum follow-up of one year for all records.(12) The primary admission of 7724 (84.6%) heart valve operations could be found in the Hospital Discharge Registry and 7654 (83.9%) records could be followed for one year after surgery. Non-linkage was mainly attributable to the fact that linking was performed on date of birth, sex and postal code. This method has been applied in previous studies.(17) Analyses Volumes The annual number of heart valve operations was determined using the BHN database [source 1]. Due to incomplete registration in the first years, a Poisson regression model was used to impute the number of valve operations from 1995 to 2006. The Poisson regression model used institution, age and gender as categorical covariates to model the procedural volume in time, with the size of the population at risk as an offset term. Time was defined as calendar year and modeled as a continuous covariate. Age was divided in 15 categories of each 5 years. The size of the Dutch adult population in each year between 1995 and 2006 was obtained from the National Statistics Bureau.(15) The estimated procedural volumes are presented with 95% confidence intervals between brackets. Annual changes in age, gender and prosthesis type were assessed in the observed dataset. Overall, age was missing for 0.04% and gender for 0.5%. The prosthesis type was unknown for 1.04% in the aortic valve group and for 0.8% in the mitral valve group. Additional surgery for cardiac arrhythmia is included in all subgroups and additional tricuspid valve surgery is included in all mitral valve subgroups, as both are considered to be part of the treatment for (mitral) valve disease. All volumes were calculated on intervention level, which means that the volumes refer to the number of performed operations. 195


Chapter S1

Patient demographics and risk factors for mortality Patient demographics of all heart valve surgery in adult patients from 1995 to 2010 were extracted from the BHN database [source 1]. Additional risk factors for in-hospital mortality for the years 2007 to 2010 were extracted from the NVT database [source 2]. A trend in the annual prevalence of risk factors and in-hospital mortality was tested using linear regression for continuous variables, logistic regression for binary variables and quantile logistic regression for the logistic EuroSCORE, with time modeled in years as an ordinal variable. All analyses on patient demographics and risk factors were performed on intervention level, which means that the results refer to the number of operations instead of patients. Mortality In-hospital mortality could be obtained from the NVT database [source 2]. Survival status after discharge was derived from the Cause of Death Registry [source 3]. Mortality was thus only available for surgery performed from 2007 to 2010. Thirty-day, operative- and 120-day mortality rates were assessed. Non-parametric survival analysis was performed using the Kaplan-Meier method to obtain survival up to 5 years.(18) Survival analysis was stratified by type of surgery. Causes of death were assessed for cases of 30-day and one-year mortality and were defined by the following ICD-10 codes: for cardiac causes I.01, I.05-I.09, I.11, I.13, I.20-I.27, I.30-I.52, R.001, for pulmonary causes all J-codes and R.090, R.093, for infectious causes all A and B codes, for neurological causes all G-codes, for vascular and renal causes all other I-codes except those defined as cardiac and R.048. All analyses on mortality and cause of death were performed on intervention level. Reoperations Reoperations could be identified using the Cause of Death Registry [source 3], as this provided information on person-level. A second valve operation recorded in a patient in the cohort was designated as a valve-related operation if the same valve is involved in the reoperation. In the analyses of the reoperation rate the competing risk of mortality is accounted for. Analyses on reoperation rates were the only analyses performed on patient level, which means that the results refer to the number of patients. Postoperative period The length of stay and information on readmissions of surgery performed from 2007 to 2009 were obtained from the Hospital Discharge Registry [source 4]. The length of stay is reported as the median with the interquartile range (IQR) in days. The number of readmissions and the causes of readmission were assessed for each type of surgery. The causes of readmission were defined by the following ICD-9 codes: for cardiac causes 391, 393-398, 402, 404, 410-

196


Trends and outcomes of valve surgery in The Netherlands

416, 420-429 and for cerebrovascular causes 430-459. All analyses on length of stay and readmissions were performed on intervention level. All analyses were performed in R version 2.15(19).

Results All valve surgery Volume In the past 16 years an estimated total of 62,524 valve operations were performed (Figure 2) [source 1]. A substantial increase in the annual number of valve operations was seen in the past 16 years: from an estimated 2,431 (95% CI 2,402-2,459) in 1995 to 5,690 (observed) in 2010 (+143%). Per 100,000 Dutch adults the volume increased from 20 operations in 1995 to 43 operations in 2010 (+115%). The type of operations are shown in Figure 3. The majority of operations was aortic valve (AoV) surgery with or without coronary artery bypass grafting (CABG) (56.4% in 2010). The largest shift in annual proportion was seen in the mitral valve surgery group: valve replacement decreased with 8%, while valve repair increased with 8.6%, over the studied period. An increase in concomitant rhythm surgery was seen from 0.2% in 1995 to 7.4% in 2010. Figure 2: Annual number of valve surgery per 100,000 adults from 1995 to 2010 in The Netherlands

Figure 2: Annual number of valve surgery per 100,000 adults from 1995 to 2010 in The Netherlands

The dotted line (grey) represents the observed number of operations per 100,000 adults in the BHN database. The solid line (black) shows the expected number of operations per 100,000 adults. The dashed line the expected number of operations adjusted for the ageing population. It The dotted line(black) (grey) is represents the observed number of operations per 100,000 adults in the represents the estimated number of operations if the age distribution in the Dutch population would BHN The solidsince line 1995. (black) shows the expected number of operations per 100,000 havedatabase. stayed the same adults. The dashed line (black) is the expected number of operations adjusted for the ageing population. It represents the estimated number of operations if the age distribution in the Dutch population would have stayed the same since 1995.

197


Chapter S1

Figure 3: Annual distribution of type of valve surgery from 1995 to 2010 Figure 3: Annual distribution of type of valve surgery from 1995 to 2010

The annual distribution of type of valve surgery is shown from 1995 to 2010. AoV: aortic valve, MV: mitral valve (with or without tricuspid valve), AoM: aortic + mitral valve, iso: isolated procedure without concomitant surgery. CABG: coronary artery Concomitant rhythm surgery is The annual distribution of type of valve surgery is bypass showngrafting. from 1995 to 2010. AoV: aortic valve, included in all groups, concomitant tricuspid surgery is included in the mitral valve groups. MV: mitral valve (with or without tricuspid valve), AoM: aortic + mitral valve, iso: isolated

procedure without concomitant surgery. coronary artery bypass grafting. Concomitant Patient demographics and risk factors forCABG: mortality

Mean age in theis operations from 1995 to 2010 increased with 1.2 rhythm surgery included in performed all groups, concomitant tricuspid surgery issignificantly included in the mitral years groups. each four years (p < 0.001). Patients were 64.5Âą12.7 years in the first four years of valve

the registry (1995 to 1998) and 68.2Âą11.7 in the last four years of the registry (2007 to 2010). Over 40% of all patients were female; this proportion decreased slightly over the years (45.2% in 1995 versus 41.4% in 2010) [source 1]. Other risk factors for mortality after valve surgery are shown in Table 1 for the period 2007 to 2010. Frequently encountered risk factors were moderate left ventricular dysfunction (20.5%) and chronic pulmonary disease (14.3%). Active endocarditis was present in 3.6%. Overall, the risk profile of patients did not change from 2007 to 2010, as the logistic EuroSCORE remained stable (p=1.000) [source 2]. When subdivided into risk categories, in 44% of the operations, the patient had a logistic EuroSCORE of <5%, in 31% a logistic EuroSCORE of 5-10%, in 17% a logistic EuroSCORE of 10-20% and 9% of all records the patient had a logistic EuroSCORE higher than 20%. This distribution in risk categories did not change between 2007 and 2010 (p = 0.071) [source 2].

198

241


Trends and outcomes of valve surgery in The Netherlands

Table 1: EuroSCORE risk factors of all valve surgery from 2007-2010 in The Netherlands Source 2: NVT database Risk factors for mortality after cardiac surgery Mean age (sd) Female Serum creatinine >200 Îźmol/l Extracardiac arteriopathy Pulmonary disease Neurological dysfunction Previous cardiac surgery Recent myocardial infarct LVEF 30â&#x20AC;&#x201C;50% LVEF <30% Systolic pulmonary pressure >60 mmHg Active endocarditis Unstable angina Emergency operation Critical preoperative state Ventricular septal rupture logistic EuroSCORE Mean Median In hospital mortality

Total N = 21,967 (%) 67.7 (+/- 11.7) 8904 (40.5) 517 (2.4) 2253 (10.3) 3146 (14.3) 714 (3.3) 2341 (10.7) 1041 (4.7) 4500 (20.5) 1437 (6.5) 1430 (6.5) 785 (3.6) 307 (1.4) 731 (3.3) 805 (3.7) 17 (0.1) 8.9 (+/- 10.3) 5.8 (IQR 3.3-9.5) 919 (4.2)

NVT database: The Netherlands Association for Cardio-Thoracic Surgery (NVT) database; s.d.: standard deviation; IQR: interquartile range.

Outcomes In-hospital mortality for all valve surgery performed from 2007 to 2010 was 4.2% and showed a significantly decreasing trend from 4.6% in 2007 to 3.6% in 2010 (p = 0.003). In-hospital mortality, 30-day mortality, operative mortality (i.e. in-hospital and/or 30-day mortality) and 120-day mortality specified by type of operation is shown in Table 2. Lower mortality rates were reported for single valve surgery than when two valves were involved. Mortality was higher when concomitant CABG surgery was performed in AoV, MV and multiple-valve surgery [source 2 and source 3]. Survival up to 4.5 years after surgery is shown in Figure 4. One-year survival after all types of valve surgery was 91.6%, 2-year survival 88.6%, 3-year survival 85.8% and 4-year survival 82.6%. The reoperation rate on the same valve was 1.6% after one year, 1.8% after two years and 2.0% after three years, with the mitral valve being the most frequently reoperated valve (Figure 5) [source 3].

199


200

7062 4334 159 441

4.8 6.2 7.9 4.8

7.6 8.8 12.7 8.2

3.0 5.0 11.3 5.0

3.6 6.3 14.5 6.1

4.9 8.5 15.7 8.2

5.1 6.6 8.2 5.6

12744 8151 343 729 3.2 5.4 10.5 5.2

3.6 7.1 11.3 18.1 3.2 6.8 10.7 17.0 11.1 2.7 4.8 8.5 15.3 1.9 5.2 8.4 13.7 9.0 2.3 3.8 6.9 12.5 1.6 3.8 7.5 11.5 7.1 7.3 8.0 12.6 13.2 5.4 9.0 10.0 10.5 9.9 4.9 6.1 7.0 7.7 3.6 6.3 5.9 7.1 5.9

3984 2688 655 248 1654 1105 438 182 1042

2.3 4.0 6.9 13.3 1.6 4.4 8.5 11.9 7.8

7.8 8.7 12.7 13.8 5.9 10.4 11.3 11.7 11.2

5.1 6.3 7.2 8.1 3.7 7.0 6.7 7.6 6.6

7121 4889 1240 474 2911 2137 856 411 1928 8.2 9.8 13.2 10.5

5.1 6.4 10.7 11.7

3.6 5.2 7.9 8.8

3.0 4.1 6.2 7.5

7.6 8.3 9.5 10.1

6.5

5.5 4.7 6.2 5.5

4.8

6838 3055 1718 385

3.9

3.1 4.1 7.9 9.3

8.1

8.2 9.1 10.8 11.8

5.5

5.8 5.1 7.0 6.7

11996

Mean

12423 5863 3039 642

8.9

Median

Source 3: Cause of Death Registry Logistic EuroSCORE 30-day mortality

5.8

Mean

N Operative mortality

21967

Median 4.2

In hospital mortality

Source 2: NVT database Logistic EuroSCORE 120-day mortality

NVT: The Netherlands Association for Cardio-Thoracic Surgery, AoV: aortic valve, MV: mitral valve (with or without tricuspid valve), AoM: aortic + mitral valve, repl.: replacement, CABG: coronary artery bypass grafting. Concomitant rhythm surgery is included in all groups, concomitant tricuspid surgery is included in the mitral valve groups.

All valve surgery Type of valve Single valve: AoV Single valve: MV Double valve Other Type of surgery Isolated AoV repl. AoV repl.+ CABG Isolated MV repl. MV repl.+ CABG Isolated MV repair MV repair + CABG AoM AoM + CABG Other Concomitant surgery Isolated valve Valve + CABG Valve + CABG + other Valve + other

N

Table 2: Early mortality by type of valve and type of surgery

Chapter S1


Trends and outcomes of valve surgery in The Netherlands Figure 4: Survival after valve surgery stratified by type of surgery

Figure 4: Survival after valve surgery stratified by type of surgery

The Kaplan Meier curve survival curve after valve surgery is shown, stratified by type of surgery. AoV: coronary artery bypass grafting. Concomitant rhythm surgery is included in all groups, concomitant tricuspid surgery AoV: aortic valve, MV: mitral valve, AoM: aortic + mitral valve, repl.: replacement, CABG: is included in the mitral valve groups. aortic valve, MV: mitral valve, AoM: + mitral valve, stratified repl.: replacement, CABG: The Kaplan Meier curve survival curve afteraortic valve surgery is shown, by type of surgery. coronary artery bypass grafting. Concomitant rhythm surgery is included in all groups, concomitant tricuspid surgery is included in the mitral valve groups. Figure 5: Risk of valve-related reoperation stratified by type of surgery

Figure 5: Risk of valve-related reoperation stratified by type of surgery

242 The cumulative cumulative incidence curvecurve is shown for the risk of valve-related reoperation, stratified by The incidence is shown for the risk of valve-related reoperation, stratified by type of surgery. AoV: AoV: aorticaortic valve, MV:MV: mitral valve, AoM: aortic type of surgery. valve, mitral valve, AoM: aortic+ mitral + mitralvalve, valve,repl.: repl.:replacement, CABG: coronary artery bypass grafting. Concomitant rhythm surgery is included in all groups, concomitant replacement, CABG: coronary artery bypass grafting. Concomitant rhythm surgery is included in tricuspid surgery is included in the mitral valve groups. all groups, concomitant tricuspid surgery is included in the mitral valve groups.

201


Chapter S1

In the first 30 days 86.7% of all deaths can be attributed to a cardiac cause (Table 3). After one year, the proportion had decreased to 71.3%, whereas pulmonary, infectious, neurologic, renal, vascular and other noncardiac causes played a larger role than in the first postoperative days [source 3]. Table 3: Cause of mortality after valve surgery Source 3: Cause of Death Registry a

30-day mortality N=11996

Causes of death Cardiac Pulmonary Infection Neurologic (Cerebro-)vascular/renal Other Unknown

467 (100.0) 405 (86.7) 10 (2.1) 9 (1.9) 0 (0) 9 (1.9) 32 (6.9) 2 (0.4)

1-year mortality N=11996 1006 (100.0) 719 (71.3) 45 (4.5) 35 (3.5) 5 (0.5) 46 (4.6) 156 (15.5) 3 (0.3)

Cause of death in agematched general population, 2007-2011(45) 100 % 16.2 % 7.7 % 1.4 % 2.9 % 7.8 % 63.9 %b -

Results are based on operations performed between 1 January 2007 and 31 December 2010 in 10 out of 16 centers that were linked to the Cause of Death Registry; b Neoplasms are the leading cause of death among the causes defined as â&#x20AC;&#x153;otherâ&#x20AC;?. a

Postoperative period In 48.2% of valve surgery performed in 2007 to 2009 the patient was directly discharged to home or another facility (such as a nursing home), the remainder was discharged to another hospital (Table 4). The median postoperative length of stay in the primary hospital was 7 days (IQR 5-11). However when subsequent stay in secondary hospitals (51.8% of patients) was included, the median total length of stay was 11 days (IQR 8-16). In the first postoperative year 43.2% of all patients were readmitted to a hospital. The cause of readmission was cardiac in 30.1% (Table 4) [source 4].

202


Trends and outcomes of valve surgery in The Netherlands

Table 4: Hospital admission and readmission data in patients undergoing valve surgery Source 4: Hospital Discharge Registry a In-hospital mortality primary center Transfer to secondary centers In-hospital mortality including secondary centers Post-operative length of stay In primary hospital, mean (days) In primary hospital, median (days) Including secondary hospital, mean (days) Including secondary hospital, median (days) Readmissions one year after surgery None 1 -2 3 -4 5 or more Unknown Causes all readmissions Cardiac Cerebrovascular Other

N= 7724 353 4003 395

% or s.d. or IQR 4.6 51.8 5.1

10.6 7 15.1 11

13.1 5-11 14.9 8-16

4318 2525 583 228 70

55.9 32.7 7.5 3.0 0.9

2066 329 4478

30.1 4.8 65.2

S.d.: standard deviation; IQR: interquartile range. a Results are based on operations performed between 1 January 2007 and 31 December 2009 in 10 out of 16 centers that were linked to the Hospital Discharge Registry.

Aortic valve surgery Volume The procedural volume of isolated aortic valve (AoV) surgery increased from 952 (95% CI 932-974) operations in 1995 to 2,008 operations in 2010. The proportion with concomitant CABG showed an increasing trend as well: from an estimated 639 (95%CI 616-667) operations in 1995 to 1,248 in 2010. AoV repair accounted for a small proportion of the total sum (less than 1%). In 1995, 61% of the patients undergoing AoV replacement with or without CABG received a mechanical prosthesis. This proportion decreased to 21% in 2010, reflecting the increasing use of bioprostheses in all age categories, as shown in Figure 6 [source 1].

203


Chapter S1

Figure 6: Prosthetic valve type in patients undergoing aortic valve surgery with or Figure 6: Prosthetic typebypass in patients undergoing aortic valve1995 surgery or without coronary without coronaryvalve artery grafting (CABG), from to with 2010 artery bypass grafting (CABG), from 1995 to 2010

The type of prostheses implanted in patients undergoing isolated aortic valve surgery with or without The type in patients undergoing aortic valve(12%); surgery CABG. PanelofA:prostheses all ages overimplanted 18 years, N=31,225 (100%); Panel B: 18isolated to 55 years, N=3,629 Panelwith or C: 55 to 65 years, N=5,875 (19%); Panel D: 65 to 70 years, N=4,913 (16%): Panel E: 70 years and older, without CABG. Panel A: all ages over 18 prosthesis years, N=31,225 (100%); Panel B: 18 to 55 years, N=16,793 (54%). Only mechanical and biological are displayed (97.6%).

N=3,629 (12%); Panel C: 55 to 65 years, N=5,875 (19%); Panel D: 65 to 70 years, N=4,913

(16%): Panel E: 70 years and older, N=16,793 (54%). Only mechanical and biological prosthesis are displayed (97.6%).

204


Trends and outcomes of valve surgery in The Netherlands

Patient demographics The mean age in AoV replacement surgery with or without CABG increased significantly from 66.3Âą11.5 in the first four years of the registry (1995 to 1998) to 70.2Âą10.8 in the last four years of the registry (2007 to 2010) (p<0.001) (Figure 7) [source 1]. A significant increase in the median logistic EuroSCORE was found: from 5.5% in 2007 to 5.9% in 2010 (p=0.008) [source 2]. Figure 7: Age and gender distributions of patients undergoing valve surgery from 1995 to 2010 7: Age and gender distributions of patients undergoing valve surgery from 1995 to 2010 Figure

Upper forfor male andand female patients undergoing valvevalve surgery in 1995 Upperpanel: panel:Age Agedistribution distribution male female patients undergoing surgery in compared 1995 to 2010. Lower panel: Trend in mean age (with 95% CI) for aortic valve surgery (A) and mitral valve compared 2010. panel: Trend in mean age16(with 95% CI) for aortic valve surgery (A) surgery (B) to with (red)Lower and without CABG (blue) over years, shown in four-year intervals. and mitral valve surgery (B) with (red) and without CABG (blue) over 16 years, shown in fouryear intervals.

205


Chapter S1

Outcomes In-hospital mortality for AoV surgery with or without CABG decreased significantly from 3.5% in 2007 to 2.4% in 2010 (p = 0.004) [source 2]. One-year survival after isolated AoV replacement and AoV replacement with CABG was 94.8% and 90.6% respectively, slowly decreasing to a 4-year survival of 85.8% and 80.5% respectively (Figure 4). With a reoperation rate of 1.5% after three-years, reoperations were rarely performed in AoV surgery [source 3]. Mitral valve surgery Volume The procedural volume of isolated mitral valve surgery nearly doubled from an estimated 453 (437-469 95% CI) operations in 1995 to 1,099 operations in 2010. Mitral valve surgery combined with CABG increased from 251 (95% CI 238-264) in 1995 to 605 (observed) operations in 2010. A clear shift was seen towards valve repair: while in 1995 approximately half of the isolated operations were valve repairs (47.1%), in 2010 this proportion had increased to 75.6%. Mechanical prostheses are less often implanted; this led to a decrease Figure 8: Type operation in in patients undergoing isolated from 43.4% in of 1995 to 13.8% 2010 (Figure 8) [source 1]. mitral valve surgery from 1995 to 2010

Figure 8: Type of operation in patients undergoing isolated mitral valve surgery from 1995 to 2010

The annual distribution of type of operation is displayed for patients undergoing isolated mitral valve

surgery with or withoutoftricuspid valve and/or rhythm surgery (N=17,445). The annual distribution type of operation is displayed for patients undergoing isolated mitral valve surgery with or without tricuspid valve and/or rhythm surgery (N=17,445).

206


Trends and outcomes of valve surgery in The Netherlands

Patient demographics The mean age in mitral valve surgery (repair or replacement) with and without concomitant CABG, increased slightly over the years from 63.8Âą11.7 in the first four years of the registry (1995 to 1998) to 65.8Âą11.4 in the last four years of the registry (2007 to 2010) (p < 0.001) (Figure 7) [source 1]. No trend in logistic EuroSCORE between 2007 and 2010 could be detected (median logistic EuroSCORE 5.5%, IQR 2.9-10.6, p = 0.917) [source 2]. Outcomes In-hospital mortality for mitral valve surgery (repair or replacement) with or without concomitant CABG did not change over the years 2007 to 2010 (4.3%, 4.5%, 3.9%, 4.2%, p = 0.626) [source 2]. Mitral valve reconstruction was associated with a considerably lower operative risk than mitral valve replacement according to the EuroSCORE. Accordingly, a lower early mortality rate was found: 2.7% in-hospital mortality and a median logistic EuroSCORE of 4.9% for repair surgery, versus 8.6% in-hospital mortality and a median logistic EuroSCORE of 7.5% for mitral valve replacement (Table 2). Survival up to four years after surgery showed similar results (Figure 4). The reoperation rate after three years was 2.6% for mitral valve repair and 3.6% for mitral valve replacement with or without CABG (Figure 5) [source 3].

Discussion Primary findings This is the first time that procedural volumes and trends, patient characteristics, survival, cause of death, reoperation rates, readmission rates and other information have been gathered for a variety of heart valve operations in a national database. The findings help us understand the developments that have been taking place in the rapidly evolving field of valve surgery and provide indispensable knowledge for clinicians, governing bodies and policy makers in healthcare. In the past 16 years there has been a spectacular increase in the number of heart valve operations performed in The Netherlands: from 20 operations in 1995 to 43 operations in 2010, per 100,000 adults. Despite no apparent change in patient risk profile, in-hospital mortality of all valve surgery has decreased significantly over the last five years. This suggests that the trend of improved outcomes and safer surgery has continued up to this day. In addition, scientific evidence might have changed our view over the years: the age at which clinicians start to consider a bioprosthetic valve has decreased and mitral valve surgery is currently dominated by repair instead of replacement. Mitral valve replacement did not result in lower valve-related reoperation rates in the first four years after surgery. Although survival after all valve surgery was satisfactory, the number of readmissions in the first year

207


Chapter S1

after surgery seemed alarmingly high. Analyses of the cause of readmission revealed that less than a third was cardiac related. Further investigation of the exact reasons is desirable. Increasing number of valve operations The increasing number of valve operations seen in The Netherlands, even after correction for the demographic change in the general population, is in concordance with the trend observed in many other countries in Europe.(11;20-28) A large study based on data from the Society for Cardio-thoracic Surgery (SCTS) in Great Britain and Ireland demonstrated a 26% increase in the number of patients undergoing AoV surgery from 2004 to 2009.(6) In the Society of Thoracic Surgeons (STS) database in North America more than a doubling of heart valve operations was reported in the period 2003 to 2007 when compared to the period 1993 to 1997.(5) This trend can partly be attributed to an increased prevalence of valvular heart disease and partly to an increasing proportion of diseased patients diagnosed as such. Because the mean age of the patients has risen, both are likely to have played an important role.(5;6;8;29) In addition, improvement of treatment in terms of short term and long term complications has broadened the indication for surgery over the years.(30) Mitral valve repair versus replacement Mitral valve repair is shown to be associated with superior long-term freedom from stroke and infection, improved left ventricular function, lower operative mortality and long term mortality.(31-35) Most large studies comparing these techniques were published in the last decade. Accordingly, in The Netherlands, mitral valve repair gained momentum approximately 10 years ago. Currently, more than three quarters (75.6%) of all mitral valve surgery is performed by repair. This proportion is substantially higher than reported in other European countries; 65% in Germany, 67% in Sweden and 67% the United Kingdom. (11;18;24) Biological and mechanical prostheses Our findings showed that the age at which a bioprosthesis is considered for aortic valve replacement, has shifted to somewhere between 55 and 65 years (Figure 6). Where less than 20% of these patients received a bioprosthesis in the nineties, a rapid increase in popularity in the last years resulted in the implantation of a bioprosthesis in almost half of the patients in 2010. Concurrently, in patients of 65 years and older, the choice for a bioprosthesis seems to have become more obvious nowadays. Long-term studies and simulation studies showed less neurological and functional complications in biological relative to mechanical prosthesis, without compromising on durability and survival.(36-38) This evidence is likely to have stimulated the use of bioprostheses in patients under 65. The increased use of bioprostheses was also observed in several other studies.(6;10;32)

208


Trends and outcomes of valve surgery in The Netherlands

Patient risk profile The patients treated with valve surgery have become older over the years. This trend was also observed in other studies and other European national databases.(5;6;8;29) In the UK and Ireland, the United States and Canada, an overall worsening of the patient risk profile has been shown.(5-8;29) In our study, this could only be confirmed in the population undergoing aortic valve surgery. Besides the aging of the population, a broadening of the indication for valve surgery may also have influenced the patient risk profile over the years. (31) The reported mean EuroSCORE was comparable across the national databases in Europe: a mean additive EuroSCORE between 6.2 and 6.7 for isolated valve operations and a mean additive EuroSCORE between 6.1 and 6.3 for isolated AoV operations.(11;20-22) Most reports on European databases did not publish preoperative risk factors or the EuroSCORE (23;24;26;39). Mortality Over the years 2007 to 2010 a declining trend in crude in-hospital mortality for all types of valve surgery was seen: annual in-hospital mortality rates decreased from 4.6% to 3.6%. A similar trend was found for operative mortality in most other studies and databases, which could reflect a combination of improved healthcare in general, more healthy aging and gradual improvements over time in cardiac surgical care.(5-7;10;20;29) Mortality rates in our study were relatively low in comparison to those reported in North America. For example, the STS reported an operative mortality of 5.6% for all types of valve surgery in the most recent period, versus 4.8% in our study.(5;40;41) In New York State valve surgery with or without CABG surgery resulted in an operative mortality rate of 5.0% in 2010 (4.6% in our population).(42) In Canada in-hospital mortality of 2.5% was reported for isolated AoV replacement and 5.3 % for AoV with concomitant CABG (versus 2.3% and 4.0% respectively in our cohort).(7) In Europe, the national databases reported comparable results after isolated AoV replacement: 2.3% in-hospital and 30-day mortality in The Netherlands, 2.8% in-hospital mortality in the UK and Ireland, and 2.4% and 1.6% 30-day mortality in 2010 and 2011 in Sweden (additive EuroSCORE between 6.2 and 6.4).(11;20-22) Previous European studies showed comparable results. In a more recent study using SCTS data the in-hospital mortality for AoV replacement was 4.1%, compared to 3.0% in our study.(6) The results of the Euro Heart Survey closely resembled our findings, with operative mortality of 2.7 and 4.3% for AoV replacement and AoV replacement with CABG (versus 2.3% and 4.0% in our study).(9) In general, mortality after valve surgery seems to be comparable in Europe. Transatlantic evaluation of mortality after valve surgery is difficult, as risk profiles cannot be compared and crude mortality rates should be interpreted with caution.

209


Chapter S1

Comparison with other cohorts Data from several other cohorts of heart valve patients have been collected and published previously, mainly in Europe and North America.(5;10;11;20;21;23;25;26;29;39;42) Remarkably, a comparison of volumes, patient risk profile and outcomes was difficult to accomplish, if possible at all. Firstly, valve surgery can be categorized in many different ways: by operated valve, by type of surgery on the valve, by the number of valves, by the presence of concomitant surgery, and so on. Secondly, risk factor definitions for valve surgery differ across databases, impeding a comparison of risk profiles. And lastly, reported outcomes vary from in-hospital mortality and 30-day mortality, to postoperative complications defined in many different ways. Although present for some years now, data is not commonly collected according to the guidelines for reporting of mortality and morbidity after heart valve operations.(43) This is not surprising, considering the fact it is laborious to collect these data for large cohorts such as ours. In the future, a global consensus on the categorization methods for valve surgery, risk factor definitions and outcomes would maximize the possibilities for comparison and merger of valve surgery data across the world. Strengths and possible limitations The main strength of this study is the nationwide coverage. Demographic factors and type of surgery have been recorded of all valve surgery performed since 1995. Risk factors for mortality have been collected since 2007, allowing a more complete picture of the study population. However, the dataset also indicates the limitations of this study. Firstly, registration of risk factors over four years is not likely to be sufficient to identify subtle trends in time. Furthermore, the collected outcome variable is limited to in-hospital mortality. Linkage to national registries provided information on survival status over a longer period of time. However, six out of 16 centers did not participate in the linkage, forcing us to accept follow-up information at the expense of a national coverage. In addition, no information is collected on postoperative complications. For this reason, even with longer a followup period, inferences on valve durability cannot be made. The NVT expanded the dataset in 2012 with complications and other variables, to allow better evaluation of outcomes. Finally, it must be stressed that the goal of this study was to describe observed trends. For this reason, risk adjustment was not performed and no inferences must be drawn with regard to the differences in outcomes across the types of valve surgery. Conclusion The results of this study provide a comprehensive view of trends and outcomes of valve surgery in The Netherlands. The number of heart valve operations performed in The Netherlands has strongly increased since 1995. Mortality after valve surgery has decreased, without indication of a decrease in patient risk, which suggests improved outcomes and

210


Trends and outcomes of valve surgery in The Netherlands

safer surgery. Bioprostheses are increasingly implanted at the expense of mechanical valves and mitral valve surgery is currently dominated by repair instead of replacement. Outcomes after all valve surgery were satisfactory and results were comparable to those found in other large national databases in Europe.

211


Chapter S1

Reference List 1. 2.

3. 4. 5. 6.

7. 8. 9. 10.

11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23.

212

GLOVER RP, O’NEILL TJ, BAILEY CP. Commissurotomy for mitral stenosis. Circulation 1950 Mar;1(3):329-42. Bonow RO, Carabello BA, Chatterjee K, de LA, Jr., Faxon DP, Freed MD, et al. 2008 focused update incorporated into the ACC/AHA 2006 guidelines for the management of patients with valvular heart disease: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines (Writing Committee to revise the 1998 guidelines for the management of patients with valvular heart disease). Endorsed by the Society of Cardiovascular Anesthesiologists, Society for Cardiovascular Angiography and Interventions, and Society of Thoracic Surgeons. J Am Coll Cardiol 2008 Sep 23;52(13):e1-142. Carabello BA, Paulus WJ. Aortic stenosis. Lancet 2009 Mar 14;373(9667):956-66. Edwards FH, Peterson ED, Coombs LP, DeLong ER, Jamieson WR, Shroyer ALW, et al. Prediction of operative mortality after valve replacement surgery. J Am Coll Cardiol 2001 Mar 1;37(3):885-92. Lee R, Li S, Rankin JS, O’Brien SM, Gammie JS, Peterson ED, et al. Fifteen-year outcome trends for valve surgery in North America. Ann Thorac Surg 2011 Mar;91(3):677-84. Dunning J, Gao H, Chambers J, Moat N, Murphy G, Pagano D, et al. Aortic valve surgery: marked increases in volume and significant decreases in mechanical valve use--an analysis of 41,227 patients over 5 years from the Society for Cardio-thoracic Surgery in Great Britain and Ireland National database. J Thorac Cardiovasc Surg 2011 Oct;142(4):776-82. Hassan A, Quan H, Newman A, Ghali WA, Hirsch GM. Outcomes after aortic and mitral valve replacement surgery in Canada: 1994/95 to 1999/2000. Can J Cardiol 2004 Feb;20(2):155-63. Thourani VH, Weintraub WS, Craver JM, Jones EL, Mahoney EM, Guyton RA. Ten-year trends in heart valve replacement operations. Ann Thorac Surg 2000 Aug;70(2):448-55. Iung B, Baron G, Butchart EG, Delahaye F, Gohlke-Barwolf C, Levang OW, et al. A prospective survey of patients with valvular heart disease in Europe: The Euro Heart Survey on Valvular Heart Disease. Eur Heart J 2003 Jul;24(13):1231-43. Brown JM, O’Brien SM, Wu C, Sikora JA, Griffith BP, Gammie JS. Isolated aortic valve replacement in North America comprising 108,687 patients in 10 years: changes in risks, valve types, and outcomes in the Society of Thoracic Surgeons National Database. J Thorac Cardiovasc Surg 2009 Jan;137(1):8290. Sixth National Adult Cardiac Surgical Database Report 2008. Dendrite Clinical Systems Ltd; 2009 Jul. Siregar S, Groenwold RH, Versteegh MI, Takkenberg JJ, Bots ML, van der GY, et al. Data Resource Profile: Adult cardiac surgery database of the Netherlands Association for Cardio-Thoracic Surgery. Int J Epidemiol 2013 Feb 9. Begeleidingscommissie Hartinterventies Nederland. http://www.bhn-registratie.nl/2013 Nashef SA, Roques F, Michel P, Gauducheau E, Lemeshow S, Salamon R. European system for cardiac operative risk evaluation (EuroSCORE). Eur J Cardiothorac Surg 1999 Jul;16(1):9-13. Centraal Bureau voor de Statistiek. www.cbs.nl. Den Haag, The Netherlands. Dutch Hospital Data. www.dutchhospitaldata.nl. Utrecht, The Netherlands. Vaartjes I, Hoes AW, Reitsma JB, de BA, Grobbee DE, Mosterd A, et al. Age- and gender-specific risk of death after first hospitalization for heart failure. BMC Public Health 2010;10:637. Kaplan EL, Meier P. Nonparametric estimation from incomplete estimations. Journal of the American Statistical Association 1958;53:457-81. R: A Language and Environment for Statistical Computing [computer program]. Vienna, Austria: R Foundation for Statistical Computing; 2011. Swedeheart Annual Report 2011. http://www.ucr.uu.se/swedeheart/index php/arsrapporter/doc_ download/178-swedeheart-annual-report-2011-english. 2012 September 14 The Danish Heart Register. http://www.dhreg.dk. 2010 April 8. Swedeheart Årsrapporter 2010. http://www.ucr.uu.se/hjartkirurgi/index php/arsrapporter. 2011 December 2. BACTS Cardiac Surgical Database Report FINAL REPORT 2008. 2012.


Trends and outcomes of valve surgery in The Netherlands

24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44.

Les bases de données de la Société Francaise de Chirurgie Thoracique et Cardio-Vasculaire, Neuvième Rapport: Juin 2012. 2012 Jun 1. Hjertekirurgiregisteret (Norwegian Register for Cardiac Surgery) 2010. 2012. Funkat AK, Beckmann A, Lewandowski J, Frie M, Schiller W, Ernst M, et al. Cardiac Surgery in Germany during 2011: A Report on Behalf of the German Society for Thoracic and Cardiovascular Surgery. Thorac Cardiovasc Surg 2012 Sep;60(6):371-82. Gummert JF, Funkat A, Beckmann A, Schiller W, Hekmat K, Ernst M, et al. Cardiac surgery in Germany during 2009. A report on behalf of the German Society for Thoracic and Cardiovascular Surgery. Thorac Cardiovasc Surg 2010 Oct;58(7):379-86. Gummert JF, Funkat AK, Beckmann A, Ernst M, Hekmat K, Beyersdorf F, et al. Cardiac surgery in Germany during 2010: a report on behalf of the German Society for Thoracic and Cardiovascular Surgery. Thorac Cardiovasc Surg 2011 Aug;59(5):259-67. Birkmeyer NJ, Marrin CA, Morton JR, Leavitt BJ, Lahey SJ, Charlesworth DC, et al. Decreasing mortality for aortic and mitral valve surgery in Northern New England. Northern New England Cardiovascular Disease Study Group. Ann Thorac Surg 2000 Aug;70(2):432-7. Vahanian A, Alfieri O, Andreotti F, Antunes MJ, Baron-Esquivias G, Baumgartner H, et al. Guidelines on the management of valvular heart disease (version 2012). Eur Heart J 2012 Oct;33(19):2451-96. Braunberger E, Deloche A, Berrebi A, Abdallah F, Celestin JA, Meimoun P, et al. Very long-term results (more than 20 years) of valve repair with carpentier’s techniques in nonrheumatic mitral valve insufficiency. Circulation 2001 Sep 18;104(12 Suppl 1):I8-11. Gammie JS, Sheng S, Griffith BP, Peterson ED, Rankin JS, O’Brien SM, et al. Trends in mitral valve surgery in the United States: results from the Society of Thoracic Surgeons Adult Cardiac Surgery Database. Ann Thorac Surg 2009 May;87(5):1431-7. Jokinen JJ, Hippelainen MJ, Pitkanen OA, Hartikainen JE. Mitral valve replacement versus repair: propensity-adjusted survival and quality-of-life analysis. Ann Thorac Surg 2007 Aug;84(2):451-8. Milano CA, Daneshmand MA, Rankin JS, Honeycutt E, Williams ML, Swaminathan M, et al. Survival prognosis and surgical management of ischemic mitral regurgitation. Ann Thorac Surg 2008 Sep;86(3):735-44. Rankin JS, Hammill BG, Ferguson TB, Jr., Glower DD, O’Brien SM, DeLong ER, et al. Determinants of operative mortality in valvular heart surgery. J Thorac Cardiovasc Surg 2006 Mar;131(3):547-57. Chan V, Jamieson WR, Germann E, Chan F, Miyagishima RT, Burr LH, et al. Performance of bioprostheses and mechanical prostheses assessed by composites of valve-related complications to 15 years after aortic valve replacement. J Thorac Cardiovasc Surg 2006 Jun;131(6):1267-73. Lund O, Bland M. Risk-corrected impact of mechanical versus bioprosthetic valves on long-term mortality after aortic valve replacement. J Thorac Cardiovasc Surg 2006 Jul;132(1):20-6. Puvimanasinghe JP, Takkenberg JJ, Edwards MB, Eijkemans MJ, Steyerberg EW, Van Herwerden LA, et al. Comparison of outcomes after aortic valve replacement with a mechanical valve or a bioprosthesis using microsimulation. Heart 2004 Oct;90(10):1172-8. Primo report Database Società Italiana di Cardio Chirurgia 2003. 2004. O’Brien SM, Shahian DM, Filardo G, Ferraris VA, Haan CK, Rich JB, et al. The Society of Thoracic Surgeons 2008 cardiac surgery risk models: part 2--isolated valve surgery. Ann Thorac Surg 2009 Jul;88(1 Suppl):S23-S42. Shahian DM, O’Brien SM, Filardo G, Ferraris VA, Haan CK, Rich JB, et al. The Society of Thoracic Surgeons 2008 cardiac surgery risk models: part 3--valve plus coronary artery bypass grafting surgery. Ann Thorac Surg 2009 Jul;88(1 Suppl):S43-S62. Adult Cardiac Surgery in New York State 2007 - 2009. New York State Department of Health; 2012 Feb. Akins CW, Miller DC, Turina MI, Kouchoukos NT, Blackstone EH, Grunkemeier GL, et al. Guidelines for reporting mortality and morbidity after cardiac valve interventions. Ann Thorac Surg 2008 Apr;85(4):1490-5. Statline Statistics Netherlands. http://statline.cbs.nl/statweb/. 2013.

213


S2 Trends and outcomes of coronary artery bypass grafting: 16-year results of The Netherlands Adult Cardiac Surgery Database

Siregar S*, de Heer F*, Groenwold RHH, Versteegh MIM, Bots ML, van der Graaf Y, van Herwerden LA

* Both authors contributed equally to this work


Chapter S2

Abstract Objective To describe procedural volume, patient risk profile and outcomes of isolated coronary artery bypass (CABG) surgery in the past 16 years in The Netherlands. Methods The procedural database for cardiac surgery in The Netherlands includes approximately 200,000 cardiac surgical operations performed between 1995 and 2010. Information on all isolated CABG surgery (115,848 operations) was extracted. We determined trends for changes in procedural volume, demographics, risk profile, and in-hospital mortality of valve operations. Because of incomplete data in the first years of registration, the total number of operations in those years was estimated using Poisson regression. For a subset from 2007 to 2010 survival status was obtained through linkage with the national Cause of Death Registry and survival was analyzed using Kaplan-Meier analysis. Information on discharge and readmissions was obtained from the National Hospital Discharge Registry. Results The annual volume of isolated CABG operations decreased from an estimated 9,200 (95% CI 9,125-9,276) operations in 1995 to 8,368 in 2010. Adjusted for population size in the Netherlands, the number of operations per 100,000 adults decreased from 76.4 in 1995 to 64.1 in 2010. Off-pump surgery is increasingly performed (17% in 2010) and arterial grafts are now used in nearly all operations (95.8% in 2010). In-hospital mortality showed a decreasing trend from 1.5 in 2007 to 1.2 in 2010 (p=0.055), whereas the mean logistic EuroSCORE remained stable (median 2.5, p=0.637). Thirty-day mortality was 1.3% and 120day mortality was 2.3%. At one year, survival after isolated CABG surgery was 96.7% and a reoperation had been performed in 0.6%. The median postoperative length of stay was 5 days (interquartile range 4-7) in the primary hospital and 8 days (interquartile range 6-11) including subsequent stay in the secondary hospital. Conclusion The results of this study provide a comprehensive view of trends and outcomes of isolated CABG surgery in the Netherlands. The number of isolated CABG operations in The Netherlands has declined since 1995. A decreasing trend in in-hospital mortality might reflect a general improvement of the safety in coronary surgery.

216


Trends and outcomes of CABG in The Netherlands

Introduction In the past decades many developments in the field of coronary revascularization have occurred. Improved medical care to prevent and reduce the risk of coronary artery disease, such as statins, antihypertensive drugs, beta-blockers and ACE-inhibitors, and general health measures are likely to have had a profound influence on resource utilization.(1;2) Technical innovations were manifold and expanded revascularization options. For instance, in 2003 the US Food and Drug Administration approved the drug eluting stent (DES), after which a substantial increase in percutaneous coronary interventions (PCI) utilization rate was seen, followed by a decrease in recent years.(3) The use of percutaneous techniques in myocardial infarction has had a major influence on the number of procedures performed. The balance between the utilization of coronary artery bypass grafting (CABG) and PCI in chronic and acute coronary syndromes is continuously redefined by formulation of new practice guidelines.(4-6) In 2009 the results of the SYNTAX trial showed superiority of CABG over PCI in three-vessel or left-main coronary artery disease, which has been confirmed by the recently published 5 year result.(7;8) This new evidence is likely to influence clinicians in their choice of revascularization therapy. In addition, changes in the population, referral pattern and technical aspects of coronary revascularization have influenced trends in patient demographics, risk profile and outcomes.(3;9-13) In this continuously developing field of cardiac surgery, information from the past years provides a valuable tool in helping us understand and prepare for the future. Since 1995 information on CABG operations in The Netherlands has been collected in a national database. The aim of this study is to describe trends in the procedural volume, the patient risk profile and outcomes of CABG surgery in the past 16 years in The Netherlands, using a national database.

Methods Data The data used for this study was derived from four data sources (Figure 1) and is partly described in previously published papers.(14) Source 1: The Supervisory Committee for Heart Interventions Netherlands (BHN) database In The Netherlands, since 1995 all cardiac surgery has been registered by an umbrella foundation set up by cardiologists, pediatric cardiologists, cardiac anesthesiologists and cardiac surgeons, called Supervisory Committee for Heart Interventions Netherlands (Begeleidingscommissie Hartinterventies Nederland, BHN).(15) Demographic details and the type of intervention are registered in this procedural database. Six of the 13 cardiac

217


Chapter S2

surgery centers participated at the start (in 1995) and were soon followed by the rest of the centers in the following years. All records of adult patients undergoing isolated CABG surgery from January 1995 until December 2010 were included in this study. Records with unknown intervention type (4.1%) and concomitant surgery were excluded. This resulted in information on 115,848 interventions. Figure 1: Overview of data sources for coronary artery bypass surgery in The Netherlands

Figure 1: Overview of data sources for coronary artery bypass surgery in The Netherlands Source 1 BHN database

Source 2 NVT database

Source 3 Cause of Death Registry

Source 4 Discharge Registry

data on N= 115,848 operations

data on N= 33,403 operations

data on N= 17,666 operations

data on N = 11,395 operations

16 / 16 centers  Demographic factors  Risk factors for mortality  In-hospital mortality

10 / 16 centers  Survival status  Date of death Cause of death

1995

6 / 13 centers  Operation rates  Demographic factors

2004 *

15 / 16 centers  Operation rates  Demographic factors

2007

16 / 16 centers  Operation rates  Demographic factors

2010

(Linked with sources 1 and 2 in April 2012)

10 / 16 centers  Discharge information  Readmission rate  Cause of readmission (Linked with sources 1 and 2 in April 2012)

2011

* In 2004 three new cardio-thoracic centers were opened in The Netherlands. * In 2004 three new cardio-thoracic centers were opened in The Netherlands.

Source 2: The Netherlands Association for Cardio-Thoracic Surgery (NVT) database In 2007 Netherlands Association for Cardio-Thoracic Surgery (NVT, www.nvtnet.nl) expanded the BHN dataset by adding in-hospital mortality and risk factors for mortality after cardiac surgery, as defined by the EuroSCORE.(16) Since 2007 all 16 cardiac surgery hospitals in The Netherlands have participated in both databases.(14) All records of adult cardiac isolated CABG surgery performed from January 2007 until December 2010 were included. Records

218


Trends and outcomes of CABG in The Netherlands

with one or more missing EuroSCORE variables were excluded (n=245, 0.7%). This resulted in information on 33402 isolated CABG operations. Source 3: The Cause of Death Registry Ten out of 16 centers participated in the follow-up of isolated CABG surgery performed from 2007 to 2010 (n= 18201 CABG operations). The linkage between the NVT database and the Cause of Death Registry has previously been described in detail.(14;17) Records were matched by date of birth, sex and postal code. At the time of analysis, survival status, date of death and cause of death up to 31 December 2011 was available. In total 17,666 records (97.1%) if isolated CABG surgery could successfully be linked. Source 4: The Hospital Discharge Registry The Hospital Discharge Registry contains information on a majority of the hospital admissions in The Netherlands. The linkage between the NVT database and the Hospital Discharge Registry has previously been described in detail.(14;18) At the time of analysis, information on hospital discharges up to 31 December 2010 was available. A subset of CABG surgery performed from 2007 to 2009 was linked to the registry (n=13519 CABG operations), to ensure a minimum follow-up of one year for all records. The primary admission of 11395 isolated CABG operations (84.3%) could be found in the Hospital Discharge Registry and 11305 (83.6%) records could be followed for one year after surgery. Non-linkage was mainly attributable to the fact that linking was performed on date of birth, sex and postal code. This method has been applied in previous studies.(19) Analyses Volumes The annual number of isolated CABG operations was determined using the BHN database [source 1]. Due to incomplete registration in the first years, a Poisson regression model was used to impute the number of valve operations from 1995 to 2006. The Poisson regression model used institution, age and gender as categorical covariates to model the procedural volume in time, with the size of the population at risk as an offset term. Time was defined as calendar year and modeled as a continuous covariate. Age was divided in 15 categories of each 5 years. The size of the Dutch adult population in each year between 1995 and 2006 was obtained from the National Statistics Bureau.(17) The estimated procedural volumes are presented with 95% confidence intervals between brackets. Annual changes in age and graft type were assessed in the observed dataset. Age was missing for 34 records (0.03%) and gender for 635 records (0.6%). All volumes were calculated on intervention level, which means that the volumes refer to the number of performed operations.

219


Chapter S2

Patient demographics and risk factors for mortality Patient demographics of all isolated CABG surgery in adult patients from 1995 to 2010 were extracted from the BHN database [source 1]. Additional risk factors for in-hospital mortality for the years 2007 to 2010 were extracted from the NVT database [source 2]. A trend in the annual prevalence of risk factors and in-hospital mortality was tested using linear regression for continuous variables, logistic regression for binary variables and quantile logistic regression for the logistic EuroSCORE, with time modeled in years as an ordinal variable. All analyses on patient demographics and risk factors were performed on intervention level, which means that the results refer to the number of operations instead of patients. Mortality In-hospital mortality could be obtained from the NVT database [source 2]. Survival status after discharge was derived from the Cause of Death Registry [source 3]. Mortality was thus only available for surgery performed from 2007 to 2010. Thirty-day, operative- and 120-day mortality rates were assessed. Non-parametric survival analysis was performed using the Kaplan-Meier method to obtain survival up to 5 years.(20) Survival analysis was stratified by type of surgery. Causes of death were assessed for cases of 30-day and one-year mortality and were defined by the following ICD-10 codes: for cardiac causes I.01, I.05-I.09, I.11, I.13, I.20-I.27, I.30-I.52, R.001, for pulmonary causes all J-codes and R.090, R.093, for infectious causes all A and B codes, for neurological causes all G-codes, for vascular and renal causes all other I-codes except those defined as cardiac and R.048. All analyses on mortality and cause of death were performed on intervention level. Reoperations Reoperations could be identified using the Cause of Death Registry [source 3], as this provided information on person-level. In the analyses of the reoperation rate the competing risk of mortality is accounted for. Analyses on reoperation rates were the only analyses performed on patient level, which means that the results refer to the number of patients. Postoperative period The length of stay and information on readmissions of surgery performed from 2007 to 2009 were obtained from the Hospital Discharge Registry [source 4]. The length of stay is reported as the median with the interquartile range (IQR) in days. The number of readmissions and the causes of readmission were assessed for each type of surgery. The causes of readmission were defined by the following ICD-9 codes: for cardiac causes 391, 393-398, 402, 404, 410416, 420-429 and for cerebrovascular causes 430-459. All analyses on length of stay and readmissions were performed on intervention level. All analyses were performed in R version 2.15.(21).

220


Trends and outcomes of CABG in The Netherlands

Results Isolated Coronary Artery Bypass Graft (CABG) operation volume The volume of isolated CABG decreased from an estimated 9,200 (95% CI 9,125-9,276) in 1995 to 8,368 operations in 2010. The annual Dutch surgery rate of isolated CABG decreased by 27% from 76.4 operations per 100,000 adults per year in 1995 to 64.1 operations per 100,000 adults per year in 2010 (Figure 2). The proportion of isolated CABG in all open-heart Figure 2. Annual number of isolated coronary artery bypass operations per 100,000 operations in The Netherlands decreased from 73.8% to 51.3% over the studied period. adults from 1995 to 2010 in The Netherlands

Figure 2. Annual number of isolated coronary artery bypass operations per 100,000 adults from 1995 to 2010 in The Netherlands

The dotted line (grey) represents the observed number of operations per 100,000 adults in the

The line (grey) represents the observed number of operations adults the BHNdotted database. The solid line (black) shows the expected numberper of 100,000 operations perin100,000 adults.

The dashed is the expected of operations adjusted for the BHN database.line The(black) solid line (black) showsnumber the expected number of operations per ageing 100,000population.

It represents the number of operations if the age distribution of the Dutch population would have

adults. line 1995. (black) is the expected number of operations adjusted for the ageing stayedThe the dashed same since population. It represents the number of operations if the age distribution of the Dutch population would have stayed same since 1995. Patient demographics andthe risk factors for mortality

The mean age in the operations performed from 1995 to 2010 increased from 66.9 (95% CI 66.4-67.4) to 69.8 years (95%CI 69.4-70.2) for female patients and from 62.4 (95% CI 62.1 - 62.7) to 65.7 years (95% CI 65.5 - 66.0) for male patients (p-value < 0.05 for both trends) (Figure 3). In 1995 24.2% of operations involved female patients; this proportion decreased by 2.4% to 21.7% in 2010 (p=0.005). Other risk factors for mortality after CABG are shown in Table 1 for the period 2007 to 2010. Frequently encountered risk factors for mortality after cardiac surgery were recent myocardial infarction (19.3%), moderate left ventricular dysfunction (19.2%), extracardiac

221


Chapter S2

arteriopathy (13.8%) and unstable angina (9.5%). Overall, the risk profile of patients did not change from 2007 to 2010: the logistic EuroSCORE remained stable (p=0.637). When subdivided into risk categories 76% of patients had a logistic EuroSCORE of <5%, 15% had a logistic EuroSCORE of 5-10%, 6% had a logistic EuroSCORE of 10-20% and 3% of all records had a logistic EuroSCORE higher than 20%. This distribution in risk categories did not change between 2007 and 2010 (p=0.601). Table 1: Patient characteristics of isolated coronary artery bypass surgery from 2007 to 2010 Source 2: NVT database N = 33,403 Age (s.d.) 66.0 (9.8) Female 22.2 (7420) Serum creatinine >200 μmol/l 1.6 (551) Extracardiac arteriopathy 13.8 (4594) Pulmonary disease 10.0 (3350) Neurological dysfunction 3.0 (1013) Previous cardiac surgery 3.3 (1096) Recent myocardial infarct 19.3 (6461) LVEF 30–50% 19.2 (6399) LVEF <30% 4.2 (1404) Systolic pulmonary pressure >60 mmHg 0.8 (255) Active endocarditis 0.0 (3) Unstable angina 9.5 (3184) Emergency operation 6.6 (2214) Critical preoperative state 4.4 (1454) Logistic EuroSCORE Mean Median

4.7 (± 6.8) 2.5 (1.5-4.9)

NVT: Netherlands Association for Cardio-Thoracic Surgery; s.d.: standard deviation.

222


Trends and outcomes of CABG in The Netherlands

Figure 3: Age and gender of isolated coronary artery bypass surgery from 1995 to Figure 3: Age and gender of isolated coronary artery bypass surgery from 1995 to 2010 Figure 3: Age and gender of isolated coronary artery bypass surgery from 1995 to 2010

A

A Figure 3: Age and gender of isolated coronary artery bypass surgery from 1995 to 2010 A

B B

B

The age of patients undergoing isolatedisolated coronary coronary artery bypass surgery is shown for the is period 1995 The age of patients undergoing artery bypass surgery shown for the period to 2010. Panel A: age is stratified by sex; Panel B: age is stratified by type of graft.

1995 to 2010. Panel A: age is stratified by sex; Panel B: age is stratified by type of graft.

The age of patients undergoing isolated coronary artery bypass surgery is shown for the period 1995 to 2010. Panel A: age is stratified by sex; Panel B: age is stratified by type of graft. 223


Chapter S2

Operative details Arterial conduits were increasingly used in CABG surgery (Figure 4). In patients under 65, all arterial grafting was performed in approximately 40% in 2010. In the older patients, the proportion of bypass surgery with both arterial and venous grafts increased. In the majority of interventions four or more distal anastomoses were performed (Table 2). Off pump surgery shows a rising trend: an increase was seen from 14.0% to 16.8% (p<0.001). The Figure 4: Venous and arterial grafting in isolated coronary artery bypass surgery from proportion of emergency operations was constant over the years (6.6%, p=0.642). 1995 to 2010 Figure 4: Venous and arterial grafting in isolated coronary artery bypass surgery from 1995 to 2010

The type of grafts used in patients undergoing isolated coronary artery bypass surgery is shown. Panel A: all ages over 18 years; Panel B: 18 to 65 years; Panel C: 65 years and older. The type of grafts used in patients undergoing isolated coronary artery bypass surgery is shown.

Panel A: all ages over 18 years; Panel B: 18 to 65 years; Panel C: 65 years and older. Table 2: operative characteristics of isolated coronary artery bypass surgery from 2007 to 2010 Source 2: NVT database Any arterial graft Only arterial grafts Only venous grafts Arterial and venous graft Number of anastomoses 1 anastomose 2 anastomoses 3 anastomoses 4 or more anastomoses Off-pump CABG Emergency operation a

2007 N=8214 (%)

2008 N=8524 (%)

2009 N=8377 (%)

2010 N=8288 (%)

P value

Total N = 33,403 % (N)

94.8 17.7 5.2 76.8 3.7 3.3 11.5 28.3 56.5 14.0 6.7

95.6 19.9 4.4 75.2 3.5 3.7 12.3 27.2 53.7 14.4 6.5

95.3 23.4 4.7 71.9 3.6 4.2 11.9 30.0 53.9 15.6 6.3

95.8 25.2 4.2 70.6 3.6 4.3 12.6 30.2 52.8 16.8 7.0

0.009 < 0.001 0.012 < 0.001 < 0.001

95.4 (31859) 21.5 (7194) 4.6 (1537) 73.6 (24587) 3.6 (Âą1.2) 3.9 (1296) 12.1 (4035) 28.9 (9657) 54.2 (18118) 15.2 (5073) 6.6 (2214)

< 0.001 0.6421

NVT: Netherlands Association for Cardio-Thoracic Surgery; CABG: coronary artery bypass grafting. Surgery performed before the beginning of the next workday

224

a


Trends and outcomes of CABG in The Netherlands

Outcomes of isolated CABG surgery In-hospital mortality for isolated CABG surgery performed from 2007 to 2010 was 1.4% and showed a decreasing trend from 1.5% in 2007 to 1.2% in 2010 (p=0.055). In-hospital mortality, 30-day mortality, operative mortality (i.e. in-hospital and/or 30-day mortality) and 120-day mortality specified by type of operation is shown in Table 3. Survival rates up to 4,5 years after surgery are shown in Figure 5, stratified by on-pump versus off pump surgery. Survival rates did not significantly differ between on-pump and off-pump surgery. The reoperation rate was 0.8 in on-pump CABG and 0.4 in off-pump CABG after three years (Figure 5). Most reoperations were performed on the same day as the primary surgery (67 of 123 reoperations). Figure5:5:Survival Survival and risk of reoperation after isolated coronary artery bypass surgery Figure and risk of reoperation after isolated coronary artery bypass surgery

The Kaplan Meier survival curve and the cumulative incidence curve for reoperation is shown for isolated coronary artery bypass surgery, stratified by on-pump and off-pump surgery. The Kaplan Meier survival curve and the cumulative incidence curve for reoperation is shown for isolated coronary artery bypass surgery, stratified by on-pump and off-pump surgery.

225


Chapter S2

Table 3. Early mortality after isolated coronary artery bypass surgery N

Logistic EuroSCORE Median

Mean

Logistic EuroSCORE

In hospital mortality

N

Source 3: Cause of Death Registry

Median

Mean

30-day mortality Operative mortality 120-day mortality

Source 2: NVT database

All isolated CABG

33402

2.5

4.7

1.4 17666

2.5

4.3

1.3

1.6

2.3

ECC On-pump Off-pump

28330 5073

2.6 2.4

4.8 4.1

1.4 14341 1.4 3325

2.5 2.4

4.4 4.0

1.4 1.0

1.6 1.4

2.3 2.1

Type of grafts All arterial Arterial + venous All venous

7194 24587 1537

1.6 2.8 5.2

3.0 4.8 11.0

0.9 4016 1.3 12858 5.1 762

1.5 2.7 5.3

2.9 4.3 10.5

0.8 1.2 5.8

1.0 1.4 6.8

1.4 2.2 8.3

NVT: Netherlands Association for Cardio-Thoracic Surgery; CABG: coronary artery bypass grafting; ECC: extracorporal circulation.

In the first 30 days 85.5% of all deaths can be attributed to a cardiac cause (Table 4). After one year, the proportion had decreased to 62.1%, whereas pulmonary, infectious, neurologic, renal, vascular and other noncardiac causes played a larger role than in the first postoperative period. Table 4: Cause of mortality after isolated coronary artery bypass surgery Source 3: Cause of Death Registry a Causes of death Cardiac Pulmonary Infection Neurologic Cerebro(vascular) / renal Other Unknown

30-day mortality N= 17666

1-year mortality N= 17666

Cause of death in age-matched general population, 2007-2011(41)

228 (100%) 195 (85.5) 2 (0.9) 4 (1.8) 0 (0.0) 9 (3.9) 16 (7.0) 2 (0.9)

593 (100%) 368 (62.1) 25 (4.2) 15 (2.5) 1 (0.2) 58 (9.8) 123 (20.7) 3 (0.5)

100 % 14.9 % 5.8 % 1.4 % 2.7 % 6.5 % 68.8 %b -

Results are based on operations performed between 1 January 2007 and 31 December 2010 in 10 out of 16 centers that were linked to the Cause of Death Registry; b Neoplasms are the leading cause of death among the causes defined as â&#x20AC;&#x153;otherâ&#x20AC;?.

a

226


Trends and outcomes of CABG in The Netherlands

Post-operative period Of all isolated CABG surgery 53.3% was transferred to another hospital for recovery (Table 5), the remaining was directly discharged to home or another facility. The median postoperative length of stay in the primary hospital was 5 days (interquartile range, IQR 4-7). However when subsequent stay in secondary hospitals was included, the median hospital length of stay was 8 days (IQR 6-11). Approximately 38% of all patients were readmitted to the hospital in the first postoperative year. A cardiac cause of readmission was reported in 24.6% of all readmissions (Table 5). Table 5. Hospital admission and readmission data in patients undergoing isolated coronary artery bypass surgery Source 4: Hospital Discharge Registry a In-hospital mortality primary center Transfer to secondary centers In-hospital mortality including transfer to secondary centers Post-operative length of stay In index hospital, mean (days) In index hospital, median (days) Including secondary hospital, mean (days) Including secondary hospital, median (days) Readmissions one year after intervention None 1 -2 3 -4 5 or more Unknown Causes all readmissions Cardiac Cerebrovascular Other

N= 11395 153 6096 176

% or s.d. or IQR 1.3 53.3 1.5

6.7 5 10.1 8

7.1 4-7 8.7 6-11

7017 3315 672 301 90

61.6 29.1 5.9 2.6 0.8

2137 475 6083

24.6 5.5 70.0

Results are based on operations performed between 1 January 2007 and 31 December 2009 in 10 out of 16 centers that were linked to the Hospital Discharge Registry. S.d.: standard deviation; IQR: interquartile range.

a

Discussion Primary findings This study combines information on procedural volumes, patient characteristics, survival, cause of death, reoperation rate and readmission rate for all isolated CABG operations in a comprehensive national database in Europe. The findings quantify CABG usage in the past years and facilitate the understanding of the interaction between CABG and PCI utilization.

227


Chapter S2

This knowledge is essential for everyone who is involved in the treatment and policy-making process of coronary artery disease. The number of isolated CABG surgery in The Netherlands decreased between the years 1995 and 2010. From 2007 to 2010 in-hospital mortality showed a decreasing trend, without an indication of a decreasing patient risk profile. This suggests that the trend of improved outcomes and safer surgery has continued up to this day. In addition, several surgical practices changed considerably over the years: all arterial grafting was increasingly performed and comprised over a quarter of all isolated CABG surgery in 2010. Likewise, offpump surgery has gained popularity and comprised 17% of all isolated CABG operations in 2010. The interaction of PCI and CABG use The annual number of isolated CABG surgery in the Netherlands decreased in the past 16 years. This trend has also been found in other national databases in Europe and North America.(3;13;22-31) Changes in the CABG rate could be related to the increasing utilization of percutaneous coronary intervention (PCI). As a previous study on coronary intervention trends in the US shows, drug-eluting stents (DES) gained rapid popularity after its Food and Drug Administration approval in 2003.(3) Although the relation between the increase in DES utilization and the decrease in CABG surgery cannot be demonstrated in our data, an interaction of PCI and CABG use can be expected. Similar to the US, PCI rates in the Netherlands have risen rapidly since 2003.(32) The results of the SYNTAX trial published in 2009 show that CABG surgery is associated with better outcomes than PCI among patients with previously untreated 3-vessel or left main coronary artery disease.(7) Undoubtedly, this new evidence has had an impact on the choice of revascularization therapy for many physicians. In addition, CABG and PCI rates might be affected by decreased short-term complication rates of CABG such as stroke, reoperation for bleeding and sternal wound infection.(10) However, the decreasing trend in CABG surgery continues up to 2010. Other European cohorts have shown similar trends in the annual CABG operation rate.(22-31) Arterial grafts The use of one or more arterial conduits, as well as all arterial grafting has gained ground, at the expense of combined arterial and venous grafting. This development should be welcomed, as the long-term patency of mammary artery grafts and the radial artery are shown to be superior to that of venous grafts. (33;34) The trend towards more arterial revascularization is likely to result in improved long term graft patency.

228


Trends and outcomes of CABG in The Netherlands

Off-pump surgery The debate on possible harms and benefits of off-pump surgery and the target population for this type of CABG continues up to this day.(35-40) Despite this, a strong increase was found in the annual number of off-pump surgery since 2007. Follow-up showed that the 4-year survival for off-pump surgery was 89.4% and for on-pump surgery 91.2%, whilst the median logistic EuroSCOREs were 2.4% and 2.5% respectively. Patient risk profile The patient characteristics of the population undergoing isolated CABG surgery in The Netherlands have changed since 1995. The population has become significantly older with a mean age of 63.5 in 1995 and a mean age of 66.6 years in 2010. The overall predicted risk expressed as the logistic EuroSCORE has remained constant. This study illustrates that the mortality after isolated CABG in The Netherlands shows a decreasing trend, despite an aging population and unchanged predicted risks. This trend has also been reported in other national databases in Europe and in North America.(10;22;23;26;27) Mortality A comparison between other cohorts and our study population is difficult make, due to differences in definitions and variables. Crude mortality in our cohort is lower than in the STS cohort: an operative mortality of 2.4% in the year 2000 and 1.9% in the year 2009 was reported for the STS cohort versus 1.6% in our cohort.(10) Other cohorts in Europe reported the following: 1.5% 30-day mortality in Sweden, 1.8% in-hospital mortality in the UK and Ireland, 2.9% in-hospital mortality in Germany and 2.2% in-hospital mortality in France. (22;23;26;28) It must be stressed that in most cases valid comparisons across databases could not be made, because risk adjustment was not possible. Strengths and possible limitations For this study a comprehensive database was used with a complete national coverage from 2007 onwards. This unique database is likely to be generalizable to other populations in Europe with comparable cardiac surgery facilities. A possible limitation of this study is that we had no access to data on PCIâ&#x20AC;&#x2122;s. Considering the constant interaction between trends in the field of percutaneous coronary interventions and CABG, some of the results require knowledge on the population treated with PCI. For this purpose we used information from other publications. Furthermore, the collected outcome variable is limited to in-hospital mortality. Linkage to national registries provided information on survival status over a longer period of time. However, six out of 16 centers did not participate with the linkage, forcing us to accept follow-up information at the expense of a national coverage. Finally, the dataset was initially set up to for the evaluation of risk-adjusted outcomes. Therefore, surgical

229


Chapter S2

details are limited and some information, such as the type of conduit and postoperative complications, was not collected. Recently, the dataset of the Netherlands Association of Cardio-Thoracic Surgery has been expanded with more operative details. Conclusion The results of this study provide a comprehensive view of trends and outcomes of isolated CABG surgery in the Netherlands. The number of isolated CABG operations in The Netherlands has declined since 1995. In-hospital mortality showed a decreasing trend, without indication of a decrease in patient risk profile. Off-pump surgery is increasingly performed and arterial grafts are now used in nearly all operations.

230


Trends and outcomes of CABG in The Netherlands

Reference List 1.

2.

3. 4.

5.

6.

7. 8.

9. 10. 11. 12. 13. 14. 15.

Pignone M, Alberts MJ, Colwell JA, Cushman M, Inzucchi SE, Mukherjee D, et al. Aspirin for primary prevention of cardiovascular events in people with diabetes: a position statement of the American Diabetes Association, a scientific statement of the American Heart Association, and an expert consensus document of the American College of Cardiology Foundation. Circulation 2010 Jun 22;121(24):2694-701. Smith SC, Jr., Allen J, Blair SN, Bonow RO, Brass LM, Fonarow GC, et al. AHA/ACC guidelines for secondary prevention for patients with coronary and other atherosclerotic vascular disease: 2006 update: endorsed by the National Heart, Lung, and Blood Institute. Circulation 2006 May 16;113(19):2363-72. Epstein AJ, Polsky D, Yang F, Yang L, Groeneveld PW. Coronary revascularization trends in the United States, 2001-2008. JAMA 2011 May 4;305(17):1769-76. ACC/AHA guidelines and indications for coronary artery bypass graft surgery. A report of the American College of Cardiology/American Heart Association Task Force on Assessment of Diagnostic and Therapeutic Cardiovascular Procedures (Subcommittee on Coronary Artery Bypass Graft Surgery). Circulation 1991 Mar;83(3):1125-73. Eagle KA, Guyton RA, Davidoff R, Ewy GA, Fonger J, Gardner TJ, et al. ACC/AHA Guidelines for Coronary Artery Bypass Graft Surgery: A Report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines (Committee to Revise the 1991 Guidelines for Coronary Artery Bypass Graft Surgery). American College of Cardiology/American Heart Association. J Am Coll Cardiol 1999 Oct;34(4):1262-347. Hillis LD, Smith PK, Anderson JL, Bittl JA, Bridges CR, Byrne JG, et al. 2011 ACCF/AHA Guideline for Coronary Artery Bypass Graft Surgery. A report of the American College of Cardiology Foundation/ American Heart Association Task Force on Practice Guidelines. Developed in collaboration with the American Association for Thoracic Surgery, Society of Cardiovascular Anesthesiologists, and Society of Thoracic Surgeons. J Am Coll Cardiol 2011 Dec 6;58(24):e123-e210. Serruys PW, Morice MC, Kappetein AP, Colombo A, Holmes DR, Mack MJ, et al. Percutaneous coronary intervention versus coronary-artery bypass grafting for severe coronary artery disease. N Engl J Med 2009 Mar 5;360(10):961-72. Mohr FW, Morice MC, Kappetein AP, Feldman TE, Stahle E, Colombo A, et al. Coronary artery bypass graft surgery versus percutaneous coronary intervention in patients with three-vessel disease and left main coronary disease: 5-year follow-up of the randomised, clinical SYNTAX trial. Lancet 2013 Feb 23;381(9867):629-38. Aldea GS, Mokadam NA, Melford R, Jr., Stewart D, Maynard C, Reisman M, et al. Changing volumes, risk profiles, and outcomes of coronary artery bypass grafting and percutaneous coronary interventions. Ann Thorac Surg 2009 Jun;87(6):1828-38. ElBardissi AW, Aranki SF, Sheng S, Oâ&#x20AC;&#x2122;Brien SM, Greenberg CC, Gammie JS. Trends in isolated coronary artery bypass grafting: an analysis of the Society of Thoracic Surgeons adult cardiac surgery database. J Thorac Cardiovasc Surg 2012 Feb;143(2):273-81. Ko DT, Tu JV, Samadashvili Z, Guo H, Alter DA, Cantor WJ, et al. Temporal trends in the use of percutaneous coronary intervention and coronary artery bypass surgery in New York State and Ontario. Circulation 2010 Jun 22;121(24):2635-44. Nallamothu BK, Young J, Gurm HS, Pickens G, Safavi K. Recent trends in hospital utilization for acute myocardial infarction and coronary revascularization in the United States. Am J Cardiol 2007 Mar 15;99(6):749-53. Riley RF, Don CW, Powell W, Maynard C, Dean LS. Trends in coronary revascularization in the United States from 2001 to 2009: recent declines in percutaneous coronary intervention volumes. Circ Cardiovasc Qual Outcomes 2011 Mar;4(2):193-7. Siregar S, Groenwold RH, Versteegh MI, Takkenberg JJ, Bots ML, van der GY, et al. Data Resource Profile: Adult caSource 2: NVT-database Source 3: Cause of Death Registryrdiac surgery database of the Netherlands Association for Cardio-Thoracic Surgery. Int J Epidemiol 2013 Feb 9. Begeleidingscommissie Hartinterventies Nederland. http://www.bhn-registratie.nl/ 2013.

231


Chapter S2

16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41.

232

Nashef SA, Roques F, Michel P, Gauducheau E, Lemeshow S, Salamon R. European system for cardiac operative risk evaluation (EuroSCORE). Eur J Cardiothorac Surg 1999 Jul;16(1):9-13. Centraal Bureau voor de Statistiek. www.cbs.nl. 2012 Dutch Hospital Data. Landelijke Medische Registratie. www.dutchhospitaldata.nl. 2012. Vaartjes I, Hoes AW, Reitsma JB, de BA, Grobbee DE, Mosterd A, et al. Age- and gender-specific risk of death after first hospitalization for heart failure. BMC Public Health 2010;10:637. Kaplan EL, Meier P. Nonparametric estimation from incomplete estimations. Journal of the American Statistical Association 1958;53:457-81. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2011. Swedeheart Annual Report 2011. http://www.ucr.uu.se/swedeheart/index php/arsrapporter/doc_ download/178-swedeheart-annual-report-2011-english. 2012 September 14 Sixth National Adult Cardiac Surgical Database Report 2008. Dendrite Clinical Systems Ltd; 2009 Jul. The Danish Heart Register. http://www.dhreg.dk/ 2010 April 8. Swedeheart Årsrapporter 2010. http://www.ucr.uu.se/hjartkirurgi/indexphp/arsrapporter. 2011 December 2. Les bases de données de la Société Francaise de Chirurgie Thoracique et Cardio-Vasculaire, Neuvième Rapport: Juin 2012. 2012 Jun 1. Hjertekirurgiregisteret (Norwegian Register for Cardiac Surgery) 2010. 2012. Funkat AK, Beckmann A, Lewandowski J, Frie M, Schiller W, Ernst M, et al. Cardiac Surgery in Germany during 2011: A Report on Behalf of the German Society for Thoracic and Cardiovascular Surgery. Thorac Cardiovasc Surg 2012 Sep;60(6):371-82. Gummert JF, Funkat A, Beckmann A, Schiller W, Hekmat K, Ernst M, et al. Cardiac surgery in Germany during 2009. A report on behalf of the German Society for Thoracic and Cardiovascular Surgery. Thorac Cardiovasc Surg 2010 Oct;58(7):379-86. Gummert JF, Funkat AK, Beckmann A, Ernst M, Hekmat K, Beyersdorf F, et al. Cardiac surgery in Germany during 2010: a report on behalf of the German Society for Thoracic and Cardiovascular Surgery. Thorac Cardiovasc Surg 2011 Aug;59(5):259-67. BACTS Cardiac Surgical Database Report FINAL REPORT 2008. 2012. Vaartjes I, van Dis I, Bots ML. Coronary interventions in the Netherlands. Netherlands Heart Foundation 2011. Available from: http://webshop.hartstichting.nl/ Producten/download. aspx?pID=4118 Lytle BW, Loop FD, Cosgrove DM, Ratliff NB, Easley K, Taylor PC. Long-term (5 to 12 years) serial studies of internal mammary artery and saphenous vein coronary bypass grafts. J Thorac Cardiovasc Surg 1985 Feb;89(2):248-58. Cao C, Manganas C, Horton M, Bannon P, Munkholm-Larsen S, Ang SC, et al. Angiographic outcomes of radial artery versus saphenous vein in coronary artery bypass graft surgery: A meta-analysis of randomized controlled trials. J Thorac Cardiovasc Surg 2012 Aug 4. Shroyer AL, Grover FL, Hattler B, Collins JF, McDonald GO, Kozora E, et al. On-pump versus off-pump coronary-artery bypass surgery. N Engl J Med 2009 Nov 5;361(19):1827-37. Caplan LR. On-pump versus off-pump CABG. N Engl J Med 2010 Mar 4;362(9):852-3. Augoustides JG. On-pump versus off-pump CABG. N Engl J Med 2010 Mar 4;362(9):852-4. Kieser TM. On-pump versus off-pump CABG. N Engl J Med 2010 Mar 4;362(9):852-4. Taggart DP. On-pump versus off-pump CABG. N Engl J Med 2010 Mar 4;362(9):852-4. Puskas JD, Mack MJ, Smith CR. On-pump versus off-pump CABG. N Engl J Med 2010 Mar 4;362(9):8514. Statline Statistics Netherlands. http://statline.cbs.nl/statweb/. 2013.


Closing pages Supplement Closing pages Synthesis


Summary Samenvatting Cardiac surgery centers and Data Registration Committee Manuscripts based on the studies presented in this thesis Dankwoord Curriculum Vitae


Closing pages

236


Summary

Summary Introduction In 2007, The Netherlands Association for Cardio-Thoracic Surgery (NVT) constituted the NVT Adult Cardiac Surgery Database. The goal of this database was to evaluate risk-adjusted mortality rates and to monitor safety as an elementary component of the quality of care. The monitoring of safety in cardiac surgery is a complex process, which involves many clinical, practical, methodological and statistical issues. The objective of this thesis was to monitor safety in cardiac surgery in The Netherlands using the NVT Adult Cardiac Surgery Database. The specific aims can be defined as follows: 1. To investigate methods to measure safety in cardiac surgery 2. To investigate methods to compare safety across cardiac surgery centers Part 1 offers an elaborate discussion of issues related to measuring safety in cardiac surgery. In Chapter 4 an overview of the basic concepts of performance indicators in healthcare is provided. An adequate performance indicator measures the quality of the delivered care, usually expressed as clinical patient outcomes. If outcomes are not available, the process and structure of care may help to describe the quality of care. Considering the wide variety of diseases and treatments, one uniform hospital-wide performance indicator is unattainable. Instead, we should focus on performance indicators for specific diseases. Carefully selected performance indicators may be used to monitor trends and to identify underperformance (benchmarking). However, the complexity of quality assessments in health care does not allow for the identification of the best performing hospitals in The Netherlands. Chapter 5 focuses on the data accuracy of risk factors. Statistical methods are presented to study changes in risk factor frequencies in a clinical database. Using the NVT database, monthly frequency rates of 18 risk factors were monitored using Statistical Process Control (SPC): Shewhart control charts, exponentially weighted moving average (EWMA) charts and cumulative sum (CUSUM) charts. Using simulations, the methods were tested for their sensitivity to gaming, i.e. intentional upcoding of risk factors. Upcoding in random patients was detected in 100% of the simulations. Subtle forms of upcoding, involving specifically high-risk patients, were more difficult to identify. However, surveillance of the overall expected risk (e.g. the logistic EuroSCORE) in addition to the separate risk factors resulted in a high detection rate of gaming. The use of SPC for risk factor surveillance is recommended in all clinical databases. In Chapter 6 the question is answered when mortality after cardiac surgery should be measured for the purpose of comparing, or benchmarking, cardiac surgery centers. Survival up to one year after surgery was obtained from the national Cause of Death Registry for

237


Closing pages

patients operated in 10 out of 16 cardiac centers. The survival curves showed a steep initial decline followed by stabilization after approximately 60 to 120 days, depending on the intervention performed e.g. 60 days for isolated coronary artery bypass grafting (CABG) and 120 days for combined CABG and valve surgery. Benchmark results were affected by the choice of follow-up period. Therefore, follow-up should be prolonged to a minimum of 120 days to capture early mortality of all types of interventions. In Part 2 several issues are reviewed that are related to the comparison of safety across cardiac surgery centers. The comparison of outcomes requires adjustment for patient- and intervention characteristics (risk factors), also called risk adjustment. The European system for cardiac operative risk evaluation (EuroSCORE) is a commonly used risk adjustment model and is the current benchmarking tool in the NVT database. The comprehensive review described in Chapter 7 was performed to review the performance of the additive and logistic EuroSCORE. A literature search resulted in 67 articles, which showed that the EuroSCORE overestimated mortality. The performance of the EuroSCORE depended on the risk profile of patients: in high-risk patients the additive EuroSCORE model actually underestimated mortality. The discriminative performance of the model was good. Given the poor predictive performance, the original EuroSCORE may not be suitable as a tool for patient selection or for benchmarking. The quality of care is often reported using ranking lists. In Chapter 8 the precision of and fluctuations in ranking lists in the comparison of cardiac surgery mortality rates is studied. Ranking lists for the years 2007 to 2009 were constructed using data from the NVT database. The ranking lists showed considerable reshuffling. The mean width of the 95% confidence intervals was 10 ranks using crude and 8 ranks using risk-adjusted mortality rates. The large overlap of the confidence intervals across hospitals indicates that rank statistics were not materially different. Hence, rankings are an imprecise statistical method to report cardiac surgery mortality rates and reshuffling of ranks can be expected solely due to chance. Upcoding or undercoding of risk factors may affect the comparison of outcomes across cardiac surgery centers. In the study presented in Chapter 9, intentional upcoding, often called gaming, was simulated to quantify the impact on the comparison of outcomes. In random patients (i.e. nondifferential misclassification) substantial upcoding was required to affect benchmarking: a 1.8 fold increase in prevalence of four risk factors changed an underperformer into an average performing center. However, when patients with the highest EuroSCORE were upcoded (i.e. differential misclassification), a 1.1 fold increase was sufficient. Thus, benchmarking based on risk-adjusted mortality rates can be manipulated by misclassification of EuroSCORE risk factors and the prevalence of all risk factors should be carefully monitored.

238


Summary

In Chapter 10 we use administrative data to compare mortality across cardiac surgery centers. For 10 out of 16 centers, data from the hospital administration database and the NVT database were extracted. The number of cardiac surgery interventions performed could not be assessed using the administrative database, due to incorrect or missing intervention codes. Risk models were developed using the administrative data (according to the Dutch Hospital Standardized Mortality Ratio method) and the clinical data (with EuroSCORE variables). The administrative model was inferior to the clinical model with respect to discrimination (c-statistic of 0.77 versus 0.85, p-value for difference <0.001) and calibration (Brier Score of 2.8 versus 2.6, p-value for difference <0.001, maximum score 3.0). Two hospitals changed outlier status when benchmarking was performed using the administrative model instead of the clinical model. In cardiac surgery, administrative data are less suitable than clinical data for the purpose of benchmarking. Risk adjustment models including procedure specific clinical risk factors are recommended. Supplementary Chapters S1 and S2 provide a comprehensive overview of trends and outcomes of valve surgery and isolated CABG surgery in the Netherlands from 1995 to 2010. The number of heart valve operations performed in The Netherlands has strongly increased since 1995. Bio-prostheses are increasingly implanted at the expense of mechanical valves and mitral valve surgery is currently dominated by repair instead of replacement. The number of isolated CABG operations in The Netherlands has declined since 1995. Off-pump surgery is increasingly performed and arterial grafts are now used in nearly all operations. The mortality after isolated CABG surgery and valve surgery has decreased, without signs of a decrease in patient risk. This suggests that outcomes have improved and cardiac surgery has become safer.

239


Closing pages

240


Samenvatting

Samenvatting in het Nederlands In 2007 richtte de Nederlandse Vereniging voor Thoraxchirurgie (NVT) de Risico- en Interventieregistratie op. Het doel van deze database was om de risico-gewogen sterfte te evalueren en om de veiligheid, als essentieel onderdeel van de kwaliteit van zorg, te bewaken. Het bewaken van de veiligheid in de hartchirurgie is een complex proces, waar veel klinische, praktische, methodologische en statistische zaken mee gemoeid zijn. Het doel van dit proefschrift was om de veiligheid in de hartchirurgie in Nederland te bewaken, waarbij gebruik gemaakt wordt van de Risico- en Interventieregistratie van de NVT. De specifieke doelen zijn als volgt: 1. Het onderzoeken van methoden om de veiligheid in de hartchirurgie te meten 2. Het onderzoeken van methoden om de veiligheid in de hartchirurgie tussen centra te kunnen vergelijken Deel 1 gaat in op zaken die van belang zijn bij het meten van de veiligheid in de hartchirurgie. In Hoofdstuk 4 wordt een overzicht gegeven van de basisconcepten van prestatie-indicatoren in de zorg. Een goede prestatie-indicator meet de kwaliteit van zorg, waarbij gebruik kan worden gemaakt van klinische uitkomsten. Als uitkomsten niet beschikbaar zijn, kunnen de processen en de structuur van de zorg een beeld geven van de kwaliteit van zorg. Gezien de verscheidenheid aan aandoeningen en behandelingen, is een ziekenhuis-brede prestatieindicator niet haalbaar en dient men zich te richten op prestatie-indicatoren voor specifieke aandoeningen. Weloverwogen prestatie-indicatoren kunnen worden gebruikt om trends te monitoren en om praktijken te herkennen die onder een vastgestelde norm presteren (benchmarken). Echter, het meten van kwaliteit is te complex om de beste ziekenhuizen van Nederland aan te kunnen wijzen. Hoofdstuk 5 richt zich op de juistheid van de data, in het bijzonder de risico factoren. Er worden statistische methoden gepresenteerd die het mogelijk maken veranderingen in de prevalentie van risicofactoren te vervolgen. De maandelijkse prevalentie van 18 risicofactoren in de NVT database werd vervolgd met behulp van Statistical Process Control (SPC): Shewhart controle grafieken, exponentially weighted moving average (EWMA) grafieken en cumulative sum (CUSUM) grafieken. Door middel van simulaties werden de methoden getest op hun sensitiviteit voor gaming, i.e. het bewust misclassificeren van risicofactoren. Gaming in willekeurige patiĂŤnten werd in 100% van de simulaties gedetecteerd. Subtiele vormen van gaming, waarbij het specifiek hoog-risicopatiĂŤnten betreft, waren moeilijker te detecteren. Echter, wanneer naast de individuele risicofactoren ook de verwachte sterfte (bijv. de logistische EuroSCORE) werd vervolgd, was er een hoge sensitiviteit voor gaming. Het gebruik van SPC voor het vervolgen van de prevalentie van risicofactoren wordt aangeraden in alle klinische databases. 241


Closing pages

In Hoofdstuk 6 wordt de vraag beantwoord wanneer sterfte na hartchirurgie gemeten dient te worden om hartchirurgische centra te kunnen vergelijken. Aan de hand van de Doodsoorzaken Registratie werd de overleving van patiënten uit 10 van de 16 hartchirurgische centra vastgesteld tot één jaar na de operatie. Er was een steile afname van de overleving zichtbaar, waarna de overlevingscurve stabiliseerde. Dit gebeurde na 60 tot 120 dagen, afhankelijk van de ingreep, bijvoorbeeld 60 dagen voor geïsoleerde coronaire bypass chirurgie (CABG) en 120 dagen voor gecombineerde hartklep en CABG chirurgie. De keuze voor het gebruik van de 30-dagen sterfte of de 1-jaars sterfte beïnvloedt de vergelijking van sterfte tussen ziekenhuizen. De follow-up na hartchirurgie dient derhalve verlengd te worden naar 120 dagen, om de vroege sterfte van alle typen hartchirurgie te meten. In Deel 2 worden verschillende zaken bestudeerd die gerelateerd zijn aan de vergelijking van de veiligheid in de hartchirurgische centra. Bij de vergelijking van uitkomsten dient men rekening te houden met de patient- en operatiekenmerken (risico factoren), wat ook wel risico-correctie wordt genoemd. De European system for cardiac operative risk evaluation (EuroSCORE) is een veelvuldig toegepast risico-correctiemodel en wordt ook gebruikt in de NVT database. In de systematische review beschreven in Hoofdstuk 7, wordt de voorspellende waarde van de additieve en logistische EuroSCORE onderzocht. De literatuurstudie omvatte 67 artikelen, die aantoonden dat de EuroSCORE de sterfte overschat. De overschatting was afhankelijk van het risicoprofiel van de patiënten: in hoogrisicopatiënten onderschatte de additieve EuroSCORE de sterfte juist. De discriminatie van de EuroSCORE was goed. De EuroSCORE is in de originele vorm niet geschikt voor gebruik als risicoscore voor patiëntselectie of het benchmarken van centra. De kwaliteit van zorg wordt vaak weergegeven in de vorm van ranglijsten. In Hoofdstuk 8 wordt de nauwkeurigheid van ranglijsten voor de vergelijking van sterfte na hartchirurgie bestudeerd. Op basis van de data in de NVT database werden ranglijsten opgesteld voor de jaren 2007 tot en met 2009. Er werden veel verschuivingen in de rangen van de centra gezien. Het gemiddelde 95% betrouwbaarheidsinterval van een ziekenhuisrang was 10 rangen voor de ranglijst o.b.v. de ongecorrigeerde sterfte en 8 rangen voor die o.b.v. de risico-gecorrigeerde sterfte. De overlappende betrouwbaarheidsintervallen tonen aan dat er geen substantieel verschil is tussen de rangen. Ranglijsten zijn een onnauwkeurige methode om sterfte na hartchirurgie weer te geven en verschuivingen in rangen zijn voornamelijk toe te schrijven aan toeval. De vergelijking van risico-gecorrigeerde uitkomsten kan worden beïnvloed door onjuiste opgave van risicofactoren. In de studie beschreven in Hoofdstuk 9, werd middels datasimulaties het effect van gaming (i.e. het bewust misclassificeren van risicofactoren) op de vergelijking van centra gekwantificeerd. Bij gaming in willekeurige patienten (i.e. non-differentiële misclassificatie) was substantiële misclassificatie nodig om benchmark

242


Samenvatting

resultaten te beïnvloeden: de prevalentie van vier risicofactoren diende 1.8-maal verhoogd te worden om een ondermaats presterend centrum in een gemiddeld centrum te veranderen. Wanneer gaming in patiënten met de hoogste EuroSCORE werd uitgevoerd (i.e. differentiële misclassificatie), was een 1.1-maal toename voldoende. De vergelijking van risico-gecorrigeerde sterfte kan beïnvloed worden door misclassificatie van EuroSCORE risicofactoren en de prevalentie van risicofactoren dient nauwlettend vervolgd te worden. In Hoofdstuk 10 gebruiken we administratieve data om de sterfte tussen hartchirurgische centra te vergelijken. Voor 10 van de 16 centra werd data uit de ziekenhuisadministratie en de NVT database gebruikt. Het aantal hartchirurgische ingrepen kon niet worden vastgesteld op basis van de ziekenhuisadministratie, als gevolg van incorrecte of ontbrekende interventiecodes. Er werd een risico-model ontwikkeld op basis van de administratieve data (volgens de Nederlandse Hospital Standardized Mortality Ratio methode) en op basis van de klinische data (met EuroSCORE-variabelen). Het administratieve model had een minder goede discriminatie (C-statistiek van 0.77 versus 0.85, p-waarde voor verschil <0.001) en kalibratie (Brier Score van 2.8 versus 2.6, p-waarde voor verschil <0.001, maximum score 3.0). Voor twee centra veranderde het resultaat t.o.v. de landelijke norm wanneer benchmarking werd uitgevoerd met het administratieve model in plaats van het klinische model. In de hartchirurgie zijn administratieve data minder geschikt dan klinische data voor de vergelijking van sterftecijfers. Het gebruik van risico-modellen met procedure-specifieke klinische risicofactoren wordt aangeraden. In de aanvullende Hoofdstukken S1 en S2 wordt een overzicht gegeven van trends en uitkomsten van hartklepchirurgie en geïsoleerde CABG ingrepen in Nederland voor de jaren 1995 tot 2010. Het aantal hartklepoperaties is sterk toegenomen sinds 1995. Bio-prothesen worden vaker geïmplanteerd dan mechanische klepprothesen en de mitralisklepchirurgie bestaat nu voornamelijk uit klepplastiek in plaats van klepvervanging. Het aantal geïsoleerde CABG operaties in Nederland is afgenomen sinds 1995. Off-pump chirurgie neemt toe en arteriële grafts worden nu in bijna alle CABG ingrepen gebruikt. De sterfte na geïsoleerde CABG chirurgie en hartklepchirurgie is afgenomen, zonder aanwijzingen voor een afname in het risicoprofiel van de patiënten. Dit wijst op een verbetering van de uitkomsten en een toename van de veiligheid in de hartchirurgie.

243


Closing pages

244


Cardiac surgery centers and Data Registration Committee

Cardiac surgery centers and Data Registration Committee All 16 cardiac surgery centers in The Netherlands contribute to the Netherlands Association for Cardio-Thoracic Surgery database. The centers are listed below, including the names of the members of the Data Registration Committee (DRC). Academic Medical Center (Amsterdam) DRC member: Dr. P. Symersky Amphia Hospital (Breda) DRC member: Drs. M. Bentala Catharina Hospital (Eindhoven) DRC member: Dr. A.H.M. van Straten Erasmus Medical Center (Rotterdam) DRC member: Dr. J.A. Bekkers Haga Hospital (The Hague) DRC member: Dr. K. Khargi Isala Klinieken (Zwolle) DRC member: Dr. A.L.P. Markou Leiden University Medical Center (Leiden) DRC member: Drs. M.I.M. Versteegh Maastricht University Medical Center (Maastricht) DRC member: Drs. P.J.C. Barenbrug Medical Center Leeuwarden (Leeuwarden) DRC member: Dr. L. Jekel Medisch Spectrum Twente (Enschede) DRC member: Dr. R.G.H. Speekenbrink Onze Lieve Vrouwe Gasthuis (Amsterdam) DRC member: Dr. W. Stooker

245


Closing pages

St. Antonius Hospital (Nieuwegein) DRC member: Drs. E.J. Daeter University Medical Center Groningen (Groningen) DRC member: Drs. I.J. Wijdh â&#x20AC;&#x201C; den Hamer University Medical Center St. Radboud (Nijmegen) DRC member: Dr. L. Noyez University Medical Center Utrecht (Utrecht) DRC member: Prof. dr. L.A. van Herwerden (President) VU Medical Center (Amsterdam) DRC member: Dr. E.K. Jansen

246


Manuscripts based on the studies presented in this thesis

Manuscripts based on the studies presented in this thesis Chapter 1 Siregar S, Versteegh MIM, van Herwerden LA. Risk-adjusted hospital mortality rates [Risicogewogen ziekenhuismortaliteit]. Ned Tijdschr Geneeskd. 2011;155(50):A4103. Chapter 3 Siregar S, Groenwold RHH, Versteegh MIM, Takkenberg JJM, Bots ML, van der Graaf Y, van Herwerden LA. Data Resource Profile: Adult cardiac surgery database of the Netherlands Association for Cardio-Thoracic Surgery. Int J Epidemiol. 2013;42(1):142-149. Chapter 4 Siregar S, Groenwold RHH, Versteegh MIM, van Herwerden LA. Performance indicators for hospitals [Prestatie-indicatoren voor ziekenhuizen]. Ned Tijdschr Geneeskd. 2012;156(49):A5487. Chapter 5 Siregar S, Roes KCB, van Straten AHM, Bots ML, van der Graaf Y, van Herwerden LA, Groenwold RHH. Statistical methods to monitor risk factors in a clinical database: example of a national cardiac surgery registry. Circ Cardiovasc Qual Outcomes. 2013 Jan 1;6(1):110-8. Chapter 6 Siregar S, Groenwold RHH, de Mol BAJM, Speekenbrink RGH, Versteegh MIM, Brandon Bravo Bruinsma GJ, Bots ML, van der Graaf Y, van Herwerden LA. Evaluation of cardiac surgery mortality rates: 30-day mortality or longer follow-up? Eur J Cardiothorac Surg. Accepted. Chapter 7 Siregar S, Groenwold RHH, de Heer F, Bots ML, van der Graaf Y, van Herwerden LA. Performance of the original EuroSCORE. Eur J Cardiothorac Surg. 2012 Apr;41(4):746-54. Chapter 8 Siregar S, Groenwold RHH, Jansen EK, Bots ML, van der Graaf Y, van Herwerden LA. Limitations of ranking lists based on cardiac surgery mortality rates. Circ Cardiovasc Qual Outcomes. 2012 May;5(3):403-9.

247


Closing pages

Chapter 9 Siregar S, Groenwold RHH, Versteegh MIM, Noyez L, Ter Burg WJPP, Bots ML, van der Graaf Y, van Herwerden LA. Gaming in risk-adjusted mortality rates: Effect of misclassification of risk factors in the benchmarking of cardiac surgery risk-adjusted mortality rates. J Thorac Cardiovasc Surg. 2012 Mar;145(3):781-9. Chapter 10 Siregar S*, Pouw ME*, Moons KGM, Versteegh MIM, Bots ML, van der Graaf Y, Kalkman CJ, van Herwerden LA, Groenwold RHH. The Dutch Hospital Standardized Mortality Ratio (HSMR) method and cardiac surgery: benchmarking using hospital administration data versus a clinical database. Supplementary Chapter S1 Siregar S*, de Heer F*, Groenwold RHH, Versteegh MIM, Bekkers JA, Brinkman ES, Bots ML, van der Graaf Y, van Herwerden LA. Trends and outcomes of valve surgery: 16-year results of The Netherlands Adult Cardiac Surgery Database. Supplementary Chapter S2 Siregar S*, de Heer F*, Groenwold RHH, Versteegh MIM, Bots ML, van der Graaf Y, van Herwerden LA. Trends and outcomes of coronary artery bypass grafting: 16-year results of The Netherlands Adult Cardiac Surgery Database. 3*

* Both authors contributed equally to this work

248


Dankwoord

Dankwoord Graag wil ik alle mensen bedanken die hebben bijgedragen aan mijn promotieonderzoek. Prof. dr. L.A. van Herwerden, beste Professor, u bent vanaf het begin van mijn werk in de cardio-thoracale chirurgie mijn mentor geweest; in mijn tijd als co-assistent, tijdens mijn wetenschappelijke stage, in mijn ANIOS-tijd, maar vooral de afgelopen drie jaar tijdens mijn promotie. Ik ben u ongelooflijk dankbaar voor de kansen die u mij heeft geboden en het vertrouwen dat u in mij stelde. Uw kritische klinische blik en uw inzet voor de database en dit onderzoek waren onmisbaar voor dit proefschrift. Ik heb enorm veel van u geleerd. Veel dank voor alles. Prof. dr. Y. van der Graaf, beste Yolanda, dankzij jouw inzet heb ik dit onderwerp onder de aandacht kunnen brengen bij het brede publiek: je benadrukte wat het maatschappelijke belang is van dit onderzoek, je gaf me mogelijkheden om hierover te schrijven en je bracht me met mensen in contact die de interesse in kwaliteit en veiligheid delen. Veel dank voor je begeleiding. Prof. dr. M.L. Bots, beste Michiel, ook al sta je niet op de tweede pagina van dit proefschrift, veel dank voor je begeleiding gedurende het hele onderzoek. Jouw commentaar heeft mij er elke keer weer aan herinnerd om zaken niet te complex te maken. Ontzettend bedankt. Dr. R.H.H. Groenwold, beste Rolf, jij bent mijn ideale co-promotor. Je voelde precies aan wanneer je iets moest voorkauwen en wanneer ik zelf ideeĂŤn kon genereren en uitwerken. Ik bewonder jouw eigenschap om elk probleem dat ik had op te kunnen lossen. Ik kon altijd en voor alles bij je terecht. Veel dank voor je tijd, je kennis, je geduld en alles wat je me hebt geleerd. De leden van mijn beoordelingscommissie, te weten Prof. dr. G.J. Rinkel , Prof. dr. D. van Dijk, Prof. dr. J.H.H. Takkenberg, Prof. dr. R.J.M. Klautz, Prof. dr. B. Bridgewater, hartelijk dank voor uw bereidheid mijn proefschrift op zijn wetenschappelijke inhoud te beoordelen. Dear Professor Bridgewater, thank you for your willingness to serve on my dissertation committee. There is much to learn from the experiences in your country and I am keen to do so. De overige leden van mijn promotiecommissie, Prof. dr. P. Sergeant, drs. M.I.M. Versteegh, Prof. dr. M.L. Bots, Prof. dr. C. Kalkman, Prof. dr. C.K. Roes, hartelijk dank voor uw bereidheid zitting te nemen in mijn promotiecommissie.

249


Closing pages

Het bestuur van de Nederlandse Vereniging voor Thoraxchirurgie (NVT) en het bestuur van de Begeleidingscommissie Hartinterventies Nederland (BHN), hartelijk dank voor uw steun aan dit onderzoek en voor uw vertrouwen in mij als onderzoeker. Dr. Versteegh, u wil ik in het bijzonder bedanken voor uw bijdrage aan mijn artikelen. Ondanks uw vele werkzaamheden heeft u abstracts en artikelen elke keer weer binnen enkele dagen gelezen en van commentaar voorzien; als het moet zelfs in het weekend. Veel dank. Leden van de Commissie Data Registratie, zonder uw inzet voor de prachtige database, was dit onderzoek niet mogelijk geweest. Ik ben zeer dankbaar dat u mij uw data heeft toevertrouwd. Het was een voorrecht dit onderzoek te mogen doen. Hartelijk dank. Dank aan alle cardiochirurgen in Nederland. Dit onderzoek is gebaseerd op de data die u jarenlang heeft verzameld. Hartelijk dank voor de bijdrage die eenieder van u heeft geleverd aan de database. Het centrale datamanagement team van de NVT database (Klinische Informatiekunde, Academisch Medisch Centrum), te weten Willem-Jan Ter Burg, Emile Brinkman en Eva Tsjapanova, hartelijk dank voor het ontvangen, opslaan en verwerken van de gigantische hoeveelheden data elk kwartaal. In het bijzonder wil ik jullie danken voor het verwerken van de nageleverde postcodes t.b.v. dit onderzoek. Beste stafleden van de Cardio-Thoracale Chirurgie UMC Utrecht, Guido van Aarnhem, Marc Buijsrogge, Faiz Ramjankhan, Jolanda Kluin, Professor Lahpor, Ronald Meijer, ik heb met enorm veel plezier bij jullie gewerkt en ik heb met weemoed het warme nest verlaten. Jullie toewijding aan het vak is fantastisch en aanstekelijk. Dank voor alles wat ik van jullie heb mogen leren. Beste stafleden van de Klinische Epidemiologie en de Biostatistiek van het Julius Centrum UMC Utrecht, dank voor alle colleges, presentaties, bijeenkomsten en hulp bij epidemiologische en statistische problemen. Ik heb in twee jaar tijd een enorme lading aan wetenschappelijke bagage opgebouwd met jullie kennis. Hartelijk dank. Beste Josephine en Joke, dank voor het inplannen van onmogelijke afspraken, het regelen van alles wat geregeld moet worden en natuurlijk, dank voor alle gezelligheid en lekkernijen. Josephine, ik wil jou in het bijzonder bedanken voor je hulp bij het organiseren van het symposium. Ik ben enorm blij met je inspanningen voor het symposium en met alles wat je verder nog voor me hebt gedaan na mijn vertrek uit het UMC. Het symposium is voor mij nu al een succes.

250


Dankwoord

Beste Frederiek, dank voor je betrokkenheid bij mijn onderzoek. Het was prettig dat ik met jou altijd kon sparren over data, epidemiologische kwesties, R en andere nerdige zaken. Ik vind het dan ook heel leuk dat jij mijn paranimf wil zijn. Veel dank ook voor de gezelligheid en de leuke tijd die ik met je heb gehad. Ik mis de wetenschapsborrels en onderzoekerslunch nu al! Beste (oud) onderzoekers, Jerson, Linda, Paul, Hanna, David, Jesper, en beste arts-assistenten en physician assistants van de Cardio-Thoracale Chirurgie UMC Utrecht, jullie waren mijn tweede huis. Wat een gezelligheid heb ik met jullie beleefd. Dank voor de mooie tijd en dank voor jullie voortdurende interesse in mijn onderzoek. Het was heel fijn dat zelfs het duizendste oefenpraatje nog kritisch werd aangehoord. Ik hoop dat jullie allemaal je doelen bereiken en dat wij elkaar in de toekomst nog veel zien op de werkvloer en op congressen, maar ook daarbuiten op etentjes, borrels en concerten! Beste onderzoekers van het Julius Centrum, in het bijzonder Annemieke, Miranda, Joppe, Marleen, Janneke, Bas, Carla, Pauline, Frederieke, Anne, Floriaan en Ewoud, dank voor de gezelligheid en het delen van jullie kennis. Het was een plezier om twee jaar bij jullie te werken. Veel succes in jullie verdere carrières en houd me op de hoogte van promoties! Professor E. Hannan, Kim Cozzens and all staff members of the Cardiac Surgery Program Service of New York State, thank you for your hospitality and for showing me the ins and outs of your program. Dear Ed, thank you for the interesting discussions on health care, the differences between The Netherlands and New York, public accountability and much more. I hope we meet again. Professor E. Blackstone, Pam Goepfarth and everyone at the Clinical Investigations Unit of the Heart and Vascular Institute at Cleveland Clinic, thank you for hosting me. I had a great time. I learned a lot about your quality program and most of all, I learned the importance of feedback on actionable data. Thank you. Mijn vrienden, in het bijzonder jaarclub Aqua, dank voor jullie interesse in mijn onderzoek, dank voor alle leuke etentjes, reisjes, feestjes etc., maar bovenal veel dank voor jullie vriendschap. Het is altijd een feest om weer bij jullie te zijn. Simon, dank voor het ontwerpen van de kaft. Lieve (schoon)familie, dank voor jullie interesse in mijn onderzoek.

251


Closing pages

Lieve papa en mama, dit proefschrift is voor jullie. Na jarenlang zaaien is het nu tijd om te oogsten. Wat jullie mij hebben meegegeven aan discipline en ambitie en jullie voortdurende steun en liefde hebben ervoor gezorgd dat ik dit onderzoek heb kunnen uitvoeren. Ik ben zo enorm dankbaar. Ik hoop dat jullie trots zijn. Ir. Sjahdian Siregar, lieve Abang, ik heb altijd naar je opgekeken. Als grote broer heb jij de ladder hoog gelegd, zodat ik kon blijven streven naar hoger. Toekomstig Ir. Sjarifa Siregar, lieve Koekie, ik vind het heel leuk dat jij mijn paranimf wil zijn. Ik hoop dat het je inspiratie geeft voor je eigen carrière. Jullie steun en liefde is onbetaalbaar. Dank. Lieve Maurits, dank dat ik dit hele onderzoek en al het andere met jou kan delen. Jouw bijdrage aan dit proefschrift is enorm. Dank voor alles.

Sabrina Siregar, april 2013

252


Curriculum Vitae

Curriculum Vitae Sabrina Siregar was born on 24 October 1985 in The Hague, The Netherlands. After graduating secondary school (Maerlant Lyceum, The Hague) in 2002, she started her Medicine studies at Utrecht University. She obtained her medical degree in 2009 and subsequently worked as a resident at the Department of Cardio-Thoracic Surgery at the University Medical Center Utrecht. In 2010, she started her research work on outcomes and safety in cardiac surgery. She explored and analyzed the Adult Cardiac Surgery Database of the Netherlands Association for Cardio-Thoracic Surgery under supervision of Prof. dr. L.A. van Herwerden (Department of Cardio-Thoracic Surgery), Prof. dr. Y. van der Graaf and Dr. R.H.H. Groenwold (both from the Julius Center for Health Sciences and Primary care). During her research work she obtained a Master of Science degree in Epidemiology at Utrecht University. In March 2013, she resumed her clinical work at the Department of Cardio-Thoracic Surgery at Leiden University Medical Center.

253


Safety in cardiac surgery  

Veiligheid en hartchirurgie

Read more
Read more
Similar to
Popular now
Just for you