Acta Orthopaedica, Vol. 92, Issue 5, October 2021

Page 1

5/21

Medical ACTA ORTHOPAEDICA

LONGER IMPLANT SURVIVAL. WITH THE RIGHT BONE CEMENT.

Real World Data show:

19%

lower revision risk* with PALACOS® R+G compared to other bone cements

* Calculated difference of cumulative revision rates in knee arthroplasty at 14 years of implantation NJR Data Supplier Feedback (summary reports); Cumulative revision rates (2007–2020) status May 2021. Current report accessible at http://herae.Us/njr-data :H WKDQN WKH SDWLHQWV DQG VWDII RI DOO WKH KRVSLWDOV LQ (QJODQG :DOHV 1RUWKHUQ ,UHODQG DQG WKH ,VOH RI 0DQ ZKR KDYH FRQWULEXWHG GDWD WR WKH 1DWLRQDO -RLQW 5HJLVWU\ :HbDUH JUDWHIXO WR WKH +HDOWKFDUH 4XDOLW\ ,PSURYHPHQW 3DUWQHUVKLS +4,3 WKH 1-5 6WHHULQJ &RPPLWWHH DQG VWDII DW WKH 1-5 &HQWUH IRU IDFLOLWDWLQJ WKLV ZRUN 7KH YLHZV H[SUHVVHG UHSUHVHQW WKRVH RI +HUDHXV 0HGLFDO *PE+ DQG GR QRW QHFHVVDULO\ UHƃHFW WKRVH RI WKH 1DWLRQDO -RLQW 5HJLVWU\ 6WHHULQJ &RPPLWWHH RU WKH +HDOWK 4XDOLW\ ,PSURYHPHQW 3DUWQHUVKLS +4,3 ZKR GR QRW YRXFK IRU KRZ WKH LQIRUPDWLRQ LV SUHVHQWHG

COVER.indd 1

10666

www.heraeus-medical.com

Vol. 92, No. 5, 2021 (pp. 501–632)

The element of success in joint replacement

Volume 92, Number 5, October 2021

09-09-2021 18:39:33


Acta Orthopaedica is owned by the Nordic Orthopaedic Federation and is the official publication of the Nordic Orthopaedic Federation

EDITORIAL OFFICE

Acta Orthopaedica Department of Orthopedics Lund University Hospital SE–221 85 Lund, Sweden E-mail: acta.ort@med.lu.se Homepage: http://www.actaorthop.org

THE FOUNDATION BOARD OF THE NORDIC ORTHOPAEDIC FEDERATION AND ACTA ORTHOPAEDICA

EDITOR

Anders Rydholm Lund, Sweden DEPUTY EDITOR

Peter A Frandsen Odense, Denmark CO-EDITORS

Li Felländer-Tsai Stockholm, Sweden Nils Hailer Uppsala, Sweden Ivan Hvid Oslo, Norway Søren Overgaard Copenhagen, Denmark Cecilia Rogmark Malmö, Sweden Urban Rydholm Lund, Sweden Bart A Swierstra Wageningen, The Netherlands Eivind Witsø Trondheim, Norway

Peter Frandsen Denmark Ragnar Jonsson Iceland Heikki Kröger Finland Anders Rydholm Sweden Kees Verheyen the Netherlands

WEB EDITOR

Magnus Tägil Lund, Sweden STATISTICAL EDITOR

Jonas Ranstam Lund, Sweden Philippe Wagner Västerås, Lund PRODUCTION MANAGER

Kaj Knutson Lund, Sweden Vol. 92, No. 5, 2021


SUBSCRIPTION INFORMATION Acta Orthopaedica [print 1745-3674, online 1745-3682] is a peerreviewed journal, published six times a year plus supplements by Taylor & Francis on behalf of Nordic Orthopaedic Federation.

Airfreight and mailing in the USA by agent named WN Shipping USA, 156-15, 146th Avenue, 2nd Floor, Jamaica, NY 11434, USA. Periodicals postage paid at Jamaica NY 11431.

Annual Institutional Subscription, Volume 92, 2021

US Postmaster: Send address changes to Acta Orthopaedica, WN Shipping USA, 156-15, 146th Avenue, 2nd Floor, Jamaica, NY 11434, USA.

$1,291

£798

€1,035

The subscription fee purchases an online subscription. The price includes access to current content and back issues to January 1997 (if available). Printed copies of the journal are provided on request as a free supplementary service accompanying an online subscription. Supplements to the journal are also included in the subscription price. For more information, visit the journal’s website: http://www.tandfonline.com/IORT Manuscripts should be uploaded at http://www.manuscriptmanager.com/ao/ for further handling at: Acta Orthopaedica Editorial Office, Department of Orthopaedics, Lund University Hospital, SE-221 85 Lund, Sweden Correspondance concerning copyright and permissions should be sent to: Maria Montzka, Portfolio Manager – Medicine P.O. Box 3255, SE-103 65 Stockholm, Sweden, Tel: +46 (0)760 14 24 68. Fax: +46 (0)8 440 80 50. E-mail: maria.montzka@informa.com Ordering information: Please contact your local Customer Service Department to take out a subscription to the Journal: USA, Canada: Taylor & Francis, Inc., 530 Walnut Street, Suite 850, Philadelphia, PA 19106, USA. Tel: +1 800 354 1420; Fax: +1 215 207 0050. UK/ Europe/Rest of World: T&F Customer Services, Informa UK Ltd, Sheepen Place, Colchester, Essex, CO3 3LP, United Kingdom. Tel: +44 (0) 20 7017 5544; Fax: +44 (0) 20 7017 5198; Email: subscriptions@tandf.co.uk Dollar rates apply to all subscribers outside of Europe. Euro rates apply to all subscribers in Europe except the UK and Republic of Ireland. If you are unsure which applies, contact Customer Services. All subscriptions are payable in advance and all rates include postage. Journals are sent by air to the USA, Canada, Mexico, India, Japan and Australasia. Subscriptions are entered on an annual basis, i.e., January to December. Payment may be made by sterling check, US dollar check, euro check, international money order, National Giro, or credit card (Amex, Visa and Mastercard). Back issues: Taylor & Francis retains a two-year back issue stock of journals. Older volumes are held by our official stockists to whom all orders and enquiries should be addressed: Periodicals Service Company, 351 Fairview Ave., Suite 300, Hudson, New York 12534, USA. Tel: +1 518 537 4700; fax: +1 518 537 5899; e-mail: psc@periodicals.com.

Subscription records are maintained at Taylor & Francis Group, 4 Park Square, Milton Park, Abingdon, OX14 4RN, United Kingdom. Copyright © 2021 The Author(s). Published by Taylor & Francis on behalf of the Nordic Orthopedic Federation. This is an Open Access article distributed under the terms of the Creative Commons Attribution-Non-Commercial License (https://creativecommons.org/licenses/by-nc/3.0 . Informa UK Limited, trading as Taylor & Francis Group makes every effort to ensure the accuracy of all the information (the “Content”) contained in its publications. However, Informa UK Limited, trading as Taylor & Francis Group, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Informa UK Limited, trading as Taylor & Francis Group. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Informa UK Limited, trading as Taylor & Francis Group shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. Terms & Conditions of access and use can be found at http://www.tandfonline. com/page/terms-and-conditions Indexed/abstracted in: Allied and Complementary Medicine Library (Amed); ASCA (Automatic Subject Citation Alert); Biological Abstracts; Chemical Abstracts; Cumulative Index to Nursing and Allied Health Literature(CINAHL); Current Advances in Ecological and Environmental Sciences; Current Contents/Clinical Medicine; Current Contents/Life Sciences; Developmental Medicine and Child Neurology; Energy Research Abstracts; EMBASE/ Excerpta Medica; Faxon Finder; Focus On: Sports Science & Medicine; Health Planning and Administration; Index Medicus/MEDLINE; Index to Dental Literature; Index Veterinarius; INIS Atomindex; Medical Documentation Service; Nuclear Science Abstracts (Ceased); Periodicals Scanned and Abstracted. Life Sciences Collection; Research Alert; Science Citation Index; SciSearch; SportSearch; Uncover Veterinary Bulletin. Printed in England by Henry Ling


Acta Orthopaedica

ISSN 1745-3674

Vol. 92, No. 5, October 2021 Guest editorial When the placebo effect is not an effect Bone quality makes a difference

501 503

I Harris H T Aro

Annotation Evidence-based orthopedics and the myth of restoring the anatomy

505

S Brorson

Placebo Surgeons’ behaviors and beliefs regarding placebo effects in surgery

507

A Rosén, L Sachs, A Ekdahl, A Westberg, P Gerdhem, T J Kaptchuk, and K Jensen

513

J Olczak, J Pavlopoulos, Jprijs, F F A Ijpma, J N Doornberg, C Lundström, J Hedlund, and M Gordon

526

P T Ogink, O Q Groot, A V Karhade, M E R Bongers, F C Oner, J-J Verlaan, and J H Schwab

532

A Joelson, F G Sigmundsson, and J Karlsson

538 544

K Dyreborg, M S Sørensen, G Flivik, S Solgaard, and M M Petersen C F Frandsen, E N Glassou, M Stilling, and T B Hansen

551

V Turppo, R Sund, J Huopio, H Kröger, and J Sirola

557 562

C E Husted, H Husted, C S Nielsen, M Mikkelsen, A Troelsen, and K Gromov T Wörner, F Eek, J Kraus-Schmitz, M Sansone, and A Stålman

568

H Bergvinsson, V Zampelis, M Sundberg, and G Flivik

575

R Leide, A Bohman, D Wenger, S Overgaard, C J Tiderius, and C Rogmark N M Edwards, C Varnum, S Overgaard, and A B Pedersen

AI Presenting artificial intelligence, deep learning, and machine learning studies to clinicians and healthcare stakeholders: an introductory reference with a guideline and a Clinical AI Research (CAIR) checklist proposal Wide range of applications for machine-learning prediction models in orthopedic surgical outcome: a systematic review Spine Properties of SF-6D when longitudinal data from 16,398 spine surgery procedures is applied to 9 national SF-6D value sets Hip Preoperative BMD does not influence femoral stem subsidence of uncemented THA when the femoral T-score is > –2.5 Poor adherence to guidelines in treatment of fragile and cognitively impaired patients with hip fracture: a descriptive study of 2,804 patients Physical capability after total joint arthroplasty: long-term population-based follow-up study of 6,462 women No increase in postoperative contacts with the healthcare system following outpatient total hip and knee arthroplasty Rapid decline of yearly number of hip arthroscopies in Sweden: a retrospective time series of 6,105 hip arthroscopies based on a national patient data register Highly cross-linked polyethylene still outperforms conventional polyethylene in THA: 10-year RSA results Hip dysplasia is not uncommon but frequently overlooked: a crosssectional study based on radiographic examination of 1,870 adults Impact of socioeconomic status on the 90- and 365-day rate of revision and mortality after primary total hip arthroplasty: a cohort study based on 103,901 patients with osteoarthritis from national databases in Denmark Knee Less improvement following meniscal repair compared with arthroscopic partial meniscectomy: a prospective cohort study of patient-reported outcomes in 150 young adults at 1- and 5-years’ follow-up Reasons for revision are associated with rerevised total knee arthroplasties: an analysis of 8,978 index revisions in the Dutch Arthroplasty Register Short-term functional outcome after fast-track primary total knee arthroplasty: analysis of 623 patients Children A roadmap to surgery in osteogenesis imperfecta: results of an international collaboration of patient organizations and interdisciplinary care teams Compensation claims in pediatric orthopedics in Norway between 2012 and 2018: a nationwide study of 487 patients The STRYDE limb lengthening nail is susceptible to mechanically assisted crevice corrosion: an analysis of 23 retrieved implants

581

589

K Pihl, M Englund, R Christensen, L S Lohmander, U Jørgensen, B Viberg, J V Fristed, and J B Thorlund

597

M Belt, G Hannink, J Smolders, A Spekenbrink-Spooren, B W Schreurs, and K Smulders

602

J C van Egmond, B Hesseling, H Verburg, and N M C Mathijssen

608

R J Sakkers, K Montpetit, A Tsimicalis, T Wirth, M Verhoef, R Hamdy, J A Ouellet, R M Castelein, C Damas, G J Janus, W H Nijhuis, L Panzeri, S Paveri, D Mekking, K Thorstad, and R W Kruse J Horn, H Rasmussen, I R K Bukholm, O Røise, and T Terjesen M S Jellesen, T N Lomholt, R Q Hansen, T Mathiesen, C Gundlach, S Kold, T Nygaard, M Mikuzis, U K Olesen, and J D Rölfing

615 621


Bibliometrics The rise of registry-based research: a bibliometric analysis Information to authors (see http://www.actaorthop.org/)

628

E Romanini, I Schettini, M Torre, M Venosa, A Tarantino, V Calvisi, and G Zanoli


Acta Orthopaedica 2021; 92 (5): 501–502

501

Guest editorial

When the placebo effect is not an effect In this issue of the journal, Rosén et al. report the results of a survey of surgeons regarding placebo effects.1 In particular, they show that surgeons consider “non-specific” effects (including aspects of the surgeon-patient interaction) and “placebo effects” to be important. Their definitions of these terms are clearly stated but not universally accepted. A discussion of these terms is necessary to allow surgeons to better understand the reasons why people improve after surgery. The terms placebo effect, non-specific effects and contextual effects all refer to responses that are separate to the specific effects of surgery: the effect that results from the anatomical and physiological changes brought about by the surgical procedure. The specific surgical effect is best measured in a high-fidelity placebo trial in which all things are equal except the specific part of the procedure in question.2 If the surgical group has better outcomes than the placebo group, it can be implied that the difference between the groups was due entirely to the specific effect of the surgery. However, in many placebo surgical trials in orthopaedics, the difference in outcome between placebo surgery and the surgical procedure is not significant, i.e., there is no specific effect of surgery.3-7 Importantly, however, is the observation that in all of these studies, both groups (surgery and placebo surgery) show significant improvements after surgery. What is of interest, and what needs defining, is what causes the observed improvement when that improvement is not due to the specific effect of surgery. Using the term “placebo effect” to describe the improvement seen after a placebo procedure is common. It is the definition used by Rosén et al. 1 and the definition used in my own book.8 The problem with using that definition is that it suggests that the improvement seen after placebo surgery is due to the placebo. In reality, the improvement is likely to have occurred without the placebo surgery. In fact, placebos, by definition, have no direct effect themselves. Some of the improvement seen after placebo surgery may be due to contextual effects; what Rosén et al refer to as nonspecific effects and others have referred to as “ritual” effects.9 These include patient expectations, the confidence and personality of the surgeon, and even the cost of the procedure. While contextual effects may explain some of the improvement after surgery, these effects are often short-lived and are unlikely to explain larger, sustained improvements.

Three other, often overlooked factors are likely to explain most or all of the improvement after surgery seen in placebo surgical trials, and in much of the surgery we perform in clinical practice. These factors are natural history, regression to the mean and concomitant treatment. Natural history (what would happen regardless of treatment) explains, for example, the eventual resolution of symptoms from the common cold. It also explains most or all of the improvement in pain that occurs after injuries. Natural history also confounds the treatment of fluctuating conditions such as multiple sclerosis. Without considering natural history in trials of surgery, we may falsely attribute improvement that occurs post-surgery as being due to the surgery (the fallacy of post hoc ergo propter hoc: it follows, therefore it is because of). Regression to the mean occurs when we select patients who are currently at one end of a spectrum and follow them. Over time, they will fall closer to the mean. A good example is provided by Daniel Kahneman in his book, Thinking, Fast and Slow. He described an experienced flight instructor who recommended punishing bad performance (those at the lower end of the spectrum) and not praising good performance (those at the high end of the spectrum), saying: “On many occasions I have praised well performing cadets… the next time they usually do worse. On the other hand, I have often screamed into a cadet’s earphone for bad execution, and in general he does better on his next try”.10 Both groups were simply regressing to the mean. Similarly, only selecting people with severe knee pain from a pool of people with osteoarthritis of the knee (a condition in which symptoms fluctuate widely) will make any treatment look good. The average pain in that group will fall closer to the mean over time and, similarly, others in the pool who were not selected will have severe pain later. Regression to the mean is likely to explain the similar improvement reported in knee arthritis symptoms for the multitude of therapies in which a before-and-after analysis is performed. The phenomena of natural history and regression to the mean are strong reasons for a control group in any study and why caution is advised when interpreting non-comparative studies. Concomitant treatment is also often overlooked. In a landmark trial comparing (the then new) bone morphogenetic protein (BMP) to (traditional) bone grafting in the treatment of

© 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group, on behalf of the Nordic Orthopedic Federation. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits ­unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. DOI 10.1080/17453674.2021.1969155


502

Acta Orthopaedica 2021; 92 (5): 501–502

Observed improvement Non-specific effects Concomitant treatment Regression to the mean Natural history

there may be some contextual effects, and the effects of natural history and concomitant treatment (post-operative physical therapy) may be strong. Conversely, the specific surgical effect of surgery to stabilize an unstable knee may be very large, and the role of other effects minor. While the definitions of some or all these effects vary, it is important for surgeons to understand the factors contributing to the observed improvement after surgery.

Context/ritual

Specific surgical effect The contribution of different effects to observed improvement after surgery.

ununited tibia fractures, BMP (with a union rate of 75%) was deemed equally effective as bone grafting (with a union rate of 84%). However, all patients also received intramedullary nailing, which produces similar union rates when used alone as a treatment for ununited tibia fractures.11 It is quite possible that BMP (and bone grafting) added nothing, and that the perceived effectiveness was entirely due to the concomitant treatment of intramedullary nailing. Placebo surgical trials have a great advantage in determining the effectiveness of surgery because they provide the same contextual effects in both groups, patients are blinded and all other factors are equal between groups. It is my opinion, however, that the use of the term “placebo effect” to describe all the improvement seen after surgery has led to the widespread misinterpretation that the placebo treatment causes the improvement. Surgeons need to understand all the factors that contribute to improvements seen after surgery. These include, apart from any specific effects of surgery, contextual effects around the procedure, the effect of concomitant treatments, and what would have happened anyway (natural history and regression to the mean), and are summarized in the Figure. Surgeons should also understand that components of the Figure are not to scale. In other words, the relative role of each component will vary widely depending on the procedure. For example, in the surgical management of proximal humerus fractures, it is likely that the specific surgical effect is small,

1. Rosén A, Sachs L, EKdahl A, Westberg A, Gerdhem P, Kaptchuck T J, Jensen K. Surgeons’ behaviors and beliefs regarding placebo effects in surgery. Acta Orthop 2021; 92(x): x-x. Epub ahead of print. 2. Beard D J, Campbell M K, Blazeby J M, et al. Considerations and methods for placebo controls in surgical trials (ASPIRE guidelines). The Lancet 2020; 395(10226): 828-38. 3. Sihvonen R, Paavola M, Malmivaara A, et al. Arthroscopic partial meniscectomy versus sham surgery for a degenerative meniscal tear. N Engl J Med 2013; 369(26): 2515-24. 4. Moseley J B, O’Malley K, Petersen N J, et al. A controlled trial of arthroscopic surgery for osteoarthritis of the knee. N Engl J Med 2002; 347(2): 81-8. 5. Beard DJ, Rees J L, Cook J A, et al. Arthroscopic subacromial decompression for subacromial shoulder pain (CSAW): a multicentre, pragmatic, parallel group, placebo-controlled, three-group, randomised surgical trial. The Lancet 2018;3 91(10118): 329-38. 6. Paavola M, Malmivaara A, Taimela S, et al. Subacromial decompression versus diagnostic arthroscopy for shoulder impingement: randomised, placebo surgery controlled clinical trial. BMJ 2018; 362: k2860. 7. Schrøder C P, Skare Ø, Reikerås O, Mowinckel P, Brox J I. Sham surgery versus labral repair or biceps tenodesis for type II SLAP lesions of the shoulder: a three-armed randomised clinical trial. Br J Sports Med 2017; 51(24): 1759-66. 8. Harris I. Surgery, the ultimate placebo. Sydney: NewSouth Publishing; 2016. 9. Green S A. Surgeons and shamans: the placebo value of ritual. Clin Orthop Relat Res 2006; 450: 249-54. 10. Kahneman D. Thinking, fast and slow. New York: Farrar, Straus and Giroux; 2011. 11. Brinker M R, O’Connor D P. Exchange nailing of ununited fractures. J Bone Joint Surg Am 2007;89(1): 177-88.

Ian Harris Ingham Institute for Applied Medical Research, South Western Sydney Clinical School, UNSW Sydney, Australia Email: iaharris1@gmail.com


Acta Orthopaedica 2021; 92 (5): 503–504

503

Guest editorial

Bone quality makes a difference In total hip arthroplasty, poor bone quality is a major risk factor for complications. There is increasing evidence that osteoporotic patients and particularly osteoporotic females should receive a cemented prosthesis to avoid peroperative or early postoperative periprosthetic fractures, and reduce the risk of poor fixation resulting in pronounced subsidence eventually resulting in clinical loosening. In cementless total hip arthroplasty, adequate initial femoral stem stability is necessary for clinical success. Clinical subsidence, measurable from routine radiographs, predisposes femoral stems to early failure (Warth et al. 2020). Avoiding undersizing the femoral stem with a meticulous broaching technique may minimize the risk of subsidence (Warth et al. 2020). The acceptable level of subsidence of cementless stems, as determined by radiostereometric analysis, is still unknown (van der Voort et al. 2015). Minimal subsidence (mean 0.8 mm) has resulted in faster recovery of walking speed and walking activity with improved patient-reported outcomes (Aro et al. 2021). In this issue of Acta Orthopaedica, Dyreborg et al. (2021) present results from a secondary analysis of clinical trial data. Based on dual-energy X-ray absorptiometry (DXA), the cohort (age ≤ 75 years) involved subjects with normal or osteopenic bone mineral density (BMD) of the femoral neck. The authors found no relationship between hip BMD and postoperative femoral stem subsidence. In line with the results of Dyreborg et al. (2021), total hip BMD measured by DXA failed to identify women prone to stem subsidence (Nazari-Farsani et al. 2020). DXA seems to overestimate BMD of osteoarthritic hips, explaining its ability to predict stem migration. Naturally, DXA measurements from other sites (contralateral hip, lumbar spine, and distal radius) are important in the detection of undiagnosed osteoporosis. Interestingly, peripheral DXA evaluation of bone quality seems to work better than hip DXA. BMD of the distal radius, measured by DXA, and cortical-bone thickness of the distal radius, measured by pulse-echo ultrasonometry, may help discriminate women at high risk of stem subsidence (NazariFarsani et al. 2020). Explaining the risk of subsidence, women with low cortical-bone thickness of the distal radius had lower total hip BMD and reduced thickness of the medial cortex of the proximal femur with lower stem-to-canal fill ratios.

Cementless stems differ in the means of obtaining cortical contact and initial stability (Khanuja et al. 2011). Stem design influences the risk of periprosthetic femoral fracture (Thien et al. 2014), postoperative bone remodeling (Karachalios et al. 2019), and stem subsidence (de Vries et al. 2014). Parallel-sided femoral stems are designed to engage the metaphyseal cortical bone in the mediolateral plane only. The stem type requires adequate bone stock and unaltered femoral geometry (Grayson and Meneghini 2017). This important instructional notice raises questions. What are the criteria for normal bone stock? Is DXA-measured osteopenia a contraindication? A parallel-sided femoral stem showed subsidence (mean 1.8 mm) even in postmenopausal women with normal hip BMD and Dorr A/B femur anatomy (Nazari-Farsani et al. 2021). Analyses with quantitative computed tomography (QCT) revealed that the stem subsidence occurred in women with high bone turnover and decreased volumetric BMD of the intertrochanteric region (Aro et al. 2021). This region is critical for the stem stability. In ageing women, endosteal trabeculation and increased intracortical porosity of the proximal femur (Zebaze et al. 2010) pose natural difficulties in achieving stem stability. In men, factors causing stem subsidence seem to be different, including young age, high bodyweight, and increased early postoperative activity (Bottner et al. 2005). Even in women, physical activity seems to dictate the direction of stem rotation. Femoral stems are affected by high torsional moments during daily activities (Bergmann et al. 2001) and postoperative walking activity, aside from total hip BMD, is associated with stem rotation in postmenopausal women (Nazari-Farsani et al. 2021). Early walking activity creates the typical pattern of slight internal stem rotation, while stems do not rotate in women with low walking activity. Stem migration does not seem to be a random event. DXA is insensitive to critical, albeit subtle, structural changes of the proximal femur. In the future, emerging robotic techniques may allow routine measurement of local volumetric BMD by means of preoperative QCT. Such an approach could improve screening of bone quality and help patient selection for the use of cementless techniques.

© 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group, on behalf of the Nordic Orthopedic Federation. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits ­unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. DOI 10.1080/17453674.2021.1941632


504

Potential conflicts of interest The author has received a consultation fee from UCB Biopharma Sprl (Belgium) and institutional research grants from the Academy of Finland and Amgen Inc. (USA).

Hannu T. Aro Department of Orthopaedic Surgery and Traumatology, Turku University Hospital and University of Turku, Turku, Finland Email: hannu.aro@utu.fi Aro H T, Engelke K, Mattila K, Löyttyniemi E. Volumetric bone mineral density in cementless total hip arthroplasty in postmenopausal women: effects on primary femoral stem stability and clinical recovery. J Bone Joint Surg Am 2021; Epub ahead of print. doi: 10.2106/JBJS.20.01614 Bergmann G, Deuretzbacher G, Heller M, Graichen F, Rohlmann A, Strauss J, Duda G N. Hip contact forces and gait patterns from routine activities. J Biomech 2001; 34: 859-71. Bottner F, Zawadsky M, Su E P, Bostrom M, Palm L, Ryd L, Sculco T P. Implant migration after early weightbearing in cementless hip replacement. Clin Orthop Relat Res 2005; 436: 132-7. de Vries LM, van der Weegen W, Pilot P, Stolarczyk P A, Sijbesma T, Hoffman E L. The predictive value of radiostereometric analysis for stem survival in total hip arthroplasty: a systematic review. Hip Int 2014; 24(3): 215-22. Dyreborg K, Sørensen M S, Flivik G, Solgaard S, Petersen M M. Preoperative BMD does not influence femoral stem subsidence of uncemented THA when the femoral T-score is > 2.5. Acta Orthop 2021; 92.

Acta Orthopaedica 2021; 92 (5): 503–504

Grayson C, Meneghini R M. Parallel-sided femoral stems. In: Lieberman J R, Berry D J, editors. Advanced Reconstruction: Hip 2. Rosemont, IL: American Academy of Orthopaedic Surgeons and the Hip Society; 2017. p. 119-25. Karachalios T, Palaiochorlidis E, Komnos G. Clinical relevance of bone remodelling around conventional and conservative (short-stem) total hip arthroplasty implants. Hip Int 2019; 29(1): 4-6. Khanuja H S, Vakil J J, Goddard M S, Mont M A. Cementless femoral fixation in total hip arthroplasty. J Bone Joint Surg Am 2011; 93(5): 500-9. Nazari-Farsani S, Vuopio M E, Aro H T. Bone mineral density and corticalbone thickness of the distal radius predict femoral stem subsidence in postmenopausal women. J Arthroplasty 2020; 35(7): 1877-84. Nazari-Farsani S, Vuopio M, Löyttyniemi E, Aro H T. Contributing factors to the initial femoral stem migration in cementless total hip arthroplasty of postmenopausal women. J Biomech 2021; 117: 110262. Thien T M, Chatziagorou G, Garellick G, Furnes O, Havelin L I, Mäkelä K, Overgaard S, Pedersen A, Eskelinen A, Pulkkinen P, Kärrholm J. Periprosthetic femoral fracture within two years after total hip replacement: analysis of 437,629 operations in the Nordic arthroplasty register association database. J Bone Joint Surg Am 2014; 96(19): e167. van der Voort P, Pijls B G, Nieuwenhuijse M J, Jasper J, Fiocco M, Plevier J W M, Middeldorp S, Valstar E R, Nelissen R G H H. Early subsidence of shape-closed hip arthroplasty stems is associated with late revision. Acta Orthop 2015; 86(5): 575-85. Warth L C, Grant T W, Naveen N B, Deckard E R, Ziemba-Davis M, Meneghini R M. Inadequate metadiaphyseal fill of a modern taper-wedge stem increases subsidence and risk of aseptic loosening: technique and distal canal fill matter. J Arthroplasty 2020; 35(7): 1868-76. Zebaze R M, Ghasem-Zadeh A, Bohte A, Iuliano-Burns S, Mirams M, Price R I, Mackie E J, Seeman E. Intracortical remodelling and porosity in the distal radius and post-mortem femurs of women: a cross-sectional study. Lancet 2010; 375(9727): 1729-36.


Acta Orthopaedica 2021; 92 (5): 505–506

505

Annotation

Evidence-based orthopedics and the myth of restoring the anatomy Stig BRORSON

Centre for Evidence-Based Orthopaedics, Department of Orthopaedic Surgery, Zealand University Hospital, Denmark, and Department of Clinical Medicine, University of Copenhagen, Denmark Correspondence: sbror@regionsjaelland.dk Submitted 2021-03-21 Accepted 2021-03-25

As orthopedic surgeons we have a strong inclination towards bringing broken bones together. Traditionally, in displaced fractures the anatomy should be restored and the success of surgery should subsequently be documented by postoperative imaging. In some common upper limb fractures, for example in displaced fractures of the proximal humerus, best evidence challenges our intuitions. On the one hand, current evidence has failed to demonstrate any benefits to patients in bringing the displaced fragments together by means of plating or nailing or even by replacing the joint (Aspenberg 2015). The only difference is an increased risk of additional surgery in the surgical group (Handoll and Brorson 2015). On the other hand, by following evidence-based recommendations we shall face a substantial number of displaced fractures healing in malunion. Passed-down knowledge and practice are challenged. As doctors we aim to offer the patient optimal treatment. Should this patient be offered surgery to restore the anatomy

(Figure), even when randomized trials (Rangan et al. 2015, Launonen et al. 2019) have been unable to document any benefits to the patient in terms of function, quality of life, pain, or any other outcome? David Sackett in orthopedics Let’s revisit David Sackett’s (1934–2015) definition of evidence-based practice: Evidence-based practice includes the integration of best available evidence, clinical expertise, and patient values (Sackett 1997). 1st, best available evidence does not support surgery in this case. 2nd, clinical expertise is limited when it comes to severely displaced fractures managed nonoperatively. Experience is mainly surgical, and the vast majority of studies read in orthopedic departments are clinical series concerning surgical techniques and implants. 67% of the literature on proximal humerus fractures concerns operative treatments compared

68-year-old female suffering an impacted 2-part fracture of the proximal humerus. Radiographs were taken on admission and after 5 months. Clinical photos were taken after 5 months. Pain-free shoulder function was obtained after 3 months. In this case intuition will tell most orthopedic surgeons to restore the anatomy, most likely by open reduction and internal fixation with a locking plate. Best evidence from randomized trials does not support this decision (Rangan et al. 2015, Launonen et al. 2019). How to act? © 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group, on behalf of the Nordic Orthopedic Federation. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits ­unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. DOI 10.1080/17453674.2021.1916686


506

with 4% including nonoperative treatments (Slobogean et al. 2015). 3rd, patient values need to be explored. They are not accessible from radiographs or demographic data yet often form the basis for operative decisions. Many elderly patients with displaced fractures of the proximal humerus have limited interest in surgical interventions unless the surgeon states that this is the only way to regain function and quality of life. However, this answer is no longer compatible with best evidence. Surprisingly, and for unknown reasons, in many countries the use of surgery for fractures of the proximal humerus has increased for decades and seems to be continuing to increase (Huttunen et al. 2012, Sumrein et al. 2017, Sabesan et al. 2017, Jo et al. 2019, Klug et al. 2019). The challenge to the modern orthopedic surgeon As orthopedic surgeons we need to reconsider current practice for certain common fracture patterns like 2-, 3-, and 4-part fractures of the proximal humerus. Similar considerations can be made for other upper limb fractures. In some displaced fractures of the clavicle (Lenza et al. 2019), the humeral shaft (Rämö et al. 2020), the olecranon (Chen et al. 2021), and the distal radius (Mulders et al. 2018) even severe radiological malunion seems to be well tolerated, at least in older adults. Patient selection is crucial. We should ask for the patient’s values and preferences instead of focusing exclusively on radiographic appearance and surgical techniques as the basis for shared decision-making. Outcome measures should be patient reported as radiographic measures poorly reflect patient preferences. Orthopedic surgeons conducting evidence-based practice should be aware that “bringing the bones together” may be intuitively right but in some cases is not supported by the best available evidence. Before the era of radiology and evidence-based medicine the pioneer of surgery and surgical pathology, R.W. Smith (1807–1873), clearly stated the point: “The impacted fracture of the neck of the humerus always unites with a certain amount of deformity … it would be imprudent to restore to the joint its natural form, even were it in our power to accomplish it, for we would thus materially diminish the chance of the occurrence of osseous consolidation … but the prudent surgeon will never omit to announce to the patient that a certain degree of impairment of the motions of the joint will be a permanent result of the injury (Smith 1847).”

Acta Orthopaedica 2021; 92 (5): 505–506

Aspenberg P. Why do we operate proximal humeral fractures? Acta Orthop 2015; 86(3): 279. doi: 10.3109/17453674.2015.1042321. Chen M J, Campbell S T, Finlay A K, Duckworth A D, Bishop J A, Gardner M J. Surgical and nonoperative management of olecranon fractures in the elderly: a systematic review and meta-analysis. J Orthop Trauma 2021; 35(1): 10-16. DOI: 10.1097/BOT.0000000000001865 Handoll H H, Brorson S. Interventions for treating proximal humeral fractures in adults. Cochrane Database of Systematic Reviews; 2015; (11): CD000434. DOI: 10.1002/14651858.CD000434.pub4 Huttunen T T, Launonen A P, Pihlajamäki H, Kannus P, Mattila V M. Trends in the surgical treatment of proximal humeral fractures: a nationwide 23-year study in Finland. BMC Musculoskelet Disord 2012; 13(1): 261. Doi: 10.1186/1471-2474-13-261 Jo Y-H, Lee K-H, Lee B-G. Surgical trends in elderly patients with proximal humeral fractures in South Korea: a population-based study. BMC Musculoskelet Disord 2019; 20(1): 136. Doi: 10.1186/s12891-019-2515-2. Klug A, Gramlich Y, Wincheringer D, Schmidt-Horlohé K, Hoffmann R. Trends in surgical management of proximal humeral fractures in adults: a nationwide study of records in Germany from 2007 to 2016. Arch Orthop Trauma Surg 2019; 139(12): 1713-21. Doi: 10.1007/s00402-019-03252-1. Launonen A P, Sumrein B O, Reito A, Lepola V, Paloneva J, Jonsson K B, et al. Operative versus non-operative treatment for 2-part proximal humerus fracture: a multicenter randomized controlled trial. PLoS Med 2019; 16(7): e1002855. Doi: 10.1371/journal.pmed.1002855. Lenza M, Buchbinder R, Johnston R V, Ferrari B A S, Faloppa F. Surgical versus conservative interventions for treating fractures of the middle third of the clavicle. Cochrane Database Syst Rev 2019; 1: CD009363. Doi: 10.1002/14651858.CD009363.pub3. Mulders M A M, Detering R, Rikli D A, Rosenwasser M P, Goslings J C, Schep N W L. Association between radiological and patient-reported outcome in adults with a displaced distal radius fracture: a systematic review and meta-analysis. J Hand Surg Am 2018; 43(8): 710-719.e5. doi: 10.1016/j.jhsa.2018.05.003. Rämö L, Sumrein B O, Lepola V, Lähdeoja T, Ranstam J, Paavola M, et al. Effect of surgery vs functional bracing on functional outcome among patients with closed displaced humeral shaft fractures: the FISH Randomized Clinical Trial. JAMA 2020; 323(18): 1792-801. Doi: 10.1001/jama.2020.3182 Rangan A, Handoll H, Brealey S, Jefferson L, Keding A, Martin B C, et al. Surgical vs nonsurgical treatment of adults with displaced fractures of the proximal humerus: the PROFHER randomized clinical trial. JAMA 2015; 313(10): 1037-47. Doi: 10.1001/jama.2015.1629. Sabesan V J, Lombardo D, Petersen-Fitts G, Weisman M, Ramthun K, Whaley J. National trends in proximal humerus fracture treatment patterns. Aging Clin Exp Res 2017; 29(6): 1277-83. doi: 10.1007/s40520-016-0695-2. Sackett D L. Evidence-based medicine and treatment choices. Lancet 1997; 349(9051): 570. Doi: 10.1016/S0140-6736(97)80122-5. Slobogean G P, Johal H, Lefaivre K A, MacIntyre N J, Sprague S, Scott T, et al. A scoping review of the proximal humerus fracture literature: orthopedics and biomechanics. BMC Musculoskelet Disord 2015; 16(1). Doi: 10.1186/s12891-015-0564-8. Smith R W. A treatise on fractures in the vicinity of joints and on certain forms of accidental and congenital dislocations. Dublin: Hodges & Smith; 1847: p 191. Sumrein B O, Huttunen T T, Launonen A P, Berg H E, Felländer-Tsai L, Mattila V M. Proximal humeral fractures in Sweden-a registry-based study. Osteoporos Int 2017; 28(3): 901-7. Doi: 10.1007/s00198-016-3808-z.


Acta Orthopaedica 2021; 92 (5): 507–512

507

Surgeons’ behaviors and beliefs regarding placebo effects in surgery Annelie ROSÉN 1, Lisbeth SACHS 1, Amanda EKDAHL 1, Andreas WESTBERG 4, Paul GERDHEM 2,3, Ted J KAPTCHUK 5, and Karin JENSEN 1,5 1 Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden; 2 Department of Reconstructive Orthopedics, Karolinska University Hospital, Stockholm, Sweden; 3 Department of Clinical Science Intervention and Technology, Karolinska Institutet, Stockholm, Sweden; 4 Department of Orthopedic Surgery, Capio Sankt Göran Hospital, Stockholm, Sweden; 5 Program in Placebo Studies, Beth Israel and Deaconess Hospital, Harvard Medical School, Boston, MA, USA Correspondence: Karin.Jensen@ki.se Submitted 2020-08-26. Accepted 2021-04-01.

Background and purpose — Emerging evidence from sham-controlled trials suggest that surgical treatment entails substantial non-specific treatment effects in addition to specific surgical effects. Yet, information on surgeons’ actual behaviors and beliefs regarding non-specific treatment and placebo effects is scarce. We determined surgeons’ clinical behaviors and attitudes regarding placebo effects. Methods — A national online survey was developed in collaboration with surgeons and administered via an electronic link. Results — All surgical clinics in Sweden were approached and 22% of surgeons participated (n = 105). Surgeons believed it was important for them to interact and build rapport with patients before surgery rather than perform surgery on colleagues’ patients (90%). They endorsed the importance of non-specific treatment effects in surgery generally (90%) and reported that they actively harness non-specific treatment effects (97%), including conveying confidence and calm (87%), building a positive interaction (75%), and making eye contact (72%). In communication regarding the likely outcomes of surgery, surgeons emphasized accurate scientific information of benefits/risks (90%) and complete honesty (63%). A majority felt that the improvement after some currently performed surgical procedures might be entirely explained by placebo effects (78%). Surgeons saw benefits with sham-controlled surgery trials, nevertheless, they were reluctant to refer patients to sham controlled trials (46%). Interpretation — Surgeons believe that their words and behaviors are important components of their professional competence. Surgeons saw the patient–physician relationship, transparency, and honesty as critical. Understanding the non-specific components of surgery has the potential to improve the way surgical treatment is delivered and lead to better patient outcomes.

Accumulating evidence suggests that the elaborate context surrounding surgical treatment may contribute to considerable placebo effects (Kallmes et al. 2009, Beard et al. 2018). Hence, factors other than the surgical intervention itself could contribute to positive health outcomes, and a recent metaanalysis (including 53 trials and 4,000 patients) reported that in 51% of all placebo-controlled surgical trials there was no statistically better result in the surgical arm compared with the placebo arm (Wartolowska et al. 2014b). Placebo surgery, or sham surgery, is an invasive procedure that has the appearance of a therapeutic intervention, but during which the essential therapeutic maneuver is omitted (Wartolowska et al. 2014). Sham controls can be compared with active treatment in order to elucidate the specific effect of a surgical intervention. Any factors outside the active intervention that affect treatment outcomes can be referred to as non-specific treatment factors. These include, for example, explanations of a treatment (Kam-Hansen et al. 2014), prior experience (Kessner et al. 2013) and the doctor–patient relationship (Kaptchuk et al. 2008). There is significant evidence that non-specific treatment factors impact treatment outcomes, as demonstrated in many different health problems, using a range of different treatment modalities, including placebo pills, creams, and injections (Finniss et al. 2010, Wager and Atlas 2015). As yet, relatively little is known about non-specific treatment effects and placebos in surgery, and in particular about surgeons’ own behaviors and beliefs. Understanding the non-specific components of surgery has the potential to improve the way surgical treatment is delivered, and lead to better outcomes. The aim of the present study was to describe surgeons’ real-world behaviors and attitudes towards placebo effects. There are no previous studies on placebo attitudes among healthcare professionals in Sweden and in contrast to previous studies (Wartolowska

© 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group, on behalf of the Nordic Orthopedic Federation. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits ­unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. DOI 10.1080/17453674.2021.1941627


508

et al. 2014a, Baldwin et al. 2016) the present study reports specific behaviors that surgeons engage in when they harness placebo effects.

Methods In a national online survey, Swedish surgeons were asked questions about their clinical behaviors and attitudes regarding non-specific treatment effects and placebo effects. Survey preparation A focus-group meeting was organized to prepare for the survey, including 9 surgeons at the Karolinska University Hospital (8 male, 1 female). A first version of the survey was sent to medical students (n = 11) in order to ensure face validity. To compare our results with other surveys on placebo effects in surgery we adapted some questions to the present study (Tilburt et al. 2008, Raz et al. 2011, Wartolowska et al. 2014a, Baldwin et al. 2016). Survey participants Participants were surgeons, of any surgical specialty, affiliated to a surgical clinic or surgical department in Sweden. Inclusion criteria required that participants were (1) licensed surgeons operating in Sweden, and (2) able to read and understand Swedish. All heads of surgery clinics in Sweden were contacted by email list provided by the Swedish Surgical Society (www.svenskkirurgi.se), a body that organizes surgeons working in Sweden. Heads of surgery were asked if they would be willing to provide the individual email addresses of their surgeons. If yes, each surgeon received an email containing information on the study and a link to the survey. Survey procedure The survey was created in the software Survey& Report version 4.2.33.5 (https://sunet.se). The final survey consisted of 32 questions, including demographics, and took approximately 10 minutes to answer. To minimize conceptual ambiguity, definitions of specific and non-specific treatment effects were included on the first page. A specific treatment effect was defined as “an effect related to the specific medical treatment, i.e., the surgical procedure or a pharmacological substance to relieve symptoms.” A non-specific treatment effect was defined as “an effect related to the context surrounding the delivery of treatment.” In order to facilitate comparisons with previous surveys, we used a wide definition of the placebo effect, i.e., any improvement in response to a placebo, including natural history and regression to the mean. Participants were instructed to answer the questions based on their everyday work in the clinic, as to reflect real-world medical behavior. The survey started with general questions regarding the impact of non-specific treatment effects before introducing more specific (and potentially more controversial) ques-

Acta Orthopaedica 2021; 92 (5): 507–512

Table 1. Survey content: short description of the questions in the national survey to surgeons in Sweden, divided by topic a The impact of the doctor–patient relationship (patient scenario) • Do you believe that the described scenario may impact the treatment outcome? (Y/N) b Non-specific treatment effects in surgery • Do you believe that non-specific treatment effects play a role in surgical treatment? (Y/N) • Which of the following factors do you believe affect treatment outcomes (multiple choice)? b • Do you deliberately harness non-specific treatment effects in treatment of patients? (Y/N) • If yes, which non-specific treatment factors have you used? (multiple choice) b Framing and communication of treatment outcomes • How would you describe the information you give to your patients regarding expected outcomes of the surgical treatment? (multiple choice) b Performing surgery that includes placebo components • How often do you perform surgery that you believe includes a placebo component? (multiple choice) • Are there, in your view, surgical treatments that have no specific component, where the treatment outcome is entirely due to the placebo effect? (Y/N) b Placebo-controlled surgical trials • Sham surgery can only be used if there is no other effective treatment to compare a new treatment with (agree/disagree) • Sham surgery can only be used if there is no risk for adverse events in the placebo group (agree/disagree) • Sham surgery can only be used in trials of non-life-threatening conditions (agree/disagree) • Sham surgery is permissible when there is uncertainty about a specific treatment effect (agree/disagree) • What can, in your view, be problematic when using sham-controlled study designs? (multiple choice) b • Would you personally be able to recruit patients to a sham-controlled surgical trial? (Y/N) Sham surgery in clinical routine • Are there, in your view, clinical situations when sham surgery might be warranted as it has been proven effective in sham-controlled trials? (Y/N) Placebo definition, clinical value, and possible mechanisms • Do you agree with the following definition of the placebo effect? (Y/N) b • Do you agree with the following definition of placebo effects in surgery? (Y/N) b • What mechanisms do you believe explain the placebo effect? (multiple choice) b • Do you believe that the placebo effect is true, i.e., has a scientific explanation? (Y/N) • Do you believe that the placebo effect may have a therapeutic benefit? (Y/N) a For a full version of b Space provided for

the survey, see Supplementary data. additional comments. (Y/N) = Yes or No question; forced choice

tions regarding sham surgery. A summary of the survey content can be found in Table 1 and a full translation is provided as Supplementary data. The data collection was open between April and September 2018. 2 reminder emails were sent to anyone who did not respond to the initial invitation.


Acta Orthopaedica 2021; 92 (5): 507–512

Table 2. Demographics a Mean age (SD) [range] 47 (11) [31–69] Years since MD license was obtained, mean (SD) [range] 18 (10) [2–40] Mean number of patients seen/week (SD) [range] 24 (14) [0–60] Sex (men / women) (%) 68 / 32 Type of surgical unit (%) City hospital / Small community / Rural setting 62 / 34 / 4 a Basic information regarding the surgeons who responded to the survey (n = 105), obtained via self-report. The “number of patients seen per week” is an approximation of each surgeon’s degree of patient contact.

Ethics, funding, and potential conflicts of interests Ethical approval for the focus group and survey was obtained from the regional ethical review board in Stockholm, Sweden (Dnr 2018/514-31/1). All surgeons, both in the focus group and survey, gave written informed consent. The authors have no conflicts of interest. The present work was supported by the Pro Futura Grant from the Bank of Sweden Tercentenary Foundation.

Results Demographics The survey was sent to 478 surgeons and 105 (22%) responded, which is similar to the response rate in a previous survey study among surgeons (Baldwin et al. 2016). None of the 478 emails bounced back due to invalid email addresses, and together with the two reminder emails we hope that the invitations were properly received. According to the National Board of Health and Welfare (Socialstyrelsen 2019) the surgeons who responded to the survey are representative of all Swedish surgeons (listed as specialists in surgery, pediatric, hand, or plastic surgery) in terms of age (median 45–49 years, range 30–70 years) and gender (31% women), except for neurosurgery and thoracic surgery where there are fewer women (21% and 25%, respectively). We have no descriptive information for those who did not respond to our survey. For demographics, see Table 2. Characteristics The different surgical specialties among the study participants were general surgery (75%), orthopedic surgery (19%), and other (12%). A wide range of surgery types were represented e.g., thoracic surgery, breast surgery, endocrine surgery, cancer surgery, and gastrointestinal surgery. Placebo definition 9 out of 10 surgeons reported that they believe the placebo effect is genuine (91%) and has a therapeutic benefit (87%). When asked what mechanisms underlie placebo effects, the

509

top 5 answers were: psychological (99%), physiological (45%), natural history (41%), unexplained factors (32%), and regression to the mean (24%). The survey reflects a wider definition of placebo where all improvements seen among patients treated with placebo are referred to as placebo effects, hence the inclusion of options such as natural history and regression to the mean (see Part 3 in the survey). Impact of the doctor–patient relationship 9 out of 10 surgeons (90%) believed that operating on another surgeon’s patient may have an impact on the treatment outcome, i.e., pre-surgical assessments made by one surgeon but surgery performed by an equally skilled colleague. The most common reason for the influence on treatment outcomes is the reduced effect of the doctor–patient relationship (65%), as the interaction between a patient and clinician may build trust prior to the planned surgery (see Part 1 of the survey). In the space for additional comments, one surgeon wrote: “My experience is that any change of doctor involves a risk of problems arising. The established alliance between the patient and doctor is always affected by the change of doctor, as described, and unfortunately often negatively.” Non-specific treatment effects in surgery Reported behavior The survey asked if surgeons’ use of techniques may enhance non-specific treatment effects (see Part 2 of the survey). Almost 9 out of 10 surgeons deliberately use techniques aimed at harnessing non-specific treatment effects (87%). The most common strategy was to “communicate calm and confidence” (76%), followed by “to offer a positive social interaction” (75%), “make eye contact” (72%), “listen with interest” (67%), “look well-groomed” (45%), “confident handshake” (44%), “communicate positive expectations” (41%), and “treatment room clean” (31%). Several choices were allowed (Question 5 in the survey). In the space for additional comments one surgeon wrote: “I use my voice, which is slow with a strong local (northern) accent, I also move slowly in order to create the illusion that we have plenty of time.” Reported attitudes The survey asked questions about attitudes towards the importance of non-specific treatment effects. A vast majority of surgeons believe that non-specific treatment effects play a role in surgical treatments (97%). Only 3% stated that non-specific treatment factors have no effect in surgical treatments. The top 5 factors that surgeons believe may affect the treatment outcome, in addition to the specific effects of surgical interventions, were (several options possible): patient believes in treatment (91%), the doctor–patient relationship (85%), doctor conveys calm and assertiveness (85%), doctor believes in treatment (82%), the interaction and care from healthcare providers other than the surgeon (79%) (see Part 2 of the survey). In the space for additional comments several surgeons expressed a


510

need to balance high expectations. One stated that: “Above all, I convey realistic and honest expectations. I never oversell.” Framing and communication of treatment outcomes Reported behavior When asked how surgeons would characterize the information they give to patients regarding the possible outcomes of surgery, 9 out of 10 prefer to give what they believe is an accurate description of risks and benefits of the treatment (91%). Information described as “completely honest” was used by 63%, “calming/ reducing anxiety” (63%), “hopeful” (33%), and “involves positive expectations” (29%). One quote from the space for additional comments read: “I give the patient realistic expectations, both when it comes to time frames and the result of the operation. If one is too positive it can have the opposite effect”. Performing surgery with a placebo component Reported behavior When asked how often surgeons perform surgery themselves that they believe has a placebo component, there were 5 options ranging from regularly (more than once per week) to never. Half of them agree it is part of their normal practice. 15% say they do it “regularly” (more than once per week), 17% “often” (more than once per month), and 21% “sometimes” (more than once per year). The other half of respondents say they do it “rarely” (less than once per year) (26%) or “never” (23%) (see Part 2 of the survey). Reported attitudes When asked if surgeons believe that there are surgical treatments where the entire treatment effect is due to placebo, 78% of surgeons said yes. In the space for additional comments surgeons gave examples, e.g., varicose vain surgery, orthopedic surgery (not specified), gallbladder surgery for pain, or hernia surgery. One responded that “it may happen in other countries” (see Part 4 of the survey). Placebo-controlled surgical trials Reported behavior When asked if they would personally be willing to recruit patients to sham-controlled surgical trials, less than half of surgeons said yes (47%) (see Part 4 of the survey). Reported attitudes When asked about their beliefs towards placebo-controlled clinical trials, 71% responded that placebo-controlled surgical trials can be used when there is uncertainty about the mechanism of an established surgical procedure. 74% responded that placebo should only be used in conditions that are not lifethreatening. 71% answered that placebo-controlled surgery should only be used if there are no risks involved, such as general anesthesia. 51% answered that it should only be used if there is no other effective treatment with which to compare the intervention. 33% answered that placebo surgery should

Acta Orthopaedica 2021; 92 (5): 507–512

only be used in designs where all patients can cross over and get real surgery (see Part 4 of the survey). When asked for their attitudes towards complications around sham surgery, ethical considerations were most commonly mentioned (88%), followed by the potential side effects, e.g., from general anesthesia (81%). Half of the participants responded that patient’s trust in doctors may be affected negatively (49%). The use of concealment in placebo-controlled trials was mentioned as an obstacle by 40% of surgeons, and the lack of effectiveness of placebo surgery was the response by 35%. When asked about usage of sham surgery in clinical routine, 36% of surgeons find the use of sham surgery permissible as it has been proven effective in sham-controlled trials (see Part 4 of the survey).

Discussion We examined whether, and to what extent, surgeons acknowledge and implement non-specific treatment effects in their clinical routine. Additionally, this study examined surgeons’ attitudes towards sham-controlled surgical trials. In contrast to 2 previous survey studies among surgeons (Wartolowska et al. 2014a, Baldwin et al. 2016), our survey assessed specific behaviors that surgeons engage in when harnessing placebo components of surgery, aiming towards a more concrete understanding of surgeons’ practices. Moreover, we asked specific questions about the way surgeons shape their patient information regarding likely outcomes of the surgery. This was a way to address the role of expectations and how they may shape surgery outcomes. To our knowledge, this has not been reported anywhere before. Finally, the present study included concrete questions about surgeons’ willingness to refer patients to sham-controlled surgical trials and found an interesting conflict between what surgeons say and what surgeons are willing to do. Surgeons in this survey reflect self-awareness and endorse the importance of non-specific treatment effects in surgery, with emphasis on the patient–clinician relationship. Half of the surgeons believe they perform surgical procedures with a placebo component, and deliberately use techniques aimed at harnessing non-specific treatment effects. Thus, surgeons explicitly acknowledge the placebo effect in their own clinical practice. One question is whether surgeons’ attitudes and behaviors can be compared to medical doctors in other medical disciplines. In a study from the UK, 77% of primary care doctors reported using “impure placebos” regularly (at least once per week) (Howick et al. 2013), which means they are using active drugs as placebos (such as treating a viral infection with antibiotics). Similar results were found in the USA, where around half of internists and rheumatologists reported usage of impure placebos (Tilburt et al. 2008). Our data indicates that surgeons are self-aware and deliberately seek to maximize non-specific treatment effects. In par-


Acta Orthopaedica 2021; 92 (5): 507–512

ticular, they strive to build a good rapport with their patients, for example by making eye contact, conveying calm, and using attentive listening. In spite of the strong belief that non-specific components play a central role in surgical treatment, surgeons are generally not fostering positive treatment expectations. Instead, they prefer to use accurate information about potential benefits of surgery when informing their patients. Our data reflects a notion among surgeons where trust is the most important non-specific treatment component, as they prefer to understate the potential of a treatment rather than risk the patient’s trust in the surgeons’ clinical judgment. This is contrary to a commonly repeated opinion suggesting that surgeons overestimate the effects of surgery, and that their judgment of when to operate (or not) is biased (Perezgonzalez 2018); sometimes referred to as the law of the instrument (Kaplan 1964). Furthermore, our data indicates that surgeons prefer to perform surgery on their own patients, rather than colleagues’ patients, so as to maintain the patient–clinician relationship and thereby obtain better surgical outcomes. Surgeons mentioned this as an important part of their professional competence, even if they were not aware of any scientific evidence to support its clinical advantage. Our study suggests that surgeons’ behaviors are directed by strong beliefs regarding non-specific treatment factors—at least their reported behaviors, as we did not measure actual decisions in the surgical clinics. Empirical studies—if ethically feasible—should be performed in order to verify some of these longstanding assumptions. There have been few previous studies investigating doctors’ attitudes towards placebos in general (Tilburt et al. 2008, Fassler et al. 2010, Howick et al. 2013), and very little about surgery in particular (Campbell et al. 2011, Wartolowska et al. 2014a). As this is the first study that focused on surgeons’ actual behaviors in the clinic, we found that not all reported behaviors were congruent with surgeons’ attitudes. In line with a previous report of surgeons’ attitudes towards shamcontrolled clinical trials in the UK (Campbell et al. 2011), our survey indicates that surgeons seek support for surgical procedures from sham-controlled trials. A large majority saw the potential value of comparing real and sham surgery. Nevertheless, less than half of surgeons (47%) were willing to recruit their own patients to sham-controlled trials (the comparable number in the UK trial was 43% [Campbell et al. 2011]). This indicates a conflict between what surgeons think and what they would do. It should be emphasized that positive attitudes towards sham-controlled surgical studies is not the same thing as referring one’s patients to such trials. It is still unclear whether surgeons who were familiar with results from sham-controlled trials were more willing to recruit patients than surgeons with less knowledge about placebocontrolled surgery. The understanding of surgeons’ attitudes and behaviors may be of importance for predicting advances in surgical medicine.

511

Contrary to assessments of pharmacological treatments, the use of placebo controls is not considered the gold standard in surgery. Based on the results here, and 2 other studies (Campbell et al. 2011, Wartolowska et al. 2014a), it is unlikely that sham-controlled treatment trials—for legitimate ethical, scientific, and feasibility reasons—will become customary in the near future, in spite of surgeons’ understanding of the scientific benefits. In both the Swedish and UK samples, surgeons have concerns about potential side effects from sham surgery and are also apprehensive about the risk to patient–surgeon trust. Yet, there is indication that patients in the placebo arm of sham-controlled studies have less serious adverse events compared with patients in the active treatment arm (Wartolowska et al. 2014b). It is possible that surgeons’ willingness to contribute to sham-controlled trials will change in the future when there is better characterization of the potential risks. In general, a broader discussion of factors that may contribute to patient improvement in surgical trials, including non-specific treatment factors, spontaneous remission, and regression to the mean, will improve the understanding of the specific mechanisms of surgery. Some surgeons (36%) find it ethically permissible to use sham surgery outside the scope of a clinical trial, and we included this question as the topic was raised in a previous study (Wartolowska et al. 2014a). Using sham surgery as a clinical tool may seem puzzling, but can be compared to so-called “open-label” placebos, an increasing treatment approach where patients are aware they are receiving inactive pills. The popularity (and increased acceptance) of open-label placebos may lead to a shift in attitudes towards inactive treatment in general and lead to clinical applications of sham surgery in the future. While there are previous reports on the use of placebo treatments in clinical practice (Tilburt et al. 2008, Howick et al. 2013) there is a paucity of data on the comportment physicians adopt in order to engage non-specific treatment effects. The behaviors reported by surgeons here may thus help understand patient–clinician relationships and medical practice in general. As reported here, some behaviors are paradoxical by nature, for example the reluctance among surgeons to induce positive expectations about treatment outcomes. Surgeons explicitly avoid giving positive suggestions regarding treatment outcomes, as maintained trust between patient and clinician is more important for patient outcomes than the potential effect of inducing positive expectations. This indicates a delicate interaction between treatment expectancy and patient–clinician trust that needs to be studied in more detail. Our findings build on self-reports, and even if surgeons were asked to report real-world behaviors the answers might not be validated objectively. In contrast to questions regarding placebo effects, questions concerning non-specific treatment effects did not explicitly mention natural history or regression-to-the-mean. Thus, it is unclear whether surgeons include these in the concept of non-specific treatment


512

effects. Also, the response rate was only 22%, which may seem low. Yet, the respondents in this study were drawn from the general surgeon population in Sweden, which is a major strength. In contrast, all respondents in the previous survey on placebo attitudes among surgeons (100 respondents) (Wartolowska et al. 2014a) attended a meeting where surgeons were aware of, or involved in, a national shamcontrolled trial of shoulder surgery, which restricted the sample. The present sample is representative of surgeons in Sweden belonging to the Swedish Surgical Society and may be generalized within the reasonable limits inherent to survey methodologies in general. Supplementary data Full version of the survey is available as supplementary data in the online version of this article, http://dx.doi.org/10.1080/ 17453674.2021.1941627

The authors wish to thank Dr Kristofer Bjerså and Dr Jeremy Howick for invaluable help with the preparation of the survey. Conceptualization: KBJ, AR, LS. Data curation and formal analysis: AR, AE. Funding acquisition: KBJ. Methodology: AR, LS, TJK. Project administration: KBJ, AR, AE. Validation: KBJ, AR, AE, PG, AE. Writing original draft: AR, AE, LS, TJK, KBJ. Review of writing and editing: AR, AE, LS, PG, AW, TJK, KBJ. Acta thanks Ian Harris and Karolina Wartolowska for help with peer review of this study.

Baldwin M J, Wartolowska K, Carr A J. A survey on beliefs and attitudes of trainee surgeons towards placebo. BMC Surg 2016; 16(1): 27. doi: 10.1186/s12893-016-0142-5. Beard D J, Rees J L, Cook J A, Rombach I, Cooper C, Merritt N, Shirkey B A, Donovan J L, Gwilym S, Savulescu J, Moser J, Gray A, Jepson M, Tracey I, Judge A, Wartolowska K, Carr A J, Group C S. Arthroscopic subacromial decompression for subacromial shoulder pain (CSAW): a multicentre, pragmatic, parallel group, placebo-controlled, three-group, randomised surgical trial. Lancet 2018; 391(10118): 329-38. doi: 10.1016/ S0140-6736(17)32457-1. Campbell M K, Entwistle V A, Cuthbertson B H, Skea Z C, Sutherland A G, McDonald A M, Norrie J D, Carlson R V, Bridgman S, group Ks. Developing a placebo-controlled trial in surgery: issues of design, acceptability and feasibility. Trials 2011; 12:50. doi: 10.1186/1745-6215-12-50.

Acta Orthopaedica 2021; 92 (5): 507–512

Fassler M, Meissner K, Schneider A, Linde K. Frequency and circumstances of placebo use in clinical practice: a systematic review of empirical studies. BMC Med 2010; 8: 15. doi: 10.1186/1741-7015-8-15. Finniss D, Kaptchuk T, Miller F, Benedetti F. Placebo effects: biological, clinical and ethical advances. Lancet 2010; 375: 686-95. Howick J, Bishop F L, Heneghan C, Wolstenholme J, Stevens S, Hobbs F D, Lewith G. Placebo use in the United Kingdom: results from a national survey of primary care practitioners. PLoS One 2013; 8(3): e58247. doi: 10.1371/journal.pone.0058247. Kallmes D F, Comstock B A, Heagerty P J, Turner J A, Wilson D J, Diamond T H, Edwards R, Gray L A, Stout L, Owen S, Hollingworth W, Ghdoke B, Annesley-Williams D J, Ralston S H, Jarvik J G. A randomized trial of vertebroplasty for osteoporotic spinal fractures. N Engl J Med 2009; 361(6): 569-79. doi: 10.1056/NEJMoa0900563. Kam-Hansen S, Jakubowski M, Kelley J M, Kirsch I, Hoaglin D C, Kaptchuk T J, Burstein R. Altered placebo and drug labeling changes the outcome of episodic migraine attacks. Sci Transl Med 2014; 6(218): 218ra5. doi: 10.1126/scitranslmed.3006175. Kaplan A. The conduct of inquiry: methodology for behavioral science. San Francisco: Chandler Publishing; 1964. Kaptchuk T J, Kelley J M, Conboy L A, Davis R B, Kerr C E, Jacobson E E, Kirsch I, Schyner R N, Nam B H, Nguyen L T, Park M, Rivers A L, McManus C, Kokkotou E, Drossman D A, Goldman P, Lembo A J. Components of placebo effect: randomised controlled trial in patients with irritable bowel syndrome. BMJ 2008; 336(7651): 999-1003. doi: 10.1136/ bmj.39524.439618.25. Kessner S, Wiech K, Forkmann K, Ploner M, Bingel U. The effect of treatment history on therapeutic outcome: an experimental approach. JAMA Intern Med 2013; 173(15): 1468-9. doi: 10.1001/jamainternmed. 2013.6705. Perezgonzalez J. Book review: Surgery, the ultimate placebo. Frontiers in Surgery 2018; 5(38): 1-7. doi: 10.3389/fsurg.2018.00038. Raz A, Campbell N, Guindi D, Holcroft C, Dery C, Cukier O. Placebos in clinical practice: comparing attitudes, beliefs, and patterns of use between academic psychiatrists and nonpsychiatrists. Can J Psychiatry/Revue canadienne de psychiatrie 2011; 56(4): 198-208. doi: 10.1177/070674371105600403. Socialstyrelsen. http://www.socialstyrelsen.se/statistik/statistikdatabas; 2019. Tilburt J C, Emanuel E J, Kaptchuk T J, Curlin F A, Miller F G. Prescribing “placebo treatments”: results of national survey of US internists and rheumatologists. BMJ 2008; 337(oct23_2): a1938-. doi: 10.1136/bmj.a1938. Wager T D, Atlas L Y. The neuroscience of placebo effects: connecting context, learning and health. Nat Rev Neurosci 2015; 16(7): 403-18. doi: 10.1038/nrn3976. Wartolowska K, Beard D J, Carr A J. Attitudes and beliefs about placebo surgery among orthopedic shoulder surgeons in the United Kingdom. PLoS One 2014a; 9(3): e91699. doi: 10.1371/journal.pone.0091699. Wartolowska K, Judge A, Hopewell S, Collins G S, Dean B J, Rombach I, Brindley D, Savulescu J, Beard D J, Carr A J. Use of placebo controls in the evaluation of surgery: systematic review. BMJ 2014b; 348: g3253. doi: 10.1136/bmj.g3253.


Acta Orthopaedica 2021; 92 (5): 513–525

513

Presenting artificial intelligence, deep learning, and machine learning studies to clinicians and healthcare stakeholders: an introductory reference with a guideline and a Clinical AI Research (CAIR) checklist proposal Jakub OLCZAK 1, John PAVLOPOULOS 2, Jasper PRIJS 3,4, Frank F A IJPMA 4,5, Job N DOORNBERG 3–5, Claes LUNDSTRÖM 6, Joel HEDLUND 6, and Max GORDON 1 1 Institute

of Clinical Sciences, Danderyd University Hospital, Karolinska Institute, Sweden; 2 Department of Computer and System Sciences, Stockholm University, Sweden; 3 Flinders University, Adelaide, Australia; 4 Department of Trauma Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands; 5 The Machine Learning Consortium; 6 Center for Medical Image Science and Visualization, Linköping University, Sweden Correspondence: Jakub.Olczak@ki.se Submitted 2020-12-12. Accepted 2021-03-26.

Background and purpose — Artificial intelligence (AI), deep learning (DL), and machine learning (ML) have become common research fields in orthopedics and medicine in general. Engineers perform much of the work. While they gear the results towards healthcare professionals, the difference in competencies and goals creates challenges for collaboration and knowledge exchange. We aim to provide clinicians with a context and understanding of AI research by facilitating communication between creators, researchers, clinicians, and readers of medical AI and ML research. Methods and results — We present the common tasks, considerations, and pitfalls (both methodological and ethical) that clinicians will encounter in AI research. We discuss the following topics: labeling, missing data, training, testing, and overfitting. Common performance and outcome measures for various AI and ML tasks are presented, including accuracy, precision, recall, F1 score, Dice score, the area under the curve, and ROC curves. We also discuss ethical considerations in terms of privacy, fairness, autonomy, safety, responsibility, and liability regarding data collecting or sharing. Interpretation — We have developed guidelines for reporting medical AI research to clinicians in the run-up to a broader consensus process. The proposed guidelines consist of a Clinical Artificial Intelligence Research (CAIR) checklist and specific performance metrics guidelines to present and evaluate research using AI components. Researchers, engineers, clinicians, and other stakeholders can use these proposal guidelines and the CAIR checklist to read, present, and evaluate AI research geared towards a healthcare setting.

Key concepts presented in this review ● Introduction to artificial intelligence (AI) and machine learning (ML) and how these relate to traditional clinical research statistics ● Common pitfalls in AI research ● How to measure and interpret AI and ML performance and how to interpret these measures ● Ethical considerations related to AI and ML in medicine ● Introduction of a Clinical Artificial Intelligence Research (CAIR) checklist, which helps to facilitate understanding, reporting, and interpreting of AI research in medicine.

Machine learning (ML), deep learning (DL), and artificial intelligence (AI) have become increasingly common in orthopedics and other medical fields. Artificial intelligence, defined in 1955, is “the science and engineering of making intelligent machines,” where intelligence is “the ability to learn and perform suitable techniques to solve problems and achieve goals, appropriate to the context in an uncertain, ever-varying world” (Manning 2020). Machine learning implies models and algorithms that learn from data rather than following explicit rules. Deep learning (DL) is a form of ML that uses large and multilayered artificial neural networks. Neural networks are computational algorithms influenced by biological networks for information processing. They consist of several layers of “neurons” that communicate. By training the neurons how to communicate, interactions develop that solve a particular problem. DL is currently the most successful and general ML approach (Michie et al. 1994, Manning 2020). Recent technological breakthroughs in computational hardware (like specialized graphics processors [GPUs] and cloud

© 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group, on behalf of the Nordic Orthopedic Federation. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits ­unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. DOI 10.1080/17453674.2021.1918389


514

computing), software, and new algorithms have paved the way for a revolution in applications and utility. Together these have resulted in new and exciting developments. Examples range from new drug discoveries (Fleming 2018, Paul et al. 2020) to skin cancer detection (Esteva et al. 2017), automated screening of diabetic retinopathy (Gulshan et al. 2016, 2019), fracture detection in radiographs (Badgeley et al. 2019, Qi et al. 2020, Olczak et al. 2021), detecting rotator cuff tears in MRI (Shim et al. 2020) or vertebral fractures in CT scans (Nicolaes et al. 2019). As many methods require a deeper understanding of computer science, we see engineers perform much of the research geared towards healthcare professionals. This creates challenges between absolute correctness and a technical perspective, and something all stakeholders, including regular clinicians, can understand and benefit from. This paper aims to give clinicians a context and greater understanding of these AI methods and their results. Machine learning, deep learning, and artificial intelligence At its core, AI involves automating complex algorithms, which often depend heavily on statistics. Computation allows for calculations and modeling on a scale that humans could theoretically perform but which are too large and complicated to be feasible. AI has its own language and different names for concepts familiar to medical professionals. Using different names obfuscates the fact that these are familiar concepts. In ML, a model is akin to a test. For example, an ML model could test whether a radiograph fulfills the conditions for containing a fracture. Depending on how well the image meets these conditions, it will calculate a probability for a fracture in the image. In contrast, regular statistics investigate individual features’ contribution to a particular outcome, e.g., how much does smoking or alcohol contribute to the risk of having a fracture. The core difference is that AI models generally have a much richer set of features, often in the thousands. Individual features merge into patterns and regularly lose their interpretability. AI models can mostly be considered “black boxes” as the path from input to output is often unclear. The main objective is, therefore, usually predicting a specific outcome. Another difference from traditional statistics is that many ML models guess the correct answer and improve by studying the errors they make. For example, suppose we present the model with an image. In that case, the model could guess that the probability is 80% that there is a fracture. If we agree that there is a fracture, we can calculate that the error is 20%. By investigating what parameters were not suggesting fracture, we can nudge those parameters into the fracture category to be more likely to predict a fracture the next time. When reporting on AI interventions, the clinical setting is crucial for understanding performance. Clinicians must understand the tasks and performance measures and whether the outcomes are relevant to the clinical setting.

Acta Orthopaedica 2021; 92 (5): 513–525

Classification A classification task is a task that categorizes observations to a set of known outcomes/classes, for example, type of fracture, normal vs. pathological ECG, or staging of a malignancy. Typically, the model produces a probability score per outcome. When there are 2 outcomes, e.g., fracture or not, we have a dichotomous outcome, a binary classification problem (e.g., “fracture” or “no fracture”). If there are several possible outcomes, e.g., hip fracture Garden 1–4, it is a multi-class classification task. The model’s core function is to separate the groups, i.e., it is preferred that the network provides probabilities close to 0% and 100% instead of around 50%. For example, an algorithm might state that there is a 20% probability of a fracture. For some purposes, this could be considered sufficient to decide on the absence of pathology, e.g., a suspected type A ankle fracture. In contrast, a scaphoid fracture may be unfortunate to miss, due to the risk of non-union when left untreated. Even at a 20% likelihood of a fracture, we might proceed with an MRI. Therefore, a reliable classifier’s key feature is better separation between groups with few cases in the conflicted region’ in this example, between 20% and 80%. Image analysis, segmentation, and localization Image analysis (also coined as “computer vision”) has gathered much attention and success. It entails analyzing and classifying the contents of images, for example, fractures (Olczak et al. 2021). Sometimes the task is to classify the contents of the image and specify a feature’s location in an image. Suppose the objective is to locate an object in an image, for example, a femur fracture, and mark it with a bounding box (Qi et al. 2020). In that case, the task is image localization or object detection. A similar task is image segmentation. The task is then to mask out regions of interest, for example, marking out the actual boundaries of individual bones or fracture lines. Predicting continuous values (regression modeling) Predicting continuous outcomes is done using regression modeling. Continuous outcomes could be predicting the angle of fracture displacement, the medial clear-space in an ankle radiograph, or the ulna-plus in a wrist radiograph. Other tasks Natural language processing (NLP) deals with language and text management. It could entail translating between languages, interpreting a written journal, generating a journal, or describing an image’s contents. Clustering is a form of ML where the AI groups data into classes without prior knowledge. For example, given a collection of radiographs, it is given the task of sorting them into groups. For instance, we could let the algorithm find which fractures are similar instead of manually choosing the groups.


Acta Orthopaedica 2021; 92 (5): 513–525

Pitfalls in the classification task for medical data Outcome imbalances Medical data is often skewed, with some outcomes being much more common than others. In general, we are more likely to find a healthy individual than an unhealthy individual. Hence, a negative test for a disease is the most likely outcome. In situations where there are multiple possible outcome types, e.g., an elaborate classification scheme, each outcome becomes less likely. That is, if we have 30 fractures and 3 groups, each subgroup will, on average, contain 10 fractures. In most cases, it becomes even more skewed, as some subgroups are more common than others. By emphasizing rare cases, giving them more weight in computations during training, we can alleviate the imbalances during training. Training and testing Training is the process of an AI model iterating through a database of cases, with annotations, thereby learning from many examples. One iteration through the entire training set is called an “epoch,” a time-consuming process that can take hours to weeks to reach optimal performance. During this process, it learns the important features of the data. The testing phase is when the model examines examples it has not seen before and on the basis of which it has to provide an outcome. Predictions of the model are compared with the ground truth in order to evaluate performance. During testing, model performance may not necessarily reflect accurate performance, as rare cases will be underrepresented. Unfortunately, recruiting more of the underrepresented classes is labor-intensive. For example, suppose we have a class that occurs once in 300 cases. In that case, we need to review at least 600 images to find 2 examples. These rare cases are usually clinically interesting, and the effort needs to be balanced against the clinical importance. A common practice is to manipulate images at random, e.g., rotating images, to force the network to find features independent of the manipulations applied; this is a form of data augmentation. Missing data Some outcomes are so rare that they will not be present in the data, preventing the algorithm from detecting them. It is a fundamental difference from humans, who can learn what a class looks like before seeing it, e.g., a Pipkin fracture, and recognize when first seeing it. Overfitting An ML model learns by looking at examples. If it learns the training examples too well, it learns the individual patients instead of the problem’s general features, i.e., it learns to recognizes the individual training cases and the expected outcome,

515

rather than the common traits that make up the outcome. Due to the size and flexibility of ML models used, this is a common problem and the reason why the gold standard is to split the data into at least a training and a test set. The test set is kept separate for final evaluation and is not used for training the model and is thus a more objective measure of performance. The same training case, or patient, must not be included in both the training and test set, as this would overestimate the accuracy. A validation set is similar to the test set, but is used to optimize settings for the model during training—and is not always reported. Confusingly, the test set is sometimes called the validation set.

Performance measures When reporting on ML algorithms, the clinical setting is essential for understanding the actual performance—and clinicians and readers must understand the results. There are many methods to measure performance, all with their strengths and weaknesses. In Table 1, we present common and widely accepted outcome measures or metrics. The confusion table Many of the performance measures presented here are familiar to clinicians from diagnostic testing, e.g., accuracy, specificity, and sensitivity. ML contains a large number of additional performance measures that researchers can report. There is a need to strike a balance between measures that clinicians are familiar with and achieving methodological perfection. When assessing an experiment’s outcome or a diagnostic test, it is common practice to present outcomes in a confusion table. A binary test has 2 possible outcomes. For a binary diagnostic test (e.g., presence of a fracture in a radiograph or plantar flexion in Simmonds–Thompson’s test for Achilles tendon rupture), we have 4 possible outcomes. Table 2 will serve as the reference for understanding performance measures. Suppose we are using a model to classify an ankle fracture into 1 of 3 outcomes—type A, type B, or type C malleolar fracture, excluding the “no fracture” outcome. We present the resulting 3-by-3 confusion table in Figure 1. The usual way to deal with the data is to divide it into subparts, where we look at each outcome separately, as in Table 3. There is an underlying decision-making process when an outcome is positive or negative. For example, we might decide that if there is a > 50% chance of the presence of a certain condition, the test is considered positive, and we would get one confusion table. However, if we decided that we need > 90% certainty to decide that a test is positive, we would get less positive test results and fewer false positives (FP). The resulting confusion table would look very different. A screening test might consider a > 20% likelihood as a positive outcome, resulting in many FP but very few false negatives (FN). The threshold where we decide that a test is negative or positive is called the decision threshold or classification threshold.


516

Acta Orthopaedica 2021; 92 (5): 513–525

Table 1. Evaluation metrics Measure

Table 2. A 2-by-2 confusion table for a binary test—2 possible outcomes Calculation or description

Accuracy (TP+TN)/(TP+TN+FN+FP) Sensitivity, true positive rate TP/(TP+FN) (TPR), recall Specificity TN/(TN+FP) Youden’s J sensitivity + specificity–1 False-positive rate (FPR) FP/(TN+FP) = 1–specificity Precision, positive predictive value TP/(TP+FP) (PPV) Negative predictive value (NPV) TN/(TN+FN) F1-score, Dice score 2•precision•sensitivity/ (precision + sensitivity) 2•TP/(2•TP+FP+FN) Model performance curves: Receiver operating characteristic (ROC) curve sensitivity (y-axis) against 1– specificity (x-axis), i.e., TPR against FPR Precision-recall (PR) curve Precision (y-axis) against sensitivity (x-axis) Area under the curve: AUC of the ROC curve (AUC) Statistic of model performance AUC of the PR-curve (AUPR) Statistic of model performance Object detection and localization—image segmentation (localization in an image): Intersection over union (IoU) TP/(TP+FP+FN) Region of interest (ROI) Used in 2D and 3D image segmentation Continuous data (regression modeling): Means squared error (MSE) ∑(true value–prediction)2/ number of cases Root mean squared error (MSE) √MSE Mean absolute error (MAE) ∑(true value–prediction)/ number of cases Text data: Bilingual evaluation understudy Compares generated text with (BLEU) reference texts Recall-oriented Understudy for Compares generated text with Gisting Evaluation (ROUGE) reference texts Multiple measurements: Frequency weighted average Summarizes many different outcomes TP = true positive, FP = false positive, TN = true negative, FN = false negative.

Measuring performance Accuracy Accuracy is defined as the correct classification rate, i.e., the rate of correct findings. For instance, if we had a data set containing 5 fractures in 100 radiographs and all those fractures were detected, we would have 100% accuracy. Suppose we have a test that always indicates “no fracture,” then the performance of the test would be 95% accurate. The second test is very accurate but has no clinical value. Accuracy has limited value for imbalanced data sets. Sensitivity (recall) and specificity Sensitivity (also known as recall) and specificity are properties well known to clinicians. Sensitivity measures how likely a test is to exclude or detect a condition correctly. We can

Ground truth

Prediction Positive Negative (detected) (not detected)

Positive (disease) Negative (normal)

True positive (TP) False negative (FN) False positive (FP) True negative (TN)

“Positive” and “negative” do not refer to benefit. Positive (P) refers to a condition’s presence and negative (N) to the condition’s absence. Table 3. Dividing the 3-by-3 confusion matrix from Figure 1 into 3 binary submatrices

Predicted Type A Predicted Type B Predicted Type C True False True False True False

True TP FN True TP (18) (8) (116) False FP TN False FP (15) (169) (15)

FN True TP FN) (21) (33) (14) TN False FP TN (21) (13) (150)

True malleolar class

C

3

11

33

B

12

116

9

A

18

4

4

A

B

C

Freq. 90 60 30

Predicted malleolar class

Figure 1. Confusion matrix for an ankle fracture classification experiment, according to Danis-Weber (AO Foundation/Orthopedic Trauma Association (AO/OTA)) classification. There are 26 type A fractures, 137 type B fractures, and 47 type C fractures. Data reproduced from (Olczak et al. 2020).

always achieve 100% sensitivity by saying that everyone has the condition. We would spot every case with the condition, but get also get many false positives. Specificity represents the true negative (TN) rate, which should usually be high in medical tests, and is balanced with the sensitivity. In a more complex task where we want to differentiate among multiple outcomes, the number of true negatives will dominate for most outcomes, and specificity should generally be high. As with accuracy and other performance measures that consider the TN rate, specificity contains little information of value in unbalanced datasets. Specificity and sensitivity represent the proportion of TP and TN, respectively, and not the probability of a condition. A typical use case for high sensitivity is fracture detection. The Ottawa Knee Rule has a sensitivity of 98%, and a nega-


Acta Orthopaedica 2021; 92 (5): 513–525

False negatives

517

True negatives Precision =

True positives

False positives

Sensitivity =

Figure 2. Graphical illustration of precision and sensitivity (or recall). Circles, “ ,” represent cases without the disease/class. Bullets, “●,” represent cases with the disease/class.

tive test will allow us to not go further with further imaging studies. Specificity is roughly 50%, and half will have falsepositive tests. Conversely, we care more about specificity in meniscal tears as these are usually less acute, and the bottleneck is often the availability of MRI. Apley’s maneuver has a sensitivity of 20%. However, a specificity of 90% suggests that requesting an MRI will result in few unnecessary exams once encountered. Ideally, a test or algorithm should provide high sensitivity and specificity. However, depending on the clinical setting, we can choose to sacrifice one for the other. Youden’s J combines specificity and sensitivity into one metric and is a way to summarize them into a single value, ranging from 0 to 1. False positive rate (FPR) The FPR is the proportion of negative outcomes that have been incorrectly predicted as positive and should be considered the opposite of sensitivity. Positive predictive value (PPV) and negative predictive value (NPV) Given a prediction, we want to know how likely it is for that prediction to be correct. PPV (also known as precision) answers the question: if we have a set of positive outcomes (cases predicted as positive), what proportion of those outcomes were truly positive? NPV measures the same for negative cases, i.e., if we have a set of negative outcomes, what proportion of those outcomes are genuinely negative? PPV and NPV, in contrast to specificity and sensitivity, give the probability of an outcome based on the prevalence in the sample. Precision and recall Precision and recall, as terms, are commonly used in ML studies but relatively unknown in medicine. In epidemiology, precision is the PPV, while recall is the sensitivity. Neither precision nor sensitivity takes into account TNs and, as such, they are less affected by class imbalances in data. Figure 2 illustrates their relationship.

F1 score or the Dice score Class imbalance has become more recognized in medical AI. It has become more common to use performance measures that take class imbalance into account. Precision and sensitivity are less sensitive to class imbalance. The F1 score, or Dice score, is a way to combine precision and sensitivity, and can be understood in terms of data overlap, as in Figure 2. The F1 score is well suited for imbalanced class problems. It is also used in image segmentation and localization tasks; see section “Image segmentation or localization”. Other good performance measures exist but are not commonly encountered, e.g., the Matthews correlation coefficient (MCC) and other F-scores. See the supplement for details. Performance curves and area under the curve (AUC) We derived the previous performance measures from the confusion table based on classification outcomes. AI classification systems usually yield a probability score as output (e.g., 99% could result in a positive prediction while 3% are in a negative one) and classify data according to a decision threshold. These thresholds are generally arbitrary (e.g., at 50%). However, they can also be tuned on a separate development dataset or derived from the literature. We constructed the confusion table based on whether we detected a condition or not, depending on whether it was present. The outcome of the decision-making process depends on the classification threshold. To assess the predictions without relying on a single classification threshold, we can compute the negatives’ rate for all thresholds (i.e., from 0 to 1) and plot them in a curve. It is not feasible to compute the confusion matrix and outputs for all possible thresholds. Instead, we compute the confusion matrix for some thresholds, combine them into a curve, and estimate the area under the curve (AUC). By computing AUC, we can estimate generic model performance. The two curves mostly studied are the receiver-operating characteristic (ROC) and precision-recall (PR) curves. In Figure 3, we see the AUC and AUPR curves for the 3 types of malleolar fractures. Receiver operating characteristic (ROC) curve When research literature mentions AUC, it usually refers to the area under the ROC curve (AUC). We will use AUC for the area under the ROC curve unless otherwise explicitly stated. The ROC curve plots the sensitivity (the y-axis) against the FPR (the x-axis) for all decision thresholds in order to obtain a curve. Computing the area under that curve gives us the AUC, which measures the model’s overall accuracy. The ROC curve’s idea is to measure the model’s ability to separate the groups by penalizing based on how wrong probabilities are. Interpretation As AUC depends on the specificity, which includes the TN outcomes, it is sensitive to imbalanced data. For a clinical trial or practical application, high AUC risks overestimating per-


518

Acta Orthopaedica 2021; 92 (5): 513–525

Sensitivity – malleolar category A

Precision

1.00

0.75

FP Prediction

Ground truth

1.00

2x

TP

IoU =

0.75

FN FP Prediction

Ground truth

F1 score =

Ground truth

TP

0.50

FN

FN 0.25

0.00 0.00

0.25

0.25

0.50

0.75

1.00

False positive rate

0.00 0.00

1.00

0.75

0.75

0.50

0.50

0.25

0.25

0.25

0.50

0.75

1.00

Sensitivity – malleolar category C

0.00 0.00

0.75

0.75

0.50

0.50

0.25

0.25

0.50

0.75

0.75

1.00

False positive rate

0.00 0.00

F1=0.40 IoU=0.25

FP

+

TP Prediction

F1=1.0 IoU=1.0

1.00

Sensitivity

Figure 4. Comparing the IoU and the F1 score in terms of data overlap. The overlapping sets illustrate why both are commonly used performance measures in object detection and image segmentation. The IoU is the percentage of area overlap of correct detection. The F1-score is the “harmonic mean” where the TPs are given additional importance. We can transform one into the other (see supplement). See Table 1 for how to compute IoU and F1 score.

0.25

0.50

0.75

1.00

0.25

0.50

0.75

1.00

Sensitivity

Precision

1.00

0.25

0.50

False positive rate

1.00

0.00 0.00

0.25

Precision

Sensitivity – malleolar category B

1.00

0.00 0.00

F1=0.80 IoU=0.66

TP FN

TP 0.50

FP Prediction

Ground truth

Sensitivity

Figure 3. ROC and PR curves for malleolar class predictions. The ROC curves (left) are monotonically growing functions of sensitivity (y-axis) and the FPR (x-axis). The AUC of the ROC curve corresponds to overall model accuracy. The PR-curves (right) have precision on the y-axis and sensitivity on the x-axis. Unlike the ROC, we see that it can oscillate and tends towards zero. The differences between the outcomes are also greater.

formance, because it is related to the accuracy, which is sensitive to data imbalance. One should consider a different performance measure for imbalanced data sets. However, we usually encounter AUC during research and development, where it is used to measure the overall model performance. It does not confine the model to a specific decision threshold, as it is computed over all thresholds. The AUC is well understood, easy to interpret, and has nice properties. See Supplementary material. Precision-recall (PR) curve Precision is the same as PPV, and recall is the same as sensitivity. The PR curve illustrates the tradeoff between preci-

sion and sensitivity and measures the model’s ability to separate between the groups. As neither precision nor sensitivity depends on TN, it is considered well suited to class imbalance data. Using AUPR to assess a model’s performance, as with the AUC, will measure the model’s performance in a way that is not affected by the classification threshold (Saito and Rehmsmeier 2015). Although it is a valid alternative to AUC, methodological issues with AUPR as a performance measure do exist. There is no clear, intuitive interpretation of AUPR or its properties (unlike AUC, which corresponds to overall accuracy). There is no consensus on what a good AUPR is. AUPR, and similar performance measures, comprise an active research field. However, most of these performance measures still need more research and are not well established. Figure 3 illustrates the differences between AUC and AUPR. Image segmentation or localization Sometimes the research problem is to detect a pathological lesion and locate it in an image to train the model to mark out the areas of interest as a human would. If there is sufficient overlap between the model and human reviewers, it is considered a success. The measures to evaluate segmentation and localization tasks presented next are equally valid for both 2D and 3D data sets. The F1 score is a commonly used performance measure based on its alternate interpretation as overlapping sets; however, the intersection over the union (IoU) is more intuitive (Figure 4). Intersection over union (IoU) or Jaccard index IoU is used to determine an image segmentation task’s performance in image data, such as radiographs, CT or MRI slices, or pathology slides. It is a measure of the pixel overlap comparing the area of overlap with the combined area of the predicted and actual location (or ground truth) in percentages.


Acta Orthopaedica 2021; 92 (5): 513–525

While relying on the application and source, it is common to consider > 50% of pixels a sufficient overlap for success. Presence/absence measures A localization task can also be used to determine the presence or absence of pathology. As this is a type of classification, the same measures used for other classification tasks are suited, e.g., ROC analysis (Chakraborty 2013). However, suppose we are interested in locating a lesion. In that case, the ROC or PR curve cannot measure the model’s ability to locate that region. An alternative measure used to incorporate the localization aspect is the free-response operating characteristic (FROC) (see Supplementary material). Another option is the region of interest (ROI) analysis, where the image is divided into regions. For example, parts of a brain scan could be divided into their respective cortexes. For each region, the rater assigns a probability that a lesion is located in that region. Plotting the ROC curve, with the number of regions falsely assigned as having a lesion, the performance can then be studied using ordinary ROC analysis. In ordinary ROC analysis, the patient or image is the unit to be observed. In contrast, each region is of interest in ROI (Obuchowski et al. 2000, Bandos and Obuchowski 2018). Continuous measurements Examples of continuous measurements could be estimating the tibiofibular and medial clear spaces in ankle radiographs to assess for syndesmotic injury. As these are continuous values, usually measured in millimeters, an AI model measuring these distances would use regression models to estimate the distance. Root mean squared error (RMSE) Mean squared error (MSE) is a common performance metric for continuous data. It computes the average squared error between the predicted and actual value. Squaring the error penalizes large errors, and it is thus more sensitive to outliers. Usually, the square root is taken from the MSE, giving the RMSE, which benefits from having the same unit and is easily relatable to the original value. Mean absolute error (MAE) MAE, or mean absolute deviation, finds the average distance between the predicted and actual value. MAE is less affected by outliers than MSE, as it does not square the difference in values. Multiple measurements Getting an AI model to detect the presence of pathology (2 outcomes) to high accuracy is generally easy, and a trivial task with limited utility. For example, most orthopedic surgeons or radiologists are good at quickly spotting fractures or other pathologies. Rather, use-cases where an AI model will be useful are to classify, locate, or detect many different outcomes or make difficult classifications.

519

A model will perform differently for each outcome, and we have to take this into account. As the number of outcomes increases, we will have to summarize multiple performance measures for all outcomes. As we would do with a group of individuals where we report a mean, we need to merge multiple outcomes into meaningful summary statistics. Frequency weighted average (FWA) Taking averages of the individual groups would give excessive importance to small groups. In Figure 2 we noticed that type B fractures were more prevalent than type A fractures and it makes sense that they should contribute more to the overall accuracy. Weighting according to frequency (FWA), excluding true negatives when they are very dominant, can be written as:

last ∑case = 1 ncase • measurecase FWA = ––––––––––––––––––––––– last ∑case = 1 ncase where n is the number of cases. For example, frequency weighted average AUC (from Figure 1) would become: AUCFWA = (24 • 0.8 + 137 • 0.93 + 47 • 0.86)/(24 + 137 + 47) = 0.90. FWA can be applied to any metric, for instance, AUPRFWA = (24 • 0.27 + 137 • 0.87 + 47 • 0.63)/(24 + 137 + 47) = 0.75.

Medical language generation Medical language generation involves the generation of medical text (e.g., diagnostic text or discharge summaries), with or without the use of input (e.g., radiographs). For example, Gale et al. (2018) trained a system to produce descriptive sentences to clarify deep learning classifiers’ decisions when detecting hip fractures from frontal pelvic radiographs. The most common word-overlap measures in medical text generation are BLEU (Papineni et al. 2002) and ROUGE (Lin 2004). BLEU measures content overlap between the model and ground truth texts and penalizes short generated captions using a brevity penalty. BLEU-1 considers single words, while BLEU-2, -3, -4 consider texts with 2 to 4 words, respectively. ROUGE-L(Recall) is in biomedical captioning the most common ROUGE variant. It measures the ratio of the length of the longest word subsequence in the human-generated text shared and the system-generated text. The measure complements BLEU by focusing on the human-generated text’s length instead of the system-generated text. We also note that various language generation evaluation measures exist, such as METEOR (Banerjee and Lavie 2005), CIDEr (Vedantam et al. 2015), and SPICE (Anderson et al. 2016). It is important to remember that human language complexity is vast and cannot be captured fully by these measures. Human evaluation of text is therefore commonly required as a supplement.


520

Ethical considerations and methodological biases New technology comes with new ethical dilemmas, and AI is no exception. The potential benefits of AI are real, as are ethical considerations. As we invest resources in research and then the software, hardware, and other logistics, resources come from elsewhere. The ramifications of AI are considerable, but clinicians are poorly informed (Felländer-Tsai 2020). We briefly describe common ethical dilemmas that clinicians should be aware of and take into account. The fundamental ethical principles that concern medical practice and patient care and treatment comprise beneficence, non-maleficence, respect for patient autonomy, and justice. Data and privacy ML and AI are powerful methods often described as “datahungry,” as they are needed to learn desired patterns and capture rare or unusual cases. AI models, at their core, conclude statistical relationships and therefore thrive on large amounts of data during training, which encourages large-scale data collection. Data, even in the right hands, can constitute a risk to patient integrity. For example, oversharing, overuse of personal data, or data theft all constitute risks to patient privacy and risk the data falling into the wrong hands or being used for the wrong purposes. Medical data is sensitive and cannot always be shared, causing problems for reproducibility and reporting on models’ outcomes. However, there are ways to anonymize and share data legally and responsibly (Hedlund et al. 2020), and this is highly encouraged. Bias and fairness Bias in AI mainly originates from the input data and the development process, and the design decision. These biases transfer to the output data, and an AI model will learn the data’s prejudice (Mittelstadt et al. 2016). Clinicians, biased by the AI interpretation, risk perpetuating that bias. Commonly acknowledged biases and confounders are gender, socioeconomic, and race. For example, a skin cancer detector trained on a dataset dominated by fair skin can have problems detecting melanoma in dark-skinned patients (Adamson and Smith 2018, Kamulegeya et al. 2019). Badgeley et al. (2019) successfully predicted hip fractures from radiographs. However, when they compensated for socioeconomic and logistical factors and healthcare process data (e.g., different scanners), model performance fell to random. Bias comes from the source and handling of data as well as the design choices during algorithm creation. Above all, it is important to recognize, examine, and reflect on AI studies’ biases (Beil et al. 2019). Informed consent and autonomy AI poses a risk to patient autonomy and integrity. When AI models produce difficult-to-explain outcomes based on

Acta Orthopaedica 2021; 92 (5): 513–525

unknown data, it becomes difficult to base decisions on their output. AI models also pose a risk to clinician autonomy. As AI systems become more prevalent, there is a risk that society will divert the responsibility for decision-making to algorithms that are incompletely understood. Clinicians and healthcare systems might implicitly become forced to implement and follow them against better judgment, which will also implicitly force patients to subject themselves to AI (Lupton 2018). Safety and interpretability The power of AI systems comes from their ability to use large amounts of data to create complex models that consider thousands of parameters. However, AI models, as developed today, are difficult to understand and interpret. AI models are mostly “black boxes.” What happens inside the model is usually unknowable. However, other medical technology and even many human analyses can also be considered black boxes. It is impossible to back-track the process fully in practice. Understanding ML models is an active field of research. One way to address the challenge is to learn to create interpretable models from the start (Rudin 2019). One popular way to understand AI models is to visualize the activating regions, i.e., the regions that lead to the classification decision get mapped in vivid colors. These can be called heat, saliency, or class activation maps. Another method is to produce bounding boxes that constitute the region of interest. However, whether the correct or incorrect region is displayed, they still do not explain why the model reacted to that region (Rudin 2019). Such auxiliary maps can capture some AI mispredictions, but far from all. Other methods to achieve interpretability include showing similar reference cases or deriving uncertainty measures (Pocevičiūtė et al. 2020). Transparency in AI is crucial for actual clinical implementations where errors could have critical implications. To critically assess AI results in the clinical workflow, we could supply standardized “model facts labels” along with the AI tool (Sendak et al. 2020); this is similar to the facts labels accompanying drugs to inform practitioners on suitable usage. Transparency could also help compensate for the sensitive nature of the data used to train and test them, which usually cannot be shared. Responsibility and liability Who is responsible and liable for AI interventions is not always clear. A model that is 95% accurate is wrong 5% of the time. It is common for an AI model that is excellent at a task to fail at examples obvious to a human observer. Some errors are within normal parameters. If the patient accepted the AI intervention, we might consider this an unfortunate but acceptable risk. However, if an AI model suggests a course of action, but the underlying rationale is not clear, clinicians might not follow it. Suppose the recommendation was correct, and not following them caused harm to the patient. Are clinicians responsible? Suppose they followed the AI recommendation, and it turned out to be a critical error, constituting


Acta Orthopaedica 2021; 92 (5): 513–525

521

Measurements. For continuous variables, e.g., angles, coordinates, or VAS pain, we can use root mean Measurements Classes/labels Area/volume Text squared error (RMSE) or mean abso(e.g., angles) (e.g., fracture yes/no) (e.g., area of fracture) (e.g., report) lute error (MAE). Both translate to values interpretable on the original Is the scale and are familiar to many cliniDo we want to Is the task to detect Is the text tagger very classification threshold penalize outliers? presence in regions? accurate? pre-determined? cians from traditional statistics. Historically, we have been more Yes No Yes No Yes Yes No No interested in RMSE as outliers tend to RMSE MAE Accuracy AUC ROI IoU Accuracy Manual be a major concern. For example, after scoring Is the wrist fracture surgery, most patients Are classes heavily AUC high (e.g., >0.8) Are classes heavily imbalanced? imbalanced? and classes heavily will have low VAS-pain levels. We imbalanced? are then primarily interested in idenF1 score AUPR F1 score tifying failures that risk high levels of VAS. Machine learning allows for Figure 5. Recommendations for choosing outcome metrics suitable for clinicians. The selected new applications and, under some cirmeasures are selected for their (1) suitability and (2) their interpretability to a clinician. Deviations from these are possible; however, they need to be motivated, and we recommend also reporting cumstances, we will at times prefer these metrics. IoU (Intersection over Union); ROI (Region of Interest); MAE (Mean Average Error); the MAE. For example, if the system RMSE (Root Mean Squared Error); AUC (Area Under the Receiver Operating Characteristic curve; draws a bounding box around a fracAUPR (Area Under the Precision-Recall curve). ture, the box must be close to the fracture site most of the time. If the objecmalpractice. Who is liable and responsible, legally but also tive is to enhance efficiency while quickly viewing images, we morally? Currently, most AI interventions are tools that assist are less concerned with rare, complex fracture cases. These clinicians, rather than replacing them, and then the physician will, regardless of the bounding box, require more attention. remains responsible. Area or volumes The F1 score is a common performance measure for segmentation performance in images. However, we argue that Proposed guidelines for evaluating and its interpretation is non-intuitive compared with the IoU, as presenting AI/ML research shown in Figure 4. We therefore recommend using IoU. As Based on the previous discussion, we propose guidelines and an alternative, used in particular for 3D imaging, we recoma checklist for reporting and presenting AI and ML to clini- mend using ROI, which is more intuitive than most alternate cians and other non-machine learning experts. We first state performance measures. our recommendations (Figure 5) on reporting and presenting AI and ML research to clinicians and provide a checklist for Medical text If we compare to a known text, such as in biomedical image capreporting (Table 4). tioning, we can use BLEU and ROUGE-L (Kougia et al. 2019). Recommendations for reporting outcomes However, we observe that these 2 measures do not assess clinical Figure 5 comprises recommendations for choosing outcome correctness (Table 5). A single word could change the meaning metrics suitable for clinicians. We choose these measures as of the text, for example, changing “presence” to “absence,” or they are (1) suitable and, in general, (2) most interpretable to adding “no” to a sentence, and could potentially cause adverse a clinician. While the discussion regarding what makes a good outcomes for patients. A human review will be necessary to choice is still ongoing, and deviations from our suggested ascertain clinical correctness. If we had a very accurate clinimetrics are possible, we expect that our suggestions will assist cal tagger (i.e., tagging text with clinical keywords), we would the indecisive clinician. We recommend including these met- estimate clinical correctness via accuracy or F1 score, e.g., by tagging both the generated and the reference clinical texts and rics alongside any other metrics. measuring accuracy and F1 over the extracted tags. Continuous Classification. AUC is a standard measure that most clinicians Accuracy are familiar with or have at least encountered. It is, though, ill- Accuracy is an easily understood and often requested persuited for heavily imbalanced data sets, where AUPR should formance measure. Even its weakness, overestimating peraccompany the AUC measure. If the performance in AUC is formance, is easy to understand. If the data is heavily imbalanced, however, the F1 score is the preferred choice. low, the additional information from AUPR is less relevant. Model

Model output


522

Acta Orthopaedica 2021; 92 (5): 513–525

Table 4. Clinical AI Research (CAIR) Checklist Proposal Section Reporting recommendations TITLE AND ABSTRACT Include that the method contains or uses an AI/ML. Broad terms such as “artificial intelligence” or “machine learning” are encouraged, but “deep learning” or similarly broadly specific terms would also work well. More precise terms are best suited in the abstract. State the AI tool’s intended use or purpose, in a disease context in the title and/or the abstract. What is the targeted condition? INTRODUCTION The introduction should focus on the clinical problem. The AI component is the tool used to solve the clinical problem. If possible, explain the AI’s intended part within the clinical pathway. METHODS State inclusion and exclusion criteria, at the participant and the input data level, separately. ● State why these criteria were used. ● How were rare pathologies handled? Describe how the input data was acquired, selected, and handled, and include any form of preprocessing before analysis. If there were some specific considerations in handling the data, this should also be specified. ● Was the data split into separate train, validation, and test sets? ● Are there any differences in how test and training sets were selected and processed? ● How were patients or cases that occur more than once handled? Can they be found in both the test and training set? For example, same patient at different points in time or duplicate data. ● Are positive and negative cases from different sources? (For example, perhaps different machines are used in high- or low-probability settings, and the algorithm learns this pattern instead?) ● If there were minimum requirements on the data, state what those requirements were. Specify if there was a human–AI interaction handling input data and level of expertise of the people handling it. ● How was the ground truth established (e.g., double review with consensus, consensus review, single review, secondary sources)? What was the level of expertise of the source or reviewers? What level of noise was present (e.g., Cohen’s kappa). ● If there was training involved in handling the data, this should be specified. Describe how missing or poor-quality data was handled. ● Were extreme values or outliers handled separately? Explain how and why. State the AI model’s specifications, design, and the parameters used in training it. ● The model’s data requirements, to serve its purpose, need to be clearly stated (e.g., data format, dimensions, time, etc.). ● How was the data preprocessed? It should be stated separately for training and test sets. ● What was the model architecture? Was a pre-trained model used? Was it pre-trained for the current study? ● If it was a pre-trained model, is the data the model was pre-trained on also part of the current data sets? ● What regularizers were used? (For example, dropout, white noise, batch normalization, stochastic weight averaging, etc.) ● How was the loss calculated? If a non-standard loss function was used, why was this particular loss chosen? ● What model-specific parameters were used in training the model? For example, learning rate, number of epochs, etc. State the specific version of the AI model used in the study. AI models are likely to undergo many iterations. It is important for reproducibility and tracking changes in the model if reused or implemented in a later study. Specify the output of the AI. The output affects the model interpretation and post-processing. ● What was the type of output? For example, probabilities, bounding boxes, text, segmented images, models? Explain how the output contributed to decision-making and evaluation of the model. ● In what way was the output decided upon? Sometimes the reason for deciding on that output needs to be specified. For example, when the output depends on a decision threshold, and the model used non-standard thresholds, or when different thresholds are used for different outcomes, it might be necessary to explain why they are different. ● If the output was used in later steps, how was it used? Include explanations of how the outputs informed, or led to, subsequent steps. For example, was the output used in subsequent steps by a user to inform an action or was it combined with a different model? How was outcome performance measured? At times it could be necessary to state why a performance measure was chosen over a different performance measure. ● The performance measure most likely familiar to the clinicians should be the primary reporting measure. Sometimes, alternate or additional measures are required but it is important to ensure that these are adequately explained. ● In the statistical section, specify the exact version of the measure used, e.g., ROUGE-L-Recall, Rouge-L-Precision, or Rouge-L-F1. ● How was confidence evaluated? Bootstrapping, Monte Carlo simulation, p-value? ● For suggestions on how to choose performance measures, see Figure 5. RESULTS Describe the results of analysis and performance errors. If no such analysis was performed, justify why not. Performance errors and failure analysis are important for AI models and help communicate the limitations of the model. DISCUSSION AND OTHER INFORMATION State if and how the AI model/data can be accessed, including any restrictions to access or reuse. If it is not possible, state why. Include any details and license. While this is highly desirable, it is not always possible to make data or models readily available or make them available online. Describe ethical considerations and implications of the model, and/or research. Biases and limitations, the input data or output, that impact generalizability should also be considered. Guidelines for publishing, reviewing, and evaluating reporting of AI and ML content to clinicians. Clinical trials and clinical trial protocols, including AI interventions, should adhere to the CONSORT-AI (Liu et al. 2020a, 2020b) and SPIRIT-AI (Rivera et al. 2020) checklists. However, those contain minimal reporting requirements. Besides, most studies are not in a clinical trial stage, and some of those recommendations are not necessarily applicable. The table elaborates on some important parts of reporting on studies utilizing AI/ML components.


Acta Orthopaedica 2021; 92 (5): 513–525

523

Table 5. Example sentences for medical text analysis using BLEU and ROUGE GT Subtle impacted intertrochanteric hip fracture H1 No subtle impacted intertrochanteric hip fracture H2 There is a hip fracture clearly appearent on the radiograph

B1 83.3 20.0

B2 81.6 14.9

B3 79.4 28.1

B4 76.0 38.6

ROU 100.0 40.0

H1 scored higher than H2 compared with the ground truth (GT, human-generated), using BLEU-1/-2/-3/-4 (B1, B2, B3, B4) and ROUGE-L (ROU). However, given the ground truth (GT), H2 is clinically correct, while H1 is not.

Clinical Artificial Intelligence Research (CAIR) Checklist See Table 4.

Discussion AI and ML will most likely impact medicine in more ways than we can imagine. Arguing that clinicians need to be involved, we began by describing different tasks and pitfalls in machine learning and shared some ways to address them. We followed by presenting the related concept of performance measures. Performance measures describe the result of the study. Using the right performance measure will give a correct context to the outcome. However, performance measure choice is not always clear and occasionally depends on an experiment’s stage and the audience. For a prospective study or development of an AI model, a measure such as AUC is appropriate. When we use an AI model as an intervention in a clinical trial or in a production setting, where we implement a specific AI system, the actual expected performance is more important. MCC or precision-recall analysis with AUPR and F1-score are more suitable, as AUC could overestimate the model’s performance (Chicco and Jurman 2020). We discussed some of the fundamental ethical problems and consequences of algorithmic medicine and AI interventions. We believe that this is essential for understanding and evaluating AI studies, including their limitations. Ethical considerations can limit individual AI systems, but those limitations are sometimes necessary to safeguard the patients, who are the ultimate beneficiaries of medical AI. The Enhancing the Quality and Transparency of Health Research, EQUATOR (Pandis and Fedorowicz 2011), network defines an AI intervention as an intervention that relies on an AI/DL/ML component (Liu et al. 2020a, 2020b, Rivera et al. 2020). In line with the growing importance of AI research in healthcare, the SPIRIT (Standard Protocol Items: Recommendations for Interventional Trials) 2013 and the CONSORT (CONsolidated Standard for Reporting Trials) 2010 were amended in 2020 with SPIRIT-AI and CONSORT-AI checklists. SPIRIT-AI and CONSORT-AI are additional checklists meant to deal with the particulars of AI studies. In particular, they address the particular biases involved. They do not

specify how to conduct AI studies but give minimal recommendations for reporting on them. Similar protocols for other study types are under development. For diagnostic and prognostic studies, STARD-AI (Standards for Reporting Diagnostic Accuracy-Artificial Intelligence) reporting guidelines are in development. Moreover, TRIPOD-ML (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis–Machine Learning) is in development (Liu et al. 2020b). However, CONSORT-AI and SPIRIT-AI are minimal reporting checklists. We presented a proposal for recommendations and guidelines on reporting AI and ML research to clinicians and other healthcare stakeholders. We also proposed the CAIR checklist to facilitate these recommendations. We envision this proposal as the starting point of a broader consensus process on reporting, presenting, and understanding AI studies’ outcomes. We also hope to help healthcare professionals and other healthcare stakeholders interpret these studies. We have not fully covered all aspects of AI and ML in medicine or orthopedics, which would be an impossible task. This paper focuses on 3 important areas for understanding and evaluating AI research in medicine. We have picked tasks commonly found in medical AI studies that are most likely to be encountered in orthopedics research. The selection can, and will, change as the field and clinicians’ familiarity with it evolve. For example, while drafting this paper, the CONSORT-AI and SPIRIT-AI guidelines were published, but TRIPOD-ML and STARD-AI have not yet been. Conclusion With the advancement of technology, computational power, and a great deal of research, AI will be an important clinical tool. For some this is a cause of concern, while for others this is an opportunity to improve health outcomes. What matters, in the end, is what is best for the patient. New tools can help clinicians do a better and more reliable job and automate tedious and trivial tasks, allowing them to focus on complex tasks. There are also risks associated with AI. The risk is that clinicians do not understand and take part in the process around them. Alternatively, they may embrace and not understand what the implications are. If low quality or wrongly guided research dominates, the implementation of meaningful outcomes might suffer. Ethical considerations that clinicians face every day are


524

not always shared or understood by the developers behind the tools, who could have a different agenda than clinicians. While we are very far from a time when AI will replace clinicians, we are in a time when clinicians must deal with and benefit from AI. Clinicians need to understand the changes, research, and results that are happening every day. To guide those developments, what is most needed is for clinicians to be part of this development. In order to do that, they need to understand it. The goal of the CAIR checklist is to facilitate this. Ethics, funding, and potential conflicts of interest MG is supported by grants provided by Region Stockholm (ALF project) and JO by grants provided by the Karolinska Institute. MG is a co-founder and shareholder in DeepMed AB. CL is an employee and shareholder of Sectra AB. Supplementary data Supplementary data are available in the online version of this article, http://dx.doi.org/10.1080/17453674.2021.1918389

The authors would like to thank Professor Ion Androutsopoulos, Professor of Artificial Intelligence, Department of Informatics, Athens University of Economics and Business for his support and valuable comments. Author contributions (according to CRediT (https://casrai.org/credit/)) JO: Conceptualization, data curation, formal analysis, investigation, methodology, visualization, writing—original draft, writing—review & editing. JPavlov: Conceptualization, formal analysis, investigation, methodology, writing—review & editing. JPrijs: Writing—review & editing. FIJ: Writing—review & editing. JD: Writing—review & editing. CL: Writing— review & editing. JH: Writing—review & editing. MG: Conceptualization, data curation, formal analysis, investigation, methodology, visualization, writing—review & editing.  Acta thanks Sebastian Mukka and Anders Troelsen help with peer review of this study.

Adamson A S, Smith A. Machine learning and health care disparities in dermatology. JAMA Dermatol 2018; 154(11): 1247. Anderson P, Fernando B, Johnson M, Gould S. SPICE: Semantic Propositional Image Caption Evaluation. arXiv:160708822 [cs] [Internet] 2016 Jul 29 [cited 2020 Nov 30]. Available from: http://arxiv.org/abs/1607.08822 Badgeley M A, Zech J R, Oakden-Rayner L, Glicksberg B S, Liu M, Gale W, et al. Deep learning predicts hip fracture using confounding patient and healthcare variables. NPJ Digit Med 2019; 2(1): 1-10. Bandos A I, Obuchowski N A. Evaluation of diagnostic accuracy in freeresponse detection-localization tasks using ROC tools. Stat Methods Med Res 2019; 28(6): 1808-25. doi: 10.1177/0962280218776683. Banerjee S, Lavie A. METEOR: an automatic metric for mt evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization [Internet]. Ann Arbor, MI: Association for Computational Linguistics; 2005 [cited 2020 Nov 30]. p. 65-72. Available from: https://www.aclweb.org/anthology/W05-0909

Acta Orthopaedica 2021; 92 (5): 513–525

Beil M, Proft I, van Heerden D, Sviri S, van Heerden P V. Ethical considerations about artificial intelligence for prognostication in intensive care. Intensive Care Med Exp 2019; 7(1): 70. doi: 10.1186/s40635-019-0286-6. Chakraborty D P. A brief history of free-response receiver operating characteristic paradigm data analysis. Academic Radiology 2013; 20(7): 915-9. Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 2020; 21(1): 6. Esteva A, Kuprel B, Novoa R A, Ko J, Swetter S M, Blau H M, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 2017; 542(7639): 115. Felländer-Tsai L. AI ethics, accountability, and sustainability: revisiting the Hippocratic oath. Acta Orthop 2020; 91(1): 1-2. doi: 10.1080/17453674.2019.1682850. Fleming N. How artificial intelligence is changing drug discovery. Nature 2018; 557(7707): S55-7. Gale W, Oakden-Rayner L, Carneiro G, Bradley A P, Palmer L J. Producing radiologist-quality reports for interpretable artificial intelligence. arXiv:180600340 [cs] [Internet] 2018 Jun 1 [cited 2020 Nov 30]. Available from: http://arxiv.org/abs/1806.00340 Gulshan V, Peng L, Coram M, Stumpe M C, Wu D, Narayanaswamy A, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 2016; 316(22): 2402. Gulshan V, Rajan RP , Widner K, Wu D, Wubbels P, et al. Performance of a deep-learning algorithm vs manual grading for detecting diabetic retinopathy in India. JAMA Opthalmol 2019; 137(9): 987-93. Hedlund J, Eklund A, Lundström C. Key insights in the AIDA community policy on sharing of clinical imaging data for research in Sweden. Scientific Data 2020; 7(1): 331. Kamulegeya L H, Okello M, Bwanika J M, Musinguzi D, Lubega W, Rusoke D, et al. Using artificial intelligence on dermatology conditions in Uganda: a case for diversity in training data sets for machine learning. bioRxiv 2019 Oct 31; 826057. Kougia V, Pavlopoulos J, Androutsopoulos I. A survey on biomedical image captioning. arXiv:190513302 [cs] [Internet] 2019 May 26 [cited 2020 Dec 1]. Available from: http://arxiv.org/abs/1905.13302 Lin C-Y. ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out [Internet]. Barcelona, Spain: Association for Computational Linguistics; 2004 [cited 2020 Nov 30]. p. 74-81. Available from: https://www.aclweb.org/anthology/W04-1013 Liu X, Cruz Rivera S, Moher D, Calvert M J, Denniston A K. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nat Med 2020a; 26(9): 1364-74. Liu X, Rivera S C, Faes L, Keane P A, Moher D, Calvert M, et al. CONSORT-AI and SPIRIT-AI: new reporting guidelines for clinical trials and trial protocols for artificial intelligence interventions. Invest Ophthalmol Vis Sci 2020b; 61(7): 1617-1617. Lupton M. Some ethical and legal consequences of the application of artificial intelligence in the field of medicine. Trends Med [Internet] 2018 [cited 2020 Oct 17]; 18(4). Available from: https://www.oatext.com/some-ethical-and-legal-consequences-of-the-application-of-artificial-intelligencein-the-field-of-medicine.php Manning C. Artificial intelligence definitions [Internet] 2020 [cited 2020 Nov 26]. Available from: https://hai.stanford.edu/sites/default/files/202009/AI-Definitions-HAI.pdf Michie D, Spiegelhalter D J, Taylor C C. Machine learning, neural and statistical classification 1994 [cited 2016 Dec 7]. Available from: http:// citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.27.355 Mittelstadt B D, Allo P, Taddeo M, Wachter S, Floridi L. The ethics of algorithms: mapping the debate. Big Data & Society 2016; 3(2): 2053951716679679. Nicolaes J, Raeymaeckers S, Robben D, Wilms G, Vandermeulen D, Libanati C, et al. Detection of vertebral fractures in CT using 3D convolutional neural networks. arXiv:191101816 [cs, eess] [Internet] 2019 Nov 5 [cited 2020 Oct 9]. Available from: http://arxiv.org/abs/1911.01816


Acta Orthopaedica 2021; 92 (5): 513–525

Obuchowski N A, Lieber M L, Powell K A. Data analysis for detection and localization of multiple abnormalities with application to mammography. Acad Radiol 2000; 7(7): 516-25. Olczak J, Emilson F, Razavian A, Antonsson T, Stark A, Gordon M. Ankle fracture classification using deep learning: automating detailed AO Foundation/Orthopedic Trauma Association (AO/ OTA) 2018 malleolar fracture identification reaches a high degree of correct classification. Acta Orthop 2021; 92(1): 102-8. doi: 10.1080/17453674.2020.1837420. Pandis N, Fedorowicz Z. The International EQUATOR Network: enhancing the quality and transparency of health care research. J Appl Oral Sci 2011; 19(5): 0. Papineni K, Roukos S, Ward T, Zhu W-J. BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics [Internet]. Stroudsburg, PA: Association for Computational Linguistics; 2002 [cited 2016 Nov 3]. p. 311-318. (ACL ’02). Available from: http://dx.doi. org/10.3115/1073083.1073135 Paul D, Sanap G, Shenoy S, Kalyane D, Kalia K, Tekade R K. Artificial intelligence in drug discovery and development. Drug Discovery Today [Internet] 2020 Oct 21 [cited 2020 Nov 26]. Available from: http://www. sciencedirect.com/science/article/pii/S1359644620304256 Pocevičiūtė M, Eilertsen G, Lundström C. Survey of XAI in digital pathology. In: Holzinger A, Goebel R, Mengel M, Müller H, editors. Artificial intelligence and machine learning for digital pathology. Lecture Notes in

525

Computer Science, vol. 12090, 2020. https://doi.org/10.1007/978-3-03050402-1_4 Qi Y, Zhao J, Shi Y, Zuo G, Zhang H, Long Y, et al. Ground truth annotated femoral X-ray image dataset and object detection based method for fracture types classification. IEEE Access 2020; 8: 189436-44. Rivera S C, Liu X, Chan A-W, Denniston A K, Calvert M J, Ashrafian H, et al. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Lancet Digital Health 2020; 2(10): e549-60. Rudin C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 2019; 1(5): 206-15. Saito T, Rehmsmeier M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS One 2015; 10(3): e0118432. doi: 10.1371/journal.pone.0118432. Sendak M P, Gao M, Brajer N, et al. Presenting machine learning model information to clinical end users with model facts labels. NPJ Digit Med 2020; 3(41). https://doi.org/10.1038/s41746-020-0253-3 Shim E, Kim J Y, Yoon J P, Ki S-Y, Lho T, Kim Y, et al. Automated rotator cuff tear classification using 3D convolutional neural network. Scientific Reports 2020; 10(1): 15632. Vedantam R, Zitnick C L, Parikh D. CIDEr: fonsensus-based image description evaluation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2015. p. 4566-75.


526

Acta Orthopaedica 2021; 92 (5): 526–531

Wide range of applications for machine-learning prediction models in orthopedic surgical outcome: a systematic review Paul T OGINK 1, Olivier Q GROOT 2, Aditya V KARHADE 2, Michiel E R BONGERS 2, F Cumhur ONER 1, Jorrit-Jan VERLAAN 1, and Joseph H SCHWAB 2 1 Department

of Orthopedic Surgery, University Medical Center Utrecht – Utrecht University, Utrecht, The Netherlands; 2 Department of Orthopedic Surgery, Orthopedic Oncology Service, Massachusetts General Hospital – Harvard Medical School, Boston, USA Correspondence: ptogink@gmail.com Submitted 2020-11-15. Accepted 2021-04-14.

Background and purpose — Advancements in software and hardware have enabled the rise of clinical prediction models based on machine learning (ML) in orthopedic surgery. Given their growing popularity and their likely implementation in clinical practice we evaluated which outcomes these new models have focused on and what methodologies are being employed. Material and methods — We performed a systematic search in PubMed, Embase, and Cochrane Library for studies published up to June 18, 2020. Studies reporting on non-ML prediction models or non-orthopedic outcomes were excluded. After screening 7,138 studies, 59 studies reporting on 77 prediction models were included. We extracted data regarding outcome, study design, and reported performance metrics. Results — Of the 77 identified ML prediction models the most commonly reported outcome domain was medical management (17/77). Spinal surgery was the most commonly involved orthopedic subspecialty (28/77). The most frequently employed algorithm was neural networks (42/77). Median size of datasets was 5,507 (IQR 635–26,364). The median area under the curve (AUC) was 0.80 (IQR 0.73– 0.86). Calibration was reported for 26 of the models and 14 provided decision-curve analysis. Interpretation — ML prediction models have been developed for a wide variety of topics in orthopedics. Topics regarding medical management were the most commonly studied. Heterogeneity between studies is based on study size, algorithm, and time-point of outcome. Calibration and decision-curve analysis were generally poorly reported.

Surgical decision-making in orthopedic surgery involves weighing the benefits of an intervention against its inherent risks. Prognostic scoring tools have been devised to individualize risk prediction and thus improve surgical decisionmaking (Janssen et al. 2015, Pereira et al. 2016, Shah et al. 2018). Although clinical prediction models are not new, recent advancements in artificial intelligence have created a host of prediction models based on machine learning (ML) (Cabitza et al. 2018). ML is a branch of artificial intelligence that enables computer algorithms to learn from experience from large datasets without explicit programming. Figure 1 shows 3 commonly employed algorithms. Existing reviews of machine learning studies have provided a broad overview of applications ranging from vision to natural language processing and predictive analytics (Cabitza et al. 2018). To our knowledge, there is no study that has critically assessed the body of studies focused on ML prediction models for surgical outcome in orthopedics. These types of prediction models are most likely the first branch of artificial intelligence to be employed in clinical practice (Staartjes et al. 2020). Therefore, familiarizing practicing orthopedic surgeons with ML’s concepts and the topics these new methods have focused on can optimize their implementation in clinic. As such, the purpose of this systematic review is to (1) evaluate which surgical outcomes orthopedic clinical prediction models have focused on, and (2) determine which techniques current prediction models use for development and validation.

Material and methods Systematic literature search Adhering to the 2009 PRISMA guidelines a systematic search was performed in PubMed, Embase, and the Cochrane Library for articles published up to June 18, 2020. 2 different domains © 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group, on behalf of the Nordic Orthopedic Federation. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits ­unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. DOI 10.1080/17453674.2021.1932928


Acta Orthopaedica 2021; 92 (5): 526–531

527

Records identified through: – PUBMED 6,036 – Embase 2,819 – Cochrane 315 Records after duplicates removed n = 7,138

Figure 1. (A) Decision trees are hierarchical structures in which each node performs a test on the input value with the subsequent branches representing the outcomes. Their graphical representation as seen here makes them easy to understand and interpret. However, they are prone to overfitting. (B) Neural networks are based on interconnected nodes. The input features are represented by the first (blue) layer. The designated outcome is represented by the final (green) layer. The middle, hidden layers (blue and orange) base their output on the input they get from prior layers. Neural networks have been around for a long time and offer good discriminative abilities, but interpretation of the relationships between the different layers remains difficult. (C) Support vector machines (SVMs) perform classification by determining the optimal separating hyperplane between datapoints, which maximizes the distance between the 2 closest points of either group. They can be used for both linear and nonlinear relationships. While they remain effective in data with a great number of features, they do not work well in larger datasets.

of medical subject headings (MeSH) terms and keywords were combined with “AND” and within the 2 domains the terms were combined with “OR.” The 1st domain included words related to ML and the second domain related to possible orthopedic specialties (Appendix 1, see Supplementary data). Terms were restricted to MeSH, title, abstract, and keywords. Two reviewers (PTO, OQG) independently screened all titles and abstracts for eligible articles based on predefined criteria. Eligible full-text articles were evaluated and crossreferenced for potentially relevant articles not identified by the initial search (Figure 2). Discrepancies between the 2 reviewers were adjudicated by the senior author (JHS). Eligibility criteria Studies reporting on ML-based prediction models addressing orthopedic surgical outcomes were included, as were all intraoperative and postoperative outcomes. The surgical orthopedic population was defined as disorders of the bones, joints, ligaments, tendons, or muscles treated by any type of operation. Excluded were studies (1) that did not include at least 1 ML-based prediction models for surgical outcome (e.g., logistic regression-based models), (2) non-English studies, (3) lack of full text, and (4) non-relevant study types such as animal studies, letters to the editors, and case reports. Assessment of methodological quality Quality assessment was performed based on a modified nineitem Methodological Index for Non-Randomized Studies (MINORS) checklist (Slim et al. 2003). We made it applicable to our systematic review by including disclosure, study aim, input feature, output feature, validation method, dataset dis-

Records excluded after screening of title and abstract n = 6,380 Full-text articles assessed for eligibility n = 758 Excluded (n = 695): – no prediction model, 225 – no surgical outcome, 470 Studies included in qualitative synthesis n = 63

Figure 2. Flowchart of study inclusions and exclusions.

tribution, performance metric, and explanation of the used AI model (Langerhuizen et al. 2019). These 9 items were scored on a binary scale: 0 (not reported or unclear) and 1 (reported and adequate). Data extraction Table 1 lists the data we extracted from each study. For this review, 6 main orthopedic surgical outcome domains were identified, consisting of (1) intraoperative complications (e.g., blood transfusion, prolonged operative time), (2) postoperative complications (e.g., venous thromboembolism), (3) survival, (4) patient reported outcome measures (PROMs), (5) medical management (e.g., hospitalization), and (6) other. For studies reporting the performance of multiple ML models, the best performing ML model was used. 13 studies provided multiple models for multiple surgical outcomes; these were extracted separately resulting in more ML models than studies. Only the 2 performance measures AUC and accuracy were extracted as they were most the commonly reported results. Study characteristics After screening of titles and abstracts, 758 full-text articles were assessed for eligibility and ultimately 59 articles were included reporting on 77 ML prediction models (Table 1). Median sample size was 5,818 (IQR 635–26,869). Using the MINORS criteria, all 59 articles were found to be of similar quality. All included a minimum of 8 out of 9 appraisal items (Appendix 2, see Supplementary data). Statistics AUC scores and accuracies in tables are expressed as they were originally reported. For studies that reported multiple results within a single outcome domain (e.g., multiple different postoperative PROMs, each with an independent AUC) averages were taken. The sizes of the training, validation, and


528

Acta Orthopaedica 2021; 92 (5): 526–531

Table 1. Data extracted from each study 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Year of publication First author Disease condition Type of surgery Input feature Number of features in final model Type of outcome Time points of outcome Number of output classes ML algorithm used Number of patients Distribution between training, validation, and test set Validation method AUC and accuracy of model Reporting of calibration and Brier score Decision-curve analysis Digital application of the model

test sets are reported as percentages of the total dataset. No meta-analysis was performed because of obvious heterogeneity between studies and in orthopedic applications. However, to summarize the findings in some quantitative form, the median AUC and accuracy of the prediction performance were calculated for all studies. We used Microsoft Excel (Version 16.31; Microsoft Inc, Redmond, WA, USA) for standardized forms for data extraction and quality assessment, and Mendeley as reference management software. Ethics, funding, and potential conflicts of interests Institutional review board approval was not required for this systematic review. No external funding was received. The authors have no conflicts of interest to declare.

Results Study design Table 2 lists the characteristics of all included studies. More than half of the 77 models were developed with data from national databases or registries (42) (Table 3). The median number of predictor variables used in the ML model was 10 (IQR 8–15). Models using national data did not include more variables: 10 (IQR 8–13). 68 of the models had a binary distribution of the outcome variable. Most frequently employed algorithms were neural networks (42) and random forests (30). 36 of the neural networks were single-layer, 5 deep learning, and 1 convolutional. The median number of patients used was 5,507 (IQR 635–26,364). Median AUC was 0.80 (IQR 0.73– 0.86) and median accuracy was 79% (IQR 75–88). Calibration was reported for 26 of the models and 23 provided Brier scores. Decision-curve analysis was employed in 14 studies. 18 provided a digital application for their prediction model.

Table 3. Characteristics of studies (n = 77). Values are count (%) unless otherwise specified Sample size, median (IQR) 5,818 (635–26,364) Predictors included in final model, median (IQR) a 10 (8–15) Outcome domain Medical management 17 (22) Survival 16 (21) Complication 15 (19) PROMs 12 (16) Intraoperative complication 3 (3.9) Other 14 (18) Orthopedic subspecialty Spine 28 (36) Arthroplasty 21 (27) Trauma 13 (17) Oncology 6 (7.8) Other 9 (12) b National/Registry database 42 (55) Split sample 70:30 22 (29) 80:20 19 (25) Other 36 (46) ML algorithm c Neural network 42 (55) Single layer 36 (47) Deep learning 6 (8) Convolutional 1 (1) Random forest 30 (39) Support vector machine 19 (25) Naive Bayes 11 (14) Stochastic gradient boosting 10 (13) Performance metric c AUC 74 (96) Accuracy 39 (51) Brier score 23 (30) Calibration 26 (34) Model explanation Global 34 (44) Local 17 (22) Decision curve analysis 14 (18) Digital application available 18 (23) AUC = area under the curve, IQR = interquartile range, ML = machine learning, PROM = patient reported outcome measure. a Amount of predictors that were included in the final, best performing machine learning algorithm. In 16% (13/81) this could not be extracted from the study or was unclear. b This includes databases such as Surveillance, Epidemiology, and End Results (SEER) or American College of Surgeons National Surgical Quality Improvement Program (ACS NSQIP). c Not mutually exclusive.

Outcome The most commonly reported outcome domains were medical management (17) and survival (16). Medical management mostly focused on discharge destination (7) and hospitalization (4). The studies on survival all addressed patient survival. 6 survival studies were in orthopedic oncology and 5 in orthopedic trauma. Both medical management and survival had a higher median AUC (0.82 and 0.84 than overall median AUC). Spinal surgery was the most commonly involved subspecialty (28).


Acta Orthopaedica 2021; 92 (5): 526–531

Discussion Recent years have seen an increasing interest in artificial intelligence and ML in orthopedics (Bini 2018, Jayakumar et al. 2019). With this systematic review we aimed to provide an introduction to the main concepts of developing ML models for orthopedic surgeons and analyze the current application and design of these models in orthopedic surgery. We found a wide range of potential applications ranging from predicting survival in spinal metastases, clinical outcome after shoulder arthroplasty, and hospitalization after hip fracture surgery. This systematic review has a number of limitations. 1st, due to the relative novelty of this field of research in orthopedic surgery, the variety in study designs renders comparisons and comprehensive quantitative analysis difficult. We therefore opted to perform a qualitative analysis of the current publications. Hopefully, the increasing familiarity with these types of studies will lead to better reporting and open up the possibility to perform quantitative analyses. 2nd, this review is likely influenced by publication bias. ML prediction models with good performance are more likely to be published than models with mediocre or poor performance. This positive publication bias has been shown both in medicine and computational sciences (Boulesteix et al. 2015). The performance measures presented here were therefore likely to be more favorable than those of all developed models. 3rd, despite our efforts to perform a search across multiple online libraries, we have missed a number of studies reporting ML prediction models. Whilst unfortunate, we do no not think these omissions will significantly alter our findings on research topics or most utilized methodology as this review included nearly 60 studies. This systematic review shows that ML models have been developed for a wide variety of topics across all subspecialties within orthopedics. Perhaps surprisingly, medical management was the most studied domain with the majority of models focusing on readmissions and discharge placement. Both readmissions and discharge delays impose a heavy burden on healthcare costs (Wan et al. 2016). Healthcare expenditure has risen steadily throughout the developed world in recent decades (OECD 2019). While there is enormous variation in healthcare systems, government institutions in virtually all countries have looked at improving medical management to help curb costs (Schwierz 2016). Papanicolas et al. (2018) found activities relating to planning, regulating, and managing health services was a major factor in the difference in healthcare expenditure between the United States and 10 other high-income countries. Shrank et al. (2019) concluded failure of care coordination, leading to unnecessary readmissions among other things, amounts to $78 billion of waste in the United States. To address this problem the Centers for Medicare and Medicaid Services started the Hospital Readmissions Reduction Program in 2012, incentivizing hospitals to lower readmission rates. Knowing in advance

529

which patients are at risk of being readmitted within 30 days after discharge is crucial, which is a possible explanation as to why so many prediction models focus on this topic. Similarly, knowing in advance where patients are likely to be discharged to makes preventing delayed discharge a lot easier than the other interventions tried over the years (Bryan 2010, Ou et al. 2011). Furthermore, the databases available in the studies on medical management appear to be larger, enabling researchers to include more variables and create better performing prediction models. These models are more likely to be published as evidenced by the higher AUC for medical management compared to overall AUC. Survival was the other commonly studied outcome domain. Accurately estimating remaining life-expectancy is an important feature in medical decision-making in orthopedic oncology (Pereira et al. 2016). In a patient group with only limited life-span remaining, the aim of treatment is to preserve quality of life. Accurate survival estimations can guide decisionmaking on whether or not to perform surgery and if so, which operative treatment should be opted for (Quinn et al. 2014). With an ageing population and cancer patients surviving longer, the incidence of bone metastases will continue to rise and prediction models will likely play an increasing role in this field (Quinn et al. 2014). The AAOS Census 2018 showed only 8.3% of orthopedic surgeons’ primary specialty area was the spine, while onethird of the prediction models were linked to spinal surgery (AAOS Department of Clinical Quality and Value 2019). Cost reduction may also be the driving factor in the overrepresentation of spinal surgery prediction models; the economic cost of spinal surgery is large and growing with spinal fusions alone costing $30 billion annually in the United States (Johnson and Seifi 2018). Prediction models could play a role in curbing costs by improving patient selection and surgical decisionmaking, although this could be said for all other subspecialties. Another possible explanation for the disproportionate number is the overlap with neurosurgery. The neurosurgical field was relatively quicker to use ML to develop prediction models and had developed several models in spinal surgery earlier on (Senders et al. 2018). Finally, the field of prediction models is expanding but still small. A significant proportion of the prediction models are developed by a few research groups that happen to focus on spine surgery. With the field expanding as fast as it is with new prediction models being published every month, we expect the overrepresentation of spine surgery to be temporary in a field in its infancy. While there is wide variation in study design, certain study design elements are fairly similar across most studies. The most common designs comprise binary outcomes; either a 70:30 or 80:20 split between training and test set; and 10-FCV as method of internal validation. Wide variety exists in study size, timepoint of outcome, and choice of ML algorithms. Study size is mostly defined by whether a national database or registry was used for model development. These quality improvement


530

databases offer a large number of datapoints with a variety of variables of a diverse group of hospitals, enabling the creation of prediction models. However, these databases are sometimes flawed by errors and their generalizability is also yet to be assessed (Rolston et al. 2017). External validation remains crucial considering generalizability outside the geographical origin of the database is not ensured (Janssen et al. 2018). Institutional databases offer the advantage of more veracious data, for instance including PROM data, which can extend over longer periods of time, but often lack adequate size. Which ML algorithm is chosen seems highly random. While studies do list the pros and cons of certain algorithms, no study elaborates on why those algorithms were specifically chosen. A potential reason neural networks and random forests are selected so often is the familiarity of these algorithms. Neural networks have been around for decades, but were limited by lagging computational power (Hopfield 1988). The increase in computational power has led to a significant expansion of what neural networks can process and scientists have been able to build on the work of previous decades (Schmidhuber 2015). Future research should report on multiple ML algorithms and provide the performance measures of all models, thus enabling comparison between different approaches. Despite the importance of performance metrics, a mere onethird of prediction models included information on calibration, similar to prior studies assessing prediction models in multiple medical domains (Bouwmeester et al. 2012, Heus et al. 2018). Calibration is important to evaluate wehther the model is under- or overestimating the risk regardless of the discriminative abilities. Systematically underestimating risk can lead to undertreatment, while overestimating risk can cause overtreatment (Van Calster and Vickers 2015, Van Calster et al. 2019). To improve the quality of reporting of clinical prediction models, Collins et al. (2015) published the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) statement. While not tailored for ML prediction models this guideline can provide a framework for researchers to use during development. Hopefully, a more widespread adaptation of the TRIPOD statement can lead to less variation in study designs and better reporting of performance metrics. Only one-fifth of prediction models have a digital application available. The purpose of prediction models is to aid clinicians and patients in decision-making, which can be achieved only if the models are available for use. Otherwise, predictive analytics based on ML will remain a mere theoretical exercise. Furthermore, researchers should be encouraged to not only provide a digital application of their prediction model, but share their code as well. With a field in its infancy, providing code of more experienced researchers can guide beginning research groups in their endeavors. Additionally, this can greatly increase the small number of external validation studies being performed.

Acta Orthopaedica 2021; 92 (5): 526–531

In conclusion, ML prediction models have been developed for a wide variety of topics in orthopedic surgery. Topics regarding medical management and survival were the most commonly studied and spine surgery was the most involved subspecialty. Heterogeneity between studies is mostly based on study size, choice of ML algorithm, and time-point of outcome. Most published prediction models showed fair to good discriminative abilities, while calibration was poorly reported. Future studies should preferably include more multi-institutional, prospective databases and develop multiple models enabling comparison between different ML approaches. Also, important performance measures such as calibration should be reported to evaluate the prediction model accurately. Supplementary data Table 2 and appendices 1 and 2 are available as supplementary data in the online version of this article, http://dx.doi.org/ 10.­1080/17453674.2021.1932928 All authors made a substantial contribution to the study. PTO, OQG, CO, JJV, and JHS contributed to the conception of the study. PTO and OQG screened all the titles and abstracts. PTO, OQG, AVK, and MB participated in data collection. PTO and OQG conducted the statistical analyses and prepared the manuscript. All authors contributed to interpretation of the data and participated in revision of the manuscript. Acta thanks Max Gordon and Christoph Hubertus Lohmann for help with peer review of this study.

AAOS Department of Clinical Quality and Value. Orthopaedic Practice in the US 2018. 2019 (January): 1-68. Bini S A. Artificial intelligence, machine learning, deep learning, and cognitive computing: what do these terms mean and how will they impact health care? J Arthroplasty 2018; 33(8): 2358-61. Boulesteix A L, Stierle V, Hapfelmeier A. Publication bias in methodological computational research. Cancer Inform 2015; 14(Suppl. 5): 11-19. Bouwmeester W, Zuithoff N P A, Mallett S, Geerlings M I, Vergouwe Y, Steyerberg E W, Altman D G, Moons K G M. Reporting and methods in clinical prediction research: a systematic review. PLoS Med 2012; 9(5): e1001221. https://doi.org/10.1371/journal.pmed. Bryan K. Policies for reducing delayed discharge from hospital. Br Med Bull 2010; 95(1): 33-46. Cabitza F, Locoro A, Banfi G. Machine learning in orthopedics: a literature review. Front Bioeng Biotechnol 2018; 6: 75. Collins G S, Reitsma J B, Altman D G, Moons K G M. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement. Eur Urol 2015; 67(6): 1142-51. Heus P, Damen J A A G, Pajouheshnia R, Scholten R J P M, Reitsma J B, Collins G S, Altman D G, Moons K G M, Hooft L. Poor reporting of multivariable prediction model studies: towards a targeted implementation strategy of the TRIPOD statement. BMC Med 2018; 16(1): 120. Hopfield J J. Artificial neural networks. IEEE Circuits Devices 1988; 4(5): 3-10. Janssen S J, van der Heijden A S, van Dijke M, Ready J E, Raskin K A, Ferrone M L, Hornicek F J, Schwab J H. 2015 Marshall Urist Young Investigator Award: Prognostication in patients with long bone metastases: does a boosting algorithm improve survival estimates? Clin Orthop Relat Res 2015; 473(10): 3112-21.


Acta Orthopaedica 2021; 92 (5): 526–531

Janssen D M C, van Kuijk S M J, D’Aumerie B B, Willems P C. External validation of a prediction model for surgical site infection after thoracolumbar spine surgery in a Western European cohort. J Orthop Surg Res 2018; 13(1): 114. Jayakumar P, Moore M L G, Bozic K J. Value-based healthcare: can artificial intelligence provide value in orthopaedic surgery? Clin Orthop Relat Res 2019; 477(8): 1777-80. Johnson W C, Seifi A. Trends of the neurosurgical economy in the United States. J Clin Neurosci 2018; 53(2018): 20-6. Langerhuizen D W G, Janssen S J, Mallee W H, Van Den Bekerom M P J, Ring D, Kerkhoffs G M M J, Jaarsma R L, Doornberg J N. What are the applications and limitations of artificial intelligence for fracture detection and classification in orthopaedic trauma imaging? A systematic review. Clin Orthop Relat Res 2019; 477(11): 2482-91. OECD 2019. Health at a Glance 2019. Available at: https://www.oecd-ilibrary. org/social-issues-migration-health/health-at-a-glance-2019_4dd50c09-en. Ou L, Chen J, Young L, Santiano N, Baramy L-S, Hillman K. Effective discharge planning: timely assignment of an estimated date of discharge. Aust Heal Rev 2011; 35(3): 357. Papanicolas I, Woskie L R, Jha A K. Health care spending in the United States and other high-income countries. JAMA 2018; 319(10): 1024-39. Pereira N R P, Janssen S J, Van Dijk E, Harris M B, Hornicek F J, Ferrone M L, Schwab J H. Development of a prognostic survival algorithm for patients with metastatic spine disease. J Bone Joint Surg Am 2016; 98(21): 1767-76. Quinn R H, Randall R L, Benevenia J, Berven S H, Raskin K A. Contemporary management of metastatic bone disease: tips and tools of the trade for general practitioners. Instr Course Lect 2014; 63: 431-41. Rolston J D, Han S J, Chang E F. Systemic inaccuracies in the National Surgical Quality Improvement Program database: implications for accuracy and validity for neurosurgery outcomes research. J Clin Neurosci 2017; 37(2017): 44-7.

531

Schmidhuber J. Deep learning in neural networks: an overview. Neural Networks 2015: 85-117. Schwierz C. Cost-containment in the European Union 2016. Https:// ec.europa.eu/info/publications/economy-finance/cost-containment-policies-hospital-expenditure-european-union_en Senders J T, Staples P C, Karhade A V, Zaki M M, Gormley W B, Broekman M L D, Smith T R, Arnaout O. Machine learning and neurosurgical outcome prediction: a systematic review. World Neurosurg 2018; 109: 476-486.e1. Shah A A, Ogink P T, Nelson S B, Harris M B, Schwab J H. Nonoperative management of spinal epidural abscess: development of a predictive algorithm for failure. J Bone Joint Surg Am 2018; 100(7): 546-55. Shrank W H, Rogstad T L, Parekh N. Waste in the US health care system: estimated costs and potential for savings. JAMA 2019; 322(15): 1501-9. Slim K, Nini E, Forestier D, Kwiatkowski F, Panis Y, Chipponi J. Methodological index for non-randomized studies (Minors): development and validation of a new instrument. ANZ J Surg 2003; 73(9): 712-16. Staartjes V E, Stumpo V, Kernbach J M, Klukowska A M, Gadjradj P S, Schröder M L, Veeravagu A, Stienen M N, van Niftrik C H B, Serra C, Regli L. Machine learning in neurosurgery: a global survey. Acta Neurochir (Wien) 2020; 162(12): 3081-91. Van Calster B, Vickers A J. Calibration of risk prediction models: impact on decision-analytic performance. Med Decis Mak 2015; 35(2): 162-9. Van Calster B, McLernon D J, Van Smeden M, Wynants L, Steyerberg E W, Bossuyt P, Collins G S, MacAskill P, McLernon D J, Moons K G M, Steyerberg E W, Van Calster B, Van Smeden M, Vickers A J. Calibration: the Achilles heel of predictive analytics. BMC Med 2019; 17(1): 1-7. Wan H, Zhang L, Witz S, Musselman K J, Yi F, Mullen C J, Benneyan J C, Zayas-Castro J L, Rico F, Cure L N, Martinez D A. A literature review of preventable hospital readmissions: [receding the Readmissions Reduction Act. IIE Trans Healthc Syst Eng 2016; 6(4): 193-211.


532

Acta Orthopaedica 2021; 92 (5): 532–537

Properties of SF-6D when longitudinal data from 16,398 spine surgery procedures is applied to 9 national SF-6D value sets Anders JOELSON 1, Freyr Gauti SIGMUNDSSON 1, and Jan KARLSSON 2 1 Department

of Orthopedics, Orebro University School of Medical Sciences and Orebro University Hospital, Orebro; 2 University Health Care Research Center, Faculty of Medicine and Health, Orebro University, Orebro, Sweden Correspondence: anders@joelson.se Submitted 2020-10-26. Accepted 2021-03-22.

Background and purpose — There are several national value sets for SF-6D. For studies conducted in countries without a country-specific value set the authors may use a value set from a neighboring or culturally similar county. We evaluated the consequences of using different national value sets in SF-6D index-based outcome analyses. Patients and methods — Patients surgically treated for lumbar spinal stenosis or lumbar disk herniation between 2007 and 2017 were recruited from the national Swedish spine register. 16,398 procedures were eligible for analysis. The SF-6D health states were coded to SF-6D preference indices using value sets for 9 countries. The SF-6D index distributions were then estimated with kernel density estimation. The change in SF-6D index before and after treatment was evaluated with the standardized response mean (SRM). Results — There was a marked variability in mean and shape for the resulting SF-6D index distributions. There were considerable differences in SF-6D index distribution shape before and after treatment using the same value set. The effect sizes of 2-year change (SRM) were in most cases similar when the 9 value sets were applied on pre- and posttreatment data. Interpretation — We found a marked variability in SF-6D index distributions when a single large data set was applied to 9 national SF-6D value sets. Consequently, we recommend that SF-6D index data from studies conducted in countries without country-specific SF-6D value sets is interpreted with caution.

The Short Form 6-dimensional instrument (SF-6D) (Brazier et al. 1998, 2002, Brazier and Roberts 2004) and the EuroQol 5-dimensional instrument (EQ-5D) (EuroQol Group 1990) are 2 similar multilevel preference-based measures for assessment of general health. The instruments are primarily used for calculation of quality adjusted life years (QALYs) in economic evaluation of health interventions. The two instruments use different national value sets (also called tariffs) to adjust for national differences in experience of health-related quality of life (HRQoL). For EQ-5D, previous studies have raised the concern that data derived from different national value sets is not fully comparable. Van Dongen et al. (2021) estimated EQ-5D index values for 16 country-specific value sets and found that the use of different country-specific value sets has an impact on cost–utility outcomes. This finding is of particular importance when conducting studies in countries without country- or region-specific value sets, as the results may depend on the choice of value set. For SF-6D, data on national variations in SF-6D index distribution is lacking. We evaluated the consequences of using different national value sets in SF-6D index-based outcome analyses. We applied a single large longitudinal SF-6D data set to several national SF-6D value sets to explore differences in SF-6D index distributions and effect sizes before and after treatment. We used SF-36 data (collected before and 2 years after surgery) from the national Swedish spine register (Swespine) for 2 of the most common spinal surgery diagnoses: spinal stenosis and disk herniation.

Patients and methods Study design This study was a register study with prospectively collected longitudinal data from the national Swedish spine register, Swespine. Swespine was launched in 1992, the national coverage is 90% of the spine units in Sweden, and the follow-up rate is 75–80% (Strömqvist et al. 2013). © 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group, on behalf of the Nordic Orthopedic Federation. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits ­unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. DOI 10.1080/17453674.2021.1915524


Acta Orthopaedica 2021; 92 (5): 532–537

533

SF-6D SF-6D is a multilevel preference-based measure for assessment of general health (Brazier et al. 1998, 2002, 2020 Brazier and Roberts 2004). The 6 dimensions are: physical functioning (PF), role limitations (RL), social functioning (SF), pain (P), mental health (MH), and vitality (VT). SF-6D is based on 11 items of the Medical Outcomes Study 36-item short-form health survey (SF-36) (Ware and Sherbourne 1992). PF and P have 6 response options, SF, MH, and VT have 5 response options, and RL has 4 response options. The response options are coded on an ordinal scale from 1 to 4–6 (1 being the best). The answers are assembled to a 6-digit health state reflecting the score on each dimension (in total 6×4×5×6×5×5 = 18,000 states, 111111 being the best and 645655 the worst). There are 2 major versions of SF-6D, the SF-36 version (Brazier et al. 2002) (used in this study) and the SF-12 version (Brazier and Roberts 2004). Each health state can be coded to a preference-based index (hereafter denoted as SF-6D index) using a value set (tariff). The SF-6D index usually ranges between 0 (states equal to death) and 1 (full health) but some SF-6D indices also include values less than 0 (health states worse than death). There are several national value sets for coding health states to SF-6D indices. For our study, we wanted broad coverage of several continents and we selected the following 9 national value sets for our investigation: Australia (Norman et al. 2014), Brazil (Cruz et al. 2011), China (Lam et al. 2008), Spain (Abellan Perpinan et al. 2012), Hong Kong (McGhee et al. 2011), Lebanon (Kharroubi et al. 2020), Portugal (Ferreira et al. 2010), the UK (Brazier et al. 2002), and the USA (Craig et al. 2013). Our literature review also identified a Dutch value set (Jonker et al. 2018) and a Japanese value set (Brazier et al. 2009). The Dutch value set was excluded from our analysis since it was based on SF-12. The Japanese value set was excluded because of inferior performance compared with the UK value set in terms of inconsistencies (worsening in dimension did not lower the index) and prediction errors (capability to predict the index of a health state).

Table 1. Characteristics of the study population

Patient data set Patients were recruited from the national Swedish spine register (Swespine). 52,560 procedures for surgical treatment of lumbar spinal stenosis and lumbar disk herniation between 2007 and 2017 are included in the register. Preoperative or 2-year postoperative SF-6D data was incomplete for 36,162 procedures, which gave 16,398 (31%) procedures eligible for analysis (Table 1).

Ethics, data sharing, funding, and potential conflicts of interest The study was approved by the regional ethical review board (registration number: 2020-03557). Data are available from the national Swedish spine register (Swespine) after approval by a Swedish regional ethical review board and approval by the Swespine board. There was no external source of funding for this study. The authors declare no conflicts of interest.

Data transformation SF-36 data was collected from Swespine (preoperative and 2-year postoperative data). SF-36 data was converted to SF-6D and then coded to SF-6D indices using the 9 national value sets. The conversion from health state to index was implemented in the language R (R Foundation for Statistical

Factor Age, mean (SE) BMI (SE) Females, n (%) Smokers, n (%)

Spinal stenosis n = 8,888 66.6 (0.11) 27.8 (0.05) 3,999 (45) 826 (9)

Disk herniation n = 7,510 45.7 (0.15) 26.4 (0.05) 3,345 (45) 950 (13)

Computing, Vienna, Austria) using the models given in the references for the 9 value sets (see Supplementary Appendix for notes on implementation). General properties To illustrate general properties of the 9 national value sets, we computer generated a data set consisting of all 18,000 SF-6D health states (111111 to 645655) and then estimated the SF-6D index distribution for all 9 national value sets with kernel density estimation. Statistics The effect size (the difference in means in terms of standard deviations) was evaluated using the standardized response mean (SRM) for paired data (the difference in means divided by the standard deviation of the difference) (Fayers and Machin 2016, Table 20.1, p. 535). An approximate 95% confidence interval (CI) for SRM is given by Fayers and Machin (2016, eq. 20.4, p. 542 and Table 20.1, p. 535). The SRM was interpreted as follows (Fayers and Machin 2016, p. 499): < 0.2 no effect, 0.2–0.4 small effect, 0.5–0.7 moderate effect, > 0.7 large effect. The distribution of a random variable specifies how the values of the random variable are distributed for all possible values of the random variable. We used kernel density estimation with Gaussian kernels to estimate the distribution of the SF-6D index (hereafter denoted as the SF-6D index distribution). See Supplementary Appendix for details on kernel density estimation.

Results SF-36 data preoperatively and 2 years postoperatively are presented in Figure 1. Outcome was improved for all domains except for general health (GH).


534

Acta Orthopaedica 2021; 92 (5): 532–537

SF−36 score spinal stenosis

SF−36 score disk herniation

100

100 ●

Spinal stenosis preoperatively

Preoperatively 2 years postoperatively

80

80

Disk herniation preoperatively

Disk herniation postoperatively

Australia ●

60

40

Spinal stenosis postoperatively

60

40

SF−6D index distribution

Brazil ●

● ●

China

20

20 ● ●

0

PF

RP

BP

GH

VT

SF

SF−36 Domains

RE

0

MH

PF

RP

Hong Kong BP

GH

VT

SF

SF−36 Domains

RE

MH

Figure 1. SF-36 scores for spinal stenosis (n = 8,888) and disk herniation (n = 7,510) preoperatively (green dots) and 2 years postoperatively (blue triangles). The standard errors were less than 0.6 for all domains. PF = physical functioning, RP = role limitation due to physical problems, BP = bodily pain, GH = general health, VT = vitality, SF = social functioning, RE = role limitations due to emotional problems, MH = mental health.

Lebanon

Portugal

Spain

UK

SF−6D states preop. spinal stenosis

SF−6D states postop. spinal stenosis

400

400

USA –0.5 300

300

200

200

100

100

0.0

0.5

1.0

–0.5

222222

333333

444444 545555 645655

0 111111

222222

333333

SF−6D states preop. disk herniation

SF−6D states postop. disk herniation 400

300

300

200

200

–0.5

0.0

0.5

1.0

–0.5

Brazil

0.0

0.5

1.0

China

−0.5 0.0 0.5 1.0

−0.5 0.0 0.5 1.0

−0.5 0.0 0.5 1.0

Hong Kong

Lebanon

Portugal

−0.5 0.0 0.5 1.0

−0.5 0.0 0.5 1.0

−0.5 0.0 0.5 1.0

Spain

0 111111

1.0

444444 545555 645655

400

100

0.5

SF−6D index distribution Australia

0 111111

0.0

Figure 3. Kernel density estimates of the SF-6D index distributions for 9 different national SF-6D value sets based on data from patients treated for spinal stenosis (n = 8,888) and patients treated for disk herniation (n = 7,510).

UK

USA

100

222222

333333

444444 545555 645655

0 111111

222222

333333

444444 545555 645655

−0.5 0.0 0.5 1.0

−0.5 0.0 0.5 1.0

−0.5 0.0 0.5 1.0

Figure 2. Frequency of SF-6D states (triangles) before and after surgical treatment for spinal stenosis (n = 8,888) and disk herniation (n = 7,510).

Figure 4. Kernel density estimates of the SF-6D index distributions for 9 different national SF-6D value sets based on a computer-generated data set consisting of all possible 18,000 SF-6D health states (111111 to 645655).

The SF-6D state distributions preoperatively and postoperatively are presented in Figure 2. The distribution of health states showed some clustering both preoperatively and postoperatively. The health states shifted towards lower values postoperatively for both spinal stenosis and disk herniation. The best possible health state (111111) was the most common state 2 years postoperatively for both spinal stenosis (n = 189, 2.1%) and disk herniation (n = 323, 4.3%). The worst possible health state (645655) was uncommon (< 0.5% of the health states). The estimation of the SF-6D index distributions for the 9 national value sets is given in Figure 3. There were marked

differences in distribution shapes (unimodal, bimodal, and multimodal). The estimation of the SF-6D index distributions for all 18,000 SF-6D health states (111111 to 645655) is shown in Figure 4. There were substantial differences primarily in widths but also in shapes of the distributions. Table 2 summarizes the mean and median SF-6D indices for the different national SF-6D value sets. The mean and median values were similar for a given national value set. There were, however, substantial differences between the national value sets. The effect sizes of 2-year change (SRM) were in most


Acta Orthopaedica 2021; 92 (5): 532–537

535

Table 2. Preop and postop SF-6D indices for 16,398 spine surgery patients based on different national value sets Country

Mean (SE)

Preop Median (IQR)

Spinal stenosis (n = 8,888) Australia 0.27 (0.0024) Brazil 0.58 (0.0010) China 0.54 (0.0015) Spain 0.37 (0.0023) Hong Kong 0.59 (0.0013) Lebanon 0.70 (0.0012) Portugal 0.75 (0.0010) UK 0.61 (0.0013) USA 0.81 (0.0006) Disk herniation (n = 7,510) Australia 0.18 (0.0025) Brazil 0.54 (0.0011) China 0.47 (0.0018) Spain 0.27 (0.0026) Hong Kong 0.54 (0.0014) Lebanon 0.65 (0.0013) Portugal 0.71 (0.0012) UK 0.56 (0.0014) USA 0.79 (0.0007)

Mean (SE)

Postop Median (IQR)

Effect size SRM (95% CI)

0.26 (0.10–0.44) 0.57 (0.51–0.63) 0.55 (0.45–0.64) 0.38 (0.24–0.51) 0.55 (0.50–0.69) 0.70 (0.63–0.78) 0.74 (0.69–0.81) 0.61 (0.54–0.70) 0.81 (0.77–0.85)

0.48 (0.0031) 0.67 (0.0014) 0.67 (0.0019) 0.57 (0.0029) 0.70 (0.0017) 0.79 (0.0014) 0.82 (0.0012) 0.72 (0.0015) 0.86 (0.0008)

0.50 (0.24–0.70) 0.67 (0.57–0.78) 0.68 (0.55–0.81) 0.58 (0.38–0.81) 0.72 (0.54–0.84) 0.81 (0.69–0.92) 0.83 (0.74–0.92) 0.73 (0.61–0.84) 0.86 (0.81–0.93)

0.76 (0.74–0.78) 0.70 (0.68–0.73) 0.72 (0.70–0.75) 0.75 (0.73–0.78) 0.71 (0.69–0.73) 0.71 (0.69–0.74) 0.67 (0.65–0.69) 0.73 (0.71–0.76) 0.69 (0.66–0.71)

0.15 (0.02–0.33) 0.54 (0.48–0.59) 0.47 (0.36–0.58) 0.30 (0.12–0.43) 0.50 (0.46–0.62) 0.65 (0.57–0.73) 0.72 (0.65–0.78) 0.56 (0.47–0.64) 0.79 (0.74–0.83)

0.59 (0.0034) 0.71 (0.0016) 0.73 (0.0020) 0.67 (0.0030) 0.76 (0.0018) 0.84 (0.0015) 0.86 (0.0012) 0.77 (0.0016) 0.89 (0.0009)

0.64 (0.40–0.83) 0.69 (0.61–0.78) 0.75 (0.63–0.84) 0.74 (0.51–0.87) 0.81 (0.66–0.87) 0.88 (0.76–0.94) 0.88 (0.81–0.93) 0.80 (0.68–0.86) 0.90 (0.84–0.96)

1.33 (1.29–1.36) 1.18 (1.15–1.21) 1.29 (1.26–1.32) 1.34 (1.31–1.38) 1.30 (1.27–1.33) 1.30 (1.27–1.33) 1.21 (1.18–1.24) 1.29 (1.26–1.32) 1.17 (1.14–1.20)

cases similar when the 9 value sets were applied to pre- and post-treatment data. For spinal stenosis the patients had moderate treatment effects and for disk herniation the treatment effects were large.

Discussion The primary purpose of this study was to investigate whether the choice of national value set had any impact on the SF-6D indices. To our knowledge our study is the first comparison of several different SF-6D national value sets based on a large longitudinal data set. We found a marked variability in SF-6D index distributions when a single large data set was applied to 9 national SF-6D value sets (Figure 3). There were differences between the different value sets and also differences between distributions before and after surgery using the same value set. This means that it is not only the mean/median SF-6D index that may change after a medical intervention: the entire shape of the distribution may be different after an intervention. This finding has consequences for the statistical inference on paired data when SF-6D index before and after a medical intervention is evaluated (assumptions on normality and/or variance equality are violated). There were marked differences in the SF-6D indices between the different value sets, both before and after surgery (Table 2). The effect sizes of 2-year change (SRM), however, were in most cases similar when the 9 value sets were applied to pre- and post-treatment data (see Table 2). This means that evaluations of treatment effects, in some cases, seem to be less

sensitive to differences in value sets than the absolute index values. The SRM is often used to evaluate responsiveness to changes in psychometric evaluations of HRQoL instruments. The SRMs of our study are similar to the SRM reported by Carreon et al. (2013) for a cohort of 1,104 patients who underwent lumbar decompression and fusion. Angst et al. (2017) suggested that the SRM can be used as an approximate estimate of the minimal clinically important difference (MCID) (cut off SRM 0.3–0.5) if an MCID evaluation study is not available for reasons of cost, time, or other constraints. For our data, the minimum SRM was 0.67 (Portuguese value set), which suggests that the improvements in SF-6D, irrespective of choice of value set, are clinically significant for spinal stenosis surgery and disk herniation surgery. QALY gain calculation is different from effect size calculation because QALY gain, as opposed to effect size, is often calculated in terms of differences in means and not in terms of differences in standard deviations (Sassi 2006). For example, in our data for spinal stenosis (see Table 2), the UK mean SF-6D index increases 0.11 points (from 0.61 to 0.72) while the corresponding US index increment is 0.05 points. Consequently, using different SF-6D value sets on exactly the same data set might result in substantial differences in QALY gain. This finding is of particular importance when conducting studies in countries without country- or region-specific value sets, as the results may depend on the choice of value set. Few previous studies have compared different national SF-6D value sets. The validation of the US SF-6D value set (Craig et al. 2013) showed good correlation when applying data of 8,428 respondents to the US and the UK value sets. In contrast, our distribution estimates showed marked differences


536

in shape and widths between the US and UK distributions. A validation of the Lebanese SF-6D value set (Kharroubi et al. 2020) found marked differences between the Lebanese SF-6D model and the UK SF-6D model. Also, the predictive ability of the Lebanese model was superior to the UK model when applying Lebanese data. The differences in models are confirmed by our distribution estimates (see Figure 3). Ferraz et al. (2019) applied the Brazilian and the UK preference weights on a Brazilian urban population and found only small quantitative differences. In contrast, our study found marked differences in distributions (see Figure 3) and effect sizes (see Table 2) when comparing the Brazilian and UK value sets. One possible explanation for the marked differences in our study is that patients from different countries may fill out the SF-6D differently while being in the same health condition. Consequently, there may be an imbalance between the response pattern of Swedish patients and, e.g., the UK value set. This imbalance might partly explain the differences in SF-6D index distributions found in our study. The estimated SF-6D index distributions based on the computer-generated data set consisting of all 18,000 SF-6D health states (Figure 4) explain some of the properties of the distributions given in Figure 3, e.g., the large width of the Australian distribution and the limited width of the US distribution. Some properties, however, are more difficult to understand, e.g., the bimodality of the Hong Kong distribution. The clustering of SF-6D states illustrated in Figure 2 seems to have only a minor impact on the SF-6D index distribution. Our findings should be evaluated in the light of several limitations. First, we recognize the inherent limitations of register data, e.g., lack of confounder information, missing data, or unknown data quality (Thygesen and Ersbøll 2014). Second, the data were limited to spine surgery patients, i.e., persons with problems mainly related to the musculoskeletal system. Third, the conversion of the 18,000 SF-6D health states to the SF-6D index represents a nonlinear multivariate transformation on discrete, sometime clustered, data. The analysis of such a model is mathematically challenging. In favor of more complex mathematical methods, we used descriptive statistics and graphical representations to explore our data. Fourth, we implemented SF-6D using the specification given in the paper by Brazier et al. (2002). The specification has inconsistencies that may introduce systematic errors in our SF-6D data (cf. Supplementary Appendix). Fifth, data were complete for 31% of the procedures. In conclusion, we found a marked variability in SF-6D index distributions when a single large data set was applied to 9 national SF-6D value sets. Consequently, studies that aggregate international data, e.g., meta-analyses, may produce misleading results if the underlying differences in SF-6D index distributions are inadequately handled. On the basis of the results of our study we recommend that SF-6D index data from studies conducted in countries without country or region-specific SF-6D value sets is interpreted with caution.

Acta Orthopaedica 2021; 92 (5): 532–537

Supplementary data The Appendix is available as supplementary data in the online version of this article, http://dx.doi.org/10.1080/17453674. 2021.1915524 Study design: AJ, FGS, JK. Analysis of data: AJ. Interpretation of data: AJ, FGS, JK. Drafting the manuscript: AJ. Critically revising the manuscript: FGS, JK. Acta thanks Filip C Dolatowski and Ivar Rossvoll for help with peer review of this study.

Abellan Perpinan J M, Sanchez Martinez F I, Martinez Perez J E, Méndez I. Lowering the ‘floor’ of the SF-6D scoring algorithm using a lottery equivalent method. Health Econ 2012; 21(11): 1271-85. Angst F, Aeschlimann A, Angst J. The minimal clinically important difference raised the significance of outcome effects above the statistical level, with methodological implications for future studies. J Clin Epidemiol 2017; 82: 128-36. Brazier J E, Roberts J. The estimation of a preference-based measure of health from the SF-12. Med Care 2004; 42(9): 851-9. Brazier J, Usherwood T, Harper R, Thomas K. Deriving a preferencebased single index from the UK SF-36 Health Survey. J Clin Epidemiol 1998; 51(11): 1115-28. Brazier J, Roberts J, Deverill M. The estimation of a preference-based measure of health from the SF-36. J Health Econ 2002; 21(2): 271-92. Brazier J E, Fukuhara S, Roberts J, Kharroubi S, Yamamoto Y, Ikeda S, Doherty J, Kurokawa K. Estimating a preference-based index from the Japanese SF-36. J Clin Epidemiol 2009; 62(12): 1323-31. Brazier J E, Mulhern B J, Bjorner J B, Gandek B, Rowen D, Alonso J, Vilagut G, Ware J E, SF-6Dv2 International Project Group. Developing a new version of the SF-6D health state classification system from the SF-36v2: SF-6Dv2. Med Care 2020; 58(6): 557-65. Carreon L Y, Berven S H, Djurasovic M, Bratcher K R, Glassman S D. The discriminative properties of the SF-6D compared with the SF-36 and ODI. Spine (Phila Pa 1976) 2013; 38(1): 60-4. Craig B M, Pickard A S, Stolk E, Brazier J E. US valuation of the SF-6D. Med Decis Making 2013; 33(6): 793-803. Cruz L N, Camey S A, Hoffmann J F, Rowen D, Brazier J E, Fleck M P, Polanczyk C A. Estimating the SF-6D value set for a population-based sample of Brazilians. Value Health 2011; 14(5 Suppl. 1): S108-14. EuroQol Group. EuroQol—a new facility for the measurement of healthrelated quality of life. Health Policy 1990; 16(3): 199-208. Fayers P M, Machin D. Quality of life: The assessment, analysis and reporting of patient-reported outcomes. 3rd ed. Chichester: Wiley; 2016. Ferraz M B, Nardi E P, Campolina A G. A comparison of UK and Brazilian SF-6D preference weights when applied to a Brazilian urban population. Value Health Reg Issues 2019; 20: 21-7. Ferreira L N, Ferreira P L, Pereira L N, Brazier J, Rowen D. A Portuguese value set for the SF-6D. Value Health 2010; 13(5): 624-30. Jonker M F, Donkers B, de Bekker-Grob E W, Stolk E A. Advocating a paradigm shift in health-state valuations: the estimation of time-preference corrected QALY tariffs. Value Health 2018; 21(8): 993-1001. Kharroubi S A, Beyh Y, Harake M D E, Dawoud D, Rowen D, Brazier J. Examining the feasibility and acceptability of valuing the Arabic version of SF-6D in a Lebanese population. Int J Environ Res Public Health 2020; 17(3): 1037. Lam C L K, Brazier J, McGhee S M. Valuation of the SF-6D health states is feasible, acceptable, reliable, and valid in a Chinese population. Value Health 2008; 11(2): 295-303. McGhee S M, Brazier J, Lam C L K, Wong L C, Chau J, Cheung A, Ho A. Quality-adjusted life years: population-specific measurement of the quality component. Hong Kong Med J 2011; 17(Suppl. 6): 17-21.


Acta Orthopaedica 2021; 92 (5): 532–537

Norman R, Viney R, Brazier J, Burgess L, Cronin P, King M, Ratcliffe J, Street D. Valuing SF-6D health states using a discrete choice experiment. Med Decis Making 2014; 34(6): 773-86. Rizzo M L. Statistical computing with R. 2nd ed. Boca Raton, FL: CRC Press Taylor & Francis Group; 2019. Sassi F. Calculating QALYs, comparing QALY and DALY calculations. Health Policy Plan 2006; 21(5): 402-8. Strömqvist B, Fritzell P, Hägg O, Jönsson B, Sandén B, Swedish Society of Spinal Surgeons. Swespine: the Swedish spine register: the 2012 report. Eur Spine J 2013; 22(4): 953-74.

537

Thygesen L C, Ersbøll A K. When the entire population is the sample: strengths and limitations in register-based epidemiology. Eur J Epidemiol 2014; 29(8): 551-8. Van Dongen J M, Jornada Ben A, Finch A P, Rossenaar M M M, Biesheuvel-Leliefeld K E M, Apeldoorn A T, Ostelo R W J G, van Tulder M W, van Marwijk H W J, Bosmans J E. Assessing the impact of EQ-5D countryspecific value sets on cost-utility outcomes. Med Care 2021; 59(1): 82-90. Ware J E Jr, Sherbourne C D. The MOS 36-item short-form health survey (SF-36), I: Conceptual framework and item selection. Med Care 1992; 30(6): 473-83.


538

Acta Orthopaedica 2021; 92 (5): 538–543

Preoperative BMD does not influence femoral stem subsidence of uncemented THA when the femoral T-score is > –2.5 Karen DYREBORG 1,2, Michala S SØRENSEN 1, Gunnar FLIVIK 3, Søren SOLGAARD 2, and Michael M PETERSEN 1 1 Department of Orthopaedic Surgery, Rigshospitalet, København, Denmark; 2 Department of Hip and Knee Surgery, Herlev-Gentofte Hospital, Hellerup, Denmark; 3 Department of Orthopaedics, Skåne University Hospital, Lund, Sweden Correspondence: karendyreborg@hotmail.com Submitted 2020-12-15. Accepted 2021-03-30.

Background and purpose — It is believed that in uncemented primary total hip arthroplasty (THA) the anchorage of the stem is dependent on the level of bone mineral density (BMD) of the femoral bone. This is one of the reasons for the widely accepted agreement that a cemented solution should be selected for people with osteoporosis or age > 75 years. We evaluated whether preoperative BMD of the femur bone is related to femoral stem migration in uncemented THA. Patients and methods — We enrolled 62 patients (mean age 64 years (range 49–74), 34 males) scheduled for an uncemented THA. Before surgery we undertook DEXA scans of the proximal femur including calculation of the T- and Z-scores for the femoral neck. Evaluation of stem migration by radiostereometric analysis (RSA) was performed with 24 months of follow-up. In 56 patients both preoperative DEXA data and RSA data were available with 24 months of follow-up. Results — None of the patients had a T-score below –2.5. We found no statistically significant relationship between preoperative BMD and femoral stem subsidence after 3 or 24 months. When comparing the average femoral stem subsidence between 2 groups with T-score > –1 and T-score ≤ –1, respectively, we found no statistically significant difference after either 3 or 24 months when measured with RSA. Interpretation — In a cohort of people ≤ 75 years of age and with local femur T-score > –2.5 we found no relationship between preoperative BMD and postoperative femoral stem subsidence of a cementless THA.

Early migration of total hip arthroplasty (THA) femoral stems is expected to some extent (Alfaro-Adrian et al. 2001). Cemented stems migrate less than uncemented do, because the initial stabilization is secured with bone cement, but both migrate in a similar pattern (Nysted et al. 2014, Van Der Voort et al. 2015, Teeter et al. 2018). The fixation of the stem and the risk of fracture are believed to rely on the density of the surrounding bone, which is why it is considered rational to fixate THAs in elderly and/or people with osteoporosis (or other disorders affecting the bone) by using bone cement (Piarulli et al. 2013, Troelsen et al. 2013, Gulati and Manktelow 2017). The BMD of the hip is the most reliable estimate to predict hip fracture risk and is interpreted by using the World Health Organization’s definition of T- and Z-score (Johnell et al. 2005, Blake and Fogelman 2007). Radiostereometric analysis (RSA) is used to measure the rotations and translations. The migration of interest is primarily translation along the Y-axis (Y-translation), where a negative value is distal migration, i.e., subsidence (Li et al. 2014, Weber et al. 2014, Matejcic et al. 2015). There are few studies comparing the local BMD with the migration of an uncemented THA stem, but some show that lower femoral BMD leads to increased subsidence (Mears et al. 2009), while other studies cannot demonstrate such a relationship (Moritz et al. 2011). Women with low systemic BMD have been reported to have a tendency to higher migration (Aro et al. 2012, Nazari-Farsani et al. 2020). Our study is partly based on secondary endpoint data from a randomized controlled trial (RCT) (Dyreborg et al. 2020). The main aim of the present study was to evaluate whether preoperative BMD of 3 regions in the femoral bone is related to femoral stem subsidence in uncemented THA. Furthermore, we determined whether a standard hip dual-energy X-ray absorptiometry (DEXA) scan, normally used for diagnosis of osteoporosis, could be used for the above purpose.

© 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group, on behalf of the Nordic Orthopedic Federation. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits ­unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. DOI 10.1080/17453674.2021.1920163


Acta Orthopaedica 2021; 92 (5): 538–543

539

Stem subsidence (mm) 3

2

1

0 0

3

6

12

24

Months after surgery

Figure 1. Mean subsidence (error bars = standard error of mean) for the 2 groups of uncemented THA femoral stem (n = 56) combined into 1 group.

Figure 2. DEXA scan of the femoral neck.

We hypothesized that low preoperative femoral BMD is related to higher stem subsidence.

Patients and methods The study is a cohort study with 24 months of follow-up after primary THA. All patients included took part in a prospective randomized clinical RSA trial with 2-year follow-up, where the patients were randomly allocated to receive 1 of 2 uncemented femoral stems (Dyreborg et al. 2020). We could not demonstrate a statistically significant difference between the migrations of the 2 groups, thus we consider them as 1 group for the present study (Figure 1). Study questions Primary question Is there a linear relation between the preoperative BMD in the femur and the degree of postoperative subsidence of the femoral component in primary uncemented THA? Secondary question Can a preoperative standard osteoporosis DEXA scan be used in predicting the preoperative BMD of the trochanter area and the shaft region, respectively? Because our primary question was rejected, i.e., there is no linear relation between preoperative BMD and the degree of postoperative subsidence, we went on and asked: Could there be a relationship between the preoperative BMD when data is dichotomized into subgroups of T-score > –1 or ≤ –1 and Z-score > 0 or ≤ 0 and postoperative stem migration at 3 and 24 months? Implants All patients received an uncemented Echo Bi-Metric Full Proximal Profile THA stem or an uncemented Bi-Metric Porous Primary THA stem, a 32 mm chrome-cobalt head and

Figure 3. Placement of the regions of interest (ROI) on the femur DEXA scan: the trochanteric region (ROI(t)) and the shaft region (ROI(s)).

an Exceed ABT RingLoc-X acetabular shell with a highly cross-linked polyethylene liner (Zimmer Biomet Inc, Warsaw, IN, USA). Both stems are press-fit titanium alloy stems with a proximal plasma spray porous titanium coating designed for bone ingrowth and proximal load and weightbearing. The distal part of the stems has a roughened titanium surface for bone ongrowth. None of the implants were coated with hydroxyapatite (HA). Dual energy X-ray absorptiometry (DEXA) DEXA scans were performed before surgery at the Department of Orthopaedic Surgery, Rigshospitalet, Denmark. The hips to be operated on were first scanned using the research scan option, starting from the level of the acetabulum and ending 25 cm distally. Sandbags secured stable and neutral rotation of the leg. Additionally, we made a preoperative standardized osteoporosis scan of the hip, with calculation of BMD of the femoral neck and the corresponding T- and Z-scores (normal population: Fem Neck Caucasian Copenhagen 93 v 2.3) (Figure 2). For these scans we used the manufacturer’s special fixation device to fixate the pelvis and lower limbs to ensure a reproducible hip BMD measurement. The results of these scans were not used to exclude any patient for inclusion in the RSA study. The research scans were not analyzed until after 24 months of follow-up had been completed for all patients. 2 regions of interest (ROI) were placed manually on the computerized scan plots to represent the trochanteric region (ROI(t)) and the shaft region (ROI(s)), respectively (Figure 3). All ROI markings were performed, starting with the marking of ROI(s) beginning just distal to the trochanter minor and ending 10 cm more distal. Placement of ROI(t), beginning proximal to the ROI(s) and including both trochanters up to a line representing the


540

Acta Orthopaedica 2021; 92 (5): 538–543

cut-off angle 1 cm proximal to the trochanter minor (45°), was then undertaken. In these 2 separate regions the local BMD was automatically calculated by the software. A Norland XR-46 bone densitometer (Norland Corp, Fort Atkinson, WI, USA) was used for measurements of BMD (g/ cm2). For the research scans, scan speed was sat at 45 mm/s and the pixel size at 1.0×1.0 mm and for the standard osteoporosis scans, scan speed was sat at 90 mm/s and the pixel size at 1.0×1.0 mm. Quality control of the machine was performed using daily calibration before the first scan. All the DEXA scans were carried out by trained health professionals. Radiostereometric analysis (RSA) During THA surgery 8 to10 tantalum markers (Ø = 0.8 mm) were inserted into the regions of both the trochanter major and minor. After mobilization, the patients had their baseline RSA radiographs taken at the Department of Diagnostic Radiology at Rigshospitalet, Copenhagen, Denmark (median = 6 days postoperatively). All RSA pictures were analyzed at the Biomechanics and RSA laboratory at Skåne University Hospital, Lund, Sweden, and with 24 months of follow-up the Y-translation (subsidence) was evaluated with model-based RSA software (version 4.1; RSAcore, Department of Orthopaedic Surgery, LUMC, the Nederlands). Statistics Demographic data (age, sex, height, weight, BMI, implant) was found to be normally distributed. No stratification for implant was done. We used linear regression to analyze for a potential relationship between preoperative BMD measured in the femoral neck region, the ROI(t), or ROI(s) and femoral stem migration expressed as the numeric value of the Y-translation at 3 and 24 months. We refer to Y-translation as subsidence unless otherwise stated. Additionally, we used linear regression analysis to evaluate whether preoperative BMD of the femoral neck from a standard osteoporosis DEXA scan could be used to predict the preoperative BMD of the specially designed regions ROI(t) and ROI(s), respectively. All data is presented as mean with range or 95% confidence intervals (CI) unless otherwise reported and results of the regression analysis are presented graphically with a scatter plot and the regression line with CI and the 95% prediction limits, the p-value, and the coefficient of correlation (R). To test whether a possible non-linear relationship between stem migration and BMD existed, the Y-translation data was divided into subgroups based on 2 clinically relevant parameters from the preoperative standardized osteoporosis scan of the hip: T-score > –1 or ≤ –1 and Z-score > 0 or ≤ 0. A possible difference in subsidence between groups of dichotomized data was evaluated using an unpaired t-test. The statistical software RStudio version 1.0.136 was used for all calculations (R Foundation for Statistical Computing, Vienna, Austria).

Assessed for eligibility n = 116 Excluded (n = 54): – declined, 48 – pilots, 4 – disease affecting bone metabolism, 2 Preoperative DEXA and THA n = 62 Excluded Revised before 3 months n=2 3-months RSA n = 60 Excluded (n = 3): – dead from other causes, 2 – discontinued, 1 3-months RSA n = 57 Not analyzed due to technical shortcomings n=1 Analyzed n = 56

Figure 4. Flowchart.

Ethics, registration, funding, and potential conflicts of interests The study was approved by the local Ethical Committee (H-42014-079), by the Danish Data Protection Agency (GEH2015-079, I-Suite no. 03764) and registered at ClinicalTrials. gov (NCT02656771). The study was carried out in accordance with the principles of the Helsinki Declaration. All patients were informed orally and in writing as prescribed in the recommendations and requirements of the local Scientific Ethical Committees. This work was supported by Zimmer Biomet (grant number C004287X), but the company did not take part in the planning, data collection, analysis, interpretation of the results, or writing of the manuscript. The authors declare no conflict of interests.

Results From February 2016 to September 2017, we enrolled 62 patients (mean age = 64 years [49–74], 34 males) (Figure 4). Of the 116 patients assessed for eligibility, 56 patients were included for analysis (Table 1). Femoral stem subsidence expressed as the numeric average value of the Y-translation was 1.2 mm (0.0–5.8) and 1.2 mm (0.0–5.8) after respectively 3 and 24 months of follow-up (Figure 1 and Table 1). Linear regression analysis showed no statistically significant relationship between subsidence (after 3 or 24 months) and preoperative BMD measured of the femoral neck region, ROI(t), or ROI(s), respectively (Figure 5).


Acta Orthopaedica 2021; 92 (5): 538–543

541

Table 1. Baseline data and results of DEXA and RSA data. Values are mean unless otherwise specified Implant (Bi-Metric/Echo Bi-Metric) Sex (male/female) Height (range) Weight (range) BMI (range) Age (range) median (IQR) T-score (range) Z-score (range) BMD, g/cm2 (range) femoral neck femoral shaft trochanter region Subsidence, mm (range) at 3 months at 24 months T-score (> –1/≤ –1) Z-score (> 0/≤ 0)

27/29 29/27 1.8 (1.6–2) 83 (50–124) 27 (18–38) 64 (49–74) 67 (11) –0.3 (–2.3 to 3.5) 1.2 (–1.3 to 4.5) 0.9 (0.7–1.5) 1.8 (1.2–2.4) 1.0 (0.7–1.4)

Subsidence 3 months (mm)

Subsidence 24 months (mm) R = −0.032 , p = 0.8

R = −0.073 , p = 0.6

6

6

4

4

2

2

0

0

−2

−2

0.8

1.0

1.2

1.4

0.8

BMD femoral neck region (mg/cm²) Subsidence 3 months (mm)

1.0

1.2

Subsidence 24 months (mm) R = −0.017 , p = 0.9

We found a statistically significant relationship between the preoperative BMD of the femoral neck region measured by standard osteoporosis DEXA scan and BMD measured by research DEXA scans of the ROI(t) (p < 0.001) and the BMD of the ROI(s) (p < 0.001) (Figure 6). None of the patients in the study had a preoperative T-score of the femoral neck below –2.5, but 21 of 56 had a femoral T-score below –1. 10 of 56 had BMD of the femoral neck region that was on average or below that of individuals of the same age and sex (Z-score ≤ 0) (Table 1). When comparing the average femoral stem subsidence between the 2 groups with T-score > –1 and T-score ≤ –1, we found no statistically significant difference in subsidence between the groups after 3 or 24 months. Likewise, when dividing the material based on the Z-score (Z-score > 0 and Z-score ≤ 0) no statistically significant difference was found between groups after 3 or 24 months (Table 2).

R = −0.077 , p = 0.6

6

6

4

4

2

2

0

0

−2

−2

1.2 (0.0–5.8) 1.2 (0.0–5.8) 35/21 46/10

0.8

1.0

1.2

1.4

0.8

1.0

BMD ROI(t) (mg/cm²) Subsidence 3 months (mm)

1.2

Subsidence 24 months (mm) R = −0.17 , p = 0.2

6

6

4

4

2

2

0

0

−2

−2

1.4

1.6

1.8

1.4

BMD ROI(t) (mg/cm²)

R = −0.12 , p = 0.4

1.2

1.4

BMD femoral neck region (mg/cm²)

2.0

2.2

2.4

1.2

1.4

1.6

BMD ROI(s) (mg/cm²)

1.8

2.0

2.2

2.4

BMD ROI(s) (mg/cm²)

Figure 5. Linear regression analysis of BMD in the femoral neck region, the ROI(s), and ROI(t) versus subsidence at 3 and 24 months, respectively. The shaded area represents the 95% confidence limits and the red broken lines the 95% prediction limits.

Discussion The main aim of this cohort study was to evaluate whether preoperative BMD of the femoral bone was related to femoral stem migration in uncemented THA up to 24 months after surgery. We found no statistically significant linear relationship between BMD and subsidence in any of the femoral regions investigated at either 3 months or 24 months, and when dividing BMD into clinically relevant groups of either normal or osteopenic femur (femoral T-score between –2.5 and –1) no difference in subsidence between the groups was found. Previous RSA studies have shown that uncemented THA femoral stems subside more than cemented ones, but in a comparable pattern with the subsidence occurring within the first 3 months followed by relative stabilization with minimal subsidence afterwards (Teeter et al. 2018). Both cemented and

BMD ROI(t) (mg/cm²)

BMD ROI(s) (mg/cm²)

3

R= 0.78, p < 0.001

3

2

2

1

1

0

0.8

1.0

1.2

1.4

BMD femoral neck region (mg/cm²)

0

R= 0.59, p < 0.001

0.8

1.0

1.2

1.4

BMD femoral neck region (mg/cm²)

Figure 6. Linear regression analysis of BMD of the femoral neck region versus BMD of ROI(t) and ROI(s), respectively. The shaded area represents the 95% confidence limits and the red broken lines the 95% prediction limits.


542

Acta Orthopaedica 2021; 92 (5): 538–543

healing, probably because of slower bone metabolism and cell turnover (Konstantinidis et al. 2016). HowSubsidence 24 months ever, it seems from our results that mean (range) p-value the threshold for this is T-score ≤ –2.5, since we do not find any linear 1.2 (0.0–4.7) association suggesting that osteope1 1.2 (0.0–5.8) nic bone have inferior quality to support an uncemented THA. 1.3 (0.0–5.8) 0.3 0.9 (0.8–1.8) After ending the study, we looked 1.2 (0.0–5.8) < 0.001 into the fracture history of the 1.3 (0.0–4.7) patients. We found that in a 10-year period before hip surgery until January 2021, 4 of the patients have had a fracture: 1 patellar fracture, 1 ankle fracture, 1 clavicular fracture, and 1 fracture of the distal radius. Hence, we found no obvious clinical signs of poor bone quality in this cohort. Moritz et al. (2011) reported that local intertrochanteric cancellous bone architecture is not a good predictor for RSA migration of anatomically designed cementless femoral stems. This was rather surprising because the rational expectation was that patients with impaired quality of intertrochanteric cancellous bone would reveal more implant migration than patients with normal cancellous bone. Our research group has previously identified a relationship between low preoperative BMD and high postoperative migration of the tibia component in patients with uncemented total knee arthroplasty (Andersen et al. 2017). Aro et al. (2012) reported that women with low systemic BMD (T ≤ –1) showed higher subsidence of an uncemented femoral stem than women with normal systemic BMD. However, they also included patients with T-score < –2.5 in their group of patients with low BMD, thus making a relationship more probable. Recently, Nazari-Farsani et al. (2020) found that BMD and cortical-bone thickness of the distal radius predicts 1-year stem subsidence in postmenopausal women. They used DEXA of the hip, lumbar spine, and distal radius along with pulse-echo ultrasonometry of the distal radius to determine the systemic BMD and cortical thickness. When we compare our results based on sex, it seems men have increased subsidence compared with women, even though their femoral neck BMD preoperatively is higher (Table 2). Based on our study and the above-mentioned studies (Aro et al. 2012, Nazari-Farsani et al. 2020) it seems systemic BMD is a better predictor of subsidence than local femoral BMD. Often there is no evaluation of bone quality before THA surgery even though the advantages of an enhanced focus make sense (Russell 2013). In our study, it seems there is no influence of preoperative local BMD on migration and no threshold of T-scores at which cemented fixation should be considered to avoid excessive migration (provided the T-score is > –2.5). But, if in doubt, an osteoporosis DEXA scan of the hip prior to surgery could be the answer; it takes less than 10 minutes and gives good visualization of the quality of the bone as it com-

Table 2. P-values for comparison between groups divided by clinically relevant T- and Z-scores and sex Variable T-score > –1 ≤ –1 Z-score > 0 ≤ 0 Female Male

n

BMD femoral neck mean (range) p-value

Subsidence 3 months mean (range) p-value

35 21

1.0 (0.8–1.5) < 0.001 0.8 (0.7–0.9)

1.3 (0.0–4.7) 0.9 1.2 (0.0–5.8)

46 10 27 29

1.0 (0.7–1.5) 0.01 0.8 (0.8–0.9) 0.9 (0.7–1.2) < 0.001 1.0 (0.8–1.5)

1.3 (0.0–5.8) 0.6 1.1 (0.0–2.4) 1.1 (0.0–5.8) < 0.001 1.3 (0.0–4.7)

cementless fixation of THA femoral stems report good longterm survivorship; nevertheless, the use of the uncemented fixation method is increasing in many countries (Bunyoz et al. 2019). The 2019 annual report from the Danish Hip Arthroplasty Register (2020) shows that for people undergoing THA because of primary osteoarthritis, uncemented THA shows better implant survival when looking at revision due to aseptic loosening. And when the endpoint changes to “all revision causes,” the cementless THA still shows better implant survival for patients younger than 70 years. This may be explained by the increased risk of dislocation of the THA and the increased risk of periprosthetic fracture for patients > 70 years of age operated on with an uncemented THA (Solgaard and Kjersgaard 2014). In the study by Troelsen et al. (2013), registry data from Australia, New Zealand, Denmark, and England and Wales suggests that cemented fixation for patients older than 75 years results in the lowest risk of revision. This age limit is in accordance with the finding from the National Health And Nutrition Examination Study (NHANES), which shows that the mean T-score of the hip for healthy females is –2.5 at the age of 75 years (Blake and Fogelman 2007). Our results, with no influence of local BMD on migration in patients aged below 75 and a T-score of the femoral neck above –2.5, are considered in good agreement with results of the above-mentioned register study. However, in the study by Mäkelä et al. (2014) the limit for uncemented fixation for THAs is suggested as being as low as 65 years of age, based on data from the Nordic Arthroplasty Register Association database. Age and osteoporosis reduce the mechanical strength of the bone, lower the bone mass, and affect the regulation of biological factors important for healing (Russell 2013). Although the latter is not fully understood, it is believed that bone cells in osteoporotic bone are likely to have an altered responsiveness to mechanical stimuli and that physical-strength exercise can prevent declining BMD or even lead to an increase in BMD (Augat et al. 2005). When people with osteoporosis need a total hip prosthesis (or any other implant surgery) the anchorage of the implant is impaired and there is a longer period of


Acta Orthopaedica 2021; 92 (5): 538–543

pares the individual patient to a larger number of people. And if the T-score is measured to be > –2.5, cementless fixation probably should be preferred. We have found proof that the femoral neck region BMD obtained by the fast osteoporosis DEXA scan is closely related to the BMD of both the trochanters (where porous surfaces of femoral components are often located) and the shaft of the femoral bone (where the stem is fixed). It is a limitation that this study has been conducted only on secondary data from an RCT of the Bi-Metric and the Echo BiMetric uncemented THA stems. It could be argued that RCT studies with greater power and different design are needed to make more confident conclusions that could be used for other uncemented hip stems. Furthermore, it is also a limitation that our study population did not include patients with hip T-scores below –2.5. Therefore, we cannot identify whether there is an even lower hip BMD threshold for safe use of uncemented hip stems. This would be an interesting topic for a future randomized controlled trial. In conclusion, we found no association between femoral neck BMD and 24-month subsidence of an uncemented primary THA femoral stem in a population with femoral T-score > –2.5 and age < 75 years. KD, MS, and MMP designed the study. KD and SS carried out the study. KD, GF, and MS analyzed the data and KD wrote the first article draft. MS, GF, SS, and MMP revised the article.  The authors would like to thank nurse Marina Golemac for her thorough and careful efforts when performing DEXA-scans, and Håkan Leijon of the RSA laboratory in Lund, Skåne University Hospital, for indispensable teaching in how to computerize the RSA pictures and data. Acta thanks Hannu T Aro and Lene Bergendal Solberg for help with peer review of this study.

Alfaro-Adrian J, Gill H S, Murray D W. Should total hip arthroplasty femoral components be designed to subside? A radiostereometric analysis study of the Charnley Elite and Exeter stems. J Arthroplasty 2001; 16(5): 598-606. Andersen M R, Winther N S, Lind T, Schroder H M, Flivik G, Petersen M M. Low preoperative BMD is related to high migration of tibia components in uncemented TKA-92 patients in a combined DEXA and RSA study with 2-year follow-up. J Arthroplasty 2017; 32(7): 2141-6. Aro H T, Alm J J, Moritz N, Mäkinen T J, Lankinen P. Low BMD affects initial stability and delays stem osseointegration in cementless total hip arthroplasty in women: a 2-year RSA study of 39 patients. Acta Orthop 2012; 83(2): 107-14. Augat P, Simon U, Liedert A, Claes L. Mechanics and mechano-biology of fracture healing in normal and osteoporotic bone. Osteoporos Int 2005; 16(Suppl. 2): 36-43. Blake G M, Fogelman I. The role of DXA bone density scans in the diagnosis and treatment of osteoporosis. Postgrad Med J 2007; 83: 509-17. Bunyoz K I, Malchau E, Malchau H, Troelsen A. Has the use of fixation techniques in THA changed in this decade? The uncemented paradox revisited. Clin Orthop Relat Res 2019; 00: 1-8.

543

Danish Hip Arthroplasty Register (DHR). 2019 National Annual Report; 2020. Dyreborg K, Andersen M R, Winther N, Solgaard S, Flivik G, Petersen M M. Migration of the uncemented Echo Bi-Metric and Bi-Metric THA stems: a randomized controlled RSA study involving 62 patients with 24-month follow-up. Acta Orthop 2020; 91(6): 693-8. Gulati A, Manktelow A R J. Even “Cementless” surgeons use cement. J Arthroplasty 2017; 32(9): S47-53. Johnell O, Kanis J A, Oden A, Johansson H, De Laet C, Delmas P, Eisman J A, Fujiwara S, Kroger H, Mellstrom D, Meunier P J, Melton L J 3rd, O’Neill T, Pols H, Reeve J, Silman A, Tenenhouse A. Predictive value of BMD for hip and other fractures. J Bone Miner Res 2005; 20(7): 1185-94. Konstantinidis L, Helwig P, Hirschmüller A, Langenmair E, Südkamp N P, Augat P. When is the stability of a fracture fixation limited by osteoporotic bone? Injury 2016; 47:S27-32. Li Y, Röhrl S M, Bøe B, Nordsletten L. Comparison of two different radiostereometric analysis (RSA) systems with markerless elementary geometrical shape modeling for the measurement of stem migration. Clin Biomech 2014; 29(8): 950-5. Mäkelä K T, Matilainen M, Pulkkinen P, Fenstad A M, Havelin L, Engesaeter L, Furnes O, Pedersen A B, Overgaard S, Kärrholm J, Malchau H, Garellick G, Ranstam J, Eskelinen A. Failure rate of cemented and uncemented total hip replacements: register study of combined Nordic database of four nations. BMJ 2014; 348(January): 1-10. Matejcic A, Vidovic D, Nebergall A, Greene M, Bresina S, Tepic S, Malchau H, Hodge W A. New cementless fixation in hip arthroplasty: a radiostereometric analysis. Hip Int 2015; 25(5): 477-83. Mears S C, Richards A M, Knight T A, Belkoff S M. Subsidence of uncemented stems in osteoporotic and non-osteoporotic cadaveric femora. Proc Inst Mech Eng Part H J Eng Med 2009; 223(2): 189-94. Moritz N, Alm J J, Lankinen P, Mäkinen T J, Mattila K, Aro H T. Quality of intertrochanteric cancellous bone as predictor of femoral stem RSA migration in cementless total hip arthroplasty. J Biomech 2011; 44(2): 2217. Nazari-Farsani S, Vuopio M E, Aro H T. Bone mineral density and corticalbone thickness of the distal radius predict femoral stem subsidence in postmenopausal women. J Arthroplasty 2020; 35(7): 1877-84.e1. Nysted M, Foss O A, Klaksvik J, Benum P, Haugan K, Husby O S, Aamodt A. Small and similar amounts of micromotion in an anatomical stem and a customized cementless femoral stem in regular-shaped femurs. Acta Orthop 2014; 85(2): 152-8. Piarulli G, Rossi A, Zatti G. Osseointegration in the elderly. Aging Clin Exp Res 2013; 25(1 Suppl.): 59-60. Russell L A. Osteoporosis and orthopedic surgery: effect of bone health on total joint arthroplasty outcome. Curr Rheumatol Rep 2013; 15(11): 371. Solgaard S, Kjersgaard A G. Increased risk for early periprosthetic fractures after uncemented total hip replacement. Dan Med J 2014; 61(2): 1-4. Teeter M G, McCalden R W, Yuan X, MacDonald S J, Naudie D D. Predictive accuracy of RSA migration thresholds for cemented total hip arthroplasty stem designs. Hip Int 2018; 28(4): 363-8. Troelsen A, Malchau E, Sillesen N, Malchau H. A review of current fixation use and registry outcomes in total hip arthroplasty: the uncemented paradox. Clin Orthop Relat Res 2013; 471(7): 2052-9. Van Der Voort P, Pijls B G, Nieuwenhuijse M J, Jasper J, Fiocco M, Plevier J W M, Middeldorp S, Valstar E R, Nelissen R G H H. Early subsidence of shape-closed hip arthroplasty stems is associated with late revision. Acta Orthop 2015; 86(5): 575-85. Weber E, Sundberg M, Flivik G. Design modifications of the uncemented Furlong hip stem result in minor early subsidence but do not affect further stability: a randomized controlled RSA study with 5-year follow-up. Acta Orthop 2014; 85(6): 556-61.


544

Acta Orthopaedica 2021; 92 (5): 544–550

Poor adherence to guidelines in treatment of fragile and cognitively impaired patients with hip fracture: a descriptive study of 2,804 patients Christina F FRANDSEN 1, Eva N GLASSOU 1,2, Maiken STILLING 1,3, and Torben B HANSEN 1,3 1 University

Clinic of Hand, Hip and Knee Surgery, Department of Orthopaedics, Regional Hospital West Jutland; 2 Department of Quality, Regional Hospital West Jutland; 3 Department of Clinical Medicine, Aarhus University, Denmark Correspondence: chfroe@rm.dk Submitted 2020-09-29. Accepted 2021-03-30.

Background and purpose — Following a hip fracture, most patients will encounter poorer functional outcomes and an increased risk of death. Treatment-monitoring of hip fracture patients is in many countries done by national audits. However, they do not allow for a deeper understanding of treatment limitations. We performed a local evaluation study to investigate adherence to 7 best-practice indicators, and to investigate patient groups at risk of suboptimal treatment. Patients and methods — 2,804 patients were surgically treated for a hip fracture from 2011 to 2017 at our institution. Data regarding admission, hospital stay, and discharge was prospectively collected, and adherence to the 7 best practice indicators (nerve block, surgical delay, antibiotics, implant choice, thromboprophylaxis, mobilization, and blood transfusions) was analyzed. Patient groups with lower adherence were identified. Results — 34% of patients received all 7 best practice indicators after considering contraindications; in particular, nerve blocks and thromboprophylaxis displayed low adherence at 61% and 91% respectively. Nursing home residents and patients with cognitive impairment, multiple comorbidities, or low functional levels were at risk of having a lower adherence. Interpretation — The most dependent patients with cognitive impairment, comorbidities, or low functional levels had lower guideline adherence. This large patient subgroup needs a higher treatment focus and more resources. Our findings are likely similar to those in other national and international institutions.

Hip fractures are a leading cause of disability and mortality among seniors worldwide, with 1-year mortality surpassing 20%. Survivors often experience diminished walking ability, reduced activities of daily living, and loss of independence (Bentler et al. 2009, Dyer et al. 2016). Recent years have seen only minimal improvements in outcomes, such as mortality, which suggest that hip fracture treatment needs improvement (Rogmark 2020). However, patients with hip fracture represent a heterogeneous and fragile patient group with multiple comorbidities, which complicates treatment. Evidence-based treatment is fundamental to modern medicine, and previous research has demonstrated improved outcomes for patients receiving best practice indicators (Nielsen et al. 2009, Kristensen et al. 2016, Oakley et al. 2017, Farrow et al. 2018). However, most studies are based on process indicators, which give no information on the actual treatment provided; this includes national audits (Sweden’s National Quality Register 2018, Danish Multidisciplinary Hip Fracture Registry 2019, Royal College of Physicians 2019, Australian & New Zealand Hip Fracture Registry 2019). To our knowledge, only a few studies have evaluated direct local adherence to guidelines for patients with hip fracture (Seys et al. 2018, Mcglynn et al. 2003, Sunol et al. 2015). Continuous monitoring through national audits and local studies might detect gaps in the treatment of patients with hip fracture and hopefully secure improvement. We assessed the degree of adherence to 7 best practice indicators in a local evidence-based guideline for treatment of hip fractures. We expected adherence to increase during the study period as the guideline was incorporated better over time. Furthermore, the study aimed to clarify whether particular patient groups are at risk of significantly lower guideline adherence and hence suboptimal treatment at our institution.

© 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group, on behalf of the Nordic Orthopedic Federation. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits ­unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. DOI 10.1080/17453674.2021.1925430


Acta Orthopaedica 2021; 92 (5): 544–550

545

Patients admitted or transferred with a hip fracture 2011–2017 n = 3,047

Screws or DHS

Garden type I–II Posterior tilt > 20° Age < 60 years

Excluded Not surgically treated n = 44

Femoral neck fracture

Garden type III–IV

Age 60–70 years b Age > 70 years

Surgically treated for a hip fracture n = 3,003 Excluded (n = 199): – missing information regarding the fracture, 11 – pathological or peri-prosthetic fracture, 17 – patients with a second hip fracture, 171 Patients included in the analysis n = 2,804

Figure 1. Patient inclusion flowchart.

a

Intertrochanteric fracture

Subtrochanteric fracture

THA Screws or DHS Screws, DHS or THA THA

Basocervical

2- or 4-hole DHS

Evans type I–II

4-hole DHS

Evans type III–IV

4-hole DHS (or IMN)

Evans type V or reverse

4- or 6-hole DHS (± lateral support plate) or IMN IMN or 6-hole DHS

Figure 2. Protocol for implant choice based on fracture type and patient age. a Posterior tilt > 20° in the lateral view of Garden I–II fractures resulted in the recommended treatment changing to a THA. However, as this was part of the adjustment after contraindications it is shown in dotted lines but included in the figure for clarification. b Individual assessment of each patient’s comorbidity, pre-fracture mobility, and radiograph to determine best treatment option, favoring screws or DHS in fractures that can be anatomically reduced, and patients without severe comorbidities and severely impaired pre-fracture mobility. DHS = dynamic hip screw. IMN = intramedullary nail. THA = total hip arthroplasty.

Patients and methods Design The study is a retrospective analysis of prospectively collected data from a cohort of patients with hip fracture conducted at a department of orthopedic surgery. Patients All patients admitted to our hospital or transferred from other hospitals with a hip fracture between January 2011 and December 2017 were examined for inclusion (n = 3,047). Hip fracture was defined as a femoral neck fracture, an inter-trochanteric fracture, or a sub-trochanteric fracture. Only surgically treated patients were included. Patients with pathological hip fractures or peri-prosthetic fractures were excluded (n = 17). 11 patients with missing data at the start of the study period were also excluded. For patients who suffered a second hip fracture during the study period, only the first hip fracture was included in the analysis (n = 171). 2,804 patients were included in the study (Figure 1). Data All patients were treated according to a well-defined hip fracture guideline at our hospital, which was introduced in January 2011. Simultaneously with the implementation of the guideline, all patients admitted with a hip fracture were prospectively included in our Hip Fracture Database. The database was established in January 2011 to study mortality and morbidity among hip fracture patients at our institution. During admission, patient characteristics and clinical measures were recorded by a nurse on specified forms for the Hip

Fracture Database. Data included weight, height, comorbidity, residency, cognitive impairment, preoperative walking aid, and pre-fracture functional level. At discharge, nurses reported prospectively collected data to the database regarding blood samples, blood transfusions, surgery (date, time, and choice of treatment), pain management (regional block and oral analgesics), and discharge placement. Comorbidities were assessed by ASA classification. Prefracture functional level was estimated using the New Mobility Score (NMS) and was dichotomized into a low pre-fracture functional level (0–5 points) and a high functional level (6–9 points) (Kristensen et al. 2005). Retrospectively, one researcher (CFF) classified all fractures on the preoperative radiographs (anterior-posterior, lateral view, and pelvic). The radiographs were classified according to the Garden classification for femoral neck fractures and the Evans classification for inter-trochanteric fractures. Posterior tilt was measured on the lateral view for all Garden I–II fractures. No sub-classification for sub-trochanteric fractures was used. Prior to analysis, we outlined 7 best practice indicators of particular importance in our local hip fracture guideline. National and international recommendations, national audits, and previous literature were reviewed for important indicators (Dansk Ortopædisk selskab 2008, NICE 2017, Seys et al. 2018, Danish Multidisciplinary Hip Fracture Registry 2019). Indicators were chosen to mirror the different procedural steps and diverse care groups involved in the treatment. The 7 best practice indicators were as follows: 1. Preoperative block. Defined as the use of either epidural or peripheral nerve block prior to surgery. 2. Surgical delay. Defined as surgery within 24 or 36 hours from admission.


546

Acta Orthopaedica 2021; 92 (5): 544–550

Table 1. Contraindications for each best practice indicator Factor

Number (%)

1. Preoperative block (n = 1,171) Patient declined 155 (13) No valid contraindications a 1,016 (87) 2. Surgical delay Within 24 hours (n = 738) Medical complications b 128 (17) Anticoagulation treatment 97 (13) Death 1 (0.1) Others 36 (4.9) No valid contraindication a 476 (65) Within 36 hours (n = 376) Medical complications b 99 (26) Anticoagulation treatment 81 (22) Death 1 (0.3) Others 27 (7.2) No valid contraindication a 168 (45) 3. Perioperative antibiotics (n = 126) Irrelevant antibiotic treatment 33 (26) No valid contraindication a 93 (74) 4. Implant choice (n = 349) Fracture characteristics 104 (30) Patient morbidity 61 (18) Pre-fracture mobility 9 (2.6) Others 8 (2.3) No valid contraindication a 167 (48) 5a. Thromboprophylaxis for 7 days after surgery (n = 211) Renal failure 3 (1.4) Former HIT c 0 (0.0) Former bleeding 10 (4.7) Bridging 88 (42) Others 39 (20) No valid contraindication a 71 (34) 5b. Thromboprophylaxis given 6–8 h after surgery (n = 1,175) Given too early 223 (19) Given too late 691 (59) Not given the first day postoperatively 136 (12) Others 22 (1.9) No valid contraindication a 103 (8.8) 6. Postoperative mobilization (n = 464) No standing abilities prior to surgery 32 (6.9) Others d 91 (20) No valid contraindication a,e 341 (73) 7. Blood transfusions (n = 90) Patient declined 6 (6.7) Asymptomatic 18 (20) Others 7 (7.8) No valid contraindication a 59 (66) a

Including no reasons given in the patient record or invalid contraindications given.

b For example, cardiac arrhythmias and strokes. c Heparin-induced thrombocytopenia. d For example, patient died within 24 h or was transferred

to another hospital within 24 h. patients only mobilized to a sitting position within 24 h.

e Including

3. Perioperative use of antibiotics. 4. Implant choice. Defined from fracture type and age (Figure 2). 5. Thromboprophylaxis. Defined as injections of low-molecular-weight heparin (LMWH) for at least 7 days with the 1st injection given 6–8 hours after surgery. 6. Postoperative mobilization to standing within 24 hours of surgery.

7. Blood transfusions if postoperative hemoglobin was below 6 mmol/l. Implant choice was based on recommendations from the Danish Orthopedic Society, and primarily dictated by the fracture type; however, especially for femoral neck fractures the patient’s age was also a determining factor (Dansk Ortopædisk selskab 2008). Dual mobility total hip arthroplasties (THAs) were used as standard treatment for patients over 70 years with a Garden III and IV fracture. Internal fixation was standard care for younger patients under 60 years due to superior healing potential and to postpone possible revision of a THA in the future. Garden III–IV fractures in patients between 60 and 70 years could be treated with screws, dynamic hip screws (DHSs), or THAs, based on an assessment by the surgeon. Screws or DHSs were used for fractures that could be anatomically reduced and patients without severe comorbidities or severely impaired mobility. For inter-trochanteric fractures, the DHS has been our standard treatment choice; however, for more unstable and complex fractures, intramedullary nails (IMNs) or DHSs with lateral support plate were used, with increasing use of IMNs during the study period. Hemiarthroplasty and external fixation was not performed for hip fractures at our institution. Data regarding perioperative antibiotics, thromboprophylaxis, and postoperative mobilization was obtained from patient records; data on preoperative pain management, surgical delay, implant choice, and blood transfusions was obtained from our Hip Fracture Database. Patient records were screened for pre-defined contraindications for each indicator (Table 1). To investigate whether patient characteristics affected adherence, patients were grouped based on commonly known risk factors for increased mortality and morbidity: age, sex, ASA score, residence, cognitive impairment, fracture type, pre-fracture functional level, and walking aids (Bentler et al. 2009, Smith et al. 2014). Statistics An all-or-none test was performed to clarify the percentages of patients receiving all 7 best practice indicators. Furthermore, adherence was calculated as the proportion of patients who achieved a given number of indicators. A chi-square test was used to assess the hypothesis of no difference in adherence between patient groups to identify groups with statistically significantly lower adherence. In the statistical analysis, patients with a valid contraindication or missing data were excluded from the adherence analysis for that particular indicator. They remained in the analysis for the other indicators. However, analysis for indicators 3 (perioperative antibiotics) and 5 (thromboprophylaxis) were executed differently. For perioperative antibiotics, the only valid contraindication was if the patient was already in a relevant antibiotic treatment regimen at the time of surgery. These patients were labelled “correctly treated” and remained


Acta Orthopaedica 2021; 92 (5): 544–550

547

Table 2. Characteristics of the study population at the time of hip fracture (n = 2,804). Values are observed numbers (%) unless ­otherwise stated Variables Mean age (SD) Female sex ASA score ASA 1 ASA 2 ASA 3 ASA 4 ASA 5 Missing Pre-fracture residence Independent living Institutionalized Missing Cognitive function Cognitively impaired Not cognitively impaired Missing Fracture type Garden type I and II Garden type III and IV Stable intertrochanteric Unstable intertrochanteric Subtrochanteric Basocervical Missing Pre-fracture mobility a Low NMS High NMS Missing Walking aids None Assisted walking No walking ability Missing a

Observed values 80 (11) 2,029 (72) 233 (8.3) 1,311 (47) 1,090 (39) 102 (3.6) 1 (0.1) 67 (2.4) 2,064 (74) 736 (26) 4 (0.1) 552 (20) 2,233 (80) 19 (0.7) 431 (15) 977 (35) 572 (20) 680 (24) 57 (2.0) 81 (2.9) 6 (0.2) 1,405 (50) 1,274 (45) 125 (4.5) 1,047 (37) 1,484 (53) 79 (2.8) 194 (6.9)

Pre-fracture mobility was assessed by New Mobility Score (NMS) with 0–5 points labelled as low and 6–9 points as high.

in the adherence analysis corrected for contraindications. For thromboprophylaxis, patients who were given their 1st injection of LMWH prior to 6 hours or later than 8 hours after surgery were labeled “correctly treated” if they had also received LMWH for 7 days. This was chosen as recent studies have shown that the timing of thromboprophylaxis is not as crucial as had been presumed earlier (Liu et al. 2016, Leer-Salvesen et al. 2018). Only patients with data regarding all 7 best practice indicators were included in the all-or-none test. Data analyses were performed using STATA 16 computer software (StataCorp, College Station, TX, USA). Ethics, funding, and potential conflicts of interest The study was conducted in accordance with the Declaration of Helsinki and registered by the Danish Data Protection Agency (number 2007-58-0010), which stated no need for written consent according to Danish law. The study has not received any funding. None of the authors has any conflicts of interest to declare.

Table 3. Observed adherence to the guideline for each of the 7 best practice indicators and all-or-none adherence to all 7 best practice indicators, listed as total number and observed numbers (%) Factor

Adherence Corrected for to guideline contraindications

1. Preoperative block 2,793 1,607 (57) 2. Surgical delay within 24 hours 2,786 2,048 (74) within 36 hours 2,787 2,411 (87) 3. Perioperative antibiotics 2,759 2,633 (95) 4. Implant choice 2,804 2,446 (87) 5. Thromboprophylaxis 2,787 1,538 (55) 6. Postoperative mobilization 2,675 2,211 (83) 7. Blood transfusions 718 628 (87) All-or-none 2,629 442 (17)

2,638 1,607 (61) 2,524 2,048 (81) 2,579 2,411 (93) 2,759 2,666 (97) 2,613 2,446 (94) 2,640 2,394 (91) 2,552 2,211 (87) 687 628 (91) 2,028 684 (34)

Table 4. Percentage of patients fulfilling 3, 4, 5, 6, or 7 best practice indicators (n = 1,946) Number of indicators fulfilled 3 4 5 6 7

Observed numbers (%) 7 (0.4) 53 (2.6) 319 (17) 883 (45) 684 (35)

Results 2,804 patients were treated for a hip fracture. The mean age was 80 years, and females predominated. The majority lived independently. Almost one-fifth of the patients had cognitive impairment, and half of the population had a low pre-fracture functional level (Table 2). Total study period 17% of patients received all 7 best practice indicators. The lowest adherence was found for preoperative block and thromboprophylaxis. The indicators with the highest degree of adherence were perioperative antibiotics and implant choice (Table 3). Overall adherence increased to 34% after considering contraindications, primarily due to increased adherence to thromboprophylaxis (Table 3). Furthermore, 65% of patients in fact fulfilled 6 or more indicators (Table 4).


548

Acta Orthopaedica 2021; 92 (5): 544–550

Preoperative block Surgical delay (within 24 hours) Total 2011 2012 2013 2014 2015 2016 2017

Perioperative antibiotics Implant choice Thromboprophylaxis Postoperative mobilisation Blood transfusion All-or-none 0

10

20

30

40

50

60

70

80

90 100

Adherence (%)

Figure 3. Adherence to the individual best practice indicators and overall adherence in percentages divided into the different years of the study period after taking contraindications into consideration.

Annual adherence Adherence to individual best practice indicators and overall adherence for each year in the study period are displayed in Figure 3 (the data is shown after considering contraindications). Preoperative block showed a decline in adherence during the 7-year period, and blood transfusions dropped in 2016 and 2017. Both declines had an impact on overall adherence, which decreased in 2016 and 2017 to 24% and 26%, respectively. Data is also shown in Table 5 (see Supplementary data). Adherence in subgroups No difference in overall adherence was found when comparing adherence in patient groups in relation to age groups, sex, or fracture type. However, nursing home residents, cognitively impaired patients, patients with a low pre-fracture functional level, and patients with high comorbidity (ASA > 4) were at risk of receiving insufficient treatment (Table 6, see Supplementary data).

Discussion Our most important finding was that only one-third of hip fracture patients at our institution fulfilled all 7 best practice indicators; however, the majority of patients received 6 or more indicators. The study also suggests lower adherence among patients with multiple comorbidities, cognitive impairment, low pre-fracture functional level, and nursing home residency. Surprisingly, adherence did not increase during the study period because of declining adherence to the indicators “preoperative pain management” and “blood transfusion” in 2016 and 2017. The decrease in adherence to preoperative pain management might be explained by a major organizational change in 2016, whereby admissions past 10 pm were performed by medical doctors rather than orthopedic surgeons or ER doctors. Medical doctors have less experience

in administering regional blocks at our institution, which may cause fewer blocks to be administered. For blood transfusions, decreasing adherence over the study period might be due to the 2015 introduction of a new guideline setting lower hemoglobin limits (< 5.6 mmol/L) for transfusion (Danish Healthcare System 2015). Although the transfusion guideline was not specific for patients with hip fracture, it likely influenced the use of blood transfusions for this patient group as well. Other local evaluation studies have similarly found suboptimal care for patients with hip fracture, some with an overall adherence of 0% (Mcglynn et al. 2003, Sunol et al. 2015, Farrow et al. 2018, Seys et al. 2018). Contrary to local evaluations, most national audits show a high level of adherence (Sweden’s National Quality Register 2018, Danish Multidisciplinary Hip Fracture Registry 2019, Australian & New Zealand Hip Fracture Registry 2019). Local evaluation studies should be seen as a supplement to national audits as they can provide a deeper understanding of treatment gaps. Surgical delay may serve as a good illustrator. National audits can only give the results, whereas local studies can point to capacity issues or patient comorbidities as the reason for low adherence. This will identify what steps are needed to increase adherence. The same is true for implant choice as national audits can only give the proportion of patients receiving the different implants; they cannot determine whether the choice of implant was the right one. In previous years, our institution has demonstrated high adherence to the national Danish audit, and similar adherence was found in the present study for matching indicators (Danish Multidisciplinary Hip Fracture Registry 2019). However, when investigating overall adherence, only one-third of patients obtained full treatment. This indicates that care for patients with hip fracture may also be suboptimal at other hospitals despite high adherence to the national audit, underlining the need for local evaluation. Nevertheless, national audits play an important role in monitoring treatment as adherence to national indicators has shown reduced mortality and readmission (Nielsen et al. 2009, Kristensen et al. 2016). Supplementing national audits with local studies in the future may inform future initiatives to improve hip fracture treatment. For the individual best practice indicators, the most surprising results were low adherence for preoperative pain management and thromboprophylaxis. Preoperative pain management will be a future focus area because optimal pain management, especially the use of preoperative blocks, may improve recovery by reducing the use of opioids and by reducing nausea and dizziness, while helping in improving mobilization and nutrition (Guay et al. 2017). Thromboprophylaxis had low adherence before considering contraindications. Low adherence was especially due to bridging, where patients did not receive LMH for 7 days, and patients receiving the 1st injection before 6 hours or later than 8 hours after surgery. However, after the start of the study, anticoagulation therapy has changed. Previous studies have shown that the


Acta Orthopaedica 2021; 92 (5): 544–550

timing of the thromboprophylaxis is of less importance (Liu et al. 2016, Leer-Salvesen et al. 2018), and with the emergence of new anticoagulation strategies, bridging has become more frequent. Consequently, a revision of the guideline is required. The change in anticoagulation therapy also had an impact on the best practice indicator “surgical delay.” Especially in the early stages of the study, vitamin K antagonists (VKA) and novel oral anticoagulants (NOAC) posed a challenge. Surgery was delayed when patients did not respond to vitamin K within 24 hours or due to the initial recommendation of a 24-hour pause from NOAC. However, more recent studies have demonstrated that operating regardless of anticoagulation therapy is, indeed, safe (Schuetze et al. 2019). Despite these delays, our study shows an impressive level of adherence compared with international standards, where most guidelines have a 36-or 48-hour deadline and most national audits show lower fulfilment than in this study (Sweden’s National Quality Register 2018, Australian & New Zealand Hip Fracture Registry 2019, Royal College of Physicians 2019). The increased adherence is probably due to an organizational change with more experienced surgeons and operating rooms functioning outside standard working hours. Implant choice adherence was in line with that reported in other studies (Palm et al. 2012). Most patients had valid contraindications when the guideline was not followed. Contraindications for implant choice were fracture characteristics, primarily a posterior tilt above 20 degrees on the lateral radiograph for Garden I and II fractures (Table 1). Here we opted for a THA instead of screws for patients to reduce the risk of reoperation (Palm et al. 2009). Patient morbidity and mobility describe situations such as young patients with severe osteoporosis or mental handicaps, or patients with no standing or walking ability. In such cases, guideline adherence would be deselected to reduce the risk of reoperation or having to perform extensive surgery. Other contraindications were patients declining the recommended implant. Our study has several strengths. A major strength is a high level of external validity owing to inclusion of all consecutive patients with a hip fracture admitted to the department, including patients with severe cognitive impairment and multiple comorbidities, reducing selection bias. Another strength is the use of prospectively collected or documented data, reducing the risk of recall bias. As with most studies, the design of our study is subject to limitations. First, we have missing data in relation to some variables. If information concerning antibiotics, thromboprophylaxis, postoperative mobilization, and the predefined contraindications were not documented during admission and therefore not available in patient records, these variables were interpreted as missing. This interpretation would have led to an underestimation of the adherence to the guideline. Despite this approach, we had a high degree of data completeness. Second, our study was limited by being a single-center study. While this ensured that patients were treated similarly, it also

549

meant together with the descriptive nature of the study that we can only be sure the results are valid for our institution. However, these results could be true for other institutions, as previous studies have found similar results and the national audit had comparable adherence for matching indicators (Mcglynn et al. 2003, Sunol et al. 2015, Farrow et al. 2018, Seys et al. 2018, Danish Multidisciplinary Hip Fracture Registry 2019). Third, a lack of consensus on which best practice indicators to use as predictors for adherence in hip fracture treatment hampers comparison with other results. Previous studies have used a wide variety of indicators from procedures (orthopedic or geriatric assessment of patients), timing (of surgery, postoperative mobilization, admission to orthopedic wards, or assessment by senior doctors) and medical indicators (antibiotics, thromboprophylaxis, and pain management). A Delphi study was conducted by Seys et al. (2018), to identify indicators of importance in the patients with hip fracture. 4 of the 7 best practice indicators in our study were found to be important for treatment of patients with hip fracture in the Delphi study (surgical delay, antibiotics, thromboprophylaxis, and postoperative mobilization) and 1 was found to be of less importance (preoperative pain management). Further research should be conducted to establish general consensus on which best practice indicators to use, which will ease comparison between studies. Risk assessment for pressure ulcers and malnutrition may be important indicators in improving treatment; furthermore, indicators regarding the period after discharge, such as osteoporosis treatment, fall prophylaxis, and rehabilitation, should be considered in future studies. Conclusion In summary, we found that despite high adherence to individual best practice indicators, overall adherence is surprisingly low at our institution, especially among fragile and cognitively impaired patients. A local evaluation study, such as ours, can be used in the clinic to identify patient groups or treatment steps that need improvement and to deepen our understanding of treatment gaps. Supplementary data Tables 5 and 6 are available as supplementary data in the online version of this article, http://dx.doi.org/10.1080/1745 3674.2021.1925430 All the authors contributed to the study design, including defining the best practice indicators. CFF collected the data. CFF performed the analysis of data with support from ENG. All authors reviewed the results and discussed them. CFF wrote the manuscript draft and all authors revised and approved it. Acta thanks Aare Märtson and Sebastian Mukka for help with peer review of this study.

Australian & New Zealand Hip Fracture Registry. Annual Report 2019; 2019. https://anzhfr.org/2019-annual-report/


550

Bentler S E, Liu L, Obrizan M, Cook E A, Wright K B, Geweke J F, Chrischilles E A, Pavlik C E, Wallace R B, Ohsfeldt R L, Jones M P, Rosenthal G E, Wolinsky F D. The aftermath of hip fracture: discharge placement, functional status change, and mortality. Am J Epidemiol 2009; 170 (10): 1290-9. doi: 10.1093/aje/kwp266. Danish Healthcare System. Vejledning om blodtransfusion. doi: j.rn: 572109/2; 2015. Danish Multidisciplinary Hip Fracture Registry. Danish Multidisciplinary Hip Fracture Registry, Nationalrapport; 2019. https://www.sundhed.dk/ content/cms/62/4662_hofterapport.pdf. Dansk Ortopædisk selskab. Reference program for Patienter Med Hoftebrud; 2008. https://www.ortopaedi.dk/fileadmin/Guidelines/Referenceprogrammer/Referenceprogram_for_patienter_med_hoftebrud2008.pdf. Dyer S M, Crotty M, Fairhall N, Magaziner J, Beaupre L A, Cameron I D, Sherrington C. A critical review of the long-term disability outcomes following hip fracture. BMC Geriatrics 2016; 16: 158. doi: 10.1186/ s1287701603320. Farrow L, Hall A, Wood A D, Smith R, James K, Holt G, Hutchison J, Myint P K. Quality of care in hip fracture patients: the relationship between adherence to national standards and improved outcomes. J Bone Joint Surg Am 2018; 100: 751-57. doi: 10.2106/JBJS.17.00884. Guay J, Parker M J, Griffiths R, Kopp S. Peripheral nerve blocks for hip fractures (review). Cochrane Database Syst Rev 2017; 5(5): CD001159. doi: 10.1002/14651858.CD001159.pub2. Kristensen M T, Foss N B, Kehlet H. Timed Up & Go og New Mobility Score til prædiktion af funktion seks måneder efter hoftefraktur [Timed Up and Go and New Mobility Score as predictors of function six months after hip fracture]. Ugeskr Laeger 2005; 167(35): 3297-3300. PMID: 16138973. Kristensen P K, Thillemann T M, Søballe K, Johnsen S P. Are process performance measures associated with clinical outcomes among patients with hip fractures? A population-based cohort study. Int J Qual Health Care 2016; 28(6): 698-708. doi: 10.1093/intqhc/mzw093. Leer-Salvesen S, Dybvik E , Engesaeter L B, Dahl O E, Gjertsen J. Low-molecular-weight heparin for hip fracture patients treated with osteosynthesis: should thromboprophylaxis start before or after surgery? An observational study of 45,913 hip fractures reported to the Norwegian Hip Fracture Register. Acta Orthop 2018; 89(6): 615-21. doi: 10.1080/17453674.2018.1519101. Liu Z, Han N, Xu H, Fu Z, Zhang D, Wang T, Jiang B. Incidence of venous thromboembolism and hemorrhage related safety studies of preoperative anticoagulation therapy in hip fracture patients undergoing surgical treatment: a case-control study. BMC Musculoskeletal Disorders 2016; 17(76): 1-8. doi: 10.1186/s128910160917y. Mcglynn E A, Asch S M, Adams J, Keesey J, Hicks J, Decristofaro A, Kerr E A. The quality of health care delivered to adults in

Acta Orthopaedica 2021; 92 (5): 544–550

the United States. N Engl J Med 2003; 348: 2635-45. doi: 10.1056/ NEJMsa022615 NICE. National Institute for Health and Clinical Excellence (NICE) (2012) Management of hip fractures in adults; 2017. https://www.nice.org.uk/ guidance/qs16. Nielsen K A, Jensen N C, Jensen C M, Thomsen M, Pedersen L, Johnsen S P, Ingeman A, Bartels P D, Thomsen R W. Quality of care and 30-day mortality among patients with hip fractures: a nationwide cohort study. BMC Health Serv Res 2009; 9: 186. doi: 10.1186/147269639186. Oakley B, Nightingale J, Moran C G, Moppett I K. Does achieving the best practice tariff improve outcomes in hip fracture patients? An observational cohort study. BMJ Open 2017; 7(2): e014190. doi: 10.1136/bmjopen2016014190. Palm H, Gosvig K, Krasheninnikoff M, Jacobsen S, Gebuhr P. A new measurement for posterior tilt predicts reoperation in undisplaced femoral neck fractures 113 consecutive patients treated by internal fixation and followed for 1 year. Acta Orthop 2009; 80(3): 303-7. doi: 10.3109/17453670902967281. Palm H, Krasheninnikoff M, Holck K, Lemser T, Foss N B, Jacobsen S, Kehlet H, Gebuhr P. A new algorithm for hip fracture surgery: reoperation rate reduced from 18% to 12% in 2,000 consecutive patients followed for 1 year. Acta Orthop 2012; 83(1): 26-30. doi: 10.3109/17453674.2011.652887. Rogmark C. Further refinement of surgery will not necessarily improve outcome after hip fracture. Acta Orthop 2020; 91(2): 123-24. doi: 10.1080/ 17453674.2019.1706936. Royal College of Physicians. The National Hip Fracture Database; 2019. https://www.nhfd.co.uk/. Schuetze K, A Eickhoff, Dehner C, Gebhard F, Richter P H. Impact of oral anticoagulation on proximal femur fractures treated within 24 h: a retrospective chart review. Injury 2019; 50(11): 2040-4. doi: 10.1016/j. injury.2019.09.011. Seys D, Sermon A, Sermeus W, Panella M, Bruyneel L, Boto P. Recommended care received by geriatric hip fracture patients: where are we now and where are we heading? Arch Orthop Trauma Surg 2018; 138: 1077-87. doi: 10.1007/s0040201829394. Smith T, Pelpola K, Ball M, Ong Al, Myint P K. Pre-operative indicators for mortality following hip fracture surgery: a systematic review and metaanalysis. Age Ageing 2014; 43(4): 464-71. doi: 10.1093/ageing/afu065. Sunol R, Wagner C, Arah O A, Kristensen S. Implementation of departmental quality strategies is positively associated with clinical practice: results of a multicenter study in 73 hospitals in 7 European countries. PLoS One 2015; 10(11): :e0141157. doi: 10.1371/journal. pone.0141157. Sweden’s National Quality Register. Annual Report 2018; 2018. https:// rikshoft.se/wpcontent/uploads/2019/11/rikshoft_rapport2018_191023.pdf.


Acta Orthopaedica 2021; 92 (5): 551–556

551

Physical capability after total joint arthroplasty: long-term populationbased follow-up study of 6,462 women Ville TURPPO 1, Reijo SUND 1, Jukka HUOPIO 2, Heikki KRÖGER 2, and Joonas SIROLA 2 1 Kuopio

Musculoskeletal Research Unit (KMRU), Institute of Clinical Medicine, University of Eastern Finland (UEF), Kuopio; 2 Department of Orthopaedics, Traumatology and Hand Surgery, Kuopio University Hospital, Kuopio, Finland Correspondence: ville.turppo@gmail.com Submitted 2021-01-24. Accepted 2021-04-05.

Background and purpose — There is lack of knowledge concerning patient-reported long-time outcome after arthroplasty. Therefore, we investigated patient self-reported physical capabilities (PC) and subjective well-being (SW) up to 20 years after total hip (THA) or knee (TKA) arthroplasty. Subjects and methods — The self-reports from postal questionnaires for study checkpoints (baseline, 10-year follow-up, 20-year follow-up) were provided by the Kuopio OSTPRE study including only women aged 52–62 years (n = 6,462). The Finnish Arthroplasty Register and Care Register for Health Care provided data on arthroplasties in the OSTPRE population. The results of women with THA/TKA were compared with women without arthroplasty (control group). Results — In subjects with THA performed before the 10-year follow-up, the proportion of good PC was initially decreased by 0.6 percentage points (pp) at the 10-year follow-up and later by 19 pp at the 20-year follow-up. After TKA, the proportion of subjects with good PC decreased by 4.1 pp (10–year follow-up) and 27 pp (20-year follow-up), respectively. The proportion of controls reporting good PC decreased by 1.4 pp at the 10-year follow-up and 14 pp at the 20-year follow-up compared with the baseline. After THA, the proportion of subjects with good SW stayed on the same level at 10-year follow-up and decreased by 2.3 pp at 20-year follow-up. After TKA, the proportion of good SW increased by 9.0 pp (10-year follow-up) and decreased by 14 pp (20year follow-up). The proportion of controls reporting good SW increased by 4.0 pp (10-year follow-up) and decreased by 8.8 pp (20-year follow-up). Interpretation — THA and TKA maintain PC and SW. The overall PC and SW are lower in women with arthroplasty, in comparison with controls without arthroplasty. THA seems to outperform TKA in maintaining PC.

In recent years, more attention has focused on patient-reported outcomes after total hip (THA) and knee (TKA) arthroplasty. Most studies on patient-reported outcome measures (PROM) have relatively short follow-ups (Ethgen et al. 2004). As implants will usually survive longer, there is a need to investigate long-term patient satisfaction and functioning. We found only a few PROM studies reporting long-term results on THA and/or TKA. THA seems to have high patient satisfaction and good functional outcomes, up to at least 16 years after operation (Mariconda et al. 2011, Gould et al. 2012). TKA seems to maintain patient functioning and activity up to 20 years postoperatively (Meding et al. 2012). Patients often inquire about the performance of THA and TKA in activities of daily living. Also, the performance of THA and TKA is compared, by patients, with non-operated knees and hips. However, there are no studies available that have compared the physical capability and subjective wellbeing between THA and TKA patients and non-operated patients. Also, the long-term changes in PC and SW after THA and TKA remain largely unknown. We assessed long-term patient self-reported physical capability (PC) and subjective well-being (SW) in women even up to 20 years after a primary THA or TKA. We compare THA/ TKA patients with a control group and postoperative scores were compared with preoperative scores.

Subjects and methods This study is based on the long-term follow-up of the female population in the Kuopio Osteoporosis Risk Factors and Prevention study (OSTPRE). The self-reports on participants’ PC and SW were provided by OSTPRE. Supplementary data on all THAs and TKAs in the OSTPRE study population was

© 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group, on behalf of the Nordic Orthopedic Federation. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits ­unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. DOI 10.1080/17453674.2021.1922039


552

obtained from the Finnish Arthroplasty Register (FAR) and the Care Register for Health Care (CRHC). The original purpose of the OSTPRE study was to investigate osteoporosis in the female population in a prospective study setting. However, it has expanded from its start in 1989 into an overall health and subjective well-being cohort, still including only a female population (http://www.uef.fi/en/web/kmru/ ostpre). The original study cohort included all 47–56-year-old women (n = 14,220) living in Kuopio Province in Eastern Finland in 1989. The study is based on self-reports via postal questionnaires, and it has been renewed every 5 years. In the current study, the OSTPRE 1994 questionnaire (n = 11,954) is used as baseline. Follow-ups are the 2004 (10-year follow-up, n = 10,912), and 2014 (20-year follow-up, n = 7,765) questionnaires. We chose these questionnaires to achieve long enough follow-up times for the participants. We focused on questions concerning self-reported PC and SW. These questions have basically remained the same since the questionnaire in 1994. Only those who had returned all 3 questionnaires were included in the study. The self-reported hip fractures in OSTPRE, also included in the present study, were complemented with the hip fractures found from the CRHC and all were also checked from the medical records. The questions asked for self-reports in OSTPRE were as follows (originally in Finnish): “Describe your current physical capability?” and “How would you describe your current well-being?”. Self-reported original PC included the following answer options: 1, capable of moving without limitations; 2, no running, without other limitations; 3, can move less than 1,000 meters; 4, can move less than 100 meters independently; 5, can move only indoors; 6, I’m temporarily immobilized; 7, I’m permanently immobilized. For statistical purposes (group size), answers 1 and 2 were combined as the group “walking without limitations” and are referred later as “good PC.” Also, answers 4–7 were considered as one group, “can move less than 100 meters independently.” This classification works well in clinical settings too, since being able to walk less than 1,000 meters supports the indication for arthroplasty. Originally, SW answers formed 5 groups: very good, good, moderate, poor, and bad. Again, for statistical purposes, very good and good were combined as “good.” Poor and bad were combined as “poor.” THA/TKA register data was collected from the FAR and CRHC. We used 2 different data sources, since it has been previously found to more comprehensively cover all arthroplasties (Turppo et al. 2018). The CRHC records all special healthcare hospital admissions. It holds records of arthroplasty operations since 1987. The FAR has recorded data from arthroplasties since 1980 (National Institute of Health and W 2019). The data was collected until 31 December 2016. Any anomalies in data were manually checked from the questionnaire forms and medical reports and corrected when possible. There were 2,444 women with THA or TKA before the final return date of the 20-year follow-up questionnaire (December 31, 2014). 921 participants who failed to return any of the

Acta Orthopaedica 2021; 92 (5): 551–556

Finnish register data (FAR + CRHC)

OSTPRE data Baseline – 1994 10-year follow-up – 2004 20-year follow-up – 2014 Excluded women with arthroplasty (n = 704): – operated before baseline, 92 – multiple arthroplasties, 612

Control group no arthroplasty n = 5,643

Arthroplasty between baseline and 10-year follow-up – THA, 61 – TKA, 75

Arthroplasty between 10-year and 20-year follow-up – THA, 231 – TKA, 452

Figure 1. Flowchart of study population (N = 6,462).

3 questionnaires were excluded, of whom 293 had died during follow-up. 92 women underwent arthroplasty before baseline and 612 women had more than 1 operated joint. Eventually, there were 819 women with a THA or TKA who met the inclusion criteria. These women formed groups according to the time of their THA or TKA. The following subgroups of women were created (Figure 1, and see Tables 2 and 3): (1) the control group included all OSTPRE participants without arthroplasty until the end of follow-up; (2) women with hip or knee arthroplasty between baseline and 10-year follow-up; (3) women with hip or knee arthroplasty between 10-year and 20-year follow-up. Statistics We used the chi-square test to examine similarity of proportions of the population being in a certain physical capability state at different follow-up points between the control group and the different groups of women with THA/TKA. We used 1-way analysis of variance (ANOVA) to compare means of, e.g., height, weight, and BMI. We used propensity score matching to select the most suitable controls for women operated in with THA or TKA . The variables found in Characteristics (Table 1) were used as covariates. Statistical analysis was conducted with the Statistical Package for the Social Sciences (SPSS), version 27 (IBM Corp, Armonk, NY, USA). Ethics, funding, and potential conflicts of interest The Research Ethics Committee of the Northern Savo Hospital District has given permission for the OSTPRE study (3/11/2014//78/2004). Written consent has been provided by every study participant. The Finnish Institution for Health and Welfare has granted permission to use the CRHC and FAR data (THL/20/5.05.00/2016). This study was supported by the Finnish Arthroplasty Association, Päivikki and Sakari Sohlberg Foundation and Academy of Finland. The authors have no conflicting interests to report.

Results The overall study population consisted of 6,462 women, 292 of whom had THA and 527 whom had TKA. Hip fracture was the indication for THA in 9 women.


Acta Orthopaedica 2021; 92 (5): 551–556

553

group, the proportion of women with good SW reports at the same followup points were 46%, 15%, and 18%. Women with Women with No arthroplasty hip prostheses knee prostheses Only 6.7% of women with a TKA had during follow-up during follow-up during follow-up revision arthroplasties. Factor (n = 5,643) (n = 292) (n = 527) p-value Among women with THA or TKA Age at baseline (years) 57 (52–62) 57 (52–62) 57 (52–62) < 0.001 a between 10-year and 20-year followHeight (cm) 161 (136–179) 162 (147–176) 162 (143–178) 0.005 a up, the mean age at the time of arthroWeight (kg) 69 (38–125) 70 (47–103) 74 (48–120) < 0.001 a plasty was 70 years for both THA and BMI 26 (16–53) 27 (19–40) 28 (20–48) < 0.001 a Mean number of chronic TKA. The median follow-up time for diseases at baseline 1.6 (0–10) 1.7 (0–8) 1.8 (0–9) 0.002 a these women was THA 3 (0–9)/TKA a at end of follow-up 6.8 (0–36) 7.2 (0–26) 7.2 (0–26) < 0.001 3 (0–10) years. Good PC was reported b Self-reported diseases at end of follow-up (%) by 76% of women with THA and 71% Osteoporosis/osteopenia 11 8.6 12 0.3 Rheumatoid arthritis/ with TKA at the 20-year follow-up ankylosing spondylitis 4.1 6.2 8.3 < 0.001 (postoperative questionnaire) (Table 2 Chronic back pain 24 30 29 0.004 and Figures 2–3 and Table 5, see Sup Ischemic heart disease 18 16 19 0.7 Hypertension 58 59 66 0.002 plementary data). The changes in good Other heart disease 15 18 18 0.1 PC of women with THA were not sig Asthma 14 15 14 0.9 nificantly different (p = 0.6) when com Emphysema 2.6 2.7 2.5 1.0 Diabetes 17 14 22 0.006 pared with the control group. For TKA Stroke 9.8 9.6 8.3 0.6 there was a statistically significant Cancer 14 13 12 0.6 difference (p = 0.04). During followb Self-reported fractures (%) up checkpoints participants reported Hip fracture at baseline 0.1 0.0 0.0 0.8 a steady decrease of SW. Eventually, end of follow-up 0.5 5.1 0.8 < 0.001 at the 20-year follow-up, good SW Any low trauma energy fracture was reported by 37% (THA) and 29% at baseline 8.2 7.5 6.6 0.4 end of follow-up 12 14 13 0.4 (TKA) (Table 3 and Figures 4–5, see Supplementary data). Statistically the a One-way analysis of variance (ANOVA). changes in proportion of women with b Pearson’s chi-square. good SW were significantly different from controls with THA (p = 0.004) Among women with arthroplasty between baseline and and TKA (p < 0.001) women. Only 3.0% of the women with 10-year follow-up, the mean age at the time of arthroplasty THA and 2.2% of women with TKA had experienced a reviwas 64 (THA)/65 (TKA) years. The median follow-up time sion arthroplasty by the end of follow-up. Among OSTPRE participants without THA or TKA during for the groups was THA 13 (10–20)/TKA 12 (9–19) years. Good PC was reported by 83% of women with THA and 80% follow-up, good PC was reported by 95% at baseline, 94% of women with TKA, at the 10-year follow-up (1st postopera- at 10-year follow-up, and 80% at 20-year follow-up (Table tive questionnaire). At the 20-year follow-up (2nd postopera- 2). SW remained almost the same throughout the follow-up tive questionnaire), and good PC was reported by 64%/53% (Table 3). At baseline 48% of these women reported good, of women. The changes in good PC of women with a THA 42% moderate, and 10% poor SW. The women in the control were not statistically significantly different from the control group and in the groups with a THA or TKA are similar in group (p = 0.2) whereas the changes in good PC in the TKA terms of age, height, number of low trauma energy fractures, group were significantly different (p = 0.01) (Table 2 and Fig- osteoporosis/osteopenia, and the mean number of chronic disures 2–3 and Table 5, see Supplementary data). Both THA and eases (Table 1). However, there are differences in the proporTKA women reported maintained or improved good SW after tions of self-reports of some important groups of chronic disoperation, at the 10-year follow-up 33% (THA)/41% (TKA). eases and in the amount of self-reported hip fractures by THA Later, at the 20-year follow-up, 31%/27% reported good SW. patients versus other participants. Again, there were statistically significant differences in good Good PC was reported by 94–97% (baseline), 93–95% SW between THA (p = 0.01)/TKA (p = 0.005) and the control (10-year follow-up), and 80–81% (20-year follow-up) of the group (Table 3 and Figures 4–5, see Supplementary data). The propensity score matched controls. Good SW was reported by proportion of women with revision arthroplasties until the end 52–61% (baseline), 52–56% (10-year follow-up), and 47–53% of follow-up was 21% for THA. Their results for good PC (20-year follow-up) (Table 4). were: 92% (baseline), 77% (1st postoperative questionnaire) Analysis for women with THA or TKA within a 1-year and 62% (2nd postoperative questionnaire). In this revised period of any questionnaire showed that preoperatively 54%

Table 1. Characteristics of the study population (N = 6,462)


554

Acta Orthopaedica 2021; 92 (5): 551–556

Table 2. Self-reported physical capability (PC) assessed by walking capability, in controls (women with no arthroplasty) and in women with total hip arthroplasty (THA) or total knee arthroplasty (TKA) at baseline, 10-year and 20-year follow-ups (%)

Table 3. Subjective well-being (SW) in controls (women with no arthroplasty) and in women with total hip arthroplasty (THA) or total knee (TKA) arthroplasty at baseline, 10-year, and 20-year followups (%)

Walking without n a limitations b < 1 km c < 100 m d p e Control group Baseline 5,356 95 4.0 1.0 10-year follow-up 5,557 94 4.0 2.0 20-year follow-up 5,497 80 11 9.0 THA between baseline and 10-year follow-up Baseline 56 84 13 4.0 0.2 10-year follow-up 60 83 12 5.0 20-year follow-up 59 64 12 24 TKA between baseline and 10-year follow-up Baseline 73 84 15 1 0.01 10-year follow-up 73 80 16 4 20-year follow-up 73 53 27 19 THA between 10-year and 20-year follow-up Baseline 218 95 4 1 0.6 10-year follow-up 225 90 8 3 20-year follow-up 222 76 12 12 TKA between 10-year and 20-year follow-up Baseline 427 94 5 0 0.04 10-year follow-up 441 89 9 3 20-year follow-up 431 71 17 12 Arthroplasties are stratified by OSTPRE study follow-up periods (between baseline and 10-year follow-up, between 10-year and 20-year follow-up). a Participants with valid answer in each individual follow-up point. b “Good PC.” c Can move < 1 km independently but > 100 m. d Can move < 100 m independently. e Chi-square was used to study the statistical significance of the changes in good PC during the follow-up between the control group and women with THA/TKA.

n a Good Moderate Poor p-value a Control group Baseline 5,520 48 42 10 10-year follow-up 5,593 52 45 3 20-year follow-up 5,577 43 50 7 THA between baseline and 10-year follow-up Baseline 60 33 50 17 0.01 10-year follow-up 60 33 62 5 20-year follow-up 58 31 50 19 TKA between baseline and 10-year follow-up Baseline 73 32 51 18 0.005 10-year follow-up 74 41 55 4 20-year follow-up 74 27 57 16 THA between 10-year and 20-year follow-up Baseline 227 45 46 9 0.004 10-year follow-up 227 38 56 5 20-year follow-up 224 37 56 7 TKA between 10-year and 20-year follow-up Baseline 435 40 49 11 < 0.001 10-year follow-up 445 36 60 4 20-year follow-up 449 29 61 10 Arthroplasties are stratified by OSTPRE-study follow-up periods ­(between baseline and 10-year follow-up, between 10-year and 20-year follow-up). a Participants with valid answer in each individual follow-up point. b Chi-square was used to study the statistical significance of the changes in good SW during the follow-up between the control group and women with THA/TKA.

of THA and 65% of TKA participants reported good PC. Postoperatively good PC was reported by 62% of THA and 69% of TKA participants. Similarly, good SW was reported by 21% of THA and 24% of TKA participants preoperatively, but 37% (THA)/31% (TKA) postoperatively.

Discussion Elderly women who had experienced THA or TKA maintained their self-reported PC approximately 10 years after the procedure. However, at the end of follow-up the PC and SW among women with arthroplasty generally seemed to decrease more than in women without arthroplasty. 2 prior studies reported worse physical functioning 12 years after THA than in a control group without arthroplasty (Mariconda et al. 2011, Gould et al. 2012). Another study reported good yet deteriorating results from TKA patients 20 years after TKA (Meding et al. 2012). The women with arthroplasty between baseline and 10-year follow-up may have been affected more by osteoarthritis or other comorbidities before

the operation than those who underwent the operation later, because before operation there were notably fewer women reporting good PC than in the control group. Postoperatively, the THA group seemed to benefit more from the operation. Previous reports on THA outperforming TKA support this finding (Ethgen et al. 2004). SW was improved or maintained with both THA/TKA at the first postoperative followup (10-year follow-up). However, at the 20-year follow-up, SW deteriorated and was a little worse than at baseline. The exact cause for deteriorating results during the longer followup (about 13 years postoperatively) remains unclear. Age and comorbidities related to aging may be the main factors, as at the 20-year follow-up the results decreased in all other groups too. Women with arthroplasty may be more prone to these factors. Women who had arthroplasty between baseline and 10-year follow-up were 5–6 years younger than those with arthroplasty later in life. Previous studies have reported that younger patients may be less satisfied with their THA or TKA operation. Regardless of good clinical results, they report more residual symptoms and their health-related quality of life may be more impaired than amongst older patients (Gotze et al. 2006, Parvizi et al. 2014). It may be that arthrosis worsens physical capability in an otherwise more physically capable young population, and arthroplasty restores capability later. Furthermore, changing social demands, i.e.,


Acta Orthopaedica 2021; 92 (5): 551–556

555

At baseline, controls and women with arthroplasty had almost the same amount of doctor-diagnosed chronic dis PC SW eases. At the end of follow Walking up, women with arthroplasty without limitations a < 1 km b < 100 m c p-value d Good Moderate Poor p-value d had a slightly higher average amount of chronic disBaseline–10-year follow-up eases. The greater burden of THA controls Baseline 95 6 0 61 31 9 diseases may also affect PC 10-year follow-up 95 5 0 0.2 54 39 7 < 0.001 and SW among women with 20-year follow-up 80 10 10 53 35 12 THA or TKA. Furthermore, TKA controls Baseline 94 6 0 59 32 9 women with TKA had the 10-year follow-up 95 4 1 < 0.001 56 39 5 < 0.001 highest average BMI as com 20-year follow-up 81 8 11 53 37 11 pared with controls and the 10–20-year follow-up THA controls THA group, which may have Baseline 97 3 0 54 38 8 affected their PC negatively. 10-year follow-up 95 4 1 0.5 53 44 3 < 0.001 Obesity has been shown to be 20-year follow-up 80 8 12 48 46 6 TKA controls strongly related to knee osteo Baseline 97 3 1 52 40 9 arthritis but less to osteoar 10-year follow-up 93 5 3 0.02 52 44 4 < 0.001 thritis of the hip (Hunter and 20-year follow-up 81 10 9 47 47 6 Bierma-Zeinstra 2019). a “Good PC.” We additionally performed b Can move < 1 km independently. b Can move < 100 m independently. propensity score matching, d Chi-square was used to study the statistical significance of the changes in good PC and SW through which gave PC results simithe follow-up between the propensity score matched controls and women with THA or TKA. lar to the original control group. The difference in SW during working life or doing sports without worrying about results between women with arthroplasty and propensity score a prosthesis may have an influence on the improvement of matched controls was increased compared with the original PC. There are also patient-related factors that can influence control group. Strengths of this study are the large cohort study combined postoperative patient-reported outcomes, e.g., comorbidities, obesity, psychological status, and expectations (Hofstede et with the national registers and with long-term data. Weaknesses of our study are that we did not have conclusive data al. 2016, Canovas and Dagneaux 2018). Women who underwent THA/TKA later in life (between on symptomatic joint diseases in the study population, and our 10-year and 20-year follow-up) seemed to have a quite simi- results may not be generalizable to men. Also, no validated lar PC to the control group, 10 years prior to arthroplasty, at patient-reported outcome measures were used. However, baseline. Before arthroplasty, at the 10-year follow-up, there scores used to evaluate clinical results of arthroplasty (e.g., was slight decrease in PC results, probably due to progression Knee Society Score, Harris Hip Score, and Oxford Knee and of osteoarthritis of the index joint. The postoperative scores Hip Score), include walking distance as a variable. Thus, our in PC were close to those reported by the control group, and end point variable may be considered feasible for evaluation age-related factors may decrease patients’ physical capabili- of the functional status. In addition, there are prior studies valties even more than arthrosis. However, neither THA nor TKA idating different self-reports in OSTPRE. Recently, we have completely restored a person’s ability to walk. The SW of reported the validity of self-reported physical capability with these older women was good throughout the follow-up. Previ- functional tests in the OSTPRE cohort (Juopperi et al. 2021). ous data has shown that age is not an obstacle for an effec- Also, self-reported fractures (Honkanen et al. 1999) as well as tive THA or TKA and elderly people report improved quality all hip fractures (Sund et al. 2014) have been validated. During of life scores after THA/TKA operations (March et al. 1999, follow-up there were many dropouts. The OSTPRE cohort is one of the rare true population-based cohorts of aging women Ethgen et al. 2004). THAs’ and TKAs’ positive effects on pain, physical func- with very long follow-up time and is also a part of the national tioning and health are known to mostly increase from months roadmap infrastructure (Finnish Research Infrastructure for to up to 2 years post-operatively (Ethgen et al. 2004, Williams Population Based Surveys—FIRI-PBS). It is obvious that in et al. 2013). In our study, both PC and SW improved in par- the aging population there will be natural reasons for “dropticipants with THA or TKA within 1 year of the questionnaire, out” in the population answering questionnaires, such as morwhen compared with results prior to operation. tality or long-term institutionalization. In the OSTPRE cohort Table 4. PC and SW results (%) for propensity score matched controls (no arthroplasty) for women with THA (n = 61) and TKA (n = 75) between baseline and 10-year follow-up, and for women with THA (n = 231) and TKA (n = 452) between 10-year and 20-year follow-up


556

this has been compensated with record linkage to national registers. The PC or SW are not available in the registers, so without assuming some values for observable events (mortality, hospitalization, long-term institutionalization) we are forced to stick to the people who have answered the questionnaires. It is true that in this situation there may be some selection bias because of dropout. However, part of the dropout is not interesting at all, because we are interested in the population who can live a normal life with THA/TKA, not in those who have already died (17%) or ended up in an institution (10%), account for the 27% at the time of 20-year follow-up (i.e., at OSTPRE 25-year follow-up in 2014). Excluding women with these reasons from the dropout population makes the dropout rates much more tolerable. It is still possible and likely that the women who have answered are relatively healthier than the ones unwilling to participate anymore, but it is difficult to control for this kind of non-random bias. In conclusion, THA and TKA maintain self-reported PC and SW. Yet, the overall PC and SW are lower in women with prior arthroplasty, in comparison with age-matched controls without arthroplasty. THA seems to outperform TKA in maintaining PC. Supplementary data Figures 2–5 and Table 5 are available as supplementary data in the online version of this article, http://dx.doi.org/10.1080/ 17453674.2021.1922039 TV, SR, HJ, KH, SJ: writing the manuscript. TV, SR, SJ: data analysis and interpretation of the results. SR, HJ, KH, SJ: supervision of the study and proofreading. The authors thank research secretary Miss Seija Oinonen for her help in the management of the questionnaire data. Acta thanks Thomas Jakobsen and Per Kjærsgaard-Andersen for help with peer review of this study.

Canovas F, Dagneaux L. Quality of life after total knee arthroplasty. Orthop Traumatol Surg Res 2018; 104 (1S): S41-6. Ethgen O, Bruyere O, Richy F, Dardennes C, Reginster J Y. Health-related quality of life in total hip and total knee arthroplasty: a qualitative and

Acta Orthopaedica 2021; 92 (5): 551–556

systematic review of the literature. J Bone Joint Surg Am 2004; 86-A(5): 963-74. Gotze C, Tschugunow A, Gotze H G, Bottner F, Potzl W, Gosheger G. Long-term results of the metal-cancellous cementless Lubeck total hip arthroplasty: s critical review at 12.8 years. Arch Orthop Trauma Surg 2006; 126(1): 28-35. Gould V C, Blom A W, Wylde V. Long-term patient-reported outcomes after total hip replacement: comparison to the general population. Hip Int 2012; 22(2): 160-5. Hofstede S N, Gademan M G, Vliet Vlieland T P, Nelissen R G, Marangvan de Mheen, P J. Preoperative predictors for outcomes after total hip replacement in patients with osteoarthritis: a systematic review. BMC Musculoskelet Disord 2016; 17: 212-16. Honkanen K, Honkanen R, Heikkinen L, Kröger H, Saarikoski S. Validity of self-reports of fractures in perimenopausal women. Am J Epidemiol 1999; 150(5): 511-6. Hunter D J, Bierma-Zeinstra S. Osteoarthritis. Lancet 2019; 393(10182): 1745-59. Juopperi S, Sund R, Rikkonen T, Kröger H, Sirola J. Cardiovascular and musculoskeletal health disorders associate with greater decreases in physical capability in older women. BMC Musculoskelet Disord 2021; 22(1): 192-021. March L M, Cross M J, Lapsley H, Brnabic A J, Tribe K L, Bachmeier C J, Courtenay B G, Brooks P M. Outcomes after hip or knee replacement surgery for osteoarthritis: a prospective cohort study comparing patients’ quality of life before and after surgery with age-related population norms. Med J Aust 1999; 171(5): 235-8. Mariconda M, Galasso O, Costa G G, Recano P, Cerbasi S. Quality of life and functionality after total hip arthroplasty: a long-term follow-up study. BMC Musculoskelet Disord 2011; 12: 222. Meding J B, Meding L K, Ritter M A, Keating E M. Pain relief and functional improvement remain 20 years after knee arthroplasty. Clin Orthop Relat Res 2012; 470(1): 144-9. National Institute of Health and Welfare. Far; 2019. Available from: http:// www.thl.fi/far/#index Parvizi J, Nunley R M, Berend K R, Lombardi A V, Ruh E L, Clohisy J C, Hamilton W G, Della Valle C J, Barrack R L. High level of residual symptoms in young patients after total knee arthroplasty. Clin Orthop Relat Res 2014; 472(1): 133-7. Sund R, Honkanen R, Johansson H, Odén A, McCloskey E, Kanis J, Kröger H. Evaluation of the FRAX model for hip fracture predictions in the population-based Kuopio osteoporosis risk factor and prevention study (OSTPRE). Calcif Tissue Int 2014; 95(1): 39-45. Turppo V, Sund R, Sirola J, Kroger H, Huopio J. Cross-validation of arthroplasty records between arthroplasty and hospital discharge registers, self-reports, and medical records among a cohort of 14,220 women. J Arthroplasty 2018; 33(12): 3649-54. Williams D P, Blakey C M, Hadfield S G, Murray D W, Price A J, Field R E. Long-term trends in the Oxford knee score following total knee replacement. Bone Joint J 2013; 95-B(1): 45-51.


Acta Orthopaedica 2021; 92 (5): 557–561

557

No increase in postoperative contacts with the healthcare system following outpatient total hip and knee arthroplasty Christian E HUSTED, Henrik HUSTED, Christian Skovgaard NIELSEN, Mette MIKKELSEN, Anders TROELSEN, and Kirill GROMOV

Department of Orthopedic Surgery, Copenhagen University Hospital, Hvidovre, Denmark Correspondence: christianhusted@live.dk Submitted 2021-03-20. Accepted 2021-04-06.

Background and purpose — Discharge on the day of surgery (DDOS) after total hip arthroplasty (THA) and total knee arthroplasty (TKA) has been shown to be safe in selected patients. Concerns have been raised that discharging patients on the day of surgery (DOS) could lead to an increased burden on other parts of the healthcare system when compared with patients not discharged on the DOS (nDDOS). Therefore, we investigated whether discharging patients on the day of surgery (DOS) after THA and TKA leads to increased contacts with the primary care sector or other departments within the secondary care sector. Patients and methods — Prospective data on 261 consecutive patients scheduled for outpatient THA (n = 135) and TKA (n = 126) were collected as part of a previous cohort study. 33% of THA patients and 37% of TKA patients were discharged on the DOS. Readmissions within 3 months after surgery were recorded. Contacts with the discharging department, other departments, and primary care physicians within 3 weeks were registered. Results — No statistically significant differences were found when comparing DDOS patients and patients not discharged on the DOS (nDDOS) with regard to readmissions, physical contacts with the discharging department, and contacts with other departments as well as general practitioners. THA DDOS patients had significantly fewer contacts with the discharging department by telephone than THA nDDOS patients. TKA DDOS patients had significantly more contacts with the discharging department by telephone than TKA nDDOS patients. Interpretation — Patients discharged on the DOS following THA or TKA generally have similar postoperative contacts with the healthcare system when compared with patients not discharged on the DOS.

Total hip arthroplasty (THA) and total knee arthroplasty (TKA) are surgical procedures that have improved continuously perioperatively for many years as a result of implementation of fast-track principles (Husted 2012, Petersen et al. 2019). These changes have led to a reduced length of stay in hospital following THA and TKA while also limiting cost, morbidity, and mortality (Khan et al. 2014, Andreasen et al. 2017, Jørgensen et al. 2017, Burn et al. 2018, Petersen et al. 2020). The epitome of fast-track surgery is outpatient surgery, where patients are discharged from the hospital on the day of surgery (DOS) to their own homes. This has proven to be beneficial in several ways for selected patients, as these patients spend less time in the hospital while still having similar outcomes when compared with patients not discharged on the DOS with regard to both patient-reported outcome measurements (Husted et al. 2021) and safety (Goyal et al. 2017, Vehmeijer et al. 2018, Gromov et al. 2019). Finally, outpatient THA and TKA come with additional financial benefits (Lovald et al. 2014, Husted et al. 2018, Gibon et al. 2020). Although an early small study indicated that the reduction in number of hospital days from fast-track did not increase the number of patient contacts with the primary healthcare sector (Andersen et al. 2009), concerns exist that the reduced time patients discharged on the DOS spend in hospital has led to an increased potential burden on other parts of the healthcare system—specifically the primary healthcare system, an increase in readmissions, and/or more contacts with the discharging department as well as other departments (Shah et al. 2019). Therefore, we aimed to investigate whether discharging patients on the DOS after THA and TKA leads to an increased burden on other parts of the healthcare system when compared with patients not discharged on the DOS. This was achieved by comparing readmissions within 3 months, contacts with the

© 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group, on behalf of the Nordic Orthopedic Federation. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits ­unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. DOI 10.1080/17453674.2021.1922966


558

discharging department, the surgeon, or other departments— both physical turnouts and by phone, as well as contacts with primary care physicians within 3 weeks.

Patients and methods The cohort of patients included in this study is the same as in a previously published study (Husted et al. 2021). That study investigated whether patient-reported outcomes were affected by DDOS after THA and TKA. Patients with an ASA score of 1–2 without sleep apnea requiring treatment, undergoing primary unilateral THA or TKA as 1st or 2nd in the surgical theater between January 2016 and June 2017 were scheduled for same-day discharge as previously described (Gromov et al. 2017, Husted et al. 2021). All THA and TKA patients had surgery performed by experienced surgeons in a standardized fast-track setup (Husted 2012). Perioperative treatment for the included patients did not differ from standard treatment for all patients operated on during the same period. All patients (also patients not suitable for DDOS who were not included in the study) participated in a preoperative interdisciplinary seminar, covering all aspects of surgery and hospital stay, including information on surgical procedure, postoperative treatment, and discharge criteria. Patients included in the study and eligible for DDOS did not receive any additional information. Spinal anesthesia was used for both THA and TKA patients as well as preoperative single-shot high-dose methylprednisolone (Lunn et al. 2011, 2013). THA and TKA patients were given 2 doses of intravenous tranexamic acid (TXA) preoperatively, and TKA patients received an additional intra-articular dose during surgery. No drains were used for either type of surgery. THAs were performed using a standard posterolateral approach. A standard medial parapatellar approach without the use of a tourniquet was employed for all TKA patients and they all received local infiltration analgesia (LIA) (Andersen and Kehlet 2014). Oral thromboprophylaxis in the form of rivaroxaban was administered 6–8 hours after surgery and continued until discharge. Postoperatively, patients had a short stay in the postoperative recovery unit after which they were transferred to the orthopedic ward where full weight-bearing mobilization was attempted as quickly as possible. Celecoxib 200 mg/12 hours and paracetamol 1 g/6 hours were used as pain medication for the first postoperative week. Oral morphine 10 mg p.n. was used as a rescue analgesic only. Patients received physiotherapy from the DOS until discharge and patients were referred for public outpatient physiotherapy after discharge for as long as seemed fit. Specified discharge criteria had to be fulfilled before discharge. These included intraoperative blood loss < 500 mL, pain scores < 3 during rest and < 5 during mobilization (VAS 0–10), spontaneous urination, and successful mobilization. Lastly, all these discharge criteria had to be fulfilled before

Acta Orthopaedica 2021; 92 (5): 557–561

8 pm on the DOS, and an adult had to be present with the patient for the first day after discharge. Information on time of discharge was recorded. Readmissions and complications within 3 months after surgery were recorded using a regional database covering all contacts with the hospital (Gromov et al. 2019). At 3 weeks’ follow-up patients were asked whether or not they had contacted their primary care providers regarding any aspects of the surgery and/or hospital stay, as contacts with primary care are not registered in the regional database. In addition to the readmissions described above, all telephone contacts with the department and/or the surgeon within 3 weeks following surgery were recorded. Statistics Statistical analyses were performed in IBM SPSS Statistics 25 (IBM Corp, Armonk, NY, USA). Normality was tested using the Shapiro–Wilks test after which Pearson’s chi-square test and an independent samples t-test were used to compare data. A statistically significant difference was defined as p < 0.05 when comparing 2 sets of comparable data. Ethics, funding, and potential conflicts of interest No approval from the National Ethics Committee was necessary as this was a non-interventional observational study. The study was approved by the Danish Data Protection Agency (entry no. 20047-58-0015). This work was sponsored by grants from the Lundbeck Foundation and Zimmer-Biomet, which had no influence on any part of the study or on the content of the paper. The authors declare no conflicts of interest.

Results From December 2015 through June 2017, 275 patients were scheduled for outpatient THA and TKA. 14 of these were not included in this study because of incomplete data. Among these, 4 were DDOS patients and 10 were nDDOS patients, resulting in 96% data completeness for DDOS and 94% data completeness for nDDOS patients. Therefore, 261 patients remained and were included in this study, consisting of 135 THA patients and 126 TKA patients. 33% (n = 45) of THA patients were discharged on the DOS, whereas 37% (n = 47) of TKA patients managed the same. The remaining patients were all discharged from the ward to their own homes the day after surgery. These results have previously been published (Husted et al. 2021). In order to compare patients accurately, all patients were divided into two groups depending on the type of surgical procedure they underwent: THA or TKA. Furthermore, they were sub-grouped based on whether they were discharged on the DOS or not. The results of this study can be seen in Tables 1 and 2. No statistically significant differences were found between DDOS THA patients and nDDOS THA patients with regard


Acta Orthopaedica 2021; 92 (5): 557–561

559

Table 1. Demographics, numbers, and statistical significance Factor THA patients, n Male, n Mean age (SD) Mean blood loss, L (SD) TKA patients, n Male, n Mean age (SD) Mean blood loss, L (SD)

DDOS

Not DDOS

45 33 60 (12) 0.31 (0.13) 48 20 60 (11) 0.23 (0.10)

90 48 63 (12) 0.41 (0.28) 78 32 60 (11) 0.22 (0.15)

Table 2. Readmissions and contacts with the healthcare system. Values are count

p-value 0.03 a 0.2 b 0.02 b 0.9 a 0.9 b 0.8 b

DDOS = discharge on the day of surgery.

a Pearson’s chi-square test. b Independent samples t-test.

to readmissions within the first 3 months after surgery, physical turnouts to the discharging department, contact with other departments, or with their general practitioners. 11% (n = 5) of DDOS THA patients were readmitted to hospital within 3 months. whereas this was the case for 17% (n = 15) of nDDOS THA patients (p = 0.4). Within the first 3 postoperative weeks, 13% (n = 6) of DDOS THA patients had physical turnouts to the discharging department compared with 10% (n = 9) for nDDOS THA patients (p = 0.6). Only one of the DDOS THA patients (2%) had contact with another department in the first 3 weeks after surgery, whereas this was the case for 6% (n = 5) of nDDOS THA patients (p = 0.4). Among DDOS THA patients, 11% (n = 5) contacted their own general practitioners during the first 3 weeks following surgery, while 17% (n = 15) of nDDOS THA patients did the same (p = 0.4). None of the patients discharged on the DOS following THA had any telephone contact with the discharging department for the first 3 postoperative weeks, whereas 19% (n = 17) of nDDOS THA had telephone contact with the department during this time (p = 0.002). Among TKA patients, there were no statistically significant differences between DDOS and nDDOS patients regarding readmissions within the first 3 postoperative months, physical turnouts to the discharging department, contact with other departments, or with their general practitioners. 10% (n = 5) of DDOS TKA patients were readmitted to hospital during the first 3 postoperative months and the same was the case for 22% (n = 17) of nDDOS TKA patients (p = 0.1) The discharging department was physically contacted by 35% (n = 17) of DDOS TKA patients and 22% (n = 17) of nDDOS TKA patients within the first 3 weeks after surgery (p = 0.1). 10% (n = 5) of DDOS TKA patients contacted other departments before 3 postoperative weeks had passed, while 17% (n = 13) of nDDOS TKA patients did the same (p = 0.3). Among TKA DDOS patients 40% (n = 19) had contact with their general practitioners during the first 3 weeks after surgery, whereas this was the case for 32% (n = 25) of nDDOS TKA patients (p = 0.4). A statistically significant difference between DDOS and nDDOS TKA patients was found (p = 0.01) when comparing telephone contact with the

Not Factor DDOS DDOS p-value a THA patients, n Readmissions within 3 months Physical turnouts to discharging department within 3 weeks Contact within 3 weeks with discharging department by telephone other departments general practitioner TKA patients, n Readmissions within 3 months Physical turnouts to discharging department within 3 weeks Contact within 3 weeks with discharging department by telephone other departments general practitioner

45 5

90 15

0.4

6

9

0.6

0 1 5 48 5

17 5 15 78 17

0.002 0.4 0.4

17

17

0.1

14 5 19

9 13 25

0.01 0.3 0.4

0.1

DDOS = discharge on the day of surgery. chi-square test

a Pearson’s

discharging department during the first 3 postoperative weeks. 29% (n = 14) of DDOS patients had contacted the discharging department by telephone, whereas this was the case for only 12% (n = 9) of nDDOS patients.

Discussion We found that patients discharged on the DOS following THA or TKA did not differ statistically significantly from patients not discharged on the DOS with regard to contacts with the healthcare system. This was the case as readmissions within 3 months were similar between groups. Furthermore, the 2 groups of patients did not differ statistically when comparing physical contact with the discharging department and contact with other departments. Finally, DDOS and nDDOS patients did not differ statistically in terms of contact with general practitioners. These results indicate that discharging patients on the DOS after THA or TKA does not lead to an extra burden on other parts of the healthcare system. This study is, to our knowledge, one of the first studies to investigate contacts with primary care following outpatient THA and TKA. Previous studies on outpatient arthroplasty have focused on safety, measured as hospital contacts and readmissions, as major complications requiring treatment will most often be readmitted to hospital. We found no difference regarding readmissions when comparing patients discharged on the DOS and patients scheduled for DOS discharge who ended up staying overnight. This is in line with several previous studies that did not find any increased risk of readmissions in selected patients following outpatient THA and TKA (Pollock et al. 2016, Gromov et al. 2019, Xu et al. 2019, Coenders et al. 2020).


560

Few studies have investigated and compared postoperative contacts between total joint arthroplasty (TJA) outpatients and inpatients. Goyal et al. (2017) compared DDOS and nDDOS THA patients and found a similar number of contacts between the two groups of patients and the office staff. They also found that DDOS and nDDOS patients were similar with regard to healthcare provider visits before their 4-week follow up. Both findings are consistent with our results. A statistically significant difference was found when comparing the amount of telephone contact with the discharging department among DDOS and nDDOS patients. DDOS THA patients had significantly less telephone contact with the discharging department than nDDOS THA patients, whereas the opposite was the case with regard to TKA patients. Why this was the case is debatable but as we did not register the reasons for telephone contacts, all we can do for now is speculate. One could argue that the shortened length of stay (LOS) in hospital associated with outpatient surgery might lead to less patient education, ultimately resulting in an additional need for more post-discharge contact with the discharging department. However, this would only explain the results of TKA patients in this study. At the same time, the group of patients who are discharged on the DOS might be “fitter” and as a result thereof have fewer complications post-discharge, including readmissions and contact with both the primary and secondary healthcare sector. This may be a confounding factor. Why DDOS THA patients had less telephone contact with the discharging department than nDDOS patients remains inexplicable but pleasing. Not many studies have previously investigated telephone contact after THA and TKA, and, to our knowledge, this has been investigated only by Shah et al. (2019). They compared outpatients with one-night inpatients following TJA and found that the two groups of patients had similar telephone contact with the surgical team post-discharge. They also investigated the subject matters of the phone calls and found that pain, nausea, medication, sleep problems, urination, leg swelling, and physical therapy were the main topics of discussion. These are important findings as they could help facilitate better education of patients both pre- and postoperatively. We did not identify reasons for telephone contacts in our study and therefore cannot compare them directly with the study by Shah et al. This study has limitations. 1st, a limitation in terms of the nature of our data exists as our data is of a quantitative nature and therefore does not allow for qualitative analyses. 2nd, this study was conducted at a department where outpatient THA and TKA are procedures that have been performed for a long time. Therefore, our results may not be directly transferable to another department where outpatient surgery is not the tradition. However, this could also be seen as a strength, as the department’s routine in outpatient TJA reduces other confounding factors that may affect the results. 3rd, patients included in this study represent a subgroup of patients who were prepared both mentally and physically for outpatient sur-

Acta Orthopaedica 2021; 92 (5): 557–561

gery. Therefore, these patients may have been keener on sameday discharge and more tolerant of postoperative complications than other groups of patients. 4th, we only registered patient-reported contacts with primary care, allowing for recall bias. However, we believe it is safe to assume that patients can remember correctly whether they have contacted their primary care physician within the last 3 weeks or not. Finally, it is important to highlight that we investigated selected patients scheduled for outpatient TJA, thus our findings may not be transferable to a wide group of patients. In conclusion, we found that patients discharged on the DOS following THA or TKA were generally similar with regard to postoperative contacts with the healthcare system when compared with patients not discharged on the DOS; hence these patients do not require extra resources from the discharging department or primary care, even with just a few hours in hospital after surgery. In the future, it would be of great value to investigate the reasons leading to DDOS and nDDOS patients contacting the discharging department by telephone and whether these reasons differ between the two groups of patients. An increased focus on patient education and information could further decrease the number of telephone contacts with the discharging department—especially following TKA. Ergo, this study adds to the mounting evidence that outpatient surgery is a good option for a subgroup of patients.

CEH, HH, and KG planned the study. CEH, HH, CSN, MM, AT, and KG were responsible for the logistical setup and collected the data. CH and KG analyzed the data. CH wrote the first draft of the paper; all authors revised the paper. Acta thanks David Houlihan-Burne for help with peer review of this study.   Andersen L Ø, Kehlet H. Analgesic efficacy of local infiltration analgesia in hip and knee arthroplasty: a systematic review. Br J Anaesth 2014; 113(3): 360-74. Andersen S H, Husted H, Kehlet H. Economic consequences of accelerated care pathways in total knee-arthroplasty. Ugeskr Laeger 2009; 171(45): 3276-80. Andreasen S E, Holm H B, Jørgensen M, Gromov K, Kjærsgaard-Andersen P, Husted H. Time-driven activity-based cost of fast-track total hip and knee arthroplasty. J Arthroplasty 2017; 32(6): 1747-55. Burn E, Edwards C J, Murray D W, Silman A, Cooper C, Arden N K, Pinedo-Villanueva R, Prieto-Alhambra D. Trends and determinants of length of stay and hospital reimbursement following knee and hip replacement: evidence from linked primary care and NHS hospital records from 1997 to 2014. BMJ Open 2018; 8(1): e019146. Coenders M J, Mathijssen N M C, Vehmeijer S B W. Three and a half years’ experience with outpatient total hip arthroplasty. Bone Joint J 2020; 102-B(1): 82-9. Gibon E, Parvataneni H K, Prieto H A, Photos L L, Stone W Z, Gray C F. Outpatient total knee arthroplasty: is it economically feasible in the hospital setting? Arthroplast Today 2020; 6(2): 231-5. Goyal N, Chen A F, Padgett S E, Tan T L, Kheir M M, Hopper R H Jr, Hamilton W G, Hozack W J. Otto Aufranc Award: A multicenter, randomized study of outpatient versus inpatient total hip arthroplasty. Clin Orthop Relat Res 2017; 475(2): 364-72.


Acta Orthopaedica 2021; 92 (5): 557–561

Gromov K, Kjærsgaard-Andersen P, Revald P, Kehlet H, Husted H. Feasibility of outpatient total hip and knee arthroplasty in unselected patients. Acta Orthop 2017; 88(5): 516-21. Gromov K, Jørgensen C C, Petersen P B, Kjaersgaard-Andersen P, Revald P, Troelsen A, Kehlet H, Husted H. Complications and readmissions following outpatient total hip and knee arthroplasty: a prospective 2-center study with matched controls. Acta Orthop 2019; 90(3): 281-5. Husted H. Fast-track hip and knee arthroplasty: clinical and organizational aspects. Acta Orthop Suppl. 2012; 83(346): 1-39. Husted C E, Husted H, Holm Ingelsrud L, Skovgaard Nielsen C, Troelsen A, Gromov K. Are functional outcomes and early pain affected by discharge on the day of surgery following Total Hip and Knee Arthroplasty? Acta Orthop 2021; 92(1): 62-66 Husted H, Kristensen B B, Andreasen S E, Skovgaard Nielsen C, Troelsen A, Gromov K. Time-driven activity-based cost of outpatient total hip and knee arthroplasty in different set-ups. Acta Orthop 2018; 89(5): 515-21. Jørgensen C C, Kehlet H, Lundbeck Foundation Centre for Fast-track Hip and Knee Replacement Collaborative group. Time course and reasons for 90-day mortality in fast-track hip and knee arthroplasty. Acta Anaesthesiol Scand 2017; 61(4): 436-44. Khan S K, Malviya A, Muller S D, Carluke I, Partington P F, Emmerson K P, Reed M R. Reduced short-term complications and mortality following Enhanced Recovery primary hip and knee arthroplasty: results from 6,000 consecutive procedures. Acta Orthop 2014; 85(1): 26-31. Lovald S T, Ong K L, Malkani A L, Lau E C, Schmier J K, Kurtz S M, Manley M T. Complications, mortality, and costs for outpatient and short-stay total knee arthroplasty patients in comparison to standard-stay patients. J Arthroplasty 2014; 29(3): 510-15.

561

Lunn T H, Husted H, Solgaard S, Kristensen B B, Otte K S, Kjersgaard A G, Gaarn-Larsen L, Kehlet H. Intraoperative local infiltration analgesia for early analgesia after total hip arthroplasty: a randomized, double-blind, placebo-controlled trial. Reg Anesth Pain Med 2011; 36(5): 424-9. Lunn T H, Andersen L Ø, Kristensen B B, Husted H, Gaarn-Larsen L, Bandholm T, Ladelund S, Kehlet H. Effect of high-dose preoperative methylprednisolone on recovery after total hip arthroplasty: a randomized, double-blind, placebo-controlled trial. Br J Anaesth 2013; 110(1): 66-73. Petersen P B, Jørgensen C C, Kehlet H, Lundbeck Foundation Center for Fast-track Hip and Knee Replacement collaborative group. Temporal trends in length of stay and readmissions after fast-track hip and knee arthroplasty. Dan Med J 2019; 66(7): A5553. Petersen P B, Kehlet H, Jørgensen C C, Lundbeck Foundation Centre for Fast-track Hip and Knee Replacement Collaborative Group. Improvement in fast-track hip and knee arthroplasty: a prospective multicentre study of 36,935 procedures from 2010 to 2017. Sci Rep 2020; 10(1): 21233. Pollock M, Somerville L, Firth A, Lanting B. Outpatient total hip arthroplasty, total knee arthroplasty, and unicompartmental knee arthroplasty: a systematic review of the literature. JBJS Rev 2016; 4(12): 01874474201612000-00004. Shah R P, Karas V, Berger R A. Rapid discharge and outpatient total joint arthroplasty introduce a burden of care to the surgeon. J Arthroplasty 2019; 34(7): 1307-11. Vehmeijer S B W, Husted H, Kehlet H. Outpatient total hip and knee arthroplasty. Acta Orthop 2018; 89(2): 141-4. Xu J, Cao J Y, Chaggar G S, Negus J J. Comparison of outpatient versus inpatient total hip and knee arthroplasty: a systematic review and metaanalysis of complications. J Orthop 2019; 17: 38-43.


562

Acta Orthopaedica 2021; 92 (5): 562–567

Rapid decline of yearly number of hip arthroscopies in Sweden: a retrospective time series of 6,105 hip arthroscopies based on a national patient data register Tobias WÖRNER 1,2, Frida EEK 1, Jesper KRAUS-SCHMITZ 3,4, Mikael SANSONE 5, and Anders STÅLMAN 2,3 1 Department

of Health Sciences, Lund University, Lund; 2 Capio Artro Clinic, Stockholm; 3 Stockholm Sports Trauma Research Center, Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm; 4 Department of Orthopaedics, Skåne University Hospital, Malmö; 5 Gothenburg Sports and Trauma Research Center, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden Correspondence: tobias.worner@med.lu.se Submitted 2021-02-09. Accepted 2021-04-19.

Background and purpose — Hip arthroscopies (HAs) have increased exponentially worldwide and are expected to continue rising. We describe time trends in HA procedures in Sweden (10 million inhabitants) between 2006 and 2018 with a focus on procedure rates, surgical procedures, and patient demographics such as age and sex distribution. Patients and methods — We retrospectively collected data from the Swedish National Patient Register (NPR) for all surgeries including surgical treatment codes considered relevant for HA from 2006 to 2018. Surgical codes were validated through a multiple-step procedure and classified into femoroacetabular impingement syndrome (FAIS) related or non-FAIS related procedure. Frequencies, sex differences, and time trends of surgical procedures and patient demographics are presented. Results — After validation of HA codes, 6,105 individual procedures, performed in 4,924 patients (mean age 34 years [SD 12]) were confirmed HAs and included in the analysis. Yearly HA procedure rates increased from 15 in 2006 to 884 in 2014, after which a steady decline was observed with 469 procedures in 2018. The majority (65%) of HAs was performed in males. Male patients were younger, and surgeries on males more frequently included an FAIS-related procedure. Interpretation — Similar to previous studies in other parts of the world, we found dramatic increases in HA procedures in Sweden between 2006 and 2014. Contrary to existing predictions, HA rates declined steadily after 2014, which may be explained by more restrictive patient selection based on refined surgical indications, increasing evidence, and clinical experience with the procedure.

Hip arthroscopy was long deemed impossible due to anatomic constraints. Easier arthroscopic access to knee and shoulder joints led to an increasing arthroscopy rate in these joints during the 1990s and 2000s (Kim et al. 2011, Colvin et al. 2012a). During the 1990s, improved surgical equipment and techniques enabled surgeons to gain easier access to the hip joint for diagnosis and treatment of a variety of pathologies (Griffiths and Khanduja 2012), including femoroacetabular impingement syndrome (FAIS), acetabular labrum tears, and chondral lesions (Bedi et al. 2013). Arthroscopic hip surgery has been one of the fastest emerging fields within orthopedics and might be at a tipping point for even wider use (Khan et al. 2016a). An exponential worldwide increase in performed HAs has been documented between 2000 and 2013, based on data from private insurance databases (Sing et al. 2015, Maradit Kremers et al. 2017, Bonazza et al. 2018), performance data from surgical trainees (Colvin et al. 2012b, Bozic et al. 2013) and data from national health services (Palmer et al. 2016). While exponentially more patients received HA, evidence for its effectiveness has been questioned (Reiman and Thorborg 2015). In recent years, RCTs have indicated that hip arthroscopy may be more effective than structured rehabilitation in the treatment of FAIS (Griffin et al. 2018, Palmer et al. 2019). The clinical relevance of the statistical superiority for HA found in these trials is debated (Ferreira et al. 2021); however, a continued rise in HA rates has been predicted worldwide (Khan et al. 2016a, Palmer et al. 2016). The only study assessing HA rates beyond 2013 reports declining rates in Finland after 2014 (Karelson et al. 2020). In Sweden, time trends regarding HA have not been investigated. It is therefore unknown whether the rise in HA has continued, or if surgical practice has changed over the years.

© 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group, on behalf of the Nordic Orthopedic Federation. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits ­unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. DOI 10.1080/17453674.2021.1928396


Acta Orthopaedica 2021; 92 (5): 562–567

Therefore, we describe frequency and time trends in performance of hip arthroscopies, with regards to performance rates, surgical procedures, and patient demographics (age and gender distribution) in Sweden.

Patients and methods Study design A retrospective analysis of national patient register data was undertaken. Data source Data was retrieved from the Swedish National Patient Register (NPR). The NPR is a national register, established by the National Board of Health and Welfare in 1960. Since 2001 all in- and outpatient services in Sweden are obliged to provide patient, geographical, administrative, as well as medical data to the NPR. Surgical procedures in the NPR are coded according to the Swedish version of the NOMESCO Classification of Surgical Procedures (NCSP-S). Each surgery can contain several different surgical codes. Diagnoses are coded according to the International Classification of Diseases (ICD), version 10. Data collection and data validation During the data collection we applied a multiple step procedure to validate the data and to yield a final cohort of surgeries with best possible specificity. Step 1: Initial data request from the NPR We requested data for all surgeries including any NCSP-S codes potentially indicating HA, that started with NE (“Musculoskeletal system, Pelvis”) or NF (Musculoskeletal system, Hip joint and thigh”) and were performed between 2001 and 2018. The collected data for each surgery included all surgical codes as well as diagnostic codes, clinic, date, patient age, and sex. Step 2: Primary selection and classification based on national coding practice Next, we categorized surgical codes and selected cases according to coding practices, which, as agreed upon during meetings of Swedish orthopedic surgeons performing HA, were expected to be used nationally. We thus selected all cases with surgical codes that potentially represented HA. Surgeries with combinations of codes including simultaneous codes for hip replacement surgery were excluded. Step 3: Validation of codes through personal communication with clinics Contrary to our expectations, we did not find uniform coding practices but large variation between clinics. We therefore presented all surgical units known to perform HA (N = 18)

563

with the preliminary results for their clinic and asked them to verify number of performed HAs and codes used. Data for all clinics that, to our knowledge, were not performing HA were excluded. In case of discrepancies between coding practices applied in our data set and coding practice reported by the clinics, we re-categorized the codes in our data set to fit actual applied practice. During this process we discovered that the 2 clinics with highest general HA rates had a complete gap in reporting (1 and 2.5 years respectively). Therefore, we requested and received the unavailable data directly from these clinics. With updated coding practices and previously unavailable data added to the file we then categorized codes as: FAIS surgery (cam resection, pincer resection, unspecified FAIS surgery), or non-FAIS surgery (psoas tenotomy, cartilage procedure, synovectomy, removal of loose bodies, labral procedures). Step 4: Final selection after exclusion and qualitative review of selected cases Surgeries with diagnosis codes indicating simultaneous fracture, or surgical codes indicating knee surgery, were excluded from the data set. In the last step, an orthopedic surgeon (AS) reviewed all surgeries with any HA code and excluded surgeries with unrealistic code combinations or other indications of open surgery as opposed to arthroscopic procedures. Due to few potential HAs and uncertain coding during the 1st years of the period, we finally included, and report on, surgeries performed between 2006 and 2018. Ethics, funding, and potential conflicts of interest This study was approved by the ethical board of Karolinska Institute (Dnr: 2019-04514). We received no external funding for the performance of the study. While 3 of the authors are clinically involved with performing HA or treating patients following the procedure, none declare any conflicts of interest that could affect the results of this study.

Results We received 18,148 individual cases with any NE or NF surgical codes from the NPR. Each surgery case included between 1 and 30 unique surgical codes. After the selection and validation process, 6,105 individual procedures were included in the final analysis (Figure 1). The 6,105 final procedures were performed in 4,924 individual patients (Table). The majority of HAs (65%) were performed on male patients. Mean age (SD) at first surgery was 34 years (SD 12). The majority of patients (82%) had only 1 HA, 15% had two HAs, and 4% had 3 or more HAs performed during the study period. NPR data excludes information concerning the side of surgery, therefore we could not differentiate between reoperations and bilateral surgery in cases where surgery was performed on different occasions.


564

Acta Orthopaedica 2021; 92 (5): 562–567

Procedures performed among all identified hip arthroscopies (n = 6,105). Values are count (%)

Cases received from the NPR (all cases with NE/NF code) n = 18,148

FAIS surgery with unspecific surgical codes with specific surgical codes Cam resection Pincer resection Non-FAIS-related surgery a Psoas tenotomy Synovectomy Removal of free body Diagnostic arthroscopy Labral reconstruction Cartilage procedures During FAIS surgery During non-FAIS-related surgery Number of HA codes in same surgery 1 2 3 4

Case selection according to national coding practices n = 16,771 Case selection based on clinics that unquestionably performed HA over the study period n = 7,705 Case selection based re-coding following code validation by respective clinics n = 6,436 Inclusion of additional cases not reported to the NPR n = 909

Exclusion of cases with codes confirming non HA-surgery or based on qualitative surgeon review of all cases n = 1,173

Case selection after review n = 6,172 Exclusion of cases prior to 2006 n = 67

5,226 (86) 4,127 (68) 1,099 (18) 946 (16) 279 (5) 879 (14) 367 (6) 243 (4) 78 (1) 127 (2) 64 (1) 4,924 (81) 4,402 (72) 522 (9) 1,335 (22) 4,500 (74) 252 (4) 18 (0.3)

FAIS = femoroacetabular impingement syndrome. a Non-FAIS-related procedures may occasionally have been performed during FAIS related surgery.

Cases included in the final cohort n = 6,105

Figure 1. Flowchart of the selection and validation process. Annual number of procedures

Annual number of procedures

Mean age (SD)

1,000

1,000

60

800

Male Female

All hip HAs HAs with FAI code HAs without FAI code

Male Female

800 45

600

600

400

400

200

200

30

0 2006

2008

2010

2012

2014

2016

2018

Figure 2. Number of hip arthroscopies (HAs) performed between 2006 and 2018. FAI = Femoroacetabular impingement.

0 2006

15

2008

2010

2012

2014

2016

2018

Figure 3. Number of hip arthroscopies between 2006 and 2018 by sex.

A FAIS procedure was included in 90% of HAs performed on male and 77% of HAs performed on female patients. The total number of performed HAs increased from 15 in 2006 to 884 in 2014, after which it declined steadily to 469 in 2018 (Figure 2). During the first years, equal proportions of males and females received HA; however, from 2009 onwards, men comprised 60–70% of the patients undergoing the procedure (Figure 3). Average age at the time of surgery remained between 30 years (SD 12) and 36 years (SD 13) years throughout the study period. Male patients were generally younger (mean 33 years [SD 12]) than female patients (mean 36 years [SD 12]) (Figure 4).

0

2006

2008

2010

2012

2014

2016

2018

Figure 4. Age at time of hip arthroscopy for men and women.

Discussion This retrospective analysis of a national patient registry describes an exponential increase in arthroscopic procedures with treatment codes for hip arthroscopy between 2006 and 2014. The increase in HA rates appears to be driven by an increase in diagnosis and treatment of FAIS, which was more often performed in male than in female patients. After 2014, procedure rates of HA began to drop every year, with this decrease continuing until the end of the study period in 2018.


Acta Orthopaedica 2021; 92 (5): 562–567

Hip arthroscopy in Sweden between 2006 and 2014 Hip arthroscopy gained popularity in the early 2000s and its uptake increased rapidly during the first decade of the new millennium (Colvin et al. 2012b, Bozic et al. 2013, Sing et al. 2015, Palmer et al. 2016, Maradit Kremers et al. 2017, Bonazza et al. 2018). We observed the same pattern in Sweden with few procedures in 2006, an exponential increase in HA rates between 2008 and 2012, and the peak in 2014. The life cycle of a surgical technique can be described in several stages (McCulloch et al. 2009), which have also been discussed in the context of HA practice (Khan et al. 2016a). After an innovation stage, in which a new treatment is used by pioneering surgeons as a solution to a clinical problem, the treatment is developed and explored further by early adopters, and larger numbers of patients thus receive surgery with broadened indications. FAIS-related procedures (cam and pincer resection) were the driving indications behind the increased HA rates in our study. In 2003, surgeons from Switzerland coined the terms cam and pincer morphology, reporting observed impingement between articular surfaces leading to femoroacetabular impingement (Ganz et al. 2003). This mechanical phenomenon is suggested to be a cause for symptoms and an etiological factor for the development of osteoarthritis (Ganz et al. 2003), which serves as theoretical framework for surgical treatment for FAIS (Griffin et al. 2018). FAIS has been defined by a recent consensus as a clinical disorder with a triad of typical symptoms, clinical findings, and radiological evidence of cam and/or pincer morphology (Griffin et al. 2016). Between 2005 and 2010 the number of scientific publications (mainly case series and expert opinion) on arthroscopic treatment of FAIS increased rapidly. During this period, which can be considered the development and exploration stage of HA in the treatment of FAIS, we observed the most rapid increase in HA rates in Sweden. Our data shows a continued increase in rates between 2010 and 2014, a time during which the number of scientific publications on FAIS continued to increase, including an increasing number of prospective studies (Khan et al. 2016b). During the same period, national and local hip arthroscopy registries were developed in Denmark and Sweden (Sansone et al. 2014, Mygind-Klavsen et al. 2016). Hip arthroscopy in Sweden between 2014 and 2018 Our study is one of the first studies to describe HA rates beyond 2013. In the 1st years following the peak in 2014, HA rates declined steeply, but this decline was less marked towards 2018. A similar pattern has been observed in a recent study from Finland, where HA rates declined after a peak in 2013 (Karelson et al. 2020). Due to the lack of comparable studies from other parts of the world we do not know if HA rates follow a similar pattern in other countries during this time period or if they keep on rising as predicted by previous studies (Khan et al. 2016a, Palmer et al. 2016). The decline in surgery rates may be explained by the natural development of surgical practice after its innovation, development, and

565

exploration stage (McCulloch et al. 2009). In the exploration phase, a new technique is adopted by increasing numbers of surgeons and indications for the procedure are explored and broadened. Through a prospective learning process based on surgical outcomes, surgeons likely refine their surgical indications over time. This increased awareness of patient selection, potentially facilitated by 1st results of register-based studies (Sansone et al. 2014, Mygind-Klavsen et al. 2016), may be a potential explanation for the decrease in HA rates after 2014. HA practice can be considered to have only been on the verge of the assessment stage of surgical innovation (McCulloch et al. 2009) once the number of performed procedures started to decline. HA rates in Sweden and Finland started to decline 4 years before the first randomized trials tested effectiveness of the procedure in comparison with non-surgical treatments (Griffin et al. 2018, Palmer et al. 2019). Based on our data we can only judge the development of HA procedure rates until 2018 and are therefore not able to identify potential effects of emerging RCT evidence. This evidence points towards superior outcomes in patients with FAIS when treated using HA compared with non-surgical treatment (Ferreira et al. 2021). It is reasonable to assume that more RCT evidence, better knowledge about which patients benefit most from HA (e.g., increased treatment effect for resection of cam morphology) (Griffin et al. 2018), and improved non-surgical treatment strategies will lead to more evidence-based clinical practice. In turn, HA rates may reach a steady level in coming years. Surgical procedures and patient demographics The rates of performed HA observed in our study were driven by FAIS-related surgery. We also found that 2/3 of all HA were performed on male patients, which is in contrast to previous database studies where the majority of patients were female (Palmer et al. 2016, Maradit Kremers et al. 2017, Bonazza et al. 2018). We believe that these 2 findings are related to each other and reflect the Swedish approach to HA. The main indication for HA is resection of cam and pincer morphology with the aim to treat FAIS. Resection of cam morphology is the most frequently performed HA procedure in Sweden (Sansone et al. 2014). This is also in line with our data, which shows that cam resections were 3 times more prevalent than pincer resections among all procedures with a specific FAIS code. Cam morphology is far better understood than pincer morphology. Cam morphology develops during adolescence and there is a dose–response relationship with athletic activity (Palmer et al. 2018). Further, cam morphology is associated with future risk of hip osteoarthritis (Agricola et al. 2013) and the treatment effect of removing a cam morphology is likely greater than that of removing a pincer morphology (Griffin et al. 2018). Cam morphology is found to be more common in men than in women (van Klij et al. 2018). This, in combination with the Swedish approach, focusing on the resection of cam morphology as primary indication for HA, may offer a possible explanation for our finding that the majority


566

of patients receiving HA were men. With a mean age of 35 at the time of index HA, patients in our study were also younger than patients in previous studies (> 40 years) (Maradit Kremers et al. 2017, Bonazza et al. 2018). In our study, male patients receiving HA were also younger (mean 33 years) than female patients (mean 37 years) receiving HA. The age and sex distribution observed in our study is, however, similar to other Swedish studies on HA patients (Sansone et al. 2014). As judged by our data, relatively young male patients treated for FAIS and likely receiving cam resection are the primary group of patients undergoing HA in Sweden. Methodological considerations While previous studies exploring time trends in HA have predominantly been based on data sources such as insurance data sets (Sing et al. 2015, Maradit Kremers et al. 2017, Bonazza et al. 2018), databases for surgical trainee performance (Colvin et al. 2012b, Bozic et al. 2013), or national health services excluding the private sector (Palmer et al. 2016), our study is based on a population-wide patient registry including all surgeries performed in Sweden during the study period. Due to inconsistency and variability in national coding practice in our data set we took extra measures in validating treatment codes and improving interpretability of our data. Through direct contact with surgical units and feedback regarding individual coding practices, we believe we have reached high specificity of our final selection of cases. Only a small share of FAISrelated procedures used specific surgical codes indicating cam or pincer resection, leaving us no other choice than coding many procedures under the umbrella term FAIS surgery. Furthermore, we did not perform a review of patient journals to confirm the individual procedures that have been coded. Conclusion The number of HA procedures performed in Sweden increased exponentially between 2006 and 2014. After 2014, HA rates declined steadily until 2018. The rise and fall of HA rates appear to be driven by treatment for FAIS, which is most frequently performed on male patients.

All authors contributed to the planning of the study. TW, JKS, and AS prepared and placed the order of data from the NPR. FE performed data management and analysis. AS and MS were responsible for communication with clinics regarding validating surgical codes and surgery rates. TW drafted the manuscript, which was critically appraised and revised by all co-authors. Acta thanks Enrico De Visser and Bent Lund for help with peer review of this study.

Agricola R, Heijboer M P, Bierma-Zeinstra S M, Verhaar J A, Weinans H, Waarsing J H. Cam impingement causes osteoarthritis of the hip: a nationwide prospective cohort study (CHECK). Ann Rheum Dis 2013; 72(6): 918-23. doi: 10.1136/annrheumdis-2012-201643.

Acta Orthopaedica 2021; 92 (5): 562–567

Bedi A, Kelly B T, Khanduja V. Arthroscopic hip preservation surgery: current concepts and perspective. Bone Joint J 2013; 95-B(1): 10-9. doi: 10.1302/0301-620X.95B1.29608. Bonazza N A, Homcha B, Liu G, Leslie D L, Dhawan A. Surgical trends in arthroscopic hip surgery using a large national database. Arthroscopy 2018; 34(6): 1825-30. doi: 10.1016/j.arthro.2018.01.022. Bozic K J, Chan V, Valone F H, 3rd, Feeley B T, Vail T P. Trends in hip arthroscopy utilization in the United States. J Arthroplasty 2013; 28(8 Suppl.): 140-3. doi: 10.1016/j.arth.2013.02.039. Colvin A C, Egorova N, Harrison A K, Moskowitz A, Flatow E L. National trends in rotator cuff repair. J Bone Joint Surg Am 2012a; 94(3): 227-33. doi: 10.2106/JBJS.J.00739. Colvin A C, Harrast J, Harner C. Trends in hip arthroscopy. J Bone Joint Surg Am 2012b; 94(4): e23. doi: 10.2106/JBJS.J.01886. Ferreira G E, O’Keeffe M, Maher C G, Harris I A, Kwok W S, Peek A L, Zadro J R. The effectiveness of hip arthroscopic surgery for the treatment of femoroacetabular impingement syndrome: s systematic review and meta-analysis. J Sci Med Sport 2021; 24(1): 21-9. doi: 10.1016/j. jsams.2020.06.013. Ganz R, Parvizi J, Beck M, Leunig M, Notzli H, Siebenrock K A. Femoroacetabular impingement: a cause for osteoarthritis of the hip. Clin Orthop Relat Res 2003; (417): 112-20. doi: 10.1097/01.blo.0000096804.78689.c2. Griffin D R, Dickenson E J, O’Donnell J, Agricola R, Awan T, Beck M, Clohisy J C, Dijkstra H P, Falvey E, Gimpel M, Hinman R S, Holmich P, Kassarjian A, Martin H D, Martin R, Mather R C, Philippon M J, Reiman MP, Takla A, Thorborg K, Walker S, Weir A, Bennell K L. The Warwick Agreement on femoroacetabular impingement syndrome (FAI syndrome): an international consensus statement. Br J Sports Med 2016; 50(19): 1169-76. doi: 10.1136/bjsports-2016-096743. Griffin D R, Dickenson E J, Wall P D H, Achana F, Donovan J L, Griffin J, Hobson R, Hutchinson C E, Jepson M, Parsons N R, Petrou S, Realpe A, Smith J, Foster N E, FASHIoN Study Group. Hip arthroscopy versus best conservative care for the treatment of femoroacetabular impingement syndrome (UK FASHIoN): a multicentre randomised controlled trial. Lancet 2018; 391(10136): 2225-35. doi: 10.1016/S01406736(18)31202-9. Griffiths E J, Khanduja V. Hip arthroscopy: evolution, current practice and future developments. Int Orthop 2012; 36(6): 1115-21. doi: 10.1007/ s00264-011-1459-4. Karelson M C, Jokihaara J, Launonen A P, Huttunen T, Mattila V M. Lower nationwide rates of arthroscopic procedures in 2016 compared with 1997 (634925 total arthroscopic procedures): has the tide turned? Br J Sports Med 2020. Online ahead of print. doi: 10.1136/bjsports-2019-101844. Khan M, Ayeni O R, Madden K, Bedi A, Ranawat A, Kelly B T, Sancheti P, Ejnisman L, Tsiridis E, Bhandari M. Femoroacetabular impingement: have we hit a global tipping point in diagnosis and treatment? Results from the InterNational Femoroacetabular Impingement Optimal Care Update Survey (IN FOCUS). Arthroscopy 2016a; 32(5): 779-87 e4. doi: 10.1016/j. arthro.2015.10.011. Khan M, Oduwole K O, Razdan P, Phillips M, Ekhtiari S, Horner N S, Samuelsson K, Ayeni O R. Sources and quality of literature addressing femoroacetabular impingement: a scoping review 2011–2015. Curr Rev Musculoskelet Med 2016b; 9(4): 396-401. doi: 10.1007/s12178-0169364-5. Kim S, Bosque J, Meehan J P, Jamali A, Marder R. Increase in outpatient knee arthroscopy in the United States: a comparison of National Surveys of Ambulatory Surgery, 1996 and 2006. J Bone Joint Surg Am 2011; 93(11): 994-1000. doi: 10.2106/JBJS.I.01618. Maradit Kremers H, Schilz S R, Van Houten H K, Herrin J, Koenig K M, Bozic K J, Berry D J. Trends in utilization and outcomes of hip arthroscopy in the United States between 2005 and 2013. J Arthroplasty 2017; 32(3): 750-5. doi: 10.1016/j.arth.2016.09.004. McCulloch P, Altman D G, Campbell W B, Flum D R, Glasziou P, Marshall J C, Nicholl J, Balliol C, Aronson J K, Barkun J S, Blazeby J M, Boutron I C, Campbell W B, Clavien P A, Cook J A, Ergina P L, Feldman L S, Flum D R, Maddern G J, Nicholl J, Reeves B C, Seiler C M, Strasberg S M, Meakins J L, Ashby D, Black N, Bunker J, Burton M,


Acta Orthopaedica 2021; 92 (5): 562–567

Campbell M, Chalkidou K, Chalmers I, de Leval M, Deeks J, Ergina P L, Grant A, Gray M, Greenhalgh R, Jenicek M, Kehoe S, Lilford R, Littlejohns P, Loke Y, Madhock R, McPherson K, Meakins J, Rothwell P, Summerskill B, Taggart D, Tekkis P, Thompson M, Treasure T, Trohler U, Vandenbroucke J. No surgical innovation without evaluation: the IDEAL recommendations. Lancet 2009; 374(9695): 1105-12. doi: 10.1016/S0140-6736(09)61116-8. Mygind-Klavsen B, Gronbech Nielsen T, Maagaard N, Kraemer O, Holmich P, Winge S, Lund B, Lind M. Danish Hip Arthroscopy Registry: an epidemiologic and perioperative description of the first 2000 procedures. J Hip Preserv Surg 2016; 3(2): 138-45. doi: 10.1093/jhps/hnw004. Palmer A J, Malak T T, Broomfield J, Holton J, Majkowski L, Thomas G E, Taylor A, Andrade A J, Collins G, Watson K, Carr A J, GlynJones S. Past and projected temporal trends in arthroscopic hip surgery in England between 2002 and 2013. BMJ Open Sport Exerc Med 2016; 2(1): e000082. doi: 10.1136/bmjsem-2015-000082. Palmer A, Fernquest S, Gimpel M, Birchall R, Judge A, Broomfield J, Newton J, Wotherspoon M, Carr A, Glyn-Jones S. Physical activity during adolescence and the development of cam morphology: a crosssectional cohort study of 210 individuals. Br J Sports Med 2018; 52(9): 601-10. doi: 10.1136/bjsports-2017-097626.

567

Palmer A J R, Ayyar Gupta V, Fernquest S, Rombach I, Dutton S J, Mansour R, Wood S, Khanduja V, Pollard T C B, McCaskie A W, Barker K L, Andrade T, Carr A J, Beard D J, Glyn-Jones S, FAIT Study Group. Arthroscopic hip surgery compared with physiotherapy and activity modification for the treatment of symptomatic femoroacetabular impingement: multicentre randomised controlled trial. BMJ 2019; 364:l185. doi: 10.1136/bmj.l185. Reiman M P, Thorborg K. Femoroacetabular impingement surgery: are we moving too fast and too far beyond the evidence? Br J Sports Med 2015; 49(12): 782-4. doi: 10.1136/bjsports-2014-093821. Sansone M, Ahlden M, Jonasson P, Thomee C, Sward L, Baranto A, Karlsson J, Thomee R. A Swedish hip arthroscopy registry: demographics and development. Knee Surg Sports Traumatol Arthrosc 2014; 22(4): 774-80. doi: 10.1007/s00167-014-2840-9. Epub 2014 Jan 25. Sing D C, Feeley B T, Tay B, Vail T P, Zhang A L. Age-related trends in hip arthroscopy: a large cross-sectional analysis. Arthroscopy 2015; 31(12): 2307-13. doi: 10.1016/j.arthro.2015.06.008. van Klij P, Heerey J, Waarsing J H, Agricola R. The prevalence of cam and pincer morphology and its association with development of hip osteoarthritis. J Orthop Sports Phys Ther 2018; 48(4): 230-8. doi: 10.2519/ jospt.2018.7816.


568

Acta Orthopaedica 2021; 92 (5): 568–574

Highly cross-linked polyethylene still outperforms conventional polyethylene in THA: 10-year RSA results Halldor BERGVINSSON, Vasilis ZAMPELIS, Martin SUNDBERG, and Gunnar FLIVIK

Department of Orthopedics, Skåne University Hospital, Clinical Sciences, Lund University, Lund, Sweden Correspondence: halldor.bergvinsson@med.lu.se Submitted 2020-12-11. Accepted 2021-04-23.

Background and purpose — Cup wear in total hip arthroplasty (THA) can be affected by different manufacturing processes of the polyethylene (PE). We report the longterm wear pattern differences, as well as early creep behavior, between conventional PE and highly cross-linked PE (HXLPE) liners, as measured with radiostereometry (RSA) up to 10 years. We also compare migration and clinical outcome of 2 similar uncemented cups with different backside surface roughness. Patients and methods — We included 45 patients with primary osteoarthritis. 23 received a conventional liner and 22 an HXLPE liner in a similar uncemented cup, but with a slightly rougher surface. The patients were followed up with RSA and hip-specific outcome questionnaire (HOOS) at 3 months, 1, 2, 5, and 10 years. Results — During the first 3 months both liners showed expected deformation with mean proximal head penetration of 0.39 mm (conventional PE) and 0.21 mm (HXLPE). Between 3 months and 10 years there was a difference in annual wear with 0.12 mm/year for the conventional liner and 0.02 mm/year for the HXLPE liner. The cup with rougher surface had less initial migration but both types had stabilized after 3 months. The HOOS scores improved after surgery and remained high for both groups throughout the study period. Interpretation — Up to 10 years the HXLPE has consistent lower annual wear, possibly contributing to longer survival of the THA, compared with conventional PE. All patients reported good results regardless of liner type.

Osteolysis, attributed to polyethylene wear debris, is one of the main causes of aseptic loosening in THA (Jacobs et al. 2001). Since highly cross-linked polyethylene (HXLPE) was introduced, several studies have shown reduced wear compared with ultra-high-molecular-weight polyethylene (UHMWPE), hereafter called conventional PE (Kuzyk et al. 2011, van Loon et al. 2020). Conventional PE liners demonstrate a mean wear rate of around 0.1 mm/year, which has been considered as the generally accepted osteolysis threshold. However, according to Dumbleton et al. (2002), a wear rate threshold of 0.05 mm/ year eliminates the risk of osteolysis. The wear for HXLPE is reported to be substantially lower, down to 0.002 mm/year (Thomas et al. 2011, Snir et al. 2014, Glyn-Jones et al. 2015). Its improved wear resistance is related to the different manufacturing process of the liners; by different amount of radiation, annealing, or remelting of the polyethylene; and even different sterilizing techniques. To date, there is no clear evidence for superiority regarding wear for any of the manufacturing processes. Even when fundamental wear improvements occur, clinical effects require many years before being obvious, thus strengthening the importance of conducting longterm clinical studies as well as involving different processing techniques and manufacturers. Although wear of conventional PE and HXLPE has previously been compared in several studies, indicating superiority of HXLPE, there are to our knowledge only 2 comparable long-term prospective RSA studies (Johanson et al. 2012, Glyn-Jones et al. 2015). These studies, however, evaluate not only products from other manufacturers but different cross-linking processes as well. Furthermore, it is still debatable when the initial deformation (creep phase) ends, and when the actual wear phase begins for the different types of polyethylene. RSA is a reliable, validated method of assessing wear (Stilling et al. 2012).

© 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group, on behalf of the Nordic Orthopedic Federation. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits ­unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. DOI 10.1080/17453674.2021.1932140


Acta Orthopaedica 2021; 92 (5): 568–574

The CSF cup with standard conventional PE liner (JRI Orthopaedics Ltd, London, UK) has been on the market since 1991 showing satisfactory results (Datir and Angus 2010, Raman et al. 2012). The CSF Plus cup (JRI Orthopaedics Ltd, London, UK) was introduced in 2006, as an evolution of the CSF cup, with a slightly rougher and improved surface in combination with a new HXLPE liner. We measured and compared the possible differences between the 2 generations of this manufacturer’s polyethylene liners in terms of creep, wear, cup migration, and clinical outcome up to 10 years. Our hypothesis was that HXLPE would result in less wear and that the rougher cup surface would yield better cup stability. Additionally, the patients were evaluated with the clinical Hip Disability and Osteoarthritis Outcome Score (HOOS) throughout the follow-up period.

Patients and methods Study group This is a single-center prospective cohort study conducted at Skåne University Hospital of 50 patients who had surgery performed between April 2007 and June 2008. Mean age was 63 years (50–75), 25 were men, all had primary hip OA, Charnley class A or B (Table 1), and had been included in a published randomized controlled trial comparing 2 versions of the Furlong stem (Weber et al. 2014). Of the 50 patients, the first 25 were allocated to have a CSF cup with conventional PE liner and the following 25 patients a CSF Plus cup with HXLPE liner. The reason for this consecutive allocation on the cup side is that the CSF Plus cup was not available to us when the study was initiated but was obtainable later in the study period. Although all patients met the inclusion criteria and were suitable for an uncemented stem, 5 were considered unsuitable for an uncemented cup (women ≥ 70 years old with radiographical doubt as to bone quality in the acetabulum). Thus, 2 patients in the CSF group, and 3 in the CSF Plus group were excluded from the cup part of the study (Figure 1, Table 1). In the cup migration analysis part of the study, due to the absence of adequate visible markers in the acetabulum affecting RSA cup migration measurements but not wear, 5 patients were excluded: 1 from the CSF group, 4 from the CSF Plus group. Liner analysis was conducted in all 35 patients. Surgery Surgery was performed by 2 experienced hip surgeons (GF and MS) using a posterolateral incision. The patients were blocked randomized to have either the Furlong HAC or the Furlong Active stem (Weber et al. 2014). All patients received a 28 mm CoCr head (JRI Orthopaedics Ltd, London, UK). The cup liners were marked by the surgeons by insertion of 4–6 tantalum markers (dependent on liner size) using drilled holes in standardized positions (0.8 mm diameter) in the periphery

569

Table 1. Patient characteristics  Mean age (range) Male/female sex Mean BMI (SD)

CSF n = 23

CSF Plus n = 22

Total n = 45

64 (50–74) 14/9 27 (4.1)

62 (53–75) 11/11 29 (6.0)

63 (50–75) 25/20 28 (5.1)

SD = standard deviation.

Table 2. Precision values of the cup wear and migration analysis Axis

Liner wear Cup translation (mm) translation (mm)

Cup rotation (°)

Transverse (X) – 0.14 0.76 Longitudinal (Y) 0.08 0.09 0.72 Sagittal (Z) – 0.38 0.22 3D 0.23 – – The value given represents the smallest migration considered as statistically significant and is based on mean + 2 SD of the error obtained. This corresponds to the 95% confidence limit.

of the liner. Additionally, 6–9 Tantalum markers were placed in the periacetabular bone in the pelvis. The liner in the CSF cup is made from Ticona grade GUR 1050 resin, which is ram extruded and sterilized with 2.5 Mrads. The HXLPE liner in the CSF Plus cup is made from the same material followed by irradiation with 7.5 Mrads of gamma sterilization to produce the cross-linking in the polyethylene. The liners are free from calcium stearate, a compound that has been associated with fusion defects and increased oxidation (McKellop et al. 1999). The polyethylene is then remelted to closely restore its mechanical properties. The product is finally sterilized with 2.5–4 Mrads in vacuum. The CSF Plus cup metal shell, compared with its precursor CSF, has a thicker and rougher layer of titanium coating. The complete coating includes the same outer layer of hydroxyapatite, Supravit, 100–170 µ thick for both cups. This makes the total thickness for CSF Plus 365 µ with a roughness of 60–100 RZ, compared with a total thickness of 200 µ and a roughness of 30–50 RZ for the CSF cup shell. RSA RSA examinations were performed according to the guidelines for standardization for radiostereometry (Valstar et al. 2005). The reference RSA examination was performed on the 1st postoperative day, before full weight-bearing, and then at 3, 12, 24, 60, and 120 months, with a time interval of ±5% for each follow-up examination. During the follow-up period, all patients had a double examination calculating the precision value (Table 2). An upper limit for the condition number (CN) is normally set at 150 (Valstar et al. 2005). We had a mean of 25 for all


570

ENROLLMENT

Acta Orthopaedica 2021; 92 (5): 568–574

Assessed for eligibility n = 159 Excluded Did not meet inclusion criteria n = 109 Randomized n = 50 ALLOCATION

Allocated to CSF (n = 25): – received allocated intervention, 23 – did not receive allocated intervention, unsuitable for uncemented cup, 2

Allocated to CSF Plus (n = 25): – received allocated intervention, 22 – did not receive allocated intervention, unsuitable for uncemented cup, 3

POSTOPERATIVE Eligible for analysis (n = 23)

Eligible for analysis (n = 21) Radiographs suboptimal (n = 1) FOLLOW-UP

Examined at 3 months, 1 year and 2 years (n = 23)

Examined at 3 months, 1 year and 2 years (n = 21)

Examined at 5 years (n = 22) Lost to follow-up, deceased (n = 1)

Examined at 5 years (n = 20) Lost to follow-up, deceased (n = 1)

Examined at 10 years (n = 20) Lost to follow-up, deceased (n = 2)

Examined at 10 years (n = 15) Lost to follow-up (n = 5): – deceased, 1 – health issues, 3 – revised, infection, 1

Figure 1. Consort flow chart.

examinations and none of the accepted exceeding 125. The upper limit for mean error of rigid body fitting (ME) was set at 0.30 with a mean for all examinations of 0.05. The RSA examinations were performed using a uniplanar technique with the patient in a supine position and the calibration cage below the patient (Selvik 1989, Kärrholm et al. 1997). We used UmRSA software for the analysis (version 6.0; RSA Biomedical, Umeå, Sweden) and a type 41 calibration cage (Tilly Medical AB, Lund, Sweden). Point motion of the femoral head in relation to the cup segment was used for wear analysis. The cup segment was defined as cup opening and back shell as definitive points of the cup combined with the markers from the liner periphery (Börlin et al. 2006). The femoral head penetration into the cup liner could be measured along the 3 axes in an orthogonal coordinate system, as signed values of X-, Y-, and Z-translation, as well as total penetration as the total point motion (3D vector). Proximal head penetration (Y-translation) and 3D penetration were selected as primary effect variables as these are the most representative of the wear direction. For cup migration analysis, segment motion of the cup segment was compared in relation to the pelvis segment. The proximal migration (Y-translation) and change of inclination (Z-rotation) were chosen as primary effect variables for cup migration with the others as secondary. Cup inclination was measured for all patients on the first postoperative radiographs as this can affect the wear of the liner (Tian et al. 2017). Clinical assessment The Self-administered Hip Disability and Osteoarthritis Outcome Score (HOOS) (Nilsdotter et al. 2003) was filled

out by all patients before surgery and at 12, 24, 60, and 120 months. Statistics Power analysis was performed based on previously published RSA data on stems and cups. Assuming that the true difference of head penetration at 2 years is 0.1 mm with a common standard deviation (SD) of 0.1, 21 patients in each group would yield a power of 90% to find a statistically significant difference between the groups, using alpha = 0.05. To cover possible dropouts, 25 patients were included in each group. Continuous variables are presented using mean and SD or range, and categorical variables are presented using counts and percentages. A significance level of 0.05 was used for all statistical tests and 95% confidence intervals (CI). Comparison between CSF and CSF Plus at single time-points were performed using two-sample t-tests. Linear regression was used to evaluate the effect of cup slope on wear. Wear over time was analyzed using a piecewise linear mixed-effect model with a knot (breaking point) at 3 months after surgery where a clear pattern change from creep to deformation has been shown in an earlier study (Bergvinsson et al. 2020). The models included 3 fixed effects: group, time starting from surgery, and time starting from 3 months after surgery, and 2 interaction terms between group and the time variables. Subject was included as a random effect. These models gave the opportunity to compare the wear slopes before and after the breaking point between the 2 cup types. Before performing the actual analyses, data was reviewed to confirm the assumption that the breaking point is at 3 months after surgery. The HOOS data was analyzed using Mann–Whitney U-test for comparison between groups. Ethics, funding, and potential conflicts of interests The study was approved by the Ethics Committee of Lund University, Sweden (Dnr 2007/33). All patients gave informed written consent to participate in the study including followups. The study was carried out according to the Helsinki Declaration of 1975, as revised in 2000. Data is available on reasonable request. JRI Orthopaedics Ltd have financially supported part of the RSA examinations but had no influence on how this study was conducted or how the results were interpreted. The authors have no conflict of interest.

Results RSA Both groups showed head penetration into the liner occurring during the first 3 months, known as initial polyethylene deformation or creep. The mean Y-penetration at 3 months was 0.39 (CI 0.21–0.60) mm for the conventional PE group and 0.21 (CI 0.10–0.32) mm for the HXLPE group. After this there is


Acta Orthopaedica 2021; 92 (5): 568–574

571

Table 3. Wear measured with RSA as translation of femoral head. Values are mean (mm) and (95% confidence intervals) Months

Conventional PE

Y-axis translation 3 0.39 (0.22–0.57) 12 0.51 (0.32–0.70) 24 0.60 (0.42–0.80) 60 0.92 (0.68–1.16) 120 1.56 (1.21–1.92) 3D translation 3 0.62 (0.38–0.87) 12 0.71 (0.47–0.96) 24 0.81 (019–1.04) 60 1.12 (0.86–1.37) 120 1.69 (1.30–2.08)

HXLPE 0.21 (0.09; 0.32) 0.21 (0.04–0.38) 0.24 (0.09–0.38) 0.27 (0.10–0.45) 0.40 (0.20–0.60) 0.40 (0.26–0.55) 0.50 (0.32–0.67) 0.50 (0.33–0.67) 0.54 (0.38–0.70) 0.56 (0.31–0.81)

p-value a 0.07 < 0.01

0.09 < 0.01

a Mixed

models analysis between 0 and 3 months and 3 months to 10 years, respectively

a clear change in the wear pattern, indicating change from the initial deformation phase followed by beginning of the wear phase. Based on this observation, a mixed–model analysis was performed with a knot at the 3-month follow-up moment. Between 3 months and 10 years the mean femoral head penetration in the 2 groups showed different patterns. The head penetration in the conventional PE group continued (p < 0.001, mixed models) whilst the HXLPE group experienced minimal penetration (p = 0.3, mixed models); at 10 years the total Y-translation was 1.56 (CI 1.21–1.91) mm and 0.40 (CI 0.20–0.60) mm, respectively. This results in a yearly wear rate of 0.12 mm for conventional PE and 0.02 mm for HXLPE after the initial creep period (Table 3 and Figure 2). The values for the total penetration (3D vector) were similar’ at 3 months, the wear was 0.62 (CI 0.37–0.87) mm for the conventional PE group and 0.40 (CI 0.26–0.54) mm for the HXLPE group. The total wear, at 10 years, for the conventional PE group was 1.69 (CI 1.30–2.08) mm and 0.56 (CI 0.31–0.81) mm for the HXLPE group. Thus, for the conventional PE the yearly wear rate is 0.11 mm/year compared with 0.02 mm/year for the HXLPE group (Table 3 and Figure 3). The CSF cup in the conventional group migrated cranially (Y-translation) 0.28 mm (CI 0.13–0.43) during the first 3 months and then seemed to have stabilized with a migration of 0.34 mm at 10 years. The CSF Plus cups in the HXLPE group had a Y-translation from 0.09 (CI 0.01–0.17) mm at 3 months and –0.04 (CI –0.30 to 0.22) mm at 10 years. After initial settling-in, measured up to 3 months, there was, up to 10 years, generally very little translation and rotation of the cups in both groups (Figures 4 and 5 and Table 4). Radiography The mean cup inclination, as measured on the postoperative radiographs, was 43° (CI 41–46) for the conventional PE group and 44° (CI 42–46) for the HXLPE group.

Table 4. Cup migration. Values are mean (mm/°) and (95% confidence intervals) Months CSF CSF Plus X–axis translation, medial (+) or lateral (–) 3 0.37 (0.16 to 0.59) 0.17 (–0.06 to 0.40) 12 0.35 (0.12 to 0.57) 0.17 (–0.05 to 0.40) 24 0.37 (0.14 to 0.59) 0.25 (–0.01 to 0.50) 60 0.31 (0.04 to 0.58) 0.31 (0.03 to 0.60) 120 0.31 (0.02 to 0.60) 0.30 (0.04 to 0.56) Y–axis translation, proximal (+) or distal (–) 3 0.28 (0.13 to 0.43) 0.09 (0.01 to 0.16) 12 0.34 (0.17 to 0.50) 0.03 (–0.15 to 0.20) 24 032 (0.16 to 0.49) 0.04 (–0.15 to 0.22) 60 0.33 (0.16 to 0.49) –0.01 (–0.23 to 0.21) 120 0.34 (0.13 to 0.54) –0.04 (–0.31 to 0.22) Z–axis translation, anterior (+) or posterior (–) 3 0.03 (–0.15 to 0.21) 0.19 (–0.08 to 0.46) 12 –0.01 (–0.20 to 0.17) 0.35 (–0.03 to 0.74) 24 0.10 (–0.09 to 0.30) 0.38 (–0.04 to 0.80) 60 0.11 (–0.09 to 0.32) 0.13 (–0.27 to 0.53) 120 0.04 (–0.21 to 0.28) 0.23 (–0.27 to 0.73) X–axis rotation, anterior (+) or posterior (–) tilt 3 0.26 (–0.20 to 0.71) 0.10 (–0.13 to 0.33) 12 0.40 (–0.01 to 0.80) 0.21 (–0.24 to 0.67) 24 0.28 (–0.14 to 0.71) 0.08 (–0.43 to 0.58) 60 0.25 (–0.26 to 0.77) 0.17 (–0.37 to 0.72) 120 0.20 (–0.34 to 0.74) 0.23 (–0.60 to 1.06) Y–axis rotation, internal (+) or external (–) rotation 3 –0.15 (–0.60 to 0.29) 0.05 (–0.33 to 0.44) 12 –0.11 (–0.49 to 0.27) 0.00 (–0.42 to 0.42) 24 –0.16 (–0.57 to 0.25) –0.12 (–0.57 to 0.33) 60 –0.13 (–0.51 to 0.25) –0.29 (–0.77 to 0.20) 120 –0.03 (–0.56 to 0.51) –0.27 (–0.76 to 0.22) Z–axis rotation, decreased (+) or increased (–) inclination 3 –0.02 (–0.49 to 0.46) –0.16 (–0.46 to 0.14) 12 0.00 (–0.52 to 0.53) –0.23 (–0.44 to –0.02) 24 0.07 (–0.44 to 0.58) –0.12 (–0.37 to 0.12) 60 0.06 (–0.46 to 0.58) –0.06 (–0.32 to 0.19) 120 0.01 (–0.61 to 0.62) 0.06 (–0.21 to 0.32)

Clinical outcome (HOOS) The HOOS was similar for both groups preoperatively and at 6, 12, 24, 60, and 120 months. All patients had improved HOOS scores compared with preoperatively and the improvement remained up to 10 years (Figure 6). 1 cup was revised due to late hematogenic infection. 10 years after THA, none of the remaining cups had any clinical or radiological signs of loosening requiring revision.

Discussion This study was conducted in order to investigate the long-term difference between conventional PE and HXLPE. Our results confirms that the superiority of the HXLPE continues up to 10 years. The curves indicate that this pattern will continue and, so far, we cannot see any disadvantages with the change from conventional PE to HXLPE. Furthermore, we conclude that the deformation process of the PE liner can be divided


572

Acta Orthopaedica 2021; 92 (5): 568–574

Y-translation, mm – femoral head

3D-translation, mm – femoral head

2.5

2.5 CSF CSF Plus Precision

2.0

CSF CSF Plus Precision

2.0

1.5

1.5

1.0

1.0

0.5

0.5

0

0 0

12

24

60

0

120

12

24

60

120

Time (months)

Time (months)

Figure 2. Y-translation of the femoral head for conventional PE (CSF) and HXLPE (CSF Plus) with 95% CI bars.

Figure 3. 3D-translation of the femoral head for conventional PE (CSF) and HXLPE (CSF Plus) with 95% CI bars.

Y-translation, mm – cup

Z-rotation (°) – cup

0.6

0.8 CSF CSF Plus Precision

0.4

CSF CSF Plus Precision

0.4

20

40

60

80

100

HOOS score

Figure 6. HOOS questionnaire outcome. HOOS outcome measures: Pain; Symptoms including stiffness and range of motion; Activity limitations – daily living (ADL); Sport and recreation function (Sport/Rec.); and Hip-related quality of life (QoL). A score of 0 indicates poor function/high number of symptoms, a score of 100 indicates excellent function/low number of symptoms.

0 0 –0.4

0.4

CSF CSF Plus

0

0.2

0.2

PREOPERATIVELY Pain Symptoms ADL Sport/rec. QoL 1 YEAR Pain Symptoms ADL Sport/rec. QoL 2 YEARS Pain Symptoms ADL Sport/rec. QoL 5 YEARS Pain Symptoms ADL Sport/rec. QoL 10 YEARS Pain Symptoms ADL Sport/rec. QoL

–0.8 0

12

24

60

120

0

12

24

60

120

Time (months)

Time (months)

Figure 4. Y-translation of the CSF and CSF Plus cups with 95% CI bars.

Figure 5. Z-rotation of the CSF and CSF Plus cups with 95% CI bars. Plus (+) rotation indicates decreased and minus (–) rotation increased inclination.

into 2 phases: the initial deformation phase, known as creep (Sychterz et al. 1999) and the later true wear phase of the liner. It has been proposed that most of the early deformation occurs within the 1st postoperative year (Sychterz et al. 1999, Hopper et al. 2004). Our results indicate that most of the creep has already happened within the first 3 months and after this initial phase the wear pattern changes from creep to actual wear. However, this is probably a gradual and overlapping process, which we speculate ends within the 1st year. Based on our results we have chosen to calculate wear rate from 3 months onwards. The wear of the conventional PE is steady during the whole study period from 3 months to 10 years with annual wear rate of 0.12 mm/year. The HXLPE show less initial creep and then also exhibits a steady wear pattern but remarkably less compared with the conventional PE, with mean annual wear rate of only 0.02 mm/year. Thus, the HXLPE is less prone to wear than its precursor and has a wear rate well below the threshold limit for wear induced osteolysis of 0.1

mm/y (Dowd et al. 2000, Dumbleton et al. 2002). However, the older conventional PE can be at risk. Our results are consistent with previously published studies indicating that the conventional PE has a higher wear rate than HXLPE, while the latter has a wear rate ranging from 0.002 to 0.15 mm/year and continues to be low even at long-term follow up (Engh et al. 2012, Reynolds et al. 2012, Babovic and Trousdale 2013, Glyn-Jones et al. 2015, Teeter et al. 2017, Tsukamoto et al. 2017). Our 10-year follow-up RSA data indicates a lower risk of later osteolysis and aseptic loosening for HXLPE. There are several ways of producing the HXLPE: by different radiation intensity, annealing or remelting, and sterilizing techniques, resulting in variations in characteristics of the liners. Although the superiority of each technique is debatable, studies to date indicate no increased risk in use of HXLPE compared with their precursors. To our knowledge, this is the 1st study presenting wear data on this specific manufacturer’s HXLPE.


Acta Orthopaedica 2021; 92 (5): 568–574

Our secondary aim was to investigate possible difference in migration behavior as well as time required until osseointegration for the CSF and the CSF Plus cup occurred. The newer design, with a rougher surface, showed less migration during initial bedding-in. However, both seemed to have osseointegrated within 3 months, and none of them showed further signs of migration and/or associated loosening throughout the 10-year follow-up. This is considered a good migration pattern for acetabular cups and indicates minimal risk for aseptic loosening with revision risk in the long term (Pijls et al. 2012). The mean pain score in HOOS was 92 (100 being no pain) for both groups 1 year after surgery and 88 after 10 years. Hence, patients experienced their hips still performing well after 10 years. A limitation of the study is that this is not a randomized study for the cup part, but only for the stem part of the study. Instead, the patients were operated on consecutively with the 1st half of the patients receiving the CSF cup and conventional PE liner and the other half receiving CSF Plus cups and HXLPE liner. The reason for this was that the CSF Plus cups were not released when the study started. It should be noted that there was the same proportion of the different stems in each group. Another potential limitation is that we are comparing the 2 different kinds of polyethylene in 2 slightly different cup shells. It might be speculated that the HXLPE liner is somewhat affected by the slightly rougher surface of the CSF Plus shell compared with the conventional PE of the CSF shell. However, we find this unlikely as the migration behavior of the cups was very similar, except for a slightly less early migration of the CSF Plus cup. A further potential limitation is that due to loss to follow-up after 10 years, the remaining 20 and 15 patients in each respective group do not meet our initial criteria for power in the study. However, the differences in wear already at 5 years are far greater than the values used for power calculations, leading us to believe that these results have sufficient power. In conclusion, our results indicate that this HXLPE has the wear characteristics expected from a modern HXLPE, with markedly less wear compared with the older conventional PE. Both the older cup, with conventional PE, and the newer cup with its slightly rougher surface and an HXLPE liner indicate very good stability up to 10 years.

HB: study conduction, data analysis, writing of the manuscript. GF and MS: study design and conduct, performing surgery, data analysis, and critical revision of the manuscript. VZ: critical revision of the manuscript. The authors would like to thank Håkan Leijon at the RSA laboratory, Skåne University Hospital, Lund University for computerizing and analyzing the RSA pictures; Helene Jacobsson at Clinical Studies Sweden—Forum South, Skåne University Hospital, Lund is thanked for statistical guidance.   Acta thanks Lennard Koster and Matthew Teeter for help with peer review of this study.

573

Babovic N, Trousdale R T. Total hip arthroplasty using highly cross-linked polyethylene in patients younger than 50 years with minimum 10-year follow-up. J Arthroplasty 2013; 28(5): 815-17. Bergvinsson H, Sundberg M, Flivik G. Polyethylene wear with ceramic and metal femoral heads at 5 years: a randomized controlled trial with radiostereometric analysis. J Arthroplasty 2020; 35(12): 3769-76. Börlin N, Röhrl S M, Bragdon C R. RSA wear measurements with or without markers in total hip arthroplasty. J Biomech 2006; 39(9): 1641-50. Datir S.P, Angus P.D. Long term survival of an hydroxyapatite-coated threaded cup in the presence of a high polythene wear rate. Hip Int 2010; 20(3): 327-34. Dowd J E, Sychterz C J, Young A M, Engh C A. Characterization of longterm femoral-head-penetration rates: association with and prediction of osteolysis. J Bone Joint Surg Am 2000; 82(8): 1102-7. Dumbleton J H, Manley M T, Edidin A A. A literature review of the association between wear rate and osteolysis in total hip arthroplasty. J Arthroplasty 2002; 17(5): 649-61. Engh C A Jr, Hopper R H Jr, Huynh C, Ho H, Sritulanondha S, Engh C A Sr. A prospective, randomized study of cross-linked and non-cross-linked polyethylene for total hip arthroplasty at 10-year follow-up. J Arthroplasty 2012; 27(8 Suppl.): 2-7.e1. Glyn-Jones S, Thomas G E, Garfjeld-Roberts P, Gundle R, Taylor A, McLardy-Smith P, Murray D W. The John Charnley Award: Highly crosslinked polyethylene in total hip arthroplasty decreases long-term wear: a double-blind randomized trial. Clin Orthop Relat Res 2015; 473(2): 432-8. Hopper R H Jr, Young A M, Engh C A Jr, McAuley J P. Assessing the pattern of femoral head penetration after total hip arthroplasty. J Arthroplasty 2004; 19(7 Suppl. 2): 22-9. Jacobs J J, Roebuck K A, Archibeck M, Hallab N J, Glant T T. Osteolysis: basic science. Clin Orthop Relat Res 2001; (393): 71-7. Johanson P E, Digas G, Herberts P, Thanner J, Kärrholm J. Highly crosslinked polyethylene does not reduce aseptic loosening in cemented THA: 10-year findings of a randomized study. Clin Orthop Relat Res 2012; 470(11): 3083-93. Kärrholm J, Herberts P, Hultmark P, Malchau H, Nivbrant B, Thanner J. Radiostereometry of hip prostheses: review of methodology and clinical results. Clin Orthop Relat Res 1997; (344): 94-110. Kuzyk P R, Saccone M, Sprague S, Simunovic N, Bhandari M, Schemitsch E H. Cross-linked versus conventional polyethylene for total hip replacement: a meta-analysis of randomised controlled trials. J Bone Joint Surg Br 2011; 93(5): 593-600. McKellop H A, Shen F W, Campbell P, Ota T. Effect of molecular weight, calcium stearate, and sterilization methods on the wear of ultra high molecular weight polyethylene acetabular cups in a hip joint simulator. J Orthop Res 1999; 17(3): 329-39. Nilsdotter A K, Lohmander L S, Klassbo M, Roos E M. Hip disability and osteoarthritis outcome score (HOOS): validity and responsiveness in total hip replacement. BMC Musculoskelet Disord 2003; 4: 10. Pijls B G, Nieuwenhuijse M J, Fiocco M, Plevier J W, Middeldorp S, Nelissen R G, Valstar E R. Early proximal migration of cups is associated with late revision in THA: a systematic review and meta-analysis of 26 RSA studies and 49 survival studies. Acta Orthop 2012; 83(6): 583-91. Raman R, Dickson D, Sharma H, Angus P, Shaw C, Johnson G, Graham A. Long-term results of cementless fully HAC coated acetabular cups in primary hip arthroplasty. Orthopaedic Proceedings 2012; 94 (Suppl. 23): 224. Reynolds S E, Malkani A L, Ramakrishnan R, Yakkanti M R. Wear analysis of first-generation highly cross-linked polyethylene in primary total hip arthroplasty: an average 9-year follow-up. J Arthroplasty 2012; 27(6): 1064-8. Selvik G. Roentgen stereophotogrammetry: a method for the study of the kinematics of the skeletal system. Acta Orthop Scand 1989; (Suppl 232): 1-51. Snir N, Kaye I D, Klifto C S, Hamula M J, Wolfson T S, Schwarzkopf R, Jaffe F F. 10-year follow-up wear analysis of first-generation highly crosslinked polyethylene in primary total hip arthroplasty. J Arthroplasty 2014; 29(3): 630-3.


574

Stilling M, Kold S, de Raedt S, Andersen N T, Rahbek O, Søballe K. Superior accuracy of model-based radiostereometric analysis for measurement of polyethylene wear: a phantom study. Bone Joint Res 2012; 1(8): 180-91. Sychterz C J, Engh C A Jr, Yang A, Engh C A. Analysis of temporal wear patterns of porous-coated acetabular components: distinguishing between true wear and so-called bedding-in. J Bone Joint Surg Am 1999; 81(6): 821-30. Teeter M G, Yuan X, Somerville L E, MacDonald S J, McCalden R W, Naudie D D. Thirteen-year wear rate comparison of highly crosslinked and conventional polyethylene in total hip arthroplasty: long-term follow-up of a prospective randomized controlled trial. Can J Surg 2017; 60(3): 212-6. Thomas G E, Simpson D J, Mehmood S, Taylor A, McLardy-Smith P, Gill H S, Murray D W, Glyn-Jones S. The seven-year wear of highly cross-linked polyethylene in total hip arthroplasty: a double-blind, randomized controlled trial using radiostereometric analysis. J Bone Joint Surg Am 2011; 93(8): 716-22. Tian J L, Sun L, Hu R Y, Han W, Tian X B. Correlation of cup inclination angle with liner wear for metal-on-polyethylene in hip primary arthroplasty. Orthop Surg 2017; 9(2): 186-90.

Acta Orthopaedica 2021; 92 (5): 568–574

Tsukamoto M, Mori T, Ohnishi H, Uchida S, Sakai A. Highly cross-linked polyethylene reduces osteolysis incidence and wear-related reoperation rate in cementless total hip arthroplasty compared with conventional polyethylene at a mean 12-year follow-up. J Arthroplasty 2017; 32(12): 3771-6. Valstar E R, Gill R, Ryd L, Flivik G, Borlin N, Kärrholm J. Guidelines for standardization of radiostereometry (RSA) of implants. Acta Orthop 2005; 76(4): 563-72. van Loon J, Hoornenborg D, Sierevelt I, Opdam K T, Kerkhoffs G M, Haverkamp D. Highly cross-linked versus conventional polyethylene inserts in total hip arthroplasty, a five-year Roentgen stereophotogrammetric analysis randomised controlled trial. World J Orthop 2020; 11(10): 442-52. Weber E, Sundberg M, Flivik G. Design modifications of the uncemented Furlong hip stem result in minor early subsidence but do not affect further stability: a randomized controlled RSA study with 5-year follow-up. Acta Orthop 2014; 85(6): 556-61.


Acta Orthopaedica 2021; 92 (5): 575–580

575

Hip dysplasia is not uncommon but frequently overlooked: a crosssectional study based on radiographic examination of 1,870 adults Rebecka LEIDE 1,2, Anna BOHMAN 3, Daniel WENGER 1,4, Søren OVERGAARD 5,6, Carl Johan TIDERIUS 1,4,a, and Cecilia ROGMARK 1,4,a 1 Department

of Clinical Sciences, Lund University, Lund, Sweden; 2 Department of Orthopedics, Halland Hospital, Halmstad, Sweden; 3 Department of Emergency Medicine, Central Hospital, Kristianstad, Sweden; 4 Department of Orthopedics, Skåne University Hospital, Lund and Malmö, Sweden; 5 Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark; 6 Department of Orthopaedic Surgery and Traumatology, Copenhagen University Hospital, Bispebjerg, Denmark a Equal senior authors. Correspondence: rebecka.leide@med.lu.se Submitted 2021-01-17. Accepted 2021-05-03.

Background and purpose — Hip dysplasia in adults is a deformity in which the acetabulum inadequately covers the femoral head. The prevalence is sparingly described in the literature. We investigated the prevalence in Malmö (Sweden) and assessed whether the condition was recognized in the radiology reports. Subjects and methods — All pelvic radiographs performed in Malmö during 2007–2008 on subjects aged 20–70 years with a Swedish personal identity number were assessed. 1,870 digital radiographs were eligible for analysis. The lateral center-edge angle (LCEA) and acetabular index angle (AIA) were measured. Hip dysplasia was defined as an LCEA ≤ 20°. Intraclass correlation coefficients (ICC) for intra-observer measurements ranged from 0.87 (AIA, 95% CI 0.78–0.93) to 0.98 (LCEA, CI 0.97–0.99). Results — The prevalence of hip dysplasia (LCEA ≤ 20°) was 5.2% (CI 4.3–6.3), (98/1,870). There was no statistically significant difference between the sexes for either prevalence of hip dysplasia or mean LCEA. The mean AIA was 0.9° (CI 0.3–1.3) higher in men (4.1 SD 5.5) compared with women (3.2 SD 5.4). The radiologists had reported hip dysplasia in 7 of the 98 cases. Interpretation — The prevalence of hip dysplasia in Malmö (Sweden) is similar to previously reported data from Copenhagen (Denmark) and Bergen (Norway). Our results indicate that hip dysplasia is often overlooked by radiologists, which may influence patient treatment.

Hip dysplasia is an anatomical deformity defined by a reduced lateral center-edge angle (LCEA) expressing insufficient acetabular coverage of the femoral head. An angle ≤ 20° is considered pathologic, whereas an angle between 21° and 25° is said to be “borderline” (Wiberg 1939, Fredensborg 1976, Ogata et al. 1990, Jacobsen and Sonne-Holm 2005). The acetabular index angle (AIA) describes the slope of the acetabular roof (Tönnis 1976) and a normal range has been suggested as 3° to 13° (Tannast et al. 2015a). Adult hip dysplasia ranges from being an asymptomatic anatomic variation to a painful disease. Diagnosis requires referral for an anteroposterior (AP) radiograph of the pelvis. Although the radiographic measurements have been known for decades, a diagnostic delay is common as radiologists and clinicians often overlook the deformity (Nunley et al. 2011). The prevalence of hip dysplasia varies from 2% to 8% in the few previous studies and the definition of the diagnosis based on the LCEA is inconsistent (Croft et al. 1991, Smith et al. 1995, Inoue et al. 2000, Jacobsen and Sonne-Holm 2005, Engesaeter et al. 2013). The prevalence has not been studied in Sweden before. In an international comparison, we perceive adult hip dysplasia to be a seldom discussed diagnosis in Sweden. Therefore, we determined the prevalence of hip dysplasia in Malmö, an urban area in southern Sweden, and investigated whether hip dysplasia was recognized in radiologists’ reports.

Subjects and methods Study design and population For this retrospective cross-sectional study, all AP pelvic radiographs performed during 2007–2008 at Skåne University © 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group, on behalf of the Nordic Orthopedic Federation. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits ­unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. DOI 10.1080/17453674.2021.1936918


576

Acta Orthopaedica 2021; 92 (5): 575–580

AP pelvic radiographs performed during 2007–2008 at Skåne University Hospital in Malmö n = 10,658 Excluded (n = 8,788): – no Swedish personal identity number, 39 – age < 20 years, 757 – age > 70 years, 6,553 – all but first AP radiographs, 268 – various reasons during radiographic assessment, 1,171 Included in the analysis n = 1,870

Figure 1. Flowchart of exclusion steps resulting in the study population of 1,870 subjects. For detailed exclusion criteria see “Study design and population”.

Hospital in Malmö were assessed for eligibility (n = 10,658). Inclusion criteria were a Swedish personal identity number and age 20–70 years. The age span was chosen to ensure full skeletal maturity and to diminish the risk of age-associated degenerative changes that could influence the measurement quality. The study period (2007–2008) and the requirement of a Swedish personal identity number were chosen to enable future long-term follow-up of cases identified with hip dysplasia. We included only the 1st image in subjects with repeated radiographic examinations during the study period. The following exclusion criteria were applied during radiographic assessment of the remaining radiographs: foramen obturator index outside 0.7–1.8 (see section “Radiographic measurements”), osteoarthritis (OA), hip implants, hip fracture, acetabular fracture, major skeletal tumor, grossly displaced pelvic fracture, a history of childhood hip disorder, inflammatory joint disease, avascular necrosis of the femoral head, skeletal deformity in the hip joint due to neurological disorder, and poor imaging quality. OA was considered present if there was joint space narrowing on the right and/or left side. The assessment was based on information available in the referral, the radiology report, and the observer’s (RL: MD, resident orthopedic surgeon and PhD student in orthopedics) own assessment of the radiographic image. In uncertain cases, images were discussed with a senior orthopedic consultant (CJT, SO, or CR). Subjects’ age, sex, the reason for referral and the radiologist’s statements were collected. Referral reports were read to divide the material into trauma (n = 928) and non-trauma (n = 2,113). The referral reports and radiology statements were digital and available in immediate connection with the radiographs, i.e., no reports were lost. During assessment, 117 subjects were found to only have a referral for radiographic examination. As these subjects never underwent radiographic examination, the issue was considered as an error in the archive rather than missing data. After finalizing the exclusion steps, 1,870 subjects were eligible for inclusion (Figure 1).

Figure 2. Example of output using the 2D dysplasia guide of the Sectra Planning System. AI = acetabular index angle (termed AIA in the text); CE = centre edge angle (termed LCEA in the text).

Radiographic measurements The normal routine at Skåne University Hospital in Malmö is to include an AP projection of the pelvis when performing radiographic examination of the hip. An AP of the pelvis is necessary to enable measurement of the LCEA. The LCEA was defined as the angle between 2 lines drawn through the center of the femoral head, the 1st line perpendicular to the horizontal line and the 2nd line drawn to the lateral subchondral sclerotic zone of the acetabular roof, the so-called “sourcil.” The horizontal line was defined as the line between the center of the femoral heads, according to Wiberg’s original description of the LCEA (Wiberg 1939) (Figure 2). The AIA was defined as the angle between the horizontal line and a line between the lateral and medial margin of the sourcil (Tönnis 1976), and was used as a complementing description of the anatomy. To determine the degree of pelvic rotation in the axial plane, the foramen obturator index (FOI) was used. FOI equals the widest horizontal diameter of the right foramen obturator divided by the widest horizontal diameter of the left foramen obturator (Tönnis 1976). A FOI between 0.7 and 1.8 is recommended when assessing the LCEA, as greater pelvic rotation may affect LCEA measurements (Jacobsen et al. 2004). Measurements were performed by a single observer (RL) using a hip dysplasia guide in the radiography software (see below), where the LCEA and AIA are obtained in whole degrees.  Prevalence of hip dysplasia Prevalence was defined as the proportion of the study population with hip dysplasia in 1 or both hips. Hip dysplasia was defined as an LCEA ≤ 20° and borderline hip dysplasia was defined as an LCEA of ≤ 25°.


Acta Orthopaedica 2021; 92 (5): 575–580

577

Table 1. Intra-observer reliability for the LCEA and AIA LCEA right hip LCEA left hip AIA right hip AIA left hip

Systematic Random error (°) error (°) 0.05 0.06 –0.33 –0.35

0.62 0.57 1.56 1.57

ICC (95% CI) 0.98 (0.97–0.99) 0.98 (0.97–0.99) 0.87 (0.79–0.93) 0.87 (0.78–0.93)

LCEA = lateral center-edge angle; AIA = acetabular index angle; ICC = intraclass correlation coefficient, CI = confidence interval.

Mention of hip dysplasia in radiology reports Radiology reports were read to register whether the presence of hip dysplasia was mentioned, either in exact terms or by describing typical features. Intra-observer reliability To assess the intra-observer reliability of the LCEA and AIA measurements, repeated measures on 50 randomly selected subjects were performed by RL 2 months after finalizing the 1st assessment. The intraclass correlation coefficients (ICC) for the LCEA were excellent (> 0.9) according to Koo and Li (2016); 0.98 (CI 0.97–0.99) for both right and left hips. The ICC for the AIA was good (0.75–0.9) according to Koo and Li (2016) for both hips; 0.87 (CI 0.79–0.93) for right hips and 0.87 (CI 0.78–0.93) for left hips (Table 1). No 2nd reading was performed to assess inter-observer reliability. Statistics and software Before data collection, a power analysis showed that 1,400 subjects were needed to obtain a dysplasia prevalence with a precision of ±1% unit. Means (SD) are presented for normally distributed variables, and medians (range) are presented for non-normally distributed variables. 95% confidence intervals (CI) were calculated. Continuous variables were considered normally distributed based on visual appearance of histograms, similarity between median and mean value, and skewness between –3 and 3. The normal distributions of the LCEA and AIA are presented together with a range of minus and plus 2 SDs from the mean. For inferential statistics, a significance level of < 0.05 was chosen. The Mann–Whitney U-test was used for group comparison of non-normally distributed variables between independent groups. For group comparison of normally distributed variables, Student’s t-test for independent samples (2-tailed) was used for independent groups and paired t-test (2-tailed) for dependent groups. Pearson’s correlation coefficient was calculated to estimate correlation between the LCEA and AIA. A chi-square test was used to compare prevalences between groups. The Clopper–Pearson method was used to calculate 95% CI for proportions. To describe the intra-observer reliability for LCEA and AIA measurements, the systematic error

= (mean of measurement 1 – mean of measurement 2)/2, random error = SD((measurement 1 – measurement 2)/√2), and ICC were calculated. ICC estimates were calculated with 95% CI and interpreted according to Koo and Li (2016). Statistical analyses were performed with IBM SPSS Statistics version 25 (IBM Corp, Armonk, NY, USA). Radiographs were stored and viewed using Sectra PACS (Sectra IDS7 v21.1 Sectra AB, Linköping, Sweden). Radiographic measurements were performed using the dysplasia guide of the Sectra 2D Planning System (Sectra Orthostation Package, version 10.1). Ethics, funding, and potential conflicts of interest The study was approved by the Regional Ethics Review Board 2016-01-26 (2015/910). Subjects’ consent was of opt-out type. Funding was received from the Greta and Johan Kock Foundation, Erik and Angelica Sparre Foundation, Swedish Research Council funding for clinical research in medicine (ALF), and Skåne University Hospital Foundation. None of the authors have any conflicts of interest to declare. Report The STROBE guidelines for cross-sectional studies were used for the reporting of this study.

Results Demographics The median age of the 1,870 included subjects was 53 years (20–70) and 63% (n = 1,171) were female. The 1,171 subjects who were excluded after radiographic assessment were older, 58 years (20–70), p < 0.001. 28% (n = 530) of the included subjects were examined due to trauma and the rest for other reasons. Prevalence of hip dysplasia and mention of hip dysplasia in radiology reports We found 98 subjects with hip dysplasia, resulting in a prevalence of 5.2%. 23% (n = 23) of these had bilateral findings (Table 2). There was no statistically significant difference in prevalence between women and men, 5.6% vs. 4.6% (Table 2). Nor was there any statistically significant difference in prevalence between subjects who were examined due to trauma compared with subjects who were examined for other causes, 6.4% (CI 4.5–8.8) vs. 4.8% (CI 3.7–6.1). 21% (n = 400) had borderline hip dysplasia (Table 2). In 91 of the 98 cases with hip dysplasia, there was no comment on the condition in the radiology report. Radiographic measurements There was no statistically significant sex-related difference regarding the mean LCEA; female mean LCEA 33° (SD 6.6), male mean LCEA 32° (SD 5.8), mean difference 0.5 (CI -0.1–


578

Acta Orthopaedica 2021; 92 (5): 575–580

Table 2. Prevalence of hip dysplasia (LCEA ≤ 20°), and borderline hip dysplasia (LCEA ≤ 25°) in 1,870 subjects (699 men/1,171 women). Values are count, prevalence (%) with 95% confidence interval LCEA ≤ 20° Total Men Women LCEA ≤ 25° Total Men Women

Either Bilateral Right 98 5.2 (4.3–6.3) 32 4.6 (3.2–6.4) 66 5.6 (4.4–7.1)

400 21 (20–23) 147 21 (18–24) 253 22 (19–24)

23 1.2 (0.8–1.8) 5 0.7 (0.2–1.7) 18 1.5 (0.9–2.4)

82 25 57

150 8.0 (6.8–9.3) 48 6.9 (5.1–9.0) 102 8.7 (7.2–11)

331 120 211

4.4 (3.5–5.4) 3.6 (2.3–5.2) 4.9 (3.7–6.3) 18 (16–20) 17 (14–20) 18 (16–20)

Left 39 2.1 (1.5–2.8) 12 1.7 (0.9–3.0) 27 2.3 (1.5–3.3)

219 12 (10–13) 75 11 (8.5–13) 144 12 (11–14)

LCEA = lateral center-edge angle

120

120

80

80

40

40

0

0

10

20

30

40

50

60

LCEA (°)

0

0

10

20

30

pared with left hips; 4.1° (SD 5.5) vs. 3.2° (SD 5.4) (Figures 3 and 4). There was a strong, negative correlation between the LCEA and AIA; Pearson’s correlation coefficient was –0.77 (p < 0.001) for right hips and –0.76 (p < 0.001) for left hips. Among hips with hip dysplasia (LCEA ≤ 20°), the mean AIA was 13.3° (SD 4.3) for right hips (n = 82), and 13.9° (SD 3.8) for left hips (n = 39), respectively.

+2 SD

Mean

–2 SD

Mean

+2 SD

Count – left hip 160

–2 SD

Count – right hip 160

40

50

60

LCEA (°)

Figure 3. Distribution of the right and left LCEA in 1,870 adults. For right hips, the mean LCEA was 32 (SD 6.9) and the range of 2 SDs 18.1–45.6. For left hips, the mean LCEA was 33 (SD 6.6) and the range of 2 SDs 20.2–46.7.

Discussion

+2 SD

Mean

–2 SD

+2 SD

Mean

–2 SD

A long tradition of clinical screening for developmental dysplasia of the hip (DDH) in neonates (Wenger et al. 2019) may have led to the misconception that adult hip dysplasia is not a concern Count – right hip Count – left hip in Sweden. Adult hip dysplasia and DDH have pathoanatomic similarities, but the link between 160 160 them, if there is one, is not clearly understood. We found a prevalence of 5.2% for adult hip dysplasia in our cohort and only 7% of the identified 120 120 cases were described in the radiology reports. Our results were based on an LCEA ≤ 20°, 80 80 which is the same definition as in a Danish study that reported results in line with ours; 5.4% 40 40 (Jacobsen and Sonne-Holm 2005). A Norwegian study, which used a cut-off of a LCEA < 20°, 0 0 reported a prevalence of 3.3% among 19-year–20 –10 0 10 20 30 –20 –10 0 10 20 30 AIA (°) AIA (°) olds (Engesaeter et al. 2013). Comparisons with Figure 4. Distribution of the right and left AIA in 1,870 adults. For right hips, the mean other studies are more difficult because their AIA was 4.1 (SD 5.5) and the range of 2 SDs –6.9 to 15.0. For left hips, the mean AIA radiographic measurements were assessed on was 3.2 (SD 5.4) and the range of 2 SDs –7.7 to 13.9. urograms and in most cases also with higher cut-off values: Croft et al. (1991) reported a 1.0). Right hips had 1.6° (CI 1.4–1.8) lower LCEA than left prevalence of a LCEA ≤ 20° of 1.6% in a male population, hips, 32° (SD 6.9) vs. 33° (SD 6.6) (Figure 3). Smith et al. (1995) a prevalence of an LCEA < 25° of 3.8% The mean AIA was 4.2° (SD 4.7) in male subjects and 3.3° in a female population and Inoue et al. (2000) a prevalence of (SD 5.1) in female subjects; mean difference was 0.9° (CI an LCEA < 25° of 8.1% among Japanese subjects and 2.9% 0.4–1.3). Right hips had 0.9° (CI 0.7–1.1) higher AIA com- among French subjects. Regarding the AIA, we found a clear


Acta Orthopaedica 2021; 92 (5): 575–580

negative correlation with the LCEA, which is in accordance with previous studies (Werner et al. 2012, Zhang et al. 2020). The cut-off values 20° and 25° were originally proposed by Wiberg (1939) in relation to hip development following childhood hip disease. If borderline dysplasia (LCEA ≤ 25°) is to be considered pathologic, the prevalence in both our cohort and the Norwegian cohort (Engesaeter et al. 2013) would have been around 20%. Inoue et al. (2000) reported women as having a higher prevalence than men in French and Japanese adults, using a cut-off value of LCEA < 25°. Engesaeter et al. (2013) concluded the same in Norwegian 19-year-olds, suggesting the prevalence for women to be 4.3% and for men 2.4%, using the cut-off value LCEA < 20°. We did not find any statistically significant difference between men and women in our study. This may be explained by different inclusion criteria, real differences between populations, and/or a type II error. In the case of a falsely accepted null hypothesis, a prevalence that differs only a few percentage units between sexes would not be clinically significant. As suggested before (Engesaeter et al. 2013), we found right hips more often to be dysplastic than left hips. Along with that, we found a higher AIA in right hips, indicating a steeper acetabulum. In addition, re-directional periacetabular osteotomy (PAO) was more frequently performed on right hips compared with left hips (770 vs. 615) in a recent prospective study of PAO outcome (Larsen et al. 2020). Together, these results indicate that adult hip dysplasia may be more common in right hips. We investigated whether or not the hip dysplasia was detected during radiographic examination in standard care. To our knowledge, this has not been previously studied. Our results suggest that a vast majority of hip dysplasia cases may be overlooked by Swedish radiologists. Underreading error has been shown to be among the most common of radiological errors, and delayed diagnosis due to radiological errors is most frequent in the musculoskeletal section (Kim and Mansfield 2014). In the current aspect, underreading of hip dysplasia may be reduced by education and raised awareness of the diagnosis among radiologists. Another important factor is for the referring clinician to ask for signs of dysplasia in relevant cases, i.e., awareness of the diagnosis needs to be increased amongst general practitioners and orthopedic surgeons as well. We acknowledge some limitations to this study. We did not adjust for pelvic tilt (Wiberg 1939, Jacobsen et al. 2004). The distance between the coccyx and the symphysis can be used for this purpose, but has been considered difficult to identify on radiographs (Laborie et al. 2011) and does not affect the LCEA to a clinically relevant extent (Tannast et al. 2015b). OA was not scored according to a common classification system such as Kellgren and Lawrence (K&L). It is therefore possible that some individuals with a K&L grade 1 were included in the study. However, we have no reason to assume that this has influenced our results. Moreover, as hip dysplasia is a risk factor for OA (Jacobsen and Sonne-Holm

579

2005), exclusion of subjects with OA might lower the prevalence of hip dysplasia. However, it was a deliberate choice of exclusion as measurement of the LCEA has been shown to be affected by degenerative changes (Ipach et al. 2014). Furthermore, the radiographic assessments and measurements were performed by a resident orthopedic surgeon and not a radiologist. Studies have shown good inter-observer reliability for radiographic measurements between observers with varying experience (Tiderius et al. 2004, Herngren et al. 2018) and that radiologic technologists with appropriate training can interpret radiographs accurately (Piper et al. 2005, Woznitza et al. 2018). In our study, the observer had extensive training prior to the data collection and continuous support from senior orthopedic consultants during the readings. Lastly, the cohort consists of subjects who actively sought medical care, and not a random sample of the population. We believe our finding of similar prevalence amongst trauma and nontrauma radiographs speaks in favor of our sample being close to the general population and thus has not biased our results. By assessing the referrals, a number of the non-trauma cases were found to have back pain, knee pain, and screening for bone metastases as cause for their pelvic radiographic examination. If the group with non-trauma radiographs had only consisted of individuals with apparent hip symptoms, we might have found a higher prevalence of hip dysplasia compared with the group with trauma-related radiographs. Conclusion Hip dysplasia was present in 5.2% of our cohort consisting of Swedish adults. The condition was mentioned in the radiology report in less than 1 in 10 cases with hip dysplasia, indicating that neither the referring clinicians nor the patients were informed of this potential cause of symptoms from the hip area. In perspective, our findings indicate a need for raised awareness of how common adult hip dysplasia is, so that general practitioners, orthopedic surgeons, and radiologists take it into account as a differential diagnosis during examination, referral, and radiographic evaluation. Among Swedish patients with adult hip dysplasia, there is probably an unmet need for proper information concerning their condition, including treatment alternatives for those with symptoms.

RL: Study design, data collection, radiographic measurements, statistical analysis, and manuscript writing. AB: Study design, data collection, and manuscript writing. DW: Study design and critical review of manuscript. SO, CJT: Study design, consultation on uncertain cases, and critical review of manuscript. CR: Study design, manuscript writing, consultation on uncertain cases, and critical review of manuscript. The authors thank Jan Åke Nilsson for statistical support. Acta thanks Marc Romijn and Matti Seppänen for help with peer review of this study.


580

Croft P, Cooper C, Wickham C, Coggon D. Osteoarthritis of the hip and acetabular dysplasia. Ann Rheum Dis 1991; 50(5): 308-10. Engesaeter I O, Laborie L B, Lehmann T G, Fevang J M, Lie S A, Engesaeter L B, Rosendahl K. Prevalence of radiographic findings associated with hip dysplasia in a population-based cohort of 2081 19-year-old Norwegians. Bone Joint J 2013; 95-b(2): 279-85. Fredensborg N. The CE angle of normal hips. Acta Orthop Scand 1976; 47(4): 403-5. Herngren B, Lindell M, Hägglund G. Good inter- and intraobserver reliability for assessment of the slip angle in 77 hip radiographs of children with a slipped capital femoral epiphysis. Acta Orthop 2018; 89(2): 217-21. Inoue K, Wicart P, Kawasaki T, Huang J, Ushiyama T, Hukuda S, Courpied J. Prevalence of hip osteoarthritis and acetabular dysplasia in French and Japanese adults. Rheumatology (Oxford) 2000; 39(7): 745-8. Ipach I, Rondak I C, Sachsenmaier S, Buck E, Syha R, Mittag F. Radiographic signs for detection of femoroacetabular impingement and hip dysplasia should be carefully used in patients with osteoarthritis of the hip. BMC Musculoskelet Disord 2014; 15: 150. Jacobsen S, Sonne-Holm S. Hip dysplasia: a significant risk factor for the development of hip osteoarthritis. A cross-sectional survey. Rheumatology (Oxford) 2005; 44(2): 211-18. Jacobsen S, Sonne-Holm S, Lund B, Soballe K, Kiaer T, Rovsing H, Monrad H. Pelvic orientation and assessment of hip dysplasia in adults. Acta Orthop Scand 2004; 75(6): 721-9. Kim Y W, Mansfield L T. Fool me twice: delayed diagnoses in radiology with emphasis on perpetuated errors. AJR Am J Roentgenol 2014; 202(3): 465-70. Koo T K, Li M Y. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med 2016; 15(2): 155-63. Laborie L B, Lehmann T G, Engesaeter I O, Eastwood D M, Engesaeter L B, Rosendahl K. Prevalence of radiographic findings thought to be associated with femoroacetabular impingement in a population-based cohort of 2081 healthy young adults. Radiology 2011; 260(2): 494-502. Larsen J B, Mechlenburg I, Jakobsen S S, Thilleman T M, Søballe K. 14-year hip survivorship after periacetabular osteotomy: a follow-up study on 1,385 hips. Acta Orthop 2020; 91(3): 299-305. Nunley R M, Prather H, Hunt D, Schoenecker P L, Clohisy J C. Clinical presentation of symptomatic acetabular dysplasia in skeletally mature patients. J Bone Joint Surg Am 2011; 93(Suppl. 2): 17-21.

Acta Orthopaedica 2021; 92 (5): 575–580

Ogata S, Moriya H, Tsuchiya K, Akita T, Kamegaya M, Someya M. Acetabular cover in congenital dislocation of the hip. J Bone Joint Surg Br 1990; 72(2): 190-6. Piper K J, Paterson A M, Godfrey R C. Accuracy of radiographers’ reports in the interpretation of radiographic examinations of the skeletal system: a review of 6796 cases. Radiography 2005; 11(1): 27-34. Smith R W, Egger P, Coggon D, Cawley M I, Cooper C. Osteoarthritis of the hip joint and acetabular dysplasia in women. Ann Rheum Dis 1995; 54(3): 179-81. Tannast M, Hanke M S, Zheng G, Steppacher S D, Siebenrock K A. What are the radiographic reference values for acetabular under- and overcoverage? Clin Orthop Relat Res 2015a; 473(4): 1234-46. Tannast M, Fritsch S, Zheng G, Siebenrock K A, Steppacher S D. Which radiographic hip parameters do not have to be corrected for pelvic rotation and tilt? Clin Orthop Relat Res 2015b; 473(4): 1255-66. Tiderius C J, Tjörnstrand J, Akeson P, Södersten K, Dahlberg L, Leander P. Delayed gadolinium-enhanced MRI of cartilage (dGEMRIC): intraand interobserver variability in standardized drawing of regions of interest. Acta Radiol 2004; 45(6): 628-34. Tönnis D. Normal values of the hip joint for the evaluation of X-rays in children and adults. Clin Orthop Relat Res 1976; (119): 39-47. Wenger D, Düppe H, Nilsson J, Tiderius C J. Incidence of late-diagnosed hip dislocation after universal clinical screening in Sweden. JAMA Netw Open 2019; 2(11): e1914779. Werner C M, Ramseier L E, Ruckstuhl T, Stromberg J, Copeland C E, Turen C H, Rufibach K, Bouaicha S. Normal values of Wiberg’s lateral center-edge angle and Lequesne’s acetabular index: a coxometric update. Skeletal Radiol 2012; 41(10): 1273-8. Wiberg G. Studies on dysplastic acetabula and congenital subluxation of the hip. With special reference to the complication of osteoarthritis. Acta Chir Scand 1939; 83(58): 5-135. Woznitza N, Piper K, Burke S, Bothamley G. Chest X-ray interpretation by radiographers is not inferior to radiologists: a multireader, multicase comparison using JAFROC (Jack-knife Alternative Free-response Receiver Operating Characteristics) Analysis. Acad Radiol 2018; 25(12): 1556-63. Zhang D, Pan X, Zhang H, Luo D, Cheng H, Xiao K. The lateral centeredge angle as radiographic selection criteria for periacetabular osteotomy for developmental dysplasia of the hip in patients aged above 13 years. BMC Musculoskelet Disord 2020; 21(1): 493.


Acta Orthopaedica 2021; 92 (5): 581–588

581

Impact of socioeconomic status on the 90- and 365-day rate of revision and mortality after primary total hip arthroplasty: a cohort study based on 103,901 patients with osteoarthritis from national databases in Denmark Nina M EDWARDS 1, Claus VARNUM 2,3, Søren OVERGAARD 3,4, and Alma B PEDERSEN 1 1 Department of Clinical Epidemiology, Aarhus University Hospital; 2 Department of Orthopaedic Surgery, Lillebaelt Hospital—Vejle, and Department of Regional Health Research, University of Southern Denmark; 3 Danish Hip Arthroplasty Register; 4 Department of Orthopaedic Surgery and Traumatology, Copenhagen University Hospital, Bispebjerg, University of Copenhagen, and Department of Clinical Medicine, Faculty of Health and Medical Sciences, Denmark Correspondence: nme@clin.au.dk Submitted 2021-01-12. Accepted 2021-05-16.

Background and purpose — Socioeconomic inequality in health is recognized as an important public health issue. We examined whether socioeconomic status (SES) is associated with revision and mortality rates after total hip arthroplasty (THA) within 90 and 365 days. Patients and methods — We obtained SES markers (cohabitation, education, income, and liquid assets) on 103,901 THA patients from Danish health registers (year 1995–2017). The outcomes were any revision (all revisions), specified revision (due to infection, fracture, or dislocation), and mortality. We used Cox regression analysis to estimate adjusted hazard ratio (aHR) of each outcome with 95% confidence interval (CI) for each SES marker. Results — Within 90 days, the aHR for any revision was 1.3 (95% CI 1.1–1.4) for patients living alone vs. cohabiting. The aHR was 2.0 (CI 1.4–2.6) for low-income vs. highincome among patients < 65 years. The aHR was 1.2 (CI 0.9–1.7) for low liquid assets among patients > 65 years. Results were consistent for any revision within 365 days as well as for revisions due to infection, fracture, and dislocation. The aHR for mortality was 1.4 (CI 1.2–1.6) within 90 days and 1.3 (CI 1.2–1.5) within 365 days for patients living alone vs. cohabiting. Low education, low income, and low liquid assets were associated with increased mortality rate within both 90 and 365 days. Interpretation — Our results suggest that living alone, low income, and low liquid assets were associated with increased revision and mortality up to 365 days after THA surgery. Optimizing medical conditions prior to surgery and implementing different post-THA support strategies with a focus on vulnerable patients may reduce complications associated with inequality.

Socioeconomic inequality in health is increasingly recognized as an important public health issue (Agabiti et al. 2007). Socioeconomic status (SES) is associated with access to total hip arthroplasty (THA) and with greater vulnerability to complications after THA, all in favor of high status (Agabiti et al. 2007, Weiss et al. 2019, Edwards et al. 2021). However, few studies have investigated the impact of SES on the risk of revision and mortality and they all present contradicting results, from showing that low SES was associated with a higher risk of early mortality after a THA, to finding no association among SES, revision, and mortality (Mahomed et al. 2003, Agabiti et al. 2007, Jenkins et al. 2009, Peltola and Järvelin 2014, Maradit Kremers et al. 2015). Previous research is limited by assessing SES by only a single marker, by lack of adjustment for important confounders and hospital factors, and by a lack of clinically relevant differentiation between time periods regarding risk assessment. Disparities in the risk of revision and mortality are important as social inequality is a growing problem in Denmark despite universal tax-supported healthcare (Sundhedsstyrelsen and Folkesundhed 2020). Even though the inequality in Denmark and most Nordic countries is less than the inequality seen in the United States (OECD 2019), our hypothesis is that even within this smaller spectrum of inequality, we shall find SES disparities concerning the risk of revision and mortality. By examining and identifying these disparities, we may be able to improve patient outcome by more focused risk assessment with proper counselling and optimization of medical risk factors prior to surgery and by implementing different postoperative strategies. We examined the association between multiple SES markers and the rates of any revision as well as revisions due to infection, fracture, and dislocation, and mortality within 90 and 365 days after THA.

© 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group, on behalf of the Nordic Orthopedic Federation. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits ­unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. DOI 10.1080/17453674.2021.1935487


582

Patients and methods Study design and setting All Danish citizens are assigned a unique civil registration number at birth, which is included in all Danish registries, allowing for unambiguous record linkage on an individual level between multiple registers and almost complete longterm follow-up of all Danish inhabitants (Schmidt et al. 2014). For this study, we linked data from the Danish Civil Registration System (DCRS), which tracks vital status, migrations, and cohabitation status (Schmidt et al. 2014); the Danish Hip Arthroplasty Registry (DHR), which holds information on primary THA and revision surgeries with high completeness (91–98%) (Gundtoft et al. 2016); the Danish National Patient Registry (DNPR), which contains discharge dates and diagnoses from all hospitalizations since 1977, and outpatient clinic and emergency room contacts since 1995 (Schmidt et al. 2015); and Statistics Denmark, which contains detailed individual-based information on socioeconomic characteristics for all Danish citizens. This study is reported following the STROBE and RECORD guidelines. Study population and outcome We conducted a population-based cohort study using prospectively collected data from the DHR. We identified all patients over the age of 45 undergoing primary THA in Denmark from January 1, 1995 to December 31, 2017 with the primary diagnosis idiopathic osteoarthritis (OA). Only the first THA during 1995–2017 was included in the study cohort; if the patient received bilateral THA on the same date, only the right THA was included in the study. The outcomes were revision divided into any revision or revision specified as due to infection, fracture, or dislocation. Revisions were identified in the DHR and defined as any later surgical procedure involving the primary THA, including change of any component or debridement without removal of any part of the prosthesis (Gundtoft et al. 2016). In addition, we studied mortality, defined by date of death due to any cause from the Danish Civil Registration System. All outcomes were evaluated within 90 or 365 days after the date of the primary THA procedure. Socioeconomic status For each THA patient, we retrieved information on SES using the following markers: cohabitating status, highest obtained education, mean family income, and mean family liquid assets. Cohabitating status was classified into 2 categories: living alone and cohabiting. Highest obtained education was classified into 3 categories: low, defined as none or elementary school; medium, defined as more than elementary school, but less than university completed; and high, defined as university degree completed. Since a large proportion of the THA patients are

Acta Orthopaedica 2021; 92 (5): 581–588

senior citizens (> 65 years of age) with a state pension, family liquid asset was used as an SES marker in patients > 65 years of age, whereas family income was used as an SES marker in patients < 65 years of age (Robert and House 1996). This provides a more accurate estimate of overall socioeconomic stratification than using income and liquid assets through all ages (Robert and House 1996). To account for yearly variations in income and liquid assets, we calculated the average yearly total income and liquid assets in the 5 years prior to primary THA surgery for the patient and the patient’s cohabiting partner. According to tertiles, the family mean income and liquid assets were categorized into 3 groups of increasing amount: low, medium, and high (income: < €31,400, €31,400–49,400, ≥ €49,400; liquid assets: < €82,653, €82,653–240,068, ≥ €240,068, respectively). Covariates We collected information on the following variables recorded at the time of primary THA: 1) From the DCRS, we collected information on age and sex. 2) Data on comorbidities was obtained from the DNPR. Based on discharge diagnoses codes 10 years before primary THA, we calculated the Charlson Comorbidity Index (CCI) score adapted to administrative data for every patient. We defined 3 levels of comorbidity: a CCI score of 0 (low) was given to patients with no known comorbidities included in the CCI; a CCI score of 1–2 (medium); and a CCI score of 3 or more (high) (Charlson et al. 1987, Schmolders et al. 2015). Statistics We tabulated the patients’ characteristics by SES markers and calculated the cumulative incidence with 95% confidence intervals (CI) of revisions, starting follow-up at the date of primary THA and treating death as competing risk. Cumulative incidence curves were plotted for any revision; and mortality was calculated within 1 year by cohabitation, education, income, and liquid assets. We used a Cox proportional hazard model to calculate time to event estimating hazard ratios (HRs) with CI for each SES marker and evaluated with a distinction within 90 days and 365 days after THA. The association between the SES markers and revision rate was assessed by using a multilevel model with inclusion of random effects into the Cox proportional hazards model. Subjects who are nested within the same higher-level unit are likely to have outcomes that are correlated with one another. This within-cluster homogeneity may be induced by unmeasured cluster characteristics, here hospitals, that affect the outcome. The Cox model is enhanced when random effects are incorporated through terms to account for within-cluster homogeneity in the outcomes and allows the intercept to vary randomly across clusters. This denotes an increased or decreased hazard by clustering at hospital level (Austin 2017). The HRs were adjusted for potential confounders: age, sex, calendar year, and CCI. We considered the SES markers’ inter-


Acta Orthopaedica 2021; 92 (5): 581–588

Total hip arthroplasties recorded in the Danish Hip Arthroplasty Registry January 1, 1995 to December 31, 2017 in patients older than 45 years n = 168,094 Excluded (n = 64,193): – left THAs due to bilateral operations, 35,519 – other primary diagnosis than osteoarthritis, 28,674 Study cohort n = 103,901

583

Sex Cohabitation

Male

Age

Charlson comorbidity index

Female

Low

Medium

High

Cohabiting (n =63,659) Living alone (n =33,105) 0%

25%

50%

75%

100% 40

60

80

100

0%

25%

50%

75%

100%

0%

25%

50%

75%

100% 40

60

80

100 0%

25%

50%

75%

100%

50%

75%

100% 40

60

80

100

0%

25%

50%

75%

100%

75%

100% 40

60

80

100

0%

25%

50%

75%

100%

Education High (n =15,718) Medium (n =36,818) Low (n =43,260)

Figure 2. Flowchart of total hip arthroplasty (THA) cohort.

Income, tertiles High (n =36,344) Medium (n =35,455) Low (n =32,023) 0%

25%

Liquid assets, tertiles

Figure 3. Demographics with distribution of sex, age and Charlson comorbidity index score in the 4 SES markers. Sex and Charlson comorbity index score distribution is given in percent on the x-axis. Age distribution is shown with age on the x-axis, green line is median age and white lines marks the first and third quartile.

Gender

High (n =35,899) Medium (n =34,038) Low (n =30,623)

dependency, since a mutual adjustment for each SES marker would assume no effect of the common aspects of the SES markers, but that all effects are due to the unique characteristics of the different SES markers (Green and Popham 2019). By applying the directed acyclic graph method, only cohabiting status was evaluated as a true confounder when calculating the HR for revision by income and liquid assets (Figure 1, see Supplementary data). Forest plots were plotted including HRs for revision and mortality for each SES marker. The assumption of proportionality of hazards was fulfilled by calculation of scaled Schoenfeld residuals and by graphically assessing by plotting the residuals against time. The study period was from 1995 to 2017. We performed a sensitivity analysis to account for yearly variations. The period was divided into 2: 1995–2005 and 2006–2017. The statistical analyses were performed in STATA version 15 (StataCorp, College Station, TX, USA) and R version 3.6.1 (R Foundation for Statistical Computing, Vienna, Austria) with use of the coxme package (https://cran.r-project.org/ web/packages/coxme/index.html[AQ2]) to compute the Cox model and to estimate the HRs. Ethics, funding, and potential conflicts of interest The study was approved by the Danish Data Protection Agency (journal number 2015-57-0002) and Aarhus University (journal number 2016-051-000001). We would like to acknowledge the support from Helsefonden, the Orthopaedic Research Fund, the AP Møller Fund, and the Aase and Ejnar Danielsens Fund. The funders had no role in the study design, data collection and analysis, or in the preparation of the manuscript. The authors report no conflict of interest.

0%

25%

50%

Years

Results Study population We identified 168,094 THAs in the DHR. We excluded 35,519 left THAs due to bilateral THA surgery (keeping the right THA), and 28,674 THA patients due to a primary diagnosis other than hip OA. The final study population included 103,901 THA patients (Figure 2). The median follow-up was 7 years (0–23) for revision and 9 years (0–23) for mortality. Some patient characteristics were unevenly distributed across the different SES groups. The distribution of females was between 46% and 74%, with the highest proportion when patients lived alone, had the lowest and highest education, had the lowest income, and had the lowest liquid assets. The mean age distribution in the different SES groups was 65–74 years of age, lowest in the high-income category and highest in the lowest income category (Figure 3 and Table 1, see Supplementary data). Socioeconomic status and revision Within 90 days after primary THA, 1,364 (1.3% of the study population) revisions were identified, whereas within 365 days, 2,092 (2.0% of the study population) revisions were identified. The cumulative incidence of any revision at 1 year was highest among patients who lived alone (2.2%; CI 2.1– 2.4), had the highest education (2.1%; CI 1.9–2.9), had the highest income (2.1%; CI 2.0–2.3), and had the lowest liquid assets (2.3%; CI 2.1–2.4) (Figure 4 and Table 2, see Supplementary data). In the stratified analysis, the cumulative incidence from 1995 to 2005 varied little in the SES categories. In


584

Acta Orthopaedica 2021; 92 (5): 581–588

Cumulative incidence (%) – any revision

Cumulative incidence (%) – any revision

2.5

2.5

Cohabitation

Level of education

Alone Cohabitant

2.0

1.5

1.5

1.0

1.0

0.5

0.5

0.0

0 30

90

Low Medium High

2.0

180

365

0.0

Days after total hip arthroplasty

0 30

90

180

365

Days after total hip arthroplasty

Cumulative incidence (%) – any revision

Cumulative incidence (%) – any revision

2.5

2.5

Level of income

Level of liquid assets

Low Medium High

2.0

1.5

1.5

1.0

1.0

0.5

0.5

0.0

0 30

90

Low Medium High

2.0

180

365

Days after total hip arthroplasty

0.0

0 30

90

180

365

Days after total hip arthroplasty

Socioeconomic status and mortality Within the first 90 days after surgery, 942 died (0.9% of the study population), whereas within the first 365 days, 2,251 died (2.2% of the study population). The cumulative incidence for mortality at 365 days was highest when patients lived alone (3.0%; CI 2.8–3.2), had the lowest education (2.4%; CI 2.3–2.5), had the lowest income (3.9%; CI 3.7–4.1), and had the lowest liquid assets (1.9%; CI 1.8–2.1) (Figure 9 and Table 2, see Supplementary data). The adjusted HR for the 90-day mortality was 1.4 (CI 1.1– 1.6) among patients living alone compared with cohabiting patients. The adjusted HR for the 90-day mortality was 1.5 (CI 1.2–2.0) for patients with low education compared with patients with high education. Among patients younger than 65 years of age, the adjusted HR for the 90-day mortality was 2.4 (CI 1.4–4.2) for patients with low income compared with patients with high income. Similar results were seen in the age group over 65 years. Low liquid assets were associated with higher 90-day mortality rates than high liquid assets. This was seen both among patients younger than 65 years of age (HR 3.8; CI 1.8–7.5) and those older than 65 years of age (HRs 1.5; CI 1.1–2.0). Similar results were obtained for mortality within 1 year (Figure 10 and Table 7, see Supplementary data).

Figure 4. Cumulative incidence of any revision for the 4 SES markers.

Discussion the stratification from 2006 to 2017, the trends from the main analysis were enhanced except within level of income. Here the cumulative incidence of any revision was highest among patients who had the lowest income (Figures 5 and 6, see Supplementary data). The 90-day results showed that living alone was associated with higher rates than cohabiting where the adjusted HRs were 1.3 (CI 1.1–1.4) for any revision, 1.3 (CI 1.1–1.7) for revision due to infection, 1.3 (CI 1.0–1.7) for revision due to fracture, and 1.2 (CI 1.0–1.5) for revision due to dislocation. Education was not associated with 90-days’ revision rate. In contrast, the HR was 2.0 (CI 1.4–2.9) for any revision for patients with low income compared with patients with high income in the age groups under 65 years. This association was not present in the age group over 65 years. Low liquid assets were associated with higher rates of any revision than high liquid assets. This was seen both among patients younger than 65 years of age (HR 1.2; CI 0.9–1.7) and those older than 65 years of age (HRs 1.3; CI 1.1–1.5) (Figure 7 and Tables 3 and 4, see Supplementary data). In the stratified analysis, the HRs from 1995–2005 and 2006–2017 periods were similar to the HRs from the overall 1995–2017 period (Tables 5 and 6, see Supplementary data). Similar results were seen for the 365-day adjusted HR for any revision and for revisions due to infection, fracture, and dislocation (Figure 8 and Tables 3 and 4, see Supplementary data).

In this large nationwide cohort study of 103,901 patients, we observed substantial socioeconomic inequality in terms of revision and death after THA. Living alone, low income, and low liquid assets were associated with increased rate of revision and mortality after both 90 days and 365 days. In addition, low education was associated with increased mortality rate. SES markers Social support can be defined as those resources in a person’s environment that enable the person to deal with life’s physical and psychological stress; a crude measurement may be obtained by dichotomizing patients according to cohabitation status (Brembo et al. 2017). We found a higher rate of revision due to fracture, infection, and dislocation when patients were living alone. This was seen especially within the first 90 days, where social support is important to maintain the household and everyday living arrangements and thereby reduce risk of falling. In line with this, the presence of social support is, according to Brembo et al. (2017), associated with improved bodily pain and physical function outcomes in general, and a study by Weiss et al. (2019) found increased risk of readmission when living alone. Our findings indicate that low education, which is a non-modifiable factor, was only weakly associated with increased revision rate. This contrasts with findings by Maradit Kremers et al. (2015) and Weiss et al. (2019). Others argue that there is a positive correlation between education and psychological health and well-being


Acta Orthopaedica 2021; 92 (5): 581–588

585

Hazard ratio at 90 days for revision due to any cause SES marker

Events

Cohabitation status Alone Cohabitant

511 768

Education Low Medium High (ref)

aHR (95%CI)

Hazard ratio at 90 days for revision due to infection SES marker

Events

aHR (95%CI)

1.27 (1.13−1.44) 1 (ref)

Cohabitation status Alone Cohabitant

144 232

1.32 (1.05−1.65) 1 (ref)

557 504 213

1.09 (0.93−1.29) 1.02 (0.87−1.20) 1 (ref)

Education Low Medium High

169 147 57

1.25 (0.91−1.70) 1.07 (0.79−1.46) 1 (ref)

Income, age < 65 Low Medium High

79 101 188

1.99 (1.39−2.85) 1.31 (0.98−1.75) 1 (ref)

Income, age < 65 Low Medium High

19 31 54

2.28 (1.14−4.54) 1.60 (0.94−2.74) 1 (ref)

Income, age 65 Low Medium High

329 367 285

0.96 (0.78−1.18) 0.94 (0.79−1.11) 1 (ref)

Income, age 65 Low Medium High

90 113 86

1.00 (0.68−1.47) 0.98 (0.73−1.33) 1 (ref)

Liquid assets, age < 65 Low Medium High

115 107 122

1.24 (0.91−1.69) 1.03 (0.77−1.36) 1 (ref)

Liquid assets, age < 65 Low Medium High

35 31 31

1.63 (0.92−2.87) 1.21 (0.72−2.03) 1 (ref)

Liquid assets, age 65 Low Medium High

344 266 319

1.29 (1.09−1.53) 1.06 (0.89−1.26) 1 (ref)

Liquid assets, age 65 Low Medium High

100 75 98

1.34 (0.99−1.83) 0.93 (0.68−1.27) 1 (ref)

0 0.5 1

2

3

0 0.5 1

Hazard ratio at 90 days for revision due to fracture SES marker

Events

Cohabitation status Alone Cohabitant

132 182

Education Low Medium High

aHR (95%CI)

2

3

Hazard ratio at 90 days for revision due to dislocation SES marker

Events

aHR (95%CI)

1.32 (1.04−1.68) 1 (ref)

Cohabitation status Alone Cohabitant

164 257

1.20 (0.97−1.49) 1 (ref)

127 145 58

1.20 (0.87−1.65) 1.24 (0.91−1.69) 1 (ref)

Education Low Medium High

187 153 63

0.96 (0.72−1.29) 0.97 (0.72−1.31) 1 (ref)

Income, age < 65 Low Medium High

23 24 53

2.10 (1.09−4.05) 1.03 (0.58−1.83) 1 (ref)

Income, age < 65 Low Medium High

31 25 49

2.01 (1.06−3.79) 1.04 (0.60−1.80) 1 (ref)

Income, age 65 Low Medium High

70 81 87

0.89 (0.58−1.35) 0.78 (0.56−1.09) 1 (ref)

Income, age 65 Low Medium High

133 122 81

1.05 (0.74−1.49) 0.98 (0.73−1.32) 1 (ref)

Liquid assets, age < 65 Low Medium High

35 27 32

1.54 (0.86−2.75) 1.17 (0.68−2.04) 1 (ref)

Liquid assets, age < 65 Low Medium High

30 34 35

0.89 (0.49−1.63) 0.86 (0.51−1.44) 1 (ref)

Liquid assets, age 65 Low Medium High

81 60 83

1.29 (0.91−1.82) 1.24 (0.87−1.76) 1 (ref)

Liquid assets, age 65 Low Medium High

121 95 101

1.29 (0.96−1.73) 1.00 (0.74−1.34) 1 (ref)

0 0.5 1

2

3

0 0.5 1

2

3

Figure 7. Hazard ratios for revision due to any cause, infection, fracture, and dislocation at 90 days. Hazard ratios are adjusted for age, sex, calendar year, and CCI. Income and liquid assets were also adjusted for cohabiting status.

as well as income and standard of living, a correlation that nurtures the expectation of a correlation between education and risk of revision (Edgerton et al. 2012, Weiss et al. 2019). Education is a widely used international marker of socioeconomic status, since it captures the long-term influence of both early life circumstances on adult health and the influence of adult resources on health. Moreover, it remains relatively constant throughout life (Galobardes et al. 2006). However, recent decades have seen considerable changes in educational opportunities for specific subgroups (Galobardes et al. 2006). This applies in particular to women and the elderly, which leads to an over-representation of these sub-groups among the less educated. These same sub-groups have a decreased risk of

revision (Bayliss et al. 2017), which would lead to an underestimation of our results and hence explain the weak association seen in our study. An interaction between health behavior and the different SES markers is evident through a variety of mediating factors such as lifestyle factors (Robert and House 1996). With higher income and liquid assets come the possibility of better diet, better rehabilitation, and better suited accommodation. Our findings of an association between low income in the age group under 65 and low liquid assets in the age group over 65 and a higher rate of revision contrast with previous findings (Peltola and Järvelin 2014). They describe the opposite effect of income. However, they found a U-formed association


586

Acta Orthopaedica 2021; 92 (5): 581–588

Cumulative incidence (%) – mortality

Cumulative incidence (%) – mortality

5

5

Cohabitation

Level of education

Alone Cohabitant

4

3

3

2

2

1

1

0

0 30

90

180

365

0

Days after total hip arthroplasty

Cumulative incidence (%) – mortality

0 30

90

180

365

Days after total hip arthroplasty

Cumulative incidence (%) – mortality

5

5

Level of income

Level of liquid assets

Low Medium High

4

3

2

2

1

1

0 30

90

Low Medium High

4

3

0

Low Medium High

4

180

365

Days after total hip arthroplasty

0

0 30

90

180

365

Days after total hip arthroplasty

Figure 9. Cumulative incidence of mortality for the 4 SES markers.

with a higher HR in the lower income categories as well in the higher income categories. This supports our findings, and the missing stratification in age in their study may explain the U-formed tendency. Most studies have a median age of +65

when examining effects in respect of income (Agabiti et al. 2007, Peltola and Järvelin 2014, Maradit Kremers et al. 2015, Weiss et al. 2019); however, an age of +65 is above the age of retirement, leaving a population with an income with less fluctuation and perhaps skewed values. Dividing the results concerning income and liquid assets into 2 age groups allows us to assess the effect as regards the true subpopulation, as we did in the present study. Mortality Our findings of an association between the low strata in all of our SES markers and a higher mortality in both the 90-day and the 365-day follow-up are in accordance with findings in earlier studies and are similar to the associations reported for the general population (Maradit Kremers et al. 2015, Ullits et al. 2015, Weiss et al. 2019). In particular, the findings by Weiss et al. (2019) support this, since they have a similar study setup and data quality. However, they have a different aim and chose to mutually adjust for the SES markers. Nearly half of the deaths that occurred within the first year occurred within the first 90 days after surgery. Some of this 90-day mortality would be caused by the surgery, as terminally ill patients are rarely offered elective surgery. This inequality in mortality even in this short period of time after THA indicates underlying diseases, health-care behavior, or social network as possible explanations rather than surgery itself. Methodological considerations The strengths of our study include prospective data collection where information on SES markers was collected from registers on an individual level with few missing data (data not

Hazard ratio at 365 days for mortality

Hazard ratio at 90 days for mortality

Events

aHR (95%CI)

1.41 (1.21−1.63) 1 (ref)

Cohabitation status Alone Cohabitant

974 4 2 1,112

1.33 (1.20−1.46) 1 (ref)

425 252 75

1.53 (1.19−1.97) 1.33 (1.02−1.72) 1 (ref)

Education Low Medium High

1,009 574 215

1.20 (1.03−1.39) 1.05 (0.89−1.23) 1 (ref)

Income, age < 65 Low Medium High

41 47 49

2.39 (1.36−4.2) 1.57 (0.99−2.49) 1 (ref)

Income, age < 65 Low Medium High

3 83 7 87 9 109

1.83 (1.22−2.74) 1.23 (0.88−1.71) 1 (ref)

Income, age 65 Low Medium High

483 217 103

1.88 (1.46−2.41) 1.10 (0.86−1.41) 1 (ref)

Income, age 65 Low Medium High

1,149 554 267

1.65 (1.40−1.93) 1.09 (0.93−1.27) 1 (ref)

Liquid assets, age < 65 Low Medium High

41 20 12

3.67 (1.79−7.54) 1.32 (0.62−2.80) 1 (ref)

Liquid assets, age < 65 Low Medium High

82 2 9 59 3 33

2.59 (1.61−4.15) 1.28 (0.80−2.05) 1 (ref)

Liquid assets, age 65 Low Medium High

143 99 79

1.50 (1.11−2.03) 1.09 (0.80−1.48) 1 (ref)

Liquid assets, age 65 Low Medium High

482 387 256

1.45 (1.22−1.72) 1.24 (1.04−1.47) 1 (ref)

SES marker

aHR (95%CI)

Events

Cohabitation status Alone Cohabitant

432 465

Education Low Medium High (ref)

0 1 2 3

SES marker

0 0.5 1

2

3

Figure 10. Hazard ratios of mortality for the 4 SES markers. Hazard ratios are adjusted for age, sex, calendar year, and CCI. Income and liquid assets were also adjusted for cohabiting status.


Acta Orthopaedica 2021; 92 (5): 581–588

shown as counts < 5 for each marker). Including liquid assets as an SES marker is also a clear advantage compared with other studies, because it provides us with a more accurate estimation of SES for individuals over the age of 65. We included a novel approach to evaluate the relationship between the rate of revision and SES by assessing the rate using a multilevel model with inclusion of random effects in the Cox proportional hazards model. Conventional survival models do not account for the loss of independence that arises from the clustering of patients in higher-level units (Austin 2017). Another strength of our study is the calendar year stratification. From this we can conclude that the HRs are not driven by changes in SES markers over time, but remain despite differences seen in the cumulative incidence. A limitation of our study is the contradictory results regarding cumulative incidence and HR. Because of the differences in the risk of THA-related mortality between the SES groups, different numbers of patients are at risk of revision over time in the different groups. These risks are implicitly incorporated when modelling the cumulative incidence function, whereas the Cox model considers the risk of revision only among those still at risk, giving opposing results (Logan et al. 2006). This paradox also hampers interpretation of the results, as the hazard cannot be directly translated into relative risk. Another limitation is that we have no information regarding lifestyle-related confounders, such as BMI, smoking, alcohol, and physical activity. These confounders differ between socioeconomic strata, influencing the observed associations between socioeconomic variables and our outcome (Weiss et al. 2019). Risk factors for revision and mortality are heart failure, diabetes, obesity, anemia, and malnutrition, and, as seen in Figure 2, the comorbidity burden in the most disadvantaged patients is higher. Some of the effects seen may therefore be driven by pre-existing risk factors, and proper optimization of these medical conditions prior to surgery may therefore minimize the inequality seen (Baek 2014, Romero et al. 2017). The mechanisms explaining the effect of SES on health outcomes are complex and not always clear. However, we examined important socioeconomic factors and found that the correlation between our markers and the rate of revision and mortality was not consistent, showing that treating these markers as indicators of the same fundamental cause ignores their sometimes sizeable independent and distinct contributions to health (Geyer et al. 2006). In conclusion, we found socioeconomic inequality in the rates of revision and mortality after THA. Living alone, low income, and low liquid assets were associated with increased revision and mortality up to 365 days after THA surgery and were associated with inequality both when examining the 90-day and the 365-day hazard rate. Living alone is the most noticeable marker, as this is an easily measured factor with multiple options for intervention. We cannot change patients’ cohabitation status, but by optimizing pre-existing risk factors

587

prior to surgery in patients living alone, and offering better rehabilitation to these patients, we may secure a better minimal level of function, improving their outcome and reducing complications associated with inequality in this respect. Another aspect is when assessing patient frailty and evaluating implant choice prior to surgery, the surgeon should also consider patient SES. However, our study does not support choosing any particular implant over another. Current evidence hence supports implementation of different pre- and post-THA strategies for patients who are living alone and in a lower SES group in general and for patients with increased vulnerability in particular. Further research is needed to clarify the mechanism leading to increased revision and mortality rates among patients with lower SES status. Supplementary data Tables 1–7 and Figures 1, 5, 6, and 8 are available as supplementary data in the online version of this article, http://dx.doi. org/10.1080/17453674.2021.1935487

NME drafted the manuscript. NME, CV, SO, and AP conceived and designed the study, interpreted the results, and revised the manuscript. Acta thanks Keijo T Mäkelä and Rüdiger J Weiss for help with peer review of this study.

Agabiti N, Picciotto S , Ceraroni G, Bisanti L, Forastiere F, Onorati R, Pacelli B, Pandolfi P, Russo A, Spadea T, Perucci C A. The influence of socioeconomic status on utilization and outcomes of elective total hip replacement: a multicity population-based longitudinal study. Int J Quality Health Care 2007; 19: 37-44. Austin P C A. Tutorial on multilevel survival analysis: methods, models and applications. Int Stat Rev 2017; 85: 185-203. Baek S H. Identification and preoperative optimization of risk factors to prevent periprosthetic joint infection. World J Orthop 2014; 5: 362-7. Bayliss L E, Culliford D, Monk A P, Glyn-Jones S, Prieto-Alhambra D, Judge A, Cooper C, Carr A J, Arden N K, Beard D J, Price A J. The effect of patient age at intervention on risk of implant revision after total replacement of the hip or knee: a population-based cohort study. Lancet 2017; 389: 1424-30. Brembo E A, Kapstad H, Van Dulmen S, Eide H. Role of self-efficacy and social support in short-term recovery after total hip replacement: a prospective cohort study. Health Qual Life Outcomes 2017; 15: 68. Charlson M E, Pompei P, Ales K L, MacKenzie C R. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Diseases 1987; 40: 373-83. Edgerton J D, L Roberts W, von Below S. Education and quality of life. In: Handbook of social indicators and quality of life research. Dordrecht: Springer; 2012. Edwards N M, Varnum C, Overgaard S, Pedersen A B. The impact of socioeconomic status on the utilization of total hip arthroplasty during 1995–2017: 104,055 THA cases and 520,275 population controls from national databases in Denmark. Acta Orthop 2021; 92(1): 29-35. Galobardes B, Shaw M, Lawlor D A, Lynch J W, Davey Smith G. Indicators of socioeconomic position (part 1). J Epidemiol Community Health 2006; 60: 7-12.


588

Geyer S, Hemstrom O, Peter R, Vagero D. Education, income, and occupational class cannot be used interchangeably in social epidemiology: empirical evidence against a common practice. J Epidemiol Community Health 2006; 60: 804-10. Green M J, Popham F. Interpreting mutual adjustment for multiple indicators of socioeconomic position without committing mutual adjustment fallacies. BMC Public Health 2019; 19: 10. Gundtoft P H, Varnum C, Pedersen A B, Overgaard S. The Danish Hip Arthroplasty Register. Clin Epidemiol 2016; 8: 509-14. Jenkins P J, Perry P R, Yew Ng C, Ballantyne J A. Deprivation influences the functional outcome from total hip arthroplasty. Surgeon 2009; 7: 351-6. Logan B R, Zhang M J, Klein J P. Regression models for hazard rates versus cumulative incidence probabilities in hematopoietic cell transplantation data. Biol Blood Marrow Transplant 2006; 12: 107-12. Mahomed N N, Barrett J A, Katz J N, Phillips C B, Losina E, Lew R A, Guadagnoli E, Harris W H, Poss R, Baron J A. Rates and outcomes of primary and revision total hip replacement in the United States Medicare population. J Bone Joint Surg Am 2003; 85: 27-32. Maradit Kremers H, Kremers W K, Berry D J, Lewallen D G. Social and behavioral factors in total knee and hip arthroplasty. J Arthroplasty; 2015 30: 1852-4. OECD. Inequalities. In: Society at a glance 2019: OECD social indicators. Paris: OECD Publishing; 2019. Peltola M, Järvelin J. Association between household income and the outcome of arthroplasty: a register-based study of total hip and knee replacements. AOTS 2014; 134: 1767-74.

Acta Orthopaedica 2021; 92 (5): 581–588

Robert S, House J S. SES differentials in health by age and alternative indicators of SES. J Aging Health 1996; 8: 359-88. Romero J A, Jones R E, Brown T. Modifiable risk factors and preoperative optimization of the primary total arthroplasty patient. Curr Orthop Pract 2017; 28: 272-75. Schmidt M, Pedersen L, Sorensen H T. The Danish Civil Registration System as a tool in epidemiology. Eur J Epidemiol 2014; 29: 541-9. Schmidt M, Schmidt S A, Sandegaard J L, Ehrenstein V, Pedersen L, Sorensen H T. The Danish National Patient Registry: a review of content, data quality, and research potential. Clin Epidemiol 2015; 7: 449-90. Schmolders J, Friedrich M J, Michel R, Strauss A C, Wimmer M D, Randau T M, Pennekamp P H, Wirtz D C, Gravius S. Validation of the Charlson comorbidity index in patients undergoing revision total hip arthroplasty. Int Orthop 2015; 39: 1771-7. Sundhedsstyrelsen and Statens Institut for Folkesundhed. Social ulighed i sundhed og sygdom—Udvikling i Danmark i perioden 2010–2017; 2020. https://wwwsstdk/-/media/Udgivelser/2020/Ulighed-i-sundhed/Socialulighed-i-sundhed-og-sygdom-tilgaengeligashx?la=da&hash=CB63CAD0 67D942FE54B99034085E78BE9F486A92 (Accessed September 9, 2020). Ullits L R, Ejlskov L, Mortensen R N, Hansen S M, Kraemer S R, Vardinghus-Nielsen H, Fonager K, Boggild H, Torp-Pedersen C, Overgaard C. Socioeconomic inequality and mortality: a regional Danish cohort study. BMC Public Health 2015; 15: 490. Weiss R J, Kärrholm J, Rolfson O, Hailer N P. Increased early mortality and morbidity after total hip arthroplasty in patients with socioeconomic disadvantage: a report from the Swedish Hip Arthroplasty Register. Acta Orthop 2019; 90: 264-69.


Acta Orthopaedica 2021; 92 (5): 589–596

589

Less improvement following meniscal repair compared with arthroscopic partial meniscectomy: a prospective cohort study of patient-reported outcomes in 150 young adults at 1- and 5-years’ follow-up Kenneth PIHL 1, Martin ENGLUND 2, Robin CHRISTENSEN 3,4, L Stefan LOHMANDER 5, Uffe JØRGENSEN 6, Bjarke VIBERG 7, Jakob Vium FRISTED 8, and Jonas B THORLUND 1,9 1 Department

of Sports Science and Clinical Biomechanics, University of Southern Denmark, Odense, Denmark; 2 Lund University, Faculty of Medicine, Department of Clinical Sciences Lund, Orthopedics, Clinical Epidemiology Unit, Lund, Sweden; 3 Musculoskeletal Statistics Unit, the Parker Institute, Bispebjerg and Frederiksberg Hospital, Denmark; 4 Research Unit of Rheumatology, Department of Clinical Research, University of Southern Denmark, Odense University Hospital, Odense, Denmark; 5 Lund University, Faculty of Medicine, Department of Clinical Sciences Lund, Orthopedics, Lund, Sweden; 6 Department of Orthopedics and Traumatology, Odense University Hospital, Odense, Denmark; 7 Department of Orthopedics, Lillebaelt Hospital, Kolding, Denmark; 8 Department of Orthopedics, Lillebaelt Hospital, Vejle, Denmark; 9 Research Unit for General Practice, Department of Public Health, University of Southern Denmark, Odense, Denmark Correspondence: jthorlund@health.sdu.dk Submitted 2020-11-30. Accepted 2021-03-25.

Background and purpose — Meniscal repair may reduce long-term risk of knee osteoarthritis compared with arthroscopic partial meniscectomy (APM), whereas patientreported outcomes may be poorer at short term than for APM. We compared patient-reported outcomes in young adults undergoing meniscal repair or APM up to ~5 years after surgery. Patients and methods — We included 150 patients aged 18–40 years from the Knee Arthroscopy Cohort Southern Denmark (KACS) undergoing meniscal repair or APM. Between-group differences in change in a composite of 4 of 5 Knee injury and Osteoarthritis Outcome Score (KOOS) subscales (pain, symptoms, sport and recreation, and quality of life—KOOS4) from baseline, 12, and 52 weeks, and a median of 5 years (range 4–6 years) were analyzed using adjusted mixed linear models, with 52 weeks being the primary endpoint. Results — 32 patients had meniscal repair (mean age 26 [SD 6]), and 118 patients underwent APM (mean age 32 [SD 7]). The repair and APM groups improved in KOOS4 from before to 52 weeks after surgery (least square means 7 and 19, respectively; adjusted mean difference –12, [95% CI –19 to –4] in favor of APM). Both groups improved further from 52 weeks to 5 years after surgery with the difference in KOOS4 scores between the groups remaining similar. Interpretation — Patients having meniscal repair experienced less improvements in patient-reported outcomes from baseline to 52 weeks and 5 years post-surgery. The findings highlight the need for randomized trials comparing these interventions in terms of patient-reported outcomes and knee OA development.

Recent studies have reported that arthroscopic partial meniscectomy (APM) is associated with increased risk of osteoarthritis (OA) development and knee replacement surgery as compared with knees with meniscal tears left in situ (Roemer et al. 2017, Rongen et al. 2017). Consequently, meniscal repair, which aims to preserve the meniscal tissue and thereby reduce knee OA risk, has been strongly advocated in recent years, especially for younger individuals with traumatic meniscal tears (Kopf et al. 2020). However, meniscal repair often requires longer rehabilitation time, and has a higher reoperation rate compared with APM, suggesting a trade-off between the 2 procedures (Paxton et al. 2011, Cavanaugh and Killian 2012). Currently, the evidence of the protective ability of meniscal repair against OA compared with APM is limited to retrospective observational data (Stein et al. 2010, Lutz et al. 2015, Persson et al. 2018). Similarly, reliable information on differences in patient-reported outcomes between meniscal repair and APM is scarce, and results from the few retrospective studies are conflicting and lack assessment of change over time (Stein et al. 2010, Paxton et al. 2011, Lutz et al. 2015). The number of meniscal repairs is increasing in accordance with current guidelines (Kopf et al. 2020). While awaiting a randomized trial evaluating knee OA development and patientreported outcomes following meniscal repair compared with APM, we used a prospective study design with pre-specified outcomes to compare change in patient-reported outcomes in patients aged 18–40 years undergoing meniscal repair or APM at multiple time points up to between 4 and 6 years after surgery.

© 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group, on behalf of the Nordic Orthopedic Federation. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits ­unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. DOI 10.1080/17453674.2021.1917826


590

Patients and methods This prospective cohort study is described in a published protocol (Thorlund et al. 2013) and registered at ClinicalTrials. gov (NCT01871272). We followed the STROBE guideline for reporting the study. Patient selection We included patients from the Knee Arthroscopy Cohort Southern Denmark (KACS) (Thorlund et al. 2013). Patients in KACS were consecutively recruited at 4 public hospitals in the region of Southern Denmark between February 1st, 2013 and January 31st, 2014, and at 1 of the initial 4 hospitals in the period February 1, 2014 to January 31, 2015. To be included in this study, patients needed to be 18–40 years of age, assigned for knee arthroscopy on suspicion of a meniscal tear by an orthopedic surgeon (i.e., based on history of injury, clinical examination, and magnetic resonance imaging [MRI] if considered necessary), able to read and understand Danish, and having an e-mail address. Patients were excluded if not having a meniscal tear at surgery, had previous or planned anterior or posterior cruciate ligament (ACL or PCL) reconstruction surgery in either knee, fracture(s) in lower extremities within 6 months before recruitment, or were unable to reply to an online questionnaire due to mental impairments. Patient-reported outcomes Participant characteristics and symptom information was collected using online questionnaires before surgery (median 6 days, IQR 2–9 days) and at 12 and 52 weeks, and median 5 years (range 4–6 years) after surgery. The main outcome was a composite score of 4 of the 5 subscales from the Knee injury and Osteoarthritis Outcome Score (KOOS), defined as KOOS4. The 4 subscales were: pain, symptoms, sport and recreation function (Sport/Rec), and knee-related quality of life (QOL) excluding the activities of daily living (ADL) subscale, because of ceiling effects in younger active populations (Collins et al. 2016). The KOOS is a knee-specific patient-reported outcome and each subscale ranges from 0 to 100, with 0 representing extreme knee problems and 100 representing no knee problems (Roos et al. 1998). It has been validated in individuals with traumatic knee injuries, including individuals undergoing arthroscopic meniscal surgery (Roos et al. 1998), and KOOS4 has been used in a previous trial assessing the effect of APM surgery (Kise et al. 2016). The main outcome was at 52 weeks (Thorlund et al. 2013), while KOOS4 scores 5 years after surgery and all 5 KOOS subscales were included as additional outcomes. Other additional outcomes were Patient Acceptable Symptom State (PASS), treatment failure, knee problems after surgery, and subsequent surgery. PASS was assessed with the question: “When you think of your knee function, would you consider your current condition as satisfying? By knee

Acta Orthopaedica 2021; 92 (5): 589–596

function, you should take into account your activities of daily living, sport and recreational activities, your pain and other symptoms and your quality of life” with response options “yes” or “no” (Ingelsrud et al. 2015). Patients unsatisfied with their current knee function after surgery were then asked a second question relating to treatment failure: “Would you consider your current state as being so unsatisfactory that you consider the treatment to have failed?” with the response options “yes” or “no.” Subsequent surgery on the index knee was assessed using 2 questions in combination: “Have you had problems with your knee after the operation?” and “Have you had additional knee surgery because of your knee problems?” both with the response options “yes” or “no.” The latter question was only asked of those replying “yes” to having had knee problems. Surgical information Surgical information was recorded at arthroscopy. A modified version of the International Society of Arthroscopy, Knee Surgery and Orthopaedic Sports Medicine (ISAKOS) classification of meniscal tears (Anderson et al. 2011) was used for the description of the surgical procedure (i.e., repair and/ or APM), classification of meniscal pathology (i.e., tear type, tear location, etc.), while the International Cartilage Repair Society (ICRS) grading system (Brittberg and Winalski 2003) was used for classification of compartment-specific cartilage damage (ranging from 0 [normal cartilage] to 4 [very severe cartilage damage]). Statistics As reported in the study protocol, a sample size of 67 in the APM group and 33 in the repair group would yield a power of 0.88 to detect a difference between groups of 10 points in KOOS4, assuming a common standard deviation of 15 and a significance level of 0.05 (Thorlund et al. 2013). Under the same assumptions the actual sample of 150 patients (118 having APM and 32 having repair) yielded a power of 0.91 to detect a 10-point difference. To reach a sufficient number of patients with repair, the original recruitment period was extended from 1 to 2 years. For the main outcome, the difference between groups in KOOS4 change from baseline to 52 weeks was analyzed using a mixed linear model (REstricted Maximum Likelihood estimation [REML]) with patients as random effects and group (repair vs. APM) and time (pre-surgery, 12 weeks, 52 weeks, and 5 years), and the interaction between group and time as fixed effects. The main model was adjusted for the following pre-surgery covariates: age, sex, BMI, and preoperative KOOS4 score. The same analysis approach was repeated for all additional KOOS subscales separately. The underlying assumptions for the mixed linear models were assessed using residual plots and kernel density plots. Results are reported as least squares means, and differences between these with 95% confidence intervals (CI).


Acta Orthopaedica 2021; 92 (5): 589–596

Additional sensitivity analyses included: (1) repeating the main analyses including only patients with traumatic meniscal tears as originally protocolized (Thorlund et al. 2013 ); (2) repeating the main analyses excluding patients having both repair and APM performed; (3) repeating the main analyses excluding patients with partial or total ACL rupture; (4) repeating the main analyses excluding patients who had had subsequent surgery on the index knee during the 5-year follow-up; and (5) repeating the main analyses adjusted for covariates with a potential difference in distributions between groups larger than 0.50 SD units (based on standardized mean differences) (Imbens and Rubin 2015). We applied the following pragmatic definition of what makes a confounding variable (C), it is likely an ancestor (cause) of the outcome (Y); it probably causes the exposure (i.e., group). Finally, in order to prevent us from adjusting for pre-existing differences (i.e., Lord’s paradox) or introducing collider bias, a potential deconfounding covariate (C) cannot be a descendant (i.e., an effect) of the exposure (group) or outcome (e.g., KOOS4) (Greenland 2003). We also conducted a subgroup analysis repeating the main analyses in which patients considered ineligible for repair were excluded (i.e., patients with tears not being non-degenerative longitudinal-vertical tears located in the red–red or red–white zone). As for the main analyses, this subgroup analysis was repeated excluding patients who had had subsequent surgery on the index knee. Lastly, in patients with complete data, the difference in proportions of patients who were unsatisfied after surgery (i.e., PASS), indicating treatment failure, or subsequent surgery between those having repair or APM, was tested by the calculation of risk ratios and risk differences with CI. Ethics, funding, and potential conflicts of interest Written informed consent to participate in KACS was obtained from all patients, while the Regional Scientific Ethics Committee waived the need for ethical approval after reviewing the outline of KACS (Thorlund et al. 2013). This study was supported by an individual postdoctoral grant (JBT) from the Danish Council for Independent Research/ Medical Sciences and funds from the Region of Southern Denmark. RC acknowledge that the Parker Institute, Bispebjerg and Frederiksberg Hospital is supported by a core grant from the Oak Foundation (OCAY-18-774-OFIL). BV reports personal fees from Osmedic Swemac and Zimmer Biomet outside the submitted work. JBT reports grants from Pfizer outside the submitted work. ME reports personal fees from Pfizer outside the submitted work. RC reports honorarium to employer from (1) Lectures: Research Methods (Pfizer, DK; 2017); GRADE Lecture (Celgene, DK; 2017), Ad Board Lecture: CAM (Orkla Health, DK; 2017); Diet in RMD (Novartis, DK; 2019); Ad Board Lecture: GRADE (Lilly, DK; 2017); Network MAs (LEO; 2020) and( (Mundipharma, 2019) and (3) Consultancy Reports: Network

591

Patients assessed for eligibility and invited to KACS n = 1,259 Excluded, did not fit inclusion criteria (n = 138): – Previous ACL/PCL surgery, 112 – Fracture on lower extremities, 5 – No e-mail address, 18 – Did not understand Danish, 2 – Not mentally able to reply, 1 Excluded, other reasons (n = 213): – No time to participate,8 – No reason/declined, 50 – Consented, but no reply prior to surgery, 155 Replied to questionnaire before surgery n = 908 Excluded (n = 70): – Surgery cancelled, 51 – Re-scheduled to other hospital, 19 Had knee arthroscopy n = 838 Excluded (n = 197): – ACL/PCL reconstruction at surgery, 15 – No meniscal tear at surgery, 176 – Missing data/misclassified as ‘no tear’, 6 Patients with full dataset at baseline and a meniscal tear at surgery n = 641 Excluded Patients aged ≥41 years n = 491 Patients aged ≤40 years n = 150 Meniscal repair (n = 32)

Meniscal resection (n = 118)

No reply to questionnaire (n = 16) 12 weeks assessment (n = 26)

12 weeks assessment (n = 109)

No reply to questionnaire (n = 0) 52 weeks assessment (n = 26)

No reply to questionnaire (n = 9)

No reply to questionnaire (n = 14)

52 weeks assessment (n = 95)

No reply to questionnaire (n = 4) 5 years assessment (n = 22)

No reply to questionnaire (n = 19)

5 years assessment (n = 76)

Figure 1. Flowchart of inclusion.

MAs (Biogen, DK; 2017), other from); GRADE (Celgene, 2018) all outside the submitted work. The remaining authors have nothing to declare.

Results 150 KACS patients (repair: n = 32 and APM: n = 118) aged 40 years or younger were included in this study (Figure 1). At the 52 weeks assessment, 29 (19%) patients were lost to follow-up (nRepair = 6 and nAPM = 23). Those lost to follow up among patients who had APM were slightly younger and had worse KOOS scores, whereas the KOOS scores among patients who had repair did not differ from those assessed at followup (Supplementary Table A1). Patients who had repair were marginally younger than the APM group, had less cartilage damage (Table 1), and differed in most meniscal pathologies (Table 2), whereas KOOS scores were comparable between the two groups.


592

Acta Orthopaedica 2021; 92 (5): 589–596

Table 1. Baseline patient characteristics. Values are count (%) unless otherwise specified Factor

Repair (n = 32)

APM Compara(n = 118) bility: SMD a

Age, mean (SD) [range] 26 (6) [18–38] 32 (7) [18–40] 0.76 Female 10 (31) 40 (34) 0.05 BMI, mean (SD) [range] 26 (3.2) [21–33] 27 (4.4) [19–44] 0.17 Participation in physical activity prior to injury 0.34 Sport at competitive level 11 (34) 24 (20) Recreational sport 9 (28) 36 (31) Light sport 5 (16) 18 (15) Heavy household work 3 (9) 15 (13) Light household work 4 (13) 20 (17) Minimal household work 0 (0) 4 (3) No household work 0 (0) 1 (1) Symptom onset b 0.12 Slowly evolved over time 6 (19) 23 (19) Semi-traumatic 9 (28) 42 (36) Traumatic 17 (53) 53 (45) Duration of symptoms 0.18 0–3 months 9 (28) 32 (27) 4–6 months 8 (25) 16 (14) 7–12 months 6 (19) 25 (21) 13–24 months 3 (9) 17 (14) > 24 months 6 (19) 28 (24) KOOS scores, mean (SD) [range] KOOS4 50 (18) [13–83] 47 (16) [3–87] 0.12 Pain 62 (21) [8–100] 58 (20) [0–97] Symptoms 60 (20) [21–93] 61 (19) [11–100] ADL 73 (17) [34–99] 69 (20) [7–100] Sport/Rec 34 (27) [0–90] 30 (22) [0–90] QoL 45 (17) [0–75] 39 (16) [0–75] APM: arthroscopic partial meniscectomy, SD: standard deviation, BMI: body mass index, KOOS: Knee injury and Osteoarthritis Outcome Score, ADL: activities of daily living, Sport/rec: sport and recreational activities, QoL: knee-related quality of life. a SMD = Standardized mean difference. Comparability is measured in SD units (derived from Kruskal–Wallis 2-sample test). An SMD of 0.5 or higher indicates the variable may be a confounding factor. b Symptom onset defined by patient as: “The pain/problems have slowly developed over time” or “As a result of a specific incident (i.e., kneeling, sliding, and/or twisting of the knee or the like” (i.e., semi-traumatic) or “As a result of a violent incident (i.e., during sports, a crash, or collision or the like)” (i.e., traumatic).

In the main analysis, both the repair and APM group improved in KOOS4 scores from before to 52 weeks after surgery (least square means 7 and 19, respectively; adjusted mean difference –12 [CI –19 to –4]) (Table 3). Both groups improved further from 52 weeks to 5 years after surgery with the difference in KOOS4 scores between the 2 groups being constant (Figure 2 and Supplementary Table A2). Similar findings were observed for all KOOS subscales (Table 3). All sensitivity analyses essentially yielded similar results to the main analyses (Supplementary Tables A3 to A6), but when excluding those who had had subsequent surgery on the index knee, the difference in change between groups varied considerably from before to 52 weeks after surgery (adjusted mean difference –22 [CI –34 to –9]), which was reduced at 5 years (adjusted mean difference –9 [CI –21 to 3]) (Supplementary Table A7).

Table 2. Surgical procedures and findings. Values are count (%) unless otherwise specified Factor

Repair APM Compara(n = 32) (n = 118) bility: SMD a

Type of repair surgery: Rasping 1 (3) – Suture 7 (22) – Arrow 1 (3) – Anchor + suture 5 (16) – More than one 18 (56) – Type of repair technique b All-inside 17 (53) – Inside-out 1 (3) – Outside-in 1 (3) – Amount resected (%), median (IQR) c 5 (5–10) 20 (10–29) Compartment Medial 23 (72) 69 (58) 0.23 Lateral 3 (9) 44 (37) 0.49 Both 6 (19) 5 (4) 0.25 d Tear depth 0.06 Partial 10 (31) 40 (34) Complete 22 (69) 75 (64) Tear type Longitudinal-vertical 28 (88) 33 (28) 0.80 Horizontal 1 (3) 5 (4) 0.02 Radial 0 (0) 4 (3) 0.06 Vertical flap 0 (0) 26 (22) 0.38 Horizontal flap 0 (0) 10 (8) 0.15 Complex 0 (0) 21 (18) 0.31 More than one tear type 3 (9) 19 (16) 0.12 Circumferential location d,e 0.81 Zone 1 25 (78) 30 (25) Zone 2 5 (16) 69 (58) Zone 3 1 (3) 16 (14) Radial location f Posterior 21 (66) 60 (51) 0.33 Posterior + mid-body 4 (13) 24 (20) 0.03 Mid body 3 (9) 14 (12) 0.04 Anterior + mid-body 0 (0) 6 (5) 0.12 Anterior 1 (3) 7 (6) 0.09 All 1 (3) 7 (6) 0.04 Meniscal tissue quality 0.50 Non-degenerative 32 (100) 84 (71) Degenerative 0 (0) 28 (24) Undetermined 0 (0) 6 (5) ACL status e 0.42 Intact 19 (61) 99 (84) Partial rupture g 2 (6) 7 (6) Total rupture g 10 (32) 12 (10) ICRS cartilage grade Medial compartment ≥ 2 h 0 (0) 17 (14) 0.30 Lateral compartment ≥ 2 f 0 (0) 12 (10) 0.45 Patellofemoral compartment ≥ 2 f,h 1 (3) 11 (11) 0.34 APM: arthroscopic partial meniscectomy, ACL: Anterior cruciate ligament, ICRS: International Cartilage Repair Society grading system. a See Table 1. b Missing data for 13 observations in repair group. c Missing data for 5 observations in APM group. Data for repair group is only for the 8 patients who also had APM. d Missing data for 3 observations in APM group. e Missing data for 1 observation in repair group. f Missing data for 2 observations in repair group. g Non-reconstructed. h Missing data for 2 observations in APM group.


Acta Orthopaedica 2021; 92 (5): 589–596

593

Table 3. Change (95% confidence interval) in Knee injury and Osteoarthritis Outcome Scores (KOOS) from baseline prior to surgery to 12 weeks, 52 weeks, and median 5 years’ follow–up for patients who had had meniscal repair or APM performed Factor

KOOS4 score – main analysis

KOOS4 score – subgroup analys

100

100 APM Repair

APM Repair

80

80

60

60

Repair APM Difference

KOOS scores, unadjusted Change from baseline at 12 weeks n 26 109 40 40 KOOS4 5.8 (–1.0 to 13) 14 (10 to 17) –7.9 (–16 to –0.3) Pain 6.9 (0.1 to 14) 15 (11 to 18) –7.9 (–16 to –0.3) Symptoms 5.4 (–1.7 to 13) 11 (7.0 to 14) –5.2 (–13 to 2.8) 20 20 ADL 7.0 (1.1 to 13) 13 (9.5 to 15) –5.4 (–12 to 1.2) Sport/Rec 8.0 (–2.0 to 18) 18 (13 to 23) –10 (–21 to 1.2) n n 32 26 26 22 32 26 QoL 3.0 (–5.0 to 11) 12 (7.5 to 16) –8.5 (–18 to 0.4) 118 109 95 76 23 22 Change from baseline at 52 weeks 0 0 Pre 12 w 52 w 5 years Pre 12 w n 26 95 Follow-up KOOS4 7.1 (0.3 to 14) 20 (16 to 23) –12 (–20 to –4.7) Pain 7.5 (0.6 to 14) 18 (14 to 22) –11 (–18 to –2.7) KOOS score – main analysis KOOS4 score – subgroup analysis 4 Symptoms 3.5 (–3.7 (11) –12 (–20 to –3.5) 10015 (11 to 19) 100 APM ADL 6.6 (0.7 to 13) 15 (12 to APM 18) –8.3 (–15 to –1.6) Repair Sport/Rec 14 (4.0 to 24) 27 (21 toRepair 32) –13 (–24 to –1.2) QoL 3.6 (–4.4 to 12) 18 (14 to 23) –15 (–24 to –5.7) 80 80 Change from baseline at 5 years n 22 76 KOOS4 13 (5.3 to 20) 25 (21 to 29) –13 (–21 to –4.5) 60 60 Pain 13 (6.0 to 21) 23 (20 to 27) –10 (–18 to –1.8) Symptoms 10 (2.4 to 18) 19 (15 to 23) –8.7 (–17 to –0.2) ADL 12 (5.3 to 18) 19 (15 to 22) –6.9 (–14 to 0.3) 40 40 Sport/Rec 19 (7.9 to 29) 33 (27 to 38) –14 (–26 to –2.1) QoL 8.6 (0.1 to 17) 26 (22 to 31) –18 (–27 to –7.9) KOOS scores, adjusted a 20 20 Change from baseline at 12 weeks n 26 109 n n KOOS4 6.1 (–0.6 to 13) 1432(10 –7.7 (–15 to –0.2) 26 to 17) 26 22 32 26 26 22 118 109 95 76 23 22 18 13 Pain 7.1 (0.4 to 14) 15 (12 to 18) –7.7 (–15 to –0.2) 0 0 Pre(7.1 w 5 years Pre 12 w 52 w 5 years 12 w to52 Symptoms 5.6 (–1.4 to 13) 11 14) –5.0 (–13 to 2.8) Follow-up ADL 7.1 (1.3 to 13) 12 (9.5 to 15) –5.3 (–12 to 1.2) Follow-up Sport/Rec 8.2 (–1.7 to 18) 18 (13 to 23) –9.8 (–21 to 1.3) Figure 2. Least squares means for the Knee injury and Osteoar QoL 3.3 (–4.6 to 11) 12 (7.7 to 16) –8.4 (–17 to 0.4) thritis Outcome Scores (KOOS)4 assessed before arthroscopic Change from baseline at 52 weeks meniscal surgery, and at 12 weeks, 52 weeks, and median n 26 95 5 years’ follow-up for patients having had meniscal repair or KOOS4 7.2 (–0.5 to 14) 19 (16 to 23) –12 (–19 to –4.3) arthroscopic partial meniscectomy (APM). Data from main Pain 7.4 (0.7 to 14) 18 (14 to 21) –10 (–18 to –2.6) analysis and subgroup analysis (harmonized tear types) were Symptoms 3.6 (–3.4 to 11) 15 (11 to 18) –11 (–19 to –3.0) adjusted for age, sex, body mass index (BMI), and preoperative ADL 6.7 (0.9 to 13) 15 (12 to 18) –8.0 (–15 to –1.4) KOOS4 score. Bars indicate 95% confidence intervals. Sport/Rec 15 (4.6 to 24) 26 (21 to 31) –12 (–23 to –0.5) QoL 3.5 (–4.4 to 11) 18 (14 to 22) –15 (–24 to 5.9) Change from baseline at 5 years difference when excluding those having had subsequent n 22 76 KOOS4 13 (5.6 to 20) 25 (21 to 29) –12 (–20 to –4.4) surgery was most pronounced at 52 weeks, while the Pain 14 (6.4 to 21) 23 (20 to 27) –9.8 (–18 to –1.7) repair group improved more from 52 weeks to 5 years Symptoms 10 (2.9 to 18) 19 (15 to 23) –8.2 (–17 to 0.1) than the APM group (Supplementary Table A10). ADL 12 (5.9 to 18) 19 (15 to 22) –6.6 (–14 to 0.4) Sport/Rec 19 (8.1 to 29) 33 (27 to 38) –14 (–26 to –2.3) Patients who had repair were more likely to report QoL 8.7 (0.4 to 17) 26 (22 to 30) –17 (–27 to –7.7) having had knee problems and subsequent surgery at 52 a

Adjusted for age, sex, BMI, and preoperative KOOS score. For abbreviations, see Table 1.

In the subgroup analysis aiming to compare patients with similar meniscal pathology in the 2 groups, the difference in improvement between the 2 groups from before to 52 weeks after surgery were larger than in the main analysis, in favor of the APM group (adjusted mean difference –21 [CI –31 to –11]), which was sustained at 5 years (Figure 2 and ­Supplementary Table A8 and A9). As in the main analysis, the

weeks and 5 years after surgery. For satisfaction (PASS) and treatment failure, wide confidence intervals precluded interpretation of possible difference in proportions between the 2 groups (Table 4).

Discussion We found that patients undergoing repair improved less in patient-reported outcomes from before to around 5 years after surgery than patients having APM. The difference was mainly

26 18

52 w


594

Acta Orthopaedica 2021; 92 (5): 589–596

Table 4. Proportion of patients reporting acceptable symptoms state and treatment failure among those with unsatisfactory symptom state, and patients reporting having had subsequent knee surgery among those reporting knee problems after surgery

At 52 weeks after initial surgery Repair APM (n = 26) (n = 95) Relative Risk difference yes/no yes/no risk (95% CI) (95% CI)

Satisfied with current knee function (PASS) 10/16 Treatment failure 6/10 Knee problems after surgery 26/0 Re–surgery 6/20

At median 5 years after initial surgery Repair APM (n = 22) (n = 75 a) Relative Risk difference yes/no yes/no risk (95% CI) (95% CI)

56/39 17/22

0.65 (0.39–1.1) 0.86 (0.41–1.8)

–0.20 (–0.42 to 0.01) –0.06 (–0.34 to 0.22)

12/10 5/5

50/25 0.82 (0.54–1.2) –0.12 (–0.36 to 0.11) 5/20 2.5 (0.92–6.8) 0.30 (–0.05 to 0.65)

74 /21 9/65

1.3 (1.2–1.4) 1.9 (0.75–4.8)

0.22 (0.14 to 0.30) 0.11 (–0.07 to 0.29)

22/0 12/10

66/9 14/52

1.1 (1.1–1.2) 2.6 (1.4–4.7)

0.12 (0.05 to 0.19) 0.33 (0.10 to 0.56)

APM: arthroscopic partial meniscectomy, PASS: patient acceptable symptom state. CI: confidence interval. Data missing for 1 observation.

a

driven by larger improvements within the first year after surgery, while the groups improved equally in the period from 1 to approximately 5 years post-surgery. These results were consistent in all subgroup and sensitivity analyses. More patients in the repair group reported knee problems after the initial surgery and subsequent surgery to the index knee at 1- and 5-years’ follow-up compared with the APM group, although the difference in subsequent surgery at 1 year was not statistically significant. To our knowledge, this is the first prospective study with prespecified outcomes investigating changes in patient-reported outcomes after meniscal repair compared with APM, providing the most solid data so far in the absence of randomized trials. Previous attempts to compare meniscal repair and APM in patients with an isolated meniscal tear have been limited to small retrospective observational studies (Stein et al. 2010, Paxton et al. 2011, Lutz et al. 2015). They found no difference in absolute scores of self-reported symptoms or function at 2–5 years after surgery between the 2 procedures (Stein et al. 2010), but the repair group were found to have better scores at 6–13 years’ follow-up (Stein et al. 2010, Lutz et al. 2015). These results contrast with the present study, where the APM group at all follow-ups had better patient-reported outcomes than the repair group (Figure 2). Likely, more patients in the APM groups in previous studies that included older patients had clinical knee OA at follow-up compared with our study on young adults (Lutz et al. 2015). Any differences in patientreported outcomes post-surgery between groups in previous retrospective studies might already have been present pre-surgery (Stein et al. 2010, Lutz et al. 2015). The repair and APM groups had similar baseline KOOS4 scores, while the APM group had higher scores at all followups than the repair group due to about a 12 points larger improvement from pre-surgery to 52 weeks and 5 years after surgery. A difference of this size is typically considered clinically relevant (Devji et al. 2017). Notably, none of the groups had reached population-based KOOS scores, especially in the subscales Sport/Rec and QOL (Paradowski et al. 2006).

Meniscal repair is a more complex procedure than APM and often requires an extended rehabilitation period. Previous studies have reported a reoperation rate for repair patients between 17% and 30% depending on time of follow-up, compared with a rate between 1% and 5% for APM patients (Paxton et al. 2011). Our findings are consistent with this, although the proportions who had subsequent surgery were larger at 5 years than previously reported. In the present study, the specific type of subsequent surgery to the index knee was not specified, which may mean that some of the subsequent surgery may not be related to the meniscus. In the sensitivity analyses excluding patients who had had subsequent surgery the difference observed in all analyses in improvement from before to 1 year after surgery between the repair and APM groups diminished from 1 to 5 years as a consequence of a larger improvement in the repair group. This supports the notion that the poorer outcomes from repair compared with APM might be due to the larger proportion having complications and subsequent surgery. While APM may have better outcomes and fewer complications short-term, the procedure likely increases structural joint deterioration and risk of subsequent joint replacement (Collins et al. 2020). Therefore, meniscal repair is typically preferred when viable despite the risk of poorer short-term outcomes and complications (Kopf et al. 2020). The biomechanical advantages of procedures that preserve intra-articular contact area and stress are described (Baratz et al. 1986), but these theoretical benefits regarding the risk of OA have not yet been confirmed by clinical trial data. The limited evidence from observational studies supports the hypothesized benefits but suffers the same limitations as the present study, mainly confounding-byindication. A recent Swedish registry study reported the incidence of OA after meniscal repair to be substantially elevated compared with the general population (Persson et al. 2018). Limitations We are unable to draw conclusions regarding causality between the surgical procedures and the degree of improvement as this was an observational study. Like all previous studies the sur-


Acta Orthopaedica 2021; 92 (5): 589–596

gical procedure was not randomized but determined by the pathology (i.e., tear type), leading to selection bias. Although our findings were consistent and robust even after repeated adjustments (attempting to deal with prognostic imbalance to reduce the risk of bias in this observational setting), none of these adjustments can replace the lack of systematic bias in the distribution of both known and unknown prognostic factors offered by randomization. At 52 weeks, loss to follow-up among the repair and APM groups was 19%. Those lost to follow-up in the APM group self-reported poorer KOOS scores before surgery compared with patients who remained in the study. However, the use of mixed models that include all patients with and without missing data at any time point should give unbiased results under the assumption of missing at random. To explore the robustness of deviations from the missing at random assumption, we assessed the impact of missing data in sensitivity analyses using different single-imputation techniques such as nonresponder imputation (i.e., baseline observation carried forward), a best- and worst-case scenario, which yielded similar results for the main outcome (Supplementary Table A6). In the study protocol the intent was to conduct this study on patients with traumatic tears only. However, as no clear consensus exists on the definition of traumatic and degenerative tears, we changed this to including all patients aged 18–40 years. A sensitivity analysis was performed using the planned definition of traumatic tears, but this did not change the interpretation of the results (Supplementary Table A11). Although the use of repair surgery and technique varied considerably and possibly has affected the outcomes in the repair group, it is unlikely that it has had a substantial impact, as previous studies have reported comparable results between the different repair methods (Nepple et al. 2012). We believe the results are generalizable to the majority of patients undergoing arthroscopic meniscal surgery as demographics of the included patients with regard to age and sex are similar to what has been reported for patients having meniscal surgery in Denmark and the United States (Montgomery et al. 2013, Thorlund et al. 2014). However, the results are not generalizable to patients with ACL/PCL reconstruction and a meniscal tear as these patients were excluded from this study. The proportion of individuals 40 years old or younger in the KACS cohort is a little lower than the corresponding number in all patients having had meniscal surgery in Denmark (Thorlund et al. 2014) and also only a small proportion in the present study were active at competitive level, indicating that we might have missed some young elite athletes. Conclusion Patients who had had meniscal repair or APM improved in patient-reported outcomes after surgery; however, the repair group experienced clinically important smaller improvements at 1 year and 5 years post-surgery than patients who had had APM. The results highlight the need for randomized controlled trials comparing the short- and long-term outcomes of

595

meniscal repair and APM on patient-reported outcomes and knee OA development. Supplementary data Tables A1–A11 are available as supplementary data in the online version of this article, http://dx.doi.org/10.1080/1745 3674.2021.1917826

Conception and design: KP, ME, RC, SL, UJ, BT. Data analysis: KP, RC. Data interpretation: All authors. Drafting the first version of the manuscript: KP, BT. Feedback and editing of manuscript: all authors. Approval of final version of manuscript: all authors. Acta thanks Franky Steenbrugge and Asbjørn Årøen for help with peer review of this study.

Anderson A F, Irrgang J J, Dunn W, Beaufils P, Cohen M, Cole B J, Coolican M, Ferretti M, Glenn R E, Jr., Johnson R, Neyret P, Ochi M, Panarella L, Siebold R, Spindler K P, Ait Si Selmi T, Verdonk P, Verdonk R, Yasuda K, Kowalchuk D A. Interobserver reliability of the International Society of Arthroscopy, Knee Surgery and Orthopaedic Sports Medicine (ISAKOS) classification of meniscal tears. Am J Sports Med 2011; 39(5): 926-32. Baratz M E, Fu F H, Mengato R. Meniscal tears: the effect of meniscectomy and of repair on intraarticular contact areas and stress in the human knee. A preliminary report. Am J Sports Med 1986; 14(4): 270-5. Brittberg M, Winalski C S. Evaluation of cartilage injuries and repair. J Bone Joint Surg Am 2003; 85-A(Suppl. 2): 58-69. Cavanaugh J T, Killian S E. Rehabilitation following meniscal repair. Curr Rev Musculoskelet Med 2012; 5(1): 46-58. Collins N J, Prinsen C A, Christensen R, Bartels E M, Terwee C B, Roos E M. Knee Injury and Osteoarthritis Outcome Score (KOOS): systematic review and meta-analysis of measurement properties. Osteoarthritis Cartilage 2016; 24(8): 1317-29. Collins J E, Losina E, Marx R G, Guermazi A, Jarraya M, Jones M H, Levy B A, Mandl L A, Martin S D, Wright R W, Spindler K P, Katz J N. Early magnetic resonance imaging-based changes in patients with meniscal tear and osteoarthritis: eighteen-month data from a randomized controlled trial of arthroscopic partial meniscectomy versus physical therapy. Arthritis Care Res 2020; 72(5): 630-40. Devji T, Guyatt G H, Lytvyn L, Brignardello-Petersen R, Foroutan F, Sadeghirad B, Buchbinder R, Poolman R W, Harris I A, Carrasco-Labra A, Siemieniuk R A C, Vandvik P O. Application of minimal important differences in degenerative knee disease outcomes: a systematic review and case study to inform BMJ Rapid Recommendations. BMJ Open 2017; 7(5): e015587. Greenland S. Quantifying biases in causal models: classical confounding vs collider-stratification bias. Epidemiology 2003; 14(3): 300-6. Imbens G W, Rubin D B. Causal inference for statistics, social, and biomedical sciences: an introduction. Cambridge: Cambridge University Press; 2015. Ingelsrud L H, Granan L P, Terwee C B, Engebretsen L, Roos E M. Proportion of patients reporting acceptable symptoms or treatment failure and their associated KOOS values at 6 to 24 months after anterior cruciate ligament reconstruction: a study from the Norwegian Knee Ligament Registry. Am J Sports Med 2015; 43(8): 1902-7. Kise N J, Risberg M A, Stensrud S, Ranstam J, Engebretsen L, Roos E M. Exercise therapy versus arthroscopic partial meniscectomy for degenerative meniscal tear in middle aged patients: randomised controlled trial with two year follow-up. BMJ 2016; 354: i3740. Kopf S, Beaufils P, Hirschmann M T, Rotigliano N, Ollivier M, Pereira H, Verdonk R, Darabos N, Ntagiopoulos P, Dejour D, Seil R, Becker R. Management of traumatic meniscus tears: the 2019 ESSKA meniscus consensus. Knee Surg Sports Traumatol Arthrosc 2020; 28(4): 1177-94.


596

Lutz C, Dalmay F, Ehkirch F P, Cucurulo T, Laporte C, Le Henaff G, Potel J F, Pujol N, Rochcongar G, Salledechou E, Seil R, Gunepin F X, Sonnery-Cottet B. Meniscectomy versus meniscal repair: 10 years radiological and clinical results in vertical lesions in stable knee. Orthop Traumatol Surg Res 2015; 101(8 Suppl.): S327-31. Montgomery S R, Zhang A, Ngo S S, Wang J C, Hame S L. Cross-sectional analysis of trends in meniscectomy and meniscus repair. Orthopedics 2013; 36(8): e1007-13. Nepple J J, Dunn W R, Wright R W. Meniscal repair outcomes at greater than five years: a systematic literature review and meta-analysis. J Bone Joint Surg Am 2012; 94(24): 2222-7. Paradowski P T, Bergman S, Sunden-Lundius A, Lohmander L S, Roos E M. Knee complaints vary with age and gender in the adult population: population-based reference data for the Knee injury and Osteoarthritis Outcome Score (KOOS). BMC Musc Dis 2006; 7: 38. Paxton E S, Stock M V, Brophy R H. Meniscal repair versus partial meniscectomy: a systematic review comparing reoperation rates and clinical outcomes. Arthroscopy 2011; 27(9): 1275-88. Persson F, Turkiewicz A, Bergkvist D, Neuman P, Englund M. The risk of symptomatic knee osteoarthritis after arthroscopic meniscus repair vs partial meniscectomy vs the general population. Osteoarthritis Cartilage 2018; 26(2): 195-201.

Acta Orthopaedica 2021; 92 (5): 589–596

Roemer F W, Kwoh C K, Hannon M J, Hunter D J, Eckstein F, Grago J, Boudreau R M, Englund M, Guermazi A. Partial meniscectomy is associated with increased risk of incident radiographic osteoarthritis and worsening cartilage damage in the following year. Eur Radiol 2017; 27(1): 404-13. Rongen J J, Rovers M M, van Tienen T G, Buma P, Hannink G. Increased risk for knee replacement surgery after arthroscopic surgery for degenerative meniscal tears: a multi-center longitudinal observational study using data from the Osteoarthritis Initiative. Osteoarthritis Cartilage 2017; 25(1): 23-9. Roos E M, Roos H P, Lohmander L S, Ekdahl C, Beynnon B D. Knee Injury and Osteoarthritis Outcome Score (KOOS): development of a self-administered outcome measure. J Orthop Sports Phys Ther 1998; 28(2): 88-96. Stein T, Mehling A P, Welsch F, von Eisenhart-Rothe R, Jager A. Longterm outcome after arthroscopic meniscal repair versus arthroscopic partial meniscectomy for traumatic meniscal tears. Am J Sports Med 2010; 38(8): 1542-8. Thorlund J B, Christensen R, Nissen N, Jorgensen U, Schjerning J, Porneki J C, Englund M, Lohmander L S. Knee Arthroscopy Cohort Southern Denmark (KACS): protocol for a prospective cohort study. BMJ Open 2013; 3(10): e003399. Thorlund J B, Hare K B, Lohmander L S. Large increase in arthroscopic meniscus surgery in the middle-aged and older population in Denmark from 2000 to 2011. Acta Orthop 2014; 85(3): 287-92.


Acta Orthopaedica 2021; 92 (5): 597–601

597

Reasons for revision are associated with rerevised total knee arthroplasties: an analysis of 8,978 index revisions in the Dutch Arthroplasty Register Maartje BELT 1,2, Gerjon HANNINK 3, José SMOLDERS 4, Anneke SPEKENBRINK-SPOOREN 5, Berend W SCHREURS 5,6, and Katrijn SMULDERS 1,2 1 Research Department, Sint Maartenskliniek, Nijmegen; 2 Interdisciplinary Consortium for Clinical Movement Sciences & Technology (ICMS); 3 Department of Operating Rooms, Radboud University Medical Center, Radboud Institute for Health Sciences, Nijmegen; 4 Department of Orthopedics, Sint Maartenskliniek, Nijmegen; 5 Dutch Arthroplasty Register (Landelijke Registratie Orthopedische Implantaten), ‘s-Hertogenbosch; 6 Department of

Orthopaedics, Radboud University Medical Center, Radboud Institute for Health Sciences, Nijmegen, The Netherlands Correspondence: maartjebelt@gmail.com Submitted 2020-10-20. Accepted 2021-04-06.

Background and purpose — From previous studies, we know that clinical outcomes of revision total knee arthroplasty (rTKA) differ among reasons for revision. Whether the prevalence of repeat rTKAs is different depending on the reason for index rTKA is unclear. Therefore, we (1) compared the repeat revision rates between the different reasons for index rTKA, and (2) evaluated whether the reason for repeat rTKA was the same as the reason for the index revision. Patients and methods — Patients (n = 8,978) who underwent an index rTKA between 2010 and 2018 as registered in the Dutch Arthroplasty Register were included. Reasons for revision, as reported by the surgeon, were categorized as: infection, loosening, malposition, instability, stiffness, patellar problems, and other. Competing risk analyses were performed to determine the cumulative repeat revision rates after an index rTKA for each reason for revision. Results — Overall, the cumulative repeat revision rate was 19% within 8 years after index rTKA. Patients revised for infection had the highest cumulative repeat revision rate (28%, 95% CI 25–32) within 8 years after index rTKA. The recurrence of the reason was more common than other reasons after index rTKA for infection (18%), instability (8%), stiffness (7%), and loosening (5%). Interpretation — Poorest outcomes were found for rTKA for infection: over 1 out of 4 infection rTKAs required another surgical intervention, mostly due to infection. Recurrence of other reasons for revision (instability, stiffness, and loosening) was also considerable. Our findings also emphasize the importance of a clear diagnosis before doing rTKA to avert second revision surgeries.

The number of revision total knee arthroplasties (rTKA) has increased over the past years, and projections predict further increases in the coming decades (Kurtz et al. 2007, Patel et al. 2015, LROI 2019). The outcome of these rTKAs is in general inferior compared with the outcome of the primary total knee arthroplasty (Greidanus et al. 2011, Baker et al. 2012, Nichols and Vose 2016). Evidence suggests that one of the determinants for outcome of rTKA is the indication for the revision. To illustrate, several studies have shown a poor prognosis when the rTKA is performed for infection or stiffness compared with revisions for aseptic loosening (Sheng et al. 2006, Pun and Ries 2008, Baker et al. 2012, Van Kempen et al. 2013, Leta et al. 2015). Poor results were reported in terms of complication rates, patient satisfaction, and survival of the prosthesis. However, the majority of these studies based their findings on small samples, and singlecenter cohorts. A repeat revision indicates that either the initial problem was not resolved despite the index revision, or that another problem occurred. Several reasons for a failed index rTKA can be: inaccurate diagnosis, the decision to choose operative versus nonoperative treatment, surgical failure, the occurrence of complications, or insufficient rehabilitation protocols. Insight into whether the reason for index rTKA is related to the same reason for the repeat rTKA might provide a base for improvement of treatment choices in these revision surgeries. Therefore, we (1) compared the repeat revision rates among the different reasons for index rTKA, and (2) evaluated how often the reason for repeat rTKA was the same as the reason for the index revision.

© 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group, on behalf of the Nordic Orthopedic Federation. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits ­unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. DOI 10.1080/17453674.2021.1925036


598

Patients and methods Data was obtained from the Dutch Arthroplasty Register (LROI), which is a nationwide register on all arthroplasties performed in the Netherlands that started in 2007. The data completeness for rTKAs is 97% up to 2018 (LROI 2019). The completeness was first assessed in 2012, yielding 86% coverage. Thus, there is no complete coverage of all rTKAs performed in the Netherlands between 2010 and 2018. All hospitals in the Netherlands report patient characteristics, surgical specifications of each knee arthroplasty procedure, and patient-reported outcomes to the LROI (LROI 2019). To ensure all revision cases were revisions after primary TKA, we retrieved data of all patients who had a primary TKA in the Netherlands between 2007 and 2018. Next, we excluded all cases without rTKA or with an rTKA registration before 2010 due to limited completeness of rTKA before 2010. The first revision after primary TKA was defined as the index rTKA. The second revision after primary TKA was defined as the repeat rTKA. Patients who had received a hinged-type prosthesis as primary implant, or who had a primary TKA performed because of a tumor, were excluded. Reasons for revision were registered in the LROI as infection, patellar dislocation, patellar pain, wear of the insert, periprosthetic fracture, malalignment, instability, loosening of the femoral component, loosening of the tibial component, loosening of the patellar button, revision after removal of prosthesis, arthrofibrosis, and other reason for revision. Multiple reasons could be reported for one revision procedure by the surgeon. When multiple reasons for revision were registered for one patient, we used a hierarchy tree to define the main reason for the revision. This hierarchy is based on the Australian Orthopaedic Association National Joint Replacement Registry (AOANJRR 2020). The hierarchy was: infection, malposition, loosening (component loosening of femur and/or tibia), patellar problems, instability, stiffness (arthrofibrosis), and other (fracture, wear insert, other non-specified). An rTKA was defined as a report of any change (insertion, replacement and/or removal) of one or more components of the prosthesis in the register. Time to event was defined as the time between the index revision surgery and repeat rTKA or death. In case of a 2-stage revision (n = 367), we used the reimplantation date as index revision. The study was conducted and reported according to STROBE guidelines. Statistics The median follow-up time was calculated using reverse Kaplan–Meier. Competing risk analysis was performed to determine the cumulative incidence of repeat revision rates after index rTKA, with death considered as competing event, stratified for the reason of index revision. Log-rank tests were used to test differences in repeat revision rate between the rea-

Acta Orthopaedica 2021; 92 (5): 597–601

sons for index revision. To evaluate the probability of having a repeat rTKA for the same reason as the index revision, we conducted a competing risk analysis. In this analysis competing events were a repeat rTKA for any reason other than the reasons for index revision and death. Differences in repeat revision rate were tested with a log-rank test. 95% confidence intervals (CI) were calculated for the cumulative incidences. All analyses were performed using R version 3.6.1 (R Foundation for Statistical Computing, Vienna, Austria) using the packages “rms” and “survival” (Harrell 2020, Therneau 2020). Ethics, funding, data sharing, and potential conflict of interest Ethical approval for the current study was not applicable according to the Dutch Medical Research Involving Human Subjects Act. Data are available from the LROI (Dutch Arthroplasty Register). This study received no funding, and the authors declare that they have no competing interests.

Results Characteristics of index revisions Between January 2010 and December 2018, a total of 8,868 patients underwent 8,978 index rTKAs as registered in the LROI (110 bilateral rTKA cases). 432 (4%) patients died during the follow-up period. The mean age at the time of the index revision surgery was 67 years (SD 9.6), and 65% were females (Table 1). A patellar problem (n = 2,058, 23%) was the most common reason for index revision; 93% of the index revisions for patellar problems were isolated patellar resurfacings. In 700 rTKAs (8%) the reason for index revision was classified as “other,” and in 354 rTKAs (4%) the reason for revision was not reported. Repeat revision TKA (Table 2) 1,123 repeat rTKAs following the index rTKA were registered. The most common reasons for repeat rTKA were infection (n = 366, 33%), instability (n = 208, 18%), and loosening (n = 195, 17%). The cumulative repeat revision rate of all index rTKA was 6% (CI 5–6) within 1 year after surgery, and 19% (CI 18–20) within 8 years. A log-rank test showed a statistically significant difference in repeat revision rate between reasons for index revision. The highest cumulative repeat revision rate within 8 years was observed for index revision for infection (28%; CI 25–32) (Figure). Patients revised for instability and stiffness had lower repeat revision rates compared with the infection group. The cumulative repeat revision rate for an index rTKA for instability was 23% (CI 18–28) at 8 years. In rTKAs revised for stiffness the cumulative repeat revision rate was 23% (CI 16–32), the maximum observed follow-up for this group, 6 years after index rTKA. rTKAs revised for loosening, malposition, or patellar problems had the lowest rate


Acta Orthopaedica 2021; 92 (5): 597–601

599

Table 1. Patient characteristics by reason for index rTKA. Values are count (%) unless otherwise specified Reason for revision a Patellar Infection Loosening Malposition problems Instability Stiffness Other Factor (n = 1,538) (n = 1,422) (n = 1,241) (n = 2,043) (n = 1,452) (n = 228) (n = 700)

Overall (n = 8,978)

Age, mean (SD) 69 (9.6) 67 (8.9) 66 (9.4) 68 (9.5) 65 (9.5) 64 (9.2) 68 (10.4) 67 (9.6) Female sex 787 (51) 955 (67) 871 (70) 1,390 (68) 964 (66) 135 (59) 451 (64) 5,787 (65) Missing 2 (0.1) 1 (0.1) 2 (0.1) 4 (0.2) 3 (0.2) 2 (0.9) 2 (0.3) 17 (0.2) ASA I 157 (10) 158 (11) 176 (14) 224 (11) 204 (14) 38 (17) 95 (14) 1,081 (12) II 853 (56) 972 (68) 837 (67) 1,439 (70) 989 (68) 154 (68) 445 (64) 5,807 (63) III–IV 504 (33) 273 (19) 205 (17) 341 (17) 233 (16) 30 (13) 138 (20) 1,780 (20) Missing 24 (1.6) 19 (1.3) 23 (1.9) 39 (1.9) 26 (1.8) 6 (2.6) 22 (3.1) 310 (3.5) Diagnosis of primary TKA Osteoarthrosis 1,435 (93) 1,344 (95) 1,169 (94) 1,950 (95) 1,350 (93) 212 (93) 655 (94) 8,446 (94) Osteonecrosis 6 (0.4) 8 (0.6) 2 (0.2) 4 (0.2) 3 (0.2) 0 (0) 2 (0.3) 26 (0.3) Posttraumatic 37 (2.4) 27 (1.9) 25 (2.0) 32 (1.6) 45 (3.1) 10 (4.4) 11 (1.6) 191 (2.1) Rheumatoid arthritis 33 (2.1) 19 (1.3) 16 (1.3) 33 (1.6) 27 (1.9) 4 (1.8) 17 (2.4) 151 (1.7) Inflammatory arthritides 3 (0.2) 0 (0) 0 (0) 0 (0) 2 (0.1) 1 (0.4) 1 (0.1) 7 (0.1) Other 10 (0.7) 5 (0.4) 8 (0.6) 6 (0.3) 8 (0.6) 1 (0.4) 4 (0.6) 49 (0.5) Missing 14 (0.9) 19 (1.3) 21 (1.7) 18 (0.9) 17 (1.2) 0 (0) 10 (1.4) 108 (1.2) Follow-up years, median 3.0 3.2 3.5 3.7 2.9 2.7 3.7 3.4 IQR 1.5–5.2 1.6–5.6 1.8–5.5 1.9–5.8 1.4–4.9 1.5–4.1 1.8–6.3 1.7–5.5 a Reasons

for revision in the table are those from the hierarchy.

Table 2. Cumulative repeat revision rate after rTKA by reason for revision

Cumulative repeat revision rate 0.4

Factor Overall Infection Loosening Malposition Patellar problems Instability Stiffness Other a b

Infection Stiffness Instability Other Patellar problems Loosening Malposition

Repeat revision rate (95% CI) at 1 year at 8 years at 8 years a 0.06 (0.05–0.06) 0.16 (0.14–0.18) 0.03 (0.02–0.04) 0.03 (0.02–0.04) 0.04 (0.03–0.05) 0.04 (0.03–0.06) 0.07 (0.04–0.11) 0.06 (0.04–0.08)

0.19 (0.18–0.20) 0.28 (0.25–0.32) 0.16 (0.13–0.19) 0.15 (0.12–0.19) 0.15 (0.13–0.17) 0.23 (0.17–0.28) 0.23 (0.15–0.31) b 0.20 (0.16–0.24)

– 0.18 (0.15–0.21) 0.05 (0.03–0.06) 0.02 (0.01–0.03) 0.02 (0.02–0.03) 0.07 (0.05–0.09) 0.07 (0.03–0.14) b 0.01 (0.00–0.04)

For the same reason as the index revision. At 6-year follow-up.

of repeat revision surgeries. The cumulative repeat revision rate within 8 years for loosening was 17% (CI 14–20), and for malposition and patellar problems 15% (CI 11–19). Reason for repeat revision by reason for index revision In cases index revised for infection who needed repeat rTKA within 8 years, the most common reason for the repeat rTKA was infection (18%; CI 15–21; Table 2). Similar results were observed when an index revision was performed for instability, stiffness, or loosening. The cumulative incidence of a repeat revision for the same reason as the index revision was 8% (CI 6–10) for instability, 7% (CI 3–14) for stiffness, and 5% (CI 4–7) for loosening. See Supplementary data for the cumulative repeat revision rates and specified reason for repeat rTKA.

0.3

0.2

0.1

0 0

1

2

3

4

5

6

7

8

9

Years from index revision surgery Numbers at risk Infection Stiffness Instability Other Patellar problems Loosening Malposition

1,464 227 1,452 699 2,040 1,422 1‚238

991 172 1,142 568 1,690 1,153 1037

744 118 835 444 1,348 877 812

513 82 594 346 1,053 663 612

355 47 391 263 757 501 447

250 16 267 200 521 358 311

156 1 160 144 348 232 191

93 0 82 75 201 141 104

45 0 28 23 86 57 32

Cumulative repeat revision rate of index rTKA by reason for revision.

Discussion Poorest outcomes in terms of a repeat rTKA were observed in patients who had had an rTKA for infection. More than 1 in 4 cases revised for infection needed repeat rTKA for any


600

reason; almost 1 in 5 had a repeat rTKA due to a new or recurrent infection, within 8 years after index surgery. The lowest repeat revision rates were observed in index rTKAs for aseptic loosening, malposition, or patellar problems. However, repeat revision rates in these groups were still substantial, with a cumulative repeat revision rate between 15% and 23%. Consistent with infection, in index rTKAs revised for loosening, instability, or stiffness the most prevalent reason for the repeat revision was the same as the index revision. The most common reason for index rTKA was patellar problems (23%), while in other registries infection and loosening are reported as most common reasons for revision (National Joint Registry 2020). An explanation for this finding may stem from the relatively low percentage of primary TKAs with resurfaced patellae in Dutch clinical practice (18%) compared with most other registries (4–82%) (Fraser and Spangehl 2017). This increases the likelihood that in the case of poor outcomes in non-resurfaced primary TKAs, a first step is to resurface the patella in a reoperation (Teel et al. 2019). Indeed, in our dataset most index rTKAs in patients with patellar problems were isolated resurfacings (> 92%). A large body of literature has consistently shown that periprosthetic joint infections are difficult to treat (Mortazavi et al. 2011, Kurtz et al. 2018, Leta et al. 2019). Our findings of the repeat revision rate after revision for infection are comparable to the Norwegian Arthroplasty Registry. 5 years after rTKA for infection, 21% of the patients had a repeat rTKA (Leta et al. 2019). The majority of these patients underwent a repeat rTKA due to infection (85 of the 104 repeat revision cases). The large number of infections in index and repeat rTKAs shows that we should keep focusing on the treatment and prevention of joint infections. It is worth mentioning that more patients revised for infection were classified as ASA class 3+4 compared with the other reasons for revision (33% vs. 20% overall). Whether patients with high ASA class are more susceptible to infection, patients with an infection are more likely to receive revision surgery even if they are ASA 3+, or patients with a high ASA class are more likely to need repeat rTKA cannot be concluded from our data. We observed a higher repeat revision rate after index rTKA for instability and stiffness compared with the NJR (NJR number of subsequent repeat rTKA: 10% after instability, 12% after stiffness) (National Joint Registry 2020). These differences might be explained by the method of reporting the incidence (cumulative incidence versus percentage by the NJR), due to different definitions of the indications, or due to the willingness to reoperate. Nonetheless, the NJR reported that instability, infection, and stiffness are more common indications for repeat rTKA than for index rTKA, which corresponds to the results of our study. The NJR hypothesizes that repeat rTKA for instability, infection, and stiffness reflects the complexity and soft tissue element that contribute to the outcome of rTKA (National Joint Registry 2020). The latter is consistent with the generally poor results that are reported

Acta Orthopaedica 2021; 92 (5): 597–601

after rTKA for stiffness and instability (Kim et al. 2010, Malviya et al. 2012, Luttjeboer et al. 2016). Lowest repeat revision rates were found in patients revised for loosening, malposition, and patellar problems. This is in line with multiple previous studies (Sheng et al. 2006, Baker et al. 2012, Leta et al. 2015). However, the majority of the index revisions for patellar problems were isolated patellar resurfacings (93%). In 10% of the cases this isolated patellar resurfacing was followed by a subsequent repeat revision for amongst other causes infection, malposition, and instability. This suggests that the initial patellar resurfacing did not address the original failure diagnosis or induced a new one. Our findings should be regarded in the context of a number of strengths and limitations. The use of nationwide registry data has benefits, including the large sample size and high generalizability. Another strength is we accounted for death as competing event in the survival analysis of revision TKA, which potentially provides a more accurate estimate of the repeat revision rate than Kaplan–Meyer analysis. Also, we did not limit the inclusion of rTKAs to patients who had a primary TKA for osteoarthritis (OA), to make the results generalizable to all revision TKA patients. We performed an additional analysis where we included only patients with OA. This additional analysis showed cumulative repeat revision rates similar to those reported in the current manuscript. A limitation of our analysis method is that a subject can only have 1 reason for revision in the analysis, while multiple reasons were reported in some cases. Therefore, we used a hierarchy in the reasons for revision to rank cases with more than 1 reason for revision. A sensitivity analysis showed this resulted in slightly different cumulative repeat revision rate estimates (see Supplementary data). Second, to ensure that all cases in our study were the first revision after primary TKA, we included only cases with the primary TKA registered. As a consequence, the follow-up time of the patients was limited. Complications that often present shortly after surgery, such as infection, are therefore better represented in the data compared with long-term complications such as loosening, resulting in higher repeat rTKA estimates for the short-term reasons for revision compared with the reasons that present long term. Thus, the repeat revision surgeries were mostly due to shortto mid-term complications. Third, the reason for revision was registered by orthopedic surgeons who may use different interpretations of the definitions for the reasons. Another limitation related to the registry data is that the registry forms are filled in once, directly following the surgery. A (suspected) infection might not be proven at that point; thus cases of infection might still be underreported despite the already high proportion of revision due to infection (Gundtoft et al. 2016, Afzal et al. 2019). Also, the registry does not have complete coverage of all primary and rTKA procedures performed in the Netherlands between 2007 and 2018. Fourth, we did not correct for correlated bilateral cases in the analysis, while the methods of our statistical analysis do assume independent


Acta Orthopaedica 2021; 92 (5): 597–601

observations, although previous studies have shown bilateral surgeries do not introduce significant dependency problems in register studies (Robertsson and Ranstam 2003, Park et al. 2010). Finally, we acknowledge the ongoing discussion of survival analysis in arthroplasty registers considering ease of interpretation versus accuracy of survival. Kaplan–Meier and competing risk analysis each have their advantages and disadvantages. However, we decided to report cumulative incidences of repeat rTKA. In conclusion, the reason for index revision seems to be associated with the incidence of repeat rTKA at 8 years’ follow-up. Poorest outcomes were found for rTKA for infection: more than 1 in 4 infection rTKAs required another surgical intervention, often due to a new or persistent infection. Recurrence of other reasons for revision (instability, stiffness, and loosening) was also considerable. This study confirms the complex treatment to manage periprosthetic infections. Our findings also emphasize the importance of a clear diagnosis before doing rTKA to avert second revision surgeries. Note Please note that there is a relatively large difference in numbers of index rTKAs included in this study and reported in the annual report of the LROI. This difference is due to a difference in selection of patients. In the annual report of the LROI, 2-stage revisions and isolated patellar resurfacing revisions were not included. These cases are, however, included in this study. Also, in the present study the selection period was limited to 2010–2018. Supplementary data The cumulative repeat revision rates per reason for repeat rTKA and the sensitivity analysis are available as supplementary data in the online version of this article, http://dx.doi.org/ 10.1080/17453674.2021.1925036 MB, GH, JS, BS, KS: concept and design. MB, AS: collection and assembly of data. MB, GH: data analysis. MB, GH, JS, AS, BS, KS: interpretation of the data. MB: drafting of manuscript. MB, GH, JS, AS, BS, KS: critical revision and final approval of the version submitted.   Acta thanks Geir Hallan and Annette W-Dahl for help with peer review of this study.

Afzal I, Radha S, Smoljanovi T, Stafford G H, Twyman R, Field R E. Validation of revision data for total hip and knee replacements undertaken at a high volume orthopaedic centre against data held on the National Joint Registry. J Orthop Surg Res 2019; 14(1): 318. AOANJRR. AOANJRR annual report 2020. https://aoanjrr.sahmri.com/annualreports-2020 Baker P, Cowling P, Kurtz S, Jameson S, Gregg P, Deehan D. Reason for revision influences early patient outcomes after aseptic knee revision knee. Clin Orthop Relat Res 2012; 470(8): 2244-52. Fraser J F, Spangehl M J. International rates of patellar resurfacing in primary total knee arthroplasty, 2004–2014. J Arthroplasty 2017; 32(1): 83-6.

601

Greidanus N V, Peterson R C, Masri B A, Garbuz D S. Quality of life outcomes in revision versus primary total knee arthroplasty. J Arthroplasty 2011; 26(4): 615-20. Gundtoft P H, Pedersen A B, Schønheyder H C, Overgaard S. Validation of the diagnosis “prosthetic joint infection” in the Danish Hip Arthroplasty Register. Bone Joint J 2016; 98-B(3): 320-5. Harrell Jr F E. rms: Regression modeling strategies. R package version 6.0-0; 2020. Kim G K, Mortazavi S M J, Purtill J J, Sharkey P F, Hozack W J, Parvizi J. Stiffness after revision total knee arthroplasty. J Arthroplasty 2010; 25(6): 844-50. Kurtz S, Ong K, Lau E, Mowat F, Halpern M. Projections of primary and revision hip and knee arthroplasty in the United States from 2005 to 2030. J Bone Joint Surg Am 2007; 89(4): 780-5. Kurtz S M, Lau E C, Son M-S, Chang E T, Zimmerli W, Parvizi J. Are we winning or losing the battle with periprosthetic joint infection: trends in periprosthetic joint infection and mortality risk for the Medicare population. J Arthroplasty 2018; 33(10): 3238-45. Leta T H, Lygre S H L, Skredderstuen A, Hallan G, Furnes O. Failure of aseptic revision total knee arthroplasties: 145 revision failures from the Norwegian Arthroplasty Register, 1994–2011. Acta Orthop 2015; 86(1): 48-57. Leta T H, Lygre S H L, Schrama J C, Hallan G, Gjertsen J E, Dale H, Furnes O. Outcome of revision surgery for infection after total knee arthroplasty: results of 3 surgical strategies. JBJS Rev 2019; 7(6): 1-10. LROI. LROI rapportage; 2019. https://www.lroi-report.nl/ Luttjeboer J S, Bénard M R, Defoort K C, van Hellemondt G G, Wymenga A B. Revision total knee arthroplasty for instability: outcome for different types of instability and implants. J Arthroplasty 2016; 31(12): 2672-6. Malviya A, Brewster N T, Bettinson K, Holland J P, Weir D J, Deehan D J. Functional outcome following aseptic single-stage revision knee arthroplasty. Knee Surgery, Sport Traumatol Arthrosc 2012; 20(10): 1994-2001. Mortazavi J S M, Molligan J, Austin M S, Purtill J J, Hozack W J, Parvizi J. Failure following revision total knee arthroplasty: infection is the major cause. Int Orthop 2011; 35(8): 1157-64. National Joint Registry. 17th Annual Report for England, Wales and Northern Ireland. Bristol: NJR; 2020. https://reports.njrcentre.org.uk/ Nichols C I, Vose J G. Clinical outcomes and costs within 90 days of primary or revision total joint arthroplasty. J Arthroplasty 2016; 31(7): 1400-1406.e3. Park M, Kim S, Chung C, Choi I, Lee S, Lee K. Statistical consideration for bilateral cases in orthopaedic research. J Bone Joint Surg Am 2010; 92(8): 1732-7. Patel A, Pavlou G, Mújica-Mota R E, Toms A D. The epidemiology of revision total knee and hip arthroplasty in England and Wales: a comparative analysis with projections for the United States. A study using the National Joint Registry dataset. Bone Joint J 2015; 97-B(8): 1076-81. Pun S Y, Ries M D. Effect of gender and preoperative diagnosis on results of revision total knee arthroplasty. Clin Orthop Relat Res 2008; 446(11): 2701-5. Robertsson O, Ranstam J. No bias of ignored bilaterality when analysing the revision risk of knee prostheses: analysis of a population based sample of 44,590 patients with 55,298 knee prostheses from the national Swedish Knee Arthroplasty Register. BMC Musculoskelet Disord 2003; 4: 1-4. Sheng P-Y, Konttinen L, Lehto M, Ogino D, Jämsen E, Nevalainen J, Pajamäki J, Halonen P, Konttinen Y T. Revision total knee arthroplasty: 1990 through 2002. A review of the Finnish arthroplasty registry. J Bone Joint Surg Am 2006; 88(7): 1425-30. Teel A J, Esposito J G, Lanting B A, Howard J L, Schemitsch E H. Patellar resurfacing in primary total knee arthroplasty: a meta-analysis of randomized controlled trials. J Arthroplasty 2019; 34(12): 3124-32. Therneau T. A package for survival analysis in R. R package version 3.2-7; 2020. Van Kempen R W T M, Schimmel J J P, Van Hellemondt G G, Vandenneucker H, Wymenga A B. Reason for revision TKA predicts clinical outcome: prospective evaluation of 150 consecutive patients with 2-years followup knee. Clin Orthop Relat Res 2013; 471(7): 2296-302.


602

Acta Orthopaedica 2021; 92 (5): 602–607

Short-term functional outcome after fast-track primary total knee arthroplasty: analysis of 623 patients Jeroen C VAN EGMOND, Brechtje HESSELING, Hennie VERBURG, and Nina M C MATHIJSSEN

Department of Orthopaedics, Reinier Haga Orthopedisch Centrum, Zoetermeer, the Netherlands Correspondence: j.vanegmond@rhoc.nl Submitted 2021-02-25. Accepted 2021-04-15.

Background and purpose — Early functional outcome after total knee arthroplasty (TKA) has been described before, but without focus on the presence of certain functional recovery patterns. We investigated patterns of functional recovery during the first 3 months after TKA and determined characteristics for non-responders in functional outcome. Patients and methods — All primary TKA in a fasttrack setting with complete patient-reported outcome measures (PROMs) preoperatively, at 6 weeks, and 3 months postoperatively were included. Included PROMs were Oxford Knee Score (OKS), Knee disability and Osteoarthritis Outcome Score Physical Function Short-Form (KOOS-PS), and EuroQol 5 dimensions (EQ-5D) including the self-rated health Visual Analogue Scale (VAS). Patients with improvement on OKS less than the minimal clinically important difference (MCID) were determined as non-responders at that time point. Characteristics between groups of responders and non-responders in functional recovery were tested for differences: we defined 4 groups a priori, based on the responder status at each time point. Results — 623 patients were included. At 6 weeks OKS, KOOS-PS, and EQ-5D self-rated health VAS were statistically significant improved compared with preoperative scores. The mean improvement was clinically relevant at 6 weeks for KOOS-PS and at 3 months for OKS. Patient characteristics in non-responders were higher BMI and worse scores on EQ-5D items: mobility, self-care, usual activities, and anxiety/depression. Interpretation — Both statistically significant and clinically relevant functional improvement were found in most patients during the first 3 months after primary TKA. Presumed modifiable patient characteristics in non-responders on early functional outcome were BMI and anxiety/ depression.

Most arthroplasty research has focused on long-term functional outcomes and survival of the prosthesis. These outcomes have frequently been used for quality assessments and performance outcomes of the prosthesis itself. Because around 20% of patients remain unsatisfied after total knee arthroplasty (TKA) (Baker et al. 2007, Bourne et al. 2010), studying early functional outcome patterns more closely might provide important information to further optimize rehabilitation and patient satisfaction. In a recent article by van Egmond et al. (2021) 3 distinct recovery trajectories were found after TKA, using preoperative, 6 months, and 12 months postoperative Oxford Knee Scores (OKS), of which 2 trajectories at 6 months had approximately the same trajectory and subsequently diverged. Relatively similar patterns have seen in total hip arthroplasty (THA) (Hesseling et al. 2019). Several studies on early function, pain, and quality of life outcomes after TKA have been published (Andersen et al. 2009, Larsen et al. 2012, Jakobsen et al. 2014, Castorina et al. 2017, Schotanus et al. 2017, Husted et al. 2021). Moreover, Canfield et al. (2020) concluded that most improvement in function and pain is gained during the first 6 months postoperatively. Although functional rehabilitation in TKA and THA patients before 6 months has been studied (Van Egmond et al. 2015, Klapwijk et al. 2017), the question remains whether differences in functional recovery patterns exist before the 6-month mark in TKA patients. We expect that rehabilitation might be further optimized with knowledge of early functional rehabilitation patterns. Therefore, the primary objective of this study was to determine patterns in functional outcome at 6 weeks and 3 months after primary TKA. Secondary objectives were a non-responder analysis and to determine characteristics for non-responders in early functional recovery.

© 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group, on behalf of the Nordic Orthopedic Federation. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits ­unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. DOI 10.1080/17453674.2021.1925412


Acta Orthopaedica 2021; 92 (5): 602–607

Patients and methods This is a retrospective exploratory cohort study. Data, all prospectively collected, was gathered from the digital PROMs database of our institution. Patients As standard procedure in our institution, during the study period from January 2015 to August 2017, all patients with primary TKA were asked to complete PROMs preoperatively, and received digital PROMs questionnaires at 6 weeks and 3 months postoperatively (OnlinePROMs, Amsterdam, the Netherlands). All patients who underwent primary TKA with fast-track recovery at our institution during the study period were eligible for inclusion. Patients with completed PROMs at all 3 time points were included for analysis. In patients with bilateral TKA during the inclusion period only the results of the first TKA were analyzed. Measurements Included PROMs were OKS, Knee disability and Osteoarthritis Outcome Score Physical Function Short-Form (KOOSPS), and EuroQol 5 dimensions (EQ-5D-3L). The EQ-5D-3L questionnaire comprises 5 questions on the dimensions of health, including mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. The second part of the EQ-5D contains a self-rated health score on a visual analogue scale (VAS) from 0 to 100, where 0 represents the worst imaginable health and 100 the best imaginable health. The EQ-5D self-rated health VAS was used from every administration to determine general health improvement (Devlin et al. 2010). The KOOS-PS score ranges from 0 to 100%, where 0% represents no difficulty in physical functioning (Perruccio et al. 2008). The minimal clinically important difference (MCID) is 4% for KOOS-PS, while a moderate improvement is stated at 32% (Singh et al. 2014). The OKS is based on 12 questions regarding pain and function of the knee. Total score ranges from 0 to 48 with higher scores indicating better function and less pain (Dawson et al. 1998). Anchor-based methods showed that a change in score of approximately 9 points on the OKS indicates a meaningful improvement at the group level (Beard et al. 2015). Missing data was handled according to the specific questionnaire rules (Murray et al. 2007). For non-responder analysis we used the OKS, mainly to ensure our results could be compared with our previous study. Moreover, we find the OKS to cover a broader range of functional outcome than the KOOS-PS. Patients were rated as responders based on MCID of the OKS; an improvement on OKS above the MCID of 9 points labelled patients as responders. Both at 6 weeks and 3 months improvement

603

was determined. Consequently 4 groups were formed including: (1) responder at 6 weeks, responder at 3 months; (2) non-responder at 6 weeks, responder at 3 months; (3) nonresponder at 6 weeks, non-responder at 3 months; and (4) responder at 6 weeks, non-responder at 3 months. Statistics Normally distributed outcomes were presented as mean and 95% confidence interval (CI). Not normally distributed outcomes were presented as median, total, and interquartile range (IQR). Repeated-measures ANOVA was used to determine changes in outcome over time for OKS, KOOS-PS, and EQ-5D selfrated health VAS separately, using all 3 time points. If there was a statistically significant change over time, a priori planned post-hoc ANOVA analysis was performed to compare preoperative scores with 6 weeks, and scores at 6 weeks with 3 months to determine at which point in time the scores improved (Twisk 2003). For responder analysis only the OKS was used to determine whether a patient was a responder. Groups of responders and non-responders were compared and tested for differences on their characteristics using chi-square, Kruskal–Wallis, and ANOVA. If there was an overall statistically significant difference between the groups, a priori planned post-hoc Mann– Whitney U analysis with Bonferroni correction was performed to test which groups differed. Characteristics of interest were dichotomized for analysis; age (≤ 75 vs. > 75), ASA (class I–II vs. III–IV), and EQ-5D scores (no problems vs. moderate-to-severe problems). For statistical analyses IBM SPSS statistics version 25 (IBM Corp, Armonk, NY, USA) was used. A p-value of 0.05 or lower was considered statistically significant. Ethics, funding, and potential conflicts of interest This study did not fall under the scope of the research with human subjects Act according to the local ethical committee as this study placed no additional burden on the patient. This study was conducted according to the Declaration of Helsinki (version 64, October 2013). No funding was received for this study. The authors have no conflicts of interest to declare.

Results 623 patients with unilateral primary TKA in a fast-track setting were included (Table 1). Median age was 70 years, and 420 (67%) patients were female. 437 patients (70%) were classified as ASA 2. Median BMI was 29 (IQR 26–36) for the total group. Primary outcome Both the function scores (OKS and KOOS-PS) and EQ-5D self-rated health VAS improved during the first 3 months as


604

Acta Orthopaedica 2021; 92 (5): 602–607

Table 1. Patient demographics (N = 623). Values are count (%) unless otherwise specified

Table 2. Median function scores at the 3 time points. Values are median [IQR] (range)

Demographics

623 TKA

Item

Age, median [IQR] (range) Female sex Smoking yes ASA score I II III BMI Normal weight (< 25) Overweight (25–30) Obesity (> 30) LOS, median [IQR] (range)

70 [64–77] (32–93) 420 (67) 64 (10)

OKS KOOS-PS EQ-5D VAS

100 (16) 437 (70) 86 (14)

a b

LOS = length of stay by hospital nights.

presented in Table 2. Since all scores were not normally distributed the median, total range, and IQR were presented. Repeated-measures ANOVA of both function scores and EQ-5D self-rated health VAS over the 3-month postoperative period showed statistically significant improvement. For OKS, KOOS-PS, and EQ-5D self-rated health VAS the Wilks’ λ was < 0.001 for the preoperative to 6 weeks period and from 6 weeks to 3 months as well. The improvement on KOOS-PS at 6 weeks postoperatively was 11%, which is clinically relevant. At 3 months, compared with preoperatively, an improvement of 16% was found (Figure 1). The OKS improved 7 points during the first 6 weeks, which is statistically significant but not clinically relevant. At 3 months a statistically significant and clinically relevant improvement of 12 points was found (Figure 2). The EQ-5D self-rated health VAS showed improvement both at 6 weeks and 3 months postoperatively, of respectively 4 and 8 points compared with preoperative levels (Figure 3).

80

23 [17–28] (2–45) 30 [25–36] (4–48) a 51 [42–62] (15–100) 40 [34–46] (0–100) a 71 [60–81] (3–100) 75 [60–86] (0–100) a

40

Median EQ-5D VAS health

Group 1 Group 2 Group 3 Group 4 Total group

80

30

60

60

40

20

40

20

10

20

0

Preoperative

6 weeks

3 months

Time points Figure 1. Median KOOS-PS course in 3 months.

0

35 [29–41] (9–48) b 35 [28–44] (0–100) b 79 [65–88] (6–100) b

100

48 Group 1 (responders at 6 weeks, responders at 3 months) Group 2 (non-responders at 6 weeks, responders at 3 months) Group 3 (non-responders at 6 weeks, non-responders at 3 months) Group 4 (responders at 6 weeks, non-responders at 3 months) Total group

3 months

Secondary outcome Responder analysis was performed based on MCID of OKS at 6 weeks and 3 months postoperatively, compared with preoperative scores. The percentage of responders improved from 44% at 6 weeks to 67% at 3 months. The predefined 4 groups comprised: (1) responder at 6 weeks, responder at 3 months (41%); (2) non-responder at 6 weeks, responder at 3 months (23%); (3) non-responder at 6 weeks, non-responder at 3 months (33%); and (4) responder at 6 weeks, non-responder at 3 months (3%). Groups 1 and 2 were determined as responders versus groups 3 and 4 as non-responders. There was a statistically significant difference between the groups regarding BMI and EQ-5D items: mobility, selfcare, usual activities, and anxiety/depression (Table 3). In the planned post-hoc analysis groups 1 and 2 were mostly comparative (Table 4). On EQ-5D anxiety/depression, group 4 differed from the other groups (Table 4). Finally, the distribution of normal and high BMI was different between groups 1 and 2 compared with group 4 (Table 3). The post-hoc pairwise analysis presented in Table 4 showed a statistically significant difference between groups 3 and 4 for BMI. The median OKS in group 1 improved from 19 preoperatively to 40 at 3 months and group 2 improved from 23 to 37

Median OKS

100

6 weeks

Significant improvement between preoperative and 6 weeks, Wilks’ λ < 0.001. Significant improvement between 6 weeks and 3 months, Wilks’ λ < 0.001.

100 (16) 267 (43) 256 (41) 2 [2–3] (0–9)

Median KOOS-PS

Preoperative

Preoperative

6 weeks

3 months

Time points

Figure 2. Median OKS course in 3 months.

0

Group 1 Group 2 Group 3 Group 4 Total group

Preoperative

6 weeks

3 months

Time points

Figure 3. Median EQ-5D VAS health course in 3 months.


Acta Orthopaedica 2021; 92 (5): 602–607

605

Table 3. Responder analysis. Values are count (%) unless otherwise specified Factor

Group 1

Group 2

Discussion

Group 3 Group 4

The primary goal of this study was to determine patterns in early functional outcome after primary TKA. The most important finding was the statistically significant and clinically relevant early improvement of both function scores at 6 weeks and 3 months postoperatively for the sample as a whole. Moreover, we examined 4 a priori defined subgroups. Patient characteristics for non-responders were higher BMI and worse scores on EQ-5D items: mobility, self-care, usual activities, and anxiety/depression With the knowledge that subgroups in TKA recovery exist, based on this study and previous studies, we have to use this knowledge to further improve rehabilitation and outcomes. For example, expectation management can be used in patients at risk of non-responding. Recently, preoperative education and expectation modification was found to increase fulfillment of expectations and concomitant higher satisfaction (Tolk et al. 2021). Therefore more individual rehabilitation might be needed instead of the usual generic type. Preoperative education and the outpatient physical therapist might play a major role in this, as patients are admitted to the hospital relatively soon after this. In addition, further studies are needed on how to identify non-responding patients preoperatively and provide better selection criteria. Further research is also needed to find what will help nonresponding patients preoperatively and during the postoperative rehabilitation. There might, for example, be a need for more support or guidance in the rehabilitation by a physical therapist.

Total number 218 (41) 119 (23) 177 (33) 14 (3) Age ≤ 75 years 142 (65) 90 (76) 121 (68) 10 (71) 0.3 a > 75 years 76 (35) 29 (24) 56 (32) 4 (29) BMI < 25 26 (12) 16 (13) 42 (24) 1 (7) 0.02 b 25–30 101 (46) 55 (46) 71 (40) 3 (21) > 30 91 (42) 48 (40) 64 (36) 10 (71) Smoking Yes 21 (10) 11 (9) 19 (11) 1 (7) 1.0 a No 197 (90) 108 (91) 158 (89) 13 (93) ASA I–II 197 (90) 102 (86) 155 (88) 11 (79) 0.4 b III–IV 21 (10) 17 (14) 22 (12) 3 (21) Preoperative EQ-5D Mobility No problems 8 (4) 2 (2) 18 (10) 0 (0) 0.004 b Some problems in walking or confined to bed 210 (96) 117 (98) 159 (90) 14 (100) Self-care No problems 165 (76) 101 (84) 151 (85) 7 (50) 0.001 b Some problems or unable to wash or dress 53 (24) 18 (15) 26 (15) 7 (50) Usual activities No problems 32 (15) 18 (15) 50 (28) 0 (0) 0.001 b Some problems or unable to perform usual activities 186 (85) 101 (85) 127 (72) 14 (100) Pain/ discomfort No pain or discomfort 16 (7) 10 (8) 25 (14) 1 (7) 0.1 b Moderate or extreme pain or discomfort 202 (93) 109 (92) 152 (86) 13 (93) Anxiety/ depression Not anxious or depressed 173 (79) 91 (76) 137 (77) 4 (29) < 0.001 b Moderate or extremely anxious or depressed 45 (21) 28 (24) 40 (23) 10 (71) VAS health, median 70 74 73 67 0.3 c IQR 60–81 60–82 60–82 50–79 range 11–100 25–100 13–100 3–100 Group 1: responders at 6 weeks, responders at 3 months. Group 2: non-responders at 6 weeks, responders at 3 months. Group 3: non-responders at 6 weeks, non-responders at 3 months. Group 4: responders at 6 weeks, non-responders at 3 months. a Chi-square b Kruskall–Wallis c ANOVA

(Table 5). This is in contrast to groups 3 and 4 where median OKS at 3 months showed only minimal improvement from 27 preoperatively to 29 at 3 months for group 3, and 19 to 25 for group 4 (Table 5). Table 4. Post-hoc pairwise analysis Item

Group Group Group Group Group Group 1 vs. 2 1 vs. 3 1 vs. 4 2 vs. 3 2 vs. 4 3 vs. 4

Table 5. OKS per group for each time-point. Values are median [IQR] (range)

BMI 1.0 0.2 0.4 0.8 0.3 0.05 EQ-5D Mobility 1.0 0.03 1.0 0.008 1.0 0.6 Self-care 0.3 0.1 0.1 1.0 0.01 0.008 Usual activities 1.0 0.004 1.0 0.03 1.0 0.06 Anxiety/depression 1.0 1.0 < 0.001 1.0 < 0.001 < 0.001

1 2 3 4

Group 1–4: See Table 3.

Group 1–4: See Table 3.

Group

Preoperative 19 [14–24] (3–36) 23 [19–27] (3–37) 27 [22–31] (5–45) 19 [16–25] (9–28)

6 weeks 35 [30–40] (12–48) 26 [22–31] (11–42) 26 [21–31] (5–46) 32 [27–36] 19–39)

3 months 40 [34–44] (21–48) 37 [34–42] (16–48) 29 [24–34] (9–47) 25 [20–28] (15–33)


606

Besides the improvement on both function scores, there was also a statistically significant improvement in EQ-5D selfrated health VAS. This is in line with the findings of Larsen et al. (2012), who found improved health-related quality of life scores in knee arthroplasty patients with no or mild pain and good function. To the best of our knowledge, no MCID has been determined for EQ-5D self-rated health VAS, therefore it is unknown whether the improvement was clinically relevant as well. In this study the EQ-5D self-rated health VAS was used instead of the index score of the EQ-5D, because we were not interested in estimating quality-adjusted life years (QALYs). Moreover, index scores are not comparable internationally, as converting EQ-5D to an index score is referenced nationally. Our findings were also in accordance with Husted et al. (2021), who found a median OKS 3 months postoperatively of 32 and 31 in the group discharged on day of surgery and not discharged on day of surgery, respectively. We found in our analysis of fast-track TKA patients a median OKS at 3 months of 35 (IQR 29–41). We used the OKS for non-responder analysis, as this validated score was previously used in the study by van Egmond et al. (2021). Therefore our results would be more easily compared with the results from that study. Moreover, we find the OKS covers a broader range of functional outcome than the KOOS-PS. Characteristics of interests were dichotomized including age (≤ 75 vs. >75), ASA (1–2 vs. 3–4), and EQ-5D (no problems vs. moderate-to-severe problems), and BMI was divided into 3 groups, to prevent small group sizes in analysis. The post hoc pairwise analysis in Table 4 shows a statistically significant BMI between groups 3 and 4. However, Table 3 presents an obvious difference in percentages of high and normal BMI between groups 1 and 2 compared with group 4. Even though these differences did not reach statistical significance, we find these differences large enough to be of clinical relevance. In our non-responder analysis, non-responders differed on the EQ-5D items mobility, self-care, usual activities, and anxiety/depression, compared with the other groups. In several studies, poor mental health is related to poor functional outcome (Sorel et al. 2019, Melnic et al. 2021, Hafkamp et al. 2021). Other studies are less distinct and did not find a relationship between anxiety and suboptimal outcomes (Wood et al. 2021). Nevertheless, previous studies showed that psychological support might lead to lower incidence of pain, anxiety/depression, and improve faster recovery (Tristaino et al. 2016, Sorel et al. 2020). Preoperative analysis of the presence of these factors and concomitant treatment might be an effective way to improve the satisfaction rate of TKA. Currently we perform no preoperative screening for psychological status in our institution. This might be feasible with the Pain Catastrophizing Scale (PCS) or Hospital Anxiety and Depression Scale (HADS) (Mercurio et al. 2020). The recently published systematic review by Sorel et al. is promising and described various interventions with good effect on postoperative pain,

Acta Orthopaedica 2021; 92 (5): 602–607

quality of life, and function for psychological distress in TKA patients (Sorel et al. 2020). Therefore further studies are needed to identify these patients preoperatively and to examine in which way adequate therapy can be provided in this setting. A major strength of this study is the relatively large number of included patients. However, there are some limitations of this study. The most important is the retrospective design with all its known forms of bias. However, all data was collected prospectively and validated questionnaires have been used. Furthermore, the results were based on single institutional data, which might make the results less generalizable. Given that our results were comparable to previously published studies from other countries, we feel this is a minor limitation. First multinomial logistic regression analysis was performed to test for patient characteristics in the 4 determined groups. Because errors occurred due to small group sizes, these analyses were not valid. Therefore, descriptive statistics were performed resulting in a more exploratory study. No causal relations can be drawn from our non-responder analysis. However, this is the first study that presents patterns in early functional outcome after TKA. New studies are needed to confirm and further define our findings. We used PROMs to determine early functional recovery after TKA. Previous studies concluded that improvement in PROMs does not correlate with objectively assessed function (Luna et al. 2017, Fransen et al. 2019). We are aware that our findings based on PROMs might not fully represent objective function. However, as the subjective PROMs relate to how patients themselves experience their function, we regard this as a highly valuable outcome. Finally, no PROMs data was available at further time points up to 1 year to present a detailed course of functional outcomes during the first postoperative year. In conclusion, orthopedic surgeons and patients can expect improved functional outcomes early after TKA surgery at 6 weeks postoperatively and substantial improvement at 3 months. Concomitant health status improvement was detected as well in this early postoperative phase. Modifiable patient characteristics for non-responders on early functional outcome were BMI and anxiety/depression. Preoperative treatment of these factors might improve postoperative outcomes.

The authors would like to thank all patients for completing the questionnaires. JE designed the study, performed data collection, data analysis, and wrote and revised the manuscript. BH supported data analysis and critically reviewed the manuscript. HV operated on several of the included patients, and critically reviewed the manuscript. NM designed the study, supported data analysis, and critically reviewed the manuscript.   Acta thanks Henrik Husted for help with peer review of this study.


Acta Orthopaedica 2021; 92 (5): 602–607

Andersen L O, Gaarn-Larsen L, Kristensen B B, Husted H, Otte K S, Kehlet H. Subacute pain and function after fast-track hip and knee arthroplasty. Anaesthesia 2009; 64(5): 508-13. doi: 10.1111/j.13652044.2008.05831.x. Baker P N, van der Meulen J H, Lewsey J, Gregg P J, National Joint Registry for E, Wales. The role of pain and function in determining patient satisfaction after total knee replacement: data from the National Joint Registry for England and Wales. J Bone Joint Surg Br 2007; 89(7): 893-900. doi: 10.1302/0301-620X.89B7.19091. Beard D J, Harris K, Dawson J, Doll H, Murray D W, Carr A J, Price A J. Meaningful changes for the Oxford hip and knee scores after joint replacement surgery. J Clin Epidemiol 2015; 68(1): 73-9. doi: 10.1016/j. jclinepi.2014.08.009. Bourne R B, Chesworth B M, Davis A M, Mahomed N N, Charron K D. Patient satisfaction after total knee arthroplasty: who is satisfied and who is not? Clin Orthop Relat Res 2010; 468(1): 57-63. doi: 10.1007/s11999-009-1119-9. Canfield M, Savoy L, Cote M P, Halawi M J. Patient-reported outcome measures in total joint arthroplasty: defining the optimal collection window. Arthroplast Today 2020; 6(1): 62-7. doi: 10.1016/j.artd.2019.10.003. Castorina S, Guglielmino C, Castrogiovanni P, Szychlinska M A, Ioppolo F, Massimino P, Leonardi P, Maci C, Iannuzzi M, Di Giunta A, Musumeci G. Clinical evidence of traditional vs fast track recovery methodologies after total arthroplasty for osteoarthritic knee treatment: a retrospective observational study. Muscles Ligaments Tendons J 2017; 7(3): 504-13. doi: 10.11138/mltj/2017.7.3.504. Dawson J, Fitzpatrick R, Murray D, Carr A. Questionnaire on the perceptions of patients about total knee replacement. J Bone Joint Surg Br 1998; 80(1): 63-9. doi: 10.1302/0301-620x.80b1.7859. Devlin N J, Parkin D, Browne J. Patient-reported outcome measures in the NHS: new methods for analysing and reporting EQ-5D data. Health Econ 2010; 19(8): 886-905. doi: 10.1002/hec.1608. Fransen B L, Mathijssen N M C, Slot K, de Esch N H H, Verburg H, Temmerman O P P, Hoozemans M J M, van Dieen J H. Gait quality assessed by trunk accelerometry after total knee arthroplasty and its association with patient related outcome measures. Clin Biomech (Bristol, Avon) 2019; 70: 192-6. doi: 10.1016/j.clinbiomech.2019.10.007. Hafkamp F J, de Vries J, Gosens T, den Oudsten B L. The relationship between psychological aspects and trajectories of symptoms in total knee arthroplasty and total hip arthroplasty. J Arthroplasty 2021; 36(1): 78-87. doi: 10.1016/j.arth.2020.07.071. Hesseling B, Mathijssen N M C, van Steenbergen L N, Melles M, Vehmeijer S B W, Porsius J T. Fast starters, slow starters, and late dippers: trajectories of patient-reported outcomes after total hip arthroplasty: results from a Dutch nationwide database. J Bone Joint Surg Am 2019; 101(24): 2175-86. doi: 10.2106/JBJS.19.00234. Husted C E, Husted H, Ingelsrud L H, Nielsen C S, Troelsen A, Gromov K. Are functional outcomes and early pain affected by discharge on the day of surgery following total hip and knee arthroplasty? Acta Orthop 2021; 92(1): 62-6. doi: 10.1080/17453674.2020.1836322. Jakobsen T L, Kehlet H, Husted H, Petersen J, Bandholm T. Early progressive strength training to enhance recovery after fast-track total knee arthroplasty: a randomized controlled trial. Arthritis Care Res (Hoboken) 2014; 66(12): 1856-66. doi: 10.1002/acr.22405. Klapwijk L C, Mathijssen N M, Van Egmond J C, Verbeek B M, Vehmeijer S B. The first 6 weeks of recovery after primary total hip arthroplasty with fast track. Acta Orthop 2017; 88(2): 140-4. doi: 10.1080/17453674.2016.1274865. Larsen K, Hansen T B, Soballe K, Kehlet H. Patient-reported outcome after fast-track knee arthroplasty. Knee Surg Sports Traumatol Arthrosc 2012; 20(6): 1128-35. doi: 10.1007/s00167-012-1919-4.

607

Luna I E, Kehlet H, Peterson B, Wede H R, Hoevsgaard S J, Aasvang E K. Early patient-reported outcomes versus objective function after total hip and knee arthroplasty: a prospective cohort study. Bone Joint J 2017; 99-B(9): 1167-75. doi: 10.1302/0301-620X.99B9.BJJ-20161343.R1. Melnic C M, Paschalidis A, Katakam A, Bedair H S, Heng M, Committee MGBAP-ROW. Patient-reported mental health score influences physical function after primary total knee arthroplasty. J Arthroplasty 2021; 36(4): 1277-83 doi: 10.1016/j.arth.2020.10.031. Mercurio M, Gasparini G, Carbone E A, Galasso O, Segura-Garcia C. Personality traits predict residual pain after total hip and knee arthroplasty. Int Orthop 2020; 44(7): 1263-70. doi: 10.1007/s00264-020-04553-6. Murray D W, Fitzpatrick R, Rogers K, Pandit H, Beard D J, Carr A J, Dawson J. The use of the Oxford hip and knee scores. J Bone Joint Surg Br 2007; 89(8): 1010-4. doi: 10.1302/0301-620X.89B8.19424. Perruccio A V, Stefan Lohmander L, Canizares M, Tennant A, Hawker G A, Conaghan P G, Roos E M, Jordan J M, Maillefert J F, Dougados M, Davis A M. The development of a short measure of physical function for knee OA KOOS-Physical Function Shortform (KOOS-PS): an OARSI/ OMERACT initiative. Osteoarthritis Cartilage 2008; 16(5): 542-50. doi: 10.1016/j.joca.2007.12.014. Schotanus M G M, Bemelmans Y F L, Grimm B, Heyligers I C, Kort N P. Physical activity after outpatient surgery and enhanced recovery for total knee arthroplasty. Knee Surg Sports Traumatol Arthrosc 2017; 25(11): 3366-71. doi: 10.1007/s00167-016-4256-1. Singh J A, Luo R, Landon G C, Suarez-Almazor M. Reliability and clinically important improvement thresholds for osteoarthritis pain and function scales: a multicenter study. J Rheumatol 2014; 41(3): 509-15. doi: 10.3899/ jrheum.130609. Sorel J C, Veltman E S, Honig A, Poolman R W. The influence of preoperative psychological distress on pain and function after total knee arthroplasty: a systematic review and meta-analysis. Bone Joint J 2019; 101B(1): 7-14. doi: 10.1302/0301-620X.101B1.BJJ-2018-0672.R1. Sorel J C, Overvliet G M, Gademan M G J, den Haan C, Honig A, Poolman R W. The influence of perioperative interventions targeting psychological distress on clinical outcome after total knee arthroplasty. Rheumatol Int 2020; 40(12): 1961-86. doi: 10.1007/s00296-020-04644-y. Tolk J J, Janssen R P A, Haanstra T M, van der Steen M C, BiermaZeinstra S M A, Reijman M. The influence of expectation modification in knee arthroplasty on satisfaction of patients: a randomized controlled trial. Bone Joint J 2021; 103-B(4): 619-26. doi: 10.1302/0301-620X.103B4.BJJ2020-0629.R3. Tristaino V, Lantieri F, Tornago S, Gramazio M, Carriere E, Camera A. Effectiveness of psychological support in patients undergoing primary total hip or knee arthroplasty: a controlled cohort study. J Orthop Traumatol 2016; 17(2): 137-47. doi: 10.1007/s10195-015-0368-5. Twisk J W R. Applied longitudinal data analysis for epidemiology: a practical guide. Cambridge: Cambridge University Press; 2003. van Egmond J C, Verburg H, Mathijssen N M. The first 6 weeks of recovery after total knee arthroplasty with fast track. Acta Orthop 2015; 86(6): 708-13. doi: 10.3109/17453674.2015.1081356. van Egmond J C, Hesseling B, Melles M, Vehmeijer S B W, van Steenbergen L N, Mathijssen N M C, Porsius J T. Three distinct recovery patterns following primary total knee arthroplasty: Dutch arthroplasty register study of 809 patients. Knee Surg Sports Traumatol Arthrosc 2021; 29(2): 529-39. doi: 10.1007/s00167-020-05969-8. Wood T J, Gazendam A M, Kabali C B, Hamilton Arthroplasty Group. Postoperative outcomes following total hip and knee arthroplasty in patients with pain catastrophizing, anxiety, or depression. J Arthroplasty 2021. Online ahead of print. doi: 10.1016/j.arth.2021.02.018.


608

Acta Orthopaedica 2021; 92 (5): 608–614

A roadmap to surgery in osteogenesis imperfecta: results of an international collaboration of patient organizations and interdisciplinary care teams Ralph J SAKKERS 1, Kathleen MONTPETIT 2, Argerie TSIMICALIS 2,3, Thomas WIRTH 4, Marjolein VERHOEF 1, Reginald HAMDY 2, Jean A OUELLET 2, Rene M CASTELEIN 1, Chantal DAMAS 2, Guus J JANUS 5, Wouter H NIJHUIS 1, Leonardo PANZERI 6, Simona PAVERI 6, Dagmar MEKKING 7, Kelly THORSTAD 2, and Richard W KRUSE 8 1 University

Medical Center Utrecht, Utrecht, The Netherlands; 2 Shriners Hospitals for Children®-Canada, Montreal, Canada; 3 Ingram School of Nursing, Faculty of Nursing and Health Sciences, McGill University, Montreal, Canada; 4 Olga Hospital, Klinikum Stuttgart, Stuttgart, Germany; 5 Isala Clinics, Zwolle, The Netherlands; 6 Osteogenesis Imperfecta Federation Europe, Eindhoven, The Netherlands; 7 Care4BrittleBones Foundation, Wassenaar, The Netherlands; 8 Nemours/A.I. duPont Hospital for Children, Wilmington, DE, USA Correspondence: r.sakkers@umcutrecht.nl Submitted 2020-12-02. Accepted 2021-05-03.

Background and purpose — Involvement of patient organizations is steadily increasing in guidelines for treatment of various diseases and conditions for better care from the patient’s viewpoint and better comparability of outcomes. For this reason, the Osteogenesis Imperfecta Federation Europe and the Care4BrittleBones Foundation convened an interdisciplinary task force of 3 members from patient organizations and 12 healthcare professionals from recognized centers for interdisciplinary care for children and adults with osteogenesis imperfecta (OI) to develop guidelines for a basic roadmap to surgery in OI. Methods — All information from 9 telephone conferences, expert consultations, and face-to-face meetings during the International Conference for Quality of Life for Osteogenesis Imperfecta 2019 was used by the task force to define themes and associated recommendations. Results — Consensus on recommendations was reached within 4 themes: the interdisciplinary approach, the surgical decision-making conversation, surgical technique guidelines for OI, and the feedback loop after surgery. Interpretation — The basic guidelines of this roadmap for the interdisciplinary approach to surgical care in children and adults with OI is expected to improve standardization of clinical practice and comparability of outcomes across treatment centers.

Expert consensus remains the best available method for guiding surgical care in most rare diseases, due to the relative lack of evidence-based practices. With a prevalence between 1:10,000 and 1:20,000, osteogenesis imperfecta (OI) is a rare genetic disease affecting the quality and quantity of collagen I. Not only bone with frequent fractures and deformities, but all tissues containing collagen I are affected (Marini et al. 2017, Chougui et al. 2020). The somewhat unpredictable phenotypic variability of the disease is often grouped according to the clinical Sillence classification I–V (Van Dijk and Sillence 2014). However, each patient is unique not only in impairments but also in treatment needs. The most severe type III has the weakest bone and not all these individuals reach the level of standing and walking. Many patients undergo surgery more than once. On the initiative of the Osteogenesis Imperfecta Federation Europe (OIFE) and the Care4BrittleBones (Care4BB) Foundation, an international interdisciplinary task force was invited to create a roadmap for a standardized, integrated approach for optimal outcomes of surgery, not only from a surgical view, but also from the patient’s perspective.

Methods The international interdisciplinary task force included members from European patient organizations and 12 healthcare professionals (HCPs) in orthopedic surgery, rehabilitation medicine, and nursing from centers recognized worldwide as leaders in the interdisciplinary care of OI. The task force developed a survey on issues around OI surgery (defined and discussed by the members) who then consulted other experts worldwide. All the responses, and the subsequent group dis-

© 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group, on behalf of the Nordic Orthopedic Federation. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits ­unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. DOI 10.1080/17453674.2021.1941628


Acta Orthopaedica 2021; 92 (5): 608–614

609

PATIENT

Osteogenesis Imperfecta Federation Europe and Care4BrittleBonesFoundation

Language and comprehension Medical and life needs Realistic expectations SURGERY Social, culture, and beliefs Readiness and commitmentt Infrastructure and support

Task force Members and experts respond to survey 1st draft recommendations

Patient’s view

Spine

Lower extremity

Upper extremity

Decision making

Feedback loop

International Conference for Quality of Life for Osteogenesis Imperfecta in Amsterdam, Netherlands in November 2019 2nd draft recommendations Interdisciplinary approach

Decision making conversation

Surgical technique guidelines

HEALTH CARE PROFESSIONAL

Feedback loop

Endorsement by reference organisations Roadmap to surgery in OI

Figure 1. Process to develop a roadmap to surgery in osteogenesis imperfecta with international collaboration of patient organizations and interdisciplinary care teams.

cussions among the task force members via 9 conference calls, formed the consensus expert opinion. A set of recommendations for surgical care was then drafted and discussed at a day-long workshop during the International Conference for Quality of Life for Osteogenesis Imperfecta in Amsterdam, the Netherlands in November 2019. The recommendations were subsequently circulated to members of the Study Group on Genetics and Metabolic Diseases of the European Paediatric Orthopaedic Society and the OIFE board for endorsement (Figure 1).

Results 4 themes were identified to guide this roadmap for surgical care in OI: the interdisciplinary approach, the surgical decision-making conversation, surgical technique guidelines for OI, and the feedback loop after surgery. 1. The interdisciplinary approach Decisions around any surgical procedure are best made involving the patient/family, the team of physicians (preferably an orthopedic surgeon, a pediatrician and/or an endocrinologist, a consultant in rehabilitative medicine, and an anesthesiologist), and other HCPs (nurses, occupational therapist, physiotherapist, psychologist, and social worker) (Figure 2). Patient view: • The patient’s ability to obtain, understand, and use healthcare information for making decisions (health literacy), and the

Preoperative medical treatment Comorbidities Nutrition Infrastructure assessment Surgeon’s experience

Figure 2. Elements of an interdisciplinary approach for surgical decision-making.

concerns of the patient/family with regard to their functional activity limitations should be evaluated. The patient must feel free to express doubts, fears, and goals. The expected outcome from any intervention must be balanced with what is achievable surgically and how the expected and achievable results relate to autonomy, well-being, and esthetics. • Integrate the social/cultural/belief system in the process and outcomes. • Evaluate patient/parent compliance with regard to postoperative rehabilitation needs. • Evaluate support, infrastructure, and home environment with regard to extended family and rehabilitation, schooling/vocational training and employment related factors. Support from an OI patient organization might be beneficial. HCP checklist: • Bisphosphonates: Optimize bone quality in growing children if needed. For adults, bisphosphonate treatment should be customized to the individual. • Comorbidities: Any medical conditions that require monitoring should be addressed first. • Nutrition: Strive for an optimal nutritional/metabolic status before surgery. • Infrastructure assessment: Necessary pre- and postoperative infrastructure (medical/psychosocial/environmental) should be verified. • Specific perioperative care for patients with OI (Rothschild et al. 2018, Beethe et al. 2020) should include: – screening on cardiac issues (risk of mitral valve prolapse/ aortic root dilatation) and chronic pain; – careful positioning with adequate padding; – avoid succinylcholine (fasciculations can cause fractures); – assess for dentinogenesis imperfecta; – use videolaryngoscopy if needed; – avoid hyperextension of the neck; – use adequate postoperative pain management to avoid chronic pain syndrome (higher risk in patients with OI; Beethe et al. 2020); – consider the use of tranexamic acid (10 mg/kg bolus + 10 mg/kg/hour) for surgeries at risk of blood loss (osteotomies) and ensure X-match available. • Surgeon’s experience: All surgeons must ask themselves if they have the knowledge and infrastructure to handle all possible complications.


610

a

Acta Orthopaedica 2021; 92 (5): 608–614

b

c

d

Figure 3. a. 15-year-old female with OI type III, treated with bisphosphonates, ambulatory with multiple fractures over a pre-existing flexible nail. b. Lateral closing wedge at the CORA, 4.5 mm flexible nail together with 6-hole 3.5 mm locking plate for length and rotational stability. c. Union without recurrent fractures or deformity at 2 months’ follow-up. d. Union without recurrent fractures or deformity at 2 years and 5 months’ follow-up.

2. The surgical decision-making conversation Individual characteristics of the patient, the family structure, phase of life, and availability of an infrastructure for surgical care should lead to an individual approach for each case. The following key points were formulated based on the expertise and experience of the task force and experts, preliminary research, discussions in adult patient groups, and the outcomes of interviews with patients and involved family members. • Transparent respectful partnership in shared decision-making with the patient, family (for minors), and the surgical care team. • A thorough evaluation of fractures, deformities, and functional needs; and a structured questionnaire or interview to explore the importance of the goals, patient’s satisfaction, and expectations are needed (Law et al. 1990). • Find support and information from specialized consults, second opinions, and patient advocacy groups in case the patients/family’s goals, expectations, and priorities cannot be matched. • Decision-making drivers and structures might vary in cultural environments with different values, including the role of the decision-maker. If necessary, use a professional cultural interpreter/translator to include the patient’s cultural values and the local regulations in the informed consent procedures. • The final decision taken by the patient/family should be respected. In cases of harm and neglect, the appropriate pathways should be followed.

Surgical technique guidelines for OI: Disease specific details General • Use single or multilevel multidirectional osteotomies at the apices of deformities of long bones and implantation of intramedullary (IM) rods for stabilization. • For acute fractures, IM implants are preferred for fixation after closed or open reduction. • Avoid oversizing rods to avoid stress shielding and subsequent bone loss. Elongating implants or constructs for stable longitudinal growth are usually preferred. Fixed-length devices can be used as an alternative, especially in adults or when bone size is small or lengthening devices are not available. • Use bony shortening with correction of deformities. • Occasionally additional soft-tissue release or muscle lengthening is necessary to allow correction of excessive contractures. • Closed osteoclasis might minimize blood loss and periosteal disruption. • Consider supplemental plating of bones after IM nailing for rotational control and non-unions, especially in older patients (Figure 3). • Avoid plates/screws as stand-alone implants to avoid stress fractures at the edges of the plates/screws. • Consider bone grafting for any bony defects. Allograft bone is preferable over autogenous graft to preserve the maximum amount of bone.


Acta Orthopaedica 2021; 92 (5): 608–614

611

a

a Figure 4. a. Preoperative radiographs of a 6-yearold female with OI type III with bilateral anterolateral deformity of the femur and tibia and tibial pseudoarthrosis. Repeated fractures, inability to weight bear or ambulate with unaffected upper extremities indicated candidacy for IM rodding. b. Postoperative radiographs. Femur: correction of malalignment with telescoping rods. Proximal female threads are inserted in greater trochanter not crossing the trochanter apophysis. Tibia: female threads are positioned into the proximal tibial epiphysis, not crossing the physis. Threads of the male rod are inserted in the distal tibia epiphysis not protruding into the ankle.

b

b

• Immobilization postoperatively in backslab or plaster cast is followed by initiation of mobilization as soon as healing permits. Children with no previous experience ambulating may require some bracing and/or walking aids. In adult patients who have solid intramedullary nailing after transverse fractures or osteotomies, full weight-bearing is allowed without the use of a brace or plaster. Oblique fractures with nonsolid fixation can be treated with partial weight bearing with or without brace or plaster. Lower extremity (LE) surgery in OI • Recurrent fractures, anticipated progressive deformity, and/ or expected improved function and ambulation after surgery are indications for surgical alignment and increasing bone stability with IM implants (Figure 4). • Operating maximal 2 bone segments in the same session is preferred by most surgeons. Upper extremity (UE) surgery in OI. • In case of deformity in both the humerus and forearm, correct and stabilize the humerus first before correcting the forearm.

Figure 5. a. Severe deformity of the upper extremity in an 8-year-old patient with OI type III. Double osteotomy of the humerus and telescopic rodding was planned plus a double corrective radius and ulna osteotomy with a K-wire fixation inserted from opposite sides through the growth plates, thus allowing for telescoping. b. Postoperative follow-up at 5 years. The amount of telescoping of the nail in the humerus corresponds with length without K-wire in radius and ulna due to growth.

• When bilateral surgery is indicated, a “one-side-after-theother” approach is good practice. • The distal humerus always needs a 3-dimensional correction. If the construct lacks stability, one option is to immobilize the extremity for 3–4 weeks in a plaster to keep the correction during consolidation. • Sandwich constructions with allograft are advised for humeri with a very small diaphyseal diameter and nonunions. • In forearm surgery, the application of K-wires from opposite directions (antegrade in the ulna, retrograde in the radius) allows for better stabilization during growth (Figure 5). • With radial head dislocation, operate only for major impairments, severe pain, or if function is markedly limited. Spine surgery in OI • Evaluate occiput C1–C2 anatomy before any surgical treatment of the spine. • 3D planning of the spine with CT scan is helpful for surgical planning.


612

Acta Orthopaedica 2021; 92 (5): 608–614

a Figure 6. a. Preoperative radiograph of a 15-year-old patient with OI type III with progressive scoliosis. b. Note the decreased space for pedicle screws in the codfish vertebrae and increased diameter of the intervertebral disks. Augmentation with a sublaminar wire at level L4.

• A Cobb angle of 45°–50° is usually an indication for surgery. In severe OI, early surgery with Cobb angles still under 40° might be advisable as progression of scoliosis will most likely occur. Surgery is relatively easier for the patient and the surgeon, with better outcomes and lower risk of complications. • Pulmonary function is often compromised in scoliosis and should be measured longitudinally. • Fusion of at least the entire curve should be considered to prevent progressive residual deformity outside the fused segment. • Decision-making for including pelvic fixation (or not) should include the preoperative ambulatory status, presence or not of L5 pars fracture (spondylolisthesis), and quality of the distal fixation. • Preoperative (4–12 weeks) and intraoperative traction (intraoperative includes cranial and skin leg traction) can optimize deformity correction and reduce the stress on the spinal implants during surgical reduction. • Cranial traction when used should involve 6–10 pins. • Augmentation of the construct using sublaminar wires can help reduce the stress over pedicle screws, or the wires add fixation on points with impossible pedicle screw insertion (Figure 6). • Use of a postoperative brace (6–12 weeks) might be considered. • In symptomatic cranio-cervical deformities, neurological structures should be decompressed, and occipital-cervical fusion should be done. Realignment and fusion for basilar invagination is usually done through a posterior approach. If symptomatic spinal cord compression persists, additional direct anterior decompression might be considered.

b

Preoperatively Measure Set goals with patient Plan

Feedback Analyze Modify

Review goals and outcomes with team and patient

SURGERY

Postoperatively Measure Compare pre and post

Figure 7. Feedback loop for OI surgery.

Feedback loop after surgery (Figure 7) Pre- and postoperative standardized outcome measurements to link the expected outcomes to the actual outcomes in a feedback loop not only measures the success of patient and clinical outcomes, but also regularly monitors the efficiency and effectiveness of the team’s processes and practices and contributes to global research efforts to continuously establish best practices. A feedback loop with both clinically reported and patientreported outcomes should include:


Acta Orthopaedica 2021; 92 (5): 608–614

• Preoperatively: standardized measurements of deformities, pain, function, and quality of life (QoL), and realistic goals and expectations with the patient/family. • Postoperatively: reassessment of deformities, pain, function, QoL for comparison with preoperative measures, and a determination of whether goals were met. Time should be taken to discuss with the patient his/her overall satisfaction and compare this with the views of the surgical team. • The recently published Key4OI Standard Set of Core Outcome Measures for OI is highly recommended to ensure global research is possible (Nijhuis et al. 2021).

Discussion These recommendations for the surgical care of patients with OI, recognizing both patient and HCP concerns, represent a summary of the expert opinion of an international task force of 15 “OI experts,” including representatives from patient organizations. For optimizing bone quality, bisphosphonates have routinely been used in patients with OI in the last 20 years (Dwan et al. 2014), and a consensus for their use in children and adolescents was recently published (Simm et al. 2018). Positive and negative effects of bisphosphonates on fracture rate, osteotomy healing, curve behavior in scoliosis, and cranial base pathology, and the effect of discontinuation of bisphosphonates around surgery, remain a topic for discussion due to possible study design biases (Anissipour et al. 2014, Dwan et al. 2014, Ng et al. 2014, Arponen et al. 2015, Simm et al. 2018, Ralston and Gaston 2020). Nevertheless, the task force agreed that ensuring optimal bone quality is critical for surgical intervention and treatment should be tailored to the surgical plan. Bisphosphonate regimes should be different in a growing skeleton compared with treating an adult skeleton (Brizola and Shapiro 2015, Simm et al. 2018). Health literacy assessment varies in different parts of the world, due to diverse social cultures and educational levels. Understanding the social and cultural backgrounds of the patient/family, avoiding violation of societal norms as much as possible, is considered paramount. The hierarchy in families, specific gender roles in different cultures, social acceptance of disease, and attribution of disease to various non-medical causes are important factors (Kahissay et al. 2017). Thresholds for patients/families to seek medical help as a result of these factors might be a major influence in the surgical process. When physicians or the primary sources of medical care are not available, effective primary medical care might be organized by mothers, elders, or alternative medical practitioners. Surgery-specific guidelines for OI are focused on diseasespecific fractures, deformities, and pain and should lead to improvement of function, participation, and QoL. The severity of bony deformity in the lower limb is associated with

613

fracture risk (Caouette et al. 2014). Adequate timing of surgery for deformities of the skeleton is always a challenge. There are no distinct age limitations on surgery. Lower extremity rodding is often indicated when a child with significant bowing attempts to stand. Operating on children less than 2 years old with small bone size is technically demanding and may result in complications but may be indicated due to a high fracture rate. The number of long bones that are operated on in one session should depend on the strategy and implants that work in the surgeon’s infrastructure. In the growing skeleton, telescoping rods are usually preferred over solid rods. However, the frequency of reoperation due to rod migration and telescoping failure has not improved much in recent decades. To date, revision rates, between 30% and 50%, and re-revision rates around 30% within 5 years of follow-up, have been reported with the current elongating devices (Azzam et al. 2018). Upper extremity surgery in OI should specifically focus on self-care activities (dressing, hygiene), daily activities (propelling wheelchair or walker, computer access) and participation in school/work/leisure life for increased autonomy of the individual (Khoshhal and Ellis 2001, Mueller et al. 2018). Prior to UE surgery, evaluation by an occupational therapist can assist in identifying strengths and impairments. This is useful for clarifying surgical goals. Upper extremity surgery in OI is more demanding due to the size of the bones, making them more difficult to align with intramedullary devices (Wirth 2019). The use of assistive devices or environmental modifications could resolve functional problems and should be tried prior to surgical options. The interdisciplinary team approach as described is especially indicated in the management of spinal disorders in OI. Progressive scoliosis, cranio-cervical deformities, and spondylolisthesis are the most common spine deformities in OI for which ample experience and training is needed (Wallace et al. 2017, Castelein et al. 2019). One of the indications for scoliosis surgery is to prevent deterioration of lung function. Yet the correlation between scoliosis (Cobb angle) and pulmonary function is known to be weak and OI intrinsic factors and chest wall deformations play a more important role (Bronheim et al. 2019). The risk for progression of scoliosis in hyperlax patients remains unclear (Engelbert et al. 1998). Basilar invagination is the most common cranio-cervical deformity in patients with OI (Arponen et al. 2015). Sleep studies can help detect nocturnal episodes of apnea in cases with potential brain stem compression. Surgery increasingly plays an important role, but controversy remains as to whether asymptomatic patients with radiological basilar invagination should be operated on to prevent neurology (Wallace et al. 2017, Castelein et al. 2019). In general, adaptability of the surgical plan, skill, and experience in the wide variability of OI is critical. Defining “success” is patient specific. Perceived improvement by the patient, and measurable improvement, should include objective clini-


614

cal measures, patient-reported measures, and goal-attainment tools (Kiresuk et al. 1994) to implement a feedback loop for evidence-based care improvement. The new Key4OI Standard Set of Core Outcome Measures for OI was designed for this purpose and its broad implementation will allow comparative research and value-based healthcare reform (Nijhuis et al. 2021). Quantitative outcome measures and qualitative insights should be weighed in defining improvement or success. Conclusion This roadmap to surgery in OI, initiated by and created with patient organizations and in collaboration with international interdisciplinary expert care teams, is a set of guidelines for optimizing surgical care in OI.

RS, KM, AT, TW, RK: conceptualization, data curation, investigation, methodology, visualization writing review and editing. MV, RH, JO, RC, CD, GJ, WN, LP, SP, DM, KT: conceptualization, data curation, investigation methodology, review. The authors gratefully acknowledge the following surgeons for their comments representing OI surgical practices from North America, Europe and Asia: D Anticevic, D Eastwood, P Esposito, F Fassier, R Ganger, C Hasler, F Hellenius, I Hvid, C Janelle, V Kenis, M Kruyt, P Lascombes, D Popkov, and M To. This roadmap to surgery in OI is endorsed by the Study Group Genetics & Metabolic Diseases of the European Paediatric Orthopaedic Society and by the Executive Committee of the Osteogenesis Imperfecta Federation Europe and the Care4BrittleBones Foundation. Acta thanks Klaus Dieter Parsch for help with peer review of this study.

Anissipour A K, Hammerberg K W, Caudill A, Kostiuk T, Tarima S, Zhao H S, Krzak J J, Smith P A. Behavior of scoliosis during growth in children with osteogenesis imperfecta. J Bone Joint Surg Am 2014; 96: 237-43. Arponen H, Vuorimies I, Haukka J, Valta H, Waltimo-Sirén J, Mäkitie O. Cranial base pathology in pediatric osteogenesis imperfecta patients treated with bisphosphonates. J Neurosurg Pediatr 2015; 15: 313-20. Azzam K A, Rush E T, Burke B R, Nabower A M, Esposito P W. Midterm results of femoral and tibial osteotomies and Fassier-Duval nailing in children with osteogenesis imperfecta. J Pediatr Orthop 2018; 38: 331-6. Beethe A R, Bohannon N A, Ogun O A, Wallace M J, Esposito P W, Lockhart T J, Hamlin R J, Williams J R, Goeller J K. Neuraxial and regional anesthesia in surgical patients with osteogenesis imperfecta: a narrative review of literature. Reg Anesth Pain Med 2020; 45: 993-9. Brizola E, Shapiro J R. Bisphosphonate treatment of children and adults with osteogenesis imperfecta: unanswered questions. Calcif Tissue Int 2015; 97: 101-3. Bronheim R, Khan S, Carter E, Sandhaus R A, Raggio C. Scoliosis and cardiopulmonary outcomes in osteogenesis imperfecta patients. Spine 2019; 44: 1057-63.

Acta Orthopaedica 2021; 92 (5): 608–614

Caouette C, Rauch F, Villemure I, Arnoux P J, Gdalevitch M, Veilleux L N, Heng J L, Aubin C É. Biomechanical analysis of fracture risk associated with tibia deformity in children with osteogenesis imperfecta: a finite element analysis. J Musculoskelet Neuronal Interact 2014; 14: 205-12. Castelein R M, Hasler C, Helenius I, Ovadia D, Yazici M; EPOS Spine Study Group. Complex spine deformities in young patients with severe osteogenesis imperfecta: current concepts review. J Child Orthop 2019; 13: 22-32. Chougui K, Addab S, Palomo T, Morin S N, Veilleux L N, Bernstein M, Thorstad K, Hamdy R, Tsimicalis A. Clinical manifestations of osteogenesis imperfecta in adulthood: an integrative review of quantitative studies and case reports. Am J Med Genet A 2020; 182: 842-65. Dwan K, Phillipi CA, Steiner R D, Basel D. Bisphosphonate therapy for osteogenesis imperfecta. Cochrane Database Syst Rev 2014; (7): CD005088. Engelbert R H, Gerver W J, Breslau-Siderius L J, van der Graaf Y, Pruijs H E, van Doorne J M, Beemer F A, Helders P J. Spinal complications in osteogenesis imperfecta: 47 patients 1–16 years of age. Acta Orthop Scand 1998; 69: 283-6. Kahissay M H, Fenta T G, Boon H. Beliefs and perception of ill-health causation: a socio-cultural qualitative study in rural North-Eastern Ethiopia. BMC Public Health 2017; 17: 124. Khoshhal K I, Ellis R D. Functional outcome of Sofield procedure in the upper limb in osteogenesis imperfecta. J Pediatr Orthop 2001; 21: 236-7. Kiresuk T J, Smith A, Cardillo J E. Goal Attainment scaling: applications theory, and measurement. Hillsdale, NJ: Lawrence Erlbaum Associates; 1994. Law M, Baptiste S, McColl M, Opzoomer (Carswell) A, Polatajko H, Pollock N. The Canadian occupational performance measure: an outcome measure for occupational therapy. Can J Occup Ther 1990; 57: 82-7. Marini J C, Forlino A, Bächinger H P, Bishop N J, Byers P H, Paepe A, Fassier F, Fratzl-Zelman N, Kozloff K M, Krakow D, Montpetit K, Semler O. Review: Osteogenesis Imperfecta. Nat Rev Dis Primers 2017; 3: 17052. Mueller B, Engelbert R, Baratta-Ziska F, Bartels B, Blanc N, Brizola E, Fraschini P, Hill C, Marr C, Mills L, Montpetit K, Pacey V, Molina M R, Schuuring M, Verhille C, de Vries O, Yeung E H K, Semler O. Consensus statement on physical rehabilitation in children and adolescents with osteogenesis imperfecta. Orphanet J Rare Dis 2018; 13: 158. Ng A J, Yue B, Joseph S, Richardson M. Delayed/non-union of upper limb fractures with bisphosphonates: systematic review and recommendations. ANZ J Surg 2014; 84: 218-24. Nijhuis W, Franken A, Ayers K, Damas C, Folkestad L, Forlino A, Fraschini P, Hill C, Janus G, Kruse R, Wekre L, Michiels L, Montpetit K, Panzeri L, Porquet-Bordes V, Rauch F, Sakkers R, Salles J P, Semler O, Sun J, To M, Tosi L, Yangyang Y, Yeung E, Zhitnik L, Zillekens C, Verhoef M. A standard set of outcome measures for the comprehensive assessment of osteogenesis imperfecta. Orphanet J Rare Dis 2021; 16(1): 140. Ralston S H, Gaston M S. Management of osteogenesis imperfecta. Front Endocrinol (Lausanne) 2020; 10: 924. Rothschild L, Goeller J K, Voronov P, Barabanova A, Smith P. Anesthesia in children with osteogenesis imperfecta: retrospective chart review of 83 patients and 205 anesthetics over 7 years. Pediatr Anesth 2018; 28: 1050–5. Simm P J, Biggin A, Zacharin M R, Rodda C P, Tham E, Siafarikas A, Jefferies C, Hofman P L, Jensen D E, Woodhead H, Brown J, Wheeler B J, Brookes D, Lafferty A, Munns C F; APEG Bone Mineral Working Group. Consensus guidelines on the use of bisphosphonate therapy in children and adolescents. J Paediatr Child Health 2018; 54: 223-33. Van Dijk F S, Sillence D O. Osteogenesis imperfecta: clinical diagnosis, nomenclature and severity assessment. Am J Med Genet A 2014; 164A: 1470-81. Wallace M J, Kruse R W, Shah S A. The spine in patients with osteogenesis imperfecta. J Am Acad Orthop Surg 2017; 25: 100-9. Wirth T. The orthopaedic management of long bone deformities in genetically and acquired generalized bone weakening conditions. J Child Orthop 2019; 13: 12-23.


Acta Orthopaedica 2021; 92 (5): 615–620

615

Compensation claims in pediatric orthopedics in Norway between 2012 and 2018: a nationwide study of 487 patients Joachim HORN 1,2, Hanne RASMUSSEN 3, Ida Rashida Khan BUKHOLM 4,5, Olav RØISE 1,2,6, and Terje TERJESEN 1 1 Division

of Orthopaedic Surgery, Section of Children’s Orthopaedics and Reconstructive Surgery, Oslo University Hospital; 2 Institute of Clinical Medicine, University of Oslo, Oslo; 3 Department of Orthopaedics, University of Northern Norway; 4 The Norwegian System of Patient Injury Compensation; 5 The Norwegian University of Life Science; 6 University of Oslo and Faculty of Health Sciences, SHARE – Centre for Resilience in Healthcare, University of Stavanger, Stavanger, Norway Correspondence: jhorn@ous-hf.no Submitted 2021-01-18. Accepted 2021-04-26.

Background and purpose — In Norway all compensation claims based on healthcare services are handled by a government agency (NPE, Norsk Pasientskade Erstatning). We provide an epidemiological overview of claims within pediatric orthopedics in Norway, and identify the most common reasons for claims and compensations. Patients and methods — All compensation claims handled by NPE from 2012 to 2018 within pediatric orthopedics (age 0 to 17 years) were reviewed. Data were analyzed with regard to patient demographics, diagnoses, type of injury, type of treatment, reasons for granted compensation, and total payouts. Results — 487 compensation claims (259 girls, 228 boys) within orthopedic surgery in patients younger than 18 years at time of treatment were identified. Mean age was 12 years (0–17). 150 out of 487 claims (31%) resulted in compensation, including 79 compensations for inadequate treatment, 58 for inadequate diagnostics, 12 for infections, and 1 based on the exceptional rule. Total payouts were US$ 8.45 million. The most common primary diagnoses were: upper extremity injuries (26%), lower extremity injuries (24%), congenital malformations and deformities (12%), spine deformities (11%), disorders affecting peripheral joints (9%), chondropathies (6%), and others (12%). Interpretation — Most claims were submitted and granted for mismanagement of fractures in the upper and lower extremity, and mismanagement of congenital malformations and disorders of peripheral joints. Knowledge of the details of malpractice claims should be implemented in educational programs and assist pediatric orthopedic surgeons to develop guidelines in order to improve patient safety and quality of care.

Patient injuries due to medical care are a large burden for patients and the healthcare system (OECD 2020) leading to increasing attention on patient safety and prevention of medical errors. In a study by de Vries et al. (2008) the median overall incidence of in-hospital adverse events was 9%, many of these being preventable. In Norway all compensation claims based on public and private healthcare services are handled by a government agency, the Norwegian System of Patient Injury Compensation (NPE). One of the tasks of NPE is to contribute with statistical data to improve quality of care and to prevent patient harm. Recent annual reports show that NPE received 5,695 claims in 2020. In the same year 4,917 decisions were made by NPE, and of these 1,481 (30%) resulted in compensation with total payouts of US$ 135 million. Orthopedic surgery accounted for nearly one-third (n = 1,443) of all claims that were decided in 2020 and 30% (n = 445) of all claims that were granted, resulting in total payouts of US$ 30 million. Thus, orthopedic surgery is at especially high risk of claims and there is evidence that claims within the field of pediatric orthopedics are more likely to result in payment compensation than adult cases (Orosco et al. 2012, Oetgen and Parikh 2016). The substantial number of recent publications on medical errors reflect the increasing attention on patient safety, and in 2011 the Norwegian Ministry of Health and Care Services launched “The Norwegian Patient Safety Program: In safe Hands,” a campaign with the aim of reducing patient harm and improving patient safety. This campaign emphasizes that patient injuries are preventable, that increasing attention should be given to patient safety, and that hazards and risks should be identified. Several papers within different medical subspecialties have been published based on data from the registry provided by

© 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group, on behalf of the Nordic Orthopedic Federation. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits ­unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. DOI 10.1080/17453674.2021.1932922


616

the Norwegian System of Patient Compensation (Kongsgaard et al. 2016, Desserud et al. 2017, Fornebo et al. 2017, Norum et al. 2018, Randsborg et al. 2018). So far, no studies have emerged, based on data from NPE, to analyze compensation claims within the field of pediatric orthopedics. In literature indexed in Medline, only 2 studies evaluated patient injuries in pediatric orthopedics (Oetgen and Parikh 2016, Galey et al. 2019). However, these studies provide only an epidemiological overview of the claims. None of the studies evaluated the specific problems that caused the patient injuries, and no concrete recommendations could be given on how to prevent patient harm. Thus, further knowledge of the details of compensation claims is required to help the pediatric orthopedic surgeon to improve patient safety and quality of care. This study analyzes all malpractice claims within pediatric orthopedics during a certain time period in order to provide epidemiological data, identify the most common reasons for claims and compensations, provide data on specific diagnosis and procedures that led to claims, and to give some recommendations that might help to prevent future of patient injury events.

Patients and methods All compensation claims in Norway are handled by NPE. The treating healthcare professional is obliged to inform the patient of the right to seek compensation from NPE. According to information provided by NPE (npe.no), claims are eligible for compensation if 3 conditions are met: (1) the patient injury must be due to treatment failure (treatment error or omission), caused by either examination, diagnosis, or treatment (including follow-up); (2) the patient injury must have resulted in financial loss of more than US$ 1,165. Despite no financial loss, compensation can be awarded if the patient has sustained a “permanent” and “significant” injury. Permanent would mean the injury lasts for at least 10 years and significant would mean that medical impairment is at least 15% based on a dedicated invalidity table for injuries (Invalidity table 2020); (3) the patient injury must not be too old. The main rule is that the patient must file the claim within 3 years after realizing that an alleged injury has occurred. All claims based on medical care more than 20 years ago are considered to be expired, no matter whether the patient was aware of the injury or not. However, compensation can also be granted in some exceptional cases even if no error or omission occurred. This rule applies when either infection, which is not caused by the patient’s condition or pre-existing illness, or a particularly severe and unexpected complication occurs. One of the tasks of NPE is to provide a database of all compensation claims and the basic data on each case. Detailed data on the patient’s treatment and complications and the decision on the matter are prospectively entered into a database in NPE. This database was the basis for this study.

Acta Orthopaedica 2021; 92 (5): 615–620

The data was provided in an anonymized version, and no sensitive patient data was included in the database. The database was searched with the terms “orthopedics” and age “0–17 years” for a certain time period (2012–2018). 487 compensation claims within the field of pediatric orthopedics fulfilled the criteria to be evaluated. The number represents all claims within pediatric orthopedics that led to a decision (granted or refused) within the period 2012–2018. The claims were analyzed for the following data: (1) overall number of complaints, patient demographics, age and sex, and number of complaints that resulted in compensation; (2) reason for compensation granted (inadequate examination/diagnostics, inadequate treatment, infection, extensive/major unexpected complication); (3) type of disease/primary injury; (4) conditions with high incidence of compensation claims; (5) geographical region (Northern, Central, Western, South East health region); (6) total payouts. There is no national registry for all pediatric orthopedic conditions in Norway. However, there is a registry for pediatric hip disorders (Barnehofteregisteret 2021) and there is published data available describing the epidemiology of pediatric fractures in a certain area of the country (Randsborg et al. 2013). Data from the pediatric hip registry and extrapolated epidemiological data from the study on pediatric fractures and population data from Statistics Norway (2020) was used to estimate the incidence and risk of claims for certain conditions. Data by Randsborg et al. (2013) is provided for the age group 0–15 years. Therefore, only claims for the same age group were considered for the risk calculation. Ethics, funding, and conflicts of interest Approval by the Regional Ethical Committee (REK) was not required since all data was based on already anonymized records from NPE—data which is provided for quality control studies. The study received no external funding. The authors declare no conflicts of interest.

Results From 2012 to 2018 (inclusive) NPE received 37,584 compensation claims. Data from NPE shows a continuous increase in compensation claims from 1988, when NPE was established, until 2015, and no further increase in the last 5 years (Figure 1). Almost one-third (n = 10,861, 29%) of the claims from 2012 to 2018 were within the field of orthopedics, and the second highest number of claims were received for cancer treatment (n = 5183, 14%) (Table 1). 31,440 claims were processed and completed, and 10,264 (33%) of the claims were granted. Within orthopedics 9,596 decisions on claims were made and 3,548 of these were granted (37%). 487 of the claims within orthopedic surgery decided by NPE in this time period apply to children from 0 to 17 years at the time of treatment, including 259 girls and 228 boys. 478 (98%) claims were based on diagnostics and treatment within


Acta Orthopaedica 2021; 92 (5): 615–620

617

Table 2. All claims listed in groups according to the ICD 10 classification system and number of claims granted. Values are count

Count 7,000

Current study

6,000

Diagnoses (groups)

5,000

4,000

3,000

2,000

1,000

0

1988

1992

1996

2000

2004

2008

2012

2016

2020

Figure 1. Compensation claims received by NPE from 1988 until 2020. White area represents the time period of the current study (2012–2018).

Table 1. Compensation claims received by NPE 2012–2018 according to the 5 most common medical subspecialties Medical subspecialties Orthopedics Oncology Dentistry Psychiatry Gastroenterological surgery

Number (%) 10,861 (29) 5,183 (14) 3,245 (9) 2,179 (6) 1,643 (4)

the public health system, whereas 9 claims applied to the private health system. The claims refer to possible patient injuries that occurred between 1967 and 2017 and the patients submitted their claims at a mean of 6 years (0–49) after injury. Mean age at time of injury was 12 years (0–17). 150 out of 487 claims (31%) resulted in compensation, including 79 compensations for inadequate treatment, 58 for inadequate diagnostics, 12 for infection, and 1 based on the exceptional rule. No compensation was granted for inadequate follow-up. 16 claims were submitted more than 20 years after the possible patient injury occurred. Total payouts were US$ 8.45 million and mean payout for each claim was US$ 56,331. The most common primary diagnoses (Table 2) were acute injuries. Conditions with specifically high rate of compensation claims Fractures in the distal humerus, femoral fractures, scoliosis, developmental dysplasia of the hip (DHH), and slipped capital femoral epiphysis (SCFE) are among those conditions with the highest number of claims submitted to NPE (Table 3). In relation to the incidence of the conditions, femoral fractures show the highest rate of claims granted within fracture

S40–S69 Acute injuries in the upper extremity and T92 sequela (n = 129) S40–S49 Injuries of shoulder and upper arm S50–S59 Injuries of the elbow and forearm S60–S69 Injuries to the wrist and hand T 92 Sequelae S70–S99 Injuries in the lower extremity and T93 sequela (n = 115) S70–S79 Injuries to the hip and thigh S80–S89 Injuries to the knee and lower leg S90–S99 Injuries to the ankle and foot T93 Sequelae Q65–Q79 Congenital malformations and deformations of the musculoskeletal system (n = 59) including M16.2–M16.3 secondary arthritis Q65 Hip dysplasia and sequelae Q66 Congenital deformities of feet Q68–Q79 M40–M54 Deforming dorsopathies, spondylopathies, and other dorsopathies (n = 52) M40–M43 Scoliosis M45–M54 M20–M25 Arthropathies (n = 46) M20 Acquired deformities fingers/toes M21–M22 acquired deformities, patella M23 Internal derangement of knee M24–M25 Other joint disorders M91–M94 Chondropathies (n = 30) M91 Juvenile osteochondrosis pelvis M92 Other juvenile osteochondrosis M93 Other osteochondropathies Others (n = 56) D16, L60, M00, M67, M84, M86, P14, T01, T14 Benign neoplasm, superficial injuries, wounds, pelvic and spine fractures, osteomyelitis

Claims Granted

51 49 18 11

13 14 10 2

24 65 24 2

9 19 6 1

36 8 15

9 1 2

44 8

14 1

8 13 11 14

5 0 7 5

5 5 20

1 1 12

56

18

care (0.4%) and all pediatric hip diseases show a relatively high incidence of accepted claims compared with the total occurrence of the specific conditions (DDH = 2%, SCFE = 3.3%, Perthes disease = 0.2%) (Tables 4 and 5). Geographical distribution of claims (frequency of cases by Health region) In Norway a state enterprise consisting of 4 regional health authorities is responsible for specialist care, including patient treatment, education of medical staff, and research. The 4 health regions represent geographical regions: South and Eastern, Western, Central, and Northern Norway Regional Health Authority. The distribution of claims according to the health region and based on the population within these regions was, by decreasing order: Northern Health Region (13 per 100,000), Western Health Region (9 per 100,000), Central Health Region (7.7 per 100,000) and South and Eastern Health Region (6.6 per 100,000). Thus, the number of claims per 100,000 inhabitants in Northern Norway was about twice that of the South and Eastern Health Region.


618

Acta Orthopaedica 2021; 92 (5): 615–620

Table 3. Most common specific diagnoses and information on the number of submitted and granted claims, and reason for granting in children aged 0–17 years. Values are count Reasons for granted compensation Dia- Treat- ExceptioMost common specific diagnoses Claims Granted gnostics ment nal rule S40–S69 Acute injuries in the upper extremity and T92 sequela (n=129) S42.2 Fractures proximal humerus 5 0 S42.4 Fractures distal humerus 33 9 1 7 1A S52.5 Distal radius fracture 12 1 1 S52.4 Antebrachii fractures 11 2 2 S53.1 Dislocation of elbow 5 2 2 C S62.0 Scaphoid fracture 4 3 2 D 1 S62.3-S62.6 Finger/metacarpal fractures 5 4 1 D 3 2 E, 1 F S63 Subluxation/dislocation, ligament rupture 5 3 2 D 1 F S70–S99 Injuries in the lower extremity S72 Femur fractures 19 7 2 5 3 G, 2 H S82.2 Fracture of shaft of tibia 6 2 2 S82.3 Fracture lower end of tibia 7 4 2 1 D, 1 I 2 1 J, 1 K S82.5; S82.6 Fracture medial or lateral malleolus 5 1 1 S83.5 Sprain of cruciate ligament 13 3 1 1 1A S93 Dislocation and sprain of joints and ligaments at ankle/foot/toe 6 1 1 M40–M54 Deforming dorsopathies, spondylopathies, and other dorsopathies M41.1 Juvenile idiopathic scoliosis 5 1 1 M41.2 Other idiopathic scoliosis 14 5 3 1 L, 2 M 2 A M41.4 Neuromuscular scoliosis 8 2 2 Q65–Q79 Congenital malformations and deformations of the musculoskeletal system Q65.0–Q65.9 uni- or bilateral DDH 25 8 6 2 M20–M25 Arthropathies M20.1 Hallux valgus 8 5 5 4 N,1 O M22.0 Recurrent dislocation patella 5 0 M23.2 Old meniscus tear 4 2 1 D 1A M23.5 Chronic instability 6 4 3 P 1 A M91-M94 Chondropathies M91.1 Perthes disease 5 1 1 M93.0 Slipped capital femoral epiphysis (SCFE) 15 10 9 D 1 Q M93.2 Osteochondritis dissecans 5 2 2 R Detailed information on the causes of compensation for all specific injuries/conditions with > 3 compensations claims submitted to NPE within the time period 2012–2018. A: infection, B: unexpected severe injury, C: delayed diagnosis of accompanying injuries, D: delayed diagnosis, E: malrotation, F: iatrogenic nerve injury, G: malalignment; H: compartment syndromes in contralateral lower leg due to peroperative hemilithotomy positioning, I: delayed diagnosis compartment syndrome, J: inadequate surgery, K: inadequate conservative treatment; L: iatrogenic spinal cord injury, M: lack of follow-up, N: lack of indication, O: delayed secondary surgery, P: surgical errors, Q: not specified, R: inadequate removal of foreign body, 1 inadequate follow-up.

Table 4. Incidence of claims granted in relation to extrapolated occurrence of pediatric fractures in children aged 0–15 years Type of fracture

Claims granted Incidence per Extrapolated no. Incidence in children 105 children per for the of claims 0–15 years 0–15 years a year b study period granted (%)

S42.4 Fractures of the distal humerus S62.3 Finger/metacarpal fractures S52.4 Antebrachii fractures S72 Femur fractures S82.2 Fractures of shaft of tibia a

8 2 2 3 2

14 32 9.5 1 8.7

1,384 3,110 932 98 853

9,688 21,770 6,525 686 5,971

0.08 0.009 0.03 0.4 0.03

Epidemiological data published by Randsborg et al. (2013) based on children aged 0–15 years. To allow for extrapolating of data only claims in children of the same age (0–15 years) are considered in this table. b Incidence of pediatric fractures per 10,000 children (age 0–15 years) was extrapolated to the population of the whole country (981,342 children aged 0–15 years) based on data provided by Statistics Norway (Statistics Norway 2020).

Discussion Our study showed that pediatric orthopedics is associated with a high incidence of compensation claims, which confirms findings by other authors, both from the United States (Oetgen and Parikh 2016, Galey et al. 2019). The granting percentage in our material was 31%, comparable to findings by Oetgen and Parikh (2016), who found 33%. Galey et al. (2019) in a US study found a granting percentage of 51%. In the United States there is no centralized, comprehensive malpractice reporting, and numerous databases exist that differ in scope and reporting details (Galey et al. 2019). Although both studies derived from the United States, Galey et al. (2019) found a much higher median indemnity payment than reported by Oetgen and Parkh (2016), a finding which was attributed to the fact that different databases were used in these studies, which might also explain differences in granting percentage. Claims were granted for inadequate treatment and inadequate diagnosis. In total numbers, upper and lower extremity injuries, arthropathies, chondropathies, and deforming dorsopathies were the most common diagnoses resulting in compensation claims. For those conditions where epidemiological data was available or could be extrapolated, femoral fractures and common pediatric hip disorders (DDH, SCFE, Perthes disease) showed the highest number of claims in relation to the occurrence of these disorders. These findings confirm those by other authors (Oetgen and Parikh 2016, Galey et al. 2019). Pediatric femoral fractures resulted in a high incidence of


Acta Orthopaedica 2021; 92 (5): 615–620

619

Table 5. Incidence of claims for pediatric hip diseases in relation to data provided by the Norwegian National Pediatric Hip Registry Total no. Incidence Claims of reported of claims Pediatric hip disease Claims granted cases a granted (%) Q65.0–Q65.9 DDH M91.1 Perthes disease M93.0 Slipped capital femoral epiphysis

23 b 5 15

8 1 10

390 b 2.0 352 0.2 298

3.3

a

Total number of reported cases for the study period are provided by the Norwegian National Pediatric Hip Registry (Barnehofteregisteret 2021). b The Norwegian National Pediatric Hip Registry provides data for late detected DDH (> 3 months). Only late detected cases are included from both the claim registry and the Norwegian National Hip Registry.

claims in relation to their occurrence and in a high rate of granted compensations. The data provided by NPE does not provide sufficient details to draw further conclusions from this finding. However, pediatric femur fractures are rare compared with other pediatric orthopedic fractures (Randsborg et al. 2013), and their treatment might therefore be considered challenging. It is important to be aware of the risk of compartment syndrome of the contralateral side due to malpositioning (hemilithotomy position) during surgery for femoral fractures. This is an avoidable complication that has been described in the literature (Brouze et al. 2019). In DDH and SCFE delayed diagnosis was the main reason for granted compensation. Late detected DDH is a frequent problem and there is no consensus on whether clinical examination combined with universal ultrasound or targeted ultrasound improves early diagnostics and treatment outcome (Shorter et al. 2013). Late diagnosis in SCFE is still frequent (Millis 2017). According to the National Norwegian Pediatric Hip registry, SCFE is diagnosed > 6 weeks after onset of symptoms in 70% of patients (Barnehofteregisteret 2021). Healthcare providers should have a high degree of suspicion of SCFE in patients with hip and/or knee and/or thigh pain. Although the numbers of cases for the different conditions is low, the data might accentuate conditions that deserve special attention. Technical errors were the cause of granted compensation in cases of cruciate ligament surgery and surgery for unstable patella, indicating that this type of surgery might be technically demanding for this age group. In surgery for juvenile hallux valgus, lack of indication was the main cause of compensation. Juvenile hallux valgus is rarely accompanied by symptoms (Hefti 2007) and the management of the condition remains controversial. Surgery might be required only in symptomatic cases. Because the recurrence rate is high (Coughlin 1995), surgery should preferably be postponed until skeletal maturity.

Delayed diagnosis of scaphoid fractures and accompanying injuries to elbow dislocation occurred, indicating that diagnostics in these upper extremity injuries remain challenging (Rasool 2004, Glad et al. 2010). The fact that several claims were submitted more than 20 years after a possible patient injury reflects that overlooked pediatric orthopedic conditions, such as DDH, might not become apparent and symptomatic until many years later. In these particular cases, the Norwegian jurisdiction to consider compensation claims based on incidents > 20 years ago as outdated might be considered doubtful. We found a mean payout for each claim of US$ ~56,300. Oetgen and Parikh (2016) found an average indemnity payout of US$ ~190,000 in pediatric orthopedic patients and Galey et al. (2019) found a median indemnity payment of US$ 675,000, when only the orthopedic surgeon was named as the defendant. Hence, payouts in the United States were 3 to 12 times higher than in Norway. The huge difference in payouts could be for several reasons. First, in Norway the public sector covers much of the financial needs of an injured patient through public benefits such as social security, sickness benefits, and pensions. The compensation from NPE will cover the difference between income without the injury and income with the injury, when other benefits from the public sector have been taken into account. In the United States, public benefits are much more restricted, and it will vary according to what the individual has in terms of insurance benefits. In addition, compensation in the United States may also be paid as “punitive damages,” compensation that is intended to act as punishment where the tortfeasor has acted negligently. In Norway, such compensation exists only to a very limited extent (personal communication J. Storvik, Senior Advisor, Legal Department, Norsk Pasientkadeerstatning 2020). Our study showed a higher frequency of compensation claims from the Northern health region than from the other regions. In fact, the number of claims per 100,000 inhabitants from the Northern Health Region was twice the number from the South Eastern Health Region. The NPE data does not offer any reliable explanation for this difference. However, lack of access to specialized pediatric orthopedic care in more remote areas of the country might have an implication for quality of care. Our study has several limitations. First, the data provided by the registry is limited. Further in-depth analysis of each case would require written consent from the patients. Second, the number of cases within certain diagnoses is relatively low, which limits the strengths of conclusions that can be drawn. Another weakness might be the fact that claims submitted to NPE are evaluated by only one specialist within the field of expertise. The evaluation by the specialist should consider national and international guidelines. However, such guidelines do not exist for many conditions and treatments. Thus, the conclusion concerning the possible existence of patient injuries would to a great extent depend on the personal judgment of only one specialist.


620

It is a definitive strength of the study that it is populationbased because all compensation claims in Norway are handled by NPE. Nevertheless, it must be emphasized that the NPE system does not capture treatment harm when no claim is filed. When NPE was established in 1988, only about 200 patients complained, while in 2018 there were 5,676 complaints. The rise was linear until 2015 and numbers have been more stable in recent years. There is reason to believe that there is still under-reporting. To our knowledge, no previous study on compensation claims within pediatric orthopedics could provide a complete epidemiological overview for a certain time period. Bukholm (2016) found that only 20–35% of claims granted by NPE had been reported to the hospital local registry for adverse events, which means either that hospitals were unaware of approximately 70% of patient injuries or that no reliable system for reporting of patient harm had been established. This underlines the importance of systematically analyzing NPE data for patient safety improvement work. Our study not only provides epidemiological data, but also analyzed the cases granted in detail to search for recurring patterns of failure as a basis to develop strategies for prevention. This contributes to an increased awareness of patient injuries within pediatric orthopedic care. Knowledge of the details of compensation claims should be part of educational programs for pediatric orthopedic surgeons and assist them to develop and implement guidelines and to improve patient safety and quality of care. On the other hand, efforts to avoid adverse legal outcomes might lead to defensive medicine: a behavior that avoids physician liability without providing increased benefits could lead to possible harm to the patients (Calikoglu and Aras 2020). Guidelines for diagnostics and treatment would assist healthcare workers to keep a balance between anxiety and risk perception.

JH, HR, and TT initiated and designed the study, analyzed the data, and critically revised the manuscript. JH and HR prepared the manuscript. IRKB and OR analyzed the data and critically revised the manuscript. The authors would like to thank Mette Willumstad Thomsen, statistician in NPE, for providing and analyzing data for the study. Acta thanks Pelle Gustafson and Kim Lyngby Mikkelsen for help with peer review of this study

Barnehofteregisteret. Retrieved from https://www.kvalitetsregistre.no/registers/nasjonalbarnehofterregister (Accessed 2021). Brouze I F, Steinmetz S, McManus J, Borens O. Well leg compartment syndrome in trauma surgery: femoral shaft fracture treated by femoral intramedullary nailing in the hemilithotomy position: case series and review of the literature. Ther Clin Risk Manag 2019; 15: 241-50. doi: 10.2147/tcrm. S177530

Acta Orthopaedica 2021; 92 (5): 615–620

Bukholm I. Use of the national registry of patient harms to improve patient safety at local hospitals. Paper presented at the Conference on Health Promoting Hospitals and Health Services 2016, Connecticut, USA. Calikoglu E O, Aras A. Defensive medicine among different surgical disciplines: a descriptive cross-sectional study. J Forensic Leg Med 2020; 73: 101970. doi: 10.1016/j.jflm.2020.101970 Coughlin M J. Juvenile hallux valgus: etiology and treatment. Foot Ankle Int 1995; 16(11): 682-97. doi: 10.1177/107110079501601104 Desserud K F, Bukholm I, Soreide J A. Compensation claims for sub-substandard care of patients with gastroentero-pancreatic neuroendocrine tumors: a nationwide descriptive study of cases between 2005–2016 in Norway. Anticancer Res 2017; 37(10), 5667-71. doi: 10.21873/anticanres.12002 de Vries E N, Ramrattan M A , Smorenburg S M, Gouma D J, Boermeester. M A The incidence and nature of in-hospital adverse events: a systematic review. Qual Saf Health Care 2008; 17(3): 216-23. doi: 10.1136/ qshc.2007.023622 Fornebo I, Simonsen K A, Bukholm I R K, Kongsgaard U E. Claims for compensation after injuries related to airway management: a nationwide study covering 15 years. Acta Anaesthesiol Scand 2017; 61(7): 781-9. doi:10.1111/aas.12914 Galey S A, Margalit A, Ain M C, Brooks J T. Medical malpractice in pediatric orthopaedics: a systematic review of US case law. J Pediatr Orthop 2019, 39(6): e482-e486. doi: 10.1097/bpo.0000000000001348 Glad T H, Melhuus K, Svenningsen S. Use of MRI for diagnosing scaphoid fracture. Tidsskr Nor Laegeforen 2010; 130(8): 825-8. doi: 10.4045/tidsskr.09.0396 Hefti F. Juvenile hallux valgus. In: Pediatric orthopedics in practice. Berlin: Springer-Verlag, Heidelberg; 2007. p. 418-22. Invalidity table. Retrieved from https://lovdata.no/dokument/SF/forskrift/1997-0421-373 (Accessed 2020). Kongsgaard U E, Fischer K, Pedersen T E, Bukholm I R K, Warncke T. Complaints to the Norwegian System of Patient Injury Compensation 2001–14 following nerve blockade. Tidsskr Nor Laegeforen 2016; 136(2324): 1989-92. doi: 10.4045/tidsskr.16.0368 Millis M B. SCFE: clinical aspects, diagnosis, and classification. J Child Orthop 2017; 11(2): 93-8. doi: 10.1302/1863-2548-11-170025 Norum J, Balteskard L, Thomsen M W, Kvernmo H D. Wrist malpractice claims in Northern Norway 2005–2014: lessons to be learned. Int J Circumpolar Health 2018; 77(1): 1483690. doi:10.1080/22423982.2018.1483690 OECD. The Economics of Patient Safety (Report); 2020. https://www.oecd. org/health/health-systems/Economics-ofPatientSafetyOctober-2020.pdf Oetgen M E, Parikh P D. Characteristics of orthopaedic malpractice claims of pediatric and adult patients in private practice. J Pediatr Orthop 2016; 36(2): 213-7. doi: 10.1097/bpo.0000000000000412 Orosco R K, Talamini J, Chang D C, Talamini M A. Surgical malpractice in the United States, 1990–2006. J Am Coll Surg 2012; 215(4): 480-8. doi:10.1016/j.jamcollsurg.2012.04.028 Randsborg P H, Gulbrandsen P, Saltytė Benth J, Sivertsen J E A, Hammer O-L, Fuglesang H F S, Arøen A. Fractures in children: epidemiology and activity specific fracture rates. J Bone Joint Surg Am 2013; 95(7): e42. doi: 10.2106/jbjs.L.00369 Randsborg P H, Bukholm I R K, Jakobsen R B. Compensation after treatment for anterior cruciate ligament injuries: a review of compensation claims in Norway from 2005 to 2015. Knee Surg Sports Traumatol Arthrosc 2018; 26(2): 628-33. doi: 10.1007/s00167-017-4809-y Rasool M N. Dislocations of the elbow in children. J Bone Joint Surg Br 2004; 86(7): 1050-8. doi: 10.1302/0301-620x.86b7.14505 Shorter D, Hong T, Osborn D A. Cochrane Review: Screening programmes for developmental dysplasia of the hip in newborn infants. Evid Based Child Health 2013; 8(1): 11-54. doi: 10.1002/ebch.1891 Statistics Norway. Retrieved from https://www.ssb.no (Accessed 2020). Storvik J. 2020. Personal communication: Senior advisor, Legal Department, Norsk Pasientskadeerstatning.


Acta Orthopaedica 2021; 92 (5): 621–627

621

The STRYDE limb lengthening nail is susceptible to mechanically assisted crevice corrosion: an analysis of 23 retrieved implants Morten Stendahl JELLESEN 1, Trine Nybo LOMHOLT 2, Rikke Quist HANSEN 1,2, Troels MATHIESEN 2, Carsten GUNDLACH 3, Søren KOLD 4, Tobias NYGAARD 5, Mindaugas MIKUZIS 4, Ulrik Kähler OLESEN 5, and Jan Duedal RÖLFING 6 1 Department 3 Department

of Mechanical Engineering, Technical University of Denmark, Lyngby; 2 Materials and Product Testing, FORCE Technology, Brøndby; of Physics, Technical University of Denmark, Lyngby; 4 Department of Orthopaedics, Interdisciplinary Orthopaedics, Aalborg University Hospital, Aalborg; 5 Department of Orthopaedics, Limb Lengthening and Bone Reconstruction Unit, Rigshospitalet, Copenhagen; 6 Orthopaedic Reconstruction and Children’s Orthopaedics, Aarhus University Hospital, Aarhus, Denmark Correspondence: jan.roelfing@clin.au.dk Submitted 2021-03-30. Accepted 2021-05-03.

Background and purpose — We noted several adverse events in patients in whom the first version of the STRYDE limb-lengthening nail (NuVasive Specialized Orthopaedics, San Diego, CA) had been implanted. Pain, osteolysis, periosteal reactions, and cortical hypertrophy at the nail junction were noted. Here, we present the analysis of 23 retrieved STRYDE implants. Materials and methods — We undertook visual inspection of the retrieved nails and screws, mechanical evaluation of the junction, micro-CT analyses, microscopic inspection of the bushing, screws, screw holes, and separated parts of the implants. Positive material identification (PMI) and energy-dispersive X-ray spectroscopy (EDS) were used to analyze the chemical composition. The hardness of the material was also investigated. Results — 20/23 retrieved nails had visible signs of corrosion, i.e., discoloration at the telescopic junction. MicroCT verified corrosion attacks in 12/12 scanned bushings. Corrosion, predominantly mechanically assisted crevice corrosion, was observed at the locking screws and screw holes in 20/23 nails. Biological material inside the nail was observed in addition to oozing from the junction of 2 nails during hardware removal, which was experimentally reproducible. Notably, the mechanical construction of the bushing changed from PRECICE P2 to STRYDE nails. Interpretation — STRYDE nails are not hermetically sealed, and liquid can pass the bushing. Biodur 108 itself is corrosion resistant; however, mechanically assisted crevice corrosion of the bushing, locking screws, and screw holes may be aggravated due to manufacturing aiming for increased strength and hardness of the alloy.

Observing several adverse events, we recently published a nationwide cross-sectional analysis of all 30 STRYDE limblengthening nails (NuVasive, Specialized Orthopedics, San Diego, CA) that were implanted in Denmark (Rölfing et al. 2021a). 27/30 STRYDE nails have now been removed and we present data from metallurgical analysis of 23 of the retrieved implants.

Materials and methods We performed an analysis of all STRYDE nails, removed either routinely, due to complications, or preemptively due to our recent discovery of adverse events with this implant (Rölfing et al. 2021a). STRYDE nails from the 4 centers— Aarhus University Hospital, Aalborg University Hospital, Odense University Hospital, and Rigshospitalet, Copenhagen, Denmark—were visually inspected and photo documented (Figure 1, see Supplementary data). Based on these initial observations, the engineers (MSJ, TNL, RQH, TM) decided together with JDR which analyses should be performed for the individual nails. Representative samples of “worst” and “best” cases in terms of discoloration at the telescoping nail junction were further analyzed by either FORCE Technology (FT) and/ or the Technical University of Denmark (DTU). Visual inspection and mechanical testing All nails were inspected and photo documented by the surgeons, and after FT and DTU received these. Handling of the implants was not uniform; some implants were wiped with an ethanol cloth only; others were also washed in a

© 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group, on behalf of the Nordic Orthopedic Federation. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits ­unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. DOI 10.1080/17453674.2021.1927506


622

Acta Orthopaedica 2021; 92 (5): 621–627

surgical instrument dishwasher before photo documentation. Movement of the telescoping nail parts was assessed with manual compression and distraction of the nail (Video at https://youtu.be/CZiqmwgI_tI). Microscopic inspection Microscopic inspection for signs of uniform and local corrosion was performed with a VHX-S650E digital microscope (Keyence Corporation, Osaka, Japan) with 20–100× magnification at DTU and a Leica MC 120 HD camera at FT. Microstructure examination was performed on cut sections of selected nails, followed by mounting in resin and subsequent grinding and polishing down to 1 µm diamond polishing. After etching in Vogels Sparbeize etchant, the microstructure was examined using a Leica DMI 5000 M metallographic microscope. Micro-CT 14 nails (12 STRYDE and 2 PRECICE P2.2 nails) were microCT scanned with a Nikon XT H 225 (Nikon Metrology Inc, Brighton, MI, USA; 160 kV, 125 µA, 1.0 mm copper filter, 801 views, 0.5 s exposure). Overview images of entire nails were recorded at 119.7 µm pixel size, while detailed scans were reconstructed at 19 or 23 µm isotropic resolution at the bushing for 12 nails, the O-rings for 5 nails and the ball bearing for 4 nails (Figures 2 and 3, see Supplementary data). Mechanical sectioning 5 STRYDE nails were mechanically sectioned at either FT or DTU. Hardness analysis Hardness measurements were performed according to Vickers hardness test method described in DS/EN ISO 6507-1:2018 using a Struers Durascan 80A with a load of 2 kg (HV2). Positive materials identification (PMI) The element weight percentage (wt.%) of the different components of the nail was assessed with X-ray fluorescence using a X-MET 8000 Optimum (Oxford Instruments, Abingdon, UK). Scanning electron microscopy (SEM) SEM was performed using a ZEISS EVO MIK-015 BRUKER X-Flash DET—011 with 15 kV (Karl Zeiss Microscopy Deutschland GmbH, Oberkochen, Germany). Suspected biological material was evaluated with energy-dispersive X-ray spectroscopy (EDS) to verify high-levels of oxygen. Data are presented as median (range). Ethics, funding, and conflicts of interest The study was conducted with consent of the patients according to the Declaration of Helsinki and was approved by the local institutional review boards. The work conducted by FORCE Technology was financially supported by the Danish Agency for Higher Education and Science, Denmark. No

Figure 4. Digital microscopy of the junctional discoloration, i.e., corrosion products.

other external financing was received. The authors declare no conflicts of interest.

Results 30 STRYDE limb-lengthening nails (19 femoral and 11 tibial nails) were eligible for the study. The median lengthening of 30 bone segments of 27 patients, median age 20 (11–65) years was 35 (15–80) mm. For further details, please see Rölfing et al. (2021a). 3 nails are still implanted. Median time from implantation to hardware removal of the 27 retrieved nails was 12 (3.5–20) months. 23/27 nails were available for analysis, because 1 implant was lost to sonication and 3 nails were removed and discarded before we became aware of the observed adverse events. 1 of the 23 nails was returned to the manufacturer for further analysis after visual inspection and photo documentation had been performed. 2 surgeons observed “oozing” of a brown substance from the telescopic nail junction during hardware removal in addition to the observation of junctional discoloration (Figure 4). Moreover, 4 of the 27 retrieved STRYDE nails fractured, 3 before and 1 during hardware removal. All 4 patients were within the weight limit of the applied nail (Robbins and Paley 2020, Rölfing et al. 2021a, 2021b). Notably, the bone regenerate was deemed to be sufficiently healed in 1 patient, while 3 regenerates were insufficient. 2 PRECICE titanium nails (P2.2) were also analyzed to compare these with the STRYDE nails. Macroscopic inspection and photo documentation Macroscopic inspection and photo documentation revealed discoloration at the telescoping junction in 20/23 implants (Figure 4) and 20/23 had corrosion at the locking screw holes


Acta Orthopaedica 2021; 92 (5): 621–627

623

Table 1. Symptoms, radiographic changes, and metallurgical characteristics of the 30 STRYDElengthened bone segments Symptoms Late onset of pain a 8/30 Swelling a 3/30 Radiographic changes Junctional osteolysis a 19/30 Periosteal reaction a 12/30 Cortical hypertrophy a 12/30 Blood samples Elevated Cr blood levels b 1/15 Metallurgical features Naked eye visible junctional corrosion 20/23 Micro-CT verified junctional corrosion 12/12 Corrosion of screw hole/screws 20/23 Not all nails were available for all analyses, therefore n/N varies. a Data previously reported (Rölfing et al. 2021a). b Chromium (Cr) levels at the time of nail removal. The wt.% of Cr and Mn are approximately the same in Biodur 108. However, Mn (and Mo) blood samples are not readily available in Denmark.

(Table 1). Corrosion near the telescopic junction was seen as corrosion products leaking from the nail. The distraction rod itself did not show signs of uniform or pitting corrosion when inspected with light optical microscopy at 200x magnification. Attacks on the rod within the bushing were noted. Likewise empty screw holes were unaffected, whereas most screw holes in contact with screws suffered from material degradation. Degradation happened predominantly inside and on the load-bearing side of the hole with a smooth and hemispherical corrosion morphology. Screws thus showed indications of mechanically assisted crevice corrosion primarily on one side of the screw, but degradation on both sides was also observed (Figure 1, see Supplementary data).

Figure 6. Micro-CT of 2 STRYDE and 1 PRECICE P2. Overview and detailed imaging of the bushing/crown, O-rings and ball bearing. Corrosion is seen around the bushing/ crown in 12/12 nails scanned at this location (middle panel), corrosion at the O-ring was observed in 1/5 nails scanned at this location (top panel). Notably, no signs of corrosion were seen around the ball bearing in 4/4 scanned nails at this location. No corrosion was seen in the reference PRECICE P2 nail at any location. The mechanical components inside the housing appeared to be similar between STRYDE and PRECICE P2.

Manual compression and distraction Manual compression and distraction of the devices showed that 13/20 nails could be telescoped several millimeters with little force (Figure 5, see Supplementary data). Micro-CT Micro-CT of 12 STRYDE nails documented corrosion attacks at the telescopic junction, primarily at the bushing, but also at the distraction rod and internal components of the nail surrounding the bushing (Figure 6). Comparison of the scanned STRYDE and 2 scanned PRECICE P2.2 nails showed a similar

Figure 8. Examples of a sectioned STRYDE (top panel), corroded bushing (middle panel), and presence of biological material on internal components. Biological origin was verified with EDS analysis showing a high oxygen content (bottom panel).


624

Acta Orthopaedica 2021; 92 (5): 621–627

mechanical construction; only the bushing seemed to be constructed differently (Figure 6). Close 360° contact between the housing and bushing were observed in PRECICE P2, while none of the 12 STRYDE nails had that feature (Figure 6, lower vs. top panel).

Figure 9. Corrosion attack at the bushing/crown. Digital microscopy (upper panels) and microscopy of the microstructure (middle and lower panels).

Mechanical sectioning Mechanical sectioning of 5 STRYDE nails, which were selected based on either visual inspection or micro-CT findings, i.e., worst and best cases, showed an internal titanium actuator pin with signs of dry lubricant of a crystalline appearance (Figure 7, see Supplementary data). EDS analysis of the crystalloid structure showed high contents of fluorine 35 wt.% and aluminum 24 wt.%. The presence of biologic material within the nail was confirmed by visual inspection and a high oxygen content determined by EDS analysis (Figure 8). The O-ring sealing within the internal compartments of the nail appeared intact, besides 1 nail suffering from corrosion attack, also at this location. Interestingly, the bushing of that nail was not severely attacked by corrosion. Microscopic inspection Microscopic inspection verified the mechanical damage and local corrosion of the microstructure both at the bushing (Figure 9) and at the screw and screw holes (Figure 10). PMI PMI showed that the housing body, bushing, and locking screws fulfilled the Biodur 108 alloy ASTM 2229 specifications (ASTM 2021), i.e., content of manganese (Mn), chromium (Cr), molybdenum (Mo), and iron (Fe) (Table 2, see Supplementary data). Moreover, the visible discoloration, i.e., corrosion products, contained the same elements.

Figure 10. Mechanically assisted crevice corrosion/fretting and crevice corrosion at a locking screw, screw hole including microscopic image of the corrosion-attacked microstructure. For SEM imaging see Figures 12 and 13 in Supplementary data.

Hardness analysis Hardness analysis of the bushing, distraction rod, housing, and screws showed a hardness in the 410–445 HV range. This means that the manufactured nail has increased mechanical properties compared with the annealed conditions of the alloy (Figure 11, see Supplementary data).


Acta Orthopaedica 2021; 92 (5): 621–627

Distraction—retraction—distraction Distraction—retraction—distraction of 2 nails using the fast distractor showed that the nails were not hermetically sealed. Air was pressed out of the nail, forming bubbles during distraction underwater. After subsequent retraction underwater and drying of the nail, the nail was distracted, which led to brown liquid leaking out of the telescoping junction (Figure 5, see Supplementary data).

Discussion The main findings of our study were: 1. 20/23 nails had visible signs of corrosion, i.e., discoloration at the telescopic junction. Micro-CT verified corrosion attacks in 12/12 scanned bushings. 2. Mechanically assisted crevice corrosion was also observed at the locking screws and screw holes in 20/23 nails. 3. STRYDE nails are not hermetically sealed. We found biological material and corrosion inside the nail and observed oozing from the junction of 2 nails during hardware removal, and this observation was experimentally reproducible. In agreement with our results, Vogt et al. (2021) and Iliadis et al. (2021a) also reported corrosion after metallurgical analysis of retrieved STRYDE nails. While reports of implant failures in limb-lengthening nails are rare (Frost et al. 2021), the related magnetically controlled spinal growing rod devices (MAGEC, NuVasive, San Diego, CA, USA) have been highly investigated in recent years (Tang et al. 2019, Akbarnia and Mundis 2019, Agarwal et al. 2020, Joyce et al. 2020, Rushton et al. 2020, Tsirikos and Roberts 2020, Wei et al. 2020, Rushton et al. 2021). In these spinal implants, wear debris causing metallosis, i.e., discoloration of the surrounding soft tissue, originally drew surgeons’ attention to the matter. Later, after extensive analyses, increased blood levels of metal ions and breakage of internal components as well as corrosion were reported. Nonetheless, the lengthening process with MAGEC growing rods is almost pain-free and may be a gentler treatment option with high patient satisfaction compared with sequential open spinal lengthening procedures (Skov et al. 2019, 2020a). Many surgeons therefore still advocate the use of MAGEC implants, despite the call for evaluation of the cost-effectiveness and safety of these devices (Rushton et al. 2020, Skov et al. 2020b, Tsirikos and Roberts 2020). In contrast to spinal growing rods, limb-lengthening nails are not surrounded by soft tissue, but surrounded by bone. Our current working hypothesis regarding STRYDE is that internal and junctional corrosion and its products cause a toxic environment leading to osteolysis. In an effort to protect its mechanical integrity, the bone forms periosteal reaction (predominantly onion-skin layered) before cortical hypertrophy occurs, which is likely the end result—unless the cortical destruction is substantial (Rölfing et al 2021a).

625

Whereas MAGEC and PRECICE are made of a titanium alloy, STRYDE is made of stainless steel Biodur 108. In general, austenitic stainless steel (as in the AISI 300 series) is known to offer appropriate corrosion resistance for many applications including orthopedic implants. The resistance to uniform corrosion is explained by the existence of a dense chromium oxide forming a passive film on the surface. Nevertheless, the passive film can locally degrade, e.g., by smallamplitude wearing movements or due to the formation of a local aggressive environment in occluded regions prone to depletion of oxygen (crevice corrosion). High nitrogen containing austenitic stainless steel (HNS) such as Biodur 108 maintains the austenitic structure by the addition of nitrogen and manganese without having a major nickel content. Having nitrogen dissolved in the austenitic stainless steel results in higher strength compared with nickel containing austenitic stainless steel. HNS is also reported to offer improved pitting and crevice corrosion resistance in the solution-annealed state compared with the AISI 300 series (Lim et al. 2001, Baba and Katada 2006). The effect of cold-working (metalworking at ambient temperature) of HNS will further increase the strength and hardness of the alloy; however, cold-working has proven to have an unfavorable effect on pitting and crevice corrosion resistance (Kamachi Mudali et al. 2002, Wang et al. 2017). The effect on corrosion behavior of having an increased amount of manganese at the expense of nickel in the stainlesssteel alloy is lower pitting and crevice corrosion resistance as well as a lower repassivation rate if the passive film is broken down (Lim et al. 2001, Toor et al. 2007). Other important factors affecting the corrosion resistance of HNS alloys are the metallurgical structure, a fine-grained austenitic structure that is free of ferrite, chi, and sigma phases, the inclusion content, and the risk of carbonitride formation (ASTM F2229). A possible explanation for corrosion near the telescopic junction is that corrosion initiates at the interfaces of the bushing where small-amplitude movements can produce disruption of the passive film causing metal degradation and crevice formation. As the environment in the crevice becomes more aggressive it leads to further degradation of the bushing and other parts of the telescopic junction. Mechanically assisted crevice corrosion may also explain the material degradation inside the screw holes. Weightbearing may initiate this process by causing small-amplitude movements between the screw and screw hole, initiating fretting. Subsequently, a larger crevice is formed where the distraction rod inside the screw hole is attacked due to increased aggressivity of the environment inside the crevice between screw and screw hole. Moreover, galvanic corrosion could be ruled out, because the housing, bushing, and screw were made from the same metal, i.e., Biodur 108. An analysis of retrieved PRECICE nails documented improvements from the first version to the current version (Panagiotopoulou et al. 2018). However, in line with our findings regarding STRYDE, these authors also noted biological


626

material within PRECICE nails, while internal corrosion was confined to the early versions. They also compared their findings with MAGEC spinal growing rods, highlighting that there are fundamental differences including the planned removal of limb-lengthening nails after bony consolidation (Panagiotopoulou et al. 2018, Rushton et al. 2020). They therefore state that it is “unlikely that a corrosive process will have sufficient time to cause an actuator pin fracture or other internal mechanism damage that compromises the use of the implant.” This notion is partially supported by Eltayeby et al. (2021) reporting on 102 retrieved nails, whereof 57/65 PRECICE P2 and 29/37 PRECICE P1 were still functioning after a median implantation time of 15 (4–47) months prior to hardware removal and testing. Albeit not the primary outcome of that study, the authors did not mention any signs of corrosion and only excluded 1 broken nail. Neither do they mention significant damage, or corrosion of the tested nails. Interestingly, Lee et al. (2017) describe broken bushings of PRECICE nails, but no corrosion. However, based on these studies and the body of evidence, clinical application of the titanium PRECICE nail is relatively safe (Alrabai et al. 2017, Calder et al. 2019, Horn et al. 2019, Hammouda et al. 2020, Morrison et al. 2020, Nasto et al. 2020, Frost et al. 2021, Iliadis et al 2021b). To our knowledge, we have reported 4 out of 5 broken STRYDE nails in the literature (Johnson et al. 2021, Rölfing et al. 2021a, 2021b). The present study included analysis of our 4 broken nails, but we were unable to determine the cause of breakage through locking screw holes (n = 3) and where the magnet resides (n = 1). Limitations of our study include that we chose which nail underwent which analyses based on an initial macroscopic inspection. Ideally, all nails should have undergone a standardized sequence of all analyses. Furthermore, handling of the implants immediately after removal was not standardized at the different centers, i.e., cleansing with saline water and/or ethanol wipes or cleaning in a surgical dishwasher. Nonetheless, all available nails underwent visual inspection and 12 STRYDE and 2 PRECICE P2.2 nails were analyzed with micro-CT. The external validity of our study is therefore more substantial than a previous micro-CT investigation of 2 P1, 1 P2.0, and 1 P2.1 PRECICE nails (Panagiotopoulou et al. 2018). Another limitation is that we did not systematically evaluate whether the retrieved nails were still functioning (Eltayeby et al. 2021). The major strength of our study is that we can document correlation of the metallurgical analyses with the previously reported clinical and radiographic findings (Rölfing et al. 2021a). In conclusion, we report corrosion causing internal and external damage to the nail, extrusion of the bushing, biological material within the nail, and the fact that fluid is able to pass the corroded bushing. The latter observations underline that the STRYDE nail is not hermetically sealed. Finally, the findings of our present study correlate with the radiographic changes and clinical symptoms noted in our previous study

Acta Orthopaedica 2021; 92 (5): 621–627

within the same cohort (Rölfing et al. 2021a). While our observational and experimental studies may not be able to determine a causal relation between corrosion leading to clinical symptoms and radiological findings, the covariation and temporal precedence make causality likely. Supplementary data Table 2 and Figures 1–3, 5, 7 and 11–13 are available as supplementary data in the online version of this article, http:// dx.doi.org/10.1080/17453674.2021.1927506

MSJ, TNL, RQH, TM, JDR: study design. MSJ and JDR: first draft. All authors: data collection, critical review, final approval of the manuscript. Thanks are offered to the orthopedic surgeons Michael Brix and Christian Faergemann, Odense University Hospital; Martin Gottliebsen, Michael Davidsen, Mathias Bünger, Ahmed Abood, and Juozas Petruskevicius, Aarhus University Hospital for providing clinical data and retrieving nails for analyses; Torben Haugaard Jensen, Characterisation and Special Testing, FORCE Technology, Denmark for performing micro-CT analyses. Gitte Pedersen, Helle Andersen and Allan Vest, Characterisation and Special Testing, FORCE Technology, Denmark for preparation of samples, microstructure analyses, hardness measurements and chemical analyses. Acta thanks Janet D Conway and Tom Joyce for help with peer review of this study.

Agarwal A, Kelkar A, Agarwal A G, Jayaswal D, Jayaswal A, Shendge V. Device-related complications associated with MAGEC rod usage for distraction-based correction of scoliosis. Spine Surg Relat Res 2020; 4(2): 148-51. Akbarnia B A, Mundis G M. Magnetically controlled growing rods in early onset scoliosis. Orthopade 2019; 48(6): 477-85. Alrabai H M, Gesheff M G, Conway J D. Use of internal lengthening nails in post-traumatic sequelae. Int Orthop 2017; 41(9): 1915-23. ASTM F2229-21, Standard Specification for Wrought, Nitrogen Strengthened 23Manganese-21Chromium-1Molybdenum Low-Nickel Stainless Steel Alloy Bar and Wire for Surgical Implants (UNS S29108). West Conshohocken, PA: ASTM International; 2021. Baba H, Katada Y. Effect of nitrogen on crevice corrosion in austenitic stainless steel. Corros Sci 2006; 48(9): 2510-24. Calder P R, McKay J E, Timms A J, Roskrow T, Fugazzotto S, Edel P, Goodier W D. Femoral lengthening using the Precice intramedullary limblengthening system: outcome comparison following antegrade and retrograde nails. Bone Joint J 2019; 101-B(9): 1168-76. Eltayeby H H, Alrabai H M, Jauregui J J, Shabtai L Y, Herzenberg J E. Post-retrieval functionality testing of PRECICE lengthening nails: the “sleeper” nail concept. J Clin Orthop Trauma 2021; 14: 151-5. Frost M, Rahbek O, Traerup J, Ceccotti A A, Kold S. Systematic review of complications with externally controlled motorized intramedullary bone lengthening nails (FITBONE and PRECICE) in 983 segments. Acta Orthop 2021; 91(1): 120-7. Hammouda A I, Szymczuk V L, Gesheff M G, Mohamed N S, Conway J D, Standard S C, McClure P K, Herzenberg J E. Acute deformity correction and lengthening using the PRECICE magnetic intramedullary lengthening nail. J Limb Lengthen Reconstr 2020; 6: 20-7. Horn J, Hvid I, Huhnstock S, Breen A B, Steen H. Limb lengthening and deformity correction with externally controlled motorized intramedullary nails: evaluation of 50 consecutive lengthenings. Acta Orthop 2019; 90(1): 81-7.


Acta Orthopaedica 2021; 92 (5): 621–627

Iliadis A D, Wright J, Stoddart M T, Goodier W D, Calder P. Early results from a single centre’s experience with the STRYDE nail. Bone Joint J. 2021a. In press (personal communication) Iliadis A D, Palloni V, Wright J, Goodier D, Calder P. Pediatric lower limb lengthening using the PRECICE nail: our experience with 50 cases. J Pediatr Orthop 2021b; 41(1): e44-9. Johnson M A, Karkenny A J, Arkader A, Davidson R S. Dissociation of a femoral intramedullary magnetic lengthening nail during routine hardware removal. JBJS Case Connect 2021; 11(1): 1-5. Joyce T J, Smith S L, Kandemir G, Rushton P R P, Fender D, Bowey A J, Gibson M J. The NuVasive MAGEC rod urgent field safety notice concerning locking pin fracture: how does data from an independent explant center compare? Spine 2020; 45(13): 872-6. Kamachi Mudali U, Shankar P, Ningshen S, Dayal R K, Khatak H S, Raj B. On the pitting corrosion resistance of nitrogen alloyed cold worked austenitic stainless steels. Corros Sci 2002; 44(10): 2183-98. Lee D H, Kim S, Lee J W, Park H, Kim T Y, Kim H W. A comparison of the device-related complications of intramedullary lengthening nails using a new classification system. Biomed Res Int 2017; 2017: 8032510. doi: 10.1155/2017/8032510 Lim Y S, Kim J S, Ahn S J, Kwon H S, Katada Y. The influences of microstructure and nitrogen alloying on pitting corrosion of type 316L and 20 wt.% Mn-substituted type 316L stainless steels. Corros Sci 2001; 43(1): 53-68. Morrison S G, Georgiadis A G, Huser A J, Dahl M T. Complications of limb lengthening with motorized intramedullary nails. J Am Acad Orthop Surg 2020; 28(18): e803-9. Nasto L A, Coppa V, Riganti S, Ruzzini L, Manfrini M, Campanacci L, Palmacci O, Boero S. Clinical results and complication rates of lower limb lengthening in paediatric patients using the PRECICE 2 intramedullary magnetic nail: a multicentre study. J Pediatr Orthop B 2020; 29(6): 611-17. Panagiotopoulou V C, Davda K, Hothi H S, Henckel J, Cerquiglini A, Goodier W D, Skinner J, Hart A, Calder P R. A retrieval analysis of the Precice intramedullary limb lengthening system. Bone Joint Res 2018; 7(7): 476-84. Robbins C, Paley D. Stryde Weight-bearing Internal Lengthening Nail. Tech Orthop 2020; 35(3): 201-8. Rölfing J D, Kold S, Nygaard T, Mikuzis M, Brix M, Faergemann C, Gottliebsen M, Davidsen M, Petruskevicius J, Olesen U K. Pain, osteolysis, and periosteal reaction are associated with the STRYDE limb lengthening nail: a nationwide cross-sectional study. Acta Orthop 2021a; epub ahead of print. doi.org/10.1080/17453674.2021.1903278

627

Rölfing J D, Bünger M, Petruskevicius J, Abood A. Removal of broken Stryde limb lengthening nails. Orthop Traumatol Surg Res 2021b; in press. doi.org/10.1016/j.otsr.2021.102958 Rushton P R P, Smith S L, Kandemir G, Forbes L, Fender D, Bowey A J, Gibson M J, Joyce T J. Spinal lengthening with magnetically controlled growing rods. Spine 2020; 45(3): 170-6. Rushton P R P, Smith S L, Fender D, Bowey A J, Gibson M J, Joyce T J. Metallosis is commonly associated with magnetically controlled growing rods: results from an independent multicentre explant database. Eur Spine J 2021; epub ahead of print. doi.org/ 10.1007/s00586-021-06750-2 Skov S T, Bünger C, Rölfing J D, Hansen E S, Høy K, Valencius K, Hemig P, Li H. High global satisfaction in magnetically controlled elongations in 29 early-onset scoliosis patients versus primary spinal fusion in 20 adolescent idiopathic scoliosis patients. Open J Orthop Rheumatol 2019; 4(1): 005-9. Skov S T, Bünger C, Li H, Vigh-Larsen M, Rölfing J D. Lengthening of magnetically controlled growing rods caused minimal pain in 25 children: pain assessment with FPS-R, NRS, and r-FLACC. Spine Deform 2020a; 8(4): 763-70. Skov S T, Li H, Hansen E S, Høy K, Helmig P, Rölfing J D, Bünger C. New growth rod concept provides three dimensional correction, spinal growth, and preserved pulmonary function in early-onset scoliosis. Int Orthop 2020b; 44(9): 1773-83. Tang N, Zhao H, Shen J-X, Zhang J-G, Li S-G. Magnetically controlled growing rod for early-onset scoliosis: systematic review and meta-analysis. World Neurosurg 2019; 125: e593-601. Toor I-H, Park K J, Kwon H. Manganese effects on repassivation kinetics and SCC susceptibility of high Mn-N austenitic stainless steel alloys. J Electrochem Soc 2007; 154(9): C494. Tsirikos A I, Roberts S B. Magnetic controlled growth rods in the treatment of scoliosis: safety, efficacy and patient selection. Med Devices Evid Res 2020; 13: 75-85. Vogt B, Rödl R, Gosheger G, Schulze M, Hasselmann J, Fuest C, Toporowski G, Laufer A, Frommer A. Focal osteolysis and corrosion at the junction of Precice Stryde intramedullary lengthening device: preliminary clinical, radiographic and metallurgic analysis of 57 lengthened segments. Bone Joint Res 2021 (under review/personal communication). Wang Q, Zhang B, Ren Y, Yang K. Eliminating detrimental effect of cold working on pitting corrosion resistance in high nitrogen austenitic stainless steels. Corros Sci 2017; 123: 351-5. Wei J Z, Hothi H S, Morganti H, Bergiers S, Dal Gal E, Likcani D, Henckel J, Hart A J. Mechanical wear analysis helps understand a mechanism of failure in retrieved magnetically controlled growing rods: a retrieval study. BMC Musculoskelet Disord 2020; 21(1): 519.


628

Acta Orthopaedica 2021; 92 (5): 628–632

The rise of registry-based research: a bibliometric analysis Emilio ROMANINI 1,2, Irene SCHETTINI 3, Marina TORRE 4, Michele VENOSA 1, Alessio TARANTINO 5, Vittorio CALVISI 5, and Gustavo ZANOLI 2,6 1 RomaPro

Center for Hip and Knee Arthroplasty, Polo Sanitario San Feliciano, Rome, Italy; 2 GLOBE, Italian Working Group on Evidence Based Orthopaedics, Rome; 3 Department of Management and Law, University of Rome Tor Vergata, Rome; 4 Scientific Secretariat of the Presidency, Istituto Superiore di Sanità, Rome; 5 MeSVA Department, University of L’Aquila; 6 Casa di Cura Santa Maria Maddalena, Occhiobello, RO, Italy Correspondence: emilio.romanini@gmail.com Submitted 2021-02-09. Accepted 2021-05-04.

Background and purpose — The main purpose of arthroplasty registries is to collect information on patients, techniques, and devices to monitor and improve the outcome of the specific procedure. This study analyses the role played by registries in the orthopedic research community and describes publication trends, characteristics, and patterns of this field of research. Patients and methods — A descriptive-bibliometric review was conducted. Scopus was the database used for the research. All articles published from 1991 to December 2020 containing keywords related to registries and arthroplasty were considered. In particular, the following dimensions were analyzed in detail: (i) papers/year; (ii) journals; (iii) countries; (iv) research growth rate; (v) collaboration among countries. VOSviewer software was used to perform the bibliometric analysis. Finally, the 50 most cited papers of the last 10 years were briefly analyzed. Results — 3,933 articles were identified. There has been growing interest in the topic since 2010. Acta Orthopaedica ranked first for the number of articles published. The country with the largest number of articles citing registries was the United States, followed by the United Kingdom and Sweden. The relative number of articles per 100,000 inhabitants is 0.60 for Europe and 0.38 for the United States. The literature in this research area has an average yearly growth rate of 28%. Interpretation — The publication rate in the field of arthroplasty registries is constantly growing with a noteworthy impact in the evolution of this research and clinical area. The growth rate is significantly higher than that of arthroplasty literature (28% vs. 10%) and the collaboration among countries is strong and increasing with time.

Randomized controlled trials (RCTs) are crucial in gaining knowledge regarding treatment effectiveness and supporting clinical decisions with evidence-based data. However, RCTs in orthopedic surgery present ethical, economic, and organizational challenges, therefore their number is limited and often the methodological quality is modest (Campbell et al. 2010, Mundi et al. 2014). Observational studies provide a valuable alternative method for clinical investigation in orthopedic surgery in settings in which RCTs are not feasible and when increased generalizability of findings is desired (Morshed et al. 2009, Castillo et al. 2012). Moreover, given the large amount of data collected, they offer increased power to capture rare events (e.g., complications and failure) that inadequately powered RCTs are potentially prone to miss (beta error). Joint registries are high-quality observational, prospective cohort studies designed to collect all primary and revision cases from a specific country or geographical area, without having to rely on extrapolation from a sample. The main purpose of joint registries is to collect information on patients, implants, and procedures in order to monitor and improve the outcome of the specific procedure (Lübbeke et al. 2019). Research using data from the national registries is increasingly applied as a source of information in arthroplasty and is influencing surgical practice in many ways (Varnum et al. 2019). The arthroplasty registry community has a culture of publishing annual reports of its results (Hughes et al. 2017) but an increasing number of original studies utilizing registry data are also published in peerreviewed journals. Moreover, many arthroplasty-related papers are referring to registry data as the basic source of information, directing research projects and resources. We evaluated the role played by registries in the orthopedic literature by means of a descriptive and bibliometric analysis of the published research and its evolution in the last 30 years. Further, we compared the growth of registry-based literature with that of the general literature on joint replacement.

© 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group, on behalf of the Nordic Orthopedic Federation. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits ­unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. DOI 10.1080/17453674.2021.1937459


Acta Orthopaedica 2021; 92 (5): 628–632

629

List of the most productive journals (> 50 articles about registries)

No. of articles on registries 1,000

Source title

Number of Impact 5-year publications factor impact 1979–2020 2019 factor a Country

Acta Orthopaedica Journal of Arthroplasty Clinical Orthopaedics and Related Research Journal of Bone and Joint Surgery American Volume Bone and Joint Journal BMC Musculoskeletal Disorders International Orthopaedics Journal of Shoulder and Elbow Surgery HIP International Knee Surgery Sports Traumatology Arthroscopy Archives of Orthopaedic and Trauma Surgery

395 369 260 209 198 124 85 78 72 69 58

3.0 3.7 4.3 4.6 4.3 1.9 2.9 2.8 1.3 3.2 2.0

3.5 3.7 4.7 5.7 4.1 2.4 2.8 3.3 1.3 3.2 2.1

a Source: Authors’ elaboration

from https://jcr.clarivate.com NOF = Nordic Orthopaedic Federation

NOF USA USA USA England England Germany USA Italy Germany Germany

No. of articles on arthroplasty 10,000

900

9,000

800

8,000

700

7,000

600

6,000

500

5,000

400

4,000

300

3,000

200

2,000

100

1,000

0

0 1980 1984 1988 1992 1996 2000 2004 2008 2012 2016 2020

Figure 1. Temporal trends of articles on registries (red line) and on arthroplasty in general (blue line). Source: Authors’ elaboration from Scopus.

Materials and methods

Results

We conducted a descriptive-bibliometric review in the field of arthroplasty surgery with a focus on registries. The search was initially performed on several biomedical databases and finally Scopus was selected, as it returned the largest number of published articles. All languages and all document types were considered eligible for this search. All articles published from 1979 (date of first publication of an article based on registry data) to December 2020 were considered. Details of the search strategies are provided in Supplementary data. A quantitative-descriptive and bibliometric analysis was conducted on the final dataset obtained. In particular, we analyzed the following dimensions: (i) number of papers/year; (ii) journals; (iii) countries; (iv) research growth rate; (v) journals’ impact factor; (vi) collaboration among countries. Technical details concerning bibliometrics, i.e., methodology and software used for the analysis, can also be found in Supplementary data. A comparison was then performed between articles in our dataset and in the larger sample obtained without restricting the search to registry-based research, to compare the growth rate of papers citing registries with those dedicated to joint replacement in general. Moreover, complying with the common use in bibliometric literature (Lobo et al. 2020, Yakkanti et al. 2020), the 50 most cited papers related to registries were analyzed; to avoid the possibility of retrieving mostly “classical” papers, we limited the analysis to the last 10 years. Finally, articles published on the top 5 ranked journals in general and internal medicine (“the Big Five”: NEJM, Lancet, BMJ, JAMA, Annals of Internal Medicine) were analyzed separately to offer an estimate of diffusion of registry-based research outside of the orthopedic field.

3,933 articles were retrieved through Scopus. The descriptive analysis showed a continuous growth of research citing the keywords related to registries throughout the entire timespan (Figure 1). The average annual growth rate of literature concerning registries is about 30%. We compared this with the average annual growth rate of the entire literature (123,126 articles) related to joint replacement, that is about 8%. 11 journals published more than 50 papers and they are listed in the Table; Acta Orthopaedica ranked first for the number of articles published. 53 articles were published in the Big Five journals (1.5%): BMJ 25; Lancet 20; Annals of Internal Medicine 4; JAMA 4, NEJM 0. The country with the largest number of articles citing registries was the United States, followed by the United Kingdom and Sweden. Europe as a whole is by far the most prolific geographical area, largely due to the contribution of northern European countries that were pioneers in this type of research (Figure 2). Among the 50 most cited papers, 39 were original studies, 11 were reviews (8 systematic, 3 narrative). In 36 papers the study was performed using original registry data, in 3 cases the analysis was performed by authors not directly involved with one specific registry. 11 studies were published in 2 of the “Big Five” journals: Lancet 7, BMJ 4. The bibliometric analysis shows the collaboration among countries: the size of the label and the country circle indicates the number of citations: the higher the number of citations, the larger the circles. Distances between circles represents correlation of countries in terms of scientific collaboration links (Figure 3). The same analysis was conducted on the subsample of the 50 most cited papers (Figure 4).


630

Acta Orthopaedica 2021; 92 (5): 628–632

United States United Kingdom Sweden Australia Denmark Norway China Germany Finland Netherlands Canada Switzerlan Undefined New Zealand Italy France Austria Spain Singapore South Korea Japan Ireland Belgium India Greece Taiwan 0

200

400

600

800

1,000

1,200

Number of articles citing registries

FIgure 2. Most productive countries (Scopus). The number of articles per 100,000 inhabitants is 0.60 for Europe and 0.38 for the United States.

Discussion

Figure 3. Map of the 15 most productive countries and of the relationships among the international research groups. Source: Authors’ elaboration from VOSviewer software.

Arthroplasty registries were started over 40 years ago in Sweden. Since then, many countries have adopted the concept and started national registries (Varnum et al. 2019). More recently, several organizations have been created to promote collaboration among registries and to develop internaFigure 4. Map of the 15 most productive countries and of the relationships among the tional standards and harmonization of data international research groups among the 50 most cited articles. Source: Authors’ elaboracollection, such as the International Society tion from VOSviewer software. of Arthroplasty Registries (ISAR) (https:// www.isarhome.org/), International Consortium of Orthopaedic Registries (ICOR) (Sedrakyan et al. words related to joint replacement also cite the keywords con2011), Nordic Arthroplasty Register Association (NARA) cerning registries (it was 1.5% in 2000). Acta Orthopaedica hosted the largest number of papers, (Van Steenbergen et al. 2021), and Network of Orthopaedic Registries of Europe (NORE) (Havelin et al. 2009, Robertsson reflecting the pioneer role played by the Scandinavian counet al. 2010, Malchau et al. 2018, Pijls et al. 2019). An inte- tries in this research and clinical area and reaffirming a wellgration of Registry Evidence within Cochrane Reviews has deserved achievement (Hailer 2015). Despite the fact that the American Joint Replacement Regisbeen suggested as “difficult but necessary” (Zanoli 2012), and a special interest group (COchrane Unified Group on Arthro- try still shows low coverage (Heckmann et al. 2019), and that plasty Registries) has been proposed, even though this has only some regional/institutional registries are well established, remained for the moment episodic and needs further method- the country with the largest number of articles on registries in our sample is the United States. This might reflect, of course, ological and “political” development. The evaluation of scientific production by means of a biblio- the large number of orthopedic surgeons and institutions prometric analysis is useful to provide evidence of the dissemina- ducing research in a nation that is much more populous than tion of registry-based knowledge within the orthopedic scien- any European country, where registries were started. Besides, tific community. In this particular area of research, no formal the US-published research production is possibly explained by bibliometric analysis has been published to our knowledge, the large number of papers published in US-based journals, though Boyer et al. (2011) performed an interesting descrip- such as the Journal of Arthroplasty and Clinical Orthopaedics tive analysis of scientific production. and Related Research. The aim of our bibliometric analysis The descriptive databases analysis showed a continuous is not to rank countries, as for instance it does not take into growth of research citing the keywords related to registries account the quality of published articles. A qualitative analythroughout the entire timespan, and this effect is larger than sis of published papers was beyond the scope of this article; the growth observed evaluating the keywords related to joint it is in any case well known that only few well-established replacement in general. It is important to underline that in the registries have enough coverage and completeness to provide last year analyzed, 2020, 6% of articles published with key- useful data (Herberts and Malchau 2000, Van Steenbergen et


Acta Orthopaedica 2021; 92 (5): 628–632

al. 2015, 2021). Our search strategy retrieves not only original reports from existing registries but also papers that discuss or quote registry data; for this reason, we do not expect to find a link between quality of registry data collection and number of published articles for each country. Again, the number of articles as calculated in our analysis reflects the interest in the subject of registries rather than the quality of original data provided. We find it reassuring that our data seems to point to an increasing attention to registry-based research even from countries where no national registry is active. The bibliometric analysis shows an increasing number of papers collecting contributions from different countries, as a result of the international collaborative initiatives. Most of the collaboration was between Sweden and Denmark: the number of publications concerning registries that these 2 countries have co-authored is 67; Sweden has a high number of co-authored articles with the United States too (59) and the United States has a strong link with the United Kingdom (56). Conclusion The increasing role played by local, regional, and national registries in the development of arthroplasty is well documented by the growing body of literature depicted by this bibliometric analysis. More recently, International Collaboration across registries at patient-level data (Ranstam et al. 2011) as well as meta-analyses (Keurentjes et al. 2014, Nieuwenhuijse et al. 2014, Paxton et al. 2018) added new research perspectives and contributed to the constant growth of scientific production and international collaboration in this field, which appears as fruitful as ever. It is hoped that the growing interest in high-quality registry research will give new strength to the “quest for phased introduction of new implants” (Nelissen et al. 2011), and ultimately will lead to improved patient care. Conflicts of interest The authors have no conflicts of interest to declare. Supplementary data The search strategy is available as supplementary data in the online version of this article, http://dx.doi.org/10.1080/ 17453674.2021.1937459

ER: conception of study, interpretation of data, and manuscript preparation. IS, MT: statistical analyses, interpretation of data, and manuscript preparation. GZ, MV, AT, RG, VC: interpretation of data and manuscript preparation. The study was conducted in the framework of the Italian Arthroplasty Registry (RIAP) supported by the Medical Devices and Pharmaceutical Service General Directorate of the Italian Ministry of Health. Acta thanks Keijo T Mäkelä, Rob Nelissen, and Ola Rolfson for help with peer review of this study.

631

Boyer P, Boutron I, Ravaud P. Scientific production and impact of national registers: the example of orthopaedic national registers. Osteoarthritis Cartilage 2011; 19(7): 858-63. Campbell A J, Bagley A, Van Heest A, James M A. Challenges of randomized controlled surgical trials. Orthop Clin North Am 2010; 41(2): 145-55. Castillo R C, Scharfstein D O, MacKenzie E J. Observational studies in the era of randomized trials: finding the balance. J Bone Joint Surg Am 2012; 94(Suppl. 1): 112-7. Hailer N P. Orthopedic registry research: limitations and future perspectives. Acta Orthop 2015; 86(1): 1-2. Havelin L I, Fenstad A M, Salomonsson R, Mehnert F, Furnes O, Overgaard S, Pedersen A B, Herberts P, Kärrholm J, Garellick G. The Nordic Arthroplasty Register Association: a unique collaboration between 3 national hip arthroplasty registries with 280,201 THRs. Acta Orthop 2009; 80(4): 393-401. Heckmann N, Ihn H, Stefl M, Etkin C D, Springer B D, Berry D J, Lieberman J R. Early results from the American Joint Replacement Registry: a comparison with other national registries. J Arthroplasty 2019; 34(7): S125-34. Herberts P, Malchau H. Long-term registration has improved the quality of hip replacement: a review of the Swedish THR Register comparing 160,000 cases. Acta Orthop Scand 2000; 71(2): 111-21. Hughes R E, Batra A, Hallstrom B R. Arthroplasty registries around the world: valuable sources of hip implant revision risk data. Curr Rev Musculoskelet Med 2017; 10(2): 240-52. Keurentjes J C, Pijls B G, Van Tol F R, Mentink J F, Mes S D, Schoones J W, Fiocco M, Sedrakyan A, Nelissen R G. Which implant should we use for primary total hip replacement?: a systematic review and meta-analysis. J Bone Joint Surg Am 2014; 96(Suppl. 1): 79-97. Lobo S, Zargaran D, Zargaran A. The 50 most cited articles in ankle surgery. Orthop Rev 2020; 12(4). Lübbeke A, Carr A J, Hoffmeyer P. Registry stakeholders. EFORT Open Rev 2019; 4(6): 330-6. Malchau H, Garellick G, Berry D, Harris W H, Robertson O, Kärrlholm J, Lewallen D, Bragdon C R, Lidgren L, Herberts P. Arthroplasty implant registries over the past five decades: development, current, and future impact. J Orthop Res 2018; 36(9): 2319-30. Morshed S, Tornetta III P, Bhandari M. Analysis of observational studies: a guide to understanding statistical methods. J Bone Joint Surg Am 2009; 91(Suppl. 3): 50-60. Mundi R, Chaudhry H, Mundi S, Godin K, Bhandari M. Design and execution of clinical trials in orthopaedic surgery. Bone Joint Res 2014; 3(5): 161-8. Nelissen R G, Pijls B G, Kärrholm J, Malchau H, Nieuwenhuijse M J, Valstar E R. RSA and registries: the quest for phased introduction of new implants. J Bone Joint Surg Am 2011; 93(Suppl. 3): 62-5. Nieuwenhuijse M J, Nelissen R G H H, Schoones J W, Sedrakyan A. Appraisal of evidence base for introduction of new implants in hip and knee replacement: a systematic review of five widely used device technologies. BMJ 2014; 349(Sep 9): g5133. Paxton E W, Mohaddes M, Laaksonen I, Lorimer M, Graves S E, Malchau H, Namba R S, Kärrholm J, Rolfson O, Cafri G. Meta-analysis of individual registry results enhances international registry collaboration. Acta Orthop 2018; 89(4): 369-73. Pijls B G, Meessen J M, Tucker K, Stea S, Steenbergen L, Marie Fenstad A, Mäkelä K, Cristian Stoica I, Goncharov M, Overgaard S. MoM total hip replacements in Europe: a NORE report. EFORT Open Rev 2019; 4(6): 423-9. Ranstam J, Kärrholm J, Pulkkinen P, Mäkelä K, Espehaug B, Pedersen A B, Mehnert F, Furnes O, for the NARA study group. Statistical analysis of arthroplasty data, I: Introduction and background. Acta Orthop 2011; 82(3): 253-7. Robertsson O, Bizjajeva S, Fenstad A M, Furnes O, Lidgren L, Mehnert F, Odgaard A, Pedersen A B, Havelin L I. Knee arthroplasty in Denmark, Norway and Sweden: a pilot study from the Nordic Arthroplasty Register Association. Acta Orthop 2010; 81(1): 82-9.


632

Sedrakyan A, Paxton E W, Phillips C, Namba R, Funahashi T, Barber T, Sculco T, Padgett D, Wright T, Marinac-Dabic D. The international consortium of orthopaedic registries: overview and summary. J Bone Joint Surg Am 2011; 93(Suppl. 3): 1-12. Van Steenbergen L N, Denissen G A, Spooren A, Van Rooden S M, Van Oosterhout F J, Morrenhof J W, Nelissen R G. More than 95% completeness of reported procedures in the population-based Dutch Arthroplasty Register: external validation of 311,890 procedures. Acta Orthop 2015; 86(4): 498-505. Van Steenbergen L N, Mäkelä K T, Kärrholm J, Rolfson O, Overgaard S, Furnes O, Pedersen A B, Eskelinen A, Hallan G, Schreurs B W. Total hip arthroplasties in the Dutch Arthroplasty Register (LROI) and the Nordic

Acta Orthopaedica 2021; 92 (5): 628–632

Arthroplasty Register Association (NARA): comparison of patient and procedure characteristics in 475,685 cases. Acta Orthop 2021; 92(1): 15-22. Varnum C, Pedersen A B, Rolfson O, Rogmark C, Furnes O, Hallan G, Mäkelä K, de Steiger R, Porter M, Overgaard S. Impact of hip arthroplasty registers on orthopaedic practice and perspectives for the future. EFORT Open Rev 2019; 4(6): 368-76. Yakkanti R, Greif D N, Wilhelm J, Allegra P R, Yakkanti R, Hernandez V H. Unicondylar knee arthroplasty: a bibliometric analysis of the 50 most commonly cited studies. Arthroplasty Today. 2020; 6(4): 931-40. Zanoli G. The Cochrane Collaboration on arthroplasty reviews and the role of evidence coming from registries: a difficult but necessary way forward. 1st ISAR Congress, Bergen, Norway, May 20-22; 2012.


5/21

Medical ACTA ORTHOPAEDICA

LONGER IMPLANT SURVIVAL. WITH THE RIGHT BONE CEMENT.

Real World Data show:

19%

lower revision risk* with PALACOS® R+G compared to other bone cements

* Calculated difference of cumulative revision rates in knee arthroplasty at 14 years of implantation NJR Data Supplier Feedback (summary reports); Cumulative revision rates (2007–2020) status May 2021. Current report accessible at http://herae.Us/njr-data :H WKDQN WKH SDWLHQWV DQG VWDII RI DOO WKH KRVSLWDOV LQ (QJODQG :DOHV 1RUWKHUQ ,UHODQG DQG WKH ,VOH RI 0DQ ZKR KDYH FRQWULEXWHG GDWD WR WKH 1DWLRQDO -RLQW 5HJLVWU\ :HbDUH JUDWHIXO WR WKH +HDOWKFDUH 4XDOLW\ ,PSURYHPHQW 3DUWQHUVKLS +4,3 WKH 1-5 6WHHULQJ &RPPLWWHH DQG VWDII DW WKH 1-5 &HQWUH IRU IDFLOLWDWLQJ WKLV ZRUN 7KH YLHZV H[SUHVVHG UHSUHVHQW WKRVH RI +HUDHXV 0HGLFDO *PE+ DQG GR QRW QHFHVVDULO\ UHƃHFW WKRVH RI WKH 1DWLRQDO -RLQW 5HJLVWU\ 6WHHULQJ &RPPLWWHH RU WKH +HDOWK 4XDOLW\ ,PSURYHPHQW 3DUWQHUVKLS +4,3 ZKR GR QRW YRXFK IRU KRZ WKH LQIRUPDWLRQ LV SUHVHQWHG

COVER.indd 1

10666

www.heraeus-medical.com

Vol. 92, No. 5, 2021 (pp. 501–632)

The element of success in joint replacement

Volume 92, Number 5, October 2021

09-09-2021 18:39:33


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.