Article Text
Abstract
Objectives To facilitate the stratification of patients with osteoarthritis (OA) for new treatment development and clinical trial recruitment, we created an automated machine learning (autoML) tool predicting the rapid progression of knee OA over a 2-year period.
Methods We developed autoML models integrating clinical, biochemical, X-ray and MRI data. Using two data sets within the OA Initiative—the Foundation for the National Institutes of Health OA Biomarker Consortium for training and hold-out validation, and the Pivotal Osteoarthritis Initiative MRI Analyses study for external validation—we employed two distinct definitions of clinical outcomes: Multiclass (categorising OA progression into pain and/or radiographic) and binary. Key predictors of progression were identified through advanced interpretability techniques, and subgroup analyses were conducted by age, sex and ethnicity with a focus on early-stage disease.
Results Although the most reliable models incorporated all available features, simpler models including only clinical variables achieved robust external validation performance, with area under the precision-recall curve (AUC-PRC) 0.727 (95% CI: 0.726 to 0.728) for multiclass predictions; and AUC-PRC 0.764 (95% CI: 0.762 to 0.766) for binary predictions. Multiclass models performed best in patients with early-stage OA (AUC-PRC 0.724–0.806) whereas binary models were more reliable in patients younger than 60 (AUC-PRC 0.617–0.693). Patient-reported outcomes and MRI features emerged as key predictors of progression, though subgroup differences were noted. Finally, we developed web-based applications to visualise our personalised predictions.
Conclusions Our novel tool’s transparency and reliability in predicting rapid knee OA progression distinguish it from conventional ‘black-box’ methods and are more likely to facilitate its acceptance by clinicians and patients, enabling effective implementation in clinical practice.
- Knee Osteoarthritis
- Machine Learning
- Arthritis
- Osteoarthritis
Data availability statement
Data are available in a public, open access repository. Data and/or research tools used in the preparation of this manuscript were obtained and analysed from the controlled access data sets distributed from the Osteoarthritis Initiative (OAI), a data repository housed within the National Institute of Mental Health (NIMH) Data Archive. OAI is a collaborative informatics system created by the NIMH and the National Institute of Arthritis, Musculoskeletal and Skin Diseases to provide a worldwide resource to quicken the pace of biomarker identification, scientific investigation and OA drug development. (DOI: 10.15154/1vhq-h028).Data provided from the FNIH OA Biomarkers Consortium Project (available at https://nda.nih.gov/oai/) made possible through grants and direct or in-kind contributions by: AbbVie; Amgen; Arthritis Foundation; Artialis; Bioiberica; BioVendor; DePuy; Flexion Therapeutics; GSK; IBEX; IDS; Merck Serono; Quidel; Rottapharm | Madaus; Sanofi; Stryker; the Pivotal OAI MRI Analyses study, NIH HHSN2682010000 21C; and the Osteoarthritis Research Society International. The OAI is a public-private partnership comprised of five contracts (N01-AR-2-2258; N01-AR-2-2259; N01AR-2-2260; N01-AR-2-2261; N01-AR-2-2262) funded by the National Institutes of Health. Funding partners include Merck Research Laboratories; Novartis Pharmaceuticals, GlaxoSmithKline; and Pfizer. Private sector funding for the consortium and OAI is managed by the Foundation for the National Institutes of Health. Code availability. The AutoPrognosis V.2.0 open-source software package is available at https://www.autoprognosis.vanderschaar-lab.com/.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Data availability statement
Data are available in a public, open access repository. Data and/or research tools used in the preparation of this manuscript were obtained and analysed from the controlled access data sets distributed from the Osteoarthritis Initiative (OAI), a data repository housed within the National Institute of Mental Health (NIMH) Data Archive. OAI is a collaborative informatics system created by the NIMH and the National Institute of Arthritis, Musculoskeletal and Skin Diseases to provide a worldwide resource to quicken the pace of biomarker identification, scientific investigation and OA drug development. (DOI: 10.15154/1vhq-h028).Data provided from the FNIH OA Biomarkers Consortium Project (available at https://nda.nih.gov/oai/) made possible through grants and direct or in-kind contributions by: AbbVie; Amgen; Arthritis Foundation; Artialis; Bioiberica; BioVendor; DePuy; Flexion Therapeutics; GSK; IBEX; IDS; Merck Serono; Quidel; Rottapharm | Madaus; Sanofi; Stryker; the Pivotal OAI MRI Analyses study, NIH HHSN2682010000 21C; and the Osteoarthritis Research Society International. The OAI is a public-private partnership comprised of five contracts (N01-AR-2-2258; N01-AR-2-2259; N01AR-2-2260; N01-AR-2-2261; N01-AR-2-2262) funded by the National Institutes of Health. Funding partners include Merck Research Laboratories; Novartis Pharmaceuticals, GlaxoSmithKline; and Pfizer. Private sector funding for the consortium and OAI is managed by the Foundation for the National Institutes of Health. Code availability. The AutoPrognosis V.2.0 open-source software package is available at https://www.autoprognosis.vanderschaar-lab.com/.
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
- Data supplement 1
- Data supplement 2
- Data supplement 3
- Data supplement 4
- Data supplement 5
- Data supplement 6
- Data supplement 7
- Data supplement 8
- Data supplement 9
- Data supplement 10
- Data supplement 11
- Data supplement 12
- Data supplement 13
- Data supplement 14
- Data supplement 15
- Data supplement 16
- Data supplement 17
- Data supplement 18
- Data supplement 19
- Data supplement 20
- Data supplement 21
- Data supplement 22
- Data supplement 23
- Data supplement 24
- Data supplement 25
Footnotes
Handling editor Josef S Smolen
Contributors All authors contributed to the conceptualisation and design of the study. SC contributed to the curation and analysis of the data. MB, MvdS and AM supervised the study. All authors contributed to the interpretation of the data, the drafting of the article and final approval of the version to be submitted. AM is the guarantor of the study. ChatGPT, an AI language model developed by OpenAI, was used exclusively to assist in improving the clarity and legibility of few sentences in the initial drafting of the manuscript, though these sections have been substantially revised by the authors to generate the final version. It did not contribute to the creation of content or the analysis of data.
Funding SC is supported by the Louis and Valerie Freedman Studentship in Medical Sciences from Trinity College Cambridge, the ORUK/Versus Arthritis: AI in MSK Research Fellowship (G124606) and the Addenbrooke’s Charitable Trust (ACT) Research Advisory Committee grant (G123290). At the start of the study, SC was also supported by the National Institute for Health and Care Research (NIHR) (ACF-2021-14-003). AM and MB are supported by the NIHR Cambridge Biomedical Research Centre (NIHR203312) and receive funding from Versus Arthritis (grant 21156). The views expressed are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care. The funders of the study were not involved in the design, data collection, analysis, interpretation or writing of this study.
Competing interests None declared. We confirm that we have read the journal’s position on issues involved in ethical publication and affirm that this report is consistent with those guidelines.
Patient and public involvement Patients and the public were involved early in our research, contributing to the development of our research questions and outcome measures. Their input, gathered through a focus group with the Patient and Public Involvement team at Addenbrooke’s Hospital, Cambridge, UK, informed the design of our study and our clinical demonstrators. While direct involvement in recruitment and study conduct was not applicable due to the nature of our data, their perspectives on the usability and implications of our research were integral. Our dissemination strategy includes regular interactions with this group, collaborations with patient groups and relevant charities (such as Osteoarthritis Research UK (ORUK) and Versus Arthritis), and public-friendly summaries of our findings to ensure ongoing, reciprocal communication and feedback.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.