SORA

Advancing, promoting and sharing knowledge of health through excellence in teaching, clinical practice and research into the prevention and treatment of illness

Can ChatGPT outperform a neurosurgical trainee? A prospective comparative study.

Williams, SC; Starup-Hansen, J; Funnell, JP; Hanrahan, JG; Valetopoulou, A; Singh, N; Sinha, S; Muirhead, WR; Marcus, HJ (2024) Can ChatGPT outperform a neurosurgical trainee? A prospective comparative study. Br J Neurosurg. pp. 1-10. ISSN 1360-046X https://doi.org/10.1080/02688697.2024.2308222
SGUL Authors: Singh, Navneet

[img]
Preview
PDF Published Version
Available under License Creative Commons Attribution.

Download (1MB) | Preview

Abstract

PURPOSE: This study aimed to compare the performance of ChatGPT, a large language model (LLM), with human neurosurgical applicants in a neurosurgical national selection interview, to assess the potential of artificial intelligence (AI) and LLMs in healthcare and provide insights into their integration into the field. METHODS: In a prospective comparative study, a set of neurosurgical national selection-style interview questions were asked to eight human participants and ChatGPT in an online interview. All participants were doctors currently practicing in the UK who had applied for a neurosurgical National Training Number. Interviews were recorded, anonymised, and scored by three neurosurgical consultants with experience as interviewers for national selection. Answers provided by ChatGPT were used as a template for a virtual interview. Interview transcripts were subsequently scored by neurosurgical consultants using criteria utilised in real national selection interviews. Overall interview score and subdomain scores were compared between human participants and ChatGPT. RESULTS: For overall score, ChatGPT fell behind six human competitors and did not achieve a mean score higher than any individuals who achieved training positions. Several factors, including factual inaccuracies and deviations from expected structure and style may have contributed to ChatGPT's underperformance. CONCLUSIONS: LLMs such as ChatGPT have huge potential for integration in healthcare. However, this study emphasises the need for further development to address limitations and challenges. While LLMs have not surpassed human performance yet, collaboration between humans and AI systems holds promise for the future of healthcare.

Item Type: Article
Additional Information: © 2024 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The terms on which this article has been published allow the posting of the Accepted Manuscript in a repository by the author(s) or with their consent.
Keywords: AI, Artificial intelligence, ChatGPT, healthcare, large language model, natural language processing, neurosurgery, Artificial intelligence, AI, natural language processing, large language model, ChatGPT, neurosurgery, healthcare, 1103 Clinical Sciences, 1109 Neurosciences, Neurology & Neurosurgery
SGUL Research Institute / Research Centre: Academic Structure > Institute of Medical & Biomedical Education (IMBE)
Journal or Publication Title: Br J Neurosurg
ISSN: 1360-046X
Language: eng
Dates:
DateEvent
2 February 2024Published
16 January 2024Accepted
Publisher License: Creative Commons: Attribution 4.0
Projects:
Project IDFunderFunder ID
NS/A000050/1WEISSUNSPECIFIED
UNSPECIFIEDNational Institute for Health and Care Researchhttp://dx.doi.org/10.13039/501100000272
PubMed ID: 38305239
Web of Science ID: WOS:001155437200001
Go to PubMed abstract
URI: https://openaccess.sgul.ac.uk/id/eprint/116490
Publisher's version: https://doi.org/10.1080/02688697.2024.2308222

Actions (login required)

Edit Item Edit Item