|Year : 2019 | Volume
| Issue : 3 | Page : 115-125
Deducing differential diagnoses in movement disorders: Neurology residents versus a novel mobile medical application (Neurology Dx)
Pulikottil W Vinny1, Roopa Rajan2, Vinay Goyal2, Madakasira V Padma2, Vivek Lal3, Padmavathy N Sylaja4, Lakshmi Narasimhan5, Sada N Dwivedi6, Pradeep P Nair7, Dileep Ramachandran8, Anu Gupta9, Venugopalan Y Vishnu2
1 Department of Neurology, Indian Naval Hospital Ship Asvini, Mumbai, Maharashtra, India
2 Department of Neurology, All India Institute of Medical Sciences, New Delhi, India
3 Department of Neurology, Postgraduate Institute of Medical Education and Research, Chandigarh, India
4 Department of Neurology, Sree Chitra Tirunal Institute of Medical Sciences and Technology, Thiruvananthapuram, Kerala, India
5 Department of Neurology, Madras Medical College, Chennai, Tamil Nadu, India
6 Department of Biostatistics, All India Institute of Medical Sciences, New Delhi, India
7 Department of Neurology, Jawaharlal Institute of Postgraduate Medical Education and Research, Puducherry, India
8 Department of Neurology, Government Medical College Trivandrum, Thiruvananthapuram, Kerala, India
9 Department of Neurology, Govind Ballabh Pant Institute of Postgraduate Medical Education and Research, New Delhi, India
|Date of Submission||27-Aug-2019|
|Date of Decision||02-Oct-2019|
|Date of Acceptance||12-Nov-2019|
|Date of Web Publication||04-Dec-2019|
Dr. Venugopalan Y Vishnu
Room No.704, CN Centre, Seventh Floor, All India Institute of Medical Sciences, New Delhi.
Source of Support: None, Conflict of Interest: None
AIM: The aim of this study was to detect the diagnostic accuracy of a novel app (Neurology Dx) vis-à-vis neurology residents. METHODS: A multicenter cross-sectional study involving seven leading teaching neurology institutes in India was conducted by recruiting 100 neurology residents. Primary outcome was proportion of correctly identified high likely gold standard differential diagnoses. Secondary outcomes were proportions of correctly identified first high likely, first three high likely, first five high likely, and combined moderate plus high likely gold standard differentials. RESULTS: Four sets comprising 15 movement disorder vignettes each (total 60) were tested on 100 neurology residents (one set for each resident) and also on the app (60 vignettes). Residents correctly identified the gold standard “high likely” differentials with a frequency of 13.6% as against 41.5% by the app (95% confidence interval [CI]: 21.9–34.1). On combining “high” and “moderate likely” differentials, residents could accurately identify gold standard differentials with a frequency of 10.8% as against 37.9% by the app (95% CI: 22.6–31.9). The residents correctly identified first five high likely gold standard differentials with a frequency of 13.5% versus 23.7% by the app (95% CI: 5.3–15.9). The residents correctly identified first three high likely gold standard differentials with a frequency of 13.0% versus 15.8% by the app (95% CI: -1.2–7.9). Residents correctly identified the first “high likely” gold standard differential in 32.3% as against 35% by the app (95% CI: -8.4–15.6). CONCLUSIONS AND RELEVANCE: This study suggests that an app (Neurology Dx) is capable of generating differential diagnoses to complement clinical reasoning of neurology residents (CTRI/2017/06/008838).
Keywords: Artificial intelligence, medical education, movement disorders, resident training
|How to cite this article:|
Vinny PW, Rajan R, Goyal V, Padma MV, Lal V, Sylaja PN, Narasimhan L, Dwivedi SN, Nair PP, Ramachandran D, Gupta A, Vishnu VY. Deducing differential diagnoses in movement disorders: Neurology residents versus a novel mobile medical application (Neurology Dx). Ann Mov Disord 2019;2:115-25
|How to cite this URL:|
Vinny PW, Rajan R, Goyal V, Padma MV, Lal V, Sylaja PN, Narasimhan L, Dwivedi SN, Nair PP, Ramachandran D, Gupta A, Vishnu VY. Deducing differential diagnoses in movement disorders: Neurology residents versus a novel mobile medical application (Neurology Dx). Ann Mov Disord [serial online] 2019 [cited 2020 Apr 9];2:115-25. Available from: http://www.aomd.in/text.asp?2019/2/3/115/272284
| Introduction|| |
Artificial intelligence has found many applications in solving problems peculiar to movement disorders., Clinical presentations in movement disorders often elicit an overwhelmingly large number of differential diagnoses (DDx). Mastering the art of drawing a complete list of relevant DDx from a constellation of patient attributes is a daunting task, especially for budding neurologists. A potentially treatable medical diagnosis may be missed for lack of experience, deficient deductive heuristics, or cognitive bias.,,, Artificial intelligence with its powerful algorithms holds the promise to fill this gap in deductive reasoning.,, Apps for deriving differential diagnosis in movement disorders are lacking on mobile platforms. A study was designed to compare diagnostic accuracy between neurology residents and a novel app (created by first and corresponding authors) in formulating a comprehensive list of the DDx to a clinical scenario related to movement disorders.
| Methods|| |
A multicenter, cross-sectional study was designed, involving seven tertiary care neurology teaching centers across India (July 15 to Aug 15, 2017). The aim of the study was to detect the diagnostic accuracy of Neurology Dx vis-à-vis neurology residents by comparing their differentials to differentials reasoned by experts (gold standard differentials).
- Neurology residents possessing a Doctor of Medicine (MD)/equivalent degree in internal medicine by a university/board recognized by the Medical Council of India (MCI) after completing medical school (total 8 years).
- Registered as a licensed medical practitioner in the Indian Medical Register and currently pursuing a postdoctoral residency program in neurology.
- Formal consent to participate in the study.
- Relinquishing any access to phone/computer/books or Internet services while answering clinical vignettes under the supervision of two independent investigators.
Making a mobile medical app: First and corresponding authors prepared an exhaustive list (knowledge based) of neurological diagnoses, symptoms, signs, and image findings from freely available resources. These disease characteristics served as unique data points. Associations between data points were defined by assigning a frequency to a pair of data point. The resulting knowledge base comprised more than 60,000 unique data points with assigned associations. A unique algorithm was then written to derive a list of differential diagnosis from the combination of user-selected patient attributes by using fuzzy logic. The knowledge base and algorithm were converted to an iOS app using swift computer language and uploaded to App Store [Figure 1]. Each clinical query entered in the App generates up to 11 DDx, a number that was arrived at due to limitation of mobile screen size. The final version (Neurology Dx, Version 2.1) was uploaded to App Store on June 13, 2017. No changes were made to the knowledge base or algorithm of the app via further updates in App Store since the commencement of this study to prevent bias and ensure reproducibility of results. A detailed description on the creation of the app has already been published.
|Figure 1: UML (unified modeling language) component and activity diagram of the medical mobile application, app (Neurology Dx)|
Click here to view
Seven tertiary care teaching neurology centers across India participated in the study [Figure 2]. Experts created 60 clinical vignettes related to movement disorders [Supplemental Methods S1]. These clinical vignettes were divided into sets, each comprising 15 clinical vignettes. Each set comprised clinical vignettes of varying difficulty, mild (5), moderate (5), and severe (5). Each resident was asked to attempt one set of these clinical vignettes.
|Figure 2: Participating teaching neurology centers with number of residents/year of residency and the set of movement disorder clinical vignettes (MDCV series I–IV)|
Click here to view
The residents were denied access to electronic databases, books, mobile phones, computers, and Internet while attempting the clinical vignettes. They were asked to pen down any number of DDx relevant to a particular clinical vignette in the descending order of priority. The differentials to clinical vignettes were documented in duplicates, and any errors found were eliminated. The patient attributes from each clinical vignette were fed into the app to obtain app-generated differentials. Up to 11 DDx generated by the app for each clinical vignette were recorded. DDx given by the residents and generated by the app were later pooled for each clinical vignette and shown to experts. Experts were unaware about the source differentials in the pooled list and could add new differentials to the pool. Experts derived gold standard DDx to all the clinical vignettes and divided them into four groups: “high likely,” “moderate likely,” “less likely,” and “wrong” differentials. These gold standard differentials were independently verified by another expert in the field. Disagreements between experts with respect to gold standard differentials and their order of relevance were resolved by mutual consultation. These gold standard differentials formed the pivot against which the DDx generated by neurology residents and the app were compared. The study protocol was prospectively registered in the Clinical Trial Registry of India (CTRI/2017/06/008838) and approved by the All India Institute of Medical Sciences (AIIMS) Institute Ethics Committee (IEC 241/05.05.2017).
The main outcome was detecting diagnostic accuracy of residents and app measured as a proportion of correctly identified high likely gold standard DDx. The secondary outcomes were proportions of correctly identified first high likely, first three high likely, first five high likely, and combined moderate plus high likely gold standard differentials by residents and app. The prespecified subgroup analysis was diagnostic accuracy among first-, second-, and third-year senior residents. Post hoc subgroup analysis was also carried out based on the difficulty level of clinical vignettes (mild, moderate, and severe).
Statistical analysis and sample size
On the basis of a previous study on diagnostic accuracy of physicians with computer algorithms, a sample size of 1500 (100 residents, 15 clinical vignettes each) movement disorder clinical vignettes (MCDVs) was estimated. The proportion of correctly identified differentials was compared between neurology residents and app using proportion test, and absolute difference with 95% confidence interval (CI) was listed.
| Results|| |
In total, 122 residents were found eligible for the study, of which 22 did not participate in the study (7 were employed on emergency duty, and rest were on leave). Of the 22 who did not participate in the study, 14 were third-year residents, 6 were second-year residents, and 2 were first-year residents. Remaining 100 residents participated in the study after informed consent. Neurology residents answered one set of 15 vignettes each. At the end of the study, a total of 1500 completed MDCVs were available [Figure 3]. Disease characteristics from all 60 movement disorder vignettes were entered into the app. The mean number of DDx generated per clinical vignette was 1 (range: 0–9) for residents and 10.68 (range: 1–11) for the app. The gold standard differentials were generated by experts (maximum high likely: 11, maximum moderate likely: 8, and maximum combined high and moderate likely: 14).
Residents correctly identified the gold standard high likely differentials with a frequency of 13.6% as against 41.5% by the app (95% CI: 21.9–34.1, P < 0.001) [Table 1]. On combining high and moderate likely differentials, residents could accurately identify gold standard differentials with a frequency of 10.8% as against 37.9% by the app (95% CI: 22.6–31.9, P < 0.001). Residents correctly identified the first high likely gold standard differential as their first high likely differential in 32.3% as against 35% by the app (95% CI: -8.4–15.6, P = 0.66). In correctly identifying the single first-ranked “high likely” gold standard differential as their first differential, residents showed a consistently improving pattern across their years in residency [Table 2], namely 26.2% by the first-year resident compared to 35% by the app (95% CI: -2.6–21.8, P = 0.14), 31.5% by the second-year resident compared to 35% by the app (95% CI: -8.2–16.7, P = 0.59), and 45.8% by the third-year resident compared to 35% by the app (95% CI: -2.9–22.9, P = 0.12). The diagnostic accuracy of the residents to identify high likely gold standard differentials among their first three differentials was 13.0% as against 15.8% by the app (95% CI: -1.2–7.9, P = 0.19). When the first five differentials were considered together, residents correctly identified the gold standard differentials with a frequency of 13.5% as against 23.7% by the app (95% CI: 5.3–15.9, P < 0.0001).
|Table 1: Neurology residents versus app (Neurology Dx): evaluation of differential diagnoses of movement disorder clinical vignettes|
Click here to view
|Table 2: Prespecified subgroup analysis of differential diagnoses generated on movement disorder clinical vignettes based on the year of neurology residency|
Click here to view
When performance was compared specifically for first high likely and first three high likely differentials, no difference was observed between the residents and the app [Table 1]. Further, when comparisons were made between first-year residents and app, no difference was observed in first high likely differentials [Table 2]. Similar comparisons between second-year residents and app showed no difference in the performance for first high likely and first three high likely differentials. Comparisons between third-year residents and app showed that the residents outperformed the app for first three high likely differentials, whereas no difference was observed for first high likely and first five high likely differentials. Post hoc subgroup analysis comparing the differentials reasoned by all neurology residents and the app based on difficulty grade of clinical vignette was also analyzed [Supplemental Table S1]. For clinical vignettes graded with difficulty level of “severe,” no difference in performance was observed for first three high likely differentials between residents and app. Similar comparisons for “moderate” severity clinical vignettes showed no difference in performance for first high likely and first three high likely differentials. When only “mild” severity clinical vignettes were compared, no difference in performance was observed between residents and the app for first high likely, first three high likely, or first five high likely differentials.
| Discussion|| |
Computer programs are formidable tools to close gaps in knowledge and cognitive reasoning in clinical practice., Synergistic effects have been noted in prior publications when humans and artificial intelligence tools were compared in interpreting clinical data or discerning correct diagnosis in the field of histopathology slides, skin lesions, and diabetic retinopathy.,, In one such prospective trial involving primary care clinics, an independent algorithm (IDx) detected retinopathy in diabetics with a sensitivity and specificity of 87% and 91%. This trial led to the Food and Drug Administration approval of the algorithm (IDx) for the detection of diabetic retinopathy without human intervention.,
Diagnostic errors have been defined as diagnoses that are missed, wrong, or delayed, as detected by some subsequent definitive test or finding. Drawing DDx from clinical scenarios are prone to errors arising out of cognitive bias, incorrect reasoning, and lack of sufficient knowledge.,, Studies have shown that computer-based clinical decision support systems can often mitigate these errors. According to a recently published systematic review, DDx generators have the potential to improve diagnostic practice among clinicians.
We report that an artificial intelligence–based mobile platform application (app) can derive DDx to movement disorder clinical scenarios with performance metrics superior to that offered by trainee neurologists. All residents participating in the study were enrolled in postdoctoral neurology residency programs in the participating centers, after having earned their MD in internal medicine (a minimum of 8 years in medical school), and were registered with the Indian Medical Register as licensed medical practitioners. The app generated accurate gold standard differentials with greater frequency for a given clinical vignette compared to the neurology residents across years in residency and difficulty level of clinical vignettes.
The performance among residents improved consistently from first- through third-year residency as well as with decreasing order of severity of clinical vignettes. [Table 2] and [Supplemental Table S1]. The third-year residents significantly outperformed the app in getting the first three high likely differential of the gold standard right. Although the third-year resident performed better than the app for first high likely differential, this difference in performance was not significant. The residents consistently performed better compared to the app as the severity of clinical vignette dropped from severe to mild.
|Supplemental Table S1: Post hoc subgroup analysis of differential diagnoses generated on movement disorder clinical vignettes based on the level of difficulty of vignettes|
Click here to view
The differentials given by the residents factor in the regional prevalence of derived differential, resident's prior clinical experiences and presence of potentially treatable differentials in the derived list. The app algorithm focuses on generating a broad differential list over a single correct diagnosis, which is often the initial goal of clinical evaluation. The clinical vignettes were intentionally designed in such a way so as to elicit a broad list of DDx rather than a single correct diagnosis. Consequently, experts also prepared a list of gold standard DDx in order of priority, for each clinical vignette, instead of a single correct diagnosis. Drawing a comprehensive list of DDx to a given clinical scenario in time-constrained settings can be a challenge to most budding neurologists. Clinical ward rounds mitigate this challenge with the help of a collective cognitive reasoning and discussions (residents with consultants), pooled knowledge base, and ready access to Internet for quick literature search. Experts backed by many years of clinical experience carried out extensive discussions without constraints of time before finalizing the gold standard differentials. A large number of gold standard differentials were arrived at for a given clinical vignette. Diagnostic accuracy when seen as a percentage of this comprehensive gold standard differentials may explain the low accuracy of participating residents who limited by constraints of time neither had access to clinical discussions nor external sources of knowledge while solving the clinical vignettes. In addition, low representation of final-year residents (22%) can explain the lower performance among the residents. The app on the contrary was not limited by aforementioned human stumbling blocks on performance. A fast processing speed, large knowledge base curated by academic neurologists, and a novel algorithm helped it to outperform the residents.
When differentials to 60 cognitive neurology vignettes were compared between the same group of residents and Neurology Dx, the app performed consistently better. Cognitive neurology might have a steeper learning curve for neurology residents, especially in the initial years of training. Movement disorders are different from cognitive neurology due to overtly manifest signs. Neurology residency training in India, at least in these seven participating centers, is probably skewed toward movement disorder cases as compared to cognitive neurology.
The app offered the unique advantage of portability on a mobile platform and full functionality in areas without Internet access. It, in many instances, was also able to list relevant DDx that were not considered by residents. These features can be exploited to complement human reasoning while deriving DDx in time-constrained settings at the point of care. For every derived DDx, the app also fetched links to published case series, case reports, and review articles from indexed medical databases (subject to availability of Internet). These links could be useful to quickly verify correctness of DDx derived by the app at the point of care.
Limitations: The app had limited symptoms, signs, radiological findings, and lab values to choose from while entering data from clinical vignettes. On occasions, exact patient attributes in the clinical vignettes were missing from the app, necessitating the closest keywords approximating the symptoms to be used. The app also generated more wrong differentials compared to residents, a limitation of most predefined knowledge-based medical applications. The app was further at a disadvantage as an arbitrary ceiling of first 11 differentials was enforced on the app (due to screen size limitation). The onset and course of presentation in clinical vignette were obvious to the participating residents that served as crucial information in pinpointing neurological localization before formulating DDx. The app however was not designed to factor in onset and course of clinical presentation or localization in neuraxis in its algorithm. The app with its curated knowledge base and algorithm was also frozen before commencement of the study to prevent bias and to ensure reproducibility of results in future. These limitations in the app could be alleviated by constantly updating the knowledge base of the app and its algorithm by subject experts. The first and corresponding authors have already created a new app (Neurology Pro) with an updated knowledge base and a completely redesigned algorithm to overcome the limitations brought out in this study. The new algorithm features machine-learning characteristics wherein users can teach the algorithm in real-time by reordering the DDx priority as also by adding missing DDx.
| Conclusion|| |
Twenty-first century has seen an explosion in the use of artificial intelligence in the field of medicine. A vast number of DDx derived clinically and based on phenomenology in Movement Disorders make it an ideal field for the application of artificial intelligence in diagnostic Neurology. The results shown in this study hold a promise to pave way for creation Apps in the field of Movement Disorders with better accuracy. The limitations of the App, highlighted in this study can easily be corrected in future updates of the App to improve its performance. We expect the predictive heuristics provided by the App in formulating DDx to be relevant to trainee and practicing Neurologists as an adjunctive tool for clinical decision making. A complementary relationship between Neurologists and Neurology Dx® can be one of the many answers in reducing diagnostic errors in clinical practice. We are confident that algorithms of tomorrow will be able to meaningfully ingest and process massive amounts of clinical data sets to usher in an era of symbiotic relationship between man and his machines.
All relevant data are within the article and its online supplement files. Any further information is available on personal communication with corresponding author.
We would like to thank Ms. Abhilasha Krishnamurthy (December Design Studio) for the illustrations in the article.
Financial support and sponsorship
Travel and stay expenses of the principal investigator (PI) (VVY) who went to each center to conduct the study with coinvestigator from that center was borne by PI himself. No external funding was received.
VPW: Copyright owner of App is spouse of VPW. Dr. RR reports no disclosures. Dr. VG reports no disclosures. Dr. MVP reports no disclosures. Dr. LV reports no disclosures. Dr. SPN reports no disclosures. Dr. LN reports no disclosures. Dr. NPP reports no disclosures. Dr. RD reports no disclosures. Dr. GA reports no disclosures. Dr. VVY reports no disclosures.
Conflicts of interest
There are no conflicts of interest.
| References|| |
LeMoyne R, Mastroianni T. Use of smartphones and portable media devices for quantifying human movement characteristics of gait, tendon reflex response, and Parkinson’s disease hand tremor. Methods Mol Biol 2015;1256:335-58.
Lee CY, Kang SJ, Hong SK, Ma HI, Lee U, Kim YJ. A validation study of a smartphone-based finger tapping application for quantitative assessment of bradykinesia in Parkinson’s disease. PLoS One 2016;11:e0158852.
Balla J, Heneghan C, Goyder C, Thompson M. Identifying early warning signs for diagnostic errors in primary care: A qualitative study. BMJ Open 2012:pii: e001539
Berner ES, Graber ML. Overconfidence as a cause of diagnostic error in medicine. Am J Med 2008;121:S2-23.
Kostopoulou O, Oudhoff J, Nath R, Delaney BC, Munro CW, Harries C, et al
. Predictors of diagnostic accuracy and safe management in difficult diagnostic problems in family medicine. Med Decis Making 2008;28:668-80.
Graber ML, Kissam S, Payne VL, Meyer AN, Sorensen A, Lenfestey N, et al
. Cognitive interventions to reduce diagnostic error: A narrative review. BMJ Qual Saf 2012;21:535-57.
Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med 2019;380:1347-58.
Cohen AB, Nahed BV, Sheth KN. Mobile medical applications in neurology. Neurol Clin Pract 2013;3:52-60.
Alotaibi NM, Sarzetto F, Guha D, Lu M, Bodo A, Gupta S, et al
. Impact of smartphone applications on timing of endovascular therapy for ischemic stroke: A preliminary study. World Neurosurg 2017;107:678-83.
Neurology Dx. Available from: https://apps.apple.com/us/app/neurology-dx/id1224462861. [Last accessed on 27 November 2019].
Vinny PW, Gupta A, Modi M, Srivastava MVP, Lal V, Sylaja PN, et al
. Head to head comparison between neurology residents and a mobile medical application for diagnostic accuracy in cognitive neurology. QJM 2019;112:591-8.
Semigran HL, Levine DM, Nundy S, Mehrotra A. Comparison of physician and computer diagnostic accuracy. JAMA Intern Med 2016;176:1860-1.
Topol EJ. High-performance medicine: The convergence of human and artificial intelligence. Nat Med 2019;25:44-56.
Steiner DF, MacDonald R, Liu Y, Truszkowski P, Hipp JD, Gammage C, et al
. Impact of deep learning assistance on the histopathologic review of lymph nodes for metastatic breast cancer. Am J Surg Pathol 2018;42:1636-46.
Haenssle HA, Fink C, Schneiderbauer R, Toberer F, Buhl T, Blum A, et al
; Reader study level-I and level-II Groups. Man against machine: Diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann Oncol 2018;29:1836-42.
De Fauw J, Ledsam JR, Romera-Paredes B, Nikolov S, Tomasev N, Blackwell S, et al
. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat Med 2018;24:1342-50.
Keane PA, Topol EJ. With an eye to AI and autonomous diagnosis. NPJ Digit Med 2018;1:40.
Abràmoff MD, Lavin PT, Birch M, Shah N, Folk JC. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digit Med 2018;1:39.
Meyer AN, Payne VL, Meeks DW, Rao R, Singh H. Physicians’ diagnostic accuracy, confidence, and resource requests: A vignette study. JAMA Intern Med 2013;173:1952-8.
Yuen T, Derenge D, Kalman N. Cognitive bias: Its influence on clinical diagnosis. J Fam Pract 2018;67:366, 368, 370, 372.
Riches N, Panagioti M, Alam R, Cheraghi-Sohi S, Campbell S, Esmail A, et al
. The effectiveness of electronic differential diagnoses (DDX) generators: A systematic review and meta-analysis. PLoS One 2016;11:e0148991.
Neurology Pro. Available from: https://apps.apple.com/us/app/neurology-pro-a-ddx-app/id1407300412. [Last accessed on 27 November 2019].
[Figure 1], [Figure 2], [Figure 3]
[Table 1], [Table 2], [Supplemental Table S1]