Skip to yearly menu bar Skip to main content

Affinity Workshop: Women in Machine Learning

Can deep learning models understand natural language descriptions of patient symptoms following cataract surgery?

Mohita Chowdhury · Oliver Gardiner · Ernest Lim · Aisling Higham · Nick de Pennington


Demands on the healthcare system are rising, and healthcare staff are a finite resource. This study set out to explore which machine learning techniques are best able to understand clinical conversations and enable automation of activity previously performed by clinical staff. We used patient descriptions of their symptoms recorded during a cataract surgery follow-up conversation with our autonomous clinical assistant, Dora, which asks patients symptom-based questions to elicit postoperative concerns. We compare the ability of different machine learning techniques to understand patients’ descriptions of their symptoms and show how state-of-the-art natural language classifiers have the ability to understand routine patient intents and benefit clinical medicine. The training and test datasets were collected from two non-overlapping patient populations who used Dora as part of ethically approved research studies across 3 diverse UK hospitals. The training dataset was augmented with members of the public using the Dora platform. All participants consented to their data being used and the data was fully anonymised. The datasets consist of transcribed utterances of patients describing their symptoms in response to questions like “is your eye red?”. Each utterance was labelled with an intent from a total of 24 different intents. Two ophthalmologists independently labelled the dataset and resolved conflicts to establish ground truth labels. We compared 5 different deep learning and traditional machine learning models; the optimal hyperparameters for each were determined using a grid search and 4-fold cross-validation on the training set. The models were then trained on the entire training dataset and their performance was evaluated on the test set. The best performing model was the Dual Intent and Entity Transformer (DIET) classifier using word embeddings from BERT.

Chat is not available.