Natural Language Processing as a Tool in Supporting Clinical Decision-Making

  • Laurence Jones

    Student thesis: Doctoral Thesis


    While the amount of unstructured text data continues to grow within the clinical domain, little modelling is carried out in comparison to other industries. My research goal in this thesis is to present machine learning models that can effectively discern the relationships within medical notes, tying symptoms and other elements to an associated medical speciality.

    There have been many studies in the clinical domain using natural language processing that have seen successes with document classification. However, the solutions proposed often rely on an external medical dictionary to annotate the data. My goal is the development of a classifier that shows that these relationships can be extracted from the original, unstructured text. Furthermore, the standard approach to documenting research in this area revolves around focusing on a single type of machine learning algorithm, be it the method of feature generation or the specific machine learning model chosen for the task.

    The results shown in this thesis address this issue by providing a comparative demonstration of multiple feature generation methods alongside a plethora of traditional machine learning and neural network-based models for classification. Lastly, existing research encounters issues with the procurement of suitable medical data, often defaulting to using datasets that have been curated for a specific task. This research instead uses real patient data from Digital Health and Care Wales (DHCW), selected randomly from cases between 2018 and 2019.The results produced in this thesis found that frequency-based feature generation performed substantially better than word embeddings when using a traditional machine learning model like logistic regression. However, using word embeddings with a neural network architecture yielded more comparable results. For the machine learning models themselves, the support vector machine (91%) and two transformer deep learning models (93%) produced the best results.
    Date of Award2023
    Original languageEnglish
    SponsorsDigital Health and Care Wales, NHS Wales
    SupervisorIan Wilson (Supervisor), Andrew Ware (Supervisor) & Penny Holborn (Supervisor)

    Cite this