Determining Geographical Casual Relationships through the Development of Spatial Cluster Detection and Feature Selection Techniques

    Student thesis: Doctoral Thesis


    Spatial datasets contain information relating to the locations of incidents of a disease or other phenomena. Appropriate analysis of such datasets can reveal information about the distribution of cases of the phenomena. Areas that contain higher than expected incidence of the phenomena, given the background population, are of particular interest. Such clusters of cases may be affected by external factors. By analysing the locations of potential influences, it may be possible to establish whether a cause and effect relationship is present within the dataset.

    This thesis describes research that has led to the development and application of cluster detection and feature selection techniques in order to determine whether causal relationships are present within generic spatial datasets. The techniques are described and demonstrated, and their effectiveness established by testing them using synthetic datasets. The techniques are then applied to a dataset
    supplied by the Welsh Leukaemia Registry that details all cases of leukaemia diagnosed in Wales between 1990 and 2000.

    Cluster detection techniques can be used to provide information about case distribution. A novel technique, CLAP, has been developed that scans the study region and identifies the statistical significance of the levels of incidence in specific areas.

    Feature selection techniques can be used to identify the extent to which a selection of inputs impact upon a given output. Results from CLAP are combined with details of the locations of potential causal factors, in the form of a numerical dataset that can be analysed using feature selection techniques. Established techniques and a newly developed technique are used for the analysis. Results from such analysis allow conclusions to be drawn as to whether geographical causal relationships are apparent.
    Date of Award2006
    Original languageEnglish
    SupervisorIan Wilson (Supervisor), Dave Kidder (Supervisor) & Gary Higgs (Supervisor)


    • Cluster analysis

    Cite this