Cluster detection and analysis with geo-spatial datasets using a hybrid statistical and neural networks hierarchical approach

  • Salar Majeed

    Student thesis: Doctoral Thesis


    Spatial datasets contain information relating to the locations of incidents of phenomena for example, crime and disease. Areas that contain a higher than expected incidence of the phenomena, given background population and census datasets, are of particular interest. By analysing the locations of potential influence, it may be possible to establish where a cause and effect relationship is present in the observed process. Cluster detection techniques can be applied to such datasets in order to reveal information relating to the spatial distribution of the cases. Research in these areas has mainly concentrated on either computational or statistical aspects of cluster detection. Each clustering algorithm has its own strengths and weakness. Their main weaknesses causing their unreliability can be estimating the number of clusters, testing the number of components, selecting initial seeds (centroids), running time and memory requirements. Consequently, a new cluster detection methodology has been developed in this thesis based on knowledge drawn from both statistical and computing domains. This methodology is based on a hybrid of statistical methods using properties of probability rather than distance to associate data with clusters. No previous knowledge of the dataset is required and the number of clusters is not predetermined. It performs efficiently in terms of memory requirements, running time and cluster quality. The algorithm for determining both the centre of clusters and the existence of the clusters themselves was applied and tested on simulated and real datasets. The results which were obtained from identification of hotspots were compared with results of other available algorithms such as CLAP (Cluster Location Analysis Procedure), Satscan and GAM (Geographical Analysis Machine). The outputs are very similar. XVI GIS presented in this thesis encompasses the SCS algorithm, statistics and neural networks for developing a hybrid predictive crime model, mapping, visualizing crime data and the corresponding population in the study region, visualizing the location of obtained clusters and burglary incidence concentration ‘hotspots’ which was specified by clustering algorithm SCS. Naturally the quality of results is subject to the accuracy of the used data. GIS is used in this thesis for developing a methodology for modelling data containing multiple functions. The census data used throughout this construction provided a useful source of geo-demographic information. The obtained datasets were used for predictive crime modelling. This thesis has benefited from several existing methodologies to develop a hybrid modelling approach. The methodology was applied to real data on burglary incidence distribution in the study region. Relevant principles of statistics, Geographical Information System, Neural Networks and SCS algorithm were utilized for the analysis of observed data. Regression analysis was used for building a predictive crime model and combined with Neural Networks with the aim of developing a new hierarchical neural Network approaches to generate a more reliable prediction. The promising results were compared with the non-hierarchical neural Network back-propagation network and multiple regression analysis. The average percentage accuracy achieved by the new methodology at testing stage increase 13% compared with the non-hierarchical BP performance. In general the analysis reveals a number of predictors that increase the risk of burglary in the study region. Specifically living in a household in which there is ‘one person’, ‘lone parent’, household where occupations are in elementary or intermediate and unemployed. For the influence of Household space, the results indicate that the risk of burglary rate increases within the household living in shared houses.
    Date of AwardMar 2010
    Original languageEnglish
    SupervisorIan Wilson (Supervisor), Jamal Ameen (Supervisor) & Andrew Ware (Supervisor)


    • Cluster analysis

    Cite this