Dendrogram seriation in data visualisation: algorithms and applications
Earle, Denise (2010) Dendrogram seriation in data visualisation: algorithms and applications. PhD thesis, National University of Ireland Maynooth.
Seriation is a data analytic tool for obtaining a permutation of a set of objects with the goal of revealing structural information within the set of objects. The purpose of this thesis is to investigate and develop tools for seriation with the goal of using these tools to enhance data visualisation. The particular focus of this thesis is on dendrogram seriation algorithms. A dendrogram is a tree-like structure used for visualising the results of a hierarchical clustering and the order of the leaves in a dendrogram provides a permutation of a set of objects. Dendrogram seriation algorithms rearrange the leaves of a dendrogram in order to nd a permutation that optimises a given criterion. Dendrogram seriation algorithms are widely used, however, the research in this area is often confusing because of inconsistent or inadequate terminology. This thesis proposes new notation and terminology with the goal of better understanding and comparing dendrogram seriation algorithms. Seriation criteria measure the goodness of a permutation of a set of objects. Popular seriation criteria include the path length of a permutation and measuring anti-Robinson form in a symmetric matrix. This thesis proposes two new seriation criteria, lazy path length and banded anti-Robinson form, and demonstrates their eectiveness in improving a variety of visualisations. The main contribution of this thesis is a new dendrogram seriation algorithm. This algorithm improves on other dendrogram seriation algorithms and is also exible because it allows the user to either choose from a variety of seriation criteria, including the new criteria mentioned above, or to input their own criteria. Finally, this thesis performs a comparison of several seriation algorithms, the results of which show that the proposed algorithm performs competitively against other algorithms. This leads to a set of general guidelines for choosing the most appropriate seriation algorithm for dierent seriation interests and visualisation settings.
Repository Staff Only: item control page