Iterative Label Spreading (ILS) is an unsupervised learning algorithm that overcomes the challenges in applying clustering methods appropriately and effectively to high-dimensional scientific data. It was developed specifically for small (<10e5) materials science data sets and is based on a general definition of a cluster and cluster result quality. Even in simple cases of clustering in two dimensions (where visual inspection of clustering results is possible) common clustering methods can fail to give the expected result (including k-Means clustering, Ward hierarchical clustering, and DBSCAN). ILS on the other hand, can be used for performing clustering, and assessing a clustering result found by any method and doesn’t require pre-defined hyperparameters. The trade-off is with scaling that is limited by the iterative nature of the algorithm. The goal of this project is to combine agglomerative clustering with ILS to improve scaling while maintaining the integrity and quality of the clustering result.
The Primary Supervisor for this project is Dr Amanda Parker, who can be contacted at email@example.com
To create a new version of the ILS software and profile the improvements in performance.
python programming and experience in data science and machine learning is essential (such as COMP3720, COMP4660, COMP4670, COMP6670, COMP8420). Familiarity with platforms such as scikit-learn is desirable.
This can be a 12cp or a 24cp project.
machine learning, clustering, software engineering, data science