Feature extraction is an important part of data-driven design but can be time consuming using manual methods. This is particularly challenging in areas such as computational nanotechnology, where a very detailed description atomic structure is essential for predicting the functional properties for use in catalysis and sensing. The structure of the surface determines the energy efficiency, chemical sensitivity and potential for toxicity, but under certain conditions the surface becomes structurally indistinguishable and the predictive capabilities of most models breaks down. The ability to determine the geometric features of each atom of a nanoparticle using machine learning will reduce the need for manual characterisation and allow for more accurate and reliable analysis of its properties.
Iterative label spreading (ILS) is an unsupervised machine learning algorithm specifically suited for clustering of materials science data and nanoparticles. Preliminary results show that ILS is capable of distinguishing inner/bulk atoms from their surface counterparts based on some structural features, but significant challenges remain. ILS ceases to recognise such separation of clusters when a disordered nanoparticle is investigated, and the threshold for size, shape and level of disorder is unknown. It is also possible, but currently unknown, if ILS can distinguish more detailed characteristics, such as different surface orientation, edges, corners, and surface defects.
In this project, you will explore the capability of ILS in distinguishing the geometric features of gold nanoparticles of a range of sizes, shapes and levels of disorder. All relevant raw datasets of nanoparticles and pipelines for feature extraction and data processing will be provided. The source code of ILS is publicly available at https://researchdata.edu.au/iterative-label-spreading/1426058.