New advancements in radio data post-processing are underway within the Square Kilometre Array (SKA) precursor community, aiming to facilitate the extraction of scientific results from survey images through a semi-automated approach. Several of these developments leverage deep learning methodologies for diverse tasks, including source detection, object or morphology classification, and anomaly detection. Despite substantial progress, the full potential of these methods often remains untapped due to challenges associated with training large supervised models, particularly in the presence of small and class-unbalanced labelled datasets.
Self-supervised learning has recently established itself as a powerful methodology to deal with some of the aforementioned challenges, by directly learning a lower-dimensional representation from large samples of unlabelled data. The resulting model and data representation can then be used for data inspection and various downstream tasks if a small subset of labelled data is available.
In this work, we explored contrastive learning methods to learn suitable radio data representations by training the SimCLR model on large collections of unlabelled radio images taken from the ASKAP EMU and SARAO MeerKAT GPS surveys. The resulting models were fine-tuned over smaller labelled datasets, including annotated images from various radio surveys, and evaluated on radio source detection and classification tasks. Additionally, we employed the trained self-supervised models to extract features from radio images, which were used in an unsupervised search for objects with peculiar morphology in the ASKAP EMU pilot survey data. For all considered downstream tasks, we reported the model performance metrics and discussed the benefits brought by self-supervised pre-training, paving the way for building radio foundational models in the SKA era.