We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
With the fast development of modern microscopes and bioimaging techniques, an unprecedentedly large amount of imaging data is being generated, stored, analyzed, and shared through networks. The size of the data poses great challenges for current data infrastructure. One common way to reduce the data size is by image compression. This study analyzes multiple classic and deep-learning-based image compression methods, as well as an empirical study on their impact on downstream deep-learning-based image processing models. We used deep-learning-based label-free prediction models (i.e., predicting fluorescent images from bright-field images) as an example downstream task for the comparison and analysis of the impact of image compression. Different compression techniques are compared in compression ratio, image similarity, and, most importantly, the prediction accuracy of label-free models on original and compressed images. We found that artificial intelligence (AI)-based compression techniques largely outperform the classic ones with minimal influence on the downstream 2D label-free tasks. In the end, we hope this study could shed light on the potential of deep-learning-based image compression and raise the awareness of the potential impacts of image compression on downstream deep-learning models for analysis.
This Element covers the interaction of two research areas: linguistic semantics and deep learning. It focuses on three phenomena central to natural language interpretation: reasoning and inference; compositionality; extralinguistic grounding. Representation of these phenomena in recent neural models is discussed, along with the quality of these representations and ways to evaluate them (datasets, tests, measures). The Element closes with suggestions on possible deeper interactions between theoretical semantics and language technology based on deep learning models.
This paper aims to explore alternative representations of the physical architecture using its real-world sensory data through artificial neural networks (ANNs). In the project developed for this research, a detailed 3-D point cloud model is produced by scanning a physical structure with LiDAR. Then, point cloud data and mesh models are divided into parts according to architectural references and part-whole relationships with various techniques to create datasets. A deep learning model is trained using these datasets, and new 3-D models produced by deep generative models are examined. These new 3-D models, which are embodied in different representations, such as point clouds, mesh models, and bounding boxes, are used as a design vocabulary, and combinatorial formations are generated from them.
Environmental enrichment programmes are widely used to improve welfare of captive and laboratory animals, especially non-human primates. Monitoring enrichment use over time is crucial, as animals may habituate and reduce their interaction with it. In this study we aimed to monitor the interaction with enrichment items in groups of rhesus macaques (Macaca mulatta), each consisting of an average of ten individuals, living in a breeding colony. To streamline the time-intensive task of assessing enrichment programmes we automated the evaluation process by using machine learning technologies. We built two computer vision-based pipelines to evaluate monkeys’ interactions with different enrichment items: a white drum containing raisins and a non-food-based puzzle. The first pipeline analyses the usage of enrichment items in nine groups, both when it contains food and when it is empty. The second pipeline counts the number of monkeys interacting with a puzzle across twelve groups. The data derived from the two pipelines reveal that the macaques consistently express interest in the food-based white drum enrichment, even several months after its introduction. The puzzle enrichment was monitored for one month, showing a gradual decline in interaction over time. These pipelines are valuable for assessing enrichment by minimising the time spent on animal observation and data analysis; this study demonstrates that automated methods can consistently monitor macaque engagement with enrichments, systematically tracking habituation responses and long-term effectiveness. Such advancements have significant implications for enhancing animal welfare, enabling the discontinuation of ineffective enrichments and the adaptation of enrichment plans to meet the animals’ needs.
Optical microrobots are activated by a laser in a liquid medium using optical tweezers. To create visual control loops for robotic automation, this work describes a deep learning-based method for orientation estimation of optical microrobots, focusing on detecting 3-D rotational movements and localizing microrobots and trapping points (TPs). We integrated and fine-tuned You Only Look Once (YOLOv7) and Deep Simple Online Real-time Tracking (DeepSORT) algorithms, improving microrobot and TP detection accuracy by $\sim 3$% and $\sim 11$%, respectively, at the 0.95 Intersection over Union (IoU) threshold in our test set. Additionally, it increased mean average precision (mAP) by 3% at the 0.5:0.95 IoU threshold during training. Our results showed a 99% success rate in trapping events with no false-positive detection. We introduced a model that employs EfficientNet as a feature extractor combined with custom convolutional neural networks (CNNs) and feature fusion layers. To demonstrate its generalization ability, we evaluated the model on an independent in-house dataset comprising 4,757 image frames, where microrobots executed simultaneous rotations across all three axes. Our method provided mean rotation angle errors of $1.871^\circ$, $2.308^\circ$, and $2.808^\circ$ for X (yaw), Y (roll), and Z (pitch) axes, respectively. Compared to pre-trained models, our model provided the lowest error in the Y and Z axes while offering competitive results for X-axis. Finally, we demonstrated the explainability and transparency of the model’s decision-making process. Our work contributes to the field of microrobotics by providing an efficient 3-axis orientation estimation pipeline, with a clear focus on automation.
Precise and efficient grasping detection is vital for robotic arms to execute stable grasping tasks in industrial and household applications. However, existing methods fail to consider refining different scale features and detecting critical regions, resulting in coarse grasping rectangles. To address these issues, we propose a real-time coarse and fine granularity residual attention (CFRA) grasping detection network. First, to enable the network to detect different sizes of objects, we extract and fuse the coarse and fine granularity features. Then, we refine these fused features by introducing a feature refinement module, which enables the network to distinguish between object and background features effectively. Finally, we introduce a residual attention module that handles different shapes of objects adaptively, achieving refined grasping detection. We complete training and testing on both Cornell and Jacquard datasets, achieving detection accuracy of 98.7% and 94.2%, respectively. Moreover, the grasping success rate on the real-world UR3e robot achieves 98%. These results demonstrate the effectiveness and superiority of CFRA.
Since the outbreak of the COVID-19 epidemic, it has posed a great crisis to the health and economy of the world. The objective is to provide a simple deep-learning approach for predicting, modelling, and evaluating the time evolutions of the COVID-19 epidemic. The Dove Swarm Search (DSS) algorithm is integrated with the echo state network (ESN) to optimize the weight. The ESN-DSS model is constructed to predict the evolution of the COVID-19 time series. Specifically, the self-driven ESN-DSS is created to form a closed feedback loop by replacing the input with the output. The prediction results, which involve COVID-19 temporal evolutions of multiple countries worldwide, indicate the excellent prediction performances of our model compared with several artificial intelligence prediction methods from the literature (e.g., recurrent neural network, long short-term memory, gated recurrent units, variational auto encoder) at the same time scale. Moreover, the model parameters of the self-driven ESN-DSS are determined which acts as a significant impact on the prediction performance. As a result, the network parameters are adjusted to improve the prediction accuracy. The prediction results can be used as proposals to help governments and medical institutions formulate pertinent precautionary measures to prevent further spread. In addition, this study is not only limited to COVID-19 time series forecasting but also applicable to other nonlinear time series prediction problems.
We propose a neural network architecture and a training procedure to estimate blurring operators and deblur images from a single degraded image. Our key assumption is that the forward operators can be parameterized by a low-dimensional vector. The models we consider include a description of the point spread function with Zernike polynomials in the pupil plane or product-convolution expansions, which incorporate space-varying operators. Numerical experiments show that the proposed method can accurately and robustly recover the blur parameters even for large noise levels. For a convolution model, the average signal-to-noise ratio of the recovered point spread function ranges from 13 dB in the noiseless regime to 8 dB in the high-noise regime. In comparison, the tested alternatives yield negative values. This operator estimate can then be used as an input for an unrolled neural network to deblur the image. Quantitative experiments on synthetic data demonstrate that this method outperforms other commonly used methods both perceptually and in terms of SSIM. The algorithm can process a 512 $ \times $ 512 image under a second on a consumer graphics card and does not require any human interaction once the operator parameterization has been set up.1
In this work, we propose a novel approach for tomato pollination that utilizes visual servo control. The objective is to meet the growing demand for automated robotic pollinators to overcome the decline in bee populations. Our approach focuses on addressing this challenge by leveraging visual servo control to guide the pollination process. The proposed method leverages deep learning to estimate the orientations and depth of detected flower, incorporating CAD-based synthetic images to ensure dataset diversity. By utilizing a 3D camera, the system accurately estimates flower depth information for visual servoing. The robustness of the approach is validated through experiments conducted in a laboratory environment with a 3D printed tomato flower plant. The results demonstrate a high detection rate, with a mean average precision of 91.2 %. Furthermore, the average depth error for accurately localizing the pollination target is impressively minimal, measuring only 1.1 cm. This research presents a promising solution for tomato pollination, showcasing the effectiveness of visual-guided servo control and its potential to address the challenges posed by diminishing bee populations in greenhouses.
The axisymmetric nozzle mechanism is the core part for thrust vectoring of aero engine, which contains complex rigid-flexible coupled multibody system with joints clearance and significantly reduces the efficiency in modeling and calculation, therefore the kinematics and dynamics analysis of axisymmetric vectoring nozzle mechanism based on deep neural network is proposed. The deep neural network model of the axisymmetric vector nozzle is established according to the limited training data from the physical dynamic model and then used to predict the kinematics and dynamics response of the axisymmetric vector nozzle. This study analyses the effects of joint clearance on the kinematics and dynamics of the axisymmetric vector nozzle mechanism by a data-driven model. It is found that the angular acceleration of the expanding blade and the driving force are mostly affected by joint clearance followed by the angle, angular velocity and position of the expanding blade. Larger joint clearance results in more pronounced fluctuations of the dynamic response of the mechanism, which is due to the greater relative velocity and contact force between the bushing and the pin. Since axisymmetric vector nozzles are highly complex nonlinear systems, traditional numerical methods of dynamics are extremely time-consuming. Our work indicates that the data-driven approach greatly reduces the computational cost while maintaining accuracy, and can be used for rapid evaluation and iterative computation of complex multibody dynamics of engine nozzle mechanism.
Classical approaches for flood prediction apply numerical methods for the solution of partial differential equations that capture the physics of inundation processes (e.g., the 2D Shallow Water equations). However, traditional inundation models are still unable to satisfy the requirements of many relevant applications, including early-warning systems, high-resolution (or large spatial domain) simulations, and robust inference over distributions of inputs (e.g., rainfall events). Machine learning (ML) approaches are a promising alternative to physics-based models due to their ability to efficiently capture correlations between relevant inputs and outputs in a data-driven fashion. In particular, once trained, ML models can be tested/deployed much more efficiently than classical approaches. Yet, few ML-based solutions for spatio-temporal flood prediction have been developed, and their reliability/accuracy is poorly understood. In this paper, we propose FloodGNN-GRU, a spatio-temporal flood prediction model that combines a graph neural network (GNN) and a gated recurrent unit (GRU) architecture. Compared to existing approaches, FloodGNN-GRU (i) employs a graph-based model (GNN); (ii) operates on both spatial and temporal dimensions; and (iii) processes the water flow velocities as vector features, instead of scalar features. We evaluate FloodGNN-GRU using a LISFLOOD-FP simulation of Hurricane Harvey (2017) in Houston, Texas. Our results, based on several metrics, show that FloodGNN-GRU outperforms several data-driven alternatives in terms of accuracy. Moreover, our approach can be trained 100x faster and tested 1000x faster than the time required to run a comparable simulation. These findings illustrate the potential of ML-based methods to efficiently emulate physics-based inundation models, especially for short-term predictions.
New advancements in radio data post-processing are underway within the Square Kilometre Array (SKA) precursor community, aiming to facilitate the extraction of scientific results from survey images through a semi-automated approach. Several of these developments leverage deep learning methodologies for diverse tasks, including source detection, object or morphology classification, and anomaly detection. Despite substantial progress, the full potential of these methods often remains untapped due to challenges associated with training large supervised models, particularly in the presence of small and class-unbalanced labelled datasets.
Self-supervised learning has recently established itself as a powerful methodology to deal with some of the aforementioned challenges, by directly learning a lower-dimensional representation from large samples of unlabelled data. The resulting model and data representation can then be used for data inspection and various downstream tasks if a small subset of labelled data is available.
In this work, we explored contrastive learning methods to learn suitable radio data representations by training the SimCLR model on large collections of unlabelled radio images taken from the ASKAP EMU and SARAO MeerKAT GPS surveys. The resulting models were fine-tuned over smaller labelled datasets, including annotated images from various radio surveys, and evaluated on radio source detection and classification tasks. Additionally, we employed the trained self-supervised models to extract features from radio images, which were used in an unsupervised search for objects with peculiar morphology in the ASKAP EMU pilot survey data. For all considered downstream tasks, we reported the model performance metrics and discussed the benefits brought by self-supervised pre-training, paving the way for building radio foundational models in the SKA era.
Recommender systems (RSs) are one of the most important examples of how AI can be used to improve the experience of consumers, as well as to increase revenues for companies. The chapter presents a short survey of the main approaches. The manipulation of consumer behavior by RCs is less a legal issue, then an ethical one, which should be considered when designing these type of systems
The purpose of this chapter is to determine how the emergence of digtal delgates would affect the process of contract conclusion and how consumer law might need to be supplemented to strike an appropriate balance between utilising the potential for automation, where desired, with the ability of consumers to remain in control.
Using convolutional neural networks (CNNs) for image recognition is effective for early weed detection. However, the impact of training data curation, specifically concerning morphological changes during the early growth phases of weeds, on recognition robustness remains unclear. We focused on four weed species (giant ragweed [Ambrosia trifida L.], red morningglory [Ipomoea coccinea L.], pitted morningglory [Ipomoea lacunosa L.], and burcucumber [Sicyos angulatus L.]) with varying cotyledon and true leaf shapes. Creating 16 models in total, we employed four dataset patterns with different growth stage combinations, two image recognition algorithms (object detection: You Look Only Once [YOLO] v5 and image classification: Visual Geometry Group [VGG] 19), and two conditions regarding the number of species treated (four and two species). We evaluated the effects of growth stage training on weed recognition success using two datasets. One evaluation revealed superior results with a single class/species training dataset, achieving >90% average precision for detection and classification accuracy under most conditions. The other dataset revealed that merging different growth stages with different shapes as a class effectively prevented misrecognition among different species when using YOLOv5. Both results suggest that integrating different shapes in a plant species as a single class is effective for maintaining robust recognition success amid temporal morphological changes during the early growth stage. This finding not only enhances early detection of weed seedlings but also bolsters the robustness of general plant species identification.
The global increase in observed forest dieback, characterized by the death of tree foliage, heralds widespread decline in forest ecosystems. This degradation causes significant changes to ecosystem services and functions, including habitat provision and carbon sequestration, which can be difficult to detect using traditional monitoring techniques, highlighting the need for large-scale and high-frequency monitoring. Contemporary developments in the instruments and methods to gather and process data at large scales mean this monitoring is now possible. In particular, the advancement of low-cost drone technology and deep learning on consumer-level hardware provide new opportunities. Here, we use an approach based on deep learning and vegetation indices to assess crown dieback from RGB aerial data without the need for expensive instrumentation such as LiDAR. We use an iterative approach to match crown footprints predicted by deep learning with field-based inventory data from a Mediterranean ecosystem exhibiting drought-induced dieback, and compare expert field-based crown dieback estimation with vegetation index-based estimates. We obtain high overall segmentation accuracy (mAP: 0.519) without the need for additional technical development of the underlying Mask R-CNN model, underscoring the potential of these approaches for non-expert use and proving their applicability to real-world conservation. We also find that color-coordinate based estimates of dieback correlate well with expert field-based estimation. Substituting ground truth for Mask R-CNN model predictions showed negligible impact on dieback estimates, indicating robustness. Our findings demonstrate the potential of automated data collection and processing, including the application of deep learning, to improve the coverage, speed, and cost of forest dieback monitoring.
In spite of the superior performance deep neural networks have proven in thousands of applications in the past few years, addressing the over-sensitivity of these models to noise and/or intentional slight perturbations is still an active area of research. In the computer vision domain, perturbations can be directly applied to the input images. The task in the natural language processing domain is quite harder due to the discrete nature of natural languages. There has been a considerable amount of effort put to address this problem in high-resource languages like English. However, there is still an apparent lack of such studies in the Arabic language, and we aim to be the first to conduct such a study in this work. In this study, we start by training seven different models on a sentiment analysis task. Then, we propose a method to attack our models by means of the worst synonym replacement where the synonyms are automatically selected via the gradients of the input representations. After proving the effectiveness of the proposed adversarial attack, we aim to design a framework that enables the development of models robust to attacks. Three different frameworks are proposed in this work and a thorough comparison between the performance of these frameworks is presented. The three scenarios revolve around training the proposed models either on adversarial samples only or also including clean samples beside the adversarial ones, and whether or not to include weight perturbation during training.
There has been a steep rise in user-generated content on the Web and social media platforms during the last few years. While the ease of content creation allows anyone to create content, at the same time it is difficult to monitor and control the spread of detrimental content. Recent research in natural language processing and machine learning has shown some hope for the purpose. Approaches and methods are now being developed for the automatic flagging of problematic textual content, namely hate speech, cyberbullying, or fake news, though mostly for English language texts. This paper presents an algorithmic approach based on deep learning models for the detection of violent incidents from tweets in the Spanish language (binary classification) and categorizes them further into five classes – accident, homicide, theft, kidnapping, and none (multi-label classification). The performance is evaluated on the recently shared benchmark dataset, and it is found that the proposed approach outperforms the various deep learning models, with a weighted average precision, recall, and F1-score of 0.82, 0.81, and 0.80, respectively, for the binary classification. Similarly, for the multi-label classification, the proposed model reports weighted average precision, recall, and F1-score of 0.54, 0.79, and 0.64, respectively, which is also superior to the existing results reported in the literature. The study, thus, presents meaningful contribution to detection of violent incidents in Spanish language social media posts.
This manuscript introduces deep learning models that simultaneously describe the dynamics of several yield curves. We aim to learn the dependence structure among the different yield curves induced by the globalization of financial markets and exploit it to produce more accurate forecasts. By combining the self-attention mechanism and nonparametric quantile regression, our model generates both point and interval forecasts of future yields. The architecture is designed to avoid quantile crossing issues affecting multiple quantile regression models. Numerical experiments conducted on two different datasets confirm the effectiveness of our approach. Finally, we explore potential extensions and enhancements by incorporating deep ensemble methods and transfer learning mechanisms.
Chapter 13 discusses neural networks and deep learning; included is a presentation of deep convolutional networks that seem to have a great potential in the classification of medical images.