Introduction
The use of unmanned aerial vehicles (UAVs) and machine learning is increasingly important in precision agriculture, particularly for weed management (Boursianis et al. Reference Boursianis, Papadopoulou, Diamantoulakis, Liopa-Tsakalidi, Barouchas, Salahas, Karagiannidis, Wan and Goudos2020; Fernández-Quintanilla et al. Reference Fernández-Quintanilla, Peña, Andújar, Dorado, Ribeiro and López-Granados2018; Shaner and Beckie Reference Shaner and Beckie2014; Tsouros et al. Reference Tsouros, Bibi and Sarigiannidis2019). Herbicides are essential for weed management but incur economic and environmental costs that are compounded by the traditional approach of tractor-based bulk application to entire fields, even in weed-free areas. A more efficient method is site-specific weed management (Christensen et al. Reference Christensen, Søgaard, Kudsk, Nørremark, Lund, Nadimi and Jørgensen2009; López-Granados Reference López-Granados2011), which involves creating customized weed maps for each field. Then, herbicide application by small ground robots can focus on areas with weeds, saving time and reducing herbicide use. While cameras on ground robots may also detect weeds, they are slower and require initial travel across the entire field to detect and remove weeds (Binch et al. Reference Binch, Cooke and Fox2018; Binch and Fox Reference Binch and Fox2017; Hall et al. Reference Hall, Dayoub, Kulk and McCool2017; Lottes et al. Reference Lottes, Hoeferlin, Sander, Müter, Schulze and Stachniss2016; Milioto et al. Reference Milioto, Lottes and Stachniss2018; Sheikh et al. Reference Sheikh, Milioto, Lottes, Stachniss, Bennewitz and Schultz2020; Wendel and Underwood Reference Wendel and Underwood2016; Wu et al. Reference Wu, Aravecchia and Pradalier2019). Traditionally, the most cost-effective methods to capture aerial images for weed detection and mapping (especially at scale) have been light aircraft or satellites (Brown and Noble Reference Brown and Noble2005; Christensen et al. Reference Christensen, Søgaard, Kudsk, Nørremark, Lund, Nadimi and Jørgensen2009; Lamb and Brown Reference Lamb and Brown2001), but they struggle to detect small patches of weeds due to relatively low spatial resolution. However, recent UAV and camera technology has improved and become cheaper, allowing UAVs to capture higher spatial resolution images for weed mapping (Boursianis et al. Reference Boursianis, Papadopoulou, Diamantoulakis, Liopa-Tsakalidi, Barouchas, Salahas, Karagiannidis, Wan and Goudos2020; Huang et al. Reference Huang, Deng, Lan, Yang, Deng and Zhang2018; López-Granados Reference López-Granados2011). This has led to a growing commercial sector of UAV companies specializing in aerial imagery for agricultural purposes (Drone AG 2023; Droneflight 2023; Drone Photography Services 2023; iRed 2023; Skeye Train 2023; SkyCam 2023).
We aim to investigate the accuracy of machine learning methods when applied to UAV RGB and multispectral aerial images for predicting the presence of black-grass (Alopecurus myosuroides Huds.) in fields that were not in the training data. Accurate predictions for out-of-sample fields are vitally important to the widespread deployment of UAV imagery in weed detection, as collecting ground-truth samples for every new field is not feasible. Previous research has predicted A. myosuroides density at a lower resolution (20 m by 20 m) with models that performed poorly on out-of-sample fields (Lambert et al. Reference Lambert, Hicks, Childs and Freckleton2018, Reference Lambert, Childs and Freckleton2019), but out-of-sample weed detection may still be viable in other crops (Kim et al. Reference Kim, Kim, Ju and Son2019). We explore methods to improve the prediction resolution to approximately 1 m. To achieve this, ground-truth samples are collected within 1 m by 1 m quadrats, and high-resolution cameras with pixel sizes ranging from 22 to 27 mm are used for the aerial imagery.
Related Work
UAVs are becoming increasingly popular in agriculture as an inexpensive, agile, and time-efficient platform for a wide range of applications (Kim et al. Reference Kim, Kim, Ju and Son2019; Maddikunta et al. Reference Maddikunta, Hakak, Alazab, Bhattacharya, Gadekallu, Khan and Pham2021; Tsouros et al. Reference Tsouros, Bibi and Sarigiannidis2019). Weed detection using UAVs and machine learning methods has been developed across many different crop and weed species (Lambert et al. Reference Lambert, Hicks, Childs and Freckleton2018, Reference Lambert, Childs and Freckleton2019; Lottes et al. Reference Lottes, Khanna, Pfeifer, Siegwart and Stachniss2017; Mohidem et al. Reference Mohidem, Che’Ya, Juraimi, Fazlil Ilahi, Mohd Roslim, Sulaiman, Saberioon and Mohd Noor2021; Popović et al. Reference Popović, Hitz, Nieto, Sa, Siegwart and Galceran2017; Su et al. Reference Su, Yi, Coombes, Liu, Zhai, McDonald-Maier and Chen2022); to date two main categories of classification have been studied: pixel-based and object-based classification. Generally, studies using pixel-based classification have captured images at higher altitudes, around 20 m and higher, and studies using object-based classification at lower altitudes, as these images have the resolution required to recognize small objects (e.g., leaves or whole plants). Rozenberg et al. (Reference Rozenberg, Kent and Blank2021) investigated both pixel- and object- based classification using two different classifiers in onion (Allium cepa L.) fields. They found that both pixel and object methods produced very similar results, with average overall accuracies between 94% and 96%. However, the pixel method had better performance with lower-resolution images than the object method.
Other studies have focused on detecting specific weed species such as A. myosuroides in wheat (Triticum aestivum L.). Lambert et al. (Reference Lambert, Hicks, Childs and Freckleton2018) used a pixel-based random forest method across 18 fields using RGB and red-edge (RE) cameras, achieving accuracies of 61% to 87%. They expanded their work by using near-infrared (NIR) cameras, vegetation indices, and a convolutional neural network, resulting in an area under the curve of 0.825 (Lambert et al. Reference Lambert, Childs and Freckleton2019). Su et al. (Reference Su, Yi, Coombes, Liu, Zhai, McDonald-Maier and Chen2022) used a pixel-based random forest classifier on RGB, NIR, and RE images, as well as 18 vegetation indices, achieving an accuracy of 93%, but only used one field for sampling and testing.
In this study, we investigate the potential for using UAVs to capture images and machine learning methods to create weed maps. We focus on UAV systems that are readily available for hire by agricultural users and well-known machine learning libraries that can be used for weed distribution analysis. Our first aim was to investigate whether images captured by UAVs are capable of detecting weeds, and if so, the size of weeds that can be detected. To do this, we captured aerial images of twelve wheat and barley (Hordeum vulgare L.) fields and created ground-truth data sets of A. myosuroides weights. Second, we wanted to determine whether the use of a multispectral camera improves the detection of the weeds or whether the use of a cheaper RGB camera would produce accurate results. To do this, we captured both RGB and multispectral images of the fields and compared them. Third, to establish which machine learning method produces the most accurate results, we evaluated four machine learning methods frequently used for this task. Finally, to investigate whether these models could be transferred across fields, we removed one field from the data set and trained the machine learning applications on the remaining eleven fields to detect weeds in new, previously unobserved fields.
Materials and Methods
To collect the image and ground-truth data, twelve cereal fields with A. myosuroides were selected in Bourne, Lincolnshire, and Peterborough, Cambridgeshire, UK. The aerial images and ground-truth data were collected from July 27 to 31, 2020. The fields included six winter wheat, one spring wheat, and five spring barley, with varying sizes ranging from 10 to 40 ha. The spring wheat and spring barley fields were at growth stages GS85 to GS89 and the winter wheat fields were GS90 to GS95, near to being harvested.
Aerial Images
Aerial images were taken by a commercial UAV imagery company (iRed, Emsworth, UK) (iRed 2023) using a MicaSense RedEdge3 Multi-spectral camera. The camera was calibrated, and all bands were stitched and orthorectified using ground control points by iRed, so that the processed files were the starting point for our analysis. The images had pixel sizes ranging from 22 to 27 mm for the RGB images and from 57 to 76 mm for the normalized difference vegetation index (NDVI) and normalized difference red-edge index (NDRE). Images were collected flying at a height of 75 to 100 m and captured five bands: blue (455 to 495 nm), green (540 to 580 nm), red (658 to 678 nm), RE (707 to 727 nm), and NIR (800 to 880 nm). The last two bands are used to calculate the NDVI (Equation 1) and NDRE (Equation 2) (Gitelson et al. Reference Gitelson, Kaufman, Stark and Rundquist2002; Torres-Sánchez et al. Reference Torres-Sánchez, Peña, de Castro and López-Granados2014). These are ratios of reflected light over incoming light to detect vegetation and are based on the fact that plant leaves strongly reflect NIR and RE.
Their values range from −1 corresponding to water, 0 for rock and sand, and 0.2 to 1 for vegetation. RGB sensors are much cheaper with higher resolution than NIR or RE sensors (note the resolution difference in our images). Thus, it would be beneficial to train classifiers on RGB and indices that only use RGB data; a commonly used index is the visible atmospherically resistant index (VARI) (Equation 3) (Gitelson et al. Reference Gitelson, Kaufman, Stark and Rundquist2002), which only needs RGB data:
Its values range from −1 to 1, as with NDVI and NDRE. Figure 1 shows the color aerial image and the three indices of one field sampled.
In the field, the most obvious difference between the crop and the weed was the level of senescence, with the crops exhibiting greater senescence than the weed. For this reason, we focused on indices that are designed to pick up living photosynthetic material (i.e., green leaves) and that are obtainable with widely available, cheaper, drone-mounted sensors (i.e., five-band multispectral imagery), and that are simple to calculate with only spectral data. Even with these restrictions, there are several other indices we could have used (e.g., Triangular Greenness Index). However, as all these indices focus on the relative amount of green reflectance in different ways, they tend to correlate with each other to varying degrees.
Ground-Truth Data Collection
Ground-truth data were obtained by field walking through the twelve fields and collecting all A. myosuroides seed heads and stalks within 1 m by 1 m quadrats. Quadrats were placed wherever A. myosuroides was encountered, and in cases where there was a large dense patch, we randomly placed the quadrat in two to four locations within the patch, depending on its size. In addition, we sampled no-weed quadrats where we visually verified no weeds were present. The center coordinates of the quadrat were recorded using a Leica CS20 and GS07 RTK GNSS unit (Leica Geosystems 2023) with an accuracy of 10 to 20 mm. The harvested samples were then dried and weighed to create the ground-truth data set, which includes the coordinate at the center of the quadrat and the sample weight. There are 406 A. myosuroides and 649 no-weed quadrats, a total of 1,055 quadrats sampled across the twelve fields. Figure 2 shows the distribution of A. myosuroides sample weights.
Data Set Definition
Data sets were created by matching up the ground-truth sample coordinates with the coordinates of the orthorectified aerial images. To increase the resolution of the NIR and RE images, the pixels were split up, the new pixels were assigned the same values as the original pixel, and all images were upscaled to the same resolution as the RGB images. For each 1 m by 1 m quadrat, the RGB and multispectral pixels were extracted from the aerial images centered around the GPS location, and the indices were calculated for each pixel. The data set contains the red, green, blue, NDVI, NDRE, and VARI values for each pixel within each 1 m by 1 m quadrat. Each pixel was labeled with the sample weight or zero where the quadrat contained no weeds (i.e., all pixels in the quadrat are assigned the same A. myosuroides weight). The data sets consist of pixel and weight data from both wheat and barley fields. Training the classifiers on a data set containing both crop types has the potential to enhance the overall applicability of the classification method, allowing it to develop a more robust understanding of the relevant features and patterns necessary for accurate weed identification and classification.
To investigate the sizes of A. myosuroides that classifiers can detect, we created four data sets, three with two classes and one with four classes. The aim is to determine whether the classifiers can simply distinguish between a threshold of plant weights or differentiate between several classes (C1, C2, etc.) of weights at the same time:
-
two classes: threshold at 0 g, C1 = 0 g, C2 > 0 g
-
two classes: threshold at 3 g, C1 ≤ 3 g, C2 > 3 g
-
two classes: threshold at 5 g, C1 ≤ 5 g, C2 > 5 g
-
four classes: C1 = 0 g, 0 g < C2 < 5 g, 5 g ≤ C3 < 10 g, C4 ≥ 10 g
The machine learning classifiers are predicting the probability that a pixel comes from a quadrat in a given weed density class. We tested several other class structures and thresholds. The results showed the same pattern and the conclusions were unaffected. In this paper, we present the most relevant results.
Machine Learning Classification
Four classifiers from Scikit.learn (scikit-learn 2023a) were trained and tested on the data; Random Forest, support vector machine (SVM), gradient boosting, and multilayer perceptron (MLP). They estimate P(quadrat | pixel data), the probability that a quadrat centered on the pixel would be labeled as a weed or not a weed, given the data from the pixel only. For each field the classifiers are run on ground-truth samples for testing and then across the whole aerial image, classifying every pixel to generate weed prediction maps of each field. The random forest and SVM classifiers performed very poorly, and thus we present here only the more relevant data from the gradient-boosting and MLP classifiers. The features the classifiers were trained on were the red, green, blue, NDVI, NDRE, and VARI pixel data. The training data were randomly split, with 10% used for training and 90% for testing. The parameters used for the gradient-boosting classifier were: n estimators = 500, max depth = 4, minimum samples split = 5, and learning rate = 0.01 (scikit-learn 2023b). For the MLP classifier, the parameters used were: activation identity and learning rate invscaling (scikit-learn 2023c).
The data sets used to train and test the machine learning classifiers contained samples from eleven of the twelve fields. Samples from the excluded field were used to test the classifiers and generate the performance metrics. This process was repeated with each field excluded in turn to test the classifiers on unseen fields. The data sets are unbalanced, as overall, 62% of the data is empty quadrats and 38% contains A. myosuroides. To improve the classification accuracy, the data are resampled to balance the number of data points in each class. Several resampling methods were tried, including random oversampling, which randomly duplicates samples in the minority classes, and random undersampling which randomly deletes samples in the majority class. Two methods of synthetic minority oversampling technique (SMOTE) (Batista et al. Reference Batista, Prati and Monard2004; Chawla et al. Reference Chawla, Bowyer, Hall and Kegelmeyer2002) were also tried to rebalance the data set. SMOTE-ENN combines oversampling (SMOTE) with edited nearest neighbors (ENN) (Wilson Reference Wilson1972); this method generates new samples where some may overlap between classes, and ENN locates and removes misclassified samples by comparing the samples to their nearest neighbors. SMOTE-Tomek also involves SMOTE oversampling, but combines with Tomek links (Tomek Reference Tomek1976). Tomek links finds pairs of samples in opposite classes which are their nearest neighbor. The majority instance of the pair is removed, which removes unwanted overlap and more clearly defines a border between the classes. For both resampling methods, only the minority classes were resampled, and the default parameters were used as detailed in scikit-learn (2023d, 2023e).
Performance Metrics
To investigate the performance of classification, four metrics were calculated: accuracy, balanced accuracy (BA), Cohen’s kappa, and Matthews correlation coefficient (MCC). Overall accuracy represents the proportion of all correct predictions, where TP is true positive (correctly classified weeds), TN is true negative (correctly classified no weeds), FP is false positive (incorrectly classified weeds), and FN is false negative (incorrectly classified no weeds). However, accuracy is a poor measure when the testing data set is unbalanced. Consider the extreme case where 99% of samples are positive. A classifier can achieve 99% accuracy by always predicting positive cases, but such a classifier has no predictive skill. We use three other performance measures that attempt to account for class imbalance in different ways. BA (Equation 4) is the average of the proportion of positive and negative samples that are correctly classified.
Cohen’s kappa, κ (Equation 5) (Cohen Reference Cohen1960) compares observed accuracy, p 0, with expected accuracy, p e .
κ ranges between −1 and 1, where 1 is perfect agreement between observed and expected accuracy, between 0 and 1 there is some agreement, and at less than 0 there is no agreement and the classification is worse than random guessing. Finally, we use a balance-corrected measure of the correlation between the observed data and the predictions, MCC (Equation 6) (Boughorbel et al. Reference Boughorbel, Jarray and El-Anbari2017; Matthews Reference Matthews1975).
MCC ranges between −1 and 1, where 1 is perfect correlation, 0 is no better than random guessing, and −1 is an inverse prediction (total disagreement between truth and prediction). We are interested in how well our classifiers work on unseen fields, as this is the most likely use case for aerial images, where a pretrained classifier is applied to images from a new field for which there are no ground data. Thus, we calculate TP, TN, FN, and FP on the excluded fields, and where a single value for a classifier and resampling method is given, it is the total performance across all twelve fields when excluded from the training set.
Results and Discussion
Our aims were: first, to determine the size of weeds that can be accurately detected and whether the classifiers can distinguish between several weed size classes; second, to investigate the accuracy of the classification when using multispectral images as compared to RGB color images; third, to evaluate the performance of machine learning classifiers in detecting A. myosuroides in cereal fields from images captured using a UAV; and finally to investigate if these models could be transferred across fields. Accurate predictions for weed detection are important for the widespread deployment of UAV imagery for precision weed detection and management.
The results of the best-performing classification methods tested are shown in Tables 1 and 2. The tables illustrate the different classifiers, data sets, resampling methods and image sets used for each test, with data from all twelve fields where samples from the field being classified were removed in turn from the data set.
a SMOTE-ENN, SMOTE-Tomek: resampling techniques that generate or remove samples in the data set to balance the classes.
b MCC, Matthews correlation coefficient.
a SMOTE-ENN, SMOTE-Tomek: resampling techniques that generates or removes samples in the data set to balance the classes.
b MCC, Matthews correlation coefficient.
In our tests, the MLP classifier consistently outperformed gradient boosting across all data and image sets. Additionally, MLP classification required considerably less time to process the data, taking only half to a twentieth of the time compared with gradient boosting. Our performance results indicate that this method of classification is effective at identifying larger A. myosuroides plants (larger than 3 g or 5 g). The accuracy for the two-classes threshold at 3 g ranges between 64% and 67%, and with the threshold at 5 g, the accuracies improve to 64% to 72%. However, the classifiers encountered difficulty distinguishing between no–A. myosuroides quadrats and quadrats with very small A. myosuroides plants. With the threshold at 0 g across both the classifiers, resampling methods, and image sets, the accuracies are around 50%, which is no better than guessing. Increasing the threshold from 0 g to 5 g produces a large increase in the performance of the classifier, accuracy increases by 25% and BA by 40%. The classifiers also have difficulty with the four-classes data set. Table 3 shows this result, which has trouble separating no–A. myosuroides samples and small A. myosuroides plants, class 1 and class 2, but have more success with larger plants, class 4. As shown in Figure 2, a majority of A. myosuroides samples collected are small plants (<3 g). The classifier has difficulty with these, because the area within a 1 m by 1 m quadrat that contains a small A. myosuroides plant (e.g., <3 g) looks almost identical to a 1 m by 1 m quadrat with no–A. myosuroides. These small A. myosuroides plants only occupy a few pixels within the quadrat, with the rest of the pixels being soil and crop. This can cause the classifiers to try and discriminate on features that are not only the weeds but also the surrounding area of soil and crop. Larger weeds are going to be easier to detect, as there are more pixels in the quadrat that have features that relate only to the weed, and not the crop or soil. That is, quadrats with larger weeds are more distinct from the background crop than quadrats with smaller weeds, and thus the classification task is easier.
a MCC, Matthews correlation coefficient.
Table 4 shows two of the best-performing results from the classifiers (Tables 1 and 2). The table shows the results of the true and false rates and the three metrics calculated comparing the RGB, NDVI, and NDRE images with only RGB and RGB-derived VARI. The two results are: the MLP classifier with SMOTE-Tomek resampling on the two-classes 5 g data set using the RGB images, NDRE, and NDVI; and the MLP classifier with SMOTE-ENN resampling on the two-classes 5 g data set using the RGB images and VARI. Using this method, the classifier effectively ignores any A. myosuroides less than 5 g and treats the area as having no A. myosuroides; however, this produces the best-performing results of the metrics and correctly predicts the absence of A. myosuroides (class 1) 69.46% and the presence of A. myosuroides above 5 g (class 2) 74.79% when using the RGB images, NDRE, and NDVI. A surprising result is (b), which only uses the RGB images and the RGB-derived VARI; the performance of this index is almost the same as (a). Using the RGB and VARI images (b), the classifier correctly predicts the absence of A. myosuroides (class 1) 67.74% and the presence of A. myosuroides above 5 g (class 2) 77.51%.
a NDVI, normalized difference vegetation index.
b NDRE, normalized difference red-edge index.
Across all twelve fields, the classifiers performance metrics are high; however, the performance varies when looking at individual fields. Figure 3 and Table 5 show the accuracy and true and false rates of the twelve fields individually using the best-performing classifier for each data set. (We do not show BA per field because this estimator is not meaningful for small class sample sizes.)
As expected from the overall metrics (Table 1) the results for the two-classes threshold at 0 g (Figure 3A) are poor with accuracies around or below 50%. As the threshold is increased to 3 g (Figure 3B), and 5 g (Table 5), the accuracy and true and false rates vary but are on the whole higher than those for the 0 g data set, with most field accuracies greater than 70%. With the latter two sets of results, the reason the accuracy varies so much is that very few fields contained an abundance of A. myosuroides and amounts above the thresholds of 3 g and 5 g to test the classifiers.
Fields 3 and 6 in Table 5 have a reasonable balance of ground-truth data, representing most of the A. myosuroides sample data overall. When these fields are being classified and their data are removed from the training data, most of the A. myosuroides data is removed from the training data. As a result, the resampling methods have a very small number of samples to generate from. The classifier also has high accuracy at predicting no–A. myosuroides areas across most of the fields, apart from fields 3, 6, and 9 (Table 5), because the colors of the images in these three fields are darker and browner, whereas the other nine fields are lighter and greener; the training data contain mostly lighter areas for no A. myosuroides, so the classifier confuses these darker areas. The performance of the classifiers exhibits minimal variation based on the crop type, as both wheat and barley fields demonstrate both high and low accuracies. There is no single crop type that consistently exhibits either high or low accuracy across the classifiers.
The main result is that UAV images and machine learning methods are capable of accurately predicting the presence and absence of A. myosuroides in out-of-sample fields. Across twelve fields, the classifiers correctly predicted no A. myosuroides 69.46% and A. myosuroides 77.51%. However, the accuracy for individual fields is low. Overall, the classifier is accurate with out-of-sample new fields, but this could be improved by increasing the number of fields sampled in the data set. Some of the fields sampled contained high densities of A. myosuroides, whereas others contained very little or no A. myosuroides, and this skews the data set. A wider variety of fields that contain more and various sizes of A. myosuroides should be sampled to improve the classification. Lower reliability and accuracy predictions are useful to indicate the areas with and without weeds so weed management can be focused on those areas with predicted weeds. A robotic weed management platform can spend more time in the areas with predicted weeds rather than covering the entire field, which will be mainly weed-free.
The use of RGB-only images and the calculated VARI results in comparably good predictions: no A. myosuroides = 67.74% correctly and A. myosuroides = 77.51% correctly (Table 4).
The data sets with RGB, NDRE, and NDVI images have five features on which the classifiers are trained: the five pixel values taken from the aerial images. Each image contains RGB colors and the two indices, NDVI and NDRE. The data sets with RGB and VARI images contain four features. Figure 4 shows the importance of these four and five features to the gradient-boosting classifier and how they influence its predictions. The most important being the NDVI and VARI indices followed by green light; as expected, the two indices highlight the A. myosuroides and the green of the A. myosuroides plant is picked out against the background of the soil and crop. Previous studies of multispectral images on weed detection have reported that NIR light is important for this task (López-Granados et al. Reference López-Granados, Peña-Barragán, Jurado-Expósito, Francisco-Fernández, Cao, Alonso-Betanzos and Fontenla-Romero2008; Smith and Blackshaw Reference Smith and Blackshaw2003). The classifiers can be used to generate prediction maps of A. myosuroides locations. Figure 5 shows one of the fields surveyed using the two-classes MLP classifier with the threshold at 5 g. Figure 5a is prediction map overlaid on the aerial image; the red areas are where the classifier predicts the location of A. myosuroides with a probability greater than 75%, and the locations have been grouped into clusters to more clearly show the areas with weeds. Figure 5A shows the map of the classifier’s prediction probability, demonstrating the classifier is confident across most of the field. These maps could be used to remove A. myosuroides more efficiently by focusing weed management methods on the highlighted areas rather than the whole field.
We have shown UAV imagery and machine learning can be used to accurately classify and predict the presence and absence of A. myosuroides. Across twelve out-of- sample fields, our classifier predicts the absence (69.46%) and presence (77.51%) of A. myosuroides correctly, with Cohen’s kappa from 0.177 to 0.186. Other studies have higher kappas, 0.22 to 0.32. Direct comparison of metrics is hard, as these studies used visual observation to classify A. myosuroides abundance (Lambert et al. Reference Lambert, Hicks, Childs and Freckleton2018, Reference Lambert, Childs and Freckleton2019) and larger sample areas. In our study, larger weed plants (>3 g) can be more accurately detected, as smaller plants are harder to distinguish against the background of soil and crops.
Our results show that using only a RGB color camera and deriving the VARI yields accurate results as good as those produced using multispectral cameras. This may be useful where only RGB color images are available and to reduce costs and weight by not using multispectral cameras on UAVs. Of the two machine learning methods shown, MLP was slightly more accurate than gradient boosting at classifying pixels as either a weed or not a weed for both multispectral and RGB images.
The classifier’s accuracy is low in individual fields, as in other studies (Lambert et al. Reference Lambert, Hicks, Childs and Freckleton2018, Reference Lambert, Childs and Freckleton2019), due to fields not containing abundant and varying amounts and sizes of weed plants. As with all machine learning tasks, the main limitation is the availability of a large amount of relevant, labeled data, in this task, collected across multiple fields at different growth stages of the crops and weeds. The method may work better if samples and images are collected slightly earlier in the growing season (early June to July) when previous studies collected data (Lambert et al. Reference Lambert, Hicks, Childs and Freckleton2018, Reference Lambert, Childs and Freckleton2019; Su et al. Reference Su, Yi, Coombes, Liu, Zhai, McDonald-Maier and Chen2022) or from a longer period of time across the growing season. Like other studies in this area, our classifiers are unlikely to work well outside the context in which they were trained (Lambert et al. Reference Lambert, Hicks, Childs and Freckleton2018, Reference Lambert, Childs and Freckleton2019). This is especially true with respect to the growth stage of the crop. The data we trained on were collected late in the season (late July) when the crop was yellower (more senesced) than the weed. As a result, this approach is highly unlikely to work before senescence has begun without additional data and model training. We also targeted fields with known A. myosuroides infestations. So most green weeds in these fields were A. myosuroides. But this is not always going to be the case, and in fields with large populations of other weed species, greener areas are more likely to be patches of other species. Collecting data to train classifiers across multiple species would be a valuable, if expensive, project.
However, our classifiers worked successfully even very late in the season just before harvesting. The efficacy of our method in detecting larger weeds highlights its suitability for utilizing UAV imagery and implementing weed management later in the growing season when the weeds are larger. Most common herbicides used to stop A. myosuroides are preemergence herbicides, applied before the crop and weeds emerge. As these images are taken at the end of the season, the model presented can be used to predict weed distributions at the end of the growing season. Late-season images can be used to track the progress of weed control programs both in season and over longer time periods. Also, if only large patches require control, weed maps from the previous year can be used for variable herbicide application in the following rotation, because large weed patches do not move much from year to year. But if very high levels of weed control, with even small individual plants targeted, are required, the classifier used would have to have very low FN rates, even for small plants, much lower than the best classifier we present here. Such a classifier could be built but will require higher-quality images (possibly collected from ground vehicles) and much larger and more diverse annotated data sets. Future work could also involve agricultural economics surveys to determine the best time to capture UAV images, use the classification method, and apply herbicides. There are no major technical barriers to achieving this, but it would require sustained investment over multiple years.
Acknowledgments
This research was supported by Innovate UK Project 105137, Autonomous Robotics Weeding Arable Crops. We thank ARWAC ltd. for their support and access to farms, equipment and facilities. No conflicts of interest have been declared.