Introduction
Poppy (also common poppy or corn poppy; Papaver rhoeas L., PAPRH) is one of the most harmful weed species regarding its infestation index in winter cereals in Spain, particularly in the Guadalquivir Valley (Andalusia, southern Spain) (Saavedra et al. Reference Saavedra, García-Torres, Hernández-Bermejo and Hidalgo1989), and it is also the most important dicotyledonous (broadleaved) weed in Cataluña (northeastern Spain), where P. rhoeas can decrease wheat (Triticum spp.) yields up to 32% (Torra et al. Reference Torra, Gonzáles-Andújar and Recasens2008). This is due to biological factors related to its capacity to colonize, persist, and compete for water and light with crops. These capacities are associated with significant seed production and, as a consequence, a persistent seedbank with an extended period of germination that is the primary source of P. rhoeas in winter cereals (Cirujeda et al. Reference Cirujeda, Recasens and Taberner2006, Reference Cirujeda, Recasens, Torra and Taberner2008; Holm et al. Reference Holm, Doll, Holm, Pancho and Herberger1997; Torra and Recasens Reference Torra and Recasens2008; Wilson et al. Reference Wilson, Wright, Brain, Clements and Stephens1995). However, P. rhoeas infestations can also be related to incorrect agronomic praxis when there is no crop rotation and weed control usually consists of a single mix of postemergence herbicides sprayed from November to March, and alternation of herbicides from year to year is not a common practice (Torra et al. Reference Torra, Cirujeda, Taberner and Recasens2010). All these characteristics contribute to P. rhoeas being considered a difficult to control weed. This situation is becoming worse due to herbicide-resistant biotypes, especially in mono–cereal crop systems, together with the reduction of tillage practices to reduce costs and soil degradation and consequently greater herbicide dependence (Cirujeda et al. Reference Cirujeda, Recasens and Taberner2001; Torra et al. Reference Torra, Cirujeda, Taberner and Recasens2010).
Traditionally, weed management has been addressed through the application of control measures to the entire field crop. However, management programs, including herbicide use, can be optimized by targeting weed-infested areas and not treating weed-free areas (Heijting et al. Reference Heijting, Van Der Werf, Stein and Kropff2007; Izquierdo et al. Reference Izquierdo, Blanco-Moreno, Chamorro, Recasens and Sans2009, Reference Izquierdo, Milne, Recasens, Royo-Esnal, Torra, Webster and Baraibar2020; Jurado-Expósito et al. Reference Jurado-Expósito, López-Granados, García-Torres, García-Ferrer, Orden and Atenciano2003). This is the basis of site-specific weed management (SSWM), which involves weed control measures only where and when they are truly needed, that is, rational weed management within a crop field to match the variation in location, density, and composition of the weed population (Fernández-Quintanilla et al. Reference Fernández-Quintanilla, Dorado, Andújar, Peña, Chantre and González-Andújar2020). One of the crucial components for SSWM is to provide accurate and timely postemergence control based on early weed infestation maps to reduce the competition between weeds and crop in the early growth phases. Remote or proximal (on-ground) sensing technologies are able to capture information that, once processed and analyzed, can be used to detect and classify weeds and crop and provide maps on which an SSWM strategy can be based, as reviewed by several authors (Fernández-Quintanilla et al. Reference Fernández-Quintanilla, Peña, Andújar, Dorado, Ribeiro and López-Granados2018; López-Granados Reference López-Granados2011; Mulla Reference Mulla2013; Peteinatos et al. Reference Peteinatos, Weis, Andújar, Rueda Ayala and Gerhards2014).
In relation to remote sensing, reliable information has been published about imagery acquired by unmanned aerial vehicles (UAVs) to provide accurate and early weed infestation maps at the field scale in vineyard and wide-row crops (e.g., sunflower (Helianthus annuus L.), cotton (Gossypium hirsutum L.), and maize (Zea mays L.), which are usually sown 70 cm apart), providing significant herbicide savings (Jiménez-Brenes et al. Reference Jiménez-Brenes, López-Granados, Torres-Sánchez, Peña, Ramírez, Castillejo-González and de Castro2019; López-Granados et al. Reference López-Granados, Torres-Sánchez, De Castro, Serrano-Pérez, Mesas-Carrascosa and Peña2016; Peña et al. Reference Peña, Torres-Sánchez, Serrano-Pérez, de Castro and López-Granados2015; Pérez-Ortiz et al. Reference Pérez-Ortiz, Peña, Gutiérrez, Torres-Sánchez, Hervás-Martínez and López-Granados2016). These authors emphasized that crop and weed plants at the early phenological stage (4 to 6 leaves unfolded, Biologische Bundesanstalt, Bundessortenamt, and Chemical industry scale [BBCH] scale; Meier Reference Meier2001) generally have similar color, appearance, and reflectance characteristics and that the distribution of weeds can be as isolated plants or in small patches, indicating the necessity to work with very high spatial resolution imagery, as is generated by UAVs (<5-cm pixels). However, for narrow-row crops such as winter cereals (e.g., wheat or barley (Hordeum spp.)) usually sown in rows 15 cm apart, an additional difficulty arose for generating accurate orthomosaicked UAV imagery in which the crop rows must be correctly aligned (Gómez-Candón et al. Reference Gómez-Candón, De Castro and López-Granados2014; Mesas-Carrascosa et al. Reference Mesas-Carrascosa, Torres-Sánchez, Clavero-Rumbao, García-Ferrer, Peña, Borra-Serrano and López-Granados2015) for the first discrimination of bare soil and vegetation fraction (Torres-Sánchez et al. Reference Torres-Sánchez, Peña, de Castro and López-Granados2014, Reference Torres-Sánchez, López-Granados and Peña2015). According to that, and taking into account that weeds grow mixed with crop plants, early P. rhoeas identification in winter cereals is challenging as P. rhoeas must be detected when cereal crop plants do not cover the soil (to avoid the complete occlusion of weeds by the crop) and due to the very small size of P. rhoeas plants (2 to 4 leaves unfolded, BBCH scale; Meier Reference Meier2001) at the time when weed control (e.g., herbicide application) is recommended. Detection of P. rhoeas has been addressed by Peña-Barragán et al. (Reference Peña-Barragán, de Castro, Torres-Sánchez, Jiménez-Brenes, Valencia and López Granados2017) using UAVs flying at 30-m altitude and images with 0.60 cm pixel−1 of spatial resolution, achieving good discrimination between bare soil and wheat but limited accuracy in weed detection. Additionally, Pflanz et al. (Reference Pflanz, Nordmeyer and Schirrmann2018) and de Camargo et al. (Reference Camargo, Schirrmann, Landwehr, Dammer and Pflanz2021) detected P. rhoeas in a more advanced growth stage using a UAV flying at 1 to 6 m over the ground.
One potential alternative is to use on-ground imagery of the winter cereal under field conditions, as this imagery allows millimeter resolution. However, to our knowledge, there are only a few papers supporting the objective of early weed detection in winter cereals. Tellaeche et al. (Reference Tellaeche, Pajares, Burgos-Artizzu and Ribeiro2011) discriminated between avena (Avena sterilis L.) and wheat plants at early growth phases by using an automatic computer vision system involving image segmentation and decision making. Pérez et al. (Reference Pérez, López, Benlloch and Christensen2000) detected broadleaved weeds in cereal crops by analyzing RGB imagery and detected weeds by locating plant material between rows of a small grain cereal crop in images recorded in video format. Andújar et al. (Reference Andújar, Weis and Gerhards2012) used an ultrasonic sensor in a wheat field infested by grass and broadleaved weeds and found that sensor readings were well correlated with weed density and coverage. However, there are not yet any studies reported about the use of proximal sensing for the early and timely detection of P. rhoeas in winter cereals to provide further development of an SSWM strategy. This objective could be achieved by improved and more powerful and efficient computer capacity together with advances in graphical processing units (GPUs) for accurately processing a large volume of data in a short time.
Different image processing techniques have been applied for weed and crop classification (Hemming and Rath Reference Hemming and Rath2002; Woebbecke et al. Reference Woebbecke, Meyer, Von Bargen and Mortensen1995). The main challenge is that both crops and weeds can have similar visual characteristics like color or texture. Thus, by extracting shape, color, and texture features (Kazmi et al. Reference Kazmi, Garcia-Ruiz, Nielsen, Rasmussen and Andersen2015; Meyer et al. Reference Meyer, Mehta, Kocher, Mortensen and Samal1998), it is possible to identify weeds and crops. However, weed identification faces issues with respect to lighting; image resolution; soil type; and small variations between weeds and crops in terms of shape, texture, color, and position (i.e., overlapping) (Dyrmann et al. Reference Dyrmann, Karstoft and Midtiby2016). The use of deep learning (DL) models for weed classification and detection offers a new framework capable of successfully dealing with the particularities of early-season weed detection. Depending on how training and validation data are labeled, DL methods are classified as supervised, unsupervised, and semi-supervised (Hasan et al. Reference Hasan, Sohel, Diepeveen, Laga and Jones2021). Supervised DL methods, in which training and validation data sets are manually labeled, are being used by most studies to detect and classify weeds in crops (Khan et al. Reference Khan, Tufail, Khan, Khan and Anwar2021; Sharpe et al. Reference Sharpe, Schumann and Boyd2020). Unsupervised methods are those in which the training data set is not labeled. These unsupervised methods, although used less often, can achieve good-quality weed discrimination, reducing the time costs of manual data labeling (dos Santos Ferreira et al. Reference Freitas, da Silva, Pistori and Folhes2019). Finally, semi-supervised methods take the middle ground between supervised and unsupervised learning (Shorewala et al. Reference Shorewala, Ashfaque, Sidharth and Verma2021). Su et al. (Reference Su, Qiao, Kong and Sukkarieh2021) proposed a real-time segmentation of interrow rigid ryegrass (Lolium rigidum Gaudin) weed plants in wheat using on-ground sensing. The performance of different DL techniques has been recently reviewed for detecting, localizing, and classifying a wide range of broadleaved and grass weed species in an extensive set of herbaceous crops, including wide- and narrow-row crops, using proximal and remote images (Hasan et al. Reference Hasan, Sohel, Diepeveen, Laga and Jones2021). Among the existing DL models, the “You Only Look Once” (YOLO) architecture stands out, because of its fast inference and high accuracy, and it is also able to calculate and predict all feature images at the same time (Thuan Reference Thuan2021). YOLO is an end-to-end DL-based detection model that determines the bounding boxes of the objects present in the image and classifies them in a single pass. Thus, although there are two-stage models, these are not effective for application in the detection of vegetation (crops or weeds), mainly due to their slow detection speed (Jin et al. Reference Jin, Sun, Che, Bagavathiannan, Yu and Chen2022). In addition, YOLO models use a combination of HSV (hue, saturation, value) color space modification and classical methods (rotation, translation, scale, shear, perspective, flip, mosaic, or blending, among others) for data augmentation. Therefore, this increase in the variability of data enables an improvement of the results (Perez and Wang Reference Perez and Wang2017). Since the first publication of YOLO (Redmon et al. Reference Redmon, Divvala, Girshick and Farhadi2016), five versions of this DL model have appeared, each including new features (Thuan Reference Thuan2021). Among these versions, modified architectures of YOLOv3 and YOLOv4 have been successfully applied for weed detection in different wide-row crops (Czymmek et al. Reference Czymmek, Harders, Knoll and Hussmann2019; Gao et al. Reference Gao, French, Pound, He, Pridmore and Pieters2020; Partel et al. Reference Partel, Charan Kakarla and Ampatzidis2019; Ying et al. Reference Ying, Xu, Zhang, Shi and Liu2021). Despite all this work, no one has reported using YOLO for P. rhoeas discrimination in winter cereals.
The main motivation for the present study is to evaluate if it is possible to detect P. rhoeas at the early stage, which involves small object detection in a field-scale scenario (real world). In this context, the objective of this work was to assess the accuracy and inference speed of different YOLO versions (from v3 to v5) for early P. rhoeas detection in proximal RGB images acquired in a wheat field. To the best of the authors’ knowledge, this is the first time that YOLOv5 has been proposed for weed detection.
Materials and Methods
Figure 1 summarizes the process carried out to detect P. rhoeas in wheat at an early stage under real field conditions using proximal RGB sensors. First, the data acquisition and subsequent data annotation for each weed plant were carried out. Six object-detection models based on different YOLO architectures were studied due to their strong feature-learning capabilities. Each DL model was evaluated considering object-detection quality and performance in terms of hardware requirements. Quality was evaluated through precision, recall, F1-score, mean average precision (@mAP.5), and accuracy metrics, whereas GPU and central processing unit (CPU) capabilities were analyzed to determine the most efficient model in the inference phase.
Data Collection
In this study, images were acquired at noon from a commercial winter wheat cereal plot located at Artesa de Segre, Spain (41.908572ºN, 1.012275ºE, WGS-84). Due to the importance of weed identification at the early growth stage, images were registered at growth stages 12 to 14 BBCH scale (Meier Reference Meier2001), corresponding to the 2 to 4 leaves unfolded and considering different situations. As an example, as shown in Figure 2, P. rhoeas appeared isolated (Figure 2, no. 4) or partially hidden by the crop (Figure 2, no. 2), together with crop residues (Figure 2, no. 3), or on soil with different tonalities (Figure 2, nos. 1 and 4) and plant densities, thus providing a robust data set including all possible situations to be considered in weed detection. Images were recorded with a Sony FDR-AX100E (Sony Corporation, Tokyo, Japan), which is an RGB (red, green, blue) sensor, registering images at approximately 1.5-m height above ground level with a focal length equal to 9 mm. The sensor size had dimensions equal to 13.2 by 8.8 mm and a resolution of 14.2 megapixels. This sensor configuration and the image acquisition height resulted in a ground sample distance of approximately 0.5 mm.
One of the challenges of detecting weeds is the low detection accuracy caused by their small size (Oh et al. Reference Oh, Chang, Ashapure, Jung, Dube, Maeda, Gonzalez and Landivar2020). In this context, a small object–detection task in computer vision is one in which the size of the object is less than or equal to 32 by 32 pixels in an image of 640 by 640 pixels (Tong et al. Reference Tong, Wu and Zhou2020). In the present study, registered RGB images have dimensions equal to 5,024 by 2,824 pixels, while P. rhoeas ranges from 18 by 18 to 5 by 5, which makes it classifiable as a small object–detection task. Because the object-detection model requires RGB images with dimensions equal to 416 by 416 pixels as input, the original images need to be automatically resized by the detection algorithm. However, being at the seedling growth stage, the P. rhoeas plants were very small, so a significant loss of information would be generated in this resampling process. Therefore, to work with the images at full spatial resolution, we cropped them to 416 by 416 pixels using OpenCV (Open Source Computer Vision Library; OpenCV 2021), which resulted in a new image data set in which each individual RGB image was cropped into 82 individual images. These smaller image sizes allowed faster identification (Jiang et al. Reference Jiang, Ergu, Liu, Cai and Ma2022), and maintaining the spatial resolution ensured no information was lost during the training process. Subsequently, the cropped images were mosaicked back to the original size of the input image. In this study, 77 images were cropped to 416 by 416 pixels, resulting in a data set of 6,319 images.
Object-Detection Algorithms
Among all YOLO architecture versions, the following models and levels were analyzed: YOLOv3, Scaled-YOLOv4 (YOLOv4-CSP and YOLOv4-P5 levels), and YOLOv5 (YOLOv5-s, YOLOv5-m, and YOLOv5-l). Although both YOLOv4-P7 and YOLOv4-P9 have deeper neural networks, they could not be evaluated due to limited available memory of the computer. In addition, previous YOLO architectures were not considered, as they reuse classifiers or locators to perform detection, while YOLOv3 in 2018 was the first version to divide the image into regions and predict the bounding boxes and weighted probabilities of each region (Redmon and Farhadi Reference Redmon and Farhadi2018). Thus, YOLOv3 changed the softmax classifier into an independent logistic regression classifier, offering a fast processing neural network with 30 images or frames per second (FPS) and a @mAP.5 of 57.9% on the Microsoft COCO database (Redmon and Farhadi Reference Redmon and Farhadi2018). In addition, predictions on YOLOv3 are made on the whole image and not by region, allowing it to be based on a global context. However, YOLOv3 showed problems with multiscale features, and Scaled-YOLOv4 was subsequently developed in 2020. It is a lightweight version of YOLOv3 that reduces the requirements of equipment, offering an accuracy on the Microsoft COCO data set equal to 55.8% (Bochkovskiy et al. Reference Bochkovskiy, Wang and Liao2020) with the best speed-to-precision ratios, ranging from 15 FPS to 1,774 FPS. In addition, Scaled-YOLOv4 is an improvement over YOLOv3 with the inclusion of cross stage partial (CSP) (Wang et al. Reference Wang, Liao, Wu, Chen, Hsieh and Yeh2020), which allows cross networks to be increased by augmenting the scaling. To date, YOLOv5, published in 2020 (Jocher et al. Reference Jocher, Stoken and Borovec2020), is the latest version of YOLO. It adds unique focus and bottleneck CSP modules to improve and merge image features to resolve the problem of missed and mischecked multiscale feature target detection.
Training Data Set Labeling
Because YOLO architecture is a supervised learning method, it requires manual labeling on images to learn by itself throughout the regions of interest (ROIs) definition. In addition, it also needs to manually define ROIs for the testing set to evaluate the quality of the model. Therefore, to train the models previously, ROIs in the images were manually delimited using the open-source image data annotation software Label Studio (https://github.com/heartexlabs/labelImg). Because our experiment relies on the accurate labeling of P. rhoeas class in the image data set, a group of three experts with expertise in weeds carried out the labeling of the data set. First, one expert performed preliminary labeling work on the data set. Then, another expert checked the annotations and corrected possible mislabeling. The third expert checked all the annotation work to ensure the consistency of the results. From the image data set, a total of 11,170 P. rhoeas samples were marked, taking into account different criteria. First, each ROI should cover as little background as possible. In addition, ROIs should consider that P. rhoeas could be in different situations: (1) isolated, (2) partially hidden (occluded) by the crop, (3) together with crop residues, and (4) with soil as background showing different tonalities. Subsequently, the samples were divided into three groups, with 70% used for training, 20% for testing, and 10% for validating the model.
Model Parameter Selection
The training was carried out on the Google Colab Pro service with a 16-GB Tesla P100 GPU and 16 GB RAM. To increase the accuracy for P. rhoeas detection, the six YOLO architectures were trained on the P. rhoeas data set. The newly learned features were coupled with pretrained weights of the COCO network with the help of transfer learning.
Thus, the hyperparameters for the six YOLO architectures were optimized, and the initial weights provided by the authors of the architectures for the pretrained COCO network were used. However, the batch size selected for training depended on the YOLO architecture, with a value equal to 32 for YOLOv3 and YOLOv4 and 64 for YOLOv5 used. Finally, each model was trained up to 500 epochs, taking into account that learning converged properly (Oppenheim et al. Reference Oppenheim, Shani, Erlich and Tsror2019).
Evaluation
First, for the assessment of the P. rhoeas object-detection model considering each architecture and according to a combination of the true and predicted labels, true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) were counted to calculate the precision, recall, F1-score, @mAP.5, and accuracy metrics based on the definitions presented in Table 1.
Second, we evaluated which of the six YOLO architectures was the most efficient in terms of the use of computational resources. This was assessed by determining the percentage of GPU usage, the number of epochs per second analyzed in training, and the FPS inferred using both GPU and CPU in testing.
Results and Discussion
The optimized hyperparameters for YOLO architectures 3, 4, and 5 are shown in Table 2. These results were obtained taking into account 300 iterations every 15 epochs and using as initial weights those provided by the authors of YOLO from the pretraining of the COCO network. Subsequently, each model was trained for 500 epochs to guarantee learning convergence by using these optimized hyperparameters.
a HSV, hue, saturation, value. A dash (—) indicates a hyperparameter not used by the YOLOv3, YOLOv4, and YOLOv5 models.
Table 3 shows the precision, recall, F1-score, @mAP.5, and accuracy values obtained in the testing phase for P. rhoeas detection. Both YOLOv4 versions presented the lowest values in precision, below 70%, while YOLOv3 and the three YOLOv5 architectures exceeded this value. Of these, YOLOv3 achieved the highest precision (78.6%), followed by YOLOv5l and -v5s, with values equal to 78.6%, 77.8%, and 76%, respectively. However, YOLOv4-CSP showed the highest positive true rate, with a recall equal to 81.1%, while YOLOv3 presented the lowest rate, 64.4%, followed by YOLOv5l, with a recall of 67.2%. The rest of the YOLO architectures displayed a recall value above 75%. Therefore, the YOLOv5 models showed better precision and recall than the other architectures, which are summarized in the F1-score values. Among them, YOLOv5s achieved the highest F1-score value, equal to 75.3%, while YOLOv4-CSP achieved the lowest value, equal to 70.0%. In addition, YOLOv5s showed the highest @mAP.5 (76.2%), while YOLOv3 showed the lowest @mAP.5 (62.2%), followed by YOLOv5l (65%). All the other architectures obtained a value of 71%. Finally, YOLOv4-P5 and YOLOv5s showed similar accuracies in P. rhoeas detection, with values equal to 79% and 77%, respectively. However, YOLOv3 and YOLOv5l generated the worst results, with values equal to 51% and 57%, respectively. Thus, in terms of precision, recall, F1-score, @mAP.5, and accuracy, YOLOv5 achieved better results than YOLOv3 and YOLOv4. In addition, based on the F1-score and @mAP.5, YOLOv5s offered the best results, in agreement with Ayachi et al. (Reference Ayachi, Afif, Said and Atri2020) and Wang and Yan (Reference Wang and Yan2021). Therefore, these results showed that the quality in the detection of P. rhoeas depends on both the architecture and the YOLO version selected.
Table 4 shows a comparison of the computational resources for training and testing used by the YOLO models evaluated. First, for the YOLOv3 and YOLOv4 architectures, it was necessary to consider a batch size equal to 32 versus 64 for YOLOv5 in the training phase, because in the former, the virtual machine was not able to finish the training process. Regarding the percentage of GPU memory usage, YOLOv5l was the most demanding, with a value equal to 98.8%. In contrast, YOLOv5s was the lowest, followed by YOLOv5m, with values of 40% and 44.8%, respectively. In addition, YOLOv4-P5 and YOLOv5l showed the highest number of epochs processed per second, 65 and 60, respectively, whereas, YOLOv5s achieved the lowest capacity, with only 18. From an inference point of view, YOLOv5s offered the highest FPS values analyzed using both GPU and CPU with a total of 83 and 7 frames, respectively. However, YOLOv4-P5s showed the least inference capacity, with values equal to 28 and 0.2. Therefore, YOLOv5s offered the best quality in the weed-detection process and the highest analytical capacity.
Figure 3 illustrates an example of the results of P. rhoeas detection in the testing phase using different YOLO architectures; weed plants are marked by bounding boxes with corresponding probabilities. These samples show how P. rhoeas detection can be successfully obtained in different real field situations, such as the presence or absence of crop plants, soil tonalities, or the presence of crop residues. The results changed according to the YOLO model assessed.
Weed control in winter cereals is usually carried out using a set of practices, such as plowing; maintenance of permanent soil covers and zero or minimum tillage (e.g., 25% soil disturbance); crop rotation (avoiding a monocrop system); use of pre- or postemergence herbicides at the early crop and weed growth stage, alternating the kind of herbicide; and delaying the sowing date to allow weed emergence before crop emergence to increase the effect of herbicides. Regardless, P. rhoeas is a hard to control weed with evidence of herbicide-resistant biotypes due to incorrect agronomic praxis in many areas related to an inexistent crop rotation (mono–cereal crop systems) and a control tactic based on a single mix of postemergence herbicides without the diversification or alternation of herbicides from year to year that also helps to prevent nonresistant weeds becoming resistant.
To improve P. rhoeas control in wheat and based on the relevance of having timely and accurate weed maps at the seedling phase, a set of innovative, powerful, and efficient DL models focused on different YOLO architectures were studied. Table 3 shows all the quality metrics of the six object-detection models based on the YOLO architecture for locating and identifying young P. rhoeas plants in wheat. The results indicated that the use of YOLOv5s was suitable for the detection of P. rhoeas in the early season from on-ground imagery acquired in a wheat crop. The proposed DL model was able to detect approximately 75% of the P. rhoeas plants in the testing images, and approximately 75% of the detected objects were actual P. rhoeas plants. Previous works have used other DL models, such as VGGNet, GoogleNet, DetectNet, or Mask R-CNN (Mini et al. Reference Mini, Sales and Luppe2020; Peteinatos et al. Reference Peteinatos, Reichel, Karouta, Andújar and Gerhards2020; Ying et al. Reference Ying, Xu, Zhang, Shi and Liu2021; Yu et al. Reference Yu, Schumann, Cao, Sharpe and Boyd2019a, Reference Yu, Sharpe, Schumann and Boyd2019b), to detect weeds with an accuracy ranging from 70% to 99%. The present work achieves the accuracy range of those works, but it must be noted that in the working scenario in most of those studies, the weed appears isolated from the crop and/or at an advanced stage of development, which simplifies the problem. For example, de Camargo et al. (Reference Camargo, Schirrmann, Landwehr, Dammer and Pflanz2021) achieved precision and recall values of approximately 90% using the ResNet-18 model. However, P. rhoeas was in growth stages 17 to 19 on the BBCH scale (Meier Reference Meier2001) with a range of 6 to 9 true leaves according to Pflanz et al. (Reference Pflanz, Nordmeyer and Schirrmann2018), who used the same data set, while in the present work, P. rhoeas plants were in growth stages 12 to 14 of the BBCH scale with 2 to 4 true leaves. Papaver rhoeas plants at those growth stages are more difficult to detect, but it is a suitable period for herbicide application. Furthermore, although both references reported the use of UAV imagery, the good accuracy they achieved could also be related to the fact that they used imagery with higher spatial resolution (between 0.1 and 0.5 mm) than the images used in the present work (approximately 0.5 mm).
Focusing the discussion specifically on the use of different YOLO versions for weed detection, the present work has also achieved accuracies similar to those reached by other works. For example, modified models of YOLOv3 have been used for hedge bindweed [Calystegia sepium (L.) R. Br.] detection in sugar beet (Beta vulgaris L.) fields with a mAP around 0.80(Gao et al. Reference Gao, French, Pound, He, Pridmore and Pieters2020); for weed detection in organic carrot (Daucus carota L. var. sativus Hoffm.) fields with an F1-score of 0.88 using input images with a relatively large size (832 by 832 pixels) (Czymmek et al. Reference Czymmek, Harders, Knoll and Hussmann2019); and for purslane (Portulaca spp.) detection with precision and recall of 71% and 78%, respectively (Partel et al. Reference Partel, Charan Kakarla and Ampatzidis2019). More recently, a modified version of YOLOv4 has been used for the detection of broadleaved and grass weeds (crabgrass [Digitaria spp.]; water plantain [Alisma spp.]; persicaria [Polygonum spp.]) in carrot fields, achieving a mAP of 87.80% (Ying et al. Reference Ying, Xu, Zhang, Shi and Liu2021).
The application of DL models in smart weeding technologies must consider both the accuracy of weed detection and the speed of inference. Of all the DL models, YOLO has been used in this work, because it is one of the fastest neural networks for real-time execution environments that do not have high-capacity hardware specifications (Kim et al. Reference Kim, Sung and Park2020). All the DL models evaluated in this work, with the exception of YOLOv4-P5, reached satisfactory inference speeds, allowing frame rates greater than 40 FPS when implemented on a GPU. Consequently, they could be studied for use in on-the-go P. rhoeas detection (e.g., in a robotic vehicle) by means of video cameras, as the standard frame rate for most off-the-shelf cameras is 30 FPS.
Our results are in the same range as previous works where smart herbicide sprayer and weed detection were used (Hussain et al. Reference Hussain, Farooque, Schumann, McKenzie-Gopsill, Esau, Abbas, Acharya and Zaman2020; Partel et al. Reference Partel, Charan Kakarla and Ampatzidis2019). Thus, in our work, the DL model developed was based on object detection in such a way that the results obtained are suited to a smart herbicide sprayer being able to target individual weed plants using nozzles with narrow spray-distribution patterns. This would allow the implementation of SSWM strategies with the associated economic and environmental benefits and would follow the set of guidelines reported in European legislation addressing the sustainable use of pesticides (European Commission 2009, 2019), which are compatible with SSWM. According to these control tactics, object detection could also be appropriate for mechanical or alternative weed control using knives (Raja et al. Reference Raja, Nguyen, Slaughter and Fennimore2020), laser beams (Marx et al. Reference Marx, Barcikowski, Hustedt, Haferkamp and Rath2012), or flaming (Gonzalez-de-Santos et al. Reference Gonzalez-de-Santos, Ribeiro, Fernandez-Quintanilla, Lopez-Granados, Brandstoetter, Tomic, Pedrazzi, Peruzzi, Pajares, Kaplanis, Perez-Ruiz, Valero, del Cerro, Vieri, Rabatel and Debilde2017). Once the detection model has been developed to detect P. rhoeas, future research should consider the development of a system that, by registering RGB videos in real time, detects the presence of P. rhoeas by a DL model such as the proposed one and sends a signal to a smart weeding implement. Thus, the variable-rate sprayer should include different RGB image sensors for capturing images, a computing unit for image processing, a microcontroller board to control the operations, several spray nozzles with valves, and a real-time kinematic GNSS receiver. Once images are acquired, detection models based on DL would be run by the computing unit. If a P. rhoeas plant were detected, its relative position would be calculated, and valves would be actuated to spray on the target. Simultaneously, its position would be registered by GNSS receiver to generate an infestation map.
In summary, a P. rhoeas detection model at an early stage was developed based on YOLO DL models (v3 to v5) through proximal RGB images taken in real wheat field conditions. To the best of the authors’ knowledge, this is the first time that YOLOv5 has been proposed for weed seedling detection. This research was based on evaluating the quality of the models as well as the velocity and use of hardware resources in the detection process. The detection of P. rhoeas was carried out in a commercial plot with natural infestation (i.e., under uncontrolled situations), which reinforces the robustness obtained. Our results show that the quality and velocity of the detection varied according to the version of YOLO used, with YOLOv5s providing the best results. Therefore, the developed model can be integrated into a more complex scheme in which other systems, such as GNSS sensors, smart sprayers, or mechanical tools, should be involved. Such an integrated system would allow the generation of infestation maps or the real-time application of herbicides to improve the management of this hard to control weed.
Acknowledgments
This research was financed by project PID2020-113229RB-C44 (AEI/10.13039/501100011033) (Spanish MCIN funds). The authors thank J. Recasens, J. M. Peña, and A. I. de Castro for their help in field surveys. No conflicts of interest have been declared.