A Comparison of YOLO Family for Apple Detection and Counting in Orchards

In agricultural production and breeding, implementing automatic picking robot in orchard farming to reduce human labour and error is challenging. The core function of it is automatic identification based on machine vision. This paper focuses on apple detection and counting in orchards and implements several deep learning methods. Extensive datasets are used and a semi-automatic annotation method is proposed. The proposed deep learning models are in state-of-the-art YOLO family. In view of the essence of the models with various backbones, a multi-dimensional comparison in details is made in terms of counting accuracy, mAP and model memory, laying the foundation for realising automatic precision agriculture.





References:
[1] Parrish, E.A. and Goksel, A.K., 1977. Pictorial pattern recognition applied to fruit harvesting. Transactions of the ASAE, 20(5), pp.822-0827.
[2] Zhou, R., Damerow, L., Sun, Y. and Blanke, M.M., 2012. Using colour features of cv. ‘Gala’apple fruits in an orchard in image processing to predict yield. Precision Agriculture, 13(5), pp.568-580.
[3] Qian, J., Yang, X., Wu, X., Chen, M. and Wu, B., 2012. Mature apple recognition based on hybrid color space in natural scene. Transactions of the Chinese Society of Agricultural Engineering, 28(17), pp.137-142
[4] Payne, A.B., Walsh, K.B., Subedi, P.P. and Jarvis, D., 2013. Estimation of mango crop yield using image analysis–segmentation method. Computers and electronics in agriculture, 91, pp.57-64.
[5] Si, Y., Liu, G. and Feng, J., 2015. Location of apples in trees using stereoscopic vision. Computers and Electronics in Agriculture, 112, pp.68-74.
[6] Li, D., Shen, M., Li, D. and Yu, X., 2017, August. Green apple recognition method based on the combination of texture and shape features. In 2017 IEEE International Conference on Mechatronics and Automation (ICMA) (pp. 264-269). IEEE.
[7] Tanco, M.M., Tejera, G. and Di Martino, M., 2018. Computer Vision based System for Apple Detection in Crops. In VISIGRAPP (4: VISAPP) (pp. 239-249).
[8] Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497.
[9] Bargoti, S. and Underwood, J., 2017, May. Deep fruit detection in orchards. In 2017 IEEE International Conference on Robotics and Automation (ICRA) (pp. 3626-3633). IEEE.
[10] Chen, S.W., Shivakumar, S.S., Dcunha, S., Das, J., Okon, E., Qu, C., Taylor, C.J. and Kumar, V., 2017. Counting apples and oranges with deep learning: A data-driven approach. IEEE Robotics and Automation Letters, 2(2), pp.781-788.
[11] Kitano, B.T., Mendes, C.C., Geus, A.R., Oliveira, H.C. and Souza, J.R., 2019. Corn plant counting using deep learning and UAV images. IEEE Geoscience and Remote Sensing Letters.
[12] Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580-587).
[13] He, K., Zhang, X., Ren, S., & Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence, 37(9), 1904-1916.
[14] Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 1440-1448).
[15] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016, October). Ssd: Single shot multibox detector. In European conference on computer vision (pp. 21-37). Springer, Cham.
[16] Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117-2125).
[17] [17] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
[18] Redmon, J., & Farhadi, A. (2017). YOLO9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7263-7271).
[19] Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767.
[20] Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934.
[21] Tian, Y., Yang, G., Wang, Z., Wang, H., Li, E. and Liang, Z., 2019. Apple detection during different growth stages in orchards using the improved YOLO-V3 model. Computers and electronics in agriculture, 157, pp.417-426.
[22] Häni, N., Roy, P., & Isler, V. (2020). MinneApple: a benchmark dataset for apple detection and segmentation. IEEE Robotics and Automation Letters, 5(2), 852-858.
[23] Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4510-4520).
[24] Loshchilov, I. and Hutter, F., 2016. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983.
[25] Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., & Ren, D. (2020, April). Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 07, pp. 12993-13000).
[26] Nan Yang. Introduction to YOLO series. https://blog.csdn.net/nan355655600/article/details/107852353