6D Posture Estimation of Road Vehicles from Color Images

Currently, in the field of object posture estimation, there is research on estimating the position and angle of an object by storing a 3D model of the object to be estimated in advance in a computer and matching it with the model. However, in this research, we have succeeded in creating a module that is much simpler, smaller in scale, and faster in operation. Our 6D pose estimation model consists of two different networks – a classification network and a regression network. From a single RGB image, the trained model estimates the class of the object in the image, the coordinates of the object, and its rotation angle in 3D space. In addition, we compared the estimation accuracy of each camera position, i.e., the angle from which the object was captured. The highest accuracy was recorded when the camera position was 75°, the accuracy of the classification was about 87.3%, and that of regression was about 98.9%.





References:
[1] X. Chen, H. Ma, J. Wan, B. Li and T. Xia, "Multi-view 3D Object Detection Network for Autonomous Driving," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 6526-6534, doi: 10.1109/CVPR.2017.691.
[2] D. Wu, Z. Zhuang, C. Xiang, W. Zou and X. Li, "6D-VNet: End-To-End 6DoF Vehicle Pose Estimation from Monocular RGB Images," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2019, pp. 1238-1247, doi: 10.1109/CVPRW.2019.00163.
[3] Menglong Zhu, Konstantinos G Derpanis, Yinfei Yang, Samarth Brahmbhatt, Mabel Zhang, Cody Phillips, Matthieu Lecce, and Kostas Daniilidis. Single image 3d object detection and pose estimation for grasping. In 2014 IEEE International Conference on Robotics and Automation (ICRA), pages 3936–3943. IEEE, 2014.
[4] F. Tang, Y. Wu, X. Hou and H. Ling, "3D Mapping and 6D Pose Computation for Real Time Augmented Reality on Cylindrical Objects," in IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 9, pp. 2887-2899, Sept. 2020, doi: 10.1109/TCSVT.2019.2950449.
[5] C. Wu, L. Chen, Z. He and J. Jiang, "Pseudo-Siamese Graph Matching Network for Textureless Objects' 6D Pose Estimation," in IEEE Transactions on Industrial Electronics, doi: 10.1109/TIE.2021.3070501.
[6] A. Doumanoglou, R. Kouskouridas, S. Malassiotis and T. Kim, "Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3583-3592, doi: 10.1109/CVPR.2016.390.
[7] E. Brachmann, F. Michel, A. Krull, M. Y. Yang, S. Gumhold and C. Rother, "Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3364-3372, doi: 10.1109/CVPR.2016.366.
[8] Alex Krizhevsky, Ilya Sutskever and Geoffrey E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks”. NIPS12, VoL1, pp-1097-1105, (2012).
[9] Komatsu R, Gonsalves T. Comparing U-Net Based Models for Denoising Color Images. AI. 2020; 1(4):465-486. https://doi.org/10.3390/ai1040029
[10] Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. Available online: https://arxiv.org/abs/1409.1556 (accessed on 1 May 2020).
[11] Galea, C.; Farrugia, R.A. Matching Software-Generated Sketches to Face PhotographsWith a Very Deep CNN, Morphed Faces, and Transfer Learning. IEEE Trans. Inf. Forensics Secur. 2017, 13, 1421–1431.
[12] Yu Xiang, Tanner Schmidt, Venkatraman Narayanan, Dieter Fox, “PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes”, arXiv preprint arXiv: 1711.00199, (2019).
[13] Alirezazadeh, P., Yaghoubi, E., Assunção, E., Neves, J. C., & Proença, H. (2019, September). Pose Switch-based Convolutional Neural Network for Clothing Analysis in Visual Surveillance Environment. In 2019 International Conference of the Biometrics Special Interest Group (BIOSIG) (pp. 1-5). IEEE.
[14] Yi Li, Gu Wang, Xiangyang Ji, Yu Xiang, Dieter Fox, "DeepIM: Deep Iterative Matching for 6D Pose Estimation", arXiv preprint arXiv: 1804.00175, (2019).
[15] S. Hinterstoisser, V. Lepetit, S. Ilic, P. Fua, and N. Navab, “Dominant Orientation Templates for Real-Time Detection of Texture-Less Objects,” in IEEE Conference on Computer Vision and Pattern Recognition, 2010.
[16] Muja, M., Rusu, R. B., Bradski, G., & Lowe, D. G. (2011, May). Rein-a fast, robust, scalable recognition infrastructure. In 2011 IEEE international conference on robotics and automation (pp. 2939-2946). IEEE.
[17] Hinterstoisser S, Cagniart C, Ilic S, Sturm P, Navab N, Fua P, Lepetit V, “Gradient response maps for real-time detection of texture less objects.”, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 34(5):876-888 (2012)
[18] E. Muñoz, Y. Konishi, C. Beltran, V. Murino and A. Del Bue, "Fast 6D pose from a single RGB image using Cascaded Forests Templates," 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016, pp. 4062-4069, doi: 10.1109/IROS.2016.7759598.
[19] Tremblay J, To T, Sundaralingam B, Xiang Y, Fox D,Birchfield S, “Deep object pose estimation for semantic robotic grasping of household objects.”, In: Conference on Robot Learning, pp 306-316, (2018)
[20] V. L. Tran and H. -Y. Lin, "3D Object Detection and 6D Pose Estimation Using RGB-D Images and Mask R-CNN," 2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2020, pp. 1-6, doi: 10.1109/FUZZ48607.2020.9177601.
[21] L. Peng, Y. Zhao, S. Qu, Y. Zhang and F. Weng, "Real Time and Robust 6D Pose Estimation of RGBD Data for Robotic Bin Picking," 2019 Chinese Automation Congress (CAC), 2019, pp. 5283-5288, doi: 10.1109/CAC48633.2019.8996450.
[22] Zeng A, Yu KT, Song S, Suo D, Walker E, Rodriguez A, Xiao J, “Multi-view self-supervised deep learning for 6D pose estimation in the amazon picking challenge.”, IEEE International Conference on Robotics and Automation (ICRA), pp 1386-1383, (2017)