Gaits Stability Analysis for a Pneumatic Quadruped Robot Using Reinforcement Learning

Deep reinforcement learning (deep RL) algorithms leverage the symbolic power of complex controllers by automating it by mapping sensory inputs to low-level actions. Deep RL eliminates the complex robot dynamics with minimal engineering. Deep RL provides high-risk involvement by directly implementing it in real-world scenarios and also high sensitivity towards hyperparameters. Tuning of hyperparameters on a pneumatic quadruped robot becomes very expensive through trial-and-error learning. This paper presents an automated learning control for a pneumatic quadruped robot using sample efficient deep Q learning, enabling minimal tuning and very few trials to learn the neural network. Long training hours may degrade the pneumatic cylinder due to jerk actions originated through stochastic weights. We applied this method to the pneumatic quadruped robot, which resulted in a hopping gait. In our process, we eliminated the use of a simulator and acquired a stable gait. This approach evolves so that the resultant gait matures more sturdy towards any stochastic changes in the environment. We further show that our algorithm performed very well as compared to programmed gait using robot dynamics.





References:
[1] M. Focchi, E. Guglielmino, C. Semini, T. Boaventura, Y. Yang, and D. G. Caldwell, “Control of a hydraulically-actuated quadruped robot leg,” in 2010 IEEE International Conference on Robotics and Automation, 2010, pp. 4182–4188.
[2] K. Narioka, A. Rosendo, A. Badri-Sprowitz,¨ and K. Hosoda, “Development of a minimalistic pneumatic quadruped robot for fast locomotion,” 12 2012, pp. 307–311.
[3] N. Heess, D. TB, S. Sriram, J. Lemmon, J. Merel, G. Wayne, Y. Tassa, T. Erez, Z. Wang, S. M. A. Eslami, M. Riedmiller, and D. Silver, “Emergence of locomotion behaviours in rich environments,” 2017.
[4] X. B. Peng, G. Berseth, K. Yin, and M. van de Panne, “Deeploco: Dynamic locomotion skills using hierarchical deep reinforcement learning,” ACM Transactions on Graphics (Proc. SIGGRAPH 2017), vol. 36, no. 4, 2017.
[5] J. Tan, T. Zhang, E. Coumans, A. Iscen, Y. Bai, D. Hafner, S. Bohez, and V. Vanhoucke, “Sim-to-real: Learning agile locomotion for quadruped robots,” 2018.
[6] Z. Xie, G. Berseth, P. Clary, J. Hurst, and M. van de Panne, “Feedback control for cassie with deep reinforcement learning,” 2018.
[7] N. Kohl and P. Stone, “Policy gradient reinforcement learning for fast quadrupedal locomotion,” in IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA ’04. 2004, vol. 3, 2004, pp. 2619–2624 Vol.3.
[8] B. Jang, M. Kim, G. Harerimana, and J. W. Kim, “Q-learning algorithms: A comprehensive classification and applications,” IEEE Access, vol. 7,
pp. 133 653–133 667, 2019.
[9] M. Hutter, C. Gehring, A. Lauber, F. Gunther,¨ D. Bellicoso, V. Tsounis, P. Fankhauser, R. Diethelm, S. Bachmann, M. Blosch,¨ H. Kolvenbach, M. Bjelonic, L. Isler, and K. Meyer, “Anymal - toward legged robots for harsh environments,” Advanced Robotics, vol. 31, pp. 918 – 931, 2017.
[10] J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V. Tsounis, V. Koltun, and M. Hutter, “Learning agile and dynamic motor skills for legged robots,” Science Robotics, vol. 4, no. 26, p. eaau5872, Jan 2019. [Online]. Available: http://dx.doi.org/10.1126/scirobotics.aau5872
[11] J. Meng, Y. Li, and B. Li, “A dynamic balancing approach for a quadruped robot supported by diagonal legs,” International Journal of Advanced Robotic Systems, vol. 12, p. 1, 10 2015.
[12] T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” 2016.
[13] P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, and D. Meger, “Deep reinforcement learning that matters,” 2019.
[14] G. Berseth, C. Xie, P. Cernek, and M. V. de Panne, “Progressive reinforcement learning with distillation for multi-skilled motion control,” 2018.
[15] X. B. Peng, G. Berseth, and M. van de Panne, “Terrain-adaptive locomotion skills using deep reinforcement learning,” ACM Trans. Graph., vol. 35, no. 4, Jul. 2016. [Online]. Available: https://doi.org/10.1145/2897824.2925881
[16] S. E. Levin, D. E, H. Perspective, and S. Of, “Raibert, m. h., ”legged robots that balance.” cambridge, mass.: Mit press (1986).” 1995.
[17] F. Moro, A. Badri-Sprowitz,¨ A. Tuleu, M. Vespignani, N. Tsagarakis, A. Ijspeert, and D. Caldwell, “Horse-like walking, trotting, and galloping derived from kinematic motion primitives (kmps) and their application to walk/trot transitions in a compliant quadruped robot,” Biological cybernetics, vol. 107, 03 2013.
[18] N. Hansen, The CMA Evolution Strategy: A Comparing Review, 06 2007, vol. 192, pp. 75–102.
[19] A. Iscen, K. Caluwaerts, J. Tan, T. Zhang, E. Coumans, V. Sindhwani, and V. Vanhoucke, “Policies modulating trajectory generators,” 2019.
[20] X. Tran and H. Yanada, “Dynamic friction behaviors of pneumatic cylinders,” Intelligent Control and Automation, vol. 04, pp. 180–190, 01 2013.
[21] M. Sorli, L. Gastaldi, E. Codina, and S. Heras, “Dynamic analysis of pneumatic actuators,” Simul. Pr. Theory, vol. 7, pp. 589–602, 12 1999.
[22] W. Sabo and P. Ben-Tzvi, “Maneuverability and heading control of a quadruped robot utilizing tail dynamics,” 10 2017, p. V002T21A010.
[23] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” 12 2013.
[24] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA, USA: A Bradford Book, 2018.
[25] M. Hausknecht and P. Stone, “On-policy vs. off-policy updates for deep reinforcement learning,” in Deep Reinforcement Learning: Frontiers and Challenges, IJCAI Workshop, New York, July 2016. [Online]. Available: http://www.cs.utexas.edu/users/ai-lab?hausknecht:deeprl16
[26] M. F. Hale, J. L. Du Bois, and P. Iravani, “Agile and adaptive hopping height control for a pneumatic robot,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 5755–5760.
[27] N. Ashraf, R. Mostafa, R. Sakr, and M. Rashad, “Optimizing hyperparameters of deep reinforcement learning for autonomous driving based on whale optimization algorithm,” PLOS ONE, vol. 16, p. e0252754, 06 2021.