Control and Simulation of a 6-DOF Biped Robot based on Twin Delayed Deep Deterministic Policy Gradient Algorithm

Phan Bui Khoi; Nguyen Truong Giang; Hoang Van Tan

doi:10.17485/IJST/v14i30.1030

Article

Control and Simulation of a 6-DOF Biped Robot based on Twin Delayed Deep Deterministic Policy Gradient Algorithm

VIEWS 3092
PDF 1516

Indian Journal of Science and Technology

DOI: 10.17485/IJST/v14i30.1030

Year: 2021, Volume: 14, Issue: 30, Pages: 2460-2471

Original Article

Control and Simulation of a 6-DOF Biped Robot based on Twin Delayed Deep Deterministic Policy Gradient Algorithm

Phan Bui Khoi^1*, Nguyen Truong Giang¹, Hoang Van Tan¹

¹School of Mechanical Engineering, Hanoi University of Science and Technology, 01 Dai Co
Viet, Hai Ba Trung, Hanoi, 100000, Vietnam

*Corresponding Author
Email: [email protected]

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Objectives: To study an algorithm to control a bipedal robot to walk so that it has a gait close to that of a human. It is known that the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm is a highly efficient algorithm with a few changes compared to the popular algorithm — the commonly used Deep Deterministic Policy Gradient (DDPG) in the continuous action space problem in Reinforcement Learning. Methods: Different from the usual sparse reward function model used, in this study, a reward model combined with a sparse reward function and dense reward function will be proposed. The application of the TD3 algorithm together with the proposed reward function model to control a bipedal robot model with 6 degrees of freedom will be presented. The training process is simulated in Gazebo/Robot Operating System (ROS) environment. Finding: The results show that, when choosing a reward model combined with a sparse reward function and a dense reward function suitable for the robot model, will help it learn faster and achieve better results. The biped robot can walk straight with an almost human-like gait. In the paper, the results from the TD3 algorithm combined with the proposed reward model are also compared with the results from other algorithms. Novelty: Applying the TD3 algorithm combined with the proposed reward model for the 6-DOF biped robot and simulating the robot’s gait in Gazebo/ROS environment, ROS is a middleware that can be used to control a robot in a real environment in the future.

Keywords: TD3; biped robot; reinforcement learning; ROS; Gazebo

References

Mohamed AK, Youmin Z. Developments and Challenges in Wheeled Mobile Robot Control. In: Proceeding of the 10th International Conference on Intelligent Unmanned Systems (ICIUS). 2014.
Solea R, Filipescu A, Nunes U. Sliding-mode control for trajectory tracking of a Wheeled Mobile Robot in presence of uncertainties. Proceedings of Asian Control Conference. 2009;p. 1701–1706. Available from: https://ieeexplore.ieee.org/document/5276210
Li W, Yang C, Jiang Y, Liu X, Su CY. Motion Planning for Omnidirectional Wheeled Mobile Robot by Potential Field Method. Journal of Advanced Transportation. 2017;2017:1–11. Available from: https://doi.org/10.1155/2017/4961383
Morin P, Samson C. Motion Control of Wheeled Mobile Robots. (pp. 799-826) 2008.
Hernández SC, Rodriguez LE, R S, Gordillo JL. Kinematics and Dynamics of a New 16 DOF Humanoid Biped Robot with Active Toe Joint. International Journal of Advanced. Robotic. System. 2012;9:1–12. Available from: https://doi.org/10.5772/52452
Xh B, Dermaku A, Likaj SA. Kinematics and and Dynamics Modeling of the Biped Robot. IFAC Proceedings. 2013;46:69–73. Available from: https://doi.org/10.3182/20130606-3-XK-4037.00032
Vadakkepat P, Goswami D. Biped Locomotion: Stability, Analysis and Control. International. Journal of smart sensing intelligent system. 2008;1:187–207. Available from: https://doi.org/10.21307/ijssis-2017-286
Vukobratovic M, Borovac B. Zero-Moment Point-Thirty five years of its life. International. Journal of Humanoid Robots. 2004;1:157–173. Available from: https://doi.org/10.1142/S0219843604000083
Morisawa M, Harada K, Kajita S, Kaneko K, Sola J, YE. Reactive stepping to prevent falling for humanoids. Proceeding of IEEE-RAS International conference on Humanoid Robots. 2009;p. 528–534. Available from: https://doi.org/10.1109/ICHR.2009.5379522
Kajita S, Sakaguchi T, Nakaoka S, Morisawa M, Kaneko K, Kanehiro F. Quick squatting motion generation of a humanoid robot for falling damage reduction. Proceeding of IEEE International Conference on Cyborg and Bionic Systems (CBS). 2017;p. 45–49. Available from: https://doi.org/10.1109/CBS.2017.8266127.
Morisawa M, Kaneko K, Kanehiro F, Kajita S, Fujiwara K, Harada K. Motion Planning of Emergency Stop for Humanoid Robot by State Space Approach. In: Proceeding of IEEE International Conferenceon Intelligent Robots and System. (pp. 2986-2992) 2006.
Kim J, Choi T, Lee J. Falling avoidance of biped robot using state classification. IEEE International Conference on Mechatronics and Automation. 2008;p. 72–76. Available from: https://doi.org/10.1109/ICMA.2008.4798728
Huang Q, Yokoi K, Kajita S, Kaneko K, Arai H, Koyachi N, et al. Planning walking patterns for a biped robot. IEEE Trans Robotics Automation. 2001;17:280–289. Available from: https://doi.org/10.1109/70.938385.
S R, Andrew GB. Reinforcement Learning: An Introduction. Cambridge, MA, USA. MIT Press. 2018.
Vicent FL, Peter H, Riashat I, MGB, JP. An Introduction to Deep Reinforcement Learning. Found Trends Machine Learning. 2018;11:219–354. Available from: http://dx.doi.org/10.1561/2200000071
Hao-Nan W, Ning L, Yi-Yun Z, Da-Wei F, Feng H, Dong-Sheng L, et al. Deep reinforcement learning: a survey. Frontiers Information Technology Electron Engineering. 2020;21:1726–1744. Available from: https://doi.org/10.1631/FITEE.1900533
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Drieleman GV, et al. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature. 2016;529:484–489. Available from: https://doi.org/10.1038/nature16961
Andrew WS, ER, JJ, KJ, SL, GT, et al. Improved Protein Structure Prediction using 2 Potentials from Deep Learning. Nature. 2020;577:706–710. Available from: https://doi.org/10.1038/s41586-019-1923-7
Julian I, Jie T, FC, Mrinal K, PP, Sergey L. How to train your robot with deep reinforcement learning: lessons we have learned. International Journal Robotics Research. 2021;p. 1–24. Available from: https://doi.org/10.1177/0278364920987859
Kober J, Bagnell J, Peters J. Reinforcement learning in robotics: A survey. International. Journal. Of Robotics. Research. 2013;32:1238–1274. Available from: https://doi.org/10.1177/0278364913495721
NH, Hung ML. Review of Deep Reinforcement Learning for Manipulation. In: Proceeding of IEEE International Conference on Robotic Computing. (pp. 25-27) 2019.
Rongrong L, Florent N, Philippe Z, DM, Birgitta DL. Deep Reinforcement Learning for the Control of Robotic Manipulation: A Focussed Mini-Review. Robotics. 2021;10:1–13. Available from: https://doi.org/10.3390/robotics10010022
Mellatshahi SN. Learning Control of Robotic Arm Using Deep Q-Neural Network. Electronic . thesis
Bedoya DR. Diseño de una estrategia para la planeación de rutas de navegación autónoma de un robot móvil en entornos interiores usando un algoritmo de aprendizaje automático. thesis
Morimoto J, Cheng G, Atkeson CG, Zeglin G. A simple reinforcement learning algorithm for biped walking. Proceeding of IEEE International Conference on Robotics and Automation. 2004. Available from: https://doi.org/10.1109/ROBOT.2004.1307522
Lillicrap TP, JJH, Pritzel A, Heess N, Erez T, Tassa Y, et al. Continuous control with deep reinforcement learning. Proceeding of 4th International Conference on Learning Representations (ICLR). 2016. Available from: https://arxiv.org/abs/1509.02971
Kumar A, NP, Omkar S. Bipedal Walking Robot using Deep Deterministic Policy Gradient. 2018. Available from: https://arxiv.org/pdf/1807.05924.pdf
Guoyu Z, Qishen Z, Jiahao L, Jiangeng L. Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards. International Jorurnal of Advanced Robotics System. 2020;17(1). Available from: https://doi.org/10.1177/1729881419898342
Fujimoto S, Hoof HV, Meger D. Addressing Function Approximation Error in Actor-Critic Methods. Proceeding of International Conference on Machine Learning Conference (ICML. 2018;p. 1587–1596. Available from: https://arxiv.org/abs/1802.09477
Hou Y, Hong H, Sun Z, Xu D, Zeng Z. The Control Method of Twin Delayed Deep Deterministic Policy Gradient with Rebirth Mechanism to Multi-DOF Manipulator. Electronics. 2021;10:870. Available from: https://doi.org/10.3390/electronics10070870
Myeongseop K, Dong-Ki H, Jae-Han P, KJS. Motion Planning of Robot Manipulators for a Smoother Path Using a Twin Delayed Deep Deterministic Policy Gradient with Hindsight Experience Replay. Applied Science . 2020;10:575. Available from: https://doi.org/10.3390/app10020575
Stephen D, Wenfeng Z. Twin-Delayed DDPG: A Deep Reinforcement Learning Technique to Model a Continuous Movement of an Intelligent Robot Agent. Proceeding of International Conference on Vision. 2019;66. Available from: https://doi.org/10.1145/3387168.3387199
José MC, PE, Lía GP, FCJ. ROS-Based Open Tool for Intelligent Robotics Education. Applied Science. 2020;10(21):7419. Available from: https://doi.org/10.3390/app10217419
Duan Y, Chen X, Houthooft R, Schulman J, Abbeel P. Benchmarking Deep Reinforcement Learning for Continuous Control. Proceedings of International Conference on Machine Learning. 2016;p. 1329–1338. Available from: https://arxiv.org/abs/1604.06778
Silver D, Lever G, Heess N, Degris T, Wierstra D, MRM. Deterministic Policy Gradient Algorithms. In: Proceeding of International Conference on Machine Learning (ICML). (pp. 387-395) 2014.
Tuomas H, Aurick Z, Pieter A, LS. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In: Proceeding of International Conference on Machine Learning. (pp. 1861-1870) 2018.
SR. Noise, Overestimation and exploration in Deep Reinforcement Learning. 2020. Available from: https://arxiv.org/pdf/2006.14167v1.pdf
Bellman R. Dynamic programing. Science. 1966;153:34–37. Available from: https://doi.org/10.1126/science.153.3731.34

Copyright

© 2021 Khoi et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)