• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology


Indian Journal of Science and Technology

Year: 2021, Volume: 14, Issue: 30, Pages: 2460-2471

Original Article

Control and Simulation of a 6-DOF Biped Robot based on Twin Delayed Deep Deterministic Policy Gradient Algorithm


Objectives: To study an algorithm to control a bipedal robot to walk so that it has a gait close to that of a human. It is known that the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm is a highly efficient algorithm with a few changes compared to the popular algorithm — the commonly used Deep Deterministic Policy Gradient (DDPG) in the continuous action space problem in Reinforcement Learning. Methods: Different from the usual sparse reward function model used, in this study, a reward model combined with a sparse reward function and dense reward function will be proposed. The application of the TD3 algorithm together with the proposed reward function model to control a bipedal robot model with 6 degrees of freedom will be presented. The training process is simulated in Gazebo/Robot Operating System (ROS) environment. Finding: The results show that, when choosing a reward model combined with a sparse reward function and a dense reward function suitable for the robot model, will help it learn faster and achieve better results. The biped robot can walk straight with an almost human-like gait. In the paper, the results from the TD3 algorithm combined with the proposed reward model are also compared with the results from other algorithms. Novelty: Applying the TD3 algorithm combined with the proposed reward model for the 6-DOF biped robot and simulating the robot’s gait in Gazebo/ROS environment, ROS is a middleware that can be used to control a robot in a real environment in the future.

Keywords: TD3; biped robot; reinforcement learning; ROS; Gazebo


  1. Solea R, Filipescu A, Nunes U. Sliding-mode control for trajectory tracking of a Wheeled Mobile Robot in presence of uncertainties. Proceedings of Asian Control Conference. 2009;p. 1701–1706. Available from: https://ieeexplore.ieee.org/document/5276210
  2. Li W, Yang C, Jiang Y, Liu X, Su CY. Motion Planning for Omnidirectional Wheeled Mobile Robot by Potential Field Method. Journal of Advanced Transportation. 2017;2017:1–11. Available from: https://doi.org/10.1155/2017/4961383
  3. Morin P, Samson C. Motion Control of Wheeled Mobile Robots. (pp. 799-826) 2008.
  4. Hernández SC, Rodriguez LE, R S, Gordillo JL. Kinematics and Dynamics of a New 16 DOF Humanoid Biped Robot with Active Toe Joint. International Journal of Advanced. Robotic. System. 2012;9:1–12. Available from: https://doi.org/10.5772/52452
  5. Xh B, Dermaku A, Likaj SA. Kinematics and and Dynamics Modeling of the Biped Robot. IFAC Proceedings. 2013;46:69–73. Available from: https://doi.org/10.3182/20130606-3-XK-4037.00032
  6. Vadakkepat P, Goswami D. Biped Locomotion: Stability, Analysis and Control. International. Journal of smart sensing intelligent system. 2008;1:187–207. Available from: https://doi.org/10.21307/ijssis-2017-286
  7. Vukobratovic M, Borovac B. Zero-Moment Point-Thirty five years of its life. International. Journal of Humanoid Robots. 2004;1:157–173. Available from: https://doi.org/10.1142/S0219843604000083
  8. Morisawa M, Harada K, Kajita S, Kaneko K, Sola J, YE. Reactive stepping to prevent falling for humanoids. Proceeding of IEEE-RAS International conference on Humanoid Robots. 2009;p. 528–534. Available from: https://doi.org/10.1109/ICHR.2009.5379522
  9. Kajita S, Sakaguchi T, Nakaoka S, Morisawa M, Kaneko K, Kanehiro F. Quick squatting motion generation of a humanoid robot for falling damage reduction. Proceeding of IEEE International Conference on Cyborg and Bionic Systems (CBS). 2017;p. 45–49. Available from: https://doi.org/10.1109/CBS.2017.8266127.
  10. Morisawa M, Kaneko K, Kanehiro F, Kajita S, Fujiwara K, Harada K. Motion Planning of Emergency Stop for Humanoid Robot by State Space Approach. In: Proceeding of IEEE International Conferenceon Intelligent Robots and System. (pp. 2986-2992) 2006.
  11. Kim J, Choi T, Lee J. Falling avoidance of biped robot using state classification. IEEE International Conference on Mechatronics and Automation. 2008;p. 72–76. Available from: https://doi.org/10.1109/ICMA.2008.4798728
  12. Huang Q, Yokoi K, Kajita S, Kaneko K, Arai H, Koyachi N, et al. Planning walking patterns for a biped robot. IEEE Trans Robotics Automation. 2001;17:280–289. Available from: https://doi.org/10.1109/70.938385.
  13. S R, Andrew GB. Reinforcement Learning: An Introduction. Cambridge, MA, USA. MIT Press. 2018.
  14. Vicent FL, Peter H, Riashat I, MGB, JP. An Introduction to Deep Reinforcement Learning. Found Trends Machine Learning. 2018;11:219–354. Available from: http://dx.doi.org/10.1561/2200000071
  15. Hao-Nan W, Ning L, Yi-Yun Z, Da-Wei F, Feng H, Dong-Sheng L, et al. Deep reinforcement learning: a survey. Frontiers Information Technology Electron Engineering. 2020;21:1726–1744. Available from: https://doi.org/10.1631/FITEE.1900533
  16. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Drieleman GV, et al. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature. 2016;529:484–489. Available from: https://doi.org/10.1038/nature16961
  17. Andrew WS, ER, JJ, KJ, SL, GT, et al. Improved Protein Structure Prediction using 2 Potentials from Deep Learning. Nature. 2020;577:706–710. Available from: https://doi.org/10.1038/s41586-019-1923-7
  18. Julian I, Jie T, FC, Mrinal K, PP, Sergey L. How to train your robot with deep reinforcement learning: lessons we have learned. International Journal Robotics Research. 2021;p. 1–24. Available from: https://doi.org/10.1177/0278364920987859
  19. Kober J, Bagnell J, Peters J. Reinforcement learning in robotics: A survey. International. Journal. Of Robotics. Research. 2013;32:1238–1274. Available from: https://doi.org/10.1177/0278364913495721
  20. Rongrong L, Florent N, Philippe Z, DM, Birgitta DL. Deep Reinforcement Learning for the Control of Robotic Manipulation: A Focussed Mini-Review. Robotics. 2021;10:1–13. Available from: https://doi.org/10.3390/robotics10010022
  21. Morimoto J, Cheng G, Atkeson CG, Zeglin G. A simple reinforcement learning algorithm for biped walking. Proceeding of IEEE International Conference on Robotics and Automation. 2004. Available from: https://doi.org/10.1109/ROBOT.2004.1307522
  22. Lillicrap TP, JJH, Pritzel A, Heess N, Erez T, Tassa Y, et al. Continuous control with deep reinforcement learning. Proceeding of 4th International Conference on Learning Representations (ICLR). 2016. Available from: https://arxiv.org/abs/1509.02971
  23. Guoyu Z, Qishen Z, Jiahao L, Jiangeng L. Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards. International Jorurnal of Advanced Robotics System. 2020;17(1). Available from: https://doi.org/10.1177/1729881419898342
  24. Fujimoto S, Hoof HV, Meger D. Addressing Function Approximation Error in Actor-Critic Methods. Proceeding of International Conference on Machine Learning Conference (ICML. 2018;p. 1587–1596. Available from: https://arxiv.org/abs/1802.09477
  25. Stephen D, Wenfeng Z. Twin-Delayed DDPG: A Deep Reinforcement Learning Technique to Model a Continuous Movement of an Intelligent Robot Agent. Proceeding of International Conference on Vision. 2019;66. Available from: https://doi.org/10.1145/3387168.3387199
  26. José MC, PE, Lía GP, FCJ. ROS-Based Open Tool for Intelligent Robotics Education. Applied Science. 2020;10(21):7419. Available from: https://doi.org/10.3390/app10217419
  27. Duan Y, Chen X, Houthooft R, Schulman J, Abbeel P. Benchmarking Deep Reinforcement Learning for Continuous Control. Proceedings of International Conference on Machine Learning. 2016;p. 1329–1338. Available from: https://arxiv.org/abs/1604.06778
  28. Silver D, Lever G, Heess N, Degris T, Wierstra D, MRM. Deterministic Policy Gradient Algorithms. In: Proceeding of International Conference on Machine Learning (ICML). (pp. 387-395) 2014.
  29. Bellman R. Dynamic programing. Science. 1966;153:34–37. Available from: https://doi.org/10.1126/science.153.3731.34


© 2021 Khoi et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)


Subscribe now for latest articles and news.