Controlador por aprendizaje reforzado para un péndulo con rueda inercial experimental

Antonio Concha-Sánchez

https://orcid.org/0000-0001-9005-3584

Mexico

Universidad de Colima image/svg+xml

Facultad de Ingeniería Mecánica y Eléctrica

Brandom J. Jiménez-Hernández

Mexico

Universidad de Colima image/svg+xml

Facultad de Ingeniería Mecánica y Eléctrica

Suresh Thenozhi

https://orcid.org/0000-0002-0288-9953

Mexico

Autonomous University of Queretaro image/svg+xml

Facultad de Ingeniería

Ramón Jiménez-Betancourt

https://orcid.org/0000-0002-0171-7279

Mexico

Universidad de Colima image/svg+xml

Facultad de Ingeniería Electromecánica

Suresh K. Gadi

https://orcid.org/0000-0001-7974-7825

Mexico

Universidad de Colima image/svg+xml

Facultad de Ingeniería Electromecánica

|

Aceptado: 14-06-2025

|

Publicado: 20-06-2025

DOI: https://doi.org/10.4995/riai.2025.23001
Datos de financiación

Descargas

Palabras clave:

Péndulo con rueda inercial, regulador cuadrático lineal, aprendizaje reforzado, algoritmo DDPG, modelado e identificación del sistema

Agencias de apoyo:

Esta investigación no contó con financiación

Resumen:

En este artículo se aborda el control de estabilización de un péndulo invertido con rueda inercial mediante el diseño de dosenfoques de control. Por un lado, se implementa un controlador convencional que utiliza el Regulador Cuadrático Lineal (LQR,por sus siglas en inglés,Linear Quadratic Regulator), y por otro lado, se propone como alternativa no convencional que emplea un controlador basado en aprendizaje por refuerzo (RL, por sus siglas en inglés,Reinforcement Learning). Se presenta el diseño mecánico, el modelo matemático y la identificación paramétrica de una plataforma experimental de bajo costo desarrollada paravalidar los controladores. Además, se diseña un observador de estado utilizando el método de Sylvester para estimar las veloci-dades del péndulo y de la rueda, necesarias para ambos controladores. El controlador RL utiliza un agente actor-crítico entrenado mediante el algoritmo DDPG (por sus siglas en inglés,Deep Deterministic Policy Gradient), basado en el modelo matemático del sistema. Finalmente, se comparan los desempeños de ambos controladores a través de resultados experimentales, concluyendo que el controlador RL logra un menor error en estado estacionario, mientras que el LQR exhibe mejor respuesta transitoria.

Ver más Ver menos

Citas:

Atac¸, E., Yıldız, K., Ülkü, E. E., 2021. Use of PID control during education in reinforcement learning on two wheel balance robot. Gazi University Journal of Science Part C: Design and Technology 9 (4), 597-607. https://doi.org/10.29109/gujsc.955562

Baek, J., Lee, C., Lee, Y. S., Jeon, S., Han, S., 2024. Reinforcement learning to achieve real-time control of triple inverted pendulum. Engineering Applications of Artificial Intelligence 128, 107518. https://doi.org/10.1016/j.engappai.2023.107518

Block, D. J., Åström, K. J., Spong, M.W., 2007. The reaction wheel pendulum. Morgan & Claypool Publishers. https://doi.org/10.1007/978-3-031-01827-5

Chen, B.-R., Hsu, C.-F., Lee, T.-T., 2019. Stabilization of inertia wheel inverted pendulum using fuzzy-based hybrid control. In: 2019 International Conference on Machine Learning and Cybernetics (ICMLC). IEEE, pp. 1-6. https://doi.org/10.1109/ICMLC48188.2019.8949281

Chinelato, C. I. G., Neves, G. P. D., Angélico, B. A., 2020. Safe control of a reaction wheel pendulum using control barrier function. IEEE Access 8, 160315-160324. https://doi.org/10.1109/ACCESS.2020.3018713

Concha, A., Gadi, S. K., 2025a. Control de un péndulo con rueda inercial mediante aprendizaje por refuerzo. URL: https://youtu.be/j1V13qQUsuE

Concha, A., Gadi, S. K., 2025b. Control de un péndulo con rueda inercial usando un regulador cuadr'atico lineal. URL: https://youtu.be/Y8OqHMgy3Sc

Concha, A., Gadi, S. K., 2025c. Simulaciones y experimentos: control RL y LQR aplicado a un p'endulo con rueda inercial. URL: https://github.com/skgadi/Projects/tree/master/2024-pendulum-with-flywheel/Simulaciones_experimentos_articulo

Demircioglu, U., Bakır, H., Bakır, R., June 2024. An investigation of pendulum control using reinforcement learning: Comparison of different agents. In: 3rd International Conference on Engineering, Natural and Social Sciences (ICENSOS 2024). pp. 94-101.

Espinosa, J. J., 2014. Control lineal de sistemas multivariables. Researchgate.

Franklin, G. F., Powell, J. D., Emami-Naeini, A., Powell, J. D., 2015. Feedback control of dynamic systems, 7th Edition. Pearson, Upper Saddle River, NJ.

Guo,W., Liu, D., 2019. Sliding mode observe and control for the underactuated inertia wheel pendulum system. IEEE Access 7, 86394-86402. https://doi.org/10.1109/ACCESS.2019.2926082

Hernandez, R., Garcia-Hernandez, R., Jurado, F., 2024. Modeling, simulation, and control of a rotary inverted pendulum: A reinforcement learning-based control approach. Modelling 5 (4), 1824-1852. https://doi.org/10.3390/modelling5040095

Hfaiedh, A., Chemori, A., Abdelkrim, A., 2020. Stabilization of the inertia wheel inverted pendulum by advanced IDA-PBC based controllers: Comparative study and real-time experiments. In: 2020 17th International Multi-Conference on Systems, Signals & Devices (SSD). IEEE, pp. 753-760. https://doi.org/10.1109/SSD49366.2020.9364159

Hidayati, A. N., Wasiwitono, U., 2021. Modeling and control of inertia wheel pendulum system with LQR and PID control. In: 2021 International Seminar on Intelligent Technology and Its Applications (ISITIA). IEEE, pp. 135-140. https://doi.org/10.1109/ISITIA52817.2021.9502267

Ho, T.-N., et al., 2025. Model-free swing-up and balance control of a rotary inverted pendulum using the TD3 algorithm: Simulation and experiments. Engineering, Technology & Applied Science Research 15 (1), 19316-19323. https://doi.org/10.48084/etasr.9335

Huanlong, L., Zhengjie, W., Bin, J., Hongyu, P., 2021. An inertia wheel pendulum control method based on actor-critic learning algorithm. In: 2021 IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). IEEE, pp. 1281-1285. https://doi.org/10.1109/TrustCom53373.2021.00176

Israilov, S., Fu, L., Sánchez-Rodríguez, J., Fusco, F., Allibert, G., Raufaste, C., Argentina, M., 2023. Reinforcement learning approach to control an inverted pendulum: A general framework for educational purposes. Plos one 18 (2), e0280071. https://doi.org/10.1371/journal.pone.0280071

Johannink, T., Bahl, S., Nair, A., Luo, J., Kumar, A., Loskyll, M., Ojea, J. A., Solowjow, E., Levine, S., 2019. Residual reinforcement learning for robot control. In: 2019 international conference on robotics and automation (ICRA). IEEE, pp. 6023-6029. https://doi.org/10.1109/ICRA.2019.8794127

Kanjanawanishkul, K., 03 2015. LQR and MPC controller design and comparison for a stationary self-balancing bicycle robot with a reaction wheel. Kybernetika 54, 173-191. https://doi.org/10.14736/kyb-2015-1-0173

Kuo, B. C., 1996. Automatic control systems. Prentice Hall PTR.

Lee, T., Ju, D., Lee, Y. S., 2025. Transition control of a double-inverted pendulum system using Sim2Real reinforcement learning. Machines 13 (3), 186. https://doi.org/10.3390/machines13030186

Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D., 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.

Matlab, 2024. Deep deterministic policy gradient (DDPG) agents. URL: https://la.mathworks.com/help/reinforcement-learning/ug/ddpg-agents.html

Montoya, O. D., Gil-Gonz'alez, W., 2020. Nonlinear analysis and control of a reaction wheel pendulum: Lyapunov-based approach. Engineering Science and Technology, an International Journal 23 (1), 21-29. https://doi.org/10.1016/j.jestch.2019.03.004

Ogata, K., 2010. Modern control engineering, 5th Edition. Prentice Hall.

Oliveira, A. I. S., Leite, A. C., Caarls, W., 2022. Intelligent robust control for second-order non-linear systems with smart gain tuning based on reinforcement learning. In: 2022 International Joint Conference on Neural Networks (IJCNN). pp. 1-8. https://doi.org/10.1109/IJCNN55064.2022.9892099

Özalp, R., Varol, N. K., Tas¸ci, B., Uc¸ar, A., 2020. A review of deep reinforcement learning algorithms and comparative results on inverted pendulum system. Machine Learning Paradigms: Advances in Deep Learning-based Technological Applications, 237-256. https://doi.org/10.1007/978-3-030-49724-8_10

Pololu Corporation, 2020. 37D Metal Gearmotors. URL: https://www.pololu.com/file/0J1706/pololu-37d-metal-gearmotors.pdf

Sandoval, J., Kelly, R., Santibá˜nez, V., 2022. Sobre el control por moldeo de energía más inyección de amortiguamiento de sistemas mecánicos. Revista Iberoamericana de Automática e Informática industrial 19 (4), 407-418. https://doi.org/10.4995/riai.2022.16862

Sutton, R. S., Barto, A. G., 2018. Reinforcement learning: An introduction. MIT press.

Teja, G. P., Dhabale, A., Waghmare, T., 2020. Nonlinear control of the reaction wheel pendulum using passivity-based control and backstepping control. In: 2020 IEEE First International Conference on Smart Technologies for Power, Energy and Control (STPEC). IEEE, pp. 1-6. https://doi.org/10.1109/STPEC49749.2020.9297784

Vadlamudi, S., Lakshmi, K. V., Yaramala, A., Uppalapati, C., Kolluri, P. K., 2024. Self balancing motorcycle using reinforcement learning. In: 2024 International Conference on Emerging Systems and Intelligent Computing (ESIC). IEEE, pp. 691-696. https://doi.org/10.1109/ESIC60604.2024.10481556

Zabihifar, S. H., Navvabi, H., Yushchenko, A. S., 2021. Dual adaptive neural network controller for underactuated systems. Robotica 39 (7), 1281-1298. https://doi.org/10.1017/S0263574720001125

Zaborniak, D., Patan, K.,Witczak, M., 2024. Design, implementation, and control of a wheel-based inverted pendulum. Electronics 13 (3), 514. https://doi.org/10.3390/electronics13030514

Zai, A., Brown, B., 2020. Deep reinforcement learning in action. Manning Publications.

Zhu, X., Deng, Y., Zheng, X., Zheng, Q., Chen, Z., Liang, B., Liu, Y., 2023. Online series-parallel reinforcement-learning-based balancing control for reaction wheel bicycle robots on a curved pavement. IEEE Access 11, 66756–66766. https://doi.org/10.1109/ACCESS.2023.3268524

Ver más Ver menos