Control Multimodal en Entornos Inciertos usando Aprendizaje por Refuerzos y Procesos Gaussianos

Mariano De Paula, Luis O. Ávila, Carlos sánchez Reinoso, Gerardo G. Acosta

Resumen

El control de sistemas complejos puede ser realizado descomponiendo la tarea de control en una secuencia de modos de control, o simplemente modos. Cada modo implementa una ley de retroalimentación hasta que se activa una condición de terminación, en respuesta a la ocurrencia de un evento exógeno/endógeno que indica que la ejecución del modo debe finalizar. En este trabajo se presenta una propuesta novedosa para encontrar una política de conmutación óptima para resolver el problema de control optimizando alguna medida de costo/beneficio. Una política óptima implementa un programa de control multimodal óptimo, el cual consiste en un encadenamiento de modos de control. La propuesta realizada incluye el desarrollo y formulación de un algoritmo basado en la idea de la programación dinámica integrando procesos Gaussianos y aprendizaje Bayesiano activo. Mediante el enfoque propuesto es posible realizar un uso eficiente de los datos para mejorar la exploración de las soluciones sobre espacios de estados continuos. Un caso de estudio representativo es abordado para demostrar el desempeño del algoritmo propuesto.

Palabras clave

Control multimodal; Programación dinámica; Procesos Gaussianos; Incertidumbre; Política

Texto completo:

PDF

Referencias

Abate, Alessandro, Maria Prandini, John Lygeros, Shankar Sastry. 2008. Probabilistic reachability and safety for controlled discrete time stochastic hybrid systems. Automatica 44, 2724-2734.

Adamek, F., M Sobotka, O Stursberg. 2008. Stochastic optimal control for hybrid systems with uncertain discrete dynamics. Proceedings of the IEEE International Conference on Automation Science and Engineering, 23-28. Washington D.C.

Åström, Karl Johan, Bo Bernhardsson. 2003. System with Lebesgue Sampling. Directions in Mathematical Systems Theory and Optimization, LNCIS 268. LNCIS. Springer-Verlag Berlin Heidelberg.

Axelsson, H., Y. Wardi, M. Egerstedt, E. I. Verriest. 2007. Gradient descent approach to optimal mode scheduling in hybrid dynamical systems. Journal of Optimization Theory and Applications 136, 167-186.

Azuma, Shun-ichi, Jun-ichi Imura, Toshiharu Sugie. 2010. Lebesgue piecewise affine approximation of nonlinear systems. Nonlinear Analysis: Hybrid Systems 4, 92-102.

Barton, Paul I., Cha Kun Lee,Mehmet Yunt. 2006. Optimization of hybrid systems. Computers & Chemical Engineering 30, 1576-1589.

Bemporad, A., S. Di Cairano. 2011. Model-predictive control of discrete hybrid stochastic automata. Automatic Control, IEEE Transactions on 56, 1307 -1321.

Bemporad, Alberto,Manfred Morari. 1999. Control of systems integrating logic, dynamics, and constraints. Automatica 35, 407-427.

Bensoussan, A., J. L. Menaldi. 2000. Stochastic hybrid control. Journal of Mathematical Analysis and Applications 249.

Bertsekas, Dimitri P. 2000. Dynamic Programming and Optimal Control, Vol. I. 2nd ed. Athena Scientific.

Blackmore, L., M. Ono, A. Bektassov,B.C. Williams. 2010. A probabilistic particle-control approximation of chance-constrained stochastic predictive control. Robotics, IEEE Transactions on 26, 502-517.

Borrelli, Francesco, Mato Baotiü, Alberto Bemporad,Manfred Morari. 2005. Dynamic programming for constrained optimal control of discrete-time linear hybrid systems. Automatica 41, 1709-1721.

Bryson, Jr Arthur E., Yu-Chi Ho. 1975. Applied optimal control: optimization, estimation and control. Revised. Taylor & Francis.

Busoniu, Lucian, Robert Babuska, Bart De Schutter,Damien Ernst. 2010. Reinforcement learning and dynamic programming using function approximators. 1.a ed. CRC Press.

Cassandras, Christos G.,John Lygeros. 2007. Stochastic hybrid systems. Boca Raton: Taylor & Francis.

Deisenroth, Marc Peter. 2010. Efficient Reinforcement Learning Using Gaussian Processes. KIT Scientific Publishing.

Deisenroth, Marc Peter, Carl Edward Rasmussen,Jan Peters. 2009. Gaussian process dynamic programming. Neurocomputing 72, 1508-1524.

Di Cairano, S., A. Bemporad,J. Júlvez. 2009. Event-driven optimization-based control of hybrid systems with integral continuous-time dynamics. Automatica 45, 1243-1251.

Ding, X.-C., Y. Wardi,M. Egerstedt. 2009. On-line optimization of switchedmode dynamical systems. Automatic Control, IEEE Transactions on 54, 2266 -2271.

Egerstedt, M., Y. Wardi,H. Axelsson. 2006. Transition-Time Optimization for Switched-Mode Dynamical Systems. IEEE Transactions on Automatic Control 51, 110- 115.

Girard, Agathe. 2004. Approximate methods for propagation of uncertainty with gaussian process models. University of Glasgow.

Kuss, M. 2006. Gaussian process models for robust regression, classification, and reinforcement learning. Technische Universite Darmstadt.

Liberzon, Daniel. 2003. Switching in systems and control. Systems & Control: Foundations & Applications. Boston: Birkhäuser Boston Inc.

Lincoln, B., A. Rantzer. 2006. Relaxing Dynamic Programming. IEEE Transactions on Automatic Control 51, 1249-1260.

Lunze, Jan,Daniel Lehmann. 2010. A state-feedback approach to event-based control. Automatica 46, 211-215.

Mehta, Tejas,Magnus Egerstedt. 2005. Learning multi-modal control programs. Hybrid Systems: Computation and Control, 466-479. Lecture Notes in Computer Science. Springer Berlin

Mehta, Tejas R.,Magnus Egerstedt. 2006. An optimal control approach to mode generation in hybrid systems. Nonlinear Analysis 65, 963-983.

Mehta, Tejas R., Magnus Egerstedt. 2008, Multi-modal control using adaptive motion description languages. Automatica 44, 1912-1917.

Pajares Martin-Sanz, G., y De la Cruz Garcia J.M. 2010. Aprendizaje automático. Un enfoque práctico, Cap. 12, Aprendizaje por Refuerzos. RA-MA.

Rantzer, A. 2006. Relaxed Dynamic Programming in Switching Systems. Control Theory and Applications, IEE Proceedings - 153, 567- 574.

Rasmussen, Carl Edward,Christopher K. I. Williams. 2006. Gaussian processes for machine learning. MIT Press.

Rosenstein, Michael T.,Andrew G. Barto. 2004. Supervised Actor-Critic Reinforcement Learning. Handbook of Learning and Approximate Dynamic Programming, 359–380. John Wiley & Sons, Inc.

Salichs, M. A., Malfaz, M., y Gorostiza, J.F. 2010. Toma de Decisiones en Robótica. Revista Iberoamericana de Automática e Informática Industrial RIAI 7, 5-16.

Shi, Peng, Yuanqing Xia, G.P. Liu,D. Rees. 2006. On designing of slidingmode control for stochastic jump systems. Automatic Control, IEEE Transactions on 51, 97 - 103.

Song, Chunyue,Ping Li. 2010. Near optimal control for a class of stochastic hybrid systems. Automatica 46, 1553-1557.

Sutton, Richard S.,Andrew G. Barto. 1998. Reinforcement learning: An introduction. MIT Press.

Verdinelli, Isabella,Joseph B. Kadane. 1992. Bayesian designs for maximizing information and outcome. Journal of the American Statistical Association 87, 510-515.

Xu, Xuping,Panos J. Antsaklis. 2003. Results and perspectives on computational methods for optimal control of switched systems. Proceedings of the 6th international conference on Hybrid systems: computation and control, 540–555. Springer-Verlag.

Xu, Yan-Kai,Xi-Ren Cao. 2011. Lebesgue-Sampling-Based Optimal Control Problems with Time Aggregation. Automatic Control, IEEE Transactions on 56, 1097-1109.

Zhang, Wei, Jianghai Hu,A. Abate. 2009. On the value functions of the discrete-time switched LQR problem. Automatic Control, IEEE Transactions on 54, 2669 -2674.

Abstract Views

581
Metrics Loading ...

Metrics powered by PLOS ALM


 

Citado por (artículos incluidos en Crossref)

This journal is a Crossref Cited-by Linking member. This list shows the references that citing the article automatically, if there are. For more information about the system please visit Crossref site

1. Fitted Q-Function Control Methodology Based on Takagi–Sugeno Systems
Henry Diaz, Leopoldo Armesto, Antonio Sala
IEEE Transactions on Control Systems Technology  vol: 28  num.: 2  primera página: 477  año: 2020  
doi: 10.1109/TCST.2018.2885689



Creative Commons License

Esta revista se publica bajo una Licencia Creative Commons Attribution-NonCommercial-CompartirIgual 4.0 International (CC BY-NC-SA 4.0)

Universitat Politècnica de València     https://doi.org/10.4995/riai

e-ISSN: 1697-7920     ISSN: 1697-7912