Método de error de Bellman con ponderación de volumen para mallado adaptativo en programación dinámica aproximada

Armesto, Leopoldo; Sala, Antonio

doi:10.4995/riai.2021.15698

Riunet Móvil

Método de error de Bellman con ponderación de volumen para mallado adaptativo en programación dinámica aproximada

dc.contributor.author	Armesto, Leopoldo	es_ES
dc.contributor.author	Sala, Antonio	es_ES
dc.date.accessioned	2021-12-21T10:27:13Z
dc.date.available	2021-12-21T10:27:13Z
dc.date.issued	2021-12-17
dc.identifier.issn	1697-7912
dc.identifier.uri	http://hdl.handle.net/10251/178687
dc.description.abstract	[EN] Optimal control and reinforcement learning have an associate “value function” which must be suitably approximated. Value function approximation problems usually have different precision requirements in different regions of the state space. An uniform gridding wastes resources in regions in which the value function is smooth, and, on the other hand, has not enough resolution in zones with abrupt changes. The present work proposes an adaptive meshing methodology in order to adapt to these changing requirements without incrementing too much the number of parameters of the approximator. The proposal is based on simplicial meshes and Bellman error, with a criteria to add and remove points from the mesh: modifications to proposals in earlier literature including the volume of the affected simplices are proposed, alongside with methods to manipulate the mesh triangulation.	es_ES
dc.description.abstract	[ES] El control óptimo y aprendizaje por refuerzo lleva asociada una "función de valor'' que debe ser adecuadamente aproximada. Estos problemas de aproximar funciones de valor tienen, usualmente, diferentes requerimientos de precisión en diferentes regiones del espacio de estados. Un mallado uniforme tiene problemas porque desperdicia recursos en regiones en las que la función de valor es suave, mientras que no tiene la suficiente resolución en zonas con grandes cambios en dicha función. El presente trabajo propone una metodología de programación dinámica aproximada con mallado adaptativo, para poder adaptarse a dichos requerimientos cambiantes sin incrementar en exceso el número de parámetros del aproximador. La propuesta se basa en mallados simpliciales y en el error en la ecuación de Bellman con un criterios para añadir y quitar puntos del mallado: se modificarán propuestas de la literatura incluyendo el volumen de los símplices afectados en los criterios, y se detallarán las manipulaciones de la triangulación necesarias.	es_ES
dc.description.sponsorship	Este artículo ha sido financiado por la Agencia Española de Investigación mediante el proyecto del Plan Nacional PID2020-116585GB-I00.	es_ES
dc.language	Español	es_ES
dc.publisher	Universitat Politècnica de València	es_ES
dc.relation.ispartof	Revista Iberoamericana de Automática e Informática industrial	es_ES
dc.rights	Reconocimiento - No comercial - Compartir igual (by-nc-sa)	es_ES
dc.subject	Control inteligente	es_ES
dc.subject	Programación Dinámica Aproximada	es_ES
dc.subject	Control Óptimo	es_ES
dc.subject	Aprendizaje	es_ES
dc.subject	Intelligent control	es_ES
dc.subject	Approximate dynamic programming	es_ES
dc.subject	Optimal control	es_ES
dc.subject	Neural learning	es_ES
dc.title	Método de error de Bellman con ponderación de volumen para mallado adaptativo en programación dinámica aproximada	es_ES
dc.title.alternative	Volume-weighted Bellman error method for adaptive meshing in approximate dynamic programming	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.4995/riai.2021.15698
dc.relation.projectID	info:eu-repo/grantAgreement/AEI//PID2020-116585GB-I00/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Instituto Universitario de Automática e Informática Industrial - Institut Universitari d'Automàtica i Informàtica Industrial	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Escuela Técnica Superior de Ingeniería del Diseño - Escola Tècnica Superior d'Enginyeria del Disseny	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Instituto de Diseño para la Fabricación y Producción Automatizada - Institut de Disseny per a la Fabricació i Producció Automatitzada	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Escuela Técnica Superior de Ingenieros Industriales - Escola Tècnica Superior d'Enginyers Industrials	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Ingeniería de Sistemas y Automática - Departament d'Enginyeria de Sistemes i Automàtica	es_ES
dc.description.bibliographicCitation	Armesto, L.; Sala, A. (2021). Método de error de Bellman con ponderación de volumen para mallado adaptativo en programación dinámica aproximada. Revista Iberoamericana de Automática e Informática industrial. 19(1):37-47. https://doi.org/10.4995/riai.2021.15698	es_ES
dc.description.accrualMethod	OJS	es_ES
dc.relation.publisherversion	https://doi.org/10.4995/riai.2021.15698	es_ES
dc.description.upvformatpinicio	37	es_ES
dc.description.upvformatpfin	47	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	19	es_ES
dc.description.issue	1	es_ES
dc.identifier.eissn	1697-7920
dc.relation.pasarela	OJS\15698	es_ES
dc.contributor.funder	Agencia Estatal de Investigación	es_ES
dc.description.references	Albertos, P., Sala, A., 2006. Multivariable control systems: an engineering approach. Springer, London, U.K.	es_ES
dc.description.references	Allgower, F., Zheng, A., 2012. Nonlinear model predictive control.	es_ES
dc.description.references	Antos, A., Szepesvári, C., Munos, R., 2008. Learning near optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning 71 (1), 89-129. https://doi.org/10.1007/s10994-007-5038-2	es_ES
dc.description.references	Ariño, C., Pérez, E., Querol, A., Sala, A., 2014. Model predictive control for discrete fuzzy systems via iterative quadratic programming. In: Fuzzy Systems (FUZZ-IEEE), 2014 IEEE International Conference on. IEEE, pp. 2288-2293. https://doi.org/10.1109/FUZZ-IEEE.2014.6891633	es_ES
dc.description.references	Ariño, C., Pérez, E., Sala, A., 2010. Guaranteed cost control analysis and iterative design for constrained takagi-sugeno systems. Engineering Applications of Artificial Intelligence 23 (8), 1420-1427. https://doi.org/10.1016/j.engappai.2010.03.004	es_ES
dc.description.references	Armesto, L., Girbés, V., Sala, A., Zima, M., Smídl, V., 2015. Duality-based nonlinear quadratic control: Application to mobile robot trajectory-following. IEEE Transactions on Control Systems Technology 23 (4), 1494-1504. https://doi.org/10.1109/TCST.2014.2377631	es_ES
dc.description.references	Athans, M., Falb, P. L., 2013. Optimal control: an introduction to the theory and its applications. Courier Corporation.	es_ES
dc.description.references	Bertsekas, D. P., 2018. Abstract dynamic programming. Athena Scientific.	es_ES
dc.description.references	Bertsekas, D. P., Tsitsiklis, J. N., 1996. Neuro-Dynamic Programming. Athena Scientific, Belmont, MA, USA.	es_ES
dc.description.references	Busoniu, L., Babuska, R., De Schutter, B., Ernst, D., 2010. Reinforcement learning and dynamic programming using function approximators. CRC press, Boca Raton, FL, USA.	es_ES
dc.description.references	Busoniu, L., Ernst, D., De Schutter, B., Babuska, R., 2010. Approximate dynamic programming with a fuzzy parameterization. Automatica 46 (5), 804-814. https://doi.org/10.1016/j.automatica.2010.02.006	es_ES
dc.description.references	Camacho, E. F., Bordons, C., 2010. Control predictivo: Pasado, presente y futuro. Revista Iberoamericana de Automática e Informática Industrial 1 (3), 5-28.	es_ES
dc.description.references	De Farias, D. P., Van Roy, B., 2003. The linear programming approach to approximate dynamic programming. Operations research 51 (6), 850-865. https://doi.org/10.1287/opre.51.6.850.24925	es_ES
dc.description.references	Deisenroth, M. P., Neumann, G., Peters, J., et al., 2013. A survey on policy search for robotics. Foundations and Trends in Robotics 2 (1-2), 1-142. https://doi.org/10.1561/2300000021	es_ES
dc.description.references	Díaz, H., Armesto, L., Sala, A., 2019. Metodología de programación dinámica aproximada para control óptimo basada en datos. Revista Iberoamericana de Automática e Informática industrial 16 (3), 273-283. https://doi.org/10.4995/riai.2019.10379	es_ES
dc.description.references	Díaz, H., Armesto, L., Sala, A., 3 2020. Fitted Q-function control methodology based on takagi-sugeno systems. IEEE Transactions on Control Systems Technology 28 (2), 477-488. https://doi.org/10.1109/TCST.2018.2885689	es_ES
dc.description.references	Díaz, H., Sala, A., Armesto, L., 2020. A linear programming methodology for approximate dynamic programming. International Journal of Applied Mathematics and Computer Science 30 (2).	es_ES
dc.description.references	Duarte-Mermoud, M., Milla, F., 2018. Estabilizador de sistemas de potencia usando control predictivo basado en modelo. Revista Iberoamericana de Automática e Informática industrial. https://doi.org/10.4995/riai.2018.10056	es_ES
dc.description.references	Fairbank, M., Alonso, E., 6 2012. The divergence of reinforcement learning algorithms with value-iteration and function approximation. In: The 2012 International Joint Conference on Neural Networks (IJCNN). pp. 1-8. https://doi.org/10.1109/IJCNN.2012.6252792	es_ES
dc.description.references	Grüne, L., 1997. An adaptive grid scheme for the discrete hamilton-jacobibellman equation. Numerische Mathematik 75, 319-337. https://doi.org/10.1007/s002110050241	es_ES
dc.description.references	Hornik, K., Stinchcombe, M., White, H., 1989. Multilayer feedforward networks are universal approximators. Neural Networks 2 (5), 359 - 366. https://doi.org/10.1016/0893-6080(89)90020-8	es_ES
dc.description.references	Inc, T. M., 2021. Matlab delaunay documentation. URL: https://www.mathworks.com/help/matlab/ref/delaunay.html	es_ES
dc.description.references	Lewis, F. L., Liu, D., 2013. Reinforcement learning and approximate dynamic programming for feedback control. Wiley, Hoboken, NJ, USA.	es_ES
dc.description.references	https://doi.org/10.1002/9781118453988	es_ES
dc.description.references	Lewis, F. L., Vrabie, D., 2009. Reinforcement learning and adaptive dynamic programming for feedback control. Circuits and Systems Magazine, IEEE 9 (3), 32-50. https://doi.org/10.1109/MCAS.2009.933854	es_ES
dc.description.references	Li, W., Todorov, E., 2007. Iterative linearization methods for approximately optimal control and estimation of non-linear stochastic system. International Journal of Control 80 (9), 1439-1453. https://doi.org/10.1080/00207170701364913	es_ES
dc.description.references	Liberzon, D., 2011. Calculus of variations and optimal control theory: a concise introduction. Princeton university press. https://doi.org/10.2307/j.ctvcm4g0s	es_ES
dc.description.references	Munos, R., Moore, A., 2002. Variable resolution discretization in optimal control. Machine learning 49 (2-3), 291-323. https://doi.org/10.1023/A:1017992615625	es_ES
dc.description.references	Rubio, F. R., Navas, S. J., Ollero, P., Lemos, J. M., Ortega, M. G., 2018. Control óptimo aplicado a campos de colectores solares distribuidos. Revista Iberoamericana de Automática e Informática industrial.	es_ES
dc.description.references	Santos, M., 2011. Un enfoque aplicado del control inteligente. Revista Iberoamericana de Automática e Informática Industrial RIAI 8 (4), 283-296. https://doi.org/10.1016/j.riai.2011.09.016	es_ES
dc.description.references	Sherstov, A. A., Stone, P., 2005. Function approximation via tile coding: Automating parameter choice. In: International Symposium on Abstraction, Reformulation, and Approximation. Springer, pp. 194-205. https://doi.org/10.1007/11527862_14	es_ES
dc.description.references	Sutton, R. S., Barto, A. G., 1998. Reinforcement learning: An introduction. Vol. 1. MIT press Cambridge.	es_ES
dc.description.references	Ziogou, C., Papadopoulou, S., Georgiadis, M. C., Voutetakis, S., 2013. On-line nonlinear model predictive control of a pem fuel cell system. Journal of Process Control 23 (4), 483-492. https://doi.org/10.1016/j.jprocont.2013.01.011	es_ES
dc.relation.references	10.1007/s10994-007-5038-2	es_ES
dc.relation.references	10.1109/FUZZ-IEEE.2014.6891633	es_ES
dc.relation.references	10.1016/j.engappai.2010.03.004	es_ES
dc.relation.references	10.1109/TCST.2014.2377631	es_ES
dc.relation.references	10.1016/j.automatica.2010.02.006	es_ES
dc.relation.references	10.1287/opre.51.6.850.24925	es_ES
dc.relation.references	10.1561/2300000021	es_ES
dc.relation.references	10.4995/riai.2019.10379	es_ES
dc.relation.references	10.1109/TCST.2018.2885689	es_ES
dc.relation.references	10.4995/riai.2018.10056	es_ES
dc.relation.references	10.1109/IJCNN.2012.6252792	es_ES
dc.relation.references	10.1007/s002110050241	es_ES
dc.relation.references	10.1016/0893-6080(89)90020-8	es_ES
dc.relation.references	10.1002/9781118453988	es_ES
dc.relation.references	10.1109/MCAS.2009.933854	es_ES
dc.relation.references	10.1080/00207170701364913	es_ES
dc.relation.references	10.2307/j.ctvcm4g0s	es_ES
dc.relation.references	10.1023/A:1017992615625	es_ES
dc.relation.references	10.1016/j.riai.2011.09.016	es_ES
dc.relation.references	10.1007/11527862_14	es_ES
dc.relation.references	10.1016/j.jprocont.2013.01.011	es_ES

Ficheros en el ítem

Descargar (3.812Mb)

ArmestoSala - M...r....pdf

PDF

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

Riunet Móvil

Método de error de Bellman con ponderación de volumen para mallado adaptativo en programación dinámica aproximada

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)

Tema móvil para Riunet