TCC - Bacharelado em Ciência da Computação (Sede)
URI permanente para esta coleçãohttps://arandu.ufrpe.br/handle/123456789/415
Navegar
Item Um currículo de aprendizagem por reforço para recompensas modeladas no Lunar Lander(2021-07-19) Albuquerque, Renilson da Silva; Sampaio, Pablo Azevedo; http://lattes.cnpq.br/8865836949700771; http://lattes.cnpq.br/3364503614448061Reinforcement learning is a machine learning paradigm where the agent learns to solve problems interacting with an environment, executing actions in a trial and error sequence. For each action performed, the agent receives a reward from the environment indicating how effective it was in solving the whole problem. The agent’s objective is to maximize the total reward received. However, in some reinforcement learning problems, the agent needs to learn complex tasks receiving uninformative rewards, leading to the credit assignment problem that slows the agent’s training process. Reward shaping and curriculum learning are techniques that can speed up agent training time by separating the problem into smaller tasks to be solved sequentially, applying smaller and informative rewards for each action performed. Lunar Lander is a simplified 2D simulator used as a benchmark for reinforcement learning solutions to the optimization problem on landing control of a lunar module. However, its standard rewards system assigns much more punitive rewards for the use of the engines, not being very constructive for the agent, which can lead to the credit assignment problem. Hence, this work proposes a curriculum using two additional shaped reward models and runs experiments that aim to minimize the Lunar Lander learning time. This work found that both the new models and the curriculum were more effective in training the Lunar Lander agent compared to the standard rewards model.