A project by Joseph Flaherty and Aaron Jimenez

Machine Learning aims to create intelligent systems that can perform specific tasks without explicitly being programmed how to do so. Reinforcement Learning is an extremely popular sub-field of machine learning which learns through trial and error, much like humans do. Reinforcement Learning has been used extensively in creating game-playing AI, with great success. Despite the popularity of the Pokémon series, no approach—Reinforcement Learning or otherwise— has been capable of conquering any of the games. In our capstone project, we aimed to create an agent that can tackle Pokémon Red.
Between the game’s over 25-hour average length, contrasting overworld and battle systems, inherent randomness, and non-linear gameplay, Pokémon Red is without a doubt an immensely daunting task for current Reinforcement Learning algorithms. The game is considered to be beaten once the player defeats all of the elite four and the champion in a series of back-to-back Pokémon battles. In order to accomplish this, the player must first traverse numerous challenges in the overworld to collect 8 gym badges by defeating 8 of the strongest Pokémon trainers across the region.
Right: The player traveling through route 1 in the overworld
To allow our agent to learn how to play the game, we first needed to create an environment for Pokémon Red. To do so we created an integration for the Gym Retro library which supports environments for various game consoles via emulator, although very few games are fully integrated. Creating an integration involved locating the memory addresses which held useful information and using those values to define a reward function so the agent knows how well it’s doing.
The Reinforcement Learning algorithm we ultimately used was Advantage Actor-Critic (A2C). A2C features two key functions: the actor and the critic. The actor makes decisions on what to do based on its current plan which is adjusted according to feedback from the critic. Conversely, the critic attempts to improve the actor’s plan according to how it thinks it should be acting.
We compared several different design options in order to improve the performance of our agent. One notable addition was the use of Imitation Learning. Imitation Learning is a technique in which the agent learns to do something by watching a demonstration of another person or AI do it. To accomplish this, we created a script that would record and package us playing the game and we subsequently used demos of ourselves playing to teach the agent how to play. In our work, we initially did some pre-training with this technique so that the agent knew roughly where to start learning how to play. Afterward, when it began trying to learn through reinforcement learning, it would already have some idea of what works to some extent. This is extremely beneficial considering the massive state space of the game because the agent is much more likely to find a viable strategy in a reasonably small amount of training.
Hi Joseph and Aaron,
This looks like such a fun way to learn about Reinforcement Learning. I am curious how difficult it was for you to integrate this game into Gym. Was that process pretty well documented as well?
Prof. Isaacs
Hi Dr. Isaacs,
Sorry for the late reply, WordPress’s default setting that requires comments to be approved before they are displayed caused us to miss this previously! We’ve removed this setting now so hopefully no more headaches should ensue.
The process of creating the integration was moderately difficult, but after some time we were able to get everything in the correct configuration. The most challenging part of the integration was the process of creating both the data for the reward function and the reward function itself. In a simple game, one might be able to set the reward function to relate to the score that the agent achieves, involving a linear relationship to just one piece of data in RAM. Conversely, in Pokemon Red, there isn’t a high score so we had to be more creative with how we rewarded the agent. These points are also critical to success as well because the loss function that the algorithms are trying to minimize is directly resulting from the reward function we created.
The need for more data and a better reward function meant that our integration was being updated regularly throughout the project. If your curious about all of the parts involved, see this page of documentation: https://retro.readthedocs.io/en/latest/integration.html
This page was quite helpful but assumes that the reward function was much simpler than what our task required. We ended up expanding upon this to write a more complex reward function in Lua. This portion was not documented as well but the documentation did at least bring us to the point where we had an emulated game acting as an environment for an RL agent which was good enough to set us in the right direction.
Thank you for your reply!
-Joseph