Guide to making AI agents that are so good, its actually scary 🤖

8 min readMar 13, 2021

A blogpost for AI and game development enthusiast that want to use leading edge technology to create amazing AIs, better, faster and scaringly good by learning from humans.

Introduction to Reinforcement Learning (RL)

Before we dive right into creating AI agents that can perform complex tasks, it’s worthwhile introducing some relevant concepts that will be used in this blogpost. So lets start with what is Reinforcement Learning (RL)? RL is an area of Machine Learning focused on creating intelligent agents that can complete tasks on their own using reinforcement signals (punishment or positive rewards). RL algorithms have been used in industries ranging from robotics, manufacturing, energy to video games, with notable mentions being OpenAI’s StartCraft 2, Dota 2, and Go agents that beat the best players in their respective games. A straightforward way to describe an RL agent and its relationship with its environment can be seen in Figure 1, where an agent gets constantly inputted a specific state and reward from the environment, making the agent take an action, thus changing the environment’s state and reward.

Figure 1: Basic diagram abstraction of how reinforcement algorithms' main components interact with each other. https://www.kdnuggets.com/2018/03/5-things-reinforcement-learning.html

Drawbacks of Reinforcement Learning

Using RL algorithms is unfortunately not the solution for all problems as it has some drawbacks that make some implementations too computationally expensive to run. For instance, high dimensional environments, very delayed rewards, and not understanding the correlation between tasks are some hurdles that can make training and agent very tedious and computationally expensive. Even simple tasks like understanding that completing an action A gives the ability to do action B, might be too complicated for a simple agent to learn. A great example of this A-B relationship is shown in the famous Atari game called “Montezuma Revenge,” where a key (A) gives an agent the ability to enter a door and get a reward (B). Although this might sound like a simple task, if we let our agent explore the game space without “helping” our agent understand the correlation between tasks, the agent might never understand the “door-key” relationship and thus never reach its desired goal. But don’t worry, there are ways of solving this, and in this blog post, we will use one possible solution known as imitation learning to solve a similar problem.

Figure 1: Image of Montezumas’ Revenge scene showing the room with the key and the closed doors. https://www.theverge.com/2016/6/9/11893002/google-ai-deepmind-atari-montezumas-revenge

Imitation Learning

Imitation Learning is a set of techniques that aim to help agents mimic human behavior by mapping state action pairs of human demonstrations or recordings and letting an agent sample from the recordings. Imitation Learning procedures are no different from the methods used to teach humans to do specific tasks, as we often imitate others behavior when we are in completely novel environments. For instance, if you are driving in a foreign city, you might imitate how certain drivers in front of you move in the road as they might have more experience regarding where certain potholes might be located.

Its worth mentioning that imitation learning encompasses a family of distinct algorithms that all aim to do a similar job. For this particular blogpost, we will be using a Generative Adverserial Imitation Learning approach which is similar to Direct Policy Learning. Both methods in short is a method of imitation learning that uses human-created data and feed it to an agent to learn a policy from, it then lets the agent proceed and learn and update its policy iteratively by itself levering human created data as well at the agents own data. For more information on different Imitation Learning Algorithms SmartLab created a great blog post on the topic here and a gail specific post here. In the Gail example we also have a discriminator that accesses if a sample is from a human or if it was created by the agent itself this will further be explained later on in the post.

Now that we got most of the definitions out of the way, we can now create an agent using Imitation Learning.

Step 1: Download Unity

Unity is a free game engine for students and individuals, which offers a vast ecosystem of tools with a lot of support from game developers from around the world. Download the Unity Hub application from the main Unity website here. Unity also has external packages that can be downloaded to import certain assets, scripts, and animations. For our AI examples, we will only use UnityML, which is an official Unity package that includes various examples and code snippets to train and visualize AI agents. You can also run multiple agents in parallel using UnityML, which greatly speeds up training time.

Once you have successfully downloaded Unity, you will see an application running called Unityhub, here you can manually download specific versions of Unity on your computer. For this blog post, we will be using Unity 2018.431f1..

Step 2: Download Python

UnityML will need Python to be installed. I will be using Python version 3.9.2 and pip version 21.0.1, a python package installer, but feel if you want to try blogpost this with a different Python configuration. Figure 3 showcases all the configurations options that can be used with UnityML.

Figure 3: Possible UnityML, Unity and Python Configurations that can be used

With Python, Pip and Unity installed, we can now run the following command to make sure unityml packages are installed in our computer’s terminal.

pip install mlagents

Step 3: Github Repository Cloning

Now make sure to clone the unityml repo on your computer from its official Github page. If you don’t have a Github account, feel free to make one here.

It’s very likely that just installing these packages, you might encounter specific errors, so feel free to go on Stackoverflow for unityml related questions or contact me if you are having trouble running these packages.

Step 4: Running UnityML Demos Scenes

Assuming you could download all relevant packages and cloned the Github Repo, you can now create a new Unity 3D project. (Make sure to install the specific Unity version you want to use before starting a default project). Now, open your Unity game file and its should load an empty world like in Figure 4.

Figure 4: Empty Unity3d world screenshot

Now you will want to import the packages from UnityML from the Package Manager under the “window” tab at the top left. Search for UnitML and download the relevant packages into your project. Also, you should be able to import the packages from your cloned ml-agents repository to you Unity3D project. If you got to this step, your setup is done.

Figure 4: Here, I created a collage of some games/scenes that one can find in the default unityml folder. Once a scene is initialized, the default AI agents provided in the Unity package isused. (Collage created using screenshots from my computer)

There are a bunch of AI’s worth exploring, feel free to go to the Examples folder and browse all the particular games scenes that are already present. We will focus on creating an agent in the game Pyramids as it’s very complicated and can best show how Imitation Learning can considerably boost our agent’s training time. (In your folder you can open this scene by going to ML-Agents>Examples>Pyramids>Scenes>Pyramids)

Figure 5: Pyramid game (Screenshot from my computer)

Step 5: Start Recording

Once your scene is set up try running the game and see how the default Ai agent behaves. Its alright at the game, nothing too extraordinary. Now, while being on edit mode, select the agent in the game and add an MLComponent using the Right Inspector Tab. Figure 6 helps for reference. Once you scroll down and press the Add Component Button and select ML Agents, add a “Demonstration Recorder” Script. Once the script is added toggle the “Record” button, add a name to your recordings(optional add a name fo the folder you want to save your recording too).

Now in the same inspector tab, find the “Behavior Parameters Script” Tab and change the property “Behavior Type” to be “Heuristic Only” (Its usually setup as Default). These steps make sure that we setup our environment to record our movements in the game. Now start the game, notice that you will be controlling the game with your keyboard. Make sure to do multiple runs with the agent (Controls are WASD). Once you finished your recording, a file will appear wherever you set up your recordings to appear. You can view stats of you gameplay by clicking the recording script as seen in Figure 7.

Now look for the projects “config” folder which might be outside your Unity Folder.

Inside the folder go inside imitation and then Pyramid. (recap config>imitation>pyramid). Open up a yaml file in a code or text editor and copy the following line of code inside the Yaml file.

gail:
        strength: 0.01
        gamma: 0.99
        encoding_size: 128
        demo_path: "Full Path of your Game Recording"

Feel free to play around with the properties like strength, gamma, econding size and so forth. There is a great officla Unity Blog Post about Gail and its parameters here.

Now press play again and you will be able to notice a difference on how your agent plays now. The difference in how the agent behaves is because it will not imitate some of you actions that you recorded. I leave you with the following GIF of how my agent played.

Gif 1: Recording of how our agent becomes able to finish the Pyramid game leveraging the recordings that were done by a human player.

Congratulations, you were able to record yourself playing a game in order to help an AI agent navigate a very complicated tasks🥳🥳🥳.

I plan to do another tutorial on how to create an agent brain from scratch later on as well as how to submit you trained models to an AI model agreagator that is coming soon. Thank you for your time and I hope you found this helpfull.

Links mentioned for quick access.

AlphaStar: Grandmaster level in StarCraft II using multi-agent reinforcement learning

AlphaStar is the first AI to reach the top league of a widely popular esport without any game restrictions. This…

deepmind.com

OpenAI Five

At OpenAI, we've used the multiplayer video game Dota 2 as a research platform for general-purpose AI systems. Our Dota…

openai.com

AlphaGo: The story so far

AlphaGo is the first computer program to defeat a professional human Go player, the first to defeat a Go world…

deepmind.com

Powerful 2D, 3D, VR, & AR software for cross-platform development of games and mobile apps.

We offer a range of plans for all levels of expertise and industries. All plans are royalty-free. Learn the tools and…

store.unity.com

Download Python

Information about specific ports, and developer info Source and binary executables are signed by the release manager or…

www.python.org

Installation - pip documentation v21.0

pip is already installed if you are using Python 2 >=2.7.9 or Python 3 >=3.4 downloaded from python.org or if you are…

pip.pypa.io

https://blogs.unity3d.com/2019/11/11/training-your-agents-7-times-faster-with-ml-agents/

Great Intro to Reinforcement Learning

https://www.kdnuggets.com/2018/03/5-things-reinforcement-learning.html

Montezuma Revenge Explanation and why its so complicated to solve

https://www.theverge.com/2016/6/9/11893002/google-ai-deepmind-atari-montezumas-revenge

Imitation Learning Post from SmartLab

https://smartlabai.medium.com/a-brief-overview-of-imitation-learning-8a8a75c44a9c

UnityML Github

https://github.com/Unity-Technologies/ml-agents