Effective talk: Making neural networks in games
Year of Talk: 2024
Reinforcement learning in-game
When initially working on different neural networks to act as players for a possible game they realized that it would normally require 16+ hours to fully train them and many times would not work as intended. This partially came down to the fact that they were highly limited by the hardware they were using at the time and that much of the work needed to be done in a very non-visual way while programming.
This combined with the long process of installing and debugging various software meant that not a lot of people on the team could try it out and test new ideas. So they needed to find a new way of making it more accessible.
A solution the team worked on and turned into a game mechanic was through real-time Reinforcement learning training. Where the agent would appear unable to do any tasks but the player can enter a space called “The Training Cloud” to train the agent. While there the player can choose from a list of intractable items to put in an area as well as the reward system for said interactable items. The Agent will then train itself to complete the task. Given the small test area, a handful of items and tasks that the agent can be trained on are required to limit the amount of time to train the agent. This meant that it could solve single problems in under 10 minutes.
Designing a world for training
The first step then needed to solve was designing the game in a way that would allow the players to better visualize and work on training the different agents.
For example, the team wanted to make sure the player could better understand how the agents would learn and could visually see them getting better over time. The team also wanted to try and make the agents be able to interact with each other in the game space. At the same time, there was a plan to make sure that each agent wouldn't over-optimize and allow them to do more general tasks.
Making a more general-purpose agent
The way this was done was by extracting what each agent can do to different items that can be equipped or unequipped. Where the different items allow them to complete some certain task on a separate square. This was even more simplified by representing the map as a 2d grid where only one item could appear per tile. Which helped the agent have an easier time processing the information.
By then splitting the different actions into a turn system allows the agent to spend more time processing which action to do. There was a maximum of 8 actions that the agent could do during only of its actions. Initially, they planned on having more options but found that limiting themselves to only 8 and when certain actions were used one after the other would allow for more actions. Which in the long term helped speed up its decision-making process as more time was spent on recognizing patterns instead of which action to use. At the same time, much of the information being passed to the agents was through a string of booleans determining what actions could be done on each tile. This was later condensed as certain actions would have similar effects if done so by combining them into larger groups helped speed up the training process significantly.
The devs also spent considerable time working on the character designs and their animation as a way to communicate to players how the agents slowly become smarter over time as they do more training.
The team was able to have the neural networks build themselves out with between 128 neurons (In 2 layers) and 512 neurons (In 4 layers). They determined that ranges allowed the agents to be complex enough for the tasks in the game while still being simple to create.
Designing the training experience
A problem early on was the fact that many of the agents would over-specialize for certain levels which caused complications when moving to new levels. The solution was to use a simulation set of levels for training and then have them move to an evaluation set.
The training process that occurs in the game consists of observing the training progress and how the agent reacts to the rewards as well as how the behavior changes. After that the training would stop and any modifications to the environment or rewards would be done to help improve training at which point the training could start again.
Something else the team implemented was having the player set end conditions. This would speed up testing as there wouldn't be less of a chance for the agents to get stuck trying to acquire the final items or already completed the main task. Players were also allowed to be affected by how much the map would change in each iteration. This would allow players to push the agents to specialize more on certain types of levels or become better in a wider array of levels. Some of the variations included player starting positions, item positions, or terrain positions.
During testing most of them will be calculated in the backend of the game and the average is shown in a graph. This lets the players know how all the agents are working over time. The way it is processed is that the level data and all inputs are acquired from Unity and transferred to a Python program running in the background. Since Unity had very poor performance when running multiple ML agents at the same time it was extracted and run in Python. During testing one of the versions would be shown to the player to show training over time. Which would help players know how the agents were progressing over time.
By having the ML be processed separately through Python it can also be sent to cloud services so that it can be processed there instead of the player's computer. This is a feature the team has considered if they try to port the game over to phone and tablet devices. Using the cloud can help improve training speeds even more.
Review for speeding up training:
Discrete amount of space
Reduce the amount of dimension being used (3d → 2d)
Limit the amount of time per level
Limit the amount of action possible
Exploit overlapping actions
Exploit symmetric information about the world (Or make a symmetric world)
Optimizing the code as much as possible
Making it into a game:
An important design pillar for the game was to insensitive the player to use unique combinations of tools and mechanics to train the agents. The game also gives quests to the player as a way to challenge them and find new combinations of tools that can be used to train the agents.
Having challenges posed forces players to spend more time thinking of different ways of using the training mechanics. Especially on how to produce the best agents that are smart enough to complete the quests but also generalist enough to still be used.
Another feature of the game is the fact that as the game progresses more actions open up for the player. This helps the players learn about the different ways people normally train RL agents and all the different considerations that need to be taken into account.
コメント