Talk Title: 'FIFA 22's' Hypermotion: Full-Match Mocap Driving Machine Learning Technology
Effective talk: Machine learning for motion capturing.
Year of Talk: 2022
The goal of the project was to use real-world data from matches played by teams and machine learning to drastically improve in-game animations.
The problem:
Originally when trying to do typical motion capture for the games the available space was quite small leaving little space to comfortably run around and try to play out what normally is seen professionally.
At the same time, many of the local talent in the studio and Vancouver isn't as high as seen in matches worldwide. This means that many moves that are normally on the harder skill level might not be possible because no one has said skill to execute them. There needed to be a way to expand the skill sets of the people being recorded to allow for those harder moves.
During many of the more professional matches there also tends to be a lot more competition and a certain edge that is harder to recreate in a small area.
The initial prototype started by using XSens camera to track 4 players in a scrimmage game to test how it would work/look. Said initial prototype showed positive signs as it could capture a lot of useful data as well as any problems that would need to be fixed. Two mentioned problems that were encountered included player drift and no ball data. The player drift would occur because over time the sensors would experience drift from the root and could move about 12 feet away over 10 minutes. Meanwhile, since no ball data had been recorded it would require animators to spend time recreating it from the recorded videos of the match.
Solving the drift was the team's initial priority as the data would be useless by the end if it wasn't accurate. This was done by setting up a local positioning system which included various cameras throughout the the stadium and beacons for each player to differentiate them. Which increased accuracy to within 5 CMs. This was later combined with the XSense cameras to fix the drift problems in the data collected.
Meanwhile when trying to solve the ball data having various static cameras recording the whole field as well as combining computer vision and uniquely collared called allowed the cameras to easily know wherever the ball might be and track it.
The actual shoot was later decided to be done with two teams in Spain. The process included having all the players start in designated taped spots and clap at the start of the match to sync all the cameras together. There were 8 static cameras used which would record 8K footage at around 50 FPS. Which would be around 270 MB per second of footage. The image below is a representation of how the cameras had been set up to best record the match. Also having the time code of world time watermarked on the recording helps improve syncing the cameras.
During the match, the plan was to have each shot take 10 - 15 minutes which would be controlled by the referee. This would allow for the games to feel high intensity while also keeping the data at a high standard of fidelity. The timing limit also helped limit file sizes that the team would need to work with as it would take a long time to upload or move the video files around.
The following challenges that appeared with the recorded data revolved around what animations would be the most useful, and how to properly use the data from different players concerning where the ball was at the moment.
Animation Workflow:
Originally animators would need to create a shot list for all the types of movements they need the motion capture people to perform in small groupings. This drastically helped animators quickly go to the recording and use it for the animations. However, the full match data was in a giant blob and would require a lot of processing to add the appropriate context and determine what animations could be used.
When going through the match and an animation is determined then on the frame it started the appropriate player is tagged with a title that explains what the action was done. Going to the player's animation capture and beginning on the correct frame count can allow the animators to grab the appropriate animation capture. At this point, the animation clip is the same as would be produced in the typical motion capture which would require minor cleaning and retargeting.
Ball Data:
Another important factor was acquiring the data from the ball. This data would allow animators to have better ideas on how animations would look when in contact with the ball. At the same time it allows for more context on why a player might turn in a certain direction if a ball is seen flying nearby. This was done through projection calculations from the 8 cameras to turnthe 3D world into 2D information but also properly project the ball back into a 3D scene.
The pipeline to track the ball started by having a distinctly collared ball that would be quite noticeable in the surrounding background. The presenter then made a video player using Python and OpenCV that would track the ball throughout the match. In the video player, it would proceed to turn all of the RGB values into their corresponding HSV. All the values that don't correspond to the appropriate HSV values for the ball would then not be rendered anymore.
Following that the cameras would be calibrated to determine the relative 3D to 2D coordinates. This was done by placing various 3D markers in the player and their relative positions on the 2D screen where determined. An algorithm would then be able to determine what projection would be needed to quickly move the data points from 3D to 2D space and vice versa.
Ball Physics:
An unforeseen advantage of capturing the ball data is that it allowed the team to also create more accurate ball trajectories. This comes down to the fact that they could have both the realistic trajectory appear next to the in-game values for similar kicks in the game and then compare how different they are. At this point, they could tune the physics parameters where necessary and test until a more accurate representation appeared.
Machine Learning: ML Flow
The team had very little experience working with ML so it would solve super small problems that would be a perfect fit.
An important factor for the ML was the fact that it could havea generic runtime. This is due to the fact that it allows for faster iterations for testing different combinations. It also supports more operations to be worked on.
Initially when there was no ML system implemented many animations would need to quickly transition to other animations to properly react to the new situation. This might be like requiring the player to jump an amount to properly transition to the start of the next animation.
The inputs for the ML system are the current pose of the player, the target pose the player is meant to end on as well as the planned pathed. The output will normally be the target animations for the following frames which can be fed back into the system to continually.
The following graphic is a visual explanation on how the data is transferred between the Mocap data/XSense into FBX and a python exporter to be combined with data from the Frostbite engine to create the final CSV file which is used by the system.
The system is primarily trained in AWS systems while using PyTorch. Which allows the team to monitor how the data is being us
ed in the ML system. After the data was processed it can be returned to the engine and tested to see if the results are what the team wanted.
Initially, it would take multiple days for the AI to properly run through all the data and get trained. Which was unacceptable for the team as minor changes might mean a lot of lost work. Therefore various optimisations needed to occur to shorten the wait time. The team was able to drop it down to around 2-3 hours per run while also optimizing the distribution training. Also running the data transfer through multicoring allowed for it to speed up the process of sending and receiving information.
A stress test to see how the final system would run was created where it had the ball be kicked at random directions with different strengths while the player would be executing an assortment of animations. This made sure the system would create the appropriate transition animation no matter what the player was doing or concerning the ball. After hundreds of tests were run (Usually overnight) the team could properly review all the results through different tables to determine any bugs that might be occurring with the animations. It also would show what buttons would have been pressed to reach any point allowing the team to recreate it as a way to see if it was a fluck or what other information could be gathered by recreating the problem.
What worked:
Captured a more realistic and competitive match.
More high-quality animations.
A bunch of unique animations that might have been hard to recreate.
Ball data which helped improve physics.
A lot of data to train ML networks.
Improvements needed:
There needs to be a way to improve the team's ability to find, select, and export animations from the recordings as it can be quite time-consuming to do so.
At the same time, the file sizes can end up being quite large and very taxing on computers, especially when moving around or trying to edit them.
Comments