Real-time Character Control: Optimizing Mobile Games with TVM

Real-time Character Control: Optimizing Mobile Games with TVM

Table of Contents

  1. Introduction
  2. Background on Game Character Control
    • Animation and Scheme Character Control
    • Real-time Charter Control
    • Latency Requirements
  3. The Algorithm: Phase Function Neural Network (PFNN)
    • Architecture of PFNN
    • Problems with Multiple Characters
  4. Optimizing the Algorithm with TVM
    • Operator Fusion
    • Pseudocodes and Vector Matrix Multiplication
  5. Approaches for Leveraging TVM
    • Approach 1: TVM with Manual Scheduler
    • Approach 2: TVM with External Kernels
    • Approach 3: Answer - An End-to-End Approach
  6. Deployment and Performance Analysis
    • Workflow for Deployment
    • Performance Analysis Comparison
    • Lessons Learned
  7. Usability Challenges and Recommendations
    • TBM Installation and Compatibility
    • Static Library Integration
    • Dynamic Shape Support
  8. Conclusion
  9. FAQ

Introduction

In this article, we will explore how Tencent AI Lab utilized TVM (Tensor Virtual Machine) to optimize a mobile game model for real-time speeds. The focus of the project was on game character control and the requirement for real-time computation. We will discuss the background of game character control and the algorithm used, known as Phase Function Neural Network (PFNN). Additionally, we will delve into the optimization techniques employed with TVM, including operator fusion and vector matrix multiplication. We will also explore the different approaches used to leverage TVM, such as manual Scheduling, external kernels, and an end-to-end approach called Answer. Finally, we will examine the deployment process and performance analysis results, along with the lessons learned from the project. We will also discuss the usability challenges faced and provide recommendations for improvement.

Background on Game Character Control

Animation and Scheme Character Control

In game development, animation plays a crucial role in creating lifelike characters. The animation of characters is controlled through a system called scheme character control. This system predicts the anchor points of the character in the next frame, allowing the game engine to render the character accordingly. The entire algorithm for character control is embedded in the mobile game, making it a real-time computation problem. The requirement for low latency poses a significant challenge, particularly when there are multiple characters in the game scene.

Real-time Charter Control

Real-time charter control involves predicting the anchor points of the game character in the next frame based on its current status. This prediction is crucial for rendering the character accurately in real-time. The algorithm used for real-time charter control is based on a Phase Function Neural Network (PFNN) architecture.

Latency Requirements

The latency requirements for real-time charter control are quite high, especially when there are multiple characters in the game scene. To ensure a smooth gaming experience, the inference latency should be within two milliseconds for each character. Meeting these high-performance requirements poses significant challenges.

The Algorithm: Phase Function Neural Network (PFNN)

The Phase Function Neural Network (PFNN) is a neural network architecture used for real-time charter control. It consists of four layers of fully-connected neurons. What makes PFNN unique is the use of a phase function for dynamically changing the weight matrices. The phase function takes a fixed value corresponding to the character's status and interpolates the weight matrix accordingly. This interpolation allows for smooth transitions and realistic character animations.

However, when there are multiple characters in the game scene, the algorithm faces difficulties. Each character has its own set of interpolated weight matrices, making batch computation inefficient. To address this challenge, the team fused the interpolation function with matrix multiplication, eliminating the need for interpolated weight matrices. This fusion greatly improved the computation efficiency.

Optimizing the Algorithm with TVM

Operator Fusion

Operator fusion plays a crucial role in optimizing the algorithm for real-time speeds. By fusing the interpolation function with matrix multiplication, the team eliminated the need for interpolated weight matrices and improved computation efficiency. The fusion of operators reduced the interpolation matrix from an RC shape to a local variable, resulting in significant performance gains. The loop order of the fused operator was also optimized to match the shape of the matrix multiplication, further improving performance.

Pseudocodes and Vector Matrix Multiplication

The optimization process involved two main pseudocodes: one for the fixed function and another for the vector matrix multiplication loop. The fixed function pseudocode performed the phase function loop, while the vector matrix multiplication pseudocode computed the fully connected layer. By fusing these two loops and changing the loop orders, the team achieved high-performance results. The fusion reduced the interpolation matrix and enabled more efficient computation.

Approaches for Leveraging TVM

Approach 1: TVM with Manual Scheduler

The first approach involved using TVM with a manual scheduler. The team manually defined the tensor expressions, algorithm, and large schedulers. This approach required extensive knowledge in high-performance computing (HPC). Although it provided flexibility and control over the process, it required significant manual effort. Automatic generation and deployment were also challenging with this approach.

Approach 2: TVM with External Kernels

The Second approach utilized TVM with external kernels. The team had already developed highly efficient inner kernels for various hardware platforms. They composed scheduler templates for auto TVM tuning, which involved defining tensor expressions, preparing kernels, and allowing the system to tune the parameters. Although this approach achieved high performance, the tuning process could be time-consuming.

Approach 3: Answer - An End-to-End Approach

The third approach, known as Answer, offered an end-to-end solution for optimizing the algorithm. Answer involved defining tensor expressions, analyzing the network, generating the schedulers, and deploying them onto the target hardware. The advantage of Answer was its short search time and desirable performance results. With Answer, the optimization process took only a couple of hours and required minimal effort. This approach proved to be a highly efficient and effective solution.

Deployment and Performance Analysis

The deployment workflow involved packing the TVM runtime into the game card. The TVM runtime was integrated into the game project folder and compiled alongside the game. For performance tuning, the team followed an offline tuning pipeline. They defined the algorithm, wrote schedulers or auto-generators, and generated optimized libraries containing kernel functions for TVM. The libraries were loaded dynamically depending on the game's runtime requirements.

Performance analysis was conducted to compare different approaches. The team used the Agent library as the baseline and observed its performance for different character counts. The results showed that the TVM approach, particularly the Hunch schedulers, met the latency requirements. Auto TVM tuning achieved even higher performance, nearing the hardware peak. The performance analysis revealed the significant impact of operator fusion and highlighted the desirability of the TVM optimization techniques.

Lessons Learned

During the project, several lessons were learned:

  1. The importance of operator fusion: Operator fusion played a crucial role in achieving high-performance results. Contrary to popular belief, operator fusion proved to be essential for optimizing the algorithm for real-time speeds.
  2. The power of TVM optimization techniques: The TVM optimization techniques, such as operator fusion and vector matrix multiplication, showed remarkable performance gains. Leveraging these techniques resulted in desirable outcomes.
  3. The effectiveness of Answer: The Answer approach provided an end-to-end solution with fast search times and desirable performance results. It proved to be an efficient and effective approach for optimizing the algorithm.
  4. Challenges with communication and cooperation: The team encountered communication issues within TVM that prevented them from achieving even higher performance. They also noted the need for better cooperation between handwritten operators and schedulers.
  5. Usability challenges and recommendations: The team faced challenges with the static library integration. They recommended official support for static libraries in TVM. Additionally, they encountered difficulties in installing TBM to people or Anaconda and suggested making the installation process more user-friendly. Dynamic shape support was also highlighted as a desirable feature for easier library generation.

Usability Challenges and Recommendations

The project highlighted several usability challenges with TVM and TBM. To enhance the usability of TVM and make it more accessible for developers, the following recommendations are suggested:

  1. Official support for static library integration: Integrating static libraries into game development is crucial for efficient deployment. The team recommended official support and easier integration processes for static libraries in TVM.
  2. Improved installation process: The team faced difficulties in installing TBM to people or Anaconda. They suggested making the installation process more user-friendly and compatible with popular development environments.
  3. Dynamic shape support: TBM currently assumes static shapes, which can be challenging when dealing with dynamically changing character numbers. The team recommended better support for dynamic shape handling, making library generation more efficient and straightforward.

Conclusion

Optimizing a mobile game model for real-time speeds is a challenging task, but with the right tools and techniques, it is achievable. This article explored how Tencent AI Lab utilized TVM to optimize a game character control algorithm. The team leveraged operator fusion, vector matrix multiplication, and TVM optimization approaches to achieve high-performance results. They also highlighted the usability challenges faced and provided recommendations for improving TVM's usability. By continuously improving optimization techniques and addressing usability challenges, developers can create even more immersive and responsive gaming experiences.

FAQ

Q: Which approach provided the best performance results?\ A: The Auto TVM tuning approach offered the highest performance results, achieving 30% faster speeds compared to the Hunch schedulers.

Q: Can TVM be used with different processors?\ A: Yes, TVM can be used with different processors. The team performed comparisons and auto-tuning for different processors to achieve optimal performance.

Q: What were the main challenges faced during deployment?\ A: The main challenges during deployment were integrating the TVM runtime into the game card and generating optimized libraries for different hardware platforms.

Q: Did the team face any challenges with compatibility and integration?\ A: Yes, the team faced challenges with TBM installation and compatibility with popular development environments like Anaconda. They recommended making the installation process more user-friendly.

Q: What were the key lessons learned from the project?\ A: The key lessons learned were the importance of operator fusion, the effectiveness of TVM optimization techniques, the power of the Answer approach, and the need for better communication and cooperation within TVM.

Q: How long did the optimization process take using Answer?\ A: The optimization process using Answer took approximately two hours, which included defining tensor expressions and allowing Answer to generate schedulers for optimal performance.

Q: What were the performance results compared to the baseline?\ A: The TVM approach, particularly the Hunch schedulers, met the high-latency requirements. Auto TVM tuning achieved even higher performance, nearing the hardware peak.

Q: What were the key recommendations for improving TVM's usability?\ A: The key recommendations included official support for static library integration, easier installation processes, and better support for dynamic shape handling.

Q: How long did it take to achieve the desired level of performance?\ A: Achieving the desired level of performance took approximately three weeks, including testing and tuning different parameters.

Most people like

Find AI tools in Toolify

Join TOOLIFY to find the ai tools

Get started

Sign Up
App rating
4.9
AI Tools
20k+
Trusted Users
5000+
No complicated
No difficulty
Free forever
Browse More Content