Science behind Tencent’s AI that beats 99.81% of human opponents in Honor of Kings
Tencent recently revealed an AI system that could beat teams of pros in its MOBA hit Honor of Kings (aka Arena of Valor).
Honor of Kings
The company rolled out a paper this week that details how the technology works. It uses a self-improving actor-critic architecture.
Honor of Kings is an example of a real-time strategy game with complicated environments (it has 10*600 possible states and and 10*18,000 possible actions) and complex objectives. Players learn to plan, attack, and defend. At the same time they also have to control skill combos, induce, and deceive opponents.
Tencent’s architecture consists of four modules:
- Reinforcement Learning (RL) Learner
- Artificial Intelligence (AI) Server
- Dispatch Module
- Memory Pool.
Researchers pitted their AI against five professional players — “QGhappy.Hurt,” “WE.762,” “TS.NuanYang,” “QGhappy.Fly,” and “eStarPro.Ca,” as well as other players of various background.
In public matches, the system win rate was 99.81% over 2,100 matches, and five of the eight AI-controlled heroes achieved a 100% win rate.
Honor of Kings only provides players with incomplete information meaning that they can only assume the actions their opponents take. As VentureBeat points out, “the endgame, then, isn’t merely AI that achieves Honor of Kings superhero performance, but insights that might be used to develop systems capable of solving some of society’s toughest challenges.” To that end, the Tencent researchers plan to make both their architecture and algorithms open source.