CFR-MIX: Solving Imperfect Information Extensive-Form Games with Combinatorial Action Space

- Novel Scalable Algorithm for Solving Team-Adversary Games

Team-Adversary games can model many real-world scenarios. For example, many policemen should coordinate with each other to catch an attacker. The solution of this game can help these policemen catch the attacker more efficiently.

Our new algorithm framework, CFR-MIX, was developed under Counterfactual Regret Minimization (CFR) framework to solve games with combinatorial action space.  The advantages of our method are as follows:

  1. Proposed a new strategy representation to reduce the strategy space and a consistency relationship between new and original strategy representations to maintain the cooperation between team members
  2. Transformed this consistency relationship to the consistency relationship between the cumulative regret values
  3. Proposed a novel decomposition method over cumulative regret values to guarantee the consistency relationship between the cumulative regret values
  4. Proposed the CFR-MIX algorithm which employs a mixing layer to implement the decomposition method under the CFR framework

One of the most popular algorithms to solve imperfect-information extensive-form games is CFR. The joint action space of the team is exponential with the number of team members. CFR and its sampling-based variants use tabular-form to record the joint action which results in impracticality due to the limited memory. Furthermore, it is also ineffective to train the strategy network over the large joint action space if we use deep neural networks to replace the tabular form.

This figure shows the cumulative regret neural network architecture used by the team player. It is the core of the CFR-MIX algorithm.    In the experiments, we evaluate CFR-MIX in two games in different domains.

  • Goofspiel Game: a poker game (C: number of cards, P: number of players, R: round number)
  • Pursuit-Evasion Game: a team of pursuers aims to capture the evader, while the evader aims for the opposite

This algorithm can be applied to solve the problems of stopping speeding vehicles, chasing criminals, and other real-world scenarios involving several players playing against an opponent. 

Shuxin Li, Youzhi Zhang, Xinrun Wang, Wanqi Xue, Bo An. "CFR-MIX: Solving imperfect information extensive-form games with combinatorial action space." In Proceedings of the 30th  International Joint Conference on Artificial Intelligence (IJCAI'21), pp.3663-3669. (Link)

Tags:

Security game
Deep learning

For more details on the above research and its applications, please contact
Singtel Cognitive and Artificial Intelligence Lab for Enterprises@NTU
(SCALE@NTU)

Nanyang Technological University
School of Computer Science and Engineering
Nanyang Avenue, Block N4 #B3A-02, Singapore 639798