Learning to Simulate Self-driven Particles System with Coordinated Policy Optimization

NeurIPS 2021

Zhenghao Peng1,   Quanyi Li3 Chunxiao Liu2Bolei Zhou1 
1The Chinese University of Hong Kong, 2SenseTime Research
3Centre for Perceptual and Interactive Intelligence
Webpage | Code | Talk | Poster | Paper | Results&Models
CoPO-trained agents exhibit social behaviors.

Coordinated Policy Optimization (CoPO)

We develop a novel MARL method Coordinated Policy Optimization (CoPO) to facilitate the bi-level coordination of agents to learn the controllers of the Self-driven Particles systems, especially traffic flows.

CoPO consists of the Local Coordination, a mechanism to coordinate agents' objectives in neighborhood by use local coordination factor (LCF) to weight the individual reward and the neighborhood reward, and the Global Coordination, which uses meta-gradient to update LCF.

CoPO can learn realistic crowd actions as well as safe and socially compliant driving skills.

Experiment Result

We develop 5 Multi-agent traffic simulation environments based on MetaDrive.

Compared to baselines, the proposed CoPO method achieves superior performance in success rate (ratio of vehicles achieving destinations), efficiency (frequency of successes) and safety (total number of crashes, the lower the better) as shown in the below figure.

The following video shows the population behaviors in Intersection environment. We use the red dots to indicate crashes. CoPO population is coordinated and high-performing.

Talk
Chinese version of talk can be found at bilibili.
Reference
@article{peng2021learning,
  title={Learning to Simulate Self-Driven Particles System with Coordinated Policy Optimization},
  author={Peng, Zhenghao and Hui, Ka Ming and Liu, Chunxiao and Zhou, Bolei and others},
  journal={Advances in Neural Information Processing Systems},
  volume={34},
  year={2021}
}