Safe Driving via Expert Guided Policy Optimization

The Conference on Robot Learning (CoRL) 2021

Zhenghao Peng1*,   Quanyi Li3*,  Chunxiao Liu2Bolei Zhou1 
1The Chinese University of Hong Kong, 2SenseTime Research
3Centre for Perceptual and Interactive Intelligence
Webpage | Code | Poster | Paper
EGPO-trained agent exhibits safer driving behaviors than the PPO agent and obtains much lower cost.

Expert Guided Policy Optimization (EGPO)

We develop a novel Expert Guided Policy Optimization (EGPO) method, integrating the guardian in the loop of RL, which contains an expert policy and a switch function to decide when to intervene. We utilize constrained optimization and offline RL techniques to tackle trivial solutions and improve the learning on expert's demonstrations. Safe driving experiments show that our method achieves superior training and test-time safety, sample efficiency, and generalizability.

Experiment Result
We show that the proposed framework can achieve superior training efficiency and performance as well as extremely low training safety violations in the safe driving tasks of MetaDrive Simulator.
Demo Video
Reference
@inproceedings{peng2021safe,
  title={Safe Driving via Expert Guided Policy Optimization},
  author={Peng, Zhenghao and Li, Quanyi and Liu, Chunxiao and Zhou, Bolei},
  booktitle={5th Annual Conference on Robot Learning},
  year={2021}
}