Expert Guided Policy Optimization

Safe Driving via Expert Guided Policy Optimization

The Conference on Robot Learning (CoRL) 2021

Zhenghao Peng¹*, Quanyi Li³*, Chunxiao Liu², Bolei Zhou¹

¹The Chinese University of Hong Kong, ²SenseTime Research
³Centre for Perceptual and Interactive Intelligence

Webpage | Code | Poster | Paper

EGPO-trained agent exhibits safer driving behaviors than the PPO agent and obtains much lower cost.

Expert Guided Policy Optimization (EGPO)

We develop a novel Expert Guided Policy Optimization (EGPO) method, integrating the guardian in the loop of RL, which contains an expert policy and a switch function to decide when to intervene. We utilize constrained optimization and offline RL techniques to tackle trivial solutions and improve the learning on expert's demonstrations. Safe driving experiments show that our method achieves superior training and test-time safety, sample efficiency, and generalizability.

Experiment Result

We show that the proposed framework can achieve superior training efficiency and performance as well as extremely low training safety violations in the safe driving tasks of MetaDrive Simulator.

Demo Video

Reference

@inproceedings{peng2021safe,
  title={Safe Driving via Expert Guided Policy Optimization},
  author={Peng, Zhenghao and Li, Quanyi and Liu, Chunxiao and Zhou, Bolei},
  booktitle={5th Annual Conference on Robot Learning},
  year={2021}
}