Efficient Learning of Safe Driving Policy via Human-AI Copilot Optimization
|
Webpage | Code | Video | Talk | Paper | Poster |
We develop an efficient Human-AI Copilot Optimization method (HACO), which incorporate human into the Reinforcement Learning (RL) training loop to boost the learning efficiency and ensure the safety. Sidestepping the requirement of complex reward engineering, HACO injects the human knowledge extracted from partial demonstration into the proxy value function by Offline RL technique. On the other hand, entropy regularization and intervention minimization are used for encouraging exploration and saving human budget respectively. The comprehensive experiments show the superior sample efficiency and safety guarantee of the proposed method.
We summarize our core technical comtribution in this talk.
HACO is tested on MetaDrive Simulator, which is efficient and allows generating various scenarios. Here, we provide the full training process of HACO and compare it with RL, IL and Offline RL baselines. As a result, HACO achieves superior sample efficiency with safety guarantee and outperform all baselines.
Furthermore, we benchmark HACO on CARLA Simulator, where agent takes semantic top-down view as observation. Equipped with 3-layer convolution neural network, HACO agent learns not only the feature extractor but the driving policy with human involvement in 10 minutes.
@inproceedings{ li2022efficient, title={Efficient Learning of Safe Driving Policy via Human-AI Copilot Optimization}, author={Quanyi Li and Zhenghao Peng and Bolei Zhou}, booktitle={International Conference on Learning Representations}, year={2022}, url={https://openreview.net/forum?id=0cgU-BZp2ky} }