Reward Learning From Very Few Demonstrations

Reward Learning From Very Few Demonstrations
December 10, 2020

This article introduces a novel skill learning framework that learns rewards from very few demonstrations and uses them in policy search (PS) to improve the skill. The demonstrations are used to learn a parameterized policy to execute the skill and a goal model to monitor executions. The rewards are learned from the goal model structure and its monitoring capability. Then, the goal model and the rewards are merged to obtain execution returns to be used with PS for improving the policy. In addition to reward learning, a black box PS method with an adaptive exploration strategy is adopted. The resulting framework is evaluated with five PS approaches and two skills in simulation. The results show that the learned dense rewards lead to better performance compared to sparse monitoring signals, and using an adaptive exploration lead to faster convergence with higher success rates and lower variance. The efficacy of the framework is validated in real-robot settings by improving three skills to complete success from complete failure using learned rewards where sparse rewards failed completely.

Read the full paper.

Previous News Next News

Related News

KUIS AI Center Open House Event – April 26th(Friday), 2024

Asst. Prof. Fatma Güney has received 2023-2024 Outstanding Research Award at Koç University

Prof. Çağatay Başdoğan elected as Principal Member of Science Academy

Assoc. Prof. Didem Unat awarded the Inno4Scale Grant

Asst. Prof. Fatma Güney Awarded ERC Starting Grant for Research Project “ENSURE”

Would you like to visit KUIS AI and see us in action? KUIS AI Open House Event on May 9th, 2023!