Reward Learning From Very Few Demonstrations December 10, 2020
This article introduces a novel skill learning framework that learns rewards from very few demonstrations and uses them in policy search (PS) to improve the skill. The demonstrations are used to learn a parameterized policy to execute the skill and a goal model to monitor executions. The rewards are learned from the goal model structure and its monitoring capability. Then, the goal model and the rewards are merged to obtain execution returns to be used with PS for improving the policy. In addition to reward learning, a black box PS method with an adaptive exploration strategy is adopted. The resulting framework is evaluated with five PS approaches and two skills in simulation. The results show that the learned dense rewards lead to better performance compared to sparse monitoring signals, and using an adaptive exploration lead to faster convergence with higher success rates and lower variance. The efficacy of the framework is validated in real-robot settings by improving three skills to complete success from complete failure using learned rewards where sparse rewards failed completely.
Cookies are used to personalize content and ads, to provide social media features and to analyze our traffic. You can accept all cookies by selecting "Allow all" or you can edit the settings by selecting "Customize your cookie settings".
When you visit a website, information is stored in your browser, mostly in the form of cookies. This information may be about you, your preferences or your device, and is often used to make the site work as you expect it to. The information does not usually identify you directly, it is meant to provide you with a more personalized web experience. You can choose not to allow some cookies. Click on the different category headings to learn more and change our default settings. Click here to view our cookie policy.
These cookies are necessary for the website to function and cannot be turned off in our systems.
Statistical Cookies
These cookies are used to provide insight into how we can improve our service to all our users and to understand how you interact with our website as an anonymous user.
Targeting Cookies
These cookies are used to create your profile and provide ads relevant to your interests. It is also used to limit the number of times you see an ad, as well as help measure the effectiveness of the ad campaign.