LEARNING FROM DEMONSTRATIONS WITH SACR2: SOFT ACTOR-CRITIC WITH REWARD RELABELING

Jesus Bujalance; Raphael Chekroun; Fabien Moutarde

Communication Dans Un Congrès Année : 2021

LEARNING FROM DEMONSTRATIONS WITH SACR2: SOFT ACTOR-CRITIC WITH REWARD RELABELING

(1) , (1) , (1)

Jesus Bujalance

Fonction : Auteur

Centre de Robotique

Raphael Chekroun

Fonction : Auteur

Centre de Robotique

Fabien Moutarde

Fonction : Auteur
PersonId : 905
IdHAL : fabien-moutarde
ORCID : 0000-0003-4799-7285
IdRef : 188916172

Centre de Robotique

Résumé

During recent years, deep reinforcement learning (DRL) has made successful incursions into complex decision-making applications such as robotics, autonomous driving or video games. Off-policy algorithms tend to be more sample-efficient than their on-policy counterparts, and can additionally benefit from any off-policy data stored in the replay buffer. Expert demonstrations are a popular source for such data: the agent is exposed to successful states and actions early on, which can accelerate the learning process and improve performance. In the past, multiple ideas have been proposed to make good use of the demonstrations in the buffer, such as pretraining on demonstrations only or minimizing additional cost functions. We carry on a study to evaluate several of these ideas in isolation, to see which of them have the most significant impact. We also present a new method for sparse-reward tasks, based on a reward bonus given to demonstrations and successful episodes. First, we give a reward bonus to the transitions coming from demonstrations to encourage the agent to match the demonstrated behaviour. Then, upon collecting a successful episode, we relabel its transitions with the same bonus before adding them to the replay buffer, encouraging the agent to also match its previous successes. The base algorithm for our experiments is the popular Soft Actor-Critic (SAC), a state-of-the-art off-policy algorithm for continuous action spaces. Our experiments focus on manipulation robotics, specifically on a 3D reaching task for a robotic arm in simulation. We show that our method SACR2 based on reward relabeling improves the performance on this task, even in the absence of demonstrations.

Domaines

Apprentissage [cs.LG] Automatique / Robotique Intelligence artificielle [cs.AI]

Fichier principal

sac-R2_paper_NeurIPS-DRLworkshop_FINAL.pdf (4.37 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Fabien Moutarde : Connectez-vous pour contacter le contributeur

https://minesparis-psl.hal.science/hal-03519790

Soumis le : lundi 10 janvier 2022-16:22:57

Dernière modification le : vendredi 19 avril 2024-16:18:56

Archivage à long terme le : mardi 12 avril 2022-00:34:29

Dates et versions

hal-03519790 , version 1 (10-01-2022)

Identifiants

HAL Id : hal-03519790 , version 1

Citer

Jesus Bujalance, Raphael Chekroun, Fabien Moutarde. LEARNING FROM DEMONSTRATIONS WITH SACR2: SOFT ACTOR-CRITIC WITH REWARD RELABELING. 'Deep Reinforcement Learning' workshop of the 35th Conference on Neural Information Processing Systems (NeurIPS'2021), Dec 2021, Virtual, United States. ⟨hal-03519790⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM ENSMP ENSMP_CAOR PARISTECH TDS-MACS PSL ENSMP_DR

79 Consultations

157 Téléchargements

LEARNING FROM DEMONSTRATIONS WITH SACR2: SOFT ACTOR-CRITIC WITH REWARD RELABELING

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager