Training techniques for DDPG model. #4

sfiorilla · 2021-12-31T07:46:02Z

sfiorilla
Dec 31, 2021

What training techniques were used to fit the DDPG model present in the repository ?

Mainly, I didn't understand how the actor and critic networks were used and combined during the training and why two instances were created for each of them.

Thanks in advance,
Salvatore.

Answered by asperti

Dec 31, 2021

The question is a bit too general. You should really consult some DDPG tutorial first. Anyway, you have two networks: the "actor", that implements a policy returning a probability distributions over actions parametrized by the current state, and the "critic" that evaluates the choice of the actor, in the case of DDPG via a Q-value function. The critic is trained with the usual Bellman equation (see the slides); the actor is trained to maximize the cumulative expected reward. The expected reward under the actor policy is just computed by composing the critic with the actor, so the training objective of the actor is to maximize the output of the composition of the two networks (freezing val…

View full answer

asperti · 2021-12-31T09:11:08Z

asperti
Dec 31, 2021
Maintainer

The question is a bit too general. You should really consult some DDPG tutorial first. Anyway, you have two networks: the "actor", that implements a policy returning a probability distributions over actions parametrized by the current state, and the "critic" that evaluates the choice of the actor, in the case of DDPG via a Q-value function. The critic is trained with the usual Bellman equation (see the slides); the actor is trained to maximize the cumulative expected reward. The expected reward under the actor policy is just computed by composing the critic with the actor, so the training objective of the actor is to maximize the output of the composition of the two networks (freezing values of the critic).
This is clear in the training part of the code. In my implementation, to avoid to freeze and unfreeze the parameters of the critic, to train the actor I always use the stable copy of the critic used to implement the "fixed Q-targets" technique.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training techniques for DDPG model. #4

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Training techniques for DDPG model. #4

sfiorilla Dec 31, 2021

Replies: 1 comment

asperti Dec 31, 2021 Maintainer

sfiorilla
Dec 31, 2021

asperti
Dec 31, 2021
Maintainer