Log ddpo reward as float to fix numpy conversion during bf16 training #1391

skavulya · 2024-03-01T20:28:53Z

Fixes TypeError: Got unsupported ScalarType BFloat16 when logging the reward during bf16 finetuning with ddpo

younesbelkada

Thank you for the fix!

HuggingFaceDocBuilderDev · 2024-03-04T01:41:14Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…huggingface#1391)

Log ddpo reward as float to fix numpy conversion during bf16 training

abcd888

skavulya mentioned this pull request Mar 1, 2024

DDPO finetuning without LoRA fails with upscaling error #1330

Closed

younesbelkada approved these changes Mar 4, 2024

View reviewed changes

younesbelkada merged commit 3bd0238 into huggingface:main Mar 4, 2024
9 checks passed

kashif pushed a commit to fe1ixxu/trl that referenced this pull request Mar 15, 2024

Log ddpo reward as float to fix numpy conversion during bf16 training (…

d3fc9b6

…huggingface#1391)

lapp0 pushed a commit to lapp0/trl that referenced this pull request May 10, 2024

Log ddpo reward as float to fix numpy conversion during bf16 training (…

54a5b6c

…huggingface#1391)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log ddpo reward as float to fix numpy conversion during bf16 training #1391

Log ddpo reward as float to fix numpy conversion during bf16 training #1391

skavulya commented Mar 1, 2024

younesbelkada left a comment

HuggingFaceDocBuilderDev commented Mar 4, 2024

Log ddpo reward as float to fix numpy conversion during bf16 training #1391

Log ddpo reward as float to fix numpy conversion during bf16 training #1391

Conversation

skavulya commented Mar 1, 2024

younesbelkada left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Mar 4, 2024