fix 8-bit multi-gpu training bug #1353

fancyerii · 2024-02-22T13:16:14Z

younesbelkada

Thanks very much @fancyerii ! I left one comment about making gradient_checkpointing_kwargs configurable, I think after that we can ship your PR ! 🚀

younesbelkada · 2024-02-22T14:08:05Z

examples/research_projects/stack_llama_2/scripts/dpo_llama2.py

+    device_map_local_process_idx: Optional[bool] = field(
+        default=True, metadata={"help": "whether to device map model to local process index, see "
+                                "https://github.com/huggingface/trl/issues/1348"}
+    )


Suggested change

device_map_local_process_idx: Optional[bool] = field(

default=True, metadata={"help": "whether to device map model to local process index, see "

"https://github.com/huggingface/trl/issues/1348"}

)

I think we can remove that and always create device_map = "device_map": {"": Accelerator().local_process_index}

Can you just make gradient_checkpointing_kwargs configurable here? 🙏

make gradient_checkpointing_kwargs configurable.

remote unnecessary config of device_map

younesbelkada · 2024-02-22T15:08:22Z

Thanks @fancyerii !
Could you run the styling checks? make precommit

HuggingFaceDocBuilderDev · 2024-02-22T15:11:23Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

fancyerii · 2024-02-22T20:23:46Z

Thanks @fancyerii ! Could you run the styling checks? make precommit

done.
btw, when I run make test, there are many warnings about "wait for wandb.init"

younesbelkada

Thanks again !

* fix 8-bit multi-gpu training bug see huggingface#1348 * Update dpo_llama2.py make gradient_checkpointing_kwargs configurable. * Update dpo_llama2.py remote unnecessary config of device_map * format with make precommit --------- Co-authored-by: ubuntu <[email protected]>

fix 8-bit multi-gpu training bug see huggingface#1348

a2098c7

fancyerii mentioned this pull request Feb 22, 2024

Update dpo_llama2.py to fix 8-bit multi-gpu training bug #1352

Closed

younesbelkada reviewed Feb 22, 2024

View reviewed changes

fancyerii added 2 commits February 22, 2024 22:55

Update dpo_llama2.py

75fdf20

make gradient_checkpointing_kwargs configurable.

Update dpo_llama2.py

dcffdc3

remote unnecessary config of device_map

format with make precommit

64e23f5

younesbelkada approved these changes Feb 23, 2024

View reviewed changes

younesbelkada merged commit ca90cba into huggingface:main Feb 23, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix 8-bit multi-gpu training bug #1353

fix 8-bit multi-gpu training bug #1353

fancyerii commented Feb 22, 2024

younesbelkada left a comment

younesbelkada Feb 22, 2024

fancyerii Feb 22, 2024

younesbelkada commented Feb 22, 2024

HuggingFaceDocBuilderDev commented Feb 22, 2024

fancyerii commented Feb 22, 2024

younesbelkada left a comment

fix 8-bit multi-gpu training bug #1353

fix 8-bit multi-gpu training bug #1353

Conversation

fancyerii commented Feb 22, 2024

younesbelkada left a comment

Choose a reason for hiding this comment

younesbelkada Feb 22, 2024

Choose a reason for hiding this comment

fancyerii Feb 22, 2024

Choose a reason for hiding this comment

younesbelkada commented Feb 22, 2024

HuggingFaceDocBuilderDev commented Feb 22, 2024

fancyerii commented Feb 22, 2024

younesbelkada left a comment

Choose a reason for hiding this comment