Fix DDP logging #2826
Labels
bug
Something isn't working
distributed
Generic distributed-related topic
priority: 0
High priority task
Milestone
Add a global_zero_only=true flag, if false- create individual files, prefixed with machine nun
Write a logging callback that will do map reduce
Can we do this in the metrics?
Aggregate all tensors on global zero first
(might run into memory issues)
Gather each output individually in CPU memory
We want to preserve the fact that logging is at 0
The text was updated successfully, but these errors were encountered: