Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add more detailed logging for fp16 diverging
Summary: We often get a generic "minimum loss scale reached" when fp16 training diverges. Would be useful to have a breakdown on where exactly the gradient norm becomes too big. Reviewed By: myleott Differential Revision: D23297774 fbshipit-source-id: 69da1cca1be22f15af633f8efe4e7b491cf4f6f9
- Loading branch information