-
Notifications
You must be signed in to change notification settings - Fork 109
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
full resnet50 precision(bf16+amp) (#253)
* full resnet50 * add ieee754 * add ieee754
- Loading branch information
Showing
4 changed files
with
13 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,6 +8,7 @@ | |
- 加速卡型号: NVIDIA_A100-SXM4-40GB | ||
- CPU型号: AMD [email protected] | ||
- 多机网络类型、带宽: InfiniBand,200Gb/s | ||
|
||
- ##### 软件环境 | ||
- OS版本:Ubuntu 20.04 | ||
- OS kernel版本: 5.4.0-113-generic | ||
|
@@ -16,6 +17,10 @@ | |
- 训练框架版本:pytorch-1.8.0a0+52ea372 | ||
- 依赖软件版本: | ||
- cuda: 11.4 | ||
|
||
- 数据格式 | ||
|
||
- 在NVIDIA DGX A100(40G)硬件上,16bit浮点数(fp16)可以使用IEEE 754 fp16或bf16格式实现。在resnet50测试样例中,bf16的性能、准确度更高。因此采用bf16格式作为fp16的实现,实行在16bit训练中 | ||
|
||
### 运行情况 | ||
|
||
|
@@ -45,5 +50,6 @@ | |
| A100单机8卡(1x8) | fp32 | bs=256,lr=0.8 | 22653 | 5663 | 5866 | 6105 | 73.5% | 28.3/40.0 | | ||
| A100单机单卡(1x1) | fp32 | bs=256,lr=0.8 | | 782 | 795 | 799 | | 27.6/40.0 | | ||
| A100两机8卡(2x8) | fp32 | bs=256,lr=0.8 | | 10576 | 11085 | 11874 | | 27.9/40.0 | | ||
|
||
| A100单机8卡(1x8) | amp | bs=512,lr=0.2 | 15312 | 7544 | 7901 | 9567 | 72.7% | 28.6/40.0 | | ||
| A100单机8卡(1x8) | bf16 | bs=512,lr=0.2 | 14082 | 8203 | 8550 | 9818 | 64.0% | 28.6/40.0 | | ||
|