[trainer] add tf32-mode control #14606

stas00 · 2021-12-03T00:19:05Z

This PR adds tr32-mode control support for HF Trainer for Ampere cards. RFC: #14450

pytorch had this mode on by default since pt-1.7, but are discussing to turn it off in the coming new release. pytorch/pytorch#67384

Here is the proposed logic:

by default HF Trainer will set it to Enabled, This is marked as experimental should we discover that this is not a safe default down the road.
and --tf32 0 will disable it.
If the setup uses a wrong gpu or too low torch version - it will silently do nothing as it's irrelevant.

The PR adds:

is_torch_tf32_available and require_torch_tf32 helper utils
adds basic test

Fixes: #14450

@sgugger, @LysandreJik

sgugger

As usual, let's operate on a strict no-breaking change rule. I understand PyTorch activates this feature by default when available (at least in some versions), so I would leave the default for --tf32 to None and let PyTorch decide when tf32=None.

Then obviously True and False will force activate/deactivate the feature.

sgugger · 2021-12-03T02:21:31Z

src/transformers/training_args.py

@@ -548,6 +552,12 @@ class TrainingArguments:
        default=False,
        metadata={"help": "Whether to use full float16 evaluation instead of 32-bit"},
    )
+    tf32: bool = field(
+        default=True,


The default should be whatever PyTorch has by default, so None here and the user can set it to True or False to force/unforce it.

I understand it's True for versions >= 1.7 and < 1.10 but False after?

ah, good idea! let the user decide!

True for versions >= 1.7 and < 1.11 and probably False after - the nightly is still True as of today.

sgugger · 2021-12-03T02:22:44Z

src/transformers/training_args.py

@@ -802,6 +812,9 @@ def __post_init__(self):
                "Mixed precision training with AMP or APEX (`--fp16` or `--bf16`) and half precision evaluation (`--fp16_full_eval` or `--bf16_full_eval`) can only be used on CUDA devices."
            )

+        if is_torch_available() and is_torch_tf32_available():
+            torch.backends.cuda.matmul.allow_tf32 = True if self.tf32 else False


So here we should only change that boolean if the value set was not None. If the value is True, there should be an error if is_torch_tf_32_available() is False so the user is not surprised if they don't get what they want.

thanks for this great feedback, Sylvain. Please have another look.

sgugger · 2021-12-03T02:23:22Z

tests/test_trainer.py

@@ -492,6 +493,15 @@ def test_mixed_bf16(self):

        # will add more specific tests once there are some bugs to fix

+    @require_torch_gpu
+    @require_torch_tf32


Out of curiosity, do we have a setup that has the right CUDA version an GPU capabilities?

I have rtx-3090 if that's what you ask.

Running benchmarks now - will post those shortly.

I was wondering for our testing machines on the automatic CI :-)

one day we will have those newer gpus.

stas00 · 2021-12-03T05:57:54Z

The benchmarks are terrible: #14608
going to ask for advice from the pytorch experts

sgugger

Thanks a lot for the update, just left some styling nits but it's great!

docs/source/performance.md

src/transformers/file_utils.py

src/transformers/training_args.py

sgugger · 2021-12-03T16:45:00Z

tests/test_trainer.py

@@ -492,6 +493,15 @@ def test_mixed_bf16(self):

        # will add more specific tests once there are some bugs to fix

+    @require_torch_gpu
+    @require_torch_tf32


I was wondering for our testing machines on the automatic CI :-)

Co-authored-by: Sylvain Gugger <[email protected]>

stas00 added 4 commits December 2, 2021 16:17

[trainer] add --tf32 support

c8edc51

it's pt>=.17

20ac698

it's pt>=.17

db1f98f

flip the default to True

9e8a7e6

stas00 changed the title ~~[trainer] add --tf32 support~~ [trainer] add tf32-mode control Dec 3, 2021

stas00 added 3 commits December 2, 2021 17:06

add experimental note

0f630cb

simplify logic

52ced77

style

b707c81

sgugger reviewed Dec 3, 2021

View reviewed changes

stas00 added 2 commits December 2, 2021 19:05

switch to 3-state logic

7851ee8

doc

887b116

stas00 mentioned this pull request Dec 3, 2021

[Benchmark] HF Trainer on RTX-3090 #14608

Open

sgugger approved these changes Dec 3, 2021

View reviewed changes

stas00 and others added 2 commits December 3, 2021 09:12

Apply suggestions from code review

6f0a359

Co-authored-by: Sylvain Gugger <[email protected]>

re-style code

f85a51e

stas00 merged commit 71b1bf7 into huggingface:master Dec 3, 2021

stas00 deleted the trainer-tf32 branch December 3, 2021 18:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[trainer] add tf32-mode control #14606

[trainer] add tf32-mode control #14606

stas00 commented Dec 3, 2021 •

edited

Loading

sgugger left a comment

sgugger Dec 3, 2021

stas00 Dec 3, 2021

stas00 Dec 3, 2021

sgugger Dec 3, 2021

stas00 Dec 3, 2021

sgugger Dec 3, 2021

stas00 Dec 3, 2021

sgugger Dec 3, 2021

stas00 Dec 3, 2021

stas00 commented Dec 3, 2021 •

edited

Loading

sgugger left a comment

sgugger Dec 3, 2021

[trainer] add tf32-mode control #14606

[trainer] add tf32-mode control #14606

Conversation

stas00 commented Dec 3, 2021 • edited Loading

sgugger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stas00 commented Dec 3, 2021 • edited Loading

sgugger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stas00 commented Dec 3, 2021 •

edited

Loading

stas00 commented Dec 3, 2021 •

edited

Loading