Add TPU support to BERT example #207

chenmoneygithub · 2022-05-25T20:57:19Z

No description provided.

mattdangerw

Looks good! Looking at the code more closely I think we don't need the flag yet, it is not adding anything.

mattdangerw · 2022-05-27T19:10:07Z

examples/bert/bert_train.py

@@ -386,6 +387,23 @@ def main(_):

    model_config = MODEL_CONFIGS[FLAGS.model_size]

+    if FLAGS.use_tpu:
+        if not tf.config.list_logical_devices("TPU"):


Looks like right now at least we don't need this flag. We could just do if not tf.config.list_logical_devices("TPU") and connect is we find any right? I don't think there's any use case where we find a TPU but don't want a TPUStrategy.

As discussed, we may still need some sort of flags to support multi-worker training, but let's add then when we need them. For this PR we don't.

ohh there might be? debugging on TPU is a bit subtle, so usually I test the code is runnable on CPU before turning on the tpu testing flag. But yea this is a minor case, for debugging I can bypass TPU with local changes. I will let you make the call!

Yeah let's remove flag for now. GPU will auto connect and need to be disabled manually, so I think it's reasonable if TPU behaves the same.

If the main use case to cover is "I want to force running on CPU," we could consider adding a mechanism that would works on both GPU and TPU machines. (Also maybe one already exists? Is there a CUDA_VISIBLE_DEVICES=-1 equivalent for TPU?)

Possibly TPU_VISIBLE_CHIPS=-1 is an equivalent, though I can't really tell if that would work.

sg! changed.

mattdangerw

lgtm. one nit

mattdangerw · 2022-05-27T21:14:24Z

examples/bert/bert_train.py

@@ -386,6 +385,18 @@ def main(_):

    model_config = MODEL_CONFIGS[FLAGS.model_size]

+    if tf.config.list_logical_devices("TPU"):
+        # Connect to TPU and create TPU strategy.
+        resolver = tf.distribute.cluster_resolver.TPUClusterResolver(


I think we can replace the next few lines with

resolver = tf.distribute.cluster_resolver.TPUClusterResolver.connect(tpu='local')

chenmoneygithub and others added 4 commits May 25, 2022 19:50

tpu

ee2053f

Add TPU support

7ccf2e0

Style fix

9e11f8a

small fix

df8a22b

chenmoneygithub requested a review from mattdangerw May 25, 2022 21:44

mattdangerw requested changes May 27, 2022

View reviewed changes

remove tpu flag

1c684bb

mattdangerw approved these changes May 27, 2022

View reviewed changes

nit

a4441e7

chenmoneygithub merged commit 76da28b into keras-team:master May 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TPU support to BERT example #207

Add TPU support to BERT example #207

chenmoneygithub commented May 25, 2022

mattdangerw left a comment

mattdangerw May 27, 2022

chenmoneygithub May 27, 2022

mattdangerw May 27, 2022 •

edited

Loading

mattdangerw May 27, 2022

chenmoneygithub May 27, 2022

mattdangerw left a comment

mattdangerw May 27, 2022

chenmoneygithub May 27, 2022

Add TPU support to BERT example #207

Add TPU support to BERT example #207

Conversation

chenmoneygithub commented May 25, 2022

mattdangerw left a comment

Choose a reason for hiding this comment

mattdangerw May 27, 2022

Choose a reason for hiding this comment

chenmoneygithub May 27, 2022

Choose a reason for hiding this comment

mattdangerw May 27, 2022 • edited Loading

Choose a reason for hiding this comment

mattdangerw May 27, 2022

Choose a reason for hiding this comment

chenmoneygithub May 27, 2022

Choose a reason for hiding this comment

mattdangerw left a comment

Choose a reason for hiding this comment

mattdangerw May 27, 2022

Choose a reason for hiding this comment

chenmoneygithub May 27, 2022

Choose a reason for hiding this comment

mattdangerw May 27, 2022 •

edited

Loading