Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix when TPU device check is ran #469

Merged
merged 11 commits into from
Jun 24, 2022
Merged

Fix when TPU device check is ran #469

merged 11 commits into from
Jun 24, 2022

Conversation

muellerzr
Copy link
Collaborator

This PR fixes an issue where xm.xla_device() can't be called outside of xm.spawn. As a result the current behavior for is_tpu_available breaks the notebook launcher.

The proposed fix is to check this in state directly outside the if chain so that checking if on a TPU device can be checked properly still.

@muellerzr muellerzr added bug Something isn't working TPU Bug or feature on TPU platforms labels Jun 24, 2022
@muellerzr muellerzr requested a review from sgugger June 24, 2022 14:58
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 24, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can maybe add an argument to the is_tpu_available function to no check for a tpu device sometimes, but we can't remove the device test entirely as we added it for a reason.

Comment on lines -41 to -46
try:
# Will raise a RuntimeError if no XLA configuration is found
_ = xm.xla_device()
_tpu_available = True
except RuntimeError:
_tpu_available = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm, removing this will break the other places we use is_tpu_available

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added reason was specifically inside AcceleratorState, at the proposed location. But having it as an argument instead works as well. Will refactor

@@ -56,8 +51,15 @@ def is_apex_available():
return importlib.util.find_spec("apex") is not None


def is_tpu_available():
"Checks if `torch_xla` is installed and if a TPU is in the environment"
def is_tpu_available(check_device=False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be True by default, and False when we don't want to check for the device (before launching multiprocessing for instance).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to be False by default, because otherwise it also does this on import checks that are scattered around the library.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or made into a separate function if we don't want False behavior. (I know we're not fans of that, but this is one case where it should be False)

@muellerzr muellerzr requested a review from sgugger June 24, 2022 16:04
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating!

@muellerzr muellerzr merged commit 9d8ed50 into main Jun 24, 2022
@muellerzr muellerzr deleted the tpu-device branch June 24, 2022 16:07
@anw90 anw90 mentioned this pull request Dec 4, 2023
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working TPU Bug or feature on TPU platforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants