Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trainer(accelerator="tpu") should raise exception if TPU not found #12047

Closed
akihironitta opened this issue Feb 22, 2022 · 5 comments
Closed
Labels
accelerator: cuda Compute Unified Device Architecture GPU accelerator: tpu Tensor Processing Unit bug Something isn't working

Comments

@akihironitta
Copy link
Contributor

akihironitta commented Feb 22, 2022

🐛 Bug

In an environment without TPU, Trainer(accelerator="tpu") should raise an exception just like Trainer(tpu_cores=8) does.

To Reproduce

>>> from pytorch_lightning import Trainer
>>> Trainer(tpu_cores=8)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/nitta/work/github.com/PyTorchLightning/pytorch-lightning/pytorch_lightning/utilities/argparse.py", line 336, in insert_env_defaults
    return fn(self, **kwargs)
  File "/Users/nitta/work/github.com/PyTorchLightning/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 472, in __init__
    gpu_ids, tpu_cores = self._parse_devices(gpus, auto_select_gpus, tpu_cores)
  File "/Users/nitta/work/github.com/PyTorchLightning/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 1741, in _parse_devices
    return device_parser._parse_devices(gpus, auto_select_gpus, tpu_cores)
  File "/Users/nitta/work/github.com/PyTorchLightning/pytorch-lightning/pytorch_lightning/utilities/device_parser.py", line 63, in _parse_devices
    tpu_cores = parse_tpu_cores(tpu_cores)
  File "/Users/nitta/work/github.com/PyTorchLightning/pytorch-lightning/pytorch_lightning/utilities/device_parser.py", line 135, in parse_tpu_cores
    raise MisconfigurationException("No TPU devices were found.")
pytorch_lightning.utilities.exceptions.MisconfigurationException: No TPU devices were found.
>>> Trainer(accelerator="tpu", devices=8)
/Users/nitta/work/github.com/PyTorchLightning/pytorch-lightning/pytorch_lightning/loops/utilities.py:90: PossibleUserWarning: `max_epochs` was not set. Setting it to 1000 epochs. To train without an epoch limit, set `max_epochs=-1`.
  rank_zero_warn(
GPU available: False, used: False
TPU available: False, using: 8 TPU cores
IPU available: False, using: 0 IPUs
<pytorch_lightning.trainer.trainer.Trainer object at 0x10569dc70>

Expected behavior

See title.

Environment

  • PyTorch Lightning Version (e.g., 1.5.0): master
  • PyTorch Version (e.g., 1.10):
  • Python version (e.g., 3.9):
  • OS (e.g., Linux):
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • How you installed PyTorch (conda, pip, source):
  • If compiling from source, the output of torch.__config__.show():
  • Any other relevant information:

Additional context

  • Trainer(accelerator="gpu", devices=8) expectedly raises an exception if no gpus are found.
  • Blocking #11470

cc @kaushikb11 @rohitgr7 @justusschock @awaelchli @akihironitta

@akihironitta akihironitta added bug Something isn't working accelerator: tpu Tensor Processing Unit labels Feb 22, 2022
@rohitgr7
Copy link
Contributor

dup #12044?

@akihironitta
Copy link
Contributor Author

Just for a follow-up note: Seems like this issue happens with ddp, too:

# if run without any GPUs
Trainer(accelerator="gpu", devices=1, strategy="ddp")  # raises exception
Trainer(accelerator="gpu", devices=[0], strategy="ddp")  # raises no exception

@akihironitta akihironitta added the accelerator: cuda Compute Unified Device Architecture GPU label Mar 2, 2022
@rohitgr7
Copy link
Contributor

rohitgr7 commented Mar 2, 2022

can you try on master?? this PR got merged recently: #12104

@kaushikb11
Copy link
Contributor

@akihironitta This has been resolved on master. This is the following error

pytorch_lightning.utilities.exceptions.MisconfigurationException: TPUAccelerator can not run on your system since TPUs are not available. The following accelerator(s) is available and can be passed into `accelerator` argument of `Trainer`: ['cpu'].

@akihironitta
Copy link
Contributor Author

Confirmed! Thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accelerator: cuda Compute Unified Device Architecture GPU accelerator: tpu Tensor Processing Unit bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants