Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import error on shutdown/KeyboardInterrupt if ran from Jupyter Lab notebook cell #20317

Open
asigalov61 opened this issue Oct 3, 2024 · 5 comments
Labels
bug Something isn't working needs triage Waiting to be triaged by maintainers ver: 2.4.x

Comments

@asigalov61
Copy link

Bug description

Import error on shutdown/KeyboardInterrupt if ran from Jupyter Lab notebook cell. If ran from script everything works fine.

What version are you seeing the problem on?

v2.4

How to reproduce the bug

Run trainer.fit from a Jupyter notebook cell, then click stop in Jupyter notebook.


print("---start train---")
trainer.fit(model, train_dataloader, ckpt_path=ckpt_path)

Error messages and logs

Detected KeyboardInterrupt, attempting graceful shutdown ...
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
~/.local/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
     45         if trainer.strategy.launcher is not None:
---> 46             return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
     47         return trainer_fn(*args, **kwargs)

~/.local/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/multiprocessing.py in launch(self, function, trainer, *args, **kwargs)
    143         self.procs = process_context.processes
--> 144         while not process_context.join():
    145             pass

~/.local/lib/python3.10/site-packages/torch/multiprocessing/spawn.py in join(self, timeout)
    117         # Wait for any process to fail or all of them to succeed.
--> 118         ready = multiprocessing.connection.wait(
    119             self.sentinels.keys(),

/usr/lib/python3.10/multiprocessing/connection.py in wait(object_list, timeout)
    930             while True:
--> 931                 ready = selector.select(timeout)
    932                 if ready:

/usr/lib/python3.10/selectors.py in select(self, timeout)
    415         try:
--> 416             fd_event_list = self._selector.poll(timeout)
    417         except InterruptedError:

KeyboardInterrupt: 

During handling of the above exception, another exception occurred:

NameError                                 Traceback (most recent call last)
/tmp/ipykernel_2824/3752444865.py in <module>
    189     ckpt_path = None
    190 print("---start train---")
--> 191 trainer.fit(model, train_dataloader, ckpt_path=ckpt_path)

~/.local/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py in fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    536         self.state.status = TrainerStatus.RUNNING
    537         self.training = True
--> 538         call._call_and_handle_interrupt(
    539             self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
    540         )

~/.local/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
     62         if isinstance(launcher, _SubprocessScriptLauncher):
     63             launcher.kill(_get_sigkill_signal())
---> 64         exit(1)
     65 
     66     except BaseException as exception:

NameError: name 'exit' is not defined

Environment

Current environment
  • CUDA:
    • GPU:
      • NVIDIA A100-SXM4-40GB
    • available: True
    • version: 12.1
  • Lightning:
    • lightning: 2.4.0
    • lightning-utilities: 0.11.7
    • pytorch-lightning: 2.4.0
    • torch: 2.4.1
    • torch-summary: 1.4.5
    • torchmetrics: 1.4.2
    • torchvision: 0.15.2
  • Packages:
    • absl-py: 0.15.0
    • aiohappyeyeballs: 2.4.3
    • aiohttp: 3.10.8
    • aiosignal: 1.3.1
    • aiosqlite: 0.19.0
    • annotated-types: 0.6.0
    • anyio: 4.1.0
    • appdirs: 1.4.4
    • argon2-cffi: 21.1.0
    • arrow: 1.3.0
    • astunparse: 1.6.3
    • async-lru: 2.0.4
    • async-timeout: 4.0.3
    • attrs: 23.1.0
    • automat: 20.2.0
    • babel: 2.13.1
    • backcall: 0.2.0
    • bcrypt: 3.2.0
    • beautifulsoup4: 4.10.0
    • beniget: 0.4.1
    • bleach: 4.1.0
    • blinker: 1.4
    • bottle: 0.12.19
    • bottleneck: 1.3.2
    • brotli: 1.0.9
    • cachetools: 5.0.0
    • certifi: 2020.6.20
    • cffi: 1.15.0
    • chardet: 4.0.0
    • charset-normalizer: 3.3.2
    • click: 8.0.3
    • cloud-init: 23.3.3
    • colorama: 0.4.4
    • comm: 0.2.0
    • command-not-found: 0.3
    • configobj: 5.0.6
    • constantly: 15.1.0
    • cryptography: 3.4.8
    • ctop: 1.0.0
    • cycler: 0.11.0
    • dacite: 1.8.1
    • dbus-python: 1.2.18
    • debugpy: 1.8.0
    • decorator: 4.4.2
    • defusedxml: 0.7.1
    • distlib: 0.3.4
    • distro: 1.7.0
    • distro-info: 1.1+ubuntu0.1
    • docker: 5.0.3
    • entrypoints: 0.4
    • et-xmlfile: 1.0.1
    • exceptiongroup: 1.2.0
    • fastjsonschema: 2.19.0
    • filelock: 3.6.0
    • flake8: 4.0.1
    • flatbuffers: 1.12.1-git20200711.33e2d80-dfsg1-0.6
    • fonttools: 4.29.1
    • fqdn: 1.5.1
    • frozenlist: 1.4.1
    • fs: 2.4.12
    • fsspec: 2024.9.0
    • future: 0.18.2
    • gast: 0.5.2
    • glances: 3.2.4.2
    • google-auth: 1.5.1
    • google-auth-oauthlib: 0.4.2
    • google-pasta: 0.2.0
    • grpcio: 1.30.2
    • h5py: 3.6.0
    • h5py.-debian-h5py-serial: 3.6.0
    • html5lib: 1.1
    • htmlmin: 0.1.12
    • httplib2: 0.20.2
    • huggingface-hub: 0.25.1
    • hyperlink: 21.0.0
    • icdiff: 2.0.4
    • idna: 3.3
    • imagehash: 4.3.1
    • importlib-metadata: 4.6.4
    • incremental: 21.3.0
    • influxdb: 5.3.1
    • iniconfig: 1.1.1
    • iotop: 0.6
    • ipykernel: 6.7.0
    • ipython: 7.31.1
    • ipython-genutils: 0.2.0
    • ipywidgets: 8.1.1
    • isoduration: 20.11.0
    • jax: 0.4.14
    • jaxlib: 0.4.14
    • jdcal: 1.0
    • jedi: 0.18.0
    • jeepney: 0.7.1
    • jinja2: 3.0.3
    • joblib: 0.17.0
    • json5: 0.9.14
    • jsonpatch: 1.32
    • jsonpointer: 2.0
    • jsonschema: 4.20.0
    • jsonschema-specifications: 2023.11.2
    • jupyter-client: 8.6.0
    • jupyter-console: 6.4.0
    • jupyter-core: 5.5.0
    • jupyter-events: 0.9.0
    • jupyter-lsp: 2.2.1
    • jupyter-server: 2.12.0
    • jupyter-server-fileid: 0.9.0
    • jupyter-server-terminals: 0.4.4
    • jupyter-ydoc: 1.1.1
    • jupyterlab: 4.0.9
    • jupyterlab-pygments: 0.1.2
    • jupyterlab-server: 2.25.2
    • jupyterlab-widgets: 3.0.9
    • kaptan: 0.5.12
    • keras: 2.13.1
    • keyring: 23.5.0
    • kiwisolver: 1.3.2
    • launchpadlib: 1.10.16
    • lazr.restfulclient: 0.14.4
    • lazr.uri: 1.0.6
    • libtmux: 0.10.1
    • lightning: 2.4.0
    • lightning-utilities: 0.11.7
    • llvmlite: 0.41.1
    • lxml: 4.8.0
    • lz4: 3.1.3+dfsg
    • markdown: 3.3.6
    • markupsafe: 2.0.1
    • matplotlib: 3.5.1
    • matplotlib-inline: 0.1.3
    • mccabe: 0.6.1
    • mistune: 3.0.2
    • ml-dtypes: 0.2.0
    • more-itertools: 8.10.0
    • mpmath: 0.0.0
    • msgpack: 1.0.3
    • multidict: 6.1.0
    • multimethod: 1.10
    • nbclient: 0.5.6
    • nbconvert: 7.12.0
    • nbformat: 5.9.2
    • nest-asyncio: 1.5.4
    • netifaces: 0.11.0
    • networkx: 2.4
    • nose: 1.3.7
    • notebook: 6.4.8
    • notebook-shim: 0.2.3
    • numba: 0.58.1
    • numexpr: 2.8.1
    • numpy: 1.25.2
    • nvidia-cublas-cu12: 12.1.3.1
    • nvidia-cuda-cupti-cu12: 12.1.105
    • nvidia-cuda-nvrtc-cu12: 12.1.105
    • nvidia-cuda-runtime-cu12: 12.1.105
    • nvidia-cudnn-cu12: 9.1.0.70
    • nvidia-cufft-cu12: 11.0.2.54
    • nvidia-curand-cu12: 10.3.2.106
    • nvidia-cusolver-cu12: 11.4.5.107
    • nvidia-cusparse-cu12: 12.1.0.106
    • nvidia-ml-py3: 7.352.0
    • nvidia-nccl-cu12: 2.20.5
    • nvidia-nvjitlink-cu12: 12.6.77
    • nvidia-nvtx-cu12: 12.1.105
    • oauthlib: 3.2.0
    • odfpy: 1.4.2
    • olefile: 0.46
    • openpyxl: 3.0.9
    • opt-einsum: 3.3.0
    • overrides: 7.4.0
    • packaging: 21.3
    • pandas: 1.3.5
    • pandas-profiling: 3.6.6
    • pandocfilters: 1.5.0
    • parso: 0.8.1
    • patsy: 0.5.4
    • pexpect: 4.8.0
    • phik: 0.12.3
    • pickleshare: 0.7.5
    • pillow: 9.0.1
    • pip: 23.3.1
    • platformdirs: 2.5.1
    • pluggy: 0.13.0
    • ply: 3.11
    • prometheus-client: 0.9.0
    • prompt-toolkit: 3.0.28
    • protobuf: 4.21.12
    • psutil: 5.9.0
    • ptyprocess: 0.7.0
    • py: 1.10.0
    • pyasn1: 0.4.8
    • pyasn1-modules: 0.2.1
    • pycodestyle: 2.8.0
    • pycparser: 2.21
    • pycryptodomex: 3.11.0
    • pydantic: 2.5.2
    • pydantic-core: 2.14.5
    • pyflakes: 2.4.0
    • pygments: 2.11.2
    • pygobject: 3.42.1
    • pyhamcrest: 2.0.2
    • pyinotify: 0.9.6
    • pyjwt: 2.3.0
    • pyopenssl: 21.0.0
    • pyparsing: 2.4.7
    • pyrsistent: 0.18.1
    • pyserial: 3.5
    • pysmi: 0.3.2
    • pysnmp: 4.4.12
    • pystache: 0.6.0
    • pytest: 6.2.5
    • python-apt: 2.4.0+ubuntu2
    • python-dateutil: 2.8.2
    • python-debian: 0.1.43+ubuntu1.1
    • python-json-logger: 2.0.7
    • python-magic: 0.4.24
    • pythran: 0.10.0
    • pytorch-lightning: 2.4.0
    • pytz: 2022.1
    • pywavelets: 1.5.0
    • pyyaml: 5.4.1
    • pyzmq: 25.1.2
    • referencing: 0.31.1
    • regex: 2024.9.11
    • requests: 2.31.0
    • requests-oauthlib: 1.3.0
    • rfc3339-validator: 0.1.4
    • rfc3986-validator: 0.1.1
    • rpds-py: 0.13.2
    • rsa: 4.8
    • safetensors: 0.4.5
    • scikit-learn: 0.23.2
    • scipy: 1.8.0
    • seaborn: 0.12.2
    • secretstorage: 3.3.1
    • send2trash: 1.8.2
    • service-identity: 18.1.0
    • setuptools: 59.6.0
    • simplejson: 3.17.6
    • six: 1.16.0
    • sniffio: 1.3.0
    • sos: 4.5.6
    • soupsieve: 2.3.1
    • ssh-import-id: 5.11
    • statsmodels: 0.14.0
    • sympy: 1.9
    • systemd-python: 234
    • tables: 3.7.0
    • tangled-up-in-unicode: 0.2.0
    • tensorboard: 2.13.0
    • tensorflow: 2.13.1
    • tensorflow-estimator: 2.13.0
    • termcolor: 1.1.0
    • terminado: 0.13.1
    • testpath: 0.5.0
    • threadpoolctl: 3.1.0
    • tinycss2: 1.2.1
    • tmuxp: 1.9.2
    • tokenizers: 0.20.0
    • toml: 0.10.2
    • tomli: 2.0.1
    • torch: 2.4.1
    • torch-summary: 1.4.5
    • torchmetrics: 1.4.2
    • torchvision: 0.15.2
    • tornado: 6.4
    • tqdm: 4.66.1
    • traitlets: 5.14.0
    • transformers: 4.45.1
    • triton: 3.0.0
    • twisted: 22.1.0
    • typeguard: 4.1.5
    • types-python-dateutil: 2.8.19.14
    • typing-extensions: 4.8.0
    • ubuntu-advantage-tools: 8001
    • ufolib2: 0.13.1
    • ufw: 0.36.1
    • unattended-upgrades: 0.1
    • unicodedata2: 14.0.0
    • uri-template: 1.3.0
    • urllib3: 1.26.5
    • virtualenv: 20.13.0+ds
    • visions: 0.7.5
    • wadllib: 1.3.6
    • wcwidth: 0.2.5
    • webcolors: 1.13
    • webencodings: 0.5.1
    • websocket-client: 1.2.3
    • werkzeug: 2.0.2
    • wheel: 0.37.1
    • widgetsnbextension: 4.0.9
    • wordcloud: 1.9.2
    • wrapt: 1.13.3
    • xlwt: 1.3.0
    • y-py: 0.6.2
    • yarl: 1.13.1
    • ydata-profiling: 4.6.3
    • ypy-websocket: 0.12.4
    • zipp: 1.0.0
    • zope.interface: 5.4.0
  • System:
    • OS: Linux
    • architecture:
      • 64bit
      • ELF
    • processor: x86_64
    • python: 3.10.12
    • release: 6.2.0-37-generic
    • version: Fixed typo in single_cpu_template #38~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 2 18:01:13 UTC 2
#- PyTorch Lightning Version (e.g., 2.4.0): 2.4.0
#- PyTorch Version (e.g., 2.4): 2.4.1+cu121
#- Python version (e.g., 3.12):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration: 1xA100 40GB
#- How you installed Lightning(`conda`, `pip`, source): pip install lightning

More info

No response

@asigalov61 asigalov61 added bug Something isn't working needs triage Waiting to be triaged by maintainers labels Oct 3, 2024
@nocoding03
Copy link

Avoid exit(1): In a Jupyter environment, exit() can cause problems. exit is possible in standard Python scripts, but should not be called in Jupyter notebooks. You can use sys.exit() instead:
import sys
sys.exit(1)

However, the recommended approach is to avoid using exit() or sys.exit() directly, especially in Jupyter notebook environments, where these commands can interrupt the kernel process and cause unnecessary problems.

@asigalov61
Copy link
Author

@nocoding03 My code/notebook does not use or calls exit. The problem is in the pytroch lightning module.

If you will double-check the provided traceback, you will see that the error comes from ~/.local/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py module.

@ori-kron-wis
Copy link

I also see that issue in lightning v2.4.0 and torch v2.5.1 while training in jupyter nb.
Once stopping the training run, instead of performing gracefully shutdown, I get this error

NameError                                 Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/call.py in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
     62         if isinstance(launcher, _SubprocessScriptLauncher):
     63             launcher.kill(_get_sigkill_signal())
---> 64         exit(1)
     65 
     66     except BaseException as exception:

NameError: name ‘exit’ is not defined

seems to be an issue with lightning not importing exit from sys (exit(0)) not defined

@odusseys
Copy link

same issue in 2.5.0 - but it even fails when defining the trainer and kills the kernel

@canergen
Copy link

canergen commented Jan 13, 2025

What's the status of this? The bug was reported 5 months ago in that specific branch #19976 authored by @awaelchli and approved by @lantiga. There seems to be no activity in fixing this. My understanding is that importing exit from sys should be sufficient to fix it but I might miss something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Waiting to be triaged by maintainers ver: 2.4.x
Projects
None yet
Development

No branches or pull requests

5 participants