On Kaggle : libcusparse.so.12: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12 #134929

FurkanGozukara · 2024-09-02T00:13:45Z

🐛 Describe the bug

I have tried everything but no luck

Waiting your inputs to try more

I tried torch 2.4.0, 2.5 - dev, cu 118, cu121 and cu124 - all same error

This below code - I got the same error when using famous ComfyUI via SwarmUI

import os

# Set CUDA_HOME environment variable
os.environ['CUDA_HOME'] = '/opt/conda'

# Add CUDA binary directory to PATH
os.environ['PATH'] = f"/opt/conda/bin:{os.environ['PATH']}"

# Set LD_LIBRARY_PATH to include CUDA libraries
os.environ['LD_LIBRARY_PATH'] = f"/opt/conda/lib:{os.environ.get('LD_LIBRARY_PATH', '')}"

# Verify CUDA version
!nvcc --version

# Optional: Check if CUDA is available in Python
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")

giving below error

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
---------------------------------------------------------------------------
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[49], line 16
     13 get_ipython().system('nvcc --version')
     15 # Optional: Check if CUDA is available in Python
---> 16 import torch
     17 print(f"CUDA available: {torch.cuda.is_available()}")
     18 if torch.cuda.is_available():

File /opt/conda/lib/python3.10/site-packages/torch/__init__.py:368
    366     if USE_GLOBAL_DEPS:
    367         _load_global_deps()
--> 368     from torch._C import *  # noqa: F403
    371 class SymInt:
    372     """
    373     Like an int (including magic methods), but redirects all operations on the
    374     wrapped node. This is used in particular to symbolically record operations
    375     in the symbolic shape workflow.
    376     """

ImportError: /opt/conda/lib/python3.10/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12

Versions

Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0] (64-bit runtime)
Python platform: Linux-5.15.154+-x86_64-with-glibc2.35
Is CUDA available: N/A
CUDA runtime version: 12.4.131
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: 
GPU 0: Tesla T4
GPU 1: Tesla T4

Nvidia driver version: 550.90.07
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn.so.9.0.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.0.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.0.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.0.0
/usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.0.0
/usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.0.0
/usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.0.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.0.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.7
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A

CPU:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        46 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               4
On-line CPU(s) list:                  0-3
Vendor ID:                            GenuineIntel
Model name:                           Intel(R) Xeon(R) CPU @ 2.00GHz
CPU family:                           6
Model:                                85
Thread(s) per core:                   2
Core(s) per socket:                   2
Socket(s):                            1
Stepping:                             3
BogoMIPS:                             4000.36
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat md_clear arch_capabilities
Hypervisor vendor:                    KVM
Virtualization type:                  full
L1d cache:                            64 KiB (2 instances)
L1i cache:                            64 KiB (2 instances)
L2 cache:                             2 MiB (2 instances)
L3 cache:                             38.5 MiB (1 instance)
NUMA node(s):                         1
NUMA node0 CPU(s):                    0-3
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Mitigation; PTE Inversion
Vulnerability Mds:                    Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Meltdown:               Mitigation; PTI
Vulnerability Mmio stale data:        Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Mitigation; IBRS
Vulnerability Spec rstack overflow:   Not affected
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; IBRS; IBPB conditional; STIBP conditional; RSB filling; PBRSB-eIBRS Not affected; BHI SW loop, KVM SW loop
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Mitigation; Clear CPU buffers; SMT Host state unknown

Versions of relevant libraries:
[pip3] flake8==7.0.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] onnx==1.16.2
[pip3] optree==0.11.0
[pip3] pytorch-ignite==0.5.1
[pip3] pytorch-lightning==2.4.0
[pip3] pytorch-triton==3.0.0+dedb7bdf33
[pip3] torch==2.5.0.dev20240901+cu124
[pip3] torchaudio==2.5.0.dev20240901+cu124
[pip3] torchinfo==1.8.0
[pip3] torchmetrics==1.4.1
[pip3] torchvision==0.20.0.dev20240901+cu124
[pip3] triton==3.0.0
[conda] magma-cuda121             2.6.1                         1    pytorch
[conda] mkl                       2024.2.1           ha957f24_103    conda-forge
[conda] numpy                     1.26.4          py310hb13e2d6_0    conda-forge
[conda] optree                    0.11.0                   pypi_0    pypi
[conda] pytorch-ignite            0.5.1                    pypi_0    pypi
[conda] pytorch-lightning         2.4.0                    pypi_0    pypi
[conda] pytorch-triton            3.0.0+dedb7bdf33          pypi_0    pypi
[conda] torch                     2.5.0.dev20240901+cu124          pypi_0    pypi
[conda] torchaudio                2.5.0.dev20240901+cu124          pypi_0    pypi
[conda] torchinfo                 1.8.0                    pypi_0    pypi
[conda] torchmetrics              1.4.1                    pypi_0    pypi
[conda] torchvision               0.20.0.dev20240901+cu124          pypi_0    pypi
[conda] triton                    3.0.0                    pypi_0    pypi

cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @seemethere @malfet @osalpekar @atalman @alexsamardzic @nikitaved @pearu @cpuhrsch @amjames @bhosmer @jcaip @ptrblck @eqy

The text was updated successfully, but these errors were encountered:

malfet · 2024-09-03T21:54:05Z

This sounds to me like a great topic one should ask at https://discuss.pytorch.org though we should also extend collect_env to print information about nvidia- packages on has installed.
If torch is installed via PyPI wheels one should not need to define CUDA_HOME or anything like that, but this in turn can affect how torch is searching for its dependencies, which seems to be what is happening now...

FurkanGozukara · 2024-09-03T22:02:34Z

This sounds to me like a great topic one should ask at https://discuss.pytorch.org though we should also extend collect_env to print information about nvidia- packages on has installed. If torch is installed via PyPI wheels one should not need to define CUDA_HOME or anything like that, but this in turn can affect how torch is searching for its dependencies, which seems to be what is happening now...

Well it was working when ComfyUI was not using Torch 2.4 but it started after they moved. I think Kaggle still has by default Torch 2.3. Do you know how can I fix this issue? I tried so many commands none worked :/

So many people waiting me to fix this issue if i can. Thank you so much

sarihl · 2024-09-04T08:24:22Z

Facing the same issue, it seems like the issue only occurs when using a notebook, with cuda 12.4
you try downgrading to a previous cuda version (worked for me)
or using a script instead of a notebook

both of these are workarounds though :/

FurkanGozukara · 2024-09-04T11:37:36Z

you try downgrading to a previous cuda version (worked for me)

how do we downgrade notebook cuda version?

malfet · 2024-09-04T23:07:00Z

@sarihl is there a document somewhere that I can read thru that documents the installation process? I can run torch-2.4 from jupyter notebook just fine

FurkanGozukara · 2024-09-04T23:24:24Z

@sarihl is there a document somewhere that I can read thru that documents the installation process? I can run torch-2.4 from jupyter notebook just fine

You run it inside kaggle?

malfet · 2024-09-04T23:44:36Z

@sarihl is there a document somewhere that I can read thru that documents the installation process? I can run torch-2.4 from jupyter notebook just fine

You run it inside kaggle?

Can you share a link to the notebook? I've tried running it and it seems to work fine for me: https://www.kaggle.com/code/malfet/check-torch-version

FurkanGozukara · 2024-09-05T02:03:51Z

@sarihl is there a document somewhere that I can read thru that documents the installation process? I can run torch-2.4 from jupyter notebook just fine

You run it inside kaggle?

Can you share a link to the notebook? I've tried running it and it seems to work fine for me: https://www.kaggle.com/code/malfet/check-torch-version

here here the notebook

to be able to see it you need to connect via ngrok and install famous swarmui and make it install comfyui backend

it is so easy and straight forward actually

notebookc7ac6afeca.txt

willlllllio · 2024-09-05T04:11:24Z

Side note but @malfet just be aware that using Kaggle for running Diffusion WebUIs is against their ToS so you might wanna be careful with your Kaggle account when trying that, as FurkanGozukara here also surely knows since he even commented on a thread about it there where a Kaggle staff memeber explains that https://www.kaggle.com/discussions/product-feedback/440296

malfet · 2024-09-05T17:03:26Z

@willlllllio thank you for the warning. I wasn't aware of that.

At this point, it does not seem like a PyTorch issue, but may be a bug with SwarmUI or whatever that creates custom environment that forces libtorch to link against wrong nvjitlink. So I think needs reproduction is the right label. I would look into it again, if there is a link to Google Colab, Kaggle or something that can be run end to end and reproduces the problem. And once again, if one install CUDA-12.2 and points LD_LIBRARY_PATH to that path torch will be force to link with wrong version and fail, but there isn't much one can do on the torch side to fix it.

malfet · 2024-10-19T00:03:13Z

@albanD showed me a reproducer, we need to add nvjitlink to rpath for libtorch_cuda.so

jt-michels · 2024-10-19T16:03:38Z

I am having a similar issue, but using Paperspace VM, not Kaggle...

yondonfu · 2024-10-22T18:41:54Z

FWIW I ran into this issue on a machine with system runtime CUDA 12.2 (as reported by nvcc) and the latest ComfyUI which seems to install torch 2.5.0 CUDA 12.4. This seems to line up with the scenario mentioned in #138460 which notes that the dep issue most commonly comes up when the the binary for CUDA 12.4 is installed with a global install < 12.4.

My current workaround for now is to just downgrade to a version of torch for a previous version of CUDA:

pip uninstall torch torchvision torchaudio
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

khanfarhan10 · 2024-12-02T18:36:28Z

This issue is really persisting and looks serious.

A thread is present here : #111469

albanD · 2024-12-02T18:42:13Z

bumping priority for activity and the fact that we have a good idea how to fix

apaz-cli · 2025-01-10T22:26:17Z

@janeyx99 @malfet

I still run into this every day. I'm in a venv, so my solution was to add .venv/lib/python3.10/site-packages/nvidia/nvjitlink/lib to my LD_LIBRARY_PATH.

This is also the same issue as #111469, which is closed by the author because they were unblocked, but the issue didn't get resolved in the general case.

My workaround is that I switch to the venv, then paste in

LD_LIBRARY_PATH=$(python -c "import site; print(site.getsitepackages()[0] + '/nvidia/nvjitlink/lib')"):$LD_LIBRARY_PATH

I would put in the PR to fix it myself, but I'm not quite sure how the import resolves object files or where the code that searches for it is.

malfet · 2025-01-15T22:34:10Z

Anyone wants to try latest nightly, which includes #141063 (will be included in 2.6 release) that to the best of my understanding fixes the problem, though I have not tired reproducing in on kaggle

atalman · 2025-01-22T17:22:36Z

Hi @FurkanGozukara can you please confirm this is fixed ? Using following install command. This should installl torch 2.6 release candidate:

pip3 install torch numpy --index-url https://download.pytorch.org/whl/test/cu124

bilzard · 2025-01-29T09:59:30Z

I checked current nightly version fixes the issue.
I also found workaround for older version.

Check on Nightly Version --> worked

Anyone wants to try latest nightly, which includes #141063 (will be included in 2.6 release) that to the best of my understanding fixes the problem, though I have not tired reproducing in on kaggle

Yes. It worked on 2.7.0.dev20250128+cu126.

Install nightly version of pytorch

! pip3 install -U --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126

import pytorch --> worked

import torch

torch.__version__

/opt/conda/lib/python3.10/site-packages/torch/utils/_pytree.py:174: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`.
  warnings.warn(

'2.7.0.dev20250128+cu126'

Workaround for older version

Check NVCC version

!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

Install pytorch that matches nvcc CUDA version

! pip install -U torch torchvision torchaudio --target=/kaggle/working --index-url https://download.pytorch.org/whl/cu121

Set path to `libnvJitLink.so` in `LD_PRELOAD` environment variable

import os

ld_preload_path = !find /usr/local/cuda* -name "libnvJitLink.so*" | head -n 1
if ld_preload_path:
    os.environ["LD_PRELOAD"] = ld_preload_path[0]

print(os.environ.get("LD_PRELOAD"))

import pytorch --> works fine

import torch

torch.__version__

'2.5.1+cu121'

FurkanGozukara · 2025-01-29T15:00:10Z

@bilzard thank you so much

i am glad this is getting fixed

atalman · 2025-01-31T22:10:54Z

Could someone please confirm if its works with latest release 2.6:

pip3 install torch numpy

And we can close this issue.

bilzard · 2025-02-03T09:25:28Z

@atalman I checked the latest release torch==2.6.0+cu124 fixes the issue.

requirements.txt:

torch
torchvision
torchaudio

! pip install -U -r requirements.txt --target=/kaggle/working

import torch

torch.__version__

'2.6.0+cu124'

atalman · 2025-02-03T16:16:12Z

Closing this issue . Resolved in nightly and release 2.6

malfet added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: regression It used to work, and now it doesn't labels Sep 4, 2024

malfet removed the needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user label Oct 19, 2024

albanD mentioned this issue Oct 21, 2024

CUDA Binary dependency chain is wrong, leading to bad binary packaging #138460

Open

Erland366 mentioned this issue Oct 23, 2024

Fix/kaggle pytorch unslothai/unsloth#1174

Closed

atalman added this to the 2.6.0 milestone Nov 22, 2024

atalman assigned malfet Nov 22, 2024

albanD added the high priority label Dec 2, 2024

pytorch-bot bot added the triage review label Dec 2, 2024

janeyx99 removed the triage review label Dec 9, 2024

atalman mentioned this issue Jan 13, 2025

Release 2.6.0 validations checklist and cherry-picks #144503

Open

73 tasks

melopeo mentioned this issue Jan 28, 2025

[BUG] ImportError when installing on an EC2 instance amazon-science/chronos-forecasting#273

Closed

2 tasks

cpwardell mentioned this issue Jan 30, 2025

Install/Run Error cpwardell/Skellytour#2

Open

atalman closed this as completed Feb 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On Kaggle : libcusparse.so.12: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12 #134929

On Kaggle : libcusparse.so.12: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12 #134929

FurkanGozukara commented Sep 2, 2024 •

edited by pytorch-bot bot

Loading

malfet commented Sep 3, 2024

FurkanGozukara commented Sep 3, 2024

sarihl commented Sep 4, 2024

FurkanGozukara commented Sep 4, 2024

malfet commented Sep 4, 2024

FurkanGozukara commented Sep 4, 2024

malfet commented Sep 4, 2024

FurkanGozukara commented Sep 5, 2024

willlllllio commented Sep 5, 2024

malfet commented Sep 5, 2024

malfet commented Oct 19, 2024

jt-michels commented Oct 19, 2024

yondonfu commented Oct 22, 2024

khanfarhan10 commented Dec 2, 2024

albanD commented Dec 2, 2024

apaz-cli commented Jan 10, 2025 •

edited

Loading

malfet commented Jan 15, 2025 •

edited

Loading

atalman commented Jan 22, 2025

bilzard commented Jan 29, 2025 •

edited

Loading

FurkanGozukara commented Jan 29, 2025

atalman commented Jan 31, 2025 •

edited

Loading

bilzard commented Feb 3, 2025 •

edited

Loading

atalman commented Feb 3, 2025

On Kaggle : libcusparse.so.12: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12 #134929

On Kaggle : libcusparse.so.12: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12 #134929

Comments

FurkanGozukara commented Sep 2, 2024 • edited by pytorch-bot bot Loading

🐛 Describe the bug

Versions

malfet commented Sep 3, 2024

FurkanGozukara commented Sep 3, 2024

sarihl commented Sep 4, 2024

FurkanGozukara commented Sep 4, 2024

malfet commented Sep 4, 2024

FurkanGozukara commented Sep 4, 2024

malfet commented Sep 4, 2024

FurkanGozukara commented Sep 5, 2024

willlllllio commented Sep 5, 2024

malfet commented Sep 5, 2024

malfet commented Oct 19, 2024

jt-michels commented Oct 19, 2024

yondonfu commented Oct 22, 2024

khanfarhan10 commented Dec 2, 2024

albanD commented Dec 2, 2024

apaz-cli commented Jan 10, 2025 • edited Loading

malfet commented Jan 15, 2025 • edited Loading

atalman commented Jan 22, 2025

bilzard commented Jan 29, 2025 • edited Loading

Check on Nightly Version --> worked

Install nightly version of pytorch

import pytorch --> worked

Workaround for older version

Check NVCC version

Install pytorch that matches nvcc CUDA version

Set path to libnvJitLink.so in LD_PRELOAD environment variable

import pytorch --> works fine

FurkanGozukara commented Jan 29, 2025

atalman commented Jan 31, 2025 • edited Loading

bilzard commented Feb 3, 2025 • edited Loading

atalman commented Feb 3, 2025

FurkanGozukara commented Sep 2, 2024 •

edited by pytorch-bot bot

Loading

apaz-cli commented Jan 10, 2025 •

edited

Loading

malfet commented Jan 15, 2025 •

edited

Loading

bilzard commented Jan 29, 2025 •

edited

Loading

Set path to `libnvJitLink.so` in `LD_PRELOAD` environment variable

atalman commented Jan 31, 2025 •

edited

Loading

bilzard commented Feb 3, 2025 •

edited

Loading