Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

latest develop version doesn't work on rtx 3080 #274

Closed
kiss81 opened this issue Mar 3, 2024 · 11 comments · Fixed by spacemeshos/post-rs#208 or #279
Closed

latest develop version doesn't work on rtx 3080 #274

kiss81 opened this issue Mar 3, 2024 · 11 comments · Fixed by spacemeshos/post-rs#208 or #279
Assignees
Labels
area/post bug Something isn't working regression

Comments

@kiss81
Copy link

kiss81 commented Mar 3, 2024

I built the latest develop postcli version. I can confirm it solves the 100% cpu issue on nvidia. It only works on my 4070ti somehow. My 3080 (non ti) crashes...
xubuntu 23.10
nvidia driver 550.54.14

2024-03-03T14:34:38.640+0100 INFO selecting 1 provider from 2 available {"module": "scrypt_ocl", "file": "scrypt-ocl/src/lib.rs", "line": 354} 2024-03-03T14:34:38.640+0100 INFO Using provider: [GPU] NVIDIA CUDA/NVIDIA GeForce RTX 3080 {"module": "scrypt_ocl", "file": "scrypt-ocl/src/lib.rs", "line": 367} 2024-03-03T14:34:38.640+0100 INFO device memory: 9997 MB, max_mem_alloc_size: 2499 MB, max_compute_units: 68, max_wg_size: 1024 {"module": "scrypt_ocl", "file": "scrypt-ocl/src/lib.rs", "line": 153} thread '<unnamed>' panicked at /home/gh-action-runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ocl-0.19.6/src/standard/pro_que.rs:480:17: assertion failed: ctx.devices().contains(&device) note: run with RUST_BACKTRACE=1 environment variable to display a backtrace fatal runtime error: failed to initiate panic, error 5 SIGABRT: abort PC=0x775827699a1b m=12 sigcode=18446744073709551610 signal arrived during cgo execution

goroutine 1 gp=0xc0000061c0 m=12 mp=0xc000101008 [syscall]: runtime.cgocall(0x5de978, 0xc00002c860) /usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc00002c838 sp=0xc00002c800 pc=0x40712b

@kiss81
Copy link
Author

kiss81 commented Mar 9, 2024

Update from multiple users on discord:
It doesn't matter what gpu. It's just the fact there is more then one gpu installed. The first works, but second one will give an error.

@kiss81
Copy link
Author

kiss81 commented Mar 9, 2024

Tried the latest postcli 12.1 on my second gpu. No luck either

_ocl", "file": "scrypt-ocl/src/lib.rs", "line": 153}
thread '' panicked at /home/gh-action-runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ocl-0.19.6/src/standard/pro_que.rs:480:17:
assertion failed: ctx.devices().contains(&device)
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5
Aborted (core dumped)

@tjb-altf4
Copy link

I had to roll back to 0.11.0 to get things working for multi gpu multi postcli instance (one system).
Postcli only seems to work for the default gpu in 0.12.x builds

@auerswaldg
Copy link

auerswaldg commented Mar 10, 2024

I have same issue trying to use 2 1660 ts cards.
Either one will work separately with v12.1 but when running commands to start second card on same plot it fails.

Error when trying to start second GPU:
./postcli -provider 1 -commitmentAtxId e6afbfdeb002c4bd3f632c1af60a883047121f174556efe877ca347bfb74f6d4 -id be643c305888c5e2fd76e6e11799 -labelsPerUnit 4294967296 -maxFileSize 2147483648 -numUnits 25 -datadir /media/postdata/ -fromFile 411 2024-03-08T10:13:33.698-0500 INFO initialization started {"datadir": "/media/postdata/", "numUnits": 25, "maxFileSize": 2147483648, "labelsPerUnit": 4294967296} 2024-03-08T10:13:33.699-0500 INFO initialization file layout {"labelsPerFile": 134217728, "labelsLastFile": 134217728, "firstFileIndex": 411, "lastFileIndex": 799} 2024-03-08T10:13:33.711-0500 INFO initialization: file already initialized {"fileIndex": 411, "currentNumLabels": 134217728, "targetNumLabels": 134217728, "startPosition": 55163486208} . 2024-03-08T10:13:33.817-0500 INFO initialization: continuing to write file {"fileIndex": 484, "currentNumLabels": 32505856, "targetNumLabels": 134217728, "startPosition": 64961380352} 2024-03-08T10:13:33.849-0500 INFO selecting 1 provider from 2 available {"module": "scrypt_ocl", "file": "scrypt-ocl/src/lib.rs", "line": 354} 2024-03-08T10:13:33.849-0500 INFO Using provider: [GPU] NVIDIA CUDA/NVIDIA GeForce GTX 1660 SUPER {"module": "scrypt_ocl", "file": "scrypt-ocl/src/lib.rs", "line": 367} 2024-03-08T10:13:33.849-0500 INFO device memory: 5936 MB, max_mem_alloc_size: 1484 MB, max_compute_units: 22, max_wg_size: 1024 {"module": "scrypt_ocl", "file": "scrypt-ocl/src/lib.rs", "line": 153} thread '<unnamed>' panicked at /home/gh-action-runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ocl-0.19.6/src/standard/pro_que.rs:480:17: assertion failed: ctx.devices().contains(&device) note: run with RUST_BACKTRACE=1 environment variable to display a backtrace fatal runtime error: failed to initiate panic, error 5 zsh: IOT instruction ./postcli -provider 1 -commitmentAtxId -id -labelsPerUnit 4294967296 25

@poszu
Copy link
Collaborator

poszu commented Mar 12, 2024

Hi, @kiss81, @tjb-altf4, @auerswaldg, thanks for reporting this issue. I think I know what the issue is, but I don't have a setup with two GPUs from the same vendor to test it (so they appear under the same platform in OpenCL). I created a pre-release of the post library with a potential fix here: https://github.com/spacemeshos/post-rs/releases/tag/untagged-5271e5897f54e060ee10. I'd appreciate it if you could try it out. To check it, you need to take the post library (post.dll on Windows, libpost.so on Linux, libpost.dylib on Mac) for your OS from the release, take postcli v0.12.x, and replace the library.

@poszu poszu self-assigned this Mar 12, 2024
@poszu poszu added the bug Something isn't working label Mar 12, 2024
@poszu poszu moved this to 🏗 Doing in Dev team kanban Mar 12, 2024
@kiss81
Copy link
Author

kiss81 commented Mar 12, 2024

Hi, @kiss81, @tjb-altf4, @auerswaldg, thanks for reporting this issue. I think I know what the issue is, but I don't have a setup with two GPUs from the same vendor to test it (so they appear under the same platform in OpenCL). I created a pre-release of the post library with a potential fix here: https://github.com/spacemeshos/post-rs/releases/tag/untagged-5271e5897f54e060ee10. I'd appreciate it if you could try it out. To check it, you need to take the post library (post.dll on Windows, libpost.so on Linux, libpost.dylib on Mac) for your OS from the release, take postcli v0.12.x, and replace the library.

awesome, will try it. Can you fix the link or make it public? It's a 404 now :P
I tried the 0.7.2 but no luck https://github.com/spacemeshos/post-rs/releases/

@auerswaldg
Copy link

Hi, @kiss81, @tjb-altf4, @auerswaldg, thanks for reporting this issue. I think I know what the issue is, but I don't have a setup with two GPUs from the same vendor to test it (so they appear under the same platform in OpenCL). I created a pre-release of the post library with a potential fix here: https://github.com/spacemeshos/post-rs/releases/tag/untagged-5271e5897f54e060ee10. I'd appreciate it if you could try it out. To check it, you need to take the post library (post.dll on Windows, libpost.so on Linux, libpost.dylib on Mac) for your OS from the release, take postcli v0.12.x, and replace the library.

awesome, will try it. Can you fix the link or make it public? It's a 404 now :P I tried the 0.7.2 but no luck https://github.com/spacemeshos/post-rs/releases/

I get the same.
I did find a 0.7.3-rc0 at https://github.com/spacemeshos/post-rs/tags
But that is just source so I'm not sure how/if I could use it.

@tjb-altf4
Copy link

Happy to test @poszu, but as @kiss81 mentioned link is 404.
If you happen to have a prebuilt docker image with required changes, that would simplify things for me for testing.. no big deal if not.

I did also test the new release v0.12.2 on the off chance that fixed the issue, but no luck.

@poszu
Copy link
Collaborator

poszu commented Mar 13, 2024

Sorry about that, apparently draft releases are not public... Could you try this: https://github.com/spacemeshos/post-rs/releases/tag/v0.7.3-rc0

@kiss81
Copy link
Author

kiss81 commented Mar 13, 2024

Sorry about that, apparently draft releases are not public... Could you try this: https://github.com/spacemeshos/post-rs/releases/tag/v0.7.3-rc0

I can confirm the 0.7.3-rc0 post-rs lib solves the issue on linux! (I used it together with postcli 0.12.2)

@poszu
Copy link
Collaborator

poszu commented Mar 13, 2024

Thanks for checking. I will merge and release a new postcli version with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/post bug Something isn't working regression
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants