-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{lib}[GCCcore/12.2.0,foss/2022b] PyTorch v1.13.1, cuDNN v8.5.0.96, magma v2.7.1, ... w/ CUDA 11.7.0 #18853
{lib}[GCCcore/12.2.0,foss/2022b] PyTorch v1.13.1, cuDNN v8.5.0.96, magma v2.7.1, ... w/ CUDA 11.7.0 #18853
Conversation
….5.0.96-CUDA-11.7.0.eb, magma-2.7.1-foss-2022b-CUDA-11.7.0.eb, NCCL-2.16.2-GCCcore-12.2.0-CUDA-11.7.0.eb, UCX-CUDA-1.13.1-GCCcore-12.2.0-CUDA-11.7.0.eb and patches: PyTorch-1.13.1_disable-test-sharding.patch, PyTorch-1.13.1_fix-duplicate-kDefaultTimeout-definition.patch, PyTorch-1.13.1_fix-flaky-jit-test.patch, PyTorch-1.13.1_fix-fsdp-fp16-test.patch, PyTorch-1.13.1_fix-fsdp-tp-integration-test.patch, PyTorch-1.13.1_fix-gcc-12-missing-includes.patch, PyTorch-1.13.1_fix-gcc-12-warning-in-fbgemm.patch, PyTorch-1.13.1_fix-kineto-crash-on-exit.patch, PyTorch-1.13.1_fix-numpy-deprecations.patch, PyTorch-1.13.1_fix-protobuf-dependency.patch, PyTorch-1.13.1_fix-pytest-args.patch, PyTorch-1.13.1_fix-python-3.11-compat.patch, PyTorch-1.13.1_fix-test-ops-conf.patch, PyTorch-1.13.1_fix-warning-in-test-cpp-api.patch, PyTorch-1.13.1_fix-wrong-check-in-fsdp-tests.patch, PyTorch-1.13.1_increase-tolerance-test_jit.patch, PyTorch-1.13.1_increase-tolerance-test_ops.patch, PyTorch-1.13.1_increase-tolerance-test_optim.patch, PyTorch-1.13.1_install-vsx-vec-headers.patch, PyTorch-1.13.1_no-cuda-stubs-rpath.patch, PyTorch-1.13.1_remove-flaky-test-in-testnn.patch, PyTorch-1.13.1_skip-failing-grad-test.patch, PyTorch-1.13.1_skip-failing-singular-grad-test.patch, PyTorch-1.13.1_skip-test-requiring-online-access.patch, PyTorch-1.13.1_skip-tests-without-fbgemm.patch
This comment was marked as outdated.
This comment was marked as outdated.
Test report by @Flamefire |
Test report by @Flamefire |
Test report by @Flamefire |
Test report by @Flamefire |
Test report by @Flamefire |
Test report by @Flamefire |
Test report by @Flamefire |
Test report by @Flamefire |
Test report by @Flamefire |
@Flamefire Should this be good to go now, or are you still figuring out the test faillures on your end? |
Test report by @Flamefire |
@boegel Good to go now. Build failed due to 3 failing tests with only 2 allowed but the issue seems to be driver related or so with a rare enough use case that I just skipped those 2 tests with a patch. There seemingly was an intermittent issue with our cluster so the test report above can be ignored. Trying again but that will take some time. However there is a |
Test report by @Flamefire |
Test report by @akesandgren |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…asyconfigs into 20230922150734_new_pr_PyTorch1131
Going in, thanks @Flamefire! |
(created using
eb --new-pr
)Requires a compiler patch or PyTorch will fail to be compiled by NVCC:
GCCcore/12.2.0
andGCCcore/12.3.0
#18854Requires a pybind11 fix for JIT compilation: