Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{ai}[foss/2022b] PyTorch v2.0.1 #19067

Merged

Conversation

Flamefire
Copy link
Contributor

@Flamefire Flamefire commented Oct 24, 2023

…2.0.1_add-missing-vsx-vector-shift-functions.patch, PyTorch-2.0.1_avoid-test_quantization-failures.patch, PyTorch-2.0.1_disable-test-sharding.patch, PyTorch-2.0.1_fix-numpy-compat.patch, PyTorch-2.0.1_fix-shift-ops.patch, PyTorch-2.0.1_fix-skip-decorators.patch, PyTorch-2.0.1_fix-test_memory_profiler.patch, PyTorch-2.0.1_fix-test-ops-conf.patch, PyTorch-2.0.1_fix-torch.compile-on-ppc.patch, PyTorch-2.0.1_fix-ub-in-inductor-codegen.patch, PyTorch-2.0.1_fix-vsx-loadu.patch, PyTorch-2.0.1_no-cuda-stubs-rpath.patch, PyTorch-2.0.1_remove-test-requiring-online-access.patch, PyTorch-2.0.1_skip-diff-test-on-ppc.patch, PyTorch-2.0.1_skip-failing-gradtest.patch, PyTorch-2.0.1_skip-test_shuffle_reproducibility.patch, PyTorch-2.0.1_skip-tests-skipped-in-subprocess.patch
@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
taurusml7 - Linux RHEL 7.6, POWER, 8335-GTX, 6 x NVIDIA Tesla V100-SXM2-32GB, 440.64.00, Python 2.7.5
See https://gist.github.com/Flamefire/f0e7be8bf7b9299977dcfef226456953 for a full test report.

@boegel boegel added the update label Oct 25, 2023
@boegel boegel added this to the 4.x milestone Oct 25, 2023
@boegel
Copy link
Member

boegel commented Oct 25, 2023

@boegelbot please test @ jsc-zen2
CORE_CNT=16

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on jsczen2l1.int.jsc-zen2.easybuild-test.cluster

PR test command 'EB_PR=19067 EB_ARGS= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --mem-per-cpu=4000M --job-name test_PR_19067 --ntasks="16" ~/boegelbot/eb_from_pr_upload_jsc-zen2.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 3641

Test results coming soon (I hope)...

- notification for comment with ID 1778706883 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
jsczen2c1.int.jsc-zen2.easybuild-test.cluster - Linux Rocky Linux 8.5, x86_64, AMD EPYC 7742 64-Core Processor (zen2), Python 3.6.8
See https://gist.github.com/boegelbot/ff4ce95320c6d4ec60b9ce024c9232f2 for a full test report.

@branfosj
Copy link
Member

Test report by @branfosj
FAILED
Build succeeded for 3 out of 4 (1 easyconfigs in total)
bear-pg0105u03b - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz (icelake), Python 3.6.8
See https://gist.github.com/branfosj/a374a39b83492ca0a94afab71dc54dfe for a full test report.

@SebastianAchilles
Copy link
Member

Test report by @SebastianAchilles
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
zen2-rockylinux-88 - Linux Rocky Linux 8.8, x86_64, AMD EPYC 7452 32-Core Processor (zen2), Python 3.6.8
See https://gist.github.com/SebastianAchilles/95ca63f9b91cc427f12629328044f7de for a full test report.

@boegel
Copy link
Member

boegel commented Oct 26, 2023

Test report by @boegel
FAILED
Build succeeded for 5 out of 6 (1 easyconfigs in total)
node3100.skitty.os - Linux RHEL 8.8, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz (skylake_avx512), Python 3.6.8
See https://gist.github.com/boegel/9130be6704b11e9034d9f739f8604aca for a full test report.

@Flamefire
Copy link
Contributor Author

Flamefire commented Oct 26, 2023

@boegel @branfosj I see a pattern there (visible thanks to the updated EasyBlock):

test_consistency_SparseBSC_sgn_cpu_uint8
test_consistency_SparseBSC_sign_cpu_uint8
test_consistency_SparseBSR_sgn_cpu_uint8
test_consistency_SparseBSR_sign_cpu_uint8
test_consistency_SparseCSC_sgn_cpu_uint8
test_consistency_SparseCSC_sign_cpu_uint8
test_consistency_SparseCSR_sgn_cpu_uint8
test_consistency_SparseCSR_sign_cpu_uint8
test_contig_vs_every_other_sgn_cpu_uint8
test_contig_vs_every_other_sign_cpu_uint8
test_non_contig_sgn_cpu_uint8
test_non_contig_sign_cpu_uint8
test_reference_numerics_normal_sgn_cpu_uint8
test_reference_numerics_normal_sign_cpu_uint8
test_reference_numerics_small_sgn_cpu_uint8
test_reference_numerics_small_sign_cpu_uint8
test_sparse_consistency_sgn_cpu_uint8
test_sparse_consistency_sign_cpu_uint8

Looks like an issue with the sgn and sign functions for uint8 tensors so very likely the same underlying issue/bug. Not sure why this happens only for those 2 test reports. Any ideas what they have in common? Maybe only ones with AVX512?

@branfosj
Copy link
Member

The inductor/test_torchinductor_opinfo failures are identical to #19066 (comment)

The traceback is similar for all of them.

======================================================================
FAIL: test_sparse_consistency_sgn_cpu_uint8 (__main__.TestSparseUnaryUfuncsCPU)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/dev/shm/branfosj/tmp-up-EL8/eb-o_7bs8k4/tmp3dgmc2zq/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 401, in instantiated_test
    result = test(self, **param_kwargs)
  File "/dev/shm/branfosj/tmp-up-EL8/eb-o_7bs8k4/tmp3dgmc2zq/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 851, in test_wrapper
    return test(*args, **kwargs)
  File "/dev/shm/branfosj/build-up-EL8/PyTorch/2.0.1/foss-2022b/pytorch-v2.0.1/test/test_sparse.py", line 3953, in test_sparse_consistency
    self.assertEqual(_sparse_to_dense(output), expected)
  File "/dev/shm/branfosj/tmp-up-EL8/eb-o_7bs8k4/tmp3dgmc2zq/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2979, in assertEqual
    raise error_metas[0].to_error(
AssertionError: Tensor-likes are not equal!

Mismatched elements: 39 / 400 (9.8%)
Greatest absolute difference: 254 at index (17, 2)
Greatest relative difference: 0.9960784316062927 at index (17, 2)

I'm rebuilding now with --skip-test-step and I'll look to see if I can find anything more useful.

@Flamefire
Copy link
Contributor Author

I'd be interested in the output for test_non_contig_sgn_cpu_uint8 or test_non_contig_sign_cpu_uint8

My current analysis:

  • sgn/sign is supposed to return one of (-1,0,1) depending on the sign of the input
  • inputs of uint8_t can only be in range [0,255] hence the only valid outputs or 0 or 1
  • a difference of 254 in the output is likely an overflow, i.e. either the reference or the result had an intermediate value of -1 where the other is 1 (uint8_t(-1) == 255 - 1 -> uint8_t(-1) - 1 = 254)

Not sure how that can happen though

@branfosj
Copy link
Member

======================================================================
FAIL: test_non_contig_sgn_cpu_uint8 (__main__.TestUnaryUfuncsCPU)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/rds/projects/2017/branfosj-rse/easybuild/EL8-ice/software/PyTorch/2.0.1-foss-2022b/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 401, in instantiated_test
    result = test(self, **param_kwargs)
  File "/rds/projects/2017/branfosj-rse/easybuild/EL8-ice/software/PyTorch/2.0.1-foss-2022b/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 851, in test_wrapper
    return test(*args, **kwargs)
  File "/rds/projects/2017/branfosj-rse/easybuild/src/easybuild-easyconfigs/easybuild/easyconfigs/tmp/pytorch-v2.0.1/test/test_unary_ufuncs.py", line 366, in test_non_contig
    self.assertEqual(op_contig, op_non_contig)
  File "/rds/projects/2017/branfosj-rse/easybuild/EL8-ice/software/PyTorch/2.0.1-foss-2022b/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2979, in assertEqual
    raise error_metas[0].to_error(
AssertionError: Tensor-likes are not equal!

Mismatched elements: 952 / 1024 (93.0%)
Greatest absolute difference: 254 at index (0,)
Greatest relative difference: 254.0 at index (0,)

In test_non_contig, it fails when shape = (1024,). I tested further and it fails with >= 64 and passes when the tensor is smaller than that.

op(contig, **torch_kwargs) and op(non_contig, **torch_kwargs) for shape = (64,) where Mismatched elements: 61 / 64 (95.3%)

tensor([255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,   0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,   0, 255, 255, 255, 255, 255, 255,   0, 255, 255, 255, 255, 255, 255, 255, 255, 255], dtype=torch.uint8)
tensor([  1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   0,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   0,   1,   1,   1,   1,   1,   1,   0,   1,   1,   1,   1,   1,   1,   1,   1,   1], dtype=torch.uint8)

So, the 0 are in the same place and are the matched elements. The mismatched ones are all 255 vs 1.

@Flamefire
Copy link
Contributor Author

In test_non_contig, it fails when shape = (1024,). I tested further and it fails with >= 64 and passes when the tensor is smaller than that.

So your system is an AVX512 system? (64*8bit=512bit) so it has a bug in the AVX512 vectorized code.

I'll see if I can find that. Strange though, that it passes those tests for 2022a.

I opened draft PRs for 2.1.0 where I just ported the patches from here and removed those that are no longer required. Those are not tested yet, but maybe you can given them a try just to see if it compiles and succeeds on those tests at least.

@branfosj
Copy link
Member

So your system is an AVX512 system?

Yes

processor       : 71
vendor_id       : GenuineIntel
cpu family      : 6
model           : 106
model name      : Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz
stepping        : 6
microcode       : 0xd0003a5
cpu MHz         : 3500.000
cache size      : 55296 KB
physical id     : 1
siblings        : 36
core id         : 35
cpu cores       : 36
apicid          : 198
initial apicid  : 198
fpu             : yes
fpu_exception   : yes
cpuid level     : 27
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect wbnoinvd dtherm ida arat pln pts avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid fsrm md_clear pconfig flush_l1d arch_capabilities
bugs            : spectre_v1 spectre_v2 spec_store_bypass swapgs mmio_stale_data
bogomips        : 4818.77
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 57 bits virtual
power management:

@Flamefire
Copy link
Contributor Author

@branfosj or @boegel I'd like to report that to PyTorch as this is likely a bug there. Or in our 2022b toolchain (which I hope not). Does your GCCcore/12.2.0 installation include the GCCcore-12.2.0_fix-vectorizer.patch?

If so and you can spare the time I'd like you to try the following (on such an AVX512 node):

  • switch to this build env: eb --dump-env PyTorch-2.0.1-foss-2022b.eb & source PyTorch-2.0.1-foss-2022b.env
  • Create a virtualenv: python -m venv --system-site-packages test-env & source test-env/bin/activate
  • install the official PyTorch module (pip install torch==2.0.1, later same but with pip install torch==2.1.0)
  • (Shallow) clone pytorch git clone --depth 1 --branch v2.0.1 https://github.com/pytorch/pytorch.git
  • Run the single failing test: cd test && python test_unary_ufuncs.py -k test_non_contig_sign_cpu_uint8

I hope this reproduces on the pip installed 2.0.1 (sanity check for us) and expect it to also fail on 2.1.0 (git checkout of the latter not required, only pip)

After this we can blame this on PyTorch and just skip the tests leaving the bug in the installed module as-if it was pip-installed if that is acceptable. At least I couldn't find anything wrong, so it might be a compiler bug given it works for 2022a

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
taurusi8018 - Linux CentOS Linux 7.9.2009, x86_64, AMD EPYC 7352 24-Core Processor, 8 x NVIDIA NVIDIA A100-SXM4-40GB, 470.57.02, Python 2.7.5
See https://gist.github.com/Flamefire/d8b71cd29d1b3a3fc47cfa9c481edd7a for a full test report.

@VRehnberg
Copy link
Contributor

VRehnberg commented Nov 1, 2023

Test report by @VRehnberg
FAILED
Build succeeded for 7 out of 8 (1 easyconfigs in total)
alvis-c1 - Linux Rocky Linux 8.8, x86_64, Intel Xeon Processor (Skylake), Python 3.6.8
See https://gist.github.com/VRehnberg/f67612fbe10aafa8c2c639a72b926d28 for a full test report.

Edit: Full log attached
easybuild-PyTorch-2.0.1-20231101.082725.bppLR.log.gz

@VRehnberg
Copy link
Contributor

VRehnberg commented Nov 1, 2023

@Flamefire

@branfosj or @boegel I'd like to report that to PyTorch as this is likely a bug there. Or in our 2022b toolchain (which I hope not). Does your GCCcore/12.2.0 installation include the GCCcore-12.2.0_fix-vectorizer.patch?

Yes.

* Run the single failing test: `cd test && python test_unary_ufuncs.py -k test_non_contig_sign_cpu_uint8`

I hope this reproduces on the pip installed 2.0.1 (sanity check for us) and expect it to also fail on 2.1.0 (git checkout of the latter not required, only pip)

For 2.0.1 (and 2.1.0 on SkyLake machine, didn't try on IceLake) I'm not seeing any test errors for this test after trying on two different machines with the following avx512 flags listed by lscpu (don't know which matter here or what all the differences mean):

  • IceLake machine:
avx512f
avx512dq
avx512ifma
avx512cd
avx512bw
avx512vl
avx512vbmi
avx512_vbmi2
avx512_vnni
avx512_bitalg
avx512_vpopcntdq
  • SkyLake machine:
avx512f
avx512dq
avx512cd
avx512bw
avx512vl

@Flamefire
Copy link
Contributor Author

I reviewed the relevant code and used AVX512 intrinsics and haven't found anything wrong with them. Also a python -c 'import torch; print(torch.__config__.show())' for the official Python 2.0.1 package shows it is build with GCC 9.3.

Given that this EC and test works with foss/2022a I'd suspect this to be a bug in GCC 12 -.-

Short of disabling AVX512 by patching it out I don't know what to do...

@VRehnberg
Copy link
Contributor

VRehnberg commented Nov 1, 2023

Test report by @VRehnberg
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
alvis-s1 - Linux Rocky Linux 8.8, x86_64, Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz, Python 3.6.8
See https://gist.github.com/VRehnberg/c0dcdfe963e1a21326261e7569f1df1a for a full test report.

Edit: Full log attached
easybuild-PyTorch-2.0.1-20231101.082255.BGlPk.log.gz

@VRehnberg
Copy link
Contributor

Test report by @VRehnberg
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
alvis-c1 - Linux Rocky Linux 8.8, x86_64, Intel Xeon Processor (Skylake), Python 3.6.8
See https://gist.github.com/VRehnberg/adfca3a103ccbd73fd27d86b5c787b4a for a full test report.

@VRehnberg
Copy link
Contributor

Test report by @VRehnberg
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
alvis-s1 - Linux Rocky Linux 8.8, x86_64, Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz, Python 3.6.8
See https://gist.github.com/VRehnberg/788e060921d3e1efc064878af92fdb08 for a full test report.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
taurusml7 - Linux RHEL 7.6, POWER, 8335-GTX, 6 x NVIDIA Tesla V100-SXM2-32GB, 440.64.00, Python 2.7.5
See https://gist.github.com/Flamefire/33b6f5810cfa0038962e2962f1d9e38f for a full test report.

@sassy-crick
Copy link
Collaborator

I am currently trying to do a testbuild but I noticed these modules are also dragged in:

GCCcore-10.2.0.eb
zlib-1.2.11-GCCcore-10.2.0.eb
help2man-1.47.16-GCCcore-10.2.0.eb
M4-1.4.18-GCCcore-10.2.0.eb
Bison-3.7.1-GCCcore-10.2.0.eb
flex-2.6.4-GCCcore-10.2.0.eb
binutils-2.35-GCCcore-10.2.0.eb
ncurses-6.2-GCCcore-10.2.0.eb
bzip2-1.0.8-GCCcore-10.2.0.eb
cURL-7.72.0-GCCcore-10.2.0.eb
XZ-5.2.5-GCCcore-10.2.0.eb
libarchive-3.4.3-GCCcore-10.2.0.eb
CMake-3.18.4-GCCcore-10.2.0.eb

Most of them are build-dependencies so ok I guess but clearly bzip2 for example is not. Are we shooting us in our foot here?

@Flamefire
Copy link
Contributor Author

Most of them are build-dependencies so ok I guess but clearly bzip2 for example is not. Are we shooting us in our foot here?

I don't understand why we would be shooting our feet by loading bzip2. Can you explain what issue you see with that?

FWIW: This dependency is pulled in by Python which requires it for the bz2 Package in the Python stdlib according to the comment in that EC. So it looks entirely reasonable to me.

@Flamefire Flamefire force-pushed the 20231024101128_new_pr_PyTorch201 branch from 667d245 to 87d9d70 Compare November 9, 2023 09:56
@Flamefire
Copy link
Contributor Author

@branfosj @boegel The compiler bug leading to the sign/sgn failures should be fixed by reinstalling with #19180 which should make this pass

@branfosj
Copy link
Member

branfosj commented Nov 9, 2023

Test report by @branfosj
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
bear-pg0105u03b - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz (icelake), Python 3.6.8
See https://gist.github.com/branfosj/8a762b78b462a396159926f67a94b345 for a full test report.

@Flamefire
Copy link
Contributor Author

@branfosj Is this the same node as before? Suddenly different tests are failing :-(

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
n1356 - Linux RHEL 8.7 (Ootpa), x86_64, Intel(R) Xeon(R) Platinum 8470 (icelake), Python 3.8.13
See https://gist.github.com/Flamefire/c1240667cbfd4de08c73568c7b98b655 for a full test report.

@boegelbot
Copy link
Collaborator

@Flamefire: Tests failed in GitHub Actions, see https://github.com/easybuilders/easybuild-easyconfigs/actions/runs/7088418368
Output from first failing test suite run:

FAIL: test__parse_easyconfig_PyTorch-2.1.0-foss-2022a.eb (test.easyconfigs.easyconfigs.EasyConfigTest)
Test for easyconfig PyTorch-2.1.0-foss-2022a.eb
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/easybuild-easyconfigs/easybuild-easyconfigs/test/easyconfigs/easyconfigs.py", line 1609, in innertest
    template_easyconfig_test(self, spec_path)
  File "/home/runner/work/easybuild-easyconfigs/easybuild-easyconfigs/test/easyconfigs/easyconfigs.py", line 1460, in template_easyconfig_test
    self.assertTrue(os.path.isfile(patch_full), msg)
AssertionError: False is not true : Patch file /home/runner/work/easybuild-easyconfigs/easybuild-easyconfigs/easybuild/easyconfigs/p/PyTorch/PyTorch-2.1.0_fix-vsx-vector-shift-functions.patch is available for PyTorch-2.1.0-foss-2022a.eb

======================================================================
FAIL: test_dep_versions_per_toolchain_generation (test.easyconfigs.easyconfigs.EasyConfigTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/easybuild-easyconfigs/easybuild-easyconfigs/test/easyconfigs/easyconfigs.py", line 898, in test_dep_versions_per_toolchain_generation
    self.assertFalse(multi_dep_vars, error_msg)
AssertionError: ['Z3'] is not false : No multi-variant deps found for '^.*-(?P<tc_gen>20(1[89]|[2-9][0-9])[ab]).*\.eb$' easyconfigs:

found 2 variants of 'Z3' dependency in easyconfigs using '2022a' toolchain generation
* version: 4.10.2; versionsuffix:  as dep for {'vaeda-0.0.30-foss-2022a.eb', 'leidenalg-0.9.1-foss-2022a.eb', 'MrBayes-3.2.7-gompi-2022a.eb', 'CellOracle-0.12.0-foss-2022a.eb', 'scib-1.1.1-foss-2022a.eb', 'TRIQS-tprf-3.1.1-foss-2022a.eb', 'infercnvpy-0.4.2-foss-2022a.eb', 'Omnipose-0.4.4-foss-2022a-CUDA-11.7.0.eb', 'scanpy-1.9.1-foss-2022a.eb', 'python-igraph-0.10.3-foss-2022a.eb', 'kb-python-0.27.3-foss-2022a.eb', 'solo-1.3-foss-2022a.eb', 'Omnipose-0.4.4-foss-2022a.eb', 'TRIQS-dft_tools-3.1.0-foss-2022a.eb', 'TRIQS-cthyb-3.1.0-foss-2022a.eb', 'TRIQS-3.1.1-foss-2022a.eb', 'Giotto-Suite-3.0.1-foss-2022a-R-4.2.1.eb', 'epiScanpy-0.4.0-foss-2022a.eb'}
* version: 4.10.2; versionsuffix: -Python-3.10.4 as dep for {'PyTorch-2.1.0-foss-2022a.eb'}


----------------------------------------------------------------------
Ran 18624 tests in 662.425s

FAILED (failures=2)
ERROR: Not all tests were successful

bleep, bloop, I'm just a bot (boegelbot v20200716.01)
Please talk to my owner @boegel if you notice me acting stupid),
or submit a pull request to https://github.com/boegel/boegelbot fix the problem.

@Flamefire Flamefire force-pushed the 20231024101128_new_pr_PyTorch201 branch from daa5968 to f4b48c9 Compare December 8, 2023 12:06
@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
n1498 - Linux RHEL 8.7 (Ootpa), x86_64, Intel(R) Xeon(R) Platinum 8470 (icelake), Python 3.8.13
See https://gist.github.com/Flamefire/eb79ee2ee75b8e421ec830778c668b27 for a full test report.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
i8026 - Linux Rocky Linux 8.7, x86_64, AMD EPYC 7352 24-Core Processor (zen2), 8 x NVIDIA NVIDIA A100-SXM4-40GB, 545.23.08, Python 3.6.8
See https://gist.github.com/Flamefire/5c4ccbbd67d10e2ce1687eacadb447ce for a full test report.

@branfosj
Copy link
Member

@boegelbot please test @ generoso
CORE_CNT=16

@boegelbot
Copy link
Collaborator

@branfosj: Request for testing this PR well received on login1

PR test command 'EB_PR=19067 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_19067 --ntasks="16" ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 12421

Test results coming soon (I hope)...

- notification for comment with ID 1858766566 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@branfosj
Copy link
Member

@boegelbot please test @ jsc-zen2
CORE_CNT=16

@boegelbot
Copy link
Collaborator

@branfosj: Request for testing this PR well received on jsczen2l1.int.jsc-zen2.easybuild-test.cluster

PR test command 'EB_PR=19067 EB_ARGS= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --mem-per-cpu=4000M --job-name test_PR_19067 --ntasks="16" ~/boegelbot/eb_from_pr_upload_jsc-zen2.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 3921

Test results coming soon (I hope)...

- notification for comment with ID 1858767086 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
n1162 - Linux RHEL 8.7 (Ootpa), x86_64, Intel(R) Xeon(R) Platinum 8470 (icelake), Python 3.8.13
See https://gist.github.com/Flamefire/3642d893a09f719ca27c036316df627c for a full test report.

@branfosj
Copy link
Member

Test report by @branfosj
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
bear-pg0105u03b - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz (icelake), Python 3.6.8
See https://gist.github.com/branfosj/b5485dd316e62a25c0ccb8855f0a1e57 for a full test report.

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
jsczen2c1.int.jsc-zen2.easybuild-test.cluster - Linux Rocky Linux 8.5, x86_64, AMD EPYC 7742 64-Core Processor (zen2), Python 3.6.8
See https://gist.github.com/boegelbot/4b84b7153410bd322d498404de830302 for a full test report.

@branfosj
Copy link
Member

@boegelbot please test @ generoso
EB_ARGS="--sanity-check-only"

@boegelbot
Copy link
Collaborator

@branfosj: Request for testing this PR well received on login1

PR test command 'EB_PR=19067 EB_ARGS="--sanity-check-only" EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_19067 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 12424

Test results coming soon (I hope)...

- notification for comment with ID 1859087345 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
cns1 - Linux Rocky Linux 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/5a2c3685596d56c289e426df8373e5fd for a full test report.

@branfosj branfosj modified the milestones: 4.x, next release (4.9.0?) Dec 17, 2023
@branfosj
Copy link
Member

Going in, thanks @Flamefire!

@branfosj branfosj merged commit 73643ce into easybuilders:develop Dec 17, 2023
9 checks passed
@Flamefire Flamefire deleted the 20231024101128_new_pr_PyTorch201 branch December 17, 2023 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants