Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{bio}[foss/2020b,fosscuda/2020b] AlphaFold v2.1.2 w/ Python 3.8.6 #14905

Merged

Conversation

ThomasHoffmann77
Copy link
Contributor

…2-fosscuda-2020b.eb and patches: AlphaFold-2.1.2_data-dep-paths.patch

…2-fosscuda-2020b.eb and patches: AlphaFold-2.1.2_data-dep-paths.patch
@lexming lexming added the update label Feb 2, 2022
@lexming lexming added this to the 4.x milestone Feb 2, 2022
@ThomasHoffmann77 ThomasHoffmann77 changed the title adding easyconfigs: AlphaFold-2.1.2-fosscuda-2020b.eb, AlphaFold-2.1.… {bio}[foss/2020b,fosscuda/2020b] AlphaFold v2.1.2 w/ Python 3.8.6 Feb 3, 2022
@boegelbot

This comment was marked as outdated.

@boegelbot

This comment was marked as outdated.

@boegel
Copy link
Member

boegel commented Feb 12, 2022

Test report by @boegel
FAILED
Build succeeded for 1 out of 2 (2 easyconfigs in total)
node3308.joltik.os - Linux CentOS Linux 7.9.2009, x86_64, Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz (cascadelake), 1 x NVIDIA Tesla V100-SXM2-32GB, 510.47.03, Python 3.6.8
See https://gist.github.com/39a7b7e567e64e4fe70b1deab107c2cb for a full test report.

@boegel
Copy link
Member

boegel commented Feb 12, 2022

Test for fosscuda easyconfig fails on our V100 system with:

FAIL: test_end_to_end_no_relax (__main__.RunAlphafoldTest)
RunAlphafoldTest.test_end_to_end_no_relax
test_end_to_end_no_relax(False)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/user/gent/400/vsc40023/eb_scratch/CO7/cascadelake-volta-ib/software/jax/0.2.19-fosscuda-2020b/lib/python3.8/site-packages/absl/testing/parameterized.py", line 316, in bound_param_test
    return test_method(self, *testcase_params)
  File "/tmp/vsc40023/easybuild_build/AlphaFold/2.1.2/fosscuda-2020b-TensorFlow-2.5.0/alphafold-2.1.2/run_alphafold_test.py", line 91, in test_end_to_end
    self.assertCountEqual(expected_files, target_output_files)
AssertionError: Element counts were not equal:
First has 0, Second has 1:  'relaxed_model1.pdb'

Same problem happened with AlphaFold 2.1.1 in #14500 on our A100 system, so I guess I should try and figure this out...

@boegel
Copy link
Member

boegel commented Feb 12, 2022

Test report by @boegel
FAILED
Build succeeded for 1 out of 2 (2 easyconfigs in total)
node3308.joltik.os - Linux CentOS Linux 7.9.2009, x86_64, Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz (cascadelake), 1 x NVIDIA Tesla V100-SXM2-32GB, 510.47.03, Python 3.6.8
See https://gist.github.com/a3a6ab784daf95241e7f00ffac075cc6 for a full test report.

@boegel
Copy link
Member

boegel commented Feb 12, 2022

Test report by @boegel
FAILED
Build succeeded for 1 out of 2 (2 easyconfigs in total)
node3906.accelgor.os - Linux RHEL 8.4, x86_64, AMD EPYC 7413 24-Core Processor (zen3), 1 x NVIDIA NVIDIA A100-SXM4-80GB, 510.39.01, Python 3.6.8
See https://gist.github.com/c2a2629e48b50a1d5b4782d65e2e44a7 for a full test report.

@branfosj
Copy link
Member

Test for fosscuda easyconfig fails on our V100 system with:

FAIL: test_end_to_end_no_relax (__main__.RunAlphafoldTest)
RunAlphafoldTest.test_end_to_end_no_relax
test_end_to_end_no_relax(False)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/user/gent/400/vsc40023/eb_scratch/CO7/cascadelake-volta-ib/software/jax/0.2.19-fosscuda-2020b/lib/python3.8/site-packages/absl/testing/parameterized.py", line 316, in bound_param_test
    return test_method(self, *testcase_params)
  File "/tmp/vsc40023/easybuild_build/AlphaFold/2.1.2/fosscuda-2020b-TensorFlow-2.5.0/alphafold-2.1.2/run_alphafold_test.py", line 91, in test_end_to_end
    self.assertCountEqual(expected_files, target_output_files)
AssertionError: Element counts were not equal:
First has 0, Second has 1:  'relaxed_model1.pdb'

Same problem happened with AlphaFold 2.1.1 in #14500 on our A100 system, so I guess I should try and figure this out...

@boegel on your build systems does $TMPDIR/absl_testing exist before running the build? If so, does it have files in it?

There are two tests run - no_relax (/unrelaxed) and relax (/relaxed). The extra file that exists is created in the relaxed test, which is the second one run, so it should not exist before that test is run. However, there is no sign that the test directory is cleared between tests. So, if your system is not cleaning up the temporary directory between builds, the old files will still be there.

You can test this with the following - using absl-py:

import os
from absl.testing import absltest

class RunAlphafoldTest(absltest.TestCase):
  def test_me(self):
    out_dir = absltest.get_default_test_tmpdir()
    print('before write', os.listdir(out_dir))
    with open(os.path.join(out_dir, 'file'), 'w') as f:
      f.write('hello world')
    print('after write', os.listdir(out_dir))
    self.assertEqual(1, 1)

if __name__ == '__main__':
  absltest.main()

Then run this twice:

$ python tmp.py
Running tests under Python 3.9.7: /home/simon/downloads/venv/bin/python
[ RUN      ] RunAlphafoldTest.test_me
before write []
after write ['file']
[       OK ] RunAlphafoldTest.test_me
----------------------------------------------------------------------
Ran 1 test in 0.001s

OK
$ python tmp.py
Running tests under Python 3.9.7: /home/simon/downloads/venv/bin/python
[ RUN      ] RunAlphafoldTest.test_me
before write ['file']
after write ['file']
[       OK ] RunAlphafoldTest.test_me
----------------------------------------------------------------------
Ran 1 test in 0.002s

OK

Note how file already exists in the second run before the write.

@branfosj
Copy link
Member

Oh, now I understand why @boegel is seeing the issue and I am not. When I test build AlphaFold I build the GPU and CPU versions separately on different systems. Kenneth always builds both at the same time. Kenneth will be seeing the issue because the temp directory is created in the first build (CPU) and still there for the second (GPU).

@branfosj
Copy link
Member

I've filed google-deepmind/alphafold#365 and I've PR-ed adding the patch to the existing 2.1.x AlphaFold easyconfig in #14989

@boegel
Copy link
Member

boegel commented Feb 16, 2022

I think the next step here is to also use the patch from #14989 here.
@branfosj Can you confirm?

@ThomasHoffmann77 Are you up for making that extra change?

Copy link
Member

@branfosj branfosj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@boegel Yes

@ThomasHoffmann77 I've made suggested changes to add the relevant patch to both easyconfigs

@boegel
Copy link
Member

boegel commented Feb 17, 2022

Test report by @boegel
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
node3901.accelgor.os - Linux RHEL 8.4, x86_64, AMD EPYC 7413 24-Core Processor (zen3), 1 x NVIDIA NVIDIA A100-SXM4-80GB, 510.39.01, Python 3.6.8
See https://gist.github.com/90b71ea324f169c8cbb5d835ae2f402b for a full test report.

@boegel
Copy link
Member

boegel commented Feb 17, 2022

Test report by @boegel
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
node3300.joltik.os - Linux CentOS Linux 7.9.2009, x86_64, Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz (cascadelake), 1 x NVIDIA Tesla V100-SXM2-32GB, 510.47.03, Python 3.6.8
See https://gist.github.com/ec9e399cde3f84f5ac0108e74e29e927 for a full test report.

…2.1.2 easyconfigs using foss(cuda)/2020b toolchain
minor style tweaks + set missing environment variables for AlphaFold 2.1.2 easyconfigs using foss(cuda)/2020b toolchain
@boegel
Copy link
Member

boegel commented Feb 17, 2022

Test report by @boegel
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
node3901.accelgor.os - Linux RHEL 8.4, x86_64, AMD EPYC 7413 24-Core Processor (zen3), 1 x NVIDIA NVIDIA A100-SXM4-80GB, 510.39.01, Python 3.6.8
See https://gist.github.com/c313ba8461003dfb124d33d7966e6168 for a full test report.

@boegel
Copy link
Member

boegel commented Feb 17, 2022

Test report by @boegel
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
node3300.joltik.os - Linux CentOS Linux 7.9.2009, x86_64, Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz (cascadelake), 1 x NVIDIA Tesla V100-SXM2-32GB, 510.47.03, Python 3.6.8
See https://gist.github.com/3149fc5e760a443de71fef360ef7dbf2 for a full test report.

@boegel
Copy link
Member

boegel commented Feb 17, 2022

@boegelbot please test @ generoso
CORE_CNT=16
EB_ARGS="AlphaFold-2.1.2-foss-2020b-TensorFlow-2.5.0.eb"

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on login1

PR test command 'EB_PR=14905 EB_ARGS="AlphaFold-2.1.2-foss-2020b-TensorFlow-2.5.0.eb" /opt/software/slurm/bin/sbatch --job-name test_PR_14905 --ntasks="16" ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 8146

Test results coming soon (I hope)...

- notification for comment with ID 1043295718 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
cnx1 - Linux Rocky Linux 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz (haswell), Python 3.6.8
See https://gist.github.com/5db7be3e5eb402d3af3dfad730d58d5b for a full test report.

@boegel
Copy link
Member

boegel commented Feb 17, 2022

Going in, thanks @ThomasHoffmann77!

@boegel boegel merged commit bc4db87 into easybuilders:develop Feb 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants