Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update mpox tree builds to current snakemake workflow #1706

Merged
merged 13 commits into from
Sep 5, 2024

Conversation

vincent-czi
Copy link
Contributor

Summary:

Updates our mpox workflow kick-off process to be compatible with the new changes present in nextstrain/mpox snakemake workflow and our fork of it.

Notes:

  • We have re-forked the nextstrain/mpox workflow. Previously, we had been using the fork chanzuckerberg/monkeypox with the branch subsampling. That was forked in 2022. We are now using chanzuckerberg/mpox and the branch subsample_by_distance.
    • We had been having issues with our mpox tree builds coming out looking very strange. This appears to be fixed by using the current version of nextstrain/mpox.
    • On top of that current version -- what our new fork was based off -- Dan brought in the subsampling work that enables the targeted tree builds CZGE offers.
  • All the work present in this PR is about getting the build_czge.yaml config to conform to the required config structure for the above mpox workflow and to shuffle around files so they're where it expects them to be.
    • Originally, the snakemake happened from /mpox inside the workflow repo. Now, running the tree build snakemake happens from inside /mpox/phylogenetic. This one change accounts for most of the shuffling around that had to occur.

@vincent-czi vincent-czi requested a review from danrlu September 5, 2024 00:02
@@ -176,7 +179,11 @@ def dump_yaml_template(
session, phylo_run.pathogen, phylo_run.template_args, group
)
builder: TemplateBuilder = TemplateBuilder(
phylo_run.tree_type, phylo_run.group, resolved_template_args, **context
phylo_run.tree_type,
phylo_run.pathogen,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a straight bugfix. The TemplateBuilder init args expect: tree_type, pathogen, group, template_args, **kwargs. Somehow we've been missing passing the pathogen to the TemplateBuilder. I'd think that would have pretty major ramifications though, so I'm surprised this didn't get noticed sooner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, I got myself a bit confused. This is still a straight bugfix, but it's fixing a seldom used develpment function dump_yaml_template that we can use for only generating the build config yaml. We hadn't used it for awhile and it never got updated while the TemplateBuilder usage where it mattered (later in this same file) did get updated. I forgot that context when I wrote the above comment. But that's why there were no major ramifications and it didn't get noticed: it almost never gets used, and I just wound up using it for this work since a lot of my work was around fixing up the yaml config.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dump_yaml_template was intended for use during development. nice catch!


# run snakemake, if run fails export the logs from snakemake to s3
(cd /mpox && snakemake --printshellcmds --configfile config/build_czge.yaml --resources=mem_mb=312320) || { $aws s3 cp /mpox/.snakemake/log/ "${s3_prefix}/logs/snakemake/" --recursive ; $aws s3 cp /mpox/results/mpxv/filter.log "${s3_prefix}/logs/mpox/" --recursive ; }
(cd /mpox/phylogenetic && snakemake --printshellcmds --configfile build_czge.yaml --resources=mem_mb=312320) || { $aws s3 cp /mpox/phylogenetic/.snakemake/log/ "${s3_prefix}/logs/snakemake/" --recursive ; $aws s3 cp /mpox/phylogenetic/results/aspen/logs/ "${s3_prefix}/logs/mpox/" --recursive ; }
Copy link
Collaborator

@danrlu danrlu Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the log location changed with the subsampling update so I changed it accordingly

Copy link
Collaborator

@danrlu danrlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good! Thanks very much for the hard work!

@vincent-czi vincent-czi merged commit 9caac01 into trunk Sep 5, 2024
13 checks passed
@vincent-czi vincent-czi deleted the vince/update-mpox-tree-builds branch September 5, 2024 20:50
thanhleviet pushed a commit to theiagenghi/theiagenepi that referenced this pull request Sep 12, 2024
…#1706)

* Fix yaml dump only to have expected args

* WIP: Alter base MPX template for new mpox flow

* WIP: Address notes from talking with Dan

* WIP: Adjust `max_sequences` to match old MPX template

* WIP: Keep `builds` in MPX output template

* Remove no longer needed comments

* Convert to being compatible with new mpox build

* Modify paths to use latest mpox workflow format

* Lint roller

* Update run_nextstrain_mpx.sh

* Fix test after mpox config format changes

* Remove arguments in filter differentiation

* Lint roller. Again.

---------

Co-authored-by: Dan Lu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants