-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update mpox tree builds to current snakemake workflow #1706
Merged
Merged
Changes from 10 commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
222b6b5
Fix yaml dump only to have expected args
vincent-czi ffff0b8
WIP: Alter base MPX template for new mpox flow
vincent-czi b536f2d
WIP: Address notes from talking with Dan
vincent-czi 23c4994
WIP: Adjust `max_sequences` to match old MPX template
vincent-czi 82cdca1
WIP: Keep `builds` in MPX output template
vincent-czi b2c5b9e
Remove no longer needed comments
vincent-czi 7ca2faf
Convert to being compatible with new mpox build
vincent-czi 5ddc542
Modify paths to use latest mpox workflow format
vincent-czi ac6bab2
Lint roller
vincent-czi 5e9eb05
Update run_nextstrain_mpx.sh
danrlu 3c09812
Fix test after mpox config format changes
vincent-czi 466515a
Remove arguments in filter differentiation
vincent-czi 33e7e91
Lint roller. Again.
vincent-czi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -33,10 +33,10 @@ set -x | |
|
||
# Download the latest mpox exclusions and clades list. This happens at RUN time, not BUILD time so that | ||
# we are always building trees with the latest upstream filters. | ||
wget https://raw.githubusercontent.com/nextstrain/mpox/master/phylogenetic/defaults/exclude_accessions.txt -O /mpox/config/exclude_accessions_mpxv.txt | ||
wget https://raw.githubusercontent.com/nextstrain/mpox/master/phylogenetic/defaults/clades.tsv -O /mpox/config/clades.tsv | ||
wget https://raw.githubusercontent.com/nextstrain/mpox/master/phylogenetic/defaults/exclude_accessions.txt -O /mpox/phylogenetic/defaults/exclude_accessions.txt | ||
wget https://raw.githubusercontent.com/nextstrain/mpox/master/phylogenetic/defaults/clades.tsv -O /mpox/phylogenetic/defaults/clades.tsv | ||
|
||
mkdir -p /mpox/data | ||
mkdir -p /mpox/phylogenetic/data | ||
key_prefix="phylo_run/${S3_FILESTEM}/${WORKFLOW_ID}" | ||
s3_prefix="s3://${aspen_s3_db_bucket}/${key_prefix}" | ||
|
||
|
@@ -52,12 +52,12 @@ mpox_git_rev=$(cd /mpox && git rev-parse HEAD) | |
aligned_upstream_location=$( | ||
python3 /usr/src/app/aspen/workflows/nextstrain_run/export.py \ | ||
--phylo-run-id "${WORKFLOW_ID}" \ | ||
--sequences /mpox/data/sequences_czge.fasta \ | ||
--metadata /mpox/data/metadata_czge.tsv \ | ||
--selected /mpox/data/include.txt \ | ||
--sequences /mpox/phylogenetic/data/sequences_czge.fasta \ | ||
--metadata /mpox/phylogenetic/data/metadata_czge.tsv \ | ||
--selected /mpox/phylogenetic/data/include.txt \ | ||
--sequence-type aligned \ | ||
--resolved-template-args "${RESOLVED_TEMPLATE_ARGS_SAVEFILE}" \ | ||
--builds-file /mpox/config/build_czge.yaml \ | ||
--builds-file /mpox/phylogenetic/build_czge.yaml \ | ||
--reset-status | ||
) | ||
|
||
|
@@ -66,33 +66,33 @@ aligned_upstream_sequences_s3_key=$(echo "${aligned_upstream_location}" | jq -r | |
aligned_upstream_metadata_s3_key=$(echo "${aligned_upstream_location}" | jq -r .metadata_key) | ||
|
||
# fetch the upstream dataset | ||
if [ ! -e /mpox/data/upstream_sequences.fasta ]; then | ||
$aws s3 cp --no-progress "s3://${aligned_upstream_s3_bucket}/${aligned_upstream_sequences_s3_key}" /mpox/data/upstream_sequences.fasta.xz | ||
unxz /mpox/data/*.xz | ||
if [ ! -e /mpox/phylogenetic/data/upstream_sequences.fasta ]; then | ||
$aws s3 cp --no-progress "s3://${aligned_upstream_s3_bucket}/${aligned_upstream_sequences_s3_key}" /mpox/phylogenetic/data/upstream_sequences.fasta.xz | ||
unxz /mpox/phylogenetic/data/*.xz | ||
fi | ||
if [ ! -e /mpox/data/upstream_metadata.tsv ]; then | ||
$aws s3 cp --no-progress "s3://${aligned_upstream_s3_bucket}/${aligned_upstream_metadata_s3_key}" /mpox/data/upstream_metadata.tsv.xz | ||
unxz /mpox/data/*.xz | ||
if [ ! -e /mpox/phylogenetic/data/upstream_metadata.tsv ]; then | ||
$aws s3 cp --no-progress "s3://${aligned_upstream_s3_bucket}/${aligned_upstream_metadata_s3_key}" /mpox/phylogenetic/data/upstream_metadata.tsv.xz | ||
unxz /mpox/phylogenetic/data/*.xz | ||
fi | ||
|
||
# If we've written out any samples, add them to the upstream metadata/fasta files | ||
if [ -e /mpox/data/sequences_czge.fasta ]; then | ||
python3 /usr/src/app/aspen/workflows/nextstrain_run/merge_mpx.py --required-metadata /mpox/data/metadata_czge.tsv --required-sequences /mpox/data/sequences_czge.fasta --upstream-metadata /mpox/data/upstream_metadata.tsv --upstream-sequences /mpox/data/upstream_sequences.fasta --destination-metadata /mpox/data/metadata.tsv --destination-sequences /mpox/data/sequences.fasta --required-match-column strain --upstream-match-column accession | ||
if [ -e /mpox/phylogenetic/data/sequences_czge.fasta ]; then | ||
python3 /usr/src/app/aspen/workflows/nextstrain_run/merge_mpx.py --required-metadata /mpox/phylogenetic/data/metadata_czge.tsv --required-sequences /mpox/phylogenetic/data/sequences_czge.fasta --upstream-metadata /mpox/phylogenetic/data/upstream_metadata.tsv --upstream-sequences /mpox/phylogenetic/data/upstream_sequences.fasta --destination-metadata /mpox/phylogenetic/data/metadata.tsv --destination-sequences /mpox/phylogenetic/data/sequences.fasta --required-match-column strain --upstream-match-column accession | ||
else | ||
cp /mpox/data/upstream_metadata.tsv /mpox/data/metadata.tsv | ||
cp /mpox/data/upstream_sequences.fasta /mpox/data/sequences.fasta | ||
cp /mpox/phylogenetic/data/upstream_metadata.tsv /mpox/phylogenetic/data/metadata.tsv | ||
cp /mpox/phylogenetic/data/upstream_sequences.fasta /mpox/phylogenetic/data/sequences.fasta | ||
fi; | ||
|
||
# Persist the build config we generated. | ||
$aws s3 cp /mpox/config/build_czge.yaml "${s3_prefix}/build_czge.yaml" | ||
$aws s3 cp /mpox/data/include.txt "${s3_prefix}/include.txt" | ||
$aws s3 cp /mpox/phylogenetic/build_czge.yaml "${s3_prefix}/build_czge.yaml" | ||
$aws s3 cp /mpox/phylogenetic/data/include.txt "${s3_prefix}/include.txt" | ||
|
||
# run snakemake, if run fails export the logs from snakemake to s3 | ||
(cd /mpox && snakemake --printshellcmds --configfile config/build_czge.yaml --resources=mem_mb=312320) || { $aws s3 cp /mpox/.snakemake/log/ "${s3_prefix}/logs/snakemake/" --recursive ; $aws s3 cp /mpox/results/mpxv/filter.log "${s3_prefix}/logs/mpox/" --recursive ; } | ||
(cd /mpox/phylogenetic && snakemake --printshellcmds --configfile build_czge.yaml --resources=mem_mb=312320) || { $aws s3 cp /mpox/phylogenetic/.snakemake/log/ "${s3_prefix}/logs/snakemake/" --recursive ; $aws s3 cp /mpox/phylogenetic/results/aspen/logs/ "${s3_prefix}/logs/mpox/" --recursive ; } | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the log location changed with the subsampling update so I changed it accordingly |
||
|
||
# upload the tree to S3. The variable key is created to use later | ||
key="${key_prefix}/mpx_czge.json" | ||
$aws s3 cp /mpox/auspice/monkeypox_mpxv.json "s3://${aspen_s3_db_bucket}/${key}" | ||
$aws s3 cp /mpox/phylogenetic/auspice/monkeypox_mpxv.json "s3://${aspen_s3_db_bucket}/${key}" | ||
|
||
# update aspen | ||
aspen_workflow_rev=WHATEVER | ||
|
@@ -111,4 +111,4 @@ python3 /usr/src/app/aspen/workflows/nextstrain_run/save.py \ | |
--bucket "${aspen_s3_db_bucket}" \ | ||
--key "${key}" \ | ||
--resolved-template-args "${RESOLVED_TEMPLATE_ARGS_SAVEFILE}" \ | ||
--tree-path /mpox/auspice/monkeypox_mpxv.json \ | ||
--tree-path /mpox/phylogenetic/auspice/monkeypox_mpxv.json \ |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a straight bugfix. The TemplateBuilder init args expect:
tree_type, pathogen, group, template_args, **kwargs
. Somehow we've been missing passing thepathogen
to the TemplateBuilder. I'd think that would have pretty major ramifications though, so I'm surprised this didn't get noticed sooner.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, I got myself a bit confused. This is still a straight bugfix, but it's fixing a seldom used develpment function
dump_yaml_template
that we can use for only generating the build config yaml. We hadn't used it for awhile and it never got updated while the TemplateBuilder usage where it mattered (later in this same file) did get updated. I forgot that context when I wrote the above comment. But that's why there were no major ramifications and it didn't get noticed: it almost never gets used, and I just wound up using it for this work since a lot of my work was around fixing up the yaml config.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dump_yaml_template
was intended for use during development. nice catch!