Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

genome annotation improvements #961

Merged
merged 2 commits into from
Jun 22, 2022
Merged

Conversation

jameshadfield
Copy link
Member

For a detailed write-up of the bug which motivated this PR, see
#881.

The main work is in the 2nd commit:

By storing the (nucleotide) genome annotation in the node-data produced
from augur-ancestral we make this information available for export.
Previously this information was only exported by augur translate which
was problematic for workflows which didn't perform translation.

No changes are needed to augur export v2 (which may now process
multiple "annotations" blocks) due to the behavior of
NodeData.deep_add_or_update which will recurse into dicts in
annotation blocks and when confronted with non-dict values which already
exist overwrite them. This poses a potential problem where two node-data
JSONs which (e.g.) define different annotations['nuc'] coordinates
will not raise any error and the resulting coodinates are dependent on
the order the node-data JSONs were provided to augur export v2.

Closes #881.

Input trees with non-labelled internal nodes always printed a WARNING
but this was going to stdout rather than stderr. This in turn allowed
the tests to pass when they shouldn't have.

Note: I think this warning should be a fatal error in almost every
conceivable case. To describe ancestral sequences without knowing which
node they correspond to in the input tree is a recipe for confusion.
@jameshadfield
Copy link
Member Author

The following test is failing (locally & on CI), but it's also failing for me on current master (where CI passes...). @victorlin any ideas?

$ cram --shell=/bin/bash tests/functional/frequencies.t

@@ -51,7 +51,7 @@
   >  --exclude-paths "root['generated_by']['version']" "root['pivots']" -- \
   >  "frequencies/zika_tip-frequencies_with_relative_dates.json" \
   >  "$TMP/tip-frequencies.json"
-  {}
+  {'iterable_item_added': {"root['BRA/2016/FC_6706']['frequencies'][2]": 0.0, "root['COL/FLR_00008/2015']['frequencies'][2]": 0.0, "root['Colombia/2016/ZC204Se']['frequencies'][2]": 0.0, "root['DOM/2016/BB_0183']['frequencies'][2]": 0.0, "root['EcEs062_16']['frequencies'][2]": 0.0, "root['HND/2016/HU_ME59']['frequencies'][2]": 0.0, "root['PAN/CDC_259359_V1_V3/2015']['frequencies'][2]": 0.0, "root['PRVABC59']['frequencies'][2]": 0.0, "root['VEN/UF_1/2016']['frequencies'][2]": 0.0, "root['ZKC2/2016']['frequencies'][2]": 0.0}}

@victorlin
Copy link
Member

CI is temp patched with #962, just need to rebase onto latest master here and other PRs. Or run again when it's not the first of the month 🙃

For a detailed write-up of the bug which motivated this commit, see
#881.

By storing the (nucleotide) genome annotation in the node-data produced
from augur-ancestral we make this information available for export.
Previously this information was only exported by `augur translate` which
was problematic for workflows which didn't perform translation.

No changes are needed to `augur export v2` (which may now process
multiple "annotations" blocks) due to the behavior of
`NodeData.deep_add_or_update` which will recurse into dicts in
annotation blocks and when confronted with non-dict values which already
exist overwrite them. This poses a potential problem where two node-data
JSONs which (e.g.) define different `annotations['nuc']` coordinates
will not raise any error and the output coodinates are dependent on
the order the node-data JSONs were provided to `augur export v2`.

Closes #881.
@jameshadfield jameshadfield force-pushed the feat/annotation-improvements branch from a404eb1 to 8ef9753 Compare June 21, 2022 00:17
@jameshadfield
Copy link
Member Author

(No changes, but force-pushed with a new SHA to trigger CI)

@jameshadfield jameshadfield requested a review from a team June 21, 2022 00:21
@codecov
Copy link

codecov bot commented Jun 21, 2022

Codecov Report

Merging #961 (8ef9753) into master (446b39d) will increase coverage by 2.17%.
The diff coverage is 25.00%.

@@            Coverage Diff             @@
##           master     #961      +/-   ##
==========================================
+ Coverage   59.36%   61.53%   +2.17%     
==========================================
  Files          43       52       +9     
  Lines        6014     6794     +780     
  Branches     1539     1768     +229     
==========================================
+ Hits         3570     4181     +611     
- Misses       2185     2348     +163     
- Partials      259      265       +6     
Impacted Files Coverage Δ
augur/ancestral.py 66.37% <25.00%> (-5.18%) ⬇️
augur/io/__init__.py 100.00% <0.00%> (ø)
augur/io/metadata.py 100.00% <0.00%> (ø)
augur/measurements.py 31.25% <0.00%> (ø)
augur/io/sequences.py 100.00% <0.00%> (ø)
augur/util_support/metadata_file.py 100.00% <0.00%> (ø)
augur/io/shell_command_runner.py 91.89% <0.00%> (ø)
augur/io/print.py 100.00% <0.00%> (ø)
augur/io/vcf.py 42.50% <0.00%> (ø)
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 446b39d...8ef9753. Read the comment docs.

@jameshadfield jameshadfield merged commit f07cc5f into master Jun 22, 2022
@jameshadfield jameshadfield deleted the feat/annotation-improvements branch June 22, 2022 22:14
@victorlin victorlin mentioned this pull request Jun 30, 2022
@victorlin victorlin added this to the Next release X.X.X milestone Jun 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Entropy panel unavailable if mutations aren't translated
3 participants