Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pangolin 4.0.1 ignores --max-ambig filter #405

Closed
GorgonVZ opened this issue Apr 5, 2022 · 6 comments
Closed

pangolin 4.0.1 ignores --max-ambig filter #405

GorgonVZ opened this issue Apr 5, 2022 · 6 comments

Comments

@GorgonVZ
Copy link

GorgonVZ commented Apr 5, 2022

Dear pangos,
since the last major update I run into problems regarding the --max-ambig filter. Running the command below I would expect every sequence with N-content >= 10% to fail. However, despite presenting fastas with >20% N content all of them are passed within lineage.csv
It looks like pangolin ignores the parameter and falls back to default 30% N Content.
(Fastas are attached in comment section)

###The N-content of my Input fasta I calculated with
seqtk comp Test.fasta | awk '{print ($3+$4+$5+$6)/$2}'

###Pangolin command I used
pangolin --no-temp Run32_CovSmallerThan80.fasta --min-length 26900 --max-ambig 0.1

###Versions
pangolin-data 1.2.133
pangolin 4.0.1

###Resulting csv
taxon,lineage,conflict,ambiguity_score,scorpio_call,scorpio_support,scorpio_conflict,scorpio_notes,version,pangolin_version,scorpio_version,constellation_version,is_designated,qc_status,qc_notes,note
19474600_S931,BA.2,0.0,,Omicron (BA.2-like),0.66,0.0,scorpio call: Alt alleles 43; Ref alleles 0; Amb alleles 22; Oth alleles 0,PUSHER-v1.2.133,4.0.1,0.3.16,v0.1.4,False,pass,Ambiguous_content:0.21,Usher placements: BA.2(7/7)
19474614_S785,BA.2,0.0,,Omicron (BA.2-like),0.77,0.0,scorpio call: Alt alleles 50; Ref alleles 0; Amb alleles 15; Oth alleles 0,PUSHER-v1.2.133,4.0.1,0.3.16,v0.1.4,False,pass,Ambiguous_content:0.27,Usher placements: BA.2(20/20)
19474640_S786,BA.2,0.10256410256410256,,Probable Omicron (BA.2-like),0.57,0.0,scorpio call: Alt alleles 37; Ref alleles 0; Amb alleles 28; Oth alleles 0,PUSHER-v1.2.133,4.0.1,0.3.16,v0.1.4,False,pass,Ambiguous_content:0.25,Usher placements: BA.2(35/39) BA.2.3(4/39)
19474674_S1004,BA.2,0.0,,Probable Omicron (BA.2-like),0.63,0.0,scorpio call: Alt alleles 41; Ref alleles 0; Amb alleles 24; Oth alleles 0,PUSHER-v1.2.133,4.0.1,0.3.16,v0.1.4,False,pass,Ambiguous_content:0.26,Usher placements: BA.2(3/3)
19474939_S1430,BA.1.1,0.42857142857142855,,Probable Omicron (BA.1-like),0.58,0.0,scorpio call: Alt alleles 34; Ref alleles 0; Amb alleles 23; Oth alleles 2,PUSHER-v1.2.133,4.0.1,0.3.16,v0.1.4,False,pass,Ambiguous_content:0.3,Usher placements: BA.1(2/49) BA.1.1(28/49) BA.1.1.11(4/49) BA.1.1.15(3/49) BA.1.1.7(12/49)
19477835_S924,BA.2,0.0,,Probable Omicron (BA.2-like),0.6,0.0,scorpio call: Alt alleles 39; Ref alleles 0; Amb alleles 26; Oth alleles 0,PUSHER-v1.2.133,4.0.1,0.3.16,v0.1.4,False,pass,Ambiguous_content:0.26,Usher placements: BA.2(3/3)
19477853_S69,BA.2,0.0,,Omicron (BA.2-like),0.65,0.0,scorpio call: Alt alleles 42; Ref alleles 0; Amb alleles 23; Oth alleles 0,PUSHER-v1.2.133,4.0.1,0.3.16,v0.1.4,False,pass,Ambiguous_content:0.21,Usher placements: BA.2(3/3)
19477906_S359,BA.2,0.38095238095238093,,Probable Omicron (BA.2-like),0.58,0.0,scorpio call: Alt alleles 38; Ref alleles 0; Amb alleles 27; Oth alleles 0,PUSHER-v1.2.133,4.0.1,0.3.16,v0.1.4,False,pass,Ambiguous_content:0.29,Usher placements: BA.2(39/63) BA.2.3(24/63)
19477931_S913,BA.2,0.06060606060606061,,Probable Omicron (BA.2-like),0.57,0.0,scorpio call: Alt alleles 37; Ref alleles 0; Amb alleles 28; Oth alleles 0,PUSHER-v1.2.133,4.0.1,0.3.16,v0.1.4,False,pass,Ambiguous_content:0.29,Usher placements: BA.2(31/33) BA.2.3(2/33)
19477968_S791,BA.2,0.0,,Omicron (BA.2-like),0.63,0.0,scorpio call: Alt alleles 41; Ref alleles 0; Amb alleles 24; Oth alleles 0,PUSHER-v1.2.133,4.0.1,0.3.16,v0.1.4,False,pass,Ambiguous_content:0.26,Usher placements: BA.2(3/3)
19478074_S372,BA.1.1.14,0.0,,Omicron (BA.1-like),0.68,0.0,scorpio call: Alt alleles 40; Ref alleles 0; Amb alleles 17; Oth alleles 2,PUSHER-v1.2.133,4.0.1,0.3.16,v0.1.4,False,pass,Ambiguous_content:0.24,Usher placements: BA.1.1.14(2/2)
19479516_S1434,BA.1.1.1,0.23076923076923078,,Probable Omicron (BA.1-like),0.54,0.0,scorpio call: Alt alleles 32; Ref alleles 0; Amb alleles 25; Oth alleles 2,PUSHER-v1.2.133,4.0.1,0.3.16,v0.1.4,False,pass,Ambiguous_content:0.27,Usher placements: BA.1.1(3/13) BA.1.1.1(10/13)
19479526_S1425,BA.2,0.0425531914893617,,Probable Omicron (BA.2-like),0.58,0.0,scorpio call: Alt alleles 38; Ref alleles 0; Amb alleles 27; Oth alleles 0,PUSHER-v1.2.133,4.0.1,0.3.16,v0.1.4,False,pass,Ambiguous_content:0.29,Usher placements: BA.2(45/47) BA.2.3(2/47)
19524787_S799,BA.2,0.0,,Omicron (BA.2-like),0.77,0.0,scorpio call: Alt alleles 50; Ref alleles 0; Amb alleles 15; Oth alleles 0,PUSHER-v1.2.133,4.0.1,0.3.16,v0.1.4,False,pass,Ambiguous_content:0.27,Usher placements: BA.2(2/2)
19538102_S800,BA.2,0.0,,Omicron (BA.2-like),0.68,0.0,scorpio call: Alt alleles 44; Ref alleles 0; Amb alleles 21; Oth alleles 0,PUSHER-v1.2.133,4.0.1,0.3.16,v0.1.4,False,pass,Ambiguous_content:0.22,Usher placements: BA.2(1/1)
19538721_S1071,BA.2,0.046511627906976744,,Probable Omicron (BA.2-like),0.58,0.02,scorpio call: Alt alleles 38; Ref alleles 1; Amb alleles 26; Oth alleles 0,PUSHER-v1.2.133,4.0.1,0.3.16,v0.1.4,False,pass,Ambiguous_content:0.28,Usher placements: BA.2(41/43) BA.2.3(2/43)
36755787_S86,BA.2,0.07407407407407407,,Omicron (BA.2-like),0.66,0.0,scorpio call: Alt alleles 43; Ref alleles 0; Amb alleles 22; Oth alleles 0,PUSHER-v1.2.133,4.0.1,0.3.16,v0.1.4,False,pass,Ambiguous_content:0.21,Usher placements: BA.2(25/27) BA.2.3(2/27)
94756791_S88,BA.2,0.07142857142857142,,Omicron (BA.2-like),0.63,0.0,scorpio call: Alt alleles 41; Ref alleles 0; Amb alleles 24; Oth alleles 0,PUSHER-v1.2.133,4.0.1,0.3.16,v0.1.4,False,pass,Ambiguous_content:0.26,Usher placements: BA.2(26/28) BA.2.3(2/28)

@GorgonVZ
Copy link
Author

GorgonVZ commented Apr 5, 2022

@aineniamh
Copy link
Member

Thanks for flagging this, I'll have a look today- the logic of how we check these things has changed as we attempt alignment prior to the qc check now.

@GorgonVZ
Copy link
Author

GorgonVZ commented Apr 6, 2022

Thanks,
you are awesome!!!

@KartofellSalat
Copy link

KartofellSalat commented Apr 6, 2022

Same issue with the --min-length parameter, I've submitted the way too short sequence and it went through, actually I did not find anywhere this parameter was used anymore.

It was flagged as fail with an ambiguity_content of 0.9
tooShort.fasta.gz

@aineniamh
Copy link
Member

aineniamh commented Apr 6, 2022

I've added in a fix now! Just waiting for tests to finish. The way we process sequences now is we actually align them before checking the qc.

Sequences that dont successfully align (potentially because of qc issues) are reported at the end as 'failed to map'.

What this means for the --min-length argument is that by the time we check the sequences, they're all the same length.
What I've done is convert the min length parameter to a proportion of the full alignment in terms of ambiguity.

This means that both --min-length and --max-ambig get interpreted as the same parameter (mostly keeping the min length argument for consistency's sake) and in the case of both in use, the most stringent filter will get used.

@aineniamh
Copy link
Member

Resolved in pangolin v4.0.3!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants