Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcftools corrupts duplicate GT format fields #1733

Closed
anthakki opened this issue Jan 19, 2024 · 1 comment · Fixed by #1752
Closed

bcftools corrupts duplicate GT format fields #1733

anthakki opened this issue Jan 19, 2024 · 1 comment · Fixed by #1752
Assignees

Comments

@anthakki
Copy link

Running bcftools (here filter but seems to affect other commands as well) on a VCF with duplicate GT (genotype) FORMAT fields seems to change all but the last GT value. Looks like ./., 0/0, 0/1, 1/1 get converted to 0,0, 2,2, 2,4, and 4,4, respectively. I'm not 100% sure if duplicate GT values are legal, but I would expect an error instead of invalid data. Non-GT fields don't seem to have the problem. I'm using bcftools 1.19, but this can also be reproduced in bcftools 1.12.

Minimized test case follows. I would expect the payload to match that of the input.

$ cat foo.vcf 
##fileformat=VCFv4.1
##FORMAT=<ID=GT,Number=1,Type=String>
##FORMAT=<ID=X,Number=1,Type=Integer>
##contig=<ID=chr1>
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	A
chr1	1	.	A	C	.	.	.	GT:X:GT:X	0/1:9:0/1:9
$ bcftools filter foo.vcf | sed '/^#/d'
chr1	1	.	A	C	.	PASS	.	GT:X:GT:X	2,4:9:0/1:9
@pd3
Copy link
Member

pd3 commented Jan 19, 2024

Duplicate tags are not allowed. I am not sure if it is explicitly stated in the VCF specification, but that was the intention.

The parsing is done in htslib, ideally it should give a warning and drop the duplicate fields. Obviously, easiest solution is to avoid producing invalid VCFs :)

@pd3 pd3 transferred this issue from samtools/bcftools Jan 23, 2024
pd3 added a commit to pd3/htslib that referenced this issue Feb 1, 2024
The removal of large tags introduced by b49eea4 and 9db565d could
not work correctly, the memmove pointers were wrong!

Resolves samtools#1733
jkbonfield pushed a commit to pd3/htslib that referenced this issue Mar 14, 2024
The removal of large tags introduced by b49eea4 and 9db565d could
not work correctly, the memmove pointers were wrong!

Resolves samtools#1733
jkbonfield pushed a commit to pd3/htslib that referenced this issue Mar 14, 2024
The removal of large tags introduced by b49eea4 and 9db565d could
not work correctly, the memmove pointers were wrong!

Resolves samtools#1733
jkbonfield pushed a commit that referenced this issue Mar 14, 2024
The removal of large tags introduced by b49eea4 and 9db565d could
not work correctly, the memmove pointers were wrong!

Resolves #1733
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants