-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hipstr flanking bp pos fix #147
Hipstr flanking bp pos fix #147
Conversation
…hout flanking bp at the start
…hat use multiple dispatch
…fore the ability to compare them is calculated, fixes HipSTR issue
It's cool to see all this work you're doing. Let me know when it's in a state you like me to review. |
… harmonized record should be skipped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I made a few documentation changes myself and added that to the PR.
I also removed the test .vcf
files you created since you also created .vcf.gz
versions of them and we don't need duplicates.
Other than that, comments in line.
trtools/compareSTR/compareSTR.py
Outdated
num_records = 0 | ||
while not done: | ||
harmonized_records = trh.HarmonizeRecords(current_records, [vcftype1, vcftype2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this needs to be a method, I think it could just be a one line list comprehension (the method's not used in more than one place). But if you want a helper method because you think it helps organize the code, make it a private helper method in this module, not a public method in tr_harmonizer - I don't think the method's utility justifies enlarging that module's API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is also used in my changes to MergeSTR, so i will be using it more than once. These changes are currently in different branch, that is based on changes in this branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, can we move this to mergeutils then? I still don't think this helper method belongs in tr_harmonizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really think mergeutils is the correct place for this method (tr_harmonizer should be responsible for harmonization, mergeutils theoretically doesn't have anything to do with this). I'll convert it to list comprehension, as you suggested, the functionality is simple enough. (at least in compareSTR)
##FORMAT=<ID=MALLREADS,Number=1,Type=String,Description="Maximum likelihood bp diff in each read based on haplotype alignments for reads that span the repeat region by at least 5 base pairs"> | ||
##contig=<ID=1,length=249250621> | ||
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT GTEX-R55C-0003 | ||
1 101673 STR_38 TCAAAAAAAAAAAAAAAA TAAAAAAAAAAAAAAAAA . . INFRAME_PGEOM=0.95;INFRAME_UP=0.05;INFRAME_DOWN=0.05;OUTFRAME_PGEOM=0.95;OUTFRAME_UP=0.01;OUTFRAME_DOWN=0.01;BPDIFFS=0;START=101675;END=101690;PERIOD=1;AN=2;REFAC=1;AC=1;NSKIP=0;NFILT=0;DP=53;DSNP=0;DSTUTTER=2;DFLANKINDEL=1 GT:GB:Q:PQ:DP:DSNP:DSTUTTER:DFLANKINDEL:PDP:PSNP:GLDIFF:AB:DAB:FS:ALLREADS:MALLREADS 0|1:0|0:1.0:0.5:53:0:2:1:41.95|11.05:0|0:5.08:-5.03:49:-0.39:-19|1;-15|1;-2|1;0|40:-2|1;0|39;1|1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add the following three sets of records to this test:
- one where the STRs have the same ref/alt
- one where they have different ending flanks
- one where they have different beginning and ending flanks
Can we also add a test where the STRs do not align appropriately (either different starting bps, or same starting bps but different end bps) and confirm that compareSTR doesn't compare them (and confirm whatever behavior we think is appropriate for compareSTR there)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the test cases you mentioned, apart from the last one. compareSTR currently compares records that have different start/end bps, it just evaluates their seq concordance to 0. Did you mean test where records have different starting position, and harmonization step isn't able to make them comparable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Did you mean test where records have different starting position, and harmonization step isn't able to make them comparable?" - yes.
It seems like currently records from different files that overlap but have different starting bps are silently ignored. It would be worthwhile to instead output an error saying something like 'the records , only partially overlap and so were not compared' (and then write a test to see that that is happening).
And also the flip of that test, taking records from different files which have the same reference starting positions but different reference ending positions. Currently, those records are compared and marked as nonequivalent, it would be better if they were not compared so this was treated the same way as records which had different starting positions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are great suggestions and I'll start working on them, but I think I should create new branch and pull request for them, because they are not specifically related to hipstr related fix I implemented here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
trtools/testsupport/sample_vcfs/compareSTR_vcfs/test_hipstr_flanking_bp_non_flanking.vcf
Outdated
Show resolved
Hide resolved
Thank you for the changes. Just those two open comments remaining. |
…of compareSTR.py to correctly handle None Records from getNextRecords
Fixed problems with starting positions of HipSTR files, added type annotations to mergeutils functions and fixed bug from #146.
Some additional changes were done in tr_harmonizer, but they are compatible with existing code and shouldn't break anything.
Currently working on new test cases.