Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hipstr flanking bp pos fix #147

Merged

Conversation

Kulivox
Copy link
Contributor

@Kulivox Kulivox commented Feb 3, 2022

Fixed problems with starting positions of HipSTR files, added type annotations to mergeutils functions and fixed bug from #146.
Some additional changes were done in tr_harmonizer, but they are compatible with existing code and shouldn't break anything.
Currently working on new test cases.

@Kulivox Kulivox changed the base branch from master to develop February 4, 2022 11:28
@LiterallyUniqueLogin
Copy link
Contributor

It's cool to see all this work you're doing. Let me know when it's in a state you like me to review.

@Kulivox Kulivox marked this pull request as ready for review February 7, 2022 11:39
Copy link
Contributor

@LiterallyUniqueLogin LiterallyUniqueLogin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I made a few documentation changes myself and added that to the PR.

I also removed the test .vcf files you created since you also created .vcf.gz versions of them and we don't need duplicates.

Other than that, comments in line.

num_records = 0
while not done:
harmonized_records = trh.HarmonizeRecords(current_records, [vcftype1, vcftype2])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be a method, I think it could just be a one line list comprehension (the method's not used in more than one place). But if you want a helper method because you think it helps organize the code, make it a private helper method in this module, not a public method in tr_harmonizer - I don't think the method's utility justifies enlarging that module's API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is also used in my changes to MergeSTR, so i will be using it more than once. These changes are currently in different branch, that is based on changes in this branch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, can we move this to mergeutils then? I still don't think this helper method belongs in tr_harmonizer.

Copy link
Contributor Author

@Kulivox Kulivox Feb 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really think mergeutils is the correct place for this method (tr_harmonizer should be responsible for harmonization, mergeutils theoretically doesn't have anything to do with this). I'll convert it to list comprehension, as you suggested, the functionality is simple enough. (at least in compareSTR)

##FORMAT=<ID=MALLREADS,Number=1,Type=String,Description="Maximum likelihood bp diff in each read based on haplotype alignments for reads that span the repeat region by at least 5 base pairs">
##contig=<ID=1,length=249250621>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT GTEX-R55C-0003
1 101673 STR_38 TCAAAAAAAAAAAAAAAA TAAAAAAAAAAAAAAAAA . . INFRAME_PGEOM=0.95;INFRAME_UP=0.05;INFRAME_DOWN=0.05;OUTFRAME_PGEOM=0.95;OUTFRAME_UP=0.01;OUTFRAME_DOWN=0.01;BPDIFFS=0;START=101675;END=101690;PERIOD=1;AN=2;REFAC=1;AC=1;NSKIP=0;NFILT=0;DP=53;DSNP=0;DSTUTTER=2;DFLANKINDEL=1 GT:GB:Q:PQ:DP:DSNP:DSTUTTER:DFLANKINDEL:PDP:PSNP:GLDIFF:AB:DAB:FS:ALLREADS:MALLREADS 0|1:0|0:1.0:0.5:53:0:2:1:41.95|11.05:0|0:5.08:-5.03:49:-0.39:-19|1;-15|1;-2|1;0|40:-2|1;0|39;1|1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add the following three sets of records to this test:

  • one where the STRs have the same ref/alt
  • one where they have different ending flanks
  • one where they have different beginning and ending flanks

Can we also add a test where the STRs do not align appropriately (either different starting bps, or same starting bps but different end bps) and confirm that compareSTR doesn't compare them (and confirm whatever behavior we think is appropriate for compareSTR there)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the test cases you mentioned, apart from the last one. compareSTR currently compares records that have different start/end bps, it just evaluates their seq concordance to 0. Did you mean test where records have different starting position, and harmonization step isn't able to make them comparable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Did you mean test where records have different starting position, and harmonization step isn't able to make them comparable?" - yes.

It seems like currently records from different files that overlap but have different starting bps are silently ignored. It would be worthwhile to instead output an error saying something like 'the records , only partially overlap and so were not compared' (and then write a test to see that that is happening).

And also the flip of that test, taking records from different files which have the same reference starting positions but different reference ending positions. Currently, those records are compared and marked as nonequivalent, it would be better if they were not compared so this was treated the same way as records which had different starting positions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are great suggestions and I'll start working on them, but I think I should create new branch and pull request for them, because they are not specifically related to hipstr related fix I implemented here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

@LiterallyUniqueLogin
Copy link
Contributor

Thank you for the changes. Just those two open comments remaining.

…of compareSTR.py to correctly handle None Records from getNextRecords
@LiterallyUniqueLogin LiterallyUniqueLogin merged commit 5f7fc5c into gymrek-lab:develop Feb 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants