Compare str comparison improvements #151

Kulivox · 2022-02-15T14:30:22Z

Changes in this pull request are based on suggestions from this comment
#147 (comment)
My changes change what determines whether two records are comparable. Before, everything was based on function, which provided a list that said which records should be skipped in the next iteration. If this list was only made up of True values, records were compared. I've decided to change that, this list is still produced, but another, more configurable function determines the comparability of the two records.

…uring comparison

LiterallyUniqueLogin

Looking good! Just two comments

trtools/compareSTR/tests/test_compareSTR.py

trtools/compareSTR/compareSTR.py

LiterallyUniqueLogin

Noticed a few things that I should've caught in the first go round, but hopefully they're not a big deal to address.

trtools/utils/mergeutils.py

trtools/compareSTR/tests/test_compareSTR.py

LiterallyUniqueLogin · 2022-02-24T19:11:26Z

trtools/utils/tr_harmonizer.py

@@ -680,6 +682,15 @@ def __init__(self,
            self.has_fabricated_ref_allele = False
            self.ref_allele_length = len(ref_allele) / len(motif)

+        # declaration of end_pos variables
+        self.end_pos = int(self.pos + self.ref_allele_length * len(motif) - 1)
+        self.full_alleles_end_pos = self.end_pos


A few comments:

I would rather not define this here and then redefine it later if we have full_alleles, that could lead to confusion for a reader. Let's just have "if full_alleles is None then .... else ...."

If we're going to define this, can we document this (in the same way you documented end_pos). It can go in the attributes section.

It seems full_alleles_pos is already defined but not documented, do you mind documenting that likewise?

I've applied your suggestions.

LiterallyUniqueLogin · 2022-02-24T19:12:23Z

trtools/utils/tr_harmonizer.py

@@ -680,6 +682,15 @@ def __init__(self,
            self.has_fabricated_ref_allele = False
            self.ref_allele_length = len(ref_allele) / len(motif)

+        # declaration of end_pos variables
+        self.end_pos = int(self.pos + self.ref_allele_length * len(motif) - 1)


This cast is unsafe, int(1.9999) goes to 1, and it's possible that with finite precision this doesn't properly evaluate to an exact integer. Let's use round instead. We can have a comment here explaining why we're using round when all we want is a cast because in theory it should be a whole integer.

You are right, I didn't realize that self.ref_allele_length can sometimes be a float because of partial repeats, so I thought something like this can't happen. I changed this cast to round and added the comment.

LiterallyUniqueLogin · 2022-02-24T19:16:41Z

trtools/compareSTR/compareSTR.py

+    right_start, right_end = right.pos, right.end_pos
+
+    overlap = min(left_end, right_end) - max(left_start, right_start)
+    # This calculation contains max() - 1 to compensate


I don't think this is the right implementation. Shouldn't we be adding one to overlap above instead of subtracting one from the denominator?

Yes, you are right, this implementation only calculates the right score for completely overlapping sequences, otherwise, it would calculate a score that is slightly lower than intended. Thank you for pointing that out!

trtools/utils/mergeutils.py

LiterallyUniqueLogin · 2022-02-24T19:40:38Z

Changes all seem to be going in the right direction!

LiterallyUniqueLogin · 2022-03-21T17:50:35Z

Sorry for letting this hang - life got busy. This looks done! Please add an 'unreleased changes' section to RELEASE_NOTES.rst in the main directory with a bullet or two about what's changed here and I'll merge this into develop.

Kulivox added 10 commits February 13, 2022 15:21

refactored GetMinHarmonizedRecords to GetRecordComparabilityAndIncrement

3a13eaa

added basic overlap handler

92be29f

changed the way overlap handling works

d1f4e5f

Fixed crash that happened if there were no comparable records found d…

eb4a8aa

…uring comparison

updated comments

0278673

updated tests and fixed things that were revealed by them

f6335e5

added comment to mergeutils tests

d9eb53a

Tests for unhandled exception fix

9823133

Added tests for new changes to how is comparability determined

31931aa

added todo

96d8fc1

Kulivox marked this pull request as ready for review February 15, 2022 17:40

LiterallyUniqueLogin reviewed Feb 16, 2022

View reviewed changes

trtools/compareSTR/tests/test_compareSTR.py Outdated Show resolved Hide resolved

trtools/compareSTR/compareSTR.py Outdated Show resolved Hide resolved

Kulivox added 2 commits February 17, 2022 10:14

Added test case

536b470

Removed incomplete overlap handler wrapper

865aa1b

LiterallyUniqueLogin reviewed Feb 23, 2022

View reviewed changes

trtools/utils/mergeutils.py Outdated Show resolved Hide resolved

trtools/compareSTR/tests/test_compareSTR.py Show resolved Hide resolved

trtools/compareSTR/tests/test_compareSTR.py Outdated Show resolved Hide resolved

Kulivox added 6 commits February 24, 2022 13:58

added end position attribute to TRRecord in tr_harmonizer.py

f0c2e3d

removed unused function

922e600

updated comparison handler in compareSTR.py

42dcbab

updated tests mock objects because of change in tr hramonizer

eaf2f95

changed type of end position from float to int in TRRecord

4fefe7a

updated compareSTR tests

f5fc33a

LiterallyUniqueLogin reviewed Feb 24, 2022

View reviewed changes

Kulivox added 3 commits February 25, 2022 16:12

changed int cast of end positions to round

3462e4c

fixed incorrect implementation of overlap calculation

e4e6dd0

added documentation of new attributes to tr_harmonizer.py

cedd8d5

updated RELEASE_NOTES.rst and comments in compareSTR.py

0310ddd

LiterallyUniqueLogin merged commit 8dd9abf into gymrek-lab:develop Mar 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compare str comparison improvements #151

Compare str comparison improvements #151

Kulivox commented Feb 15, 2022 •

edited

Loading

LiterallyUniqueLogin left a comment

LiterallyUniqueLogin left a comment

LiterallyUniqueLogin Feb 24, 2022

Kulivox Feb 25, 2022

LiterallyUniqueLogin Feb 24, 2022

Kulivox Feb 25, 2022 •

edited

Loading

LiterallyUniqueLogin Feb 24, 2022

Kulivox Feb 25, 2022

LiterallyUniqueLogin commented Feb 24, 2022

LiterallyUniqueLogin commented Mar 21, 2022

Compare str comparison improvements #151

Compare str comparison improvements #151

Conversation

Kulivox commented Feb 15, 2022 • edited Loading

LiterallyUniqueLogin left a comment

Choose a reason for hiding this comment

LiterallyUniqueLogin left a comment

Choose a reason for hiding this comment

LiterallyUniqueLogin Feb 24, 2022

Choose a reason for hiding this comment

Kulivox Feb 25, 2022

Choose a reason for hiding this comment

LiterallyUniqueLogin Feb 24, 2022

Choose a reason for hiding this comment

Kulivox Feb 25, 2022 • edited Loading

Choose a reason for hiding this comment

LiterallyUniqueLogin Feb 24, 2022

Choose a reason for hiding this comment

Kulivox Feb 25, 2022

Choose a reason for hiding this comment

LiterallyUniqueLogin commented Feb 24, 2022

LiterallyUniqueLogin commented Mar 21, 2022

Kulivox commented Feb 15, 2022 •

edited

Loading

Kulivox Feb 25, 2022 •

edited

Loading