Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hipstr flanking bp pos fix #147

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
e9e9ac4
added new field to TRRecord which holds position of full alleles
Kulivox Feb 2, 2022
0c7b6e6
HipSTR harmonization sets TRRecord position to position of record wit…
Kulivox Feb 2, 2022
927f1f3
Added correct types to mergeutils.py functions
Kulivox Feb 3, 2022
9ad411a
Added multipledispatch lib to requirements
Kulivox Feb 3, 2022
786bfee
Fixed problems revealed by static type analysis and added functions t…
Kulivox Feb 3, 2022
0a52fd9
Fixed bug in compareSTR.py
Kulivox Feb 3, 2022
a8cf756
Added utility function to tr_harmonizer.py
Kulivox Feb 3, 2022
e2db6cf
Implemented improved record comparison, records are now harmonized be…
Kulivox Feb 3, 2022
d098089
updated RELEASE_NOTES.rst
Kulivox Feb 3, 2022
9cec45a
added utility function to mergeutils.py
Kulivox Feb 4, 2022
1db076d
Removed multiple dispatch and created new function for choosing which…
Kulivox Feb 5, 2022
51401ea
added tests for additions to TRHarmonizer
Kulivox Feb 6, 2022
924092a
removed duplicated function in mergeutils.py
Kulivox Feb 6, 2022
9be775c
added test for getMinHarmonizedRecords from mergeutils
Kulivox Feb 6, 2022
c481ac7
updated compareSTR.py to use harmonized values UpdateComparisonResults
Kulivox Feb 6, 2022
482407b
added tests for comparison of harmonized HipSTR records
Kulivox Feb 6, 2022
6f33b7d
removed incomplete test
Kulivox Feb 7, 2022
e230a56
added docstring for new function in mergeutils.py
Kulivox Feb 7, 2022
a4cf7ce
Updated release notes to better describe changes
Kulivox Feb 7, 2022
43b5aff
Update docs, remove extra copies of VCFs
LiterallyUniqueLogin Feb 8, 2022
a77ed17
More doc cleanup
LiterallyUniqueLogin Feb 8, 2022
9956a7d
added new test cases to test_mergeutils.py
Kulivox Feb 9, 2022
834e689
Added new test cases to test_compareSTR.py
Kulivox Feb 9, 2022
6730e6a
removed HarmonizeRecords function, reordered conditions in main loop …
Kulivox Feb 11, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,6 @@ Development Notes
-----------------

* TRTools only currently supports diploid genotypes. Haploid calls, such as those on male chrX or chrY, are not yet supported but should be coming soon.
* CompareSTR currently only compares STR loci between two callsets if they have the same start and end coordinates. In the future we will add the capacity to take in a user specificied mapping of loci between the two callsets and use that to compare loci even if they don't completely overlap one another.

Contact Us
----------
Expand Down
10 changes: 10 additions & 0 deletions RELEASE_NOTES.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
Unreleased changes
-----

Bug fixes:

* https://github.com/gymreklab/TRTools/issues/146 fixed record positions being compared twice
* CompareSTR: Decision on which records are comparable is now based on data from harmonized TRRecords,
and not from the records directly from VCF readers. Thanks to this, HipSTR records which have different starting positions,
but position of their repeat is at the same position are compared correctly (harmonization step removes this difference).

4.0.1
-----

Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ scikit-learn==0.23.1
matplotlib==3.2.2
pysam==0.16.0.1
cython==0.29.20
cyvcf2==0.30.1
cyvcf2==0.30.1
4 changes: 2 additions & 2 deletions trtools/compareSTR/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ Example use cases include:

* Comparing calls to a "ground truth" set, e.g. from capillary electrophoresis data, to find call errors
* Comparing calls for the same tool using different parameter settings to identify differences due to bioinformatic processing
* Comparing calls for different tools. This only works if they used the same set of reference TRs. Please note that compareSTR uses matching chromosomes and positions to compare TRs. Therefore, our method is not able to compare TRs if the starting coordinates is changed by TR calling method (e.g., due to phasing with a nearby variant)
* Comparing calls for different tools. This only works if they used the same set of reference TRs. Please note that compareSTR only compares TRs with matching chromosomes and allele-positions (ignoring flanking base pairs)

CompareSTR optionally will stratify results based on a user-specified FORMAT field (e.g. depth, or quality score) and/or by repeat motif length.

Note: CompareSTR is designed to be used as a QC tool. While it may be able to pick up certain biological differences in some applications (e.g. identifying de novo mutations by comparing parent and child callsets or somatic mutations by comparing callsets from different tissues), most such applications would be better performed by specialized tools.
Note: CompareSTR is designed to be used as a QC tool. While it may be able to pick up certain biological differences in some applications (e.g. identifying de novo mutations by comparing parent and child callsets or somatic mutations by comparing callsets from different tissues), use-case specific analyses may be better performed by more specialized tools.

Note: CompareSTR has the ability to stratify comparisons based on quality scores. However, beware that quality scores output by different genotypers may not be directly comparable. You can use `qcSTR <https://trtools.readthedocs.io/en/latest/source/qcSTR.html>`_ to visualize the distribution of quality scores in each VCF file seprately.

Expand Down
37 changes: 20 additions & 17 deletions trtools/compareSTR/compareSTR.py
Original file line number Diff line number Diff line change
Expand Up @@ -521,8 +521,8 @@ def UpdateComparisonResults(record1, record2, sample_idxs,
dictionary of counts to update
"""
# Extract shared info
chrom = record1.vcfrecord.CHROM
pos = record1.vcfrecord.POS
chrom = record1.chrom
pos = record1.pos
period = len(record1.motif)
reflen = len(record1.ref_allele)/period

Expand Down Expand Up @@ -808,29 +808,32 @@ def main(args):
vcfregions = [vcfreaders[0](args.region), vcfreaders[1](args.region)]

### Walk through sorted readers, merging records as we go ###
current_records = [next(reader) for reader in vcfreaders]
is_min = mergeutils.GetMinRecords(current_records, chroms)

current_records = mergeutils.InitReaders(vcfreaders)
done = mergeutils.DoneReading(current_records)
vcf_types = [vcftype1, vcftype2]
num_records = 0
while not done:
if any([item is None for item in current_records]): break
if args.numrecords is not None and num_records >= args.numrecords: break

harmonized_records = [trh.HarmonizeRecord(vcf_types[i], current_records[i]) for i in range(len(current_records))]
# contains information about which record should be
# skipped in next iteration and whether it is currently comparable
is_min = mergeutils.GetMinHarmonizedRecords(harmonized_records, chroms)


if args.verbose: mergeutils.DebugPrintRecordLocations(current_records, is_min)
if mergeutils.CheckMin(is_min): return 1
if all([is_min]):
if (current_records[0].CHROM == current_records[1].CHROM and \
current_records[0].POS == current_records[1].POS):
UpdateComparisonResults(trh.HarmonizeRecord(vcftype1, current_records[0]), \
trh.HarmonizeRecord(vcftype2, current_records[1]), \
sample_idxs,
args.ignore_phasing, args.period,
format_fields, format_bins,
args.stratify_file,
overall_results, locus_results,
sample_results, bubble_results)
if all(is_min):
UpdateComparisonResults(*harmonized_records,
sample_idxs,
args.ignore_phasing, args.period,
format_fields, format_bins,
args.stratify_file,
overall_results, locus_results,
sample_results, bubble_results)

current_records = mergeutils.GetNextRecords(vcfregions, current_records, is_min)
is_min = mergeutils.GetMinRecords(current_records, chroms)
done = mergeutils.DoneReading(current_records)
num_records += 1

Expand Down
26 changes: 26 additions & 0 deletions trtools/compareSTR/tests/test_compareSTR.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,32 @@ def test_main(tmpdir, vcfdir):
retcode = main(args)
assert retcode == 1

def test_hipstr_position_harmonisation(tmpdir, vcfdir):
vcfcomp = os.path.join(vcfdir, "compareSTR_vcfs")

hipstr_vcf_1 = os.path.join(vcfcomp, "test_hipstr_flanking_bp_flanking.vcf.gz")
hipstr_vcf_2 = os.path.join(vcfcomp, "test_hipstr_flanking_bp_non_flanking.vcf.gz")

args = base_argparse(tmpdir)
args.vcf1 = hipstr_vcf_1
args.region = ""
args.vcftype1 = 'hipstr'
args.vcf2 = hipstr_vcf_2
args.vcftype2 = 'hipstr'
retcode = main(args)
assert retcode == 0

with open(tmpdir + "/test_compare-locuscompare.tab", "r") as out_overall:
lines = out_overall.readlines()
## vcf1 : flank bp at start of record, vcf2: no flanking bp
assert lines[1] == "1 101675 1.0 1.0 1\n"
## vcf1 : no flanking bp, vcf2: no flanking bp
assert lines[2] == "1 111675 1.0 1.0 1\n"
## vcf1 : flanking bp at the end, vcf2: no flanking bp
assert lines[3] == "1 112655 1.0 1.0 1\n"
## vcf1 : flanking bp at both sides, vcf2: no flanking bp
assert lines[4] == "1 125557 1.0 1.0 1\n"

def test_wrong_vcftype(tmpdir, vcfdir, capsys):
args = base_argparse(tmpdir)
vcfcomp = os.path.join(vcfdir, "compareSTR_vcfs")
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading