Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large structural variants not being fully captured in report tracks #100

Closed
zbonnstetter opened this issue Dec 20, 2023 · 4 comments
Closed

Comments

@zbonnstetter
Copy link

I noticed that the flanking range has only been centered around variant Start positions, and not variant End positions as well. This leads to particularly large variants getting cut off and not fully captured, with no way to remedy it outside of setting the flanking range unnecessarily high for every variant in a report, or manually examining every variant and regenerating reports for ones that have not been fully captured in the tracks.

Here is an example of a deletion that at the default 1kb flanking range has barely captured at all before becoming unintelligible due to extending beyond the flanking range
Screenshot 2023-12-20 at 10 07 57 AM

I would love to see (or if this functionality exists and I have simply missed it, know about) a way to ensure that every variant, regardless of size, is fully captured by default and that the flanking range applies to both the Start and End of the variant instead of solely the start.

@zbonnstetter zbonnstetter changed the title Large structural variants not being fully captured in rep Large structural variants not being fully captured in report tracks Dec 20, 2023
@jrobinso
Copy link
Contributor

Could you clarify what you mean by "End" position wrt to the VCF record? AFAIK this has not been standardized for SVs, although there are some conventions out there. An example VCF record line would help.

Technically, in a VCF record, only the start position is recorded. The end can sometimes be inferred from the alt allele, and sometimes from info tags.

@zbonnstetter
Copy link
Author

zbonnstetter commented Dec 20, 2023

Thank you for your swift response; sure, I've attached a truncated example VCF record below. As you mentioned, the end position is recorded in the info tags in this instance. If you have any advice on potentially leveraging the presence of an END info tag it would be much appreciated.

chr2 206135424 pbsv.DEL <Deleted_Sequence> C . PASS SVTYPE=DEL;END=206137035;SVLEN=-1611

@jrobinso
Copy link
Contributor

jrobinso commented Dec 21, 2023

I'm looking at this now. A caution about "large" structural variants, if they are too large the resulting report will not be loadable in a web browser because the sequence will swell the size of the html file beyond the loadable limit. Its impossible to say precisely, or even approximately, what the limit is, but its not infinite. A single variant of 1 MB is likely to translate to 100s of kb in file size.

I think we need to set a limit, and switch to a 2 locus view if the SV length exceeds that limit.

jrobinso added a commit that referenced this issue Dec 22, 2023
* Compute "end" position for indels
* Recognize CHR2 and END fields when present
* Capture entire SV, using mutlilocus view for SVs exceeding a threshold length (10kb by default).
@jrobinso
Copy link
Contributor

This is fixed with release 1.10.0. Also, I have added a new parameter maxlen which is the maximum length of a variant to display in a single view. This defaults to 10000 (10 kb). If a variant exceeds this length it will be displayed in split screen view. This is important to keep the size of the resulting html reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants