-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite complex SV functional annotation in SVAnnotate #8516
Changes from 1 commit
15944df
313bed5
7335a21
ee474f1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -123,6 +123,10 @@ | |
* duplicated. The partial duplication occurs when a duplication has one breakpoint within the transcript and one | ||
* breakpoint after the end of the transcript. When the duplication is in tandem, the result is that there is one | ||
* intact copy of the full endogenous gene.</p></li> | ||
* <li><p><i>PREDICTED_PARTIAL_DISPERSED_DUP</i><br /> | ||
* Gene(s) which are partially overlapped by an SV's dispersed duplication. This annotation is applied to a | ||
* dispersed (non-tandem) duplication segment that is part of a complex SV if the duplicated segment overlaps part | ||
* of a transcript but not the entire transcript (which would be a PREDICTED_COPY_GAIN event).</p></li> | ||
* <li><p><i>PREDICTED_INV_SPAN</i><br /> | ||
* Gene(s) which are entirely spanned by an SV's inversion. A whole-gene inversion occurs when an inversion spans | ||
* the entire transcript, from the first base of the 5' UTR to the last base of the 3' UTR. </p></li> | ||
|
@@ -354,6 +358,7 @@ private void addAnnotationInfoKeysToHeader(final VCFHeader header) { | |
header.addMetaDataLine(new VCFInfoHeaderLine(GATKSVVCFConstants.NONCODING_SPAN, VCFHeaderLineCount.UNBOUNDED, VCFHeaderLineType.String, "Class(es) of noncoding elements spanned by SV.")); | ||
header.addMetaDataLine(new VCFInfoHeaderLine(GATKSVVCFConstants.NONCODING_BREAKPOINT, VCFHeaderLineCount.UNBOUNDED, VCFHeaderLineType.String, "Class(es) of noncoding elements disrupted by SV breakpoint.")); | ||
header.addMetaDataLine(new VCFInfoHeaderLine(GATKSVVCFConstants.NEAREST_TSS, VCFHeaderLineCount.UNBOUNDED, VCFHeaderLineType.String, "Nearest transcription start site to an intergenic variant.")); | ||
header.addMetaDataLine(new VCFInfoHeaderLine(GATKSVVCFConstants.PARTIAL_DISPERSED_DUP, VCFHeaderLineCount.UNBOUNDED, VCFHeaderLineType.String, "Gene(s) overlapped partially by a dispersed duplication in a complex SV.")); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similarly, I might suggest, "Gene(s) overlapped partially by a the duplicated interval involved in a dispersed duplication event in a complex SV" |
||
|
||
} | ||
|
||
|
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -211,14 +211,20 @@ protected static String annotateDeletion(final SimpleInterval variantInterval, | |||||||
* Get consequence of duplication variant on transcript | ||||||||
* @param variantInterval - SimpleInterval representing structural variant | ||||||||
* @param gtfTranscript - protein-coding GTF transcript | ||||||||
* @param isComplex - boolean: true if SV type is CPX, false otherwise | ||||||||
* @return - consequence of duplication variant on transcript | ||||||||
*/ | ||||||||
@VisibleForTesting | ||||||||
protected static String annotateDuplication(final SimpleInterval variantInterval, | ||||||||
final GencodeGtfTranscriptFeature gtfTranscript) { | ||||||||
final GencodeGtfTranscriptFeature gtfTranscript, | ||||||||
boolean isComplex) { | ||||||||
final SimpleInterval transcriptInterval = new SimpleInterval(gtfTranscript); | ||||||||
if (variantSpansFeature(variantInterval, transcriptInterval)) { | ||||||||
return GATKSVVCFConstants.COPY_GAIN; | ||||||||
return GATKSVVCFConstants.COPY_GAIN; // return CG immediately because same regardless of isDispersed | ||||||||
} | ||||||||
if (isComplex) { | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Just a little clearer |
||||||||
// if not CG, overlaps part of gene --> if complex, immediate PARTIAL_DISPERSED_DUP | ||||||||
return GATKSVVCFConstants.PARTIAL_DISPERSED_DUP; | ||||||||
} else if (variantOverlapsTranscriptionStartSite(variantInterval, gtfTranscript)) { | ||||||||
return GATKSVVCFConstants.TSS_DUP; | ||||||||
} else if (!transcriptInterval.contains(variantInterval)) { | ||||||||
|
@@ -276,7 +282,7 @@ protected static String annotateDuplication(final SimpleInterval variantInterval | |||||||
protected static String annotateCopyNumberVariant(final SimpleInterval variantInterval, | ||||||||
final GencodeGtfTranscriptFeature gtfTranscript, | ||||||||
final Set<String> MSVExonOverlapClassifications) { | ||||||||
final String consequence = annotateDuplication(variantInterval, gtfTranscript); | ||||||||
final String consequence = annotateDuplication(variantInterval, gtfTranscript, false); | ||||||||
if (MSVExonOverlapClassifications.contains(consequence)) { | ||||||||
return GATKSVVCFConstants.MSV_EXON_OVERLAP; | ||||||||
} else { | ||||||||
|
@@ -338,12 +344,14 @@ protected static String annotateBreakend(final SimpleInterval variantInterval, | |||||||
* Add consequence of structural variant on an overlapping transcript to consequence dictionary for variant | ||||||||
* @param variantInterval - SimpleInterval representing structural variant | ||||||||
* @param svType - SV type | ||||||||
* @param isComplex - boolean: true if SV type is CPX, false if not | ||||||||
* @param transcript - protein-coding GTF transcript | ||||||||
* @param variantConsequenceDict - running map of consequence -> feature name for variant to update | ||||||||
*/ | ||||||||
@VisibleForTesting | ||||||||
protected void annotateTranscript(final SimpleInterval variantInterval, | ||||||||
final GATKSVVCFConstants.StructuralVariantAnnotationType svType, | ||||||||
final boolean isComplex, | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So I am realizing that there is a slight semantic issue with I think the issue stems from the fact that this method is intended for use on an I think at least some of the variable names should change here, It's not hugely a concern since this is intended to be a private method, but this would improve the readability I think. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've been thinking about this and I'm not sure of the optimal names here.
|
||||||||
final GencodeGtfTranscriptFeature transcript, | ||||||||
final Map<String, Set<String>> variantConsequenceDict) { | ||||||||
final String consequence; | ||||||||
|
@@ -355,7 +363,7 @@ protected void annotateTranscript(final SimpleInterval variantInterval, | |||||||
consequence = annotateInsertion(variantInterval, transcript); | ||||||||
break; | ||||||||
case DUP: | ||||||||
consequence = annotateDuplication(variantInterval, transcript); | ||||||||
consequence = annotateDuplication(variantInterval, transcript, isComplex); | ||||||||
break; | ||||||||
case CNV: | ||||||||
consequence = annotateCopyNumberVariant(variantInterval,transcript, MSV_EXON_OVERLAP_CLASSIFICATIONS); | ||||||||
|
@@ -477,36 +485,88 @@ protected static GATKSVVCFConstants.StructuralVariantAnnotationType getSVType(fi | |||||||
* Add protein-coding annotations for any transcripts overlapping the variant to the variant consequence dictionary | ||||||||
* @param variantInterval - SimpleInterval representing structural variant | ||||||||
* @param svType - SV type | ||||||||
* @param isComplex - boolean: true if SV type is CPX, false otherwise | ||||||||
* @param variantConsequenceDict - running map of consequence -> feature name for variant to update | ||||||||
*/ | ||||||||
@VisibleForTesting | ||||||||
protected void annotateGeneOverlaps(final SimpleInterval variantInterval, | ||||||||
final GATKSVVCFConstants.StructuralVariantAnnotationType svType, | ||||||||
final boolean isComplex, | ||||||||
final Map<String, Set<String>> variantConsequenceDict) { | ||||||||
final Iterator<SVIntervalTree.Entry<GencodeGtfTranscriptFeature>> gtfTranscriptsForVariant = | ||||||||
gtfIntervalTrees.getTranscriptIntervalTree().overlappers( | ||||||||
SVUtils.locatableToSVInterval(variantInterval, sequenceDictionary) | ||||||||
); | ||||||||
for (Iterator<SVIntervalTree.Entry<GencodeGtfTranscriptFeature>> it = gtfTranscriptsForVariant; it.hasNext(); ) { | ||||||||
SVIntervalTree.Entry<GencodeGtfTranscriptFeature> transcriptEntry = it.next(); | ||||||||
annotateTranscript(variantInterval, svType, transcriptEntry.getValue(), variantConsequenceDict); | ||||||||
annotateTranscript(variantInterval, svType, isComplex, transcriptEntry.getValue(), variantConsequenceDict); | ||||||||
} | ||||||||
} | ||||||||
|
||||||||
/** | ||||||||
* Get section of one interval (primaryInterval) that is not overlapped by the other (secondaryInterval) | ||||||||
* @param primaryInterval - SimpleInterval | ||||||||
* @param secondaryInterval - SimpleInterval overlapping (but not fully containing) primaryInterval | ||||||||
* @return - SimpleInterval representing the portion of primaryInterval not overlapped by secondaryInterval | ||||||||
*/ | ||||||||
@VisibleForTesting | ||||||||
protected static SimpleInterval getNonOverlappingInterval(final SimpleInterval primaryInterval, | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not add this to the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I started doing this originally but then realized that it's not super generalizable because I made a lot of assumptions about the intervals. I could still add it but would need to enforce the secondary interval overlapping but not containing the primary interval There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider also the name |
||||||||
final SimpleInterval secondaryInterval) { | ||||||||
if (primaryInterval.getStart() < secondaryInterval.getStart()) { | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should check that the contigs are the same. |
||||||||
return new SimpleInterval(primaryInterval.getContig(), primaryInterval.getStart(), secondaryInterval.getStart()); | ||||||||
} | ||||||||
else { | ||||||||
return new SimpleInterval(primaryInterval.getContig(), secondaryInterval.getEnd(), primaryInterval.getEnd()); | ||||||||
} | ||||||||
} | ||||||||
|
||||||||
/** | ||||||||
* Parse one interval string from CPX_INTERVALS INFO field into an SVSegment representing the SV type and | ||||||||
* interval of one of the components of the complex event | ||||||||
* @param cpxInterval - one element from CPX_INTERVALS list, a string representing one component of complex SV | ||||||||
* @return - SVSegment representing one component of the complex SV (type and interval) | ||||||||
* Parse CPX_INTERVALS field into a list of SV segments for annotation of protein-coding consequences. | ||||||||
* Ignore or adjust INV intervals as required by the CPX event type | ||||||||
* @param cpxIntervals - list of elements from CPX_INTERVALS field, each describing one segment of a complex SV | ||||||||
* @param complexType - Complex SV event type category, from CPX_TYPE field | ||||||||
* @return - List of SVSegments representing component of the complex SV (type and interval) to annotate for | ||||||||
* protein-coding consequences | ||||||||
*/ | ||||||||
@VisibleForTesting | ||||||||
protected static SVSegment parseCPXIntervalString(final String cpxInterval) { | ||||||||
final String[] parsed = cpxInterval.split("_"); | ||||||||
final GATKSVVCFConstants.StructuralVariantAnnotationType svTypeForInterval = GATKSVVCFConstants.StructuralVariantAnnotationType.valueOf(parsed[0]); | ||||||||
final SimpleInterval interval = new SimpleInterval(parsed[1]); | ||||||||
return new SVSegment(svTypeForInterval, interval); | ||||||||
protected static List<SVSegment> getComplexAnnotationIntervals(final List<String> cpxIntervals, | ||||||||
final String complexType) { | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would be more maintainable to have separate parsing and processing methods for this. Define inner classes Doing so is a bit more "brittle," but it is safer to be explicitly checking and representing the input this way. Also it avoids the string matching, which can be prone to bugs. |
||||||||
final List<SVSegment> segments = new ArrayList<>(cpxIntervals.size() + 1); | ||||||||
final List<SimpleInterval> dupIntervals = new ArrayList<>(cpxIntervals.size()); | ||||||||
SimpleInterval inversionIntervalToAdjust = null; | ||||||||
for (final String cpxInterval : cpxIntervals) { | ||||||||
final String[] parsed = cpxInterval.split("_"); | ||||||||
final GATKSVVCFConstants.StructuralVariantAnnotationType svTypeForInterval = GATKSVVCFConstants.StructuralVariantAnnotationType.valueOf(parsed[0]); | ||||||||
final SimpleInterval interval = new SimpleInterval(parsed[1]); | ||||||||
if (svTypeForInterval == GATKSVVCFConstants.StructuralVariantAnnotationType.INV) { | ||||||||
// ignore INV segment for dDUP_iDEL or INS_iDEL | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add a quick explanation of why |
||||||||
if (complexType.contains("iDEL") || complexType.contains("dDUP")) { | ||||||||
continue; | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This may just be a stylistic preference but personally I find a loop that has multiple if/else clauses in it, some of which have a |
||||||||
} | ||||||||
// save INV interval to adjust later for dupINV / INVdup / dupINVdup / dupINVdel / delINVdup | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also here |
||||||||
else if (complexType.contains("INV") && complexType.contains("dup")) { | ||||||||
inversionIntervalToAdjust = new SimpleInterval(interval); | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you need to call |
||||||||
continue; | ||||||||
} | ||||||||
} | ||||||||
if (svTypeForInterval == GATKSVVCFConstants.StructuralVariantAnnotationType.DUP) { | ||||||||
dupIntervals.add(interval); | ||||||||
} | ||||||||
segments.add(new SVSegment(svTypeForInterval, interval)); | ||||||||
} | ||||||||
// adjust INV interval for dupINV / INVdup / dupINVdup / dupINVdel / delINVdup | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And here |
||||||||
if (inversionIntervalToAdjust != null) { | ||||||||
SimpleInterval adjustedInversionInterval = inversionIntervalToAdjust; | ||||||||
for (final SimpleInterval dupInterval : dupIntervals) { | ||||||||
adjustedInversionInterval = getNonOverlappingInterval(adjustedInversionInterval, dupInterval); | ||||||||
} | ||||||||
segments.add(new SVSegment(GATKSVVCFConstants.StructuralVariantAnnotationType.INV, adjustedInversionInterval)); | ||||||||
} | ||||||||
|
||||||||
return segments; | ||||||||
} | ||||||||
|
||||||||
|
||||||||
/** | ||||||||
* Get SV type to use for annotation for a breakend VCF record | ||||||||
* Breakend may represent BND, CTX, or DEL / DUP if the user specifies {@code SVAnnotate.MAX_BND_LEN_NAME} | ||||||||
|
@@ -562,17 +622,15 @@ protected static List<SVSegment> getSVSegments(final VariantContext variant, | |||||||
final String chr2 = variant.getAttributeAsString(GATKSVVCFConstants.CONTIG2_ATTRIBUTE, null); | ||||||||
final int end2 = variant.getAttributeAsInt(GATKSVVCFConstants.END2_ATTRIBUTE, pos); | ||||||||
if (overallSVType.equals(GATKSVVCFConstants.StructuralVariantAnnotationType.CPX)) { | ||||||||
final List<String> cpxIntervalsString = variant.getAttributeAsStringList(GATKSVVCFConstants.CPX_INTERVALS, null); | ||||||||
if (cpxIntervalsString == null) { | ||||||||
final List<String> cpxIntervals = variant.getAttributeAsStringList(GATKSVVCFConstants.CPX_INTERVALS, null); | ||||||||
if (cpxIntervals == null) { | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Recently I learned that the |
||||||||
throw new UserException("Complex (CPX) variant must contain CPX_INTERVALS INFO field"); | ||||||||
} | ||||||||
if (complexType == null) { | ||||||||
throw new UserException("Complex (CPX) variant must contain CPX_TYPE INFO field"); | ||||||||
} | ||||||||
intervals = new ArrayList<>(cpxIntervalsString.size() + 1); | ||||||||
for (final String cpxInterval : cpxIntervalsString) { | ||||||||
intervals.add(parseCPXIntervalString(cpxInterval)); | ||||||||
} | ||||||||
intervals = getComplexAnnotationIntervals(cpxIntervals, complexType); | ||||||||
// no need to add sink site INS for INS_iDEL because DEL coordinates contain sink site | ||||||||
if (complexType.contains("dDUP")) { | ||||||||
intervals.add(new SVSegment(GATKSVVCFConstants.StructuralVariantAnnotationType.INS, | ||||||||
new SimpleInterval(chrom, pos, pos + 1))); | ||||||||
|
@@ -620,7 +678,46 @@ protected static List<SVSegment> getSVSegments(final VariantContext variant, | |||||||
return intervals; | ||||||||
} | ||||||||
|
||||||||
/** | ||||||||
* Update list of SVSegments to use for promoter & noncoding annotations for complex SVs. Removes DUP segments | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
IMO, "Updates" implies in-place modification of the input list |
||||||||
* which are never tandem in CPX events | ||||||||
* @param svSegments - List of SVSegments used for gene overlap annotations | ||||||||
* @return - Updated list of SVSegments to use for promoter & noncoding annotations for CPX SVs | ||||||||
*/ | ||||||||
@VisibleForTesting | ||||||||
protected static List<SVSegment> getSegmentsForNonCodingAnnotations(final List<SVSegment> svSegments) { | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd modify the name of this to reinforce that it's meant for CPX events only. Or alternatively, you might consider making a little helper class ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I updated these functions to run on all SV types instead of just complex, to go with the other change you suggested below to create new lists of segments for each type of annotation |
||||||||
final List<SVSegment> updatedSegments = new ArrayList<>(svSegments.size()); | ||||||||
for (final SVSegment svSegment : svSegments) { | ||||||||
if (svSegment.getIntervalSVType() != GATKSVVCFConstants.StructuralVariantAnnotationType.DUP) { | ||||||||
updatedSegments.add(svSegment); | ||||||||
} | ||||||||
} | ||||||||
return updatedSegments; | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could be done in one line with a stream:
|
||||||||
} | ||||||||
|
||||||||
/** | ||||||||
* Update list of SVSegments to use for nearest TSS annotations for complex SVs. DUP segments are already removed. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||
* Merges remaining intervals (DEL, INV) for deletion-containing CPX events. | ||||||||
* @param svSegments - List of SVSegments used for gene overlap annotations | ||||||||
* @return - Updated list of SVSegments to use for nearest TSS annotations for CPX SVs | ||||||||
*/ | ||||||||
@VisibleForTesting | ||||||||
protected static List<SVSegment> getSegmentForNearestTSS(final List<SVSegment> svSegments, | ||||||||
final String complexType) { | ||||||||
// for dDUP_iDEL, INS_iDEL, delINV, INVdel, dupINVdel, delINVdup, delINVdel --> merge all remaining SV segments | ||||||||
// which will be INS, DEL, INV types (DUPs already removed) | ||||||||
if (complexType.contains("del") || complexType.contains("DEL")) { | ||||||||
SimpleInterval spanningSegment = svSegments.get(0).getInterval(); | ||||||||
for (int i = 1; i < svSegments.size(); i++) { | ||||||||
spanningSegment = spanningSegment.spanWith(svSegments.get(i).getInterval()); | ||||||||
} | ||||||||
return Collections.singletonList(new SVSegment(GATKSVVCFConstants.StructuralVariantAnnotationType.DEL, | ||||||||
spanningSegment)); | ||||||||
} else { | ||||||||
// for dDUP, dupINV, INVdup, dupINVdup --> no further modifications (already adjusted INV, removed DUPs) | ||||||||
return svSegments; | ||||||||
} | ||||||||
} | ||||||||
|
||||||||
/** | ||||||||
* Create a copy of the variant consequence dictionary in which the feature names for each consequence are sorted | ||||||||
|
@@ -649,16 +746,25 @@ protected static Map<String, Object> sortVariantConsequenceDict(final Map<String | |||||||
protected Map<String, Object> annotateStructuralVariant(final VariantContext variant) { | ||||||||
final Map<String, Set<String>> variantConsequenceDict = new HashMap<>(); | ||||||||
final GATKSVVCFConstants.StructuralVariantAnnotationType overallSVType = getSVType(variant); | ||||||||
final List<SVSegment> svSegments = getSVSegments(variant, overallSVType, maxBreakendLen); | ||||||||
final boolean isComplex = overallSVType == GATKSVVCFConstants.StructuralVariantAnnotationType.CPX; | ||||||||
final String complexType = variant.getAttributeAsString(GATKSVVCFConstants.CPX_TYPE, null); | ||||||||
List<SVSegment> svSegments = getSVSegments(variant, overallSVType, maxBreakendLen); | ||||||||
|
||||||||
// annotate gene overlaps | ||||||||
if (gtfIntervalTrees != null && gtfIntervalTrees.getTranscriptIntervalTree() != null) { | ||||||||
for (SVSegment svSegment : svSegments) { | ||||||||
annotateGeneOverlaps(svSegment.getInterval(), svSegment.getIntervalSVType(), variantConsequenceDict); | ||||||||
annotateGeneOverlaps(svSegment.getInterval(), svSegment.getIntervalSVType(), isComplex, variantConsequenceDict); | ||||||||
} | ||||||||
} | ||||||||
|
||||||||
// if variant consequence dictionary is empty (no protein-coding annotations), apply INTERGENIC flag | ||||||||
final boolean noCodingAnnotations = variantConsequenceDict.isEmpty(); | ||||||||
|
||||||||
// for CPX events, update SV segments to annotate promoter & noncoding consequences | ||||||||
if (overallSVType == GATKSVVCFConstants.StructuralVariantAnnotationType.CPX) { | ||||||||
svSegments = getSegmentsForNonCodingAnnotations(svSegments); | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Rather than changing the value of |
||||||||
} | ||||||||
|
||||||||
// then annotate promoter overlaps and non-coding feature overlaps | ||||||||
if (gtfIntervalTrees != null && gtfIntervalTrees.getPromoterIntervalTree() != null) { | ||||||||
for (final SVSegment svSegment : svSegments) { | ||||||||
|
@@ -672,6 +778,11 @@ protected Map<String, Object> annotateStructuralVariant(final VariantContext var | |||||||
} | ||||||||
} | ||||||||
|
||||||||
// for CPX events, update SV segments to annotate nearest TSS | ||||||||
if (overallSVType == GATKSVVCFConstants.StructuralVariantAnnotationType.CPX) { | ||||||||
svSegments = getSegmentForNearestTSS(svSegments, complexType); | ||||||||
} | ||||||||
|
||||||||
// annotate nearest TSS for intergenic variants with no promoter overlaps | ||||||||
if (gtfIntervalTrees != null && gtfIntervalTrees.getTranscriptionStartSiteTree() != null && | ||||||||
!variantConsequenceDict.containsKey(GATKSVVCFConstants.PROMOTER) && noCodingAnnotations) { | ||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify this a little more ("an SV's dispersed duplication" could potentially be the insert interval in the mind of some readers I think), I might suggest the wording, "Gene(s) which are partially overlapped by the duplicated segment involved in an SV's dispersed duplication."