-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
313 Cleanup Extract Cohort params #7293
Conversation
src/main/java/org/broadinstitute/hellbender/tools/gvs/extract/ExtractCohort.java
Outdated
Show resolved
Hide resolved
ExtractCohort already has both of these params (with different input names) accounted for and ExtractFeatures doesn't use these params at all
14df450
to
9b24bf4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one required change - rename onlyAnnotate
@@ -35,7 +35,9 @@ | |||
private static final Logger logger = LogManager.getLogger(ExtractCohort.class); | |||
private ExtractCohortEngine engine; | |||
|
|||
@Argument( | |||
public enum FilteringType { NONE, GENOTYPE, SITES}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does selecting GENOTYPE also imply that we are doing SITE level filtering? So maybe the enum should be {NONE, SITES_ONLY, FULL} (or instead of FULL maybe SITES_AND_GT)?
// set headers | ||
extraHeaderLines.add(FilterSensitivityTools.getTruthSensitivityHeader(truthSensitivitySNPThreshold, vqsLodSNPThreshold, GATKVCFConstants.SNP)); | ||
extraHeaderLines.add(FilterSensitivityTools.getTruthSensitivityHeader(truthSensitivityINDELThreshold, vqsLodINDELThreshold, GATKVCFConstants.INDEL)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not for this PR, but I'm wondering if we should make FilterSensititvityTools an class where we create instances. We would pass it all the params on construction, it would do the validation. It could also add the headers and then we just ask it for the 2 thresholds so all the logic is encapsulated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true---it does seem like a lot of params to be passing around
if (!this.disableGnarlyGenotyper && genotypedVC == null) { | ||
return; | ||
} | ||
final VariantContext genotypedVC = annotatedVC; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could name the variable in the previous line genotypedVC and then get rid of this line
@@ -489,7 +474,7 @@ private VariantContext filterSiteByAlleleSpecificVQSLOD(VariantContext mergedVC, | |||
builder.attribute(GATKVCFConstants.AS_VQS_LOD_KEY, relevantVqsLodMap.values().stream().map(val -> val.equals(Double.NaN) ? VCFConstants.EMPTY_INFO_FIELD : val.toString()).collect(Collectors.toList())); | |||
builder.attribute(GATKVCFConstants.AS_YNG_STATUS_KEY, new ArrayList<>(relevantYngMap.values())); | |||
|
|||
if (!onlyAnnotate) { | |||
if (filteringType.equals(ExtractCohort.FilteringType.SITES)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this finding the variable filteringType? the parameter passed in is onlyAnnotate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooof--nice catch, thank you!
optional = true | ||
) | ||
private boolean performGenotypeVQSLODFiltering = true; | ||
private FilteringType filteringType = FilteringType.NONE; // TODO do we want to make sure this is always / required? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be FilteringType.GENOTYPE by default, and if they don't supply a filter_set we can either throw an error or change the value for this to be NONE (maybe the former)
@@ -460,7 +460,7 @@ private void finalizeCurrentVariant(final List<VariantContext> unmergedCalls, | |||
} | |||
} | |||
|
|||
private VariantContext filterSiteByAlleleSpecificVQSLOD(VariantContext mergedVC, HashMap<Allele, HashMap<Allele, Double>> vqsLodMap, HashMap<Allele, HashMap<Allele, String>> yngMap, boolean onlyAnnotate) { | |||
private VariantContext filterSiteByAlleleSpecificVQSLOD(VariantContext mergedVC, HashMap<Allele, HashMap<Allele, Double>> vqsLodMap, HashMap<Allele, HashMap<Allele, String>> yngMap, ExtractCohort.FilteringType onlyAnnotate) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
onlyAnnotate
was somewhat confusing before -- I think now this could just be flteringType
@@ -160,7 +153,7 @@ public void traverse() { | |||
} | |||
final String rowRestrictionWithFilterSetName = rowRestriction + " AND " + SchemaUtils.FILTER_SET_NAME + " = '" + filterSetName + "'"; | |||
|
|||
boolean noVqslodFilteringRequested = (filterSetInfoTableRef == null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why wasn't this stopping the NPEs before?
this.filterSetInfoTableRef = filterSetInfoTableName == null || "".equals(filterSetInfoTableName) ? null : new TableReference(filterSetInfoTableName, SchemaUtils.YNG_FIELDS);
so we shouldn't be throwing it unless we are somehow setting the filterSetInfoTableName? Does this ternary need parens?
SITES("sites"), | ||
NONE(""); | ||
|
||
String value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can simplify this.. you don't need the value
so the enum could be
public enum VQSLODFilteringType { GENOTYPE, SITES, NONE };
fullName = "vqslod-filter-genotypes", | ||
doc = "Should VQSLOD filtering be applied at the genotype level", | ||
fullName = "vqslod-filter-by-site", | ||
doc = "If VQSLOD filtering is applied, it should be at a site level. Default is false", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice docs
Set<VCFHeaderLine> extraHeaderLines = new HashSet<>(); | ||
|
||
if (filterSetInfoTableName != null) { // filter using vqslod-- default to GENOTYPE unless SITES specifically selected | ||
if (performSiteSpecificVQSLODFiltering) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be a ternary instead
vqslodfilteringType = performSiteSpecificVQSLODFiltering ? VQSLODFilteringType.SITES : VQSLODFilteringType.GENOTYPE;
@@ -122,13 +118,13 @@ public ExtractCohortEngine(final String projectID, | |||
this.traversalIntervals = traversalIntervals; | |||
this.minLocation = minLocation; | |||
this.maxLocation = maxLocation; | |||
this.filterSetInfoTableRef = filterSetInfoTableName == null || "".equals(filterSetInfoTableName) ? null : new TableReference(filterSetInfoTableName, SchemaUtils.YNG_FIELDS); | |||
this.filterSetSiteTableRef = filterSetSiteTableName == null || "".equals(filterSetSiteTableName) ? null : new TableReference(filterSetSiteTableName, SchemaUtils.FILTER_SET_SITE_FIELDS); | |||
this.filterSetInfoTableRef = (filterSetInfoTableName == null || "".equals(filterSetInfoTableName)) ? null : new TableReference(filterSetInfoTableName, SchemaUtils.YNG_FIELDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we just test the enum here?
There are params in Extract Cohort that can be tightened up
Extract Tool has two params that are not used by Extract Features and are already in Extract Cohort
The filter_set_name is used in the BQ filtering tables and looks like we can set it to be completely required for any type of filtering.
There are 3 BQ filter tables -- 2 are needed for filtering (no matter what?) and 1 (tranches) is needed for thresholding and sensitivity calculations?
Genotype level filtering is true by default, but this doesn't seem like it should effect things after this cleanup. Though technically it should only be true if a filter_set_name has been specified. I will add another comment in the body of the code, but I would like to add this safety gate explicitly
Disable gnarly doesn't need to be a passed in param---so we'll rip it out for now
SNP and INDEL truth sensitivity and SNP and INDEL Lod scores are cumbersome to have to worry about passing in, but I dont see a better alternative. Should there be additional validation on these (where if they are specified, but no filter_set_name is, then they throw an error?)