Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute filter scatter [VS-392] #7852

Merged
merged 6 commits into from
May 18, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion scripts/variantstore/TERRA_QUICKSTART.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,6 @@ This is done by running the `GvsCreateFilterSet` workflow with the following par
| filter_set_name | a unique name to identify this filter set (e.g. `my_demo_filters`); you will want to make note of this for use in step 5 |
| INDEL_VQSR_max_gaussians_override | you don't need to set this unless a previous run of IndelsVariantRecalibrator task failed to converge, start with 3 and lower as needed |
| project_id | the name of the google project containing the dataset |
| scatter_count | how widely to scatter the task that extracts the features to filter on; 20 is plenty for 10 samples |
| SNP_VQSR_max_gaussians_override | you don't need to set this unless a previous run of SNPsVariantRecalibratorClassic task failed to converge, start with 5 and lower as needed |

## 4. Prepare Callset
Expand Down
6 changes: 5 additions & 1 deletion scripts/variantstore/wdl/GvsCreateFilterSet.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ workflow GvsCreateFilterSet {

String filter_set_name
Array[String] indel_recalibration_annotation_values = ["AS_FS", "AS_ReadPosRankSum", "AS_MQRankSum", "AS_QD", "AS_SOR"]
Int scatter_count
Array[String] snp_recalibration_annotation_values = ["AS_QD", "AS_MQRankSum", "AS_ReadPosRankSum", "AS_FS", "AS_MQ", "AS_SOR"]

File interval_list = "gs://gcp-public-data--broad-references/hg38/v0/wgs_calling_regions.hg38.noCentromeres.noTelomeres.interval_list"
Expand Down Expand Up @@ -70,6 +69,11 @@ workflow GvsCreateFilterSet {
project_id = project_id
}

Int scatter_count = if GetNumSamplesLoaded.num_samples < 100 then 20
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Int scatter_count = if GetNumSamplesLoaded.num_samples < 100 then 20
Int scatter_count = if GetNumSamplesLoaded.num_samples < 100 then 20
else if GetNumSamplesLoaded.num_samples < 1000 then 100
else if GetNumSamplesLoaded.num_samples < 10000 then 200
else if GetNumSamplesLoaded.num_samples < 100000 then 500
else 1000```

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a little cleaner?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with the formatting that the Winstanley plugin recommended, although I like the way your suggestion looks,.

else if GetNumSamplesLoaded.num_samples < 1000 then 100
else if GetNumSamplesLoaded.num_samples < 10000 then 200
else if GetNumSamplesLoaded.num_samples < 100000 then 500 else 1000

call Utils.SplitIntervals {
input:
intervals = interval_list,
Expand Down
1 change: 0 additions & 1 deletion scripts/variantstore/wdl/GvsQuickstartIntegration.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,6 @@ workflow GvsQuickstartIntegration {
input_vcfs = input_vcfs,
input_vcf_indexes = input_vcf_indexes,
filter_set_name = "quickit",
create_filter_set_scatter_count = 20,
rsasch marked this conversation as resolved.
Show resolved Hide resolved
extract_table_prefix = "quickit",
extract_scatter_count = 100,
# Force filtering off as it is not deterministic and the initial version of this integration test does not
Expand Down
2 changes: 0 additions & 2 deletions scripts/variantstore/wdl/GvsUnified.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,6 @@ workflow GvsUnified {
# Begin GvsCreateFilterSet
String filter_set_name
Array[String] indel_recalibration_annotation_values = ["AS_FS", "AS_ReadPosRankSum", "AS_MQRankSum", "AS_QD", "AS_SOR"]
Int create_filter_set_scatter_count
Array[String] snp_recalibration_annotation_values = ["AS_QD", "AS_MQRankSum", "AS_ReadPosRankSum", "AS_FS", "AS_MQ", "AS_SOR"]

Int? INDEL_VQSR_max_gaussians_override = 4
Expand Down Expand Up @@ -109,7 +108,6 @@ workflow GvsUnified {
project_id = project_id,
filter_set_name = filter_set_name,
indel_recalibration_annotation_values = indel_recalibration_annotation_values,
scatter_count = create_filter_set_scatter_count,
snp_recalibration_annotation_values = snp_recalibration_annotation_values,
interval_list = interval_list,
gatk_override = gatk_override,
Expand Down