Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WDLize GvsPrepareCallset (briefly known as CreateCohortTable) #7200

Merged
merged 30 commits into from
Apr 14, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
a0e4874
first pass at CreateCohortTable.wdl, rename python script
mmorgantaylor Apr 12, 2021
e3ad814
update wdl defaults, add this branch to dockstore
mmorgantaylor Apr 12, 2021
bd3469e
add inputs file
mmorgantaylor Apr 12, 2021
e01c16f
fix indentation on dockstore y(f)ml
mmorgantaylor Apr 12, 2021
be53b43
pull script from github branch directly
mmorgantaylor Apr 13, 2021
a5dd483
change string reference character
mmorgantaylor Apr 13, 2021
8293d0f
revert previous; add missing input
mmorgantaylor Apr 13, 2021
9c6d19a
fix attempt for defaults - use new variable names
mmorgantaylor Apr 13, 2021
5fffcab
try alternate string reference character again ~
mmorgantaylor Apr 13, 2021
2dfefff
add bigquery installation
mmorgantaylor Apr 13, 2021
3d02841
use custom docker, add docker build script
mmorgantaylor Apr 13, 2021
4ba7b65
make build_docker script easier to use
mmorgantaylor Apr 13, 2021
9c0a154
add entrypoint to dockerfile, run script in app dir
mmorgantaylor Apr 13, 2021
11d5c79
try something else
mmorgantaylor Apr 13, 2021
433fdfa
remove ls from wdl
mmorgantaylor Apr 13, 2021
6d7e113
add SA key file as input, use google base image for gcloud auth support
mmorgantaylor Apr 13, 2021
a705592
cleanup, fix docker input
mmorgantaylor Apr 13, 2021
bb916f3
functional SA authentication for CreateCohortTable
mmorgantaylor Apr 13, 2021
f2122e0
don't copy SA file, localization happens anyway
mmorgantaylor Apr 14, 2021
b475a9d
define defaults better
mmorgantaylor Apr 14, 2021
0d4bb2f
finish defaults for wdl inputs
mmorgantaylor Apr 14, 2021
049909f
use default_dataset, further clean up unneeded inputs
mmorgantaylor Apr 14, 2021
0ecb319
update example inputs json
mmorgantaylor Apr 14, 2021
018fae3
update inputs with new docker image
mmorgantaylor Apr 14, 2021
a21626c
rename create_cohort_data_table.py
mmorgantaylor Apr 14, 2021
438b89e
remove redundant google sdk installation from Dockerfile
mmorgantaylor Apr 14, 2021
9ffe9ae
update shell script with new python script name
mmorgantaylor Apr 14, 2021
2024451
update to support latest docker tag; remove branch from dockstore yml
mmorgantaylor Apr 14, 2021
8ea51c8
refactor duplicated config setup
mmorgantaylor Apr 14, 2021
2c5d25e
rename wdl GvsPrepareCallset
mmorgantaylor Apr 14, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@
"CreateCohortTable.project": "spec-ops-aou",
"CreateCohortTable.dataset": "gvs_tieout_acmg_v1",

"CreateCohortTable.docker": "us.gcr.io/broad-dsde-methods/variantstore:mmt_ngs_cohort_extract_wdl_2020_05_13_gcloud"
"CreateCohortTable.docker": "us.gcr.io/broad-dsde-methods/variantstore:mmt_ngs_cohort_extract_wdl_2020_05_13_sa"
}

71 changes: 35 additions & 36 deletions scripts/variantstore/wdl/CreateCohortTable.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,17 @@ version 1.0

workflow CreateCohortTable {
input {
String project
String data_project
String dataset
mmorgantaylor marked this conversation as resolved.
Show resolved Hide resolved
String fq_cohort_sample_table = "${data_project}.${dataset}.sample_info"
String fq_sample_mapping_table = "${data_project}.${dataset}.sample_info"

String destination_cohort_table_name

# TODO testing remove this ?
String? query_project = data_project
mmorgantaylor marked this conversation as resolved.
Show resolved Hide resolved
String destination_project = data_project
String destination_dataset = dataset

String? docker
}
Expand All @@ -13,8 +22,16 @@ workflow CreateCohortTable {

call CreateCohortTableTask {
input:
project = project,
data_project = data_project,
dataset = dataset,
fq_cohort_sample_table = fq_cohort_sample_table,
fq_sample_mapping_table = fq_sample_mapping_table,

destination_cohort_table_name = destination_cohort_table_name,

query_project = query_project,
destination_project = destination_project,
destination_dataset = destination_dataset,

docker = docker_final
}
Expand All @@ -28,52 +45,34 @@ task CreateCohortTableTask {
}

input {
String project
String data_project
String dataset
String fq_cohort_sample_table
String fq_sample_mapping_table

String? query_project
String? destination_project
String? destination_dataset
String destination_cohort_table_name

String? destination_cohort_table_name
String? fq_cohort_sample_table
String? fq_sample_mapping_table
# TODO testing remove this ?
String? query_project
String destination_project
String destination_dataset

File? service_account_json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still defined as a File even if it's just a path to a json in GCP?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as Kris pointed out, since it's a file, it'll localize, so we can just pass it straight into python. tested!

String docker
}

#### set defaults ####
String query_project_final = if defined(query_project) then "${query_project}" else "${project}"
String destination_project_final = if defined(destination_project) then "${destination_project}" else "${project}"
String destination_dataset_final = if defined(destination_dataset) then "${destination_dataset}" else "${dataset}"

String destination_cohort_table_name_final = if defined(destination_cohort_table_name) then "${destination_cohort_table_name}" else "exported_cohort_all_samples"
String fq_cohort_sample_table_final = if defined(fq_cohort_sample_table) then "${fq_cohort_sample_table}" else "${project}.${dataset}.sample_info"
String fq_sample_mapping_table_final = if defined(fq_sample_mapping_table) then "${fq_sample_mapping_table}" else "${project}.${dataset}.sample_info"

String has_service_account_file = if (defined(service_account_json)) then 'true' else 'false'

command <<<
set -e

if [ ~{has_service_account_file} = 'true' ]; then
SA_FILENAME="sa_key.json"
gsutil cp "~{service_account_json}" $SA_FILENAME
SA_ARGS="--sa_key_path ${SA_FILENAME}"
else
SA_ARGS=""
fi

python3 /app/create_cohort_data_table.py \
--fq_petvet_dataset ~{project}.~{dataset} \
--fq_temp_table_dataset ~{destination_project_final}.temp_tables \
--fq_destination_dataset ~{destination_project_final}.~{destination_dataset_final} \
--destination_table ~{destination_cohort_table_name_final} \
--fq_cohort_sample_names ~{fq_cohort_sample_table_final} \
--query_project ~{query_project_final} \
--fq_sample_mapping_table ~{fq_sample_mapping_table_final} \
$SA_ARGS
--fq_petvet_dataset ~{data_project}.~{dataset} \
--fq_temp_table_dataset ~{destination_project}.temp_tables \
--fq_destination_dataset ~{destination_project}.~{destination_dataset} \
--destination_table ~{destination_cohort_table_name} \
--fq_cohort_sample_names ~{fq_cohort_sample_table} \
--query_project ~{query_project} \
--fq_sample_mapping_table ~{fq_sample_mapping_table} \
~{"--sa_key_path " + service_account_json}
>>>

runtime {
Expand Down