Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WDLize GvsPrepareCallset (briefly known as CreateCohortTable) #7200

Merged
merged 30 commits into from
Apr 14, 2021
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
a0e4874
first pass at CreateCohortTable.wdl, rename python script
mmorgantaylor Apr 12, 2021
e3ad814
update wdl defaults, add this branch to dockstore
mmorgantaylor Apr 12, 2021
bd3469e
add inputs file
mmorgantaylor Apr 12, 2021
e01c16f
fix indentation on dockstore y(f)ml
mmorgantaylor Apr 12, 2021
be53b43
pull script from github branch directly
mmorgantaylor Apr 13, 2021
a5dd483
change string reference character
mmorgantaylor Apr 13, 2021
8293d0f
revert previous; add missing input
mmorgantaylor Apr 13, 2021
9c6d19a
fix attempt for defaults - use new variable names
mmorgantaylor Apr 13, 2021
5fffcab
try alternate string reference character again ~
mmorgantaylor Apr 13, 2021
2dfefff
add bigquery installation
mmorgantaylor Apr 13, 2021
3d02841
use custom docker, add docker build script
mmorgantaylor Apr 13, 2021
4ba7b65
make build_docker script easier to use
mmorgantaylor Apr 13, 2021
9c0a154
add entrypoint to dockerfile, run script in app dir
mmorgantaylor Apr 13, 2021
11d5c79
try something else
mmorgantaylor Apr 13, 2021
433fdfa
remove ls from wdl
mmorgantaylor Apr 13, 2021
6d7e113
add SA key file as input, use google base image for gcloud auth support
mmorgantaylor Apr 13, 2021
a705592
cleanup, fix docker input
mmorgantaylor Apr 13, 2021
bb916f3
functional SA authentication for CreateCohortTable
mmorgantaylor Apr 13, 2021
f2122e0
don't copy SA file, localization happens anyway
mmorgantaylor Apr 14, 2021
b475a9d
define defaults better
mmorgantaylor Apr 14, 2021
0d4bb2f
finish defaults for wdl inputs
mmorgantaylor Apr 14, 2021
049909f
use default_dataset, further clean up unneeded inputs
mmorgantaylor Apr 14, 2021
0ecb319
update example inputs json
mmorgantaylor Apr 14, 2021
018fae3
update inputs with new docker image
mmorgantaylor Apr 14, 2021
a21626c
rename create_cohort_data_table.py
mmorgantaylor Apr 14, 2021
438b89e
remove redundant google sdk installation from Dockerfile
mmorgantaylor Apr 14, 2021
9ffe9ae
update shell script with new python script name
mmorgantaylor Apr 14, 2021
2024451
update to support latest docker tag; remove branch from dockstore yml
mmorgantaylor Apr 14, 2021
8ea51c8
refactor duplicated config setup
mmorgantaylor Apr 14, 2021
2c5d25e
rename wdl GvsPrepareCallset
mmorgantaylor Apr 14, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .dockstore.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,16 @@ workflows:
filters:
branches:
- master
- name: CreateCohortTable
subclass: WDL
primaryDescriptorPath: /scripts/variantstore/wdl/CreateCohortTable.wdl
testParameterFiles:
- /scripts/variantstore/wdl/CreateCohortTable.example.inputs.json
filters:
branches:
- master
- ah_var_store
- mmt_ngs_cohort_extract_wdl
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove before merging

- name: ImportGenomes
subclass: WDL
primaryDescriptorPath: /scripts/variantstore/wdl/ImportGenomes.wdl
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"CreateCohortTable.project": "spec-ops-aou",
"CreateCohortTable.dataset": "gvs_tieout_acmg_v1",

"CreateCohortTable.docker": "us.gcr.io/broad-dsde-methods/broad-gatk-snapshots:varstore_cb56620f1db171d3f1c682e150e6aeb0cef64a83_mmt_ngs_cohort_extract_wdl_2020_04_12"
}

86 changes: 86 additions & 0 deletions scripts/variantstore/wdl/CreateCohortTable.wdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
version 1.0

workflow CreateCohortTable {
input {
String project
String dataset
mmorgantaylor marked this conversation as resolved.
Show resolved Hide resolved

String? docker
}

# TODO update this docker source
String docker_final = select_first([docker, "us.gcr.io/broad-dsde-methods/variantstore:latest"])
mmorgantaylor marked this conversation as resolved.
Show resolved Hide resolved

call CreateCohortTableTask {
input:
project = project,
dataset = dataset,

docker = docker_final
}

}

task CreateCohortTableTask {
# indicates that this task should NOT be call cached
meta {
volatile: true
}

input {
String project
String dataset

String? query_project
String? destination_project
String? destination_dataset

String? destination_cohort_table_name
String? fq_cohort_sample_table
String? fq_sample_mapping_table

File? service_account_json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still defined as a File even if it's just a path to a json in GCP?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as Kris pointed out, since it's a file, it'll localize, so we can just pass it straight into python. tested!

String docker
}

#### set defaults ####
String query_project_final = if defined(query_project) then "${query_project}" else "${project}"
String destination_project_final = if defined(destination_project) then "${destination_project}" else "${project}"
String destination_dataset_final = if defined(destination_dataset) then "${destination_dataset}" else "${dataset}"

String destination_cohort_table_name_final = if defined(destination_cohort_table_name) then "${destination_cohort_table_name}" else "exported_cohort_all_samples"
String fq_cohort_sample_table_final = if defined(fq_cohort_sample_table) then "${fq_cohort_sample_table}" else "${project}.${dataset}.sample_info"
String fq_sample_mapping_table_final = if defined(fq_sample_mapping_table) then "${fq_sample_mapping_table}" else "${project}.${dataset}.sample_info"

String has_service_account_file = if (defined(service_account_json)) then 'true' else 'false'

command <<<
set -e

if [ ~{has_service_account_file} = 'true' ]; then
gcloud auth activate-service-account --key-file='~{service_account_json}'
fi

python3 /app/create_cohort_data_table.py \
--fq_petvet_dataset ~{project}.~{dataset} \
--fq_temp_table_dataset ~{destination_project_final}.temp_tables \
--fq_destination_dataset ~{destination_project_final}.~{destination_dataset_final} \
--destination_table ~{destination_cohort_table_name_final} \
--fq_cohort_sample_names ~{fq_cohort_sample_table_final} \
--query_project ~{query_project_final} \
--fq_sample_mapping_table ~{fq_sample_mapping_table_final}
>>>

runtime {
docker: docker
memory: "10 GB"
disks: "local-disk 100 HDD"
bootDiskSizeGb: 15
preemptible: 0
cpu: 1
}

}



6 changes: 4 additions & 2 deletions scripts/variantstore/wdl/extract/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,14 +1,16 @@
FROM python:3.7
FROM gcr.io/google.com/cloudsdktool/cloud-sdk:305.0.0
# FROM python:3.7

# Copy the application's requirements.txt and run pip to install
ADD requirements.txt /app/requirements.txt
RUN pip install -r /app/requirements.txt

# Add the application source code.
ADD raw_array_cohort_extract.py /app
mmorgantaylor marked this conversation as resolved.
Show resolved Hide resolved
ADD ngs_cohort_extract.py /app
ADD create_cohort_data_table.py /app

# install google SDK
RUN curl -sSL https://sdk.cloud.google.com | bash

WORKDIR /app
ENTRYPOINT ["/bin/bash"]
14 changes: 14 additions & 0 deletions scripts/variantstore/wdl/extract/build_docker.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
if [ $# -lt 1 ]; then
echo "USAGE: ./build_docker.sh [DOCKER_TAG_STRING]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

echo " e.g.: ./build_docker.sh mybranch_2021_04_03"
exit 1
fi

INFO=$1
GCR_TAG="us.gcr.io/broad-dsde-methods/variantstore:${INFO}"

docker build . -t broad-dsde-methods/variantstore:${INFO}
docker tag broad-dsde-methods/variantstore:${INFO} ${GCR_TAG}
docker push ${GCR_TAG}

echo "docker image pushed to \"${GCR_TAG}\""
2 changes: 1 addition & 1 deletion scripts/variantstore/wdl/extract/run_gvs_tieout_extract.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
PROJECT="spec-ops-aou"
DATASET="gvs_tieout_acmg_v1"

python ngs_cohort_extract.py \
python create_cohort_data_table.py \
mmorgantaylor marked this conversation as resolved.
Show resolved Hide resolved
--fq_petvet_dataset ${PROJECT}.${DATASET} \
--fq_temp_table_dataset ${PROJECT}.temp_tables \
--fq_destination_dataset ${PROJECT}.${DATASET} \
Expand Down