Skip to content

Commit

Permalink
Merge pull request #274 from PGScatalog/dev
Browse files Browse the repository at this point in the history
* Check for _ in sampleset names

* fix samplesheet path to point to VCF

* drop vcf suffix

* update tests with removed vcf suffix

* include inputs when relabelling (geno and sample files are unchanged)

* add more tests for results structure

* Expose documentation about switching versions.

* add cloud / JSON samplesheet docs

* add multiple chromosomes example

* add links to JSON samplesheet

* explicitly set default results to $PWD/results

this change affects people running the workflow directly from
github, e.g.

$ nextflow run pgscatalog/pgsc_calc ...

if --outdir isn't set, then the results folder can be in $NXF_HOME,
which is a hidden folder in the home directory by default. not a
helpful place for results to be!

this doesn't affect people running from a cloned repo directly

* Fix typo in output.rst

* Add in documentation about popsimilarity file.

* migrate to pygscatalog utilities (#296)

* add correlation test

* add correlation action

* fix download URL

* use scoring files from correlation archive

* get test profile working with pygscatalog

* integration updates

* fix correlation scorefile wildcard

* fix tests

* update plink2

* gzip afreq in plink2_vcf

* update custom scoring files for liftover

* fix match module test

* use local files in test suite

* fix singularity container definition

* check for environment variables with set -euxo

* logs are massive, don't upload, debug locally

* Improve pca (#267)

* Output allele frequencies along with missingness (for filtering variants)

* Add afreq to output

* Add afreq to intersect_variants.nf

* add afreq to intersect_thinned

* intersect with new pgscatalog-intersect application

* rebase

* Make verbose

* Remove duplication

* Use new output of intersect_variants in filtering

* Use new output of intersect_variants in intersect_variants.nf : keeps memory footprint very low (but higher I/O into tempfiles)

* Fix column index to PCA_ELIGIBLE (13)

* Fix awk statement that doesn't work with odd carriage return?

* Fix awk statement for True/False (not 0/1 as in previous version)

* Add in variant-based filters

---------

Co-authored-by: Benjamin Wingfield <[email protected]>

* remove duplicate container definition (pygscatalog)

* fix duplicate freq flags

* bump workflow version

* don't upload output directory in ancestry tests

* add docker uid runOption to test config

* just use working directory as tmpdir

* drop deprecated docker.userEmulation

* update upload-artifact to v4

* fix join failure caused by wrong meta in afreq output (VCF)

* Superseded by pgscatalog-intersect

* Update pgscatalog_utils conda environment

* use stable container tags

* bump pgscatalog.core version

---------

Co-authored-by: Benjamin Wingfield <[email protected]>
  • Loading branch information
nebfield authored May 24, 2024
2 parents 8bdf287 + db50fdc commit 1321c1a
Show file tree
Hide file tree
Showing 80 changed files with 2,455 additions and 485 deletions.
7 changes: 3 additions & 4 deletions .github/workflows/ancestry-conda.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,11 +64,10 @@ jobs:

- name: Upload logs on failure
if: failure()
uses: actions/upload-artifact@v3
uses: actions/upload-artifact@v4
with:
name: logs-conda-ancestry
name: logs-singularity-ancestry
path: |
/home/runner/pytest_workflow_*/*/.nextflow.log
/home/runner/pytest_workflow_*/*/log.out
/home/runner/pytest_workflow_*/*/log.err
/home/runner/pytest_workflow_*/*/output/*
/home/runner/pytest_workflow_*/*/log.err
8 changes: 3 additions & 5 deletions .github/workflows/ancestry-vcf.yml
Original file line number Diff line number Diff line change
Expand Up @@ -77,14 +77,13 @@ jobs:

- name: Upload logs on failure
if: failure()
uses: actions/upload-artifact@v3
uses: actions/upload-artifact@v4
with:
name: logs-singularity-ancestry
path: |
/home/runner/pytest_workflow_*/*/.nextflow.log
/home/runner/pytest_workflow_*/*/log.out
/home/runner/pytest_workflow_*/*/log.err
/home/runner/pytest_workflow_*/*/output/*
singularity:
if: ${{ inputs.singularity }}
Expand Down Expand Up @@ -150,11 +149,10 @@ jobs:

- name: Upload logs on failure
if: failure()
uses: actions/upload-artifact@v3
uses: actions/upload-artifact@v4
with:
name: logs-singularity-ancestry
path: |
/home/runner/pytest_workflow_*/*/.nextflow.log
/home/runner/pytest_workflow_*/*/log.out
/home/runner/pytest_workflow_*/*/log.err
/home/runner/pytest_workflow_*/*/output/*
/home/runner/pytest_workflow_*/*/log.err
10 changes: 4 additions & 6 deletions .github/workflows/ancestry.yml
Original file line number Diff line number Diff line change
Expand Up @@ -71,14 +71,13 @@ jobs:

- name: Upload logs on failure
if: failure()
uses: actions/upload-artifact@v3
uses: actions/upload-artifact@v4
with:
name: logs-singularity-ancestry
path: |
/home/runner/pytest_workflow_*/*/.nextflow.log
/home/runner/pytest_workflow_*/*/log.out
/home/runner/pytest_workflow_*/*/log.err
/home/runner/pytest_workflow_*/*/output/*
singularity:
if: ${{ inputs.singularity }}
Expand Down Expand Up @@ -137,14 +136,13 @@ jobs:
run: TMPDIR=~ PROFILE=singularity pytest --kwdof --symlink --git-aware --wt 2 --tag "ancestry" --ignore tests/bin
env:
TMPDIR: ${{ runner.temp }}

- name: Upload logs on failure
if: failure()
uses: actions/upload-artifact@v3
uses: actions/upload-artifact@v4
with:
name: logs-singularity-ancestry
path: |
/home/runner/pytest_workflow_*/*/.nextflow.log
/home/runner/pytest_workflow_*/*/log.out
/home/runner/pytest_workflow_*/*/log.err
/home/runner/pytest_workflow_*/*/output/*
/home/runner/pytest_workflow_*/*/log.err
1 change: 0 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ on:
branches:
- dev
- main
- fix_vcf
release:
types: [published]

Expand Down
35 changes: 35 additions & 0 deletions .github/workflows/correlation-test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: Correlation test
on:
push:
branches:
- correlation
- ci
workflow_dispatch:
release:
types: [published]

jobs:
preload_correlation:
uses: ./.github/workflows/preload-correlation.yml

preload_docker:
uses: ./.github/workflows/preload-docker.yml

preload_singularity:
uses: ./.github/workflows/preload-singularity.yml

correlation_docker:
needs: [preload_docker, preload_correlation]
uses: ./.github/workflows/correlation.yml
with:
container-cache-key: ${{ needs.preload_docker.outputs.cache-key }}
correlation-cache-key: ${{ needs.preload_correlation.outputs.cache-key }}
docker: true

correlation_singularity:
needs: [preload_singularity, preload_correlation]
uses: ./.github/workflows/correlation.yml
with:
container-cache-key: ${{ needs.preload_singularity.outputs.cache-key }}
correlation-cache-key: ${{ needs.preload_correlation.outputs.cache-key }}
singularity: true
158 changes: 158 additions & 0 deletions .github/workflows/correlation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
name: Run correlation test with singularity or docker profiles

on:
workflow_call:
inputs:
container-cache-key:
type: string
required: true
correlation-cache-key:
type: string
required: true
docker:
type: boolean
singularity:
type: boolean

env:
NXF_SINGULARITY_CACHEDIR: ${{ github.workspace }}/singularity
SINGULARITY_VERSION: 3.8.3

jobs:
docker:
if: ${{ inputs.docker }}
runs-on: ubuntu-latest

steps:
- name: Set environment variables
run: |
echo "CORRELATION_DIR=$RUNNER_TEMP" >> $GITHUB_ENV
- name: Check out pipeline code
uses: actions/checkout@v4

- uses: nf-core/setup-nextflow@v2

- name: Restore docker images
id: restore-docker
uses: actions/cache/restore@v4
with:
path: ${{ runner.temp }}/docker
key: ${{ inputs.container-cache-key }}
fail-on-cache-miss: true

- name: Load docker images from cache
run: |
find $HOME -name '*.tar'
find ${{ runner.temp }}/docker/ -name '*.tar' -exec sh -c 'docker load < {}' \;
- name: Restore reference data
uses: actions/cache/restore@v4
with:
path: |
${{ env.CORRELATION_DIR }}/correlation37.pgen
${{ env.CORRELATION_DIR }}/correlation37.psam
${{ env.CORRELATION_DIR }}/correlation37.pvar.zst
${{ env.CORRELATION_DIR }}/PGS000018_hmPOS_GRCh37.txt.gz
${{ env.CORRELATION_DIR }}/PGS000027_hmPOS_GRCh37.txt.gz
${{ env.CORRELATION_DIR }}/PGS000137_hmPOS_GRCh37.txt.gz
${{ env.CORRELATION_DIR }}/PGS000727_hmPOS_GRCh37.txt.gz
${{ env.CORRELATION_DIR }}/PGS000728_hmPOS_GRCh37.txt.gz
${{ env.CORRELATION_DIR }}/PGS000729_hmPOS_GRCh37.txt.gz
key: ${{ inputs.correlation-cache-key }}
fail-on-cache-miss: true

- name: Set up test requirements
uses: actions/setup-python@v5
with:
python-version: '3.10'
cache: 'pip'

- run: pip install -r ${{ github.workspace }}/tests/requirements.txt

- name: Run correlation test
run: TMPDIR=~ PROFILE=docker pytest --kwdof --symlink --git-aware --wt 2 --tag "test score correlation"

- name: Upload logs on failure
if: failure()
uses: actions/upload-artifact@v4
with:
name: logs-singularity-ancestry
path: |
/home/runner/pytest_workflow_*/*/.nextflow.log
/home/runner/pytest_workflow_*/*/log.out
/home/runner/pytest_workflow_*/*/log.err
/home/runner/pytest_workflow_*/*/output/*
singularity:
if: ${{ inputs.singularity }}
runs-on: ubuntu-latest

steps:
- name: Set environment variables
run: |
echo "CORRELATION_DIR=$RUNNER_TEMP" >> $GITHUB_ENV
- name: Check out pipeline code
uses: actions/checkout@v4

- uses: nf-core/setup-nextflow@v2

- name: Restore singularity setup
id: restore-singularity-setup
uses: actions/cache@v4
with:
path: /opt/hostedtoolcache/singularity/${{ env.SINGULARITY_VERSION }}/x64
key: ${{ runner.os }}-singularity-${{ env.SINGULARITY_VERSION }}
fail-on-cache-miss: true

- name: Add singularity to path
run: |
echo "/opt/hostedtoolcache/singularity/${{ env.SINGULARITY_VERSION }}/x64/bin" >> $GITHUB_PATH
- name: Restore singularity container images
id: restore-singularity
uses: actions/cache@v4
with:
path: ${{ env.NXF_SINGULARITY_CACHEDIR }}
key: ${{ inputs.container-cache-key }}

- name: Restore reference data
uses: actions/cache/restore@v4
with:
path: |
${{ env.CORRELATION_DIR }}/correlation37.pgen
${{ env.CORRELATION_DIR }}/correlation37.psam
${{ env.CORRELATION_DIR }}/correlation37.pvar.zst
${{ env.CORRELATION_DIR }}/PGS000018_hmPOS_GRCh37.txt.gz
${{ env.CORRELATION_DIR }}/PGS000027_hmPOS_GRCh37.txt.gz
${{ env.CORRELATION_DIR }}/PGS000137_hmPOS_GRCh37.txt.gz
${{ env.CORRELATION_DIR }}/PGS000727_hmPOS_GRCh37.txt.gz
${{ env.CORRELATION_DIR }}/PGS000728_hmPOS_GRCh37.txt.gz
${{ env.CORRELATION_DIR }}/PGS000729_hmPOS_GRCh37.txt.gz
key: ${{ inputs.correlation-cache-key }}
fail-on-cache-miss: true

- name: Set up test requirements
uses: actions/setup-python@v5
with:
python-version: '3.10'
cache: 'pip'

- run: pip install -r ${{ github.workspace }}/tests/requirements.txt

- name: Run correlation test
run: TMPDIR=~ PROFILE=singularity pytest --kwdof --symlink --git-aware --wt 2 --tag "test score correlation"
env:
TMPDIR: ${{ runner.temp }}

- name: Upload logs on failure
if: failure()
uses: actions/upload-artifact@v4
with:
name: logs-singularity-ancestry
path: |
/home/runner/pytest_workflow_*/*/.nextflow.log
/home/runner/pytest_workflow_*/*/log.out
/home/runner/pytest_workflow_*/*/log.err
/home/runner/pytest_workflow_*/*/output/*
4 changes: 2 additions & 2 deletions .github/workflows/module.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ jobs:

- name: Upload logs on failure
if: failure()
uses: actions/upload-artifact@v2
uses: actions/upload-artifact@v4
with:
name: logs-docker-${{ inputs.tag }}
path: |
Expand Down Expand Up @@ -110,7 +110,7 @@ jobs:

- name: Upload logs on failure
if: failure()
uses: actions/upload-artifact@v2
uses: actions/upload-artifact@v4
with:
name: logs-singularity-${{ inputs.tag }}
path: |
Expand Down
38 changes: 38 additions & 0 deletions .github/workflows/preload-correlation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: Preload correlation data

on:
workflow_call:
outputs:
cache-key:
value: correlation

jobs:
preload_correlation:
runs-on: ubuntu-latest
steps:
- name: Set environment variables
run: |
echo "CORRELATION_DIR=$RUNNER_TEMP" >> $GITHUB_ENV
- name: Cache reference data
id: cache-ref
uses: actions/cache@v4
with:
path: |
${{ env.CORRELATION_DIR }}/correlation37.pgen
${{ env.CORRELATION_DIR }}/correlation37.psam
${{ env.CORRELATION_DIR }}/correlation37.pvar.zst
${{ env.CORRELATION_DIR }}/PGS000018_hmPOS_GRCh37.txt.gz
${{ env.CORRELATION_DIR }}/PGS000027_hmPOS_GRCh37.txt.gz
${{ env.CORRELATION_DIR }}/PGS000137_hmPOS_GRCh37.txt.gz
${{ env.CORRELATION_DIR }}/PGS000727_hmPOS_GRCh37.txt.gz
${{ env.CORRELATION_DIR }}/PGS000728_hmPOS_GRCh37.txt.gz
${{ env.CORRELATION_DIR }}/PGS000729_hmPOS_GRCh37.txt.gz
key: correlation

- name: Download reference data
if: steps.cache-ref.outputs.cache-hit != 'true'
run: |
wget -qnc -P $CORRELATION_DIR https://ftp.ebi.ac.uk/pub/databases/spot/pgs/resources/correlation.tar.zst
tar -xf $CORRELATION_DIR/correlation.tar.zst -C $CORRELATION_DIR
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@ testing*
assets/report/renv/
assets/report/report.Rproj
.Rprofile
tests/.venv/
1 change: 1 addition & 0 deletions assets/examples/scorefiles/customgrch37.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
#pgs_name=testlift
#pgs_id=custom
#trait_reported=test
#genome_build=GRCh37
chr_name chr_position effect_allele other_allele effect_weight
Expand Down
1 change: 1 addition & 0 deletions assets/examples/scorefiles/customgrch38.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
#pgs_name=test
#pgs_id=custom
#trait_reported=test
#genome_build=GRCh38
chr_name chr_position effect_allele other_allele effect_weight
Expand Down
Loading

0 comments on commit 1321c1a

Please sign in to comment.