Skip to content

Commit

Permalink
docs(preprocessor): descriptions of --absent-to-ref and --unspecified…
Browse files Browse the repository at this point in the history
…-to-ref
  • Loading branch information
BinglanLi committed Oct 17, 2024
1 parent f302af6 commit f9adf8e
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 9 deletions.
9 changes: 6 additions & 3 deletions docs/using/Running-PharmCAT-Pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,8 @@ Standard use case:

```
usage: pharmcat_pipeline [-s <samples> | -S <txt_file>]
[-0] [-G] [-R] [-refRegion <bed_file>]
[--absent-to-ref] [-unspecified-to-ref] [-G]
[-R] [-refRegion <bed_file>]
[-matcher] [-ma] [-matcherHtml] [-research <type>]
[-phenotyper]
[-reporter] [-rs <sources>] [-re] [-reporterJson]
Expand All @@ -64,8 +65,10 @@ Input arguments:
Only applicable if you have multiple samples and only want to work on specific ones.
Preprocessor arguments:
-0, --missing-to-ref
Assume genotypes at missing PGx sites are 0/0. DANGEROUS!
--absent-to-ref
Assume genotypes at absent PGx sites are 0/0. DANGEROUS!
-unspecified-to-ref
Assume unspecified genotypes ./. as 0/0 when every sample is './.'. DANGEROUS!
-G, --no-gvcf-check
Bypass the gVCF check for the input VCF.
-R, --retain-specific-regions
Expand Down
18 changes: 12 additions & 6 deletions docs/using/VCF-Preprocessor.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,11 +107,17 @@ VCF files can have more than 1 sample and should be [bgzip](http://www.htslib.or
-ss <span class="altArg"><br />or --single-sample</span>
: Generate 1 VCF file per sample.

-0 <span class="altArg"><br />or --missing-to-ref</span>
: This option will add missing PGx positions to the output. Missing PGx positions are those whose genotypes are all missing "./." in every single sample.
* This option will not convert "./." to "0/0" if any other sample has non-missing genotype at this position as these missing calls are likely missing for good reasons.
* This **SHOULD ONLY BE USED** if you are sure your data is reference at the missing positions
instead of unreadable/uncallable at those positions. Running PharmCAT with positions as missing vs reference can lead to different results.

--absent-to-ref
: This option will add absent PGx positions into the output as homozygous reference.
* This **SHOULD ONLY BE USED** if you are sure your data is reference at the absent positions
instead of unreadable/uncallable.
* Running PharmCAT with positions as absent vs reference can lead to different results.

--unspecified-to-ref
: This option will convert unspecified PGx position to homozygous reference. Unspecified PGx positions are those whose genotypes are unspecified "./." in every single sample.
* This option will not convert "./." to "0/0" when there is a specified genotype at a PGx position as these `./.` calls are likely left unspecified for good reasons.
* Running PharmCAT with positions as unspecified vs reference can lead to different results.

-c <span class="altArg"><br />or --concurrent-mode</span>
: Enable concurrent mode. This defaults to using one less than the number of CPU cores available.
Expand Down Expand Up @@ -173,7 +179,7 @@ All preprocessor output files will use the base filename of the input file unles

If there are multiple samples, and the `-ss` flag is provided, the preprocessor will produce one PharmCAT-ready VCF file per sample. The output files are named `<base_filename>.<sample_id>.preprocessed.vcf`

If there are missing PGx positions, it will also produce a report named `<base_filename>.missing_pgx_var.vcf`. This file only reports positions that are missing in _all_ samples. If `-0`/`--missing-to-ref` is turned on, you can use this report to trace positions whose genotypes are missing in all samples (`./.`) in the original input but have now been added into the output VCF(s) as reference (`0/0`).
The preprocessor will produce a report named `<base_filename>.missing_pgx_var.vcf` when there are absent PGx positions or alleles. This file only reports positions that are absent or unspecified in _all_ samples. The report is based on the input VCF and is not affected by `--unspecified-to-ref` or `--absent-to-ref`.


## Tutorial
Expand Down

0 comments on commit f9adf8e

Please sign in to comment.