diff --git a/docs/using/Running-PharmCAT-Pipeline.md b/docs/using/Running-PharmCAT-Pipeline.md index 00b68d63..c7e48f14 100644 --- a/docs/using/Running-PharmCAT-Pipeline.md +++ b/docs/using/Running-PharmCAT-Pipeline.md @@ -38,7 +38,8 @@ Standard use case: ``` usage: pharmcat_pipeline [-s | -S ] - [-0] [-G] [-R] [-refRegion ] + [--absent-to-ref] [-unspecified-to-ref] [-G] + [-R] [-refRegion ] [-matcher] [-ma] [-matcherHtml] [-research ] [-phenotyper] [-reporter] [-rs ] [-re] [-reporterJson] @@ -64,8 +65,10 @@ Input arguments: Only applicable if you have multiple samples and only want to work on specific ones. Preprocessor arguments: - -0, --missing-to-ref - Assume genotypes at missing PGx sites are 0/0. DANGEROUS! + --absent-to-ref + Assume genotypes at absent PGx sites are 0/0. DANGEROUS! + -unspecified-to-ref + Assume unspecified genotypes ./. as 0/0 when every sample is './.'. DANGEROUS! -G, --no-gvcf-check Bypass the gVCF check for the input VCF. -R, --retain-specific-regions diff --git a/docs/using/VCF-Preprocessor.md b/docs/using/VCF-Preprocessor.md index 70ba89f7..1a5dc316 100644 --- a/docs/using/VCF-Preprocessor.md +++ b/docs/using/VCF-Preprocessor.md @@ -107,11 +107,17 @@ VCF files can have more than 1 sample and should be [bgzip](http://www.htslib.or -ss
or --single-sample
: Generate 1 VCF file per sample. --0
or --missing-to-ref
-: This option will add missing PGx positions to the output. Missing PGx positions are those whose genotypes are all missing "./." in every single sample. - * This option will not convert "./." to "0/0" if any other sample has non-missing genotype at this position as these missing calls are likely missing for good reasons. - * This **SHOULD ONLY BE USED** if you are sure your data is reference at the missing positions - instead of unreadable/uncallable at those positions. Running PharmCAT with positions as missing vs reference can lead to different results. + +--absent-to-ref +: This option will add absent PGx positions into the output as homozygous reference. + * This **SHOULD ONLY BE USED** if you are sure your data is reference at the absent positions + instead of unreadable/uncallable. + * Running PharmCAT with positions as absent vs reference can lead to different results. + +--unspecified-to-ref +: This option will convert unspecified PGx position to homozygous reference. Unspecified PGx positions are those whose genotypes are unspecified "./." in every single sample. + * This option will not convert "./." to "0/0" when there is a specified genotype at a PGx position as these `./.` calls are likely left unspecified for good reasons. + * Running PharmCAT with positions as unspecified vs reference can lead to different results. -c
or --concurrent-mode
: Enable concurrent mode. This defaults to using one less than the number of CPU cores available. @@ -173,7 +179,7 @@ All preprocessor output files will use the base filename of the input file unles If there are multiple samples, and the `-ss` flag is provided, the preprocessor will produce one PharmCAT-ready VCF file per sample. The output files are named `..preprocessed.vcf` -If there are missing PGx positions, it will also produce a report named `.missing_pgx_var.vcf`. This file only reports positions that are missing in _all_ samples. If `-0`/`--missing-to-ref` is turned on, you can use this report to trace positions whose genotypes are missing in all samples (`./.`) in the original input but have now been added into the output VCF(s) as reference (`0/0`). +The preprocessor will produce a report named `.missing_pgx_var.vcf` when there are absent PGx positions or alleles. This file only reports positions that are absent or unspecified in _all_ samples. The report is based on the input VCF and is not affected by `--unspecified-to-ref` or `--absent-to-ref`. ## Tutorial