Replies: 1 comment 4 replies
-
Hello, It's definitely best to calculate scores on cohorts in version A 50% match rate isn't good, but it's a normal problem when working with WGS data. We assume VCF input to this workflow have been processed by an imputation server like Michigan or TopMed. The workflow has difficulties with WGS VCFs because variants that are homozygous REF are treated as missing data, lowering the match rate. Here's an explanation of match rates: #86 (comment) Another user working with WGS gVCFs described some extra steps they had to take to process the VCF before using our workflow: I hope I've explained well enough 😄 Please let me know if you have any more questions |
Beta Was this translation helpful? Give feedback.
-
Hi:
Thank you for developing such a great nextflow pipeline, I tested and it's running well on the test dataset. Now I want to apply it to my own sample cohort. My genome files are originally separate VCF files and each VCF file only contains WGS data for one individual. I tried setting one file (I converted it to plink2 format) using a sample sheet and it's successfully running. Then got the hint that the sample size is too small. My question here is should I always use pgsc_calc to calculate PRS scores on a cohort not just per sample? And if so, should I get multi-sample merged VCF files (or all other kinds of format) always, and the sample sheet going to be only one line? (if I just want to use all of the chromosomes)
My second question is, when I calculated PRS scores by only one sample's genotype, it always give errors of variant matching lower than the threshold (around 50%). Although I fixed the error by resetting the threshold, I wonder if is that a normal thing.
Thank you in advance and looking forwards to your reply.
Beta Was this translation helpful? Give feedback.
All reactions