snp2pop

Population origin mapping from cancer SNP profile into 5 continental groups or 26 population groups as defined in 1000 Genomes Project.

AFR (Africa)	EUR (Europe)	AMR (Admixed America)	EAS (East Asia)	SAS (South Asia)
ACB (African Caribbeans in Barbados)	CEU (Utah Residents (CEPH) with Northern and Western European Ancestry)	CLM (Colombians from Medellin, Colombia)	CDX (Chinese Dai in Xishuangbanna, China)	BEB (Bengali from Bangladesh)
ASW (Americans of African Ancestry in SW USA)	FIN (Finnish in Finland)	MXL (Mexican Ancestry from Los Angeles USA)	CHB (Han Chinese in Beijing, China)	GIH (Gujarati Indian from Houston, Texas)
ESN (Esan in Nigeria)	GBR (British in England and Scotland)	PEL (Peruvians from Lima, Peru)	CHS (Southern Han Chinese)	ITU (Indian Telugu from the UK)
GWD (Gambian in Western Divisions in the Gambia)	IBS (Iberian Population in Spain)	PUR (Puerto Ricans from Puerto Rico)	JPT (Japanese in Tokyo, Japan)	PJL (Punjabi from Lahore, Pakistan)
LWK (Luhya in Webuye, Kenya)	TSI (Toscani in Italia)		KHV (Kinh in Ho Chi Minh City, Vietnam)	STU (Sri Lankan Tamil from the UK)
MSL (Mende in Sierra Leone)
YRI (Yoruba in Ibadan, Nigeria)

This tool supports mapping from B-allele frequency data generated with 9 Affymetrix SNP array platforms as well as whole-genome sequencing data as input and a population assignment to one of the five continental groups (with 97.1% accuracy, benchmarked with paired TCGA data) or one of the 26 population groups (with 92.7% accuracy, benchmarked with paired TCGA data).

The currently supported genome version is GRCh37 (hg19). A mapping to other genome versions is planned.

Docker version installation

The easiest way is to use docker application. First, install Docker application, then:

docker pull baudisgroup/snp2pop

Usage

First, you need to create a working directory $hostdir (use absolute path) to copy your input files and to receive the output from the pipeline. (file will be modified, so please only copy your original file into this folder.)

docker run -it --rm --mount type=bind,source=$hostdir,target=/data baudisgroup/snp2pop

After entering the interactive mode of the container, you can place your input files in $hostdir/input directory. Then:

Rscript --vanilla run_pop.r <parameters>

Then you will obtain your mapping results in the /results folder under $hostdir.

Demo Example with test data

You need to download the /test_SNP folder from here and copy the absolute path as $test_dir. These 2 files are processed BAF files from GEO repository with genotyping platform "Mapping250K_Sty".

docker run -it --rm --mount type=bind,source=$test_dir,target=/data baudisgroup/snp2pop

If you use the /test_SNP, run the following command:

Rscript --vanilla run_pop.r -i BAF -o CONT -p Mapping250K_Sty

For testing sequencing data, you can download the 1000Genomes phase3 version 5 data from 1000Genomes project FTP site (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502). If you use sequencing data, run the following command:

Rscript --vanilla run_pop.r -i GZVCF -p Sequencing -o ALL

Options

Options	Type	Description
-i --input	TEXT	input as B allele frequency file format (BAF), or genotype calling format (GC), Birdseed genotype format (BS) for SNP array data, or Variant Call Format (VCF) / gzipped VCF (GZVCF) for sequencing data.
-p --platform	TEXT	SNP array platform (see below), or Sequencing.
-o --output	TEXT	output as 9 theoretical fractions (FRAC), or output as ratio of 5 continental groups with a voting result (CONT) or ratio of 26 population groups with a voting result (POP), or both 26 populations and 5 continental groups summarized from the 26 population voting output (ALL).

Input file -i --input

The input file can be SNP array output or sequencing data.

In case of sequencing data, vcf or vcf.gz file formats are supported as input for sequencing data. Only one vcf file should be placed in the directory.

In case of SNP array output, file should be tab separated. There should be 4 columns: ID (SNP ID or simply indicating row number), chromosome (1-23), nucleotide base position, and a value column (a number within 0-1 if BAF format, or AA/AB/BB if GC format).

Example for BAF input format:

ID	CHRO	BASEPOS	VALUE
SNP_A-2131660	1	1220751	0.3487
SNP_A-1967418	1	2302812	0.9451
SNP_A-1969580	1	2398125	1.0000
SNP_A-4263484	1	2622185	0.4612
...	...	...	...

Example for GC input format:

ID	CHRO	BASEPOS	VALUE
SNP_1	1	1220751	AB
SNP_2	1	2302812	BB
SNP_3	1	2398125	BB
SNP_4	1	2622185	AB
...	...	...	...

Example for BS input format:

ID	CHRO	BASEPOS	VALUE
SNP_1	1	1220751	1
SNP_2	1	2302812	2
SNP_3	1	2398125	2
SNP_4	1	2622185	1
...	...	...	...

Platform specification -p --platform

For sequencing data, "Sequencing" should be used.

For SNP array, one of the following 9 array platforms should be named:

Mapping10K_Xba142
Mapping50K_Hind240
Mapping50K_Xba240
Mapping250K_Nsp
Mapping250K_Sty
GenomeWideSNP_5
GenomeWideSNP_6
CytoScan750K_Array
CytoScanHD_Array

Output file -o --output

Options	Description
FRAC	the 9 fractional values representing estimated theoretical ancestors from admixture model.
CONT	voting percentage of 5 continental groups, the final result with highest vote and a confidence score^*.
POP	voting percentage of 26 population groups, the final result with highest vote and a confidence score^*.
ALL	voting percentage of 26 population groups, the final result with highest vote and a confidence score^. In addition, the voting summary of the population groups belonging to each continental group, then a final result of predicted continental group with a confidence score ^.

^*confidence score: the different between highest and second highest voting percentage.

Citation instructions

If you use SNP2pop in a published analysis, please cite the following article:

Enabling population assignment from cancer genomes with SNP2pop.
Huang Q, Baudis M.
Sci Rep. 2020 Mar 16;10(1):4846. doi: 10.1038/s41598-020-61854-x
PMID: 32179800

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
snp2pop_docker		snp2pop_docker
test_SNP		test_SNP
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

snp2pop

Docker version installation

Usage

Demo Example with test data

Options

Input file -i --input

Platform specification -p --platform

Output file -o --output

Citation instructions

About

Releases

Packages

Languages

baudisgroup/snp2pop

Folders and files

Latest commit

History

Repository files navigation

snp2pop

Docker version installation

Usage

Demo Example with test data

Options

Input file -i --input

Platform specification -p --platform

Output file -o --output

Citation instructions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages