The Polaris project provides
- Population sequencing resources on high throughput Illumina sequencing platforms
- Variant calls validated using population genetic and Mendelian methods
The variant calls currently provided in Polaris are breakpoint-resolved deletion and insertion structural variants (SVs).
Further details of the sequencing resources, input data sources, genotyping methods and validation methods can be found in the project wiki.
Our latest variant call release set is VC1.0. This call set contains 70,706 from a candidate set of 184,988 validated breakpoint-resolved SV calls.
Candidates were identified from 4 sources:
- Previously characterized Manta calls
- Platinum Genomes pedigree consistent events with unique population breakpoints
- Parliament insertions2
- Icelandic insertions identified with PopIns3
All candidates were jointly re-called using our
breakpoint joint caller suite, paragraph
.
Validation consisted of:
- Polaris 1 Diversity Panel / Polaris 1 PGx Panel HWE assessment
- Platinum Genomes pedigree1 pedigree consistency check
PASS calls were either
- Pedigree consistent
- Homozygous in pedigree and MAF > 0.05 + HWE p-value > 0.05 in Polaris panels
Complete release notes for VC1.0 can be found here.
The VC1.0 VCF is available can be downloaded either using AWS CLI
or wget
and can also be viewed in this S3 bucket display. Using wget
is
currently the easier of the two command line options.
Polaris datasets are stored in an AWS S3 bucket called illumina-polaris
, and
can de downloaded using the AWS CLI:
$: aws cp s3://illumina-polaris/vc1_0.vcf.gz
$: aws cp s3://illumina-polaris/vc1_0.vcf.gz.tbi
If you don't have AWS credentials, you can use wget
or a similar tool to
download VC1.0:
$: wget https://s3.amazonaws.com/illumina-polaris/vc1_0.vcf.gz
$: wget https://s3.amazonaws.com/illumina-polaris/vc1_0.vcf.gz.tbi
Population panels with unrestricted access sequenced as part of Polaris are available through BaseSpace or the European Nucleotide Archive (ENA).
Additional panels are available through the EGA or dbGaP with restricted access subject to approval through a Data Access Committee. No variant calls are ever reported in Polaris for restricted access panels.
Further information the sequencing resources described below can be found in the [project wiki][0.3].
All HiSeqX PCR-Free data was generated by Illumina Laboratory Services (ILS) with a target whole genome coverage of 30X.
There are currently two unrestricted access panels available in Polaris, with a third pending.
- Diversity panel (BaseSpace (pending), ENA) — 150 samples selected to represent a diversity of populations
- PGx panel (BaseSpace (pending), ENA) — 70 samples with orthogonally validated genotypes for 28 genes relevant for PGx4
- Trio panel (pending) — 51 children whose parents were sequenced as part of the diversity panel
There is also a restricted access repeat expansion panel available through EGA.
- Parents & grandparents
- ENA — pending
- BaseSpace — pending
- Children
- dbGaP — pending
- Platinum Genomes pedigree
- NIST Ashkenazi Jewish trio
- Platinum Genomes Pedigree
- Platinum Genomes pedigree
- NIST Ashkenazi Jewish trio
Please open an issue to provide feedback or ask questions.
- Eberle, et al (2017) A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree. Genome Res. 27:157-164. doi:10.1101/gr.210500.116
- English, et al (2015) Assessing structural variation in a personal genome-towards a human reference diploid genome. BMC Genomics. 16:286 doi:10.1186/s12864-015-1479-3
- Kehr, et al (2017) Diversity in non-repetitive human sequences not found in the reference genome. Nat Genet. 49(4):588-593. doi: 10.1038/ng.3801
- Pratt, et al (2016) Characterization of 137 Genomic DNA Reference Materials for 28 Pharmacogenetic Genes: A GeT-RM Collaborative Project. J Mol Diagn. 18(1):109-23. doi:10.1016/j.jmoldx.2015.08.005