Skip to content
This repository has been archived by the owner on Jan 16, 2019. It is now read-only.

Site Frequency Spectrum

Skylar Wyant edited this page May 20, 2016 · 15 revisions

This method calculates a site frequency spectrum using ANGSD. Please see ANGSD's tutorial page.

Basic Usage

To run this method, use the following command

angsd-wrapper SFS Site_Frequency_Spectrum_Config

where Site_Frequency_Spectrum_Config is the full path to the configuration file for the site frequency spectrum.

Input files

All inputs should be specified in Site_Frequency_Spectrum_Config.

Common Variables

This method does make use of Common_Config, those that are used are listed below:

Variable Function
SAMPLE_LIST
GROUP_SAMPLES on dev
A list of samples to be used in calculations
SAMPLE_INBREEDING
GROUP_INBREEDING on dev
A list of inbreeding coefficients, where each line here corresponds to a line in SAMPLE_LIST or GROUP_SAMPLES on dev
ANC_SEQ Path to ancestral sequence
PROJECT Name given to all outputs in ANGSD-wrapper
SCRATCH Place to store files, the full path is SCRATCH/PROJECT/SFS
REGIONS Limit the scope of ANGSD-wrapper to certain regions

Method-Specific Variables

This method has no method-specifc variables

Method Parameters

The parameters for this method can be tweaked as necessary, they have been set for optimal generalized function:

Parameter Function
DO_SAF Creates a site frequency spectrum
UNIQUE_ONLY Use uniquely mapped reads only
MIN_BASEQUAL Minimum base quality score
BAQ Adjust Q scores around indels
MIN_IND Minimum number of individuals needed to use this site
GT_LIKELIHOOD Estimates genotype likelihoods
DO_GLF Format of genotype likelihoods file
MIN_MAPQ Minimum base mapping quality
N_CORES Number of cores to use, please do not set above the limits of your system
DO_MAJORMINOR Estimate major/minor alleles
DO_GENO Peform genotype calling
DO_MAF Calculate per-site frequencies
DO_POST Calculate the posterior probability using per-site frequencies
OVERRIDE If true, will recalculate files that already exist

Output files

Naming Scheme Contents
PROJECT_DerivedSFS* Final site frequency spectrum
PROJECT_SFSOut.arg Details of arguments
PROJECT_SFSOut.geno.gz Genotype calls
PROJECT_SFSOut.mafs.gz Minor allele frequencies
PROJECT_SFSOut.saf.gz Intermediate site frequency spectrum
PROJECT_SFSOUT.saf.idx Index of intermediate site frequency spectrum
PROJECT_SFSOut.saf.pos.gz Position data of the saf file

Visualization

PROJECT_DerivedSFS can be visualized with the Shiny graphing interface. A web browser with a graphical user interface is required.