Haplotype Fixation Index (HFI) for crop populations with homozygous nature, such as rice.
NOTICE: The formulas and pictures in this page may not be displayed properly in some regions due to local Internet policies.
By Zhuo CHEN, contact: [email protected] or [email protected]
Considering the homozygous nature of cultivated rice, I designed a haplotype-based estimate HFI for genetic differentiation analysis. The design of HFI estimate was inspired from fixation index Weir and Cockerham's FST but with multiple major changes.
Artificial hybrid breeding had shuffled the distribution of various haplotypes in crop popuation genomes, and the number and allele frequency of different haplotypes in a population may be more important than the base differences between the haplotypes.
The motivation for designing this method was to assess the changes in the absolute value of haplotype diversity rather than the fold-change, and ignoring the number of base differences between haplotypes.
HFI is based on two other haplotype-based estimates, namely hapDiv (haplotype diversity) and hapDist (haplotype distance).
where n is the number of haplotypes in the window; xi and xj are the allele frequency of the haplotype i and j; dij is the genetic distance between haplotype i and j. If there are any clear base difference (excluding missing genotype or heterozygous genotype) between the two haplotypes, dij will be set as 1; otherwise, it will be set as zero.
where definition of n, x and d in the equation are the same with those in the equation of hapDiv.
Then:
Typical usage:
perl HFI.pl --in pop.geno --out out.hfi --list1 pop1.list --list2 pop2.list
For detailed usage:
perl HFI.pl
Input format: tab-delimited table with header. Each line contains: chr pos geno1 geno2 ... genoX
This program was designed only for populations with homozygous nature. The genotype coding in the input file is:
0 for reference type; 1 for alternative type; - for missing or heterozygous genotype.
The input file is recommended to be created from a VCF format file and pruned with the script SNP_pruning.r2.pl
perl SNP_pruning.r2.pl --in pop.vcf --out pop.geno
Weir BS, Cockerham CC. ESTIMATING F-STATISTICS FOR THE ANALYSIS OF POPULATION STRUCTURE. Evolution. 1984;38(6):1358–1370. doi:10.1111/j.1558-5646.1984.tb05657.x
Zhuo Chen, Xiuxiu Li, Hongwei Lu, Qiang Gao, Huilong Du, Hua Peng, Peng Qin, Chengzhi Liang. Genomic atlases of introgression and differentiation reveal breeding footprints in Chinese cultivated rice. Journal of Genetics and Genomics, 2020, ISSN 1673-8527, https://doi.org/10.1016/j.jgg.2020.10.006.