A set of functions for pedigree analysis, designed for use with data from the GÉNÉO portal. Based on the functionality of GENLIB: see the article by Gauvin et al. (2015) <doi:10.1186/s12859-015-0581-5>.
The GENLIB reference manual and this README file are sufficient to learn how to use the GENEO toolkit.
- Easily port R code using GENLIB into Python code using the GENEO toolkit;
- Integrate with Python libraries such as Pandas and NumPy;
- Provide speed and convenience;
- Present a modular structure for further development.
- Create a pedigree structure from a file or
DataFrame
; - Output a pedigree as a
DataFrame
; - Identify individuals in a pedigree, such as probands and founders;
- Extract a subpedigree from a pedigree;
- Describe a pedigree, such as the number of individuals and its completeness;
- Compute information about a pedigree, such as the pairwise kinship coefficients of probands and the genetic contributions of ancestors;
- (Eventually) Simulate information about pedigrees and individuals.
-
Clone this repository,
cd
into it, then runpip install .
. Alternatively, without cloning, runpip install https://github.com/GPhMorin/geneo/archive/main.zip
. Both options install two packages,geneo
andcgeneo
(used by the former internally), and their dependencies. You will need a compiler that supports C++17. -
If OpenMP is found during installation, the
geneo.phi()
function will run in parallel, making it the fastest implementation of kinship computation that we know of. If you use macOS, you may need to follow these instructions to enable OpenMP. -
On Windows 11, the toolkit was tested using Microsoft Visual C++ 2022.
-
If the pedigree is loaded from a file, using
geneo.genealogy("path/to/pedigree.csv")
, the file must start with an irrelevant line (such asind father mother sex
) and the following lines must contain, as digits, each individual's ID, their father's ID (0
if unknown), their mother's ID (0
if unknown), and their sex (0
if unknown,1
if male,2
if female), in that order. Each information must be separated by anything but digits (tabs, spaces, commas, etc.), with one line per individual. -
Three datasets come from the GENLIB source code:
geneo.geneaJi
,geneo.genea140
andgeneo.pop140
. They are part of the project for testing and practice. More information on these datasets is available in the GENLIB reference manual. They may be loaded usinggeneo.genealogy(geneo.geneaJi)
, etc. -
You may also load the pedigree from a Pandas DataFrame, for instance:
import geneo as gen
import pandas as pd
inds = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
fathers = [0, 0, 0, 1, 1, 0, 3, 3, 6, 6]
mothers = [0, 0, 0, 2, 2, 0, 4, 4, 5, 5]
sexes = [1, 2, 1, 2, 2, 1, 2, 1, 1, 2]
df = pd.DataFrame({'ind': inds, 'father': fathers,
'mother': mothers, 'sex': sexes})
ped = gen.genealogy(df)
The function calls are almost verbatim copies of GENLIB's. For instance:
# With GENLIB
library(GENLIB)
data(genea140)
ped <- gen.genealogy(genea140)
pro <- gen.pro(ped)
phi <- gen.phi(ped, pro=pro)
mean <- gen.phiMean(phi)
mrca <- gen.findMRCA(ped, c(802424, 868572))
dist <- gen.find.Min.Distance.MRCA(mrca)
out <- gen.genout(ped, sorted=TRUE)
# With the GENEO toolkit
import geneo as gen
genea140 = gen.genea140
ped = gen.genealogy(genea140)
pro = gen.pro(ped)
phi = gen.phi(ped, pro=pro)
mean = gen.phiMean(phi)
mrca = gen.findMRCA(ped, [802424, 868572])
dist = gen.find_Min_Distance_MRCA(mrca)
out = gen.genout(ped, sorted=True)
Function | Description |
---|---|
gen.graph |
Pedigree graphical tool |
gen.simuHaplo |
Gene dropping simulations - haplotypes |
gen.simuHaplo_convert |
Convert proband simulation results into sequence data given founder haplotypes |
gen.simuHaplo_IBD_compare |
Compare proband haplotypes for IBD sharing |
gen.simuHaplo_traceback |
Trace inheritance path for results from gene dropping simulation |
gen.simuProb |
Gene dropping simulations - Probabilities |
gen.simuSample |
Gene dropping simulations - Sample |
gen.simuSampleFreq |
Gene dropping simulations - Frequencies |
gen.simuSet |
Gene dropping simulations with specified transmission probabilities |
gen.fCI |
Average inbreeding coefficient confidence interval |
gen.phiCI |
Average kinship confidence interval |
gen.completenessVar |
Variance of completeness index |
gen.implexVar |
Variance of genealogical implex |
gen.meangendepthVar |
Variance of genealogical depth |