-
Notifications
You must be signed in to change notification settings - Fork 15
/
Copy pathREADME.Rmd
212 lines (136 loc) · 9.96 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file --->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
out.width = "100%",
fig.path = "man/figures/"
)
```
# ADMIXTOOLS 2
A new, lightning fast implementation of [*ADMIXTOOLS*](https://github.com/DReichLab/AdmixTools).
## Overview
*ADMIXTOOLS* is a collection of programs which use genetic data to infer how populations are related to one another. It has been used in countless publications to test whether populations form clades (*qpDstat*, *qpWave*), to estimate ancestry proportions (*qpAdm*), and to fit admixture graphs (*qpGraph*).
*ADMIXTOOLS 2* provides the same functionality as *ADMIXTOOLS* in a new look, and it's orders of magnitude faster. This is achieved through separating the computation of *f*~2~-statistics from all other computations, and through a number of other optimizations.
<!-- In the example below, rendering the plot takes much longer than computing the fit of a new *qpGraph* model: -->
<!-- ![app demo](man/figures/shinyapp1.gif) -->
## Features
* Much faster than the original *ADMIXTOOLS* software
* Simple R command line interface
* Even simpler point-and-click interface
* Several new features and methodological innovations that make it easier to find robust models:
+ [Automated and semi-automated admixture graph inference](html#exploring-different-graphs-1)
+ Simultaneous exploration of many *qpAdm* models
+ Unbiased comparison of any two *qpGraph* models using out-of-sample scores
+ Jackknife and bootstrap standard errors and confidence intervals for any *qpAdm*, *qpWave*, and *qpGraph* parameters
+ Interface with *msprime* makes it easy to [simulate](https://uqrmaie1.github.io/admixtools/articles/graphs.html#simulating-under-an-admixture-graph-1) genetic data under an admixture graph
* Full support for genotype data in *PACKEDANCESTRYMAP/EIGENSTRAT* format and *PLINK* format
* Wrapper functions around the original *ADMIXTOOLS* software (see also [admixr](https://bodkan.net/admixr/index.html))
<!-- * Simple interface with [msprime](https://msprime.readthedocs.io/en/stable/index.html) for simulating under a given admixture graph -->
* [Extensive documentation](https://uqrmaie1.github.io/admixtools/articles/admixtools.html)
* New features available [on request](mailto:[email protected])!
## Installation
```{r, eval=FALSE}
install.packages("devtools") # if "devtools" is not installed already
devtools::install_github("uqrmaie1/admixtools")
library("admixtools")
```
```{r, echo=FALSE, warning=FALSE, message=FALSE, comment=FALSE}
library('admixtools')
```
The above commands will install all R package dependencies which are required to run *ADMIXTOOLS 2* on the command line. For the interactive app, additional packages are required, which can be installed like this:
```{r, eval=FALSE}
devtools::install_github("uqrmaie1/admixtools", dependencies = TRUE)
```
If you encounter any problems during the installation, this is most likely because some of the required R packages cannot be installed. If that happens, try manually re-installing some of the larger packages, and pay attention to any error message:
```{r, eval=FALSE}
install.packages("Rcpp")
install.packages("tidyverse")
install.packages("igraph")
install.packages("plotly")
```
Running `devtools::install_github("uqrmaie1/admixtools")` will compile C++ from source code (this isn't necessary when installing packages from CRAN with `install.packages()`).
If you get the following error:
`Error: Failed to install 'admixtools' from GitHub: Could not find tools necessary to compile a package.`
you might be able to solve the problem by installing [Rtools](https://cran.r-project.org/bin/windows/Rtools/) for your version of R if you use Windows, or [Xcode command line tools](https://developer.apple.com/download/more/) if you use macOS. This is also described [here](https://r-pkgs.org/setup.html#setup-tools).
On some Linux distributions, it may be necessary to manually install additional libraries or programs. For example, if you see the error `gfortran: command not found` during installation, you would have to install `gfortan` with `sudo zypper in gfortran` or `sudo apt-get install gfortran`, depending on the Linux distribution.
If you get any other errors during installation, please [contact me](mailto:[email protected]).
## Usage
Admixture graphs can be fitted like this:
```{r, eval = FALSE}
genotype_data = "/my/geno/prefix"
fit = qpgraph(genotype_data, example_graph)
fit$score
```
```{r, echo = FALSE}
fit = qpgraph(example_f2_blocks, example_graph)
fit$score
```
```{r, eval = FALSE}
plot_graph(fit$edges)
```
![example graph](man/figures/graph1.png)
Clearly not a historically accurate model, but it gets the idea across.
<br>
When testing more than one model, it makes sense to extract and re-use f2-statistics:
```{r, eval = FALSE}
f2_blocks = f2_from_geno(genotype_data)
fit = qpgraph(f2_blocks, example_graph)
```
```{r}
fit$score
```
<br>
*f*~2~-statistics can also be used to estimate admixture weights:
```{r}
left = c("Altai_Neanderthal.DG", "Vindija.DG")
right = c("Chimp.REF", "Mbuti.DG", "Russia_Ust_Ishim.DG", "Switzerland_Bichon.SG")
target = "Denisova.DG"
```
```{r, eval = FALSE}
qpadm(f2_blocks, left, right, target)$weights
```
```{r, echo = FALSE}
qpadm(example_f2_blocks, left, right, target, verbose = FALSE)$weights
```
<br>
Or to get *f*~4~-statistics:
```{r, eval = FALSE}
f4(f2_blocks)
```
```{r, echo = FALSE}
f4(example_f2_blocks)
```
## Interactive browser app
*ADMIXTOOLS 2* also has a simple point-and-click interface. This makes it easy to explore many *qpAdm* or *qpGraph* models at the same time, for example by allowing you to build and change admixture graphs interactively.
Typing the following command in the R console launches the app:
```{r, eval = FALSE}
run_shiny_admixtools()
```
![app demo](man/figures/shinyapp2.gif)
## Documentation
One of the design goals behind *ADMIXTOOLS 2* is to make the algorithms more transparent, so that the steps leading from from genotype data to conclusions about demographic history are easier to follow.
To this end, all *ADMIXTOOLS 2* functions are [documented](https://uqrmaie1.github.io/admixtools/reference/index.html). You can also take a look at the [tutorial](articles/admixtools.html), read more about how *ADMIXTOOLS 2* computes [f-statistics](https://uqrmaie1.github.io/admixtools/articles/fstats.html) and [standard errors](https://uqrmaie1.github.io/admixtools/articles/resampling.html), and what you can do with [admixture graphs](https://uqrmaie1.github.io/admixtools/articles/graphs.html).
For even greater transparency, many of the core functions are implemented twice: In C++ for performance (used by default), and in R, which makes it easier to trace the computations step by step. For example, if you want to know how weights are computed in `qpadm()`, you can type `qpadm` in R to get the function code and you will see another function which is called `qpadm_weights()`. By default, this function will be replaced by its faster C++ version `cpp_qpadm_weights()`. But you can still see what it's doing without reading the C++ code by typing `admixtools:::qpadm_weights`. And you can tell `qpadm()` to use the R versions instead of the C++ versions by calling `qpadm(cpp = FALSE)`.
## Contact
For questions, feature requests, and bug reports, please submit an issue on GitHub, or contact [Robert Maier](mailto:[email protected]) or [Pavel Flegontov](mailto:[email protected]).
## Acknowledgments
Nick Patterson has developed the original *ADMIXTOOLS* software. David Reich and Nick Patterson have not only conceived most of the ideas that have made *ADMIXTOOLS* so successful, they have also provided me (and continue to provide me) with critical ideas and feedback in the development of *ADMIXTOOLS 2*.
I also want to thank all members of the Reich lab for many discussions which have helped me gain a better understanding of the underlying methods, and for crucial feedback about *ADMIXTOOLS 2*; in particular Pavel Flegontov, Iosif Lazaridis, Mark Lipson, Harald Ringbauer, Shop Mallick, and Tian Chen Zeng.
Many useful features in *ADMIXTOOLS 2* were suggested by its users. If you are able to install and run *ADMIXTOOLS 2* without issues, it is only because others have cleared the path for you, by bringing problems to my attention. In no particular order, I want to thank Ornob Alam, Ming-Shan Wang, Angad Johar, Ezgi Altınışık, Tobias Göllner, Christian Huber, Kale, Ted Kandell, Fraser Combe, Matthew Williams, Lenny Dykstra, Kristján Moore, Dilawer Khan, Lareb Humayoun, Sánta Benedek, Steven Rosenberg, Daniel Tabin, Benjamin Peter, and Ahmad Bekhit.
If you expected to see your name here but I failed to include it, please let me know about it!
## Cite *ADMIXTOOLS 2*
For referencing *ADMIXTOOLS 2* in your work, please cite our [eLife paper](https://elifesciences.org/articles/85492).
There is an earlier [preprint](https://www.biorxiv.org/content/10.1101/2022.05.08.491072v1) of the manuscript, and some additional personal comments [here](https://uqrmaie1.github.io/admixtools/articles/paper.html).
## See also
* [ADMIXTOOLS](https://github.com/DReichLab/AdmixTools) The original *ADMIXTOOLS* software
* [admixr](https://bodkan.net/admixr/index.html) An R package with *ADMIXTOOLS* wrapper functions and many useful tutorials
* [admixturegraph](https://github.com/mailund/admixture_graph) An R package for automatic graph inference
* [miqoGraph](https://github.com/juliayyan/PhylogeneticTrees.jl) A Julia package for automatic graph inference
* [MixMapper](http://cb.csail.mit.edu/cb/mixmapper/) Another method to infer admixture graphs
* [TreeMix](https://bitbucket.org/nygcresearch/treemix/wiki/Home) Another method to infer admixture graphs
* [Legofit](http://content.csbs.utah.edu/~rogers/src/legofit/index.html) A program to estimate the history of population size, subdivision, and gene flow
* [qpBrute](https://github.com/ekirving/qpbrute) Automated graph fitting and Bayes factor calculations