-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathFastLocalBranchSupport.html
98 lines (98 loc) · 9.19 KB
/
FastLocalBranchSupport.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, minimal-ui">
<title>Fast Local Branch Support</title>
<link type="text/css" rel="stylesheet" href="assets/css/github-markdown.css">
<link type="text/css" rel="stylesheet" href="assets/css/pilcrow.css">
<link type="text/css" rel="stylesheet" href="assets/css/hljs-github.min.css"/>
</head>
<body>
<article class="markdown-body"><h1 id="fast-local-branch-support"><a class="header-link" href="#fast-local-branch-support"></a>Fast Local Branch Support</h1>
<h2 id="fast-local-branch-support-by-siavash-mirarab-and-erfan-sayyari"><a class="header-link" href="#fast-local-branch-support-by-siavash-mirarab-and-erfan-sayyari"></a>Fast Local Branch Support by Siavash Mirarab and Erfan Sayyari</h2>
<h1 id="introduction"><a class="header-link" href="#introduction"></a>Introduction</h1>
<p>Species tree reconstruction is complicated by effects of Incomplete
Lineage Sorting (ILS), commonly modeled by the multi-species coalescent
model. While there has been substantial progress in developing methods
that estimate a species tree given a collection of gene trees, less
attention has been paid to fast and accurate methods of quantifying
support. </p>
<p>In this paper, we propose a fast algorithm to compute
quartet-based support for each branch of a given species tree with
regard to a given set of gene trees. We then show how the quartet
support can be used in the context of the multi-species coalescent model
to compute i) the local posterior probability that the branch is in the
species tree and ii) the length of the branch in coalescent units. We
evaluate the precision and recall of the local posterior probability on
a wide set of simulated and biological data, and show that it has very
high precision and improved recall compared to multi-locus
bootstrapping. The estimated branch lengths are highly accurate when
gene trees have little error, but are underestimated when gene tree
estimation error increases. Computation of both branch length and local
posterior probability is implemented as a new feature in ASTRAL.</p>
<h3 id="how-to-install-astral"><a class="header-link" href="#how-to-install-astral"></a>How to install ASTRAL</h3>
<p>The installation instruction and dependencies of the ASTRAL is available
at <a href="https://github.com/smirarab/ASTRAL/tree/master">ASTRAL</a>.</p>
<h2 id="how-to-score-species-tree-using-astral"><a class="header-link" href="#how-to-score-species-tree-using-astral"></a>How to score species tree using ASTRAL</h2>
<h4 id="note:-the-following-commands-are-used-for-latest-version-of-the-paper.-the-next-set-of-commands-are-the-commands-used-with-primary-version-of-paper.-note-that-we-resolved-some-bugs-for-latest-version-of-paper.-you-would-find-the-details-of-the-older-versions-at-[arxiv](https://arxiv.org/abs/1601.07019v1)."><a class="header-link" href="#note:-the-following-commands-are-used-for-latest-version-of-the-paper.-the-next-set-of-commands-are-the-commands-used-with-primary-version-of-paper.-note-that-we-resolved-some-bugs-for-latest-version-of-paper.-you-would-find-the-details-of-the-older-versions-at-[arxiv](https://arxiv.org/abs/1601.07019v1)."></a>Note: The following commands are used for latest version of the paper. The next set of commands are the commands used with primary version of paper. Note that we resolved some bugs for latest version of paper. You would find the details of the older versions at <a href="https://arxiv.org/abs/1601.07019v1">arXiv</a>.</h4>
<h4 id="latest-version-of-paper:"><a class="header-link" href="#latest-version-of-paper:"></a>Latest version of paper:</h4>
<p>We used <a href="https://github.com/smirarab/ASTRAL/tree/posteval">ASTRAL (posteval)</a>
version 4.9.1 for scoring and <a href="https://github.com/smirarab/ASTRAL/tree/master">ASTRAL (master)</a>
version 4.9.8 for computing the branch length of the trees.
To have posterior probabilities of branches of main species tree and 2
other alternatives we used <a href="https://github.com/smirarab/ASTRAL/tree/posteval">posteval</a>:</p>
<pre class="hljs"><code> <span class="hljs-selector-tag">java</span> −<span class="hljs-selector-tag">Xmx2000M</span> −<span class="hljs-selector-tag">jar</span> <span class="hljs-selector-tag">astral</span><span class="hljs-selector-class">.4</span><span class="hljs-selector-class">.9</span><span class="hljs-selector-class">.1</span><span class="hljs-selector-class">.jar</span> −<span class="hljs-selector-tag">i</span> <span class="hljs-selector-attr">[GENE TREES]</span> −<span class="hljs-selector-tag">q</span> <span class="hljs-selector-attr">[SPECIES TREE]</span> −<span class="hljs-selector-tag">t</span> 4</code></pre><p> To compute the branch lengths of main species tree we used the MAP
solution with the command <a href="https://github.com/smirarab/ASTRAL/tree/master">master</a> :</p>
<pre class="hljs"><code> <span class="hljs-selector-tag">java</span> −<span class="hljs-selector-tag">Xmx2000M</span> −<span class="hljs-selector-tag">jar</span> <span class="hljs-selector-tag">astral</span><span class="hljs-selector-class">.4</span><span class="hljs-selector-class">.9</span><span class="hljs-selector-class">.8</span><span class="hljs-selector-class">.jar</span> −<span class="hljs-selector-tag">i</span> <span class="hljs-selector-attr">[GENE TREES]</span> −<span class="hljs-selector-tag">q</span> <span class="hljs-selector-attr">[SPECIES TREE]</span> −<span class="hljs-selector-tag">t</span> 2</code></pre><p> To compute the bootstrap support of the alternative topologies we
used <a href="https://github.com/smirarab/ASTRAL/tree/posteval">posteval</a>:</p>
<pre class="hljs"><code> <span class="hljs-selector-tag">java</span> −<span class="hljs-selector-tag">Xmx2000M</span> −<span class="hljs-selector-tag">jar</span> <span class="hljs-selector-tag">astral</span><span class="hljs-selector-class">.4</span><span class="hljs-selector-class">.9</span><span class="hljs-selector-class">.1</span><span class="hljs-selector-class">.jar</span> −<span class="hljs-selector-tag">i</span> <span class="hljs-selector-attr">[BS-replicates]</span> −<span class="hljs-selector-tag">q</span> <span class="hljs-selector-attr">[SPECIES TREE]</span> −<span class="hljs-selector-tag">t</span> 5</code></pre><h2 id="datasets"><a class="header-link" href="#datasets"></a>Datasets</h2>
<p>There are mainly two simulated datasets that we used in this paper.</p>
<ol class="list">
<li><a href="https://drive.google.com/open?id=0B16sMwDmKEuuUnkzd3EzVlZhcDA">ASTRALII
dataset</a>.
This dataset contains estimated gene trees, true gene trees, true
species trees, and infered species trees with ASTRAL, RAxML, and NJST.
The branch length estimations available at
<a href="https://drive.google.com/open?id=0B16sMwDmKEuuRElQSzBxaGFPZEU">ASTRALII-BL</a>.
Also the posterior probability estimate files available at
<a href="https://drive.google.com/open?id=0B16sMwDmKEuuYlZwSWRyaTkxMU0">ASTRALII-PP</a>\</li>
<li><a href="https://drive.google.com/open?id=0B16sMwDmKEuuWWZ3a2tnN3hkdkE">avian
dataset</a>.
This dataset contains true gene trees, estimated gene trees, true
species trees, and estimated species trees. It also contains bootstrap
replicates and the branch lenghths and posterior probability estimation
files. The results' csv files are available at
<a href="https://drive.google.com/open?id=0B16sMwDmKEuuWk00bmRaczN2clE">results</a></li>
</ol>
<h3 id="biological-datasets"><a class="header-link" href="#biological-datasets"></a>Biological datasets</h3>
<p>In this paper, we analyzed 4 different biological datasets. The results
and the datasets are available at <a href="https://drive.google.com/open?id=0B16sMwDmKEuucTV4VnpTVlR1dXM">Biological
dataset</a>:</p>
<ol class="list">
<li>Avian biological dataset of Jarvis et. al. available at paper:
Jarvis, Erich D., et. al. "Phylogenomic analyses data of the avian
phylogenomics project." GigaScience 4.1 (2015): 1-9.</li>
<li>The dataset analyzed by Xi et. al. available at paper:
Xi, Zhenxiang, Liang Liu, Joshua S Rest, and Charles C. Davis.
“Coalescent versus Concatenation Methods and the Placement of Amborella
as Sister to Water Lilies.” Systematic Biology 63, no. 6 (November 1,
2014): 919–932. doi:10.1093/sysbio/syu055.</li>
<li>The 1KP dataset analyzed by Naim Matasci et. al. published at:
Matasci, N., Hung, L.H., Yan, Z., Carpenter, E.J., Wickett, N.J.,
Mirarab, S., Nguyen, N., Warnow, T., Ayyampalayam, S., Barker, M. and
Burleigh, J.G., 2014. Data access for the 1,000 Plants (1KP) project.
GigaScience, 3(1), pp.1-10.</li>
<li>The dataset analyzed by Prum et. al. available at:
Prum, R.O., Berv, J.S., Dornburg, A., Field, D.J., Townsend, J.P.,
Lemmon, E.M. and Lemmon, A.R., 2015. A comprehensive phylogeny of birds
(Aves) using targeted next-generation DNA sequencing. Nature.</li>
</ol>
<h2 id="report-bugs"><a class="header-link" href="#report-bugs"></a>report-bugs</h2>
<p>contact <a href="https://github.com/smirarab">Siavash Mirarab</a>
( smirarab at ucsd dot edu ), or [Erfan Sayyari] (<a href="https://github.com/esayyari">https://github.com/esayyari</a>)
( esayyari at eng dot ucsd dot edu ).</p>
<p>This page was generated by <a href="https://pages.github.com">GitHub Pages</a> </p>
</article>
</body>
</html>