对于非常近缘的物种,有时候会出现基因树和物种树不一致的情况,这有时候也可以作为一种物种形成过程中伴有基因流的参考。在构建木榄属系统发育树的过程中,采用了将B. exaristata的reads比对到木榄或者海莲基因组上,手动提取cds序列并转换为protein序列,之后找到对应的参考基因组基因所在的orthogroup,手动加入Bex的基因画树的方法。遇到的问题是最终的物种树Bex总和用来当做参考基因组的物种聚在一起。所以这里检查一下物种树和基因树的一致性。
参考基因家族聚类+系统发育树构建的pipeline,考虑到和物种树构建采用相同的数据集,所以采用gblocks删除gap之后的序列。
# work directory: gblocksgbs
perl gb2phy.pl
mkdir phys
mv *phy phys
mv phys ..
cd ../phys
perl raxmlshell.pl
mkdir besttrees bootstraps tbes fbps mltrees logs starttrees rbas bestmodels reducedphys raxmlngout
mv *bestTree besttrees && mv *rba rbas && mv *log logs && mv *mlTrees mltrees && mv *TBE tbes&& mv *FBP fbps && mv *bestModel bestmodels && mv *reduced.phy reducedphys && mv *startTree starttrees && mv *bootstraps bootstraps
mv starttrees/ reducedphys/ best* rbas/ logs/ tbes/ fbps/ bootstraps/ raxmlngout/
mv raxmlngout .. && cd ../raxmlngout
my @fas=glob("*gb");
foreach my $fa (@fas) {
(my $out=$fa)=~s/\-g/\.phy/;
`sh ../../scripts/convertFasta2Phylip.sh $fa >$out`;
}
my @phys=glob("*phy");
foreach my $phy (@phys) {
(my $prefix=$phy)=~s/\.phy//;
`raxml-ng --all -msa $phy --model GTR+I+G --prefix $prefix --seed 2 --threads 1 --bs-metric fbp,tbe --bs-trees 100`;
}
由于raxml-ng输出的是无根树,所以这里要用notung软件先做reroot。
==注:notung不改变输入的树形,所以不用担心做reroot改变内部节点的改变==
cd besttrees
mkdir rawtrees
mv pal* rawtrees
cd rawtrees
perl rename.pl
mkdir renamedtrees && mv OG* renamedtrees && mv renamedtrees .. && cd ../renamedtrees
echo ../species.tree >batch.txt && ls OG* >>batch.txt
java -jar -Xmx8g -XX:ParallelGCThreads=4 ~/software/Notung-2.9.1.5.jar -b batch.txt --root --speciestag postfix --edgeweights name --treeoutput nhx --nolosses --outputdir ../reroot/
cd ../reroot && perl extract_topo.pl >../topo.txt
mkdir topology && mv topo.txt topology && cd topology
(Ptr,(Clo,(Rap,(Bpa,(Bcy,(Bse,(Bex,Bgy)))))));
my @trees=glob("pal*");
foreach my $tree (@trees) {
$tree=~m/\_(\d+)\./;
my $OG=$1;
$OG="OG$OG";
($out=$tree)=~s/pal2aln\_/OG/;
open IN,"<$tree";
open OUT,">$out";
while (<IN>) {
$_=~s/([a-zA-Z]{3})/$OG\_$1/g;
print OUT "$_";
}
}
my @trees=glob("OG*");
foreach my $tree (@trees) {
open TREE,"<","$tree";
while (<TREE>) {
s/\:.*?\[.*?\]//g;
s/\[.*?\]//g;
s/OG\d+\_//g;
s/n\d+//g;
s/\:\d+\.\d+//g;
print "$tree\t$_";
}
}