-
Notifications
You must be signed in to change notification settings - Fork 2
How h1s works
For a detailed and well-described of h1s algorithm, please refer to h1s paper (in preparation). Here is just a quick-&-dirty explanation how h1s works.
In general, h1s contains three main steps: (1) core group compilation, (2) ortholog search and (3) FAS score calculation.
First, h1s search for orthologs of seed sequence in all taxa within the blast_dir
folder (--blastpath
). The reference species of the seed sequence must be also present in this core taxon list. Depend on the user specified settings, h1s will try to compile the core ortholog group for the seed with n+1 sequences (n defined by the option --coreOrth
, default value is 5) and maximize the taxonomy diversification of the core group in the range between the specified minimum and maximum rank (with the options --minDist
and maxDist
, by default are genus and kingdom, respectively). The output core ortholog group will be saved in the core_orthologs
folder (--hmmpath
).
In this step, h1s also use FAS scores for choosing the best candidate to add into a core group. This FAS score evaluation will not be applied if the user uses the option --fasoff
.
After having the core ortholog group of the seed gene, h1s will use its profile HMM to find orthologs in the search taxa, which are all taxa in the genome_dir
folder (--searchpath
). The main output of this step is a multiple fasta file (jobID.extended.fa
), where the seed sequence can be found at the beginning of the file, and followed by all founded ortholog sequences.
If the option --fasoff
is used, the last step ii skipped, and h1s will create another output called jobID.phyloprofile
, which can be input to PhyloProfile tool for further phylogenetic analyzing.
In case --fasoff
not set, h1s will perform the FAS score calculation based on the jobID.extended.fa
file. hamstrFAS
function of the FAS tool will be applied to compare the feature architecture of the seed protein against all other sequences in the jobID.extended.fa
. The outputs of this step will be jobID.phyloprofile
, jobID_forward.domains
and jobID_reverse.domains
.
Because hamstrFAS
takes the first sequence from the jobID.extended.fa
file as the seed protein, therefore if you encounter any strange FAS result, you can check if the jobID.extended.fa
is as expected.