How h1s works

For a detailed and well-described of h1s algorithm, please refer to h1s paper (in preparation). Here is just a quick-&-dirty explanation how h1s works.

Simple h1s workflow

In general, h1s contains three main steps: (1) core group compilation, (2) ortholog search and (3) FAS score calculation.

Core group compilation

First, h1s search for orthologs of seed sequence in all taxa within the blast_dir folder (--blastpath). The reference species of the seed sequence must be also present in this core taxon list. Depend on the user specified settings, h1s will try to compile the core ortholog group for the seed with n+1 sequences (n defined by the option --coreOrth, default value is 5) and maximize the taxonomy diversification of the core group in the range between the specified minimum and maximum rank (with the options --minDist and maxDist, by default are genus and kingdom, respectively). The output core ortholog group will be saved in the core_orthologs folder (--hmmpath).

In this step, h1s also use FAS scores for choosing the best candidate to add into a core group. This FAS score evaluation will not be applied if the user uses the option --fasoff.

Ortholog search

After having the core ortholog group of the seed gene, h1s will use its profile HMM to find orthologs in the search taxa, which are all taxa in the genome_dir folder (--searchpath). The main output of this step is a multiple fasta file (jobID.extended.fa), where the seed sequence can be found at the beginning of the file, and followed by all founded ortholog sequences.

If the option --fasoff is used, the last step ii skipped, and h1s will create another output called jobID.phyloprofile, which can be input to PhyloProfile tool for further phylogenetic analyzing.

FAS score calculation

In case --fasoff not set, h1s will perform the FAS score calculation based on the jobID.extended.fa file. hamstrFAS function of the FAS tool will be applied to compare the feature architecture of the seed protein against all other sequences in the jobID.extended.fa. The outputs of this step will be jobID.phyloprofile, jobID_forward.domains and jobID_reverse.domains.

Because hamstrFAS takes the first sequence from the jobID.extended.fa file as the seed protein, therefore if you encounter any strange FAS result, you can check if the jobID.extended.fa is as expected.