Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are there HOG alignments and gene trees in OrthoFinder output? #943

Open
000generic opened this issue Nov 18, 2024 · 2 comments
Open

Are there HOG alignments and gene trees in OrthoFinder output? #943

000generic opened this issue Nov 18, 2024 · 2 comments

Comments

@000generic
Copy link

000generic commented Nov 18, 2024

Hi!

I am interested in exploring alignments and gene trees of the HOG orthogroups produced by OrthoFinder to evaluate / spot check OrthoFinder output. I am also interested in using HOG alignments and gene trees for species tree generation using other tools. It seems like HOG orthogroup alignments and trees are not available in the OrthoFinder output - vs OG orthogroup alignments and trees are available. Is this correct?

As further background:

My understanding of Orthofinder OG vs HOG orthogroups is that OGs are initially produced in the pipeline - and that a given OG may include two paralogous gene families due to over clustering by Orthofinder. HOGs are produced at a later stage, wherein Orthofinder goes back and identifies over-clustered OG orthogroups and splits them into separate fully orthologous orthogroups. This updated set of orthogroups (HOGs) consists of 1) previously correctly clustered OG orthogroups plus 2) previously over-clustered and then split OG orthogroups.

HOG orthogroups are found in the N0.tsv file - and as I understand it, they are recommended for phylogenetic analysis, as they are highest quality OrthoFinder orthogroups, containing gene families of strictly orthologous genes.

Going through output of OrthoFinder2 and now the new OrthoFinder3, I can locate OG orthogroup alignments in the Working Directory - and OG orthogroup trees in the Resolve Gene Trees directory. I am not locating any alignments or gene for the HOG orthogroups. Is this correct, no HOG orthogroup alignments or trees are produced by OrthoFinder due to how the pipeline works? If not, would it be possible to include an option to have them produced in the future?

Thank you very much :) Eric

@lauriebelch
Copy link

Hi Eric,

This will be changing in the full release of orthofinder3 (out in the next few months!) - we will be reporting gene trees, alignments, sequence files etc. for the N0 hierarchical orthogroup

In the meantime, these can be made quite easily if you need a specific one - the N0.tsv file tells you what node of the gene tree the HOG orthogroup comes from, and you can use this info to trim the tree. You can also use the identity of the genes to trim the alignment and sequence files

@000generic
Copy link
Author

000generic commented Nov 20, 2024

That will be great to have included in the output! That is a good idea to leverage the gene tree node indicated in the NO tsv. However, alignments and so their subsequent trees can be sensitive to sequences included or not - especially for orthologous but divergent sequences like I am working with in deep evolution - and so it could be better to do fresh alignments of any HOGs that are a result of the OG being split.

For now I am just doing all HOGs fresh to make sure the same settings are used in Mafft and FastTree for all sequences - or is there a way to know the specific settings used in building the OG alignments and trees within OrthoFinder? Then I could potentially run fewer HOG tree pipelines and just do the subset that result from an OG split.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants