Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run these batch scripts? #2

Open
melop opened this issue Jul 24, 2018 · 9 comments
Open

How to run these batch scripts? #2

melop opened this issue Jul 24, 2018 · 9 comments

Comments

@melop
Copy link

melop commented Jul 24, 2018

Hello,
I copied all bf files to the "TemplateBatchFiles" folder under the hyphy 2.2.6 installation folder. I modified the relative path to ./TemplateModels/chooseGeneticCode.def in the bf scripts.
However when I ran the script, it just shows an error:
Error:
Could not find source dataset file:filepath Path stack: {/beegfs/group_dv/software/source/hyphy-2.2.6/installed/lib/hyphy/,/beegfs/group_dv/software/source/hyphy-2.2.6/installed/lib/hyphy/TemplateBatchFiles/}

Function call stack
1 : Read Data Set ds from file filepath

Segmentation fault

What is the correct way to run these files?

Best Regards,
Ray

@aartivnkt
Copy link

Hi Ray,

Sorry about the delay in response, and the confusion surrounding getting HYPHY to run correctly. The issue seems to be with setting the paths ​correctly for your input files so HYPHY can find them​.

I ​describe here ​a simple ​example ​set up that you can follow, and familiarize​ yourself​ with the path structure of HYPHY.

First, make sure you are in the hyphy directory. This is the directory where the HYPHYMP executable resides if hyphy is installed successfully (i.e through make install). Lets ​call ​this directory​​ /path/hyphy​​ (please change /path to the specific one in your system).

Second, make a directory​,​ called test_run in /path/hyphy that will have a batch file you want to run, and a test alignment file (eg: BranchSites_delta_null.bf batch file and knownGene.uc001hmo.1.1.testHyPhyBS alignment file from github).

Third, go to the test_run directory and run HYPHYMP from there.
cd /path/hyphy/test_run
../HYPHYMP BranchSites_delta_null.bf

Make sure the BranchSites_delta_null.bf file has​ the correct​ paths to ​the input​ alignment file, and chooseGeneticCode.def.​ If you followed this example, ​those paths​ should look like:

filepath = "knownGene.uc001hmo.1.1.testHyPhyBS";
LoadFunctionLibrary("../../hyphy/res/TemplateBatchFiles/TemplateModels/chooseGeneticCode.def", {"0" : "Universal"});

​Note that the path in LoadFunctionLibrary is relative to where you are running HYPHYMP from. So in our example set-up from the test_run directory you need to go two levels up to get to the hyphy directory ("../../hyphy").

Hope this is clear, and you can get some examples to work correctly! Let me know if you have any additional issues with this.
Aarti

@aartivnkt
Copy link

Also, regarding your question on specifying foreground and background branches:

There's a line that specifies the foreground branch in BranchSites_delta.alt.bf that needs to be edited to specify a foreground branch of interest:

ExecuteCommands ("givenTree."+"hg18"+".nonSynRate:=omega_FG*givenTree."+"hg18"+".synRate;");

refers to the hg18, or human branch in the example alignment file "knownGene.uc001hmo.1.1.testHyPhyBS". Please change "hg18" to whatever foreground branch you are interested in to estimate omega_2.

You would also make this change in the BranchSites_delta.null.bf. Note that foreground omega (or omega_2) is only defined for the selection model.

@melop
Copy link
Author

melop commented Aug 1, 2018

Dear Aarti,

Congratulations on your paper and thank you for the detailed example!

I was confused because I though the script accepts parameters from the command line. I am trying to run it on ~13000 genes, so it's a bit difficult to change the source file every time.

About specifying the foreground branch, what do I need to do if I want to specify a whole clade (including multiple tips and internal branches) instead of a single species as the foreground?

In CodeML you can mark the tree directly using "#1" and "$1" notations. Is it possible to implement a similar functionality in your scripts?

Alternatively, do I need to name all the branches in the tree, and refer to them by name in the "ExecuteCommands" statement?

Either way, is it possible to provide a working example?

Thanks for your help!

Ray

@aartivnkt
Copy link

Hi Ray,

Sorry, right now we don't have the functionality for accepting arguments from the command line; The way out for you is to programmatically write a script to change the name of the input file/foreground branch for each gene you are interested in. Also, we have only tested our batch file on a single species foreground branch -- the batch file we provide is a modification of the YangNielsenBranchSite.bf that is available with the hyphy package, designed for this purpose.

Aarti

@melop
Copy link
Author

melop commented Aug 1, 2018 via email

@TibisayEscalona
Copy link

Dear Aarti

I read your paper "MNM cause false inferences of lineage-specific positive selection" and I wanted to apply the BS+MNM test of positive selection to my data using the batch file you provide in
https://github.com/JoeThorntonLab/MNM_SelectionTests.

I am having problems running the data using a new version of Hyphy. My question is if the batch available can only be run using Hyphy 2.2.6 version or it does not matter and probably I am not following the right path. Unfortunately I can not install this version in the computers do to incompatibility.

Do you have any suggestions.

Best regards

Tibisay

@aartivnkt
Copy link

Hi Tibisay,
I haven't tested our code on the new version of hyphy. Please follow instructions described in a previous response for 2.2.6:
#2 (comment)
Aarti

@TibisayEscalona
Copy link

Hi Aarti

Thanks for your reply. I will try and follow what you suggest. Hopefully it will work. I will let you know. If this does not work are you by chance going to test your code in a newer version of Hyphy. Thanks

@TibisayEscalona
Copy link

Hi Aarti

Sorry for the long message.

We manage to run the code in a newer version of Hyphy, and also in an older version of Hyphy. We used your data to test if it works and we notice that depending on the hyphy version the parameter values due change slightly, however which one provides better values is difficult to know.

We are not clear if the batch only runs only once or if we need to run it 50 times in order to get this ML distribution and the median as shown in Supplementary figure 3. Or this run already represents the result of the 50 replicates? and the Lnl value represents the best fit from the 50 replicates. Please clarify

From the results we were not sure about the meaning of certain values, as we could not find any explanation of the meaning of the out-file test results obtained after running the batch. Could you please clarify where are the branch length, the Lnl, np

for example after running the BS+MNM null test once, then results showed

global delta =value;
global kappa_inv=value;
so on for all the branches and parameter values
.
.
.
Each of these values represent the genome-wide median of each parameter, or you need to replicate 50 times?

after the line that says:
Tree givenTree=
(((hg18: 0.001048216840877893, are this the branch length?.
following
{{ this line seems to represent the values of the parameters mention above}
following
{ in this line we think the first value is the Lnl?, then the number of parameters (np), the next number we dont understand, then lots of zeros because is the null model.

Just to clarify if we are following the correct steps

  1. After obtaining the Lnl from the BS+MNM null test we need to compared it with the Lnl of the BS null to see which has a better fit.

  2. To obtain the p-value for a particular gene in the BS+MNM test, the Lnl from the BS+MNM null test is compared to the Lnl from the BS+MNM Alternative by performing a likelihood ratio test. This pvalue is contrasted to the pvalue of the gene obtained from the BS test in PAML codeml.?

Can this code be run in a concatenated data set or only can be run for individual genes?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants