Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execution of global section of task can fail on remote host #1155

Closed
BoPeng opened this issue Jan 4, 2019 · 10 comments
Closed

Execution of global section of task can fail on remote host #1155

BoPeng opened this issue Jan 4, 2019 · 10 comments
Labels

Comments

@BoPeng
Copy link
Contributor

BoPeng commented Jan 4, 2019

[global]
parameter: something=5 
def a():
  print(something)

[default]
a()

is ok, but

[global]
parameter: something=5 
def a():
  print(something)

[default]
task:
a()

fails because parameters are not handled in tasks and should have been passed. However, SoS checks the content of step default and does not see parameter is needed....

@BoPeng
Copy link
Contributor Author

BoPeng commented Jan 4, 2019

This only fixes part of the problem. as stated here, the global section can fail if it is executed on the remote host. For example

[global]
genes = read('whatever')
def func():
    pass

You load the gene list from a local file, submit the task to a remote host where whatever does not exist. SoS currently ignores this problem but then func would not be defined due to the failure. A solution might be to do

[global]
parameter: genes=read('whatever')
def func():
   pass

because parameters are passed instead of processed in this case.

A real solution might be separating the parts that cannot be executed remotely:

[global]
if not_in_task:
    genes = read(whatever)
def func():
   pass

@BoPeng BoPeng changed the title Execution of global section fail in tasks due to parameter handling Execution of global section of task can fail on remote host Jan 4, 2019
@BoPeng
Copy link
Contributor Author

BoPeng commented Jan 4, 2019

The problem with the patch 0a0e7bd is that when all parameters are passed as dependencies to every step, change of parameters in global section will cause all steps to be re-executed. Something more clever has to be done.

@BoPeng
Copy link
Contributor Author

BoPeng commented Jan 4, 2019

Should we explicitly require the global section to be executable on remote host? If the section reads a file, the user should be responsible for making sure the file is available... I think this is the cleanest solution.

@gaow
Copy link
Member

gaow commented Jan 4, 2019

If the section reads a file, the user should be responsible for making sure the file is available...

I'm not sure if I understand this ... is there a way to only execute global section once, have it pickled somewhere (and keep them in sync), and for each steps that follows we load this pickled object? Or is this less efficient than the current implementaiton?

@gaow
Copy link
Member

gaow commented Jan 4, 2019

and to clarify, executing global section happens at every step, not every substep, right?

@BoPeng
Copy link
Contributor Author

BoPeng commented Jan 5, 2019

No. Things like import time cannot be pickled around. I meant if you have a local file and read it from the global section, you will have to transfer the file to remote server if you execute any task there.

@gaow
Copy link
Member

gaow commented Jan 5, 2019

I see, right we discussed import cannot be pickled. It is fair enough then to ask for file required for global section to present.

@BoPeng
Copy link
Contributor Author

BoPeng commented Jan 5, 2019

The scenario was like getting a large number of loci and dispatch individual locus to tasks, so asking each task to read the file and get all the loci does not make much sense. This is why I simply ignored errors for failed execution of global section on remote host.

@BoPeng
Copy link
Contributor Author

BoPeng commented Jan 5, 2019

For sanity reasons I think we should require global section to be correctly executed on master, but we will need to provide a way to ignore some statements.

parameter: loci_list = read(.....)

will serve the purpose but I am not sure we should advertise it.

@BoPeng
Copy link
Contributor Author

BoPeng commented Feb 26, 2019

#1219

The global section will no longer be executed on remote host...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants