-
Notifications
You must be signed in to change notification settings - Fork 6
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decision: Parallel backend and pipeline tool #221
Comments
After some time looking at this and f2f discussion, I now favour using either Upsides
Downsides
Thoughts @zsusswein ? |
Yeah I can see the arguments for this. I would note though that one of the main issues is not lack of cloud support in In terms of the DAG I see the manual DAG construction you do in I think for now we should press on but plan to refactor as a demonstration project that we can pitch if/when there is interest or if there is interest from others (i.e @zsusswein) |
A few quick thoughts:
|
I strongly agree and if we could anything other than azure batch we would be able to continue all in Julia but alas. That being said I can see an argument for a standard approach to pipelining and that might as well be very fully featured (i.e some of the above options).
Noting that containers in julia look quite trivial: https://discourse.julialang.org/t/recommended-recipe-for-deploying-a-julia-app-in-docker-with-efficient-precompilation/95591/2. I would also say that in most good pipeline tools exactly where a specific task is actually being run is abstracted and so I would lightly pushback against this or perhaps rephrase to its key to think about how the tools you are using would require you to do this
Currently the latter. The nice thing about |
Shall we move this to discussion? |
yes |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
We need to decide which pipelining tool and computational backends we are going to use.
At the moment we have a partial
Dagger.jl
implementation but its unclear how we can scale this across different types of compute (i.e local, connections via ssh, slurm and Azure batch). It is also unclear what kind of pipeline tools thatDagger.jl
offers (like the task graph, progress monitoring etc). See here for issue looking into some of this: JuliaParallel/Dagger.jl#512Another option is the
JobSchedulers.jl
andPipelines.jl
ecosystem which would be closer to a traditional command line based workflow. However it is currently unclear if these packages support non-local compute. See here for an issue on this: cihga39871/JobSchedulers.jl#15A final alternative is to take a mixed approach of the two or to take a simpler approach that uses more of the standard base Julia tooling.
Making any decision is not time critical as we can use local compute for our current pipeline but as an example of Julia best practices we want to have a clear steer on scalable approaches for the future.
The text was updated successfully, but these errors were encountered: