Only build the graph on the batch_worker for non complete tasks. #185

MarcelHoh · 2023-02-07T11:07:59Z

I had an issue where the batch worker seemed to get stuck in infinite loops of task requirements (at least I saw it loop over 1+ million tasks before killing the jobs. This is quite strange as such loops are not present within the pipeline and the actual process call is able to build the graph without issue.

While I haven't fully solved the issue yet, I found setting the batch worker to not check dependents of completed tasks sped up the task processing and at least for now avoided the above issue. I'm still investigating my original issue but this change might be more broadly beneficial.

Only build the graph on the batch_worker for non complete tasks.

codecov-commenter · 2023-02-07T11:11:14Z

Codecov Report

Base: 59.20% // Head: 59.20% // No change to project coverage 👍

Coverage data is based on head (bbf9278) compared to base (1ff3782).
Patch coverage: 100.00% of modified lines in pull request are covered.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #185   +/-   ##
=======================================
  Coverage   59.20%   59.20%           
=======================================
  Files          23       23           
  Lines        1554     1554           
=======================================
  Hits          920      920           
  Misses        634      634

Impacted Files	Coverage Δ
b2luigi/__init__.py	`88.46% <100.00%> (ø)`
b2luigi/cli/runner.py	`43.75% <100.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

MarcelHoh · 2023-02-07T12:41:29Z

Sorry, let me open a new PR with a better approach.

meliache · 2023-02-07T12:52:36Z

Hmnnn, I can't comment on your infinite loop issue without a minimal reproducible example. Regarding the code change for the run_as_batch_worker, I'm not eager to accept that without understanding why the existing option only_non_complete=True wasn't used in the first place, whether it was an oversight by @nils-braun or it's just a workaround for a bug that arises in your specific use-case but not in the general case. I also don't have the best understanding of the b2luigi core but run_as_batch_worker is as far as I understand the function that executes usually a single task on a remote batch worker in a separate remote b2luigi process that has been spawned with the --batch-runner cli argument. Therefore, usually, I would expect that run_as_batch_worker typically just receives a single task and it shouldn't know about the completion status of that task as it's a separate luigi process. So I'm confused how using only_non_complete=True would help there except something is horribly wrong, so I'm confused.

While writing this message I saw you closed the PR and announced you will do a new one. Just note I don't really have time for working on b2luigi due to a paper and my thesis deadline in a couple of weeks, so it might take until a weekend or a couple of weeks until I have time to properly review it.

Maybe the new option cache_task_completion added in spotify/luigi#3178 and released in Luigi 3.1.1 might help with performance. You could try that option by updating to the latest luigi release and then setting that option in the worker configuration.

cache_task_completion
By default, luigi task processes might check the completion status multiple times per task which is a safe way to avoid potential inconsistencies. For tasks with many dynamic dependencies, yielded in multiple stages, this might become expensive, e.g. in case the per-task completion check entails remote resources. When set to true, completion checks are cached so that tasks declared as complete once are not checked again. Defaults to false.

This is not a b2luigi setting, but a general luigi setting and thus should be set in a luigi.cfg or luigi.toml file, based on certain environment variables, so please see the Luigi configuration documentation on how to do that. For example, you could export the environment variables in your .bashrc

export LUIGI_CONFIG_PARSER=toml
export LUIGI_CONFIG_PATH=$HOME/luigi.toml

and create a file $HOME/luigi.toml with the contents

[worker]
cache_task_completion = true

MarcelHoh · 2023-02-07T13:11:50Z

Thanks for the quick feedback, and all the best for your submission ! I suspect only_non_complete was not enabled as this avoids the cost of checking whether each task is complete. I think therefore it makes sense to leave this as False as it stands currently. Let me continue discussion on the new PR as I see you are already reviewing it!

Update runner.py

bbf9278

Only build the graph on the batch_worker for non complete tasks.

github-actions bot added the needs changelog Entry to CHANGELOG.md is missing label Feb 7, 2023

MarcelHoh closed this Feb 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only build the graph on the batch_worker for non complete tasks. #185

Only build the graph on the batch_worker for non complete tasks. #185

MarcelHoh commented Feb 7, 2023

codecov-commenter commented Feb 7, 2023

MarcelHoh commented Feb 7, 2023

meliache commented Feb 7, 2023

MarcelHoh commented Feb 7, 2023

Only build the graph on the batch_worker for non complete tasks. #185

Only build the graph on the batch_worker for non complete tasks. #185

Conversation

MarcelHoh commented Feb 7, 2023

codecov-commenter commented Feb 7, 2023

Codecov Report

MarcelHoh commented Feb 7, 2023

meliache commented Feb 7, 2023

MarcelHoh commented Feb 7, 2023