Add worker config to cache task completion results. #3178

riga · 2022-06-21T14:27:16Z

Description

This PR adds a new worker config cache_task_completion which, when enabled (default is False of course), leads to tasks' successful complete() results being cached by the worker. In short, this can save resources in case tasks have a large number of dynamic dependencies that are yielded in several stages (see below), and the tasks' complete() calls are somewhat expensive (e.g. when file targets are remote).

Motivation and Context

We sometimes have k's of tasks being yielded through dynamic dependencies, and we only use files on remote locations. Therefore we want to limit the number of API calls and would really benefit from this option. Also, in our case we are sure that once a task is complete, it never toggles to being incomplete again, so that this kind of caching is 100% safe (and I'm sure that the opposite is rather rare). Example:

class MyTask(luigi.Task):
    ...
    def run(self):
        # first round
        deps_1 = [DepA1(), DepB1(), ..., DepZ1()]
        outputs_1 = yield deps_1

        # second round
        # (with some artificial relation to outputs_1 to justify that two yields are required ...)
        deps_1 = [dep(param=x) for dep, x in zip([DepA2, DepB2, ..., DepZ2], outputs_1)]
        yield deps_2

When running, deps_1 are yielded twice, and the completion checks are triggered multiple times:

when yielded the first time and reaching this point,
before being added to the tree
after each dep ran, in the post-run check,
when yielded the second time at the same position as 1.5.

And for every additional round of yielding, there will be another check (same as 4.). With the new config enabled, the checks would only run for 1. and 2..

Have you tested this? If so, how?

I added a test to worker_test.py that checks the number of complete() calls of tasks being yielded as dynamic dependencies, with and without the new config. The documentation is updated as well.

dlstadther

I understand the purpose of this change and I think it's great! However, I'm a bit confused by some of the unittest assertions. Hoping to get a little clarity in my understanding of those values.

dlstadther · 2022-06-23T01:30:34Z

test/worker_test.py

+            self.assertEqual(a10.complete_count, 5)
+            self.assertEqual(a11.complete_count, 4)
+            self.assertEqual(a12.complete_count, 3)


i'm a little unclear why these complete_count values for each of the tasks differ in quantity. Could you clarify that for me?

Similarly, i'm confused as to why the assertions below (with check_complete_on_run=True) resulted in larger complete_count quantities.

i'm a little unclear why these complete_count values for each of the tasks differ in quantity. Could you clarify that for me?

Sure!

The complete count with cache_task_completion=False of (5, 4, 3) for (a10, a11, a12) is due to luigi's assumption of idempotence of run() methods that yield dynamic dependencies in that the worker invokes run() and in case it's a generator, it get's the next result and

if it's a bunch of already complete tasks (some of the completeness checks is happening here), it gets the next generator result, or

if it's a bunch of tasks of which at least one is not complete yet, it adds all of them to the tree and forgets about the state of the generator.

(code here) The yielding task is placed back to the tree in PENDING state, too. And when it's started again later on, the entire procedure is triggered again, leading to a new generator in its initial state, but now with the previously incomplete bunch being complete. Therefore, completion checks of tasks of a certain bunch are always performed at least once more than those in the next bunch.

Similarly, i'm confused as to why the assertions below (with check_complete_on_run=True) resulted in larger complete_count quantities.

With check_complete_on_run=True there is a single, additional call happening here which is increasing the counts. I wanted to check if that's consistent with the proposed changes so I added this block in the same test.

Oh, i see now. This makes much more sense now! Thank you for the thorough response!

dlstadther

Code changes, documentation additions, and unittest coverage LGTM!

Thanks for your contribution!

dlstadther · 2022-06-24T01:12:33Z

test/worker_test.py

+            self.assertEqual(a10.complete_count, 5)
+            self.assertEqual(a11.complete_count, 4)
+            self.assertEqual(a12.complete_count, 3)


Oh, i see now. This makes much more sense now! Thank you for the thorough response!

Add option to cache completion results.

a434e1f

riga requested review from dlstadther and a team as code owners June 21, 2022 14:27

riga added 3 commits June 21, 2022 21:20

Extend completion cache to Worker methods.

8472045

Sync completation cache after running task.

f874aaf

Increase test coverage.

4227592

dlstadther reviewed Jun 23, 2022

View reviewed changes

Merge branch 'master' into feature/cache_completion_status.

3c43a97

dlstadther approved these changes Jun 24, 2022

View reviewed changes

dlstadther merged commit f8254fd into spotify:master Jun 24, 2022

This was referenced Jun 26, 2022

Improve control over dynamic requirements #3179

Merged

Assume all tree array functions to work in-place columnflow/columnflow#54

Merged

meliache mentioned this pull request Feb 7, 2023

Only build the graph on the batch_worker for non complete tasks. nils-braun/b2luigi#185

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add worker config to cache task completion results. #3178

Add worker config to cache task completion results. #3178

riga commented Jun 21, 2022 •

edited

Loading

dlstadther left a comment

dlstadther Jun 23, 2022

dlstadther Jun 23, 2022

riga Jun 23, 2022 •

edited

Loading

dlstadther Jun 24, 2022

dlstadther left a comment

dlstadther Jun 24, 2022

Add worker config to cache task completion results. #3178

Add worker config to cache task completion results. #3178

Conversation

riga commented Jun 21, 2022 • edited Loading

Description

Motivation and Context

Have you tested this? If so, how?

dlstadther left a comment

Choose a reason for hiding this comment

dlstadther Jun 23, 2022

Choose a reason for hiding this comment

dlstadther Jun 23, 2022

Choose a reason for hiding this comment

riga Jun 23, 2022 • edited Loading

Choose a reason for hiding this comment

dlstadther Jun 24, 2022

Choose a reason for hiding this comment

dlstadther left a comment

Choose a reason for hiding this comment

dlstadther Jun 24, 2022

Choose a reason for hiding this comment

riga commented Jun 21, 2022 •

edited

Loading

riga Jun 23, 2022 •

edited

Loading