Implement task queues for configurable task launch #398

jpsamaroo · 2023-07-07T16:13:12Z

To allow Dagger to fulfill more use cases, it would be helpful if we
could perform some basic transformations on tasks before they get
submitted to the scheduler. For example, we would like to enable
in-order execution semantics for regions of code that execute GPU
kernels, as this matches GPU synchronization semantics, and thus makes
it easier to upgrade GPU code to use Dagger.

Separately, to enable reliable DAG optimizations, we need to be able to
guarantee that a region of user code can be seen as a whole within the
scheduler. The one-at-a-time task submission semantics that we currently
have are insufficient to achieve this, as some tasks in the DAG region
of interest may already have been launched before the optimization can
see enough of the DAG to be useful.

To support these and other use cases, this commit adds a flexible
pre-submit task queueing system, as well as making it possible to add
additional tasks as synchronization dependencies (instead of the default
set from task arguments).

The task queueing system allows a custom task queue to be set in TLS,
which will be used by @spawn/spawn when submitting tasks. The task
queue is provided one or more task specifications and EagerThunk
handles, and is free to delay and/or batch task submission, as well as
to modify the task specification arbitrarily to match the desired
semantics. Task queues are nestable, and tasks submitted within sets of
nested task queues should inherit the semantics of the queue they are
contained within most directly (with further transformations occuring as
tasks move upwards within the nest of queues). The most upstream task
queue submits tasks to the worker 1 eager scheduler, but this is also
expected to be flexible to allow unique task submission semantics.

To support the goal of predictable optimizations, a LazyTaskQueue is
added (available via spawn_bulk) which batches up multiple task
submissions into just one, and locks the scheduler until all tasks have
been submitted, allowing the scheduler to see the entire DAG structure
all at once. Nesting of spawn_bulk queues allows multiple DAG regions
to be combined into a single total region which is submitted all at
once.

The ability to specify additional task synchronization dependencies is also a
key piece that is orthogonal to task queues. This feature enables the
goal of in-order execution semantics by enabling the creation of an
InOrderTaskQueue (available via spawn_sequential), which tracks the
last-submitted task or set of tasks, and adds those tasks as additional
synchronization dependencies to the next submitted task or set of tasks,
effectively causing serializing behavior. Nesting of spawn_sequential
queues allows separate sequential chains of tasks to be specified, with
deeper-nested chains sequencing after previously-submitted tasks or
chains in shallower-nested queues.

Interestingly, the nesting of spawn_bulk within spawn_sequential
allows entire DAG regions to explicitly synchronize against each other
(such that one region executes before another), while allowing tasks
within each region to still expose parallelism. The inverse nesting of
spawn_sequential within spawn_bulk allows a chain of sequential
tasks to be submitted all at once, adding interesting optimization
opportunities.

Alongside these enhancements, the eager submission pipeline is optimized
by removing the eager_thunk submission pathway (which submitted all
tasks into a Channel), and allows tasks to be directly submitted into
the scheduler without redirection. It is expected that this will improve
task submission performance and reduce memory usage.

To allow Dagger to fulfill more use cases, it would be helpful if we could perform some basic transformations on tasks before they get submitted to the scheduler. For example, we would like to enable in-order execution semantics for regions of code that execute GPU kernels, as this matches GPU synchronization semantics, and thus makes it easier to upgrade GPU code to use Dagger. Separately, to enable reliable DAG optimizations, we need to be able to guarantee that a region of user code can be seen as a whole within the scheduler. The one-at-a-time task submission semantics that we currently have are insufficient to achieve this, as some tasks in the DAG region of interest may already have been launched before the optimization can see enough of the DAG to be useful. To support these and other use cases, this commit adds a flexible pre-submit task queueing system, as well as making it possible to add additional tasks as synchronization dependencies (instead of the default set from task arguments). The task queueing system allows a custom task queue to be set in TLS, which will be used by `@spawn`/`spawn` when submitting tasks. The task queue is provided one or more task specifications and `EagerThunk` handles, and is free to delay and/or batch task submission, as well as to modify the task specification arbitrarily to match the desired semantics. Task queues are nestable, and tasks submitted within sets of nested task queues should inherit the semantics of the queue they are contained within most directly (with further transformations occuring as tasks move upwards within the nest of queues). The most upstream task queue submits tasks to the worker 1 eager scheduler, but this is also expected to be flexible to allow unique task submission semantics. To support the goal of predictable optimizations, a `LazyTaskQueue` is added (available via `spawn_bulk`) which batches up multiple task submissions into just one, and locks the scheduler until all tasks have been submitted, allowing the scheduler to see the entire DAG structure all at once. Nesting of `spawn_bulk` queues allows multiple DAG regions to be combined into a single total region which is submitted all at once. The ability to specify additional task synchronization dependencies is also a key piece that is orthogonal to task queues. This feature enables the goal of in-order execution semantics by enabling the creation of an `InOrderTaskQueue` (available via `spawn_sequential`), which tracks the last-submitted task or set of tasks, and adds those tasks as additional synchronization dependencies to the next submitted task or set of tasks, effectively causing serializing behavior. Nesting of `spawn_sequential` queues allows separate sequential chains of tasks to be specified, with deeper-nested chains sequencing after previously-submitted tasks or chains in shallower-nested queues. Interestingly, the nesting of `spawn_bulk` within `spawn_sequential` allows entire DAG regions to explicitly synchronize against each other (such that one region executes before another), while allowing tasks within each region to still expose parallelism. The inverse nesting of `spawn_sequential` within `spawn_bulk` allows a chain of sequential tasks to be submitted all at once, adding interesting optimization opportunities. Alongside these enhancements, the eager submission pipeline is optimized by removing the `eager_thunk` submission pathway (which submitted all tasks into a `Channel`), and allows tasks to be directly submitted into the scheduler without redirection. It is expected that this will improve task submission performance and reduce memory usage.

jpsamaroo added enhancement scheduler performance needs docs eager api labels Jul 7, 2023

Sch: Improve behavior on internal error

2d6179b

jpsamaroo force-pushed the jps/task-queues branch from a74efdf to 19c1646 Compare July 15, 2023 23:16

jpsamaroo added 8 commits July 16, 2023 14:11

Remove PLUGINS legacy code

3abaa9f

Sch: Enable :move logs

3582b42

Move at-dagdebug into utils

9016977

Sch: Fix dequeue code in processor runner

2075371

estimate_task_costs: Reduce allocations

81eea2b

Sch: Make inbound queue size unlimited

6fd94eb

processor: Workaround dynamic dispatch in prio queue

1ec8563

jpsamaroo force-pushed the jps/task-queues branch from 261ae2d to 1ec8563 Compare July 16, 2023 19:12

jpsamaroo removed the needs docs label Jul 16, 2023

jpsamaroo merged commit b461012 into master Jul 16, 2023

jpsamaroo deleted the jps/task-queues branch July 16, 2023 19:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement task queues for configurable task launch #398

Implement task queues for configurable task launch #398

jpsamaroo commented Jul 7, 2023

Implement task queues for configurable task launch #398

Implement task queues for configurable task launch #398

Conversation

jpsamaroo commented Jul 7, 2023