Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make queue size configurable by workers #52

Open
1597463007 opened this issue Jan 29, 2025 · 2 comments
Open

Make queue size configurable by workers #52

1597463007 opened this issue Jan 29, 2025 · 2 comments

Comments

@1597463007
Copy link
Contributor

Currently queue size is configured by the scheduler is applied globally across all connected workers. This was deemed not a problem up until the introduction of the IBM Spectrum Symphony worker which features workers with alternative implementations. Some workers running on higher specced hosts or workers that can execute multiple tasks concurrently should be allowed to accept more tasks into its queue.

This issue will track the progress of moving the queue size configuration into the worker side.

The worker will send the queue size info using the heartbeat payload and the existing --per-worker-queue-size scheduler flag will become a noop and will be removed in the future.

@1597463007 1597463007 changed the title Make queue size configurable by the worker Make queue size configurable by workers Jan 29, 2025
@gxuu
Copy link

gxuu commented Jan 31, 2025

I've rolled a basic solution according to the requirements outlined in issue #52. I welcome any comments and advice to further improve the program.

This pull request doesn't fully comply with the guidelines. Here's a list of the missing items:

  • I didn't increment the version number. While this change breaks compatibility, it's not a significant feature. It might be best to address this when rolling out major updates.
  • I didn't write tests. The implementation passed all existing tests, and I've also manually tested the code. I'm willing and eager to write tests after receiving feedback.
  • I didn't craft the code, and the naming was bad.

I'm also confused about the following:

  • It seems "servers" are organized into "Clusters," and each cluster consists of several workers. Since all workers within a cluster run on the same machine, why not specify the queue size at the cluster level?

Thanks,
gxu

@1597463007
Copy link
Contributor Author

I agree the terminology can be confusing. In Dask, Dask workers is analogous to Scaler clusters as each Dask worker can hold more than one process to execute tasks.

In Scaler's context, "Cluster" means a group of workers running under the same parent PID. It's mainly used to make clean up the workers easier.

The term has diverged a bit from the original meaning as more worker implementations are created. E.g. The IBM Spectrum Symphony Worker is for all intents and purposes a "Worker" but it behaves more like a "Cluster" as communicates with the Symphony Grid Scheduler (and by extension Symphony Grid Workers) and can run tasks concurrently.

The cluster level doesn't have the ability to handle messages, messages are sent directly between the Scheduler and Workers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants