Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set open file limit for ScyllaDB processes #2160

Merged
merged 1 commit into from
Oct 23, 2024

Conversation

zimnx
Copy link
Collaborator

@zimnx zimnx commented Oct 18, 2024

Description of your changes:

ScyllaDB, during regular operation, may need to manage millions of open files due to the nature of its workload and architecture. The open file limit (rlimit) for containers is inherited from the CRI and systemd, both of which tend to set conservative limits to avoid misbehavior in other programs when high limits are applied. Simply setting fs.nr_open using ScyllaCluster sysctls API is insufficient to raise these limits for ScyllaDB process.

To automate setting it, Scylla Operator NodeConfig container optimization was extended with additional Job discovering the maximum possible limit and setting it on main process of ScyllaDB containers. ScyllaDB Pods await until limit is changed before starting ScyllaDB process. Any forks (sidecar starter or hypervisor) should inherit the limits.

Users should increase fs.nr_open to at least value recommended by ScyllaDB, because defaults of popular Container Runtimes are ~1024 times lower. Sysctls can currenly be changed via scylladbcluster.spec.sysctls field. Note that this tuning is applied only on Nodes matching deployed NodeConfig selector.

Which issue is resolved by this Pull Request:
Resolves #2131

@zimnx zimnx added kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Oct 18, 2024
@scylla-operator-bot scylla-operator-bot bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Oct 18, 2024
@scylla-operator-bot scylla-operator-bot bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 18, 2024
@zimnx zimnx force-pushed the resourcelimits branch 2 times, most recently from 9630186 to 18e77d8 Compare October 21, 2024 10:01
@scylla-operator-bot scylla-operator-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 21, 2024
@zimnx zimnx force-pushed the resourcelimits branch 2 times, most recently from cfac7a8 to db4b88a Compare October 21, 2024 10:03
@zimnx zimnx changed the title Set open file limit for ScyllaDB processes [WIP] Set open file limit for ScyllaDB processes Oct 21, 2024
@scylla-operator-bot scylla-operator-bot bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 21, 2024
@scylla-operator-bot scylla-operator-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 21, 2024
@zimnx zimnx changed the title [WIP] Set open file limit for ScyllaDB processes Set open file limit for ScyllaDB processes Oct 21, 2024
@scylla-operator-bot scylla-operator-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 21, 2024
@zimnx zimnx requested review from tnozicka and rzetelskik October 21, 2024 14:05
@zimnx
Copy link
Collaborator Author

zimnx commented Oct 21, 2024

Manager flake - #2061 (comment)
/retest

if err != nil {
return fmt.Errorf("can't change rlimits: %w", err)
} else {
klog.InfoS("Rlimits were changed successfully")
Copy link
Contributor

@tnozicka tnozicka Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this belongs to the end of the function so it's logged for every caller similar to the intro logging there

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's on the end, would you mind providing a suggestion instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the end of changeRlimits (inside the function)

@zimnx zimnx requested a review from tnozicka October 22, 2024 14:52
@zimnx
Copy link
Collaborator Author

zimnx commented Oct 22, 2024

Flake - #2096 (comment)
/retest

@tnozicka
Copy link
Contributor

/approve

/assign @rzetelskik
(I'll be on PTO till Tuesday)

Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tnozicka, zimnx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@rzetelskik rzetelskik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one suggestion, rest lgtm

ScyllaDB, during regular operation, may need to manage millions of open files due to the nature of its workload and architecture.
The open file limit (rlimit) for containers is inherited from the CRI and systemd, both of which tend to set conservative limits to avoid misbehavior in other programs when high limits are applied.
Simply setting fs.nr_open using ScyllaCluster sysctls API is insufficient to raise these limits for ScyllaDB process.

To automate setting it, Scylla Operator `NodeConfig` container optimization was extended with additional Job discovering the maximum possible limit and setting it on main process of ScyllaDB containers.
ScyllaDB Pods await until limit is changed before starting ScyllaDB process. Any forks (sidecar starter or hypervisor) should inherit the limits.

Users should increase `fs.nr_open` to at least value [recommended by ScyllaDB](https://github.com/scylladb/scylladb/blob/master/dist/common/sysctl.d/99-scylla-filemax.conf#L5), because defaults of popular Container Runtimes are ~1024 times lower. Sysctls can currenly be changed via `scylladbcluster.spec.sysctls` field.
Note that this tuning is applied only on Nodes matching deployed NodeConfig selector.
@rzetelskik
Copy link
Member

/lgtm
thanks

@scylla-operator-bot scylla-operator-bot bot added the lgtm Indicates that a PR is ready to be merged. label Oct 23, 2024
@scylla-operator-bot scylla-operator-bot bot merged commit 19072b5 into scylladb:master Oct 23, 2024
12 checks passed
@zimnx zimnx deleted the resourcelimits branch October 23, 2024 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bump open file limit for ScyllaDB process during container tuning
3 participants