Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topicoperator unalterable topic configs #10182

Closed

Conversation

kahartma
Copy link

@kahartma kahartma commented May 31, 2024

Type of change

Select the type of your PR

  • Bugfix
  • Enhancement / new feature

Description

This MR adds a new env var STRIMZI_UNALTERABLE_TOPIC_CONFIG to the TopicOperator, similar to the existing STRIMZI_ALTERABLE_TOPIC_CONFIG. It can be used to specify topic configs that should not be reconciled by the TopicOperator.

If CruiseControl is enabled in the cluster, it will automatically be set to follower.replication.throttled.replicas,leader.replication.throttled.replicas which are used and updated by the partition reassignment process. See follower.replication.throttled.replicas,leader.replication.throttled.replicas

Checklist

Please go through this checklist and make sure all applicable tasks have been done

  • Write tests
  • Make sure all tests pass
  • Update documentation
  • Check RBAC rights for Kubernetes / OpenShift roles
  • Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
  • Reference relevant issue(s) and close them after merging
  • Update CHANGELOG.md
  • Supply screenshots for visual changes, such as Grafana dashboards

@kahartma kahartma changed the title Added STRIMZI_UNALTERABLE_TOPIC_CONFIG to topic operator Topicoperator unalterable topic configs May 31, 2024
Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

Looking back, I think the decision taken in the community call was wrong. I think the default should be set in the Topic Operator itself and not in Cluster Operator? The way it is implemented now would not allow users to override this configuration when deploying Topic Operator through Cluster Operator. WDYT @ppatierno @tombentley @fvaleri?

@scholzj scholzj added this to the 0.42.0 milestone May 31, 2024

var unalterableConfigs = config.unalterableTopicConfig();
if (unalterableConfigs != null && alterConfigOps != null && !unalterableConfigs.isEmpty()) {
if (!unalterableConfigs.equalsIgnoreCase("NONE")) {
Copy link
Author

@kahartma kahartma May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consciously excluded the ALL case here, because I didn't see why anyone would want to block all configs, but could be added of course

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A cluster admin wants to provide locked-up topic configs for some critical cluster (no delegation). That's a valid use case, so I would also add support for that.

Should we also change this method name to skipAlterConfigOps and update the Javadoc, as it now deals with both alterable and non-alterable configs?

@fvaleri fvaleri requested review from fvaleri and tombentley May 31, 2024 13:04
Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the decision taken in the community call was wrong. I think the default should be set in the Topic Operator itself and not in Cluster Operator

@scholzj I agree.

The TO already has the knowledge about CC enablement, so changes should be limited to the topic-operator module.


@kahartma thanks for working on this. Some initial comments.

Unalterable and alterable config variables should be mutually exclusive, so we should skip them if they are both set, and also raise a warning (log and status).

We need some more tests in BatchingTopicControllerTest to check the various permutations and invalid configs:

  1. Should ignore with Cruise Control throttle config

  2. Should reconcile and warn with user and Cruise Control throttle config

  3. AlterableConfig (allow list)
    3.1. Happy path
    3.2. Should reconcile with "ALL", "" (empty string)
    3.3. Should ignore and warn with "NONE", invalid config (eg: "sdasdas" or "retention.bytes; retention.ms" - note ";" instead of ",")

  4. UnalterableConfig (deny list)
    4.1. Happy path
    4.2. Should ignore and warn with "ALL"
    4.3. Should reconcile with "NONE", "" (empty string), invalid config (eg: "sdasdas" and "retention.bytes; retention.ms" - note ";" instead of ",")

  5. Should ignore and warn when both alterable and unalterable configs are set

Probably, we also need to create a documentation paragraph where we explain how to use these new configurations to delegate topic management in a controlled way, but this can be added later as a separate PR.

Comment on lines +38 to +39
* @param saslUsername, The SASL username for the Admin client
* @param saslPassword, The SASL password for the Admin client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How moving saslUsername and saslPassword is related to this change? Can you please revert?

Comment on lines 138 to +139
static final ConfigParameter<String> ALTERABLE_TOPIC_CONFIG = new ConfigParameter<>("STRIMZI_ALTERABLE_TOPIC_CONFIG", STRING, "ALL", CONFIG_VALUES);
static final ConfigParameter<String> UNALTERABLE_TOPIC_CONFIG = new ConfigParameter<>("STRIMZI_UNALTERABLE_TOPIC_CONFIG", STRING, "NONE", CONFIG_VALUES);
Copy link
Contributor

@fvaleri fvaleri May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important to provide some Javadoc. For example:

    /**
     * Use {@code alterableTopicConfig} to specify a comma separated list (allow list) of Kafka topic configurations that are reconciled, everything else is ignored.
     * The default value is "ALL", which means that all configurations are available; the opposite is "NONE", which means that all configurations are ignored.
     * <br/><br/>
     * This is useful in standalone mode when you have a Kafka service that restricts alter operations to a subset of Kafka topic configurations.
     */
    static final ConfigParameter<String> ALTERABLE_TOPIC_CONFIG = new ConfigParameter<>("STRIMZI_ALTERABLE_TOPIC_CONFIG", STRING, "ALL", CONFIG_VALUES);

    /**
     * Use {@code unalterableTopicConfig} to specify a comma separated list (deny list) of Kafka topic configurations that are ignored, everything else is reconciled.
     * The default value is "NONE", which means that all configurations are available; the opposite is "ALL", which means that all configurations are ignored.
     * <br/><br/>
     * This is useful when another application changes dynamic topic configurations directly in Kafka, and the operator should not interfere.
     */
    static final ConfigParameter<String> UNALTERABLE_TOPIC_CONFIG = new ConfigParameter<>("STRIMZI_UNALTERABLE_TOPIC_CONFIG", STRING, "NONE", CONFIG_VALUES);```


var unalterableConfigs = config.unalterableTopicConfig();
if (unalterableConfigs != null && alterConfigOps != null && !unalterableConfigs.isEmpty()) {
if (!unalterableConfigs.equalsIgnoreCase("NONE")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A cluster admin wants to provide locked-up topic configs for some critical cluster (no delegation). That's a valid use case, so I would also add support for that.

Should we also change this method name to skipAlterConfigOps and update the Javadoc, as it now deals with both alterable and non-alterable configs?

readOnlyConfigs.add(key);
}
});
if (unalterableConfigs != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would extract this logic in a separate method, for example addAlterableConfigWarning, and maybe rename addNonAlterableConfigsWarning to addUnalterableConfigWarning.

@@ -1236,7 +1236,7 @@ private static TopicOperatorConfig topicOperatorConfig(String ns, KafkaCluster k
useFinalizer,
100, 100, 10, false, new FeatureGates(""),
false, false, "", 9090, false, false, "", "", "",
"all", false);
"all", "none", false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use uppercase here like in the other tests?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will then also update all->ALL for alterableTopicConfig to keep it consistent

@@ -1990,7 +1990,7 @@ public void shouldTerminateIfQueueFull(
true,
1, 100, 5_0000, false, new FeatureGates(""),
false, false, "", 9090, false, false, "", "", "",
"all", false);
"all", "none", false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use uppercase here like in the other tests?

kahartma and others added 2 commits May 31, 2024 17:24
…eratorConfig.java

Co-authored-by: Federico Valeri <[email protected]>
Signed-off-by: kahartma <[email protected]>
…eratorConfig.java

Co-authored-by: Federico Valeri <[email protected]>
Signed-off-by: kahartma <[email protected]>
@kahartma
Copy link
Author

kahartma commented May 31, 2024

Unalterable and alterable config variables should be mutually exclusive, so we should skip them if they are both set, and also raise a warning (log and status).

@fvaleri Okay, then I would extend this PR a bit more to distinguish between the user-provided STRIMZI_UNALTERABLE_TOPIC_CONFIG and making *.replication.throttled.replicas unalterable if cruise control is activated? Otherwise we would run into the case that user-provided STRIMZI_ALTERABLE_TOPIC_CONFIG would be skipped if cruise control is activated and thereby STRIMZI_UNALTERABLE_TOPIC_CONFIG is automatically set (and also skipped)

@fvaleri
Copy link
Contributor

fvaleri commented May 31, 2024

Unalterable and alterable config variables should be mutually exclusive, so we should skip them if they are both set, and also raise a warning (log and status).

@fvaleri Okay, then I would extend this PR a bit more to distinguish between the user-provided STRIMZI_UNALTERABLE_TOPIC_CONFIG and making *.replication.throttled.replicas unalterable if cruise control is activated? Otherwise we would run into the case that STRIMZI_ALTERABLE_TOPIC_CONFIG would be skipped if cruise control is activated and thereby STRIMZI_UNALTERABLE_TOPIC_CONFIG is automatically set (and also skipped)

If they are both set at the same time, we go with defaults and raise warnings.

I would say that throttle configurations will be always ignored when CC is enabled, unless the user explicitly set them in .spec.config. This is something that we can clarify in the Javadoc. Does it makes sense?

@scholzj
Copy link
Member

scholzj commented Jun 2, 2024

Unalterable and alterable config variables should be mutually exclusive, so we should skip them if they are both set, and also raise a warning (log and status).

@fvaleri Okay, then I would extend this PR a bit more to distinguish between the user-provided STRIMZI_UNALTERABLE_TOPIC_CONFIG and making *.replication.throttled.replicas unalterable if cruise control is activated? Otherwise we would run into the case that STRIMZI_ALTERABLE_TOPIC_CONFIG would be skipped if cruise control is activated and thereby STRIMZI_UNALTERABLE_TOPIC_CONFIG is automatically set (and also skipped)

If they are both set at the same time, we go with defaults and raise warnings.

I would say that throttle configurations will be always ignored when CC is enabled, unless the user explicitly set them in .spec.config. This is something that we can clarify in the Javadoc. Does it makes sense?

I think raising the warning sounds reasonable. But why not allow it? What if the user is using Cruise Control, but at the same time wants to manage the replication throttling for some reason personally through the Topic Operator? I think it is a reasonable default, and reasonable thing to warn against when configuring or in the docs. But not sure I see really a reason to prevent it.

@fvaleri
Copy link
Contributor

fvaleri commented Jun 3, 2024

I think raising the warning sounds reasonable. But why not allow it?

Maybe I wasn't clear, but that's what I meant.

We always allow. If CC is enabled AND the user explicitly sets throttle configs, we allow and raise a warning, as we may have a race condition (there is no race when running rebalances without throttling, which is the default). If CC is NOT enabled, we allow with no warning.

We can extract this CC throttle fix and related unit tests in a separate PR, as it does not involve the new STRIMZI_UNALTERABLE_TOPIC_CONFIG. Wdyt?

@tombentley
Copy link
Member

@fvaleri I think that's reasonable. Better to avoid adding another config if we can avoid it. And the behaviour you describe would do that, so +1 from me.

@fvaleri
Copy link
Contributor

fvaleri commented Jun 6, 2024

@kahartma so this would actually reduce the scope of this PR.

No need to add another environment variable to fix the issue caused by the TO-CC race condition during throttled rebalance. We can simply add a private constant to BatchingTopicController (e.g. THROTTLING_CONFIG) which contains the two throttle config properties.

The logic should be:

  • If Cruise Control is enabled WITHOUT .spec.config throttle props, we skip any throttle alter config operation without warnings
  • If Cruise Control is enabled WITH .spec.config throttle props, we do not skip the throttle alter config operations, but we raise a warning (can't be an error because there is no race when running rebalances without throttling, which is the default)
  • If Cruise Control is NOT enabled, we reconcile as usual without warnings

The interaction between ALTERABLE_TOPIC_CONFIG and THROTTLING_CONFIG requires careful testing.

There is a general need for setting policies to restrict which topic configurations can be changed by normal users (config management delegation), but we think that this requires a formal design proposal to discuss API changes and how this would work in practice.

Sorry for the change in plans. Let me know if you still have time to work on this, or do you want me to continue.

Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Required changes.

@kahartma are you still working on this?

@scholzj
Copy link
Member

scholzj commented Jun 19, 2024

@kahartma Are you still planning to get back to this? Or should someone else have a look at it?

@scholzj
Copy link
Member

scholzj commented Jun 22, 2024

As there was nor response for a long time, we are closing it in favour of #10241

@scholzj scholzj closed this Jun 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants