-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NodeConfig needs to aggregate status conditions so they can be queried #1557
Comments
Ideally add a test with broken NodeConfig |
when we have the aggregated conditions we should update printers as well, see #1730 |
@tnozicka do you know any simple reproducers of the two cases? |
I suppose |
@tnozicka I need some insight into the intended semantics of the current conditions.
|
update+available
The thing with nodeconfig is that it technically may not have a long running workload to call it "available", so I chose "Reconciled" initially, before we had the generic sync functions or the other patterns. I suppose we can use the available as well here in a broader sense. I guess implementation wise we'll always have to have something long running to make it level based, unless kubernetes exposes those through an API.
Can you elaborate here or give me an example, I am not sure I got the point. |
Wouldn't progressing make more sense? It seems better aligned with ScyllaCluster ans StatefulSets for example.
For example, should |
Progressing and Available are given. I though we are talking Reconciled vs. Available.
But there is also the aggregation per node. It's all a bit weird since some conditions come from the workload but fot the aggregation purposes, I'd just do: |
Thanks for all the responses so far.
With how this is implemented now, I was thinking this should be more of a |
I don't find the "mid" aggregations particularly useful (and there is a lot of options), so I'd likely only generically aggregate the |
I think it's useful because the NodeConfig controller has to read these conditions from the status, since they come from different controllers (NodeSetup), so if they're not there - we have to assume a worst-case default. I guess it's easier to do per-matching node then per matching node AND per each controller of NodeSetup. But ok, I understand the conclusion is the intermediate conditions are an implementation detail.
I don't think I follow what this is about. |
I was think about just taking the existing conditions on the status and doing degradedCondition, err := AggregateStatusConditions(
FindStatusConditionsWithSuffix(*allConditions, "Degraded"),
metav1.Condition{
Type: "Degraded",
Status: metav1.ConditionFalse,
Reason: internalapi.AsExpectedReason,
Message: "",
ObservedGeneration: generation,
},
) probably fixing the logic to check observed generations since the conditions come from different places |
The Scylla Operator project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
/lifecycle stale |
/remove-lifecycle stale |
we also need a way of figuring out if all the asynchronous statuses has been reported already (from node setup pods) |
What should the feature do?
The needs to be a way to asses whether NodeConfig has succeded but we only have conditions like
RaidControllerNodeubuntu-2204Degraded
where the node is variable and impractical/impossible to query.What is the use case behind this feature?
Visibility and debugability. When say a RAID setup or mounts fail, it need to fail at the correct place where we wait for the NodeConfig to apply and not 5 command later when manager or scyllacluster breaks.
Anything else we need to know?
No response
The text was updated successfully, but these errors were encountered: