Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consume Topology CR by reference #924

Merged
merged 2 commits into from
Feb 26, 2025

Conversation

fmount
Copy link
Contributor

@fmount fmount commented Feb 18, 2025

This patch provides more granular control over Pod placement and scheduling through the Topology CR integration introduced in infra-operator. In particular it provides:

  • a new API parameter (TopologyRef) defined for each Component that allows to reference an existing Topology CR in the same namespace

  • the operator logic that retrieves and processes the referenced Topology CR through the functions provided by infra-operator

  • an enhanced StatefulSet configuration that incorporates the processed Topology

  • a set of envTest to test the lifecycle (add/update/override/remove) of the resulting StatefulSets when a Topology is referenced

Note that webhooks are in place to prevent referencing a Topology from a different namespace (which is not a supported scenario).

Note: This is still WIP as the Topology support needs to be extended to NovaCells

Close: https://issues.redhat.com/browse/OSPRH-14208

Depends-On: openstack-k8s-operators/openstack-operator#1312

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/8565649c80eb4f9b8852862ee81ba98b

openstack-meta-content-provider FAILURE in 12m 59s
⚠️ nova-operator-kuttl SKIPPED Skipped due to failed job openstack-meta-content-provider
⚠️ nova-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-meta-content-provider
⚠️ nova-operator-tempest-multinode-ceph SKIPPED Skipped due to failed job openstack-meta-content-provider

@fmount fmount requested a review from SeanMooney February 18, 2025 15:31
@fmount fmount force-pushed the topologyref branch 4 times, most recently from 815bfd3 to ea4c82e Compare February 18, 2025 16:28
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/ab0b064036e04bcab625f7ad529b998f

openstack-meta-content-provider FAILURE in 13m 42s
⚠️ nova-operator-kuttl SKIPPED Skipped due to failed job openstack-meta-content-provider
⚠️ nova-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-meta-content-provider
⚠️ nova-operator-tempest-multinode-ceph SKIPPED Skipped due to failed job openstack-meta-content-provider

Copy link
Contributor

@SeanMooney SeanMooney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

som ecomment in ine.

i had to step away while in the middle of reviewing this to discuss something else so i may have missed a few things but i guess we would also want this change to have kuttl tests too?

Copy link
Contributor Author

@fmount fmount left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SeanMooney Thank you for your comments and for this first pass. I'll work on the change requests and improve the testing part.

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/a3ad45afd038493185e8baff1430e5e9

openstack-meta-content-provider FAILURE in 12m 21s
⚠️ nova-operator-kuttl SKIPPED Skipped due to failed job openstack-meta-content-provider
⚠️ nova-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-meta-content-provider
⚠️ nova-operator-tempest-multinode-ceph SKIPPED Skipped due to failed job openstack-meta-content-provider

Copy link
Contributor

@gibizer gibizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only managed to review the api part of the change but I'm publishing my comments as it might take time before I can finish reviewing the rest of the patch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be controversial but so far we kept the possibility to create not just Nova but like NovaAPI CR independently from a higher level entity (like OpenStackControlPlane, or Nova). If you take the NovaAPI case that is either created from a NovaAPITemplate by the nova controller, or created by the user via a NovaAPI CR. For the former we have the validation webhook running on the Nova CR level or on the OpenStackControlPlane CR level. For the latter the webhook runs on the NovaAPI CR level. So we have validation code here on the NovaAPI level as well. As the topologyRef field is both in the novaAPITemplate and novaAPISpec we need validation logic on both level. This is now missing here an on the other service CRs in nova

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep i don't think we should loose that capability. it adds a lot fo value from my perspective not just from a testablity point of view but in ensuring we can properly reconsile each CR independently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack thank you, this is interesting. I'm going to add the ValidateTopology in novaapi_webhook as well then.
I'd like to have a chat (after FR2) about the ability to independently create a NovaAPI CR. This is something I was thinking for Glance in the Edge scenario and periodically comes back to my mind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I updated the novaapi_webhook and create a envTest for it. Marking the thread as resolved (feel free to reopen if I missed something).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a misunderstanding here. I wanted to express that nova-operator supports creating (and therefore validating) all the CRs individually not just NovaAPI. I don't believe it make sense to add that now to this PR due to multiple reasons:

  • deadline
  • the PR is already 2000 lines so it is pretty hard to follow.

So I suggest to handle this as a followup.

Copy link
Contributor

@gibizer gibizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another set of comments mostly about the nova_controller. I will continue from here probably tomorrow or on Friday.

@@ -1133,6 +1134,10 @@ func (r *NovaReconciler) ensureCell(
cellSpec.NodeSelector = instance.Spec.NodeSelector
}

if cellTemplate.TopologyRef == nil {
cellSpec.ConductorServiceTemplate.TopologyRef = instance.Spec.TopologyRef
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is incorrect. Here we are creating a cellSpec from the cellTemplate. So we should not really modify the conductorServiceTemplate here. All the template data is coming from the user. All the spec data is generated from template data + additional inheritance logic. So we should only modify the spec data but never the template data. Also the cell controller will take the cellSpec, and the conductorServiceTemplate within it ,and turn it into a conductorSpec. That is the place where we can decide what to set on the topologyRef of the conductorSpec based on the conductorTemplate and the inheritance logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sean said above that they does not see the value inheriting the nova.spec.topologyRef (here that is instance.spec.topologyRef) to the cellSpec if the cellTemplate has no topologyRef defined. If we agree on that then here you don't need to inherit anything from the instance.Spec.TopologyRef.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im kind of conflicted because if i look at the content of the

topologySpreadConstraints i think if you need this level of granularity then you won't use a common topology between different nova services or between nova and Keystone

so i would expect each sub cr to reference a different topology.
yet i don't think the major of customer will need that.

with out the motivating use case it hard to reason about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For storage there might be components that need a different Topology based on NodeAffinity (because of some hw requirements) instead of topologySpreadConstraints, and this was the main reason to allow the subCRs to override the default Topology applied to the global service.
I see that here we have a more complex scenario determined by > 1 nested level, so I'm not sure where the best strategy is.
As mentioned, I don't think customers are going to request this level of granularity, and we might do not want to inherit it from the top-level. As a general approach, I always tend to inherit from the top-level if the parameter is not present as an override, but I think here can be modified as follows:

- cellSpec.TopologyRef is not provided
- cellSpec.ConductorServiceTemplate.TopologyRef is not provided 
- > inherit from the top-level

is the above correct? We can also make the choice to not inherit from the top-level, but this might be translated into a documentation Note that we must have, as the experience might be different from other operators.
I also see your point to never modified cellSpec.ConductorServiceTemplate.TopologyRef as this represents user data, but I'm wondering if setting it to the novaCell topology is ok when is not provided by the user.

Copy link
Contributor Author

@fmount fmount Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I parse this correctly, in any case an inheritance that updates cellSpec.ConductorServiceTemplate.TopologyRef is wrong by definition, and this is because ConductorServiceTemplate is part of the user input, which is authoritative when the spec is built.
While I think there's value in overriding the CellSpec topology to not follow a potential common topology applied to multiple services, maybe inheriting from the top-level nova (that is useful for top-level services) is not the right choice.
For this reason, for now, I'm removing this ability (the wrong if that updates ConductorServiceTemplate). As per the envTests, because we have the Topology parameter, if it is passed down to a Conductor instance, the controller is able to process that parameter, and it is consumed by the Conductor StatefulSet. We might need another pass to this patch to figure out which is the best inheritance model we want to take (or we can decide to not inherit that parameter and document it as a choice).
FYI right now in novacell_controller, when a NewConductorSpec is built I have [1]:

if conductorSpec.TopologyRef == nil {
    conductorSpec.TopologyRef = novaCell.TopologyRef
}

which seems sufficient to cover this feature.
[1] https://github.com/openstack-k8s-operators/nova-operator/pull/924/files#diff-46d41b0e33ca2a11ff58a5d5af330620e3ce346a99cbf605b82efb1f6e2cd020R235

@@ -1133,6 +1134,10 @@ func (r *NovaReconciler) ensureCell(
cellSpec.NodeSelector = instance.Spec.NodeSelector
}

if cellTemplate.TopologyRef == nil {
cellSpec.ConductorServiceTemplate.TopologyRef = instance.Spec.TopologyRef
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im kind of conflicted because if i look at the content of the

topologySpreadConstraints i think if you need this level of granularity then you won't use a common topology between different nova services or between nova and Keystone

so i would expect each sub cr to reference a different topology.
yet i don't think the major of customer will need that.

with out the motivating use case it hard to reason about this.

@SeanMooney
Copy link
Contributor

by the way the jia epic for the nova-operator component should be reference din the PR header
with Close: OSPRH####
so that this pr is auto linked to the jira and any backprot of the same internally will be associated properly.

@fmount fmount force-pushed the topologyref branch 2 times, most recently from 9cb754b to 5447e73 Compare February 20, 2025 10:16
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/2748f6a98ad1482b964fbd4741ac0d72

openstack-meta-content-provider FAILURE in 14m 24s
⚠️ nova-operator-kuttl SKIPPED Skipped due to failed job openstack-meta-content-provider
⚠️ nova-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-meta-content-provider
⚠️ nova-operator-tempest-multinode-ceph SKIPPED Skipped due to failed job openstack-meta-content-provider

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/e4d107d63f044faa8fa765f6123f72fd

openstack-meta-content-provider FAILURE in 12m 49s
⚠️ nova-operator-kuttl SKIPPED Skipped due to failed job openstack-meta-content-provider
⚠️ nova-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-meta-content-provider
⚠️ nova-operator-tempest-multinode-ceph SKIPPED Skipped due to failed job openstack-meta-content-provider

Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/openstack-operator for 1312,b17515b0ec6a7a3fe3fd71d07ff0951b905f82cf

Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/openstack-operator for 1312,b17515b0ec6a7a3fe3fd71d07ff0951b905f82cf

Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/openstack-operator for 1312,b17515b0ec6a7a3fe3fd71d07ff0951b905f82cf

Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/openstack-operator for 1312,b17515b0ec6a7a3fe3fd71d07ff0951b905f82cf

Copy link

This change depends on a change that failed to merge.

Change openstack-k8s-operators/openstack-operator#1312 is needed.

@fmount
Copy link
Contributor Author

fmount commented Feb 26, 2025

recheck

Copy link
Contributor

@gibizer gibizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API and inheritance looks good to me. We agreed to handle the implementation comments in follow ups. I just want to make it clear that there is fairly big amount of follow up expected on this. Hopefully there will be enough time allocated in the future to push those.

I still had no chance to look at the tests and I'm not sure I will be able to today. I'm not blocking this PR any more. So @SeanMooney if you have a chance to look at the test again feel free to approve the whole PR. If I have a chance to look at the tests I will approve.

@gibizer gibizer dismissed their stale review February 26, 2025 10:55

my comments are either fixed or agreed for a follow up

@fmount
Copy link
Contributor Author

fmount commented Feb 26, 2025

Thanks @gibizer for reviewing the changes. I understand this is challenging due to the size of the patch and the amount of comments, so I'll start working on a more focused follow up after the code freeze.
We had a green last run, and I pushed one last update to restore the comment I removed by mistake and improve the top-level envTest to verify the expected conditions when you point to a non-existing Topology (this was already covered for all the sub resources controllers).
IMO, unless @SeanMooney points me to additional changes we must have, once CI is green we can try to land this patch, so I have a bit of time to focus on bumping Nova in the openstack-operator.

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/be9d145793e447f1bcbbe662734e225a

✔️ openstack-meta-content-provider SUCCESS in 2h 30m 10s
✔️ nova-operator-kuttl SUCCESS in 45m 42s
✔️ nova-operator-tempest-multinode SUCCESS in 2h 11m 30s
nova-operator-tempest-multinode-ceph FAILURE in 39m 02s

Copy link
Contributor

@gibizer gibizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the test coverage has some gaps I think we have enough coverage for now to get his in before the deadline and extend the coverage in the follow up.

I'm approving the PR now and expecting a bunch of follow ups later.

Thanks @fmount for providing this feature.

@@ -485,6 +485,7 @@ func GetCellNames(novaName types.NamespacedName, cell string) CellNames {
type NovaNames struct {
Namespace string
NovaName types.NamespacedName
StaticName types.NamespacedName
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be unused other then its initialization

Namespace: novaName.Namespace,
Name: "nova-cell1-topology",
},
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering what topology we use for the nova-conductor and nova-metadata test cases.

Comment on lines +1106 to +1120
topologySpec := GetSampleTopologySpec()
// Create Test Topologies
for _, t := range novaNames.NovaTopologies {
CreateTopology(t, topologySpec)
}

DeferCleanup(keystone.DeleteKeystoneAPI, keystone.CreateKeystoneAPI(novaNames.NovaName.Namespace))

spec := GetDefaultNovaSpec()
cell0template := GetDefaultNovaCellTemplate()
cell0template["cellDatabaseInstance"] = cell0.MariaDBDatabaseName.Name
spec["cellTemplates"] = map[string]interface{}{"cell0": cell0template}
spec["apiDatabaseInstance"] = novaNames.APIMariaDBDatabaseName.Name
spec["apiDatabaseAccount"] = novaNames.APIMariaDBDatabaseAccount.Name
spec["topologyRef"] = map[string]interface{}{"name": novaNames.NovaTopologies[0].Name}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we should connect the dots between CreateTopology creating the CR and spec["topologyRef"] = map[string]interface{}{"name": novaNames.NovaTopologies[0].Name} using the name of that CR.

Would it make sense if CreateToplogy (or a helper wrapping it) return Toplogy ref struct we need to pass to the topologyRef field here?

Comment on lines +1136 to +1137
Expect(api.Status.LastAppliedTopology).ToNot(BeNil())
Expect(api.Status.LastAppliedTopology.Name).To(Equal(novaNames.NovaTopologies[0].Name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could we simplify this to say we expect LastAppliedTopology is the same as the toplogyRef struct passed to Nova in the BeforeEach? i.e. passing the ref struct around in the test instead of recreating it / asserting the internals of it, and relay on golangs matching the fields between the expected and actual ref struct.

Eventually(func(g Gomega) {
g.Expect(th.GetStatefulSet(novaNames.MetadataName).Spec.Template.Spec.TopologySpreadConstraints).ToNot(BeNil())
// No default Pod Antiaffinity is applied
g.Expect(th.GetStatefulSet(novaNames.MetadataName).Spec.Template.Spec.Affinity).To(BeNil())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just go make the code a bit more readable (shorter lines):

ss := th.GetStatefulSet(novaNames.MetadataName)
podTemplate := ss.Spec.Template.Spec
g.Expect(podTemplate.TopologySpreadConstraints)....
g.Expect(podTemplate.Affinity)...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also a bit more specific assert for the content of the TopologySpreadConstraints would be nice. Maybe we can match what is there to what is in the ToplogyCR? If they are using the same struct then we can rely on go doing the struct comparision

)

Eventually(func(g Gomega) {
g.Expect(th.GetStatefulSet(novaNames.MetadataName).Spec.Template.Spec.TopologySpreadConstraints).ToNot(BeNil())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here it would be especially important to check the content as this field wasn't nil before the test case either it just had a different topology

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we test the inheritance of the field. As the nova-metadata inheritance has two cases (top level, cell level metadata service) we probably need an extra test case with Nova having metadata enabled on the cell level and top level nova having a different topology than cell1 and asserting that metadata gets the cell1 toplogy not the top level nova one.
In the multcell test suite we have a case called "Nova CR instance is created with metadata per cell" that shows how to configure metadata on the cell level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and in the other service CR level test suites I think you should also check that the finalizer is added to the TopologyCR and that it is removed when a) the service CR is deleted b) the topologyRef is changed to another Topolog CR.

Expect(err).To(HaveOccurred())
Expect(err.Error()).To(
ContainSubstring(
"Invalid value: \"namespace\": Customizing namespace field is not supported"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, I would expect that the path to the topologyRef is included in the error message. Maybe it is there you just not assert it with the substring here. Look for other tests above for something like invalid: spec.apiServiceTemplate.override.service[ in the assert.

@gibizer
Copy link
Contributor

gibizer commented Feb 26, 2025

recheck nova-operator-tempest-multinode-ceph failed with:

Error from server (InternalError): error when creating "/home/zuul/ci-framework-data/artifacts/manifests/openstack/openstack/cr/cifmw-kustomization-result.yaml": Internal error occurred: failed calling webhook "mopenstackcontrolplane.kb.io": failed to call webhook: Post "https://openstack-operator-webhook-service.openstack-operators.svc:443/mutate-core-openstack-org-v1beta1-openstackcontrolplane?timeout=10s": dial tcp 10.217.0.90:9443: connect: connection refused

Copy link
Contributor

@SeanMooney SeanMooney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we more or less agree that we would close the remaining testing gaps in a followup.

so what is missing is we do not have env test coverage for mulipel cells to core the default behavior and override via templates

we also do not have kuttle testing which is probably something we should add for this feature.

looking at the current code i was not expect to see applyto used in this pr. i expect that in a follow up pr after we discussed what the interface to that should be.

if our code freeze deadline was not this week i likely would ask you to address both of those point before we proceeded but if gibi is ok with merging this we can proceed.

// SetLastAppliedTopology - Sets the LastAppliedTopology value in the Status
func (instance *NovaScheduler) SetLastAppliedTopology(topologyRef *topologyv1.TopoRef) {
instance.Status.LastAppliedTopology = topologyRef
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: there is a lot of duplciation per type that i think we can address later but this is partly a limitation of go and how we are using go so perhaps we cant dedue this.

@@ -200,6 +192,20 @@ func StatefulSet(
if instance.Spec.NodeSelector != nil {
statefulset.Spec.Template.Spec.NodeSelector = *instance.Spec.NodeSelector
}
if topology != nil {
topology.ApplyTo(&statefulset.Spec.Template)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i really wish you didnt do this in this pr.

spec := GetDefaultNovaSpec()
cell0template := GetDefaultNovaCellTemplate()
cell0template["cellDatabaseInstance"] = cell0.MariaDBDatabaseName.Name
spec["cellTemplates"] = map[string]interface{}{"cell0": cell0template}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so the cell tempelat is t he default one which will not contain a topology ref

so we expect it to inerti at the cell spec level form the nova spec


api := GetNovaAPI(novaNames.APIName)
Expect(api.Status.LastAppliedTopology).ToNot(BeNil())
Expect(api.Status.LastAppliedTopology.Name).To(Equal(novaNames.NovaTopologies[0].Name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
since this si the same for all cases i would have pull this out into a var in the when block so we dint have to repeat the indexing
i.e something like
expectTopoName := novaNames.NovaTopologies[0].Name

corev1.ConditionTrue,
)
})
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the follow patch we are missing coverage for settin diffent topology refers for each of the sub services via the nova cr.

g.Expect(th.GetStatefulSet(novaNames.MetadataName).Spec.Template.Spec.Affinity).ToNot(BeNil())
}, timeout, interval).Should(Succeed())
})
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this covers the main point but we do not have testing of cell level metadata in this PR

so that should be done in the follow up pr

that really something that should be tested at the nova controller level not here since this is testing the metadata cr reconsitaiton and the only delta is in how that is created in the nova/cell contoler.

Comment on lines +449 to +491
Eventually(func(g Gomega) {
cond := GetNovaConductor(cell0.ConductorName)
g.Expect(cond.Status.LastAppliedTopology).ToNot(BeNil())
g.Expect(cond.Status.LastAppliedTopology.Name).To(Equal(novaNames.NovaTopologies[4].Name))
}, timeout, interval).Should(Succeed())

Eventually(func(g Gomega) {
api := GetNovaAPI(novaNames.APIName)
g.Expect(api.Status.LastAppliedTopology).ToNot(BeNil())
g.Expect(api.Status.LastAppliedTopology.Name).To(Equal(novaNames.NovaTopologies[1].Name))
}, timeout, interval).Should(Succeed())

th.ExpectCondition(
novaNames.SchedulerName,
ConditionGetterFunc(NovaSchedulerConditionGetter),
condition.TopologyReadyCondition,
corev1.ConditionTrue,
)
Eventually(func(g Gomega) {
sch := GetNovaScheduler(novaNames.SchedulerName)
g.Expect(sch.Status.LastAppliedTopology).ToNot(BeNil())
g.Expect(sch.Status.LastAppliedTopology.Name).To(Equal(novaNames.NovaTopologies[2].Name))
}, timeout, interval).Should(Succeed())

th.ExpectCondition(
novaNames.SchedulerName,
ConditionGetterFunc(NovaSchedulerConditionGetter),
condition.TopologyReadyCondition,
corev1.ConditionTrue,
)

Eventually(func(g Gomega) {
metadata := GetNovaMetadata(novaNames.MetadataName)
g.Expect(metadata.Status.LastAppliedTopology).ToNot(BeNil())
g.Expect(metadata.Status.LastAppliedTopology.Name).To(Equal(novaNames.NovaTopologies[3].Name))
}, timeout, interval).Should(Succeed())

th.ExpectCondition(
novaNames.MetadataName,
ConditionGetterFunc(NovaMetadataConditionGetter),
condition.TopologyReadyCondition,
corev1.ConditionTrue,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the order does not really matter but it slightly annoys me that these are not index in order :)

it doesn't matte rbecuase there is not fixed reconciliation order between the controller in this case and we are just asserting the end conditions

but you updated the spec in index order so i was expecting the same in the assert instead of 4 1 2 3

we could also assert ath this point that the top level toploty refe is still unchanged.

Eventually(func(g Gomega) {
nova = GetNova(novaNames.NovaName)
g.Expect(nova.Status.LastAppliedTopology.Name).To(Equal(novaNames.NovaTopologies[0].Name))
}, timeout, interval).Should(Succeed())

This patch provides more granular control over Pod placement and
scheduling through the Topology CR integration introduced in infra-operator.
In particular it provides:

- a new API parameter (TopologyRef) defined for each Component that
  allows to reference an existing Topology CRs in the same namespace

- the operator logic that retrieves and processes the referenced
  Topology CR through the functions provided by infra-operator

- an enhanced StatefulSet configuration that incorporates the processed
  Topology

- a set of envTest to test the lifecycle (add/update/override/remove) of
  the resulting StatefulSets when a Topology is referenced

Note that webhooks are in place to prevent referencing a Topology from a
different namespace (which is not a supported scenario).

Signed-off-by: Francesco Pantano <[email protected]>
While TopologyRef is able to reference Name and namespace, we currently
only save TopologyRef.Name to .Status.LastAppliedTopology. This patch
aligns both interfaces to use the same type in the same form. In
addition, some logic is moved to infra-operator because is common to all
operators, and this patch also adopts this new form.

Signed-off-by: Francesco Pantano <[email protected]>
@gibizer
Copy link
Contributor

gibizer commented Feb 26, 2025

looking at the current code i was not expect to see applyto used in this pr. i expect that in a follow up pr after we discussed what the interface to that should be.

if our code freeze deadline was not this week i likely would ask you to address both of those point before we proceeded but if gibi is ok with merging this we can proceed.

I think there was a misunderstanding of expectation about the changes around applyTo. I consider the current applyTo as OK, and I'm OK to further improve on it by shrinking the scope of the variable passed. But I also consider this as being in the same bucket as the other implementation changes we now request in a follow up.

I agree in a different timeline without the imminent deadline we would keep iterating on this PR, not just because of applyTo but also due to a list of other implementation changes. But the deadline is out of our control. So I think we deliberately made the decision to only fix the external API structural and behavioral changes (namely the topologyRef fields and their inheritance) before the deadline and handle the rest after the deadline.

Copy link
Contributor

@gibizer gibizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmount the rebase looks good to me.

@openshift-ci openshift-ci bot added the lgtm label Feb 26, 2025
Copy link
Contributor

openshift-ci bot commented Feb 26, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fmount, gibizer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 9c6c30c into openstack-k8s-operators:main Feb 26, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants