-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Update install to support ClusterServingRuntime #252
Conversation
f4aca3f
to
1949cff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @chinhuang007 this looks pretty good, just a few comments.
@@ -138,7 +138,7 @@ func (r *ServingRuntimeReconciler) Reconcile(ctx context.Context, req ctrl.Reque | |||
|
|||
// Delete etcd secret when there is no ServingRuntimes in a namespace | |||
etcdSecretName := cfg.GetEtcdSecretName() | |||
if len(runtimes.Items) == 0 { | |||
if len(srSpecs) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chinhuang007 this is really a fix to the prior PR, we could just open a separate PR with just this and get it merged first.
// confirm 3 serving runtimes exist in namespace scope mode | ||
list, err := FVTClientInstance.ListServingRuntimes(metav1.ListOptions{}) | ||
Expect(err).ToNot(HaveOccurred()) | ||
Expect(list.Items).To(HaveLen(3)) | ||
if NameSpaceScopeMode { | ||
Expect(list.Items).To(HaveLen(3)) | ||
} else { | ||
Expect(list.Items).To(HaveLen(0)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these, wouldn't it be better to list either SRs or CSRs depending on namespace/cluster mode? I.e. the list should still contain 3 in either case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in commit comments, I try to minimize the FVT changes to test CSRs, for which I suggest to create a new PR.
patch: |- | ||
- op: replace | ||
path: /kind | ||
value: ClusterServingRuntime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If cluster scope is the default install mode, I'm wondering whether we should invert this - change the SRs in runtimes
to be CSRs, and have this overlay change them to SRs for the namespace scope install case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, due to existing FVTs and unit test depend on these to be SR, I try to minimize the changes. A separate PR should address your concern.
scripts/deploy/iks/test-fvt.sh
Outdated
fi | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't completely understand the changes in this file, are they temporary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is temporarily to make the test only for the controller namespace in the namespace scope mode, until we have the FVTs all figured out for CSRs in cluster scope mode.
@@ -18,8 +18,7 @@ resources: | |||
# Including creation of Service here for now, will later be done automatically | |||
- bases/serving.kserve.io_predictors.yaml | |||
- bases/serving.kserve.io_inferenceservices.yaml | |||
# ClusterServingRuntime not yet supported by modelmesh-serving | |||
# - bases/serving.kserve.io_clusterservingruntimes.yaml | |||
- bases/serving.kserve.io_clusterservingruntimes.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering whether we should include this in namespace scope case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. Since we won't do anything with CSRs in namespace scope mode, probably can exclude it during install.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chinhuang007 WDYT about excluding this for namespace mode? We could just have a separate cluster_crd kustomization which includes this one + the CSR CRD
83c74df
to
279d02a
Compare
@njhill I made some changes to address comments. Please take one more look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @chinhuang007 it's looking pretty good, just a few more comments.
@@ -18,8 +18,7 @@ resources: | |||
# Including creation of Service here for now, will later be done automatically | |||
- bases/serving.kserve.io_predictors.yaml | |||
- bases/serving.kserve.io_inferenceservices.yaml | |||
# ClusterServingRuntime not yet supported by modelmesh-serving | |||
# - bases/serving.kserve.io_clusterservingruntimes.yaml | |||
- bases/serving.kserve.io_clusterservingruntimes.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chinhuang007 WDYT about excluding this for namespace mode? We could just have a separate cluster_crd kustomization which includes this one + the CSR CRD
"config/runtimes/ovms-1.x.yaml", | ||
"controllers/testdata/mlserver-0.x.yaml", | ||
"controllers/testdata/triton-2.x.yaml", | ||
"controllers/testdata/ovms-1.x.yaml", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be a lot cleaner for these to convert it from CSR to SR programmatically. Then we won't need to copy files around.
And it should be easy with manifestival, you can just have a top level function like
func convertToServingRuntime(resource *unstructured.Unstructured) error {
if resource.GetKind() == "ClusterServingRuntime" {
resource.SetKind("ServingRuntime")
}
return nil
}
and then just add it to the existing Transform() call where each of the CRs are loaded, i.e.:
By("create the runtime")
m, err = mf.ManifestFrom(mf.Path(filepath.Join("..", sampleFilename)))
m.Client = mfc.NewClient(k8sClient)
Expect(err).ToNot(HaveOccurred())
m, err = m.Transform(convertToServingRuntime, mf.InjectNamespace(namespace))
Expect(err).ToNot(HaveOccurred())
// ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chinhuang007 WDYT about excluding this for namespace mode? We could just have a separate cluster_crd kustomization which includes this one + the CSR CRD
I think having a separate cluster_crd kustomization will complicate the installation. Currently everything under crd folder is taken as a whole, as documented below. We would need to duplicate kustomization.yaml except one line and make a few other changes to make it to work. So I would suggest to stay with the current simple solution.
# This kustomization.yaml is not intended to be run by itself,
# since it depends on service name and namespace that are out of this kustomize package.
# It should be run by config/default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, it just feels like this is a bit more hacky/brittle.
if NameSpaceScopeMode { | ||
f = FVTClientInstance.ListServingRuntimes | ||
} | ||
list, err := f(metav1.ListOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above, except err
is already declared here
scripts/deploy/iks/test-fvt.sh
Outdated
@@ -62,6 +61,9 @@ run_fvt() { | |||
fi | |||
fi | |||
|
|||
if [[ $(grep "Test Suite Passed" fvt.out) ]]; then | |||
REV=0 | |||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this is added here?
scripts/install.sh
Outdated
@@ -239,6 +239,7 @@ if [[ $delete == "true" ]]; then | |||
info "Deleting any previous ModelMesh Serving instances and older CRD with serving.kserve.io api group name" | |||
kubectl delete crd/predictors.serving.kserve.io --ignore-not-found=true | |||
kubectl delete crd/servingruntimes.serving.kserve.io --ignore-not-found=true | |||
kubectl delete crd/clusterservingruntimes.serving.kserve.io --ignore-not-found=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should move this into the namespace_scope != true
block below I think?
@@ -296,6 +298,7 @@ fi | |||
if [[ $namespace_scope_mode == "true" ]]; then | |||
info "Enabling namespace scope mode" | |||
kubectl set env deploy/modelmesh-controller NAMESPACE_SCOPE=true | |||
sed -i 's/#- bases\/serving.kserve.io_clusterservingruntimes.yaml/- bases\/serving.kserve.io_clusterservingruntimes.yaml/g' crd/kustomization.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is needed here, it's already done above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to reset crd/kustomization.yaml back to original in case the user wants to delete the namespaced installation and deploy a cluster scoped installation using the same set of files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok sorry I didn't notice that this was reverting.
@@ -281,6 +282,7 @@ kubectl get secret storage-config | |||
info "Installing ModelMesh Serving RBACs (namespace_scope_mode=$namespace_scope_mode)" | |||
if [[ $namespace_scope_mode == "true" ]]; then | |||
kustomize build rbac/namespace-scope | kubectl apply -f - | |||
sed -i 's/- bases\/serving.kserve.io_clusterservingruntimes.yaml/#- bases\/serving.kserve.io_clusterservingruntimes.yaml/g' crd/kustomization.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, you didn't make a separate kustomization for this. I guess this is ok. Maybe add a comment above this line saying that we won't install the CSR CRD in namespace mode?
Update install and config to support ClusterServingRuntime - update install script to setup SRs or CSRs - add rbac to access CSRs - add and update install yamls - update kserve ServingRuntimes and ClusterServingRuntimes yaml - update fvt, to use the namespace scope mode for now - fix an issue in SR controller We will need a separate PR to cover FVTs for the cluster scope mode with ClusterServingRuntimes Signed-off-by: Chin Huang <[email protected]>
Signed-off-by: Chin Huang <[email protected]>
Signed-off-by: Chin Huang <[email protected]>
Signed-off-by: Chin Huang <[email protected]>
Signed-off-by: Chin Huang <[email protected]>
Signed-off-by: Chin Huang <[email protected]>
279d02a
to
af0baee
Compare
Signed-off-by: Chin Huang <[email protected]>
@njhill Thanks for the review. One more commit is pushed. Please take a look. |
Signed-off-by: Nick Hill <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @chinhuang007, LGTM. I pushed a couple more small changes to avoid another review roundtrip :)
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: chinhuang007, njhill The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Nick Hill <[email protected]>
/lgtm |
Update install and config to support ClusterServingRuntime
We will need a separate PR to cover more FVT scenarios for the cluster scope mode with ClusterServingRuntimes
Signed-off-by: Chin Huang [email protected]