Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Update install to support ClusterServingRuntime #252

Merged
merged 9 commits into from
Oct 31, 2022

Conversation

chinhuang007
Copy link
Contributor

@chinhuang007 chinhuang007 commented Sep 29, 2022

Update install and config to support ClusterServingRuntime

  • update install script to setup SRs or CSRs
  • add rbac to access CSRs
  • add and update install yamls
  • update kserve ServingRuntimes and ClusterServingRuntimes yaml
  • update fvt to work with either cluster or namespace scope mode
  • update unit tests to work with updated yamls

We will need a separate PR to cover more FVT scenarios for the cluster scope mode with ClusterServingRuntimes

Signed-off-by: Chin Huang [email protected]

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chinhuang007 this looks pretty good, just a few comments.

@@ -138,7 +138,7 @@ func (r *ServingRuntimeReconciler) Reconcile(ctx context.Context, req ctrl.Reque

// Delete etcd secret when there is no ServingRuntimes in a namespace
etcdSecretName := cfg.GetEtcdSecretName()
if len(runtimes.Items) == 0 {
if len(srSpecs) == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chinhuang007 this is really a fix to the prior PR, we could just open a separate PR with just this and get it merged first.

Comment on lines 65 to 69
// confirm 3 serving runtimes exist in namespace scope mode
list, err := FVTClientInstance.ListServingRuntimes(metav1.ListOptions{})
Expect(err).ToNot(HaveOccurred())
Expect(list.Items).To(HaveLen(3))
if NameSpaceScopeMode {
Expect(list.Items).To(HaveLen(3))
} else {
Expect(list.Items).To(HaveLen(0))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these, wouldn't it be better to list either SRs or CSRs depending on namespace/cluster mode? I.e. the list should still contain 3 in either case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in commit comments, I try to minimize the FVT changes to test CSRs, for which I suggest to create a new PR.

patch: |-
- op: replace
path: /kind
value: ClusterServingRuntime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If cluster scope is the default install mode, I'm wondering whether we should invert this - change the SRs in runtimes to be CSRs, and have this overlay change them to SRs for the namespace scope install case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, due to existing FVTs and unit test depend on these to be SR, I try to minimize the changes. A separate PR should address your concern.

fi

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't completely understand the changes in this file, are they temporary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is temporarily to make the test only for the controller namespace in the namespace scope mode, until we have the FVTs all figured out for CSRs in cluster scope mode.

@@ -18,8 +18,7 @@ resources:
# Including creation of Service here for now, will later be done automatically
- bases/serving.kserve.io_predictors.yaml
- bases/serving.kserve.io_inferenceservices.yaml
# ClusterServingRuntime not yet supported by modelmesh-serving
# - bases/serving.kserve.io_clusterservingruntimes.yaml
- bases/serving.kserve.io_clusterservingruntimes.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether we should include this in namespace scope case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Since we won't do anything with CSRs in namespace scope mode, probably can exclude it during install.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chinhuang007 WDYT about excluding this for namespace mode? We could just have a separate cluster_crd kustomization which includes this one + the CSR CRD

@njhill njhill marked this pull request as draft October 5, 2022 17:34
@chinhuang007 chinhuang007 force-pushed the install-csr branch 2 times, most recently from 83c74df to 279d02a Compare October 10, 2022 22:30
@chinhuang007 chinhuang007 marked this pull request as ready for review October 10, 2022 23:15
@chinhuang007 chinhuang007 changed the title Update install to support ClusterServingRuntime feat: Update install to support ClusterServingRuntime Oct 10, 2022
@chinhuang007
Copy link
Contributor Author

@njhill I made some changes to address comments. Please take one more look.

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chinhuang007 it's looking pretty good, just a few more comments.

@@ -18,8 +18,7 @@ resources:
# Including creation of Service here for now, will later be done automatically
- bases/serving.kserve.io_predictors.yaml
- bases/serving.kserve.io_inferenceservices.yaml
# ClusterServingRuntime not yet supported by modelmesh-serving
# - bases/serving.kserve.io_clusterservingruntimes.yaml
- bases/serving.kserve.io_clusterservingruntimes.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chinhuang007 WDYT about excluding this for namespace mode? We could just have a separate cluster_crd kustomization which includes this one + the CSR CRD

"config/runtimes/ovms-1.x.yaml",
"controllers/testdata/mlserver-0.x.yaml",
"controllers/testdata/triton-2.x.yaml",
"controllers/testdata/ovms-1.x.yaml",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be a lot cleaner for these to convert it from CSR to SR programmatically. Then we won't need to copy files around.

And it should be easy with manifestival, you can just have a top level function like

func convertToServingRuntime(resource *unstructured.Unstructured) error {
	if resource.GetKind() == "ClusterServingRuntime" {
		resource.SetKind("ServingRuntime")
	}
	return nil
}

and then just add it to the existing Transform() call where each of the CRs are loaded, i.e.:

		By("create the runtime")
		m, err = mf.ManifestFrom(mf.Path(filepath.Join("..", sampleFilename)))
		m.Client = mfc.NewClient(k8sClient)
		Expect(err).ToNot(HaveOccurred())
		m, err = m.Transform(convertToServingRuntime, mf.InjectNamespace(namespace))
		Expect(err).ToNot(HaveOccurred())
		// ...

Copy link
Contributor Author

@chinhuang007 chinhuang007 Oct 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chinhuang007 WDYT about excluding this for namespace mode? We could just have a separate cluster_crd kustomization which includes this one + the CSR CRD

I think having a separate cluster_crd kustomization will complicate the installation. Currently everything under crd folder is taken as a whole, as documented below. We would need to duplicate kustomization.yaml except one line and make a few other changes to make it to work. So I would suggest to stay with the current simple solution.

# This kustomization.yaml is not intended to be run by itself,
# since it depends on service name and namespace that are out of this kustomize package.
# It should be run by config/default

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, it just feels like this is a bit more hacky/brittle.

fvt/predictor/predictor_suite_test.go Outdated Show resolved Hide resolved
if NameSpaceScopeMode {
f = FVTClientInstance.ListServingRuntimes
}
list, err := f(metav1.ListOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above, except err is already declared here

@@ -62,6 +61,9 @@ run_fvt() {
fi
fi

if [[ $(grep "Test Suite Passed" fvt.out) ]]; then
REV=0
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this is added here?

@@ -239,6 +239,7 @@ if [[ $delete == "true" ]]; then
info "Deleting any previous ModelMesh Serving instances and older CRD with serving.kserve.io api group name"
kubectl delete crd/predictors.serving.kserve.io --ignore-not-found=true
kubectl delete crd/servingruntimes.serving.kserve.io --ignore-not-found=true
kubectl delete crd/clusterservingruntimes.serving.kserve.io --ignore-not-found=true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should move this into the namespace_scope != true block below I think?

@@ -296,6 +298,7 @@ fi
if [[ $namespace_scope_mode == "true" ]]; then
info "Enabling namespace scope mode"
kubectl set env deploy/modelmesh-controller NAMESPACE_SCOPE=true
sed -i 's/#- bases\/serving.kserve.io_clusterservingruntimes.yaml/- bases\/serving.kserve.io_clusterservingruntimes.yaml/g' crd/kustomization.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed here, it's already done above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to reset crd/kustomization.yaml back to original in case the user wants to delete the namespaced installation and deploy a cluster scoped installation using the same set of files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok sorry I didn't notice that this was reverting.

@@ -281,6 +282,7 @@ kubectl get secret storage-config
info "Installing ModelMesh Serving RBACs (namespace_scope_mode=$namespace_scope_mode)"
if [[ $namespace_scope_mode == "true" ]]; then
kustomize build rbac/namespace-scope | kubectl apply -f -
sed -i 's/- bases\/serving.kserve.io_clusterservingruntimes.yaml/#- bases\/serving.kserve.io_clusterservingruntimes.yaml/g' crd/kustomization.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, you didn't make a separate kustomization for this. I guess this is ok. Maybe add a comment above this line saying that we won't install the CSR CRD in namespace mode?

scripts/install.sh Outdated Show resolved Hide resolved
Update install and config to support ClusterServingRuntime
- update install script to setup SRs or CSRs
- add rbac to access CSRs
- add and update install yamls
- update kserve ServingRuntimes and ClusterServingRuntimes yaml
- update fvt, to use the namespace scope mode for now
- fix an issue in SR controller

We will need a separate PR to cover FVTs for the cluster
scope mode with ClusterServingRuntimes

Signed-off-by: Chin Huang <[email protected]>
@chinhuang007
Copy link
Contributor Author

@njhill Thanks for the review. One more commit is pushed. Please take a look.

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chinhuang007, LGTM. I pushed a couple more small changes to avoid another review roundtrip :)

@kserve-oss-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chinhuang007, njhill

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Nick Hill <[email protected]>
@njhill
Copy link
Member

njhill commented Oct 29, 2022

/lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants