-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
created nvidia gpu operator addon #824
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Howlla Thankyou for taking time to get this addon submitted. Couple of things are missing, please reachout for any questions:
- Addon Documentation is missing. Check
docs/addons/grafana-operator.md
as an example. - Addon is missing in indexing list of documentation.
- Addon should be added to e2e in
examples/blueprint-construct/index.ts
mkdocs.yaml
should be updated with new addon.- Also see if you have opportunity to add unit tests under
tests
overall great work. Are you planning to do gpu-builder
in separate PR?
Created builder and addon with documentation. Also added it to the e2e test @elamaran11 i was not sure whether to add the gpu operator addon in the builder itself or will we do it through the addon. It is a small change in line 124 of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Howlla Looks good to me, great work on the Builder class on Addon. Following needs to be done :
- Builder class doc is missing
- Also builder class should be added to mkdocs.
- Plus the builder class is not aligned to functionality we discussed yesterday. So please make sure you add GPU Addon, have a method like
enableGPU
and parametrize GPU like AmpAddon.
lib/builders/gpu-builder.ts
Outdated
] | ||
}) | ||
) | ||
.addOns( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should have GPU Addon as part of the builder. You should also be adding other required addons for the cluster which you used in your sample runs like coredns, kube-proxy etc etc to minimally run the solution. You can follow observability builder as an example to have a method like enableGPU
and have all addons. Also as i mentioned before, you should think about grabbing all values as parameters from pattern you are building to builder class to pass to GPU Addon like AmpAddon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Howlla We are on track. Just couple of comments.
docs/builders/gpu-builder.md
Outdated
const region = process.env.CDK_DEFAULT_REGION!; | ||
const stackID = `${id}-eks-blueprint`; | ||
|
||
const nodeRole = new blueprints.CreateRoleProvider("blueprint-node-role", new iam.ServicePrincipal("ec2.amazonaws.com"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not required. We needed that for windows for different reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, should i remove the nodeRole arguments from gpu-builder.ts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nodeRole
itself is not required here because its taken care by blueprints. This was specific to make windows work. I think you should follow the Observability Builder convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shapirov103 This PR looks near to complete with couple of comments open. Can you take a look in the PR, provide feedback so we can merge this with Release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Howlla great work, I added a few comments, there is a bit of work left, happy to discuss.
Please also make sure you run standard commands for PR submission such as make lint
. This process is described here: https://github.com/aws-quickstart/cdk-eks-blueprints/blob/main/docs/internal/readme-internal.md#submitting-pull-requests
lib/builders/gpu-builder.ts
Outdated
* This method helps you prepare a blueprint for setting up observability | ||
* returning an array of blueprint addons for AWS managed open source services | ||
*/ | ||
public enableGpu(values?: blueprints.Values): GpuBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are these values? it is going confusing to the customers which values are allowed to be passed. the way it stands, it is just key value pairs and no indication of the schema. Did you mean to allow passing some particular type here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added valueSchema similar to container insights values
lib/builders/gpu-builder.ts
Outdated
* This method helps you prepare a blueprint for setting up windows nodes with | ||
* usage tracking addon | ||
*/ | ||
public static builder(): GpuBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any constructor args to allow customers to pass here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing gpuOptions now
Fixed the issues @shapirov103 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Howlla The gpu-builder.ts
should be reworked based on my current PR for observability, Also you should add the builder to index.ts
in builder
folder in lib. Please let us know once done, we can review.
made changes @elamaran11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Howlla Did you merge from the branch in my PR. I think i have most of these changes for builders plus few other changes in my PR needs to be merged, before we can move in. @shapirov103 Please check
I havent merged from your yours but i made sure no conflicts occur. Let me know if you want me to create a branch with both ghaction and gpu builder code |
You can do couple of things. One merge from my branch or merge from main after my PR is merged. |
Merged with main branch and resolved conflicts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @shapirov103 Please check and merge this.,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more minor feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @shapirov103 Please review this, we can move to merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Howlla Great work!
I added a question that should not affect the PR status, i will merge after e2e.
WITH_REBOOT: true | ||
}, | ||
toolkit: { | ||
version: 'v1.13.1-centos7' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a mandatory parameter? I am curious how customers (and us) will update this moving forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This toolkit version was verified as working with AL2 by Re Alvarez
/do-e2e-tests |
/do-e2e-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
end to end tests passed
Issue #, if available:
Description of changes:
Creates new addon for gpu-operator which extends helm addon. Still working on md file for documentation.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.