-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Configuration device plugin #627
Add Configuration device plugin #627
Conversation
Signed-off-by: Johnson Shih <[email protected]>
Signed-off-by: Johnson Shih <[email protected]>
Signed-off-by: Johnson Shih <[email protected]>
Signed-off-by: Johnson Shih <[email protected]>
Signed-off-by: Johnson Shih <[email protected]>
Signed-off-by: Johnson Shih <[email protected]>
Signed-off-by: Johnson Shih <[email protected]>
For history of discussions around this work, see previous PR #565 |
this is so awesome to see! |
Regarding I assume the purpose of that is to ensure high availability. And certainly, that provides it from a node perspective. I'm not sure I'd make that a hard requirement though ... or at least allow the user to pick if they want it to be a requirement or a preference. For example, if there was a configuration that specifieid NodeHA=preference (rather than requirement) and there was a single node with 2 healthy slots, the user could schedule their workload. This might provide HA from a device API perspective (maybe the camera has glitchy software) or from a network perspective (maybe the network connection to the camera isn't great). |
sorry, i made a lot of nit comments about variable/function naming ... i think the comments can all boil down to making it easier to understand the code (for me) if:
|
overall, this looks super awesome and am very excited about you bringing this feature in!!! |
Signed-off-by: Johnson Shih <[email protected]>
The algorithm of CL resource allocation ensures allocating slots from different devices for a container on a given node. For example, 2 devices are discovered and the capacity is 3. On nodeA, a container can allocate 1 or 2 slots and you can have multiple containers scheduled to nodeA (e.g. 6 container, each requests 1 slot, or 3 containers each requests 2 slots, or combination of 1-slot and 2-slot containers). On nodeB, the same algorithm is used and both nodes share the total 6 slots. The purpose of In the case of reporting 2 * 3 slots, if a container requests 4 CL resources, kubelet issue allocate request for the container and Agent will fail the request due only 2 devices are available. This end up in a infinite loop that kubelet keeps retry to allocate for 4 CL resources until another 2 devices discovered. If we report 2 slots and increase the available slots later, then the container request 4 CL resources won't be scheduled until the reported available slots exceeds 4. This will reduce the chance that kubelet entering the infinite loop allocating resources for the container. |
Signed-off-by: Johnson Shih <[email protected]>
Signed-off-by: Johnson Shih <[email protected]>
Signed-off-by: Johnson Shih <[email protected]>
&DeviceUsageKind::Configuration(vdev_id.clone()), | ||
&dps.node_name, | ||
) | ||
.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't remember if there was some policy on when to return an error and when to unwrap, do we want to return an error here instead of unwrap call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly, i just want to make sure that unwrap is called in appropriate and intentional places vs returning an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dps.list_and_watch_message_sender is for notifying list_and_watch thread within the same device plugin service to re-scan the device availability. Since it is always valid as long as the device plugin service is running, the unwrap() should always succeed, if not, then it's unexpected and we fail fast.
i can't seem to wrap my brain around this. i'm not sure i get why we would fail if a container requests 4 slots and there are 6. i can understand why we'd prefer to succeed using slots from 4 different devices, but i don't understand why we'd fail to schedule if we had enough slots for the 2 devices. |
There are actually only 2 devices even we reports 6 slots. When allocating virtual devices to a container, does Agent treat each slot as a different virtual device? or Agent maps the slot back to actual device and allocate different devices? If Agent treat each slot as a different virtual device, when a container request 4 virtual devices, it should allow it. If Agent maps slots back to actual devices, then it should reject the allocation since only 2 devices are available. We had a long discussion about what behavior should be when facing this situation. I pasted the paragraph from the CL-resource proposal (https://github.com/project-akri/akri-docs/blob/main/proposals/configuration-level-resources.md) below. The implementation is based to the concept of allocation by "unique device". If we think allocation by "unique slot" is a valid scenario, we can add that support. (I do have that implemented, we can add a field "uniqueDevice" in Configuration CRD to decide which allocation policy to use.) There are two implementation options in the case where there are not enough unique Instances to meet the requested number of Configuration-level resources. One scenario where this could happen is if a Pod requests 3 cameras when only 2 exist with a capacity of 2 each. In this case, the Configuration-level camera resource shows as having a quantity of 4, despite there being two cameras. In this case, the kubelet will try to schedule the Pod, thinking there are enough resources. The Agent could either allocate 2 spots on one camera and one on the other or deny the allocation request. The latter is the preferred approach as it is more consistent and ensures the workload is getting the number of unique devices it expects. After failing an allocate request from the kubelet, the Pod will be in an UnexpectedAdmissionError state until another camera comes online and it can be successfully scheduled. |
Signed-off-by: Johnson Shih <[email protected]>
i guess i read |
@kate-goldenring, I am way late to this party and I imagine you've done far more in-depth thinking on this than I have. My gut is that preferring unique instances is fine for CL, but that there should be a way to schedule to available slots (regardless of instance uniqueness). What is your take? |
@bjfield for me the main use case is "I want to deploy a workload with 2 (distinct) cameras", if I have a single camera available on a node then the workload is not scheduled on that node, but if you expose all slots, if a single camera has two slots then it will end up scheduled on the node even though that's not what you wanted. The current implementation has a lot of unsolved corner cases here, as @johnsonshih told during community call there is a "race window" when a workload gets deleted where two device plugin slots will still be "available" even though we only want one, or a case where we have two containers on the same pod each asking for the 2 cameras that wouldn't be scheduled. But I don't really see any way to solve these with the Device Plugin API (maybe we could solve these with the new Dynamic Resource Allocation API, but I don't think we want to push it right now) without sacrificing the "If I want 2 distinct cameras, I don't want to end-up with two slots of the same camera" thing. |
i think i've always understood this scenario to be: i want to schedule against slots without having to understand instance hashes. nothing about distinct or non-distinct entered into my mind. |
but i'll defer to the other folks. the code seems good to me, approved. |
Signed-off-by: Johnson Shih <[email protected]>
Signed-off-by: Johnson Shih <[email protected]>
thanks for reviewing this PR. |
What this PR does / why we need it:
This PR is to implement the proposal of exposing resource at Configuration level. https://github.com/project-akri/akri-docs/blob/main/proposals/configuration-level-resources.md#configuration-level-resources
With Configuration level resource, users can select resource to use at configuration level without knowing the instance id beforehand.
Special notes for your reviewer:
The implementation is described in the document PR:project-akri/akri-docs#76
Summary of changes in this PR
If applicable:
cargo fmt
)cargo build
)cargo clippy
)cargo test
)cargo doc
)