-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: [WIN-NPM] process updatePods in fifo order #1856
Conversation
Closing because this issue would be extremely rare. All of these rare events would have to coincide.
|
The chance of this occurring is higher when applying ipsets in background (bullet point 3 is no longer required). Have now seen sequence 1 fail too. |
npm/pkg/dataplane/dataplane.go
Outdated
@@ -149,6 +174,9 @@ func (dp *DataPlane) AddToSets(setNames []*ipsets.IPSetMetadata, podMetadata *Po | |||
klog.Infof("[DataPlane] {AddToSet} pod key %s not found in updatePodCache. creating a new obj", podMetadata.PodKey) | |||
updatePod = newUpdateNPMPod(podMetadata) | |||
dp.updatePodCache.cache[podMetadata.PodKey] = updatePod | |||
|
|||
// add to queue only if not in the cache/queue already | |||
dp.updatePodCache.queue = append(dp.updatePodCache.queue, podMetadata.PodKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are we checking if the key is not in queue ? we will need to have a 2nd check here,
npm/pkg/dataplane/dataplane.go
Outdated
@@ -177,6 +205,9 @@ func (dp *DataPlane) RemoveFromSets(setNames []*ipsets.IPSetMetadata, podMetadat | |||
klog.Infof("[DataPlane] {RemoveFromSet} pod key %s not found in updatePodCache. creating a new obj", podMetadata.PodKey) | |||
updatePod = newUpdateNPMPod(podMetadata) | |||
dp.updatePodCache.cache[podMetadata.PodKey] = updatePod | |||
|
|||
// add to queue only if not in the cache/queue already |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to add a secondary check here if the key is in queue or not,
npm/pkg/dataplane/dataplane.go
Outdated
defer dp.updatePodCache.Unlock() | ||
defer func() { | ||
dp.updatePodCache.removeDeletedItemsFromQueue() | ||
dp.updatePodCache.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be create a datastructure called ordered Map ? and use that as the update pod cache ?
example: https://github.com/iancoleman/orderedmap/blob/master/orderedmap.go
// enqueue adds a pod to the queue if necessary and returns the pod object used | ||
func (c *updatePodCache) enqueue(m *PodMetadata) *updateNPMPod { | ||
pod, ok := c.cache[m.PodKey] | ||
if !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the nodename got updated ? we will need to check if the pod objs are same if not update the element in map and move the pod obj to the end ?
// Caller should ensure the queue is not empty. | ||
// Otherwise, the following will occur: "panic: runtime error: index out of range [0] with length 0" | ||
func (c *updatePodCache) dequeue() *updateNPMPod { | ||
pod := c.cache[c.queue[0]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check if 0th element is present before calling for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed offline. caller must lock updatePodCache and check isEmpty() before calling dequeue()
692aa51
to
3e8d187
Compare
Reason for Change:
Fully solve #1729. When the issue’s scenario occurs, ordering of Pods matters for UpdatePod() within ApplyDataPlane().
UTs have been failing for sequence 1 and 2 of this issue, and the chance of failure is higher when applying IPSets in background.
Issue Fixed:
Requirements:
Notes:
Solution
Handle
updatePod
objects in FIFO order, the same order as the control plane.Results
Update Pod ACLs in same order as the control plane, e.g., the first Pod created is the first Pod to have proper connectivity.