-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[exporterhelper] Fix invalid write index updates in the persistent queue #8963
[exporterhelper] Fix invalid write index updates in the persistent queue #8963
Conversation
655830a
to
f6ae621
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #8963 +/- ##
==========================================
+ Coverage 91.53% 91.55% +0.01%
==========================================
Files 316 316
Lines 17111 17115 +4
==========================================
+ Hits 15663 15670 +7
+ Misses 1152 1150 -2
+ Partials 296 295 -1 ☔ View full report in Codecov by Sentry. |
Thanks for the bug fix. About the second change, I’m not 100% sure that the proposed mode of handling erroneous situations is always better than the existing one. Can you please submit separate PRs? |
f6ae621
to
cd6904a
Compare
Can do, but I need to modify the test for full storage in a somewhat unintuitive way. I've done this and left a comment explaining the reasoning, let me know if it's clear. |
cd6904a
to
725821b
Compare
725821b
to
8f5cb99
Compare
// Carry out a transaction where we both add the item and update the write index | ||
setWriteIndexOp := storage.SetOperation(writeIndexKey, itemIndexToBytes(newIndex)) | ||
setItemOp := storage.SetOperation(itemKey, reqBuf) | ||
if err := pq.client.Batch(ctx, setWriteIndexOp, setItemOp); err != nil { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the return error, but why changing the way operations were constructed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to write this by assigning to err
in the condition, which is more readable with a shorter statement imo. So I defined the operations separately. Do you think it's worse now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it just complicates the review, especially understanding what changed. Writing the code as you suggested makes sense in general, but here I feel that it complicates a bit (especially that you still shadow err). I would write maybe like:
ops := []storage.Operations{ storage.SetOperation(writeIndexKey, itemIndexToBytes(newIndex), storage.SetOperation(itemKey, reqBuf)}
if err := pq.client.Batch(ctx, ops...); err != nil {
return err
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that looks better. Applied this change and also renamed the error variable to avoid shadowing.
// Carry out a transaction where we both add the item and update the write index | ||
setWriteIndexOp := storage.SetOperation(writeIndexKey, itemIndexToBytes(newIndex)) | ||
setItemOp := storage.SetOperation(itemKey, reqBuf) | ||
if err := pq.client.Batch(ctx, setWriteIndexOp, setItemOp); err != nil { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it just complicates the review, especially understanding what changed. Writing the code as you suggested makes sense in general, but here I feel that it complicates a bit (especially that you still shadow err). I would write maybe like:
ops := []storage.Operations{ storage.SetOperation(writeIndexKey, itemIndexToBytes(newIndex), storage.SetOperation(itemKey, reqBuf)}
if err := pq.client.Batch(ctx, ops...); err != nil {
return err
}
…eue (open-telemetry#8963) **Description:** Fixing a bug where the in-memory value of the persistent queue's write index would be updated even if writing to the storage failed. This normally wouldn't have any negative effect other than inflating the queue size temporarily, as the read loop would simply skip over the nonexistent record. However, in the case where the storage doesn't have any available space, the in-memory and in-storage write index could become significantly different, at which point a collector restart would leave the queue in an inconsistent state. Worth noting that the same issue affects reading from the queue, but in that case the writes are very small, and in practice the storage will almost always have enough space to carry them out. **Link to tracking Issue:** open-telemetry#8115 **Testing:** The `TestPersistentQueue_StorageFull` test actually only passed by accident. Writing would leave one additional item in the put channel, then the first read would fail (as there is not enough space to do the read index and dispatched items writes), but subsequent reads would succeed, so the bugs would cancel out. I modified this test to check for the number of items in the queue after inserting them, and also to expect one fewer item to be returned.
Description:
Fixing a bug where the in-memory value of the persistent queue's write index would be updated even if writing to the storage failed. This normally wouldn't have any negative effect other than inflating the queue size temporarily, as the read loop would simply skip over the nonexistent record. However, in the case where the storage doesn't have any available space, the in-memory and in-storage write index could become significantly different, at which point a collector restart would leave the queue in an inconsistent state.
Worth noting that the same issue affects reading from the queue, but in that case the writes are very small, and in practice the storage will almost always have enough space to carry them out.
Link to tracking Issue: #8115
Testing:
The
TestPersistentQueue_StorageFull
test actually only passed by accident. Writing would leave one additional item in the put channel, then the first read would fail (as there is not enough space to do the read index and dispatched items writes), but subsequent reads would succeed, so the bugs would cancel out. I modified this test to check for the number of items in the queue after inserting them, and also to expect one fewer item to be returned.