-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrency issues when multiple patches happen for the same resources #1277
Concurrency issues when multiple patches happen for the same resources #1277
Conversation
This makes sense, it's probably very hard to test the race condition there, but do we have a test for the normal case of a new revision to verify this doesn't break normal behavior? (either unit or a gabbit) Also please squash (cleanup) the commits into one (or more, whatever is preferred) but provide good commit messages. |
"concurrency issue that might happen. Therefore, " | ||
"no revision is going to be generated at this " | ||
"time.", data_to_update) | ||
create_revision = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if it makes sense to just move the check revision logic to https://github.com/gnocchixyz/gnocchi/blob/master/gnocchi/indexer/sqlalchemy.py#L922-L923. that way it would be behind the lock created by for_update.
that said, now it's always locking and i've no idea the impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, it's probably very hard to test the race condition there, but do we have a test for the normal case of a new revision to verify this doesn't break normal behavior? (either unit or a gabbit)
Also please squash (cleanup) the commits into one (or more, whatever is preferred) but provide good commit messages.
Thanks for the review. I squashed the commits. Also, regarding tests, there are already gabbi tests that cover resource revisions, and all of them are passing just fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chungg I agree with you. Ideally, I Would have done that. However, I implemented this way to be less intrusive as possible in the code base.
I can move the decision to generate revision to the update_resource method only, if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this, I'll let @chungg respond and approve if he's OK with it.
An issue was found when Ceilometer notification is stopped, and samples accumulate in the RabbitMQ. These samples might contain the same set of attributes that would trigger one revision. However, due to a concurrency issue, when Ceilometer notification is started back again, all of them are processed in almost the same moment, and then pushed back to Gnocchi API. Then, in Gnocchi API the detection to see if the resource needs a revision is executed outside of a semaphore control (e.g. read/write transaction with the DB with a table being locked). Therefore, the code detects that a revision is needed, and generated multiple revisions, which have the same content but different revision start and end. This patches aims to resolve that situation. The solution adopted is to check again in the transaction (when the lock was acquired), before the revision is created, if the revision is really needed. It it is not needed, a log entry is generated and we stop the processing by returnign the resource data.
4c360af
to
52ff6f2
Compare
Hello guys, |
Thanks!! |
An issue was found when Ceilometer notification is stopped, and samples accumulate in the RabbitMQ. These samples might contain the same set of attributes that would trigger one revision. However, due to a concurrency issue, when Ceilometer notification is started back again, all of them are processed in almost the same moment, and then pushed back to Gnocchi API. Then, in Gnocchi API the detection to see if the resource needs a revision is executed outside of a semaphore control (e.g. read/write transaction with the DB with a table being locked). Therefore, the code detects that a revision is needed, and generated multiple revisions, which have the same content but different revision start and end.
This patches aims to resolve that situation. The solution adopted is to check again in the transaction (when the lock was acquired), before the revision is created, if the revision is really needed. It it is not needed, a log entry is generated and we stop the processing by returnign the resource data.