-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(plugin): fix batch queue closure function update bug #10090
Conversation
6009e7e
to
3b28b81
Compare
3b28b81
to
39ac379
Compare
39ac379
to
f52bfc8
Compare
@@ -120,6 +122,13 @@ function DatadogHandler:log(conf) | |||
|
|||
local queue_id = get_queue_id(conf) | |||
local q = queues[queue_id] | |||
if q and not tx_deepcompare(conf, q.conf) then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have we consider using a crud
event handler to flush queue
upon plugin config change instead of doing deepcompare on
every request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also seems to be imprudent to copy and paste this logic into every plugin that uses queues.
@@ -153,6 +155,14 @@ function HttpLogHandler:log(conf) | |||
|
|||
local queue_id = get_queue_id(conf) | |||
local q = queues[queue_id] | |||
if q and not tx_deepcompare(conf, q.conf) then | |||
local flush_res, err = q.queue:flush() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what circumstance would the configuration change in ways that retains the queue key but still requires flushing? In other words, would a configuration change not normally recreate the queue anyway? Is there test coverage for this part of this code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FTI-4643 is showing is that the problem was detected in the opentelemetry plugin, not on http-log.
I believe the problem is in how queue_id is calculated.
On http-log, get_queue_id is calculated using "all the values of the config":
local function get_queue_id(conf)
return fmt("%s:%s:%s:%s:%s:%s:%s:%s",
conf.http_endpoint,
conf.method,
conf.content_type,
conf.timeout,
conf.keepalive,
conf.retry_count,
conf.queue_size,
conf.flush_timeout)
end
In other plugins we are returning __key__
. In Datadog:
local function get_queue_id(conf)
return conf.__key__
end
In opentracing:
local queue_id = conf.__key__
On http-log we essentially "serialize all the config" already in get_queue_id
. If the config changes, queue_id
will always change.
On the others we use conf.__key__
. The plugins iterator will set that when iterating over a plugin for the first time (probably at creation time). It seems it is just an integer.
That means that an "old" version of opentelemetry may have the same "number" than a new version of the plugin with different config. If the new config is loaded via a reload, the Lua state will not be created, so queue[4]
will be "kept the same".
My recommendation is: let's not use key and let's use something similar to what http_log is doing instead. I am not 100% sure that this is the solution, so I will leave this as a comment, not as a change request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If __key__
remains to be in use, it will be easier to see what's going on if the get_queue_id
function would be replaced by queue_id = conf.__key__
like in the opentracing plugin. I was mislead by my reading of the http-log plugin, which uses some of the config attributes to construct the queue id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, @kikito 's meaning is the same as mine.
@hanshuebner for HTTP log, I think this implementation has some problems because it uses all config fields as the key to store the queue, if we add another config in here, it needs to add this field in the batch queue map function, this is easy to forget, also concatenates the string is very expensive in Lua, I think we need a new way to solve this problem, I will use the crud event handler to flush queue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that the string concatenation required to yield the queue id is contributing a lot to the overall performance, but I'll gladly see my scepticism be removed with some real numbers. My concern is mostly that get_queue_id
, as it is defined in the http-log plugin, does something that is very different from what get_queue_id
does in the opentracing plugin. The reuse of the (bad) function name in the http-log plugin led me to believe that a similar thing is done in the opentracing plugin. By removing the function and instead just using the __key__
field of the configuration, such confusion could be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hanshuebner I think using the plugin update event is a better way to flush the batch queue, Wat do you think about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not misinformed, the only way to get a "plugin update event" is to compare the current configuration with the previous configuration in the access
function. I think flushing the queue should only be done if queue parameters relevant to the upstream server fail. I think it would be suprising and unwanted to automatically flush the queues on all configuration changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I am wrong with the string, string.fmt
function is not so expensive, "a" .. "b"
this way string concatenation function is expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update worker event flush will be triggered when this plugin configure updated. So I think it's needed to flush the queue for the plugin queue.9ba3603#diff-1ff9a05ec152f46da66c4fc78f1d1f0a9e1604ed0bf530a71a6d4e9323fc3436R157
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flushing the queues on arbitrary reconfigure events is not acceptable.
I do not agree with these changes at this point, but I will not continue reviewing until Monday.
154e3fc
to
9ba3603
Compare
9ba3603
to
39ac379
Compare
@dndx and I have met to discuss the need to redesign the queuing system, which will be happening in the next couple of weeks. With the redesign, the fix for this FTI would look different enough to make it questionable whether we want to roll it out like this. In particular, I have these objections:
As indicated in FTI-4643, the customer currently knows a workaround to the actual problem they're seeing, so it seems not to be an urgent issue. |
@hanshuebner I am very glad to see the new design, If you have the results please assign me. Now, this PR will hold on, and delete in 3.2 millstones. |
It seems #10172 can address this problem, So this PR needs to be closed. |
…o `0.1.2137` in `/spec-ee/kong-api-tests` (#10090) Bumps [@kong/runtime-groups-api-client](https://github.com/Kong/khcp/tree/HEAD/packages/khcp-api-client) from 0.1.2111 to 0.1.2137. - [Commits](https://github.com/Kong/khcp/commits/HEAD/packages/khcp-api-client) --- updated-dependencies: - dependency-name: "@kong/runtime-groups-api-client" dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Hayk <[email protected]>
Summary
kong opentelemetry plugin use Closure in here https://github.com/Kong/kong/blob/master/kong/plugins/opentelemetry/handler.lua#L151, this will cause config update not to be successful, because this closure function only captures once configure when create
Full changelog
Issue reference
FTI-4643