-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the bug of hybrid table request using the same request id #9443
Conversation
LOGGER.debug("Keep track of running query: {}", requestId); | ||
queryServers.addServers(offlineRoutingTable, realtimeRoutingTable); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one reason to use addServers() was to accumulate all servers involved when running a query, which might do some subqueries (thus recursively calling this handleRequest() method (where I assumed different routing tables might be created for some subqueries...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and that is the reason why I also moved the _queriesById.remove(requestId)
into the handleRequest()
to avoid this scenario
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense!
if (queryServers._realtimeRoutingTable != null) { | ||
// NOTE: When the query is sent to both OFFLINE and REALTIME table, the REALTIME one has negative request id to | ||
// differentiate from the OFFLINE one | ||
long realtimeRequestId = queryServers._offlineRoutingTable == null ? requestId : -requestId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🆒
b684292
to
ac72549
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #9443 +/- ##
============================================
+ Coverage 69.76% 69.86% +0.10%
Complexity 5098 5098
============================================
Files 1890 1890
Lines 100934 100945 +11
Branches 15347 15352 +5
============================================
+ Hits 70420 70529 +109
+ Misses 25541 25447 -94
+ Partials 4973 4969 -4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Do we also need the code change on canceling the request, since after this PR realtime servers may start to serve some queries with negative requestId?
@jackjlli Yes, the query cancellation logic is also revised to cancel the correct request id on the server |
Is there a reason we needed to do it this way (negative ids)? Most log search technologies folks use will expect KV filter like |
@jadami10 Good point. The reason why we choose to fix the issue by using negative id is because that won't change the transport object between brokers and servers, which is backward-incompatible change. On the server side, we may check whether the request-id is positive and log it accordingly. |
does that matter since it was already working before? The description states the components it's trying to fix but not really what was broken, so it's a little tough for me to understand what the problem is in the first place. |
@jadami10 The problem is that for tracing and query cancellation, we use request id as the key to track the queries running on the server. For hybrid table, 2 queries are sent out, one for the OFFLINE table and one for the REALTIME table. If both tables are served on the same server, the 2 queries will override the entry for each other, and only one query can be tracked. |
That makes sense. In the case where we have 1 server serving both parts, does it/can it know which query is which part without passing more info from the broker? That way we could append Is there actually a use case where someone wants to cancel a request id for just 1 part of the query? That feels strange since the broker will then return an exception anyway. |
That is possible by checking the table name. Let me think if we can solve it on server itself. Currently we don't allow cancelling only one request. From endpoint perspective, it will be the same, where user still always give the positive request id, and broker automatically re-write it to negative one if necessary. The issue is that the logger might log the negative request id on the server side. |
Currently when querying a hybrid table, 2 requests (one for OFFLINE and one for REALTIME) are sent to the servers, but with the same request id. It can cause problem for 2 modules:
This PR fixes the bug by using negative request id for REALTIME request in hybrid table.