-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: new queries cannot be submitted when one of coordinator disconnected in specific case #30
Comments
I think we have query state expiration config and by default it's 10 seconds. Basically if the query state of that running query hasn't been updated for more than 10 seconds we won't count that query. But for this to work it's required that all the coordinators' time are synced up, can you check this? |
I have checked that all the coordinators' time are synced up already. @haochending |
I found StateFetcher did handleExpiredQueryState only in OOM_QUERY_STATE_COLLECTION_NAM and FINISHED_QUERY_STATE_COLLECTION_NAM. Is there miss QUERY_STATE_COLLECTION_NAME? So that running queries in QUERY_STATE_COLLECTION_NAME never be cleaned. I think expired queries in QUERY_STATE_COLLECTION_NAME also need to be updated to failed. @haochending |
@hfsugar I think the code has already been refactored in the latest master branch and handleExpiredQueryState is getting called whenever we are deserializing the query states, can you give that a try? |
OK. I have tried clean expired data in QUERY_STATE_COLLECTION_NAME and it works. Thanks! |
/sync |
Has been fixed. |
Software Environment:
latest version
linux 4.9.0-8-amd64 Bump libthrift from 0.9.3-1 to 0.12.0 in /hetu-heuristic-index #1 SMP Debian 4.9.144-3 (2019-02-02) x86_64 GNU/Linux
java version "1.8.0_162"
Java(TM) SE Runtime Environment (build 1.8.0_162-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed mode)
Describe the current behavior
After submitting big query which leads to the corruption of one coordinator, new small query cannot be submitted successfully.
Describe the expected behavior
The corruption of one coordinator doesn't affect other new queries.
Steps to reproduce the issue
Related log/screenshots
lk> select 1;
Query 20210428_035158_00002_4tzge, QUEUED, 0 nodes, 0 splits
Special notes for this issue
Why new query cannot be submitted?
The related bug is when new query be submitted, group.canRunMore() always returns false.
So that new query always be queued.
Why group.canRunMore() always returns false?
Because query stats in Hazelcast not updated when big query leads to corruption of one coordinator.
The fact is that client gets server gone after big query submitted, but query status in Hazelcast is always running , which affects canRunMore of other new queries.
The text was updated successfully, but these errors were encountered: