-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blocking query of the health API "/health/service/<service>" may return empty nodes for non-empty services. #20790
Comments
ishustava
added a commit
that referenced
this issue
Mar 18, 2024
Currently, when a client starts a blocking query and an ACL token expires within that time, Consul will return ACL not found error with a 403 status code. However, sometimes if an ACL token is invalidated at the same time as the query's deadline is reached, Consul will instead return an empty response with a 200 status code. This is because of the events being executed. 1. Client issues a blocking query request with timeout `t`. 2. ACL is deleted. 3. Server detects a change in ACLs and force closes the gRPC stream. 4. Client resubscribes with the same token and resets its state (view). 5. Client sees "ACL not found" error. If ACL is deleted before step 4, the client is unaware that the stream was closed due to an ACL error and will return an empty view (from the reset state) with the 200 status code. To fix this problem, we introduce another state to the subsciption to indicate when a change to ACLs has occured. If the server sees that there was an error due to ACL change, it will re-authenticate the request and return an error if the token is no longer valid. Fixes #20790
4 tasks
ishustava
added a commit
that referenced
this issue
Mar 22, 2024
#20876) Currently, when a client starts a blocking query and an ACL token expires within that time, Consul will return ACL not found error with a 403 status code. However, sometimes if an ACL token is invalidated at the same time as the query's deadline is reached, Consul will instead return an empty response with a 200 status code. This is because of the events being executed. 1. Client issues a blocking query request with timeout `t`. 2. ACL is deleted. 3. Server detects a change in ACLs and force closes the gRPC stream. 4. Client resubscribes with the same token and resets its state (view). 5. Client sees "ACL not found" error. If ACL is deleted before step 4, the client is unaware that the stream was closed due to an ACL error and will return an empty view (from the reset state) with the 200 status code. To fix this problem, we introduce another state to the subsciption to indicate when a change to ACLs has occured. If the server sees that there was an error due to ACL change, it will re-authenticate the request and return an error if the token is no longer valid. Fixes #20790
4 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Overview of the Issue
For example, a GET call of the API
/health/service/consul?index=9999999&wait=10s
may returnstatus_code=200 index=1 nodes=[]
when the following event 2 happens immediately after event 1:This is due to a race condition of handling errors caused by the above two events:
m.mat.reset()
resets the index.m.index=0
due to the above error handling. So theresult.Value
is set to an empty list.Reproduction Steps
consul.json
:reproduce.py
:Consul info
Reproduced in consul 1.11.7. Also reproduced in the main branch commit: 6f48ce1
Operating system and Environment details
Arch Linux Kernel v6.6.6
Log Fragments
The text was updated successfully, but these errors were encountered: