-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: broken reconnect on some HTTP2 error frames #1261
Conversation
providers/flagd/src/main/java/dev/openfeature/contrib/providers/flagd/FlagdProvider.java
Show resolved
Hide resolved
...ature/contrib/providers/flagd/resolver/process/storage/connector/sync/SyncResponseModel.java
Outdated
Show resolved
Hide resolved
...s/flagd/src/main/java/dev/openfeature/contrib/providers/flagd/resolver/grpc/RpcResolver.java
Outdated
Show resolved
Hide resolved
...ure/contrib/providers/flagd/resolver/process/storage/connector/sync/SyncStreamConnector.java
Outdated
Show resolved
Hide resolved
f5773d7
to
2851ff4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i know this is still work in progress, but i added some thoughts - if i find the time i am more than willing to support here :) this is a great improvement, and i think normalizing our code base, reducing complexity has a big benefit. .... in the longrun maybe our rpc mode will also use a construct of flagstore and connector (not part of this) so we have one unified approach to this :)
.../src/main/java/dev/openfeature/contrib/providers/flagd/resolver/common/ChannelConnector.java
Show resolved
Hide resolved
.../src/main/java/dev/openfeature/contrib/providers/flagd/resolver/common/ChannelConnector.java
Outdated
Show resolved
Hide resolved
...ure/contrib/providers/flagd/resolver/process/storage/connector/sync/SyncStreamConnector.java
Outdated
Show resolved
Hide resolved
.../src/main/java/dev/openfeature/contrib/providers/flagd/resolver/common/ChannelConnector.java
Outdated
Show resolved
Hide resolved
.../src/main/java/dev/openfeature/contrib/providers/flagd/resolver/common/ChannelConnector.java
Outdated
Show resolved
Hide resolved
.../src/main/java/dev/openfeature/contrib/providers/flagd/resolver/common/ChannelConnector.java
Outdated
Show resolved
Hide resolved
...ure/contrib/providers/flagd/resolver/process/storage/connector/sync/SyncStreamConnector.java
Outdated
Show resolved
Hide resolved
...ure/contrib/providers/flagd/resolver/process/storage/connector/sync/SyncStreamConnector.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noticed some Javadoc comments that are affected by class renaming of Grpc*
to Rpc*
. I did not mark all of them.
.../src/main/java/dev/openfeature/contrib/providers/flagd/resolver/common/ChannelConnector.java
Outdated
Show resolved
Hide resolved
.../src/main/java/dev/openfeature/contrib/providers/flagd/resolver/common/ChannelConnector.java
Outdated
Show resolved
Hide resolved
...ure/contrib/providers/flagd/resolver/process/storage/connector/sync/SyncStreamConnector.java
Outdated
Show resolved
Hide resolved
...rs/flagd/src/main/java/dev/openfeature/contrib/providers/flagd/resolver/rpc/RpcResolver.java
Outdated
Show resolved
Hide resolved
...rs/flagd/src/main/java/dev/openfeature/contrib/providers/flagd/resolver/rpc/RpcResolver.java
Outdated
Show resolved
Hide resolved
...rs/flagd/src/main/java/dev/openfeature/contrib/providers/flagd/resolver/rpc/RpcResolver.java
Outdated
Show resolved
Hide resolved
9412098
to
35bdd6a
Compare
...lagd/src/test/java/dev/openfeature/contrib/providers/flagd/resolver/rpc/RpcResolverTest.java
Show resolved
Hide resolved
...ntrib/providers/flagd/resolver/process/storage/connector/sync/SyncStreamQueueSourceTest.java
Show resolved
Hide resolved
...ntrib/providers/flagd/resolver/process/storage/connector/sync/SyncStreamQueueSourceTest.java
Show resolved
Hide resolved
...e/contrib/providers/flagd/resolver/process/storage/connector/sync/SyncStreamQueueSource.java
Show resolved
Hide resolved
...rs/flagd/src/main/java/dev/openfeature/contrib/providers/flagd/resolver/rpc/RpcResolver.java
Show resolved
Hide resolved
...ure/contrib/providers/flagd/resolver/process/storage/connector/grpc/GrpcStreamConnector.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I feel like we are getting closer and closer to an easy to use codebase, in which we share the same approaches in all the providers. Changing to a queue for also RPC normalizes things, and improves the maintainability of our codebase in the long run
...rs/flagd/src/main/java/dev/openfeature/contrib/providers/flagd/resolver/rpc/RpcResolver.java
Outdated
Show resolved
Hide resolved
...ature/contrib/providers/flagd/resolver/process/storage/connector/sync/SyncResponseModel.java
Outdated
Show resolved
Hide resolved
...ture/contrib/providers/flagd/resolver/process/storage/connector/sync/SyncStreamObserver.java
Outdated
Show resolved
Hide resolved
metadataResponse = channelConnector.getBlockingStub().getMetadata(metadataRequest.build()); | ||
} catch (Exception metaEx) { | ||
log.error("Metadata exception: {}", metaEx.getMessage(), metaEx); | ||
context.cancel(metaEx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok to proceed with a canceled context into the inner while loop, or should we try to restart (which might lead to problems because we might never get into the inner while loop just because we cannot query metadata)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be different from the previous behavior, as context.cancel()
will notify the stream listener and put an error event on it, so we will break out of the inner loop after 1 iteration.
However, this would mean that we can get in a fairly tight loop starting the stream over and over if the metadata errors over and over (this is not new). I'm not sure what can be done to improve this since I really do not want to proceed without metadata in case an admin has injected context attrs and there's rules involving them, and I don't want to give up forever on one error either. We could perhaps add a sleep after the metadata error just to make sure the loop isn't super tight... but I don't want to build our own backoff here for this edge case.
I really dislike the fact the metadata is a separate RPC. We need to improve that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @cupofcat
...e/contrib/providers/flagd/resolver/process/storage/connector/sync/SyncStreamQueueSource.java
Show resolved
Hide resolved
...e/contrib/providers/flagd/resolver/process/storage/connector/sync/SyncStreamQueueSource.java
Outdated
Show resolved
Hide resolved
710dc1a
to
c45bcbe
Compare
Signed-off-by: Todd Baert <[email protected]>
…s/flagd/resolver/rpc/RpcResolver.java Co-authored-by: Simon Schrottner <[email protected]> Signed-off-by: Todd Baert <[email protected]>
Signed-off-by: Todd Baert <[email protected]>
Signed-off-by: Todd Baert <[email protected]>
Signed-off-by: Todd Baert <[email protected]>
Signed-off-by: Todd Baert <[email protected]>
6e3c931
to
0ba935d
Compare
This PR fixes an issue where some HTTP2 error frames (for example
RST_STREAM
frames) cause our streams to die and not recover. TheRST_STREAM
in particular is sent by envoy/istio when upstream connections are lost, though the connection stays open on our end (envoyproxy/envoy#30149, grpc/grpc-go#8041). We are required to restart the stream in this case by reacting to theonCompleted
stream handler, which we never did in theeventStream
and we recently stopped doing in thesyncStream
(due to a regression).Specifically, the PR:
onError
andonComplete
stream messages by reconnecting the sync (in-process) and event (RPC) streams (this was the main issue)eventStream
now also uses a blocking queue to consume messages, just like thesyncStream
We don't have e2e tests for this issue because we'd need some more feature in the test-harness to do that, but I will create an issue for it.