-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(meta): Client failover #7248
Conversation
…o service_discovery
…o service_discovery
…o service_discovery
…o service_discovery
…o service_discovery
…o service_discovery
let current_leader = nl.unwrap(); | ||
|
||
let addr = format!( | ||
"http://{}:{}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any plans to switch to https? Do I have to consider that here?
return; | ||
} | ||
|
||
// Only print failure messages if the entire failover failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am doing this, to avoid spamming the logs. Printing immediately is very verbose, since the meta nodes send stale leader information for quite some time.
Codecov is complaining that coverage decreases. I assume that this is ok, since we also relied on relied on sim testing in feat: introduce ElectionClient trait for meta. Please correct me if I am wrong @shanicky @yezizp2012 |
Good job! No worries about the code coverage. @shanicky also submitted a PR #7389 to support service discovery(including client failover). After a quick look at both of your pr's, I believe you have implemented some common functionality, except for inconsistencies in where you update the meta leader address information.
I think both implementations are necessary as we discussed earlier, and need a configuration parameter to determine which path to take, such as Besides, I think starting risingwave via |
.map_err(RpcError::into) | ||
.map_err(RpcError::into); | ||
|
||
for retry in self.meta_client.get_retry_strategy() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failover is already covered in meta_rpc_client_method_impl
. So I guess this's not necessary?
|
||
// Hold locks on all sub-clients, to update atomically | ||
{ | ||
let mut leader_c = self.leader_client.as_ref().lock().await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd better wrap a core struct for all clients and lock/unlock on it.
} | ||
|
||
// repeat request if we were connected against the wrong node | ||
if self.do_failover_if_needed().await { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If some RPC calls failed, it can already tell that if we should do failover based on the response code. And not all RPCs calls are retryable.
/// Execute the failover if it is needed. If no failover is needed do nothing | ||
/// Returns true if failover was needed, else false | ||
pub async fn do_failover_if_needed(&self) -> bool { | ||
let current_leader = self.try_get_leader_from_connected_node().await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have to issue a call of make_leader_request
for all failure requests and need to make failover singleton, so that this refresh process will only happen once.
let channel = get_channel_with_defaults(addr).await?; | ||
|
||
// Dummy address forces a client failover | ||
let dummy_address = HostAddress { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't get leader address directly here? Storing a dummy info here is quite weird.
Thank you very much for the feedback. I will have a look at this on Wednesday after my vacation :) |
Closing this PR as duplicate work 😞 |
I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.
What's changed and what's your intention?
#6787
PR based on #7049. Only merge after that one is merged.
Checklist
./risedev check
(or alias,./risedev c
)Documentation
If your pull request contains user-facing changes, please specify the types of the changes, and create a release note. Otherwise, please feel free to remove this section.
Types of user-facing changes
Please keep the types that apply to your changes, and remove those that do not apply.
Release note
Please create a release note for your changes. In the release note, focus on the impact on users, and mention the environment or conditions where the impact may occur.
Refer to a related PR or issue link (optional)