-
Notifications
You must be signed in to change notification settings - Fork 728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement TSO Client and provide general client side service discovery framework and general gPRC stream handling framework. #6037
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
Hi @binshi-bing. Thanks for your PR. I'm waiting for a tikv member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
client/base_client.go
Outdated
// GetTSOAllocators returns {dc-location -> TSO allocator leader URL} connection map | ||
GetTSOAllocators() *sync.Map | ||
// GetTSOAllocatorLeaderAddrByDCLocation returns the tso allocator of the given dcLocation | ||
GetTSOAllocatorLeaderAddrByDCLocation(dcLocation string) (string, bool) | ||
// GetTSOAllocatorClientConnByDCLocation returns the tso allocator grpc client connection | ||
// of the given dcLocation | ||
GetTSOAllocatorClientConnByDCLocation(dcLocation string) (*grpc.ClientConn, string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a base client interface, could we only provide some general methods to build up the specified client upon it? Introducing TSO-related methods here does not sound good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should move TSO-related methods out, but we can do it incrementally. The reason why we have TSO-related methods here is that previous baseClient(pdBaseClient) mixed up service discovery and tso related service discovery logic. There is historical reason, now we have to refactor it step by step. @JmPotato
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, if the final goal is like that, then this makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I'll move CreateTsoStream, ProcessTSORequests and the related private functions to another interface. @JmPotato
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to tso_stream. Template, Abstract Factor, Builder design patterns are used to make it possible.
5183f8b
to
dfaf5b4
Compare
0f9c7a6
to
764c81c
Compare
Codecov ReportBase: 74.20% // Head: 74.00% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #6037 +/- ##
==========================================
- Coverage 74.20% 74.00% -0.21%
==========================================
Files 373 376 +3
Lines 37354 37501 +147
==========================================
+ Hits 27718 27751 +33
- Misses 7197 7314 +117
+ Partials 2439 2436 -3
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
353b48e
to
06f60fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly, LGTM.
3f2323c
to
f0580b8
Compare
9eff40c
to
8a70da1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
diff so much, we need to review it carefully
client/grpcutil/grpcutil.go
Outdated
if ok { | ||
return conn.(*grpc.ClientConn), nil | ||
} | ||
tc, err := tlsCfg.ToTLSConfig() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tc
is usually used to be test cluster in this repo, I think tls
is a better name for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tls is the package name and mightn't be appropriate for the config name. tlsCfg is already used for tlsCfg *tlsutil.TLSConfig passed to this function as func param. I changed the name to tlsConfig.
client/tso_request_dispatcher.go
Outdated
for { | ||
// the pd/allocator leader change, we need to re-establish the stream | ||
if u != url { | ||
log.Info("[pd] the leader of the allocator leader is changed", zap.String("dc", dc), zap.String("origin", url), zap.String("new", u)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also [pd]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These functions are used by both tso APIs in pd client and tso APIs in the independent tso client. I changed name to [pd/tso]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also changed every [pd] to [pd/tso] in this file.
stream tsopb.TSO_TsoClient | ||
} | ||
|
||
func (s *tsoTSOStream) processRequests(clusterID uint64, dcLocation string, requests []*tsoRequest, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this two function is so similiar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not easy to remove duplicate code now, because the request in stream.Send(), the stream, the response from stream.Recv() are in different types from pdpb and tsopb. We have used several design patterns, including template, Abstract Factory, Builder as you can see in base_client.go and tso_stream.go, so that we can share the service discovery code and tso batching/forwarding/async framework. These two functions are closest to the protobuf code and only different in types so it is even not possible to dedup further.
client/client.go
Outdated
go c.tsLoop() | ||
go c.tsCancelLoop() | ||
} | ||
if enableAdmissionCtl { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does it mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tso mcs mode doesn't need to enable admisstion control module, so this flag is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean we can have a better name, such as enableAdmissionControl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the flags for now. The original purpose of adding these flags is to disable some clients when the server side doesn't provide the correspoinding services. For example, disable tso client when the server side is resource management microservice, or disable resource management client when the server side the tso microservice.
Changes: 1. Define the interface BaseClient which is generally for service discovery on a quorum-based cluster or a primary/secondy configured cluster so that the grpc client logic layer can decouple from the server discovery layer. 2. Rename baseClient to pdBaseClient then refactor it to implements BaseClient interface. It provides a basic implementation of service discovery on a quorum-based cluster. 3. Refactor pd client with template design patter to provides general client side service discovery framework and general TSO batching, forwarding, async and pooling framework. 4. Add skeleton of tsoBaseClient which is a basic implementation of server discover on a primary/secondary configured cluster. Signed-off-by: Bin Shi <[email protected]>
Signed-off-by: Bin Shi <[email protected]>
Signed-off-by: Bin Shi <[email protected]>
Signed-off-by: Bin Shi <[email protected]>
Signed-off-by: Bin Shi <[email protected]>
If -m is specified, then talk to tso microservice, pd otherwise. Set -tso to specify tso serving addresses. e.g., ./pd-tso-bench -v -m -duration 5s -tso "127.0.0.1:3379" Signed-off-by: Bin Shi <[email protected]>
AddTSOAllocatorServiceEndpointSwitchedCallback adds callbacks which will be called when any global/local tso allocator service endpoint is switched. Signed-off-by: Bin Shi <[email protected]>
…adability of client.go Signed-off-by: Bin Shi <[email protected]>
…t_dispatcher.go to improve code readability. Signed-off-by: Bin Shi <[email protected]>
…nd move GetOrCreateGRPCConn to grpcutil. Signed-off-by: Bin Shi <[email protected]>
…r refactor the code. Signed-off-by: Bin Shi <[email protected]>
Signed-off-by: Bin Shi <[email protected]>
Signed-off-by: Bin Shi <[email protected]>
… client. Signed-off-by: Bin Shi <[email protected]>
c74d478
to
3ca0398
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly LGTM
/merge |
@lhy1024: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
This pull request has been accepted and is ready to merge. Commit hash: 3ca0398
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks awesome!
/ok-to-test |
What problem does this PR solve?
Refactor pd client with template design pattern to provide general client side service discovery framework and general gPRC stream batching, forwarding, async and pooling framework.
Issue Number: Ref #5836
What is changed and how does it work?
Check List
Tests
pd-tso-bench testing result:
~/code/pingcap/my-pd/bin tso-integration ● ./pd-tso-bench -v -duration 5s -pd "127.0.0.1:3379" 1 ↵ 4577 17:46:06
Start benchmark #0, duration: 5s
Create 1 client(s) for benchmark
[2023/02/23 17:46:10.110 -08:00] [INFO] [client.go:431] ["[pd] create pd(tso) client with endpoints"] [pd-address="[127.0.0.1:3379]"]
[2023/02/23 17:46:10.110 -08:00] [INFO] [tso_client.go:336] ["[tso] switch primary"] [new-primary=http://127.0.0.1:3379/] [old-primary=]
[2023/02/23 17:46:10.110 -08:00] [INFO] [client.go:762] ["[pd] tso dispatcher created"] [dc-location=global]
count: 495783, max: 4.3819ms, min: 0.1444ms, avg: 2.0098ms
<1ms: 10410, >1ms: 228832, >2ms: 256541, >5ms: 0, >10ms: 0, >30ms: 0, >50ms: 0, >100ms: 0, >200ms: 0, >400ms: 0, >800ms: 0, >1s: 0
count: 481110, max: 4.6088ms, min: 0.0558ms, avg: 2.0063ms
<1ms: 24825, >1ms: 165334, >2ms: 290951, >5ms: 0, >10ms: 0, >30ms: 0, >50ms: 0, >100ms: 0, >200ms: 0, >400ms: 0, >800ms: 0, >1s: 0
count: 479289, max: 4.3606ms, min: 0.3493ms, avg: 2.0835ms
<1ms: 515, >1ms: 190880, >2ms: 287894, >5ms: 0, >10ms: 0, >30ms: 0, >50ms: 0, >100ms: 0, >200ms: 0, >400ms: 0, >800ms: 0, >1s: 0
count: 478138, max: 4.4927ms, min: 0.3457ms, avg: 2.0902ms
<1ms: 342, >1ms: 185102, >2ms: 292694, >5ms: 0, >10ms: 0, >30ms: 0, >50ms: 0, >100ms: 0, >200ms: 0, >400ms: 0, >800ms: 0, >1s: 0
Total:
count: 1934320, max: 4.6088ms, min: 0.0558ms, avg: 2.0471ms
<1ms: 36092, >1ms: 770148, >2ms: 1128080, >5ms: 0, >10ms: 0, >30ms: 0, >50ms: 0, >100ms: 0, >200ms: 0, >400ms: 0, >800ms: 0, >1s: 0
count: 1934320, <1ms: 1.87%, >1ms: 39.81%, >2ms: 58.32%, >5ms: 0.00%, >10ms: 0.00%, >30ms: 0.00%, >50ms: 0.00%, >100ms: 0.00%, >200ms: 0.00%, >400ms: 0.00%, >800ms: 0.00%, >1s: 0.00%
P0.5: 2.0442ms, P0.8: 2.2536ms, P0.9: 2.4216ms, P0.99: 3.1541ms
Remaining issues:
Killing etcd (remote)backend and the primary in sequence, the secondary doesn't switch to primary. This is because leadership.Watch() in the secondary is stuck. This seems to be a bug in monolithic architecture but uncovered by microservice (detached etcd) scenario.
pd(tso)-client exit isn't clean.
Release note