Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative way of traffic switching during shard split #8980

Closed
wangmeng99 opened this issue Oct 11, 2021 · 5 comments
Closed

Alternative way of traffic switching during shard split #8980

wangmeng99 opened this issue Oct 11, 2021 · 5 comments

Comments

@wangmeng99
Copy link
Contributor

wangmeng99 commented Oct 11, 2021

Feature Description

The current traffic switching during shard split effectively holds the traffic then switches the primary. This, depending on the QPS, may create 'micro' unavailability. I wonder if it's worthwhile to support the 'dual write' fashin when it comes to the traffic switching, i.e. writing to both original shard and new shard during switch, followed by stopping the writes to the original shards. Granted we need some more in-depth discussion on the protocol during the dual write phase to ensure consistency and how to recover if there is a failure in the middle.

Related to #7330.

Use Case(s)

All the operations that need to switch the primary, e.g. shard split.

@sougou
Copy link
Contributor

sougou commented Oct 18, 2021

We can consider the dual write approach. Have you thought through the corner cases and have solutions in mind?

I think what we need to prove is that the source and targets can always converge correctly if writes fail in one and succeed in the other. Also, if one side fails, we have to think about whether vitess should return a success or failure to the app. At the same time, the current vreplication is very strict about consistency. We'll need to look at adding tolerances for situations where only one side succeeded.

Conceptually, I think the replication based cutover is the same as dual-write, except that the source takes the responsibility of writing to the target. It addresses the above concerns of consistency, but the cost is that things get lagged if there are errors or delays in the system. I almost feel like the delay is a theoretical necessity to keep things consistent.

An alternate approach we can look at is to time the cutover when the replication delay is near-zero. Since we control this timing, this may be viable.

@aquarapid
Copy link
Contributor

One thing that @deepthi mentioned while we were talking about this; what would dual writes do to GTID replication, which I believe to still be used under the covers, even with group replication?

@wangmeng99
Copy link
Contributor Author

sorry for the delay. I don't have a concrete solution. The high level thought was to use 2PC to coordinate the writes to both source and target shards; update the topology with the new primary then stop 2PC. GTID is not a concern because vreplication already leaves different GTIDs for the 'same' replicated transaction on the source and the target. This certainly means very complicated switch over.

The primary switch sacrifice latency to guarantee consistency. I guess the difference was, 'dual write' mode has the bounded worst case latency, i.e.maximum time is to write to both replica; whereas switch at a point in an async stream has unbounded worst case latency, i.e. we don't have know how many transactions before the switch point.

Timing the cutover when replication delay is zero seems a really good tactic solution to me. If there is an issue filed for that, I'm happy to close this one in favor of that one. What do you think?

@aquarapid
Copy link
Contributor

Agreed on having a mode in SwitchReads/SwitchWrites to only proceed to write/locking operations if the workflow (resharding in this case) delay is "low". I don't believe I have seen an issue for that before.

@mattlord
Copy link
Contributor

I'm going to close this for now as done since 1) shard splits are gone, replaced by Reshard 2) We support buffering for Reshard 3) We have various traffic switching pre-checks now that will avoid attempting the switch when we can see that it's likely to fail (e.g. the vreplication lag is high).

If I'm missing anything please let me know and we can re-open this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants