Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: route present-time reads to global_read follower replicas #59571

Merged

Conversation

nvanbenschoten
Copy link
Member

@nvanbenschoten nvanbenschoten commented Jan 29, 2021

This commit updates the kv client routing logic to account for the new LEAD_FOR_GLOBAL_READS RangeClosedTimestampPolicy added in #59505. In enterprise builds, non-stale read-only requests to ranges with this closed timestamp policy can now be routed to follower replicas, just like stale read-only requests to normal ranges currently are.

In addition to this main change, there are a few smaller changes in this PR that were hard to split out, so are included in this commit.

First, non-transactional requests are no longer served by followers even if the follower replica detects that the request can be. Previously, non-transactional requests would never be routed intentionally to a follower replica, but Replica.canServeFollowerRead would allow them through if their timestamp was low enough. This change was made in order to keep the client and server logic in-sync and because this does not seem safe for non-transactional requests that get their timestamp assigned on the server. We're planning to remove non-transactional requests soon anyway (#58459), so this isn't a big deal. It mostly just caused some testing fallout.

Second, transactional requests that had previously written intents are now allowed to have their read-only requests served by followers, as long as those followers have a closed timestamp above the transaction's read and write timestamp. Previously, we had avoided this because it seemed to cause issues with read-your-writes. However, it appears safe as long as the write timestamp is below the closed timestamp, because we know all of the transaction's intents are at or below its write timestamp. This is very important for multi-region work, where we want a transaction to be able to write to a REGIONAL table and then later perform local reads (maybe foreign key checks) on GLOBAL tables.

Third, a max clock offset shift in canUseFollowerRead was removed. It wasn't exactly clear what this was doing. It was introduced in the original 70be833 and seemed to allow a follower read in cases where they otherwise shouldn't be expected to succeed. I thought at first that it was accounting for the fact that the kv client's clock might be leading the kv server's clock and so it was being pessimistic about the expected closed timestamp, but it actually seems to be shifting the other way, so I don't know. I might be missing something.

Finally, the concept of a replicaoracle.OracleFactoryFunc was removed and the existing replicaoracle.OracleFactory takes its place. We no longer need the double-factory approach because the transaction object is now passed directly to Oracle.ChoosePreferredReplica. This was a necessary change, because the process of determining whether a follower read can be served now requires transaction information and range information (i.e. the closed timestamp policy), so we need to make it in the Oracle itself instead of in the OracleFactory. This all seems like a simplification anyway.

This is still waiting on changes to the closed timestamp system to be able to write end-to-end tests using this new functionality.

@nvanbenschoten nvanbenschoten requested review from ajwerner and a team January 29, 2021 05:15
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/closedTSPolicyRouting branch 2 times, most recently from 441a8fc to 11a7493 Compare January 29, 2021 07:55
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong: 🚀 🚀

Reviewed 18 of 18 files at r1, 17 of 17 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


pkg/ccl/kvccl/kvfollowerreadsccl/followerreads.go, line 97 at r2 (raw file):

}

// closedTimestampLikelySufficient determines if a request at a given timestamp

much clearer


pkg/ccl/kvccl/kvfollowerreadsccl/followerreads_test.go, line 265 at r2 (raw file):

// TestOracle tests the OracleFactory exposed by this package.
func TestOracleFactory(t *testing.T) {

nit: TestOracle and comment

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/closedTSPolicyRouting branch 4 times, most recently from d3de04b to 53860a9 Compare February 1, 2021 04:31
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/closedTSPolicyRouting branch 2 times, most recently from 4117d4c to a501736 Compare February 6, 2021 02:04
Copy link
Contributor

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 19 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)


pkg/ccl/kvccl/kvfollowerreadsccl/followerreads.go, line 89 at r3 (raw file):

		return 0, err
	}
	return getFollowerReadLag(st), nil

Should we add a comment here (or near the follower_read_timestamp() builtin definition) mentioning that AS OF SYSTEM TIME follower_read_timestamp() will always return a negative offset regardless of whether the range(s) serving the query close timestamps in the past?


pkg/ccl/kvccl/kvfollowerreadsccl/followerreads.go, line 99 at r3 (raw file):

// closedTimestampLikelySufficient determines if a request at a given timestamp
// is likely to be below a follower's closed timestamp and servicable as a

s/servicable/serviceable


pkg/ccl/kvccl/kvfollowerreadsccl/followerreads.go, line 117 at r3 (raw file):

	}
	expectedClosedTS := clock.Now().Add(offset.Nanoseconds(), 0)
	return maxObservableTS.LessEq(expectedClosedTS)

Until this point, I was under the impression that historical queries (via AOST) don't have a notion of uncertainty and that they can observe causal inversion. However, I see that both this and the previous implementation were accounting for the max-offset here. Is this so we preserve linearizability for long running txns?


pkg/ccl/kvccl/kvfollowerreadsccl/followerreads_test.go, line 313 at r3 (raw file):

	setLatency("3", 80*time.Millisecond)

	testCases := []struct {

The test is very cool!


pkg/roachpb/data.go, line 932 at r3 (raw file):

	//
	// There is a case where an intent written by a transaction is above the
	// transactions write timestamp — after a successful intent push. Such cases

minor nit: transaction's


pkg/roachpb/data.go, line 937 at r3 (raw file):

	//
	// WIP: something about how the intent will be present on followers with a
	// sufficiently high closed timestamp, either in its pre or post push state.

I was really confused about how reading an intent in its post-push state on a follower would be correct until I found:

// 11. We're reading our own txn's intent at an equal or higher sequence.
// Note that we read at the intent timestamp, not at our read timestamp
// as the intent timestamp may have been pushed forward by another
// transaction. Txn's always need to read their own writes.

It might just be because I'm unfamiliar with this stuff but do you think we should point to something like this from here?

However, if own-intents are the only reason we're Forwarding by WriteTimestamp, then I wonder if we can just do without it. It seems like it only "hurts" us by making follower-reads less likely. Am I misunderstanding something?

@aayushshah15
Copy link
Contributor


pkg/ccl/kvccl/kvfollowerreadsccl/followerreads.go, line 117 at r3 (raw file):

I thought at first that it was accounting for the fact that the kv client's clock might be leading the kv server's clock and so it was being pessimistic about the expected closed timestamp, but it actually seems to be shifting the other way, so I don't know.

Oh, I missed this before. The previous implementation seems to be shifting the wrong(?) way.

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, non-transactional requests are no longer served by followers even if the follower replica detects that the request can be. Previously, non-transactional requests would never be routed intentionally to a follower replica, but Replica.canServeFollowerRead would allow them through if their timestamp was low enough. This change was made in order to keep the client and server logic in-sync and because this does not seem safe for non-transactional requests that get their timestamp assigned on the server.

Why do you say this isn't safe for requests that get their timestamp on the server? I understand that these request don't have an uncertainty interval (so I don't understand how they're linearizable at all) but why does it matter who assigns the timestamp?

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, and @nvanbenschoten)


pkg/ccl/kvccl/kvfollowerreadsccl/followerreads.go, line 98 at r3 (raw file):

}

// closedTimestampLikelySufficient determines if a request at a given timestamp

I'd make it a point of explaining that the decision is range-specific, and introduce ctPolicy. Maybe name ctPolicy -> range[Ct]Policy.


pkg/ccl/kvccl/kvfollowerreadsccl/followerreads.go, line 98 at r3 (raw file):

}

// closedTimestampLikelySufficient determines if a request at a given timestamp

Also, the comment talks about :a request at a given ts", but the function takes a "maxObservableTS" which we generally don't call "the request's timestamp". Consider adding some words here, explaining that it's not the "request's timestamp" that matters, but this other thing.


pkg/ccl/kvccl/kvfollowerreadsccl/followerreads.go, line 106 at r3 (raw file):

// canSendToFollower implements the logic for checking whether a batch request
// may be sent to a follower.
// TODO(aayush): We should try to bind clusterID to the function here, rather

this todo is in the wrong place, but deleting it is just a slap in the face :)


pkg/ccl/kvccl/kvfollowerreadsccl/followerreads.go, line 117 at r3 (raw file):

Until this point, I was under the impression that historical queries (via AOST) don't have a notion of uncertainty and that they can observe causal inversion

You're still under this impression, right?


pkg/ccl/kvccl/kvfollowerreadsccl/followerreads.go, line 123 at r3 (raw file):

// may be sent to a follower.
func canSendToFollower(
	clusterID uuid.UUID,

if you're interested, consider taking the opportunity (perhaps different patch) to bind this damn clusterID


pkg/roachpb/data.go, line 912 at r3 (raw file):

func (t Transaction) LastActive() hlc.Timestamp {
	ts := t.LastHeartbeat
	// Only forward by the ReadTimestamp if it is a clock timestamp.

nit: this comment is useless now that the todo is gone; delete it


pkg/roachpb/data.go, line 920 at r3 (raw file):

// MaxObservableTimestamp returns the largest timestamp at which the transaction
// may observe when performing a read-only operation. This is the maximum of the

nit: "may observe" is weird because it sounds like "transactions observing" is a term of art. Maybe "read values".


pkg/roachpb/data.go, line 923 at r3 (raw file):

// transaction's read timestamp, its write timestamp, and its global uncertainty
// limit.
func (t Transaction) MaxObservableTimestamp() hlc.Timestamp {

you sure you don't want a pointer receiver on this? And maybe on other methods around, but that can wait.


pkg/roachpb/data.go, line 934 at r3 (raw file):

	// transactions write timestamp — after a successful intent push. Such cases
	// do allow a transaction to operate above its max observable timestamp.
	// However, this is fine...

don't forget to expand on the "this is fine".


pkg/roachpb/data.go, line 937 at r3 (raw file):

However, if own-intents are the only reason we're Forwarding by WriteTimestamp, then I wonder if we can just do without it. It seems like it only "hurts" us by making follower-reads less likely. Am I misunderstanding something?

Well but how are you suggesting we make sure we read our own writes when going to a follower?


pkg/roachpb/data.go, line 938 at r3 (raw file):

	// WIP: something about how the intent will be present on followers with a
	// sufficiently high closed timestamp, either in its pre or post push state.
	// Either works. Do we need to change the contract of MaxObservableTimestamp

yeah, MaxObservable seems to not describe this well (and I personally didn't really like the name to begin with). How about... RequiredFrontier?

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random question - do we route batches with QueryIntent in them to followers?

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, and @nvanbenschoten)

Copy link
Contributor

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @ajwerner, @andreimatei, and @nvanbenschoten)


pkg/ccl/kvccl/kvfollowerreadsccl/followerreads.go, line 117 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Until this point, I was under the impression that historical queries (via AOST) don't have a notion of uncertainty and that they can observe causal inversion

You're still under this impression, right?

Yes. So would it be worth it to plumb a flag down here and disregard the max offset if the request was sent from an AOST query?

If not, we could just spell out the reasoning for accounting for the max-offset in a comment here.


pkg/roachpb/data.go, line 937 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

However, if own-intents are the only reason we're Forwarding by WriteTimestamp, then I wonder if we can just do without it. It seems like it only "hurts" us by making follower-reads less likely. Am I misunderstanding something?

Well but how are you suggesting we make sure we read our own writes when going to a follower?

OK, I think I understand. Without Forwarding to the WriteTimestamp, we can't guarantee that an intent is even present on the follower replica serving the request.

First commit from cockroachdb#59505.

This commit updates the kv client routing logic to account for the new
`LEAD_FOR_GLOBAL_READS` `RangeClosedTimestampPolicy` added in cockroachdb#59505. In
enterprise builds, non-stale read-only requests to ranges with this
closed timestamp policy can now be routed to follower replicas, just
like stale read-only requests to normal ranges currently are.

In addition to this main change, there are a few smaller changes in this
PR that were hard to split out, so are included in this commit.

First, non-transactional requests are no longer served by followers even
if the follower replica detects that the request can be. Previously,
non-transactional requests would never be routed intentionally to a
follower replica, but `Replica.canServeFollowerRead` would allow them
through if their timestamp was low enough. This change was made in order
to keep the client and server logic in-sync and because this does not
seem safe for non-transactional requests that get their timestamp
assigned on the server. We're planning to remove non-transactional
requests soon anyway (cockroachdb#58459), so this isn't a big deal. It mostly
just caused some testing fallout.

Second, transactional requests that had previously written intents are
now allowed to have their read-only requests served by followers, as
long as those followers have a closed timestamp above the transaction's
read *and* write timestamp. Previously, we had avoided this because it
seemed to cause issues with read-your-writes. However, it appears safe
as long as the write timestamp is below the closed timestamp, because we
know all of the transaction's intents are at or below its write
timestamp. This is very important for multi-region work, where we want a
transaction to be able to write to a REGIONAL table and then later
perform local reads (maybe foreign key checks) on GLOBAL tables.

Third, a max clock offset shift in `canUseFollowerRead` was removed. It
wasn't exactly clear what this was doing. It was introduced in the
original 70be833 and seemed to allow a follower read in cases where they
otherwise shouldn't be expected to succeed. I thought at first that it
was accounting for the fact that the kv client's clock might be leading
the kv server's clock and so it was being pessimistic about the expected
closed timestamp, but it actually seems to be shifting the other way, so
I don't know. I might be missing something.

Finally, the concept of a `replicaoracle.OracleFactoryFunc` was removed
and the existing `replicaoracle.OracleFactory` takes its place. We no
longer need the double-factory approach because the transaction object
is now passed directly to `Oracle.ChoosePreferredReplica`. This was a
necessary change, because the process of determining whether a follower
read can be served now requires transaction information and range
information (i.e. the closed timestamp policy), so we need to make it in
the Oracle itself instead of in the OracleFactory. This all seems like a
simplification anyway.

This is still waiting on changes to the closed timestamp system to be
able to write end-to-end tests using this new functionality.
Stop using timeutil.Now to create HLC timestamps.
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/closedTSPolicyRouting branch from a501736 to 2c972f8 Compare February 17, 2021 23:53
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

Why do you say this isn't safe for requests that get their timestamp on the server? I understand that these request don't have an uncertainty interval (so I don't understand how they're linearizable at all) but why does it matter who assigns the timestamp?

Because if it isn't the leaseholder who is assigning the timestamp then the replica's clock may trail the commit timestamp of a write with a happens-before relation to the read. This is all kind of broken though. See #58459.

do we route batches with QueryIntent in them to followers?

Discussed offline. We currently don't because we take a fairly pessimistic view of what can happen during async consensus. We assume that an in-flight write can be pushed to a higher timestamp between the time that pipelining begins and when the intent first "lands". And so we don't use a similar argument to the one we're using in RequiredFrontier to guarantee read-your-writes.

I think this is probably too pessimistic of a view and it may be ok to route QueryIntent requests to followers if we loosen its semantics a little bit. But that isn't needed here, so I'll defer that investigation.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15, @ajwerner, and @andreimatei)


pkg/ccl/kvccl/kvfollowerreadsccl/followerreads.go, line 89 at r3 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

Should we add a comment here (or near the follower_read_timestamp() builtin definition) mentioning that AS OF SYSTEM TIME follower_read_timestamp() will always return a negative offset regardless of whether the range(s) serving the query close timestamps in the past?

Done.


pkg/ccl/kvccl/kvfollowerreadsccl/followerreads.go, line 98 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I'd make it a point of explaining that the decision is range-specific, and introduce ctPolicy. Maybe name ctPolicy -> range[Ct]Policy.

You don't think the name of the type makes that clear enough?


pkg/ccl/kvccl/kvfollowerreadsccl/followerreads.go, line 98 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Also, the comment talks about :a request at a given ts", but the function takes a "maxObservableTS" which we generally don't call "the request's timestamp". Consider adding some words here, explaining that it's not the "request's timestamp" that matters, but this other thing.

Done.


pkg/ccl/kvccl/kvfollowerreadsccl/followerreads.go, line 99 at r3 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

s/servicable/serviceable

Done.


pkg/ccl/kvccl/kvfollowerreadsccl/followerreads.go, line 106 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

this todo is in the wrong place, but deleting it is just a slap in the face :)

👋

I don't see anything better we can do. utilccl.CheckEnterpriseEnabled requires a cluster ID. Hiding it in some object and then turning these static functions into methods just makes things even less clear.


pkg/ccl/kvccl/kvfollowerreadsccl/followerreads.go, line 117 at r3 (raw file):
A request sent with AOST will have no uncertainty interval, that's correct. That will be reflected in the transaction's GlobalUncertaintyLimit, which will be equal to its ReadTimestamp. So we don't need to plumb anything else down here.

The previous implementation seems to be shifting the wrong(?) way.

That's what it looks like. It confused me too.

If not, we could just spell out the reasoning for accounting for the max-offset in a comment here.

We aren't accounting for max offset here anymore. This is now handled in RequiredFrontier.


pkg/ccl/kvccl/kvfollowerreadsccl/followerreads.go, line 123 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

if you're interested, consider taking the opportunity (perhaps different patch) to bind this damn clusterID

Bind it to what? I don't love this, but I don't like introducing some pointless object even more. It just adds even more noise. Functions are at least easy to test and this doesn't get plumbed too far.


pkg/ccl/kvccl/kvfollowerreadsccl/followerreads_test.go, line 265 at r2 (raw file):

Previously, ajwerner wrote…

nit: TestOracle and comment

Done.


pkg/roachpb/data.go, line 912 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: this comment is useless now that the todo is gone; delete it

Done.


pkg/roachpb/data.go, line 920 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: "may observe" is weird because it sounds like "transactions observing" is a term of art. Maybe "read values".

Done.


pkg/roachpb/data.go, line 923 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

you sure you don't want a pointer receiver on this? And maybe on other methods around, but that can wait.

Done.


pkg/roachpb/data.go, line 932 at r3 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

minor nit: transaction's

Done.


pkg/roachpb/data.go, line 934 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

don't forget to expand on the "this is fine".

Done.


pkg/roachpb/data.go, line 937 at r3 (raw file):

we can't guarantee that an intent is even present on the follower replica serving the request.

Yes, exactly. Sorry for the confusion, I was lazy by leaving this WIP comment and skipping the most subtle point.

but do you think we should point to something like this from here?

Done.


pkg/roachpb/data.go, line 938 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

yeah, MaxObservable seems to not describe this well (and I personally didn't really like the name to begin with). How about... RequiredFrontier?

Done.

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: 🚋

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @andreimatei, and @nvanbenschoten)


pkg/ccl/kvccl/kvfollowerreadsccl/followerreads.go, line 98 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

You don't think the name of the type makes that clear enough?

I guess it does, nvm.

@nvanbenschoten
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Feb 18, 2021

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Feb 18, 2021

Build succeeded:

@craig craig bot merged commit ca1a1a7 into cockroachdb:master Feb 18, 2021
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/closedTSPolicyRouting branch February 18, 2021 19:29
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Mar 23, 2021
Fixes cockroachdb#62447.

In cockroachdb#62447, Erik found that cockroachdb#59571 had re-ordered the call to
`utilccl.CheckEnterpriseEnabled` to occur before checking the batch in
`canSendToFollower`, instead of after. This added an error allocation
into the hot path of all reads, which showed up in CPU profiles and
caused an 8% performance regression on `kv95`. This commit fixes this by
moving the enterprise check back out of the hot-path for all non-stale
read-only batches.

A follow up to this PR would be to make `utilccl.CheckEnterpriseEnabled`
cheaper by avoiding the error allocation for callers that don't need an
error. This work is not done in this commit.
craig bot pushed a commit that referenced this pull request Mar 23, 2021
62377: changefeedccl: allow topic_name parameter for changefeed kafka sinks r=[stevendanna,miretskiy] a=HonoreDB

Previously, changes for a table went to a Kafka topic
named for that table, with users only able to specify a prefix.
Some users, however, need changes to go to a specific topic,
including sometimes the same one for more than one table,
distinguishing messages using metadata.
This patch allows the `?topic_name=foo` parameter to be added
to Kafka sink URIs. This will override the per-table topic
generation, so that changes for every table will all go to
the specified topic. It may be used in conjunction with
`topic_prefix`, although the distinction is not meaningful.

Release note (enterprise change): Kafka sink URIs now accept
the "topic_name" parameter to override per-table topic names.

Closes #59300
Closes #58302


62414: workload/schemachange: add SURVIVE syntax r=ajwerner a=otan

see individual commits for details

62420: cliccl/load: fix TestLoadShowIncremental typo r=pbardea a=Elliebababa

This patch fixs typo of TestLoadShowIncremental.

Resolves: #62416

Release note: none

62465: kvccl: re-order enterprise check in canSendToFollower r=nvanbenschoten a=nvanbenschoten

Fixes #62447.

In #62447, Erik found that #59571 had re-ordered the call to
`utilccl.CheckEnterpriseEnabled` to occur before checking the batch in
`canSendToFollower`, instead of after. This added an error allocation
into the hot path of all reads, which showed up in CPU profiles and
caused an 8% performance regression on `kv95`. This commit fixes this by
moving the enterprise check back out of the hot-path for all non-stale
read-only batches.

A follow up to this PR would be to make `utilccl.CheckEnterpriseEnabled`
cheaper by avoiding the error allocation for callers that don't need an
error. This work is not done in this commit.

62473: kvserver: unskip TestEagerReplication r=lunevalex a=lunevalex

PR #61847 fixed the flake in TestEagerReplication but was not
rebased with master, so the skipped tag was not properly removed.
This PR actually unskips TestEagerReplication.

Release note: None

Co-authored-by: Aaron Zinger <[email protected]>
Co-authored-by: Oliver Tan <[email protected]>
Co-authored-by: elliebababa <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Alex Lunev <[email protected]>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Mar 25, 2021
Fixes cockroachdb#62447.

In cockroachdb#62447, Erik found that cockroachdb#59571 had re-ordered the call to
`utilccl.CheckEnterpriseEnabled` to occur before checking the batch in
`canSendToFollower`, instead of after. This added an error allocation
into the hot path of all reads, which showed up in CPU profiles and
caused an 8% performance regression on `kv95`. This commit fixes this by
moving the enterprise check back out of the hot-path for all non-stale
read-only batches.

A follow up to this PR would be to make `utilccl.CheckEnterpriseEnabled`
cheaper by avoiding the error allocation for callers that don't need an
error. This work is not done in this commit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants