Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvcoord: key range cache entries by regular keys, not meta keys #52817

Merged
merged 4 commits into from
Aug 20, 2020

Conversation

andreimatei
Copy link
Contributor

@andreimatei andreimatei commented Aug 14, 2020

Before this patch, entries in the range cache were keyed by
RangeMetaKey(desc.EndKey). There were no good reasons why either
the RangeMetaKey() transformation was used, or why the EndKey was used
instead of the StartKey. Presumably this has all been done in order to
match how descriptors are stored in the meta ranges, but this was
misguided since it's confusing.

This patch makes cache entries simply be keyed on desc.Start. This fixes
a problem with querying the cache with RKeyMin.Next(). Before this
patch, this query was broken since it was failing to find the cached
Meta1 range: the range was [RKeyMin, Meta2Start), and so it was keyed at
RangeMetaKey(Meta2Start) = Meta1/Meta2Start. The cache lookup was using
RangeMetaKey(RKeyMin.Next), which results in /Meta2/ (which
is too high) because of arguably a bug in the implementation of
RangeMetaKey. Fixing that bug proves to not be quite straigh-forward
(#52786). This patch fixes the problem by not needing to apply neither
.Next() nor RangeMetaKey() when performing cache lookups.

This patch fixes the issues referenced below; what the restores in those
issues were experiencing was a spectacular consequence of the failure to
lookup the cache described above: evicting the Meta1 descriptor from the
cache was always failing (because the lookup was failing) which, in
turn, was causing the DistSender to believe that it was perpetually
receiving old lease information and so it would loop around contacting
the same replica over and over. There's something to improve in the
DistSender, coming separately.
While the underlying RangeMetaKey() bug is old, I believe this bug was
exposed more by the recent range cache changes. It used to be that we
would generally evict the Meta1 descriptor from the cache based on the
key that caused that descriptor to be inserted in the cache. Now we
generally evict the descriptor based on the range's start key (which is
RKeyMin, and suffers from the RKeyMin.Next() issue).

Fixes #52468
Fixes #52391
Fixes #50987

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: although now that I see how small this is, I'm less convinced that it's worth it in its entirety. Avoiding the RangeMetaKey translation when inserting and retrieving is a nice win, let's definitely keep that. Storing descriptors by their StartKey instead of their EndKey is a little less of a clear win. Conceptually it's nicer, but it actually makes the code more complex, not less complex like I was expecting, because we can't just .Next() to get around inclusive vs. exclusive discrepancies.

So all things considered, I think my vote would be to store descriptors by EndKey, instead of RangeMetaKey(EndKey) (current behavior) or StartKey (this behavior). I'm curious about what you think.

Reviewed 2 of 2 files at r1, 6 of 6 files at r2, 3 of 3 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei)


pkg/kv/kvclient/kvcoord/range_cache.go, line 558 at r3 (raw file):

	end := rangeCacheKey(span.EndKey)
	// Align the span's start on a range boundary.
	floor, _ /* rawEntry */ := rdc.getCachedRLocked(ctx, span.Key, false /* inverted */)

Instead of scanning backwards and then forwards, can we scan backwards from the EndKey and retain the single pass?


pkg/kv/kvclient/kvcoord/range_cache.go, line 908 at r3 (raw file):

rangeCacheKey(roachpb.RKeyMin)

Should we do the same thing we did with maxCacheKey?


pkg/util/cache/cache.go, line 462 at r3 (raw file):

}

// DoRangeEntryReverse invokes f on all cache entries in the range (to, from]. from

Shouldn't this be DoRangeReverseEntry? They both sound bizarre, but at least that one follows the existing pattern from llrb.Tree.

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So all things considered, I think my vote would be to store descriptors by EndKey, instead of RangeMetaKey(EndKey) (current behavior) or StartKey (this behavior). I'm curious about what you think.

I think I'll keep the StartKey. It's one less thing to remember and I think it's worth it.
I've considered adding support for all the inclusive/exclusive options we need to OrderedCache, so that they're hidden at a lower level. I've been lazy so far, but I could...

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


pkg/kv/kvclient/kvcoord/range_cache.go, line 558 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Instead of scanning backwards and then forwards, can we scan backwards from the EndKey and retain the single pass?

done; see now


pkg/kv/kvclient/kvcoord/range_cache.go, line 908 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
rangeCacheKey(roachpb.RKeyMin)

Should we do the same thing we did with maxCacheKey?

done


pkg/util/cache/cache.go, line 462 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Shouldn't this be DoRangeReverseEntry? They both sound bizarre, but at least that one follows the existing pattern from llrb.Tree.

done

@andreimatei andreimatei force-pushed the range-cache.fix-keymin-2 branch from e3927c8 to 27790ca Compare August 18, 2020 02:38
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still :lgtm:

Reviewed 4 of 6 files at r6, 4 of 4 files at r7.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

A test wanted to assert the RangeCache state, but was using the Lookup()
method which, in case of cache miss, goes to the database.

Release note: None
Release note: None
Before this patch, entries in the range cache were keyed by
RangeMetaKey(desc.EndKey). There were no good reasons why either
the RangeMetaKey() transformation was used, or why the EndKey was used
instead of the StartKey. Presumably this has all been done in order to
match how descriptors are stored in the meta ranges, but this was
misguided since it's confusing.

This patch makes cache entries simply be keyed on desc.Start. This fixes
a problem with querying the cache with RKeyMin.Next(). Before this
patch, this query was broken since it was failing to find the cached
Meta1 range: the range was [RKeyMin, Meta2Start), and so it was keyed at
RangeMetaKey(Meta2Start) = Meta1/Meta2Start. The cache lookup was using
RangeMetaKey(RKeyMin.Next), which results in /Meta2/<something> (which
is too high) because of arguably a bug in the implementation of
RangeMetaKey. Fixing that bug proves to not be quite straigh-forward
(cockroachdb#52786). This patch fixes the problem by not needing to apply neither
.Next() nor RangeMetaKey() when performing cache lookups.

This patch fixes the issues referenced below; what the restores in those
issues were experiencing was a spectacular consequence of the failure to
lookup the cache described above: evicting the Meta1 descriptor from the
cache was always failing (because the lookup was failing) which, in
turn, was causing the DistSender to believe that it was perpetually
receiving old lease information and so it would loop around contacting
the same replica over and over. There's something to improve in the
DistSender, coming separately.
While the underlying RangeMetaKey() bug is old, I believe this bug was
exposed more by the recent range cache changes. It used to be that we
would generally evict the Meta1 descriptor from the cache based on the
key that caused that descriptor to be inserted in the cache. Now we
generally evict the descriptor based on the range's start key (which is
RKeyMin, and suffers from the RKeyMin.Next() issue).

Fixes cockroachdb#52468
Fixes cockroachdb#52391
Fixes cockroachdb#50987

Release note: None
@andreimatei andreimatei force-pushed the range-cache.fix-keymin-2 branch from 27790ca to 5ea819f Compare August 20, 2020 01:51
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)

@craig
Copy link
Contributor

craig bot commented Aug 20, 2020

Build succeeded:

@craig craig bot merged commit 8bc4417 into cockroachdb:master Aug 20, 2020
@andreimatei andreimatei deleted the range-cache.fix-keymin-2 branch August 21, 2020 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants