-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite cache synchronization to lock instead of spin #21124
Conversation
I hope that the benchmark code provided is only for illustration and |
The query itself should also be larger for the difference in performance to manifest. See #18022 for some real-world candidates |
I ran both. Running a setup/cleanup per method invocation outside the method in BenchmarkDotNet can be done with IterationSetup, but that setting has a significant impact on the number of iterations etc. Since the cache only ever has one entry the impact should be pretty negligible (and I didn't dive to deep into tweaking BDN). If you'd like to see more data I can also loop inside the function to reduce the impact of the clearing. At the end of the day I also don't think it matters too much - I think we all agreed to remove the current spinning loop, and between the proposed locking and not doing any synchronization I don't think it matters that much.
Not sure we call that a real-world candidate, more like a cartesian explosion nightmare we recommend avoiding :) But if you think that's important for deciding what to do with this PR, let me know and I'll run that. |
Yes, I'm talking only about the query compilation. The database shouldn't have any data for the benchmark |
I think a more suitable example would be dynamic code generation with continuous AND conditions with a skewed binary tree, where perhaps only 1 constant value in very right changes. That gives a really large expression tree (which #18022 actually does not) and at the same time cache miss. |
I don't think this scenario is common enough. We could pregenerate the large skewed expression tree and use the same one in each iteration. |
We can certainly spend a lot of time tweaking and measuring this scenario, and I'll do the work if you guys think it's justified. Note also that the current PR does retain the locking (like @smitpatel originally wanted) - it just does it in a much better way (blocking on the lock instead of spinning). Unless someone strongly feels that the proposed locking mechanism is somehow bad, I don't think there's much value in continuing to investigate and benchmark this. Let me know. |
One more note - adding an entry to the cache and then compacting takes less than two microseconds. That means it's really quite negligible, and including Compact inside the benchmark should be fine (no need to work extra hard to generate different expression trees etc.). BenchmarkDotNet=v0.12.0, OS=ubuntu 20.04
Intel Xeon W-2133 CPU 3.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100-preview.6.20266.3
[Host] : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
Benchmarkpublic class Program
{
MemoryCache _cache;
[GlobalSetup]
public void Setup()
{
_cache = new MemoryCache(new MemoryCacheOptions { SizeLimit = 10240 });
}
[Benchmark]
public void MemoryAddAndCompact()
{
_cache.Set("someKey", "somevalue", new MemoryCacheEntryOptions { Size = 10 });
_cache.Compact(100);
}
static void Main(string[] args)
=> BenchmarkRunner.Run<Program>();
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an improvement, but it can still be improved further
@AndriySvyryd @smitpatel can you provide more details on what you'd like to see? Our original discussion was between two options: switch to prevent concurrent compilation of the same query via locks (this is done by this PR), or removing concurrent compilation prevention altogether. What could we be doing better here? |
What I would like to see is, |
You could decompose |
@AndriySvyryd I agree, the scenario where a query isn't already cached, and is being compiled by another thread, really, really doesn't seem like it's worth optimizing to this level. |
@smitpatel the basic thing here is to avoid uncontrolled spin looping, which is a bad idea in almost any scenario. This is less about actual, visible perf (although I believe that's also relevant), and more about safety: if in any way our compilation blocks or takes a very long time (we've had several bugs like this in the history of EF), that means other threads are occupying CPU cores with 100% spin loops. Basically, we should never, ever spin without at least using something like SpinWait, which spins for a while and then switches to waiting. In any case that isn't really relevant here. |
Closes #18516
tl;dr While results aren't very conclusive, this PR replaces spinning with an equivalent lock-based approach.
Benchmark code
Benchmark results