Rewrite cache synchronization to lock instead of spin #21124

roji · 2020-06-03T12:06:26Z

tl;dr While results aren't very conclusive, this PR replaces spinning with an equivalent lock-based approach.

The objective here is to remove the spinning loop that occurs when another thread is already compiling our query, and to generally simplify synchronization.
I checked replacing the spinning loop with two things:
- LOCKING: A proper lock; this keeps the behavior where multiple threads don't compile the same query, but rather the first one compiles and the others wait for it (just not via spinning).
- NOSYNC: No synchronization, so multiple threads that happen to execute an un-compiled query compile in parallel.
I used two benchmarking scenarios (the MemoryCache is compacted/reset at the beginning of each invocation):
- Simply spin up 16 threads which execute the same heavy-ish query
- Spin up 1 thread, wait a bit, then execute 15 more.
The results are a bit inconclusive - benchmarking scenarios like this is very messy as the threads interfere with each other etc. But I was able to generate a scenario where locking improved perf a bit. I'm not convinced this is important, but as @smitpatel argued for this and the implementation simple, I went for that.
Unrelated: this PR also removes ICompiledQueryCache.GetOrAddAsync which isn't used anywhere (and is a bit more complicated to implement with locking). I don't think there is a justification for a compiled query cache which performs I/O as part of its job...

Benchmark code

[Benchmark]
public virtual async Task MultipleThreadsNoDelay()
{
    _memoryCache.Compact(100); // Clear the cache between invocations

    for (var i = 0; i < 16; i++)
        _tasks[i] = Task.Run(ExecuteQuery);

    await Task.WhenAll(_tasks);
}

[Benchmark]
public virtual async Task MultipleThreadsDelay()
{
    _memoryCache.Compact(100); // Clear the cache between invocations

    _tasks[0] = Task.Run(ExecuteQuery);
    await Task.Delay(60);

    for (var i = 1; i < 16; i++)
        _tasks[i] = Task.Run(ExecuteQuery);

    await Task.WhenAll(_tasks);
}

async Task<List<Customer>> ExecuteQuery()
{
    using var context = _fixture.CreateContext(_serviceProvider);
    return await context.Customers
        .AsNoTracking()
        .Include(c => c.Orders)
        .ThenInclude(o => o.OrderLines)
        .ThenInclude(ol => ol.Product)
        .ToListAsync();
}

Benchmark results

### LOCKING, NODELAY

-------------------- Histogram --------------------                                                                                          
[ 41.421 ms ;  59.439 ms) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                                                                                  
[ 59.439 ms ;  77.456 ms) |                              
[ 77.456 ms ;  92.401 ms) | @                                         
[ 92.401 ms ; 110.419 ms) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[110.419 ms ; 121.052 ms) | @@                            
[121.052 ms ; 140.837 ms) |                              
[140.837 ms ; 158.855 ms) | @@@@@@@@@@@                                                                                                      
[158.855 ms ; 179.837 ms) | @                                                                                                                
[179.837 ms ; 191.093 ms) |                                                                                                                  
[191.093 ms ; 209.110 ms) | @@@@                          
[209.110 ms ; 223.323 ms) | @                            
---------------------------------------------------         
                                                                                                                                             
// * Summary *                                                                                                                               
                                                                                                                                             
BenchmarkDotNet=v0.12.0, OS=ubuntu 20.04                             
Intel Xeon W-2133 CPU 3.60GHz, 1 CPU, 12 logical and 6 physical cores                                                                        
.NET Core SDK=5.0.100-preview.6.20266.3                   
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
                                                                      
                                                                      
|                 Method |     Mean |    Error |   StdDev |
|----------------------- |---------:|---------:|---------:|
| MultipleThreadsNoDelay | 94.62 ms | 16.87 ms | 44.73 ms |

### LOCKING, DELAY

-------------------- Histogram --------------------
[ 46.843 ms ;  86.274 ms) | @@@@@@@@@@@@@@@@@@@@@@@@@@
[ 86.274 ms ; 117.981 ms) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[117.981 ms ; 163.849 ms) | @@@@@@@@@@@@
[163.849 ms ; 200.926 ms) | @@@@@
[200.926 ms ; 222.260 ms) | @
[222.260 ms ; 253.967 ms) | @@@@@
[253.967 ms ; 277.726 ms) | 
[277.726 ms ; 309.433 ms) | @@@@@@@@@@@@@
[309.433 ms ; 343.544 ms) | @@@
[343.544 ms ; 375.251 ms) | @
---------------------------------------------------

// * Summary *

BenchmarkDotNet=v0.12.0, OS=ubuntu 20.04
Intel Xeon W-2133 CPU 3.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100-preview.6.20266.3
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT


|               Method |     Mean |    Error |   StdDev |   Median |
|--------------------- |---------:|---------:|---------:|---------:|
| MultipleThreadsDelay | 146.5 ms | 28.69 ms | 83.25 ms | 114.1 ms |



### LOCKING_OLD, NODELAY

-------------------- Histogram --------------------
[ 43.083 ms ;  58.162 ms) | @@@@@@@@@@@@@@@@@@@@@@@@@
[ 58.162 ms ;  73.240 ms) | 
[ 73.240 ms ;  81.958 ms) | 
[ 81.958 ms ;  91.323 ms) | @@@@@@
[ 91.323 ms ; 106.402 ms) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[106.402 ms ; 121.480 ms) | 
[121.480 ms ; 136.559 ms) | 
[136.559 ms ; 158.610 ms) | @@@@@@@@@@@@
[158.610 ms ; 173.689 ms) | 
[173.689 ms ; 188.767 ms) | 
[188.767 ms ; 200.078 ms) | 
[200.078 ms ; 216.918 ms) | @@
---------------------------------------------------

// * Summary *

BenchmarkDotNet=v0.12.0, OS=ubuntu 20.04
Intel Xeon W-2133 CPU 3.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100-preview.6.20266.3
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT


|                 Method |     Mean |    Error |   StdDev |
|----------------------- |---------:|---------:|---------:|
| MultipleThreadsNoDelay | 93.30 ms | 14.08 ms | 37.59 ms |


### LOCKING_OLD, DELAY

-------------------- Histogram --------------------
[ 29.367 ms ;  47.891 ms) | @@@@@@@
[ 47.891 ms ;  72.333 ms) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[ 72.333 ms ;  96.776 ms) | 
[ 96.776 ms ; 121.218 ms) | 
[121.218 ms ; 145.661 ms) | 
[145.661 ms ; 170.103 ms) | 
[170.103 ms ; 205.960 ms) | @@@@@@@@@@@@@@@@@@
[205.960 ms ; 229.476 ms) | @@@@@@@
---------------------------------------------------

// * Summary *

BenchmarkDotNet=v0.12.0, OS=ubuntu 20.04
Intel Xeon W-2133 CPU 3.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100-preview.6.20266.3
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT


|               Method |     Mean |    Error |   StdDev |   Median |
|--------------------- |---------:|---------:|---------:|---------:|
| MultipleThreadsDelay | 93.21 ms | 22.17 ms | 63.95 ms | 56.50 ms |


### NOSYNC, NODELAY

-------------------- Histogram --------------------
[ 57.361 ms ;  78.356 ms) | @@@@@@@@@@@
[ 78.356 ms ;  99.352 ms) | 
[ 99.352 ms ; 113.401 ms) | 
[113.401 ms ; 127.650 ms) | @@
[127.650 ms ; 148.646 ms) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[148.646 ms ; 168.370 ms) | @@@@@@@@@@@
[168.370 ms ; 189.366 ms) | 
[189.366 ms ; 214.092 ms) | @@@@@@@
[214.092 ms ; 235.088 ms) | @@@@@@@@@@@@@@@@@@@@@@
[235.088 ms ; 250.854 ms) | @@@@@@
[250.854 ms ; 271.849 ms) | 
[271.849 ms ; 299.214 ms) | 
[299.214 ms ; 320.210 ms) | @
---------------------------------------------------

// * Summary *

BenchmarkDotNet=v0.12.0, OS=ubuntu 20.04
Intel Xeon W-2133 CPU 3.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100-preview.6.20266.3
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT


|                 Method |     Mean |    Error |   StdDev |
|----------------------- |---------:|---------:|---------:|
| MultipleThreadsNoDelay | 165.7 ms | 19.16 ms | 54.36 ms |


### NOSYNC, DELAY

-------------------- Histogram --------------------
[ 40.168 ms ;  67.306 ms) | @@@@@@@@@
[ 67.306 ms ;  94.106 ms) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[ 94.106 ms ; 112.470 ms) | @@@@@
[112.470 ms ; 139.269 ms) | 
[139.269 ms ; 166.069 ms) | 
[166.069 ms ; 185.753 ms) | 
[185.753 ms ; 206.214 ms) | @@@@@@
[206.214 ms ; 233.014 ms) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[233.014 ms ; 257.876 ms) | @@@@@
---------------------------------------------------

// * Summary *

BenchmarkDotNet=v0.12.0, OS=ubuntu 20.04
Intel Xeon W-2133 CPU 3.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100-preview.6.20266.3
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT


|               Method |     Mean |    Error |   StdDev |   Median |
|--------------------- |---------:|---------:|---------:|---------:|
| MultipleThreadsDelay | 142.3 ms | 24.25 ms | 70.36 ms | 96.36 ms |

Closes #18516

AndriySvyryd · 2020-06-05T00:23:15Z

I hope that the benchmark code provided is only for illustration and memoryCache.Compact(100); wasn't actually run inside the benchmark

AndriySvyryd · 2020-06-05T00:27:01Z

The query itself should also be larger for the difference in performance to manifest. See #18022 for some real-world candidates

roji · 2020-06-05T01:14:24Z

I hope that the benchmark code provided is only for illustration and memoryCache.Compact(100); wasn't actually run inside the benchmark

I ran both. Running a setup/cleanup per method invocation outside the method in BenchmarkDotNet can be done with IterationSetup, but that setting has a significant impact on the number of iterations etc. Since the cache only ever has one entry the impact should be pretty negligible (and I didn't dive to deep into tweaking BDN). If you'd like to see more data I can also loop inside the function to reduce the impact of the clearing. At the end of the day I also don't think it matters too much - I think we all agreed to remove the current spinning loop, and between the proposed locking and not doing any synchronization I don't think it matters that much.

The query itself should also be larger for the difference in performance to manifest. See #18022 for some real-world candidates

Not sure we call that a real-world candidate, more like a cartesian explosion nightmare we recommend avoiding :)

But if you think that's important for deciding what to do with this PR, let me know and I'll run that.

AndriySvyryd · 2020-06-05T01:39:30Z

Not sure we call that a real-world candidate, more like a cartesian explosion nightmare we recommend avoiding :)

Yes, I'm talking only about the query compilation. The database shouldn't have any data for the benchmark

smitpatel · 2020-06-05T01:52:58Z

I think a more suitable example would be dynamic code generation with continuous AND conditions with a skewed binary tree, where perhaps only 1 constant value in very right changes. That gives a really large expression tree (which #18022 actually does not) and at the same time cache miss.

AndriySvyryd · 2020-06-05T02:04:35Z

and at the same time cache miss.

I don't think this scenario is common enough. We could pregenerate the large skewed expression tree and use the same one in each iteration.

roji · 2020-06-05T15:24:24Z

We can certainly spend a lot of time tweaking and measuring this scenario, and I'll do the work if you guys think it's justified. Note also that the current PR does retain the locking (like @smitpatel originally wanted) - it just does it in a much better way (blocking on the lock instead of spinning). Unless someone strongly feels that the proposed locking mechanism is somehow bad, I don't think there's much value in continuing to investigate and benchmark this.

Let me know.

roji · 2020-06-05T15:32:48Z

One more note - adding an entry to the cache and then compacting takes less than two microseconds. That means it's really quite negligible, and including Compact inside the benchmark should be fine (no need to work extra hard to generate different expression trees etc.).

BenchmarkDotNet=v0.12.0, OS=ubuntu 20.04
Intel Xeon W-2133 CPU 3.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100-preview.6.20266.3
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT

Method	Mean	Error	StdDev
MemoryAddAndCompact	1.756 us	0.0315 us	0.0263 us

Benchmark

public class Program
{
    MemoryCache _cache;

    [GlobalSetup]
    public void Setup()
    {
        _cache = new MemoryCache(new MemoryCacheOptions { SizeLimit = 10240 });
    }

    [Benchmark]
    public void MemoryAddAndCompact()
    {
        _cache.Set("someKey", "somevalue", new MemoryCacheEntryOptions { Size = 10 });
        _cache.Compact(100);
    }

    static void Main(string[] args)
        => BenchmarkRunner.Run<Program>();
}

AndriySvyryd

I think this is an improvement, but it can still be improved further

roji · 2020-06-05T16:45:20Z

@AndriySvyryd @smitpatel can you provide more details on what you'd like to see? Our original discussion was between two options: switch to prevent concurrent compilation of the same query via locks (this is done by this PR), or removing concurrent compilation prevention altogether. What could we be doing better here?

smitpatel · 2020-06-05T16:57:39Z

What I would like to see is,
some perf numbers which proves that we are improving something here, or an article which details both the types of sync mechanism with pros/cons of both to determine if this is really needed change.
I am not against changing this code but it is crucial code path in query which has worked for years without any customer issue of any kind so we should do some decent research before we make the change.
Since @AndriySvyryd approved the PR, I am fine taking his word that this is improvement and ok with merging current change set. I do not have any other pattern to suggest.

AndriySvyryd · 2020-06-05T17:30:44Z

You could decompose lock into Monitor calls to save on some calls and perhaps allocations, but thinking about it I realized that it would introduce significant complexity for minor perf gain in corner cases, so you can stop thinking about this after this PR is in. Fixing #12905 would probably be better

roji · 2020-06-05T17:33:55Z

@AndriySvyryd I agree, the scenario where a query isn't already cached, and is being compiled by another thread, really, really doesn't seem like it's worth optimizing to this level.

roji · 2020-06-05T17:38:34Z

some perf numbers which proves that we are improving something here, or an article which details both the types of sync mechanism with pros/cons of both to determine if this is really needed change.

@smitpatel the basic thing here is to avoid uncontrolled spin looping, which is a bad idea in almost any scenario. This is less about actual, visible perf (although I believe that's also relevant), and more about safety: if in any way our compilation blocks or takes a very long time (we've had several bugs like this in the history of EF), that means other threads are occupying CPU cores with 100% spin loops. Basically, we should never, ever spin without at least using something like SpinWait, which spins for a while and then switches to waiting. In any case that isn't really relevant here.

Rewrite cache synchronization to lock instead of spin

853c286

Closes #18516

roji requested review from smitpatel and AndriySvyryd June 3, 2020 12:06

AndriySvyryd approved these changes Jun 5, 2020

View reviewed changes

roji merged commit fb28b56 into master Jun 5, 2020

roji deleted the CacheSync branch June 5, 2020 17:34

ajcvickers mentioned this pull request Jun 11, 2020

Entity Framework Weekly Status Updates (2020) #19549

Closed

smitpatel mentioned this pull request Jul 21, 2020

Bad Sql request generated for Select #21666

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite cache synchronization to lock instead of spin #21124

Rewrite cache synchronization to lock instead of spin #21124

roji commented Jun 3, 2020

AndriySvyryd commented Jun 5, 2020

AndriySvyryd commented Jun 5, 2020

roji commented Jun 5, 2020

AndriySvyryd commented Jun 5, 2020 •

edited

Loading

smitpatel commented Jun 5, 2020

AndriySvyryd commented Jun 5, 2020 •

edited

Loading

roji commented Jun 5, 2020

roji commented Jun 5, 2020 •

edited

Loading

AndriySvyryd left a comment

roji commented Jun 5, 2020

smitpatel commented Jun 5, 2020

AndriySvyryd commented Jun 5, 2020

roji commented Jun 5, 2020

roji commented Jun 5, 2020

Rewrite cache synchronization to lock instead of spin #21124

Rewrite cache synchronization to lock instead of spin #21124

Conversation

roji commented Jun 3, 2020

AndriySvyryd commented Jun 5, 2020

AndriySvyryd commented Jun 5, 2020

roji commented Jun 5, 2020

AndriySvyryd commented Jun 5, 2020 • edited Loading

smitpatel commented Jun 5, 2020

AndriySvyryd commented Jun 5, 2020 • edited Loading

roji commented Jun 5, 2020

roji commented Jun 5, 2020 • edited Loading

AndriySvyryd left a comment

Choose a reason for hiding this comment

roji commented Jun 5, 2020

smitpatel commented Jun 5, 2020

AndriySvyryd commented Jun 5, 2020

roji commented Jun 5, 2020

roji commented Jun 5, 2020

AndriySvyryd commented Jun 5, 2020 •

edited

Loading

AndriySvyryd commented Jun 5, 2020 •

edited

Loading

roji commented Jun 5, 2020 •

edited

Loading