await ValueTask performance #11803

hpbieker · 2019-01-13T08:39:17Z

I did some performance testing for async-await on a completed ValueTask and compared it to a Task.CompletedTask. I see that for Task.CompletedTask it does not matter if I bypass await if I check the Task for IsCompletedSuccessfully, but for ValueTask it has significant performance impact (3x). Is there a missing optimization in the async code generated for async-await when using ValueTask?

BenchmarkDotNet=v0.11.3, OS=Windows 7 SP1 (6.1.7601.0)
Intel Core i5-6300U CPU 2.40GHz (Skylake), 1 CPU, 4 logical and 2 physical cores
Frequency=2437558 Hz, Resolution=410.2466 ns, Timer=TSC
.NET Core SDK=2.2.100
  [Host]     : .NET Core 2.2.0 (CoreCLR 4.6.27110.04, CoreFX 4.6.27110.04), 64bit RyuJIT
  Job-OUSAWM : .NET Core 2.2.0 (CoreCLR 4.6.27110.04, CoreFX 4.6.27110.04), 64bit RyuJIT

WarmupCount=1

                                 Method |       Mean |     Error |    StdDev | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
--------------------------------------- |-----------:|----------:|----------:|------------:|------------:|------------:|--------------------:|
                              ValueTask | 1,031.7 ns | 16.990 ns | 15.892 ns |           - |           - |           - |                   - |
     ValueTaskIfIsCompletedSuccessfully |   291.5 ns |  2.743 ns |  2.291 ns |           - |           - |           - |                   - |
 CompletedTaskIfIsCompletedSuccessfully |   277.7 ns |  5.460 ns |  6.288 ns |           - |           - |           - |                   - |
                          CompletedTask |   319.5 ns |  4.782 ns |  3.993 ns |           - |           - |           - |                   - |

        [Benchmark]
        public async Task ValueTask()
        {
            for (int i = 0; i < 100; i++)
                await new ValueTask();
        }

        [Benchmark]
        public async Task ValueTaskIfIsCompletedSuccessfully()
        {
            for (int i = 0; i < 100; i++)
            {
                var valueTask = new ValueTask();
                if (!valueTask.IsCompletedSuccessfully)
                    await valueTask;
            }
        }

        [Benchmark]
        public async Task CompletedTaskIfIsCompletedSuccessfully()
        {
            for (int i = 0; i < 100; i++)
            {
                var task = Task.CompletedTask;
                if (!task.IsCompletedSuccessfully)
                    await task;
            }
        }

        [Benchmark]
        public async Task CompletedTask()
        {
            for (int i = 0; i < 100; i++)
                await Task.CompletedTask;
        }

The text was updated successfully, but these errors were encountered:

MarcoRossignoli · 2019-01-13T11:12:21Z

/cc @stephentoub @benaadams

stephentoub · 2019-01-13T16:50:42Z

The root cause here is likely https://github.com/dotnet/coreclr/issues/18542. But even without that, there are still additional costs to ValueTask as discussed in https://blogs.msdn.microsoft.com/dotnet/2018/11/07/understanding-the-whys-whats-and-whens-of-valuetask/, and the only reason for using the non-generic ValueTask is if you believe you'll be able to avoid asynchronous allocations via IValueTaskSource… otherwise, you're better off just returning Task and using CompletedTask (which is why the non-generic ValueTask was only introduced when the IValueTaskSource mechanism was introduced).

benaadams · 2019-01-14T16:46:56Z

#10540 more heavily affects ConfigureAwait

Tweaking to use the OperationsPerInvoke param i.e.

[Benchmark(OperationsPerInvoke = LoopIterations)]
public async ValueTask ValueTask()
{
    for (int i = 0; i < LoopIterations; i++)

And adding .ConfigureAwait variants; results on 3.0

                                 Method |      Mean |    Median | Ratio |        Op/sec |
--------------------------------------- |----------:|----------:|------:|--------------:|
                              ValueTask |  9.093 ns |  9.094 ns |  3.24 | 109,974,705.8 |
                ValueTaskConfigureAwait | 16.064 ns | 16.060 ns |  5.73 |  62,250,996.0 |
     ValueTaskIfIsCompletedSuccessfully |  2.544 ns |  2.543 ns |  0.91 | 393,081,761.0 |
 CompletedTaskIfIsCompletedSuccessfully |  2.496 ns |  2.493 ns |  0.89 | 400,641,025.6 |
                          CompletedTask |  2.805 ns |  2.802 ns |  1.00 | 356,506,238.9 |
            CompletedTaskConfigureAwait | 16.760 ns | 12.796 ns |  6.48 |  59,665,871.1 |

The issue with the test is ValueTask and ValueTaskIfIsCompletedSuccessfully aren't equivalent; for ValueTask you need to call .GetResult() to reset the IValueTaskSource if there is one.

Changing to

[Benchmark(OperationsPerInvoke = LoopIterations)]
public async ValueTask ValueTaskIfIsCompletedSuccessfully()
{
    for (int i = 0; i < LoopIterations; i++)
    {
        var valueTask = new ValueTask();
        if (!valueTask.IsCompletedSuccessfully)
            await valueTask;
        else
            valueTask.GetAwaiter().GetResult();
    }
}

Then you get

                             Method |      Mean |        Op/sec |
----------------------------------- |----------:|--------------:|
                          ValueTask |  9.175 ns | 108,991,825.6 |
 ValueTaskIfIsCompletedSuccessfully |  9.074 ns | 110,204,981.3 |

stephentoub · 2019-01-14T16:50:07Z

The issue with the test is ValueTask and ValueTaskIfIsCompletedSuccessfully aren't equivalent; for ValueTask you need to call .GetResult() to reset the IValueTaskSource if there is one.

Good catch, yes. While that doesn't matter specifically for default(ValueTask), for general consumption of arbitrary ValueTask instances handed to you, it's required; otherwise various cleanup won't happen, e.g. not returning cached instances to pools.

benaadams · 2019-01-14T17:43:23Z

Aside, optimizing with IsCompletedSuccessfully is more about not creating the outer async statemachine; rather than costs associated with await which aren't that great (as your original benchmark shows)

Though you'd like need to be quite sensitive to the performance impact to make the extra code completely worthwhile:
e.g.

                    Method |      Mean | Ratio |        Op/sec |
-------------------------- |----------:|------:|--------------:|
                 ValueTask | 10.890 ns |  3.56 |  91,827,364.5 |
     ValueTaskSyncFastPath |  9.158 ns |  2.99 | 109,194,147.2 |
             CompletedTask |  3.058 ns |  1.00 | 327,011,118.4 |
 CompletedTaskSyncFastPath |  2.425 ns |  0.79 | 412,371,134.0 |

For

const int LoopIterations = 100;

[MethodImpl(MethodImplOptions.NoInlining)]
static Task OperationAsync() => Task.CompletedTask;

[MethodImpl(MethodImplOptions.NoInlining)]
static ValueTask ValueTaskOperationAsync() => default;

[Benchmark(OperationsPerInvoke = LoopIterations)]
public async ValueTask ValueTask()
{
    for (int i = 0; i < LoopIterations; i++)
    {
        ValueTask task = ValueTaskOperationAsync();
        await task;
    }
}

[Benchmark(OperationsPerInvoke = LoopIterations)]
public ValueTask ValueTaskSyncFastPath()
{
    for (int i = 0; i < LoopIterations; i++)
    {
        var valueTask = ValueTaskOperationAsync();
        if (!valueTask.IsCompletedSuccessfully)
        {
            return ValueTaskAsync(i, valueTask);
        }
        else
        {
            valueTask.GetAwaiter().GetResult();
        }
    }

    return default;

    async ValueTask ValueTaskAsync(int iter, ValueTask valueTask)
    {
        await valueTask;
        iter++;
        for (int i = iter; i < LoopIterations; i++)
        {
            valueTask = ValueTaskOperationAsync();
            await valueTask;
        }
    }
}

[Benchmark(OperationsPerInvoke = LoopIterations, Baseline = true)]
public async Task CompletedTask()
{
    for (int i = 0; i < LoopIterations; i++)
    {
        Task task = OperationAsync();
        await task;
    }
}

[Benchmark(OperationsPerInvoke = LoopIterations)]
public Task CompletedTaskSyncFastPath()
{
    for (int i = 0; i < LoopIterations; i++)
    {
        Task task = OperationAsync();
        if (!task.IsCompletedSuccessfully)
        {
            return CompletedTaskAsync(i, task);
        }
    }

    return Task.CompletedTask;

    async Task CompletedTaskAsync(int iter, Task task)
    {
        await task;
        iter++;
        for (int i = iter; i < LoopIterations; i++)
        {
            task = OperationAsync();
            await task;
        }
    }
}

benaadams · 2019-01-14T18:20:07Z

Saying that the loop amortizes the async effect (as one statemachine is created for 100 awaits); if its just acting on a single Task-like; then it becomes more worthwhile to avoid:

                    Method |      Mean | Ratio |        Op/sec |
-------------------------- |----------:|------:|--------------:|
                 ValueTask | 33.450 ns |  1.41 |  29,895,366.2 |
     ValueTaskSyncFastPath | 14.514 ns |  0.61 |  68,898,994.1 |
             CompletedTask | 23.665 ns |  1.00 |  42,256,496.9 |
 CompletedTaskSyncFastPath |  4.021 ns |  0.17 | 248,694,354.6 |

[MethodImpl(MethodImplOptions.NoInlining)]
static Task OperationAsync() => Task.CompletedTask;

[MethodImpl(MethodImplOptions.NoInlining)]
static ValueTask ValueTaskOperationAsync() => default;

[Benchmark]
public async ValueTask ValueTask()
{
    await ValueTaskOperationAsync();
}

[Benchmark]
public ValueTask ValueTaskSyncFastPath()
{
    var valueTask = ValueTaskOperationAsync();
    if (!valueTask.IsCompletedSuccessfully)
    {
        return ValueTaskAsync(valueTask);
    }
    else
    {
        valueTask.GetAwaiter().GetResult();
    }

    return default;

    async ValueTask ValueTaskAsync(ValueTask vt)
    {
        await vt;
    }
}

[Benchmark(Baseline = true)]
public async Task CompletedTask()
{
    await OperationAsync();
}

[Benchmark]
public Task CompletedTaskSyncFastPath()
{
    Task task = OperationAsync();
    if (!task.IsCompletedSuccessfully)
    {
        return CompletedTaskAsync(task);
    }

    return Task.CompletedTask;

    async Task CompletedTaskAsync(Task t)
    {
        await t;
    }
}

The normal optimization that applies here is pass-though rather than await e.g.

Task DoAsync()
{
    return OperationAsync();
}

rather than

async Task DoAsync()
{
    await OperationAsync();
}

Though you have to await if there is exception handling, or using etc as you want that to run after the Task is complete, not at the point it goes async but before it has completed.

benaadams · 2019-01-15T00:44:49Z

Note: related https://github.com/dotnet/coreclr/issues/21973

stephentoub · 2020-02-28T22:00:14Z

We will continue to look for ways to reduce the overheads, but I don't think there's anything actionable here anymore, so closing. Thanks.

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the Future milestone Jan 31, 2020

maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 26, 2020

stephentoub closed this as completed Feb 28, 2020

ghost locked as resolved and limited conversation to collaborators Dec 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

await ValueTask performance #11803

await ValueTask performance #11803

hpbieker commented Jan 13, 2019

MarcoRossignoli commented Jan 13, 2019 •

edited

Loading

stephentoub commented Jan 13, 2019 •

edited

Loading

benaadams commented Jan 14, 2019 •

edited

Loading

stephentoub commented Jan 14, 2019

benaadams commented Jan 14, 2019

benaadams commented Jan 14, 2019

benaadams commented Jan 15, 2019

stephentoub commented Feb 28, 2020

await ValueTask performance #11803

await ValueTask performance #11803

Comments

hpbieker commented Jan 13, 2019

MarcoRossignoli commented Jan 13, 2019 • edited Loading

stephentoub commented Jan 13, 2019 • edited Loading

benaadams commented Jan 14, 2019 • edited Loading

stephentoub commented Jan 14, 2019

benaadams commented Jan 14, 2019

benaadams commented Jan 14, 2019

benaadams commented Jan 15, 2019

stephentoub commented Feb 28, 2020

MarcoRossignoli commented Jan 13, 2019 •

edited

Loading

stephentoub commented Jan 13, 2019 •

edited

Loading

benaadams commented Jan 14, 2019 •

edited

Loading