Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

await ValueTask performance #11803

Closed
hpbieker opened this issue Jan 13, 2019 · 8 comments
Closed

await ValueTask performance #11803

hpbieker opened this issue Jan 13, 2019 · 8 comments
Labels
area-System.Threading untriaged New issue has not been triaged by the area owner
Milestone

Comments

@hpbieker
Copy link

I did some performance testing for async-await on a completed ValueTask and compared it to a Task.CompletedTask. I see that for Task.CompletedTask it does not matter if I bypass await if I check the Task for IsCompletedSuccessfully, but for ValueTask it has significant performance impact (3x). Is there a missing optimization in the async code generated for async-await when using ValueTask?

BenchmarkDotNet=v0.11.3, OS=Windows 7 SP1 (6.1.7601.0)
Intel Core i5-6300U CPU 2.40GHz (Skylake), 1 CPU, 4 logical and 2 physical cores
Frequency=2437558 Hz, Resolution=410.2466 ns, Timer=TSC
.NET Core SDK=2.2.100
  [Host]     : .NET Core 2.2.0 (CoreCLR 4.6.27110.04, CoreFX 4.6.27110.04), 64bit RyuJIT
  Job-OUSAWM : .NET Core 2.2.0 (CoreCLR 4.6.27110.04, CoreFX 4.6.27110.04), 64bit RyuJIT

WarmupCount=1

                                 Method |       Mean |     Error |    StdDev | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
--------------------------------------- |-----------:|----------:|----------:|------------:|------------:|------------:|--------------------:|
                              ValueTask | 1,031.7 ns | 16.990 ns | 15.892 ns |           - |           - |           - |                   - |
     ValueTaskIfIsCompletedSuccessfully |   291.5 ns |  2.743 ns |  2.291 ns |           - |           - |           - |                   - |
 CompletedTaskIfIsCompletedSuccessfully |   277.7 ns |  5.460 ns |  6.288 ns |           - |           - |           - |                   - |
                          CompletedTask |   319.5 ns |  4.782 ns |  3.993 ns |           - |           - |           - |                   - |
        [Benchmark]
        public async Task ValueTask()
        {
            for (int i = 0; i < 100; i++)
                await new ValueTask();
        }

        [Benchmark]
        public async Task ValueTaskIfIsCompletedSuccessfully()
        {
            for (int i = 0; i < 100; i++)
            {
                var valueTask = new ValueTask();
                if (!valueTask.IsCompletedSuccessfully)
                    await valueTask;
            }
        }

        [Benchmark]
        public async Task CompletedTaskIfIsCompletedSuccessfully()
        {
            for (int i = 0; i < 100; i++)
            {
                var task = Task.CompletedTask;
                if (!task.IsCompletedSuccessfully)
                    await task;
            }
        }

        [Benchmark]
        public async Task CompletedTask()
        {
            for (int i = 0; i < 100; i++)
                await Task.CompletedTask;
        }
@MarcoRossignoli
Copy link
Member

MarcoRossignoli commented Jan 13, 2019

/cc @stephentoub @benaadams

@stephentoub
Copy link
Member

stephentoub commented Jan 13, 2019

The root cause here is likely https://github.com/dotnet/coreclr/issues/18542. But even without that, there are still additional costs to ValueTask as discussed in https://blogs.msdn.microsoft.com/dotnet/2018/11/07/understanding-the-whys-whats-and-whens-of-valuetask/, and the only reason for using the non-generic ValueTask is if you believe you'll be able to avoid asynchronous allocations via IValueTaskSource… otherwise, you're better off just returning Task and using CompletedTask (which is why the non-generic ValueTask was only introduced when the IValueTaskSource mechanism was introduced).

@benaadams
Copy link
Member

benaadams commented Jan 14, 2019

#10540 more heavily affects ConfigureAwait

Tweaking to use the OperationsPerInvoke param i.e.

[Benchmark(OperationsPerInvoke = LoopIterations)]
public async ValueTask ValueTask()
{
    for (int i = 0; i < LoopIterations; i++)

And adding .ConfigureAwait variants; results on 3.0

                                 Method |      Mean |    Median | Ratio |        Op/sec |
--------------------------------------- |----------:|----------:|------:|--------------:|
                              ValueTask |  9.093 ns |  9.094 ns |  3.24 | 109,974,705.8 |
                ValueTaskConfigureAwait | 16.064 ns | 16.060 ns |  5.73 |  62,250,996.0 |
     ValueTaskIfIsCompletedSuccessfully |  2.544 ns |  2.543 ns |  0.91 | 393,081,761.0 |
 CompletedTaskIfIsCompletedSuccessfully |  2.496 ns |  2.493 ns |  0.89 | 400,641,025.6 |
                          CompletedTask |  2.805 ns |  2.802 ns |  1.00 | 356,506,238.9 |
            CompletedTaskConfigureAwait | 16.760 ns | 12.796 ns |  6.48 |  59,665,871.1 |

The issue with the test is ValueTask and ValueTaskIfIsCompletedSuccessfully aren't equivalent; for ValueTask you need to call .GetResult() to reset the IValueTaskSource if there is one.

Changing to

[Benchmark(OperationsPerInvoke = LoopIterations)]
public async ValueTask ValueTaskIfIsCompletedSuccessfully()
{
    for (int i = 0; i < LoopIterations; i++)
    {
        var valueTask = new ValueTask();
        if (!valueTask.IsCompletedSuccessfully)
            await valueTask;
        else
            valueTask.GetAwaiter().GetResult();
    }
}

Then you get

                             Method |      Mean |        Op/sec |
----------------------------------- |----------:|--------------:|
                          ValueTask |  9.175 ns | 108,991,825.6 |
 ValueTaskIfIsCompletedSuccessfully |  9.074 ns | 110,204,981.3 |

@stephentoub
Copy link
Member

The issue with the test is ValueTask and ValueTaskIfIsCompletedSuccessfully aren't equivalent; for ValueTask you need to call .GetResult() to reset the IValueTaskSource if there is one.

Good catch, yes. While that doesn't matter specifically for default(ValueTask), for general consumption of arbitrary ValueTask instances handed to you, it's required; otherwise various cleanup won't happen, e.g. not returning cached instances to pools.

@benaadams
Copy link
Member

Aside, optimizing with IsCompletedSuccessfully is more about not creating the outer async statemachine; rather than costs associated with await which aren't that great (as your original benchmark shows)

Though you'd like need to be quite sensitive to the performance impact to make the extra code completely worthwhile:
e.g.

                    Method |      Mean | Ratio |        Op/sec |
-------------------------- |----------:|------:|--------------:|
                 ValueTask | 10.890 ns |  3.56 |  91,827,364.5 |
     ValueTaskSyncFastPath |  9.158 ns |  2.99 | 109,194,147.2 |
             CompletedTask |  3.058 ns |  1.00 | 327,011,118.4 |
 CompletedTaskSyncFastPath |  2.425 ns |  0.79 | 412,371,134.0 |

For

const int LoopIterations = 100;

[MethodImpl(MethodImplOptions.NoInlining)]
static Task OperationAsync() => Task.CompletedTask;

[MethodImpl(MethodImplOptions.NoInlining)]
static ValueTask ValueTaskOperationAsync() => default;

[Benchmark(OperationsPerInvoke = LoopIterations)]
public async ValueTask ValueTask()
{
    for (int i = 0; i < LoopIterations; i++)
    {
        ValueTask task = ValueTaskOperationAsync();
        await task;
    }
}

[Benchmark(OperationsPerInvoke = LoopIterations)]
public ValueTask ValueTaskSyncFastPath()
{
    for (int i = 0; i < LoopIterations; i++)
    {
        var valueTask = ValueTaskOperationAsync();
        if (!valueTask.IsCompletedSuccessfully)
        {
            return ValueTaskAsync(i, valueTask);
        }
        else
        {
            valueTask.GetAwaiter().GetResult();
        }
    }

    return default;

    async ValueTask ValueTaskAsync(int iter, ValueTask valueTask)
    {
        await valueTask;
        iter++;
        for (int i = iter; i < LoopIterations; i++)
        {
            valueTask = ValueTaskOperationAsync();
            await valueTask;
        }
    }
}

[Benchmark(OperationsPerInvoke = LoopIterations, Baseline = true)]
public async Task CompletedTask()
{
    for (int i = 0; i < LoopIterations; i++)
    {
        Task task = OperationAsync();
        await task;
    }
}

[Benchmark(OperationsPerInvoke = LoopIterations)]
public Task CompletedTaskSyncFastPath()
{
    for (int i = 0; i < LoopIterations; i++)
    {
        Task task = OperationAsync();
        if (!task.IsCompletedSuccessfully)
        {
            return CompletedTaskAsync(i, task);
        }
    }

    return Task.CompletedTask;

    async Task CompletedTaskAsync(int iter, Task task)
    {
        await task;
        iter++;
        for (int i = iter; i < LoopIterations; i++)
        {
            task = OperationAsync();
            await task;
        }
    }
}

@benaadams
Copy link
Member

Saying that the loop amortizes the async effect (as one statemachine is created for 100 awaits); if its just acting on a single Task-like; then it becomes more worthwhile to avoid:

                    Method |      Mean | Ratio |        Op/sec |
-------------------------- |----------:|------:|--------------:|
                 ValueTask | 33.450 ns |  1.41 |  29,895,366.2 |
     ValueTaskSyncFastPath | 14.514 ns |  0.61 |  68,898,994.1 |
             CompletedTask | 23.665 ns |  1.00 |  42,256,496.9 |
 CompletedTaskSyncFastPath |  4.021 ns |  0.17 | 248,694,354.6 |
[MethodImpl(MethodImplOptions.NoInlining)]
static Task OperationAsync() => Task.CompletedTask;

[MethodImpl(MethodImplOptions.NoInlining)]
static ValueTask ValueTaskOperationAsync() => default;

[Benchmark]
public async ValueTask ValueTask()
{
    await ValueTaskOperationAsync();
}

[Benchmark]
public ValueTask ValueTaskSyncFastPath()
{
    var valueTask = ValueTaskOperationAsync();
    if (!valueTask.IsCompletedSuccessfully)
    {
        return ValueTaskAsync(valueTask);
    }
    else
    {
        valueTask.GetAwaiter().GetResult();
    }

    return default;

    async ValueTask ValueTaskAsync(ValueTask vt)
    {
        await vt;
    }
}

[Benchmark(Baseline = true)]
public async Task CompletedTask()
{
    await OperationAsync();
}

[Benchmark]
public Task CompletedTaskSyncFastPath()
{
    Task task = OperationAsync();
    if (!task.IsCompletedSuccessfully)
    {
        return CompletedTaskAsync(task);
    }

    return Task.CompletedTask;

    async Task CompletedTaskAsync(Task t)
    {
        await t;
    }
}

The normal optimization that applies here is pass-though rather than await e.g.

Task DoAsync()
{
    return OperationAsync();
}

rather than

async Task DoAsync()
{
    await OperationAsync();
}

Though you have to await if there is exception handling, or using etc as you want that to run after the Task is complete, not at the point it goes async but before it has completed.

@benaadams
Copy link
Member

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 26, 2020
@stephentoub
Copy link
Member

We will continue to look for ways to reduce the overheads, but I don't think there's anything actionable here anymore, so closing. Thanks.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 14, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Threading untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

6 participants