Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5.1.2 + System.Transactions with volatile resource manager gives "The operation is not valid for the current state of the enlistment" exceptions under concurrent load #2262

Closed
coleman-c opened this issue Dec 7, 2023 · 6 comments

Comments

@coleman-c
Copy link

coleman-c commented Dec 7, 2023

Describe the bug

After upgrading to EFCore 8 from EFCore 6 we started seeing "System.InvalidOperationException: The operation is not valid for the current state of the enlistment." errors in our logfiles.

A bit of research has isolated the issue to the increased version of the SqlClient transitive dependency, specifically the update from 5.0.1 to 5.0.2. The issue is present on the latest 5.1.2 version also. Included below is a sample application that reproduces the issue.

In low volume testing the issue didn't appear, but under moderate load appeared reliably.

The full exception we are seeing is below:

System.InvalidOperationException: The operation is not valid for the current state of the enlistment.
   at System.Transactions.EnlistmentState.Committed(InternalEnlistment enlistment)
   at System.Transactions.SinglePhaseEnlistment.Committed()
   at Microsoft.Data.SqlClient.SqlDelegatedTransaction.SinglePhaseCommit(SinglePhaseEnlistment enlistment)
   at System.Transactions.DurableEnlistmentCommitting.EnterState(InternalEnlistment enlistment)
   at System.Transactions.PreparingEnlistment.Prepared()
   at MyEnlistmentNotification.Prepare(PreparingEnlistment preparingEnlistment) in E:\Src\_scratch\ConsoleApp3\Program.cs:line 36
   at System.Transactions.VolatileEnlistmentPreparing.EnterState(InternalEnlistment enlistment)
   at System.Transactions.TransactionStateVolatilePhase1.EnterState(InternalTransaction tx)
   at System.Transactions.CommittableTransaction.Commit()
   at System.Transactions.TransactionScope.InternalDispose()
   at System.Transactions.TransactionScope.Dispose()
   at Program.<>c.<<<Main>$>b__0_0>d.MoveNext() in E:\Src\_scratch\ConsoleApp3\Program.cs:line 18
--- End of stack trace from previous location ---
   at Program.<Main>$(String[] args) in E:\Src\_scratch\ConsoleApp3\Program.cs:line 24
   at Program.<Main>(String[] args)

To reproduce

A complete console app is shown below that reproduces the issue reliably. Swapping between the two referenced versions of Microsoft.Data.SqlClient should demonstrate it working or failing in 5.0.1 and 5.0.2 repectively.

using System.Transactions;
using Microsoft.Data.SqlClient;

List<Task> tasks = [];
for (int i = 0; i < 1000; ++i)
{
    var task = Task.Run(async () =>
    {
        using (var tx = new TransactionScope(TransactionScopeAsyncFlowOption.Enabled))
        {
            SqlConnection conn = new("...");
            await conn.OpenAsync();
            conn.Close();

            Transaction.Current.EnlistVolatile(new MyEnlistmentNotification(), EnlistmentOptions.None);

            tx.Complete();
        }
    });

    tasks.Add(task);
}
await Task.WhenAll(tasks);

internal class MyEnlistmentNotification : IEnlistmentNotification
{
    public void Commit(Enlistment enlistment)
    {
        Thread.Sleep(500);
        enlistment.Done();
    }

    public void InDoubt(Enlistment enlistment) => enlistment.Done();
    public void Prepare(PreparingEnlistment preparingEnlistment) => preparingEnlistment.Prepared();
    public void Rollback(Enlistment enlistment) => enlistment.Done();
}
<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net8.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
  </PropertyGroup>
  <ItemGroup>
    <!--Broke-->
    <PackageReference Include="Microsoft.Data.SqlClient" Version="5.0.2" />
    <!--Works-->
    <!--<PackageReference Include="Microsoft.Data.SqlClient" Version="5.0.1" />-->
  </ItemGroup>
</Project>

Expected behavior

The same behaviour in 5.0.2 and greater as in 5.0.1.

Further technical details

Microsoft.Data.SqlClient version: 5.0.2 / 5.1.2
.NET target: net8
SQL Server version: SQL Server 2016/2022
Operating system: net8 debian docker image & Windows 11

Additional context
Currently this prevents us migrating to any EFCore versions > 7.0.5 since they all require a SqlClient dependency > 5.0.1

@ErikEJ
Copy link
Contributor

ErikEJ commented Dec 7, 2023

Have you tested 5.1.2 ?

@coleman-c
Copy link
Author

Yes, sorry - I should have included that information. I've tested the latest version also, 5.1.2. I'll edit and add that to the initial post for clarity.

@coleman-c coleman-c changed the title 5.0.2 + System.Transactions with volatile resource manager gives "The operation is not valid for the current state of the enlistment" exceptions under concurrent load 5.1.2 + System.Transactions with volatile resource manager gives "The operation is not valid for the current state of the enlistment" exceptions under concurrent load Dec 7, 2023
@JRahnama JRahnama added the 🆕 Triage Needed For new issues, not triaged yet. label Dec 9, 2023
@kf-gonzalez kf-gonzalez removed the 🆕 Triage Needed For new issues, not triaged yet. label Dec 12, 2023
@DavoudEshtehari
Copy link
Contributor

Sharing the result from my runs:

  • 5.2.0-preview4.23342.2
    ---> System.Transactions.TransactionAbortedException: The transaction has aborted. ---> System.InvalidOperationException: The requested operation cannot be completed because the connection has been broken. --- End of inner exception stack trace --- at System.Transactions.TransactionStateAborted.EndCommit(InternalTransaction tx) at System.Transactions.CommittableTransaction.Commit() at System.Transactions.TransactionScope.InternalDispose() at System.Transactions.TransactionScope.Dispose() at Test.TestTAE.<>c.<<GH2262>b__4_0>d.MoveNext() in D:\Repos\sqlclients\SqlClient-20232\TransactionAbortedEx\Program.cs:line 313 --- End of stack trace from previous location --- at Test.TestTAE.GH2262() in D:\Repos\sqlclients\SqlClient-20232\TransactionAbortedEx\Program.cs:line 318 --- End of inner exception stack trace --- at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions) at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken) at System.Threading.Tasks.Task.Wait() at Test.TestTAE.Main(String[] args) in D:\Repos\sqlclients\SqlClient-20232\TransactionAbortedEx\Program.cs:line 338 at Test.TestTAE.<Main>(String[] args)

  • 5.1.2 & 5.1.1 & 5.0.2
    ---> System.InvalidOperationException: The operation is not valid for the current state of the enlistment. at System.Transactions.EnlistmentState.Committed(InternalEnlistment enlistment) at System.Transactions.SinglePhaseEnlistment.Committed() at Microsoft.Data.SqlClient.SqlDelegatedTransaction.SinglePhaseCommit(SinglePhaseEnlistment enlistment) at System.Transactions.DurableEnlistmentCommitting.EnterState(InternalEnlistment enlistment) at System.Transactions.PreparingEnlistment.Prepared() at Test.TestTAE.MyEnlistmentNotification.Prepare(PreparingEnlistment preparingEnlistment)

  • 5.0.1
    ---> System.Transactions.TransactionException: The operation is not valid for the state of the transaction. ---> System.TimeoutException: Transaction Timeout --- End of inner exception stack trace --- at System.Transactions.TransactionState.EnlistPromotableSinglePhase(InternalTransaction tx, IPromotableSinglePhaseNotification promotableSinglePhaseNotification, Transaction atomicTransaction, Guid promoterType) at System.Transactions.Transaction.EnlistPromotableSinglePhase(IPromotableSinglePhaseNotification promotableSinglePhaseNotification, Guid promoterType) at System.Transactions.Transaction.EnlistPromotableSinglePhase(IPromotableSinglePhaseNotification promotableSinglePhaseNotification) at Microsoft.Data.SqlClient.SqlInternalConnection.EnlistNonNull(Transaction tx) at Microsoft.Data.ProviderBase.DbConnectionPool.PrepareConnection(DbConnection owningObject, DbConnectionInternal obj, Transaction transaction) at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection) at Microsoft.Data.ProviderBase.DbConnectionPool.WaitForPendingOpen()

@ajcvickers
Copy link

/cc @roji

@vonzshik
Copy link

vonzshik commented Dec 15, 2023

Just a standard race condition in SqlClient, nothing new here. Short version, SqlClient manages to mix two different transactions together, which results in it dooming the connection.

First, SqlClient tries to cleanup connection on transaction commit, which happens with this stacktrace:

image

Now, there is also another cleanup, though this time it's transaction cleanup, which is called when transaction is committed through a callback:

image

image

Now, here's the problem. The first cleanup returns the connection from the transaction pool to the default pool, allowing other threads to start using that connection. But, due to callback never being removed, the second cleanup detects another transaction and ultimately dooms that connection, which leads to various errors.
Ultimately, that callback should have been removed the moment the transaction completes, maybe in DetatchTransaction:

internal void DetachTransaction(Transaction transaction, bool isExplicitlyReleasing)

@coleman-c
Copy link
Author

Looks like this was fixed with the fix in the following PR: #2301

Thanks @DavoudEshtehari!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

7 participants