SqlCommand.Cancel() does not work on Linux #109

roji · 2019-05-17T00:56:36Z

SqlCommand.Cancel() doesn't appear to work using System.Data.SqlClient on Linux (Ubuntu 19.04, .NET Core SDK 3.0.100-preview5-011568) - the command runs to completion, and only then throws SqlException. The issue does not seem to repro on Windows (with .NET Core).

Repro:

using (var cmd = new SqlCommand("WAITFOR DELAY '00:00:05'", conn))
{
    // sync cancellation
    Task.Delay(1000).ContinueWith(t => cmd.Cancel());
    var sw = Stopwatch.StartNew();
    try
    {
        cmd.ExecuteNonQuery();
    }
    catch (Exception e)
    {
        Console.WriteLine($"Sync cancellation: caught {e.GetType().Name} after {sw.ElapsedMilliseconds}ms");
        // Does not interrupt the call (completes after 5 seconds) but still generates SqlException at the end
    }

    // async cancellation
    sw.Restart();
    try
    {
        await cmd.ExecuteNonQueryAsync(new CancellationTokenSource(1000).Token);
    }
    catch (Exception e)
    {
        Console.WriteLine($"Async cancellation: caught {e.GetType().Name} after {sw.ElapsedMilliseconds}ms");
        // Interrupts and generates SqlException immediately
    }
}

/cc @divega @karinazhou

The text was updated successfully, but these errors were encountered:

karinazhou · 2019-05-17T17:17:50Z

Thank you for repro steps. We will investigate more on this one.

David-Engel · 2019-05-17T17:40:36Z

Triage: This would be nice to fix for 1.0, but may not rise above other GA tasks at this point.

roji · 2019-05-18T07:08:34Z

FWIW it may be worth confirming whether this somehow stems from a more low-level issue (i.e. some sort of corefx networking issue) as opposed to an SqlClient-issue.

Wraith2 · 2019-05-20T23:45:16Z

It's a managed implementation issue. I've reproduced it on windows using your test script above and have added a new test.

There are lock(this) statements in the managed send and receive methods on SniTCPHandle. So what happens is you start a query which ultimately tracks down into Receive which takes the lock and then from another thread you call Cancel which tracks down into SendAttention and that tries to Send and blocks on the lock. When the receive times out the lock is released the send works, the attention is processed and everything unravels in the correct order just a lot later that you want it to.

I think send and receive aren't mutually exclusive but that each one is not concurrent with itself. Changing the lock to be operation specific consumes a bit of object space but there shouldn't be too many tcp handles around since they're reused. It will need some careful testing because locking in this library is definitely "here be dragons" territory.

roji · 2019-05-21T05:16:54Z

@Wraith2 your explanation makes perfect sense, and I can see how allowing concurrent send/receive is not to be done lightly... :)

saurabh500 · 2019-05-21T05:56:49Z

My 2 c. Send and receive are mutually exclusive. All sends are also mutually exclusive except for the Attention packet.
Good catch @Wraith2
Cancel ATTN packet is the only packet in the protocol which can be sent out of band.

saurabh500 · 2019-05-21T05:57:30Z

This is a bug, and I can understand based on @Wraith2 explanation why this is happening. What is the fix?

Wraith2 · 2019-05-21T07:44:26Z

Based on your assertions i think the fix is to change the Send implementation to require the lock unless the packet is detectably OOB (ATTN is the only one of these i know of)) from the packet header.

saurabh500 · 2019-05-21T07:48:02Z

Uploading a snippet from the TDS protocol to make this clearer.

saurabh500 · 2019-05-21T07:50:08Z

I would say, the client should try to send the attention packet ASAP, by inserting it into the TDS stream, as soon as it finishes writing the current packet.

All sends are also mutually exclusive except for the Attention packet.

To refine this further, all requests are send independently of each other except for ATTN packet which can be interleaved in between other packets for an ongoing request.

Wraith2 · 2019-05-21T07:58:46Z

So i think my proposal is to have two locks and these rules.

all in-band send and receive operations MUST aquire the in-band lock (currently Monitor.Enter(this))
out-of-band communications MAY aquire (would be Monitor.TryEnter(this, out aquired) and detect attn packet header flag)
all send operations MUST aquire the send lock (new _sendLock, lock(_sendLock))

roji · 2019-05-21T08:09:07Z

Just be careful you don't end up in a state where you're writing an out-of-band cancellation, without having acquired the lock, just as a user send starts - with the two interfering with one another. I'm not sure of the cross-OS guarantees with regards to atomicity of a single write (i.e. is a ~~read~~ write guaranteed to complete without interference even if another write is in progress).

FYI in PostgreSQL the protocol obviates this kind of problem by making clients send cancellations on a totally separate TCP connection (although that sometimes creates its own difficulties).

Wraith2 · 2019-05-21T08:15:18Z

Just be careful you don't end up in a state where you're writing an out-of-band cancellation, without having acquired the lock, just as a user send starts - with the two interfering with one another

I don't think it would be possible with my suggested scheme. the attn would not have the in-band lock but must have the send lock, the first in-band would finish and release the in-band lock, another send could then get the in-band lock but would always block on the send lock which can only complete when the attn has been sent.

Wraith2 · 2019-05-21T19:51:15Z

I've tried the idea above. It works as far as I can tell. The full manual test suite runs through without unusual problems (TVPMain really need fixing at some point).

It's a bit annoying fix because of the current packet cloning which I've fixed in the managed networking rework PR. It will also be blocked by the same issue as that PR dotnet/corefx#35363 where an unrelated test that fails anyway will continue to fail so it won't get merged. So once things get unblocked I'll be able to fix this.

roji · 2019-05-22T04:18:23Z

@Wraith2 great to hear. Note @divega's comment here - we do have async cancellation working, it's probably worth understanding how/why. If we have totally different mechanisms for sync/async cancellation (which we probably do as one is buggy and not the other), it may be worth exploring merging them at least to some extent (unless there's a good reason).

It may also be worth working on this issue and on #44 together, as both are cancellation-related.

roji mentioned this issue May 17, 2019

SQL.OperationCancelled Returns SqlException Instead of OperationCancelledException #51

Closed

David-Engel added this to the 1.0.0 milestone May 17, 2019

David-Engel added the 🙌 Up-for-Grabs Issues that are ready to be picked up for anyone interested. Please self-assign and remove the label label May 17, 2019

divega mentioned this issue May 22, 2019

Canceling SQL Server query with while loop hangs forever #44

Open

Wraith2 mentioned this issue Jun 5, 2019

SqlClient Fix managed command cancellation dotnet/corefx#38271

Closed

cheenamalhotra modified the milestones: 1.0.0, 1.1.0 Jul 25, 2019

cheenamalhotra mentioned this issue Oct 8, 2019

Port CoreFx PR 38271: Fix Statement Command Cancellation (Managed SNI) #248

Merged

cheenamalhotra modified the milestones: 1.1.0, 1.2.0 Nov 20, 2019

cheenamalhotra closed this as completed in #248 Jan 8, 2020

cheenamalhotra removed the 🙌 Up-for-Grabs Issues that are ready to be picked up for anyone interested. Please self-assign and remove the label label Jan 8, 2020

fredericDelaporte mentioned this issue Jun 26, 2022

Added support for the CancelQuery() method in IStatelessSession nhibernate/nhibernate-core#3074

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SqlCommand.Cancel() does not work on Linux #109

SqlCommand.Cancel() does not work on Linux #109

roji commented May 17, 2019

karinazhou commented May 17, 2019

David-Engel commented May 17, 2019

roji commented May 18, 2019

Wraith2 commented May 20, 2019

roji commented May 21, 2019

saurabh500 commented May 21, 2019

saurabh500 commented May 21, 2019

Wraith2 commented May 21, 2019

saurabh500 commented May 21, 2019

saurabh500 commented May 21, 2019

Wraith2 commented May 21, 2019 •

edited

Loading

roji commented May 21, 2019 •

edited

Loading

Wraith2 commented May 21, 2019

Wraith2 commented May 21, 2019

roji commented May 22, 2019

SqlCommand.Cancel() does not work on Linux #109

SqlCommand.Cancel() does not work on Linux #109

Comments

roji commented May 17, 2019

karinazhou commented May 17, 2019

David-Engel commented May 17, 2019

roji commented May 18, 2019

Wraith2 commented May 20, 2019

roji commented May 21, 2019

saurabh500 commented May 21, 2019

saurabh500 commented May 21, 2019

Wraith2 commented May 21, 2019

saurabh500 commented May 21, 2019

saurabh500 commented May 21, 2019

Wraith2 commented May 21, 2019 • edited Loading

roji commented May 21, 2019 • edited Loading

Wraith2 commented May 21, 2019

Wraith2 commented May 21, 2019

roji commented May 22, 2019

Wraith2 commented May 21, 2019 •

edited

Loading

roji commented May 21, 2019 •

edited

Loading