Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SqlCommand.Cancel() does not work on Linux #109

Closed
roji opened this issue May 17, 2019 · 15 comments · Fixed by #248
Closed

SqlCommand.Cancel() does not work on Linux #109

roji opened this issue May 17, 2019 · 15 comments · Fixed by #248
Milestone

Comments

@roji
Copy link
Member

roji commented May 17, 2019

SqlCommand.Cancel() doesn't appear to work using System.Data.SqlClient on Linux (Ubuntu 19.04, .NET Core SDK 3.0.100-preview5-011568) - the command runs to completion, and only then throws SqlException. The issue does not seem to repro on Windows (with .NET Core).

Repro:

using (var cmd = new SqlCommand("WAITFOR DELAY '00:00:05'", conn))
{
    // sync cancellation
    Task.Delay(1000).ContinueWith(t => cmd.Cancel());
    var sw = Stopwatch.StartNew();
    try
    {
        cmd.ExecuteNonQuery();
    }
    catch (Exception e)
    {
        Console.WriteLine($"Sync cancellation: caught {e.GetType().Name} after {sw.ElapsedMilliseconds}ms");
        // Does not interrupt the call (completes after 5 seconds) but still generates SqlException at the end
    }

    // async cancellation
    sw.Restart();
    try
    {
        await cmd.ExecuteNonQueryAsync(new CancellationTokenSource(1000).Token);
    }
    catch (Exception e)
    {
        Console.WriteLine($"Async cancellation: caught {e.GetType().Name} after {sw.ElapsedMilliseconds}ms");
        // Interrupts and generates SqlException immediately
    }
}

/cc @divega @karinazhou

@karinazhou
Copy link
Member

Thank you for repro steps. We will investigate more on this one.

@David-Engel David-Engel added this to the 1.0.0 milestone May 17, 2019
@David-Engel
Copy link
Contributor

Triage: This would be nice to fix for 1.0, but may not rise above other GA tasks at this point.

@David-Engel David-Engel added the 🙌 Up-for-Grabs Issues that are ready to be picked up for anyone interested. Please self-assign and remove the label label May 17, 2019
@roji
Copy link
Member Author

roji commented May 18, 2019

FWIW it may be worth confirming whether this somehow stems from a more low-level issue (i.e. some sort of corefx networking issue) as opposed to an SqlClient-issue.

@Wraith2
Copy link
Contributor

Wraith2 commented May 20, 2019

It's a managed implementation issue. I've reproduced it on windows using your test script above and have added a new test.

There are lock(this) statements in the managed send and receive methods on SniTCPHandle. So what happens is you start a query which ultimately tracks down into Receive which takes the lock and then from another thread you call Cancel which tracks down into SendAttention and that tries to Send and blocks on the lock. When the receive times out the lock is released the send works, the attention is processed and everything unravels in the correct order just a lot later that you want it to.

I think send and receive aren't mutually exclusive but that each one is not concurrent with itself. Changing the lock to be operation specific consumes a bit of object space but there shouldn't be too many tcp handles around since they're reused. It will need some careful testing because locking in this library is definitely "here be dragons" territory.

@roji
Copy link
Member Author

roji commented May 21, 2019

@Wraith2 your explanation makes perfect sense, and I can see how allowing concurrent send/receive is not to be done lightly... :)

@saurabh500
Copy link
Contributor

My 2 c. Send and receive are mutually exclusive. All sends are also mutually exclusive except for the Attention packet.
Good catch @Wraith2
Cancel ATTN packet is the only packet in the protocol which can be sent out of band.

@saurabh500
Copy link
Contributor

This is a bug, and I can understand based on @Wraith2 explanation why this is happening. What is the fix?

@Wraith2
Copy link
Contributor

Wraith2 commented May 21, 2019

Based on your assertions i think the fix is to change the Send implementation to require the lock unless the packet is detectably OOB (ATTN is the only one of these i know of)) from the packet header.

@saurabh500
Copy link
Contributor

image

Uploading a snippet from the TDS protocol to make this clearer.

@saurabh500
Copy link
Contributor

I would say, the client should try to send the attention packet ASAP, by inserting it into the TDS stream, as soon as it finishes writing the current packet.

All sends are also mutually exclusive except for the Attention packet.

To refine this further, all requests are send independently of each other except for ATTN packet which can be interleaved in between other packets for an ongoing request.

@Wraith2
Copy link
Contributor

Wraith2 commented May 21, 2019

So i think my proposal is to have two locks and these rules.

  1. all in-band send and receive operations MUST aquire the in-band lock (currently Monitor.Enter(this))
  2. out-of-band communications MAY aquire (would be Monitor.TryEnter(this, out aquired) and detect attn packet header flag)
  3. all send operations MUST aquire the send lock (new _sendLock, lock(_sendLock))

@roji
Copy link
Member Author

roji commented May 21, 2019

Just be careful you don't end up in a state where you're writing an out-of-band cancellation, without having acquired the lock, just as a user send starts - with the two interfering with one another. I'm not sure of the cross-OS guarantees with regards to atomicity of a single write (i.e. is a read write guaranteed to complete without interference even if another write is in progress).

FYI in PostgreSQL the protocol obviates this kind of problem by making clients send cancellations on a totally separate TCP connection (although that sometimes creates its own difficulties).

@Wraith2
Copy link
Contributor

Wraith2 commented May 21, 2019

Just be careful you don't end up in a state where you're writing an out-of-band cancellation, without having acquired the lock, just as a user send starts - with the two interfering with one another

I don't think it would be possible with my suggested scheme. the attn would not have the in-band lock but must have the send lock, the first in-band would finish and release the in-band lock, another send could then get the in-band lock but would always block on the send lock which can only complete when the attn has been sent.

@Wraith2
Copy link
Contributor

Wraith2 commented May 21, 2019

I've tried the idea above. It works as far as I can tell. The full manual test suite runs through without unusual problems (TVPMain really need fixing at some point).

It's a bit annoying fix because of the current packet cloning which I've fixed in the managed networking rework PR. It will also be blocked by the same issue as that PR dotnet/corefx#35363 where an unrelated test that fails anyway will continue to fail so it won't get merged. So once things get unblocked I'll be able to fix this.

@roji
Copy link
Member Author

roji commented May 22, 2019

@Wraith2 great to hear. Note @divega's comment here - we do have async cancellation working, it's probably worth understanding how/why. If we have totally different mechanisms for sync/async cancellation (which we probably do as one is buggy and not the other), it may be worth exploring merging them at least to some extent (unless there's a good reason).

It may also be worth working on this issue and on #44 together, as both are cancellation-related.

@cheenamalhotra cheenamalhotra removed the 🙌 Up-for-Grabs Issues that are ready to be picked up for anyone interested. Please self-assign and remove the label label Jan 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants