Fix Cancel #234

Wraith2 · 2019-09-29T17:39:23Z

Fixes #44

Makes the parser lock in the internal Cancel method (called once from SqlCommand or SqlDataReader whichever is active) optional in the case of Cancel. This means that the send attention needed to cancel the running query is not blocked until the query completes. There's a new test to verify the behaviour derived from the bug report.

cheenamalhotra · 2019-09-30T17:19:11Z

Thanks @Wraith2 for coming up with the PR. I do see that in NetFx code (which works fine), the implementation for Cancel() is based on "objectId" instead of "caller" as in NetCore, which looks like a major difference to me. Hence the condition changes as under:

NetFx implements condition in Cancel as under:

SqlClient/src/Microsoft.Data.SqlClient/netfx/src/Microsoft/Data/SqlClient/TdsParserStateObject.cs

Line 642 in 29f639a

if ((!_cancelled) && (objectID == _allowObjectID) && (objectID != -1))

while in NetCore it is done as:

SqlClient/src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParserStateObject.cs

Line 607 in 29f639a

if ((!_cancelled) && (_cancellationOwner.Target == caller))

Would you please verify the behavior and attempt to match NetFx if it's possible, we do want to avoid any unknown implications of making patches.

Wraith2 · 2019-09-30T19:08:54Z

The ObjectID is just a number assigned to the command object when it's created, it's monatonically incremented in the ctor. It's used for tracing with the Bid object that was removed from the corefx version. If you trace the calls to TdsStartParser.Dispose they both go through command. It also seems to be used for identifying self in some cases to make sure the thing that's being operated on Isn't another connection.

So as far as I can see it's just a different way of expressing the question of whether the target is the one that is expected.

Now, which one is better I couldn't tell you. The provenance of both versions of this code are opaque to me. I can see the code but the reasons behind any of the decisions that formed it aren't available to me. If you know a particular approach is better and should be followed then you'll need to instruct me. If you need reasons behind anything other than changes I've made you'll need to track down the original authors or version history notes, I can't access any of that information even though it would be incredibly helpful to me.

David-Engel · 2019-09-30T23:26:03Z

Interesting. The new test is failing against netfx...

cheenamalhotra · 2019-09-30T23:37:29Z

Okkk.. It's actually causing the build to hang. 🤔
I didn't pay attention to builds today.

Looking at #44, the user reported it works fine in SQL Server Management Studio which is System.Data.SqlClient (NetFx). That gave me an impression, this was NetCore specific.

But now it fails with Microsoft.Data.SqlClient (NetFx), so something has happened in between 🤔

Let me verify this test scenario with S.D.S, M.D.S both NetFx and NetCore, and I'll get back on this. May have to drill down history to find change that caused this.

Wraith2 · 2019-09-30T23:57:47Z

We could try using the tests added in dotnet/corefx#38271 which use delay and timers instead of an infinite loop, that'd prevent the build hang.

I'm not sure what's different between corefx and this repo that means we see a different cancel behaviour, if this issue was in corefx my other PR shouldn't fix the issue since this one is in fully shared code and the other is in a specific SNI implementation.

The report of it working in management studio is also confusing unless management studio uses another connection (or mars) to cancel it.

Wraith2 · 2019-10-17T10:51:32Z

I've updated this PR to use the test I mentioned. Those tests use DELAY so they won't hang the test runs.

cheenamalhotra · 2019-10-17T20:46:36Z

The changes fail on Unix, please check logs below:

Ubuntu-SQL.zip
Ubuntu-Azure.zip

Wraith2 · 2019-10-17T21:32:08Z

Yes, they will do because of #248 which we've already discussed.

There are also tests failing because they're just wrong. For example PlainCancelAsync is failing because it's expecting a specific exception and message which I haven't changed. So was it failing before? I can't tell because I don't have a way to run the unix tests without hardcoding managed mode because I can't run tests in debug mode.

It really feels like you're trying to make even simple things much more difficult than they need to be and I don't understand why.

cheenamalhotra · 2019-10-17T22:14:29Z

So can we instead merge both PRs at one place and then attempt to fix issue? It sounds like both PRs are dependent on each other.

Some tests do fail randomly so little bit of noise if they're not related you can avoid, but I'm only looking into new test failures, that fail consistently even when testing locally.

cheenamalhotra · 2019-10-17T22:18:38Z

The 4 tests failing here as in logs are New failures coming from this PR:

Samirat · 2019-10-22T19:29:05Z

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParserStateObject.cs

@@ -598,41 +598,41 @@ internal void Cancel(object caller)
                // Keep looping until we either grabbed the lock (and therefore sent attention) or the connection closes\breaks
                while ((!hasLock) && (_parser.State != TdsParserState.Closed) && (_parser.State != TdsParserState.Broken))


If we don't get the lock, won't this loop / send attention multiple times? Is that intended?

Samirat · 2019-10-22T19:36:49Z

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParserStateObject.cs

+                    // try to take the lock so that if another command is attempted it will queue on the lock
+                    // but don't require that the lock be taken because otherwise attention cannot be sent
+                    // during command execution causing cancellation to wait
+                    // This lock is also protecting against concurrent close and async continuations
                    Monitor.TryEnter(this, _waitForCancellationLockPollTimeout, ref hasLock);


What's the point of taking a lock at all here? If we actually need to prevent new commands being started while sending attention, then we can't allow this lock to be optional, and if we don't, then why have it at all.

Samirat · 2019-10-22T19:38:11Z

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParserStateObject.cs

                                {
-                                    try
+                                    _parser.Connection._parserLock.Wait(canReleaseFromAnyThread: false, timeout: _waitForCancellationLockPollTimeout, lockTaken: ref hasParserLock);


Feels like there should be an async version of this function. If the locks were async, wouldn't that also fix the cancellation hang?

add cancel running query test and remove state lock in cancel method

e529adc

Wraith2 changed the title ~~add cancel running query test and remove state lock in cancel method~~ Fix Cancel Sep 29, 2019

Wraith2 mentioned this pull request Oct 8, 2019

Port CoreFx PR 38271: Fix Statement Command Cancellation (Managed SNI) #248

Merged

Wraith2 added 2 commits October 17, 2019 11:38

use delay wait in cancel tests to avoid hanging test runner

27cf2e5

remove commented out test

8016662

Samirat reviewed Oct 22, 2019

View reviewed changes

Wraith2 closed this Oct 24, 2019

Wraith2 deleted the bug-44 branch June 29, 2021 22:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Cancel #234

Fix Cancel #234

Wraith2 commented Sep 29, 2019

cheenamalhotra commented Sep 30, 2019

Wraith2 commented Sep 30, 2019

David-Engel commented Sep 30, 2019

cheenamalhotra commented Sep 30, 2019 •

edited

Loading

Wraith2 commented Sep 30, 2019

Wraith2 commented Oct 17, 2019

cheenamalhotra commented Oct 17, 2019

Wraith2 commented Oct 17, 2019

cheenamalhotra commented Oct 17, 2019

cheenamalhotra commented Oct 17, 2019

Samirat Oct 22, 2019 •

edited

Loading

Samirat Oct 22, 2019

Samirat Oct 22, 2019 •

edited

Loading

		@@ -598,41 +598,41 @@ internal void Cancel(object caller)
		// Keep looping until we either grabbed the lock (and therefore sent attention) or the connection closes\breaks
		while ((!hasLock) && (_parser.State != TdsParserState.Closed) && (_parser.State != TdsParserState.Broken))

Fix Cancel #234

Fix Cancel #234

Conversation

Wraith2 commented Sep 29, 2019

cheenamalhotra commented Sep 30, 2019

Wraith2 commented Sep 30, 2019

David-Engel commented Sep 30, 2019

cheenamalhotra commented Sep 30, 2019 • edited Loading

Wraith2 commented Sep 30, 2019

Wraith2 commented Oct 17, 2019

cheenamalhotra commented Oct 17, 2019

Wraith2 commented Oct 17, 2019

cheenamalhotra commented Oct 17, 2019

cheenamalhotra commented Oct 17, 2019

Samirat Oct 22, 2019 • edited Loading

Choose a reason for hiding this comment

Samirat Oct 22, 2019

Choose a reason for hiding this comment

Samirat Oct 22, 2019 • edited Loading

Choose a reason for hiding this comment

cheenamalhotra commented Sep 30, 2019 •

edited

Loading

Samirat Oct 22, 2019 •

edited

Loading

Samirat Oct 22, 2019 •

edited

Loading