-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Cancel #234
Fix Cancel #234
Conversation
Thanks @Wraith2 for coming up with the PR. I do see that in NetFx code (which works fine), the implementation for NetFx implements condition in SqlClient/src/Microsoft.Data.SqlClient/netfx/src/Microsoft/Data/SqlClient/TdsParserStateObject.cs Line 642 in 29f639a
while in NetCore it is done as: SqlClient/src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParserStateObject.cs Line 607 in 29f639a
Would you please verify the behavior and attempt to match NetFx if it's possible, we do want to avoid any unknown implications of making patches. |
The ObjectID is just a number assigned to the command object when it's created, it's monatonically incremented in the ctor. It's used for tracing with the Bid object that was removed from the corefx version. If you trace the calls to TdsStartParser.Dispose they both go through command. It also seems to be used for identifying self in some cases to make sure the thing that's being operated on Isn't another connection. So as far as I can see it's just a different way of expressing the question of whether the target is the one that is expected. Now, which one is better I couldn't tell you. The provenance of both versions of this code are opaque to me. I can see the code but the reasons behind any of the decisions that formed it aren't available to me. If you know a particular approach is better and should be followed then you'll need to instruct me. If you need reasons behind anything other than changes I've made you'll need to track down the original authors or version history notes, I can't access any of that information even though it would be incredibly helpful to me. |
Interesting. The new test is failing against netfx... |
Okkk.. It's actually causing the build to hang. 🤔 Looking at #44, the user reported it works fine in SQL Server Management Studio which is System.Data.SqlClient (NetFx). That gave me an impression, this was NetCore specific. But now it fails with Microsoft.Data.SqlClient (NetFx), so something has happened in between 🤔 Let me verify this test scenario with S.D.S, M.D.S both NetFx and NetCore, and I'll get back on this. May have to drill down history to find change that caused this. |
We could try using the tests added in dotnet/corefx#38271 which use delay and timers instead of an infinite loop, that'd prevent the build hang. I'm not sure what's different between corefx and this repo that means we see a different cancel behaviour, if this issue was in corefx my other PR shouldn't fix the issue since this one is in fully shared code and the other is in a specific SNI implementation. The report of it working in management studio is also confusing unless management studio uses another connection (or mars) to cancel it. |
I've updated this PR to use the test I mentioned. Those tests use DELAY so they won't hang the test runs. |
The changes fail on Unix, please check logs below: |
Yes, they will do because of #248 which we've already discussed. There are also tests failing because they're just wrong. For example PlainCancelAsync is failing because it's expecting a specific exception and message which I haven't changed. So was it failing before? I can't tell because I don't have a way to run the unix tests without hardcoding managed mode because I can't run tests in debug mode. It really feels like you're trying to make even simple things much more difficult than they need to be and I don't understand why. |
So can we instead merge both PRs at one place and then attempt to fix issue? It sounds like both PRs are dependent on each other. Some tests do fail randomly so little bit of noise if they're not related you can avoid, but I'm only looking into new test failures, that fail consistently even when testing locally. |
@@ -598,41 +598,41 @@ internal void Cancel(object caller) | |||
// Keep looping until we either grabbed the lock (and therefore sent attention) or the connection closes\breaks | |||
while ((!hasLock) && (_parser.State != TdsParserState.Closed) && (_parser.State != TdsParserState.Broken)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't get the lock, won't this loop / send attention multiple times? Is that intended?
// try to take the lock so that if another command is attempted it will queue on the lock | ||
// but don't require that the lock be taken because otherwise attention cannot be sent | ||
// during command execution causing cancellation to wait | ||
// This lock is also protecting against concurrent close and async continuations | ||
Monitor.TryEnter(this, _waitForCancellationLockPollTimeout, ref hasLock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of taking a lock at all here? If we actually need to prevent new commands being started while sending attention, then we can't allow this lock to be optional, and if we don't, then why have it at all.
{ | ||
try | ||
_parser.Connection._parserLock.Wait(canReleaseFromAnyThread: false, timeout: _waitForCancellationLockPollTimeout, lockTaken: ref hasParserLock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like there should be an async version of this function. If the locks were async, wouldn't that also fix the cancellation hang?
Fixes #44
Makes the parser lock in the internal Cancel method (called once from SqlCommand or SqlDataReader whichever is active) optional in the case of Cancel. This means that the send attention needed to cancel the running query is not blocked until the query completes. There's a new test to verify the behaviour derived from the bug report.