-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missed synchronization #826
Comments
the |
In the scenario I'm offering, there are already two threads (cancellation has been already started while Connect is still in progress). So first cancellation thread passes all "fail" conditions, then main thread passes all "success" conditions and returnes the socket. CTS.Dispose/finally does not affect this scenario anyhow. What's left: socket is returned and being used, same time cancellation thread is scheduled to dispose that socket and I believe timing of that can not be anyhow estimated. For example, if cancellation thread came from pool, it may be has just finished heavy job and is now planned after all other N threads to finish same heavy N-jobs. As you can imagine, this case is nearly impossible to reproduce on purpose. I've just searched issues by "ObjectDisposed" keyword and there are at least 2 from top 20 which describes similar behavior:
I don't see any other option but to do some synchronization. Probably fastest would be to introduce "interlocked exchange". But this code also relies on try/catch{} and number of non-reproducable issues should grow too fast to maintain. The best in my opinion would be to introduce Socket.Connect(timeout) (timeout of IO operation seems logical to me). |
If the need were acute the method could be added to the runtime but that's an issue for /runtime and even if it were added the first shipping product that contained it would be .net6 and we couldn't use it in this library on downlevel runtimes so it wouldn't be very useful. I understand what you're saying and agree that there is the possibility of returning a disposed socket. However there is no causation proof and I've debugged my way through too many incorrect assumptions to think that we can assert that this causes an observable problem. Is there a way that we can either 1) fix the issue you've highlighted without externally observable effect? or 2) enhance the tracability of the |
I have found this issue to be reported and reproduced, repro is here: #449 (comment) It describes exactly same behavior as I do. Randomness of this issues are predicted by the theory. The correctness of assumptions in this tickets can be proven by static code-analysis tools. This is my next point: code-anlysis is proposed specifially to avoid hard to observe issues and I believe this should be used in SqlClient.
I think last option which I offered should do this: ConnectAsync should be used for easy synchronization of 2 tasks. Or again any synchronization eventually will do the thing.
As I said I see it's much more easier to fix the thing relying on code-analysis than to trace it. |
Am not following this issue in details, but if you're looking for a socket timeout without doing async, then there's already a way for doing this. You can set the socket to non-blocking, call Connect (at which point you get a WouldBlock exception), and then use Socket.Select to wait for the connection to complete, but with a timeout. It's not incredibly pretty but it works. |
Thank you, this would work for us. But the code which we are discussing has been introduced under the sign of performance. Probably, we should avoid exceptions at least on success execution path |
@jinek as this is code which establishes a physical network connection (i.e. TCP handshake), I really doubt this extra exception would have any perf impact at all. I'd at least set up a quick BenchmarkDotNet benchmark to verify this rather than assuming it's true. |
Perhaps we're overthinking this. The caller handles null and not-connected cases by returning an sql error so why not just check the connected state of the return candidate after the cts has been disposed but before returning the value. At that point it can no longer be touched by the timer so it it's alive it'll stay alive and if it's dead we clean it up and return null allowing the caller to interpret that as a standard failure to connect, which is what it is. |
The socket passes checks (as it got connected) while same time Cancel method is already running. |
At the point when the cts is disposed either the cancel function has been invoked or it will not be invoked, as such either cancellation is done/pending or won't happen. But yes you're right due to the vagaries of threading we can't assume anything about ordering. So the socket array needs to change to a (Socket, int) class array and interlocks on the int used to guard socket access, 0-1 on cancel 0-2 on take. Have I ever mentioned that threading and async give me headaches? I also dislike allocations. I don't like there being an array allocated for just two things, especially when most of the time it's never going to hit the second one, and that second socket never gets disposed. Fancy working on this? |
By the way, anything in the world is either happened/gonna happen or won't happen 🤣
Yes. I would love sqlclient to work with multithreading. I'm still not sure about implementation details, but something like you mention should happen. |
@jinek thanks for bringing this up. While I was working on issue #422 I came a cross an error message on debug mode. (@Wraith2 you can reproduce the issue by attaching your project to the driver and run it under netcoreapp3.1-Debug. Do not forget to add the switch to use ManagedSNI). The error message was stating that we were trying to write to a disposed socket.
There has been another issue in System.Data.SqlClient regarding the socket in |
I've tried to run benchmark for implementations which we have discussed here. Solution offered by @roji almost equal in time to async solution:
Code is here: jinek@e9e7309 Here are implementations with async and exception (I've created delegate to substitute implementations at runtime): Async
Exception
Original was
As for me both solution look good, but solution by @roji does not have async code, which surely looks more attractive for sync code. Additionally, it allows to exclude the path of exception for infinite timeout case. @JRahnama Has been that "async" issue logged in this repository? |
@jinek the issue and its related materials could be followed at issue #583 and it also can be tracked back to a change in 2018 in corefx at this issue. |
I see, thank you. Can we try |
SqlClient-826 Missed synchronization Additionally: * Exceptions swallowing removed to satisfy CA1031 https://docs.microsoft.com/en-us/dotnet/fundamentals/code-analysis/quality-rules/ca1031 * InternalException refactored to satisfy ca1032 https://docs.microsoft.com/en-us/dotnet/fundamentals/code-analysis/quality-rules/ca1032 (Best practices https://docs.microsoft.com/en-us/dotnet/standard/exceptions/best-practices-for-exceptions#include-three-constructors-in-custom-exception-classes ) * InternalException constructor has been changed to public as class is not marked as sealed.
While investigating other issues I’ve found probably few hard to reproduce issue.
I haven’t seen them happening, but I’ve seen several issues which are claimed as hard to reproduce/happens under heavy load, so may be it could help there. In original PR (where code has been introduced) that was mentioned also.
This block accessing sockets array
SqlClient/src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SNI/SNITcpHandle.cs
Line 341 in 8ad8da6
is not synchronized with this one
SqlClient/src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SNI/SNITcpHandle.cs
Line 368 in 8ad8da6
The following order seems is possible:
In this case socket is first returned and then at random time disposed while being used (user does not receive any exceptions, connection just disposes).
But there are also many hard investigate synchronization/exceptions things there. I think may be that entire file and related should be checked/reviewed again.
The text was updated successfully, but these errors were encountered: