-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process cancelled tasks from SerialStream.Unix read/write request queues when a timeout occurs #55188
Conversation
Tagging subscribers to this area: @dotnet/area-system-io Issue DetailsSerialStream.Unix creates SerialStreamIORequest tasks and adds them to read/write queues. These queues are normally processed by the private method DoIORequest when IOLoop encounters Interop.PollEvents.POLLIN or Interop.PollEvents.POLLOUT events. If the serial line is quiet, POLLIN/POLLOUT events won't occur, so DoIORequest is never called. This results in SerialStreamIORequests building up in the read/write queues. This pull request fixes this by calling a new private method ProcessCancelledIORequests that clears cancelled tasks from the front of the queue. It is called before TimeoutExceptions are thrown. Resolves #55146
|
I created a the new method to specifically check for cancelled tasks rather than relying on just checking if the task was completed like DoIORequest does. My concern was if Task1 was cancelled but Task2 completed successfully and was waiting to be processed, calling DoIORequest to clear the the cancelled Task1 could pull the successfully completed Task2 from the queue prematurely. Because this is working entirely with private members, I'm not sure how I would create automated tests for this. |
@adamsitnik unsure who I should tag on this (so apologies in advance for the spam), but what can I do about these failing checks? They look like infrastructure failures, or failures stemming from infrastructure failures. |
I've hit the re-run button (which is available only for maintainers). I am afraid that the only other way of re-running the failed runs is by closing and re-opening the PR or by pushing anything to the repo (which kind of sucks) |
Ha, totally understood. That's not entirely foreign to me either -- I run into similar problems at work. If those are acceptable alternative methods, I can keep trying until it passes or something comes up that can't be traced back to a timeout. |
@krwq This PR is assigned to you for follow-up/decision before the RC1 snap. The checks did all pass after rerunning. |
@krwq ping |
@dseeg were you able to test this with a real serial device? |
@jozkee I originally encountered the issue when I was using a Zigbee radio module connected to a Raspberry Pi via serial pins. I recreated the issue and tested my fix using an Arduino connected to a Raspberry Pi via USB. I made a sample project to reproduce the issue if you have an Arduino, a Raspberry Pi, and some spare time: https://github.com/dseeg/SerialPortMemoryLeak I'll admit that the exact conditions for this to occur for others are probably unlikely. In my case, it took some strangely written code and a 4 day weekend for me to become aware of it. Let me know if there's anything you would like me to explain, address, change, etc. |
@@ -807,6 +811,22 @@ private static int DoIORequest(ConcurrentQueue<SerialStreamIORequest> q, Request | |||
return 0; | |||
} | |||
|
|||
private static void ProcessCancelledIORequests(ConcurrentQueue<SerialStreamIORequest> q) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will be thread safe as is. Currently there is multiple threads which can enqueue and only one dequeuing. With this change you will have multiple threads potentially dequeueing. Since we do Peek & Dequeue here and IOLoop you might end up with one thread peeking and different one dequeuing. See i.e.
// assumes dequeue-ing happens on a single thread |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead consider calling this inside of the IOLoop in the idle state (make sure to write a comment about why we do that)
private static void ProcessCancelledIORequests(ConcurrentQueue<SerialStreamIORequest> q) | ||
{ | ||
// assumes dequeue-ing happens on a single thread | ||
while (q.TryPeek(out SerialStreamIORequest r)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider cleaning all completed instead, similarly as done here:
runtime/src/libraries/System.IO.Ports/src/System/IO/Ports/SerialStream.Unix.cs
Lines 793 to 798 in 98efcdf
if (r.IsCompleted) | |
{ | |
q.TryDequeue(out _); | |
// take another item since we haven't processed anything | |
continue; | |
} |
sorry for late review, I missed this notification - consider pinging me on teams/e-mail or assign this to my backlog for more reliable response |
@krwq thanks for helping review this. @adamsitnik @jozkee I think we should move this to 7.0 for the following reasons:
I'll move it to 7.0, but if someone has a good reason to keep it in 6.0, please let us know in the comments. |
@dseeg what's the status? there's some unaddressed feedback left here |
@krwq I made changes a while back based on your feedback but I was unable test it in a way I that I felt comfortable with it. A few other things came up in life so this was brought down a few pegs on my priority list. I should have notified someone. Apologies. I did get a ping about a week ago that suggests this may already have been resolved with the release of 6.0. I'll check and confirm. If I find that it's still not fixed, I'll need to see what I can do to rebuild my physical testing environment. |
It was my mistake not ensuring we were following up in a timely manner. I appreciate your understanding on our delays.
Thanks for your help here! I'm going to mark the PR as a Draft for the time being. Let us know if you need anything in the meantime on this. |
By the way, marking this as a draft gives us 30 days to conclude whether or not we should proceed with the PR before it's auto-closed as inactive. |
@jeffhandley I'm going to just go ahead and close it. For some reason I'm unable to build the project as I was able to before and it's preventing me from testing this in a way I'd feel comfortable about it, and I'm having immense trouble finding the time to identify why the build process no longer works. I'm also not satisfied with the changes I'm trying to test; I'm worried it could introduce performance issues. Which leads me to this snippet I found when looking at the contributing readme: |
Thank you, @dseeg. Do you think we should also close the backing #55146 issue too? I hope that this was still a positive experience for you overall and that you will consider submitting more pull requests into the .NET libraries. There are a quite a few other IO issues |
SerialStream.Unix creates SerialStreamIORequest tasks and adds them to read/write queues. These queues are normally processed by the private method DoIORequest when IOLoop encounters Interop.PollEvents.POLLIN or Interop.PollEvents.POLLOUT events.
If the serial line is quiet, POLLIN/POLLOUT events won't occur, so DoIORequest is never called. This results in SerialStreamIORequests building up in the read/write queues.
This pull request fixes this by calling a new private method ProcessCancelledIORequests that clears cancelled tasks from the front of the queue. It is called before TimeoutExceptions are thrown.
Resolves #55146