-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dcd_nrf5x: fix race condition #2626
Conversation
…er() needs DI/EI at one point
@rgrr could you please tell me more about how you tested this? |
No, unfortunately I have no logs from a USB analyzer. My testcase was a program under windows, USB nRF dongle connected to the PC with CDC-ACM in it. The dongle is more or less a gateway between a UART based packet protocol and BLE. The test program basically did
We have plenty of implementations of the Dongle firmware, e.g. with nRF52832 via UART or nRF52840 via UART or nRF52840 via CDC-ACM (Nordic stack) or nRF52840 via CDC-ACM with TinyUSB. The dongle with TinyUSB showed the behavior that it sometimes (randomly) crashed during the COM port connect phase. Code inspection showed, that a parameter of one call was wrong. Hope this helps a little bit. |
@rgrr so TinyUSB was build without any rtos (FreeRTOS/mynewt/other)? |
yes |
@rgrr when you use TinyUSB functions, do your code calls them from other interrupts (not just USB one)? |
The application it self is mainloop based. The mainloop das the call to "tud_task()", everything else in the CDC-ACM handling is done via the callbacks. "proc_soc()" (some power handling of the nRF driver, part of the TinyUSB implementation) is also done from the mainloop. The CDC handler itself use the regular callbacks. So... no, I do no calls from interrupt context, everything is done from the mainloop. |
forgot: compiler is clang 13.0.0, used in many projects in my company. |
@rgrr First change where The second part that blocks all interrupts for duration of two calls is not clear. It's not obvious why it should fix anything. |
Yes, I agree: the first case is clear, the second not so much. That was a lot of try and guess: first made DI/EI over the whole function -> no more crashes. Then narrowed it down to the two calls. Never actually determined who were the (double) callers. If you want to, I will try to find out, but that will take some time because I'm currently very busy. |
Without convincing explanation I can't really give my blessing. I'm pretty sure that DE/EI will not break anything but I'm not sure it really fixes problem that may pop up in different place later. |
I understand your position. I will revert the second case and commit the mini change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks good, thank you and @kasjer for reviewing
Describe the PR
The PR fixes two race conditions in the dcd_nrf5x driver. The first one fixes a function invocation from edpt_dma_start() which is called also from interrupt context, the second one (DI/EI) is required to protect a code section from interrupts.
Additional context
The actual problems (crashes/hangups) appeared during repeated connect / disconnect to a CDC-ACM on the nRF5x. The above changes fixed the situation.