-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
k_queue_poll returns NULL with K_FOREVER #4358
Comments
@carlescufi : Can you provide more details about where it happens? Looking at k_queue_poll() code, it can be a situation that e.g. k_poll() returns that queue has elements, at which point context switches and some another thread gets this element out of it, and when context switches back, nothing is left. Any chance that you have multiple consumers? |
@pfalcon @carlescufi I think multiple consumers may be a likely scenario, since the use case here involves a global net_buf pool which many places of the stack may be trying to allocate buffers from. |
@jhedberg @pfalcon @cvinayak indeed there are multiple consumer threads for the net buf pool in this case, at least 3: prio_recv_thread: allocates cmd complete packets All are allocated in hci_raw from the same hci_rx_pool. It would seem then that our assumption that net_buf_alloc(... K_FOREVER) will always return a valid buf is not valid when you call it from multiple threads with the same pool? |
So the question for the kernel people, @andyross and @andrewboie would be: "can one call k_lifo_get(), i.e. k_queue_get() with K_FOREVER from multiple threads using a single queue and still expect to always get a pointer back that is not NULL, assuming no thread has explicitly cancelled the wait?" |
I think the non-NULL expectation makes the most sense to me, I can't think of advantages to having the API return NULL sometimes depending on how events line up |
I agree with @andrewboie wrt non-NULL expectations. Lots of code depends on this expectation. Looking at the non k_poll logic, _Swap returned a context safe non-NULL data ptr. I will be in favour of a similar k_poll irq locked return or k_poll returning with a guaranteed _current->base.swap_data something. @Vudentz your code snippet with while loop that @carlescufi is testing changes API behavior, which should follow API deprecation rules (to maintain old behavior for next 2 releases). That said, introduction of k_queue_poll has already broken that rule on master, by returning NULLs. |
@cvinayak The NULL return is a bug, Ive never said it wasn't. We could in theory do what mutex is doing: |
k_queue_get shall never return NULL when timeout is K_FOREVER which can happen when a higher priority thread cancel/take an item before the waiting thread. Fixes issue zephyrproject-rtos#4358 Signed-off-by: Luiz Augusto von Dentz <[email protected]>
k_queue_get shall never return NULL when timeout is K_FOREVER which can happen when a higher priority thread cancel/take an item before the waiting thread. Fixes issue #4358 Signed-off-by: Luiz Augusto von Dentz <[email protected]>
k_queue_get shall never return NULL when timeout is K_FOREVER which can happen when a higher priority thread cancel/take an item before the waiting thread. Fixes issue zephyrproject-rtos#4358 Signed-off-by: Luiz Augusto von Dentz <[email protected]>
k_queue_get shall never return NULL when timeout is K_FOREVER which can happen when a higher priority thread cancel/take an item before the waiting thread. Fixes issue #4358 Signed-off-by: Luiz Augusto von Dentz <[email protected]>
k_queue_get shall never return NULL when timeout is K_FOREVER which can happen when a higher priority thread cancel/take an item before the waiting thread. Fixes issue zephyrproject-rtos#4358 Signed-off-by: Luiz Augusto von Dentz <[email protected]>
When running with #4350, we still see instances on Cortex-M4 (nRF52832) where k_queue_poll returns NULL, traced down to sys_slist_get() returning NULL, something that should never happen. We can reproduce relatively easily, although it seems to be a race condition.
The text was updated successfully, but these errors were encountered: