Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k_queue_poll returns NULL with K_FOREVER #4358

Closed
carlescufi opened this issue Oct 16, 2017 · 9 comments
Closed

k_queue_poll returns NULL with K_FOREVER #4358

carlescufi opened this issue Oct 16, 2017 · 9 comments
Assignees
Labels
area: Kernel bug The issue is a bug, or the PR is fixing a bug priority: high High impact/importance bug
Milestone

Comments

@carlescufi
Copy link
Member

When running with #4350, we still see instances on Cortex-M4 (nRF52832) where k_queue_poll returns NULL, traced down to sys_slist_get() returning NULL, something that should never happen. We can reproduce relatively easily, although it seems to be a race condition.

@carlescufi carlescufi added this to the 1.9.2 milestone Oct 16, 2017
@carlescufi carlescufi added area: Kernel bug The issue is a bug, or the PR is fixing a bug labels Oct 16, 2017
@carlescufi
Copy link
Member Author

carlescufi commented Oct 16, 2017

@pfalcon @cvinayak @jhedberg FYI

@pfalcon
Copy link
Contributor

pfalcon commented Oct 16, 2017

@carlescufi : Can you provide more details about where it happens? Looking at k_queue_poll() code, it can be a situation that e.g. k_poll() returns that queue has elements, at which point context switches and some another thread gets this element out of it, and when context switches back, nothing is left. Any chance that you have multiple consumers?

@jhedberg
Copy link
Member

@pfalcon @carlescufi I think multiple consumers may be a likely scenario, since the use case here involves a global net_buf pool which many places of the stack may be trying to allocate buffers from.

@carlescufi
Copy link
Member Author

@jhedberg @pfalcon @cvinayak indeed there are multiple consumer threads for the net buf pool in this case, at least 3:

prio_recv_thread: allocates cmd complete packets
rx_thread: allocates events coming from the radio
tx_thread: allocates cmd complete

All are allocated in hci_raw from the same hci_rx_pool.

It would seem then that our assumption that net_buf_alloc(... K_FOREVER) will always return a valid buf is not valid when you call it from multiple threads with the same pool?

@carlescufi
Copy link
Member Author

carlescufi commented Oct 16, 2017

So the question for the kernel people, @andyross and @andrewboie would be: "can one call k_lifo_get(), i.e. k_queue_get() with K_FOREVER from multiple threads using a single queue and still expect to always get a pointer back that is not NULL, assuming no thread has explicitly cancelled the wait?"

@pfalcon
Copy link
Contributor

pfalcon commented Oct 16, 2017

Related patches: #777 , #1206

@andrewboie
Copy link
Contributor

I think the non-NULL expectation makes the most sense to me, I can't think of advantages to having the API return NULL sometimes depending on how events line up

@cvinayak
Copy link
Contributor

I agree with @andrewboie wrt non-NULL expectations. Lots of code depends on this expectation.

Looking at the non k_poll logic, _Swap returned a context safe non-NULL data ptr. I will be in favour of a similar k_poll irq locked return or k_poll returning with a guaranteed _current->base.swap_data something.

@Vudentz your code snippet with while loop that @carlescufi is testing changes API behavior, which should follow API deprecation rules (to maintain old behavior for next 2 releases). That said, introduction of k_queue_poll has already broken that rule on master, by returning NULLs.

@Vudentz
Copy link
Contributor

Vudentz commented Oct 17, 2017

@cvinayak The NULL return is a bug, Ive never said it wasn't. We could in theory do what mutex is doing:
https://github.com/zephyrproject-rtos/zephyr/blob/master/kernel/mutex.c#L155
Though the mutex code may also suffer from priority problems since the owner thread priority may be changed using k_thread_priority_set.

Vudentz added a commit to Vudentz/zephyr that referenced this issue Oct 18, 2017
k_queue_get shall never return NULL when timeout is K_FOREVER which can
happen when a higher priority thread cancel/take an item before the
waiting thread.

Fixes issue zephyrproject-rtos#4358

Signed-off-by: Luiz Augusto von Dentz <[email protected]>
nashif pushed a commit that referenced this issue Oct 18, 2017
k_queue_get shall never return NULL when timeout is K_FOREVER which can
happen when a higher priority thread cancel/take an item before the
waiting thread.

Fixes issue #4358

Signed-off-by: Luiz Augusto von Dentz <[email protected]>
carlescufi pushed a commit to carlescufi/zephyr that referenced this issue Oct 18, 2017
k_queue_get shall never return NULL when timeout is K_FOREVER which can
happen when a higher priority thread cancel/take an item before the
waiting thread.

Fixes issue zephyrproject-rtos#4358

Signed-off-by: Luiz Augusto von Dentz <[email protected]>
nashif pushed a commit that referenced this issue Oct 18, 2017
k_queue_get shall never return NULL when timeout is K_FOREVER which can
happen when a higher priority thread cancel/take an item before the
waiting thread.

Fixes issue #4358

Signed-off-by: Luiz Augusto von Dentz <[email protected]>
SebastianBoe pushed a commit to SebastianBoe/zephyr that referenced this issue Oct 19, 2017
k_queue_get shall never return NULL when timeout is K_FOREVER which can
happen when a higher priority thread cancel/take an item before the
waiting thread.

Fixes issue zephyrproject-rtos#4358

Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Kernel bug The issue is a bug, or the PR is fixing a bug priority: high High impact/importance bug
Projects
None yet
Development

No branches or pull requests

8 participants