Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for array access panics #12

Merged
merged 1 commit into from
Aug 14, 2024
Merged

Fixes for array access panics #12

merged 1 commit into from
Aug 14, 2024

Conversation

nand-nor
Copy link
Contributor

Description

Some minor band-aid fixes to these PRs in esp-hal, both panics have shown up on esp32c6 and esp32h2 devices.

No root cause yet (I have not had much time to spend on this lately) but may be worth noting this arose with the fixes applied in the linked PRs. So this may be indication that its a bug in the phy / mac layer logic, potentially in the ieee802154 crate or somewhere in esp-ieee802154 or one of the other dependencies. I have very limited evidence to base this on other than the output at the time of the panic, log shows where esp-ieee802154 caught and handled it without modifying the buffer, so the (invalid?) frame was passed as-is to esp-openthread for further processing & panics via similar logic. May be a red herring but sharing in case it is helpful.

More details

Theres two fixes included in this commit, one for avoiding a panic while processing a received frame and one for avoiding a panic during rssi parsing. Similar to this this fixes a panic that arises when esp-openthread is passed a frame with a 0th index containing a length larger than the receiving buffer. The other fix handles a panic I saw a handful of times, only on esp32h2 devices, at first start up, where checking for rssi panics because the value at raw.data[0] is 0, so have included a check for that. I can share log snippets too if helpful.

Also worth noting for these band-aids I opted to prioritize logging in the checks even though it makes things terribly long / messy to look at. Im on the fence as to whether it is actually worth it, so if there is a preference let me know & I'll fixup. For example the rcv frame parse check could be distilled down to something much cleaner like:

let len = (raw.data[0] as usize).min(RCV_FRAME_PSDU.len()).min(raw.data[1..].len());

Testing

The test for the rcv buffer fix is not exhaustive as it is not yet possible for me to consistently reproduce, but I have run the fix with the esp-openthread example bin and have not observed any regressions. The test checking the other rssi-related fix was easier as that did pop up somewhat frequently, once every 10 runs or so only on esp32h2 platforms. Im running longevity tests now and will follow up as needed.

@bjoernQ
Copy link
Collaborator

bjoernQ commented Aug 14, 2024

Thanks! I guess having these checks won't hurt even once we identify the root cause (but definitely finding that would be great).
Do you see the warn from https://github.com/esp-rs/esp-hal/blob/6b6e628940eec34e9a4cf0f68bfc3a7b9aac76e6/esp-ieee802154/src/raw.rs#L399 when this happens? Because in that code we previously already enqueued the faulty frame for processing 🤔

Copy link
Collaborator

@bjoernQ bjoernQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@bjoernQ bjoernQ merged commit fe5a519 into esp-rs:main Aug 14, 2024
@nand-nor
Copy link
Contributor Author

Thanks! I guess having these checks won't hurt even once we identify the root cause (but definitely finding that would be great). Do you see the warn from https://github.com/esp-rs/esp-hal/blob/6b6e628940eec34e9a4cf0f68bfc3a7b9aac76e6/esp-ieee802154/src/raw.rs#L399 when this happens? Because in that code we previously already enqueued the faulty frame for processing 🤔

Yea I do see that log since I applied that esp-hal fix locally, so that is the main driver behind my suspicion that something is happening at lower layer processing that passes up a bad frame (or, at least a frame with a bad length). So I think that somewhat helps divide the problem space a bit as long as it is not a red herring.

Hopefully will have free time soon to do some static analysis of the crates ( esp-ieee802154 or and it's dependencies like the ieee802154 crate) involved in the rcv path to see if that knocks loose ideas for further root causing. Would probably also be good to dust off my (very minimal) OpenOCD skills and see if I can set up gdb...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants