-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New dhcp state machine #95
Conversation
…n new state machine.
…Added some tests for helper functions.
…fixed length, other small changes.
…t.socket methods.
…m offered IPv4 is not in use.
I think I've successfully merged the changes. Squished a bug and increased the buffer size from 318 to 512 after receiving a record breaking DHCP response.
|
@BiffoBear I've gotten wireshark set up and have figured out how to capture the traffic in there. I have seen a few different ways that it can fail with the version from this PR, and unfortunately have not had a complete success yet with this version. For one thing, I think there may be some other "timeout 1000 seconds" issue that remains in this version (I know #97 solved something like that). The behavior that I see most typically is the device seemingly hanging or waiting forever after:
from the debug output. I added an additional debug output between the loop and the if statement here:
And from that I can see that it does have ~ 1000 seconds on the timeout counter.
This log file contains the output from debug logs as well as the packet copied from wireshark for one of these "hang forever" attempts. This log file contains wireshark output from currently released library (not this PR) doing a successfull DHCP negotiation (successful HTTP requests occured after this). This log file contains output from this PR version getting and printing an IP address successfully, but then seems to just hang afterward. Never prints the hostname, or any other request responses. Wireshark dumps from the two packets that were send during this time are included. So far, on all attempts with the version from this PR I either:
I probably have not waited the full 1000 seconds if there are still timeouts like that though, so perhaps my "seemingly hang forever" would have actually done something else with more time. I only added additional log messages to try to verify that during the last bit of my testing, much of it occurred before I had those extra prints. All testing was with simpletest from this repo on Feather ESP32-S2 TFT. |
Great catch on the timeout. That's a bug where I was calculating a timeout based on the value of I'm perplexed by your Wireshark logs. Your I've pushed a fix for the timeout bug. I'll spend some time decoding the DHCP packets and get back to you later about that. Thanks for your review! |
@FoamyGuy The output from your file
These print statements execute after I'll dig out a w5500 that I have knocking around and see whether I get the same issue with that. I've been testing on a w5100s. |
@FoamyGuy I think I've found where it is hanging. I've added a timeout that raises a RuntimeError to break the hang some debug statements that should reveal where the error is occurring. This may not address the underlying cause of the hang though, so I've made the PR a draft for now. When you have a moment I would appreciate a copy of the serial out with debugging on for a failed connection, in case your wonderfully weird network is doing something different to mine. Thx! |
@BiffoBear here are logs from the two different states I'm seeing with the latest version. This log is after a cold boot up (i.e. unplug / replug) it is partially successful it seems like it got the DNS lookup of adafruit.com done successfully (though outputs as hex instead of dotted string). But then it failed after that with an exception that is raised. After that cold boot and partial success if I then do ctrl-C / ctrl-D to restart the script and have it run again I seem to always end up with a "retried 3 times and failed" type error. Here is the log from this cast: Let me know if there is anything else that would be helpful to provide or test out on my end. Thank you so much for sticking with this and all of the improvements you've submitted for this library in this PR and beyond! |
Thanks, that partial success showed me where my logic was flawed. I've pushed a band-aid fix. Once this PR is done, I've got some ideas for the I don't see why there is apparently no reply from the DHCP server after a warm reboot. That really puzzles me. Please would you record the exchange in Wireshark? If you use the capture filter Thanks for all your help and encouragement. No rush. Martin |
@FoamyGuy I may have been barking up the wrong tree. I've added an explicit reset to the w5500 detection code. When you have a moment, please see if this solves your no DHCP after soft reset issue. Thanks! |
@BiffoBear I have tested out the latest versions Unfortunately the latest commit seems to cause more issues in my environment. If I test the absolute latest version including the newest commit I consistently end up with In my local copy I commented out the lines added in the latest commit so that I could test the version from before that and capture some additional logs. All of this output was obtained on a Feather S2 TFT with the latest version minus the last commit with the reset command, that is commented out in the version running while these logs were captured: debug=True logs from serial console: wireshark logs from failed DHCP attempt: Wireshark logs from a successful DHCP and simpletest exectution: My results with this version seem to be consistently that after a cold boot up it will successfully connect, get an IP, and successfully fetch data from the internet. Then if I ctrl-D / ctrl-C to reset it will consistently fail with After a full power cycle it boots up and succeeds fully 1 time again, then upon first reset is back to the 'no response' state. |
OK. I'll save it as a draft and wait until I can get my hands on a W5500 to test against Thanks for checking! |
@FoamyGuy This time I promise 🤪! I fixed my w5500 so I could test on it. On the current main branch I get the error after 5 x ctrl-D with a w5500, never with a w5100s. On the latest commit for this PR I don't get any errors after 10 x ctrl-D on the W5500 or the w5100s. |
@BiffoBear nice! I gave the latest version a try and I am getting continued success with it after several resets, running the simpletest. I'm thinking this is likely good now, but I am going to plan to come back once more and do some more thorough testing with more of the examples. Thank you so much for seeing it through and making all of these improvements! |
@FoamyGuy And thank you for being such a patient, insightful and thorough reviewer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are looking good to me at this point.
I tested successfully several of the examples on a Feather RP2040 v8.1.0-beta.2. Everything is functioning as intended and seems to run reliably even through several soft resets.
Thanks again for these improvements @BiffoBear!
Updating https://github.com/adafruit/Adafruit_CircuitPython_ATECC to 1.2.17 from 1.2.15: > Merge pull request adafruit/Adafruit_CircuitPython_ATECC#33 from FoamyGuy/fix_docs > Merge pull request adafruit/Adafruit_CircuitPython_ATECC#31 from brass75/issue_25/missing-typehints > Add upload url to release action > Add .venv to .gitignore > Update .pylintrc for v2.15.5 > Fix release CI files > Update pylint to 2.15.5 > Updated pylint version to 2.13.0 > Switching to composite actions Updating https://github.com/adafruit/Adafruit_CircuitPython_LC709203F to 2.3.0 from 2.2.11: > Merge pull request adafruit/Adafruit_CircuitPython_LC709203F#25 from jrrickerson/config_voltage_alarm Updating https://github.com/adafruit/Adafruit_CircuitPython_RockBlock to 1.3.12 from 1.3.11: > Merge pull request adafruit/Adafruit_CircuitPython_RockBlock#30 from zachariahpifer/add-type-annotations > Add upload url to release action > Add .venv to .gitignore Updating https://github.com/adafruit/Adafruit_CircuitPython_TCS34725 to 3.3.17 from 3.3.16: > Merge pull request adafruit/Adafruit_CircuitPython_TCS34725#42 from zemyblue/feat/support_tcs34727 > Add upload url to release action > Add .venv to .gitignore Updating https://github.com/adafruit/Adafruit_CircuitPython_Touchscreen to 1.2.1 from 1.2.0: > Merge pull request adafruit/Adafruit_CircuitPython_Touchscreen#25 from CedarGroveStudios/patch-1 Updating https://github.com/adafruit/Adafruit_CircuitPython_Wiznet5k to 2.5.0 from 2.4.2: > Merge pull request adafruit/Adafruit_CircuitPython_Wiznet5k#95 from BiffoBear/new_dhcp_state_machine Updating https://github.com/adafruit/Adafruit_CircuitPython_MacroPad to 2.2.0 from 2.1.8: > Merge pull request adafruit/Adafruit_CircuitPython_MacroPad#44 from kalehmann/feature/display_sleep > Add upload url to release action > Update .pylintrc for v2.15.5 > Fix release CI files > Update pylint to 2.15.5 > Updated pylint version to 2.13.0 > Switching to composite actions Updating https://github.com/adafruit/Adafruit_CircuitPython_SimpleIO to 3.0.10 from 3.0.9: > Merge pull request adafruit/Adafruit_CircuitPython_SimpleIO#67 from RossK1/adding_type_hints > Add upload url to release action > Add .venv to .gitignore Updating https://github.com/adafruit/Adafruit_CircuitPython_Bundle/circuitpython_library_list.md to NA from NA: > Updated download stats for the libraries
closes #94 New DHCP FSM implements exponential fallback and correctly renews or rebinds. The messages are sent in unicast mode. Also added a debug message helper function with a hexdump to improve code readability. DHCP messages are now sent and received without using the socket module. This should save some memory.