-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel Issues on Idle RPi4s #3919
Comments
pinging @P33M as this looks like it might be a USB issue. What does the |
It's this: I cobbled together a quick pair of scripts that control the USB RGB LED. It writes a hex string to /dev/ttyACM0 once a second. I do recognize that I could probably throw a second sleep in the main loop (among many other optimizations), though system load caused by this script is quite minimal. All of the Pis, both affected and not, run it. The second script merely alters the contents of /tmp/led_color according to the system load. |
Hm. Why is xhci trying to expand the ring for an endpoint that gets poked with minimal data? Can you dump out the ring sizes for the device in question? Preferably when the ring expansion has failed at least once. Navigate (as root) to For each of the |
Here you go: my sandbox:
my dns server:
kubernetes master:
one of the workers:
I verified their lsusb output was identical to my sandbox (since I don't generally plug anything else into them, and they're all running the same basic image, they all had the same lsusb output). |
Ok that's a big smoking gun right there. There should be no need for 32k of TRBs in the ep ring of any device. Can you post the full output of |
I'll need a little time to come up with a stub script that does the same thing, and I'll post it here once I get it to produce the same behavior. |
I've watched the example script long enough to see my sandbox go from 16384 to 32768, so I've decided to pass this script on for replication.
Results best obtained on a nearly idle Pi (My most common victim just runs BIND9 and DHCPD, the second most common just sshd and outgoing ansible). It may take up to 3 days for the errors to trigger, but I've seen them crater as soon as 8 hours after the last reboot for the problem. Also, thank you for taking a look and being so quick! |
Ah. I can reproduce this on regular Raspbian with a Pi Zero pretending to be a CDC ACM device in about 3 minutes if I speed up the loop. Crucially, after a certain amount of time it looks like the dequeue pointer isn't being updated, so each time the enqueue pointer wraps around to the dequeue pointer position, the ring undergoes an expansion. Now to see if it's a Linux bug or something to do with our xhci controller... |
The issue is specific to the VL805 and independent of platform (fails in the same manner on PC). When the first link TRB in the transfer ring for the CDC interrupt IN endpoint is encountered, the dequeue pointer in the endpoint context forever ends up pointing to it. |
@pelwell this issue is not specific to 64bit - can it be moved to the linux repo please? For some reason I don't have the option to move it. |
I would if I could - I haven't got past pleb status on this repo. |
The issue is quickly reproducible with the test case because of a coincidence of factors.
Naively avoiding the link TRB by double-incrementing the TR dequeue pointer if the driver would otherwise stop on it does fix the interrupt endpoint, but breaks the bulk IN endpoint in some other manner. |
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. See: raspberrypi#3919 Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. See: raspberrypi#3919 Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. See: #3919 Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. See: #3919 Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. See: #3919 Signed-off-by: Jonathan Bell <[email protected]>
rpi-update firmware has a potential fix for this issue. Please test and report back. |
After eleven hours of testing with the new rpi-update firmware, I am happy to report that: Tue 27 Oct 2020 06:37:00 PM CDT everything seems to be working correctly. Thank you very much! |
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. See: #3919 Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. See: #3919 Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. See: #3919 Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. See: #3919 Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. See: #3919 Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. See: #3919 Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. See: #3919 Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. Link: #3919 [6.5.y Fixup - move downstream quirk bits further along] Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. Link: #3919 [6.5.y Fixup - move downstream quirk bits further along] Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. Link: #3919 [6.5.y Fixup - move downstream quirk bits further along] Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. Link: #3919 [6.5.y Fixup - move downstream quirk bits further along] Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. Link: #3919 [6.5.y Fixup - move downstream quirk bits further along] Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. Link: #3919 [6.5.y Fixup - move downstream quirk bits further along] Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. Link: #3919 [6.5.y Fixup - move downstream quirk bits further along] Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. Link: #3919 [6.5.y Fixup - move downstream quirk bits further along] Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. Link: #3919 [6.5.y Fixup - move downstream quirk bits further along] Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. Link: #3919 [6.5.y Fixup - move downstream quirk bits further along] Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. Link: #3919 [6.5.y Fixup - move downstream quirk bits further along] Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. Link: #3919 [6.5.y Fixup - move downstream quirk bits further along] Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. Link: #3919 [6.5.y Fixup - move downstream quirk bits further along] Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. Link: #3919 [6.5.y Fixup - move downstream quirk bits further along] Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. Link: #3919 [6.5.y Fixup - move downstream quirk bits further along] Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. Link: #3919 [6.5.y Fixup - move downstream quirk bits further along] Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. Link: #3919 [6.5.y Fixup - move downstream quirk bits further along] Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. Link: #3919 [6.5.y Fixup - move downstream quirk bits further along] Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. Link: #3919 [6.5.y Fixup - move downstream quirk bits further along] Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. Link: #3919 [6.5.y Fixup - move downstream quirk bits further along] Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. Link: #3919 [6.5.y Fixup - move downstream quirk bits further along] Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. Link: #3919 [6.5.y Fixup - move downstream quirk bits further along] Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. Link: #3919 [6.5.y Fixup - move downstream quirk bits further along] Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. Link: #3919 [6.5.y Fixup - move downstream quirk bits further along] Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. Link: #3919 [6.5.y Fixup - move downstream quirk bits further along] Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. Link: #3919 [6.5.y Fixup - move downstream quirk bits further along] Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. Link: raspberrypi#3919 [6.5.y Fixup - move downstream quirk bits further along] Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. Link: #3919 [6.5.y Fixup - move downstream quirk bits further along] Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. Link: #3919 [6.5.y Fixup - move downstream quirk bits further along] Signed-off-by: Jonathan Bell <[email protected]>
The VL805 controller can't cope with the TR Dequeue Pointer for an endpoint being set to a Link TRB. The hardware-maintained endpoint context ends up stuck at the address of the Link TRB, leading to erroneous ring expansion events whenever the enqueue pointer wraps to the dequeue position. If the search for the end of the current TD and ring cycle state lands on a Link TRB, move to the next segment. Link: #3919 [6.5.y Fixup - move downstream quirk bits further along] Signed-off-by: Jonathan Bell <[email protected]>
Hello!
I've got a small network of Raspberry Pi 4 4GB models, all running the 2020-08-20 64-bit beta "Lite" image, and doing various jobs from being a Kubernetes cluster to supporting the cluster and a small, private, network attached to it. Since I started running the 64-bit image, I've noticed that my Pi4s that are -not- doing a lot (specifically my sandbox, the Pi running DNS/DHCP, and the the repository mirror and docker registry Pi) will occasionally go off the rails and produce this:
Those messages repeat many times per second until I reboot the affected Pi.
The reboot does clear up the issue for anywhere from 8 to 36 hours (aside from the hundreds of megabytes of log if I haven't paid attention in a while). Is this a known issue?
Of all of the Pis in the network and cluster, the three Pis that do this with regularity are the ones that are the most idle. The database server, the NFS server, and Kubernetes masters and workers are constantly busy, and very rarely do this (and haven't in a month or more since I stood Kubernetes up). It's the only difference between the Pis that I can find, as they are all nearly identical (differing only in SSD size).
All Pis are software and firmware up-to-date at this time, as I do boot directly from SSD on all of them.
Swapping the PIs around (a kubernetes worker Pi for my sandbox Pi, for instance) results in the new sandbox Pi exhibiting the same behavior eventually.
The Pis ran the 32-bit Raspbian image doing the same things without issue.
Would any of you have an idea of what's going on here?
Hardware and kernel information:
Raspberry Pi 4B, 4GB
I can provide as much information as you require, just let me know what you need.
Thank you!
The text was updated successfully, but these errors were encountered: