-
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Hub can get stuck while broadcasting #1419
Comments
Thanks for raising this. This is the expected behavior (as in, not a bug), but I can see this being not ideal. We could perhaps improve this with documentation and a clear example.
A fair analogy is screaming two different things from the same rooftop at the same time 😄 So even if we made the error go away, this probably isn't going to work as you'd like. The receiver wouldn't be guaranteed to receive both. To send two values, it's better to send them together in a list. So if your code wants to send variables A and B from two different tasks, you could make a another task that just broadcasts a new list of A & B whenever either of them changes. Generally in Pybricks, we try to raise the error when there is one, instead of not telling you and leaving you confused why something isn't working.
We couldn't know in advance how the program might run. It's perfectly fine if two tasks both use broadcasting, just not at the same time. |
I see, thanks for the clarification. Now, I'm not sure how this could work, but wouldn't it be possible to make broadcasting always happen asynchronously when using block coding? This way, it would never occur at the same time no matter where in the program the call is used. Feel free to close this issue if you decide that no changes are needed for this :) |
That way, you'd still 'drown' one message in the other, so overall responsiveness is likely not as good compared to intentionally combining the messages as needed depending on your application. |
Let's keep the issue open since it's definitely a good question. We'll want to document this clearly and explain why =) |
In addition to this, is it true that broadcasting and unpacking cannot be done at the same time as well? |
That should be allowed. If you find a reproducible small program we can test, that would be very useful! |
I did find #1454. Maybe you were seeing this too? |
That one is interesting. Did your hub continue to fade blue during and after the "crash"? I've yet to have this happen while connected to a computer, and it's really inconsistent in when it happens, sometimes after a minute, and sometimes after 10. Other times it doesn't happen at all during the time I'm testing with my program. I'm still using three hubs communicating with each other and this far, two of them have randomly "crashed", one more often than the other. The third hub has been fine all the time. The difference between how they handle communication is that the good hub has been doing it in a loop in a separate task. The other two have only been unpacking in a loop in a separate task, while broadcasting from the main program whenever needed. I've since moved it all to that separate loop and it seems to have been fine since then on all three hubs. I'll see if I can reproduce it with a small program. |
No, so maybe you're seeing something different.
Thank you! |
I wasn't able to reproduce it with a small program sadly. However, I now know that it's actually not due to broadcasting and unpacking at the same time since one of my hubs did this again today, even though the program has all broadcasting and unpacking in a single task. I now have no clue what can be causing this. |
Just to add to this, same happened today, but long pressing the power button made the light on the hub flash rapidly without stopping. The hub stopped responding to long button presses altogether and the only way to revive the hub was to pull the batteries. |
Which firmware are you using? from pybricks import version
print(version) The beta firmware from https://beta.pybricks.com/ should already fix some of this, so it would be good to know which version you used. |
I'm running: ('technichub', '3.4.0b2', 'v1.20.0-23-g6c633a8dd on 2024-02-14') One of my hubs are probably on an older firmware. I read something about bad data in the thread you linked. I've been holding off updating the firmware since it fails 95% of the time ("The hub took too long to respond. Restart the hub and try again."), taking up to 30 minutes until I can get it running. I'll update the third hub now and we will see if the issue comes back. Thanks for letting me know there's fixes for this in the update :) |
All hubs are now updated but the issue still occurred. This time I also had to pull the batteries to get the hub turned off. Here's the program I'm running on the most problematic hub (Code is generated through Block Coding). from pybricks.hubs import TechnicHub
from pybricks.parameters import Axis, Color, Direction, Port, Stop
from pybricks.pupdevices import ColorDistanceSensor, Motor
from pybricks.tools import multitask, run_task, wait
Color.WHITE = Color(0, 0, 100)
Color.BLACK = Color(0, 0, 0)
SensorHub = TechnicHub(top_side=Axis.Z, front_side=Axis.X, broadcast_channel=3, observe_channels=[1, 2])
DirectTrigger = ColorDistanceSensor(Port.A)
DirectTrigger.detectable_colors((Color.RED, Color.NONE))
LateTrigger = ColorDistanceSensor(Port.C)
LateTrigger.detectable_colors((Color.WHITE, Color.BLACK))
Tipping = Motor(Port.B, Direction.COUNTERCLOCKWISE)
DistributorTipp = False
TableReadyForTipp = False
Triggered = False
Tipped = False
TriggeredDistributor = False
async def main1():
global Triggered
while True:
await wait(0)
while not (await DirectTrigger.color() == Color.RED or await LateTrigger.color() == Color.WHITE):
await wait(1)
Triggered = True
await wait(1000)
Triggered = False
async def main2():
global DistributorTipp, TriggeredDistributor, TableReadyForTipp
while True:
await wait(0)
DistributorTipp, = SensorHub.ble.observe(2) or [0] * 1
TriggeredDistributor, TableReadyForTipp = SensorHub.ble.observe(1) or [0] * 2
await SensorHub.ble.broadcast([Tipped, Triggered])
await wait(200)
async def main3():
global Tipped
while True:
await wait(0)
if DistributorTipp == True and TableReadyForTipp == True:
await Tipping.run_angle(100, 100, Stop.BRAKE)
await wait(1000)
await Tipping.run_angle(100, -100, Stop.BRAKE)
Tipped = True
while not (DistributorTipp == False and TableReadyForTipp == False):
await wait(1)
Tipped = False
else:
pass
await wait(500)
async def main():
await multitask(main1(), main2(), main3())
run_task(main()) |
The documentation issue here has been addressed via pybricks/pybricks-api@b2b183e. What remains here then is the issue of the hub getting stuck. I haven't been able to reproduce this yet. |
Great! I attended a lego-event this weekend and had my machine running for a day straight. The stuck hub issue happened about once every 30 minutes on one of the three hubs at random. Sometimes having to pull the batteries and sometimes not. I still don't know what could be causing this. |
Just to update here. Using v3.5.0b1 (Pybricks Beta v2.5.0-beta.2) with the latest firmware still causes these crashes. There is a difference though, when it happens, I always have to pull the batteries in comparison to before where that was the odd case. |
Is it still this program that causes it for you? I'd like to make some time to properly investigate this one. As a first step, I'd like to try to reproduce it. Do you think we can make something with a hub and just a few motors, without replicating your whole build? Are you only transmitting boolean values? What does your other program look like? Or can the crash be reproduced by just running this one? Thanks! |
Tried the program you referred to on a Technic hub with one ColorDistanceSensor (I have only one ColorDistanceSensor) and a ColorSensor. The program runs over two hours without a problem. Two hubs run a transmitter: one on a primehub: from pybricks.hubs import PrimeHub
from pybricks.tools import wait
from urandom import choice
transmitter = PrimeHub(broadcast_channel=2, observe_channels=[1, 3])
while True:
transmitter.ble.broadcast([choice([True, False])])
TriggeredDistributor, TableReadyForTipp = transmitter.ble.observe(3) or [0] * 2
wait(100) And one on a Technichub: from pybricks.hubs import TechnicHub
from pybricks.tools import wait
from urandom import choice
transmitter = TechnicHub(broadcast_channel=1, observe_channels=[2, 3])
while True:
transmitter.ble.broadcast([choice([True, False]), choice([True, False])])
TriggeredDistributor, TableReadyForTipp = transmitter.ble.observe(3) or [0] * 2
wait(100) But as stated, no problem seen yet. Bert |
Updated status above. |
I don't think we could realistically do this while keeping the program running without maintaining all sorts of extra state information. So maybe the best we can do is to raise an exception to trigger a program stop and skip Bluetooth de-init when this happens. One problem is that it can occur in several tasks. For one task, it could look like this: diff --git a/lib/pbio/drv/bluetooth/bluetooth_stm32_cc2640.c b/lib/pbio/drv/bluetooth/bluetooth_stm32_cc2640.c
index b5771b0a7..d1c8301fd 100644
--- a/lib/pbio/drv/bluetooth/bluetooth_stm32_cc2640.c
+++ b/lib/pbio/drv/bluetooth/bluetooth_stm32_cc2640.c
@@ -1067,6 +1067,8 @@ void pbdrv_bluetooth_peripheral_disconnect(void) {
static PT_THREAD(broadcast_task(struct pt *pt, pbio_task_t *task)) {
pbdrv_bluetooth_value_t *value = task->context;
+ static struct etimer broadcast_timeout;
+
PT_BEGIN(pt);
if (value->size > B_MAX_ADV_LEN) {
@@ -1081,7 +1083,21 @@ static PT_THREAD(broadcast_task(struct pt *pt, pbio_task_t *task)) {
// not the command status).
PT_WAIT_WHILE(pt, write_xfer_size);
HCI_LE_setAdvertisingData(value->size, value->data);
- PT_WAIT_UNTIL(pt, hci_command_complete);
+
+ // In rare cases while observing, setting the advertising data never completes
+ // and the task hangs. We cannot currently recover from this state, so we
+ // turn off Bluetooth and raise an exception on the current task in order
+ // to end the user program.
+ PROCESS_CONTEXT_BEGIN(&pbdrv_bluetooth_spi_process);
+ etimer_set(&broadcast_timeout, 2000);
+ PROCESS_CONTEXT_END(&pbdrv_bluetooth_spi_process);
+ PT_WAIT_UNTIL(pt, hci_command_complete || etimer_expired(&broadcast_timeout));
+ if (etimer_expired(&broadcast_timeout)) {
+ bluetooth_ready = false;
+ pbdrv_bluetooth_power_on(false);
+ task->status = PBIO_ERROR_IO;
+ PT_EXIT(pt);
+ }
if (!is_broadcasting) {
PT_WAIT_WHILE(pt, write_xfer_size);
diff --git a/pybricks/common/pb_type_ble.c b/pybricks/common/pb_type_ble.c
index 8114a1e8d..36ab73e40 100644
--- a/pybricks/common/pb_type_ble.c
+++ b/pybricks/common/pb_type_ble.c
@@ -559,6 +559,11 @@ mp_obj_t pb_type_BLE_new(mp_obj_t broadcast_channel_in, mp_obj_t observe_channel
}
void pb_type_BLE_cleanup(void) {
+
+ if (!pbdrv_bluetooth_is_ready()) {
+ return;
+ }
+
static pbio_task_t stop_broadcasting_task;
static pbio_task_t stop_observing_task;
pbdrv_bluetooth_stop_broadcasting(&stop_broadcasting_task);
diff --git a/pybricks/iodevices/pb_type_iodevices_lwp3device.c b/pybricks/iodevices/pb_type_iodevices_lwp3device.c
index 871d243a5..cfa852fc3 100644
--- a/pybricks/iodevices/pb_type_iodevices_lwp3device.c
+++ b/pybricks/iodevices/pb_type_iodevices_lwp3device.c
@@ -321,6 +321,11 @@ STATIC void pb_lwp3device_configure_remote(void) {
}
void pb_type_Remote_cleanup(void) {
+
+ if (!pbdrv_bluetooth_is_ready()) {
+ return;
+ }
+
pbdrv_bluetooth_peripheral_disconnect();
while (pbdrv_bluetooth_is_connected(PBDRV_BLUETOOTH_CONNECTION_PERIPHERAL)) {
|
One problem is that we'd need this timeout potentially for every wait in the start/stop broadcast/observe tasks, and potentially others like write tasks if the user also uses the remote light etc. So maybe it is more practical to just fix forced hub shutdown instead. |
@dlech - why does the Would I suppose the same question applies to EDIT: I replaced these three wait loops by a single loop with only |
Would you be interested in a program that stalls in seconds? Laurens program plus BT stop broadcasting after each broadcast"""Do a reset after each broadcast
and see if this runs longer or not
"""
from pybricks.hubs import TechnicHub
# from pybricks.parameters import Axis
from pybricks.parameters import Color
from urandom import choice
# from pybricks.tools import wait, StopWatch
from pybricks import version
print(version)
# Set up all devices.
HUB = TechnicHub(broadcast_channel=1, observe_channels=[2, 3])
print('\nThis is HUB "' + str(HUB.system.name()) + '" as a ' + version[0])
print('\tBluetooth chip version "' + str(HUB.ble.version()) + '"')
print("\tbattery voltage:\t", HUB.battery.voltage(), "mV")
# Initialize variables.
# watch = StopWatch()
bool2 = 0
# The main program starts here.
while True:
bool2 = choice([True, False])
HUB.light.on(Color(120, 100, 50)) # green
HUB.ble.broadcast([bool2, choice(['abcdefghijkl', 'ghijklabcdef'])]) # same data size and type all the time
# reset the bluetoothchip every broadcast
HUB.light.on(Color.RED)
HUB.ble.broadcast(None)
HUB.light.on(Color.MAGENTA) [EDIT} updated BT reset to BT stop broadcasting |
Thank you. By the way, |
@BertLindeman @JJHackimoto - We've added some updates to make sure that the hub can at least shutdown when this happens. By pressing the button for three seconds. No need to pull any batteries. pybricks/pybricks-micropython#245 The CI is getting stuck for some reason, so here is the prebuilt firmware with the shutdown fix: |
Test above in this post with "stop broadcast" after each broadcast, stops (in RED this first test) and "hold button" nicely stops the hub. Will run other tests, that will take more time.. |
Build links in pybricks/pybricks-micropython#245 are fixed, so firmware for all hubs is ready for testing. Also a bonus feature we got as part of this cleanup: You can now call |
Sounds reasonable for the way we currently use these functions. |
Tests running on build 3360. |
Although it would make the stdio flush ioctl non-interuptable which could be problematic if someone tries to flush a really big buffer. |
Thanks for the quick review of the PR! I'll update it based on your notes.
I've only changed it for the deinit - the mphal flush used in programs still uses the event poll hook. Is that what you mean? |
I thought you meant you changed the implementation of |
Can confirm this still happens using the latest beta. No need to pull the batteries anymore though :) |
Yeah, we haven't been able to find a fix, so just patched the shutdown issue. |
Spikehub hangs with fast blinking button doing no broadcasts. The hub definition only sets the channel: If the broadcast_channel is not set, the problem does not occur. The program is a simple combination of display animation and button press. The complete program# Hub hangs after printing 1 2 3 1 2 3
# central button press makes the hub do a rapid blue blink
from pybricks.hubs import InventorHub
from pybricks.parameters import Button, Icon
from pybricks.tools import wait
hub = InventorHub(broadcast_channel=1)
# hub = InventorHub()
animation = [Icon.EYE_LEFT_BLINK * i / 100 for i in [0, 100, 100, 0]]
while True:
print(1)
hub.display.animate(animation, 10)
print(2)
pressed = hub.buttons.pressed()
print(3)
if any(pressed):
pass
wait(100) Not sure if this issue is the correct issue. |
Could you be seeing #1295? When pressing a button in your program, there is no wait in the loop so it's restarting the animation way faster than it can run, which can lead to a lockup. |
Could be, Laurens, but . . . What makes a difference is changing the hub definition from hub = InventorHub(broadcast_channel=1) to hub = InventorHub() In the last case no problem. |
@BertLindeman Since #1295 was fixed you may be able to confirm your case above. @JJHackimoto please see #1806 for ideas to add a method to stop and start observing. |
Will test |
Thanks! Looks good already. Now, I know I'm making this complicated for you but would there be a way for me to test this with my blocks code? I guess I could convert it to python and add your fix that way since you probably won't have a block for this available until release right? |
[EDIT] add program names Issue of May 1 solved. Ran this large test program again at 3.5.0 and it did get stuck at RED. The current large program my name ```issue_1419_large_program.py```from pybricks import version
from pybricks.hubs import TechnicHub
from pybricks.parameters import Axis, Color, Direction, Port, Stop
from pybricks.pupdevices import ColorDistanceSensor, Motor, ColorSensor
from pybricks.tools import multitask, run_task, wait
from urandom import choice
Color.WHITE = Color(0, 0, 100)
Color.BLACK = Color(0, 0, 0)
SensorHub = TechnicHub(top_side=Axis.Z, front_side=Axis.X, broadcast_channel=3, observe_channels=[1, 2])
DirectTrigger = ColorDistanceSensor(Port.A)
DirectTrigger.detectable_colors((Color.RED, Color.NONE))
# LateTrigger = ColorDistanceSensor(Port.C)
# LateTrigger.detectable_colors((Color.WHITE, Color.BLACK))
LateTrigger = ColorSensor(Port.C) # I have only ONE colorDistanceSensor so a ColorSensor
LateTrigger.detectable_colors((Color.WHITE, Color.BLACK))
Tipping = Motor(Port.B, Direction.COUNTERCLOCKWISE)
DistributorTipp = False
TableReadyForTipp = False
Triggered = False
Tipped = False
TriggeredDistributor = False
async def main1():
global Triggered
while True:
await wait(0)
while not (await DirectTrigger.color() == Color.RED or await LateTrigger.color() == Color.WHITE):
await wait(1)
# print(end="1")
Triggered = True
await wait(1000)
Triggered = False
async def main2():
global DistributorTipp, TriggeredDistributor, TableReadyForTipp
while True:
await wait(0)
DistributorTipp, = SensorHub.ble.observe(2) or [0] * 1
SensorHub.light.on(Color(180, 100, 50)) # Cyan
TriggeredDistributor, TableReadyForTipp = SensorHub.ble.observe(1) or [0] * 2
SensorHub.light.on(Color(0, 100, 50)) # red
await SensorHub.ble.broadcast([Tipped, Triggered])
SensorHub.light.on(Color(60, 100, 50)) # Yellow
# SensorHub.light.on(Color.NONE)
await wait(200)
SensorHub.light.on(Color(120, 100, 50)) # green
async def main3():
global Tipped
while True:
await wait(0)
# print(DistributorTipp, "d", TableReadyForTipp, "t", end="")
if DistributorTipp == True and TableReadyForTipp == True:
await Tipping.run_angle(100, 100, Stop.BRAKE)
await wait(1000)
await Tipping.run_angle(100, -100, Stop.BRAKE)
Tipped = True
while not (DistributorTipp == False and TableReadyForTipp == False):
await wait(1)
Tipped = False
else:
pass
await wait(500)
async def main():
await multitask(main1(), main2(), main3())
print(version)
run_task(main()) I did not expect this test to get stuck so I did not note the start time. The Technic hub was running not PC connected, so a crtl-c cannot be done to see the location it is stuck at. Bad testing amateur here. A short button press does nothing, Will try to do the test connected to the PC and first well-loaded batteries in. Next test started at 23:10 on fresh batteries. [EDIT] another technic hub in this program
|
Status at 02:30 all test running. Status 08:10 Both Technic hubs stuck. Hub-A running the large test (see previous post, my name In both cases a CTRL-C on the log window did nothing, So had to use the long button press to power-off. [EDIT] repaired typo in program name from pybricks.hubs import PrimeHub
from pybricks.tools import wait
from urandom import choice
transmitter = PrimeHub(broadcast_channel=2, observe_channels=[1, 3])
while True:
transmitter.ble.broadcast([choice([True, False])])
TriggeredDistributor, TableReadyForTipp = transmitter.ble.observe(3) or [0] * 2
print(TriggeredDistributor, TableReadyForTipp, end="\t")
wait(0) |
Thanks for all your testing! But I was referring to your (then most-recent) program above: # Hub hangs after printing 1 2 3 1 2 3
# central button press makes the hub do a rapid blue blink
from pybricks.hubs import InventorHub
from pybricks.parameters import Button, Icon
from pybricks.tools import wait
hub = InventorHub(broadcast_channel=1)
# hub = InventorHub()
animation = [Icon.EYE_LEFT_BLINK * i / 100 for i in [0, 100, 100, 0]]
while True:
print(1)
hub.display.animate(animation, 10)
print(2)
pressed = hub.buttons.pressed()
print(3)
if any(pressed):
pass
wait(100) I believe this is fixed by #1295 and as indicated, probably unrelated to broadcasting. The hanging issue with combined observing/broadcasting has not been changed, and likely can't be fixed due to the inherit limitations of the Bluetooth chip. Instead, that an alternate solution (workaround) was proposed in pybricks/pybricks-micropython#269 |
You can test it with block code. I will make an example and post it in |
I propose that we close this issue as unfixable. With a practical workaround introduced in #1806 Any objections? |
(Now quoting you loosely) Thank you for getting me on my toes 😃 Can confirm this test on
So you are right, Laurens |
Describe the bug
Block Coding: When using Broadcast in two or more tasks running simultaneously, an error will be thrown and the program will terminate. Error below:
"OSError: This resource cannot be used in two tasks at once."
Expected behavior
I expected the block coding to give me an error or prevent me from building the program this way before running the program. This is especially true since block coders may not be that experienced in coding, and may not understand the error thrown in the console. The error won't be seen if running the program off the hub without a bluetooth connection either.
I could also expect this to just work since I personally can't see the reason for this not working. It currently feels unintuitive that the only way to use Broadcasting is to have a loop that constantly sends out the chosen values instead of just sending an update when the values have actually changed (for example when just sending out Booleans once in a while).
The text was updated successfully, but these errors were encountered: