Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hard fault on boot w/ GCC 14 on Cortex-M #80542

Closed
fabiobaltieri opened this issue Oct 28, 2024 Discussed in #74379 · 10 comments · Fixed by #80579
Closed

Hard fault on boot w/ GCC 14 on Cortex-M #80542

fabiobaltieri opened this issue Oct 28, 2024 Discussed in #74379 · 10 comments · Fixed by #80579
Assignees
Labels
area: ARM ARM (32-bit) Architecture bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug

Comments

@fabiobaltieri
Copy link
Member

Discussed in #74379

Originally posted by agesome June 17, 2024
Hi. When building Zephyr with GCC 14.1.0 (arch linux package), it seems that on startup, z_thread_entry is called with entry = NULL, when normally it would = _main, which leads to a fault.

entry(p1, p2, p3);

If we break in arch_switch_to_main_thread, it looks like this, _main (which is passed to z_thread_entry as entry) is OK:

#0  arch_switch_to_main_thread (main_thread=main_thread@entry=0x20020a50 <z_main_thread>, 
    stack_ptr=stack_ptr@entry=0x20033440 "\356\242^i\205\200\346", <incomplete sequence \333>..., 
    _main=_main@entry=0x1001e47d <bg_thread_main>) at /workspace/zephyr-workspace/zephyr/arch/arm/core/cortex_m/thread.c:575
#1  0x1001e63c in switch_to_main_thread (stack_ptr=0x20033440 "\356\242^i\205\200\346", <incomplete sequence \333>...)
    at /workspace/zephyr-workspace/zephyr/kernel/init.c:585
#2  z_cstart () at /workspace/zephyr-workspace/zephyr/kernel/init.c:684
#3  0x10002390 in z_prep_c () at /workspace/zephyr-workspace/zephyr/arch/arm/core/cortex_m/prep_c.c:196
#4  0x100021f4 in z_arm_reset () at /workspace/zephyr-workspace/zephyr/arch/arm/core/cortex_m/reset.S:169

but if we then break in z_thread_entry, it looks as if arch_switch_to_main_thread had _main = 0

#0  z_thread_entry (entry=0x0, p1=0x0, p2=0x0, p3=0x0) at /workspace/zephyr-workspace/zephyr/lib/os/thread_entry.c:48
#1  0x100023d6 in arch_switch_to_main_thread (main_thread=0x0, stack_ptr=0x0, _main=0x0)
    at /workspace/zephyr-workspace/zephyr/arch/arm/core/cortex_m/thread.c:575
#2  0xc5d30d16 in ?? ()

After some experimenting, I fixed this by commenting

CODE_UNREACHABLE;

With GCC 12 included in Zephyr SDK, this problem does not appear.

So at this point I'm not sure if it's some incompatible change 12->14, or a bug in 14. Any advice?

I also made a comparison of assembly on GCC 14 with/without CODE_UNREACHABLE and GCC 12: https://gist.github.com/agesome/bfc1e15b561df353fd5f340c0413e348

@fabiobaltieri fabiobaltieri added the area: ARM ARM (32-bit) Architecture label Oct 28, 2024
@fabiobaltieri
Copy link
Member Author

cc @stephanosio in case you know something about this already (ARM + GCC 14)

@fabiobaltieri
Copy link
Member Author

@ithinuel could you look into it? We bumped into the issue as well and also verified that removing that line is an effective workaround

@stephanosio stephanosio self-assigned this Oct 29, 2024
@ithinuel
Copy link
Collaborator

ithinuel commented Oct 29, 2024

As far as I can tell from reading the gist, It looks like CODE_UNREACHABLE correctly makes the function do nothing (not even return) after the asm volatile block in gcc14, while it still returned in gcc 12 (with pop {r4, r5, r6, pc}) even if there’s noway to comeback to this point.

The difference that concerns me though is that in GCC 14, it does not set _current to main_thread as expected from line 525.
What shows as

   0x10001eb4 <+36>:    asrs    r4, r6, #9
   0x10001eb6 <+38>:    movs    r0, #0

in the first listing is actually the address of _current: 0x20001274.
Could the compiler fail to see that this is still needed and erroneously optimises this out?

@stephanosio stephanosio changed the title Hard fault w/ GCC 14 on RP2040 Hard fault w/ GCC 14 on Cortex-M Oct 29, 2024
@stephanosio stephanosio changed the title Hard fault w/ GCC 14 on Cortex-M Hard fault on boot w/ GCC 14 on Cortex-M Oct 29, 2024
@stephanosio
Copy link
Member

Note that this is also reproducible with GCC 14 on mps2/an385 and other Cortex-M platforms.

@ithinuel
Copy link
Collaborator

It was pointed to me that _main should be passed to the assembly block as r rather than +r and it may be confusing the compiler. Maybe forcing line 525 to be volatile to prevent compiler optimisation.

I'm afraid this might be a bit too cutting edge for the system I'm working from (NixOS). developer.arm.com only has 13.3.

If you know how to get gcc14 for arm-none-eabi available in NixOS please let me know.

@fabiobaltieri
Copy link
Member Author

fabiobaltieri commented Oct 29, 2024

@ithinuel I have a setup to test, did this change

index fa500032d3c..00d31c05ee3 100644
--- a/arch/arm/core/cortex_m/thread.c
+++ b/arch/arm/core/cortex_m/thread.c
@@ -559,6 +559,8 @@ void arch_switch_to_main_thread(struct k_thread *main_thread, char *stack_ptr,
 #endif
 #endif /* CONFIG_BUILTIN_STACK_GUARD */
 
+       volatile k_thread_entry_t __main = _main;
+
        /*
         * Set PSP to the highest address of the main stack
         * before enabling interrupts and jumping to main.
@@ -586,7 +588,7 @@ void arch_switch_to_main_thread(struct k_thread *main_thread, char *stack_ptr,
        "mov   r3, #0\n"
        "ldr   r4, =z_thread_entry\n"
        "bx    r4\n"            /* We don’t intend to return, so there is no need to link. */
-       : "+r" (_main)
+       : "+r" (__main)
        : "r" (stack_ptr)
        : "r0", "r1", "r2", "r3", "r4", "ip", "lr");

and it worked, though I won't try to pretend I understand why :-) does it make any sense?

@fabiobaltieri
Copy link
Member Author

index fa500032d3c..695e216a43c 100644
--- a/arch/arm/core/cortex_m/thread.c
+++ b/arch/arm/core/cortex_m/thread.c
@@ -588,7 +588,7 @@ void arch_switch_to_main_thread(struct k_thread *main_thread, char *stack_ptr,
        "bx    r4\n"            /* We don’t intend to return, so there is no need to link. */
        : "+r" (_main)
        : "r" (stack_ptr)
-       : "r0", "r1", "r2", "r3", "r4", "ip", "lr");
+       : "r0", "r1", "r2", "r3", "r4", "ip", "lr", "memory");
 
        CODE_UNREACHABLE;
 }

this alwo works (suggested by @ithinuel over discord)

@fabiobaltieri
Copy link
Member Author

index fa500032d3c..6cd7144e79d 100644
--- a/arch/arm/core/cortex_m/thread.c
+++ b/arch/arm/core/cortex_m/thread.c
@@ -586,9 +586,9 @@ void arch_switch_to_main_thread(struct k_thread *main_thread, char *stack_ptr,
        "mov   r3, #0\n"
        "ldr   r4, =z_thread_entry\n"
        "bx    r4\n"            /* We don’t intend to return, so there is no need to link. */
-       : "+r" (_main)
-       : "r" (stack_ptr)
-       : "r0", "r1", "r2", "r3", "r4", "ip", "lr");
+       :
+       : "r" (_main), "r" (stack_ptr)
+       : "r0", "r1", "r2", "r3", "r4", "ip", "lr", "memory");
 
        CODE_UNREACHABLE;
 }

this also works

@ithinuel
Copy link
Collaborator

Seems to be the best solution.

@fabiobaltieri
Copy link
Member Author

Alrighty, thanks @ithinuel, #80579 out for review.

@fabiobaltieri fabiobaltieri added bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug labels Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: ARM ARM (32-bit) Architecture bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants