Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

target/riscv: simplify reset for rtos harts #821

Merged
merged 1 commit into from
Apr 6, 2023

Conversation

en-sc
Copy link
Collaborator

@en-sc en-sc commented Mar 22, 2023

Since the deletion of -rtos hwthread, there is no need to treat harts with -rtos specified differently on reset.

Change-Id: I88a9129936b5172bb7479dfa1255e29ff460c054

@en-sc en-sc force-pushed the en-sc/fix-reset-mharts branch from 018ab71 to 9cecda8 Compare March 22, 2023 16:51
@en-sc en-sc changed the title target/riscv: fix reset for rtos harts target/riscv: simplify reset for rtos harts Mar 22, 2023
@en-sc
Copy link
Collaborator Author

en-sc commented Mar 22, 2023

Prior to the change, DM_DMCONTROL_HALTREQ was written twice to dmstatus on reset halt (lines 2487 and 2499), which was causing instability on some HW implementations

Copy link
Collaborator

@timsifive timsifive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding more error checking, too.

What hardware have you tested this with? I'm especially thinking of multi-hart systems. The tests against spike don't really cover this code, and it's a bit of a house of cards.

src/target/riscv/riscv-013.c Outdated Show resolved Hide resolved
src/target/riscv/riscv-013.c Outdated Show resolved Hide resolved
src/target/riscv/riscv-013.c Outdated Show resolved Hide resolved
src/target/riscv/riscv-013.c Outdated Show resolved Hide resolved
@timsifive
Copy link
Collaborator

This seems to work fine on HiFive Unleashed.

@en-sc en-sc force-pushed the en-sc/fix-reset-mharts branch 3 times, most recently from 4b26eac to 3e9a08d Compare March 23, 2023 18:02
@en-sc
Copy link
Collaborator Author

en-sc commented Mar 28, 2023

Thanks for adding more error checking, too.

What hardware have you tested this with? I'm especially thinking of multi-hart systems. The tests against spike don't really cover this code, and it's a bit of a house of cards.

We also tested on HiFive Unleashed as well as on Syntacore SCR7 SMP (on which the bug was discovered in the first place)

@en-sc en-sc requested a review from timsifive March 28, 2023 17:30
timsifive
timsifive previously approved these changes Mar 29, 2023
Copy link
Collaborator

@JanMatCodasip JanMatCodasip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@en-sc Thank you for the changes. I have checked the code visually and it looks all right. I have only few minor review findings.


uint32_t dmstatus;
int dmi_busy_delay = info->dmi_busy_delay;
const int dmi_busy_delay = info->dmi_busy_delay;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For readability, could we please call this this orig_dmi_busy_delay?

Suggested change
const int dmi_busy_delay = info->dmi_busy_delay;
const int orig_dmi_busy_delay = info->dmi_busy_delay;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

"Increase the timeout with riscv set_reset_timeout_sec.",
riscv_reset_timeout_sec, dmstatus,
get_field(dmstatus, DM_DMSTATUS_ALLUNAVAIL) ? "true" : "false",
get_field(dmstatus, DM_DMSTATUS_ALLUNAVAIL) ? "true" : "false");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a copy-paste typo:

Suggested change
get_field(dmstatus, DM_DMSTATUS_ALLUNAVAIL) ? "true" : "false");
get_field(dmstatus, DM_DMSTATUS_ALLHAVERESET) ? "true" : "false");

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

Comment on lines 2462 to 2466
uint32_t control = 0, control_haltreq;
control = set_field(control, DM_DMCONTROL_DMACTIVE, 1);
control_haltreq = set_field(control, DM_DMCONTROL_HALTREQ, target->reset_halt ? 1 : 0);
dmi_write(target, DM_DMCONTROL,
result = dmi_write(target, DM_DMCONTROL,
set_dmcontrol_hartsel(control_haltreq, info->index));
Copy link
Collaborator

@JanMatCodasip JanMatCodasip Apr 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For readability, I'd recommend to simplify this part and set always all the fields of dmcontrol:

uint32_t control = 0;
control = set_field(control, DM_DMCONTROL_DMACTIVE, 1);
control = set_field(control, DM_DMCONTROL_HALTREQ, target->reset_halt ? 1 : 0);
control = set_dmcontrol_hartsel(control, info->index);
result = dmi_write(target, DMI_DMCONTROL, control);

Also I'd recommend to use just one variable in this function (control).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

Comment on lines 2514 to 2517
/* Ack reset and clear DM_DMCONTROL_HALTREQ if previously set */
return dmi_write(target, DM_DMCONTROL,
set_dmcontrol_hartsel(control, info->index) |
DM_DMCONTROL_ACKHAVERESET);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as at the top of the function, I'd recommend to set all the fields here to make it very easy to read:

control = 0;
control = set_field(control, DM_DMCONTROL_DMACTIVE, 1);
control = set_field(control, DM_DMCONTROL_ACKHAVERESET, 1);
result = dmi_write(target, DMI_DMCONTROL, control);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

@@ -4459,8 +4424,10 @@ riscv_insn_t riscv013_read_debug_buffer(struct target *target, unsigned index)
int riscv013_invalidate_cached_debug_buffer(struct target *target)
{
dm013_info_t *dm = get_dm(target);
if (!dm)
if (!dm) {
LOG_TARGET_ERROR(target, "No DM is specified for the target");
Copy link
Collaborator

@JanMatCodasip JanMatCodasip Apr 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you please come across a scenario when this error message would pop up?

Since there is no progbuf_cache to invalidate if the target does not have a debug module specified, I'd recommend to demote this to a debug message only.

In another words, if there is no DM assigned, the operation in such case should not be considered as failed - it is simply not applicable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion! I've added the message just in case. Have not observed it yet. I've demoted the message as you recommended.

@en-sc
Copy link
Collaborator Author

en-sc commented Apr 4, 2023

@timsifive, @JanMatCodasip, can you please clarify something for me:
When the reset is asserted, DM_DMCONTROL_NDMRESET is set. This is the case now, as well as it was prior to suggested changes. If I understand correctly, this makes it so system-wide reset (except for DM) is performed. Isn't it necessary to write DM_DMCONTROL_ACKHAVERESET for each hart on the platform after (as well as check that all the harts, not just the one selected, came out of reset)?

Since the deletion of `-rtos hwthread`, there is no need to treat harts
with `-rtos` specified differently on reset.

Signed-off-by: Evgeniy Naydanov <[email protected]>
Change-Id: I88a9129936b5172bb7479dfa1255e29ff460c054
@en-sc en-sc force-pushed the en-sc/fix-reset-mharts branch from e76c3d3 to 1c16824 Compare April 5, 2023 16:15
@en-sc
Copy link
Collaborator Author

en-sc commented Apr 5, 2023

Check Code Style fails, but it is due to rebase (no style issues in src/target/riscv/riscv-013.c are reported)

@timsifive
Copy link
Collaborator

When the reset is asserted, DM_DMCONTROL_NDMRESET is set. This is the case now, as well as it was prior to suggested changes. If I understand correctly, this makes it so system-wide reset (except for DM) is performed. Isn't it necessary to write DM_DMCONTROL_ACKHAVERESET for each hart on the platform after (as well as check that all the harts, not just the one selected, came out of reset)?

It is not required for the target. The reset took effect just as well without writing ackhavereset.

It would be better for OpenOCD to confirm that havereset is set for every hart, and then to ack it so that the bit is clear and OpenOCD will notice if a spontaneous reset occurs.

@timsifive timsifive merged commit 15bb3e2 into riscv-collab:riscv Apr 6, 2023
@Alan-19950616
Copy link

@timsifive @en-sc @JanMatCodasip This commit causes certain cores to fail to boot on riscv smp multicore systems.

Here's how I configured it.

set _CHIPNAME0 riscv0
jtag newtap $_CHIPNAME0 cpu -irlen 5

set _TARGETNAME0 $_CHIPNAME0.cpu
target create $_TARGETNAME0.0 riscv -chain-position $_TARGETNAME0 -coreid 0 -rtos hwthread
target create $_TARGETNAME0.1 riscv -chain-position $_TARGETNAME0 -coreid 1
target create $_TARGETNAME0.2 riscv -chain-position $_TARGETNAME0 -coreid 2
target create $_TARGETNAME0.3 riscv -chain-position $_TARGETNAME0 -coreid 3
target create $_TARGETNAME0.4 riscv -chain-position $_TARGETNAME0 -coreid 4
target create $_TARGETNAME0.5 riscv -chain-position $_TARGETNAME0 -coreid 5
target create $_TARGETNAME0.6 riscv -chain-position $_TARGETNAME0 -coreid 6
target create $_TARGETNAME0.7 riscv -chain-position $_TARGETNAME0 -coreid 7

target smp $_TARGETNAME0.0 $_TARGETNAME0.1 $_TARGETNAME0.2 $_TARGETNAME0.3 $_TARGETNAME0.4 $_TARGETNAME0.5 $_TARGETNAME0.6 $_TARGETNAME0.7

@aap-sc
Copy link
Collaborator

aap-sc commented Oct 20, 2023

@Alan-19950616

This commit causes certain cores to fail to boot on riscv smp multicore systems.

Few questions:

  1. Could you please clarify what do you mean by 'fail to boot" ? Does OpenOCD returns some error? Does OpenOCD targets indicate that they are running? Or is it you have some loader program that fails to be initialized properly?

  2. Which commands do you use to "boot" the system?

  3. Can you attach -d3 openocd log of a session that has this issue?

@Alan-19950616
Copy link

Alan-19950616 commented Oct 23, 2023

The "fail to boot" is mean this ctest failed, and OpenOCD has no error messages.

smphello_8c.zip

#include <stdio.h>
#include "nuclei_sdk_soc.h"

#if !defined(__riscv_atomic)
#error "RVA(atomic) extension is required for SMP"
#endif

#if !defined(SMP_CPU_CNT)
#error "SMP_CPU_CNT macro is not defined, please set SMP_CPU_CNT to integer value > 1"
#endif

typedef struct {
    uint32_t state;
} spinlock;

spinlock lock;
volatile uint32_t lock_ready = 0;
volatile uint32_t cpu_count = 0;
volatile uint32_t finished = 0;

// comment SPINLOCK_CAS to use AMOSWAP as spinlock
#define SPINLOCK_CAS

__STATIC_FORCEINLINE void spinlock_init(spinlock *lock)
{
    lock->state = 0;
}

__STATIC_FORCEINLINE void spinlock_lock(spinlock *lock)
{
    uint32_t old;
    uint32_t backoff = 10;
    do {
#ifndef SPINLOCK_CAS
        // Use amoswap as spinlock
        old = __AMOSWAP_W((&(lock->state)), 1);
#else
        // use lr.w & sc.w to do CAS as spinlock
        old = __CAS_W((&(lock->state)), 0, 1);
#endif
        if (old == 0) {
            break;
        }
        for (volatile int i = 0; i < backoff; i ++) {
            __NOP();
        }
        backoff += 10;
    } while (1);
}

__STATIC_FORCEINLINE void spinlock_unlock(spinlock *lock)
{
    lock->state = 0;
}

int boot_hart_main(unsigned long hartid);
int other_harts_main(unsigned long hartid);
int main(void);

/* Reimplementation of smp_main for multi-harts */
int smp_main(void)
{
    return main();
}

int main(void)
{
    int ret;
    // get hart id in current cluster
    unsigned long hartid = __get_hart_id();
    if (hartid == BOOT_HARTID) { // boot hart
        spinlock_init(&lock);
        lock_ready = 1;
        finished = 0;
        __SMP_RWMB();
        ret = boot_hart_main(hartid);
    } else { // other harts
        // wait for lock initialized
        while (lock_ready == 0);
        ret = other_harts_main(hartid);
    }
    return ret;
}

int boot_hart_main(unsigned long hartid)
{
    volatile unsigned long waitcnt = 0;
    spinlock_lock(&lock);
    printf("Hello world from hart %d\n", hartid);
    cpu_count += 1;
    spinlock_unlock(&lock);
    // wait for all harts boot and print hello
    while (cpu_count < SMP_CPU_CNT) {
        waitcnt ++;
        __NOP();
        // The waitcnt compare value need to be adjust according
        // to cpu frequency
        if (waitcnt >= SystemCoreClock) {
            break;
        }
    }
    if (cpu_count == SMP_CPU_CNT) {
        printf("All harts boot successfully!\n");
        finished = 1;
        return 0;
    } else {
        printf("Some harts boot failed, only %d/%d booted!\n", cpu_count, SMP_CPU_CNT);
        return -1;
    }
}

int other_harts_main(unsigned long hartid)
{
    spinlock_lock(&lock);
    printf("Hello world from hart %d\n", hartid);
    cpu_count += 1;
    spinlock_unlock(&lock);
    // wait for all harts boot and print hello
    while (cpu_count < SMP_CPU_CNT);
    // wait for boot hart to set finished flag
    while (finished == 0);
    return 0;
}

@aap-sc
Copy link
Collaborator

aap-sc commented Oct 23, 2023

@Alan-19950616

The "fail to boot" is mean this ctest failed, and OpenOCD has no error messages.

If openocd has no error messages than likely this may be an application bug (timings of the operations were changed).

  1. Is this real HW or spike?
  2. Could you please double-check the state of targets after "resume" operation completes? This can be done by issuing something like echo "[targets]" or just targets. If all targets are in "running" state - then you'll need to debug the application.

@en-sc
Copy link
Collaborator Author

en-sc commented Oct 24, 2023

The targets are running, you can see this in the last poll in the log.

Here is the snippet regarding riscv.cpu.0:

  • Hart zero is selected:
Debug: 253798 32054 riscv.c:2764 riscv_poll_hart(): [riscv.cpu.0] polling, target->state=1
Debug: 253799 32054 riscv-013.c:388 scan(): 41b w 00000001 @10 -> + 00000000 @00; 4i
Debug: 253800 32054 riscv-013.c:398 scan(): dmcontrol=1 { dmactive=active, ndmreset=0, hartsello=0, clrresethaltreq=0, hasel=single, ackunavail=nop, ackhavereset=nop, hartreset=0, setresethaltreq=0, resumereq=0, haltreq=0, clrkeepalive=0, setkeepalive=0, } ->
Debug: 253801 32054 riscv-013.c:388 scan(): 41b - 00000000 @10 -> + 00000001 @10; 4i
Debug: 253802 32054 riscv-013.c:398 scan(): dmcontrol=0 { dmactive=inactive, ndmreset=0, hartsello=0, clrresethaltreq=0, hasel=single, ackunavail=nop, ackhavereset=nop, hartreset=0, setresethaltreq=0, resumereq=0, haltreq=0, clrkeepalive=0, setkeepalive=0, } -> dmcontrol=1 { dmactive=active, ndmreset=0, hartsello=0, clrresethaltreq=0, hasel=single, ackunavail=nop, ackhavereset=nop, hartreset=0, setresethaltreq=0, resumereq=0, haltreq=0, clrkeepalive=0, setkeepalive=0, }
  • dmstatus is read to get the current state:
Debug: 253803 32054 riscv-013.c:388 scan(): 41b r 00000000 @11 -> + 00000000 @00; 4i
Debug: 253804 32054 riscv-013.c:398 scan(): dmstatus=0 { version=none, anyrunning=0, allrunning=0, anyunavail=0, allunavail=0, anynonexistent=0, allnonexistent=0, anyresumeack=0, allresumeack=0, anyhavereset=0, allhavereset=0, impebreak=0, stickyunavail=current, ndmresetpending=false, confstrptrvalid=invalid, hasresethaltreq=0, authbusy=ready, authenticated=false, anyhalted=0, } ->
Debug: 253805 32055 riscv-013.c:388 scan(): 41b - 00000000 @11 -> + 00430ca2 @11; 4i
Debug: 253806 32055 riscv-013.c:398 scan(): dmstatus=0 { version=none, anyrunning=0, allrunning=0, anyunavail=0, allunavail=0, anynonexistent=0, allnonexistent=0, anyresumeack=0, allresumeack=0, anyhavereset=0, allhavereset=0, impebreak=0, stickyunavail=current, ndmresetpending=false, confstrptrvalid=invalid, hasresethaltreq=0, authbusy=ready, authenticated=false, anyhalted=0, } -> dmstatus=0x430ca2 { version=0.13, anyrunning=1, allrunning=1, anyunavail=0, allunavail=0, anynonexistent=0, allnonexistent=0, anyresumeack=1, allresumeack=1, anyhavereset=0, allhavereset=0, impebreak=1, stickyunavail=current, ndmresetpending=false, confstrptrvalid=invalid, hasresethaltreq=1, authbusy=ready, authenticated=true, anyhalted=0, }
  • We can see that anyrunning and allrunning are set.
Debug: 253806 32055 riscv-013.c:398 scan(): dmstatus=0 { ... } -> dmstatus=0x430ca2 { version=0.13, anyrunning=1, allrunning=1, ... }
  • This is the same for all the other harts (with dmcontrol.hartsello set accordingly):
Debug: 253798 32054 riscv.c:2764 riscv_poll_hart(): [riscv.cpu.0] polling, target->state=1
...
Debug: 253800 32054 riscv-013.c:398 scan(): dmcontrol=1 { ..., hartsello=0, ... } ->
...
Debug: 253806 32055 riscv-013.c:398 scan(): dmstatus=0 { ... } -> dmstatus=0x430ca2 { ... , anyrunning=1, allrunning=1, ... }
Debug: 253807 32055 riscv.c:2764 riscv_poll_hart(): [riscv.cpu.1] polling, target->state=1
...
Debug: 253809 32055 riscv-013.c:398 scan(): dmcontrol=0x10001 { ..., hartsello=1, ... } ->
...
Debug: 253815 32055 riscv-013.c:398 scan(): dmstatus=0 { ... } -> dmstatus=0x430ca2 { ..., anyrunning=1, allrunning=1, ... }
Debug: 253816 32055 riscv.c:2764 riscv_poll_hart(): [riscv.cpu.2] polling, target->state=1
...
Debug: 253818 32055 riscv-013.c:398 scan(): dmcontrol=0x20001 { ..., hartsello=2, ... } ->
...
Debug: 253824 32056 riscv-013.c:398 scan(): dmstatus=0 { ... } -> dmstatus=0x430ca2 { ..., anyrunning=1, allrunning=1, ... }
Debug: 253825 32056 riscv.c:2764 riscv_poll_hart(): [riscv.cpu.3] polling, target->state=1
...
Debug: 253827 32056 riscv-013.c:398 scan(): dmcontrol=0x30001 { ..., hartsello=3, ... } ->
...
Debug: 253833 32056 riscv-013.c:398 scan(): dmstatus=0 { ... } -> dmstatus=0x430ca2 { ..., anyrunning=1, allrunning=1, ... }
Debug: 253834 32056 riscv.c:2764 riscv_poll_hart(): [riscv.cpu.4] polling, target->state=1
...
Debug: 253836 32057 riscv-013.c:398 scan(): dmcontrol=0x40001 { ..., hartsello=4, ... } ->
...
Debug: 253842 32057 riscv-013.c:398 scan(): dmstatus=0 { ... } -> dmstatus=0x430ca2 { ..., anyrunning=1, allrunning=1, ... }
Debug: 253843 32057 riscv.c:2764 riscv_poll_hart(): [riscv.cpu.5] polling, target->state=1
...
Debug: 253845 32057 riscv-013.c:398 scan(): dmcontrol=0x50001 { ..., hartsello=5, ... } ->
...
Debug: 253851 32058 riscv-013.c:398 scan(): dmstatus=0 { ... } -> dmstatus=0x430ca2 { ..., anyrunning=1, allrunning=1, ... }
Debug: 253852 32058 riscv.c:2764 riscv_poll_hart(): [riscv.cpu.6] polling, target->state=1
...
Debug: 253854 32058 riscv-013.c:398 scan(): dmcontrol=0x60001 { ..., hartsello=6, ... } ->
...
Debug: 253860 32058 riscv-013.c:398 scan(): dmstatus=0 { ... } -> dmstatus=0x430ca2 { ..., anyrunning=1, allrunning=1, ... }
Debug: 253861 32058 riscv.c:2764 riscv_poll_hart(): [riscv.cpu.7] polling, target->state=1
...
Debug: 253863 32058 riscv-013.c:398 scan(): dmcontrol=0x70001 { ..., hartsello=7, ... } ->
...
Debug: 253869 32059 riscv-013.c:398 scan(): dmstatus=0 { ... } -> dmstatus=0x430ca2 { ..., anyrunning=1, allrunning=1, ... }
Debug: 253870 32059 riscv.c:2998 riscv_openocd_poll(): [riscv.cpu.7] should_remain_halted=0, should_resume=0
Debug: 253871 32059 target.c:3075 handle_target(): [riscv.cpu.7] target_poll() -> 0, next attempt in 100ms

So I agree with @aap-sc and would like suggest to double-check the software you are running.

Please, note that the decoding of debug register's fields does note contain the last field (riscv/riscv-debug-spec#908 is a fix for that).

@en-sc en-sc deleted the en-sc/fix-reset-mharts branch August 14, 2024 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants