target/riscv: simplify reset for rtos harts #821

en-sc · 2023-03-22T15:42:20Z

Since the deletion of -rtos hwthread, there is no need to treat harts with -rtos specified differently on reset.

Change-Id: I88a9129936b5172bb7479dfa1255e29ff460c054

en-sc · 2023-03-22T16:58:18Z

Prior to the change, DM_DMCONTROL_HALTREQ was written twice to dmstatus on reset halt (lines 2487 and 2499), which was causing instability on some HW implementations

timsifive

Thanks for adding more error checking, too.

What hardware have you tested this with? I'm especially thinking of multi-hart systems. The tests against spike don't really cover this code, and it's a bit of a house of cards.

src/target/riscv/riscv-013.c

timsifive · 2023-03-23T17:20:11Z

This seems to work fine on HiFive Unleashed.

en-sc · 2023-03-28T17:25:07Z

Thanks for adding more error checking, too.

What hardware have you tested this with? I'm especially thinking of multi-hart systems. The tests against spike don't really cover this code, and it's a bit of a house of cards.

We also tested on HiFive Unleashed as well as on Syntacore SCR7 SMP (on which the bug was discovered in the first place)

JanMatCodasip

@en-sc Thank you for the changes. I have checked the code visually and it looks all right. I have only few minor review findings.

JanMatCodasip · 2023-04-04T06:13:38Z

src/target/riscv/riscv-013.c


 	uint32_t dmstatus;
-	int dmi_busy_delay = info->dmi_busy_delay;
+	const int dmi_busy_delay = info->dmi_busy_delay;


For readability, could we please call this this orig_dmi_busy_delay?

Suggested change

const int dmi_busy_delay = info->dmi_busy_delay;

const int orig_dmi_busy_delay = info->dmi_busy_delay;

JanMatCodasip · 2023-04-04T06:17:10Z

src/target/riscv/riscv-013.c

+					"Increase the timeout with riscv set_reset_timeout_sec.",
+					riscv_reset_timeout_sec, dmstatus,
+					get_field(dmstatus, DM_DMSTATUS_ALLUNAVAIL) ? "true" : "false",
+					get_field(dmstatus, DM_DMSTATUS_ALLUNAVAIL) ? "true" : "false");


There seems to be a copy-paste typo:

Suggested change

get_field(dmstatus, DM_DMSTATUS_ALLUNAVAIL) ? "true" : "false");

get_field(dmstatus, DM_DMSTATUS_ALLHAVERESET) ? "true" : "false");

JanMatCodasip · 2023-04-04T06:25:24Z

src/target/riscv/riscv-013.c

 	uint32_t control = 0, control_haltreq;
 	control = set_field(control, DM_DMCONTROL_DMACTIVE, 1);
 	control_haltreq = set_field(control, DM_DMCONTROL_HALTREQ, target->reset_halt ? 1 : 0);
-	dmi_write(target, DM_DMCONTROL,
+	result = dmi_write(target, DM_DMCONTROL,
 			set_dmcontrol_hartsel(control_haltreq, info->index));


For readability, I'd recommend to simplify this part and set always all the fields of dmcontrol:

uint32_t control = 0; control = set_field(control, DM_DMCONTROL_DMACTIVE, 1); control = set_field(control, DM_DMCONTROL_HALTREQ, target->reset_halt ? 1 : 0); control = set_dmcontrol_hartsel(control, info->index); result = dmi_write(target, DMI_DMCONTROL, control);

Also I'd recommend to use just one variable in this function (control).

JanMatCodasip · 2023-04-04T06:28:46Z

src/target/riscv/riscv-013.c

+	/* Ack reset and clear DM_DMCONTROL_HALTREQ if previously set */
+	return dmi_write(target, DM_DMCONTROL,
+			set_dmcontrol_hartsel(control, info->index) |
+			DM_DMCONTROL_ACKHAVERESET);


Same as at the top of the function, I'd recommend to set all the fields here to make it very easy to read:

control = 0; control = set_field(control, DM_DMCONTROL_DMACTIVE, 1); control = set_field(control, DM_DMCONTROL_ACKHAVERESET, 1); result = dmi_write(target, DMI_DMCONTROL, control);

JanMatCodasip · 2023-04-04T06:32:29Z

src/target/riscv/riscv-013.c

@@ -4459,8 +4424,10 @@ riscv_insn_t riscv013_read_debug_buffer(struct target *target, unsigned index)
 int riscv013_invalidate_cached_debug_buffer(struct target *target)
 {
 	dm013_info_t *dm = get_dm(target);
-	if (!dm)
+	if (!dm) {
+		LOG_TARGET_ERROR(target, "No DM is specified for the target");


Have you please come across a scenario when this error message would pop up?

Since there is no progbuf_cache to invalidate if the target does not have a debug module specified, I'd recommend to demote this to a debug message only.

In another words, if there is no DM assigned, the operation in such case should not be considered as failed - it is simply not applicable.

Thank you for the suggestion! I've added the message just in case. Have not observed it yet. I've demoted the message as you recommended.

en-sc · 2023-04-04T09:58:13Z

@timsifive, @JanMatCodasip, can you please clarify something for me:
When the reset is asserted, DM_DMCONTROL_NDMRESET is set. This is the case now, as well as it was prior to suggested changes. If I understand correctly, this makes it so system-wide reset (except for DM) is performed. Isn't it necessary to write DM_DMCONTROL_ACKHAVERESET for each hart on the platform after (as well as check that all the harts, not just the one selected, came out of reset)?

Since the deletion of `-rtos hwthread`, there is no need to treat harts with `-rtos` specified differently on reset. Signed-off-by: Evgeniy Naydanov <[email protected]> Change-Id: I88a9129936b5172bb7479dfa1255e29ff460c054

en-sc · 2023-04-05T16:37:48Z

Check Code Style fails, but it is due to rebase (no style issues in src/target/riscv/riscv-013.c are reported)

timsifive · 2023-04-06T16:53:56Z

When the reset is asserted, DM_DMCONTROL_NDMRESET is set. This is the case now, as well as it was prior to suggested changes. If I understand correctly, this makes it so system-wide reset (except for DM) is performed. Isn't it necessary to write DM_DMCONTROL_ACKHAVERESET for each hart on the platform after (as well as check that all the harts, not just the one selected, came out of reset)?

It is not required for the target. The reset took effect just as well without writing ackhavereset.

It would be better for OpenOCD to confirm that havereset is set for every hart, and then to ack it so that the bit is clear and OpenOCD will notice if a spontaneous reset occurs.

Alan-19950616 · 2023-10-20T06:18:54Z

@timsifive @en-sc @JanMatCodasip This commit causes certain cores to fail to boot on riscv smp multicore systems.

Here's how I configured it.

set _CHIPNAME0 riscv0
jtag newtap $_CHIPNAME0 cpu -irlen 5

set _TARGETNAME0 $_CHIPNAME0.cpu
target create $_TARGETNAME0.0 riscv -chain-position $_TARGETNAME0 -coreid 0 -rtos hwthread
target create $_TARGETNAME0.1 riscv -chain-position $_TARGETNAME0 -coreid 1
target create $_TARGETNAME0.2 riscv -chain-position $_TARGETNAME0 -coreid 2
target create $_TARGETNAME0.3 riscv -chain-position $_TARGETNAME0 -coreid 3
target create $_TARGETNAME0.4 riscv -chain-position $_TARGETNAME0 -coreid 4
target create $_TARGETNAME0.5 riscv -chain-position $_TARGETNAME0 -coreid 5
target create $_TARGETNAME0.6 riscv -chain-position $_TARGETNAME0 -coreid 6
target create $_TARGETNAME0.7 riscv -chain-position $_TARGETNAME0 -coreid 7

target smp $_TARGETNAME0.0 $_TARGETNAME0.1 $_TARGETNAME0.2 $_TARGETNAME0.3 $_TARGETNAME0.4 $_TARGETNAME0.5 $_TARGETNAME0.6 $_TARGETNAME0.7

aap-sc · 2023-10-20T07:53:01Z

@Alan-19950616

This commit causes certain cores to fail to boot on riscv smp multicore systems.

Few questions:

Could you please clarify what do you mean by 'fail to boot" ? Does OpenOCD returns some error? Does OpenOCD targets indicate that they are running? Or is it you have some loader program that fails to be initialized properly?
Which commands do you use to "boot" the system?
Can you attach -d3 openocd log of a session that has this issue?

Alan-19950616 · 2023-10-23T04:03:31Z

The "fail to boot" is mean this ctest failed, and OpenOCD has no error messages.

smphello_8c.zip

#include <stdio.h>
#include "nuclei_sdk_soc.h"

#if !defined(__riscv_atomic)
#error "RVA(atomic) extension is required for SMP"
#endif

#if !defined(SMP_CPU_CNT)
#error "SMP_CPU_CNT macro is not defined, please set SMP_CPU_CNT to integer value > 1"
#endif

typedef struct {
    uint32_t state;
} spinlock;

spinlock lock;
volatile uint32_t lock_ready = 0;
volatile uint32_t cpu_count = 0;
volatile uint32_t finished = 0;

// comment SPINLOCK_CAS to use AMOSWAP as spinlock
#define SPINLOCK_CAS

__STATIC_FORCEINLINE void spinlock_init(spinlock *lock)
{
    lock->state = 0;
}

__STATIC_FORCEINLINE void spinlock_lock(spinlock *lock)
{
    uint32_t old;
    uint32_t backoff = 10;
    do {
#ifndef SPINLOCK_CAS
        // Use amoswap as spinlock
        old = __AMOSWAP_W((&(lock->state)), 1);
#else
        // use lr.w & sc.w to do CAS as spinlock
        old = __CAS_W((&(lock->state)), 0, 1);
#endif
        if (old == 0) {
            break;
        }
        for (volatile int i = 0; i < backoff; i ++) {
            __NOP();
        }
        backoff += 10;
    } while (1);
}

__STATIC_FORCEINLINE void spinlock_unlock(spinlock *lock)
{
    lock->state = 0;
}

int boot_hart_main(unsigned long hartid);
int other_harts_main(unsigned long hartid);
int main(void);

/* Reimplementation of smp_main for multi-harts */
int smp_main(void)
{
    return main();
}

int main(void)
{
    int ret;
    // get hart id in current cluster
    unsigned long hartid = __get_hart_id();
    if (hartid == BOOT_HARTID) { // boot hart
        spinlock_init(&lock);
        lock_ready = 1;
        finished = 0;
        __SMP_RWMB();
        ret = boot_hart_main(hartid);
    } else { // other harts
        // wait for lock initialized
        while (lock_ready == 0);
        ret = other_harts_main(hartid);
    }
    return ret;
}

int boot_hart_main(unsigned long hartid)
{
    volatile unsigned long waitcnt = 0;
    spinlock_lock(&lock);
    printf("Hello world from hart %d\n", hartid);
    cpu_count += 1;
    spinlock_unlock(&lock);
    // wait for all harts boot and print hello
    while (cpu_count < SMP_CPU_CNT) {
        waitcnt ++;
        __NOP();
        // The waitcnt compare value need to be adjust according
        // to cpu frequency
        if (waitcnt >= SystemCoreClock) {
            break;
        }
    }
    if (cpu_count == SMP_CPU_CNT) {
        printf("All harts boot successfully!\n");
        finished = 1;
        return 0;
    } else {
        printf("Some harts boot failed, only %d/%d booted!\n", cpu_count, SMP_CPU_CNT);
        return -1;
    }
}

int other_harts_main(unsigned long hartid)
{
    spinlock_lock(&lock);
    printf("Hello world from hart %d\n", hartid);
    cpu_count += 1;
    spinlock_unlock(&lock);
    // wait for all harts boot and print hello
    while (cpu_count < SMP_CPU_CNT);
    // wait for boot hart to set finished flag
    while (finished == 0);
    return 0;
}

aap-sc · 2023-10-23T07:49:36Z

@Alan-19950616

The "fail to boot" is mean this ctest failed, and OpenOCD has no error messages.

If openocd has no error messages than likely this may be an application bug (timings of the operations were changed).

Is this real HW or spike?
Could you please double-check the state of targets after "resume" operation completes? This can be done by issuing something like echo "[targets]" or just targets. If all targets are in "running" state - then you'll need to debug the application.

en-sc · 2023-10-24T08:34:56Z

The targets are running, you can see this in the last poll in the log.

Here is the snippet regarding riscv.cpu.0:

Hart zero is selected:

Debug: 253798 32054 riscv.c:2764 riscv_poll_hart(): [riscv.cpu.0] polling, target->state=1
Debug: 253799 32054 riscv-013.c:388 scan(): 41b w 00000001 @10 -> + 00000000 @00; 4i
Debug: 253800 32054 riscv-013.c:398 scan(): dmcontrol=1 { dmactive=active, ndmreset=0, hartsello=0, clrresethaltreq=0, hasel=single, ackunavail=nop, ackhavereset=nop, hartreset=0, setresethaltreq=0, resumereq=0, haltreq=0, clrkeepalive=0, setkeepalive=0, } ->
Debug: 253801 32054 riscv-013.c:388 scan(): 41b - 00000000 @10 -> + 00000001 @10; 4i
Debug: 253802 32054 riscv-013.c:398 scan(): dmcontrol=0 { dmactive=inactive, ndmreset=0, hartsello=0, clrresethaltreq=0, hasel=single, ackunavail=nop, ackhavereset=nop, hartreset=0, setresethaltreq=0, resumereq=0, haltreq=0, clrkeepalive=0, setkeepalive=0, } -> dmcontrol=1 { dmactive=active, ndmreset=0, hartsello=0, clrresethaltreq=0, hasel=single, ackunavail=nop, ackhavereset=nop, hartreset=0, setresethaltreq=0, resumereq=0, haltreq=0, clrkeepalive=0, setkeepalive=0, }

dmstatus is read to get the current state:

Debug: 253803 32054 riscv-013.c:388 scan(): 41b r 00000000 @11 -> + 00000000 @00; 4i
Debug: 253804 32054 riscv-013.c:398 scan(): dmstatus=0 { version=none, anyrunning=0, allrunning=0, anyunavail=0, allunavail=0, anynonexistent=0, allnonexistent=0, anyresumeack=0, allresumeack=0, anyhavereset=0, allhavereset=0, impebreak=0, stickyunavail=current, ndmresetpending=false, confstrptrvalid=invalid, hasresethaltreq=0, authbusy=ready, authenticated=false, anyhalted=0, } ->
Debug: 253805 32055 riscv-013.c:388 scan(): 41b - 00000000 @11 -> + 00430ca2 @11; 4i
Debug: 253806 32055 riscv-013.c:398 scan(): dmstatus=0 { version=none, anyrunning=0, allrunning=0, anyunavail=0, allunavail=0, anynonexistent=0, allnonexistent=0, anyresumeack=0, allresumeack=0, anyhavereset=0, allhavereset=0, impebreak=0, stickyunavail=current, ndmresetpending=false, confstrptrvalid=invalid, hasresethaltreq=0, authbusy=ready, authenticated=false, anyhalted=0, } -> dmstatus=0x430ca2 { version=0.13, anyrunning=1, allrunning=1, anyunavail=0, allunavail=0, anynonexistent=0, allnonexistent=0, anyresumeack=1, allresumeack=1, anyhavereset=0, allhavereset=0, impebreak=1, stickyunavail=current, ndmresetpending=false, confstrptrvalid=invalid, hasresethaltreq=1, authbusy=ready, authenticated=true, anyhalted=0, }

We can see that anyrunning and allrunning are set.

Debug: 253806 32055 riscv-013.c:398 scan(): dmstatus=0 { ... } -> dmstatus=0x430ca2 { version=0.13, anyrunning=1, allrunning=1, ... }

This is the same for all the other harts (with dmcontrol.hartsello set accordingly):

Debug: 253798 32054 riscv.c:2764 riscv_poll_hart(): [riscv.cpu.0] polling, target->state=1
...
Debug: 253800 32054 riscv-013.c:398 scan(): dmcontrol=1 { ..., hartsello=0, ... } ->
...
Debug: 253806 32055 riscv-013.c:398 scan(): dmstatus=0 { ... } -> dmstatus=0x430ca2 { ... , anyrunning=1, allrunning=1, ... }
Debug: 253807 32055 riscv.c:2764 riscv_poll_hart(): [riscv.cpu.1] polling, target->state=1
...
Debug: 253809 32055 riscv-013.c:398 scan(): dmcontrol=0x10001 { ..., hartsello=1, ... } ->
...
Debug: 253815 32055 riscv-013.c:398 scan(): dmstatus=0 { ... } -> dmstatus=0x430ca2 { ..., anyrunning=1, allrunning=1, ... }
Debug: 253816 32055 riscv.c:2764 riscv_poll_hart(): [riscv.cpu.2] polling, target->state=1
...
Debug: 253818 32055 riscv-013.c:398 scan(): dmcontrol=0x20001 { ..., hartsello=2, ... } ->
...
Debug: 253824 32056 riscv-013.c:398 scan(): dmstatus=0 { ... } -> dmstatus=0x430ca2 { ..., anyrunning=1, allrunning=1, ... }
Debug: 253825 32056 riscv.c:2764 riscv_poll_hart(): [riscv.cpu.3] polling, target->state=1
...
Debug: 253827 32056 riscv-013.c:398 scan(): dmcontrol=0x30001 { ..., hartsello=3, ... } ->
...
Debug: 253833 32056 riscv-013.c:398 scan(): dmstatus=0 { ... } -> dmstatus=0x430ca2 { ..., anyrunning=1, allrunning=1, ... }
Debug: 253834 32056 riscv.c:2764 riscv_poll_hart(): [riscv.cpu.4] polling, target->state=1
...
Debug: 253836 32057 riscv-013.c:398 scan(): dmcontrol=0x40001 { ..., hartsello=4, ... } ->
...
Debug: 253842 32057 riscv-013.c:398 scan(): dmstatus=0 { ... } -> dmstatus=0x430ca2 { ..., anyrunning=1, allrunning=1, ... }
Debug: 253843 32057 riscv.c:2764 riscv_poll_hart(): [riscv.cpu.5] polling, target->state=1
...
Debug: 253845 32057 riscv-013.c:398 scan(): dmcontrol=0x50001 { ..., hartsello=5, ... } ->
...
Debug: 253851 32058 riscv-013.c:398 scan(): dmstatus=0 { ... } -> dmstatus=0x430ca2 { ..., anyrunning=1, allrunning=1, ... }
Debug: 253852 32058 riscv.c:2764 riscv_poll_hart(): [riscv.cpu.6] polling, target->state=1
...
Debug: 253854 32058 riscv-013.c:398 scan(): dmcontrol=0x60001 { ..., hartsello=6, ... } ->
...
Debug: 253860 32058 riscv-013.c:398 scan(): dmstatus=0 { ... } -> dmstatus=0x430ca2 { ..., anyrunning=1, allrunning=1, ... }
Debug: 253861 32058 riscv.c:2764 riscv_poll_hart(): [riscv.cpu.7] polling, target->state=1
...
Debug: 253863 32058 riscv-013.c:398 scan(): dmcontrol=0x70001 { ..., hartsello=7, ... } ->
...
Debug: 253869 32059 riscv-013.c:398 scan(): dmstatus=0 { ... } -> dmstatus=0x430ca2 { ..., anyrunning=1, allrunning=1, ... }
Debug: 253870 32059 riscv.c:2998 riscv_openocd_poll(): [riscv.cpu.7] should_remain_halted=0, should_resume=0
Debug: 253871 32059 target.c:3075 handle_target(): [riscv.cpu.7] target_poll() -> 0, next attempt in 100ms

So I agree with @aap-sc and would like suggest to double-check the software you are running.

Please, note that the decoding of debug register's fields does note contain the last field (riscv/riscv-debug-spec#908 is a fix for that).

en-sc force-pushed the en-sc/fix-reset-mharts branch from 018ab71 to 9cecda8 Compare March 22, 2023 16:51

en-sc changed the title ~~target/riscv: fix reset for rtos harts~~ target/riscv: simplify reset for rtos harts Mar 22, 2023

timsifive requested changes Mar 23, 2023

View reviewed changes

src/target/riscv/riscv-013.c Outdated Show resolved Hide resolved

src/target/riscv/riscv-013.c Outdated Show resolved Hide resolved

src/target/riscv/riscv-013.c Outdated Show resolved Hide resolved

src/target/riscv/riscv-013.c Outdated Show resolved Hide resolved

en-sc force-pushed the en-sc/fix-reset-mharts branch 3 times, most recently from 4b26eac to 3e9a08d Compare March 23, 2023 18:02

en-sc requested a review from timsifive March 28, 2023 17:30

timsifive previously approved these changes Mar 29, 2023

View reviewed changes

JanMatCodasip requested changes Apr 4, 2023

View reviewed changes

en-sc dismissed timsifive’s stale review via e76c3d3 April 4, 2023 09:23

en-sc force-pushed the en-sc/fix-reset-mharts branch from 3e9a08d to e76c3d3 Compare April 4, 2023 09:23

target/riscv: simplify reset

1c16824

Since the deletion of `-rtos hwthread`, there is no need to treat harts with `-rtos` specified differently on reset. Signed-off-by: Evgeniy Naydanov <[email protected]> Change-Id: I88a9129936b5172bb7479dfa1255e29ff460c054

en-sc force-pushed the en-sc/fix-reset-mharts branch from e76c3d3 to 1c16824 Compare April 5, 2023 16:15

timsifive approved these changes Apr 6, 2023

View reviewed changes

timsifive merged commit 15bb3e2 into riscv-collab:riscv Apr 6, 2023

en-sc deleted the en-sc/fix-reset-mharts branch August 14, 2024 18:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

target/riscv: simplify reset for rtos harts #821

target/riscv: simplify reset for rtos harts #821

en-sc commented Mar 22, 2023 •

edited

Loading

en-sc commented Mar 22, 2023

timsifive left a comment

timsifive commented Mar 23, 2023

en-sc commented Mar 28, 2023

JanMatCodasip left a comment

JanMatCodasip Apr 4, 2023

en-sc Apr 4, 2023

JanMatCodasip Apr 4, 2023

en-sc Apr 4, 2023

JanMatCodasip Apr 4, 2023 •

edited

Loading

en-sc Apr 4, 2023

JanMatCodasip Apr 4, 2023

en-sc Apr 4, 2023

JanMatCodasip Apr 4, 2023 •

edited

Loading

en-sc Apr 4, 2023

en-sc commented Apr 4, 2023 •

edited

Loading

en-sc commented Apr 5, 2023

timsifive commented Apr 6, 2023

Alan-19950616 commented Oct 20, 2023

aap-sc commented Oct 20, 2023

Alan-19950616 commented Oct 23, 2023 •

edited

Loading

aap-sc commented Oct 23, 2023

en-sc commented Oct 24, 2023

	const int dmi_busy_delay = info->dmi_busy_delay;
	const int orig_dmi_busy_delay = info->dmi_busy_delay;

	get_field(dmstatus, DM_DMSTATUS_ALLUNAVAIL) ? "true" : "false");
	get_field(dmstatus, DM_DMSTATUS_ALLHAVERESET) ? "true" : "false");

target/riscv: simplify reset for rtos harts #821

target/riscv: simplify reset for rtos harts #821

Conversation

en-sc commented Mar 22, 2023 • edited Loading

en-sc commented Mar 22, 2023

timsifive left a comment

Choose a reason for hiding this comment

timsifive commented Mar 23, 2023

en-sc commented Mar 28, 2023

JanMatCodasip left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JanMatCodasip Apr 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JanMatCodasip Apr 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

en-sc commented Apr 4, 2023 • edited Loading

en-sc commented Apr 5, 2023

timsifive commented Apr 6, 2023

Alan-19950616 commented Oct 20, 2023

aap-sc commented Oct 20, 2023

Alan-19950616 commented Oct 23, 2023 • edited Loading

aap-sc commented Oct 23, 2023

en-sc commented Oct 24, 2023

en-sc commented Mar 22, 2023 •

edited

Loading

JanMatCodasip Apr 4, 2023 •

edited

Loading

JanMatCodasip Apr 4, 2023 •

edited

Loading

en-sc commented Apr 4, 2023 •

edited

Loading

Alan-19950616 commented Oct 23, 2023 •

edited

Loading