Skip to content

Commit

Permalink
[docs] update: Zalrsc -> Zaamo
Browse files Browse the repository at this point in the history
  • Loading branch information
stnolting committed Jan 3, 2025
1 parent 69e8268 commit d65663e
Show file tree
Hide file tree
Showing 6 changed files with 65 additions and 109 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ setup according to your needs. Note that all of the following SoC modules are en
[[`B`](https://stnolting.github.io/neorv32/#_b_isa_extension)]
[[`U`](https://stnolting.github.io/neorv32/#_u_isa_extension)]
[[`X`](https://stnolting.github.io/neorv32/#_x_isa_extension)]
[[`Zalrsc`](https://stnolting.github.io/neorv32/#_zalrsc_isa_extension)]
[[`Zaamo`](https://stnolting.github.io/neorv32/#_zaamo_isa_extension)]
[[`Zba`](https://stnolting.github.io/neorv32/#_zba_isa_extension)]
[[`Zbb`](https://stnolting.github.io/neorv32/#_zbb_isa_extension)]
[[`Zbkb`](https://stnolting.github.io/neorv32/#_zbkb_isa_extension)]
Expand Down
78 changes: 30 additions & 48 deletions docs/datasheet/cpu.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -415,7 +415,8 @@ always valid when set.
| `rw` | 1 | Access direction (`0` = read, `1` = write)
| `src` | 1 | Access source (`0` = instruction fetch, `1` = load/store)
| `priv` | 1 | Set if privileged (M-mode) access
| `rvso` | 1 | Set if current access is a reservation-set operation (`lr` or `sc` instruction, <<_zalrsc_isa_extension>>)
| `amo` | 1 | Set if current access is an atomic memory operation (<<_atomic_memory_access>>)
| `amoop` | 4 | Type of atomic memory operation (<<_atomic_memory_access>>)
3+^| **Out-Of-Band Signals**
| `fence` | 1 | Data/instruction fence request; single-shot
| `sleep` | 1 | Set if ALL upstream devices are in <<_sleep_mode>>
Expand Down Expand Up @@ -463,36 +464,31 @@ additional latency). However, _all_ bus signals (request and response) need to b


:sectnums:
==== Atomic Accesses
==== Atomic Memory Access

The load-reservate (`lr.w`) and store-conditional (`sc.w`) instructions from the <<_zalrsc_isa_extension>> execute as standard
load/store bus transactions but with the `rvso` ("reservation set operation") signal being set. It is the task of the
<<_reservation_set_controller>> to handle these LR/SC bus transactions accordingly. Note that these reservation set operations
are intended for processor-internal usage only (i.e. the reservation state is not available for processor-external modules yet).
The <<_zaamo_isa_extension>> adds atomic read-modify-write memory operations. Since the <<_bus_interface_protocol>>
only supports read-or-write operations, the atomic memory requests are handled by a dedicated module of the bus
infrastructure - the <<_atomic_memory_operations_controller>>.

.Reservation Set Controller
[NOTE]
See section <<_address_space>> / <<_reservation_set_controller>> for more information.

The figure below shows three exemplary bus accesses (1 to 3 from left to right). The `req` signal record represents
the CPU-side of the bus interface. For easier understanding the current state of the reservation set is added as `rvs_valid` signal.
For the CPU, the atomic memory accesses are handled as plain "load" operation but with the `amo` signal set
and also providing write data (see <<_bus_interface>>). The `amoop` signal defines the actual atomic processing
operation:

[start=1]
. A load-reservate (LR) instruction using `addr` as address. This instruction returns the loaded data `rdata` via `rsp.data`
and also registers a reservation for the address `addr` (`rvs_valid` becomes set).
. A store-conditional (SC) instruction attempts to write `wdata1` to address `addr`. This SC operation **succeeds**, so
`wdata1` is actually written to address `addr`. The successful operation is indicated by a **0** being returned via
`rsp.data` together with `ack`. As the LR/SC is completed the registered reservation is invalidated (`rvs_valid` becomes cleared).
. Another store-conditional (SC) instruction attempts to write `wdata2` to address `addr`. As the reservation set is already
invalidated (`rvs_valid` is `0`) the store access fails, so `wdata2` is **not** written to address `addr` at all. The failed
operation is indicated by a **1** being returned via `rsp.data` together with `ack`.

.Three Exemplary LR/SC Bus Transactions (showing only in-band signals)
image::bus_interface_atomic.png[700]

.Store-Conditional Status
[NOTE]
The "normal" load data mechanism is used to return success/failure of the `sc.w` instruction to the CPU (via the LSB of `rsp.data`).
.AMO Operation Type Encoding
[cols="<1,<4"]
[options="header",grid="rows"]
|=======================
| `bus_req_t.amoop` | Description
| `-000` | swap
| `-001` | unsigned add
| `-010` | logical xor
| `-011` | logical and
| `-100` | logical or
| `0110` | unsigned minimum
| `0111` | unsigned maximum
| `1110` | signed minimum
| `1111` | signed maximum
|=======================

.Cache Coherency
[IMPORTANT]
Expand Down Expand Up @@ -521,7 +517,7 @@ This chapter gives a brief overview of all available ISA extensions.
| <<_m_isa_extension,`M`>> | Integer multiplication and division instructions | <<_processor_top_entity_generics, `RISCV_ISA_M`>>
| <<_u_isa_extension,`U`>> | Less-privileged _user_ mode extension | <<_processor_top_entity_generics, `RISCV_ISA_U`>>
| <<_x_isa_extension,`X`>> | Platform-specific / NEORV32-specific extension | Always enabled
| <<_zalrsc_isa_extension,`Zalrsc`>> | Atomic reservation-set instructions | <<_processor_top_entity_generics, `RISCV_ISA_Zalrsc`>>
| <<_zaamo_isa_extension,`Zaamo`>> | Atomic memory operations | <<_processor_top_entity_generics, `RISCV_ISA_Zaamo`>>
| <<_zba_isa_extension,`Zba`>> | Shifted-add bit manipulation instructions | <<_processor_top_entity_generics, `RISCV_ISA_Zba`>>
| <<_zbb_isa_extension,`Zbb`>> | Basic bit manipulation instructions | <<_processor_top_entity_generics, `RISCV_ISA_Zbb`>>
| <<_zbkb_isa_extension,`Zbkb`>> | Scalar cryptographic bit manipulation instructions | <<_processor_top_entity_generics, `RISCV_ISA_Zbkb`>>
Expand Down Expand Up @@ -689,37 +685,23 @@ RISC-V specs. Also, custom trap codes for <<_mcause>> are implemented.
* There are <<_neorv32_specific_csrs>>.


==== `Zalrsc` ISA Extension

The `Zalrsc` ISA extension is a sub-extension of the RISC-V _atomic memory access_ (`A`) ISA extension and includes
instructions for reservation-set operations (load-reservate `lr` and store-conditional `sc`) only.
It is enabled by the top's <<_processor_top_entity_generics, `RISCV_ISA_Zalrsc`>> generic.
==== `Zaamo` ISA Extension

.AMO / `A` Emulation
[NOTE]
The atomic memory access / read-modify-write operations of the `A` ISA extension can be emulated using the
LR and SC operations (quote from the RISC-V spec.: "_Any AMO can be emulated by an LR/SC pair._").
The NEORV32 <<_core_libraries>> provide an emulation wrapper for emulating AMO/read-modify-write instructions that is
based on LR/SC pairs. A demo/program can be found in `sw/example/atomic_test`.
The `Zaamo` ISA extension is a sub-extension of the RISC-V `A` ISA extension and compromises instructions for read-modify-write
<<_atomic_memory_access>> operations. It is enabled by the top's <<_processor_top_entity_generics, `RISCV_ISA_Zaamo`>> generic.

.Instructions and Timing
[cols="<2,<4,<3"]
[cols="<2,<4,<1"]
[options="header", grid="rows"]
|=======================
| Class | Instructions | Execution cycles
| Load-reservate word | `lr.w` | 5
| Store-conditional word | `sc.w` | 5
| Atomic memory operations | `amoswap.w` `amoadd.w` `amoand.w` `amoor.w` `amoxor.w` `amomax[u].w` `amomin[u].w` | 5 + 2 * _memory_latency_
|=======================

.`aq` and `rl` Bits
[NOTE]
The instruction word's `aq` and `lr` memory ordering bits are not evaluated by the hardware at all.

.Atomic Memory Access on Hardware Level
[NOTE]
More information regarding the atomic memory accesses and the according reservation
sets can be found in section <<_reservation_set_controller>>.


==== `Zifencei` ISA Extension

Expand Down
78 changes: 26 additions & 52 deletions docs/datasheet/soc.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -226,7 +226,7 @@ The generic type "`suv(x:y)`" is an abbreviation for "`std_ulogic_vector(x downt
| `RISCV_ISA_E` | boolean | false | Enable <<_e_isa_extension>> (reduced register file size).
| `RISCV_ISA_M` | boolean | false | Enable <<_m_isa_extension>> (hardware-based integer multiplication and division).
| `RISCV_ISA_U` | boolean | false | Enable <<_u_isa_extension>> (less-privileged user mode).
| `RISCV_ISA_Zalrsc` | boolean | false | Enable <<_zalrsc_isa_extension>> (atomic reservation-set operations).
| `RISCV_ISA_Zaamo` | boolean | false | Enable <<_zaamo_isa_extension>> (atomic memory operations).
| `RISCV_ISA_Zba` | boolean | false | Enable <<_zba_isa_extension>> (shifted-add bit-manipulation instructions).
| `RISCV_ISA_Zbb` | boolean | false | Enable <<_zbb_isa_extension>> (basic bit-manipulation instructions).
| `RISCV_ISA_Zbkb` | boolean | false | Enable <<_zbkb_isa_extension>> (scalar cryptography bit manipulation instructions).
Expand Down Expand Up @@ -576,67 +576,41 @@ explicit specific processor generic. See section <<_processor_external_bus_inter


:sectnums:
==== Reservation Set Controller
==== Atomic Memory Operations Controller

The reservation set controller is responsible for handling the load-reservate and store-conditional bus transaction that
are triggered by the `lr.w` (LR) and `sc.w` (SC) instructions from the CPU's <<_zalrsc_isa_extension>>.
The atomic memory operations (AMO) controller is responsible for handling the read-modify-write operations issued by the
CPU's <<_zaamo_isa_extension>>. For each AMO request, the controller executes an atomic set of three operations:

A "reservation" defines an address or address range that provides a guarding mechanism to support atomic accesses. A new
reservation is registered by the LR instruction. The address provided by this instruction defines the memory location
that is now monitored for atomic accesses. The according SC instruction evaluates the state of this reservation. If
the reservation is still valid the write access triggered by the SC instruction is finally executed and the instruction
return a "success" state (`rd` = 0). If the reservation has been invalidated the SC instruction will not write to memory
and will return a "failed" state (`rd` = 1).

.Reservation Set(s) and Granule
[NOTE]
The reservation set controller supports only **a single** global reservation set with a **word-aligned 4-byte granule**.

The reservation is invalidated if...

* an SC instruction is executed that accesses an address **outside** of the reservation set of the previous LR instruction.
This SC instruction will **fail** (not writing to memory).
* an SC instruction is executed that accesses an address **inside** of the reservation set of the previous LR instruction.
This SC instruction will **succeed** (finally writing to memory).
* a normal store operation accesses an address **inside** of the current reservation set (by the CPU or by the DMA).
* a hardware reset is triggered.

.Consecutive LR Instructions
[NOTE]
If an LR instruction is followed by another LR instruction the reservation set of the former one is overridden
by the reservation set of the latter one.
.Simplified AMO Controller Operation
[cols="^1,<3,<6"]
[options="header",grid="rows"]
|=======================
| Step | Pseudo Code | Description
| 1 | `tmp1 <= MEM[address];` | Perform a read operation accessing the addressed memory
cell and store the loaded data into an internal buffer (`tmp1`).
| 2 | `tmp2 <= tmp1 OP cpu_wdata` | The buffered data from the first step is processed
using the write data provide by the CPU. The result is stored to another internal buffer (`tmp2`).
| 3 | `MEM[address] <= tmp2;` `cpu_rdata <= tmp1;` | The data from the second buffer (`tmp2`) is
written to the addressed memory cell. In parallel, the data from the first buffer (`tmp1` = original
content of the addresses memory cell) is sent back to the requesting CPU.
|=======================

.Bus Access Errors
[IMPORTANT]
If the LR operation causes a bus access error (raising a load access exception) the reservation **is registered anyway**.
If the SC operation causes a bus access error (raising a store access exception) an already registered reservation set
**is invalidated anyway**.
The controller performs two bus transactions: a read operations and a write operation. Only the acknowledge/error
handshake of the last transaction is sent back to the CPU.

.Strong Semantic
[IMPORTANT]
The LR/SC mechanism follows the _strong semantic_ approach: the LR/SC instruction pair fails only if there is a write
access to the referenced memory location between the LR and SC instructions (by the CPU itself or by the DMA).
Context changes, interrupts, traps, etc. do not effect nor invalidate the reservation state at all.
As the AMO controller is the memory-nearest instance (see <<_bus_system>>) the previously described set of operations
cannot be interrupted. Hence, they execute in an atomic way.

.Physical Memory Attributes
[NOTE]
The reservation set can be set for _any_ address (only constrained by the configured granularity). This also
includes cached memory, memory-mapped IO devices and processor-external address spaces.

Bus transactions triggered by the LR instruction register a new reservation set and are delegated to the adressed
memory/device. Bus transactions triggered by the SC remove a reservation set and are forwarded to the adressed
memory/device only if the SC operations succeeds. Otherwise, the access request is not forwarded and a local ACK is
generated to terminate the bus transaction.

.LR/SC Bus Protocol
[NOTE]
More information regarding the LR/SC bus transactions and the the according protocol can be found in section
<<_bus_interface>> / <<_atomic_accesses>>.
Atomic memory operations can be executed for _any_ address. This also includes
cached memory, memory-mapped IO devices and processor-external address spaces.

.Cache Coherency
[IMPORTANT]
Atomic operations **always bypass** the cache using direct/uncached accesses. Care must be taken
to maintain data cache coherency (e.g. by using the `fence` instruction).
Atomic operations **always bypass** the CPU's <<_processor_internal_data_cache_dcache, data cache>>
using direct/uncached accesses. Care must be taken to maintain data cache coherency when accessing
cached memory (e.g. by using the `fence` instruction).


:sectnums:
Expand Down
6 changes: 3 additions & 3 deletions docs/datasheet/soc_dcache.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
**Overview**

The processor features an optional data cache to improve performance when using memories with high
access latencies. The cache is connected directly to the CPU's data access interface and provides
access latency. The cache is connected directly to the CPU's data access interface and provides
full-transparent accesses. The cache is direct-mapped and uses "write-allocate" and "write-back" strategies.

.Cached/Uncached Accesses
Expand All @@ -28,8 +28,8 @@ The data cache provides direct accesses (= uncached) to memory in order to acces
processor-internal IO/peripheral modules). All accesses that target the address range from `0xF0000000` to `0xFFFFFFFF`
will not be cached at all (see section <<_address_space>>). Direct/uncached accesses have **lower** priority than
cache block operations to allow continuous burst transfer and also to maintain logical instruction forward
progress / data coherency. Furthermore, atomic load-reservate and store-conditional instructions (<<_zalrsc_isa_extension>>)
will always **bypass** the cache.
progress / data coherency. Furthermore, the atomic memory operations of the <<_zaamo_isa_extension>> will
always **bypass** the cache.

.Caching Internal Memories
[NOTE]
Expand Down
6 changes: 3 additions & 3 deletions docs/datasheet/soc_icache.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
**Overview**

The processor features an optional instruction cache to improve performance when using memories with high
access latencies. The cache is connected directly to the CPU's instruction fetch interface and provides
access latency. The cache is connected directly to the CPU's instruction fetch interface and provides
full-transparent accesses. The cache is direct-mapped and read-only.

.Cached/Uncached Accesses
Expand All @@ -28,8 +28,8 @@ The data cache provides direct accesses (= uncached) to memory in order to acces
processor-internal IO/peripheral modules). All accesses that target the address range from `0xF0000000` to `0xFFFFFFFF`
will not be cached at all (see section <<_address_space>>). Direct/uncached accesses have **lower** priority than
cache block operations to allow continuous burst transfer and also to maintain logical instruction forward
progress / data coherency. Furthermore, atomic load-reservate and store-conditional instructions (<<_zalrsc_isa_extension>>)
will always **bypass** the cache.
progress / data coherency. Furthermore, the atomic memory operations of the <<_zaamo_isa_extension>> will
always **bypass** the cache.

.Caching Internal Memories
[NOTE]
Expand Down
4 changes: 2 additions & 2 deletions docs/datasheet/soc_xbus.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -140,5 +140,5 @@ The data cache provides direct accesses (= uncached) to memory in order to acces
All accesses that target the address range from `0xF0000000` to `0xFFFFFFFF`
will not be cached at all (see section <<_address_space>>). Direct/uncached accesses have **lower** priority than
cache block operations to allow continuous burst transfer and also to maintain logical instruction forward
progress / data coherency. Furthermore, atomic load-reservate and store-conditional instructions (<<_zalrsc_isa_extension>>)
will always **bypass** the cache.
progress / data coherency. Furthermore, the atomic memory operations of the <<_zaamo_isa_extension>> will
always **bypass** the cache.

0 comments on commit d65663e

Please sign in to comment.