You are on page 1of 16

Semidynamics

Avispado 222 Manual

Version 1.0
04/28/2022
Contents
1 Avispado 222 Features 3
1.1 Supported Modes 3
1.2 Block Diagram 3
1.3 Front End 4
1.4 Issue Queues 5
1.5 Load Store Unit 5
1.6 Floating point Unit 6
1.7 MMU 7
1.8 PMP 7
1.9 PMU 7

2 Avispado 222 Ports 8

3 Memory Map & PMA 13

4 Implemented CSR Registers 13


4.1 Machine Information Registers 16

5 Interrupts and Exceptions 16


5.1 Interrupt Pins 16
5.2 Interrupt Priorities 16

2
1 Avispado 222 Features
Avispado 222 is a 12-stage, 3-way-issue, 2-way-commit, in-order, 64-bit, RISC-V core that
supports the following RISC-V extensions:
● A: Atomic instructions
● C: Compressed instructions
● D: Double-Precision Floating Point instructions
● F: Single-Precision Floating Point instructions
● I: Base integer instruction set
● M: Integer Multiplication and Division instructions
● Zicsr: Control and Status Register instructions
● Zifencei: fence.i instruction

Avispado 222 adheres to the following specifications:


● Risc-V Instruction Set Manual, Volume I: Unprivileged ISA, version 20200125
● Risc-V Instruction Set Manual, Volume II: Privileged Architecture, v1.12
● Risc-V Debug Support v1.0.0

Avispado 222 supports SV39 and SV48 virtual memory and it implements Semidynamics
Gazillion MissesTM technology, allowing it to support up to 128 outstanding misses.

1.1 Supported Modes


Avispado 222 supports RISC‑V supervisor and user modes, providing three levels of
privilege: machine (M), user (U), and supervisor (S). See The RISC‑V Instruction Set
Manual, Volume II: Privileged Architecture, Version 1.10 for more information on the
privilege modes.

1.2 Block Diagram


The Avispado 222 block diagram is shown below:

3
1.3 Front End
The Front-End is the part of Avispado 222 responsible for providing instructions to the Execution
Units. The main blocks of the Front-End are the Branch Predictor, the Instruction Memory
System and the Decoder. The Front-End in Avispado provides up to 2 instructions per cycle (8B)
to the back-end of the machine.

Branch Prediction:

Avispado 222 implements a TAGE branch predictor. The branch predictor contains:
● 64-entry 2-way tagged Branch Target Buffer (BTB), that predicts the destination address
of taken branches and indirect jumps
● 4-entry Return Address Stack (RAS) for return instruction prediction
● 3 different tables for TAGE prediction with a total size over 4 Kbits

Correctly-predicted taken branches have no penalty regardless of whether they are direct or
indirect, or if the target address is word or half-word aligned. Direct jumps mispredicted are
detected in the decode stage and the front-end is redirected to the correct address with a
penalty cost of 5 cycles. Branch misprediction penalty is 9 cycles.

Instruction Memory System:

The Instruction Memory system is composed of the Instruction TLB and the Instruction Cache.
The Instruction cache is 16 KB 4-way associative cache virtually indexed, physically-tagged.
Cache line size is 64B and implements pseudo-LRU replacement policy. In case of miss, a
request for the line is sent down to the next level in the memory system via AXI4. The
Instruction Memory system supports up to 8 outstanding misses from the Instruction Cache.
This feature helps to prefetch instruction lines and to have a smoother instruction delivery to the
next stages of the processor.

The Instruction Cache is not kept coherent with the rest of the memory. Hence, applications that
require modifying code must use fence.i instructions to fully invalidate the Instruction Cache
contents and synchronize with the newly produced values.

The Instruction TLB has 16 entries and it is fully-associative. It is responsible for translating
virtual to physical addresses. In case of miss, the PTW is invoked to provide the corresponding
Page Table Entry. The accessed physical addresses are checked in the PMA (Physical Memory
Attributes). In case of invalid addresses, the Instruction Memory System will raise an Instruction
Access Exception.

Decoder:

The decoder receives 512-bit cache lines from the Instruction Memory System and it extracts up
to 2 instructions to the next stages. As it has been mentioned earlier, this block also checks that

4
an indirect jump has been predicted as taken and the address matches with the one in the
instruction (in case the destination of the branch is encoded in the instruction and it does not
depend on any register). Otherwise it redirects the front-end and flushes the forthcoming
instructions.

Similarly, for those predicted taken branches whose destination address can be computed from
the instruction binary (i.e, conditional jumps), if the predicted address does not match with the
computed one, the front-end is redirected and the instructions flushed.

1.4 Issue Queues

Avispado 222 has 3 issue queues, as shown in the block diagram. Instructions are issued in
program order from each of the queues to the corresponding execution units whenever their
operands are available either in the register file or the bypass network. The issue queues are
split based on operation function as follows:
● General Issue Queue (GIQ): Holds all integer, branch, and CSR-related instructions until
operands are ready and dispatches them to the integer execution pipeline. Latencies for
most of the operations are single-cycle except for integer multiplication, division and
reminder. Some CSR writes may cause a complete pipeline flush.
● Memory Issue Queue (MIQ): Holds all memory related instructions, i.e., load, store and
atomic operations, both integer and floating point. Load-to-use latency for loads that hit
in the D-cache is 3 clock cycles.
● Floating Point Issue Queue (VFIQ): Holds all vector and floating point arithmetic
instructions. Latencies for most of the operations are fixed, except for the floating point
divide and square root, and some special vector instructions (i.e, vrgather, etc.)

1.5 Load Store Unit


The Load Store Unit is responsible for executing all memory related instructions, that include
scalar loads and stores and atomic instructions. The Load Store Unit is fully pipelined and it is
able to accept one instruction per cycle.

Data Cache:

The Avispado 222’s Data Cache is a 32KB 8-way associative cache with a cache line of 64B. It
is virtually indexed and physically tagged and it implements pseudo-LRU for replacement. It is
write-allocate (for scalar stores) and copy-back (dirty lines only update upper levels of memory
when the line is evicted).

Gazillion MissesTM:

Avispado 222 has the ability to manage up to 128 outstanding data cache misses.

5
Unaligned accesses:

Avispado 222’s Load Store Unit is able to deal with unaligned memory access. The solution is
purely hardware-based and it is able to request the two lines and merge the result in case of an
unaligned scalar load, or to write to the two lines in case of an unaligned store.

DTLB:

The Data TLB is a 32 entry 8-way set associative cache. It is responsible for translating virtual to
physical addresses. In case of DTLB miss, the hardware page table walker (PTW) is invoked to
provide the corresponding Page Table Entry.

PMA:

All physical addresses generated by the Load Store unit are run by the PMA (described in
section 4 Memory Map & PMA). If the PMA does not validate the access, an exception is
generated.

Store Buffer:

Stores are executed in the Load Store Unit but the corresponding write to memory only happens
once the store instruction has reached the commit stage. Meanwhile, pending stores are kept in
the Store Buffer. This structure has a capacity of 16 entries. Avispado 222 implements support
for store-to-load data forwarding. In case the value to be accessed is spread across several
entries of the Store Buffer or it is not fully contained in a single entry, forwarding is disabled and
the load needs to wait until the store is completed.

Atomic Operations:

Avispado 222 provides support for all atomic instructions as described in the A chapter of the
RISC-V Instruction Set Architecture Volume I: Unprivileged ISA. Avispado 222 employs the
exclusive access mechanism provided in AMBA AXI4 protocol specification to read/write
memory when processing AMOs, load-reserved and store-conditional.

1.6 Floating point Unit


Avispado 222 provides full support for both single and double precision floating point arithmetic
as described in the IEEE 754-2008 floating point standard, including hardware support for
de-normal values. The FPU contains units to execute all instructions required for the F
(single-precision) and D (double-precision) floating point extensions as described in the RISC-V
Instruction Set Architecture Volume I: Unprivileged ISA. It contains a fully pipelined
fused-multiply-add, a floating point-to-integer (and vice versa) convert unit, a floating point

6
comparator, and an iterative non-pipelined floating point divider and square-root unit that is able
to compute 1 bit per cycle.

1.7 MMU

Avispado 222 supports bare-metal execution. Furthermore, it supports SV39 and SV48 virtual
memory as described in the RISC-V Instruction Set Architecture Volume II: Privileged
Architecture. Avispado 222 provides a 48-bit virtual address space and up to 40-bits of physical
address space (configurable on customer request). Supported page sizes are 4KB, 2MB, 1GB
and 512GiB.

The Page Table Walker (PTW) is responsible for serving the misses from both TLBs (instruction
and data) as described in the Privileged Architecture Manual.

1.8 PMP

Avispado does not support PMP.

1.9 PMU
Avispado 222 provides support for hardware performance counters. There are 3 fixed counters
as defined in the Privileged Architecture Manual: i) cycle counter, ii) time counter and iii)
instructions retired.

Additionally, Avispado 222 provides 2 additional programmable Performance Monitor Counters,


hpmcounter3 and hpmcounter4. Such counters can be programmed by writing into the
hpmevent3 and hpmevent4 registers as described in the Privileged Architecture Manual. A ‘0’
value in the register indicates that the performance counter associated is not active. The rest of
the values are described in the next table:

mhpeventX[63:0] Description

1 Instruction Cache Miss

2 Data Cache Miss

3 Instruction TLB miss

4 Data TLB miss

5 Exceptions

6 ERET instructions

7
8 Branch decode misses

9 Branch mispredicts

10 Committed load instructions

11 Committed store instructions

12 Committed Control Flow instructions

2 Avispado 222 Ports


This section presents the ports to the Avispado core. Avispado includes an AXI4 port with a
default data width of 512 bits to access upper levels of the memory hierarchy (data width can be
parameterized, valid values are 64, 128, 256 and 512 bits). Furthermore, Avispado includes an
AXI4-Lite port with a data width of 64 bits to access IO devices.

Port Name I/O Description

clk_i I Input clock

rst_ni I Active low reset

boot_addr_i I [63:0] Reset boot address

hart_id_i I [7:0] Hart ID in a multicore environment (reflected in a CSR)

irq_i I [1:0] Level sensitive IR lines, mip & sip (async)

ipi_i I Inter-processor interrupts (async)

time_irq_i I Timer interrupt in (async)

debug_req_i I Debug request (async)

kick_core_i I When this signal is set HIGH the core starts fetching
instructions from boot address

cb_cfg_bus_i I [63:0] Chicken bits from config registers

ddr_region_limit_i I [31:0] Determine upper limit of available memory

8
AXI4 signals
Port Name I/O Description

axi4_req_o.aw_ctrl.id O [14:0] Write request ID1

axi4_req_o.aw_ctrl.addr O [63:0] Write request address2

axi4_req_o.aw_ctrl.blen O [7:0] Write request burst length: number of transfers in a


burst

axi4_req_o.aw_ctrl.bsize O [2:0] Write request burst size: indicates the size of each
transfer in the burst

axi4_req_o.aw_ctrl.btype O [1:0] Write request burst type. Indicates how to calculate the
address of each transfer in the burst

axi4_req_o.aw_ctrl.lock O Write request lock type (normal/exclusive)

axi4_req_o.aw_ctrl.cach O [3:0] Write request memory type. Determines how data can
e be buffered in intermediate AXI4 components.

axi4_req_o.aw_ctrl.prot O [2:0] Write request protection type. Defines the access


permissions for write accesses according to the following
table:

Bit Value Function

[0] 0 Unprivileged access (U mode)

1 Privileged access (M or S mode)

[1] 1 Non-secure access

[2] 0 Data access

axi4_req_o.aw_ctrl.qos O [3:0] Quality of service identifier for write request3

axi4_req_o.aw_ctrl.regio O [3:0] Region identifier. Permits a single physical interface on


n a slave to be used for multiple logical interfaces.

axi4_req_o.aw_valid O Write request valid signal

axi4_resp_i.aw_ready I Write request ready signal

1
Bit width for transaction ID is parameterized in Avispado. By default, 15 bits are used.
2
Bit width for AXI4 address can be selected by the user.
3
QoS is used as a priority indicator for the write transaction. The higher the QoS value, the higher the
priority of the transaction.

9
axi4_req_o.w_data.data O [511:0] Write data4

axi4_req_o.w_data.strb O [63:0] Write data strobes. Byte enables indicating which


bytes must be written5

axi4_req_o.w_data.last O Indicates the last transfer in a write burst

axi4_req_o.w_valid O Write data valid signal

axi4_resp_i.w_ready I Write data ready signal

axi4_resp_i.b_ctrl.id I [14:0] Write response ID tag6

axi4_resp_i.b_ctrl.resp I [1:0] Write response code indicating the status of the write
request

axi4_resp_i.b_valid I Write response valid signal

axi4_req_o.b_ready O Write response ready signal

axi4_req_o.ar_ctrl.id O [14:0] Read request ID7

axi4_req_o.ar_ctrl.addr O [63:0] Read request address8

axi4_req_o.ar_ctrl.blen O [7:0] Read request burst length: number of transfers in a


burst

axi4_req_o.ar_ctrl.bsize O [2:0] Read request burst size: indicates the size of each
transfer in the burst

axi4_req_o.ar_ctrl.btype O [1:0] Read request burst type. Indicates how to calculate the
address of each transfer in the burst

axi4_req_o.ar_ctrl.lock O Read request lock type (normal/exclusive)

axi4_req_o.ar_ctrl.cache O [3:0] Read request memory type. Determines how data can
be buffered in intermediate AXI4 components.

axi4_req_o.ar_ctrl.prot O [2:0] Read request protection type. Defines the access


permissions for read accesses according to the following
table:

Bit Value Function

[0] 0 Unprivileged access (U mode)

4
Avispado supports data bus widths of 64, 128, 256 and 512 bits. By default, 512 bits are used.
5
Number of strobes bits automatically adjusted to match the size of the data port in AXI4.
6
Bit width for write transaction ID can be configured in Avispado (15 bits by default).
7
Bit width for read transaction ID can be configured in Avispado (15 bits by default).
8
Bit width for AXI4 address can be selected by the user.

10
1 Privileged access (S or M mode)

[1] 1 Non-secure access

[2] 0 Data access

1 Instruction access

axi4_req_o.ar_ctrl.qos O [3:0] Quality of service identifier for read request9

axi4_req_o.ar_valid O Read request valid signal

axi4_resp_i.ar_ready I Read request ready signal

axi4_resp_i.r_data.id I [14:0] Read ID tag10

axi4_resp_i.r_data.data I [511:0] Read data11

axi4_resp_i.r_data.resp I [1:0] Read response code indicating the status of the read
request

axi4_resp_i.r_data.last I Indicates the last transfer in a read burst

axi4_resp_i.r_valid I Read data valid signal

axi4_req_o.r_ready O Read data ready signal

AXI4-Lite signals
Port Name I/O Description

axi_lite_req_o.aw.addr O [63:0] Write request address

axi_lite_req_o.aw.prot O [2:0] Write request protection type. Defines the access


permissions for write accesses according to the following
table:

Bit Value Function

[0] 0 Unprivileged access (U mode)

1 Privileged access (M or S mode)

[1] 1 Non-secure access

9
QoS is used as a priority indicator for the read transaction. The higher the QoS value, the higher the
priority of the transaction.
10
Bit width for read transaction ID can be configured in Avispado (15 bits by default).
11
Avispado supports data bus widths of 64, 128, 256 and 512 bits. By default, 512 bits are used.

11
[2] 0 Data access

axi_lite_req_o.aw_valid O Write request valid signal

axi_lite_resp_i.aw_ready I Write request ready signal

axi_lite_req_o.w.data O [63:0] Write data

axi_lite_req_o.w.strb O [7:0] Write data strobes. Byte enables indicating which bytes
must be written

axi_lite_req_o.w_valid O Write data valid signal

axi_lite_resp_i.w_ready I Write data ready signal

axi_lite_resp_i.b.resp I [1:0] Write response code indicating the status of the write
request

axi_lite_resp_i.b_valid I Write response valid signal

axi_lite_req_o.b_ready O Write response ready signal

axi_lite_req_o.ar.addr O [63:0] Read request address

axi_lite_req_o.ar.prot O [2:0] Read request protection type. Defines the access


permissions for read accesses according to the following
table:

Bit Value Function

[0] 0 Unprivileged access (U mode)

1 Privileged access (S or M mode)

[1] 1 Non-secure access

[2] 0 Data access

1 Instruction access

axi_lite_req_o.ar_valid O Read request valid signal

axi_lite_resp_i.ar_ready I Read request ready signal

axi_lite_resp_i.r.data I [63:0] Read data

axi_lite_resp_i.r.resp I [1:0] Read response code indicating the status of the read
request

12
axi_lite_resp_i.r_valid I Read data valid signal

axi_lite_req_o.r_ready O Read data ready signal

3 Memory Map & PMA


The memory map is fully customized to the customer specification. The provided memory map
is encoded into the PMA block and can not be dynamically changed at run time. For each valid
physical memory region in the memory map, customers must declare the following attributes:
● R: Region is readable (1) or not (0)
● W: Region is writable (1) or not (0)
● X: Region is fetchable (1) or not (0)
● C: Region is cacheable (1) or not (0)
● A: Region supports atomic operations (1) or not (0)
● U: Region supports unaligned accesses (1) or not (0)
● QoS: 4 bits indicating the QoS associated with requests to the region

4 Implemented CSR Registers


The following table documents the CSR registers implemented in Avispado:

Number Privilege Name Description

User Floating-Point CSRs

0x001 URW fflags Floating Point Accrued Exceptions

0x002 URW frm Floating-Point Dynamic Rounding Mode.

0x003 URW fcsr Floating-Point Control and Status Register (frm +


fflags).

User Counter / Timers

0xC00 URO cycle Cycle counter for RDCYCLE instruction.

0xC01 URO time Timer for RDTIME instruction.

0xC02 URO instret instructions-retired counter for RDINSTRET


instruction.

13
0xC03 URO hpmcounter3 Performance-monitoring counter.

0xC04 URO hpmcounter4 Performance-monitoring counter.

Supervisor Trap Setup

0x100 SRW sstatus Supervisor status register.

0x104 SRW sie Supervisor interrupt-enable register.

0x105 SRW stvec Supervisor trap handler base address.

0x106 SRW scounteren Supervisor counter enable.

Supervisor Trap Handling

0x140 SRW sscratch Scratch register for supervisor trap handlers.

0x141 SRW sepc Supervisor exception program counter.

0x142 SRW scause Supervisor trap cause.

0x143 SRW stval Supervisor bad address or instruction.

0x144 SRW sip Supervisor interrupt pending.

Supervisor Protection and Translation

0x180 SRW satp Supervisor address translation and protection.

Machine Information Registers

0xF11 MRO mvendorid Vendor ID.

0xF12 MRO marchid Architecture ID.

0xF13 MRO mimpid Implementation ID.

0xF14 MRO mhartid Hardware thread ID.

Machine Trap Setup

0x300 MRW mstatus Machine status register.

0x301 MRW misa ISA and extensions

0x302 MRW medeleg Machine exception delegation register.

0x303 MRW mideleg Machine interrupt delegation register.

0x304 MRW mie Machine interrupt-enable register.

0x305 MRW mtvec Machine trap-handler base address.

14
0x306 MRW mcounteren Machine counter enable.

Machine Trap Handling

0x340 MRW mscratch Scratch register for machine trap handlers.

0x341 MRW mepc Machine exception program counter.

0x342 MRW mcause Machine trap cause.

0x343 MRW mtval Machine bad address or instruction.

0x344 MRW mip Machine interrupt pending.

Machine Counter / Timers

0xB00 MRW mcycle Machine cycle counter.

0xB02 MRW minstret Machine instructions-retired counter.

0xB03 MRW mhpmcounter3 Machine performance-monitoring counter.

0xB04 MRW mhpmcounter4 Machine performance-monitoring counter.

Machine Counter Setup

0x320 MRW mcountinhibit Machine counter-inhibit register.

0x323 MRW mhpmevent3 Machine performance-monitoring event selector.

0x324 MRW mhpmevent4 Machine performance-monitoring event selector.

Debug/Trace Registers (shared with Debug Mode)

0x7A0 MRW tselect Debug/Trace trigger register select.

0x7A1 MRW tdata1 First Debug/Trace trigger data register.

0x7A2 MRW tdata2 Second Debug/Trace trigger data register.

0x7A3 MRW tdata3 Third Debug/Trace trigger data register.

Debug Mode Registers

0x7B0 MRW dcsr Debug control and status register.

0x7B1 MRW dpc Debug PC.

0x7B2 MRW dscratch0 Debug scratch register 0.

0x7B3 MRW dscratch1 Debug scratch register 1.

15
4.1 Machine Information Registers

Register Value Comment

mvendorid 0x698 Vendor ID = Semidynamics

marchid 0x80000000000000xx Avispado 222 has bits [7:0] set to 0x14

mimpid <defined at RTL freeze Bits [7:0] indicate the revision ID


time> Bits [15:8] indicate the process node.

5 Interrupts and Exceptions

5.1 Interrupt Pins


Avispado 222 supports the following interrupt pins:

Pin Reflected into Type Sensitivity Name

irq[0] mip.meip Asynch Level Machine External Interrupt

irq[1] mip.seip Asynch Level Supervisor External Interrupt

ipi mip.msip Asynch Level Machine Software Interrupt

time_irq mip.mtip Asynch Level Machine Timer interrupt

debug_req N/A Asynch Level Debug request

5.2 Interrupt Priorities


Avispado prioritizes interrupts as follows, in decreasing order of priority:
● Machine external interrupts
● Machine software interrupts
● Machine timer interrupts
● Supervisor external interrupts
● Supervisor software interrupts
● Supervisor timer interrupts

16

You might also like