Professional Documents
Culture Documents
Version 1.0
04/28/2022
Contents
1 Avispado 222 Features 3
1.1 Supported Modes 3
1.2 Block Diagram 3
1.3 Front End 4
1.4 Issue Queues 5
1.5 Load Store Unit 5
1.6 Floating point Unit 6
1.7 MMU 7
1.8 PMP 7
1.9 PMU 7
2
1 Avispado 222 Features
Avispado 222 is a 12-stage, 3-way-issue, 2-way-commit, in-order, 64-bit, RISC-V core that
supports the following RISC-V extensions:
● A: Atomic instructions
● C: Compressed instructions
● D: Double-Precision Floating Point instructions
● F: Single-Precision Floating Point instructions
● I: Base integer instruction set
● M: Integer Multiplication and Division instructions
● Zicsr: Control and Status Register instructions
● Zifencei: fence.i instruction
Avispado 222 supports SV39 and SV48 virtual memory and it implements Semidynamics
Gazillion MissesTM technology, allowing it to support up to 128 outstanding misses.
3
1.3 Front End
The Front-End is the part of Avispado 222 responsible for providing instructions to the Execution
Units. The main blocks of the Front-End are the Branch Predictor, the Instruction Memory
System and the Decoder. The Front-End in Avispado provides up to 2 instructions per cycle (8B)
to the back-end of the machine.
Branch Prediction:
Avispado 222 implements a TAGE branch predictor. The branch predictor contains:
● 64-entry 2-way tagged Branch Target Buffer (BTB), that predicts the destination address
of taken branches and indirect jumps
● 4-entry Return Address Stack (RAS) for return instruction prediction
● 3 different tables for TAGE prediction with a total size over 4 Kbits
Correctly-predicted taken branches have no penalty regardless of whether they are direct or
indirect, or if the target address is word or half-word aligned. Direct jumps mispredicted are
detected in the decode stage and the front-end is redirected to the correct address with a
penalty cost of 5 cycles. Branch misprediction penalty is 9 cycles.
The Instruction Memory system is composed of the Instruction TLB and the Instruction Cache.
The Instruction cache is 16 KB 4-way associative cache virtually indexed, physically-tagged.
Cache line size is 64B and implements pseudo-LRU replacement policy. In case of miss, a
request for the line is sent down to the next level in the memory system via AXI4. The
Instruction Memory system supports up to 8 outstanding misses from the Instruction Cache.
This feature helps to prefetch instruction lines and to have a smoother instruction delivery to the
next stages of the processor.
The Instruction Cache is not kept coherent with the rest of the memory. Hence, applications that
require modifying code must use fence.i instructions to fully invalidate the Instruction Cache
contents and synchronize with the newly produced values.
The Instruction TLB has 16 entries and it is fully-associative. It is responsible for translating
virtual to physical addresses. In case of miss, the PTW is invoked to provide the corresponding
Page Table Entry. The accessed physical addresses are checked in the PMA (Physical Memory
Attributes). In case of invalid addresses, the Instruction Memory System will raise an Instruction
Access Exception.
Decoder:
The decoder receives 512-bit cache lines from the Instruction Memory System and it extracts up
to 2 instructions to the next stages. As it has been mentioned earlier, this block also checks that
4
an indirect jump has been predicted as taken and the address matches with the one in the
instruction (in case the destination of the branch is encoded in the instruction and it does not
depend on any register). Otherwise it redirects the front-end and flushes the forthcoming
instructions.
Similarly, for those predicted taken branches whose destination address can be computed from
the instruction binary (i.e, conditional jumps), if the predicted address does not match with the
computed one, the front-end is redirected and the instructions flushed.
Avispado 222 has 3 issue queues, as shown in the block diagram. Instructions are issued in
program order from each of the queues to the corresponding execution units whenever their
operands are available either in the register file or the bypass network. The issue queues are
split based on operation function as follows:
● General Issue Queue (GIQ): Holds all integer, branch, and CSR-related instructions until
operands are ready and dispatches them to the integer execution pipeline. Latencies for
most of the operations are single-cycle except for integer multiplication, division and
reminder. Some CSR writes may cause a complete pipeline flush.
● Memory Issue Queue (MIQ): Holds all memory related instructions, i.e., load, store and
atomic operations, both integer and floating point. Load-to-use latency for loads that hit
in the D-cache is 3 clock cycles.
● Floating Point Issue Queue (VFIQ): Holds all vector and floating point arithmetic
instructions. Latencies for most of the operations are fixed, except for the floating point
divide and square root, and some special vector instructions (i.e, vrgather, etc.)
Data Cache:
The Avispado 222’s Data Cache is a 32KB 8-way associative cache with a cache line of 64B. It
is virtually indexed and physically tagged and it implements pseudo-LRU for replacement. It is
write-allocate (for scalar stores) and copy-back (dirty lines only update upper levels of memory
when the line is evicted).
Gazillion MissesTM:
Avispado 222 has the ability to manage up to 128 outstanding data cache misses.
5
Unaligned accesses:
Avispado 222’s Load Store Unit is able to deal with unaligned memory access. The solution is
purely hardware-based and it is able to request the two lines and merge the result in case of an
unaligned scalar load, or to write to the two lines in case of an unaligned store.
DTLB:
The Data TLB is a 32 entry 8-way set associative cache. It is responsible for translating virtual to
physical addresses. In case of DTLB miss, the hardware page table walker (PTW) is invoked to
provide the corresponding Page Table Entry.
PMA:
All physical addresses generated by the Load Store unit are run by the PMA (described in
section 4 Memory Map & PMA). If the PMA does not validate the access, an exception is
generated.
Store Buffer:
Stores are executed in the Load Store Unit but the corresponding write to memory only happens
once the store instruction has reached the commit stage. Meanwhile, pending stores are kept in
the Store Buffer. This structure has a capacity of 16 entries. Avispado 222 implements support
for store-to-load data forwarding. In case the value to be accessed is spread across several
entries of the Store Buffer or it is not fully contained in a single entry, forwarding is disabled and
the load needs to wait until the store is completed.
Atomic Operations:
Avispado 222 provides support for all atomic instructions as described in the A chapter of the
RISC-V Instruction Set Architecture Volume I: Unprivileged ISA. Avispado 222 employs the
exclusive access mechanism provided in AMBA AXI4 protocol specification to read/write
memory when processing AMOs, load-reserved and store-conditional.
6
comparator, and an iterative non-pipelined floating point divider and square-root unit that is able
to compute 1 bit per cycle.
1.7 MMU
Avispado 222 supports bare-metal execution. Furthermore, it supports SV39 and SV48 virtual
memory as described in the RISC-V Instruction Set Architecture Volume II: Privileged
Architecture. Avispado 222 provides a 48-bit virtual address space and up to 40-bits of physical
address space (configurable on customer request). Supported page sizes are 4KB, 2MB, 1GB
and 512GiB.
The Page Table Walker (PTW) is responsible for serving the misses from both TLBs (instruction
and data) as described in the Privileged Architecture Manual.
1.8 PMP
1.9 PMU
Avispado 222 provides support for hardware performance counters. There are 3 fixed counters
as defined in the Privileged Architecture Manual: i) cycle counter, ii) time counter and iii)
instructions retired.
mhpeventX[63:0] Description
5 Exceptions
6 ERET instructions
7
8 Branch decode misses
9 Branch mispredicts
kick_core_i I When this signal is set HIGH the core starts fetching
instructions from boot address
8
AXI4 signals
Port Name I/O Description
axi4_req_o.aw_ctrl.bsize O [2:0] Write request burst size: indicates the size of each
transfer in the burst
axi4_req_o.aw_ctrl.btype O [1:0] Write request burst type. Indicates how to calculate the
address of each transfer in the burst
axi4_req_o.aw_ctrl.cach O [3:0] Write request memory type. Determines how data can
e be buffered in intermediate AXI4 components.
1
Bit width for transaction ID is parameterized in Avispado. By default, 15 bits are used.
2
Bit width for AXI4 address can be selected by the user.
3
QoS is used as a priority indicator for the write transaction. The higher the QoS value, the higher the
priority of the transaction.
9
axi4_req_o.w_data.data O [511:0] Write data4
axi4_resp_i.b_ctrl.resp I [1:0] Write response code indicating the status of the write
request
axi4_req_o.ar_ctrl.bsize O [2:0] Read request burst size: indicates the size of each
transfer in the burst
axi4_req_o.ar_ctrl.btype O [1:0] Read request burst type. Indicates how to calculate the
address of each transfer in the burst
axi4_req_o.ar_ctrl.cache O [3:0] Read request memory type. Determines how data can
be buffered in intermediate AXI4 components.
4
Avispado supports data bus widths of 64, 128, 256 and 512 bits. By default, 512 bits are used.
5
Number of strobes bits automatically adjusted to match the size of the data port in AXI4.
6
Bit width for write transaction ID can be configured in Avispado (15 bits by default).
7
Bit width for read transaction ID can be configured in Avispado (15 bits by default).
8
Bit width for AXI4 address can be selected by the user.
10
1 Privileged access (S or M mode)
1 Instruction access
axi4_resp_i.r_data.resp I [1:0] Read response code indicating the status of the read
request
AXI4-Lite signals
Port Name I/O Description
9
QoS is used as a priority indicator for the read transaction. The higher the QoS value, the higher the
priority of the transaction.
10
Bit width for read transaction ID can be configured in Avispado (15 bits by default).
11
Avispado supports data bus widths of 64, 128, 256 and 512 bits. By default, 512 bits are used.
11
[2] 0 Data access
axi_lite_req_o.w.strb O [7:0] Write data strobes. Byte enables indicating which bytes
must be written
axi_lite_resp_i.b.resp I [1:0] Write response code indicating the status of the write
request
1 Instruction access
axi_lite_resp_i.r.resp I [1:0] Read response code indicating the status of the read
request
12
axi_lite_resp_i.r_valid I Read data valid signal
13
0xC03 URO hpmcounter3 Performance-monitoring counter.
14
0x306 MRW mcounteren Machine counter enable.
15
4.1 Machine Information Registers
16