You are on page 1of 38

Architecture and Components

1
Use Cases
RTL Simulation running test
binaries/micro-benchmarks
Custom SoC architecture
New blocks + reusing
existing blocks
FPGA-accelerated simulation
running full workloads

FPGA prototyping for fast


cool demos

Tape-out the SoC to get


actual silicon results

2
Organization
What is Chipyard?
• An organized framework for
various SoC design tools
• A curated IP library of open-
source RISC-V SoC
components
• A methodology for agile SoC
architecture design,
exploration, and evaluation

3
Organization
SoC architecture and
generators

4
SoC Architecture
Digital SoC Architecture
RocketTile BoomTile
Rocket Boom
PTW PTW

Accelerator
Core Core

RoCC
L1I$ L1D$ L1I$ L1D$
MMIO
Accelerator
Tile Bus Tile Bus

System Bus

Periphery Bus Front Bus

Control Bus

L2 L2 BootROM Serdes
Bank Bank UART GPIOs
PLIC

Memory Bus CLINT


Debug
DRAM DRAM
Chan. Chan.

5
Tiles and Cores
Digital SoC Architecture
RocketTile BoomTile
Rocket
PTW
Boom
PTW
Tiles:

Accelerator
Core Core
• Each Tile contains a RISC-V core

RoCC
and private caches
L1I$ L1D$ L1I$ L1D$
MMIO
• Several varieties of Cores
Accelerator supported
Tile Bus Tile Bus
• Interface supports integrating your
System Bus
own RISC-V core implementation

Periphery Bus Front Bus

Control Bus

L2 L2 BootROM Serdes
Bank Bank UART GPIOs
PLIC

Memory Bus CLINT


Debug
DRAM DRAM
Chan. Chan.

6
Rocket and BOOM

Rocket:
• First open-source RISC-V CPU
• In-order, single-issue RV64GC core
• Efficient design point for low-power devices
SonicBOOM:
• Superscalar out-of-order RISC-V CPU
• Advanced microarchitectural features to maximize IPC
• TAGE branch prediction, OOO load-store-unit, register
renaming
7
• High-performance design point for general-purpose
Rocket and BOOM

Rocket and SonicBOOM:


• Support RV64GC ISA profile
• Boots off-the-shelf RISC-V Linux distros (buildroot,
Fedora, etc.)
• Fully synthesizable, tapeout-proven
• Described in Chisel
• Fully open-sourced

8
PULP Cores in

CVA6 (Formerly Ariane): Ibex (Formerly Zero-RISCY):


• RV64IMAC 6-stage single-issue in-order core • RV64IMC 2-stage single-issue in-order core
• Open-source • Open-source
• Implemented in SystemVerilog • Implemented in SystemVerilog
• Developed at ETH Zurich as part of PULP, • Developed at ETH Zurich as part of PULP
9 • Now maintained by OpenHWGroup • Now maintained by lowRISC
Sodor Education Cores
Sodor Core Collection
• Collection of RV32IM cores for
teaching and education
• 1-stage, 2-stage, 3-stage, 5-stage
implementations
• Micro-coded “bus-based” RISC-V
implementation Sodor Micro-coded

• Used in introductory computer


architecture courses at Berkeley

10
New in Chipyard 1.9.0
Spike –as –a- Tile
Digital SoC Architecture
Spike Tile
Spike:
• Open-source RISC-V ISA
libspike::processor_t simulator
MMIO
• Fast, extensible, C++ functional
DPI Call
Accelerator model
Spike-as-a-Tile:
System Bus
• DPI interface between RTL SoC
simulation and SoC functional
Periphery Bus Front Bus model
Control Bus
• Spike “virtual platform”
• Enables testing of complex device
L2 L2 BootROM Serdes software in RTL simulation
Bank Bank UART GPIOs
PLIC

Memory Bus CLINT


Debug
DRAM DRAM
Chan. Chan.

11
RoCC Accelerators
Digital SoC Architecture
RoCC Accelerators:
RocketTile BoomTile
Rocket Boom
• Tightly-coupled accelerator interface
PTW PTW

Accelerator
Core Core • Attach custom accelerators to Rocket

RoCC
or BOOM cores
L1I$ L1D$ L1I$ L1D$
MMIO
Accelerator
Tile Bus Tile Bus

System Bus

Periphery Bus Front Bus

Control Bus

L2 L2 BootROM Serdes
Bank Bank UART GPIOs
PLIC

Memory Bus CLINT


Debug
DRAM DRAM
Chan. Chan.

12
RoCC Accelerators
1 1. Core automatically decodes + sends
custom instructions to accelerator
BOOM/Rocket 2
2. Accelerator can write back into core
Custom registers
3
TLBs PTW Accelerator
3. Accelerator can support virtual-
Implementation addressing by sharing core PTW/TLB
4
L1I$ L1D$ 4. Accelerator can fetch-from/write-to
coherent L1 data cache or outer-
4 memory

SystemBus
Flexible interface supports a variety of
accelerator designs
L2 Peripherals
Included in Chipyard:
• Gemmini ML accelerator
• Hwacha vector accelerator
• SHA3 accelerator
13
MMIO Accelerators
Digital SoC Architecture
RocketTile BoomTile
Rocket
PTW
Boom
PTW MMIO Accelerators:

Accelerator
Core Core
• Controlled by MMIO-mapped

RoCC
L1I$ L1D$ L1I$ L1D$
registers
MMIO • Supports DMA to memory system
Tile Bus Tile Bus
Accelerator • Examples:
• Nvidia NVDLA accelerator
System Bus • FFT accelerator generator

Periphery Bus Front Bus

Control Bus

L2 L2 BootROM Serdes
Bank Bank UART GPIOs
PLIC

Memory Bus CLINT


Debug
DRAM DRAM
Chan. Chan.

14
Coherent Interconnect
Digital SoC Architecture
RocketTile BoomTile
Rocket Boom
PTW PTW

Accelerator
Core Core TileLink Standard:

RoCC
• TileLink is open-source chip-scale
L1I$ L1D$ L1I$ L1D$ interconnect standard
MMIO
Accelerator
• Comparable to AXI/ACE
Tile Bus Tile Bus • Supports multi-core, accelerators,
peripherals, DMA, etc
System Bus Interconnect IP:
Periphery Bus Front Bus
• Library of TileLink RTL generators
provided in RocketChip
Control Bus • RTL generators for crossbar-based
L2 L2 BootROM Serdes
buses
UART GPIOs
Bank Bank
PLIC • Width-adapters, clock-crossings,
CLINT
etc.
Memory Bus
Debug
• Adapters to AXI4, APB
DRAM DRAM
Chan. Chan.

15
Protocol Shims
CVA6WrapperTile AMBA-to-TileLink shims enable
CVA6 Verilog
NVDLA Wrapper
easy integration with existing IP
CVA6
Core
PTW NVDLA Verilog • Works for
cores/peripherals/accelerators
NVDLA
L1I$ L1D$
• Drop-in Verilog integration of
CVA-6, NVDLA
AXI4Bus AXI4 DMA

AXI4ToTL AXI4ToTL

16
NoC Interconnect
Digital SoC Architecture
RocketTile BoomTile
Rocket Boom
PTW PTW

Accelerator
Core Core

RoCC
L1I$ L1D$ L1I$ L1D$
MMIO
Accelerator Constellation
Tile Bus Tile Bus • Drop-in replacement for TileLink
crossbar buses
Constellation Network-on-Chip
Interconnect
Control Bus

L2 L2 BootROM Serdes
Bank Bank UART GPIOs
PLIC

Memory Bus CLINT


Debug
DRAM DRAM
Chan. Chan.

17
Constellation
A parameterized Chisel
generator for SoC
interconnects
• Protocol-independent
transport layer
• Supports TileLink, AXI-4
• Highly parameterized
• Deadlock-freedom
• Virtual-channel wormhole-
routing

18
L2/DRAM
Digital SoC Architecture
RocketTile BoomTile
Rocket Boom
PTW PTW

Accelerator
Core Core

RoCC
L1I$ L1D$ L1I$ L1D$
MMIO
Shared memory:
Accelerator • Open-source TileLink L2 developed by
Tile Bus Tile Bus
SiFive
System Bus
• Directory-based coherence with
MOESI-like protocol
Periphery Bus Front Bus • Configurable capacity/banking
• Support broadcast-based coherence in
Control Bus
no-L2 systems
L2 L2
UART GPIOs
BootROM Serdes • Support incoherent memory systems
Bank Bank
PLIC DRAM:
Memory Bus CLINT • AXI-4 DRAM interface to external
Debug memory controller
DRAM DRAM
Chan. Chan.
• Interfaces with DRAMSim

19
Peripherals and IO
Digital SoC Architecture
RocketTile BoomTile
Rocket Boom
PTW PTW

Accelerator
Core Core

RoCC
Peripherals and IO:
L1I$ L1D$ L1I$ L1D$ • Open-source RocketChip blocks
MMIO
Accelerator
• Interrupt controllers
Tile Bus Tile Bus • JTAG, Debug module,
BootROM
System Bus
• UART, GPIOs, SPI, I2C, PWM,
Periphery Bus Front Bus etc.
• TestChipIP: useful IP for test chips
Control Bus • Clock-management devices
L2 L2 BootROM Serdes
• SerDes
UART GPIOs
Bank Bank
PLIC • Scratchpads
Memory Bus CLINT
Debug
DRAM DRAM
Chan. Chan.

20
SoC Architecture
Digital SoC Architecture
RocketTile BoomTile
Rocket Boom
PTW PTW

Accelerator
Core Core

RoCC
L1I$ L1D$ L1I$ L1D$
MMIO
Accelerator
Tile Bus Tile Bus

System Bus

Periphery Bus Front Bus

Control Bus

L2 L2 BootROM Serdes
Bank Bank UART GPIOs
PLIC

Memory Bus CLINT


Debug
DRAM DRAM
Chan. Chan.

21
Organization

25
Organization
SoC Configuration

26
Composable Configurations
Digital SoC Architecture class CustomConfig extends Config(
RocketTile BoomTile new WithL1CacheWays(4) ++
Rocket Boom new WithAsyncTiles ++
PTW PTW

Accelerator
Core Core

RoCC
new WithSystemBusWidth(128) +
L1I$ L1D$ L1I$ L1D$
new WithFPGemmini ++
MMIO new With3WideBooms ++
Accelerator new WithL2TLBs(512) ++
Tile Bus Tile Bus
new WithL2Sets(1024) ++
System Bus
new WithDefaultGemmini ++
Periphery Bus Front Bus
new WithNRocketCores(1) ++
Control Bus new WithNBoomCores(1) ++
new WithBootROM ++
L2 L2 BootROM Serdes
Bank Bank UART GPIOs new WithUART ++
PLIC
new WithJtagDTM ++
Memory Bus CLINT new WithGPIOs ++
Debug new WithInclusiveCache(512) ++
DRAM DRAM
Chan. Chan. )
27
Organization

28
Organization
SW RTL Simulation:
• RTL-level simulation with
Verilator or VCS
• Hands-on tutorial next
FPGA prototyping:
• Fast, non-deterministic
prototypes
• Bringup platform for taped-out
chips
Hammer VLSI flow:
• Tapeout a custom config in
some process technology
• Overview of flow later
FireSim:
• Fast, accurate FPGA-
accelerated simulations
29 • Hands-on tutorial later
Organization

IO and Harness configuration

30
Multipurpose
TestHarness FireSimHarness ChipHarness

ChipTop ChipTop ChipTop


DigitalTop DigitalTop DigitalTop

Digital System configuration

Serdes
Analog
IOCell

IOCell

IOCell

IOCell

PLL
Chip IO configuration

UARTBridge

SerialBridge

ClockBridge
AXI4Bridge
FMC
SimGPIOs.cc
DRAMSim.cc

SimUART.cc

SimJTAG.cc

SimSerial.cc

TestDriver.v

Harness Configuration
FASED
Host Host Clock
UART Serial Driver Tethered FPGA

31
Configuring IO + Harness
TestHarness
class CustomConfig extends Config(
ChipTop
new WithDefaultGemmini ++
DigitalTop
new WithNRocketCores(1) ++
Digital Digital Config new
new
WithNBoomCores(1) ++
WithBootROM ++
System new WithUART ++
new WithJtagDTM ++
new WithGPIOs ++
new WithInclusiveCache(512) ++
IO Binders
Chip IO new WithIOCellModels ++
SimUART.cc

SimJTAG.cc

SimSerial.cc
SimGPIOs.c
DRAMSim.c

TestDriver.v

new WithDRAMSim ++
Harness Binders new WithSimUART ++
Harness
c

new WithSimJTAG ++
new WithSimSerial

32 )
Organization

33
Organization

FIRRTL/CIRCT Transforms

34
Chisel

Elaboration Flow Generators

design.fir
• Chisel programs generate FIRRTL
representations of hardware FIRRTL Passes

• FIRRTL passes transform the target SRAM Macro Hierarchy


Replacement Manipulation
netlist
• FireSim’s AutoCounter/AutoILA/Printf use
FIRRTL passes design.fir
• VLSI flows can use FIRRTL passes to
adjust the module hierarchy CIRCT passes

• NEW in Chipyard 1.9.1: Optimization Deduplication


CIRCT Backend
• CIRCT generates tool-friendly
synthesizable Verilog from FIRRTL
• Very fast/powerful FIRRTL compiler ChipTop.sv Harness.sv

35
Organization
Configs: Describe
parameterization of a multi-
generator SoC
Generators: Flexible, reusable
library of open-source Chisel
generators (and Verilog too)
IOBinders/HarnessBinders:
Enable configuring IO strategy
and Harness features
FIRRTL/CIRCT Passes:
Structured mechanism for
supporting multiple flows
Target flows: Different use-cases
for different types of users

38
Learning Curve
Advanced-level
• Configure custom IO/clocking setups
• Develop custom FireSim extensions
• Integrate and tape-out a complete SoC

Evaluation-level
• Integrate or develop custom hardware IP into
Chipyard
• Run FireSim FPGA-accelerated simulations
• Push a design through the Hammer VLSI flow
• Build your own system

Exploratory-level
• Configure a custom SoC from pre-existing
components
• Generate RTL, and simulate it in RTL level
simulation
• Evaluate existing RISC-V designs

39
For Education
Proven in many Berkeley Architecture
courses
• Hardware for Machine Learning
• Undergraduate Computer Architecture
• Graduate Computer Architecture
• Advanced Digital ICs
• Tapeout HW design course

Advantages of common shared HW


framework
• Reduced ramp-up time for students
• Students learn framework once, reuse it in
later courses
• Enables more advanced course projects
(tapeout a chip in 1 semester)

40
For Tapeouts
Standard Chipyard “Flow” For Tapeout chipyard-coolchip.git
1. RTL Development – Develop new
accelerators/devices and test rapidly
2. FireSim – Evaluate your design on real
workloads
3. Hammer - Reusable/extensible VLSI
flow
4. Bringup – generate FPGA bringup
platforms using Chipyard RTL Simulations Phys. Bringup
Design

Chipyard is a single-source-of-truth for a


chip
• Enables parallel workflows across
different parts of the flow
• Reproducible environments simplify
debugging
• Continuous integration for tapeouts

41
Community
Documentation:
• https://chipyard.readthedocs.io/en/de
v/
• 133 pages
• Most of today’s tutorial content is
covered there

Mailing List:
• google.com/forum/#!forum/chipyard

Open-sourced:
• All code is hosted on GitHub
• Issues, feature-requests, PRs are
welcomed

45
An open, extensible research and
design platform for RISC-V SoCs
• Unified framework of
parameterized generators
• One-stop-shop for RISC-V SoC
design exploration
• Supports variety of flows for
multiple use cases
• Open-sourced, community and
research-friendly

46

You might also like