You are on page 1of 23

Computer Architectures

Overview of SOC
Architecture design

Tien-Fu Chen

National Chung Cheng Univ.

© by Tien-Fu Chen@CCU SOC - 0

SOC design Issues

 SOC architecture

 Reconfigurable
 System-level
 Programmable processors
 Low-level reconfiguration

 On-chip bus

 Embedded Software Issues

© by Tien-Fu Chen@CCU SOC - 1


Embedded Systems vs. General Purpose
Computing - 1
Embedded System General purpose computing

 Runs a few applications often Intended to run a fully general set


known at design time of applications

 Not end-user programmable  End-user programmable

 Operates in fixed run-time  Faster is always better


constraints, additional performance
may not be useful/valuable

© by Tien-Fu Chen@CCU SOC - 2

Embedded Systems vs. General Purpose


Computing - 2
Embedded System General purpose computing

Differentiating features:
 power Differentiating features
 cost
 speed (need not be fully
 speed (must be predictable) predictable)
 speed
output  did we mention speed?
analog
CPU  cost (largest component power)
input analog
Logic
mem
embedded
computer

© by Tien-Fu Chen@CCU SOC - 3


Embedded System: Examples

Embedded
System

© by Tien-Fu Chen@CCU SOC - 4

Design Complexity Increase

SoC Characteristics Current Next


Design Design
Average Gate Count 616K 1,000K
Designs with Gate Count > 18% 31%
1M
Units Shipped >5M 20% 37%

# of lines of DSP SW 54K 295K


# of lines of uP SW 75K 266K
Clock Speed >133mhz 44% 61%
>400mhz 15% 22%
>5 Clock Domains 25% 35%
>9 Clock Domains 10% 12%

Source: Collett International Research- December 2000 Research on


360 IC/ASIC Design Teams in North America
© by Tien-Fu Chen@CCU SOC - 5
Embedded System on Chip (SoC) Design
System
Zone 4: Global
Environment
Satellite
Zone 3: Suburban Zone 2: Urban
Zone 1: In-Building

Micro-Cell Pico-Cell
Macro-Cell Requirements
Specification
Specification
Untimed,
Unclocked,
C/C++ Level Embedded
Testbench Systems Design

Software
Embedded
Refinement

Implementation
Characterization
Software
Design Export P/C
µ Analog
Memory SOC

Implementation
Timed,
Clocked, Firmware
RTL Level
CORE

© by Tien-Fu Chen@CCU SOC - 6

Architectures

 Supplements Models by specifying how the system will


actually be implemented

 Goal of each architecture is to describe


 Number of components
 Type of each component
 Type of each connection among above components
 General classification
 Application-specific architectures: DSP
 General-purpose architectures: CISC, RISC
 Parallel processors: VLIW, SIMD, MIMD

© by Tien-Fu Chen@CCU SOC - 7


System Architecture Design
 System Architecture & Exploration
 What
 Hardware/Software partitioning; processor, and memory architecture
choices; system timing budget, power management strategy, system
verification strategy…
 Partitioning into HW block hierarchy, cycle time budgeting, block
interfaces, block verification, clock architecture and test strategy
 Fixed point architecture exploration and design
 How - Quickly assemble architecture(s) for exploration to
measure system timing/performance. Need to accurately
(enough) model the bottlenecks

© by Tien-Fu Chen@CCU SOC - 8

System Integration & Verification

 System Design Environment for HW/SW Refinement,


Verification and Integration
 What
– Enables hierarchical (manual or automatic) refinement of individual blocks of
design in context of system. Maintain system and hierarchical test benches
– Verification of refined hardware/software with entire system design
– Define next level of clock architecture (derived) and test strategy

 How - Build a system verification hierarchy that allows


integration of HW blocks, system software (HAL), embedded
application SW and eventually verifying the entire design at
cycle accurate (or RTL) level

© by Tien-Fu Chen@CCU SOC - 9


CoDesign and Co-Synthesis
Specification

Partition

HW: HDL SW: Algorithm,


(Behavioral, DataFlow, Textual/Graphical
Structural), Schematic representation
Co-Synthesis
Synthesis

Executable or
RTL, Gate level, Compilable code:
Transistors, Layout Detailed Representation The program(s),
of Implementation OS routines

© by Tien-Fu Chen@CCU SOC - 10

Traditional design

Traditional System Design Process


Tasks

SW design
SW test
System
PCB test
design

ASIC design Fabrication Test

Time

© by Tien-Fu Chen@CCU SOC - 11


System-level Co-design

Co-Design Process
Tasks

SW design
SW test
System
design
Shared Design PCB test

ASIC design Fabrication Test

Time
System-Level Partitioning

© by Tien-Fu Chen@CCU SOC - 12

Configurabilty and Embedded Systems

Advantages of configuration:

Pay (in power, design time, area) only for what you
use

Gain additional performance by adding features


tailored to your application:

Particularly for embedded systems:


 Principally in embedded controller microprocessor
applications
 Some us in DSP

© by Tien-Fu Chen@CCU SOC - 13


What to Configure?
What parts of the microcontroller/microprocessor system to
configure?

Easy answers:
 Memory and Cache Sizes - get precisely the sizes your applications
needs
 Register file sizes

 Interrupt handling and addresses

Harder answers:
 Peripherals
 Instructions

But first we need more context


© by Tien-Fu Chen@CCU SOC - 14

Trickle Down Theory of Embedded Architectures

 Mainframe/supercomputers Features tend to trickle down:


• #bits: 4->8->16->32->64
 High-end servers/workstations • ISA’s
• Floating point support
 High-end personal computers • Dynamic scheduling
• Caches
 Personal computers • I/O controllers/processors
• LIW/VLIW
 Lap tops/palm tops • Superscalar

 Gadgets

 Watches

 ...

© by Tien-Fu Chen@CCU SOC - 15


Configurability in ARM Processor
 ARM allows for configurability via AMBA bus

 Offers ``prime cell’’ peripherals which hook into AMBA Peripheral


Bus (APB)

 UART

 Real Time Clock

 Audio Codec Interface

 Keyboard and mouse interface

 General purpose I/O

 Smart card interface

 Generic IR interface

© by Tien-Fu Chen@CCU SOC - 16

ARM7 core

© by Tien-Fu Chen@CCU SOC - 17


ARM’s Amba open standard

 Advanced System Bus, (ASB) - high performance, CPU, DMA, external

 Advanced Peripheral Bus, (APB) - low speed, low power, parallel I/O,
UART’s

 External interface

http://www.arm.com/Documentation/Overviews/AMBA_Intro/#intro

© by Tien-Fu Chen@CCU SOC - 18

Ex: ARM Infrared (IR)


Interface

© by Tien-Fu Chen@CCU SOC - 19


Ex : Audio Codec

© by Tien-Fu Chen@CCU SOC - 20

Another Kind of Configurability

HDL Synthesis of a processor core from


an RTL description allows for:
RTL
Synthesis • full range of other types of
configurability
netlist
Library • additional degrees of freedom in
logic quality of implementation
optimization
Examples:

netlist • ARM7
• Motorola Coldfire
physical
design • Tensilica Xtensa

layout

© by Tien-Fu Chen@CCU SOC - 21


Issues in low-level Configurable Design

Choice and Granularity of Computational


Elements
Choice and Granularity of Interconnect
Network
(Re)configuration Time and Rate
 Fabrication time --> Fixed function devices
 Beginning of product use --> Actel/Quicklogic
FPGAs
 Beginning of usage epoch --> (Re)configurable
FPGAs
 Every cycle --> traditional Instruction Set
Processors
© by Tien-Fu Chen@CCU SOC - 22

The Choice of the Computational Elements

Reconfigurable Reconfigurable Reconfigurable Reconfigurable


Logic Datapaths Arithmetic Control
In

Data P rogram
Memory Memory
mux
AddrGen AddrGen
CLB CLB
re g0
Inst ruction
Memory Memory Decoder
re g1 Datapath &
C ontroller
a dder
C LB C LB
MAC
buffer Data
Memory

Bit-Level Operations Dedicated data paths Arithmetic kernels RTOS


e.g. encoding e.g. Filters, AGU e.g. Convolution Process management

© by Tien-Fu Chen@CCU SOC - 23


Multi-granularity Reconfigurable Architecture:
The Berkeley Pleiades Architecture
Configuration Bus
Satellite Processor

Configuration
Arithmetic Arithmetic Arithmetic
Processor Processor Processor Dedicated
Arithmetic
Communication Network

Network Interface
Control Configurable Configurable
Processor Datapath Logic

• Computational kernels are “spawned” to satellite processors


• Control processor supports RTOS and reconfiguration
• Order(s) of magnitude energy-reduction over traditional programmable architectures

© by Tien-Fu Chen@CCU SOC - 24

Matching Computation and Architecture

AddressGen AddressGen

Memory Memory
Convolution

MAC MAC

L G C
Control
Processor

Two models of computation: Two architectural models:


communicating processes + data-flow sequential control+ data-driven

© by Tien-Fu Chen@CCU SOC - 25


Execution Model of a Data-Flow Kernel
Embedded processor
Code seg start end

AddrGen
for(i=1;i<=L;i++)
for(k=i;k<=L;k++) MEM: in

phi[i][k]= phi[i-1][k-1] AddrGen


MPY MPY
+in[NP-i]*in[NP-k] MEM: phi
-in[NA-1-i]*in[NA-1-k]; ALU

Code seg ALU

• Distributed control and memory


© by Tien-Fu Chen@CCU SOC - 26

Software Methodology Flow

Algorithm s C++

µproc &
Kernel Detection Accelerator
P DA Models

Behavioral
Estimation/E xploration
SUIF+ C-IF
Po wer & Tim in g Estimation
of Vario us Kernel Implemen tations
P remapped
K ernels
P artitioning
C++ Module
Libraries
S oftware Compilation
Reconfig. Hardware Mapping
Interface Code Generation

© by Tien-Fu Chen@CCU SOC - 27


The System-on-a-Chip Nightmare

System Bus
DMA CPU DSP

Mem
Ctrl.
Bridge

MPEG
The “Board-on-a-Chip”
Approach
I O O
C

Custom Interfaces

Control Wires Peripheral


Bus
© by Tien-Fu Chen@CCU SOC - 28

Sonics SOC Integration Architecture

Open Core
DMA CPU DSP MPEG Protocol™

SiliconBackplane™
(patented) { MultiChip
Backplane™

C MEM I O
SiliconBackplane
Agent™

© by Tien-Fu Chen@CCU SOC - 29


Master vs. Slave

IP Core IP Core IP Core

Master Master Slave Slave


Open Core Response
Protocol Request
Initiator Target
Slave Slave Master Master

On-Chip Bus

© by Tien-Fu Chen@CCU SOC - 30

The Backplane: Why Not Use a Computer Bus?

Transmit FIFO Receive FIFO


IP IP
Core Arbiter Address Core
Computer
  

  

Bus
Data

IP IP
Core Core
Time

• Expensive to decouple
• Not designed for real-time
© by Tien-Fu Chen@CCU SOC - 31
Communication Buses Decouple
and Guarantee Real Time
Transmit FIFO Receive FIFO
IP IP
Core TDMA TDMA Core
Communications
  

  
Bus
Data

IP IP
Core Core
Time

• Connections are expensive


• Poor read latency
© by Tien-Fu Chen@CCU SOC - 32

On-Chip Bus for SOC

Example on- chip bus interconnects


 ARM’s AMBA bus
 IBM’s Core Connect
 Virtual Socket Interface Alliance group
 Open Connect Protocol group

Example processor cores


 ARM
 MIPS
 PowerPC

© by Tien-Fu Chen@CCU SOC - 33


Design Example:
The Radio-on-a-Chip

DSP and control


Reconfigurable Embedded uP intensive
State Machines + DSPs Mixed-mode
Combines
programmable, flexible,
FPGA
and application-specific
Dedicated Reconfigurable modules
DSP DataPath
Cost and energy are
the key metrics

© by Tien-Fu Chen@CCU SOC - 34

System Level Design Science

Design Methodology:
 Top Down Aspect:
 Orthogonalization of Concerns:
– Separate Implementation from Conceptual Aspects
– Separate computation from communication
 Formalization: precise unambiguous semantics
 Abstraction: capture the desired system details (do not overspecify)
 Decomposition: partitioning the system behavior into simpler behaviors
 Successive Refinements: refine the abstraction level down to the
implementation by filling in details and passing constraints
 Bottom Up Aspect:
 IP Re-use (even at the algorithmic and functional level)
 Components of architecture from pre-existing library

© by Tien-Fu Chen@CCU SOC - 35


Separate Behavior from Micro-architecture

System Behavior Implementation Architecture


 Functional Specification  Hardware and Software
of System.  Optimized Computer

 No notion of hardware or

software! Mem
13
External DSP
Rate
User/Sys
Control I/O Processor
Buffer 3 Sensor

Processor Bus
12
Synch
Control MPEG DSP RAM
4

Rate Frame
Video Video
Front Transport
Decode 2
Buffer
Decode 6
Buffer
Output 8 Peripheral Control
End 1 5 7
Processor
Rate
Buffer Audio
Audio System
9 Decode/ Decode
Output 10 RAM

Mem
11

© by Tien-Fu Chen@CCU SOC - 36

Map Between Behavior from Architecture

Transport Decode Implemented


as Software Task Running
on Microcontroller
Mem
13

User/Sys
Communication DSP
Rate External
Buffer
Control
3 Sensor Over Bus I/O Processor
12
Processor Bus

Synch
Control
4
MPEG DSP RAM
Rate Frame
Transport Video Video
Front Buffer Buffer
Decode 2 Decode 6 Output 8
End 1 5 7
Peripheral Control
Rate Processor
Buffer Audio
9 Decode/ Audio System
Output 10
Decode
RAM
Mem
11

Audio Decode Behavior


Implemented on
Dedicated Hardware

© by Tien-Fu Chen@CCU SOC - 37


Embedded Software Crisis
Cheaper, more powerful
J. Fiddler - WRS
Microprocessors

Increasing Embedded
Software More
Time-to-market
Crisis Applications
pressure

J. Fiddler - WRS Bigger, More Complex


Applications
© by Tien-Fu Chen@CCU SOC - 38

SW: Embedded Software Tools

application compiler
U source Application
S code software
E
R a.out RTOS

A
S
C I ROM
simulator P
C
debugger A
U S
RAM
I
C

© by Tien-Fu Chen@CCU SOC - 39


Hardware Platforms Not Enough!

Hardware platform has to be abstracted

Interface to the application software is the “API”

Software layer performs abstraction:


 Programmable cores and memory subsystem “hidden” by
RTOS and compilers
 I/O subsystem with Device Drivers

 Network with Network Communication Software

© by Tien-Fu Chen@CCU SOC - 40

Software Platforms

Platform API
Application Software

Software
Software Platform
Hardware Platform Hardware
Input devices Output Devices I O
network
Communication

API
Network
RTOS
Compiler

Device Drivers
BIOS

© by Tien-Fu Chen@CCU SOC - 41


Platform-based methodology
 Platform based design:
 Application mapped on architecture
 Performance evaluation and iterative
refinement
 Challenges:
 complete system simulation
 complexity management
 composability and reuse

 Key elements for composability


 Identification and use of useful models
of computation
 FSMD, DE, DF, CSP, ...
 A flexible, extensible language platform
to capture the functionality.
 Composability can be achieved using
Object-oriented mechanisms:

© by Tien-Fu Chen@CCU SOC - 42

Final goal for SOC: Platform-Based Design


Taking Design Block Reuse to the Next Level

Pre-Qualified/Verified Foundation Block + Reference Design


Foundation-IP*
Scaleable
bus, test, power, IO,
MEM clock, timing architectures

Application
Hardware IP Processor(s), RTOS(es) and
Space CPU
FPGA SW architecture
SW IP

Programmable Methodology / Flows:


System-level performance
evaluation environment
Rapid Prototype for
*IP can be hardware (digital End-Customer Evaluation
or analogue) or software. Foundry-Specific SoC Derivative Design
IP can be hard, soft or Pre-Qualification Methodologies
‘firm’ (HW), source or Foundry Targetting Flow
object (SW)

© by Tien-Fu Chen@CCU SOC - 43


Summary of SOC Design
System Verification
Specification

Co-Synthesis
Partitioning

Verification
HW Parameter SW Parameter
Estimation Estimation

HW SW
Synthesis Synthesis Verification
ASIC OS
EXE Code Verification

System
Integration Final Verification

© by Tien-Fu Chen@CCU SOC - 44

You might also like