3 - SoC & NoC

(EN 412) VLSI Design
System-on-Chip (SoC)
&
Network-on-Chip (NoC)
By Prof. Dr. Yaseer A. Durrani

Dept. of Electronics Engineering
University of Engineering & Technology, Taxila
1
Outline
❑ What is SoC?
❑ SoC Challenges, Application
❑ SoC Design Flow
❑ What is NoC?
❑ NoC Topologies
❑ NoC Design Flow
1
Modern Car
Body & Convenience

Xenon Light, Seat Position,
Powertrain Climate Control, Dashboard
Engine Control Climate Control
Transmission Control
Airbag Night Vision
Battery Management Park Distance
Steering Control
Transmission
Hybrid Dashboard
Engine Blindspot
Cooling Detection Suspension
FAN TPMS
Mirror Door Brake
Light Battery
Management Central Lock ABS ESP
Adaptive
Cruise Control
Safety Chassis
Airbag, ABS Brakes, Active Suspension,

Adaptive Cruise Control Power Steering
3
Introduction
❑ System on Chip (SoC) is basically a circuit embedded on a small coin-sized
chip & integrated with a microcontroller or microprocessor. The design of SoC
usually includes CPU, memory, I/O ports, secondary storage devices, &
peripheral interfaces such as I2C, SPI, UART, CAN, Timers, etc. depending
upon the requirement such as digital or analog signal processing system or
floating-point unit
❑ SoC connects CPU, hard disk, RAM, ROM, USB connectivity, & other
secondary storage devices with circuit embedded on the chip whereas the
motherboard does this using expansion cards
2
Types of SoC
❑ Few types of SoC design:
– SoC used in microprocessors
– Built for microcontrollers
– SoC designed for a special dedicated application & cannot be used for any
other work
– Programmable SoC, not all but few components of such SoC are
programmable
❑ Advantages of SoC:
– Reduction in overall system design as compare to motherboard designs
– Compact & small size chips even the size of a fingertip
– Better efficiency & performance
– Less power consumption
– Less time to market
Introduction
❑ SoC is an IC that implements most or all function of complete electronic system
❑ SoC is IC that integrates software & hardware Intellectual Property (IP) using
more than one design methodology for the purpose of defining functionality &
behavior of proposed system
❑ SoC technology put max. amount of technology into smallest possible space
❑ SoC bridges the gap b/w software & hardware
❑ Four vital areas of SoC:
❑ Higher levels of abstraction
❑ IP and platform re-use
❑ IP creation – ASIPs, interconnect and algorithm
❑ Earlier software development and integration
3
Three forms of SoC Design
❑ ASIC Vendor Design:
– Design in which all components in chip are designed as well as fabricated
by ASIC vendor
❑ Integrated Design:
– Design by ASIC vendor in which all components are not designed by that
vendor. It implies the use of cores obtained from some other source such as
a core/IP vendor or a foundry
❑ Desktop Design:
– Design by fabless company that uses cores which for most part obtained
from other source such as IP companies, EDA companies, design services
companies, or foundry
SoC Design Challenges

❑ SoC design takes longer compared to ASIC
❑ For ASIC following factors influence Turn Around Time (TAT):
– Frequency of the design
– Number of clock domains
– Number of gates
– Density
– Number of blocks & sub-blocks
❑ Factor that influences TAT for SoC is system integration (integrating different
silicon IPs on same IC)
On-Chip bus
Off-Chip bus 128/512 bits
32-bits
Drivers
Drivers
ASIC DRAM
I/O
I/O
ASIC DRAM
SoC
Leverage Internal Bandwidth vs External Bandwidth

8
4
SoC Design Challenges
Verification Definition Library Issues
32 Bit Memory
ARM uP
Arbiter, Bridge,
DMA, Address SRAM
ARM9 Decoder, SRAM IP Integration
HW/SW Microcontroller
I/F
co-verification
Communication
Application Interfaces Interfaces
Application (USB, Enet, Smartcard)
Logic with Logic
DSP pre-
processing Standard Peripherals Standard
Analog sensor (Watchdog, Timers, Peripherals
UART, etc)
Hierarchical Platform for

5 Different IP Sources
Design future chips
Your System Just Grew Pins !

9
Key Parameters of Scaling

Process Power
.18u to .13u to 90n… Active, Leakage
Performance Productivity
300MHz to multi-GHz… Gates/designer-day,
Design reuse
Density
Metal layer tradeoff, Quality / Reliability
library selection, DPM, FIT, SER,
optimization Hot Carrier Effect
Complexity
2M to 5M to 10M gates…
10
10
5
Migration from ASICs to SoCs
❑ Application Specific Integrated Circuits (ASIC): is logic chips designed by end
customers to perform a specific function for a desired application
❑ ASIC vendors supply libraries for each technology they provide. These libraries
contain pre-designed & pre-verified logic circuits
❑ ASIC technologies are:
– Full-Custom, Semi Custom, Platform based, Gate array, Standard Cell
❑ Mid-1990s, ASIC technology evolved from chip-set philosophy to an embedded-

cores-based System-on-Chip (SoC) concept
❑ SoC is an IC designed by stitching together multiple stand-alone VLSI designs
to provide full functionality for an application
❑ SoC compose of pre-designed models of complex functions known as cores
(terms such as intellectual property block, virtual components, and macros) that
serve a variety of applications
11
11
SoC vs. ASIC

❑ SoC is not just a large ASIC
❑ Architectural approach involving significant design reuse
❑ Addresses the cost & time-to-market problems
❑ SoC methodology is an incremental step over ASIC methodology
❑ SoC design is significantly more complex
❑ Need cross-domain optimizations
❑ IP reuse & Platform-based design increase productivity, but not enough
❑ Even with extensive IP reuse, many ASIC design problems remain
❑ Productivity increase far from closing design gap
Much higher levels of High Performance

integration (<0.25um) Circuits & Tools
Common Shorter Design

Architectural Blocks Cycle+ Lagging
Productivity
12
12
6
SoC Benefits
❑ Typical approach:
– Define requirements SoC Consists of
– Design with off-the shelf chips
❑ Digital ❑ Analog
– at 0.5 year mark : first prototypes
– 1 year : ship with low margins/loss – Control ❑ RF
– Start ASIC integration – µP ❑ Power
– 2 years : ASIC-based prototypes – DSP ❑ MEMs
– 2.5 years : ship, make profits – Interfaces
❑ IP
❑ With SoC ❑ Memory
– Define requirements – SRAM
– Design with off-the shelf cores – DRAM
– at 0.5 year mark : first prototypes – FLASH
– 1 year : ship with high margin and market share
mem Proc IP cores
mem
Ip- USB
CPU Sec hub USB
CPU DSP
hub
Typical : $70 Typical : $10

Ip- X
DSP X Sec Co-
Proc
Up to now : Collection of Chips Now : Collection of Cores 13
13
Typical SoC Applications

❑ Typical applications of SoC: Consumer Devices, Networking, Communications,
& other segments of electronic industry
– Microprocessor, Media Processor, GPS Controllers, Cellular Phones, GSM
Phones, Smart Pager ASICs, Digital Television, Video Games,
PC-on-a-Chip
❑ Common problems designing complex chip:
– Time-to-market pressures demand rapid development
– Quality of results (performance, area, power)
– Increasing chip complexity makes verification more difficult
– Deep submicron issues make timing closure more difficult
– Development team has different levels and areas of expertise, and is often
scattered throughout the world
– Designers may worked on similar designs in past, but cannot reuse these
designs because the design flow, tools & guidelines changed
– SoC designs include embedded processor cores, so significant software
component, which leads to additional methodology, process &
organizational challenges
14
14
7
SoC Architecture
❑ CPU, Direct Memory Access (DMA), DSP engine all share same bus
❑ Lot of control wires b/w blocks & peripheral buses b/w subsystems  there is
interdependency b/w blocks & lot of wires in chip
❑ Intelligent system integration is used on-chip interconnect that unifies all traffic
into single entity
❑ Example: Sonics’ SMART Interconnect SiliconBackplane MicroNetwork

❑ To compare traditional CPU bus, on-chip interconnect such as Sonics
SiliconBackplane has following advantages:
– Higher efficiency, Flexible configuration, Guaranteed bandwidth & latency,
Integrated arbitration
❑ MicroNetwork is a heterogeneous, integrated network that unifies, decouples &
manages all of the communication b/w processors, memories, I/O devices
15
15
SoC Architecture
❑ Example: Basic WiseNET
❑ Architecture includes: Ultralow-power dual-band radio transceiver (Tx & Rx)
❑ Sensor interface with signal conditioner & two A/D converters (ANA_FE)
❑ Digital control unit based on Cool-RISC microcontroller (μC) with on-chip low-
leakage memory, several time basis & digital interfaces
❑ Power management block (POW)
16
16
8
SoC Architecture
❑ Example: Typical gateway VoIP (Voice over Internet Protocol) system-on-chip
❑ Gateway VoIP SoC is a device used for functions such as vocoders, echo
cancellation, data/fax modems, and VoIP protocols
17
17
SoC Example
❑ Beaglebone black development board
– This board contains a AM335x microprocessor based SoC. This block
diagram shows the all internal components of AM335x SoC. This SoC
contains all peripherals inside single chip along with Cortex A8 processor
18
18
9
SoC Architecture
❑ Architecture means operational structure of users view of system with time it
evolves functional specification & hardware implementation.
❑ System architecture defines system-level building blocks such as processors
and memories interconnection between them
❑ Processor architecture determines processor’s IS, associated programming
model, its detailed implementation, includes hidden registers, branch prediction
circuits and specification details concerning ALU.
❑ Implementation of a processor is called microarchitecture
19
19
19
SoC Hardware & Software Implementations

❑ If all elements cannot be contained on a single chip, the implementation can be
system on board (or chip) but often called SoC
❑ In fundamental decision in SoC design is choose which components in system
are implemented in hardware & in software
❑ Software is usually executed on general-purpose processor (GPP) which
intercepts instructions at run time.
❑ Architecture offers flexibility, adoptability and resource sharing among
applications which hardware is generally slower & power hungry
❑ SoC designers aim to combine benefits b/w h.w & s.w
20
20
10
SoC Processor Execution Sequence
❑ SoC processors directly implemented sequential execution model. Next
instruction is not processed until all execution for current instruction is complete
and results have been committed:
❑ 1. Fetch instruction into instruction register (IF)
❑ 2. Decode the opcode of instruction (ID)
❑ 3. Generate the address in memory of any data item residing there (AG)
❑ Fetch data operand into executable registers (DF)
❑ Executing specified operation (EX)
❑ Write back the result to register file (WB)
Instruction timing in a Pipelined processing 21
21
SoC Pipelined Soft Processor Execution

❑ Simple pipeline execution has only 1-operation in each phase at any given time
❑ Rigid or static pipeline requires the processor to go through all stages or
phases of pipeline required by a particular instruction or not
❑ Dynamic pipeline allows bypassing of one or more pipeline stages depends on
required instruction
❑ More complex dynamic pipeline allows instruction to complete out of order or
even to initiate out of order
22
22
11
SoC Superscalar processor execution
❑ All instructions are independent to execute on parallel basis
❑ Dynamic pipelined processors remain limited to execute a single operation per
cycle by virtue of their scalar nature. This limitation can be avoided with addition
of multiple functional units and dynamic scheduler to process more than one
instruction per cycle
Superscalar Instruction timing in pipelined ILP Superscalar processor model 23
23
SoC VLIW Processors

❑ Very long instruction word (VLIW) rely on static analyses in compiler instead of
hardware to implement
❑ VLIW processor executes multiple independent operations due to control
complexity is not significantly greater than scaler processor
❑ VLIW processor is used for high performance applications
VLIW processor model

24
24
12
SoC SIMD Processors
❑ Single Instruction Stream Multiple Data Stream (SIMD) class of processor
includes both array and vector processors
❑ SIMD processor is a natural response to the use of certain regular data
structures such as vectors & matrices
❑ Assembly based programming SIMD architecture is similar to programming
processor except some operations perform computations on aggregate data
❑ Array processor consists of many interconnected processor elements, each
having their own local memory space
❑ Vector processing consists of a single processor that references a global
memory space and has special function units that operate on vectors
Array processor model Vector processor model 25
25
SoC Multiprocessors
❑ Multiprocessors can cooperatively execute to solve a single problem by using
some form of interconnection for sharing results
❑ Each processor executes completely independently however, most applications
requires some form of synchronization during execution to pass information and
data b/w processors
❑ It share memory and execute separate program tasks (MIMD: multiple
instruction stream, multiple data stream), their proper implementation is
significantly more complex than array processor
26
26
13
SoC Memory & Addressing
❑ SoC applications vary with memory requirements. It can be simple with only
ROM & RAM or can be support to elaborate operating system requiring large
off-chip memory with a memory management unit and cache hierarchy
Processors with memory off-die
SoC: processors & memory
27
27
SoC Memory Examples

❑ SoC designers to consider whether to put RAM & ROM on-die or off-die
28
28
14
Memory for SoC Operating System
❑ Critical decision in SoC about selection of Operating system & memory
management functionality such as virtual memory
❑ If system is restrict to real memory with MBs, system can be implemented on
true system on SoC (all memory on-die)
❑ Virtual memory is often slower and more expensive requires complex memory
management unit
❑ Users has limited ways of creating new processes & expanding application
base of system
Operating system for SoC designs
Virtual-to-real address mapping 29
29
Virtual Memory
❑ Virtual memory, or virtual storage provides an "idealized abstraction of storage
resources that are actually available on a given machine which "creates illusion
to users of a very large (main) memory
❑ The operating system using a combination of hardware & software,
maps memory addresses used by program, called virtual addresses,
into physical addresses in computer memory.
❑ Operating system manages virtual address spaces & assignment of real
memory to virtual memory. Address translation hardware in CPU, often referred
to as memory management unit (MMU), automatically translates virtual
addresses to physical addresses
Virtual memory combines active RAM & inactive

memory to form large range of addresses 30
30
15
SoC Memory Management Unit
❑ A memory management unit (MMU), or paged memory management
unit (PMMU) has all memory references passed through itself, primarily
performing the translation of virtual memory addresses to physical addresses
❑ MMU effectively performs virtual memory management, handling at same
time memory protection, cache control, bus arbitration and, in simpler computer
architectures (especially 8-bit systems), bank switching
❑ Modern MMUs typically divide virtual address space (range of addresses used
by processor) into pages, each having a size which is a power of 2, usually a
few KB, but they may be much larger. The bottom bits of address (offset within
a page) are left unchanged. Upper address bits are virtual page number
31
31
Intellectual Property (IP)

❑ Utilizing the pre-designed modules enables:
– To avoid reinventing the wheel for every new product
– To accelerate the development of new products
– To assemble various blocks of large ASIC/SoC quite rapidly
– To reduce the possibility of failure based on design & verification of block
for first time
❑ These predesigned modules are commonly called Intellectual Property (IP)
cores or Virtual Components (VC)
Examples
32
32
16
Core-based Design
❑ Cores are pre-designed, verified but untested functional blocks
❑ Core is the intellectual property of vendor
– Internal details not available to user
❑ Core-vendor supplied tests must be applied to embedded cores
33
33
Types of Intellectual Property

❑ Soft IP cores: (Synthesizable RTL)
– Technology independent synthesizable in RTL VHDL/Verilog code to
provide functional descriptions of IPs
– Layout is completely flexible
– Performance & area depend on cell library used
– Synthesis, test, timing analysis, & verification are required
– Offers max. flexibility & re-configurability to match the requirements
❑ Hard IP cores: (Non-Modifiable Layout)

– Consist of hard layout & timing information using particular physical design
libraries that cannot be modified
– Highly optimized for area & performance, synthesized to specific technology
– Includes behavioral model for simulation
– May be encrypted
– Test sets (test stimuli and test responses) are given
– Gives min. flexibility to the core integrator, but saves design time and effort
– Cannot be re-synthesized
34
34
17
Types of Intellectual Property
❑ Firm IP cores: (Gate-Level Netlist)
– Technology-dependent gate-level netlist that meets timing constraints
– Layout is flexible
– Best of both cores balance high performance & optimization of hard IPs with
flexibility of soft IPs
– Performance & area are more predictable
– May be encrypted to protect IP
– Technology dependent delivered in targeted netlists to specific physical
libraries after going through synthesis without performing the physical layout
– Internal implementation of core cannot be modified
– User can parameterize I/O to remove unwanted functionality
35
35
Trade-offs among Soft, Firm & Hard Cores

Soft
Reusability core
Portability Firm
Flexibility core Hard
core
Predictability, Performance, Time-to-Market
IP Format Representation Optimization Technology Reusability

Hard GDSII Very High Technology Low
Dependent
Soft RTL Low Technology Very High
Independent
Firm Target Netlist High Technology High
Generic 36
36
18
Design for Reuse
❑ Reusing macros/cores/IP that have already been designed & verified helps to
address all of the problems above
❑ To overcome the design gap, design reuse - use of pre-designed & pre-verified
cores, or reuse of existing designs becomes vital concept in design
methodology
❑ Effective block-based design methodology requires extensive library of
reusable blocks, or macros based on following principles:
– Macro must be extremely easy to integrate into overall chip design
– Macro must be so robust that integrator has to perform essentially no
functional verification of internals of the macro
Challenge for designers is not whether to adopt

reuse, but how to employ it effectively
37
37
Design for Reuse

❑ To be fully reusable, hardware macro must be:
❑ Designed to solve general problem:
– Easily configurable to fit different applications
❑ Designed for use in multiple technologies:
– For soft macros means, synthesis scripts must produce satisfactory quality
of results with a variety of libraries
– For hard macros means, effective porting strategy for mapping the macro
onto new technologies
❑ Designed for simulation with a variety of simulators:
– Good design reuse practices both Verilog & VHDL & verification test-bench
should be available & work with all major commercial simulators
❑ Designed with standards-based interfaces:
– Unique or custom interfaces should be used only if no standards-based
interface exists
38
38
19
Design for Reuse
❑ Verified independently of the chip in which it will be used:
– Often, macros are designed & only partially tested before being integrated
into a chip for verification. Reusable designs must have full, stand-alone
test benches & verification suites that afford high levels of test coverage
❑ Verified to a high level of confidence:
– Very rigorous verification as well as building a physical prototype that is
tested in actual system running real software
❑ Fully documented in terms of appropriate applications & restrictions:
– In particular, valid configurations & parameter values must be documented.
Any restrictions on configurations or parameter values must be clearly
stated. Interfacing requirements and restrictions on how the macro can be
used must be documented
39
39
IP Reuse based SoC Design

❑ Controller, Custom Logic, AMS blocks
❑ Short Time-to-Market (pre-designed)
❑ Less Expensive (reuse)
❑ Faster Performance (optimized algorithms & implementation)
❑ Lesser Area (optimized algorithms & implementation)
40
40
20
Virtual Components (VC) for Reuse
41
41
What is MPSoC?
❑ Multiprocessor System-on-Chip (MPSoC) is SoC that uses multiple instruction-
set processors (CPUs) usually targeted for embedded applications
❑ Used by platforms that contain multiple, usually heterogeneous, processing
elements with specific functionalities reflecting the need of expected
application domain, a memory hierarchy & I/O components
❑ MPSoCs often require large amounts of memory
42
42
21
Homo/Heterogenous SoC
CPU CPU CPU CPU CPU DSP MEM
MEM MEM MEM MEM
Interconnection network (BUS)
Interconnection network (BUS, XBAR)

Embedded Dedicated I/O
FPGA IP
CPU CPU CPU CPU
MEM MEM MEM MEM Heterogenous SoC (MP-SoC)
Homogeneous SoC (MP-SoC)
43
43
Typical Low Power SoC

❑ Processors: uC, uP or DSP cores
❑ Standard interface core: Industry standards UART, PCI, USB, FireWire,
Ethernet, I2C, SPI
❑ Memory cores: D-cache, ROM, RAM, Flash, EEPROM
❑ Peripherals: cache controllers, counter/timers, DMA controllers, real-time timers,
power-on-reset generators
❑ Timing sources: oscillators, PLL
❑ Digital cores: JPEG compressor, MPEG decoder, VGA
❑ Analog cores: clock recovery, RF components, ADC, DAC
❑ Voltage regulators & power management circuits
44
44
22
Hardware-Software Codesign
❑ SoC design process is a hardware-software codesign in which design
productivity is achived by design reuse
– Design exploration at behavioral level (C, C++, etc.) by system architects
– Creation of Architecture Specification
– RTL Implementation (Verilog/VHDL) by hardware designers
❑ Drawbacks
– Specification Errors - susceptible to late detection
– Correlating validations at behavioral & RTL difficult
– Common interface b/w system & hw designers based on natural language
Generic C/C++/Java SW
Applications
µC specific Assembly
Compiler Opcode
Assembler
I/O
Register Board support DSP Drivers

Setup in µC package
Opcode HEX code

instruction
ASIC FPGA
VHDL CPLD HW
PAL/PLA ... 45
45
Design Methodologies
46
46
23
Time Driven Design (TDD) Methodology
❑ Best for moderately sized & complex ASICs
– Consists primarily of new logic on DSM processes
– Doesn’t have a significant utilization of hierarchical design
❑ Problems:
– Looping between synthesis & placement
– Long turnaround times for each loop
– Unanticipated chip-size growth late in design process
– Repeated area, power, & timing, optimizations
– Late creations of adequate manufacturing test vector
47
47
Block-based Design (BBD) Methodology

❑ Ideally, BBD is behaviorally modeled at system level, where HW/SW tradeoffs
& co-verification is performed
❑ New design components are partitioned & mapped onto specified RTL blocks
❑ RTL blocks are then designed to budgeted timing, power, & area constraints
❑ Opportunistically reused functions are poorly characterized, subject to
modification & require pre-verification
❑ Hard, firm, or soft programmable processor cores
❑ Extracting test-bench data from system-level simulation to support functional
verification
– Shift from “ASIC-out” to “system-in” verification
❑ Processor-dominated or custom bus architecture
❑ Predominately flat manufacturing test architecture is used
❑ Timing analysis in both a hierarchical and flat context
– Top-down planning to create block budgets for hierarchical timing analysis
– Flat detailed timing analysis for higher accuracy
– Management of guard band becomes critical in DSM design to ensure
design convergence
❑ Handoff b/w design team & ASIC vendor often occurs at lower level than in
TDD
• A fully placed netlist is normal, not uncommon to go all way to GDS II
• RTF is attractive, but impractical for all but least aggressive designs
48
48
24
Platform-based Design (PDB) Methodology
❑ TDD + BDD + extensive & planned design reuse
❑ PBD separates design into two areas of focus:
1. Block authoring: Blocks are generated so they interface easily with
multiple target designs. Two new design concepts must be established:
– Interface standardization:
• Block authoring may be distributed among different design teams if
under same interface specifications & design methodology guidelines
– Virtual system design:
• To establish system constraints necessary for block design. For
example: Power profile, Test options, Hard, firm, or soft,…
2. System-chip integration: Focus on designing & verifying system architecture,
& interfaces b/w blocks. Starts with partitioning system around preexisting
block-level functions & identifying new or differentiating functions needed
• System-level partitioning along with performance analysis, HW/SW
design tradeoffs & functional verification
49
49

❑ PBD design is either derivative design with added functionality, or a
convergence design where previously separate functions are integrated
❑ Hierarchical, heterogeneous test architecture
– Manufacturing test design is incorporated into standard interfaces to
support each block’s specific test methodology
– BIST, scan BIST, full/partial scan, JTAG, …
❑ Hierarchical timing analysis
– Based on pre-characterized block timing
❑ Hierarchical routing
– Interblock routing is to and from area-based block connectors
– Constraint driven to meet SI and interface timing requirements
❑ Physical design is a key stage in design
– Usually 0.25 mm and smaller process technology
– DSM interconnect-dominated delays
❑ Uses primarily firm or hard VCs
– Predictable, and pre-verified
50
50
25
51
51
Waterfall Design
❑ Project transitions from phase to phase in a step function, never returning to
activities of previous phase
❑ Worked well up to 100k gates & down 0.5um
❑ H/W & S/W developments are serialized
❑ High investment at each level
❑ Requirements must be well defined first
❑ Domain & solution to the problem are extremely stable
❑ Client must wait until the end to see any product
❑ Inherent to data administration, because data persists
❑ Product failure signals process failure, promoting fear
❑ Upstream investments save on downstream costs
❑ Little or no prototyping
52
52
26
Spiral Design
❑ Process for developing a system in steps & throughout the various steps,
feedback is obtained & incorporated back into the process
❑ As complexity increases, geometry shrinks & time-to-market pressures continue
to escalate, chip designers are moving from waterfall to spiral model
❑ Design team works on multiple aspects of design simultaneously, incrementally
improving in each area as design converges on completion
❑ Parallel concurrent H/W /S/W development
❑ Parallel verification & synthesis
❑ Develop modules only if not available
❑ Planned iteration throughout
❑ Smaller investment at each level, iteration
❑ Requirements & problem can shift
❑ Success criteria determined for each
iteration
❑ Client may see product sooner, product is
likely lower quality initially
❑ Inherent to world, because world is
influenced by design
❑ Product failure is just part of design
process
❑ Costs similar across board
❑ Prototype dependent 53
53
Spiral / Waterfall Design Processes
Spiral
Waterfall
54
54
27
Top-Down Vs Bottom-Up Design
❑ Top-Down Design:
– Choice of algorithm (optimization)
– Choice of architecture (optimization)
– Definition of functional modules
– Definition of design hierarchy
– Split up in small boxes or units (modules)
– Define required units ( adders, state machine, etc.)
– Map into chosen technology (synthesis, schematic, layout)(change
algorithms or architecture if speed or chip size problems)
– Behavioral simulation tools
❑ Bottom-Up Design:
– Build gates in given technology
– Build basic units using gates
– Build generic modules of use
– Put modules together
– Hope that you arrived at some reasonable architecture
– Gate level simulation tools
Old fashioned design methodology a la discrete logic
55
55
SoC Requirements & Specifications

❑ It is important to analyze carefully the requirements, sufficient time, market
situation to determine all factors prior to design:
– Compatibility with previous designs
– Reuse of previous designs
– Customer requests/complaints
– Sales reports
– Cost analysis
– Competitive equipment analysis
– Trouble reports (reliability) of previous products
56
56
28
SoC Design Flow
Concept definition
Specification System specifications
Design
Improvements System simulation
Design
Measurements
Digital Spec
(Compl.& Perf.& Power)
Design Cycle
Modeling System simulation (II)
FPGA
HDL Simulation HDL Design
Implementation
Simulation &
Verification FPGA Synthesis FPGA prototype
System Design Process 57
57
SoC Flow
Specify
Generic SoC Generic
HW IP blocks Electronic system level design SW IP blocks
Architecture Partition
platform HW/SW
Application
Specific
Integrate SW IP Modules
Application Integrate SW
Application
Specific Specific IP Modules
Specific IP block Operating
HW IPs
into Arch. platform system
HW/SW Low level
Co-simulation SW design
HW design Functional SW simulation flow
flow simulation
HW & SW (low level)

FPGA platform
58
58
29
SoC Flow
HW & SW (low level)
FPGA platform
Physical Application
design SW development
HW/SW Verification on
Prototype application prototype
or development board SW test
IC fabrication
Prototype
IC fabrication
Ship ICs & SW to Clients

59
59
Design Methodologies
Front-End ASIC Design Flow Back-End Design Flow or Generic Physical Flow
60
60
30
SoC Design Flow
61
61
Four major areas of SoC design

❑ Block(IP) authoring
❑ VC delivery
❑ Chip integration
❑ Software development
(Design environment)
62
62
31
EDA/CAD Tools
63
63
SoC Processor Area

❑ SoC have die sizes 10-15mm on a side. Die is produced in bulk from
large wafer around 30cm diameter (about 12 inch)
❑ Defect randomly occur over wafer surface
NG : good Dies
ND : Defect Dies
Poission distribution of defects
ρD : Defect density
64
64
32
SoC Processor Area
rbe : Register bit equivalent
65
65
SoC Baseline Area Model

❑ Manufacturing process has defect density of 0.2 defect/cm2, pd=0.2,
yield Y=0.95, A=25mm2
❑ Smaller the feature size, more logic can be accommodated

❑ Latches, 10%, buses, routing, clocking & overall control 40% space
❑ Total system area: Processor+storage=1462A, 5200-2462=2728A for
cache
66
66
33
Baseline die floor plan
❑ Summary of area design rule:
1. Compute target chip size from target yield &
defect density
2. Compute die cost
3. Compute net available area. Allow 10-20%
for pins, guard ring, power etc.
4. Determine rbe size from minimum size
5. Allocate area based on trial system
architecture until base size is calculate
6. Allocate area for cache & storage
67
67
SoC Power Management
Battery capacity & duty cycle 68
68
34
SoC Future
❑ Have 100s of hardware blocks
❑ Have billions of transistors
❑ Have multiple processors
❑ Have large wire-to-gate delay ratios
❑ Handle large amounts of high-speed data
❑ Need to support “plug-and-play” IP blocks
❑ Future NoC needs to be ready for these SoCs...
69
69
Networks-on-Chip
70
70
35
SoC Nightmare
System Bus
DMA CPU DSP
Mem The “Board-on-a-Chip”

Ctrl. Bridge Approach
The
MPEG architecture is
I o o tightly coupled
C
Control Wires Peripheral Bus
71
71
Evolution or Paradigm Shift?

❑ Architectural paradigm shift
– Replace wire spaghetti by an intelligent network infrastructure
❑ Design paradigm shift
– Busses & signals replaced by packets
❑ Organizational paradigm shift
– Create a new discipline, a new infrastructure responsibility
❑ Communication Bandwidth: Rate of information transfer b/w module &
surrounding environment in which it operates usually measure in
byte/sec.
Network
link
Network
router
Computing
module
Bus
72
72
36
Bus based On-Chip Communication
❑ Buses are collection of wires to which one or more components are
connected
❑ Master (or Initiator): IP component that initiates a read or write data
transfer
❑ Slave (or Target): IP component that does not initiate transfers & only
responds to incoming transfer requests
❑ Arbiter: Controls access to the shared bus. Uses arbitration scheme to
select master to grant access to bus
❑ Decoder: Determines which component a transfer is intended for
❑ Bridge: Connects two busses & acts as slave on one side & master on
other
73
73
Types of Buses
❑ Bus typically consists of three types of signal lines
❑ Address
– Carry address of destination for which transfer is initiated
– Can be shared or separate for read, write data
❑ Data
– Carry information b/w source & destination components
– Can be shared or separate for read, write data
– Choice of data width critical for application performance
❑ Control
– Requests and acknowledgements
– Specify more information about type of data transfer
• Byte enable, burst size, cacheable/bufferable, write-back/through, …
Address Bus
Data Bus
Control Bus
74
74
37
Bus Standards
❑ Bus Standards: Microprocessors have adopted number of bus
standards such as VME bus, Intel Multibus-II, ISA bus, PCI bus etc. to
connect together ICs on PCBs in System-on-board implementation
❑ Arbitration & Protocols: Bus master is the unit that initiate
communication on computer bus or I/O path. In SoC, bus master is
processor and slaves are I/O devices & memory components
❑ Bus master controls the bus paths using slave address, control signal &
flow of data signals b/w master & slaves such process called arbitration
❑ Bus protocol is an agreed set of rules for transmitting information b/w
two or more devices over a bus. Protocols determine:
❑ 1. Type & order of data being sent
❑ 2. How sending device indicates that it has finished sending information
❑ 3. Data compression method used
❑ 4. How receiving device acknowledge reception information
❑ 5. How arbitration is performed to resolve contention on bus & in what
priority & type of error checking to be used
75
75
Bus Standards
❑ Bus Bridge: Module that connects together two buses but not necessary
❑ Three Functions:
– 1. If two buses use different protocols, a bus bridge provides necessary
format & standard conversion
– A bridge is inserted b/w two buses to segment them & keep traffic contained
within segments. This improves concurrency: both buses operate same time
– A bridge often contains memory buffers & associated control circuits that
allow write posting
❑ Physical Bus Structure: depends on no. of wires, paths, cycle time etc. &
protocol
❑ Bus Varieties: Buses may be unified or split. Unified bus address is initially
transmitted in bus cycle followed by one or more data cycles. Split bus has
separate buses for each of these functions
76
76
38
SoC AMBA Buses
❑ Two commonly used SoC bus standards are Advanced Microcontroller
Bus Architecture (AMBA) bus developed by ARM & CoreConnect bus
by IBM. AMBA: Two buses are defined in AMBA specification:
❑ Advanced High-Performance Bus (AHB) is designed to connect
embedded processors, peripherals, Direct Memory Access (DMA)
controllers, on-chip memory & interfaces. It has high speed, high
bandwidth bus architecture, 32-bit data operation with extendable
1024-bits, concurrent multiple master/slave operations
❑ Advanced Peripheral Bus (APB) has lower performance than AHB bus
for slower peripheral modules. It is designed for low power & reduced
interface complexity
❑ Third bus Advanced System Bus (ASB) is designed for low
performance 16/32-bit uCs
77
77
SoC CoreConnect Bus

❑ IBM CoreConnect Bus: designed for specific processor core,
PowerPC and also adoptable to other processers. CoreConnect Bus &
AMBA bus share many common features
❑ CoreConnect Bus architecture provides three buses for interconnecting
cores, library macros & custom logic such as:
– Processor local bus (PLB)
– On-chip peripheral bus (OPB)
– Device control register (DCR) bus
CoreConnect-based SoC 78
78
39
SoC PLB/Split Transaction Buses
❑ PLB Bus: It is used for high-bandwidth, high performance & low-latency
interconnections b/w processors, memory & DMA controllers
❑ It is fully synchronous, split transaction bus with separate address, read,
& write data buses allowing two simultaneous transfers per clock cycle
❑ All masters/slaves have their own Address Read/Write Data & control
signals also called transfer qualifier signals
❑ Split transaction Bus: PLB address Read/Write data buses are
decoupled, allowing for address cycles to be overlapped with Read or
Write data cycles and read data cycles to be overlapped with write data
cycle
PLB transfer protocol 79
79
SoC OPB Bus

❑ On-chip peripheral bus (OPB) is designed for easy connection of on-
chip peripheral devices
❑ It provides a common design point for various on-chip peripherals.
❑ It is fully synchronous bus which functions independently at a separate
level of bus hierarchy. It is not intended to connect directly to processor
core. The processor core can access the slave peripherals on this bus
through the PLB to OPB bridge unit which is a separate core
80
80
40
Bus Topologies
❑ Reduces impact of capacitance across two segments
❑ Reduces contention & energy
Improves system throughput

Multiple ongoing transfers on different buses
Hierarchical Shared Bus
Shared Bus 81
81
Types of Bus Topologies
Full crossbar/matrix bus (point to point)

Partial crossbar/matrix bus
Tri-state buffer based bidirectional signals

Ring Bus
82
82
41
Bus Physical Structure
MUX based signals

AND-OR based signals
83
83
Electrical Properties & System Integration

1. Efficient interconnect: Delay, Module Module Module
Power, Noise, Scalability,

Reliability Module Module Module Module
2. Increase system: Integration Module Module

Productivity Module
3. Enable Multi Processors for SoCs Module Module
Wire-area
d n d n
NoC: d Simple Bus: d
O (n)
O n3( n)
( n)
n n
O (n) O n
d n
Point-to Point: Segmented Bus: d
O n2( n ) O n2( n )
( ) ( )
n
O n n O n n
Scalability – Area & Power in NoCs 84
84
42
On-Chip Communication
❑ Bus Structure: Highly connected multi-core system. Largely power
consumes and increase capacitive load
❑ Point-to-point Structure: Optimal in bandwidth, availability, latency &
power usage, simple to design. Occupy hardware area & routing
problems
❑ NoC Structure: Lower complexity & cost, can be reuse of components,
efficient & high-performance interconnect, Scalable. May have latency,
software need clear synchronization in uP
❑
Point-to-point NoC Network 85

Bus
85
Bus Vs NoCs
Bus-based architectures Irregular architectures Regular Architectures
❑ Bus based interconnect ❑ Networks on Chip

– Low cost ❑ Layered Approach
– Easier to Implement – Better electrical properties
– Flexible – Higher bandwidth
– Bandwidth limited – Energy efficiency
– Speed grows down – Scalable
– No concurrency – Concurrent to reuse
– Central arbitration – Built-in Pipeline
– No layers of – Separate abstraction layers
abstraction – No performance guarantee
– Extra delay in routers
– Area & power still increase
86
86
43
Typical NoC Router
H Buffer
Buffer H
Crossbar Switch H Buffer

Buffer H
H Buffer
Buffer H
Routing Arbitration
▪ This example uses a centralized arbitrer for all I/O ports

▪ Distributed arbitration can also be used
87
87
Layered Approach
Software Traffic Queuing
Modeling Theory
Transport Architectures
Network Separation of
concerns
Wiring
Networking
PE PE PE
PE PE PE Router PE
PE PE PE
Regular NoC
88
88
44
Networks-on-Chip (NoC)
❑ Network-on-chip (NoC) is a communication subsystem on IC b/w IP cores in SoC
❑ NoC can span synchronous & asynchronous clock domains or use unclocked
asynchronous logic
❑ NoC technology applies networking theory & methods to on-chip communication
& brings notable improvement over conventional bus & crossbar interconnections
❑ NoC improves the scalability of SoC & power efficiency of complex SoC
compared to other designs
❑ Significant design effort for bus-style interconnect is not going to sufficient for
large SoCs:
– Physical implementation does not scale: bus fanout, loading, arbitration
depth all reduce operating frequency
– Available bandwidth does not scale: the single bus must be shared by all
masters & slaves
89
89
NoC Characteristics
❑ Communication latency: Time delay b/w module requesting data &
receiving a response to request
❑ Master & Slave: A unit can initiate or react to communicate requests. A
Master such as processor, controls transactions b/w itself & other
modules. A Slave such as memory responds to requests from master
❑ Concurrency Requirement: Number of independent simultaneous
communication channels operating in parallel
❑ Packet or Bus Transaction: The size and definition of information
transmitted in a single transaction. For bus: address with control bits
(R/W) and data. Packet consists of header (address & control) data
❑ ICU: It manages interconnect protocol & physical transaction. If IP core
requires protocol translation to access bus, the unit called bus wrapper
❑ Multiple Clock Domains: Different IP modules may operate at different
clock and data rates
90
90
45
NoC Properties
❑ NoC is more than a single shared bus
❑ NoC provides point-to-point connections b/w any two hosts attached to
network either by crossbar switches or through node-based switches
❑ NoC provides high aggregate bandwidth through parallel links
❑ NoC, communication is separate from computation
❑ NoC uses a layered approach to communications, although with few
network layers due to complexity and expense
❑ NoCs support pipelining and provide intermediate data buffering b/w
sender & receiver
91
91
NoC Properties
❑ Regular geometry that is scalable 1 ns (1 GHz) 0.1 ns (10 GHz)
❑ Flexible QoS guarantees B B
❑ Higher bandwidth
❑ Reusable components
– Buffers, arbiters, routers, protocol stack A
❑ No long global wires (or global clock tree) A
– No problematic global synchronization Year 2005 Year 2010
– GALS: Globally asynchronous, locally synchronous design

❑ Reliable & predictable electrical & physical properties
❑ Silicon cost of a bus is small
❑ Any bus is almost directly compatible with most available IPs, including software
running on CPUs
❑ Concepts are simple & well understood
❑ Problems:
– Internal network contention causes (often unpredictable) latency
– Network has a significant silicon area
– Bus-oriented IPs need smart wrappers
– Software needs clean synchronization in multiprocessor systems
– System designers need reeducation for new concepts
– Every unit attached adds parasitic capacitance, therefore electrical
performance degrades with growth
– Bus timing is difficult in a deep submicron process
– Bus arbiter delay grows with number of masters
– Bandwidth is limited & shared by all units attached 92
92
46
Routing Algorithms
❑ NoC routing algorithms should be simple
– Complex routing schemes consume more device area (complex
routing/arbitration logic)
– Additional latency for channel setup/release
– Deadlocks must be avoided
❑ Deadlock can occur if it is impossible for any messages to move (without
discarding one)
– Buffer deadlock occurs when all buffers are full in a store and forward
network. This leads to a circular wait condition, each node waiting for space
to receive the next message.
– Channel deadlock is similar, but will result if all channels around a circular
path in a wormhole-based network are busy (recall that each “node” has a
single buffer used for both input and output)
❑ Some additional features are highly desirable
– QoS, fault-tolerance
93
93
2D-Mesh NoC–XY Routing

❑ X-Y routing is determined completely from their addresses
❑ In X-Y routing, message travels “horizontally” (in X-dimension) from source node
to “column” containing destination, where message travels vertically
– X direction is determined first, next Y direction
❑ Four possible direction pairs, east-north, east-south, west-north, & west-south
❑ Advantages for X-Y routing:
– Very simple to implement
– Deterministic
– Deadlock-free
94
94
47
Optimizing NoC Particular Application
❑ NoC architecture has to flexible & parametric
• Parameters allow customization
• Parameters: Buffers depth, number of virtual channels, NoC size, etc.
❑ Application Specific Optimization
– Buffers Application Architecture Library
– Routing
– Topology Architecture / Application Model
– Mapping to topology
– Implementation and Reuse NoC Optimisation
❑ Architecture Optimization Configure

– Quality of Service (QoS) Support Evaluate
Refine
– Topology
❑ Fault Tolerance Analyse / Profile
– Gossiping architectures
Good?
No
Optimized
Synthesis NoC
Communication Centric Design
95
95
NoC Design Flow

R R R R R
❑ Optimize capacity for performance/power tradeoff
Module Module Module
❑ Capacity allocation is a traditional WAN optimization
problem R Module R R R
Module Module
R R R R R
Extract inter-
module traffic Module Module Module Module
Module R R R R R
Module
Place modules R R R Module R R
Module
Module
Allocate link R R R R
capacities Module Module Module
R Module R
Module Module
R R R R R
Verify QoS and
Module Module Module Module
cost
Module R R R R
Module
R Module R
Module
Module
96
96
48
NoC Structure
❑ NoC is a packet switched on-chip communication network designed using a
layered methodology
– “Routes packets, not wires”
❑ NoCs use packets to route data from the source to the destination PE via a
network fabric that consists of
• Switches (routers)
• Interconnection links (wires)
❑ NoCs are an attempt to scale down the concepts of largescale networks, &
apply them to embedded SoC domain
❑ ISO/OSI network protocol stack model
Processing
Elements (PEs)
97
97
NoC Topologies
❑ Famous NoC topologies are:
a) SPIN,
b) CLICHE’
c) Torus
d) Folded torus
e) Octagon
f) BFT
98
98
49
NoC Topologies
❑ Direct Topologies:
– Each node has direct point-to-point link to a subset of other nodes in
system called neighboring nodes
– Nodes consist of computational blocks and/or memories, as well as a NI
block that acts as a router
– When number of nodes in system increases, total available communication
bandwidth also increases
– Fundamental trade-off is b/w connectivity & cost
– Most direct network topologies have an orthogonal implementation, where
nodes can be arranged in n-dimensional orthogonal space
– Routing for such networks is fairly simple
• e.g. n-dimensional mesh, torus, folded torus, hypercube, & octagon
❑ 2D Mesh Topology:
– All links have same length
• Eases physical design
– Area grows linearly with number of nodes
– Must be designed to avoid traffic accumulating in center of mesh
Direct Topology 2D Mesh Topology
99
99
NoC Topologies
❑ Torus/k-ary /n-cube:
– n-dimensional grid with k nodes in each dimension
– k-ary 1-cube (1-D torus) is essentially a ring network with k nodes
• Limited scalability as performance decreases when more nodes
– k-ary 2-cube (i.e., 2-D torus) topology is similar to a regular mesh

• Except that nodes at edges are connected to switches at opposite edge
via wrap-around channels
• Long end-around connections can, however, lead to excessive delays
100
100
50
NoC Topology
❑ Folding Torus Topology:
– Overcomes the long link limitation of a 2-D torus
– Links have the same size
– Meshes and tori can be extended by adding bypass links to increase
performance at the cost of higher area
❑ Octagon Topology:
– Another example of a direct network
– Messages being sent between any 2 nodes require at most two hops
– More octagons can be tiled together to accommodate larger designs
• by using one of nodes is used as a bridge node
Folding Torus Topology Octagon Topology

101
101
NoC Topology
❑ Indirect Topologies:
– Each node is connected to an external switch, and switches have point-to-
point links to other switches
– Switches do not perform any information processing, and correspondingly
nodes do not perform any packet switching
– e.g. SPIN, crossbar topologies
❑ Fat tree topology:
– Nodes are connected only to the leaves of the tree
– More links near root, where bandwidth requirements are higher
❑ k-ary n-fly butterfly network:
– Blocking multi-stage network – packets may be temporarily blocked or
dropped in the network if contention occurs
– kn nodes, and n stages of kn-1 k x k crossbar
– e.g. 2-ary 3-fly butterfly network
Fat Tree Topology K-ary n-fly butterfly Network 102
102
51
NoC Topology
❑ (m, n, r) Symmetric Clos network
– Three-stage network in which each stage is made up of a number of
crossbar switches
– m is the no. of middle-stage switches
– n is the number of input/output nodes on each input/output switch
– r is the number of input and output switches
– e.g. (3, 3, 4) Clos network
– Non-blocking network
– Expensive (several full crossbars)
❑ Benes network
– Rearrangeable network in which paths may have to be rearranged to
provide a connection, requiring an appropriate controller
– Clos topology composed of 2 x 2 switches
– e.g. (2, 2, 4) re-arrangeable Clos network constructed using two (2, 2, 2)
Clos networks with 4 x 4 middle switches
(m,n,r) Symmetric Clos Network BenesNetwork 103
103
NoC Topology
❑ Irregular or ad hoc network topologies
– Customized for an application
– Usually a mix of shared bus, direct & indirect network topologies
– e.g. reduced mesh, cluster-based hybrid topology
104
104
52
NoC Example
Processor Processor Processor
Master Master Master
Routing Global
Routing Routing Memory
Node Node Node Slave

Global I/O
Slave
Routing Routing Routing

Node Node Node
Global I/O
Slave

Routing Routing Routing

Node Node Node
105
105
NoC Based FPGA Architecture

Functional FR
FR
CR CR ETH
unit CPU
I/F
CNI CNI CNI CNI CNI CNI NoC for inter-
R R R R R R routing
FR
CR CR
SERDES
CNI CNI CNI CNI CNI CNI
Routers R R R R R R
FR FR
FR FR
PCI CR D/A
DSP CPU
A/D
CNI CNI CNI CNI CNI CNI Configurable
R R R R R R
region – User
logic
CR CR CR CR
Configurable
network CNI CNI CNI CNI CNI CNI
interface R R R R R R
FR
FR
CR CR ETH
DRAM
I/F
CNI CNI CNI CNI CNI CNI
R R R R R R
106
106
53
Networks Processor
107
107
Ideal NoC Network

❑ What would the ideal network look like?
– Low area overhead
– Simple implementation
– High-speed operation
– Low-latency
– High-bandwidth
– Operate at a constant frequency even with additional blocks
– Increase available bandwidth as blocks are added
– Provide performance guarantees
– Have a “universal” interface
These are competing requirements: Design a network that is the “best” fit
108
108
54
SoC Interconnect Switches
❑ The units on chip, link bandwidth has 16-128 wires
❑ Node fanout is the number of channels connect a node to its neigbors
❑ Nodes can altered both to establish connectivity & improve n/w bandwidth
Static network (links b/w units is fixed)
109
109
Static & Dynamic Network

❑ Static n/w consist of 2D switches to connect SoC modules
❑ Dynamic n/w consist of centralized crossbar switch. These switches
avoid traffic congestion at different clock frequencies
❑ Crossbar-based interconnect connects local block on same chip
Dynamic network Switch-based on interconnect scheme

110
110
55

3 - SoC & NoC

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3 - SoC & NoC

Uploaded by

Copyright:

Available Formats

(EN 412) VLSI Design

By Prof. Dr. Yaseer A. Durrani

Body & Convenience

Airbag, ABS Brakes, Active Suspension,

SoC Design Challenges

Leverage Internal Bandwidth vs External Bandwidth

Hierarchical Platform for

Your System Just Grew Pins !

Key Parameters of Scaling

❑ Mid-1990s, ASIC technology evolved from chip-set philosophy to an embedded-

SoC vs. ASIC

Much higher levels of High Performance

Common Shorter Design

Typical : $70 Typical : $10

Typical SoC Applications

❑ Example: Sonics’ SMART Interconnect SiliconBackplane MicroNetwork

SoC Hardware & Software Implementations

Instruction timing in a Pipelined processing 21

SoC Pipelined Soft Processor Execution

Superscalar Instruction timing in pipelined ILP Superscalar processor model 23

SoC VLIW Processors

VLIW processor model

Array processor model Vector processor model 25

Processors with memory off-die

SoC: processors & memory

SoC Memory Examples

Operating system for SoC designs

Virtual-to-real address mapping 29

Virtual memory combines active RAM & inactive

Intellectual Property (IP)

Types of Intellectual Property

❑ Hard IP cores: (Non-Modifiable Layout)

Trade-offs among Soft, Firm & Hard Cores

Predictability, Performance, Time-to-Market

IP Format Representation Optimization Technology Reusability

Challenge for designers is not whether to adopt

Design for Reuse

IP Reuse based SoC Design

Interconnection network (BUS, XBAR)

MEM MEM MEM MEM Heterogenous SoC (MP-SoC)

Homogeneous SoC (MP-SoC)

Typical Low Power SoC

Register Board support DSP Drivers

Opcode HEX code

Block-based Design (BBD) Methodology

Platform-based Design (PDB) Methodology

Spiral / Waterfall Design Processes

SoC Requirements & Specifications

System Design Process 57

HW & SW (low level)

Ship ICs & SW to Clients

Four major areas of SoC design

SoC Processor Area

Poission distribution of defects

rbe : Register bit equivalent

SoC Baseline Area Model

❑ Smaller the feature size, more logic can be accommodated

SoC Power Management

Battery capacity & duty cycle 68

DMA CPU DSP

Mem The “Board-on-a-Chip”

Evolution or Paradigm Shift?

SoC CoreConnect Bus