Professional Documents
Culture Documents
Introduction
SoC
An integrated circuit that implements most or all of the
functions of a complete electronic system
Memory chip is not a system but a component
Challenges
silicon density, design and verification tools & complexity,
bug cost, software, complex standard
Introduction
New design methodology
using pre-designed, pre-verified processor cores
but, general-purpose processors is impossible
Solution
configurable, extensible processor
firmware rather than RTL-defined hardware
The Limitations of Traditional ASIC Design
Conventional SoC-design
combining a standard microprocessor, memory, RTL-built
logic into an ASIC
philosophical descendants of earlier board-level designs
one or two 32-bit busses (for saving pins)
rigid partitioning between a microprocessor and logic blocks
because assume that the communications are bottlenecks
Embedded system
critical functions need specific data types
cannot take full advantage of all capabilities
hard-wired circuits for data-intensive functions
Extensible Processors as an Alternative to RTL
Hard-wired logic
small silicon area (low switched capacitance)
low cycle count (more useful work per cycle)
Configurable processor
small architecture
features can be omitted and configured on demand
application-specific instructions and interface can be added
same effects
Toward Multiple-Processor SoCs
Complexity of SoC designs
faster initial design
greater post-fabrication flexibility
Two trends
combining of functions traditionally implemented
migration of functions with RTL into application-specific
processors
Regards
interconnection of multiple processors
simulation of a system composed of application-specific
processors
What Are MPSoCs?
What is an MPSoC?
A system-on-chip that contains multiple instruction-set
processors
In practice, most SoCs are MPSoCs
For SW designers
SW will be shipped as a part of a chip must be extremely
reliable
Meet many design constraints reserved for hardware
(hard timing constrains, energy consumption...)
What Are MPSoCs?
Heterogeneous vs. symmetric multiprocessors
Harder to program
Cheaper
More energy-effective
Specialized I/O
Task-level behavior
Task-level parallelism is both easy to identify in SoC
applications and important to exploit
RTOSs provide scheduling mechanisms, but abstract the
process
Networks-on-chips
Use packet networks to interconnect the processes in the
SoC
Challenges
FPGAs
The FPGA logic can be used for custom logic that could not
be designed before manufacturing
A good complement to software-based customization
Security
Connect to Internet
Security becomes increasingly important
Networks of chips
Sensor networks
Do not have total control over the system
Design Methodologies
Fast design time is very important
Tight time-to-market and time window constraints
Architecture design
Design of components, hardware/software interface design
Key challenges
Determining which abstraction of MPSoC architecture is most
suitable at each of the design steps
Determining how to obtain application-specific optimization of
software architecture
Software
Optimization Viewpoint
Cost and performance requirements
Two factors
Processor architecture
Parallelism
Application-specific
Memory hierarchy
Shared memory
Distributed memory
30
Reducing Active Energy
Multiple Supply Voltage
Decreasing supply voltage – decrease performance since
increase gate delay
Effective in MPSoC since different type of MP require
difference performance
DVS combined with DFS
(most popular of the techniques)
Most embedded and mobile processors containing this
feature
31
DVS+DFS
As long as supply voltage is increased before
increasing the clock rate, the system only stall when
the PLL is relocking on the new clock rate.
Future MPSoC would require its own converter and
PLL
Requirement : Cores are tolerant of periodic dropouts
Complication : PLL is analog device – noise is induced by
digital switching
32
Reducing Standby Energy
Increasing VT decreases Subthreshold leakage
current (pros) and increases gate delay (cons).
DVS, DFS, variable VT is an effective way
Sleep transistor
Gating the supply rail
Switching off the supply to idle component
System SW can determine the optimal scheduling
Can direct idle cores to switch off
33
Energy-Aware Memory System
Design
Memory constitute a significant portion of the overall
chip resources
Energy is expended due to data access, coherence
activity and leakage energy expended in storing the
data
34
Reducing Active Energy
1. Partitioning large caches into smaller
structures
2. Use of a memory hierarchy that attempts to
capture most access in the smallest size of
memory
Cache way-prediction
Selective way caches
Filter cache
35
Reducing Standby Energy
Most of above techniques do little to alleviate the
leakage energy
36
Requirement
Ability to identify unused resources
Cache size is reduced dynamically to optimize
Cache block is supply-gated
Keeping the tag line active when deactivating a cache line
Dynamic voltage scaling
Drowsy cache
Leakage-biased bitline
37
Influence of Cache Architecture
on Energy Consumption
2 popular alternatives for building a cache
single multi-ported cache (shared by MP)
Pros
Constructive interference can reduce overall miss rates
Inter-process communication is easy to implment
Cons
Consume significant energy
Not scalable
38
Each processor have its own private cache
Pros
Low power per access, low latency, and good scalability
Cons
Duplication of data and instructions
Complex cache coherence protocol
39
Combine the advantage of two option!
CCC (crossbar-connected cache)
Shared cache is divided into multiple banks using an N x M
crossbar
Pros
Duplication problem is eliminated (logically single)
Consistency mechanism isn’t needed
Scalable
Be useful in reducing energy consumption
40
CCC (con’t)
Cons
Concurrent access to the same bank cause processor stall
Alleviate
More cache banks than # of processor
Deals with the reference to the same block
The energy benefits of CCC
41
Reducing Snoop Energy
In bus-based symmetric multi-processors, all bus
size cache controllers snoop the bus
Snoop occur when writes are issued to already cached
block, and cache miss
Unlike normal cache, tag and data array access are
separated
Energy optimizations include
Use of dedicated tag arrays for snoops
Serialization of tag and data array accesses
42