You are on page 1of 42

Multiprocessor System-On-Chips

Introduction
 SoC
 An integrated circuit that implements most or all of the
functions of a complete electronic system
 Memory chip is not a system but a component

 Contains memory, instruction-set processor (CPU),


specialized logic, bus, other digital functions…

 Generally tailored to the application rather than


being a general-purpose chip
 Cost-effective
 Provide the necessary performance
Introduction
 A new crisis to SoC design
 increases in functionality, reliability, bandwidth
 decreases in cost, power consumption
 high-intension silicon
 with Register-Transfer-Level hardware design techniques

 Pressure on chip designers


 productive gap, growing cost, time-to-market

 Challenges
 silicon density, design and verification tools & complexity,
bug cost, software, complex standard
Introduction
 New design methodology
 using pre-designed, pre-verified processor cores
 but, general-purpose processors is impossible

 designing custom RTL logic


 but, takes too long, too rigid to change easily

 Solution
 configurable, extensible processor
 firmware rather than RTL-defined hardware
The Limitations of Traditional ASIC Design

 Conventional SoC-design
 combining a standard microprocessor, memory, RTL-built
logic into an ASIC
 philosophical descendants of earlier board-level designs
 one or two 32-bit busses (for saving pins)
 rigid partitioning between a microprocessor and logic blocks
 because assume that the communications are bottlenecks

 The impact of SoC Integration


 wide bus (128-, 256-bit)
 1 GB per second on an SoC using wider busses
The Limitations of Traditional ASIC Design

 The limitation of general-purpose processors


 most generic data types
 for complete generality
 silicon-intensive, deeply pipelined, super-scalar…
 IPC limits

 Embedded system
 critical functions need specific data types
 cannot take full advantage of all capabilities
 hard-wired circuits for data-intensive functions
Extensible Processors as an Alternative to RTL

 Two important criteria


 must accelerate and simplify the creation of configurations
 hardware descriptions, software development tools,
verification aids

 Configurable & Extensible processor


 Non-architectural processor configuration
 Fixed-menu of processor architecture configurations
 User-modifiable processor RTL
 Processor extension using an instruction-set description
language
 Fully automated processor synthesis
Extensible Processors as an Alternative to RTL

 Design migration from hardwired state machine to


firmware program control
 flexibility
 software-based development
 faster, more complete system modeling
 unification of control and data
 time-to-market

 Not right choice


 small, fixed-state machines
 simple data buffering
Extensibility and Energy Efficiency

 Hard-wired logic
 small silicon area (low switched capacitance)
 low cycle count (more useful work per cycle)

 Configurable processor
 small architecture
 features can be omitted and configured on demand
 application-specific instructions and interface can be added

 same effects
Toward Multiple-Processor SoCs
 Complexity of SoC designs
 faster initial design
 greater post-fabrication flexibility
 Two trends
 combining of functions traditionally implemented
 migration of functions with RTL into application-specific
processors
 Regards
 interconnection of multiple processors
 simulation of a system composed of application-specific
processors
What Are MPSoCs?
 What is an MPSoC?
 A system-on-chip that contains multiple instruction-set
processors
 In practice, most SoCs are MPSoCs

 Why do we care about performance?


 Precise performance requirements
 At least, some real-time deadlines

 Why do we care about energy?


 Battery-operated devices
 In non battery-operated, energy consumption is related to
cost
What Are MPSoCs?
 In an MPSoC, SW design is an inherent part of the
overall chip design

 For chip designers


 Either HW or SW can be used to solve a problem
 Depend on performance, power, and design time

 For SW designers
 SW will be shipped as a part of a chip must be extremely
reliable
 Meet many design constraints reserved for hardware
(hard timing constrains, energy consumption...)
What Are MPSoCs?
 Heterogeneous vs. symmetric multiprocessors
 Harder to program
 Cheaper
 More energy-effective

 Challenges in MPSoC software design


 The combination of high reliability, real-time performance,
small memory footprint, and low-energy software
Why MPSoCs?
 Typically, MPSoC is a heterogeneous multiprocessor
 Several different types of PEs (processing elements)
 Heterogeneously distributed memory system
 Heterogeneous interconnection network between the PEs
and the memory systems

 A shared-memory multiprocessor model is preferred


because it makes life simpler for the programmer
Why MPSoCs?
 Multiprocessor vs. Uniprocessor
 Enough performance for some applications
 The computational concurrency required to handle
concurrent real-world events in real time
(task-level parallelism)

 Heterogeneous vs. Symmetric


 Perform real-time computations
 Be area-efficient
 Be energy-efficient
 Provide the proper I/O connections
Why MPSoCs?
 Perform real-time computations
 Real-time computing is much more than high-performance
computing

 Predictable behavior of the hardware

 For predictable and high performance


 A mechanism that is specialized to the needs of the application
 Specialized memory systems, application-specific instructions
Why MPSoCs?
 Be area-efficient
 A special-purpose PE may be much faster and smaller than
a programmable processor

 If the system architect can predict some aspects of the


memory behavior of the application, it is often possible to
reflect those characteristics in the architecture

 Memory specialization / Cache configuration


Why MPSoCs?
 Be energy-efficient
 Power-sensitive, whether due
 To environmental considerations (heat dissipation)
 Or to system requirements (battery power)

 Specialization saves power

 Stripping away features that are unnecessary for the


application
Why MPSoCs?
 Provide the proper I/O connections
 The point of an SoC is to provide a complete system

 Specialized I/O

 Because of the variety of physical interfaces, it is difficult to


create customizable I/O devices effectively
Challenges
 Software development
 High performance, real time, and low power
 Each MPSoC requires its own software development
environment

 Task-level behavior
 Task-level parallelism is both easy to identify in SoC
applications and important to exploit
 RTOSs provide scheduling mechanisms, but abstract the
process

 Networks-on-chips
 Use packet networks to interconnect the processes in the
SoC
Challenges
 FPGAs
 The FPGA logic can be used for custom logic that could not
be designed before manufacturing
 A good complement to software-based customization

 Security
 Connect to Internet
 Security becomes increasingly important

 Networks of chips
 Sensor networks
 Do not have total control over the system
Design Methodologies
 Fast design time is very important
 Tight time-to-market and time window constraints

 Higher level abstractions are needed on the HW and


SW side

 A key issue is the definition of a good system-level


model that is capable of representing all those
heterogeneous components along with local and
global design constraints and metrics
Design Methodologies
 Design steps
 Design space exploration
 Hardware/software partitioning, selection of architectural
platform and components

 Architecture design
 Design of components, hardware/software interface design

 Consider strict requirements, regarding time-to-


market, system performance, power consumption,
and production cost…
Hardware Architecture
 Which CPU do you use? What instruction set and cache should
be used based on the application characteristics?

 What set of processors do you use? How many processors do


you need?

 What interconnect and topology should be used? How much


bandwidth is required? What quality-of-service characteristics
are required of the network?

 How should the memory system be organized? Where should


memory be placed and how much memory should be provided
for different tasks?
Hardware Architecture
 Research project for high-performance MPSoC
architectures for high-performance applications
 Philips Nexperia™ DVP
 Texas Instrument OMAP platform
 Xilinx Virtex-II Pro ™…

 We can see that


 Limit the number and types of integrated processor cores
 Provide a fixed or not well-defined memory architecture
 Limit the choice of interconnect networks and available IPs
 Do not support the design from a high abstraction level
Software
 Programmer’s Viewpoint
 Parallel architecture, parallel programming is required

 Two types of parallel programming model


 Shared-memory programming : OpenMP
 Message-passing programming : message-passing interface

 MPSoC vs. conventional parallel programming


 Application
 Application-specific
 Not need full-featured parallel programming models
 Architecture
 Heterogeneity
 Massive parallelism
Software
 Software Architecture and Design Reuse Viewpoint
 Middleware, Operating system, Hardware abstraction layer

 APIs provide an abstraction of the underlying hardware


architecture to upper layers of software

 The software architecture may enable several levels of


software design reuse

 Key challenges
 Determining which abstraction of MPSoC architecture is most
suitable at each of the design steps
 Determining how to obtain application-specific optimization of
software architecture
Software
 Optimization Viewpoint
 Cost and performance requirements

 Two factors
 Processor architecture
 Parallelism
 Application-specific

 Memory hierarchy
 Shared memory
 Distributed memory

 Consider problems in a different context with more design


freedom in hardware architecture and with a new focus on
energy consumption
Techniques for Designing
Energy-Aware MPSoC
Introduction
 Power and energy consumption have become
significant constraints
 Reducing active power – voltage scaling
 Reducing standby power

30
Reducing Active Energy
 Multiple Supply Voltage
 Decreasing supply voltage – decrease performance since
increase gate delay
 Effective in MPSoC since different type of MP require
difference performance
 DVS combined with DFS
 (most popular of the techniques)
 Most embedded and mobile processors containing this
feature

31
DVS+DFS
 As long as supply voltage is increased before
increasing the clock rate, the system only stall when
the PLL is relocking on the new clock rate.
 Future MPSoC would require its own converter and
PLL
 Requirement : Cores are tolerant of periodic dropouts
 Complication : PLL is analog device – noise is induced by
digital switching

32
Reducing Standby Energy
 Increasing VT decreases Subthreshold leakage
current (pros) and increases gate delay (cons).
 DVS, DFS, variable VT is an effective way
 Sleep transistor
 Gating the supply rail
 Switching off the supply to idle component
 System SW can determine the optimal scheduling
 Can direct idle cores to switch off

33
Energy-Aware Memory System
Design
 Memory constitute a significant portion of the overall
chip resources
 Energy is expended due to data access, coherence
activity and leakage energy expended in storing the
data

34
Reducing Active Energy
1. Partitioning large caches into smaller
structures
2. Use of a memory hierarchy that attempts to
capture most access in the smallest size of
memory
 Cache way-prediction
 Selective way caches
 Filter cache

35
Reducing Standby Energy
 Most of above techniques do little to alleviate the
leakage energy

 Reducing leakage during idle cycles by turning off


the supply voltage
 Gated-Vdd : shut down portions of the cache dynamically
 State-preserving leakage optimization

36
 Requirement
 Ability to identify unused resources
 Cache size is reduced dynamically to optimize
 Cache block is supply-gated
 Keeping the tag line active when deactivating a cache line
 Dynamic voltage scaling
 Drowsy cache
 Leakage-biased bitline

37
Influence of Cache Architecture
on Energy Consumption
 2 popular alternatives for building a cache
 single multi-ported cache (shared by MP)
 Pros
 Constructive interference can reduce overall miss rates
 Inter-process communication is easy to implment
 Cons
 Consume significant energy
 Not scalable

38
 Each processor have its own private cache
 Pros
 Low power per access, low latency, and good scalability
 Cons
 Duplication of data and instructions
 Complex cache coherence protocol

39
 Combine the advantage of two option!
 CCC (crossbar-connected cache)
 Shared cache is divided into multiple banks using an N x M
crossbar
 Pros
 Duplication problem is eliminated (logically single)
 Consistency mechanism isn’t needed
 Scalable
 Be useful in reducing energy consumption

40
 CCC (con’t)
 Cons
 Concurrent access to the same bank cause processor stall
 Alleviate
 More cache banks than # of processor
 Deals with the reference to the same block
 The energy benefits of CCC

41
Reducing Snoop Energy
 In bus-based symmetric multi-processors, all bus
size cache controllers snoop the bus
 Snoop occur when writes are issued to already cached
block, and cache miss
 Unlike normal cache, tag and data array access are
separated
 Energy optimizations include
 Use of dedicated tag arrays for snoops
 Serialization of tag and data array accesses

42

You might also like