You are on page 1of 10

DSP DSP

DSP DSP

DSP DSP

Coprocessors Accelerators Peripherals

Embedded vs. General Purpose:
All Multicores Are Not Alike

Alan Gatherer, Chief Technical Officer Communications Infrastructure Group Texas Instruments

Why Multicore? Why Now? Problem:  Silicon can no longer significantly increase processor performance  Power/Heat Dissipation issues associated with frequency scaling  “Instruction Level Parallelism (ILP) Wall”  “Memory Wall” Solution:  Performance scaling through parallel processing  Instead of maximizing processor clock speed. with each core processing multiple threads simultaneously . use the additional transistors afforded by Moore's Law to place multiple cores on a single piece of silicon.

CPRI/OBSAI .A problem we have had for a decade  Many embedded systems are multicore  System architectures based on  Minimizing total power  Achieving a required level of performance  Hot spot power issues: system reliability  Inter-device comms are peer to peer and packet based  Ethernet and RapidIO  Specialized IO tends to “circuit switched”  TDM.

2W Max at 85°C 1.TI’s 5561/3010 production since 2003 DSP Subsystems DARAM Peripherals  DARAM Peripherals 6 DSP subsystems  C55x CPU @ 300 MHz DARAM SARAM C55x C55x ••• C55x Peripherals    Shared Memory SARAM SARAM Local and shared memory 24 Mb Total Memory Communications subsystem CACHE CACHE CACHE Global DMA UTOPIA PCI/HPI McBSP 512 HDLC     Global DMA Shared peripheral interfaces 1.0W typical   Power dissipation: Developed specifically for voice applications .

MAC. server farms. batch compute processing MGW and SBC  L1 modem technology  Vocoding  Requires precise  L2. IP to IP in  This is the space where the multiprocessing community have classically focused (with a more generous latency constraint) prediction of the time to completion of a task uncertainty leads to low CPU utilization  Task completion time .Solution depends on the problem Throughput Limited Latency Limited  Packet processing.

if Metcalfe’s law applies  (n-1)! is a possibility .The Old Engineer has a point… Why have more cores than you absolutely have to?  More threads create more…  Synchronization issues  Deadlock issues  Race Conditions  Really hard debug problems  Complexity will increase as some function of the number of cores (n) that need to coordinate actions or share resources  n2.

But maybe smaller is better  Why use 16 cores when you can use 32!!  Is there a reason to go to more cores than are dictated by ILP and system power requirements?  Some problems have parallelism that is most easily expressed  by multiple cores Higher clock speeds at some point lead to higher total system power  Maybe (reasons stolen from Berkeley white paper)    Finer gained ability to do voltage and clock scaling Redundancy (but maybe you only need this if you have a lot of cores in the first place) Simpler basic unit for HW verification (but maybe you moved the ver problem to SW) .

So what is small and what is simple?  Again borrowing from the Berkeley paper….  5-9 pipeline stages  Stay away from out of order execution  Stay away from branch prediction  TI has taken this philosophy with the C6x  Exposed pipeline  Minimize control logic overhead in favor  of useful processing units As simple a memory subsystem as possible .

Virtualization. abstraction vs debug   The really difficult debug problems are related to temporal dependencies between cores Visibility of transactions and their order is critical • We use logic analyzers for this… • The amount of data to trace can be enormous    Abstraction makes programming easier • Better code reuse. common APIs • Only ask for what you want = More portable code Virtualization provides a common platform across devices But what do I debug? • The programmer might not recognize the machine level program .

Summary Multicore is great And its been here for a while! Don’t get carried away We must still develop reliable real time systems Multicore is harder than single core So we need to find better ways to do it Much of the problem is in core interaction Worry about both what AND when… Virtualization/Abstraction and debug seem at odds A great topic for research .