Riding The Next Wave Of Embedded Multicore Processors

-Maximizing CPU Performance in a Power-Constrained World G.Balaji

The Issues Variety of methods to increase processor performance What Commercial end-customers are demanding What are the scaling factors for Processor. How the industry-standard PC and Servers can stay on our performance curve in fore coming days.

What is a Processor? A single chip package that fits in a socket, (Bundle of much transistors) cores
Cores can have functional units, cache, etc. associated with them (Processing elements) Cores can be fast or slow.

Number of signal pins doesn’t scale with number of cores.

Why we need alternate solution Each generation of microprocessor Make them to run as much as faster. Current processor requires thousands of meters of microscopic “wire” which causes path delays and synchronization difficulties.

Transistors leakage current adds to produce heat . Each generation they dissipate more heat as clock speed increase. Transistors consumes more power and produces more heat Internal path delays becoming unworkable.

Transistors are doubles every 18 months

Transistors Are Not Free The number of transistors in a core determines basic power consumption Efficiency matters when designing new cores
More functional units means more transistors Deeper pipelines mean more transistors Larger caches mean more transistors

Solution Dedicated Hardware accelerators. Reducing processor’s frequency and voltage results an reduction in its overall power requirements. Semiconductor manufacturers forwarded the approach to build somewhat lower frequencies and voltages. But to integrate two or more of these processing cores on a single chip. Its called MULTI-CORE PROCESSOR.......... Multiprocessor and multicore systems are the future.



Industry Needs aerospace and defense (A&D) embedded computer systems strive to meet the constantly increasing demand for more processing power. compute-intensive applications such as image and radar processing. They must simultaneously address the challenge of constraints on size, weight and power (SWAP). Dealing a today’s most advanced processors is a problem with the heat . Multicore processor enable designers to add more processing power per slot without the burden of additional heat dissipation or power consumption.

Multi-Core Processors
A processor that combines two or more independent processors into a single package. (or) Link together multiple cores that work in parallel on the same chip. Performance increases Scalability


Multi-Core Processor Architecture

Types of Multicore


Specialization among processors. Often different instruction sets.

• Homogenous

Processors have the same instruction set, can run any task,

Three Architectures Symmetric multi-processing (SMP) Distributed processing (DP) Asymmetric multiprocessing (AMP)

Symmetric Multi-processing (SMP) What is symmetric processing Especially well suited to handle real time processing used in radar, image processing and other military applications. Each node may have two or more processors and memory is global to all processors. The processors may also have both local cache and shared cache, and the cache is coherent between all processors and memory. A single O/S is used to control all the nodes. Often prefer large global memories that can be accessed at higher data rates .

The advantages of SMP include a large global memory and better performance per Watt, important for SWAP (size, weight and power) SMP’s large global memory is accessible to all of the processor cores. The disadvantages of SMP include the fact that the memory latency and bandwidth of a given node can be affected by other nodes, and cache “thrashing” may occur in some applications.

Asymmetric multiprocessing (AMP) What is AMP ? Application tasks are sent to the system’s separate processors. Each processor essentially a separate computing system with its own OS and memory partition within the common global memory. One advantage of an AMP design is that asymmetric memory partitions can be assigned from one large global memory, making more efficient use of memory resources .

Independent copies of the O/S can run on each node. It offers superior node-to-node communication compared to other architecture. memory latency and bandwidth can be affected by other nodes, cache “thrashing” may occur in some applications,

Distributed processing (DP) Distributed processing is based on independent nodes . Each node has its own processor and memory, and each of the nodes communicates over busses . Separate copies of the operating system are run on each of the nodes. Advantages of a DP approach include predictable performance and higher memory bandwidth since memory is not shared. Adv & disadv


Few Facts

The Software becomes the Problem
Parallelism required to gain performance. – Parallel hardware is “easy” to design. – Parallel software is (very) hard to write. Fundamentally hard to grasp true concurrency – Especially in complex software environments. Existing software assumes single-processor – Might break in new and interesting ways. – Multitasking no guarantee to run on multiprocessor.

Coding Approach Many different programming languages, tools, methodologies and styles available. Choice of programming model can have a huge impact on performance, ease of programming,

fine-grained parallelism Can be done incrementally (one loop at a time) Does not require deep knowledge of the code Compiler assisted is best Tedious if done by hand Large loops have to be parallelized speedup to occur. Potentially many synchronization points.

Coarse-grained programming Coarse – grained parallelism (task level) Make loops parallel at higher level of the tree More code is parallel at once. Fewer synchronization points. Requires deeper knowledge of the code. May lead to load imbalance.

Software Back locks Disabling Interrupts is not Locking
Single processor: DI = cannot be interrupted – Guaranteed exclusive access to whole machine – Cheap mechanism, used in many drivers & kernels Multiprocessor: DI = stop interrupts on one core – Other cores keep running – Shared data can be modified from the outside

Race Condition Tasks “race” to a common point – Result depends on who gets there first – Occurs due to insufficient synchronization Present with regular multitasking, but much more severe in multiprocessing Solution: protect all shared data with locks, Synchronize to ensure expected order of events

Debugging process Debuggers are the tools that allow the visibility into the inner workings of an application. Debugging a multicore target throws a whole new wrench into the works. How does one control each core with one debugger connecting to all the cores or with a separate debugger for each core? JTAG-based connection devices allow “on-chip debugging.” They allow the IDE to interact with the target and provide services such as remotely start, stop or suspend program execution (set a breakpoint) and allow one to view memory and register contents as well as IO and peripheral devices.

Multiple debuggers can each be assigned to individual cores and can send debug service control packets to their assigned core without impacting the other cores . JTAG is a communication mechanism used to control an embedded processor. It does not directly have anything to do with debugging. On the cores themselves there must be debug logic that controls the core. The “Nexus 5001 Forum” is an industry group that has advanced a new IEEE standard (IEEEISTO 5001) that defines just such a debug logic block to support embedded development.

Features for industries From medical imaging to military and aerospace, there is a multi-core CPU-based SBC that can provide the system developer with increased processing power . In automotive applications, an important benefit of this multi-core approach is that it allows redundancy in critical applications. For example, safety monitors can readily be established, in which one core monitors the other.

Hardware is leading the move – Parallelism is a major paradigm shift for software – Software and software tools are racing to catch up – Education and training needs to be updated – Programmers need to relearn programming To manage the software, we need: – New programming paradigms For modified programming languages ,We need: – New debug and analysis techniques

Finally we are going to see many interesting hardware-software combinations.


http://www.cotsjournalonline.com riding the next wave embedded multicore processor. AMD http://www.amd.com/ AMD Multi-Core http://www.amd.com/multicore/ Multicore processor charecters and challenges http://www.techonline.com/archives/articles AMD Multi-Core White Paper http://enterprise.amd.com/downloadables/33211A_M www.ECNasiamag.com/archives/articles journal future of embedded .

Sign up to vote on this title
UsefulNot useful