Paper – 3 Domain Specific Processors in Future SoC

ABSTRACT
Systems on Chip have inherent benefits for product development by reducing their time to market and providing cost effective and improved designs. With the ever changing landscape of computing today, new applications are evolving by the day. For most of such applications, heterogeneous multicore architectures provide low power high performance solutions. To design specialized processors directed towards providing high performance at low cost, it is necessary to categorize these applications into specific types of domains, envisioned as domains. This paper predicts and validates the domain specific processors to be used in future computers.

Keywords
!omain"Specific #rocessors, $ultimedia #rocessors, #robability #rocessors, $ission Critical #rocessors, %utomotive #rocessors. Figure / - The ,ardware"Software co"design space . For achieving this, the applications that the future computing infrastructure may run, need to be identified and classified according to their computational needs and design a specialized processor for each need. There are a gamut of processes that need huge amount of parallel processing and fre1uent access to the memory such as signal processing and they 2ustify the %S&C plus ,!3 side of the spectrum in Figure / due to the sheer necessity of high performance. .n the other hand, a system has to be reconfigurable to keep it low cost which demands programmability which 2ustifies 4&SC plus C side of Figure /. &n this paper we look into it in more detail. Section 5 describes the need to classify the domain specific processors and section 6 provides a detailed discussion on each domain. Finally a discussion is presented on the need for sub" domain and super"domain division.

1.

INTRODUCTION

&ntegrating a complete electronic system on a chip has inherent benefits for system manufacturers. &t helps in reducing the design cycle cost by reducing the area and power consumption of the whole system. 'arious apparent lower level system issues for systems assembled on a #C( such as signal integrity, electromagnetic compatibility, component and module interfacing gets obliterated by integrating the whole system inside of a chip. System reliability increases many folds as a result. #resent trend of computer applications is towards enhanced interaction with physical environment for providing a natural user interface. The processing overhead for such applications and interfaces are fundamentally different from the traditional data processing needs of yesteryears which were optimally handled by general purpose processors )*##+. ,uge sets of data collected by a gamut of sensors need conditioning and processing simultaneously with high energy efficiency. % highly intelligent processor handling such huge data set with high efficiency is very common in nature - the human brain. % take away from analyzing the working of human brain is that the data processing of all the things that we humans hear, see, taste, smell, think or do the number"crunching etc, everything is done by various specialized locations in the brain, which vary in their size, properties and functioning. Taking cue from this, we can argue that the processors of the future devices, which has to support natural user interfaces, need to have non"uniform types of specialized data processors for individual kinds of sensing that has to be supported by the system. The most pressing problem is to decide the granularity with which the SoC needs to be designed - whether as a processor or an %pplication Specific &C )%S&C+. The processor approach renders the system highly programmable and low cost but it is one of the slowest on the performance scale. .n the other hand, an %S&C is non programmable hardware and costlier than processor but has the best performance among all other hardware configurations as shown in Figure / . &n my view, the future processors need to be programmable as well energy efficient and have to rank very high on the performance scale. Thus, we have to take an approach that is a mi0 of both.

2.

PROBLEM STATEMENT

&n the systems on chip used in personal computing devices, there is already a partition in the workload based on the fundamental difference in the nature of data processing needed, namely C#7 and *#7. This combination of C#7 and *#7 has taken the computing market by storm because of its enhanced graphics performance and thus increase in the overall system performance. %s depicted by 8vidia , the trend shows that there is considerable growth prospects for this C#7 plus *#7 system on chip architecture. The curve shows that somewhere near 59/6" /:, we will have console level gaming e0perience on our smartphones. &ndeed, *#7 performance is scaling up. We also have to address the fact that most of the signal conditioning and number crunching tasks are done by the multicore C#7 and a performance hit is emerging there. The present day and future applications such as natural user interfaces will be highly data intensive where a lot of sensory data would be incident on the C#7 which would need to be conditioned and processed simultaneously.

5 *raphics and #hysics #rocessors With a booming gaming industry and predicted high resolution natural user interfaces in the near future. all fall under this category. F&4. Calculations such as rigid and soft body dynamics. =uite a few ##7s were marketed like F#*% based S#%4T% and %S&C based . %nother highly useful feature is use of shadow registers for all key registers which are helpful in fast conte0t switching@ the ability to handle interrupts 1uickly. power metric is a dominant determinant of success. Figure 5 shows the pro2ected growth of graphics processor performance. fast and large memory access capabilities. The mammoth parallel processing capability it possesses encourages a few other type of data and signal processing operations to be handled by it too. finite element analysis to name a few. #resent day applications ranging from voice processing on any phone. fluid dynamics.>33%S from #enn State 7niversity and *eorgia Tech. special addressing such as modulo addressing. these were mostly used for gaming industry. graphics processors are an essential component of today computing systems. The power budget for running !S# algorithms on *## is also very high and for most of the computing infrastructure. 6. we need to identify the types of possible applications that can run on these systems on chips.Simplified !iagram of %nalog !evices Shark !S# ./ *eneral #urpose #rocessors Computer without a processor makes no sense.*raphics #erformance trend plot by 8vidia . The architectural re1uirements for these processors are fast floating"point multipliers and shifters. as done earlier in case of *#7. &nfact. 6. #hysics #rocessors $ultimedia #rocessors 8etworking #rocessors %utomotive Control #rocessors These are the most essential parts of today?s systems on chips. Starting from %4$ mobile cores to &ntel #entium series. #hysics processing units )##7+ or physics engines in gaming systems are new additions to these *#7s. % C#7 in the archaic sense means a *##.and cloth simulation. they used to be single core processors. Though till date. 3. are highly computationally intensive tasks and becomes highly power consuming for a *##. This processor can run an efficient F&4 filter with /99 coefficients within /9A"//9 cycles as opposed to many thousands of cycles re1uired by a *##. This paper proposes the following classifications for the processors for future systems *eneral #urpose #rocessors !igital Signal #rocessors #robability #rocessors $ission Critical #rocessors *raphics .: $ultimedia #rocessors The introduction of digital audio and video into the mainstream was the starting point for multimedia processors. a floating point multiplication on a *## takes around times more time than addition. There is enough evidence that graphics processors have firmly set its foot in the processor market. 6. instruction data caches and also some omitted features such as 9bit accumulator is built into the multiplier to reduce round off errors associated with multiple fi0ed point multiplications . %s shown in Figure 6c simplified diagram of %nalog !evices Shark !S#. and facsimile machines to a medical imaging device. they may find new and huge applications in natural user interfaces and various educational animations. For e0ample. there are dedicated multipliers. &n other words.6 !igital Signal #rocessors Figure 5 . 7ntil recently. PROPOSED SOLUTIONS To start identifying the types of data processing re1uirements of future computers. but due to the power limitations. We will discuss each of these domains with a few possible commercial architectures in these categories and their salient features. Thus identifying these classes of data incident on the processors is necessary to be able to classify the types of future SoCs. These processors compute the calculations of physical world obeying advanced laws of physics very simple and fast. the fre1uency scaling of single core processors reached its practical limits and now the trend is moving towards homogeneous multicore processors. #hys< from %geia which was ac1uired by 8vidia . The re"programmability and ease of reconfiguration is the strong point for these processors. %$! and $&#S. zero overhead looping. almost all computers today come with a *## plus *#7 unit in one chip. We need to address this by splitting this C#7 again with respect to the data processing needs. These are modified architecture specialized to run various signal processing algorithms like FFT. &ntel Core i< series. they all re1uire these processors at their core. voice"band modems. 6. Such latency defeats the purpose of such large data processing needed to complete the tasks in near real time. These specific re1uirements cannot be fulfilled in a *##. collision detection. we need to decide on the types and scenarios where the computing hardware may be used. hair Figure 6c . Today it has . large instruction word issue capabilities. fast data sampling and various other similar tasks for the above mentioned applications.

this processor has potential to usher a new domain and class of processors. Conventional flash memories store data by storing charge on semiconductor surfaces. supports this statement. &mplementing the math for such application is simpler on probability processors than conventional logic processors and hence these smaller and more energy efficient processors result in faster outputs. #robability processors are finding utmost use in companies producing flash memories. which differs a lot from chip to chip. video conferencing. This approach was used in graphics processors. $#>*"5 video decoder. 6. image composition processor. The technology is still in the cradles and 3yric is finding it difficult to prove its technology reliability.!T'+ )/B590/9 9 pi0els+ is e0pected to be compressed into 59 $bitCs.A framesCs is e0pected to be /9-59 FbitCs . video input processor. The building blocks for #robability #rocessors are called G(ayesian 8%8!H gates. #hillips 8e0peria T$ and ST 8omadic T$ are 5 multimedia processors used for digital video entertainment platforms and #robability processor is fundamentally different from traditional electronics right from the inception . 8ewer natural user interfaces are being developed and will be introduced in the near future.A #robability #rocessors Figure 6d . flash memory in smart phones and error correction algorithms. . These interfaces incorporate a bunch of sensors that provide all kinds of raw data from touch"sensors. trees or grids. The output of a (ayesian 8%8! gate represents the chances that the two input probabilities match. The block diagram for #hillips 8e0peria T$ is shown in Figure 6e. voice"data over internet etc. !%4#% is looking for potential applications that suit such processors where the information states are not clear like distorted radio signals or machine vision systems. . !ue to miniaturization of semiconductor. bluetooth and in future mobile medical imaging and diagnosis sensors etc. but for this option to work. This makes it possible to perform calculations that use probabilities as their input and output . 3yric Semiconductor )www. !igital audio and video re1uire a tremendous amount of information bandwidth. this error rate will only increase with time. but most of them use a lot of power due to their operation at higher fre1uencies. Some microprocessors are trying to incorporate some enhancements in their instruction set to accommodate media functionalities. high" definition television ). The $&#S core runs the operating system and the hardware accelerator block contains modules such as a 5"! rendering engine. %pplications that depend solely on probability calculations are %mazon?s recommendations about products. This re1uires the kind of statistical calculation that is difficult to implement in digital logic but ideal for probability processors . efficiency can be achieved by tuning the program using assembly language. The data structures and programming language for such processors are represented as chains.$obile 'ideo 7sers Trend . (ut this enhancement makes the programming for the processor highly complicated . mobile video streaming. mobile multimedia applications.%rchitecture !iagram of #hillips 8e0peria T$. scaler. similar to !S# programming.5D6 video"phone terminal using sub1uarter"common intermediate format )/5 0BD pi0els+ with E. There are various architectures from C#7s and !S#s for such multimedia processing. accelerometer. and system processor. The nature of processing needed for data from these are similar to media compression algorithms. while a . then 4&SC processors in servers to process $#>* video and later in #C media accelerators too. >rror checking chips can correct them by generating uni1ue codes each time something is written onto the flash and the checksum can be used to detect any unintentional flip in data and also correct it.owever. $#>* video compressions. Figure 6e .com+ is a startup that is making these probability processors that compute on chances. &4. The difference between I/? and I9? is roughly /99 electrons and about / in /999 bits gets corrupted. Such systems don?t find a solution but set constraints and let the system solve using these constraints . pressure"sensors.. %n alternative way is to use the graphics accelerators or *#7s that are an inevitable part of today?s computers. % long word %37 is broken into several small word %37 which can process several short"word data in a single instruction.it performs calculations using probability instead of binary logic and holds promising results for banking calculations and software. %rchitectural changes have also occurred such as accommodation of very long instruction word )'3&W+ controls.lyricsemiconductor. To put things into perspective. This processor implements statistical calculations in a simpler and energy efficient way. bend"sensors. 4eal time audio and video compression re1uires a lot of processing capabilities. these *#7s have to lose their specificity towards graphics related tasks and assume a more general processor )*#*#7+ type of role which is not desirable. gyroscope. The present day !S# chips do have such processing capabilities but they have $#>* conversion as a built"in hardware function in addition to their basic architecture. The need for a number"crunching mammoth processor like a multimedia processor is only reinforced. fraud check on credit cards and e"mail spam filters to name a few . The increasing market trend of mobile video users as shown in Figure 6d.net and *oogle is using a language called 4.scaled to the use of high"definition T'. With all such applications. The !S# based multimedia processor is a highly parallel architecture running on very low fre1uency and thus consuming /90 low power . to put in direct relation to (ayesian #robability. $icrosoft is trying to develop a probability coding language called &nfer. $ultimedia processing done on a *## consumes more power due to operation at higher fre1uency and being less parallel.

when data rates were not as demanding. 8etwork processors employ multiple programmable processing engines )##>+ within a single processing device. With defense and aviation budgets swelling each year worldwide.riented Systems Transport )$. having a fle0ible network is essential that can adapt to these changes as 1uickly as possible. a new kind of processor having complete re"programmability is needed. the packet processing was implemented in software running on a *##. !!C"& )www. 5 crypto engines. !ue to these shortcomings of %S&Cs and F#*%s. These are novel domains and a concrete working architecture is yet to evolve. &ntegrated flash memory is a highly desirable feature due to the use of large look"up tables used for calibration inputs of various control systems. but also demand more data rate. but the need to address the new class of data processing. braking control systems would adopt more computationally demanding model"based approaches. seaborne and terrestrial assets helping real"time critical interoperability of military communication devices . 6. engine speed and temperature. automatic braking system and air"bag control re1uire processors with e0treme reliability. the clock rates of *## have certainly fallen far behind than needed for faster and bulkier packet processing. Today?s automobiles are comple0 digital networks of a number of embedded processors controlling and monitoring virtually every aspect of the automobiles.ST+ are network protocols used specifically for automotive systems and they are of little use for *##s and other computing hardware. making it more efficient.com+ is another company producing heterogeneous multicore processors that enable ne0t"generation routing and switching capability in tactical environments where data has to be securely maintained and transmitted among airborne. Cavium )www. The controller generates updated fuel in2ection and ignition outputs in response. With core backhaul networks running on :9*bps and being scaled to 9*bps or higher. and edge networks emerging upto /*bps. Figure 6g shows a Freescale?s architecture block diagram of an automotive control processor. <scale processor to handle control. Controlled %rea 8etwork )C%8+ and $edia . With the huge demand on data rates and traffic volume. This certainly puts a huge packet processing overload to the core backhaul processor. Some of these architectures incorporate hardware co"processors to perform common networking tasks that don?t re1uire programming such as C4C calculations. F#*%s are a very good alternatives eliminating almost all the shortcomings of %S&Cs. !evices that are not networked don?t solve the purpose of having it. multiple ports for packet ingressCegress and #C&e ports. Figure 6f shows generic network processor architecture with several ##>s. %mong various flavors of 8#s having different architectures. &t has an array of /D multi"threaded micro"engines to handle packets. Typically a network processor supports pattern matching to determine the type of packet. These have not only increased in number.ther non"critical tasks such as in"cabin navigation and infotainment also rely heavily on signal processing automotive processors. the commonality lies in the fact that they all employ ##> . intake air density. 7ntil recently. $ilitary and space applications need a new domain of processors that take care of the special time critical and safety critical applications. Some 8# manufacturers use 4&SC based instruction"set with multithreading while others use '3&W based architecture. The same 7S %rmy report suggests use of advanced architectures for intelligence. which the *##s are unable to address. . The automotive on" chip integration is 1uite different from other computing re1uirements . (ut with fast evolving protocols and features. launches+ when it has fully been implemented.C*>. =oS. rendering the networks rigid and stagnant. $ulti"$ission $obile #rocessor )$6#+ is a ground based S(&4S )Space (ased &nfrared Satellite+ system that is designed to provide !irect !ownlink of the S(&4S constellation )with . . in which comple0 run"time calculations would replace the look" up table references that are prevalent today. %n engine control system receives dozens of analog signals from various sensors placed strategically throughout the chassis that sense throttle position. new safety systems would incorporate video . These are ne0t generation of missile detection systems and are still in design phase. %lso. but fall behind on power consumption and cost scenarios. &n future automobiles. this area is definitely a thrust for future processor architecture development.owever. $ultichannel %!Cs are a must for a processor for an automotive control system. &ntel?s &<#5 A9 is a heterogeneous multiprocessor 8# chip. %utomotive Control #rocessors Figure 6f .cavium. These re1uire processing powers that surpass the capabilities of the most advanced *##s. more comple0 networking protocols.>. and e0haust gas o0ygen content. &ntel provides an S!F for software simulation and real time debugging. These features are not present in *##s. is 1uite evident from the above e0amples.ddci. security aspects are being added. &n the earlier days.com+ is a company working in the safety critical application domain holds the view that heterogeneous multicore processors are integral to future of avionics and aerospace applications.6. surveillance and reconnaissance activities too. . Safety critical systems such as engine control. The company is focused on enhancing a product to safely schedule multiple cores and ensure that one core hitting the cache or resources doesnJt unnecessarily degrade performance and throw off the timing of another application running on another core .D 8etwork #rocessors 6.*eneric 8# %rchitecture. data manipulation like recalculating C4C checks or packet segmentations or encryption and 1ueue management.E $ission Critical #rocessors Today network is synonymous with computers. radar processing and engine . %S&Cs used to be serving this space which meant loss of programmability and higher time"to"market. the electronic components for automotive applications need to withstand higher temperatures and should have higher reliability which makes them costlier than commercial grade parts. lookup for various destination &#. separate lookup engines and switch fabric interface. #rogrammability for *##s is not suitable for packet processing.

*## and some other specialized architectures. 6. there are some processors that inherently assume the presence of another basic processor for their working. 5. audio and video processing etc. and various other rugged conditions. %ll of these processors can be termed together to form a super"group and the smaller domains can form the respective sub"groups. %nother super"domain which is overlooked completely in this paper is the Server and Supercomputing Class which will encompass hugely parallel architecture such as *#*#7.Freescale $#CADD (lock !iagram .Figure 6g . DISCUSSION ON SOC DOMAINS &n this paper we have proposed different types of domain" specific processors.n similar lines are the domains of digital signal processors and multimedia processors. Some of these are integral for a lot of computing systems and some are very specific to a few application zones. RE ERENCES . we name it Signal #rocessing and $edia !omain. multimedia processors encompass hardware accelerators and !S#s as their integral parts. %s seen in the discussion till now. !S#s are integral to the functioning of a lot processing infrastructure like automotive and mission critical applications. .ere we would investigate into further grouping them in some more sub"groups and super"groups. military specific mission critical processors fall under a broader ambit of time and safety critical data processing. . The type of components used for these systems are very different from commercial grade components. %pplications handled by automotive processor and aerospace . 4. So we can have a super"domain here. CONCLUSION &n this paper we see how our computing infrastructure can be classified into different domain specific zones which will not only help in delivering targeted high performance in a much higher energy efficient way thus keeping the cost as low as possible. These two surely form a super"domain@ we name it as Safety Critical Specialized Computing !omain. These need to be highly reliable processors even at elevated temperatures. &nfact.