You are on page 1of 20

FPGA Architectures from 'A' to 'Z' : Part 1

Clive Maxfield 8/15/2 ! 5:1" PM #$%

Editor's Note: %his article is a&stracted from Cha'ter ( of m) &oo* %he $esi+, -arrior's Guide to FPGAs. /012: "5 !"! (3. 4ith the *i,d 'ermissio, of the 'u&lisher5 / should 'oi,t out that this &oo* does,'t actuall) teach )ou ho4 to desi+, 4ith FPGAs6 i,stead. it 'rovides a, i,troductio, to the various device architectures. tools. a,d desi+, flo4s 7chec* out the Co,te,ts 8ist for more details95 0ee also Part 2 of this article a,d also the follo4i,+ related articles: 1) How to implement an open IP encryption flow 2) Alternative computing solutions, from single cores to arrays of 'things' 3) True !P !ynthesis for "P#As $) #raph%&ase' physical synthesis for "P#A 'esigns () )oun'ing Algorithms 1*1 In this article we intro'uce a plethora of architectural features+ ,ertain options - such as using antifuse versus !)A. configuration cells - are mutually e/clusive+ !ome "P#A ven'ors speciali0e in one or the other, while others may offer multiple 'evice families &ases on these 'ifferent technologies+ 12nless otherwise note', the ma3ority of these 'iscussions assume !)A.%&ase' 'evices+) In the case of em&e''e' &loc4s such as multipliers, a''ers, memory, an' microprocessor cores, 'ifferent ven'ors offer alternative 5flavors5 of these &loc4s with 'ifferent 5recipes5 of ingre'ients+ 1.uch li4e 'ifferent &ran's of chocolate chip coo4ies featuring larger6smaller chocolate chips, for e/ample, some "P#A families will have &igger6&etter6&a''er em&e''e' )A. &loc4s, while others might feature more multipliers, or support more I67 stan'ar's, or +++ the list goes on)+ The pro&lem is that the features supporte' &y each ven'or an' each family change on an almost 'aily &asis+ This means that once you've 'eci'e' what features you nee', you then nee' to 'o a little research to see which ven'or's offerings currently come closest to satisfying your re8uirements+ A little &ac*+rou,d i,formatio, 9efore hurling ourselves into the &o'y of this chapter, there are a couple of things we nee' to 'efine so as to ensure that we're all marching to the same 'rum&eat+ "or e/ample, a term you're going to see use' throughout this &oo4 is 5fa&ric+5 In the conte/t of a silicon

chip, this is use' to refer to the un'erlying structure of the 'evice 1sort of li4e the phrase "The fabric of civilized society")+ %he 4ord :fa&ric: comes from the Middle #,+lish fabryke. mea,i,+ :somethi,+ co,structed5: :hen you first hear someone using fabric in this way, it might soun' a little 5snooty5 or pretentious+ Truth to tell, however, once you get use' to it, this is 8uite a useful wor'+ :hen we tal4 a&out the 5geometry5 of an integrate' circuit, we are referring to the si0e of the in'ivi'ual structures constructe' on the chip - such as the portion of a field-effect transistor (FET) 4nown as its 5channel5+ These structures are incre'i&ly small+ In the early 1;<*s, 'evices were &ase' on 3 =m geometries, which means that their smallest structures were 3 millionths of a meter in si0e+ 1In conversation, we woul' say> "This IC is based on a three micron technology.") %he ';' s)m&ol sta,ds for :micro: from the Gree* micros. mea,i,+ :small: 7he,ce the use of :;P: as a, a&&reviatio, for :micro'rocessor.: mea,i,+ :small 'rocessor:95 /, the metric s)stem. ';' sta,ds for :o,e millio,th 'art of.: so 1 ;m re'rese,ts :o,e millio,th of a meter5: ?ach new geometry is referre' to as a 5technology no'e5+ 9y the 1;;*s, 'evices &ase' on 1 =m geometries were starting to appear, an' feature si0es continue' to plummet throughout the course of the 'eca'e+ As we move' into the 21st ,entury, high% performance I,s ha' geometries as small as *+1< =m+ 9y 2**2 this ha' shrun4 to *+1= m, &y 2**3 'evices at *+*; =m were starting to appear, an' - at the time of this writing - the latest "P#As are &eing create' at the *+*@( =m no'e+ Any geometry smaller than aroun' *+( =m is referre' to as deep s bmicron (!"#)+ At some point that is not well 'efine' 1or that has multiple 'efinitions 'epen'ing on whom one is tal4ing to), we move into the ltra-deep s bmicron ($!"#) realm+ Things starte' to &ecome a little aw4war' once geometries 'roppe' &elow 1=m, not the least that it's a pain to 4eep on having to say things li4e "%ero point one three microns." "or this reason, in conversation, it's now common to tal4 in terms of 5nano5, where one nano 1short for nanometer) e8uates to a thousan'th of a micron - that is, one thousan'th of a millionth of a meter+ Thus, instea' of mum&ling 5*+*; =m5 1"&oint zero nine microns"), one can simply proclaim 5;* nano5 1"'inety nano") an' have 'one with it+ 7f course, these &oth mean e/actly the same thing, &ut if you feel move' to regale your frien's on these topics, it's &est to use the vernacular of the 'ay an' present yourself as hip an' tren'y as oppose' to an ol' fu''y%'u''y from the last millennium+ 0<AM=&ased devices The ma3ority of "P#As are &ase' on the use of !)A. configuration cells, which means

that they can &e configure' 1programme') over an' over again+ The main a'vantages of this techni8ue are that new 'esign i'eas can &e 8uic4ly implemente' an' teste', while evolving stan'ar's an' protocols can &e accommo'ate' relatively easily+ "urthermore, when the system is first powere'%up, the "P#A can initially &e programme' to perform one function such as a self%test an'6or &oar'6system test, an' it can then &e reprogramme' to perform its main tas4+ Another &ig a'vantage of the !)A.%&ase' approach is that these 'evices are at the forefront of technology+ "P#A ven'ors can leverage the fact that many other companies speciali0ing in memory 'evices e/pen' tremen'ous resources on research an' 'evelopment 1)A ) in this area+ "urthermore, the !)A. cells are create' using e/actly the same ,.7! technologies as the rest of the 'evice, so no special processing steps are re8uire' in or'er to create these components+ In the past, memory 'evices were often use' to 8ualify the manufacturing processes associate' with a new technology no'e+ .ore recently, the mi/ture of si0e, comple/ity, an' regularity associate' with the latest "P#A generations has resulte' in these 'evices &eing use' for this tas4+ 7ne a'vantage of using "P#As over memory 'evices to 8ualify the manufacturing process is that, if there's a 'efect, the structure of "P#As is such that it's easier to i'entify an' locate the pro&lem 1that is, figure out 5what5 an' 5where5 it is)+ As an e/ample, when I9. an' 2., were rolling out their *+*; =m 1;* nano) processes, "P#As were the first 'evices to race out of the starting gate+ 2nfortunately, there's no such thing as a free lunch+ 7ne 'ownsi'e of !)A.%&ase' 'evices is that they have to &e reconfigure' every time the system is powere'%up+ This either re8uires the use of a special e/ternal memory 'evice 1which has an associate' cost an' consumes real estate on the &oar') or the use of an on%&oar' microprocessor 1or some variation on these techni8ues+ (Editor)s 'ote* These techni+ es are presented in Chapter , of the boo-.. 0ecurit) issues a,d solutio,s 4ith 0<AM=&ased devices Another consi'eration with regar' to !)A. &ase' 'evices is that it can &e 'ifficult to protect your intellect al property (I&) in the form of your 'esign+ This is &ecause the configuration file use' to program the 'evice is store' in some form of e/ternal memory+ (Editor)s 'ote* There)s a very interesting paper entitled How to implement an open IP encryption flow that provides a great introd ction to encryption in the conte/t of E!0 design flo1s in general and F&20 design flo1s in partic lar.. ,urrently there are no commercially availa&le tools that will rea' the contents of a configuration file an' generate a correspon'ing schematic or netlist representation+ Having sai' this, un'erstan'ing an' e/tracting the 5logic5 from the configuration file while not a trivial tas4 - woul' not &e &eyon' the &oun's of possi&ility given the com&ination of clever fol4s an' computing horsepower availa&le to'ay+ Bet's not forget that there are reverse engineering companies all over the worl' speciali0ing in the recovery of 5'esign IP5+ An' there are also a num&er of countries

whose governments turn a &lin' eye to IP theft so long as the money 4eeps on rolling in 1you 4now who you are)+ !o if a 'esign is a high%profit item, you can &et that there are fol4s out there that are rea'y an' eager to replicate it while you're not loo4ing+ In reality, the real issue here is not relate' to someone stealing your IP &y reverse% engineering the contents of the configuration file, &ut rather their a&ility to 5clone5 your 'esign irrespective of whether or not they un'erstan' how it performs its magic+ 2sing rea'ily availa&le technology, it is relatively easy for someone to ta4e a circuit &oar', put it on a 5&e' of nails5 tester, an' 8uic4ly e/tract a complete netlist for the &oar'+ This netlist can su&se8uently &e use' to repro'uce the &oar'+ Cow the only tas4 remaining for the nefarious scoun'rels is to copy your "P#A configuration file from its &oot P)7. 1or ?P)7., ?2P)7., or whatever) an' they have a 'uplicate of the entire 'esign+ 7n the &right si'e, some of to'ay's !)A.%&ase' "P#As support the concept of bitstream encryption+ In this case, the final configuration 'ata is encrypte' &efore &eing store' in the e/ternal memory 'evice+ The encryption 4ey itself is loa'e' into a special !)A.%&ase' register in the "P#A via its DTA# port+ In con3unction with some associate' logic, this 4ey allows the incoming encrypte' configuration &itstream to &e 'ecrypte' as it is &eing loa'e' into the 'evice+ The process of loa'ing an encrypte' &itstream automatically 'isa&les the "P#A's rea'% &ac4 capa&ility+ This means that you will typically use unencrypte' configuration 'ata 'uring 'evelopment 1where you nee' to use rea'%&ac4) an' then start to use encrypte' 'ata when you move into pro'uction+ 1Eou can loa' an unencrypte' &itstream at any time, so you can easily loa' a test configuration an' then reloa' the encrypte' version+) The main 'ownsi'e to this scheme is that you re8uire a &attery &ac4up on the circuit &oar' to maintain the contents of the encryption 4ey register when power is remove' from the system+ This &attery will have a lifetime of years or 'eca'es, &ecause it nee' only maintain a single register in the 'evice, &ut it 'oes a'' to the si0e, weight, comple/ity, an' cost of the &oar'+ (Editor)s 'ote* "ince penning the above for the boo-3 I)ve discovered that Altera provide some devices 1ith a non-volatile encryption -ey capability3 1hich reomves the re+ irement for an e/ternal battery. F rthermore3 there is another aspect to this 4 anti-tampering. If yo have a volatile encryption -ey in an F&20 as disc ssed above3 then removing the battery bac- p ca ses this -ey to be lost3 thereby preventing yo from reloading the original (encrypted) config ration file. 5o1ever3 the fact that there is no longer a -ey means that yo can still load an nencrypted config ration file3 1hich means yo can load a "Trogan" design. 6y comparison3 sing a non-volatile encryption -ey in an F&20 allo1s yo to prevent a ser from loading a Tro7an design. For e/ample3 consider an F&20 performing a copy-protection f nction in a cons mer digital media device. $sing a non-volatile -ey prevents someone from loading that F&20

1ith a design that simply passes digital content thro gh 1itho t copy-protecting the o tp t.. A,tifuse=&ased devices 2nli4e !)A.%&ase' 'evices - which are programme' while resi'ent in the system antifuse%&ase' 'evices are programme' offline using a special 'evice programmer+ (Editors 'ote* F ndamental concepts 4 s ch as 1hat an antif se act ally is 4 are presented in Chapter 8 of the boo-.. The proponents of antifuse%&ase' "P#As are prou' to point to an assortment of 1not% insignificant) a'vantages+ "irst of all, these 'evices are non%volatile 1their configuration 'ata remains when the system is powere'%'own), which means that they are imme'iately availa&le as soon as power is applie' to the system+ "ollowing from their non%volatility, these 'evices 'on't re8uire any e/ternal memory 'evice to store their configuration 'ata, which saves the cost of an a''itional component an' also saves real estate on the &oar'+ 7ne noteworthy a'vantage of antifuse%&ase' "P#As is the fact that their interconnect structure is naturally 5)a' Har'5, which means they are relatively immune to the effects of ra'iation+ This is of particular interest in the case of military an' aerospace applications, &ecause the state of a configuration cell in an !)A.%&ase' component can &e 5flippe'5 if that cell is hit &y ra'iation 1of which there is a lot in space)+ 9y comparison, once an antifuse has &een programme', it cannot &e altere' in this way+ Having sai' this, it shoul' also &e note' that any flip%flops in these 'evices remain sensitive to ra'iation, so 'evices inten'e' for ra'iation%intensive environments must have their flip%flops protecte' &y triple red ndancy design+ This refers to having three copies of each register an' ta4ing a ma3ority vote+ I'eally, all three registers will contain i'entical values, &ut if one has &een 5flippe'5 such that two registers say 5*5 an' the thir' says 515, then the 5*s5 have it 1or vice versa if two registers say 515 an' the thir' says 5*5+) (Editor)s 'ote* In addition to rad-hard sol tions from the ma7or F&20 vendors3 there are also other specialist companies that design and man fact re radiation-hardened F&20)s that 4 they claim 4 overcome the real-estate iss e of triple-mode red ndancy9 one s ch company is Aerofle/ .icroelectronic !olutions3 for e/ample.. <adiatio, ca, come i, the form of gamma rays 7ver) hi+h=e,er+) 'hoto,s9. beta particles 7hi+h=e,er+) electro,s9. a,d alpha particles5 /t should &e ,oted that rad=hard devices are ,ot limited to a,tifuse tech,olo+ies5 >ther com'o,e,ts ? eve, those &ased o, 0<AM architectures ? ma) &e availa&le 4ith s'ecial rad=hard 'ac*a+i,+ a,d tri'le redu,da,c) desi+,5 9ut perhaps the most significant a'vantage of antifuse%&ase' "P#As is that their configuration 'ata is &urie' 'eep insi'e them+ 9y 'efault, it is possi&le for the 'evice programmer to rea' this 'ata out, &ecause this is actually how the programmer wor4s+ As each antifuse is &eing processe', the 'evice programmer 4eeps on testing it to 'etermine

when that element has &een fully programme', then it moves onto the ne/t antifuse+ "urthermore, the 'evice programmer can &e use' to automatically verify that the configuration was performe' successfully 1this is well worth 'oing when you're tal4ing a&out 'evices containing (* million%plus programma&le elements)+ In or'er to 'o this, the 'evice programmer re8uires the a&ility to rea' the actual states of the antifuses an' compare them to the re8uire' states 'efine' in the configuration file+ 7nce the 'evice has &een programme', however, it is possi&le to set 1grow) a special 5security5 antifuse that su&se8uently prevents any programming 'ata 1in the form of the presence or a&sence of antifuses) from &eing rea' out of the 'evice+ ?ven if the 'evice is 'e%cappe' 1its top is remove'), programme' an' unprogramme' antifuses appear to &e i'entical, an' the fact that all of the antifuses are &urie' in the internal metalli0ation layers ma4es it almost impossi&le to reverse engineer the 'esign+ Fen'ors of antifuse%&ase' "P#As may also tout a couple of other a'vantages relating to power consumption an' spee', &ut if you aren't careful this can &e a case of "the + ic-ness of the hand deceives the eye." "or e/ample, they might tease you with the fact that an antifuse%&ase' 'evice consumes only 2*G 1appro/imately) of the stan'&y power of an e8uivalent !)A.%&ase' component an' that their operational power consumption is also significantly lower, an' that their interconnect%relate' 'elays are smaller+ Also, they might casually mention that an antifuse is much smaller an' thus occupies much less real%estate on the chip than an e8uivalent !)A. cell 1although they may neglect to mention that antifuse 'evices also re8uire e/tra programming circuitry inclu'ing a large, hairy programming transistor for each antifuse)+ They will follow this &y noting that when you have a 'evice containing tens of millions of configuration elements, using antifuses means that the rest of the logic can &e much closer together+ This serves to re'uce the interconnect 'elays, there&y ma4ing these 'evices faster than their !)A. cousins+ An' &oth of the a&ove points woul' &e true +++ if one were to &e comparing two 'evices implemente' at the same technology no'e+ 9ut therein lays the ru&, &ecause antifuse technology re8uires the use of aroun' three a''itional process steps after the main manufacturing process has &een 8ualifie'+ "or this 1an' relate') reasons, antifuse 'evices are always at least one - an' usually several - generations 1technology no'es) &ehin' !)A.%&ase' components, which effectively wipes out any spee' or power consumption a'vantages that might otherwise &e of interest+ 7f course, the main 'isa'vantage associate' with antifuse%&ase' 'evices is that they are one-time-programmable (:T&), so once you've programme' one of these little scallywags its function is "set in stone." This ma4es these components a poor choice for use in a 'evelopment or prototyping environment+ #P<>M=&ased devices This section is short an' sweet &ecause no one currently ma4es - or has plans to ma4e ?P)7.%&ase' "P#As+

##P<>M/F8A0@=&ased devices ??P)7. or "BA!H%&ase' "P#As are similar to their !)A. counterparts in the fact that their configuration cells are connecte' together in a long shift%register%style chain+ These 'evices can &e configure' offline using a 'evice programmer+ Alternatively, some versions are in-system programmable (I"&), &ut their programming time is a&out 3H that of an !)A.%&ase' component+ 7nce programme', the 'ata they contain is non%volatile, so these 'evices woul' &e 5instant%on5 when power is first applie' to the system+ :ith regar's to protection, some of these 'evices use the concept of a multi%&it 4ey, which can range from aroun' (* &its to several hun're' &its in si0e+ 7nce you've programme' the 'evice, you can loa' your user%'efine' 4ey 1&it%pattern) to secure its configuration 'ata+ After the 4ey has &een loa'e', the only way to rea' 'ata out of the 'evice - or write new 'ata into it - is to loa' a copy of your 4ey via the DTA# port 1this port is 'iscusse' later in this chapter an' also in ,hapter ()+ The fact that the DTA# port in to'ay's 'evices runs at aroun' 2* megahert0 1.H0) means that it woul' ta4e &illions of years to crac4 the 4ey &y e/haustively trying every possi&le value+ Two%transistor ??P)7. an' "BA!H cells are appro/imately 2+(H the si0e of their one% transistor ?P)7. cousins, &ut they are still way smaller than their !)A. counterparts+ This means that the rest of the logic can &e much closer together, there&y re'ucing interconnect 'elays+ 7n the 'ownsi'e, these 'evices re8uire aroun' five a''itional process steps on top of stan'ar' ,.7! technology, which results in their lagging &ehin' !)A.%&ase' 'evices &y one or more generations 1technology no'es)+ Bast &ut not least, these 'evices ten' to have relatively high static power consumption 'ue to their containing vast num&ers of internal pull%up resistors+ @)&rid F8A0@=0<AM devices Bast &ut not least, there's always someone who wants to a'' yet 5one more5 ingre'ient into the coo4ing pot+ In the case of "P#As, some ven'ors offer com&inations of programming technologies+ "or e/ample, consi'er a 'evice where each configuration element is forme' from the com&ination of a "BA!H 1or ??P)7.) cell an' an !)A. cell+ In this case, the "BA!H elements can &e preprogramme'+ Then, when the system is powere' up, the contents of the "BA!H cells are copie' in a massively parallel fashion into their correspon'ing !)A. cells+ This techni8ue gives you the non%volatility associate' with antifuse 'evices, which means the 'evice is imme'iately availa&le when power is first applie' to the system+ 9ut unli4e an antifuse%&ase' component, you can su&se8uently use the !)A. cells to reconfigure the 'evice while it remains resi'ent in the system+ Alternatively, you can reconfigure the 'evice using its "BA!H cells either while it remains in the system or offline &y means of a 'evice programmer+

0ummar) of 'ro+rammi,+ tech,olo+ies The 4ey points associate' with the various programming technologies 'escri&e' a&ove are &riefly summari0e' in Table ;+

Table ;. " mmary of programming technologies. Fi,e=. medium=. a,d coarse=+rai,ed architectures It is common to categori0e "P#A offerings as &eing either fine-grained or coarsegrained+ In or'er to un'erstan' what this means, we first nee' to remin' ourselves that the main feature that 'istinguishes "P#As from other 'evices is that their un'erlying fa&ric pre'ominantly consists of large num&ers of relatively simple programma&le logic &loc4 5islan's5 em&e''e' in a 5sea5 of programma&le interconnect 1Fig ;)+

;. $nderlying F&20 fabric. In the case of a fine-grained architecture, each logic &loc4 can &e use' to implement only a very simple function+ "or e/ample, it might &e possi&le to configure the &loc4 to act as any 3%input function such as a primitive logic gate 1AC , 7), CAC , etc+) or a storage element 1 %type flip%flop, %type latch, etc+)+ In a''ition to implementing glue logic an' irregular structures li4e state machines, fine% graine' architectures are sai' to &e particularly efficient when e/ecuting systolic algorithms 1functions that &enefit from massively parallel implementations)+ These architectures are also sai' to offer some a'vantages with regar's to tra'itional logic synthesis technology, which is geare' towar' fine%graine' A!I, architectures+ The mi'%1;;*s saw a lot of interest in fine%graine' "P#A architectures, &ut over time the vast ma3ority fa'e' away into the sunset leaving only their coarse%graine' cousins+ In the case of a coarse%graine' architecture, each logic &loc4 contains a relatively large amount of logic compare' to their fine%graine' counterparts+ "or e/ample, a logic &loc4 might contain four $%input B2Ts, four multiple/ers, four %type flip%flops, an' some fast carry logic 1see the following topics in this chapter for more 'etails)+ An important consi'eration with regar' to architectural granularity is that fine%graine' implementations re8uire a relatively large num&er of connections into an' out of each &loc4 compare' to the amount of functionality that can &e supporte' &y those &loc4s+ 9y comparison, as the granularity of the &loc4s increase to me'ium%graine' an' higher, the amount of connections into the &loc4s 'ecreases compare' to the amount of functionality they can support+ This is important, &ecause the programma&le inter%&loc4 interconnect accounts for the vast ma3ority of the 'elays associate' with signals as they propagate through an "P#A+ 7ne slight 5fly%in%the%soup5 is that a num&er of companies have recently starte' 'eveloping really coarse%graine' 'evice architectures comprising arrays of nodes, where

each no'e is a highly comple/ processing element ranging from an algorithmic function such as a Fast Fo rier Transform (FFT) all the way up to a complete general%purpose microprocessor core 1see also Chapter <* =ho 0re 0ll the &layers> an' Chapter 8?* Field-&rogrammable 'ode 0rrays)+ Although these 'evices aren't classe' as "P#As, they 'o serve to mu''y the waters+ "or this reason, B2T%&ase' "P#A architectures are now often classe' as medi m-grained, there&y leaving the coarse%graine' appellation free to &e applie' to these new no'e%&ase' 'evices+ (Editor)s 'ote* For those 1ho are interested3 there)s a rather sef l paper tatled 0lternative comp ting sol tions3 from single cores to arrays of )things) that places these node-based architect res in the conte/t of other comp ting sol tions.. Mux versus 8A%=&ased lo+ic &loc*s There are two fun'amental incarnations of the programma&le logic &loc4s use' to form the me'ium%graine' architectures reference' in the previous topic> .2H 1multiple/er) &ase' an' B2T 1loo4up ta&le) &ase'+ MAB is 'ro,ou,ced to rh)me 4ith :flux:. 4hile 8A% is 'ro,ou,ced to rh)me 4ith :,ut:5 MAB=&ased: As an e/ample of a .2H%&ase' approach, consi'er one way in which the 3%input function ) C 7a D &9 E c coul' &e implemente' using a &loc4 containing only multiple/ers 1Fig 8)+

8. #$@-based logic bloc-. The 'evice can &e programme' such that each input to the &loc4 is presente' with a logic *, a logic 1, or the true or inverse version of a signal 1'a', '&', or 'c' in this case) coming from another &loc4 or from a primary input to the 'evice+ This allows each &loc4 to &e configure' in myria' ways to implement a plethora of possi&le functions+ 1The '/' shown

on the input to the central multiple/er in Fig 8 in'icates that we 'on't care whether this input is connecte' to a * or a 1+) 8A%=&ased: The un'erlying concept &ehin' a B2T is relatively simple+ A group of input signals are use' as an in'e/ 1pointer) into a loo4up ta&le+ The contents of this ta&le are arrange' such that the cell pointe' to &y each input com&ination contains the 'esire' value+ "or e/ample, let's assume that we wish to implement the function ) C 7a D &9 E c 1Fig ?)+

?. Ae+ ired f nction and associated tr th table. This can &e achieve' &y loa'ing a 3%input B2T with the appropriate values+ "or the purposes of the following e/amples, we shall assume that the B2T is forme' from !)A. cells 1&ut it coul' &e forme' using antifuses, ?2P)7., or "BA!H cells as 'iscusse' earlier in this chapter)+ A commonly use' techni8ue is to use the inputs to select the 'esire' !)A. cell using a casca'e of transmission gates as shown in Fig B+ 1Cote that the !)A. cells will &e also connecte' together in a chain for configuration purposes - that is, to loa' them with the re8uire' values - &ut these connections have &een omitte' from this illustration to 4eep things simple+)

B. 0 transmission gate-based C$T (the programming chain is omitted for p rposes of clarity). If a transmission gate is ena&le' 1active), it passes the signal seen on its input through to its output+ 9ut if the gate is 'isa&le', its output is electrically 'isconnecte' from the wire it is 'riving+ The transmission gate sym&ols shown with a small circle 1calle' a 5&o&&le5 or a 5&u&&le5) in'icate that these gates will &e activate' &y a logic * on their control input+ 9y comparison, sym&ols without &o&&les in'icate that these gates will &e activate' &y a logic 1+ 9ase' on this un'erstan'ing, it's easy to see how 'ifferent input com&inations can &e use' to select the contents of the various !)A. cells+ Mux=&ased versus 8A%=&asedF 7nce upon a time - when engineers han'crafte' their circuits prior to the a'vent of to'ay's sophisticate' computer%ai'e' 'esign tools - some fol4s say that it was possi&ly to achieve the &est results using .2H%&ase' architectures+ 1!a' to relate, they usually 'on't 8ualify this with regar's to e/actly how these results were 5&etter,5 so this is largely left to our imaginations+) It is also sai' that .2H%&ase' architectures have an a'vantage when it comes to implementing control logic along the lines of "If this is inp t is tr e and this inp t is false3 then ma-e that o tp t tr e..." sort of thing+ However, some of these architectures 'on't provi'e high%spee' carry logic chains, in which case their B2T%&ase' counterparts are left the lea'ers in anything to 'o with arithmetic processing+ 0ome mux=&ased architectures ? such as those fielded &) Guic*8o+ic ? feature lo+ic &loc*s co,tai,i,+ multi'le la)ers of muxes 'receded &) 'rimitive lo+ic +ates li*e A2$s5 %his 'rovides them 4ith a lar+e fa,=i, ca'a&ilit). 4hich is claimed to +ive them a, adva,ta+e for address decodi,+ a,d state machi,e decodi,+ a''licatio,s5

Throughout much of the 1;;*s, "P#As were wi'ely use' in the telecommunications an' networ4ing mar4ets+ 9oth of these application areas involve pushing lots of 'ata aroun', in which case B2T%&ase' architectures hol' the high groun'+ "urthermore, as 'esigns 1an' 'evice capacities) grew larger an' synthesis technology increase' in sophistication, han'crafting circuits largely &ecame a thing of the past+ The en' result is that the ma3ority of to'ay's "P#A architectures are B2T%&ase' as 'iscusse' &elow+ 3=. (=. 5=. or !=i,'ut 8A%sF The great thing a&out an n%input B2T is that it can implement any possi&le n%input com&inational 1or com&inatorial) logic function+ A''ing more inputs allows you to represent more comple/ functions, &ut every time you a'' an input you 'ou&le the num&er of !)A. cells+ 0ome fol*s 'refer to sa) :com&i,atio,al lo+ic.: 4hile others favor the term :com&i,atorial lo+ic5: The first "P#As were &ase' on 3%input B2Ts+ "P#A ven'ors an' university stu'ents su&se8uently researche' the relative merits of 3%, $%, (%, an' even @%input B2Ts into the groun' 1whatever you 'o, 'on't get trappe' in conversation with a &unch of "P#A architects at a party)+ The current consensus is that $%input B2T! offer the optimal &alance of tra'eoffs+ In the past, some 'evices were create' using a mi/ture of 'ifferent B2T si0es, such as 3% input an' $%input B2Ts, &ecause this offere' the promise of optimal 'evice utili0ation+ However, one of the main tools in the 'esign engineer's treasure chest is logic synthesis, an' uniformity an' regularity are what a synthesis tool li4es &est+ Thus, all of the really successful architectures are currently &ase' only on the use of $%input B2Ts+ 1This is not to say that mi/e'%si0e' B2T architectures won't re%emerge in the future as 'esign software continues to increase in sophistication+) (Editor)s 'ote* Things have moved on significantly since I originally penned the 1ords above. For e/ample3 the recently introd ced Dirte/-, family from Hilin/ feat res <-inp t C$Ts. #ean1hile3 Altera has a fabric that combines t1o B-C$Ts and fo r ?-C$Ts. In addition to allo1ing designers to form a <-C$T3 this also allo1s yo to ma-e a ,-C$T and a ?-C$T3 and many other combinations. It is said that 4 depending on the design 4 this type of fabric can be m ch more more efficient in terms of overall logic tilization than an fabric based on a single n-C$T.. 8A% versus distri&uted <AM versus shift re+ister The fact that the core of a B2T in an !)A.%&ase' 'evice comprises a num&er of !)A. cells offers a num&er of interesting possi&ilities+ In a''ition to its primary role as a loo4up ta&le, some ven'ors allow the cells forming the B2T to &e use' as a small &loc4 of )A. 1the si/teen cells forming a $%input B2T, for e/ample, coul' &e cast in the role of a 1@ I 1 )A.)+ This is referre' to as distrib ted A0# &ecause 1a) the B2Ts are strewn 1'istri&ute') across the surface of the chip an' 1&) this 'ifferentiates it from the larger chun4s of bloc- A0# intro'uce' later in this chapter+

Eet another possi&ility 'evolves from the fact that all of the "P#A's configuration cells inclu'ing those forming the B2T - are effectively strung together in a long chain 1Fig ,)+

,. Config ration cells lin-ed in a chain. This aspect of the architecture is 'iscusse' in more 'etail in ,hapter (+ The point here is that - once the 'evice has &een programme' - some ven'ors allow the !)A. cells forming a B2T to &e treate' in'epen'ently from the main &o'y of the chain an' to &e use' in the form of a shift register+ Thus, each B2T may &e consi'ere' to &e multi% facete' 1Fig <)+

<. 0 m lti-faceted C$T. C81s versus 8A1s versus slices 555 "#an can not live by C$Ts alone3" as the 9ar' woul' surely say if he were to &e acci'entally reincarnate' as an "P#A 'esigner+ "or this reason, in a''ition to one or more B2Ts, a programma&le logic &loc4 will contain other elements such as multiple/ers an' registers+ 9ut &efore we 'elve into this topic, we first nee' to wrap our &rains aroun' some terminology+ A Bili,x lo+ic cell 78C9 7ne niggle when it comes to "P#As is that each ven'or has its own names for things+ 9ut we have to start somewhere, so let's 4ic4 off &y saying that the core 5&uil'ing &loc45 in a mo'ern "P#A from Hilin/ is calle' a logic cell (CC)+ Amongst other things, an B, comprises a $%input B2T 1which can also act as a 1@ I 1 )A. or a 1@%&it shift register), a multiple/er an' a register 1Fig E)+

E. 5ighly-simplified vie1 of a @ilin/ logic cell (CC). It must &e note' that this illustration is a gross simplification, &ut it will serve our purposes here+ The register can &e configure' 1programme') to act as a flip%flop as shown in Fig E or as a latch+ The polarity of the cloc- 1rising%e'ge%triggere' or falling% e'ge%triggere') can &e configure', as can the polarity of the cloc- enable an' setFreset signals 1active%high or active%low)+ In a''ition to the B2T, .2H, an' register, the B, also contains a smattering of other elements, inclu'ing some special fast carry logic for use in arithmetic operations 1this is 'iscusse' in more 'etail a little later)+ A, Altera lo+ic eleme,t 78#9 Dust for reference, the e8uivalent core 5&uil'ing &loc45 in an "P#A from Altera is calle' a logic element (CE)+ There are a num&er of 'ifferences &etween a Hilin/ B, an' an Altera B?, &ut the overall concepts are very similar+ 0lici,+ a,d dici,+ The ne/t step up the hierarchy is what Hilin/ call a 5!lice5 1the other ven'ors have their own e8uivalent names)+ :hy 5slice5J :ell, they ha' to call it something, an' whichever way you loo4 at it - the term slice is 5something5+ At the time of writing, a slice contains two logic cells 1Fig G)+

G. 0 "slice" containing t1o logic cells. The reason for the "at the time of 1riting" 8ualifier is that these 'efinitions can - an' 'o change with the seasons+ (Editor)s 'ote* For e/ample3 the recently anno nced Dirte/-, family from @ilin/ has fo r <-inp t C$Ts per slice.. The internal wires have &een omitte' from this illustration to 4eep things simple, &ut it shoul' &e note' that, although each logic cell's B2T, .2H, an' register have their own 'ata inputs an' outputs, the slice has one set of cloc-, cloc- enable, an' setFreset signals that are common to &oth logic cells+ C81s a,d 8A1s An' moving one more level up the hierarchy we come to what Hilin/ call a config rable logic bloc- (CC6) an' what Altera refer to as a logic array bloc- (C06)+ 17ther "P#A ven'ors may have their own names for each of these entities+) 2sing ,B9s as an e/ample, some Hilin/ "P#As have two slices in each ,B9 while others have four+ At the time of writing, a ,B9 e8uates to a single logic &loc4 in our original visuali0ation of 5islan's5 of programma&le logic in a 5sea5 of programma&le interconnect 1Fig H)+

H. 0 CC6 containing fo r slices (the n mber of slices depends on the F&20 family). There is also some fast programma&le interconnect 1ithin the ,B9+ This interconnect 1not shown in Fig H for reasons of clarity) is use' to connect neigh&oring slices+ The reason for having this type of logic%&loc4 hierarchy - B,, then !lice 1with two B,s), then ,B9 1with four !lices) - is that it is complemente' &y an e8uivalent hierarchy in the interconnect+ Thus, there is fast interconnect &etween the B,s in a slice, then slightly slower interconnect &etween slices in a ,B9, followe' &y the interconnect &etween ,B9s+ The i'ea is to achieve the optimum tra'eoff &etween ma4ing it easy to connect things together without incurring e/cessive interconnect%relate' 'elays+ $istri&uted <AMs a,d shift re+isters :e previously note' that each $%&it B2T can &e use' as a 1@ I 1 )A.+ An' things 3ust 4eep on getting &etter an' &etter, &ecause - assuming the 5four%slices%per%,B95 configuration illustrate' in Fig H - all of the B2Ts within a ,B9 can &e configure' together to implement the following>

!ingle%port 1@ I < &it )A. !ingle%port 32 I $ &it )A.

!ingle%port @$ I 2 &it )A. !ingle%port 12< I 1 &it )A. ual%port 1@ I $ &it )A. ual%port 32 I 2 &it )A. ual%port @$ I 1 &it )A.

Alternatively, each $%&it B2T can &e use' as a 1@%&it shift register+ In this case, there are special 'e'icate' connections &etween the logic cells within a slice an' &etween the slices themselves that allow the last &it of one shift register to &e connecte' to the first &it of another without using the or'inary B2T output 1which can &e use' to view the contents of a selecte' &it within that 1@%&it register)+ This allows the B2Ts within a single ,B9 to &e configure' together to implement a shift register containing up to 12< &its as re8uire'+ %he 'oi,t 4here a set of data or co,trol si+,als e,ters or exits a lo+ic fu,ctio, is commo,l) referred to as a :'ort:5 /, the case of a si,+le='ort <AM. data is 4ritte, i, a,d read out of the fu,ctio, usi,+ a commo, data &us5 1) com'ariso,. i, the case of a dual='ort <AM. data is 4ritte, i,to the fu,ctio, usi,+ o,e data &us 7'ort9 a,d read out usi,+ a seco,d data &us 7'ort95 /, fact. the read a,d 4rite o'eratio,s each have a, associated address &us 7used to :'oi,t to: a 4ord of i,terest i,side the <AM95 %his mea,s that the read a,d 4rite o'eratio,s ca, &e 'erformed simulta,eousl)5 Fast carr) chai,s A 4ey feature of mo'ern "P#As is that they inclu'e the special logic an' interconnect re8uire' to implement fast carry chains+ In the conte/t of the ,B9s intro'uce' in the previous section, each logic cell 1B,) contains special carry logic+ This is complemente' &y 'e'icate' interconnect &etween the two B,s in each slice, &etween the slices in each ,B9, an' &etween the ,B9s themselves+ This special carry logic an' 'e'icate' routing &oosts the performance of logical functions such as counters an' arithmetic functions such as a''ers+ The availa&ility of these fast carry chains - in con3unction with features li4e the shift register incarnations of B2Ts 1'iscusse' a&ove) an' the em&e''e' multipliers etc+ 1intro'uce' &elow) - provi'e' the wherewithal for "P#As to &e use' for applications li4e 'igital signal processing 1 !P)+ #m&edded <AMs A lot of applications re8uire the use of memory, so "P#As now inclu'e relatively large 5chun4s5 of em&e''e' )A. calle' e-A0# or bloc- A0#+ epen'ing on the architecture of the component, these &loc4s might &e positione' aroun' the periphery of the 'evice, scattere' across the face of the chip in relative isolation, or organi0e' in columns as shown in Fig ;I+

;I. Fig ;I. 6irds-eye vie1 of chip 1ith col mns of embedded A0# bloc-s. epen'ing on the 'evice, such a )A. might &e a&le to hol' anywhere from a few thousan' to tens of thousan's of &its+ "urthermore, a 'evice might contain anywhere from tens to hun're's of these )A. &loc4s, there&y provi'ing a total storage capacity of a few hun're' thousan' &its all the way up to several millions of &its+ (Editor)s 'ote* 0gain3 things have moved on since I penned the above. For e/ample3 in the case of the "tratti/ II F&20 family that 1as introc ced by Altera in 8IIB3 some of these little scamps have A0# bloc-s that hold 5$'!AE!" of tho sands of bits (J,HIK bits) and a device may contain T5:$"0'!" of A0# bloc-s (the largest "trati/ II device as these 1ords drip off my finger tips have ;EIE bloc-s).. ?ach &loc4 of )A. can &e use' in'epen'ently, or multiple &loc4s can &e com&ine' together to implement larger &loc4s+ These &loc4s can &e use' for a variety of purposes, such as implementing stan'ar' single% or 'ual%port )A.s, first%in first%out 1"I"7) functions, state machines, an' so forth+ 0ee also Part 2 of this article555

?m&e''e' multipliers, a''ers, .A,s, etc+ ?m&e''e' processor cores 1har' an' soft) ,loc4 trees an' cloc4 managers #eneral%purpose I67 #iga&it transceivers Har' IP, soft IP, an' firm IP

!ystem gates versus 5real5 gates "P#A 5years5

Clive "Max" Maxfield is president of Tech9ites Interactive3 a mar-eting cons ltancy firm specializing in high technology. #a/ is the a thor and co-a thor of a n mber of boo-s3 incl ding 9e&op to the 9oolean 9oogie 1An 2nconventional #ui'e to ?lectronics)3 The esign :arrior's #ui'e to "P#As 1 evices, Tools, an' "lows)3 and How ,omputers o .ath feat ring the pedagogical and phantasmagorical virt al IE ,alculator. =idely regarded as being an e/pert in all aspects of comp ting and electronics (at least by his mother)3 #a/ 1as once referred to as "an ind stry notable" and a "semicond ctor design e/pert" by someone famo s 1ho 1asn)t prompted3 coerced3 or rem nerated in any 1ay. #a/ can be reached at ma/Ktech&ites+com+ http>66www+eetimes+com6'esign6programma&le%logic6$*1$<$$6"P#A%Architectures% from%A%to%L%%Part%1JpageCum&erM3