You are on page 1of 6
Clusiered Voltage Scaling Technique for Low-Power Design iyosh} Usami* and Mark Horowitz Stanford University, Stanford, CA 94305 Abstract “This paper describes a technique: to reduce power without changing circuit performance by making use of two supply Vallapes. Gates Off the critical path are-nun at the Lower supply to reduce ower. Te-minimizo the: siiaiber of interfacing svel-converters noéded, our algorithm clusters the circuits which opeate at reduced voltage, leading to clustered voltage scaling (CVS). We applied the CVS techaiqae to design exeraples of contol logic: in areal microprocessor, which had besn imple- mented using a low-power library, Minimizing the power is achieved by combining the gate re-sizing and the CVS technique. The CVS teshnique was able to further reduce the power by 10-20%. 1. Introduction Volage scaling is-ane of the most etfective sechniques in reducing the power consumption oF CMOS citcuits [1]. The majority of the power is dynamic powcr, which is reduced quadratically ‘vith the supply valiage Vp [2]. The cunzent in the MOS transistdr, howover, decreases with Yop which Jeads to increase the delay of the cireutt, causing the pocformaxice degradation. ‘Tn adesiga of most micropcoceszors oF ASIC chips, the operating frequency is scv by the darget market, The timing constraints in the chip. are in turn set by the operating frequency, Designers need to optimize the design t0 reduce the power consump the constant Vth, the-critical-path delay will nat moet che timing constraints, Alehough some re- scarches bry to compensate for the Inst perfor mance by up-sieang the transistors (J, this will reach a point of litited returns. Moreover, in the real design, iti likely that meetiag the ming, constraints Will be very difficult evan without a eureantaddcess is Toshiba Corp. $801, Hoeikawa- cho, Baiwai-ka, Knwtakt, Japan, This work was done whi he wos a-visiting catolar ai Stanford. evaslon te make gible’ cupie of all or parc is material for prcineal ws Chases We is ented wilbaut re prdie hot he eapioo see ok mate or daipyted fr proGt of corumescal sleimmag, tho cppaiht Fouce, the Ge af tha pubedlom andi dace eppeer, and wou is pen Bal eapaight & by porfdan ofthe ACM ec, Te ongy alkervie to nigh, io peak ta Hoeven ce these Co Nt ques PPE fermion a ee. SELED 35 Dama Point CA MEAD 1995 ACN 0.89951-744 0504.58.50 om within the specified timing. constraints. If thc supply Voltageis seduced under scaled supply. na real control Logie, notevery path is ceitical, Non-criticel paths have positive "slash", the difference bevwizen the required time and the, arrival time of signals. A power reduction tech- nique bas been reported in [3], in which the:tran- sistors ere downsized where the excessive slacks extgt, Since transistors get less efficient as they get very small and thete is a fixed minimum size, we propose "Clustered Voltage Sealing” (CvS) which partially reduces the supply voltage to cir- cuits that have excessive slacks. The next sostion presents aur method, and Section 3 presomts re- ‘sults from using this algorithm. 2. Clustered Voltage Sealing 21 Clustered Voltage Scaling Structure ‘The basic idea is to provide wo different supply voltages: Vo. (lhe reduced voltage) and Voout (the original {utvedueed) volkage}. Circuit with excessive slack are made ta operate at Vane, while those along the critical pathe are taade te 9p- ‘trate ot Voor. However, there are a couple of peobleens when two different supply voltages co- ‘Geist. One is the static current flowing ar the inter- fag of the Vow. pam and the Vooy pare. As showa in Fig, 2.1, if Ge oupet of a circuit operating al Vopr (ealled as the Vpn eiteuit) is connected di- rectly t0 the input caf a circuit operating at Vouy (calledLas the Vppy civcuit), dhe static cunent flows inthe Vooe cincoit a the input level "High. Since. the voltage of the nede M1 is wot raised higher than Voor even at "High" level, che PMOS (MP1) eantioe be cut-off Yay, < Vonuy ~My It canses the $1866 current front Vgbit to Vs through MMP | and MAI. I is mot desirable for Jovr-power oriented design, A Jevel converses, like the one shown in Fig, 22, blacks the statig current if tis jinsereed at the nede N1. It should be noted thac ao Tavel-conventeris needed in the reversed case, ie. svhen the oulgitaf a Vou circuit ig comneated to the input of 2 Vox citevit. The static current does not flow because the inpux of the Voc circuit is deiven up to Yoox. Vou Cee Fig2.1 Direct Connection Fig2,2 Conventional of Vou Ciratand Level Comverser “Vora Cirovit Aithough the level converter shown in Fig. 2.2 prevents the static powar, dynamie power is falny large when wggliog. In our measimement, it dissipanes a¢- much ag 0-SmW per gircuit at 1O0MHi, with delay of @ Sn using @8am CMOS. tf we prodiice she structure in which a Vpoc sircuit and a Vppg eizeuit are connected one after the other, such as "Vppy cireuit~ Vonacireuit = Vong sireuit’- Vopay siteurt -..", algtof level converters are. nazded at each interface berween dhe Vppe, cirouit and the Vony circuit . The sumat ‘ew power could cancel our the power reduction given by reducing the supply voltage partially. - “To avoid this problem, we propose the Clhstered Voltage Sealing (CWS) stuceare shown in Fig. 2.3. The structwre minimizes the umber of Jevel convertors needed. Astuming the eortbina- tional logic is implemented using standard calls, (he structure primary_inputts > Vopy, Cells + Vig, cols > fevel converters 9 primary _ostiputs is formed in all paths from the primary inputs the primary oumputs. This leads to the forraation of ‘the cluster of ‘Viyyy e2lls and thet of Veqy, cells. ‘Supply voltage Vn1, is given to ibe shaded cells in Big. 2.3. Level comvemters are inscrtedt only be- ‘rween the Jast cells inthe Vo Voy contee- dens, The C¥S structure has another advantage: ‘the overhead is small fathe layout because of the clnstered structure, When a designer performs placoment and routing asing standard cells, he'she ‘groups the oells in the Wypyy Cluster and thase in. ‘the Vion luster, and thes place thant separate rows (10: Level Convener} Fig2.3 Clustered-VoltagesScaling (CVS) Smucture 22 Latch with Level-Conversion Function Irthe peimary ourpet is connected toa Jatch, the level converter can be menged in With the latoh as a8 shown in Fig. 24, This i6 a laxch with levelsconversion function (LCF). Reduced voltage swing at the node "IN" is converied to 5 ¥-swing ax the output Q. Power and delay of the laich wer- sos the inpot voltage (Vyy) are plotted ie Fig. 2.5, The data of the potver aad delay were obtaitied sing SPICE, We assumed the operoting fre~ quency of 1OOM#Hz and 03LmCMOS technology. ‘The delay has been measured from IN to Q. we Belay, Tae Tags Ss Fig.2.4 Lach with “VIN Groits) Level-Conversion Fig 2.5 Power and Detay of Buncign Latch with LCF ‘The power and the delay both increase when Vin gets lower, because the slope of the voliage wavex form al the apde D Bar decreases. We compared the power of a simple combination of the conven- tonal level-converter and the conventional lsich, ‘with that of the latch with LCF, Av Vi = 3Y, the power of the simple combination is |.0Sm¥¥ be- uuse the level-converier consumes OSiiW and the Jatch does 0.5m, while the power of the latch with LEF is coly dsm per eizcuit. The power of the latch with LCF is much less than that Of the simple combination. 23 Algorithms and Heuristics for Creating CVS Structure 4. Basic Algorisim For the gate-level neilist, we perform backward graph-traversal using the Depth-First- ‘Search {DFS} algonthm from the primary ourputs. foward the primary inputs. Bach time we visita cell, we try to replace it with a Vpn cell IF dhe ‘timing constraints is still net even after the re- placement, the cell is replaced, This process is ce ‘Pested uncil we finish visiting all the cells in the module. We will usc Fig. 2.7 os an example. Fig.2.7 Initial Circait Graph traversal is initiated at he primary ontpues (ol, 02,03, od). Weich of tients taka fest sdeteernined by a heuriglie, which will be discussed Jater. When ol is taken fret, replacement js ried at the cell "G8". Degraded delay oF G8 is computed whon its supply voltage Is reduced from Vipa ta Yop by the following equation: Ys, Ops ~¥ 0 Bel ana OEE # Belay Static timing analysis by PERT [4] is performed asing the degraded delay to chock the arcival time, requized time and slack atearh node. Hf the slack is positive, the Vigna coll at GS is replaced with a Von. call and the backovard traversal is continued, ‘The call GS'is visited next and seplace ment is tried by checking the slack using the stntic timing unal- ysis. I€ GS ¢ suecessfully replaced. the fanins of GSarcexemined, ‘On the contrary, if the negative Slack is detected, the cell is not repladed. For example, if Ge mepative slack is detected when trying tose place G5, GS is notzeplaced and is marked as “onreplaceable", We stop the traversal from the fanins (02, 03,14} toward the primary inpats and newly initiate the backward waversal atthe noxe ‘primary ontput. 4. Algoritian for Handiing Fanowts Ja performing the waversal, we need tobe careful about the eell with multiple fanouts. For example, ia above-mentioned traversal of ‘01 9 G8 — GS, the cell GS has tvvo famous of ‘GB and G9. In order to replace the Vy cell at GS with a Vppg cell, not only G8 but aso G9 should bbe replaced with 4 Voy, osll, Otherwise, the CVS structure with Primary_inputs => Voy ClUste¥ > Voge eltister A} outpus taiches with LOF cannot be formed. Although G8 is replaced in the, aversal, GS is not visited yet so we need fo OY 10 replace al-G9, Moreover, if GP has multiple fanouts, we need to my replace the Vpper celle at all of the fanours with Vpn cells. ‘When trying to repisce the cells with mi. tiple fanomts, we perform the forward teaversal using DFS toward the primacy outputs to check if the child cells can bo replaced wieh Vopx, cells It fs zat until every child checked in die procedtu ig replaced that GS it replaced witha Yop, celi. Eveumally. the CVS structure shown in Fig. 23 is completed. The Vpa cells and the ‘Vou, sells are-closiered, respeetvely. os Henristies A couple of heuristics, larger and “largerSlack" ere provided, They determine the order of the candidates it DES. For ‘example, we eed to determine the ofder of the primary outpnts 01, 02, 03 and-o4 when initiating DES. if we choose the heuristic "larger, the nodes ol-of ane aninged in descending order with ‘heir load rapacitiet, such as {ol, 03, 02,04} 401 has dhe largest capacitance), The backward DES is initiated at ol, followed by «3, 0? and of. Ef wo choose the heuristic "langerSlack", the nodes ‘ave arranged in descending onder with their slacks. ‘Weealso use these heuristigs 69 order choices i side the DES. 3, Experitnental Results and Discussions $4 Design Examples We implemented the algorithms and the heuristics deseribed in Section 2 in auc tool "Power Slimmer". i has boon 7a on some design examples of contr logic in a superscaler micro- processor, Torch [5], The outline af the examples Js shown in Table 3.1. The module "AdeeSBI" and "BdecSE1" ace decode & control logis. Only combinational civcuits are eatracted fromthe origi nal circuits. Table 3.1_ Outline of Design Examples RHEE | Mase) BEET] TOLER, ap so = outpats 30 a5 #of calls 164 14 34 Desiga Flew ‘The design flow is shown in Fig.3.1 Tagic Syaikenis Combination Fat 32 Libraries Wo tied osing wo standard cell Hbraries, a "Regular Lib” aad 3 "Fine Lib" as shown in Table 3:2. The Regular Lib was acwally vsed in the dedign of the Torch micreprocessor, in which the minimum cell has the transistors of Wp / Wri = 200m 10pm in 0.3m CMOS. Lt in- cludes apo 4 inputs of NANDs, NORs, ANDs, ‘ORs and inverters, logether with XORs, AQMs and ‘OAls. The Fins Lib has been prepared for the ex- periment, in which the minimum cell has the tran sistors of WJ Was Spdor /2.Syere hich is close athe minimum vansistor width, Furthermore, the Fine Jib hax more variations of transistor sizes shan the Regufar Lib, ‘Table 3.2. Regular Lib and Fine Lib RegularLib [Fine Gib ‘Varintons 3 Te, ae Sx, ak ‘Tronsisior size in 0.25 step, Bx, 12x, 16 For ceils a7, a6 Fons Enna [Say Power EET ‘Cierbag Sim Ij” Progam

You might also like