Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
0Activity
0 of .
Results for:
No results containing your search query
P. 1
getPDF.jsp

getPDF.jsp

Ratings: (0)|Views: 12|Likes:
Published by Cskarthi Keyan

More info:

Published by: Cskarthi Keyan on Jul 03, 2012
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

07/03/2012

pdf

text

original

 
356 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 18, NO. 3, MARCH 2010
Reducing SRAM Power Using Fine-GrainedWordline Pulsewidth Control
Mohamed H. Abu-Rahma
 , Member, IEEE 
, Mohab Anis
 , Member, IEEE 
, and Sei Seung Yoon
 Abstract—
EmbeddedSRAMdominatesmodernSoCs,andthereisastrongdemandforSRAMwithlowerpowerconsumptionwhileachieving high performance and high density. However, the largeincrease of process variations in advanced CMOS technologies isconsidered one of the biggest challenges for SRAM designers. Inthe presence of large process variations, SRAMs are expected toconsume larger power to ensure correct read operations and meetyield targets. In this paper, we propose a new architecture thatsignificantly reduces the array switching power for SRAM. Theproposed architecture combines built-in self-test and digitally con-trolled delay elements to reduce the wordline pulsewidth for mem-ories while ensuring correct read operations, hence reducing theswitching power. Monte Carlo simulations using a 1-Mb SRAMmacro in an industrial 45-nm technology are used to verify thepower saving for the proposed architecture. For a 48-Mb memorydensity, a 27% reduction in array switching powercanbe achievedfor a read access yield target of 95%. In addition, the proposedsystem can provide larger power saving as process variations in-crease, which makes it an attractive solution for 45-nm-and-belowtechnologies.
 Index Terms—
Built-in self test (BIST), low power, random vari-ations, SRAM, statistical, statistical yield estimation.
I. I
NTRODUCTION
W
ITH TECHNOLOGY scaling, the requirements of higher density and lower power embedded SRAMsare increasing exponentially. It is expected that more than90% of the die area in future systems-on-chip (SoCs) will beoccupied by SRAM [1]. This is driven by the high demandfor low-power mobile systems, which integrate a wide rangeof functionality such as digital cameras, 3-D graphics, MP3players, and other applications. In the mean time, randomvariations are increasing significantly with technology scaling.Random dopant fluctuation (RDF) is the dominant source of random variation in the bit cell’s transistors. The variations indue to RDF are inversely proportional to the square rootof device area [2]. Therefore, SRAM bit cells experience thelargest random variations on a chip, as bit-cell transistors aretypically the smallest devices for the given design rules [3]–[6].Embedded SRAMs usually dominate the SoC silicon area,and their power consumption (both dynamic and static) is aconsiderable portion of the total power consumption of a SoC.
Manuscript received May 27, 2008; revised November 19, 2008. First pub-lished June 10, 2009; current version published February 24, 2010.M. H. Abu-Rahma and S. S. Yoon are with Qualcomm Incorporated, SanDiego, CA 92121 USA (e-mail: marahma@qualcomm.com).M.AnisiswiththeDepartmentofElectricalandComputerEngineering,Uni-versityofWaterloo,Waterloo,ONN2L3G1,Canada(e-mail:manis@vlsi.uwa-terloo.ca).Digital Object Identifier 10.1109/TVLSI.2009.2012511Fig. 1. Memory read power and bitline differential versus
for a 512-kbmemory in 65-nm technology.
Moreover, the SRAM yield can dominate the overall chipyield. Hence, statistical design margining techniques are usedto guarantee high memory yield. However, to achieve highyield, memory power consumption (and speed) is negativelyimpacted. The stringent requirements of high yield and lowpower consumption require combining circuit and architecturaltechniques to reduce SRAM power consumption.SRAMarrayswitchingpowerconsumptionisconsideredoneof the largest components of power in high-density memories[7], [8]. This is mainly because of the large memory arrays andtherequirementsforhighareaefficiencywhichforceSRAMde-signers to use the maximum numbers of rows and columns en-abled by the technology. Fig. 1 shows the dynamic power con-sumption for read operations versus wordline (WL) pulsewidth
1063-8210/$26.00 © 2009 IEEE
 
ABU-RAHMA
et al.
: REDUCING SRAM POWER USING FINE-GRAINED WORDLINE PULSEWIDTH CONTROL 357
for a 512-kb memory macro in an industrial 65-nm tech-nology. Power consumption results are extrapolated toto estimate the component of switching power due to the pe-ripheral circuits. For normal operating conditions, array powerconsumption is more than 60% of read power. Therefore, it isimportant to reduce the array switching due to its strong impacton the memory’s total power as well as the SoC’s power.Several circuit techniques have been proposed to reduce theSRAMarrayswitchingpowerconsumptionbyreducingtheWLpulsewidth. One of the most common techniques to controlis using a bit-cell replica path, which reduces the bitline dif-ferential, hence lowering power consumption [4], [6], [9]–[11].Replica-path (e.g., self-timed) techniques provide a simple ap-proachofprocesstrackingfor
globalvariations
(interdieorsys-tematic within die) as well as environmental variations (voltageand temperature). However, these circuit techniques are not ef-ficient when memory bit cells experience large random varia-tion,sincecircuittechniquescannotadaptto
randomvariations
.Therefore, their effectiveness decreases with process scaling,and larger design margins are used which increases power con-sumption due to larger . To reduce the loss due to exces-sive margining, circuits and architectures must be designed to-gether to reduce power and manage variability. Higher levels of design abstraction can have better variation-tolerance capabil-ities because the impact of random variation can be measuredat that level. Therefore, combining architecture techniques withcircuit-level designs can reduce the pessimism in using worstcase approaches and can help adapt the circuit to random varia-tions, which can reduce power consumption [3], [12], [13].In this paper, we propose a new architecture that reducesSRAMswitchingpowerconsumptionbyusingfine-grainedWLpulsewidth control. The proposed solution combines a memorybuilt-in self-test (BIST) with additional logic to reduce theswitching power consumption for the memory. The proposedarchitecture helps recover the switching power consumed dueto the worst case assumption used in SRAM design to achievea high yield. The proposed architecture utilizes infrastructurewhich is already available in SoC with very low area overhead.The rest of this paper is organized as follows. In Section II,we derive statistical models for memory read access yield andread power consumption, which show the tradeoff betweenthese metrics. In Section III, we describe the proposed systemand its operation. In Section IV, we present the statistical sim-ulation flow and the power savings using the proposed systemwhen applied for memories in an industrial 45-nm technology.In addition, we discuss some design considerations related tothe proposed system. In Section V, we present our conclusions.II. Y
IELD AND
P
OWER
T
RADEOFF
Due to the random variation in SRAM bit cells, there is atight coupling between memory yield and power consumption.Toachievehighyield,readaccessfailuresshouldbeminimized.Read access yield
1
is defined as the probability of a correct readoperation. In a read operation, the selected WL is activated for aperiod of time to allow the bitlines to discharge. The WL activa-tion time ( ) is a critical parameter for memory design since
1
In this paper, we use the term yield to refer to read access yield.Fig. 2. Typical SRAM architecture.
it affects the memory speed (access time) as well as memorypower. To reduce read access failures, the WL pulse shouldbelargeenoughtoguaranteeadequatebitlinedifferential,whichcan be sensed correctly using the sense amplifier (SA).The total power consumption for a memory in a read or writecycle can be expressed as(1)where is the total leakage power from the array and theperipheral circuitry, and and are the switchingpowers from the array and the peripheral circuitry, respectively(as shown in Fig. 2).In a read access, the array switching power can be calculatedas(2)where and are the number of bitlines and WLs in amemory bank, respectively. is the bitline capacitance perbit cell, is the bitline differential in read access (used tosense the bit cell’s stored value), is the supply voltage, andis the operating frequency.can be calculated asforfor(3)where is the bit-cell read current. To a first order, canbe approximated by assuming linear dependence on , for therange of where , as shown in Fig. 1.Therefore, from (2) and (3), the array switching power can becomputed asforfor(4)
 
358 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 18, NO. 3, MARCH 2010
From (4), it is clear that is directly proportional to(when ),whichisconrmedbythereadpower results shown in Fig. 1.A correct read operation requires to be large enough toguarantee a correct sensing using the SA. Hence, a largeimplieshavingasufficientlylarge thatenablesweakbitcells(with low ) to be correctly sensed for a given yield require-ment.Increasing increasesthereadaccessyield;however,atthe same time, increasing increases power consumption, asshown in (4). Moreover, has a direct impact on a memory’saccess time [4], [14]. Therefore, is usually set to ensure acorrect read operation for a given read access yield requirementand memory density.The bit-cell read current is strongly affected by the randomvariations in the bit-cell access device (pass gate) as wellas the pull-down device. Due to these variations, has beenshown to follow a Normal distribution with amean of and a standard deviation of [4]–[6], [14], [15].Moreover, recent measurement results for a 1-Mb memory haveconfirmed that indeed follows a Normal distribution up tofor the mean value [16].It is noteworthy to mention that, with technology/supplyscaling and with the increase of variation, the device op-eration region may change from strong to moderate inversion.In the extreme case of the device working in the subthresholdregion, distribution will be lognormal due to the exponentialdependence on [17].Random within-die (WID) variationsalso havestrong impacton the SA offset [4], [14], [18]–[20], which cause SAs to showinput offset voltages that affect the accuracy of the read opera-tion.Inaddition,systematicvariationsduetoasymmetriclayoutcan increase the SA input offset; that is why highly symmetriclayouts are typically used for SA [4], [14]. Moreover, due tothe small differential signal developed on the SA inputs, an ag-gressor located near the SA may couple a large noise at the SAinput which can affect the accuracy of the read operation. Nev-ertheless, by using layout noise shielding techniques and highlysymmetric SA layout styles, the impact of this component canbe minimized.Typically, the SA input offset due to random variations canbe modeled using a Normal distribution [19]. However, due tothe complexity of deriving a closed-form yield relation that ac-counts for both and the SA offset statistically, we treat theSA input offset as a worst case approach instead of statistical aswas used in [4], [5], and [15].Toguaranteeacorrectreadoperationandbyusingastatisticaltreatment for and a worst case approach for the SA offset, thefollowing condition should be satisfied [4], [5], [15]:(5)where is the minimum required bitline differentialvoltage, which is a function of the SA input offset and itsimmunity against coupling noise (typically ).is the mean bit-cell read current, and is the relativevariation in . is the required design coverage, which is
Fig.3.
PDFshowingthe3,4,and5
pointswhichcorrespondtodifferentmemory yield targets (assumed
= 
=15
).
related to the target yield and the memory density [4], [15], andcan be computed as(6)where is the inverse standard Normal cumulative distri-bution function, is the memory read access yield target,and are the total number of bit cells in the memory. Forexample, for a 1-Mb memory, if the target read access yield is95%, then the required design coverage is . There-fore, to achieve the same yield for a large memory density,should be increased. From (5), this means that a larger isrequired.It is important to note that the relation between andis nonlinear as shown in (3) and (5). In fact, assuming thatis a Normal distribution, the probability density function(PDF) of can be calculated using one-to-one mappingfrom (5) as follows [21] (details of the derivation are providedin Appendix A):for (7)where isthePDFfor and isthePDFfor ,whichis a Normal distribution.Figs. 3 and 4 show the distributions of both bit cell and. Note that the PDF is not symmetric, but instead, itis skewed to larger values. Moreover, the 3, 4, and 5values are shown for and the corresponding values forthese ’s. It is clear that is very sensitive to variations.For , increases four times compared with its nominalvalue (calculated using ).Because of the skewed distribution, large values of are required to ensure an acceptable read access yield. More-over, to achieve the same yield as the memory size increases, ahigher coverage is required as shown in (6), which signif-icantly increases (due to the nonlinear relation betweenand ). Therefore, due to statistical variations in the bit cell,

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->