Simulation and Calculations

Graph 1: W1, R1, W0 at TT, 2.
5V
(nominal), 27C
Graph 2: W1, R1, W0 at TT, 5.5V, 50C
Graph 3: W1, R1, W0 at TT, 5.5V, 27C
Graph 4: W1, R1, W0 at TT, 5.5V, 0C
Graph 5: W1, R1, W0 at FF, 5.5V, 50C
Graph 6: W1, R1, W0 at FF, 5.5V, 27C
Graph 7: W1, R1, W0 at FF, 5.5V, 0C
Graph 8: W1, R1, W0 at SS, 5.5V, 50C
Graph 9: W1, R1, W0 at SS, 5.5V, 27C
Graph 10: W1, R1, W0 at SS, 5.5V, 0C
Graph 11: W1, R1, W0 at SF, 5.5V,

50C
Graph 12: W1, R1, W0 at SF, 5.5V,

27C
Graph 13: W1, R1, W0 at SF, 5.5V, 0C
Graph 14: W1, R1, W0 at FS, 5.5V,

50C
Graph 15: W1, R1, W0 at FS, 5.5V,

27C
Graph 16: W1, R1, W0 at FS, 5.5V, 0C
Graph 17: W1, R1, W0 at TT, 5V, 50C
Graph 20: W1, R1, W0 at FF, 5V, 50C
Graph 23: W1, R1, W0 at SS, 5V, 50C
Graph 26: W1, R1, W0 at SF, 5V, 50C
Graph 29: W1, R1, W0 at FS, 5V, 50C
Graph 32: W1, R1, W0 at TT, 4.5V,

50C
Graph 33: W1, R1, W0 at TT, 4.5V,

27C
Graph 34: W1, R1, W0 at TT, 4.5V, 0C
Graph 35: W1, R1, W0 at FF, 4.5V,

50C
Graph 36: W1, R1, W0 at FF, 4.5V,

27C
Graph 37: W1, R1, W0 at FF, 4.5V, 0C
Graph 38: W1, R1, W0 at SS, 4.5V,

50C
Graph 39: W1, R1, W0 at SS, 4.5V,

27C
Graph 40: W1, R1, W0 at SS, 4.5V, 0C
Graph 41: W1, R1, W0 at SF, 4.5V,

50C
Graph 42: W1, R1, W0 at SF, 4.5V,

27C
Graph 43: W1, R1, W0 at SF, 4.5V, 0C
Graph 44: W1, R1, W0 at FS, 4.5V,

50C
Graph 45: W1, R1, W0 at FS, 4.5V,

27C
Graph 46: W1, R1, W0 at FS, 4.5V, 0C
Figures:
Figure A: Average Write Power
Figure B: Write 0 Simulation
Figure C: Average Read Power
Figure D: Read 1 Simulation
Figure E: Average Power 1:2 Decoder in Model

Array
Figure F: Average Power 8:256 Decoder in

SRAM
Figure G: Metric Breakdown
Delivery Item
Value
Metric (mWatts2*ns*mm2)
179633.89
Bitcell area (um2)
.000498
Total area (mm2)
.524
Read power (mW) (for a 32 bit read)
(2.288 * 32) = 73.22
Write power (mW) (for a 32 bit write)
(1.819 * 32) = 58.21
Total power (mW) average of 5 reads

for each write operation
68.916
Read delay (ns)
72.18
Write delay (ns)
38.39
Total delay (ns)
72.18
Figure H: Bit Cell Layout
Figure I: Bit Cell Layout Array
Calculations:
Calculation 1: Modeled Resistance and Capacitance on
Array Model
(Based capacitance/length and resistance/length on table of pg 144 to model BL and WL capacitances/resistances)

>Capacitance/length = 95aF/um
From layout, WL length is 27.9um for one bit cell.
For 1024 bit cells (number of bit cells across in SRAM array): 27.9um * 1024 = 28569.6um
From layout, BL length is 5.4um for one bit cell
For 1024 bit cells (number of bit in a column in SRAM array): 5.4um * 1024 = 5529.6um
Modeled Capacitance:
WL: 28569.6um * 95aF/um = 2.714 pF
BL: 5529.6um * 95aF/um = .525pF
>Resistance equation is R = R (L / W) where R = 1 and W (for BL and WL) are 1.2um

Modeled Resistance:
WL: .1*(28569.7um) / 1.2um = 2380.8
BL: .1 *(5529.6um) / 1.2um = 460.8
Calculation 2: Metric Breakdown Calculations
Bit Cell Area:
27.9um x 17.85um = .000498um2
Bit Cell Area for Entire SRAM (1024 x 1024 bit cells):
28569.6um x 18278.4um = .522mm2
(this does not include periphery)
For periphery:
Take average area of transistor (83pm2) and multiply by approximate amount of transistors added for periphery (see Calculations
4)
83pm2 * (10752 + 2058 + 1152 + 2048 + 10752) = 2.22um 2
-> assuming periphery adds about 2.22um2 , total area is .524mm2
Power:
Write power: 1.819mW
array)
Read Power: 2.288mW (both these average power values for read/write to one bit cell in a whole
Total power is average of 5 reads for every write:

(1.819mW + 2.228mW (5)) / 6 = 2.159mW
Subtract the decoder power for the 1:2 decoder used in the modeled array because the actual SRAM uses a 8:256 decoder
(55.02uW)
2.159mW 55.02uW = 2.104mW
Then, for a full 32 bit word read/write, this power is multiplied times 32:
2.104mW * 32 = 67.33mW
Add in the 8:256 decoder power used in the actual SRAM (1.586mW)
67.33mW + 1.586mW = 68.916mW
Delay:
Worst case read delay: 499.06ns 428.79ns - .694ns (delay of 1:2 decode) + 2.6ns (delay of 8:256 decode) = 72.18ns slowest delay
Worst case write delay: 403.5ns 367.01ns -.694ns (delay of 1:2 decode) + 2.6ns (delay of 8:256 decode) = 38.39ns
Metric calculation:
(Area)*(Power)*(Delay) = (mm2) * (mW2) * (nanoseconds) = (.524)*(68.916^2)*(72.18) = 179633.89
Calculation 3: Bit Cell Ratio
For Read
For Write
Calculation 4: Area Savings

By using connected bit lines and bit line bars we drastically reduced the power of the SRAM by reducing the number of
transistors needed, saving area.
By having a MUX that feeds to the output from each of the four block columns as opposed to each of the 16 blocks, we were
able to use:
6 transistors per MUX
7 MUXs across x 32 MUXs for a 8 32 input MUX
2 8 32 input MUXs per block column
4 Block columns per SRAM
6 x 7 x 32 x 2 x 4 = 10752 transistors/SRAM
As opposed to:
6 x 7 x 32 x 2 x 16(each block) = 43008 transistors/SRAM
Saving 32256 transistors
Pre-charging only the block columns as opposed to each block allowed us to use only:
256 transistors per bit line and bit line bar
4 per block column
256 x 2 x 4 = 2048 transistors/SRAM
As opposed to:
256 x 2 x 16(each block) = 8192 transistors/SRAM
We also saved area by using Sense Amps for each block column as opposed to each word of each block
9 transistors per Sense Amp
32 Sense Amps per block column
4 block columns per SRAM
As opposed to using a sense amp for each word in a block in each of the 16 blocks:
9 x 32 x 8(words/block) x 16(each block) = 36864 transistors/SRAM
The data at the bottom of the SRAM cell that enters into each block column as opposed to each block uses:
256 transistors per bit line and bit line bar
4 block columns per SRAM
As opposed to:
256 x 2 x 16(each block) = 8192 transistors/SRAM
Calculation 5: Clock Buffer
The way the clock buffer was sized was an approximation of the number of stages it was driving. Here is a list of the stages that
the CLK signal drives:
Hierarchical Pre-charging : 512
Decoder Enable Logic: 2 ANDs per x 4 block columns
Block Select Enable: 4 ANDs
Input Register: 2ANDs per input x 32 inputs
Output Register: 2 ANDs per output x 32 outputs
512 + 2x4 + 4 + 2x32 + 2x32 = 652 stages
Using this as our primary metric for buffer sizing, assuming an FO4 to obtain the minimum delay, the optimal number of stages
was 4. We used 4 inverters, sized 4x larger than the previous, which was ultimately used to drive all 652 of the previously
mentioned stages. By using hierarchical pre-charging, we were able to allow charging for only 512 transistors as opposed to
2048 for every bit line and bit line bar.
*This calculation was under the assumption that the inputs are ideal if driving less than 512 stages. The CLK signal was the only
one of our signals that drove more than this number.
Calculation 6: VDD, Clock Sensitivity, and Shorted Bit

Line Tradeoffs
These calculations analyze some of the power and delay tradeoffs when lowering VDD and the clock period. The
end of this section also details the delay for a read using shorted bit lines versus non shorted bit lines. These
simulations were for a read and write to ONE bit cell in a modeled array because of the difficulty in simulating an
entire 32 bit read/write.
Originally, the simulation for read and write was at a minimum clock period for functionality was the following
(1) VDD=5V and Clock period of 280ns, pulse width 80ns, and transient simulation time of 560ns
The average power of the read was 14.1mW and write was 15.56mW
Considering the total power to be 5 reads for every write, the total power for one bit cell is roughly
(15.56mW + 14.1mW (5)) / 6 =
14.3mW
(2) To reduce this power, we tried to reduce VDD to 2.5V and kept the same clocking as (1)
The average power was REDUCED for a read to 2.288mW and write 1.819mW
This is about 6 times less power for read, and 8.5 times less power for a write
At this VDD and clock, the read delay is 72.18ns and write delay is 38.39ns
Considering the total power to be 5 reads for every write, the total power for one bit cell is roughly
(1.819mW + 2.228mW (5)) / 6 =
2.159mW
---Can the power be reduced further by altering the clock?

(3) Try to reduce the power further by keeping VDD at 2.5V and changing Clock period to 280us, pulse width 80us,
and transient simulation time of 560us (now time is on MICRO-second scale)
The average power is REDUCED for a read to 108.7uW and INCREASED significantly for a write
to 21.48mW
Compared to (2), this is about 21 times less power for a read, but almost 12 times more power
for a write
The delay for read and write is almost approximately the same as (2) so there is significant
tradeoff for delay
How does this compare for total power to (2)? Considering the total power to be 5 reads for every
write, the total power is roughly
(21.48mW+ 108.7uW (5)) / 6 =
3.671mW which is GREATER than (2), therefore (2) is a better choice
for low power
Going a step further, maybe increasing VDD back to 5V, but keeping the Clocking the same as (3) will lower power:
(4) VDD at 5V and Clock period to 280us, pulse width 80us, and transient simulation time of 560us (now time is on
MICRO-second scale)
The average power is significantly INCREASED for a read to 1.924mW and INCREASED
significantly for a write to 440.2mW
Compared to (3), this is about 17 times more power for a read and 20.5 times more power for
a write
Delay is not much different than (3), so once again, no significant tradeoff delay
Obviously, this is way more power and not a good option. To quantitatively verify, the total power for
(4) is:
(440.2mW+ 1.924mW (5)) / 6 =
74.97mW (MUCH greater than the total power for either (1), (2), or
(3))
CONCLUSION: Operating at (3) will give the lowest power option ( VDD at 2.5V and Clock period
280us, pulse width 80us, and transient simulation time of 560us). (3) was used for calculations in the
Metric Breakdown.
The following calculation demonstrates the read delay for our SRAM design, using the shorted BL and BLBs. This is
compared to the read delay for a non-shorted BL and BLB.
For the shorted BL/BLB, across 4 blocks the read delay is:
499.06ns - 428.78ns = 70.27ns
For the read across 1 block (i.e. not shorted bit lines), the delay is:
463.49ns - 428.78ns = 34.71ns
The comparison demonstrates that by shorting the BL/BLB for a block column (that is, reading the BL/BLB across 4
blocks), there is an increase in delay, but it is not 4 times the delay for a read to one block (which one might
expect). We sacrifice this delay to reduce the area of the SRAM significantly (detailed in Calculation 4) and reduce
the overall power metric.

Simulation and Calculations

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Simulation and Calculations

Uploaded by

Copyright:

Available Formats

Graph 1: W1, R1, W0 at TT, 2.

Graph 2: W1, R1, W0 at TT, 5.5V, 50C

Graph 3: W1, R1, W0 at TT, 5.5V, 27C

Graph 4: W1, R1, W0 at TT, 5.5V, 0C

Graph 5: W1, R1, W0 at FF, 5.5V, 50C

Graph 6: W1, R1, W0 at FF, 5.5V, 27C

Graph 7: W1, R1, W0 at FF, 5.5V, 0C

Graph 8: W1, R1, W0 at SS, 5.5V, 50C

Graph 9: W1, R1, W0 at SS, 5.5V, 27C

Graph 10: W1, R1, W0 at SS, 5.5V, 0C

Graph 11: W1, R1, W0 at SF, 5.5V,

Graph 12: W1, R1, W0 at SF, 5.5V,

Graph 13: W1, R1, W0 at SF, 5.5V, 0C

Graph 14: W1, R1, W0 at FS, 5.5V,

Graph 15: W1, R1, W0 at FS, 5.5V,

Graph 16: W1, R1, W0 at FS, 5.5V, 0C

Graph 17: W1, R1, W0 at TT, 5V, 50C

Graph 18: W1, R1, W0 at TT, 5V, 27C

Graph 19: W1, R1, W0 at TT, 5V, 0C

Graph 20: W1, R1, W0 at FF, 5V, 50C

Graph 21: W1, R1, W0 at FF, 5V, 27C

Graph 22: W1, R1, W0 at FF, 5V, 0C

Graph 23: W1, R1, W0 at SS, 5V, 50C

Graph 24: W1, R1, W0 at SS, 5V, 27C

Graph 25: W1, R1, W0 at SS, 5V, 0C

Graph 26: W1, R1, W0 at SF, 5V, 50C

Graph 27: W1, R1, W0 at SF, 5V, 27C

Graph 28: W1, R1, W0 at SF, 5V, 0C

Graph 29: W1, R1, W0 at FS, 5V, 50C

Graph 30: W1, R1, W0 at FS, 5V, 27C

Graph 31: W1, R1, W0 at FS, 5V, 0C

Graph 32: W1, R1, W0 at TT, 4.5V,

Graph 33: W1, R1, W0 at TT, 4.5V,

Graph 34: W1, R1, W0 at TT, 4.5V, 0C

Graph 35: W1, R1, W0 at FF, 4.5V,

Graph 36: W1, R1, W0 at FF, 4.5V,

Graph 37: W1, R1, W0 at FF, 4.5V, 0C

Graph 38: W1, R1, W0 at SS, 4.5V,

Graph 39: W1, R1, W0 at SS, 4.5V,

Graph 40: W1, R1, W0 at SS, 4.5V, 0C

Graph 41: W1, R1, W0 at SF, 4.5V,

Graph 42: W1, R1, W0 at SF, 4.5V,

Graph 43: W1, R1, W0 at SF, 4.5V, 0C

Graph 44: W1, R1, W0 at FS, 4.5V,

Graph 45: W1, R1, W0 at FS, 4.5V,

Graph 46: W1, R1, W0 at FS, 4.5V, 0C

Figure A: Average Write Power

Figure B: Write 0 Simulation

Figure C: Average Read Power

Figure D: Read 1 Simulation

Figure E: Average Power 1:2 Decoder in Model

Figure F: Average Power 8:256 Decoder in

Figure G: Metric Breakdown

Bitcell area (um2)

Total area (mm2)

Read power (mW) (for a 32 bit read)

(2.288 * 32) = 73.22

Write power (mW) (for a 32 bit write)

(1.819 * 32) = 58.21

Total power (mW) average of 5 reads

Read delay (ns)

Write delay (ns)

Total delay (ns)

Figure H: Bit Cell Layout

(Area)(Power)(Delay) = (mm2) * (mW2) * (nanoseconds) = (.524)(68.916^2)(72.18) = 179633.89