You are on page 1of 4

Burst Mode Memories Improve Cache Design Zwie Amitai, Product Planing and Applications Manager David C.

Wyland,Vice President of Engineering Quality Semiconductor, Inc. 851 Martin Avenue Santa Clara. CA 95050-2903 Tel: (408) 986-8326 Fax: (408) 496-0591 ABSTRACT Burst mode memories improve cache design by improving refill time on cache misses. Burst mode RAMs allow refill of a four word cache line in five clock cycles at 50 mHz rather than the eight clock cycles that would be required for a conventional SRAM. Burst mode RAMs also have clock synchronous interfaces which make them easier to design into systems, particukirly at clock rates of 25 mHz and above. Clock Figure 2: Burst RAM Read Timing

:

,

,
8

I

mAlT
Counter

E l
A burst mode RAM provides high speed transfer of a block of sequential words, called a burst. A block diagram of a burst mode SRAM is shown in Figure 1. A burst mode RAM consists of a conveniional SRAM plus an address counter, a read/write flip flop and a write register. Read and write timing is controlled by a clock in combination with the address counter load and read/write signals. In this configuration, random access to a word in the SRAM requires two clock cycles with successive words being read or written at one clock cycle per word. This is shown in the timing diagrams of Figures 2 and 3. Figure 1 : Burst RAM Block Diagram

Read Data For write operations, the first word of data to be written is clocked in to the write register at the same time the address counter and the read/write flip flop are loaded, as shown in Figure 3. Data from the write register is written into the SRAM during the second clock cycle. At the end of the second clock cycle, new data is clocked into the write register and address counter is incremented to the next location to write the next sequential word. Figure 3: Burst RAM Write Timing Clock Address

clOc Address--)l

kTf14T: f
T

Write Register

,
L

:

I

!

LJ

ReadMlrite Data In the read timing diagram of Figure 2,the first clock cycle is used to load the address counter and the readhnrrite flip flop 1 for random access 10 the first word. Read data comes out of the SRAM before Ihe end of the second clock cycle. l h e address counter is incremented at the end of the second clock cycle, and the next word is read from the SRAM. This allows one clock cycle per successive word read following the initial random access.

f

mAD
Counter

ww

-

Write Reg
I

SRAM Wriie (Internal)

1

,

I b

, , ,

279

Authorized licensed use limited to: IEEE Xplore. Downloaded on April 8, 2009 at 11:36 from IEEE Xplore. Restrictions apply.

the cache checks to see if it has a copy. It can be estimated by the product of cache miss rate and the number of wait states requiredfor cache refill on a miss. in typical programs. making the burst mode memory particularly well suited to the newer generations of high speed RlSC and ClSC chips. If the upper address bits do not match. .2 ns to the memory cycle time using the QSFCT161A. In a burst mode SRAM. a cache data RAM and a small amount of logic to control events when a cache hit or a cache miss occurs. The most significant bits of the address are compared against the bits stored in the tag RAM. The three wait states without a cache are determined by the timing requirementsof the main memory. both the tag and data RAMSare 8K words deep. RlSC design is based on cache memory. The function of a cache memory is to improve the effective access time of the main memory. This adds 6.5 wait states with a cache. Restrictions apply. In the example shown. When a read request is made to main memory. and the most significant bits identify the region of memory that they came from. the main memory is started on a normal read cycle. This is a cache hit. the least significant bits of the address bus are sent to both the tag and data RAMS while the most significant bits are stored in the tag RAM when data is stored in the cache data RAM. the least significant bits of the address are used to select one of the 8K words in both memories. The 0. The cache does this by keeping copies of the most frequently read words from main memory in a small. A direct mapped cache consists of a cache tag RAM. A iss occurs when the word is not found in the cache. by eliminating processor wait states. the data stored came from a different location.2 ns. then the data stored in the data RAM is a copy of the data at the requested location and can be immediately supplied to the processor. Clock speeds of up to 50 mHz are possible in a TTL system. If the cache is larger than this cluster size. A direct mapped cache for a 32-bit processor is shown in Figure 4. In an conventional burst mode memory system design using an SRAM and an address counter. the minimum cycle time of the burst operation is approximatelythe same as the address access time of an equivalent SRAM. (Cache theory is a little more subtle than this. Downloaded on April 8. it responds immediately. If not. most of the read data will be provided by the cache. Caches are effective because most of the memory accesses are read cycles from a relatively small cluster of memory locations. This greatly reduces the read and write cycle times for sequential data following the first access. The cycle time is therefore increased by the delay of the address counter. 2009 at 11:36 from IEEE Xplore. A cache hit is said to occur if a requested word is found in the cache.The burst mode memory is capable of high speed operation after the initial access because the sequential addresses are generated internally by the address counter. one of the fastest counters commercially available. Direct mapped caches work because most accesses to main few thousand words located somewhere in the memory space.5 wait states is a statisticalaverage. CACHF MFMORY IN RlSC AND ClSC PROCESSQBS The use of cache memories has become a standard feature of high performance processor design.8 ns SRAM would be required to achieve the 20 ns cycle time of a burst mode RAM. Alternately. If a 20 ns SRAM is used. If it does. high speed buffer memory. Cache performancecan be defined in terms of effective wait states with a cache relative to the number of wait states without n. In the direct mapped cache. Indeed. a 13. When the processor attempts to read a word from main memory. and the processor waits for it to respond. Figure 4: Cache Block Diagram Burst mode RAMS are faster than SRAM based memory systems because the address counter is integrated into their design. the minimum minimum cycle time is determined by the sum of the clock to output delay of the counter plus the address access time of the SRAM. 32-Bit FP Main Memory (DRAM) Ready HiVMiss Control Logic ReadMlrite The cache stores copies of words read from main memory in the cache data RAM and stores the location these words are read from in the cache tag RAM. If there is a match between the two. This is a cache miss. This can be as low as 20 ns.) memory are typically to a small cluster of a 280 Authorized licensed use limited to: IEEE Xplore. The cache therefore speeds up the system by reducing the average amount of time the processor has to wait to read a word from memory. the minimum cycle is 26. It treats the least significant bits of the address as a hashing function for a hash indexed buffer. A 33 mHz processor witn medium Speed DRAM memory may require three wait states without a cache and 0. usually medium speed DRAM. The least significant bits of the address bus are used to index within this cluster of words.

The full 8K words of the 8813 are used to support the 32K words of the 8839. -5 A Ratch -Enable 7 Hit/Miss Control Logic 1 281 Authorized licensed use limited to: IEEE Xplore. 2009 at 11:36 from IEEE Xplore. *= 2: CD Tag RAM 2Kx15 l x QS8813 + Data RAM 8Kx32 2x QS8811 . however. . * w=l CD Data Address A1 5-A: 1 Tag Data RAM *= o m ilKxl5 l x CIS881 3 A2. They are particularly useful at CPU clock speeds above 25 mHz due to their higher performance and simpler interface. Figure 7: 80486 '128K Byte Cache Block Diagram . Only 2K of the 8K are used. . This eliminates the need for additional logic in the propagation delay path between the Tag SRAM and the microprocessor. The QS8813 is an 8Kx18 Tag SRAM with built-in match enable logic that allows it to directly drive the BRDY input of the 80486.3 CONCl (ISION 8x QS81589 Burst mode memories provide performance improvementfor the cache systems used in high speed ClSC and RlSC systems which use multiple words per cache line. Both the 8811 and 8339 Burst Mode RAM chips provide an on-chip address counter and logic for burst mode operation. The design of Figure 7 uses one QS8813 8Kx18 Tag RAM and four QS8839 32Kx9 Burst Mode RAMs for the tag and data memories respectively.3 4 Ratch -Enable 7 HWMiss Control Logic The design of Figure 6 uses one 088813 8Kx18 Tag RAM and two OS8811 8Kx18 Burst Mode RAMs for the tag and data memories respectively. the burst counter on the 8811 count in either binary or 80486 counting modes. Read Data b . Also. pin selectable. Restrictions apply.Figure 6: 80488'32K Byte Cache Block Diagram Figure 8: 80486 Cache Timing Diagram T1 Clock T2 T2 T2 T2 T1 . The complete design requires only three RAM chips. This can save five or more nanoseconds in match time. The address counter provides for bursts of up to four words using the 80486 address counting algorithm. w0 a A4-Alb . Downloaded on April 8. the E1813 provides a single chip design solution for the TAG RAM. Address m A2. . Because of these advantages. burst mode memories are becoming a standard component for cache design of high speed systems.

This is not easy at 33-50 mHz clock rates. the data cache is four times as deep as the cache tag memory. This is because the cache memory must be capable of two cycle first access in normal operation as well as burst mode operation for refill on a miss. by reducing the number of wait states required to load the four words. Performance of the four word per line cache of Figure 5 can be improved. so the one word per line cache has a small advantage for the same reload timing. in the Intel 80486. Figure 6 shows a four word per line 32 KByte secondary cache for an 80486 using 8kx18 burst mode SRAMs for the data portion of the cache and an 8K x 18 tag RAM with on-board comparator for the tag memory. On a cache miss. A 128 KByte cache using this architecture is shown in Figure 7. This architecture provides a 32 KByte cache in three chips expandable to 128 KBytes in nine chips using the same tag RAM. 2009 at 11:36 from IEEE Xplore.Qche P e r f o r m e vs Reload T i m Cache performance is defined by miss rate and reload time.e. 32-Bit PP Main Memory (DRAM) Ready ReadNVrite 282 Authorized licensed use limited to: IEEE Xplore. The two least significant bits of the address bus go to the cache data memory but do not go to the tag memory. For example. If all four words are eventually used by the processor and if four wait states are required per word. Miss rate is the percentage of accesses that miss.the common tag for the four locations . the small.. This may require three wait states in a conventional access and four wait states with a cache. A burst mode memory with two cycle first access and one cycle per word thereafter can accept data at the rates capable of being generated by the interleaved DRAM main memory. In this design. The four word per line cache has an implied requirement that the cache data memory must be capable of absorbing data at one clock cycle per word. Target miss rates are in the 2-20% range. The burst mode memory provides a natural advantage at these speeds. The cache system has an extra wait state because one clock cycle is required to determine if the data is in the cache before main memory access can be started on a miss.is written at the same time. and a single tag . off-chip secondary cache. a total of 16 wait states will be used by either cache to load the four words from main memory. Miss rates are like EPA gas mileage estimates: with different programs. Generally. however. Restrictions apply. cache organization and the statistics of the program running on the processor. In some cases. not all four words will be used. caches range from 16 KBytes to 256 KBytes in size. A FOUR WORD PER LINE CACHE Cache refill performance can be improved by loading more than one word on a miss. The burst access memory is particularly useful for cache memory reload because the interleavingtechniques that can be applied in main memory using static column or nibble mode access generally result in unacceptable chip count and propagation delay when attempted in the cache. Cache reload time for the cache in Figure 4 is the time to access one word out of main memory. on-board cache uses a four word per line refill which is typically supplied from a larger. four words are loaded into the cache data memory. A cache using this approach is shown in Figure 5.is not changed. BURST MODF IN SECONDARY CACHES Burst mode operation is becoming a widely used standard in both RlSC and ClSC processors.i. your miss rate will vary from benchmark estimates. where a line refers to the amount of data fetched on a cache miss. The main memory can be designed using interleaving techniques to provide the first word in four wait states and the next three words at one wait state each for a total of 7 wait states rather than 16. In this case. Downloaded on April 8. This approximately doubles the performanceof the cache. A timing diagram for both designs is shown in Figure 8 . burst mode operation is used by the secondary cache both in its normal operating mode of supplying data to the 80486 as well as the reload on a miss mode when it receives data from main memory. . The miss rate of a cache is a function of cache size. This is called a four word per line cache memory. with larger caches having lower miss rates. Figure 5 : Four Word/Line Cache Block Diagram Data Four WOrd/Line Gache Performance Changing the cache from one word per line to four words per line does not change performance significantly if the reload timing . number of wait states per word . and reload time is the number of wait states required to get the data for the processor and reload the cache on a miss.