You are on page 1of 35

ABSTRACT

Die-stacking opens up avenues to integrate multiple layers of DRAM on top of the processor die.
Initial implementations have considered die-stacking of DRAM caches. DRAM caches being
large in size, implementing them with conventional block sizes would require large tag store
overhead. A large SRAM tag store results in increased block access latency and leakage energy
consumption. On the other hand, placing the tag store in DRAM leads to large state write traffic
arising from replacement policy information update after each cache access and compound
DRAM cache access for reading the tag and data. This leads to wastage of precious DRAM
cache bandwidth and large dynamic energy consumption. This work presents a middle ground by
implementing a tag cache to augment the DRAM cache. The tag cache replicates a subset of the
tags in SRAM, while the full tag store is maintained in DRAM. The tag cache augmented setassociative DRAM cache also utilises a MAP-G miss predictor and a filter cache to identify the
popular tags that should be kept in the tag cache. These two structures together help decrease the
average cache access latency. The proposed implementation of the DRAM cache organisation is
compared to the Alloy cache which is currently the state of the art and a baseline implementation
without the DRAM cache. While the proposed cache organisation at 256 MB capacity running
with a four-core processor is found to have 16% improvement over the baseline implementation,
it falls short of Alloy cache by 4% for a selected set of workloads in our simulation-driven
empirical evaluation. We develop an analytical model to explain the performance gap between
the tag cache-based organisation and the Alloy cache.

You might also like