Shashank Aca Assignment

ADVANCE COMPUTER
ARCHITECTURE
Assignment
BY
SHASHANK B.V (17030141CSE035)
Under the guidance of

Shekhar.R
Computer Science and Engineering
Department of Computer Science and Engineering

Alliance College of Engineering and Design
Alliance University
Bangalore
ABSTRACT:
Cache memory is one of the fastest memories inside a computer which
acts as a buffer or mediator between CPU and Memory (RAM). When CPU
requires some data element it goes to Cache and it that data element is
present in cache, it fetches it; otherwise, cache controller requests the data
from memory. Cache contains most frequently accessed data locations.
While giving high speed of data access, cache is equally expensive as
compared to other memories in a machine. Major purpose of a Cache is to
reduce the memory access time because going to primary memory costs a
lot more time compared to cache. With the development of high speed
processors, memory access has been a bottleneck for the throughput of
computational machines for decades. Multiple advances have been carried
out to improve the throughput of computers, one of which was the
introduction of cache memory. There are two main parts of the cache one
of which is Directory which stores the addresses of the lines and the other
one is Data line which holds the data that is stored in the cache memory
which is further addressed by the directory.
INTRODUCTION:
Cache Memory is a special very high-speed memory. It is used to speed up and
synchronizing with high-speed CPU. Cache memory is costlier than main memory
or disk memory but economical than CPU registers. Cache memory is an extremely
fast memory type that acts as a buffer between RAM and the CPU. It holds
frequently requested data and instructions so that they are immediately available to
the CPU when needed.
Cache memory is used to reduce the average time to access data from the Main
memory. The cache is a smaller and faster memory which stores copies of the data
from frequently used main memory locations. There are various different
independent caches in a CPU, which store instructions and data.
Levels of memory:
 Level 1 or Register –
It is a type of memory in which data is stored and accepted that are
immediately stored in CPU. Most commonly used register is
accumulator, Program counter, address register etc.
 Level 2 or Cache memory –
It is the fastest memory which has faster access time where data is
temporarily stored for faster access.
 Level 3 or Main Memory –
It is memory on which computer works currently. It is small in size and
once power is off data no longer stays in this memory.
 Level 4 or Secondary Memory –
It is external memory which is not as fast as main memory but data
stays permanently in this memory.
Cache Performance:
When the processor needs to read or write a location in main memory, it first checks
for a corresponding entry in the cache.
 If the processor finds that the memory location is in the cache, a cache
hit has occurred and data is read from cache
 If the processor does not find the memory location in the cache,
a cache miss has occurred. For a cache miss, the cache allocates a new
entry and copies in data from main memory, then the request is fulfilled
from the contents of the cache.
The performance of cache memory is frequently measured in terms of a quantity
called Hit ratio.
Hit ratio = hit / (hit + miss) = no. of hits/total accesses
We can improve Cache performance using higher cache block size, higher
associativity, reduce miss rate, reduce miss penalty, and reduce the time to hit in the
cache.
Cache Mapping:
There are three different types of mapping used for the purpose of cache memory
which are as follows: Direct mapping, Associative mapping, and Set-Associative
mapping. These are explained below.
1. Direct Mapping –
The simplest technique, known as direct mapping, maps each block of
main memory into only one possible cache line. or
In Direct mapping, assigne each memory block to a specific line in the
cache. If a line is previously taken up by a memory block when a new
block needs to be loaded, the old block is trashed. An address space is
split into two parts index field and a tag field. The cache is used to store
the tag field whereas the rest is stored in the main memory. Direct
mapping`s performance is directly proportional to the Hit ratio.
2. i = j modulo m
3. where
4. i=cache line number
5. j= main memory block number
m=number of lines in the cache
For purposes of cache access, each main memory address can be viewed
as consisting of three fields. The least significant w bits identify a unique
word or byte within a block of main memory. In most contemporary
machines, the address is at the byte level. The remaining s bits specify
one of the 2s blocks of main memory. The cache logic interprets these s
bits as a tag of s-r bits (most significant portion) and a line field of r bits.
This latter field identifies one of the m=2r lines of the cache.
6. Associative Mapping –
In this type of mapping, the associative memory is used to store content
and addresses of the memory word. Any block can go into any line of
the cache. This means that the word id bits are used to identify which
word in the block is needed, but the tag becomes all of the remaining
bits. This enables the placement of any word at any place in the cache
memory. It is considered to be the fastest and the most flexible mapping
form.
7. Set-associative Mapping –
This form of mapping is an enhanced form of direct mapping where the
drawbacks of direct mapping are removed. Set associative addresses the
problem of possible thrashing in the direct mapping method. It does this
by saying that instead of having exactly one line that a block can map to
in the cache, we will group a few lines together creating a set. Then a
block in memory can map to any one of the lines of a specific set..Set-
associative mapping allows that each word that is present in the cache
can have two or more words in the main memory for the same index
address. Set associative cache mapping combines the best of direct and
associative cache mapping techniques.
In this case, the cache consists of a number of sets, each of which
consists of a number of lines. The relationships are
m = v * k
i= j mod v
where
i=cache set number
j=main memory block number
v=number of sets
m=number of lines in the cache number of sets
k=number of lines in each set
DESIGN AND IMPLEMENTAION:

package com.crunchify.tutorials;

import java.util.ArrayList;
import org.apache.commons.collections.MapIterator;
import org.apache.commons.collections.map.LRUMap;
/**
* @author Crunchify.com
*/
public class CrunchifyInMemoryCache<K, T> {
private long timeToLive;
private LRUMap crunchifyCacheMap;
protected class CrunchifyCacheObject {
public long lastAccessed = System.currentTimeMillis();
public T value;
protected CrunchifyCacheObject(T value) {
this.value = value;
}
}
public CrunchifyInMemoryCache(long crunchifyTimeToLive, final long
crunchifyTimerInterval, int maxItems) {
this.timeToLive = crunchifyTimeToLive * 1000;
crunchifyCacheMap = new LRUMap(maxItems);
if (timeToLive > 0 && crunchifyTimerInterval > 0) {
Thread t = new Thread(new Runnable() {
public void run() {
while (true) {
try {
Thread.sleep(crunchifyTimerInterval * 1000);
} catch (InterruptedException ex) {
}
cleanup();
}
}
});
t.setDaemon(true);
t.start();
}
}
public void put(K key, T value) {
synchronized (crunchifyCacheMap) {
crunchifyCacheMap.put(key, new CrunchifyCacheObject(value));
}
}
@SuppressWarnings("unchecked")
public T get(K key) {
CrunchifyCacheObject c = (CrunchifyCacheObject)
crunchifyCacheMap.get(key);
if (c == null)
return null;
else {
c.lastAccessed = System.currentTimeMillis();
return c.value;
}
}
}
public void remove(K key) {
crunchifyCacheMap.remove(key);
}
}
public int size() {
return crunchifyCacheMap.size();
}
}

@SuppressWarnings("unchecked")
public void cleanup() {
long now = System.currentTimeMillis();
ArrayList<K> deleteKey = null;
MapIterator itr = crunchifyCacheMap.mapIterator();
deleteKey = new ArrayList<K>((crunchifyCacheMap.size() / 2) + 1);
K key = null;
CrunchifyCacheObject c = null;
while (itr.hasNext()) {
key = (K) itr.next();
c = (CrunchifyCacheObject) itr.getValue();
if (c != null && (now > (timeToLive + c.lastAccessed))) {
deleteKey.add(key);
}
}
}
for (K key : deleteKey) {
crunchifyCacheMap.remove(key);
}
Thread.yield();
}
}
CrunchifyInMemoryCacheTest.java
(checkout all comments inside for understanding)
package com.crunchify.tutorials;
import com.crunchify.tutorials.CrunchifyInMemoryCache;
/**
* @author Crunchify.com
*/
public class CrunchifyInMemoryCacheTest {
public static void main(String[] args) throws InterruptedException {
CrunchifyInMemoryCacheTest crunchifyCache = new
CrunchifyInMemoryCacheTest();
System.out.println("\n\n==========Test1:
crunchifyTestAddRemoveObjects ==========");
crunchifyCache.crunchifyTestAddRemoveObjects();
crunchifyTestExpiredCacheObjects ==========");
crunchifyCache.crunchifyTestExpiredCacheObjects();
crunchifyTestObjectsCleanupTime ==========");
crunchifyCache.crunchifyTestObjectsCleanupTime();
}

private void crunchifyTestAddRemoveObjects() {
// Test with crunchifyTimeToLive = 200 seconds
// crunchifyTimerInterval = 500 seconds
// maxItems = 6
CrunchifyInMemoryCache<String, String> cache = new
CrunchifyInMemoryCache<String, String>(200, 500, 6);
cache.put("eBay", "eBay");
cache.put("Paypal", "Paypal");
cache.put("Google", "Google");
cache.put("Microsoft", "Microsoft");
cache.put("IBM", "IBM");
cache.put("Facebook", "Facebook");
System.out.println("6 Cache Object Added.. cache.size(): " + cache.size());
cache.remove("IBM");
System.out.println("One object removed.. cache.size(): " + cache.size());

cache.put("Twitter", "Twitter");
cache.put("SAP", "SAP");
System.out.println("Two objects Added but reached maxItems.. cache.size():
" + cache.size());
}
private void crunchifyTestExpiredCacheObjects() throws InterruptedException
// Test with crunchifyTimeToLive = 1 second
// crunchifyTimerInterval = 1 second
// maxItems = 10
cache.put("eBay", "eBay");
cache.put("Paypal", "Paypal");
// Adding 3 seconds sleep.. Both above objects will be removed from
// Cache because of timeToLiveInSeconds value

Thread.sleep(3000);
System.out.println("Two objects are added but reached timeToLive.
cache.size(): " + cache.size());
}
private void crunchifyTestObjectsCleanupTime() throws InterruptedException {
int size = 500000;
// Test with timeToLiveInSeconds = 100 seconds
// timerIntervalInSeconds = 100 seconds
// maxItems = 500000
for (int i = 0; i < size; i++) {
String value = Integer.toString(i);
cache.put(value, value);
}
Thread.sleep(200);
long start = System.currentTimeMillis();
cache.cleanup();
double finish = (double) (System.currentTimeMillis() - start) / 1000.0;
System.out.println("Cleanup times for " + size + " objects are " + finish + "
s");
}
}
OUTPUT:
CONCLUSION:
The main achievement and contribution of this paper have been to propose a new
dual unit tile/line access cache memory in improving the performance of column-
directional cache memory access. We proposed a hierarchical hybrid Z-ordering
data layout that can improve 2D data locality and exploit hardware prefetching
well. In addition, we proposed a ATSRA tag memory reduction method that can
minimize the hardware scale of the proposed cache. After analysis and
consideration of the experimental results, we proved that the proposed cache
achieves both parallel unit tile/line accessibility by the minimal overhead hardware
increase although its access efficiency can be outperformed as compared to those
previous studies.
Finally, we modified the SimpleScalar simulator to evaluate the performance of
parallel unit tile/line access and provided the following important conclusions: (1)
While providing column-directional parallel access capability, the proposed cache
succeeds to suppress the conflicts miss increase to the power-of-two sized matrix
computation as compared with that of the raster layout and Z-Morton order layout.
As a result, the proposed cache enables almost one cycle load of 8 double precision
data in both row and column directions so that it not only makes matrix
transposition unnecessary but also allows effective utilization of 2D reference
locality by SIMD operations. (2) The proposed ATSRA cache provides a high-
performance of the parallel unit tile/line with the minimal hardware increase to the
normal DL1 cache structure. (3) The number of parallel load instructions required
for parallel unit tile/line access is reduced to about one-fourth of that required for
conventional raster line access. The proposed cache with tile/line accessibility
further improves the performance of 2D applications by using SIMD instructions.
In future, we will combine the proposed cache with SIMD processor and evaluate
the unit tile/line access performance for parallel data access.
REFERENCES:
1. H. Chang and W. Sung, “Efficient vectorization of SIMD programs with
non-aligned and irregular data access hardware,” in Proceedings of the
International Conference on Compilers, Architectures and Synthesis for Embedded
Systems, pp. 167–175, New York, NY, USA, 2008.View at: Google Scholar
2. M. Alvarez, E. Salami, A. Ramírez, and M. Valero, “Performance impact of
unaligned memory operations in SIMD extensions for video codec applications,”
in Proceedings of the International Symposium on Performance Analysis of
Systems and Software, 2007.View at: Google Scholar
3. J. Thiyagalingam, O. Beckmann, and P. Kelly, “Minimizing associativity
conflicts in Morton layout,” in Proceedings of the Parallel Processing and Applied
Mathematics, pp. 1082–1088, Poznan, Poland, 2006.View at: Google Scholar
4. J. Thiyagalingam, O. Beckmann, and P. Kelly, “Improving the performance
of morton layout by array alignment and loop unrolling,” in Proceedings of the
16th International Workshop on Languages and Compilers for Parallel
Computing, pp. 241–257, 2003.View at: Google Scholar
5. D. W. Walker, “Morton ordering of 2D arrays for efficient access to
hierarchical memory,” International Journal of High Performance Computing
Applications, vol. 32, no. 1, pp. 189–203, 2018.View at: Publisher Site | Google
Scholar
6. N. Park, B. Hong, and V. K. Prasanna, “Tiling, block data layout, and
memory hierarchy performance,” IEEE Transactions on Parallel and Distributed
Systems, vol. 14, no. 7, pp. 640–654, 2003.View at: Publisher Site | Google Scholar

Shashank Aca Assignment

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Shashank Aca Assignment

Uploaded by

Copyright:

Available Formats

ADVANCE COMPUTER

Under the guidance of

Department of Computer Science and Engineering

DESIGN AND IMPLEMENTAION:

public class CrunchifyInMemoryCache<K, T> {

private long timeToLive;

private LRUMap crunchifyCacheMap;

protected class CrunchifyCacheObject {

public long lastAccessed = System.currentTimeMillis();

protected CrunchifyCacheObject(T value) {

public CrunchifyInMemoryCache(long crunchifyTimeToLive, final long

crunchifyTimerInterval, int maxItems) {

this.timeToLive = crunchifyTimeToLive * 1000;

crunchifyCacheMap = new LRUMap(maxItems);

if (timeToLive > 0 && crunchifyTimerInterval > 0) {

Thread t = new Thread(new Runnable() {

public void run() {

} catch (InterruptedException ex) {

public void put(K key, T value) {

crunchifyCacheMap.put(key, new CrunchifyCacheObject(value));

public T get(K key) {

public void remove(K key) {

public int size() {

public void cleanup() {

long now = System.currentTimeMillis();

ArrayList<K> deleteKey = null;

MapIterator itr = crunchifyCacheMap.mapIterator();

deleteKey = new ArrayList<K>((crunchifyCacheMap.size() / 2) + 1);

K key = null;

key = (K) itr.next();

c = (CrunchifyCacheObject) itr.getValue();

if (c != null && (now > (timeToLive + c.lastAccessed))) {

for (K key : deleteKey) {

(checkout all comments inside for understanding)

public class CrunchifyInMemoryCacheTest {

public static void main(String[] args) throws InterruptedException {

CrunchifyInMemoryCacheTest crunchifyCache = new

private void crunchifyTestAddRemoveObjects() {

// Test with crunchifyTimeToLive = 200 seconds

// crunchifyTimerInterval = 500 seconds

CrunchifyInMemoryCache<String, String> cache = new

CrunchifyInMemoryCache<String, String>(200, 500, 6);

System.out.println("6 Cache Object Added.. cache.size(): " + cache.size());

System.out.println("One object removed.. cache.size(): " + cache.size());

System.out.println("Two objects Added but reached maxItems.. cache.size():

private void crunchifyTestExpiredCacheObjects() throws InterruptedException

// Test with crunchifyTimeToLive = 1 second

// crunchifyTimerInterval = 1 second

CrunchifyInMemoryCache<String, String> cache = new

CrunchifyInMemoryCache<String, String>(1, 1, 10);

// Cache because of timeToLiveInSeconds value

System.out.println("Two objects are added but reached timeToLive.

cache.size(): " + cache.size());

private void crunchifyTestObjectsCleanupTime() throws InterruptedException {

int size = 500000;

// Test with timeToLiveInSeconds = 100 seconds

// timerIntervalInSeconds = 100 seconds

// maxItems = 500000

CrunchifyInMemoryCache<String, String> cache = new

CrunchifyInMemoryCache<String, String>(100, 100, 500000);

for (int i = 0; i < size; i++) {

String value = Integer.toString(i);

long start = System.currentTimeMillis();