Professional Documents
Culture Documents
ARCHITECTURE
Assignment
BY
SHASHANK B.V (17030141CSE035)
Levels of memory:
Level 1 or Register –
It is a type of memory in which data is stored and accepted that are
immediately stored in CPU. Most commonly used register is
accumulator, Program counter, address register etc.
Level 2 or Cache memory –
It is the fastest memory which has faster access time where data is
temporarily stored for faster access.
Level 3 or Main Memory –
It is memory on which computer works currently. It is small in size and
once power is off data no longer stays in this memory.
Level 4 or Secondary Memory –
It is external memory which is not as fast as main memory but data
stays permanently in this memory.
Cache Performance:
When the processor needs to read or write a location in main memory, it first checks
for a corresponding entry in the cache.
If the processor finds that the memory location is in the cache, a cache
hit has occurred and data is read from cache
If the processor does not find the memory location in the cache,
a cache miss has occurred. For a cache miss, the cache allocates a new
entry and copies in data from main memory, then the request is fulfilled
from the contents of the cache.
The performance of cache memory is frequently measured in terms of a quantity
called Hit ratio.
Hit ratio = hit / (hit + miss) = no. of hits/total accesses
We can improve Cache performance using higher cache block size, higher
associativity, reduce miss rate, reduce miss penalty, and reduce the time to hit in the
cache.
Cache Mapping:
There are three different types of mapping used for the purpose of cache memory
which are as follows: Direct mapping, Associative mapping, and Set-Associative
mapping. These are explained below.
1. Direct Mapping –
The simplest technique, known as direct mapping, maps each block of
main memory into only one possible cache line. or
In Direct mapping, assigne each memory block to a specific line in the
cache. If a line is previously taken up by a memory block when a new
block needs to be loaded, the old block is trashed. An address space is
split into two parts index field and a tag field. The cache is used to store
the tag field whereas the rest is stored in the main memory. Direct
mapping`s performance is directly proportional to the Hit ratio.
2. i = j modulo m
3. where
4. i=cache line number
5. j= main memory block number
m=number of lines in the cache
For purposes of cache access, each main memory address can be viewed
as consisting of three fields. The least significant w bits identify a unique
word or byte within a block of main memory. In most contemporary
machines, the address is at the byte level. The remaining s bits specify
one of the 2s blocks of main memory. The cache logic interprets these s
bits as a tag of s-r bits (most significant portion) and a line field of r bits.
This latter field identifies one of the m=2r lines of the cache.
6. Associative Mapping –
In this type of mapping, the associative memory is used to store content
and addresses of the memory word. Any block can go into any line of
the cache. This means that the word id bits are used to identify which
word in the block is needed, but the tag becomes all of the remaining
bits. This enables the placement of any word at any place in the cache
memory. It is considered to be the fastest and the most flexible mapping
form.
7. Set-associative Mapping –
This form of mapping is an enhanced form of direct mapping where the
drawbacks of direct mapping are removed. Set associative addresses the
problem of possible thrashing in the direct mapping method. It does this
by saying that instead of having exactly one line that a block can map to
in the cache, we will group a few lines together creating a set. Then a
block in memory can map to any one of the lines of a specific set..Set-
associative mapping allows that each word that is present in the cache
can have two or more words in the main memory for the same index
address. Set associative cache mapping combines the best of direct and
associative cache mapping techniques.
In this case, the cache consists of a number of sets, each of which
consists of a number of lines. The relationships are
m = v * k
i= j mod v
where
i=cache set number
j=main memory block number
v=number of sets
m=number of lines in the cache number of sets
k=number of lines in each set
import java.util.ArrayList;
import org.apache.commons.collections.MapIterator;
import org.apache.commons.collections.map.LRUMap;
/**
* @author Crunchify.com
*/
public T value;
this.value = value;
}
}
while (true) {
try {
Thread.sleep(crunchifyTimerInterval * 1000);
}
cleanup();
}
}
});
t.setDaemon(true);
t.start();
}
}
synchronized (crunchifyCacheMap) {
}
}
@SuppressWarnings("unchecked")
synchronized (crunchifyCacheMap) {
CrunchifyCacheObject c = (CrunchifyCacheObject)
crunchifyCacheMap.get(key);
if (c == null)
return null;
else {
c.lastAccessed = System.currentTimeMillis();
return c.value;
}
}
}
synchronized (crunchifyCacheMap) {
crunchifyCacheMap.remove(key);
}
}
synchronized (crunchifyCacheMap) {
return crunchifyCacheMap.size();
}
}
@SuppressWarnings("unchecked")
synchronized (crunchifyCacheMap) {
CrunchifyCacheObject c = null;
while (itr.hasNext()) {
deleteKey.add(key);
}
}
}
synchronized (crunchifyCacheMap) {
crunchifyCacheMap.remove(key);
}
Thread.yield();
}
}
CrunchifyInMemoryCacheTest.java
package com.crunchify.tutorials;
import com.crunchify.tutorials.CrunchifyInMemoryCache;
/**
* @author Crunchify.com
*/
CrunchifyInMemoryCacheTest();
System.out.println("\n\n==========Test1:
crunchifyTestAddRemoveObjects ==========");
crunchifyCache.crunchifyTestAddRemoveObjects();
System.out.println("\n\n==========Test2:
crunchifyTestExpiredCacheObjects ==========");
crunchifyCache.crunchifyTestExpiredCacheObjects();
System.out.println("\n\n==========Test3:
crunchifyTestObjectsCleanupTime ==========");
crunchifyCache.crunchifyTestObjectsCleanupTime();
}
// maxItems = 6
cache.put("eBay", "eBay");
cache.put("Paypal", "Paypal");
cache.put("Google", "Google");
cache.put("Microsoft", "Microsoft");
cache.put("IBM", "IBM");
cache.put("Facebook", "Facebook");
cache.remove("IBM");
cache.put("Twitter", "Twitter");
cache.put("SAP", "SAP");
" + cache.size());
}
// maxItems = 10
cache.put("eBay", "eBay");
cache.put("Paypal", "Paypal");
// Adding 3 seconds sleep.. Both above objects will be removed from
}
cache.put(value, value);
}
Thread.sleep(200);
cache.cleanup();
System.out.println("Cleanup times for " + size + " objects are " + finish + "
s");
}
}
OUTPUT:
CONCLUSION:
The main achievement and contribution of this paper have been to propose a new
dual unit tile/line access cache memory in improving the performance of column-
directional cache memory access. We proposed a hierarchical hybrid Z-ordering
data layout that can improve 2D data locality and exploit hardware prefetching
well. In addition, we proposed a ATSRA tag memory reduction method that can
minimize the hardware scale of the proposed cache. After analysis and
consideration of the experimental results, we proved that the proposed cache
achieves both parallel unit tile/line accessibility by the minimal overhead hardware
increase although its access efficiency can be outperformed as compared to those
previous studies.
Finally, we modified the SimpleScalar simulator to evaluate the performance of
parallel unit tile/line access and provided the following important conclusions: (1)
While providing column-directional parallel access capability, the proposed cache
succeeds to suppress the conflicts miss increase to the power-of-two sized matrix
computation as compared with that of the raster layout and Z-Morton order layout.
As a result, the proposed cache enables almost one cycle load of 8 double precision
data in both row and column directions so that it not only makes matrix
transposition unnecessary but also allows effective utilization of 2D reference
locality by SIMD operations. (2) The proposed ATSRA cache provides a high-
performance of the parallel unit tile/line with the minimal hardware increase to the
normal DL1 cache structure. (3) The number of parallel load instructions required
for parallel unit tile/line access is reduced to about one-fourth of that required for
conventional raster line access. The proposed cache with tile/line accessibility
further improves the performance of 2D applications by using SIMD instructions.
In future, we will combine the proposed cache with SIMD processor and evaluate
the unit tile/line access performance for parallel data access.
REFERENCES:
1. H. Chang and W. Sung, “Efficient vectorization of SIMD programs with
non-aligned and irregular data access hardware,” in Proceedings of the
International Conference on Compilers, Architectures and Synthesis for Embedded
Systems, pp. 167–175, New York, NY, USA, 2008.View at: Google Scholar
2. M. Alvarez, E. Salami, A. Ramírez, and M. Valero, “Performance impact of
unaligned memory operations in SIMD extensions for video codec applications,”
in Proceedings of the International Symposium on Performance Analysis of
Systems and Software, 2007.View at: Google Scholar
3. J. Thiyagalingam, O. Beckmann, and P. Kelly, “Minimizing associativity
conflicts in Morton layout,” in Proceedings of the Parallel Processing and Applied
Mathematics, pp. 1082–1088, Poznan, Poland, 2006.View at: Google Scholar
4. J. Thiyagalingam, O. Beckmann, and P. Kelly, “Improving the performance
of morton layout by array alignment and loop unrolling,” in Proceedings of the
16th International Workshop on Languages and Compilers for Parallel
Computing, pp. 241–257, 2003.View at: Google Scholar
5. D. W. Walker, “Morton ordering of 2D arrays for efficient access to
hierarchical memory,” International Journal of High Performance Computing
Applications, vol. 32, no. 1, pp. 189–203, 2018.View at: Publisher Site | Google
Scholar
6. N. Park, B. Hong, and V. K. Prasanna, “Tiling, block data layout, and
memory hierarchy performance,” IEEE Transactions on Parallel and Distributed
Systems, vol. 14, no. 7, pp. 640–654, 2003.View at: Publisher Site | Google Scholar