You are on page 1of 12

Microprocessing

and
Microprogramming
ELSEVIER Microprocessing and Microprogramming 41 ( 19%) 69 l-702

A neural network-based replacement strategy


for high performance computer architectures
Humayun Khalid *
Depurtment of Electrical Engineering, Convent Avenue at 140th Street, City University of New York, The City College, New York,
NY 10031, USA

Received 5 May 1995; revised 8 August 1995; accepted 23 October 1995

Abstract

We propose a new scheme for the replacement of cache lines in high performance computer systems. Preliminary
research, to date, indicates that neural networks (NNs) have great potential in the area of statistical predictions [l]. This
attribute of neural networks is used in our work to develop a neural network-based replacement policy which can effectively
eliminate dead lines from the cache memory by predicting the sequence of memory addresses referenced by the central
processing unit (CPU) of a computer system. The proposed strategy may, therefore, provide better cache performance as
compared to the conventional schemes, such as: LRU (Least Recently Used), FIFO (First In First Out), and MRU (Most
Recently Used) algorithms. In fact, we observed from the simulation experiments that a carefully designed neural
network-based replacement scheme does provide excellent performance as compared to the LRU scheme. The new approach
can be applied to the page replacement and prefetching algorithms in virtual memory systems.

Keywor& Performance evaluation; Trace-driven simulation; Cache memory; Neural networks; Predictor

1. Introduction extremely vital for high speed computers [2-71. It is


for this reason, that caches have been extensively
In high performance computer systems, band- studied since their introduction by IBM in the system
width of the memory is often a bottleneck because it 360 Model 85 [8]. Caches provide, with high proba-
plays a critical role in affecting the peak throughout. bility, instructions and data needed by the CPU at a
Cache is the simplest cost-effective way to achieve rate that is closer to the CPU’s demand rate. They
high speed memory hierarchy and, its performance is work because the programs exhibit a property called
locality of reference. Three basic cache organizations
were identified by Conti [9]: direct mapped, fully
l Email: khali@ee-mail.engr.ccny.cuny.edu associative, and set associative. The choice of cache

0165~6074/%/$15.00 0 1996 Elsevier Science B.V. All rights reserved


XSDI 0165-6074(95)00030-5
692 H. Khalid/Microprocessing and Microprogramming 41 (1996) 691-702

organization can have a significant impact on cache considered in our work. Section 3 provides a theoret-
performance and costs [lo- 121. In general, set asso- ical framework for the replacement policy. Next,
ciative cache organization offers a good balance algorithms are developed to realize the replacement
between hit ratios and the implementation cost. Also, scheme for different categories of neural networks.
the selection of a line/block replacement algorithm, Section 5 is on the hardware implementation of the
in set associative caches, can have. a significant algorithms. Simulation results and discussion are
impact on the overall system performance. The com- given in Section 6. Finally, the last section contain
mon replacement algorithms used with such caches recommendations for future work and conclusions.
are: FIFO, MRU, and the LRU. These algorithms try
to anticipate future references by looking at the past
behavior of the programs (exploiting the locality of 2. Neural network paradigms for the cache con-
reference proper@ of the programs). The relative troller
performance of these algorithms depends mainly on
the length of the history consulted. An LRU algo- Neural networks are human attempts to simulate
rithm consults a longer history of past address pat- and understand the nervous system with the hope of
terns than FIFO, and therefore, it has a better relative capturing some of the power of these biological
performance. systems such as the ability for generalization, grace-
Many different approaches have been suggested ful degradation, adaptivity and learning, and paral-
by researchers to improve the performance of re- lelism. There are many different types/models/
placement algorithms [13-151. One such proposal paradigms of neural networks (NNs), reflecting the
was given by Pomerene et al. [13]. Pomerene sug- emphasis and goals of different research groups.
gested the use of a shadow directory in order to look Each network type, generally, has several variations
at a relatively longer history when making decisions which address some weakness or enhance some fea-
with LRU. The problem with this approach is that ture of the basic networks. In this large array of
the size of the shadow directory limits the length of network types, there are two characteristics that di-
the history consulted. Chen et al. [14] studied the vide NNs into a few basic categories.
improvement in cache performance due to instruc- (1) Direction of signal flow: feed-forwa,rd/feed-
tion reordering for a variety of benchmark programs. back/lateral,
The instruction reordering increases the spatial and (2) Mode of learning: supervised/unsupervised/re-
sequential locality of instruction caches, and thereby, inforcement.
results in a better performance of the replacement NNs are experts at many pattern recognition tasks.
and mapping algorithms. Our research work attempts They generate their own rules by learning from
to improve the performance of cache memory by examples shown to them. Due to their learning abil-
developing a replacement strategy that looks at a ity, NNs have been applied in several areas such as:
very long history of addresses, without proportional financial prediction, signal analysis and processing,
increase in the space/time requirements. The task is process control, robotics, classification, pattern
accomplished through neural networks (NNs). Some recognition, filtering, speech synthesis, and medical
rudimentary work in this area was done by Sen [ 161. diagnosis. Several good references are available on
His results were not very encouraging. Moreover, no the theory and applications of NNs [17-191.
justification or explanation was given for the design. We have chosen six of the most popular NNs, that
In the next section of this paper we will describe encompass almost all of the basic categories. In
various neural network paradigms that have been addition, we have looked at variants from among
H. Khalid/Microprocessing and Microprogramming 41 (1996) 691-702 693

these basic categories. A very brief description of the LVQ has various drawbacks and therefore, variants
paradigms used in our work is given hereunder. have been developed to overcome these shortcom-
ings.
(1) Backpropagation Neural Network (BPNN). A
typical BPNN has an input layer, an output layer, (6) General Regression Neural Network (GRNN).
and at least one hidden layer. There is no theoretical GRNNs are general purpose NNs. A simple GRNN
limit on the number of hidden layers. However, in consists of an input layer, a pattern layer, summation
our work we have tried to limit the number of cases and division layer, and an output layer, GRNNs
under study by considering at most two hidden lay- provide good results in situations where the statistics
ers. There are several variants of BPNNs, a few of of the data changes over time. They can be regarded
which are considered in the present work. as the generalization of probabilistic neural net-
works. GRNNs use the same parzen estimator as the
(2) Radial Basis Function Network (RBFN). In gen- PNNs.
eral, a RBPN is any neural network which makes use
of a radially symmetric and radially bounded transfer
function in its hidden layer. Probabilistic neural net- 3. Theoretical basis for the cache replacement
works (PNNs) and general regression neural net- policy
works (GRNNs) could both be considered RBFNs.
The Moody/Darken type of RBFN (MD-RBFN), The proposed scheme uses neural networks to
that we have used in the research, consists of 3 identify which line/block should be replaced on a
layers: an input layer, a prototype layer, and an cache miss in real-time. The job of our neural net-
output layer. works is to learn a function that may approximate
the unknown complex relationship between the past
(3) Probabilistic Neural Network (PNNJ. The PNNs and future values of addresses.
actually implements Bayes decision rule and uses f: past + future;
Parzen windows to estimate the probability density
functions (pdf) related to each class. The main ad- tl (past, future) E Domain( addresses)
vantage of PNN is that it can be effectively used The function of f can also be learned using the tag
with sparse data. field of the addresses. Recall, that a set-associative
cache partitions the addresses into tag, set, and the
(4) Modular Neural Network (MNN). MNNs con- word fields. The word field is used to select a word
sists of a group of BPNNs competing to learn differ- within a cache block/line and the set field is used
ent aspects of a problem, and therefore, it can be for indexing. The tag field preserves the mapping
regarded as a generalization of BPNN. There is a between the cache and the main memory. This means
gating network associated with MNN that controls that the development of f using the tag field is only
the competition and learns to assign different parts of possibly if we use the set field for indexing a group
the data space to different networks. of neurons in NNs. Fig. 1 shows how a NN is
partitioned into groups/sets and indexed through a
(5) Learning Vector Quantization Nehvork (LVQ). set selector. Here, the function f actually represents
LVQ is a classification network. It contains an input a predictor. The parameters of such predictors are
layer, a Kohonen layer which learns and performs developed through the use of some statistical proper-
the classification, and an output layer. The basic ties of variables/functions such as mean, variance,
694 H. Khalid/Microprocessing and Microprogramming 41 (1996) 691-702

layer of NNs (predictor) and the output f(i + 1)


(estimate of T(i + 1)) is compared with T( i + 1) in
order to get error values for the feedback purpose.
The repetition of the procedure will eventually lead
to the development of predictor parameters contain-
ing histogram information of the addresses. There
are some NNs, like PNNs and GRNNs, which re-
quire only the present value of tags T(i) to estimate
the categorical pdf/cdf, internally, using parzen win-
dows. Such NNs are good for classification type of
problems, since they are adept at developing categor-
ical histograms (related to a category/class).
The above discussion gives some insight into the
Fig. 1. Partial block diagram of the cache controller with any
algorithm that would make NNs behave as predictors
neural network paradigm except PNN/GRNN.
for our problem. The question that remains is that
how can the predictor information be used for
probability density functions (pdf), cumulative distri- block/line replacements in a cache in real time?
bution functions (cdf), power spectral density etc. This question is answered in two different ways for
This means that our NNs would behave as predictors two different categories of NNs based on whether
if they are informed about the statistical properties of the histogram is estimated internally by NNs or not.
the tags. The problem with this approach is that tags, Our first category consist of BPNN, MNN, LVQ,
like other similar data sets, do not have well-defined and RBFN. The second category contains PNN, and
statistical properties. GRNN. The basic approach, however, is the same
To accomplish the task in, we have attempted to for both cases. What we want to do is to make both
teach NNs the histogram of references (an estimated types of NNs learn the histogram of addresses/tags.
pdf is called a histogram). Initially, the performance At the same time, we would also like to extract the
of the predictor would be low, however, the perfor- history information for a particular tag instead of the
mance improves as the histogram develops. A well- estimate of the future values of the tag. So, the
developed histogram may contain sufficient locality important parameter that is used for guiding the
information about the programs and can be used for replacement decision is not f(i + l), rather, it is the
guiding the replacement decisions. In order to ac- response of a particular layer of neurons containing
complish this with NNs, we need at least two tags, categorical history. The number of categories re-
T(i) and T(i + l), for the training/learning. This ferred to in such layers, that is the number of neu-
implies that the network learning should be delayed rons in the layers, should obviously be equal to the
until a new tag arrives. A tag T(i) is a vector that is associativity A of the cache in order to be able to
made up of O’s and - l’s (Bipolar values). The size identify the cache line to be replaced within the set.
of each vector T(i) depends on the computer archi- The task, therefore, is to select a NN architecture and
tecture. For example for a Q-bit address machine train it to learn the histogram in such a way so that at
that incorporates a cache with N sets and P words least a layer of neurons in NNs would provide the
per block, the size of the vector T(i) would be equal desired information. The responses of these neurons
to Y = (Q - (log, N + log, P)) bits. So, a Y-di- to an incoming tag T(i) will therefore determine the
mensional vector T(i) should be applied to the input extent to which a tag vector T(i) is seen in the past.
H. Kholid/ Microprocessing and Microprogrumming 41 (1996) 691-702 695

This can be accomplished by correlating the history


of references, corresponding to each cache line in a
set, with the present tag vector T(i).
Consider the finite correlation of two sequences x,
and Y,,, T(i)
m
‘n= c Xkyk-n
k=O

=X()y-n+X,y,_n+ ... +x,y,_. (1)


i ForSet N-1 1
i T
The neurons in NNs can be made to compute the Li

sum as follows for an input vector X and the weight


vector W:
xrw = xowo + X,W] + . . . +xmwm (2) Fig. 2. Partial block diagram of the cache controller with
PNN/GRNN paradigm.
Equating the right hand side of Eqs. (1) and (2), we
get the following values:
W~=y_nrW1=Y~-nt . . . . ..,wm=ym_n
This means that for certain weight values, the inner oping the histogram. In such a case, we don’t need
product results from a layer of neurons can provide, T(i) and T(i + 1) for the determination of feedback
to a certain extent, the correlation information. With error. Therefore, the output layer of such NNs could
this information, we can determine which line is the be used for the identification of replaceable cache
most suitable candidate to be replaced in a cache set. lines (see Fig. 2). The things become a little more
Another way of looking at the same thing is from complicated for the other category of NNs
the point of view of the difference in the two vectors. (BPNN, MNN, LVQ, RBFN). This is because, we
For an associativity A, we are looking towards an need to train such NNs to develop the histogram by
A-class problem for which we require externally providing T(i + 1). In this paper, we sug-
gest a heuristic solution to the problem of identifying
i= ,7ax, A 11X - Wi 11
9. 7 the layer of neurons that would provide the replace-
Expanding IIX - Wi II we get, ment information. Our trace-driven simulation exper-
iments indicate that by inserting a low-dimensional
IIx-wiII=[(x-wi)‘(x-wi)]“2 layer of neurons between two identical NNs, learning
the histogram together, could help us identify the
= (xTx- 2XTWi -t- w;rwi)1’2 (3) appropriate cache line to be replaced (see Fig. 1).
From (3) it is obvious that for max IIX - Wi II,
i= 1, 2,..., A, we require min (XTWi>. Therefore,
the most suitable line to be replaced is the one that
4. The algorithms
corresponds to the neuron, in a layer, with the small-
est output response. Thus, the identification of such a
layer for the two categories of NNs, under study, The discussion above, leads us to the following
becomes imperative for our job. For the PNNs/ algorithms for the replacement of cache lines from a
GRNNs, parzen estimators are responsible for devel- set.
696 H. Khalid/ Microprocessing and Microprogramming 41 (I 996) 691-702

Algorithm-l Step 2: Same as step3 above


(For replacements with BPNN, MNN, RBFN, and Step 3: Same as step4 above.
LVQ) Step 4: (Replacement Line Identification)
Step 1: (Partitioning Of NNs) Apply the input Y-dimensional tag vector 7’(i) to the
Partition the NNs into groups. Each group/NN mod- selected module n and send the output response of
ule deals with a specific set. This means that for N the PNN/GRNN to a transformation module. This
sets of a cache the algorithmic space/time complex- module transforms values as follows:
ity would be O(aN), where (Yis a factor that relates g: Lowest value + Highest possible output value for neurons
to the complexity of NNs within a module. All other values + Lowest possible output value for neurons.

Step 2: (Partitioning of Modules) The highest output index of the transformation mod-
Each module of NN is partitioned into two sub-mod- ule indicates the corresponding cache line to be
ules. The first sub-module should have an input layer replaced from the set n.
and one or more hidden layers. The second sub-mod- Step 5: (Learning)
ule should have one or more similar hidden layers The output values from the transformation unit is
and an output layer. The two sub-modules are to be compared with its input values for the error. This
connected by a lower dimensional layer of NNs, means that
interface layer, containing ‘A’ neurons (A = ERROR = { g (input) - input) (4)
associativity of a cache) that provide the desired
However, for the hit situation or for the case when a
response for replacements.
cache line is empty, the error is evaluated by com-
Step 3: (Initialization)
paring the input values of the transformation module
Initialize all the weights in the NNs to small random
with a vector. This vector should have a value of
values very close to zero.
+ 1.0 (highest possible unscaled output value for a
Step 4: (Module Selection)
neuron) at the coordinates corresponding to the hit-
Partition each incoming address into tag, word, and
position/tag-insertion position (empty line case). All
set fields. The set field is to be used for selecting one
other values should be - 1.0 (lowest possible un-
of the NN modules.
scaled output value for a neuron>.
Step 5: (Replacement Line Identification)
Apply the input Y-dimensional tag vector 7’(i) to the
selected module n and look at the response from the
5. Proposed implementation of the replacement
interface layer. The lowest response from neuron i,
scheme
i= 1,2,..., A, indicates that the ith cache line
within the set n should be replaced on a miss. On a
Based on the given theory and the algorithm for
hit situation or when a cache line is empty (illegal
different categories of NNs, we have developed two
tag), the response from NNs should be disregarded.
different architectures presented via Figs. 1 and 2.
Step 6: (Learning)
Fig. 1 shows how the set field of an incoming
The output response from the second sub-module is
address is extracted by the set selector to identify a
compared with the next incoming tag T( i + 1) and
neural network module that contains the information
the resulting error (T(i + 1) - l?‘(i+ 1)) is used for
related to a particular cache set. The input and the
the weight adjustment.
output lines for only one neural network module is
Algorithm-2 (For replacements with PNNs and shown for brevity. All the other neural network
GRNNs) modules are identical copies of the one shown. The
Step I: Same as above. tag field of an address is fed in the NN module as an
H. Khalid/Microprocessing and Microprogramming 41 (1996) 691-702 697

input through a register. The sub-modules are repre- rithm is shown in Fig. 2. Here, only the present
sented by rectangular boxes. These boxes contain value of tag T(i) is presented directly to the NNs
input, output, and the hidden layer(s) for BPNN, module as an input. The output layer for
MNN, RBFN, or LVQ. The number of neurons in PNNs/GRNNs is shown external to the rectangular
the input and output layers are identical and equal to box in Fig. 2. The transformation module transforms
the dimensionality Y of the tag vector T(i). The the output values from PNNs/GRNNs for two pur-
sub-modules are connected by a neural network layer poses:
called the interface layer. Each neuron in the inter- (11 To identify the corresponding cache line to be
face layer corresponds to a cache block/line within replaced.
a cache set. The error values are generated by com- (2) To determine the error.
paring the estimated tag f(i + 1) with a new incom- Again, the module transforms values as follows:
ing tag T(i + 1). Therefore, the error calculations are
g : Lowest value + Highest possible output value
delayed by approximately the amount of time it takes
for a new address/tag to arrive. The error is com- for neurons
puted through an adder/subtracter accompanied with All other values + Lowest possible output value
a buffer as shown by a circle in Fig. 1. This error is
for neurons
fed back to the neural networks, for the set in
consideration, in order to develop the histogram of For example, a 4-dimensional output vector
addresses. The minimum selector (MIN SEL) identi- [O.l 0.9 0.8 0.31T is transformed as given below.
fies the neuron, in the interface layer, that provides
[O.l 0.9 0.8 0.31T L [l.O - 1.0 - 1.0 - 1.01r
the lowest response value and accordingly sends the
line replacement information to the control unit (CU). A multiplexer (MUX) is used to select the vector
The initial performance of this architecture would values from either the CU or the output of the
obviously be low, because of the initial random transformation unit. The information as to which of
weight selection. However, performance improves as the inputs should be selected by the MUX comes
NNs learn the histogram. The hardware implementa- from the CU. The lower input lines to the MUX are
tion for the PNNs/GRNNs based replacement algo- selected by the CU on a hit situation or when a cache

(4 (c)
Fig. 3. Miss ratio v.s. cache size for different variants of BPNN with BLDATA as the trace tile: (a) Associativity = 2; (b) Associativity = 4,
(c) Associativity = 8.
698 H. Khalid/ Microprocessing and Microprogramming 41 (1996) 691-702

(4
Fig. 4. Miss ratio vs. cache size for different variants of BPNN with TPC.DAT as the trace file: (a) Associativity = 2; (b) Associativity = 4;
(c) Associativity = 8.

line in empty. However, on a cache miss, the upper million addresses. For more information on these
lines (representing the desired output vector) are trace files see [20].
selected for the purpose of error computation. The caches considered in our simulation have set
sizes that vary from N = 32 to N = 2048. The line
size is fixed at 8 words per block. We have also
6. Simulation results and discussion varied the associativity A and looked at three differ-
ent cases, that is A = 2, A = 4, and A = 8. Each
In this research work, we have used trace-driven figure is partitioned into three parts namely (a), (b),
simulations to accurately obtain information on the and (cl, corresponding to the three values of associa-
relative performance of various neural network tively under consideration. Figs. 3 and 4 provide the
paradigms and the conventional LRU algorithm. The miss ratio information for different variants of BPNN
two trace files used were named TPC.data and paradigm with BI.data and TPC.data as the trace
BI.data. TPC.data is the trace file that was generated files, respectively. T’s assigned to the legends denote
on an IBM PC running Borland’s Turbo Pascal the type of the BPNN. Complete description of T’s
Compiler, The other trace file, BI.data, was gener- is given in Table 1. Notice from the Figs. 3 and 4
ated using Microsoft’s Basica Interpreter. Each of that BPNN of type T6 has a relatively better perfor-
these original trace files contains approximately l/2 mance than the other types. This means that the

Table 1
Legend information for Figures 3 and 4
Tl:Hidden layer1 = 11, Learning rule = Quick-prop, Learning rate = 0.5, Transfer function = TanH, Momentum = 0.5
‘I2 Hidden layer1 = 11,Learning rule = Max-prop, Learning rate = 0.5, Transfer function = TanH, Momentum = 0.5
T3: Hidden layer1 = 19, Learning rule = Norm-cum, Learning rate = 0.5, Transfer function = TanH, Momentum = 0.5
T4: Hidden layer1 = 11,Learning rule = Ext-delta-bar-delta, Learning rate = 0.5, Transfer function = Sigmoid, Momentum = 0.5
T5: Hidden layer1 = 1I, Hidden layer2 = 19, Learning rule = Max-prop, Learning rate = 0.9, Learning rule = TanH, Momentum = 0.9
T6: Hidden layer1 = 19, Learning rule = Norm-cum, Learning rate = 0.9, Transfer function = TanH, Momentum = 0.9
T7: Hidden layer1 = 19, Learning rule = Norm-cum, Learning rate = 0.05, Transfer function = TanH, Momentum = 0.05
H. Khalid/ Microprocessing and Microprogramming 41 (1996) 691-702 699

Norm-Cum learning rule along with the TanH trans- decrement in the miss ratio with increasing cache
fer function is suitable for our neural network-based size is uniform across different types of BPNNs.
replacement algorithm. The number of neurons in the Notice that the relative performance of T7 in Fig. 4
hidden layer are reasonably varied, in our experi- is worse as compared to Fig. 3; This is due to the
ments, in the range 10 to 20. For T5, even with two fact that the learning of neural networks is dependent
hidden layers, Max-Prop learning rule could not on the sequence of the data file. However, based on
outperform the Norm-Cum based results. However, the general profile of the results, for a variety of
T3’s performance is on the same league as T6 (best address traces, we can draw some conclusions on the
case) which demonstrates the fact that for our data, relative performance of different types of neural
there is a very little change in learning with the networks. Our experiments with using dynamic
increased momentum and learning rate values. learning rates (based on miss ratios) failed because
The performance of T7 with respect to T6 indi- of the low values of miss ratios, the related curves
cates that very high values of momentum and leam- are, therefore, not included in the given figures. In
ing rates do not produce very drastic results as Figs. 5 and 6, a comparative performance of BPNN,
compared to moderate values, but, very low values PNN, GRNN, LVQ, RBFN, and MNN is given.
are indeed detrimental to the performance. The Here, we have considered the best results from among

(a) (b)
Fig. 6. Miss ratio V.S. cache size for different NN paradigms with TPC.DATA as the trace file: (a) Associativity = 2; (b) Associativity = 4;
(c) Associativity = 8.
700 H. Khlid/ Microprocessing and Microprogramming 41 (1996) 691-702

ib) CC)

Fig. 7. Miss ratio V.S. cache size for LRU, LVQ, and BPNN with BI.DATA as the trace tile: (a) Associativity = 2; (b) Associativity = 4; (c)
Associativity = 8.

the paradigms for a reasonable number of neurons mance improvement of 16.4711% in the miss ratio
(IO to 20 neurons). Our comparative study reveals values over the LRU algorithm. The average is not
that BPNN outperforms the other paradigms. The significant because of the large variance in BPNN’s
closest rival appears to be the LVQ. PNNs and performance as shown in Figs. 3 and 4. LVQ also
GRNNs appear to be next in line, respectively. How- performs well and, in most of the cache organiza-
ever, PNNs in some cases performs as well as LVQ. tions considered in the simulation studies, its perfor-
The difference in the relative performance of various mance is better than the LRU. This proves that we
neural network paradigms is quite appreciable for can get excellent results from NNs provided we
large caches. choose our parameters carefully. We expect that,
The last pair of Figs. 7 and 8, provide us with a longer trace files for a variety of applications and
relative performance of LRU, LVQ, and BPNN. For systems software can provide us with even better
the peak category, BPNN gives a significant perfor- results. The results are trace dependent, however, the

(a) (b) fc)

Fig. 8. Miss ratio V.S. cache size for LRU, LVQ, and BPNN with TPC.DATA as the trace file: (a) Associativity = 2; (b) Associativity = 4;
(c) Associativity = 8.
H. Khalid/ Microprocessing and Microprogramming 41 (1996) 691-702 701

theory presented here dictates that our cache replace- [41 D. ThiCbaut, H.S. Stone and J.L. Wolf, Improving disk cache
ment strategy can be expected to yield reasonably hit-ratios through cache partitioning, IEEE Trans. Computers
41(6) (June 1992) 665-676.
good results for any workload.
[51 A. Agarwal, M. Horowitz and J. Hennessy, An analytical
cache model, ACM Trans. Compurer Systems 7(2) (May
1989) 184-215.
7. Conclusion and recommendations for future t61J.E. Smith and J.R. Goodman, Instruction cache replacement
work policies and organizations, IEEE Trans. Computers 343)
(March 1985) 234-241.
[71 M.D. Hill and A.J. Smith, Evaluating associativity in CPU
In this paper, we have presented neural network-
caches, IEEE Trans. Compufers 38( 12) (Dec. 1989) 1612-
based cache replacement algorithms and have pro- 1630.
posed corresponding hardware implementations. Dl S.G. Tucker, The IBM 3090 Systems: An overview, IBM
Several neural network paradigms, such as: BPNN, Systems J. 25(6) (Jan. 1986).
MNN, RBPN, LVQ, PNN, and GRNN, were investi- [91 C.J. Conti, Concepts for buffer storage, fEEE Comp. Group
News 2(8) (March 19691 9- 13.
gated in our work and their relative performance was
1101 A.J. Smith, A comparative study of set associative memory
studied, and analyzed. Our trace-driven simulation mapping algorithms and their use for cache and main mem-
results indicate that with a suitable BPNN paradigm, ory, IEEE Trans. SofhYare Eng. 421 (March 1978).
we can get an excellent improvement of 16.47 11% in [Ill H. Khalid and M.S. Obaidat, A novel cache memory con-
the miss ratio over the conventional LRU algorithm. troller: algorithm and simulation, Summer Compurer Simula-
tion Co& (SCSC’ 95), Ottawa, Canada (July 1995) 767-772.
Excellent performance of neural network-based re-
[121 M.S. Obaidat and H. Khalid, A performance evaluation
placement strategy means that this new approach has methodology for computer systems, Proc. IEEE 14th Annual
potential for providing promising results when ap- Int. Phoenix Conf. on Computers and Communications,
plied to the page replacement and prefetching algo- Scottsdale, AZ (March 1995) 713-719.
rithms in virtual memory systems. [131 J. Pomerene. T.R. Puzak, R. Rechtschaffen and F. Sporacio,
Prefetching mechanism for a high-speed buffer store, US
Patent, 1984.
1141 W.Y. Chen, P.P. Chung, T.M. Conte and W.-M.W. Hwu,
Acknowledgements The effect of code expanding optimizations on instruction
cache design, IEEE Trans. Computers 42(9) (Sep. 1993)
This research is supported in part by psc-cuny 1045-1057.
grants #‘s 6-64455 and 6-666353. [151 W.W. Hwu and P.P. Chang, Achieving high instruction
cache performance with an optimizing compiler, Proc. 16rh
Annual Inr. Symp. on Computer Architecture (June 1989)
242-25 1.
References 1161C.-F. Sen, A self-adaptive cache replacement algorithm by
using backpropagation neural networks, MS Thesis, Univer-
fll S. Wu, R. Lu and N. Buesing, A stock market forecasting sity of Missouri-Rolla, 1991.
system with an experimental approach using an artificial [17] M.S. Obaidat and H. Khalid, An intelligent system for
neural network, Proc. 25th Small College Computing Symp. ultrasonic transducer characterization, submitted to the IEEE
North Dakota (April 24-25, 1992) 183-192. Trans. on Instrumentanon and Measurements, 1995.
L.21MS. Obaidat, H. Khalid and K. Sadiq, A methodology for [ 181 J. Hertz, A. Krogh and R.G. Palmer, Jnrroduction to the
evaluating the performance of CISC computer systems under Theory of Neural Computation (Addison-Wesley, 1991).
single and two-level cache environments, Microprocessing [191 S. Kong and B. Kosko, Differential competitive learning for
and Microprogramming J 40(6) (July 1994) 41 l-421. centroid estimation and phoneme recognition, IEEE Trans.
(31 S. Laha, J.H. Pate1 and R.K. Iyer, Accurate low-cost methods Neural Nenvorks 2(l) (Jan. 19911 118-124.
for performance evaluation of cache memory systems, IEEE [201 H.S. Stone, High Performance Compurer Architecfure (Ad-
Truns. Computers 37(11) (Nov. 1988) 1325-1336. dison-Wesley, 1993).
702 H. Khalid/Microprocessing and Microprogramming 41 (1996) 691-702

Humayun Khalid was born at Karachi, Khalid is a reviewer for IEEE international Phoenix Conference
Pakistan. He received his B.S.E.E. on Computers and Communications (IPCCC), IEEE International
(Magna Cum Laude) and M.S.E.E. Conference on Electronics, Circuits, and Systems (ICECS’95),
(Graduate Citation) degrees from the Journal of Computer and Electrical Engineering, Information Sci-
City College of New York. Currently, ences Journal, and Computer Simulation Journal. He is an Assis-
he is a Ph.D. candidate in the depart- tant Chair for the IEEE International Conference on Electronics,
ment of Electrical Engineering at the Circuits, and Systems (ICECS’95). He is a lecturer at the City
City University Of New York. His re- University of New York (CUNY), and a lead researcher at the
search interests include parallel com- Computer Engineering Research Lab. (CERL). He has published
puter architecture, high performance several refereed technical articles in national and international
computing/computers, computer net- journals and conference proceedings.
works, performance evaluation, and ap
plied artificial neural networks. Mr.

You might also like