Professional Documents
Culture Documents
Correlated Braches
Srinivas Rao Narne
Department of Electrical and Computer Engineering
University of Florida
Gainesville, USA
snarne1@ufl.edu
Abstract
Various advances in computer for increasing the
speed to meet the never ending demand in improvement
itself has led to processor pipelines being more lengthy
and issue widths being wider in order to increase the
throughput. If this trend continues, branch misprediction penalty will become very high. Branch misprediction is the single most significant performance
limiter for improving processor performance using
deeper pipelining. To cope with deeper pipelines and
wider issue width, we cannot use default branch
prediction types.
So for branch prediction our techniques rely on
using a long global history, and identifying correlated
branches in this history by using runtime dataflow
information. Such a predictor uses a collection of
predictors, each of which provides its predictions at a
different stage of the pipeline front-end. A simple 1cycle latency line predictor provides predictions in the
first stage, followed in a couple of stages later by
predictions from a more accurate global predictor.
Finally, one or two stages later, a highly accurate
corrector predictor selectively corrects the global
predictors prediction.
Introduction
The vibrant information technology industry
today needs computers as their core products to deal
with all sort of information. The computer on the other
hand needs to deliver to the expectations of the fast
evolving IT industry. For this sole reason researches and
computer architects are striving for the past 30-40 years
to be able to improve the computer performance in any
way possible and they have regularly come up with new
ideas ranging from increasing clock speed, parallelism,
pipeline t o d e v e l o p i n g m u l t i c o r e p r o c e s s o r s ,
dealing
Implementation
2.
Register-writing Instructions
3.
25000
20000
15000
Default
10000
Implementati
on
5000
0
test-math test-printf
Benchmarks
Default
Modified
Test-math
2038
1488
Test-fmath
735
660
Test-llong
384
352
Test-printf
21212
17799
30
25
20
15
10
5
0
Default
Implemented
New
Implementati
on
test-fmath
test-llong
Default
Modified
Gzip
35718
32263
Crafty
102269
86513
Vortex
47904
47609
Figure 12: Number of misses for the first 5,000,000
instructions
120000
100000
Default
80000
60000
New
Implementati
on
40000
20000
0
Gzip Crafty Vortex
References
[1] Renju Thomas, Manoj Franklin, Chris Wilkerson,
and Jared Stark (Intel Corporation), Improving Branch
Prediction by Dynamic Dataflow-based Identification of
Correlated Branches from a Large Global History, in the
30th Annual International S y m p o s i u m on Computer
Architecture (ISCA-2003), June 2003.
[2] David A. Patterson, John L. Hennessy, Computer
Organization and Design, 3rd Edition, Morgan
Kaufmann Publishers, Inc., 2004
Benchmarks
Default
Modified
Gzip
7.15%
6.46%
Crafty
12.21%
10.33%
Vortex
7.91%
7.86%
Figure 14: Comparison of Miss Rate
(misses/total branches)
14
12
10
8
6
4
Default
Modified
2
0
Gzip
Crafty
Vortex