You are on page 1of 8

Omega 51 (2015) 128135

Contents lists available at ScienceDirect

Omega
journal homepage: www.elsevier.com/locate/omega

Technological forecasting of supercomputer development: The March


to Exascale computing$
Dong-Joon Lim n, Timothy R. Anderson, Tom Shott
Department of Engineering and Technology Management, Portland State University, 1900 SW 4th Avenue, Suite LL-50-02, Portland, OR 97201, United States

art ic l e i nf o a b s t r a c t

Article history: Advances in supercomputers have come at a steady pace over the past 20 years. The next milestone is to
Received 22 October 2013 build an Exascale computer however this requires not only speed improvement but also signicant
Accepted 21 September 2014 enhancements for energy efciency and massive parallelism. This paper examines technological progress
This manuscript was processed by Associate
of supercomputer development to identify the innovative potential of three leading technology paths
Editor Huang
Available online 2 October 2014
toward Exascale development: hybrid system, multicore system and manycore system. Performance
measurement and rate of change calculation were made by technology forecasting using data
Keywords: envelopment analysis (TFDEA.) The results indicate that the current level of technology and rate of
Data envelopment analysis progress can achieve Exascale performance between early 2021 and late 2022 as either hybrid systems
Technological forecasting
or manycore systems.
State of the art
& 2014 Elsevier Ltd. All rights reserved.
Rate of change
Supercomputer

1. Introduction fastest machine 20 years ago, the Fujitsu Numerical Wind Tunnel.
On average, progress went from being measured by Gigaops in
Supercomputers have played a critical role in various elds 1990 to Teraops in about 10 years, and then to Petaops in
which require computationally intensive tasks such as pharma- another 10 years [5]. In line with this, the next milestone to build
ceutical test, genomics research, climate simulation, energy an Exascale computer, a machine capable of doing a quintillion
exploration, molecular modeling, astrophysical recreation, etc. operations, i.e.1018, per second had been projected to see light of
The unquenchable need for faster and higher precision analysis day by 2018 [6]. However, there are signicant industry concerns
in those elds create the demand for even more powerful super- that this incremental improvement might not continue mainly due
computers. Furthermore, developing indigenous supercomputer to several practical problems.
has become a erce international competition due to its role as a The biggest challenge to build the Exascale computer is the power
strategic asset for a nationwide scientic research and the prestige consumption [7]. Tianhe-2, which is currently not only the fastest but
of being the maker of the fastest computers [1,2]. While the vast also the largest supercomputer, uses about 18 MW (MW) of power. If
majority of supercomputers have still been built using processors the current trend of power use continues, projections for the
from Intel, Advanced Micro Devices (AMD), and NVidia, manufac- Exascale computing systems range from 60 to 130 MW which would
turers are committed to developing their own customized systems, cost up to $150 million annually [8]. Therefore, unlike past advance-
e.g. interconnect, operating system and resource management, as ment mainly driven by performance improvement [9], energy
system optimization becomes a crucial factor in todays massively efciency has now gone from being a negligible factor to a funda-
parallel computing paradigm [3]. mental design consideration. Practically, fewer sites in the U.S. will be
Advances in supercomputers have come at a steady pace over able to host the Exascale computing systems due to limited avail-
the past 20 years in terms of speed, which has been enabled by the ability of facilities with sufcient power and cooling capabilities [10].
continual improvement in computer chip manufacturing [4]. The To cope with these issues, current efforts are targeting the Exascale
worlds fastest supercomputer today (March 2014) is the Tianhe-2 machine that draws electrical power of 20 MW using 100 million
built by Chinas National University of Defense Technology (NUDT) processors in the 2020 timeframe [7,11].
performing at over 33.86 Petaops, i.e. 33.86  1015 oating point Given the fact that the Exascale computing may require a vast
operations per second. This is about 273,000 times faster than the amount of power and massive parallelism, it is crucial to incorpo-
rate the power efciency and multicore characteristics into the
measure of technology assessment to have a correct view of
$ current trend [12]. This implies that the extrapolation relying
This manuscript was processed by Associate Editor Huang.
n
Tel.: 1 5036880385; fax: 1 5037254667. on a single performance measure, i.e. computing speed, may
E-mail address: tgno3.com@gmail.com (D.-J. Lim). overlook required features of future technology systems and could

http://dx.doi.org/10.1016/j.omega.2014.09.009
0305-0483/& 2014 Elsevier Ltd. All rights reserved.
D.-J. Lim et al. / Omega 51 (2015) 128135 129

eventually result in an erroneous forecast. Specically, the average then analyzes time series efciency changes to explain the expan-
power efciency of todays top 10 systems is about 2.2 Petaops sion of SOA frontier. This approach has a strong advantage by taking
per megawatt. This indicates that it is required to improve power various tradeoffs into account whereas most extrapolation methods
efciency by a factor of 23 to achieve the Exascale goal. The can only explain the variance of single dependent variable at a time
projection of performance development therefore may have to be by directly relating it to independent variables [33,34].
adjusted to consider structural and functional challenges involved. Although time series application of benchmarking practice can
This manifestly requires multifaceted approach to investigate the shed light on the new product development planning phase, there
tradeoffs between system attributes, which can tackle the ques- remains a need to integrate the product positioning with the
tions such as: how much performance improvement would be assessment of technological progress so that analysts can investi-
restricted by power and/or core reduction? What would be the gate different product segments with the purpose of market
maximum attainable computing performance with certain levels research. In particular, the High Performance Computing (HPC)
of power consumption and/or the number of cores? industry has important niches with segmented levels of competi-
There are three leading technology paths representing todays tion from small-scale multiprocessing computers to the main-
supercomputer development: hybrid systems, multicore systems, frame computers. This necessarily requires an identication of
and manycore systems [13]. The hybrid systems use both central technological advancement suitable for the design target of
processing units (CPU) and graphics processing units (GPU) to Exascale computer from corresponding product segments. For this
efciently leverage the performances [14]. The multicore systems reason, this study presents a new approach which is capable of
maintain a number of complex cores whereas the manycore systems considering the variable rates of technological advancement from
use a large number of less powerful but power efcient cores within different product segments.
the highly-parallel architecture [15]. Manufacturers and researchers The whole process of the proposed model can be divided into
are exploring these alternate paths to identify the most promising, three computational stages. In the rst stage, (1)(7), the notation
namely energy efcient and performance effective, avenue to face xij represents the ith input and yrj represents the rth output of
challenges of deploying and managing Exascale systems [1618]. technology j and jk identies the technology being evaluated.
h A fR; Cg
The comparative analysis on these technology paths can, therefore, The variables for the linear program underlying DEA are jk
h A fR; Cg h A fR; Cg
give insights into the estimation of the future performance levels as and k . The variable k represents the proportion of
well as the possible disruptive technology changes. output that should be generated to become an SOA at time period
This study employs technology forecasting using data envelop- R (release time) or at time period C (current frontier) to actual
ment analysis (TFDEA) to measure the technological progress output that technology k produced. Since each reference set, Rjk ,
considering tradeoffs among power consumption, multicore pro- only includes technologies that had been released up to tk by
cessors, and maximum performance so that supercomputers are to constraint (4), Rk indicates how superior the technology k is at the
be evaluated in terms of both energy efciency and performance time of release. Similarly, constraint (5) restrict each reference set,
effectiveness. The resulting analysis then provides a forecast of Cjk , to include technologies that had been released up to current
Exascale computer deployment under three different development frontier time T, therefore, Ck measures the amount by which
alternatives in consideration of current business environment as technology k is surpassed by the current SOA frontier. The
well as emerging technologies. objective function (1) also incorporates effective date minimiza-
tion to ensure reproducible results from possible alternate optimal
solutions [35]. The variable returns to scale (VRS) are enforced by
constraint (6).
2. Methodology " !#
n nj 1 hjk Ut j
max hk  1
Frontier analysis (or best practice) methods that model the k1 nj 1 hjk
frontier of data points rather than model the average possibilities
n
have become popular in modern benchmarking studies [1922]. s:t: hjk  yrj Z hk  yrk ; r 1; ; s 2
As an example, TFDEA has shown its usefulness in a wide range of j1
applications since the rst introduction in PICMET01 [2327]. This
n
approach has a strong advantage in capturing technological
hjk  xij r xik ; i 1; ; m 3
advancement from the State of the Arts (SOAs) rather than being j1
inuenced by the inclusion of mediocre technologies that is
frequently observed in the central tendency model [28]. Rjk 0; 8 j; kj t j 4 t k 4
One of the favorable characteristics of TFDEA is its exibility that
can incorporate practical views in the assessment. For example, Cjk 0; 8 j; kj t j 4 T 5
DEA, which underlies TFDEA, allows dynamic weighting scheme
that the model gives freedom to each data point to select its own n
weights, and as such, the efciency measure will show it in the best hjk 1; 8k 6
j1
possible light [29,30]. This approach has an advantage to prevent
the model from underestimating various types of technologies
hjk Z0; 8 j; k; h A fR; C g 7
based on a priori xed weighing scheme. Furthermore, DEA gen-
erates a reference set that can be used as reasonable benchmarks Once efciency measurements both at time of release (R) and at
for each data point to improve its performance [31]. This allows time of current frontier (C) are completed, the rate of change
TFDEA to capture the technological advancement with conside- (RoC), Ck , may then be calculated in (8) by taking all technologies
n

ration of various tradeoffs from different model parameters: orien- that were efcient at time of release, Rk 1, but were superseded
n

tation, returns to scale, weight restrictions along with variable by technology at current frontier, Ck 4 1. The local RoCs, Cj , can
selections [32]. be obtained in (9) by taking the weighted average of RoCs for each
In addition, TFDEA can deal with multiple variables, i.e. system technology on the current frontier. Each local RoC therefore
attributes, thereby tracking efciency changes over time. Speci- represents a growth potential of adjacent frontier facets based
cally, TFDEA rst measures the efciency of each technology and on the technological advancement observed from related past
130 D.-J. Lim et al. / Omega 51 (2015) 128135

products. Consequently, the local RoC enables the model to Table 1


identify an individualized RoC under which each forecasting target TOP500 dataset from 1993 to 2013.
is expected to arrive [36,37]. Note that the traditional TFDEA
Data column 1993 2008 20101 20102 20111 20112
model makes a forecast based on a single aggregated RoC, i.e.  2007  2009  2013
average of Ck , without consideration of the unique growth patterns
of different product segments. Rank O O O O
Site O O O O O O
  1  n
n
n Cn U t j =n Cn
 tk  n Manufacturer O O O O O O
Ck Ck j 1 j;k j 1 j;k ; 8 k Rk 1; Ck 4 1 8 Name X X X X X O
Computer O O O O O O
n
nk 1 Cj;k  Ck  n Country O O O O O O

Cj n ; 8 j Cj 1 9 Year O O O O O O
nk 1; C 40 Cj;k Total cores O O O O O O
k Accelerator/ X X X X X O
n
The forecasting process is denoted in (10) where Ck indicates co-processor cores
Rmax O O O O O O
super-efciency measuring how much the future technology k Rpeak O O O O O O
outperforms SOA technologies on the current frontier. The indivi- Efciency (%) X X X X X O
dualized RoC for each forecasting target k can be computed by Nmax O O O O O O
combining the local RoCs of SOA technology j that constitutes the Nhalf O O O O O O
Power X O O O O O
frontier facet onto which technology k is being projected. The
Mops/Watt X X X X X O
forecasted time t fkorecast is, therefore, obtained by the sum of Measured size X X O X X X
estimated elapsed time and the effective date for the projection. Processor technology O O O O O O
!
Processor generation X X X X X O
1
ln n n Processor O O O O O O
Ck n C  t j  Proc. frequency O O O O O X
t fkorecast   j 1 j;k n ; 8 k t k 4 T 10
C n
C n
n C Processor cores X X O O O X
ln j 1 j;k  j =j 1 j;k
n C n
j 1 j;k
Processor speed (MHz) X X X X X O
System family O O O O O O
System model X O O O O O
Operating system O O O O O O
OS family X X X X X O
3. Analysis Cores per socket X X X X X O
Architecture O O O O O O
3.1. Dataset Accelerator/co-processor X X X X O O
Segment O O O O O O
Application area O O O O O X
The TOP500 list was rst created in 1993 to assemble and
Interconnect family O O O O O O
maintain a list of the 500 most powerful computer systems [38]. Interconnect O O O O O O
Since the list has been compiled twice a year, datasets from 1993 Region O O O O O O
to 2013 have been combined and cleaned up so that each machine Continent O O O O O O
appears once in the nal dataset. The purpose of this study is to
O: Available, X: unavailable.
consider both energy efciency and performance effectiveness,
therefore lists up to 2007 were excluded due to the lack of
information on the power consumption (see Table 1.) Variables
variables and the maximum LINPACK performance (Rmax) was
selected for this study are as follows:
used as the output variable. This allows the model to identify the
better performing supercomputer which has lower power, fewer
 Name (text): name of machine
cores, and higher performance. Orientation can be either input-
 Year (year): year of installation/last major update
oriented or output-oriented and can be best thought of as whether
 Total Cores (number): number of processors
the technological progress is better characterized as input reduc-
 Rmax (Gigaops): maximal LINPACK performance achieved
tion or output augmentation [40]. While power consumption
 Power (Kilowatts): power consumption
will be a key concern in the Exascale computing, the advancement
 Interconnect family (text): interconnect being used
of this industry has been driven primarily by computing perfor-
 Processor technology/family (text): processor architecture
mance, i.e. ops, improvement. Besides, the Exascale computing is
being used
a clearly dened development goal therefore an output orientation
was selected for this application. It should be noted here that
In the nal dataset, there were total 1,199 machines, with
either orientation can deal with tradeoffs among input and output
number of cores ranging from 960 to 3.12 million, power ranging
variables. As with many DEA applications, variable returns to scale
from 19 kW to 17.81 MW, Rmax ranging from 9 Teraops to 33.86
(VRS) was selected for appropriate returns to scale assumption
Petaops from 2002 to 2013. Note that logarithmic transformation
since doubling the input(s) doesnt correspond to doubling the
was applied to three variables prior to the analysis due to their
output(s) here as well. The main purpose of this study is to make a
exponentially increasing trends.
forecast of the Exascale computer deployment by examining past
rate of progress, thus the frontier year of 2013 was used so as to
3.2. Model building cover the whole dataset. Lastly, minimizing the sum of effective
dates was added as a secondary goal into the model to handle the
The TFDEA model has been implemented using the software potential issue of multiple optima from the dynamic frontier year
developed by Lim and Anderson1 [39]. As discussed earlier, power [41]. Table 2 summarizes the model parameters used in this study.
consumption and the number of cores were used as input Fig. 1 shows thirteen supercomputers identied as SOAs from
the analysis. Intel provided the processors for the largest share
1
R package for TFDEA is available at http://cran.r-project.org/web/packages/ (62%) and, inter alia, GPU/Accelerator based systems dominated
TFDEA/index.html. A web-based version is also available at http://tfdea.com. both energy and core efcient systems, while IBMs Blue Gene,
D.-J. Lim et al. / Omega 51 (2015) 128135 131

NNSA/SC and Blue Gene/Q, showed comparable energy efciency list in which technological efforts to become energy efcient and/
as manycore based systems. As also seen from specications of or core efcient are not taken into account.
these supercomputers in Table 3, supercomputers are character- Fig. 2 illustrates performance trajectories based on 1,199 super-
ized as being competitive in consideration of tradeoffs within computers from three dominant processor families: AMD X86, IBM
different scale sizes. This enables the model to construct technol- Power, and Intel (IA-32/64, Core, Nehalem, Westmere, and Sandy
ogy frontiers from which various production possibilities can be Bridge.) Since the Japanese supercomputer, Earth-Simulator, in
identied. This characteristic, in fact, differentiates the TFDEA 2002 was built using a Nippon Electric Company (NEC) chip
process from a single dimensional measure such as the TOP500 which was not adopted by other supercomputer manufacturers
thereafter, Fig. 2 is drawn from 2005 to focus on main vendors of
processor for todays systems. The ordinate is the overall perfor-
mance score from the DEA model. As such, each line indicates
Table 2
performance trajectory of the top performing supercomputers from
TFDEA model parameters.
each year against the frontier year of 2013. That is, a performance
Inputs Output Orientation RTS Frontier Frontier Second score of 100% indicates that the supercomputer has a superior
year type goal performance enough to be on the SOA frontier in 2013. A perfor-
mance score higher than 100% denotes super-efciency from the
Power, Rmax Output VRS 2013 Dynamic Min
cores
DEA model which can show how much the supercomputer is
outperforming other SOA supercomputers.

Fig. 1. 13 State of the art supercomputers considering system tradeoffs.

Table 3
Specications of 13 state of the art supercomputers considering system tradeoffs.

Name Date Cores Power Rmax Interconnect Processor family

Eurora Eurotech Aurora HPC 1020 2013 2,688 46.00 100,900 InniBand Intel
Tianhe-2 TH-IVB-FEP 2013 3120,000 17,808.00 33,862,700 Custom Intel
HPCC 2013 10,920 237.00 531,600 InniBand Intel
Titan Cray XK7 2012 560,640 8,209.00 17,590,000 Cray AMD
Beacon Appro GreenBlade GB824M 2012 9,216 45.11 110,500 InniBand Intel
BlueGene/Q, Power BQC 16C 1.60GHz 2012 8,192 41.09 86,346 Custom IBM Power
iDataPlex DX360M3 2011 3,072 160.00 142,700 InniBand Intel
NNSA/SC Blue Gene/Q Prototype 2 2011 8,192 40.95 85,880 Custom IBM Power
DEGIMA Cluster 2011 7,920 31.13 42,830 InniBand Intel
BladeCenter QS22 Cluster 2008 1,260 18.97 9,259 InniBand IBM Power
Cluster Platform 3000 BL2x220 2008 1,024 42.60 9,669 InniBand Intel
Power 575, p6 4.7 GHz, Inniband 2008 960 153.43 14,669 InniBand IBM Power
BladeCenter HS21 Cluster 2007 960 91.55 9,058 InniBand Intel
132 D.-J. Lim et al. / Omega 51 (2015) 128135

Fig. 2. Performance trajectories of different processor families.

The trajectory of Many/Multicore systems shows that IBM computing and accelerator resources, as well as improves produc-
Power (PC) processor based machines are outperforming AMD tivity and scalable performance [47]. Intel acquired the InniBand
X86 processor based machines. AMD X86 processor based machines business from Qlogic in 2012 to support innovating on fabric
however showed surpassing performances over IBM Power (PC) architectures not only for the HPC but data centers, cloud, and
based machines when they were adopted by Cray to build hybrid Web 2.0 market [48].
systems in 2011 and 2012. In fact, the successful development of Recent attention is focusing on Intels next generation super-
Titan Cray XK7 using AMD Opteron CPUs coupled with NVidia computer which will adopt Crays Aries interconnect with Intel
coprocessors has made Cray Inc. one of the leading supercomputer Xeon Phi accelerator as their rst non-InniBand based hybrid
vendors to date. Interestingly, this is also consistent with the fact system after their acquisition of interconnect business of Cray [49].
that Cray Inc. was awarded the $188M U.S. Blue Waters contract, This transition reects the strategic decision of Cray to use an
which is a project funded by National Science Foundation (NSF), independent interconnect architecture rather than a processor
replacing IBM which had pulled out of the project prior to comple- specic one as AMDs performance and supply stability fell behind
tion in 2011 [42]. competitors [44,50].
It is also interesting to point out that the performance gap Unlike AMD X86 or Intel processor based systems, the top
between Many/Multicore based machines and GPU/Accelerator performing supercomputers using IBM Power (PC) processor were
based machines is larger in supercomputers using AMD X86 Many/Multicore systems. IBM initially developed the multicore
processors than Intel processors. This can be attributed to the architecture, later evolved to manycore systems, known as Blue
partnership between Cray and AMD. In fact, Cray has been a staunch Gene technology. The Blue Gene approach is to use a large
supporter of AMD processors since 2007 and their collaboration has number of simple processing cores and to connect them via a
delivered continued advancement in HPC [43]. In particular, Crays low latency, highly-scalable custom interconnect [51]. This has the
recent interconnect technology, Gemini, was customized for the advantage of achieving a high aggregate memory bandwidth,
AMD Opteron CPUs Hyper-Transport links to optimize internal whereas GPU clusters require messages to be copied from the
bandwidth [44]. Since modern supercomputers are deployed as GPU to the main memory and then from main memory to the
massively centralized parallel systems, the speed and exibility of remote node, whilst maintaining low power consumption as well
interconnect becomes important for the overall performance of as cost and oor space efciency [52]. Currently, GPU/Accelerator
supercomputer. Given that hybrid machines using AMD X86 pro- based systems suggest smaller cluster solutions for the next
cessors all use Crays interconnect system, one may notice that AMD generation HPC with its promising performance potential, how-
X86 based supercomputers had a signicant performance contribu- ever the Blue Gene architecture demonstrates an alternate direc-
tion from Cray interconnect as well as NVidia coprocessors. tion of massively parallel quantities of independently operating
One may notice that top supercomputers based on Intel cores with fewer programming challenges [53].
processors have switched to hybrid systems since 2010. This is
because combining CPUs and GPUs is advantageous in data
parallelism which makes it possible to balance the workload 3.3. Model validation
distribution as efcient use of computing resources becomes more
important in todays HPC structure [45]. Hybrid machines using To validate a predictive performance of the constructed model,
Intel processors have all adopted InniBand interconnect for their we conducted hold-out sample tests. Specically, a rolling origin
cluster architectures regardless of GPUs/Accelerators; NVidia, ATI was used to determine the forecast accuracy by collecting devia-
Radeon, Xeon Phi, PowerXCell, etc. InniBand, manufactured by tions from multiple forecasting origins so that the performance of
Mellanox and Intel, enables low processing overhead and is ideal the model can be tested both in near-term and far-term. This thus
to carry multiple trafc types such as clustering, communications, provides an objective measure of accuracy without being affected
and storage over a single connection [46]. Especially, its GPU- by occurrences unique to a certain xed origin [54]. The compara-
Direct technology facilitates faster communication and lower tive results with planar model and random walk are summarized
latency of GPU/Accelerator based systems that can maximize in Table 4.
D.-J. Lim et al. / Omega 51 (2015) 128135 133

Table 4
Model validation using a rolling origin hold-out sample tests.

Forecast origin Mean absolute deviation (unit: year)

Hybrid systems Multicore systems Manycore systems

TFDEA Planar model Random walk TFDEA Planar model Random walk TFDEA Planar model Random walk

2007 N/A N/A N/A N/A N/A N/A 1.8075 2.8166 2.9127
2008 N/A N/A N/A N/A N/A N/A 1.4470 2.5171 2.4949
2009 1.5814 2.7531 2.1852 N/A N/A N/A 2.0060 2.3593 2.0509
2010 1.1185 1.9956 1.5610 N/A N/A N/A 1.4996 2.0863 1.6016
2011 1.8304 1.5411 1.2778 0.9899 0.7498 1.0000 1.2739 1.8687 1.3720
2012 0.7564 1.2012 1.0000 N/A N/A N/A 0.8866 2.2269 1.0000

Average 1.3217 1.8728 1.5060 0.9899 0.7498 1.0000 1.4867 2.3125 1.9053

Since the rst hybrid system, Blade Center QS22, appeared in Table 5
2008 in our dataset, the hold-out sample test was conducted from Exascale computer as a forecasting target.
the origin of 2009 for hybrid systems. That is, the mean absolute
Cores Power Rmax
deviation of 1.58 years was obtained from TFDEA when the model
made a forecast on arrivals of post-2009 hybrid systems based on 100 million 20 MW 1 Exaops
the rate of technological progress observed from 2008 to 2009.
The overall forecasting error across the forecasting origins was
found to be 1.32 years which is more accurate than planar model represented by Tianhe-2. Considering the possible deviations
and random walk. identied in the previous section, one could expect the arrival of
Although multicore systems showed successive introductions hybrid Exascale system within the 2020 timeframe. In fact, many
from 2007 to 2012, technological progress, i.e. expansion of SOA industry experts claim that GPU/Accelerator based systems will be
frontier surface, hasnt been observed until 2010. This rendered more popular in TOP500 list for their outstanding energy ef-
the model able to make a forecast only in 2011. The resulting ciency, which may spur the Exascale development [13,16].
forecast error of TFDEA was found to be about a year which is The forecasted arrival time of the rst multicore based Exascale
slightly bigger than that of planar model however care must be system is far beyond 2020 due to the slow rate of technological
taken to interpret this error since this was obtained only from a advancement: 1.19% as well as relatively underperforming perfor-
year ahead forecast in 2011. mances. Note that projection from the planar model also esti-
Consecutive introductions of manycore systems with a steady mated the arrival of multicore based Exascale system farther
technological progress made it possible to conduct hold-out beyond 2020 timeframe. This implies that innovative engineering
sample tests from the origin of 2007 to 2012. Notwithstanding a efforts are required for multicore based architecture to be scaled
bigger average forecasting error of 1.49 years due to the inclusion up to the Exaop performance. Even though the RIKEN embarked
of errors from longer forecasting windows than other two systems, on the project to develop the Exascale system continuing the
TFDEA showed outperforming forecast results compared to the preceding success of K-computer, IBMs cancellation of Blue Water
planar model and random walk. contract and recent movement toward design house raise ques-
Overall, it is shown that TFDEA model provides a reasonable tions on the prospect of multicore based HPCs [55,56].
forecast for three types of supercomputer systems with the The rst manycore based system is expected to reach the Exascale
maximum possible deviation of 18 months. In addition, it is target by 2022.28. This technology path has been mostly led by the
interesting to note that forecasts from TFDEA tended to be less progress of Blue Gene architecture and shown the individualized RoC
sensitive to the forecasting window than the planar model or of 2.34%. Nonetheless, this fast advancement couldnt overcome the
random walk. This implies that the current technological progress current performance gap with hybrid systems in the Exascale race.
of supercomputer technologies exhibits multifaceted characteris-
tics that can be better explained by various tradeoffs derived from
the frontier analysis. In contrast, a single design tradeoff identied
from the central tendency model was shown to be vulnerable 4. Discussion
especially to the long-term forecast.
The analysis of technologies RoCs makes it possible to forecast
3.4. Forecasting a date for achieving Exascale performance from three different
approaches; however it is worthwhile to examine these forecasts
We now turn to the forecasting of the Exascale systems. As with consideration for the business environment and emerging
previously noted, the design goal of the Exascale supercomputer is technologies to anticipate the actual deployment possibilities of
expected to have the Exaops (1018 op/s) with 20 MW power the Exascale systems.
consumption and 100 million total cores (see Table 5) [7,11]. These The optimistic forecast is that, as seen from the high perform-
specications were set as a forecasting target to estimate when ing Tianhe-2 and Titan Cray XK7 system, there would be an Intel
this level of systems could be reached considering the RoCs or AMD based system with a Xeon Phi or NVidia coprocessor and a
identied from the past advancements in a relevant segment. custom Cray interconnect system. However, given business reali-
Table 6 summarizes the forecasting results from three devel- ties its unlikely that the rst Exascale system will use AMD
opment possibilities. Exascale performance was forecasted to be processors. Intel purchased the Cray interconnect division and is
achieved earliest by hybrid systems in 2021.13. Hybrid systems are expected to design the next generation Cray interconnect opti-
expected to accomplish this with a relatively high individualized mized for Intel processors and Xeon Phi coprocessors [57]. Existing
RoC of 2.22% and having the best current level of performance technology trends and the changing business environment would
134 D.-J. Lim et al. / Omega 51 (2015) 128135

Table 6
Forecast results of Exascale supercomputer.

Hybrid system Multicore system Manycore system

Individualized rate of change (RoC) 1.022183 1.011872 1.023437


Forecasted arrival of Exascale supercomputer 2021.13 2031.74 2022.28

make a forecast a hybrid Exascale system with a Cray interconnect, industry concern that the nave forecast based on the past
Intel Processors and Xeon Phi coprocessors. performance curve may have to be adjusted. TFDEA is highly
The 2.22% year improvement for hybrid systems has come relevant in this context especially to deal with multiple tradeoffs
mostly from a combination of advances in Cray systems, such as between systems attributes. This study examined comparative
their transverse cooling system, Cray interconnects, AMD processors prospects of three competing technology alternatives with various
and NVidia coprocessors. It is difcult to determine the contribution design possibilities considering business environment to achieve
of each component however it is worth noting that only Cray the Exascale computing so that researchers and manufacturers can
systems using AMD processors were SOAs. This implies that Crays have an accurate view on their development targets. In sum, the
improvements are the highest contributor to the RoC for AMD results showed that current development target of 2020 might
based systems. Intel moving production of Cray interconnect chips entail technical risks considering the rate of change toward the
from TSMC to Intels more advanced processes will likely result in energy efciency observed in the past. It is anticipated that either
additional performance improvement. Thus, we can expect that a Cray built hybrid systems using Intel processors or an IBM built
Intel/Cray collaboration will result in RoC greater than the 2.22% and Blue Gene architecture system using PowerPC processors will
might reach the Exascale goal earlier. likely achieve the goal between early 2021 and late 2022.
As another possibility of achieving Exascale systems, IBMs Blue In addition, the results provided a systematical measure of
Gene architecture using IBM Power (PC) processor with custom technological change which can guide a decision on the new
interconnects has shown a 2.34% yearly improvement building on product target setting practice. Specically, the rate of change
the 3rd highest rated Sequoia system. The Blue Gene architecture, contains information not only about how much performance
with a high bandwidth, low latency interconnects and no copro- improvement is expected to be competitive but also about how
cessors to consume bandwidth or complicate programming, is an much technical capability should be relinquished to achieve a
alternative to the coprocessor architectures being driven by Intel specic level of technical capabilities in other attributes. One can
and AMD. Given their stable business environment they may be also utilize this information to anticipate the possible disruptions.
more effective moving forward while Intel/Cray work out their As shown in the HPC industry, the rate of change of manycore
new relationship. system was found to be faster than that of hybrid system. Although
Who has the system experience to build an Exascale system? the arrival of hybrid Exascale system is forecasted earlier because of
Cray, IBM and Appro have built the largest SOA OEM systems. In its current surpassing level of performance, fast rate of change of
2012, Cray purchased Appro leaving two major supercomputer manycore system implies that the performance gap could be over-
manufactures [58]. Based upon the captured RoCs and the busi- come and Blue Gene architecture might accomplish the Exascale
ness changes we can expect that the rst Exascale system will be goal earlier if hybrid system development couldnt keep up with the
built by either Cray or IBM. expected progress. Furthermore, the benchmark results can provide
Data driven forecasting techniques, such as TFDEA, make a information about market segments relevant to the new technology
forecast on technical capabilities based upon released products, so alternatives. This makes it possible to identify competitors as well
emerging technologies that are not yet being integrated into as dominant designs in a certain segment under which new product
products are not considered. In the supercomputer academic developers may be searching for market opportunities.
literature, there is an ongoing debate about when the currently A new approach presented in this study can take segmented
dominating large core processors (Intel, AMD) will be displaced by rate of change into account. Unlike traditional TFDEA relying on a
larger numbers power-efcient lower performance small cores constant rate of change, presented model enables to obtain
such as ARM; much as what happened when microprocessors variable rates of change for each product segment thereby either
displaced vector machines in the 1990s and ARM based mobile estimating technical capabilities of products at a certain point in
computing platforms are affecting both Intel and AMD X86 desk- time or forecasting the time by which desired levels of products
top and laptop sales [13,18]. Although there is no ARM based will be operational.
supercomputer in the TOP500 yet, the European Mont-Blanc Lastly, as an extension of this study, future research can
project is targeting getting one on the list by 2017 and NVidia is consider:
developing an ARM based supercomputer processor for use with
its coprocessor chips [59]. Small cores are a potentially disruptive  imposing weight restrictions to assess technologies in line with
technology as long as power efciency is concerned, therefore the practical views,
further analysis is needed to investigate when it will overcome the  elaborating disruption possibilities from small core systems,
challenges of building interconnects to handle a larger number of  incorporating external factors that can either stimulate or
smaller cores or when software developers will overcome the constrain the technological progress.
synchronization challenges of effectively using more cores.

Acknowledgement
5. Conclusion
Authors would like to thank Dr. Wilfred Pinfold for his insight-
The HPC industry is experiencing a radical transition which ful comments on an earlier draft, the Associate Editor and three
requires improvement of power efciency by a factor of 23 to anonymous reviewers for their thoughtful and constructive sug-
deploy and/or manage the Exascale systems. This has created an gestions during the review process.
D.-J. Lim et al. / Omega 51 (2015) 128135 135

References [29] Fried HO, Lovell CAK, Schmidt SS. The measurement of productive efciency
and productivity growth. 1st ed.. New York: Oxford University Press; 2008.
[30] Nakabayashi K, Tone K. Egoists dilemma: a DEA game. Omega 2006;34:
[1] Lim, DJ, Kocaoglu, DF., ChinaCan it move from imitation to innovation?. In:
13548.
Technol. manag. energy smart world (PICMET); 2011 Proc. PICMET11, n.d.
[31] Doyle JR, Green RH. Comparing products using data envelopment analysis.
p. 113.
Omega 1991;19:6318.
[2] Feitelson DG. The supercomputer industry in light of the Top500 data. Comput
[32] Cook WD, Tone K, Zhu J. Data envelopment analysis: prior to choosing a
Sci Eng 2005;7:427.
model. Omega 2014;44:14.
[3] Dongarra, J., Visit to the National University for Defense Technology Changsha,
[33] Alexander AJ, Nelson JR. Measuring technological change: aircraft turbine
China, Tennessee; 2013.
[4] National Research Council, Getting Up to Speed: The Future of Supercomput- engines. Technol Forecast Soc Change 1973;5:189203.
ing, 1st ed., National Academies Press, Washington, D.C; 2005. [34] Martino JP. Measurement of technology using tradeoff surfaces. Technol
[5] Dongarra J, Meuer H, Strohmaier E. TOP500 supercomputer sites. Super- Forecast Soc Change 1985;27:14760.
computer 1997;13:89111. [35] Lim D-J, Anderson TR, Inman OL. Choosing effective dates from multiple
[6] Peckham M. RIKEN plans exascale supercomputer 30 times faster than optima in technology forecasting using data envelopment analysis (TFDEA).
todays fastest in six years. Time 2013. Technol Forecast Soc Change 2014;88:917.
[7] Ashby, S, Beckman, P, Chen, J, Colella, P, Collins, B, Crawford, D., et al., The [36] Lim D-J, Anderson TR, Improving Forecast Accuracy by a Segmented Rate of
opportunities and challenges of Exascale computing, U.S Dep. Energy Ofce Change in Technology Forecasting using Data Envelopment Analysis, in:
Sci.; 2010. p. 171. Infrastruct. Serv. Integr., PICMET, Kanazawa, 2014: pp. 29032907.
[8] Simon, H, Zacharia, T, Stevens, R., Modeling and simulation at the exascale for [37] Lim D-J, Jahromi SR, Anderson TR, Tudorie AA. Comparing technological
energy and the environment, Dep. Energy Tech. Rep.; 2007. advancement of hybrid electric vehicles (HEV) in different market segments.
[9] Hsu, C-H, Kuehn, JA, Poole, SW., Towards efcient supercomputing. In: Proc. Technol Forecast Soc Change 2014. http://dx.doi.org/10.1016/j.techfore.2014.
Third Jt. WOSP/SIPEW int. conf. perform. eng.ICPE12, ACM Press, New York, 05.008.
New York, USA; 2012: p. 157. [38] Dongarra JJ, Luszczek P, Petitet A. The LINPACK Benchmark: past, present and
[10] Kamil, S, Shalf, J, Strohmaier, E., Power efciency in high performance future. Concurr Comput Pract Exp 2003;15:80320.
computing. In: 2008 IEEE int. symp. parallel distrib. process., IEEE; 2008. [39] Lim, D-J, Anderson, TR., An introduction to technology forecasting with a
p. 18. TFDEA excel add-in. In: Technol. manag. emerg. technol. (PICMET), 2012 proc.
[11] Robert F. Who will step up to exascale? Science 2013;80:2646. PICMET 12; 2012.
[12] Geller T. Supercomputings exaop target. Commun ACM 2011;54:16. [40] Lim D-J, Runde N, Anderson TR. Applying technology forecasting to new
[13] Simon, H., Why we need exascale and why we wont get there by 2020. In: product development target setting of LCD panels. In: Klimberg R, Lawrence
Opt. interconnects conf., Lawrence Berkeley National Laboratory, Santa Fe;
KD, editors. Adv bus manag forecast. 9th ed.. Emerald Group Publishing
2013.
Limited; 2013. p. 340.
[14] Owens JD, Houston M, Luebke D, Green S, Stone JE, Phillips JC. GPU computing.
[41] Anderson, TR, Inman, L., Resolving the issue of multiple optima in technology
Proc IEEE 2008;96:87999.
forecasting using data envelopment analysis. In: Technol. manag. energy
[15] Li Sheng, Ahn, J-H, Strong, RD, Brockman, JB, Tullsen, DM, Jouppi, NP., McPAT:
An integrated power, area, and timing modeling framework for multicore and smart world, PICMET; 2011. p. 15.
manycore architectures. In: MICRO-42. 42nd Annu. IEEE/ACM int. symp. on. [42] Wood P. Cray Inc. replacing IBM to build UI supercomputer. News Gaz 2011:1.
IEEE, IEEE, New York; 2009. p. 469480. [43] Lawrence L. Cray jumped from AMD for the exibility and performance of
[16] Panziera JP. Five challenges for future supercomputers. Supercomputers Intel chips. Inquiry 2012.
2012:3840. [44] Murariu C. Intel signs deal with cray, marks AMDs departure from crays
[17] Vuduc R, Czechowski K. What GPU computing means for high-end systems. supercomputers. Softpedia 2012.
IEEE Micro 2011:748. [45] Wang F, Yang C-Q, Du Y-F, Chen J, Yi H-Z, Xu W-X. Optimizing linpack
[18] Rajovic, N, Carpenter, P, Gelado, I, Puzovic, N, Ramirez, A., Are mobile benchmark on GPU-accelerated petascale supercomputer. J Comput Sci
processors ready for HPC?. In: Supercomput. conf., barcelona supercomputing Technol 2011;26:85465.
center, Denver; 2013. [46] Cole A. Intel and InniBand: pure HPC play, or is there a fabric in the works? IT
[19] Bogetoft P, Otto L. Benchmarking with DEA, SFA, and R. 1st ed.. New York: Bus Edge 2012.
Springer; 2010. [47] InniBand strengthens leadership as the high-speed interconnect of choice,
[20] Giokas D. Bank branch operating efciency: a comparative application of DEA Mellanox Technol.; 2012. p. 131.
and the loglinear model. Omega 1991;19:54957. [48] Whittaker Z. Intel buys QLogics InniBand assets for $125 million. ZD Net
[21] Liu JS, Lu LYY, Lu W-M, Lin BJY. A survey of DEA applications. Omega 2012.
2013;41:893902. [49] IntelPR, intel acquires industry-leading, high-performance computing inter-
[22] Thanassoulis E, Boussoane A, Dyson RG. A comparison of data envelopment connect technology and expertise, Intel Newsroom; 2012.
analysis and ratio analysis as tools for performance assessment. Omega [50] Morgan T. Intel came a-knockin for Cray super interconnects. Register 2012.
1996;24:22944. [51] Haring R, Ohmacht M, Fox T, Gschwind M, Boyle P, Chist N, et al. The IBM Blue
[23] Anderson, TR, Hollingsworth, K, Inman, L., Assessing the rate of change in the Gene/Q Compute Chip, Ieee Micro 32; 2011, 4860.
enterprise database system market over time using DEA. In: PICMET01. Portl. [52] Lazou C, Should I. Buy GPGPUs or Blue Gene?, HPCwire; 2010.
int. conf. manag. eng. technol.; 2001. p. 203. [53] Feldman M. Blue Genes and GPU Clusters Top the Latest Green500, HPCwire;
[24] Cole BF. An evolutionary method for synthesizing technological planning and 2011.
architectural advance. Georgia Institute of Technology; 2009. [54] Tashman LJ. Out-of-sample tests of forecasting accuracy: an analysis and
[25] Tudorie AA. Technology forecasting of electric vehicles using data envelop- review. Int J Forecast 2000;16:43750.
ment analysis. Delft Univ Technol 2012. [55] Kimihiko H. Exascale supercomputer project. Kobe 2013.
[26] Durmusoglu, A, Dereli, T., On the technology forecasting using data envelop- [56] Tung L. After X86 servers, is IBM gearing up to sell its chips business too?,
ment analysis (TFDEA). In: Technol. manag. energy smart world (PICMET), ZDNet; 2014.
2011 proc. PICMET 11, PICMET, U.S. Portland, 2011. [57] Nick Davis. Cray CS300 cluster supercomputers to feature new intel(R) Xeon
[27] Kim J-O, Kim J-H, Kim S-K. A hybrid technological forecasting model by
Phi(TM) coprocessors x100 family. Cray Media 2013.
identifying the efcient DMUs: an application to the main battle tank. Asian J
[58] Nick Davis. Cray to acquire appro international. Cray Media 2012.
Technol Innov 2007;15:83102.
[59] Aroca RV, Gonalves LMG. Towards green data centers: a comparison of x86
[28] Lim, D-J, Anderson, TR, Kim, J. Forecast of wireless communication technol-
and ARM architectures power efciency. J Parallel Distrib Comput 2012;72:
ogy: a comparative study of regression and TFDEA Model. In: Technol. manag.
177080.
emerg. technol., PICMET, Vancouver, Canada; 2012. p. 12471253.