Professional Documents
Culture Documents
Product Data
Yifan Sun, Nicolas Bohm Agostini, Shi Dong, and David Kaeli
Northeastern University
Email: {yifansun, agostini, shidong, kaeli}@ece.neu.edu
Abstract—Moore’s Law and Dennard Scaling have guided the products. Equipped with this data, we answer the following
semiconductor industry for the past few decades. Recently, both questions:
laws have faced validity challenges as transistor sizes approach
arXiv:1911.11313v1 [cs.DC] 26 Nov 2019
10
10 CPU 10
10 GPU
Transistors
Transistors
9 9
10 10
8 8
10 10
7 7
10 10
2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020
(a) Transistor count of CPUs. (b) Transistor count of GPUs.
Fig. 1. Moore’s Law is still valid for both CPUs and GPUs.
AMD NVIDIA
Density (Transistor per mm^2)
800
7
10
5
200
10
m
80 nm
65 nm
55 nm
45 nm
40 nm
32 nm
28 nm
22 nm
20 nm
16 nm
14 nm
12 nm
10 nm
7 nm
22 nm
18 nm
15 nm
13 nm
11 nm
90 nm
III. F INDINGS
A. Moore’s Law
device’s power consumption. Chips can easily consume more Researchers have been predicting the end of Moore’s Law
power than their TDP specification for a short periods of time. for a long time [5], [6], [10], [11]. However, Figure. 1
Chips may also maintain energy consumption under TDP by suggests the number of transistors on a chip is still increasing
shutting down parts of the circuit. However, TDP can be a exponentially over time. For CPUs, we lack critical data
reasonable estimation of the average power consumption when from the years 2014 to 2017 because Intel stopped releasing
a device runs regular workloads at its base clock. transistor count and die size data since their 7th generation
Core CPU. However, AMD restarted to produce high-end
Although we have most of the product data, we miss the CPUs with large die-size recently. We can observe that the
critical data of CPUs produced by Intel since 2014. Intel CPU transistor scaling trend is continuing to follow the pre-
stopped revealing its CPU transistor counts and the die sizes 2014 trend. Also, Figure. 1 suggests that vendors tend to
since the 7th generation Core CPU. Therefore, when present- use new CMOS technologies in high-end products first. Low-
ing detailed studies that are related to transistor counts and end products may continue to use an older version of the
die sizes (Figure. 2, 3, 4), we focus on the GPU data. Also, in CMOS process when the high-end products switch to a new
these analyses, we only show devices that are manufactured by technology node.
the Taiwan Semiconductor Manufacturing Company (TSMC). The next question we need to ask is, ”What drives increases
We select TSMC because TSMC has produced the highest in transistor count?” In Figure. 2, we see that transistor scaling
number of chips in our dataset. We only consider chips from is playing an essential role in increasing the die density (num-
a single foundry since different foundries tend to use different ber of transistors per area). For the devices produced within a
measurements when computing their process sizes. specific process size, the variance in terms of transistor density
3
Watt Per Billion Transistors 10 1.0
0.8 1500
m
nm
nm
nm
nm
nm
nm
nm
nm
nm
nm
nm
nm
nm
nm
nm
nm
nm
nm
nm
nm
nm
nm
nm
nm
nm
nm
nm
nm
nm
nm
7n
7n
7n
90
80
65
55
40
32
28
16
14
12
90
80
65
55
40
32
28
16
14
12
90
80
65
55
40
32
28
16
14
12
(a) Watt per billion transistor vs. process size. (b) Watt per mm^2 vs. process size. (c) Base clock vs. process size.
Fig. 4. Process Size vs. (a) Energy consumption per billion transistor. (b) Energy consumption per area. (c) Base clock speed.
Vendor AMD NVIDIA other nodes. The most recent 7nm technology has an unusually
Die Size 300.0 600.0 900.0
high energy density, exceeding the 98 percentile of the energy
density of all other nodes. It is still too early to conclude
10
8 the end of Dennard Scaling at the 7nm node since there have
been only eight 7nm GPUs produced at the time of this report.
We expect chip manufacturers to develop solutions to improve
energy efficiency.
FLOPS per Watt
8
16
7
14
6
12
10 5
TFLOPS
8 4
6 3
4 2
2 1
0 0
2008 2010 2012 2014 2016 2018 2020 2008 2010 2012 2014 2016 2018 2020
(a) Single-precision performance. (b) Double-precision performance.
TDP 0.0 150.0 300.0 450.0 Base Clock 0.0 600.0 1200.0 1800.0 Die Size 0.0 300.0 600.0 900.0
10 10 10
10 10 10
Single Precision FLOPS
9 9 9
10 10 10
8 8 8
10 10 10
7 7 7
10 10 10
2008 2010 2012 2014 2016 2018 2020 2008 2010 2012 2014 2016 2018 2020 2008 2010 2012 2014 2016 2018 2020
(a) GPU TDP vs. FLOPS. (b) GPU base clock vs. FLOPS. (c) GPU die size vs. FLOPS.
Fig. 7. The trend of GPU Performance, TDP, base clocks, and die sizes.
GPU (NVIDIA GeForce RTX 2080). It can also deliver more GPUs, the ratio is usually 2:1. Interestingly, for GPUs of the
than half of the computing power of the most powerful GPU same product series, the ratio is not fixed. In recent years, with
(an NVIDIA Tesla V100). The increase in CPU computing the development of new CPU technologies, CPUs can have a
power is a combined result of recently developed CPU SIMD much higher double-precision performance than many GPUs.
instructions extensions (e.g., SSE and AVX), an increased We suggest that users check the specifications of their CPUs
number of cores, and an increased CPU frequency. The and GPUs before using GPUs for double-precision computing.
reduction in the CPU-GPU performance gap suggests that we
should not ignore the CPU computing power when considering D. GPU Performance
CPU-GPU heterogeneous computing applications [3], [9]. GPU performance is rapidly increasing. In the year 2019,
GPUs tend to have very different performance on double- the GPU with the smallest die size and consumed the least
precision computing. A few GPUs (e.g., the NVIDIA Tesla amount of energy has higher performance than the flagship
V100) have much higher performance than CPUs, while other GPU of 2007. Here, we analyze the factors that drive perfor-
GPUs struggle to be faster than CPUs in double-precision mance improvements of GPUs.
computing. GPUs have very different double-precision com- According to Figure. 7 (a), the high-end GPUs always
puting capabilities because GPU vendors design GPUs for dif- consume more energy compared to lower-end GPUs released
ferent markets. For gaming-oriented GPUs, the ratio between around the same time. Over time, the TDP of flagship GPUs
the number of single-precision units and double-precision units are increasing from around 150W to around 300W. High-TDP
is usually 32:1. For high-performance computing-oriented GPUs (≈300W) start to appear around 2010-2012. Since then,
the increases in TDP have slowed as the TDP approaches We also compared the performance of CPUs and GPUs.
thermal dissipation limits imposed by current air or water GPUs can still deliver a much higher single-precision com-
cooling solutions. The high-end GPUs commonly have a puting power than consumer-level CPUs. However, recent
300W TDP. Only 3 AMD GPUs exceed the 300W TDP. development in parallel CPU computing has challenged the
We can also observe in Figure. 7 (a) that the power required GPU’s dominance. The gap between a CPU’s and a GPU’s
to drive the same TFLOPS performance halves every three computing capabilities was becoming smaller.
to four years. Newer devices have been consistently able to Finally, we analyzed the factors that drove up the perfor-
deliver more performance within the same power budget every mance improvement of GPUs. Frequency increases are the
year. If this trend continues, in 2020, we will see devices that most critical factor that improved GPU performance. Die size
consume less then 200W delivering more than 10 TFLOPS in and TDP increases also contributed to GPU performance im-
performance. provements. We concluded that the energy efficiency (FLOPS
The frequency increase is the most important factor that per Watt) of GPUs doubles approximately every three to four
drives the performance improvement of GPUs. The frequency years.
of GPUs tripled from around 600MHz to 1.8GHz (Figure. 7
R EFERENCES
(b)). The biggest jump happened around the year 2016 when
NVIDIA released its Pascal GPUs. To be specific, the GTX- [1] Akhil Arunkumar, Evgeny Bolotin, Benjamin Cho, Ugljesa Milic, Eiman
Ebrahimi, Oreste Villa, Aamer Jaleel, Carole-Jean Wu, and David
980 only runs at 1,216MHz frequency, while the GTX-1080 Nellans. Mcm-gpu: Multi-chip-module gpus for continued performance
runs at a frequency of 1,733MHz. scalability. In ACM SIGARCH Computer Architecture News, volume 45,
Another trend in GPU frequency is that high-end and low- pages 320–332. ACM, 2017.
[2] Robert H Dennard, Fritz H Gaensslen, V Leo Rideout, Ernest Bassous,
end GPUs released each year tend to use similar frequencies. and Andre R LeBlanc. Design of ion-implanted mosfet’s with very small
According to Figure. 7 (b), the frequency across devices physical dimensions. IEEE Journal of Solid-State Circuits, 9(5):256–
targeting different market segments varies based on the time 268, 1974.
[3] Juan Gómez-Luna, Izzat El Hajj, Li-Wen Chang, Vı́ctor Garcı́a-Floreszx,
when the GPU is released. Simon Garcia De Gonzalo, Thomas B Jablin, Antonio J Pena, and Wen-
Finally, as observed in Figure. 7 (c), the GPU die size is mei Hwu. Chai: Collaborative heterogeneous applications for integrated-
also increasing. The increased die size allows more transistors architectures. In 2017 IEEE International Symposium on Performance
Analysis of Systems and Software (ISPASS), pages 43–54. IEEE, 2017.
to reside in a single GPU. Note that the increases are coupled [4] Mohammad Khavari Tavana, Yifan Sun, Nicolas Bohm Agostini, and
together with the reduction in the process size. Increasing die David Kaeli. Exploiting adaptive data compression to improve per-
sizes reduce yield (the percentage of total manufactured chips formance and energy-efficiency of compute workloads in multi-gpu
systems. In 2019 IEEE International Parallel and Distributed Processing
that can pass validation tests). As the yield decreases, large die Symposium (IPDPS), pages 664–674. IEEE, 2019.
sizes push up the cost and price of GPU chips. Since CPUs [5] Laszlo B Kish. End of moore’s law: thermal (noise) death of integration
have already started to use MCM packaging technologies to in micro and nano electronics. Physics Letters A, 305(3-4):144–149,
2002.
mitigate this problem, we believe that a MCM-GPU package [6] Charles C Mann. The end of moore’s law? Technology Review,
is also a natural solution for GPUs. 103(3):42–42, 2000.
[7] John Nickolls and William J Dally. The gpu computing era. IEEE micro,
IV. C ONCLUSION 30(2):56–69, 2010.
[8] Robert R Schaller. Moore’s law: past, present and future. IEEE spectrum,
In this paper, we collected data from more than 4000 CPU 34(6):52–59, 1997.
and GPU products. With this data, we draw observations about [9] Yifan Sun, Xiang Gong, Amir Kavyan Ziabari, Leiming Yu, Xiangyu
Moore’s Law and Dennard Scaling, in terms of their general Li, Saoni Mukherjee, Carter McCardwell, Alejandro Villegas, and David
Kaeli. Hetero-mark, a benchmark suite for cpu-gpu collaborative
validity. Transistor scaling clearly plays an essential role in computing. In 2016 IEEE International Symposium on Workload
increasing the number of transistors on a single chip and Characterization (IISWC), pages 1–10. IEEE, 2016.
reducing the energy consumption per transistor. Architectural [10] Thomas N Theis and H-S Philip Wong. The end of moore’s law: A
new beginning for information technology. Computing in Science &
solutions, such as MCM packaging, DVFS, and clock gating, Engineering, 19(2):41, 2017.
have become increasingly important in maintaining the historic [11] M Mitchell Waldrop. The chips are down for moores law. Nature News,
scaling trends. 530(7589):144, 2016.