Professional Documents
Culture Documents
Yuksel 2020
Yuksel 2020
Chris Marroquin
IBM Corporation
2800 37th St. NW
Rochester, MN 55901, USA
Authorized licensed use limited to: University of Glasgow. Downloaded on November 01,2020 at 16:25:39 UTC from IEEE Xplore. Restrictions apply.
discussed and the effect of size of the microchannel heat sink configuration consisting of 2 IBM POWER CPUs (300 W each)
design on overall thermal performance have been investigated and 6 GPUs (400 W each). Five different cases shown in Fig.
[12, 13]. Also, pumping and flow rate requirements for 2 are analyzed using an analytical analysis of 1D resistive type
microchannel heat sink design were characterized [14]. network to understand the CPU and GPU’s temperature. In all
However, there are very few studies on flexible coldplate cases, a fixed flowrate (~1 gpm), warm water inlet temperature
design that can cool multiple devices as well as tolerating height (40°C) and measured coldplate thermal resistance data are used.
difference between closely spaced high power electronic Case 1 shows the serial combination of the CPUs and GPUs.
components with high reliability. Thus, we investigate the This case has the advantage of the highest flowrate to each
thermal performance of different flexible coldplate assemblies coldplate, but has the disadvantage of the greatest preheat to the
for optimal system level thermal and packaging design in this final GPU coldplate. It is observed that there is 8.3°C bulk
study. heating for the 1800 W of preheat upstream of the last GPU and
the last GPU reaches 78 °C. Case 2 illustrates the two parallel
WATER COOLING DESIGN OPTIONS path approach for each CPU and 2 GPUs. Around 8.3 °C bulk
Air cooling has traditionally been applied to cool the servers. heating is observed for the last GPU which reaches at 77.5 °C.
Nevertheless, increasing complexity and the power values in Case 5 shows the parallel combination of the CPUs and GPUs.
high power electronics such as CPU and GPU makes air cooling This case has the disadvantage of the lowest flowrate to each
less favorable due to its lower thermal cooling capacity. Water coldplate, but has the advantage of no preheat to any coldplate.
cooling; on the other hand, allows for higher power density. Hence, it can also be mentioned that this is the opposite of case
Thus, water cooling through coldplate assembly can be 1. This results in the lowest temperature of the last GPU of 77.3
designed to cool effectively high power components with °C which is thermally desired option.
dramatic increase of the electronic power components. As the
percentage of heat dumped to water is also very important in CASE I CPU CPU GPU GPU GPU GPU
data center for overall energy efficiency, air cooling via fans or
blowers at much lower speeds than acoustic limit can be
CPU GPU GPU
implemented to cool the low power electronic components on CASE II
the board in addition to the water cooling design aiming to cool
high power electronic components. Preheat from the high CPU GPU GPU
power electronic components create a challenge at the back
section of the server in general which affects the location of the
components in the board, overall design, etc.. for purely air GPU GPU
GPU
GPU
GPU
CPU
GPU
Authorized licensed use limited to: University of Glasgow. Downloaded on November 01,2020 at 16:25:39 UTC from IEEE Xplore. Restrictions apply.
However, due to low flowrate (0.167 gpm) per path, the total GPU coldplate has an advantage over the pressure drop
external resistance on the CPU and GPU coldplate assembly compared to serial flow path option. Moreover, thermal penalty
reaches the maximum value which is 0.093 °C/W. This could due to bulk heating is expected less on the last GPU by using
bring some additional thermal and flow related challenges and 2x flexible coldplate. However, cost is expected to be less
high thermal penalty for part-to-part coldplate variations. As favorable due to having more copper tubes and mechanical
such, very low flowrate is typically not desired for long-term assembly components. Hose connections and cable
reliability and optimum thermal performance. management might create some spacing issues in parallel flow
Table 1. Thermal performance of analyzed water cooling option than serial flow option. However, contacting of 4 GPUs
cases by single 4x flexible coldplate in serial flow option is more
Case Qupstream Flowrate Flowrate DTbulk to DTbulk Total Rext Tinlet Tlast GPU challenging than 2 GPUs in parallel flow option. In general,
of last to last to last last for last DTbulk (ºC/W) (ºC) (ºC) leaving as many mechanical components untouched in GPU
GPU GPU GPU GPU GPU (ºC)
(W) (gpm) (lpm) (ºC) (ºC) assemblies is desired for better serviceability and reliability.
Case II: Parallel Flow Path Option The pressure drop within the coldplate for different flowrate
Fig. 4 depicts the parallel flow path option. In parallel flow is illustrated in Fig. 6. The total pressure drop within the
option, each of the two parallel paths consists of a CPU and 2x coldplate with different flowrate is mainly from the overall
Authorized licensed use limited to: University of Glasgow. Downloaded on November 01,2020 at 16:25:39 UTC from IEEE Xplore. Restrictions apply.
mechanical design of the coldplate, inlet and outlet diameter of
3x Flexible GPU
the hose, mechanical structure of the barb and riser, necking Elsewhere
(7 mm Copper Cold Plate
32%
regions in the coldplate and microchannel design. Thus, the Pipe, Hoses, etc..)
35%
motivation of this work is to further improvement of the
coldplate structure for optimal thermal performance of high-end
server design.
Outlet
Barb
Riser Cold plate
Micro-Fin Min
Inlet
TIMs
Fig. 7. Experimental Demonstration of Serial Flow Path with y
3x Coldplate z x
The 3x flexible coldplate is approximately 1/3rd of the Volta GPUs with FETS, Inductors and HBM
overall total pressure drop. Thus, it is important to understand
the effect of Barb and Riser and the necking regions on overall Fig. 9. CFD Model of 3x Coldplate with 1/4 “ ID Barb & Riser
pressure drop within the coldplate structure. Assemblies on GPU Assembly
Authorized licensed use limited to: University of Glasgow. Downloaded on November 01,2020 at 16:25:39 UTC from IEEE Xplore. Restrictions apply.
(A) Flexible 2x Coldplate
Assembly with Each Neck 16.5 mm 16.5 mm 43 mm
Increased Neck Region Region Length
Authorized licensed use limited to: University of Glasgow. Downloaded on November 01,2020 at 16:25:39 UTC from IEEE Xplore. Restrictions apply.
section on the coldplate which is the neck region need to be
Speed further understood to improve the hydrodynamic
Max characteristics.
y
Barb
Barb Speed
Inlet x
Max
(B)
Min
Cold Plate & GPU Riser
Riser
Assembly
Micro-Fin Min
Inlet
Barb Pressure
Max
Fig. 11. CFD Results of 3x Coldplate: (A) Pressure and (B)
Speed Distribution
Pressure (Psi)
Pressure
y Riser
4.35 x
1st Micro-Fin Min
Micro-
channel
2.90
2nd
Micro- Fig. 13. Inlet Flow Section of Barb & Riser Flow
channel
Characteristics: (A) Speed Distribution (B) Pressure Drop
3rd It is also observed that the highest pressure drop occurs
Micro-
1.45 channel within the 2nd microchannel (middle). Very high speed flow due
to the smaller neck region enters to the 2nd microchannel. Thus,
there is a sudden velocity change at the inlet and the exit section
Distance (mm)
of the 2nd microchannel due to the fact the 2nd microchannel is
100 200
between the two neck regions. However, the flow distribution
at the inlet and the outlet of the 1st microchannel region doesn’t
Fig. 12. Pressure Drop within 3x Coldplate with 1/4” ID Barb
create such sudden velocity change compared to 2nd
and Riser Assembly
microchannel which results in lowest pressure drop at the 1st
As there is area change at the flow turn, circulation affects microchannel. The high speed flow also entering to 3rd
the flow characteristics at such high ReD number. The gradual microchannel right after passing the 2nd neck region and exiting
enlargement at the inlet Barb section which is the ratio of the the flow through the riser section are relatively observed better
Barb and Riser diameter, approximately 0.37, results in sudden flow characteristics within the riser section than the neck region
area change; thus, such sudden area change leads to pressure due to the geometrical differences between riser and the neck
loss. The flow at the outlet section do not encounter similar region. Thus, pressure drop is observed to be lower in 3rd
effects as in the inlet section which results in lower pressure microchannel than 2st microchannel but higher than 1st
drop at the outlet section. For instance, the gradual contractions microchannel.
at the outlet section has typically lower loss coefficient than Fig. 14 illustrates the percentage of the pressure drop per
gradual enlargement at the inlet section. This pressure drop assembly component across the coldplate. Neck regions and the
difference at the inlet and outlet section is found as ~0.15 psi inlet Barb & Riser assembly are observed as causing the
from Macroflow analysis. Also, the effect of gravity due to maximum pressure drop by having more than 20 % of the total
water flow within the Barb assembly has a component along pressure drop across the inlet and outlet section of the coldplate.
with the flow direction leading to increasing the pressure drop The pressure drop at the microchannels are not seen significant
from the analysis of the conservation of momentum along the onto the overall pressure drop within the coldplate assembly.
flow direction although due to using short Barb length, this Due to sudden area change at the neck regions (neck cross-
effect is found not significant. However, this affects oppositely sectional area << fin open area), the flow is not recovered fully
at the outlet section, which results in lower pressure drop. So, after the flow leaves from the neck region which results in high
about 0.47 psi which is almost the twice of the pressure drop at frictional loss. Thus, the design at the neck region needs to be
the outlet Barb & Riser itself is observed due to those affects. improved in order to reduce the pressure drop at the neck
Thus, especially inlet Barb and Riser design as well as critical regions.
Authorized licensed use limited to: University of Glasgow. Downloaded on November 01,2020 at 16:25:39 UTC from IEEE Xplore. Restrictions apply.
Outlet Barb & Riser 1st Microchannel
5%
respectively which is also around 60 % of reduction at the neck
7% 2st Microchannel
Inlet Barb & Riser 13% region. Thus, the reduction on the total pressure drop is mainly
23%
resulted from the pressure drop in the inlet Barb & Riser
3rd Microchannel
10% assembly and neck regions which is about 1 psi by only
increasing the Barb diameter by about 1.27 times. The flow
constriction at the gap area between the Barb and Riser structure
is reduced compared to design I which has 1/4 “ Barb ID. With
also reducing the gradual area enlargement by reducing the ratio
of the inlet Barb ID to the riser width compared to design I, the
1st Neck Region
19% flow characteristics is observed better in design II which results
2nd Neck Region
23% in reducing the pressure drop from 0.7 psi to 0.43 psi at the inlet
Barb and Riser assembly. Also, pressure drop within the
microchannel is not changed significantly compared to design I
Fig. 14. Percentage of Pressure Drop per Assembly as expected due to the same microfin structure and spacing with
Component Across the 3x Coldplate with 1/4” ID Barb and the same flowrate.
Riser Assembly Thus, increasing the neck region area within the coldplate
b. Design II: Barb & Riser Inlet Diameter = 6.35 mm could result in lower pressure drop at the same flow conditions
(3/8” ID), Flowrate=3 lpm with similar or better thermal performance at the same flowrate
The impact of increasing the hose ID and inlet barb diameter observed earlier from Table 1.
so as to reduce the overall pressure drop across the coldplate
2x Coldplate Design
assembly was simulated. A 6.35 mm (3/8” ID) inlet diameter
hose is used due to industry standard and packaging limitations a. Design III: Barb & Riser Inlet Diameter = 6.35
to understand the hydrodynamic characteristics. ReD number is mm (3/8” ID), Flowrate=3 lpm
reduced by ~1.7 times lower than design I with area increase at As a possible reduction of number of neck regions within the
the Barb section. 3x coldplate assembly, 2x coldplate could be designed.
The pressure drop at 3 lpm (~0.79 gpm) is found as 2.09 psi However, the dimensions of the coldplate assembly creates a
within the coldplate assembly from CFD modeling. The challenge within the high-end server due to closely spaced
pressure drop is observed as ~0.17 psi through the first electronic components on board, wiring and packaging issues.
microchannel, ~0.43 psi from the 2nd microchannel and ~0.30 So, after the investigation of mechanical assemblies near the
psi from the 3rd microchannel. Moreover, ~0.25 psi and ~0.30 coldplate and the GPU assembly, the height of the coldplate’s
psi pressure drop are observed at the 1st and 2nd neck region, neck region is observed to be possibly further increased, which
respectively. Also, about 0.43 psi and 0.21 psi pressure drop are results in approximately 2.35 times higher neck region area
found at the inlet 3/8” ID inlet and outlet Barb & riser assembly, compared to 3x coldplate assembly.
respectively. The percentage of pressure drop per assembly
1st Microchannel 2nd Microchannel Pressure
component across the coldplate is illustrated in Fig. 15.
(Flow Inlet Section) (Flow Outlet Section)
Max
Outlet Barb & Riser 1st Microchannel
10% 8%
2st Microchannel
Inlet Barb & Riser 21%
21%
3rd Microchannel
2nd Neck Region 14%
14%
Authorized licensed use limited to: University of Glasgow. Downloaded on November 01,2020 at 16:25:39 UTC from IEEE Xplore. Restrictions apply.
Fig. 16 illustrates the CFD modeling of 2x flexible coldplate Outlet Barb and Riser
15%
1st Microchannel
12%
by about 2.6 times compared to each neck region in the 3x Inlet Barb and Riser
31%
coldplate assembly. So, it is observed that the major loss term 31%
Barb Pressure
Inlet Max
Min
Neck Region
Riser
Micro-Fin Fig 19. Temperature Distribution in 2x Coldplate Assembly
Min As it is mentioned in Table 1 for case III that 3 parallel path
option which means the flow per path becomes 0.33 gpm at 1
gpm flowrate per server with the same impedance on each
Fig 17. Inlet Barb & Riser Flow Pressure Drop Characteristics parallel path can be implemented with 2x coldplate design.
Thus, CFD simulation is also performed for 0.33 gpm per
Also, ~0.17 psi and ~0.39 psi pressure drop are observed parallel path and maximum temperature on the downstream
through the first and second microchannel sink, respectively. It
GPU is observed about 4 °C lower than the thermal design point
is observed that ~0.35 psi pressure drop can be recovered in of 400 W GPU assembly (Throttling point of GPU main die 83
total at the neck regions due to reducing the number of neck
°C) which implies that theoretical and CFD results have in very
regions and increasing the neck region area compared to 3x
good agreement. It is also important to mention that GPU
coldplate assembly by 2.35. This shows that about 60 %
assembly could also throttle based on HBM die if the chip
pressure drop reduction in total within 3x coldplate neck
stacking in the HBM die could have different internal thermal
regions can be achieved by designing 2x coldplate assembly
resistance due to higher memory bandwidth, which is not
with 3/8 “ ID Barb and Riser assembly.
considered in this study.
The percentage of pressure drop per assembly component
across the 2x coldplate with 3/8” ID Barb and Riser assembly
CONCLUSION
is also shown in Fig. 18. It is observed similarly as in 3x
In this paper, we investigate the optimal thermal
coldplate assembly, inlet Barb & Riser and the outlet section
performance of coldplate design by analyzing different design
microchannel (2nd microchannel in 2x coldplate assembly)
options and mainly focusing on serial and different parallel flow
cause the highest pressure drop components across the 2x
path options for high-end server design. It is observed different
coldplate assembly.
Authorized licensed use limited to: University of Glasgow. Downloaded on November 01,2020 at 16:25:39 UTC from IEEE Xplore. Restrictions apply.
design options don’t affect significantly maximum temperature International Technical Conference and Exhibition on
on the last GPU which is the overall thermal design point. We Packaging and Integration of Electronic and Photonic
show that modeling the Barb and Riser assembly used in Microsystems. American Society of Mechanical Engineers
coldplate design and associated with the pressure drop and flow Digital Collection.
characteristics in the coldplate is very important for design. [9] Garimella, S. V., and Sobhan, C. B., Transport in
Using 3/8” ID Barb & Riser in 3x coldplate assembly compared Microchannels—A Critical Review, Annual Review of Heat
to 1/4” ID Barb & Riser can lower the pressure drop to ~2.09 Transfer, vol. 13, 2003.
psi from ~3.05 psi at 3 lpm (~0.79 gpm). The pressure drop in [10] Xuan, Y., & Li, Q. (2003). Investigation on convective
2x coldplate design with 3/8” ID Barb and 2.35 times increased heat transfer and flow features of nanofluids. J. Heat transfer,
neck region area is observed as 1.4 psi from the CFD modeling 125(1), 151-155.
at 3 lpm (~0.79 gpm). The inlet flow within the Barb assemblies [11] Owhaib, W., & Palm, B. (2004). Experimental
is observed as turbulent. At such high ReD number, frictional investigation of single-phase convective heat transfer in circular
losses are dramatically affected by the relative roughness and microchannels. Experimental Thermal and Fluid Science, 28(2-
the geometrical structure of the mechanical components & 3), 105-110.
assembly. Inlet Barb & Riser assembly and neck region design [12] Chiu, H. C., Jang, J. H., Yeh, H. W., & Wu, M. S. (2011).
are found as most critical components on overall pressure drop The heat transfer characteristics of liquid cooling heatsink
in the coldplate assemblies. Local circulation and the flow containing microchannels. International Journal of Heat and
constriction at the sudden area changes result in very high Mass Transfer, 54(1-3), 34-42.
pressure drop which affects the overall hydrodynamic [13] Zhang, H. Y., Pinjala, D., Wong, T. N., Toh, K. C., &
characteristics of the coldplate. Joshi, Y. K. (2005). Single-phase liquid cooled microchannel
heat sink for electronic packages. Applied Thermal
ACKNOWLEDGEMENT Engineering, 25(10), 1472-1487.
The authors would like to acknowledge Chris Tuma for [14] Garimella, S. V., & Singhal, V. (2004). Single-phase flow
assisting the mechanical development of the coldplate and heat transport and pumping considerations in microchannel
assembly. heat sinks. Heat transfer engineering, 25(1), 15-25.
REFERENCES
[1] Salahuddin, S., Ni, K., & Datta, S. (2018). The era of hyper-
scaling in electronics. Nature Electronics, 1(8), 442-450.
[2] Liao, X. K., Lu, K., Yang, C. Q., Li, J. W., Yuan, Y., Lai,
M. C., ... & Shen, J. (2018). Moving from exascale to zettascale
computing: challenges and techniques. Frontiers of
Information Technology & Electronic Engineering, 19(10),
1236-1244.
[3] Fan, Z., Qiu, F., Kaufman, A., & Yoakum-Stover, S. (2004,
November). GPU cluster for high performance computing. In
Proceedings of the 2004 ACM/IEEE conference on
Supercomputing (p. 47). IEEE Computer Society.
[4] Moldoveanu, A. (2008). Highly-scalable server for massive
multi-player 3D virtual spaces based on multi-processor
graphics cards. Annals of DAAAM & Proceedings, 899-901.
[5] Garimella, S. V., Singhal, V., & Liu, D. (2006). On-chip
thermal management with microchannel heat sinks and
integrated micropumps. Proceedings of the IEEE, 94(8), 1534-
1548.
[6] Schmidt, R. (2004). Challenges in electronic cooling—
opportunities for enhanced thermal management techniques—
microprocessor liquid cooled minichannel heat sink. Heat
Transfer Engineering, 25(3), 3-12.
[7] Tian, S., Takken, T., Schultz, M., Yao, Y., Coteus, P.,
Marroquin, C., O’Connell K, Mahaney HV, Yuksel A, &
Ellsworth, M. (2019). A Single Flexible Coldplate Cools
Multiple Devices. In 2019 18th IEEE Intersociety Conference
on Thermal and Thermomechanical Phenomena in Electronic
Systems (ITherm) (pp. 1313-1320). IEEE.
[8] Yuksel, A., Mahaney, V., Marroquin, C., Tian, S.,
Hoffmeyer, M., Schultz, M., & Takken, T. Thermal and
Mechanical Design of the Fastest Supercomputer of the World
in Cognitive Systems: IBM POWER AC 922. In ASME 2019
Authorized licensed use limited to: University of Glasgow. Downloaded on November 01,2020 at 16:25:39 UTC from IEEE Xplore. Restrictions apply.