Professional Documents
Culture Documents
Abstract— Large 3DIC designs with multiple chips require usually not on optimal locations as described in [2][3][4] for
several iterations of transient thermal analysis particularly for SoC designs due to the use of coarse mesh grid on-chip in
fine-grain on-chip dynamic thermal management. This requires chip-package-system transient thermal simulation. However,
a fast thermal analysis technology as opposed to traditional for the advanced 7nm/5nm designs such as the Vega 2.0 3DIC
CFD/FEA based methods which have severe runtime/capacity from AMD, there are 64 on-chip thermal sensors in-place as
2022 International Symposium on VLSI Design, Automation and Test (VLSI-DAT) | 978-1-6654-0921-6/22/$31.00 ©2022 IEEE | DOI: 10.1109/VLSI-DAT54769.2022.9768082
limitations for large chips (e.g., 2cmx2cm) in 3DIC while shown in Fig. 2. This implies that an accurate transient thermal
generating fine grained (e.g., 10umx10um) transient thermal simulation is needed with fine mesh grid to optimize the
response. The fast transient thermal analysis is based on the idea
placement of on-chip thermal sensors.
of combining the global, intermediate, and local transient
response curves generated from an ML-predictor. The local,
intermediate, and global transient response curves are scaled
based on the far-field and near-field transient decay surface
components respectively, computed using the trained ML decay
surface predictor, followed by linear superposition of the curves
for each power value in the transient power profile to generate
the effective transient response curve. The runtime for
generating thermal results for a large chip is in the order of
minutes, compared to several hours/days while using CFD/FEA
based tools with good accuracy correlation. The fast transient
thermal solver is implemented on distributed ML computing
platform for parallel computation of transient thermal inference
model.
978-1-6654-0921-6/22/$31.00
Authorized @2022Town
licensed use limited to: University IEEELibrary of Shenzhen. Downloaded on March 19,2024 at 10:45:55 UTC from IEEE Xplore. Restrictions apply.
voltage drops and timing [12][13]0. Performing fine-grained time and hence the temperature will vary with time instead of
transient thermal analysis on large 3D IC designs is virtually approaching a steady state value.
impossible using traditional CFD/FEA based solvers. To
address the above challenges, this paper proposes a novel fast
transient thermal solver which enables transient fine-grained
thermal analysis with extremely fast runtimes compared to
traditional FEA based thermal analysis.
In the following sections, we will describe the techniques
for fast transient thermal simulation as needed in architecture-
level thermal simulation as well as in layout-based transient
thermal simulation for 3D IC.
The paper is organized as follows. Section II describes the Figure 5 Transient power profile on each block of a chip
problem statement. Sections III to VII explain the ML-based
Fast Transient Thermal Solver. Section VIII describes the III. SOLUTION OVERVIEW
extension of the methodology to 3D IC case. Section IX Figure 6 shows the flow overview of the fast transient thermal
discusses the results of thermal analysis based on the solver. The inputs to this flow are the chip-pkg-system design
methodology in the paper on couple of example designs to inputs, boundary conditions and the transient power profile.
demonstrate the runtime and capacity improvements.
II. MOTIVATION AND PROBLEM DEFINITION
Traditional Finite Element Method (FEM) based numerical
simulations require prohibitively long run times even for
small simulation durations making it unviable to be used for
generating fine-grained transient thermal responses. Even
with distributed computing resources the run times will not
be good enough for such a transient thermal response
prediction.
Figure 4 Tile-based power map with 10ߤmx10ߤm resolution (left) Figure 6 Fast Transient Thermal Solver Flow Overview
and thermal profile of a 4mmx3mm test chip with 800k instances
(right) A trained ML model is used to predict the decay surface
models to characterize the local, intermediate, and global
Figure 4 shows an example of a fine grid power and thermal models. The ML predictor and the decay surface models will
maps for a 4mm x 3mm test chip at a resolution close to 10ߤm be described in later sections. The chip is partitioned into
in size. The chip has about 800k instances. Using a fine mesh size NxM tiles of size k(um)xk(um) on which the transient power
of 10ߤm on the silicon block, it took more than couple hours to is distributed to generate the fine-grained power map which
solve for the chip+package+PCB model using FEA to generate serve as the heat sources. With this transient power-map the
the steady state thermal response. Hence an FEA based method transient temperature curves are generated using a
is clearly not suitable for transient thermal response generation convolution-based approach described later for each tile
which is usually much longer than static thermal solve. One of location. Finally, the transient temperature curves for each of
the recent papers proposed a Deep Neural Networks (DNN) the tiles is linearly superimposed to generate the effective
based fast static thermal solver [8]. However, that work was transient response.
limited to performing only steady state thermal analysis. The
fast transient thermal solver methodology in this paper
generates a fine-grained (10umx10um) on-chip transient
thermal responses with the inputs as the time-varying on-chip
power profile. However, the methodology is flexible and can
be applied to smaller or larger mesh grids and any size of
chips.
Figure 5 shows an example transient power profile where the
average power changes with time. This average power can be
full chip power, per block power, or power at even finer
granularity such as cell-level instance. This transient power
will create a power map for the chip which will change with Figure 7 A 2-layer model (left) and a 3-layer die model (right)
Authorized licensed use limited to: University Town Library of Shenzhen. Downloaded on March 19,2024 at 10:45:55 UTC from IEEE Xplore. Restrictions apply.
It is worthwhile here to explain the die model used in this intermediate impact and global impact corresponding to local
work. For this work two die models were considered, a 2- tiles, intermediate tiles, and global tiles. These tiles then have
layer silicon block model and a 3-layer die model as shown corresponding local level, intermediate level, and global level
in Figure 7. The 3-layer die model considers the thermal decay surface models.
models for the interconnect and insulation layers making it The decay surface at any level can be categorized by
more accurate for thermal analysis which is even more considering an MxM array where M << no. of tiles on the
applicable in the case of fine-grained thermal analysis. chip. For example, a local level decay surface can be
Hence, for this work we selected the 3-layer die model for characterized with an array of 21x21 tiles with tile size as
thermal analysis of the chip. 10umx10um, whereas an intermediate level decay surface
can be categorized with an array of 10x10 tiles with tile size
IV. TRANSIENT DECAY SURFACE CHARACTERIZATION as 210umx210um and the global level decay surface can
Figure 8 shows an example transient decay surface simply be the full chip-pkg-system level response at the
characterized for a 10umx10um heat source. This figure center of the chip with the power uniformly distributed on the
shows the temperature change with time at various distances chip.
from the heat source. This essentially models the temperature
at any location due to a given heat source as a function of time Figure 9 shows that how a decay surface can be characterized
and distance as in equation (1). for an MxM array. The tile array is placed on the chip and a
heat source of power Pchar is applied at the center of the array.
With this heat source the transient thermal response is
captured at all the MxM tiles. The set Ds={Pchar, ∆Tchar},
where , ∆Tchar is the set of all the transient curves for MxM
tiles, represents the decay surface model for this level.
οܶ ൌ ݂൫݀ ǡ ݐ൯( 1) Figure 9 Decay surface characterization for MxM array
οܶ݅ ൌ σ݆ ݆ܶ݅ ܪ א ݆ǡ( 2) The guiding principle is that for the local level decay surface
the smallest tile size is used for high accuracy to capture the
݂݊ݐ݁ݏ݄݁ݐݏ݅ܪ݁ݎ݄݁ݓെ ݄ܿ݅ݏ݁ܿݎݑݏݐ݄ܽ݁ impact of local heat sources and coarser tile sizes are used for
intermediate level decay surfaces since the impact is an
The term οܶ denotes the temperature change at the ith aggregated effect. After the decay surfaces have been
location due to jth on-chip heat source. We assume a Linear characterized the output of this stage is the Ds_local,
Time Invariant (LTI) system and therefore the contribution of Ds_intermediate and Ds_global. It is observed that the
each of the on-chip heat sources can be linearly combined to thermal responses in the decay surfaces scales almost linearly
generate the total effective transient response at the ith with power as:
location as in equation (2). This operation is repeated for all οܶԢ
ൌ
ܲԢ
( 3)
the tiles in the chip to generate the transient response curves ο݄ܶܿܽݎ ݄ܲܿܽݎ
for each tile. For a chip size of 1cmx1cm, the number of tiles The other observation is that the thermal response of heat
will be 1M, and therefore for each tile the impact of 1M tiles source at tile i on tile j will be the same as the impact of the
needs to be computed on a given tile. Repeating this operation
same heat source at tile j on tile i. It should be noted that the
for each tile will lead to 1Mx1M operations which can be decay surface characterizations are performed with the full
prohibitively expensive computationally. Therefore, an chip-pkg-system model.
innovative solution has been developed where the number of
operations for each tile can be reduce using multi-scale decay V. MACHINE LEARNING FOR DECAY SURFACE MODEL
surface model. The thermal response at any given tile location
Having described the fundamental idea of transient decay
can be computed by considering the impact of heat sources
surface in the previous section, it should be noted that for any
from nearby tiles, from intermediate distance tiles and from
given chip-pkg-system, this initial characterization has to be
far away tiles. These impacts are categorized as local impact,
performed before the fast transient thermal analysis solver
Authorized licensed use limited to: University Town Library of Shenzhen. Downloaded on March 19,2024 at 10:45:55 UTC from IEEE Xplore. Restrictions apply.
can be used. To address this challenge a machine learning DeepONet [7] based neural network has been built. The
(ML) based predictor was developed which can predict the network structure is depicted in Fig. 11, where the branch net
decay surfaces for a given chip-pkg-system. will take system parameters as input and the trunk net will
This section illustrates our proposed ML-based transient take time step as input. The information from branch net and
decay surface characterization. To accurately capture the far- trunk net are combined through a merging layer and output
field and near-field transient decay surface components, ML- the temperature distribution on the chip.
based models for various levels are constructed and trained.
The precision of the models’ prediction is evaluated on
unseen cases/data. Branch net
Merging
Transient simulations have been carried out utilizing a high- layer
fidelity numerical solver (Ansys Mechanical APDL) for
simulating the transient response of the chip with its center Trunk net
tile heated. Examples of the temperature response on the chip
Figure 11 ML model architecture
are shown in Figure 10.
To validate the effectiveness of the trained model, unseen
By adjusting the input system parameters of the simulation
system parameters are sampled and input to Ansys
model, different sets of data can be generated. The
corresponding system parameter space for the four decay Mechanical APDL for calculation. Then the same system
surface levels is provided in Table 1. It is worth noting that the parameters are fed into the trained neural network for
system parameters for the four different levels are almost the prediction. Finally, the simulation results from the numerical
same except for the tile size and the power applied on the tile. solver and the network prediction are compared and the
Besides, one additional parameter, C scale, for modifying the prediction error is calculated for evaluating the model
thermal capacitance of the system has been introduced as well. performance. An example of the ML-based model prediction
Uniform random sampling has been employed to select values is shown in Figure 12, where the first subplot is the ground
from the predefined space for each system parameter. In total, truth solution from the numerical solver, the second subplot
1000 distinct cases have been created for each decay surface is the neural network prediction and the last one is the
level. absolute error distribution for the prediction. It can be seen
from the color bar of the third subplot that the model is
making predictions with reasonable accuracy. The trained
model is further tested on a test dataset of 100 unseen cases
and the overall L2 relative error is less than 1%, which
verifies the accuracy of the trained model.
(A) (B) (C) (D)
Figure 10 Examples of transient response on chip. (A) Time
step=10, (B) Time step=45, (C) Time step=55, (D) Time step=65
C_scale 0.5-200 where οܶ݀ ݏis the convolution kernel from the decay surface
Dielectric / interconnection layer model, and the p is the scaled power map with respect to the
thickness (μm) 0.00138--0.138 power used for characterizing the decay surface model. This
equation will calculate the contribution of the οܶሺݔǡ ݕǡ ݐሻdue
Since the input to the ML model includes different to the corresponding decay surface model οܶ݀ݏ.
combinations of system parameters along with the time step
and the output of the ML model is the corresponding
temperature distribution of the chip at the given time step, a
Authorized licensed use limited to: University Town Library of Shenzhen. Downloaded on March 19,2024 at 10:45:55 UTC from IEEE Xplore. Restrictions apply.
5. The above steps can be repeated for all the tiles/sensor
locations to generate the transient thermal response for
all the tiles/sensor locations.
6. Heat sources which have been accounted for at a certain
level are excluded from the other levels.
Authorized licensed use limited to: University Town Library of Shenzhen. Downloaded on March 19,2024 at 10:45:55 UTC from IEEE Xplore. Restrictions apply.
heat sources on a die can affect the neighboring dies through
thermal coupling and hence this effect needs to be modeled
to accurately predict the transient thermal response on any
die.
Authorized licensed use limited to: University Town Library of Shenzhen. Downloaded on March 19,2024 at 10:45:55 UTC from IEEE Xplore. Restrictions apply.
based solver was used to run transient power-on thermal off-center location. The fast transient thermal solver matches
analysis. This power-on transient thermal analysis simulation well with the FEA based thermal solver for both the probe
was of 5 seconds duration. A randomly generated power map locations. The figure also shows the contributions of the
as shown in Figure 21was applied such that the total power is different components of the decay surface model to the total
790mW. temperature curve. At the steady state, the local decay surface
contributes about 9.5 degree Celsius, and the global decay
surface contributes about 112 degrees Celsius. These two
components added with the ambient temperature of 20
degrees Celsius leads to a total temperature of 141.5 degrees
Celsius. The contribution of the global decay surface is small
since this is a small chip and the intermediate level decay
surface covers most of the part of the chip for the given
temperature probe point. The figure also plots the difference
between the results from the FEA based solver (golden) and
fast transient thermal solver.
With the above power map, the FEA solver and the fast
transient thermal solver were used for generating the transient
response curves at the center of chip for correlation. The local
and global decay surface models were used in the fast
transient thermal solver since this is a small chip. Figure 24 Results with transient power profile
REFERENCES
[1] Y. Chen, C. Yang, C. Kuo, M. Chen, C. Tung, W. Chiou, D. Yu, “Ultra
high density SoIC with sub-micron bond pitch,” ECTC, 2020.
[2] S. Krishnaswamy, P. Jain, M. Saeidi, A. Kulkarni, A. Adhiya, J.
Harvest, “Fast and accuate thermal analysis of smartphone with
dynamic power management using reduced order modeling,” ITherm,
2017.
[3] M. Dogruoz, M. Abarham, G. Shankaran, “Transient thermal
behavious of SOIC packages – an optimization study,” ITherm, 2016.
[4] Y. Im, W. Kim, T. An, H. Lee, Y. Cho, J. Yoo, H. Lee, Y. Shin, M.
Lee, V. Yaddanapudi, “Thermal sensor placement based on meta-
model enhancing observavility and controlability,” ITherm, 2020.
Figure 23 Temperature components from different decay surface
models [5] Vega 20: Under The Hood - The AMD Radeon VII Review: An
Unexpected Shot At The High-End (anandtech.com)
[6] Too Hot to Test workshop, Intel, 2021, https://youtu.be/0gPSbZqbXUg
Figure 23 shows another example with a different power map
with the temperatures probed at the center of the chip and an
Authorized licensed use limited to: University Town Library of Shenzhen. Downloaded on March 19,2024 at 10:45:55 UTC from IEEE Xplore. Restrictions apply.
[7] Y. Sun, C. Zhan, J. Guo, Y. Fu, G. Li, en J. Xia, “Localized thermal [11] M. Emilio, “Hybrid Chips may Solve Thermal, Efficiency, and
effect of sub-16nm FinFET technologies and its impact on circuit Integration Challenges in 5G Mobile Devices”, in Power Electronics
reliability designs and methodologies”, in 2015 IEEE International News, Oct. 2019
Reliability Physics Symposium, 2015 [12] Y. Zhong, M. D.F. Wong, “Thermal-Aware IR Drop Analysis in Large
[8] J.Wen, S. Pan, N. Chang, W.-T. Chuang, W. Xia, D. Zhu, A. Kumar, E.-C. Power Grid”, in Proc. of International Symposium on Quality
Yang, K. Srinivasan, and Y.-S. Li, “DNN-Based Fast Static On-Chip Thermal Electronic Design, 2008
Solver”, in 2020 36th Semiconductor Thermal Measurement, Modeling & [13] C. Peach and Y. Zhang, “Protecting AI Chips from Thermal Challenges
Management Symposium (SEMI-THERM), IEEE, 2020 during ATE Test”, in Evaluation Engineering Magazine, Jun., 2019
[9] R. Chandra, “It's Time To Consider Temperature Gradients In IC S. Makovejev, S. Olsen, V. Kilchytska, and J. Raskin, “Time and
Design”, in Electronic Design, Feb., 2006 Frequency Domain Characterization of Transistor Self-Heating”, in
[10] A. Mutschler, “New Thermal Issues Emerge”, in Semiconductor IEEE Transactions On Electron Devices, 2013
Engineering, Feb., 2018
Authorized licensed use limited to: University Town Library of Shenzhen. Downloaded on March 19,2024 at 10:45:55 UTC from IEEE Xplore. Restrictions apply.