You are on page 1of 16

applied

sciences
Article
Research on Device Modeling Technique Based on MLP Neural
Network for Model Parameter Extraction
Haixia Kang 1,2 , Yuping Wu 1,2,3,4, * , Lan Chen 1,3 and Xuelian Zhang 1,3

1 Institute of Microelectronics of Chinese Academy of Sciences, Beijing 100029, China;


kanghaixia@ime.ac.cn (H.K.); chenlan@ime.ac.cn (L.C.); zhangxuelian@ime.ac.cn (X.Z.)
2 University of Chinese Academy of Sciences, Beijing 100049, China
3 Beijing Key Laboratory of Three-Dimensional and Nanometer Integrated Circuit Design Automation
Technology, Beijing 100029, China
4 Key Laboratory of Microelectronic Devices & Integrated Technology, CAS, Beijing 100029, China
* Correspondence: wuyuping@ime.ac.cn; Tel.: +86-010-8299-5693

Abstract: The parameter extraction of device models is critically important for circuit simulation. The
device models in the existing parameter extraction software are physics-based analytical models, or
embedded Simulation program with integrated circuit emphasis (SPICE) functions. The programming
implementation of physics-based analytical models is tedious and error prone, while it is time
consuming to run the device model evaluation for the device model parameter extraction software
by calling the SPICE. We propose a novel modeling technique based on a neural network (NN)
for the optimal extraction of device model parameters in this paper, and further integrate the NN
model into device model parameter extraction software. The technique does not require developers
to understand the device model, which enables faster and less error-prone parameter extraction
software developing. Furthermore, the NN model improves the extraction speed compared with
the embedded SPICE, which expedites the process of parameter extraction. The technique has been

 verified on the BSIM-SOI model with a multilayer perceptron (MLP) neural network. The training
Citation: Kang, H.; Wu, Y.; Chen, L.; error of the NN model is 4.14%, and the testing error is 5.38%. Experimental results show that the
Zhang, X. Research on Device trained NN model obtains an extraction error of less than 6%, and its extraction speed is thousands
Modeling Technique Based on MLP of times faster than SPICE in device model parameter extraction.
Neural Network for Model
Parameter Extraction. Appl. Sci. 2022, Keywords: device model parameter extraction; device modeling; neural network; multilayer
12, 1357. https://doi.org/10.3390/ perceptron
app12031357

Academic Editor: Stavros Souravlas

Received: 31 October 2021


1. Introduction
Accepted: 24 January 2022
Published: 27 January 2022
In the process of device model parameter extraction, the model parameter values are
substituted into the device model for calculation. The error between the measured data of
Publisher’s Note: MDPI stays neutral the device and the calculated data of the model is obtained and compared with the target
with regard to jurisdictional claims in value. The optimization engine will adjust the model parameter values and then substitute
published maps and institutional affil-
them into the model calculation. This process is carried out iteratively until the fitting error
iations.
is lower than the target value, as shown in Figure 1. The device models in the existing
parameter extraction software are manually implemented, physics-based analytical models,
or embedded SPICE. The implementation of physics-based analytical models requires the
Copyright: © 2022 by the authors.
developer to be skilled in device physics and have a deep understanding of the models.
Licensee MDPI, Basel, Switzerland.
Moreover, the programming implementation of the physics-based analytical models is
This article is an open access article tedious and error prone and, therefore, the software development is time intensive. The
distributed under the terms and development of physics-based analytical models is not generic, and needs programming for
conditions of the Creative Commons each device model individually. When the SPICE is called for simulation, it is necessary to
Attribution (CC BY) license (https:// parse the netlist, establish the matrix equations, and iteratively solve the matrix equations,
creativecommons.org/licenses/by/ etc. It is time consuming to run the device model evaluation for the device model parameter
4.0/).

Appl. Sci. 2022, 12, 1357. https://doi.org/10.3390/app12031357 https://www.mdpi.com/journal/applsci


Appl.
Appl.Sci. 2022,12,
Sci.2022, 12,1357
x FOR PEER REVIEW 22 of
of1616

extraction softwareextraction
model parameter by callingsoftware
the SPICE, making
by calling thethe model
SPICE, parameter
making extraction
the model less
parameter
efficient.
extraction less efficient.

Figure1.1.The
Figure Theprocess
processofofmodel
modelparameter
parameterextraction.
extraction.

InInrecent
recentyears,
years,NNs NNshavehavebeen
beenwidely
widelyused useddueduetototheir
theirsuperior
superiordatadataprocessing
processing
capabilities.NNs
capabilities. NNsprovide
providesmooth
smoothfunctions
functionsand andinfinite
infiniteorder
orderderivatives
derivativesthat
thatcan
canapproxi-
approx-
mate
imatecomplex
complexnonlinear
nonlinearfunctions
functions[1,2].[1,2].Since
SincethetheIVIVcurves
curvesandandother
othercharacteristics
characteristicsofof
devices
devicesare arenonlinear,
nonlinear,NNs NNshave havethethepotential
potentialtotomodel
modelthemthem[3,4].
[3,4].Currently,
Currently,NNs NNshave
have
been applied in several modeling situations, such as small-signal
been applied in several modeling situations, such as small-signal modeling and micro-modeling and microwave
device modeling
wave device [5–8]. As
modeling theAs
[5–8]. complexity
the complexity of theofdevice model
the device increases,
model the number
increases, the numberof
model
of modelparameters
parametersgoes goes
up. Manually
up. Manuallyspecifying new model
specifying parameters
new model for complex
parameters device
for complex
physics
device equations is much is
physics equations harder
muchthan adding
harder thannewaddingvariables to the NN
new variables to model
the NN[9]. model [9].
The
Theartificial
artificialneural
neural network
network (ANN)
(ANN) is applied
is appliedto model
to modelthe the
device, withwith
device, a preprocess-
a prepro-
ing
cessing of the current and voltage to train the ANN model accurately [10]. A devicemodel
of the current and voltage to train the ANN model accurately [10]. A device model
based
basedon onthe
theNNNNactsactsasasan
anintermediate
intermediatebetween
betweenearly earlydevice
devicemeasurements
measurementsand andaalater
later
occurring
occurringcompact
compactmodel model[11].
[11].AAfitness
fitnessfunction
functionbasedbasedononkeykeyelectrical
electricalcharacteristics
characteristics
(the
(theoff
offcurrent
current andandthethe
subthreshold
subthreshold slope) is proposed
slope) is proposedto compensate
to compensate for the
forR2
thefunctions
R2 func-
acting solely as the fitness function. A novel physics-inspired neural
tions acting solely as the fitness function. A novel physics-inspired neural network (Pi- network (Pi-NN)
approach
NN) approachfor compact modeling
for compact is proposed
modeling is proposedin [12], where
in [12], fundamental
where fundamental device physics
device phys-
are
ics are incorporated in the NN model to improve the modeling accuracy. In this paper,we
incorporated in the NN model to improve the modeling accuracy. In this paper, we
propose
proposeusingusingan anMLP
MLPneural
neuralnetwork
networkinstead
insteadofofaamanually
manuallyimplemented
implementedphysics-based
physics-based
analytical
analyticalmodel
modelororembedded
embeddedSPICE SPICEtotobuildbuildaadevice
devicemodel
modeland andintegrate
integrateititinto
intothe
the
parameter extraction software. The NN uses the IV and CV curves,
parameter extraction software. The NN uses the IV and CV curves, and other characteris- and other characteris-
tics
ticsfor
formodeling,
modeling,and andlearns
learnsthethecomplex
complexrelationship
relationshipbetween
betweenthe theinput
inputand andoutput
outputby by
training. The NN modeling technique does not require understanding the device model,
training. The NN modeling technique does not require understanding the device model,
Appl. Sci. 2022, 12, x FOR PEER REVIEW 3 of 16
Appl. Sci. 2022, 12, 1357 3 of 16

which enables faster and less error-prone development than the implementation of the
which enables faster and less error-prone development than the implementation of the
physics-based analytical model. Moreover, the NN model improves the extraction speed
physics-based analytical model. Moreover, the NN model improves the extraction speed
over SPICE, which can expedite the process of parameter extraction.
over SPICE, which can expedite the process of parameter extraction.
The NNs in these works [10–12] are only used for circuit simulation, and the inputs
The NNs in these works [10–12] are only used for circuit simulation, and the inputs
of the NNs only contain the voltages and the device geometries. We use the MLP neural
of the NNs only contain the voltages and the device geometries. We use the MLP neural
network to build a device model for parameter extraction. The MLP neural network inputs
network to build a device model for parameter extraction. The MLP neural network inputs
include the voltages, device geometries, temperature, and model parameters. We perform
include the voltages, device geometries, temperature, and model parameters. We perform a
a sensitivity analysis of the model parameters, where sensitivity means the influence de-
sensitivity analysis of the model parameters, where sensitivity means the influence degree
gree of the model parameters on the characteristic curves. The model parameters with
of the model parameters on the characteristic curves. The model parameters with non-zero
non-zero sensitivity are used as the inputs for the NN. The device modeling flow based
sensitivity are used as the inputs for the NN. The device modeling flow based on NN and
on NN and the flow of device model parameter extraction using the NN model is shown
the flow of device model parameter extraction using the NN model is shown in Figure 2.
in Figure 2.

Figure 2. The device modeling flow based


Figure2. based on
onNN
NNand
andthe
theflow
flowofofdevice
devicemodel
modelparameter
parameterextraction
extraction
using
usingthe
the NN
NN model.
model.
Appl. Sci. 2022, 12, 1357 4 of 16

The rest of this paper is organized as follows. Section 2 introduces the data preparation
and preprocessing methods. Section 3 presents the training and testing work of the NN.
In Section 4, we analyze the device modeling and model parameter extraction results.
Section 5 summarizes the work.

2. Data Preparation and Pre-Processing


2.1. Data Preparation
This work uses current data for device modeling, defining the influence of the model
parameters on the current as the sensitivity. Model parameters with non-zero sensitivity are
taken as the inputs of the NN. The sensitivity calculation formulas are defined as follows:

∆Ii ∆P
A0[i ] = | / | (1)
Ii P

S( A) = max ( A0) (2)


where Ii is current when the model parameter A is set to P, and ∆Ii denotes the current
difference when the model parameter A is set to P and P + ∆P under the same working
conditions. Array A0 represents the sensitivity array containing all current data. S(A)
represents the sensitivity of the model parameter A, which is the maximum value of
array A0.
In this work, we take the BSIM-SOI model as the source for the training set and testing
set [13]. The sensitivity analysis results of the BSIM-SOI model show 128 model parameters
with non-zero sensitivity, which are used as the NN inputs. Some of the model parameters
with high sensitivity are listed in Table 1. It took 23.9 h to generate the data needed for the
sensitivity analysis and 36 s for the sensitivity analysis itself.

Table 1. Some model parameters of BSIM-SOI.

Model Parameters Description


Bigc Parameter for Igcs and Igcd
Nrecr0 Recombination non-ideality factor at reversed bias for source
Beta2 Third Vds dependent parameter of impact ionization current
Vtun0 Voltage dependent parameter for tunneling current for source
Ntun Reverse tunneling non-ideality factor for source

The extraction of device model parameters needs to take into account various working
conditions (voltages, temperature) [14,15] and use a MOSFET of different sizes [16,17].
Therefore, the inputs of the NN need to contain six variables, namely, drain-source voltage
(Vds ), gate-source voltage (Vgs ), bulk-source voltage (Vbs ), temperature (T), gate width (W),
and gate length (L) in addition to the model parameters. Drain-source current (Ids ) is the
output. The MLP neural network consists of an input layer, a few hidden layers, and an
output layer, which are fully connected [18,19].
The 45 nm SOI process [20] was used to build the circuit. The minimum gate length
value of the nmosbc-type MOSFET used in this work is 56 nm, while the minimum value
of the gate width is 654 nm. For the training set, the gate length ranges from 56 nm to
1000 nm, the gate width from 654 nm to 1200 nm, and the temperature from −40 ◦ C to
125 ◦ C. The output characteristic (Ids -Vds ) curves sweep from 0 to 1.0 V for the Vds with a
step of 0.05 V. The transfer characteristic (Ids -Vgs ) curves sweep from 0 to 1.0 V for the Vgs
with a step of 0.05 V. The temperature, gate length, and gate width are scattered within the
ranges. At the same time, the Latin hypercube sampling method was applied to the model
parameters to ensure that their ranges were fully covered [21]. Tens of thousands of sets of
IV curves were prepared as the training set. The time taken to generate the training set for
the NN was 11.2 h.
A good NN model must have good generalization ability, i.e., it is required that the
NN model performs well on the test data, which is different from the training data [22,23].
Appl. Sci. 2022, 12, 1357 5 of 16

The inputs were extended to varying degrees when preparing the test set. W was taken
from 654 nm to 1500 nm, L from 56 nm to 1200 nm, and T from −40 ◦ C to 125 ◦ C. The range
of model parameters was extended to 1 to 1.1 times the original range. The test set takes a
minor step for voltage. The output characteristic curves for Vds sweep from 0 V to 1.0 V
with a step of 0.025 V. The transfer characteristic curves for Vgs sweep from 0 to 1.0 V with
a step of 0.025 V. More than 4000 sets of IV curves were prepared as the test set.

2.2. Data Pre-Processing


Input feature vectors have different magnitudes, among which large feature vectors
can dominate the training, causing the network to fail to learn from other feature vectors.
Thus, the inputs need to be normalized. The normalization of the inputs can facilitate
gradient descent and accelerate convergence [24,25]. In this work, min–max normalization
is applied to the input feature vectors to map them between 0 and 1.
The values of currents mainly range from 1 × 10−12 A to 1 × 10−3 A with a large span
of magnitude. If the NN is trained using unprocessed current, the error is dominated by
the large current range, and the errors in the small current range are not weighted enough,
which will lead to a poor fit for small values. We applied a logarithmic operation to the
training data to solve this problem, and errors in all regions are weighted equally. However,
when Vds is 0, Ids is zero current, which cannot take logarithms, so the points with Vds of 0
were removed from the training set and test set. At the same time, the voltage settings in
the data preparation were adjusted, and the output characteristic curves were swept from
0.05 V to 1.0 V for Vds . In the following work, the Ids at Vds of 0 was specified as 0 for the
convenience of discussion.

3. Training and Testing the MLP Neural Network


This MLP neural network leverages the backpropagation (BP) algorithm for training.
The training process is divided into forward propagation and backpropagation. Forward
propagation is a series of linear and activation operations using the hyperparameters
with the inputs, and the output results are calculated backward from the input layer.
Backpropagation uses the output layer error to optimize the hyperparameters [26,27].
The NN employs the adaptive momentum estimation (Adam) algorithm to accelerate
the convergence [28], and the initial learning rate of the Adam algorithm is set to 0.01. The
mean square root of the relative error (RR) is defined as the performance metrics, which is
defined as follows: v
u 1 N Ii − Ii 2
u

N i∑
RR = t | nn i sim | (3)
=1 Isim
i
where Isim denotes the current value of SPICE simulation and Inni represents the predicted

value of the NN.


The IV curves can be divided into subthreshold and non-subthreshold regions ac-
cording to the relationship between Vgs and the threshold voltage (Vth ). The relationship
between the current and voltage in the subthreshold region is exponential, and the equation
is provided as follows:
s
We f f Vgs − Vth − Vo f f
  
qε si Nch 2 Vds
Ids = µ0 × × × vt × 1 − exp − × exp ( ) (4)
Le f f 2φs vt nvt

where µ0 represents the mobility, Weff is the effective channel width, Leff is the effective
channel length, q is the electronic charge, εsi is the dielectric constant of silicon, Nch is the
substrate doping concentration, φs is the surface potential, vt is the thermal voltage, Voff is
the offset voltage, and n is the subthreshold swing parameter. The relationship between the
current and voltage in the non-subthreshold region is linear, and the equation is provided
as follows:
Appl. Sci. 2022, 12, x FOR PEER REVIEW 6 of 16

Appl. Sci. 2022, 12, 1357 6 of 16


the offset voltage, and n is the subthreshold swing parameter. The relationship between
the current and voltage in the non-subthreshold region is linear, and the equation is pro-
vided as follows:
𝑉ds −
−V 𝑉dsat 𝑉V −−𝑉Vdsat
 
V
(1++ ds

Ids 𝐼= =
We𝑊 𝑣Cox𝐶× ×V(𝑉
f f vsat gs −− 𝑉 −−A𝐴bulk ××V𝑉
Vth dsat )×× 11+
+ V𝑉A
×(1
× )) (5)
(5)
𝑉VASCBE
where vvsat
where isthe
satis thesaturation
saturation velocity, Cox
velocity, C is the
ox is the gate
gate oxide
oxide capacitance,
capacitance, A Abulk the body charge
bulk the body charge
effect parameter, V dsat the drain-source saturation voltage, V
effect parameter, Vdsat the drain-source saturation voltage, VA the Early voltage,and
A the Early voltage, andVVASCBE
ASCBE
is the substrate current-induced body effect (SCBE) corresponding to
is the substrate current-induced body effect (SCBE) corresponding to the Early voltagethe Early voltage [29].
Among the common NN activation functions, sigmoid types are used
[29]. Among the common NN activation functions, sigmoid types are used in preference, in preference, as
they have smooth derivatives to meet the requirements for high-order
as they have smooth derivatives to meet the requirements for high-order derivatives in derivatives in circuit
simulations.
circuit simulations.
Among
Among the the sigmoid-type
sigmoid-type functions,
functions, the the commonly
commonly used used ones
ones are
are log-sigmoid
log-sigmoid and and
tanh-sigmoid functions,
tanh-sigmoid functions,which
whichcancanmap
map(−∞, (−∞,+∞)+∞) inputs
inputs to the
to the stepssteps
(0, 1)(0,and
1) and −1,
(−1, (+1),
+1), respectively. The equations of log-sigmoid and tanh-sigmoid functions are shown in
respectively. The equations of log-sigmoid and tanh-sigmoid functions are shown in
Equations (6) and (7), respectively.
Equations (6) and (7), respectively.
11
log −−
𝑙𝑜𝑔 𝑠𝑖𝑔𝑚𝑜𝑖𝑑
sigmoid == (6)
(6)
11++e−
𝑒x
x
e𝑒 −−e 𝑒 −x
tanh
𝑡𝑎𝑛ℎ −−sigmoid
𝑠𝑖𝑔𝑚𝑜𝑖𝑑==e x + e− x (7)
(7)
𝑒 +𝑒
Two activation
Two activationfunctions
functionsare
areused
usedtoto model
model thethe device
device separately.
separately. From
From Figure
Figure 3, we3,
we can observe that log-sigmoid fits better than tanh-sigmoid. Therefore, the activation
can observe that log-sigmoid fits better than tanh-sigmoid. Therefore, the activation func-
function
tion ofNN
of this thiseventually
NN eventually uses
uses the the log-sigmoid.
log-sigmoid. DuringDuring the forward
the forward propagation
propagation of the
NN, log-sigmoid controls the data within 0 and 1, and the data change smoothlysmoothly
of the NN, log-sigmoid controls the data within 0 and 1, and the data change without
without
data data diffusion,
diffusion, so log-sigmoid
so log-sigmoid obtains aobtains
higher amodel
higheraccuracy.
model accuracy.

(a) (b)
Figure 3. Transfer curves with different activation functions: (a) fitting curves of log-sigmoid; (b)
Figure 3. Transfer curves with different activation functions: (a) fitting curves of log-sigmoid;
fitting curves of tanh-sigmoid. Easy to see: In the small voltage range, both curves fit quite well.
(b) fitting curves of tanh-sigmoid. Easy to see: In the small voltage range, both curves fit quite well.
However, when using the tanh-sigmoid function, there are obvious errors in the higher-value range.
However, when using the tanh-sigmoid function, there are obvious errors in the higher-value range.
Some criteria were used in this study to better estimate the trained NN models and
Some criteria were used in this study to better estimate the trained NN models and
avoid overfitting. Those criteria include the Akaike Information Criterion (AIC), the cor-
avoid overfitting. Those criteria include the Akaike Information Criterion (AIC), the
rected Akaike Information Criterion (AICc), and the Bayesian Information Criterion (BIC).
corrected Akaike Information Criterion (AICc), and the Bayesian Information Criterion
The AIC method attempts to find the model that explains the data best with the small-
(BIC).
est number
The AICof parameters. The preference
method attempts model
to find the is the
model thatmodel withthe
explains thedata
lowest AIC
best value
with the
smallest number of parameters. The preference model is the model with the lowest AIC
Appl. Sci. 2022, 12, 1357 7 of 16

value [30]. Under the assumption that the model errors are normally distributed, the value
of AIC can be obtained by the following equation:

AIC = 2k + nln( RSS/n) (8)

where k is the number of parameters for the model to be estimated, n is the sample size,
and RSS is the residual sum of squares from the estimated model. The equation of RSS is
as follows:
N  2
RSS = ∑ Inn i i
− Isim (9)
i=1

AICc is a correction of the AIC, which performs better when the sample size is
smaller [31]. The equation of AICc is as follows:

2k(k + 1)
AICc = AIC + (10)
n−k−1
The BIC penalizes parameters more heavily than the AIC. For any two estimated
models, the model with the lower value of BIC is preferred [32]. The equation of BIC is as
follows:
BIC = k × ln(n) + nln( RSS/n) (11)
The value of RR is obtained based on the relative error. Since there is a large magnitude
difference in the currents of different regions, using the relative error can show the fit of the
NN model more intuitively. The value of RR can reflect the fitting quality of the IV curves,
so when the values of RR of the different NN models differ widely, we prefer the NN model
that generates the smallest RR value. When the RR values of different NN models are close
to each other, we select the optimal model by referring to the AIC, AICc, and BIC values.
The AIC, AICc, and BIC values were calculated for each NN model, and the results
are shown in Table 2. RE-10% in Table 2 represents the proportion of data with a relative
error (RE) greater than 10% in the fitting data of the NN model. NN1, NN2, and NN3 are
the optimal models corresponding to the AIC, AICc, and BIC, respectively. NN1, NN4, and
NN5 are NN models with the smallest RR values.

Table 2. Estimation results of different NN models.

No. NN AIC AICc BIC RR RE-10%


1 134-160-160-160-1 −3.6964 × 106 −3.5891 × 106 −2.9608 × 106 5.38% 2.02%
2 134-120-120-120-1 −3.6945 × 106 −3.6624 × 106 −3.2394 × 106 6.18% 3.81%
3 134-100-100-100-1 −3.6564 × 106 −3.6401 × 106 −3.3174 × 106 6.53% 5.83%
4 134-150-150-150-1 −3.6832 × 106 −3.6031 × 106 −3.0238 × 106 5.48% 2.67%
5 134-190-190-190-1 −3.5421 × 106 −3.2832 × 106 −2.5539 × 106 5.66% 2.95%

Although NN2 and NN3 achieved optimality under the AICc and BIC criteria, respec-
tively, their RR values were significantly higher than the NN1 model. The equations for AIC
and BIC can be divided into two items: the first one is the complexity penalty of the model
(2k for AIC, k × ln(n) for BIC), and the second one is the accuracy penalty of the model
(n × ln(RSS/n)). The first item is mainly used to avoid the overfitting of the model, and the
second item considers the fitting of the model itself. The first item is a penalty to the second
item, and the model with the fewest parameters is selected under the condition of satisfying
the model validity and reliability, so the complexity must be considered after satisfying
the high accuracy first. The difference in the accuracy penalty between the different NN
models refers to the difference in RSS. The RSS value is 3.7910 × 10−5 , 5.2934 × 10−5 , and
7.5426 × 10−5 for the NN1, NN2, and NN3 models, respectively. There is a significant
difference in the accuracy of the three NN models, and the accuracy of the NN1 model is
Appl. Sci. 2022, 12, x FOR PEER REVIEW 8 of 16
Appl. Sci. 2022, 12, 1357 8 of 16

model is higher than the other two models. Therefore, the NN2 and NN3 models were not
higher as
selected than
thethe other model.
optimal two models. Therefore, the NN2 and NN3 models were not selected
as the optimal model.
The RR values for the NN1, NN4, and NN5 models were close, so we further evalu-
The using
ated them RR values for the
the AIC, NN1,
AICc, NN4,
and BICand NN5 The
criteria. models
AIC,were close,
AICc, andsoBIC
wevalues
furtherfor
evaluated
NN5
them using the AIC, AICc, and BIC criteria. The AIC, AICc, and BIC values for NN5 are
are larger than those of NN1 and NN4, while the AIC, AICc, and BIC values for NN1 and
larger than those of NN1 and NN4, while the AIC, AICc, and BIC values for−5 NN1 and NN4
NN4 are closer to each other. The RSS value is 3.7910 × 10 −5 and 4.4666 × 10 for the NN1
are closer to each other. The RSS value is 3.7910 × 10−5 and 4.4666 × 10−5 for the NN1
and NN4 models, respectively. Therefore, we eventually choose the NN1 model for fur-
and NN4 models, respectively. Therefore, we eventually choose the NN1 model for further
ther work. According to the RE-10% in Table 2, when the NN1 model fits the data, an RE
work. According to the RE-10% in Table 2, when the NN1 model fits the data, an RE of
of nearly 98% of the data is within 10%. It took 1.7 h to train the NN1 model, with a train-
nearly 98% of the data is within 10%. It took 1.7 h to train the NN1 model, with a training
ing error of 4.14% and a testing error of 5.38%.
error of 4.14% and a testing error of 5.38%.
4. 4.
Results and
Results Discussion
and Discussion
4.1. Error
4.1. Errorandand Performance
Performance Analysis
Analysis
The
Themean
mean value
valueμµ ofof
thethe REREforfor
thethe
training
training data
datais is
2.81%,
2.81%, andandthethe
standard
standard deviation
deviation
σ of
σ ofthe
theRERE is is
3.04%.
3.04%. TheThetraining
training data
datawith
with ananRERE greater
greater than
than μµ + 3σ
+ 3σaccounted
accounted forfor
0.89%
0.89%
ofofallall
training
training data.
data.We We performed
performed anan
analysis
analysis ononthe
thetraining
training data
datawith
withanan
RERE
greater
greaterthan
than
μ µ+ +3σ3σandandcounted
countedthe thedistributions
distributionsofoftemperature,
temperature,gate gatewidth,
width,and andgate
gatelength.
length.The The
statistical
statistical results
results are
areshown
shown ininFigures
Figures4–6,4–6,respectively.
respectively.
According
According toto Figure
Figure 4,4,
it it
cancan
bebe seen
seen that
that forfor
thethe training
training data
data with
with anan
RERE greater
greater than
than
μµ + 3σ,
+ 3σ, thethe proportions
proportions ofof different
different bars
bars arearenotnot very
very different.
different. TheThe proportion
proportion ofof the
the first
first
barbar
inin Figure
Figure 5 is
5 is lower
lower than
than that
that ofof the
the other
other bars,
bars, which
which is is because
because the
the gate
gate width
width range
range
ofof this
this bar
bar is is smaller.The
smaller. The overall
overall distribution
distribution ofof the
the gate
gate width
width is is uniform
uniform when
when the
the gate
gate
width
width range
range is is considered.
considered. FromFrom Figure
Figure 6, 6,
it it
can canbebe observed
observed that
that the
the proportion
proportion ofof data
data
witha asmall
with smallgate gatelength
length is very large, large,and
andthe theproportion
proportionalmostalmost decreases
decreaseswith
withthethe
increase
in-
in the gate length. It indicates that the NN model fits less
crease in the gate length. It indicates that the NN model fits less well to the data with well to the data with a small
a
gate gate
small length. Due Due
length. to thetoshort-channel
the short-channel effect,effect,
more complex
more complex device device
physicsphysics
mathematics
mathe-are
involved
matics around the
are involved minimum
around gate length
the minimum gate [10], so it[10],
length is reasonable that the error
so it is reasonable is slightly
that the error
is higher
slightlyathigher
the smaller
at thegate
smallerlength.
gate length.

20% 18.56%
18% 16.46%
16% 15.28%
14.34% [-40, -25]
14%
12.42% [-25, 0]
11.23% 11.72%
Percentage

12%
[0, 25]
10%
[25, 50]
8%
[50, 75]
6%
[75, 100]
4%
[100, 125]
2%

0%
T [℃]

Figure
Figure 4. 4. Temperature
Temperature distribution
distribution ofof the
the training
training data
data with
with ananRERE greater
greater than
than μµ + 3σ.
+ 3σ. The
The training
training
data with an RE greater than μ + 3σ accounted for 0.89% of all training data.
data with an RE greater than µ + 3σ accounted for 0.89% of all training data.
Appl.
Appl. Sci.Sci.
Sci.
Appl. 2022,
2022, 12,12,
12,
2022, x FOR
x FOR PEER REVIEW
1357PEER REVIEW 9 of
9 16
of9 16
of 16

25%
25%
21.97%
21.97%

20%
20% 17.84%
17.84% 17.32% 16.98%
16.94% 17.32% 16.98%
16.94% [654,
[654, 700]
700]

Percentage
Percentage 15%
15% [700, 800]
[700, 800]

[800,
[800, 900]
900]
10%
10% 8.95%
8.95% [900, 1000]
[900, 1000]

[1000,
[1000, 1100]
1100]
5%
5% [1100, 1200]
[1100, 1200]

0%
0%
W [nm]

Figure
Figure
Figure 5. 5. Gate
5. Gate
Gate width
width
width distribution
distribution
distribution of of
of thethe
the training
training
training data
data
data with
with
with anan
an RERE
RE greater
greater
greater than
thanthan
μ + 3σ.
μ ++µ 3σ.
3σ. The
The
The training
training
training
data with
data an
with RE
an greater
RE than
greater μ
than +
µ 3σ
+ accounted
3σ accountedfor 0.89%
for of
0.89% all
of training
all data.
training
data with an RE greater than μ + 3σ accounted for 0.89% of all training data. data.

60%
60%

50% 47.67%
47.67%
50%

40%
40% [56, 200]
[56, 200]
Percentage
Percentage

[200,
[200, 400]
400]
30%
30%
[400, 600]
[400, 600]
20.78%
20.78%
20% [600,
[600, 800]
800]
20%
12.55%
12.55% [800, 1000]
[800, 1000]
9.38%
9.38% 9.61%
9.61%
10%
10%

0%
0%
L [nm]

Figure 6. Gate
Figure
Figure 6. Gate length
6. Gate
length distribution
length
distribution of of
distribution
of thethe
the training
training data
training
data with
data anan
with
with an RERE
RE greater than
greater
greater μ ++µ 3σ.
than
than μ 3σ. The
+ 3σ. training
The
The training
training
data
data with
data
with anan
with
an RERE
RE greater
greater than
greater
than μ ++µ3σ
than
μ 3σ accounted
+ 3σ accounted
accounted forfor
for 0.89% of of
0.89%
0.89% of allall
all training data.
training
training data.
data.

TheThe mean
mean value
value μ ofµ of
RERE forfor
thethe test
test datadata is 3.89%,
is 3.89%, andandthethe standard
standard deviation
deviation σ ofσ of
RERE
is 3.71%.
is 3.71%. TheThe
testtest
datadata
withwith
ananRERE greater
greater thanthan
μ +µ3σ+ 3σ accounted
accounted forfor 1.50%
1.50% of all
of all testtest data.
data.
TheThe distributions
distributions of of temperature,
temperature, gate
gate width,
width, andand gate
gate length
length forfor
thethe test
test data
data withwithanan
RERE
greater
greater than
than μµ + 3σ
+ 3σ areare shown
shown ininFigures
Figures7–9,7–9,respectively.
respectively.According
Accordingto toFigures
Figures77and and 8,
8, it
it can be seen that for temperature and gate width, the proportions of different bars arenot
can be seen that for temperature and gate width, the proportions of different bars are
notvery
verydifferent.
different.According
According to to
Figure
Figure9, it
9,can be be
it can seen thatthat
seen thethe
proportion
proportion of data withwith
of data a small
a
gate length is higher. The NN model fits less well to the training data
small gate length is higher. The NN model fits less well to the training data for small gate for small gate lengths
and, therefore,
lengths fits thefits
and, therefore, test data
the testfor small
data gate lengths
for small less well.
gate lengths less well.
A statistical analysis of the IV curves for the training data was performed. It was found
that for the Ids –Vgs curves, the error of Vgs from 0 to 0.2 V was slightly higher than the other
regions of the curves. It is basically in the subthreshold region when Vgs is between 0 and
0.2 V. According to Equation (4), it is known that the relationship between current and
voltage in the subthreshold region is exponential. The exponential relation has infinite order
derivatives, and its Taylor series expansion is composed of infinite polynomials. According
Appl. Sci. 2022, 12, 1357 10 of 16

to Equation (5), the relationship between current and voltage in the non-subthreshold region
is polynomial. When using the same NN model for subthreshold and non-subthreshold
regions, the finite polynomial for the non-subthreshold region is easier to fit and, therefore,
its modeling accuracy is higher. The errors were counted separately for the subthreshold
and non-subthreshold data in the training data using Equation (3), with 4.77% for the
subthreshold region and 3.83% for the non-subthreshold region. At the same time, the
errors were counted separately for the subthreshold and non-subthreshold data in the
Appl. Sci.
Appl. Sci. 2022,
2022, 12,
12, xx FOR
FOR PEER
PEER REVIEW
REVIEW 10 of
10 of 16
16
test data using Equation (3), with 6.89% for the subthreshold region and 4.79% for the
non-subthreshold region.

25%

19.40%
20% 18.31%
[-40, -25]

[-25, 0]
14.45% 14.70%
Percentage
Percentage

15%
[0, 25]
11.41% 10.97%
10.75%
[25, 50]
10%
[50, 75]

[75, 100]
5%
[100, 125]

0%
T [℃]

Figure
Figure 7. Temperature
7. Temperature distribution
distribution of of
thethe
testtest data
data with
with anan
RERE greater
greater than
than + 3σ.
μ +µ3σ. TheThe
testtest
datadata with
with
an an
RERE greater
greater than
than + 3σ
μ +µ3σ accounted
accounted forfor 1.50%
1.50% of all
of all testtest data.
data.

25%

20.69%
20%
17.86%
16.60% [654, 750]
15.81%
14.37% 14.67%
Percentage
Percentage

15% [750, 900]

[900, 1050]

10% [1050, 1200]

[1200, 1350]

5% [1350, 1500]

0%
W [nm]

Figure
Figure 8. Gate
8. Gate width
width distribution
distribution of the
of the testtest data
data withwith
anan RERE greater
greater than
than + 3σ.
μ +µ3σ. TheThe
testtest data
data with
with
an an
RERE greater
greater than
than + 3σ
μ +µ3σ accounted
accounted forfor 1.50%
1.50% of all
of all testtest data.
data.
Appl.
Appl. Sci.Sci. 12,12,
2022,
2022, 1357PEER REVIEW
x FOR 11 16
11 of of 16

35% 33.23%

30%

25% [56, 200]


20.82%
Percentage [200, 400]
20% 17.58%
[400, 600]
15% [600, 800]
11.77%

10% 8.03% 8.58% [800, 1000]

[1000, 1200]
5%

0%
L [nm]

Figure
Figure 9. Gate
9. Gate length
length distribution
distribution of the
of the testtest data
data with
with anan
RERE greater
greater than
than + 3σ.
μ +µ3σ. TheThe
testtest data
data with
with
an an
RERE greater
greater than
than + 3σ
μ +µ3σ accounted
accounted forfor 1.50%
1.50% of of
allall
testtest data.
data.

A Multiple
statisticalsets of IV curve
analysis of the data from afor
IV curves device need to be
the training collected
data to extract different
was performed. It was
model parameters when extracting [27]. Thus, the NN model
found that for the Ids–Vgs curves, the error of Vgs from 0 to 0.2 V was slightly needs to fit multiple sets of
higher than
the other regions of the curves. It is basically in the subthreshold region when Vgs is be- 3,
IV curves from the device well. Ten randomly selected sets of inputs are shown in Table
and 0the
tween andIV0.2
curve fit for the first
V. According three sets
to Equation (4),ofitinputs
is knownare that
shown thein Figures 10–12,
relationship where
between
current and voltage in the subthreshold region is exponential. The exponential relationthe
the curves indicate the IV curves from the SPICE simulation, and the dots indicate
haspredicted valuederivatives,
infinite order of the NN. The andfitting errors
its Taylor of the
series ten sets of
expansion is inputs
composedare shown in Table
of infinite pol- 4,
where RR-all represents the error of the whole region. RR-sub and
ynomials. According to Equation (5), the relationship between current and voltage in the RR-nonsub represent
the error in the subthreshold
non-subthreshold and non-subthreshold
region is polynomial. When using the regions,
same NNrespectively.
model forThe results show
subthreshold
that the overall fit is good, demonstrating that the NN model achieves accurate modeling
and non-subthreshold regions, the finite polynomial for the non-subthreshold region is
of the IV curves for the BSIM-SOI model.
easier to fit and, therefore, its modeling accuracy is higher. The errors were counted sep-
arately
Table for thesets
3. Ten subthreshold
of inputs. and non-subthreshold data in the training data using Equa-
tion (3), with 4.77% for the subthreshold region and 3.83% for the non-subthreshold re-
T gion.
W At the same L time, theBigc errors were countedNrecr0separately for the subthreshold
Beta2 Vtun0 and non-
Ntun
No.
(◦ C) ((Fs 2 /g)0.5 m−1 V−1 )
subthreshold data in the test data using Equation (3), with 6.89% for the subthreshold
(nm) (nm) (-) (V) (V) (-)
1 43 region
804 and 4.79%
705 for the 0.004556
non-subthreshold3.035 region. 0.06692 0.5217 1.179
2 40 742 Multiple 364sets of IV curve data from a device
0.003386 3.300 need 0.08025
to be collected0.4921
to extract different
1.046
3 65 936
model 1055
parameters when 0.002766
extracting [27]. Thus, 2.795the NN model
0.07883 needs to0.5806
fit multiple 1.346
sets of
4 −26 1178
IV curves from 254 the device 0.006996
well. Ten randomly 2.529selected0.04238
sets of inputs0.3751
are shown in1.016 Table
5 31 758 463 0.007036 2.991 0.03853 1.1110 1.410
3, and the IV curve fit for the first three sets of inputs are shown in Figures 10–12, where
6 28 1237 182 0.009446 2.240 0.03931 0.5623 0.997
7 −5 the curves indicate
1211 405 the IV curves from the2.665
0.006202 SPICE simulation,
0.07713 and the 0.8417dots indicate1.049 the
8 101 predicted
762 value
363 of the NN. The
0.003523 fitting errors of
2.175 the ten sets of
0.06655 inputs are
1.1560 shown in Table
1.284
9 61 4, where RR-all
1412 834 represents the error of the whole
0.001907 3.303 region.0.06904
RR-sub and RR-nonsub
1.0025 represent
1.497
10 6 902 error in111
the the subthreshold 0.008876 3.252
and non-subthreshold 0.02890 respectively.
regions, 1.1886 1.484
The results
show that the overall fit is good, demonstrating that the NN model achieves accurate
modeling of the IV curves for the BSIM-SOI model.

Table 3. Ten sets of inputs.

T W L Bigc Nrecr0 Beta2 Vtun0 Ntun


No.
(°C) (nm) (nm) ((Fs2/g)0.5m−1V−1) (-) (V) (V) (-)
1 43 804 705 0.004556 3.035 0.06692 0.5217 1.179
2 40 742 364 0.003386 3.300 0.08025 0.4921 1.046
3 65 936 1055 0.002766 2.795 0.07883 0.5806 1.346
4 −26 1178 254 0.006996 2.529 0.04238 0.3751 1.016
5 31 758 463 0.007036 2.991 0.03853 1.1110 1.410
65 28
31 1237
758 182
463 0.009446
0.007036 2.240
2.991 0.03931
0.03853 0.5623
1.1110 0.997
1.410
76 −5
28 1211
1237 405
182 0.006202
0.009446 2.665
2.240 0.07713
0.03931 0.8417
0.5623 1.049
0.997
87 101
−5 762
1211 363
405 0.003523
0.006202 2.175
2.665 0.06655
0.07713 1.1560
0.8417 1.284
1.049
98 61
101 1412
762 834
363 0.001907
0.003523 3.303
2.175 0.06904
0.06655 1.0025
1.1560 1.497
1.284
Appl. Sci. 2022, 12, 1357 12 of 16
109 6
61 902
1412 111
834 0.008876
0.001907 3.252
3.303 0.02890
0.06904 1.1886
1.0025 1.484
1.497
10 6 902 111 0.008876 3.252 0.02890 1.1886 1.484

(a) (b)
Figure 10. IV fitting curves for the first set of inputs: (a) Ids–Vds(b)
(a) curves at different Vgs when Vbs = 0;
Figure 10. IV fitting curves for the first set of inputs: (a) Ids –Vds curves at different Vgs when Vbs = 0;
(b) Ids–V10.
Figure gs curves at different Vbsthe
when
firstVset
ds = 1.0 V.
(b) Ids –Vgs IV fitting
curves at curves for
different Vbs when Vdsof=inputs:
1.0 V. (a) Ids–Vds curves at different Vgs when Vbs = 0;
(b) Ids–Vgs curves at different Vbs when Vds = 1.0 V.

(a) (b)
(a)
Figure 11. IV fitting curves for the second set of inputs: (a) Ids–V(b)
ds curves at different Vgs when Vbs =

0;
Figure –Vgs curves
(b) Ids11. at different Vbs second
when Vds =of1.0inputs:
V.
Figure 11. IV
IV fitting
fittingcurves
curvesfor
forthe
the secondset
set of (a)(a)
inputs: Ids–V –Vcurves
Idsds at different Vgs when Vbs =
ds curves at different Vgs when
0; (b) Ids–Vgs curves at different Vbs when Vds = 1.0 V.
Vbs = 0; (b) Ids –Vgs curves at different Vbs when Vds = 1.0 V.
Appl. Sci.
Appl. Sci. 2022, 12, x1357
2022, 12, FOR PEER REVIEW 1313ofof 16
16

(a) (b)
Figure 12. IV fitting curves for the third set of inputs: (a) Ids–Vds curves at different Vgs when Vbs = 0;
Figure 12. IV fitting curves for the third set of inputs: (a) Ids –Vds curves at different Vgs when Vbs = 0;
(b) Ids–Vgs curves at different Vbs when Vds = 1.0 V.
(b) Ids –Vgs curves at different Vbs when Vds = 1.0 V.
Table 4. Errors of ten sets of inputs.
Table 4. Errors of ten sets of inputs.
No. RR-All RR-Sub RR-Nonsub
No. RR-All RR-Sub RR-Nonsub
1 4.59% 6.83% 4.01%
21 4.59%
3.05% 6.83%
4.02% 4.01%
2.54%
2 3.05% 4.02% 2.54%
33 3.13%
3.13%
3.67%
3.67%
2.87%
2.87%
44 4.54%
4.54% 6.58%
6.58% 3.39%
3.39%
55 5.20%
5.20% 6.06%
6.06% 4.81%
4.81%
66 6.68%
6.68% 7.30%
7.30% 6.41%
6.41%
7 2.95% 3.15% 2.91%
7 2.95% 3.15% 2.91%
8 3.01% 5.08% 2.43%
89 3.01%
2.98% 5.08%
3.63% 2.43%
2.83%
9
10 2.98%
5.91% 3.63%
6.11% 2.83%
5.87%
10 5.91% 6.11% 5.87%
The time comparison between the NN model prediction and SPICE simulation is
shownTheintime
Tablecomparison between
5, comparing thetoNN
the time model5000,
generate prediction
10,000, and SPICE
15,000, and simulation is
20,000 sets of
shown in Table 5, comparing the time to generate 5000, 10,000, 15,000, and 20,000 sets
IV curves. The results show that the trained NN model is thousands of times faster than of
IV curves. The results show that the trained NN model is thousands of times faster than
SPICE.
SPICE.
Table 5. Time comparison between NN model prediction and SPICE simulation.
Table 5. Time comparison between NN model prediction and SPICE simulation.
Number of IV SPICE NN Model Multiplier of Speed
NumberCurves
of IV (s)
SPICE (s)
NN Model Increase
Multiplier of Speed
Curves
5000 (s)
19,934.43 (s)
20.67 Increase
964.41
10,000
5000 39,822.99
19,934.43 30.95
20.67 1286.69
964.41
15,000 59,729.98 46.12 1295.10
10,000 39,822.99 30.95 1286.69
20,000 79,630.48 61.04 1304.56
15,000 59,729.98 46.12 1295.10
20,000 79,630.48 61.04 1304.56
4.2. Device Model Parameter Extraction
The trained
4.2. Device Model NN modelExtraction
Parameter is used for parameter extraction. Measured results include data
of various device sizes. The gate length included 56 nm, 100 nm, 200 nm, 300 nm, 400 nm,
The trained NN model is used for parameter extraction. Measured results include
and 500 nm, and the gate width included 654 nm, 700 nm, 800 nm, 900 nm, and 1000 nm.
data of various
The devices device
were sizes. to
subjected The gate length
different included
operating 56 nm, 100 nm,
temperatures. 200 nm,
We used 300 nm,(3)400
Equation to
nm, and 500 nm, and the gate width included 654 nm, 700 nm, 800 nm, 900 nm,
calculate the error between the measured result and simulation result using the extracted and 1000
nm.
model Theparameters.
devices were
Thesubjected
errors ofto
thedifferent
five setsoperating
of measuredtemperatures. We used
data are 4.88%, Equation
5.71%, 5.62%,
(3) to calculate
5.78%, the Figures
and 5.18%. error between
13 andthe measured
14 show result
part of and simulation
the fitting curves of result using
the first twothe
setsex-
of
tracted model parameters. The errors of the five sets of measured data are 4.88%, 5.71%,
Appl. Sci. 2022, 12, x FOR PEER REVIEW 14 of 16
Appl.
Appl. Sci.
Sci. 2022,
2022, 12,
12, x1357
FOR PEER REVIEW 1414ofof 16
16

5.62%, 5.78%, and 5.18%. Figures 13 and 14 show part of the fitting curves of the first two
5.62%,
measured
sets 5.78%, and
data,
of measured 5.18%.
where
data, theFigures
curves
where 13 and indicate
14the
indicate
the curves show partmeasured
measured
the of results,
the fitting
andcurves
theand
results, ofthe
dots the firstindi-
indicate
dots two
the
sets of measured
simulation data,
results of where
the the
extractedcurves
model indicate the
parameters. measured
The results,
results show and
a the
good dots
fit.
cate the simulation results of the extracted model parameters. The results show a good fit. indi-
cate the simulation results of the extracted model parameters. The results show a good fit.

(a) (b)
(a) (b)
Figure
Figure 13. IV fitting curves of the first set of measured data: (a) (a)
13. IV fitting curves of the first set of measured data: Ids–Vcurves
Ids –V ds curves with T =◦ 0
with T = 0 C,
Figure 13. IV fitting curves of the first set of measured data: (a) I654ds
ds–Vds curves with T = 0
°C, W = 654 nm, L = 200 nm; (b) Ids–Vgs curves with◦ T = 0 °C, W =
W = 654 nm, L = 200 nm; (b) Ids –Vgs curves with T = 0 C, W = 654 nm, L = 200 nm.nm, L = 200 nm.
°C, W = 654 nm, L = 200 nm; (b) Ids–Vgs curves with T = 0 °C, W = 654 nm, L = 200 nm.

(a) (b)
(a) (b)
Figure 14. IV fitting curves of the second set of measured data: (a) Ids–Vds curves with T = 25 °C, W
ds curves with T = 25 °C, ◦W
=Figure 14.
14.LIV
800 nm,
Figure IV fitting
= 500 nm;curves
fitting (b) Idsof
curves ofthe
–V gsthe
second
second
curves set ofof
setT
with =measured
°C, W =data:
measured
25 800 (a)
data: (a)IdsLI–V
nm, = –V
ds 500 curves with T = 25 C,
dsnm.
=W800 ds–Vgs curves with T = 25 °C, W
= 800 nm, L = 500 nm; (b) Ids –Vgs curves with T = 25 ◦ C, W = 800 nm,500
nm, L = 500 nm; (b) I = 800 nm, L = L =nm.
500 nm.
We used SPICE and the NN model to extract parameters from the measured data of
We
a single used SPICE
Wedevice.
used SPICE and the
and the NN
The extractionNN model
model
error toSPICE
to
of the extractwas
extract parameters
parameters from
from
3.27%, and themeasured
it the
took measured data
4.4 h. Thedata ofofa
extrac-
a single
single device.
device. The
The extraction
extraction error
error of of
thethe SPICE
SPICE waswas 3.27%,
3.27%, andand
it it
tooktook
4.4
tion error of the NN model was 4.28%, and it took 8.6 s. Both the SPICE and the NN model4.4
h. h.
TheThe extrac-
extraction
tion
errorerror
worked of onofthe
the the
NN NN model
model
same CPUwas was 4.28%,
4.28%,
processor. and and it took
it took 8.68.6
s. s.Both
Boththe
theSPICE
SPICEand and the
the NN model
model
worked on the same CPU processor.
5. Conclusions
5. Conclusions
5. Conclusions
In this paper, a novel device modeling technique based on NNs for the optimal ex-
In this
In this paper,
paper,aanovel
noveldevice
devicemodeling
modeling technique
technique based
based on on
NNs NNs for
the the optimal
traction of device model parameters is proposed and verified with thefor optimal
BSIM-SOI ex-
model.
extraction
traction of device
of device model
model parameters
parameters is proposed
is proposed and verified
and verified with the BSIM-SOI
with physics-based model.
the BSIM-SOI analyt-
model.
This technique does not require developers to manually implement
This technique
This technique does
does not
notrequire
requiredevelopers
developers totomanually
manually implement
implement physics-based
physics-based analytical
analyt-
ical models, which vastly accelerates the developing of parameter extraction software. The
models,
ical models,which vastly accelerates
which the developing ofofparameter extraction software. The
The
development of thevastly accelerates
program related the developing
to the physics-based parameter extraction
analytical modelssoftware.
in the model
development of
development of the
the program
program related
related to
to the
the physics-based
physics-based analytical
analytical models
models in
in the
the model
model
Appl. Sci. 2022, 12, 1357 15 of 16

parameter extraction software takes about six months, while the development of the NN
model takes less than one week.
The MLP neural network used in this paper obtained 4.14% in overall training error,
with 3.83% in the non-subthreshold and 4.77% in the subthreshold regions. The NN model
has good generalization ability. The overall test error is 5.38% of more than four thousand
IV curves, with 6.89% in the subthreshold and 4.79% in the non-subthreshold regions. The
trained NN model is used for parameter extraction, and the extraction error is less than 6%.
Compared with the embedded SPICE, the extraction speed of the NN model is thousands
of times faster.

Author Contributions: Conceptualization, H.K., Y.W. and L.C.; methodology, H.K.; software, H.K.;
validation, H.K.; investigation, H.K. and Y.W.; data curation, H.K.; writing—original draft preparation,
H.K.; writing—review and editing, H.K., Y.W. and X.Z.; supervision, Y.W.; funding acquisition, Y.W.
All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by Research on Modeling, Analysis and Optimization Technolo-
gies for ULP Circuit (E1GW047002).
Data Availability Statement: Not applicable.
Acknowledgments: The first author, H.K., hereby acknowledges the Institute of Microelectronics of
Chinese Academy of Sciences (IMECAS) and the EDA Center.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Jain, A.K.; Mao, J.; Mohiuddin, K.M. Artificial neural networks: A tutorial. Computer 2015, 29, 31–44. [CrossRef]
2. Canziani, A.; Paszke, A.; Culurciello, E. An Analysis of Deep Neural Network Models for Practical Applications. arXiv 2016,
arXiv:1605.07678.
3. Root, D.E. Future Device Modeling Trends. IEEE Microw. Mag. 2012, 13, 45–59. [CrossRef]
4. Zhang, Z.; Wang, R.; Chen, C.; Huang, Q.; Wang, Y.; Hu, C.; Wu, D.; Wang, J.; Huang, R. New-Generation Design-Technology
Co-Optimization (DTCO): Machine-Learning Assisted Modeling Framework. In Proceedings of the 2019 Silicon Nanoelectronics
Workshop (SNW), Kyoto, Japan, 9–10 June 2019; pp. 1–2.
5. Zhang, A.; Gao, J. InP HBT Small Signal Modeling based on Artificial Neural Network for Millimeter-wave Application. In
Proceedings of the 2020 IEEE MTT-S International Conference on Numerical Electromagnetic and Multiphysics Modeling and
Optimization (NEMO), Hangzhou, China, 7–9 December 2020; pp. 1–3.
6. Liu, W.; Na, W.; Feng, F.; Zhu, L.; Lin, Q. A Wiener-Type Dynamic Neural Network Approach to the Modeling of Nonlinear
Microwave Devices and Its Applications. In Proceedings of the 2020 IEEE MTT-S International Conference on Numerical
Electromagnetic and Multiphysics Modeling and Optimization (NEMO), Hangzhou, China, 7–9 December 2020; pp. 1–3.
7. Wang, S.; Roger, M.; Sarrazin, J.; Lelandais-Perrault, C. Hyperparameter Optimization of Two-Hidden-Layer Neural Networks
for Power Amplifiers Behavioral Modeling Using Genetic Algorithms. IEEE Microw. Wirel. Compon. Lett. 2019, 29, 802–805.
[CrossRef]
8. Liu, W.; Zhu, L.; Feng, F.; Zhang, W.; Liu, G.J.M. A Time Delay Neural Network Based Technique for Nonlinear Microwave
Device Modeling. Micromachines 2020, 11, 831. [CrossRef] [PubMed]
9. Xu, J.; Root, D.E. Advances in artificial neural network models of active devices. In Proceedings of the IEEE Mtt-s International
Conference on Numerical Electromagnetic & Multiphysics Modeling & Optimization, Limoges, France, 6–8 July 2022; pp. 1–3.
10. Zhang, L.; Chan, M. Artificial neural network design for compact modeling of generic transistors. J. Comput. Electron. 2017, 16,
825–832. [CrossRef]
11. Klemme, F.; Prinz, J.; Santen, V.M.v.; Henkel, J.; Amrouch, H. Modeling Emerging Technologies using Machine Learning:
Challenges and Opportunities. In Proceedings of the 2020 IEEE/ACM International Conference On Computer Aided Design
(ICCAD), San Diego, CA, USA, 2–5 November 2020; p. 1367.
12. Li, M.; İrsoy, O.; Cardie, C.; Xing, H.G. Physics-Inspired Neural Networks for Efficient Device Compact Modeling. IEEE J. Explor.
Solid-State Comput. Devices Circuits 2016, 2, 44–49. [CrossRef]
13. Sheu, B.J.; Scharfetter, D.L.; Ko, P.K.; Jeng, M.C.J.S.-S.C. BSIM: Berkeley short-channel IGFET model for MOS transistors. IEEE J.
Solid-State Circuits 1987, 22, 558–566. [CrossRef]
14. Petrosyants, K.O.; Ismail-zade, M.R.; Sambursky, L.M.; Dvornikov, O.V.; Lvov, B.G.; Kharitonov, I.A. Automation of parameter
extraction procedure for Si JFET SPICE model in the −200 . . . +110 ◦ C temperature range. In Proceedings of the 2018 Moscow
Workshop on Electronic and Networking Technologies (MWENT), Moscow, Russia, 14–16 March 2018; pp. 1–5.
15. Cortes-Ordonez, H.; Haddad, C.; Mescot, X.; Romanjek, K.; Ghibaudo, G.; Estrada, M.; Cerdeira, A.; Iñiguez, B. Parameter
Extraction and Compact Modeling of OTFTs From 150 K to 350 K. IEEE Trans. Electron Devices 2020, 67, 5685–5692. [CrossRef]
Appl. Sci. 2022, 12, 1357 16 of 16

16. Leonhardt, A.; Ferreira, L.F.; Bampi, S. Nanoscale FinFET global parameter extraction for the BSIM-CMG model. In Proceedings
of the 2015 IEEE 6th Latin American Symposium on Circuits & Systems (LASCAS), Montevideo, Uruguay, 24–27 February 2015;
pp. 1–4.
17. Tanaka, C.; Matsuzawa, K.; Miyata, T.; Adachi, K.; Hokazono, A. Investigation of BSIM4 parameter extraction and characterization
for Multi Gate Oxide-Dual Work Function (MGO-DWF)-MOSFET. In Proceedings of the 2016 Joint International EUROSOI
Workshop and International Conference on Ultimate Integration on Silicon (EUROSOI-ULIS), Vienna, Austria, 25–27 January
2016; pp. 96–99.
18. Allan, P.J.A.N. Approximation theory of the MLP model in neural networks. Acta Numer. 1999, 8, 143–195.
19. Karlik, B.; Olgac, A.V. Performance Analysis of Various Activation Functions in Generalized MLP Architectures of Neural
Networks. Int. J. Artif. Intell. Expert Syst. 2011, 1, 111–122.
20. Koh, R.; Yamagami, S.; Lee, J.W.; Waka Ba Yashi, H.; Saito, Y.; Ogura, A.; Narihiro, M.; Arai, K.; Takemura, H.; Mogami, T. Soi
Mosfet. U.S. Patent 6933569 B2, 23 August 2005.
21. Iman, R.L.; Davenport, J.M.; Zeigler, D.K. Latin Hypercube Sampling (Program User’s Guide). [LHC, in FORTRAN]; Sandia Labs.:
Albuquerque, NM, USA, 1980.
22. Yang, Y.; Li, C. Quantitative analysis of the generalization ability of deep feedforward neural networks. J. Intell. Fuzzy Syst. 2021,
40, 4867–4876. [CrossRef]
23. Choudhury, T.A.; Hosseinzadeh, N.; Berndt, C.C. Improving the Generalization Ability of an Artificial Neural Network in
Predicting In-Flight Particle Characteristics of an Atmospheric Plasma Spray Process. J. Therm. Spray Techn. 2012, 21, 935–949.
[CrossRef]
24. Shalabi, L.A.; Shaaban, Z. Normalization as a Preprocessing Engine for Data Mining and the Approach of Preference Matrix. In
Proceedings of the 2006 International Conference on Dependability of Computer Systems, Szklarska Poreba, Poland, 25–27 May
2006; pp. 207–214.
25. Tai, Q.Y.; Shin, K.S. GA-based Normalization Approach in Back-propagation Neural Network for Bankruptcy Prediction Modeling.
J. Intell. Inf. Syst. 2010, 16, 1–14.
26. Hecht-Nielsen, R. Theory of the Backpropagation Neural Network. Int. 1989 Jt. Conf. Neural Netw. 1989, 1, 593–605.
27. Villa, A.; Duch, W.; Erdi, P.; Masulli, F.; Palm, G. Artificial Neural Networks and Machine Learning—ICANN 2012; Springer: Berlin,
Germany, 2011.
28. Kingma, D.; Ba, J.J.C.S. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980.
29. Cheng, Y. BSIM3v3 Manual; Department of Electrical Engineering & Computer Science, University of California: Berkeley, CA,
USA, 1996.
30. Akaike, H. Information theory as an extension of the maximum likelihood principle. In Proceedings of the Second International
Symposium on Information Theory, Budapest, Hungary, 2–8 September 1971; pp. 267–281.
31. Sugiura, N. Further analysts of the data by akaike’ s information criterion and the finite corrections. Commun. Stat. Theory Methods
1978, 7, 13–26. [CrossRef]
32. Xue, Y.; Wang, W.; Zhu, H.; Zheng, G.; Cui, Z. An Inversion Method for Shallow Water Geoacoustic Parameters Based on BIC
Criterion. In Proceedings of the 2021 OES China Ocean Acoustics (COA), Harbin, China, 14–17 July 2021; pp. 192–196.

You might also like