Compeleceng D 15 00854R2

Elsevier Editorial System(tm) for Computers
& Electrical Engineering

Manuscript Draft
Manuscript Number: COMPELECENG-D-15-00854R2
Title: Digital Hardware Implementation of a Radial Basis Function Neural

Network
Article Type: SI: asi
Keywords: Radial basis function neural network (RBF NN), Stochastic

gradient descent (SGD), Very high-speed IC hardware description language
(VHDL), Simulink and ModelSim co-simulation, Electronic design automation
(EDA).
Corresponding Author: Prof. YingShieh Kung,
Corresponding Author's Institution:
First Author: Phan Thanh Nguyen
Order of Authors: Phan Thanh Nguyen; YingShieh Kung; Seng-Chi Chen; Hsin-
Hung Chou
Abstract: This work studies a digital hardware implementation of a radial

basis function neural network (RBF NN) Firstly, the architecture of the
RBF NN, which consists of an input layer, a hidden layer of nonlinear
processing neurons with Gaussian function, an output layer and a learning
mechanism, is presented. The supervising learning mechanism based on the
stochastic gradient descent (SGD) method is applied to update the
parameters of RBF NN. Secondly, a very high-speed IC hardware description
language (VHDL) is adopted to describe the behavior of the RBF NN. The
finite state machine (FSM) is applied for reducing the hardware resource
usage. Thirdly, based on the electronic design automation (EDA) simulator
link, a co-simulation work by Simulink and ModelSim is applied to verify
the VHDL code of RBF NN. Finally, some simulation cases are tested to
validate the effectiveness of the proposed digital hardware
implementation of the RBF NN.
Cover Letter
Southern Taiwan University of Science and Technology

Department of Electrical Engineering
1 Nan-Tai St. Yung-Kang Dist., Tainan city, Taiwan
Fax: 886-6-2537461, E-Mail: kung@mail.stust.edu.tw
Nov. 16, 2015
To: Editor in Chief
We have revised our manuscript “Digital Hardware Implementation of

a Radial Basis Function Neural Network” in accordance with the
suggestions of the final version and resubmitted it to the Computers and
Electrical Engineering.
We hope that it would meet all the publication requirements. Thank you for
the kindness advises and suggestions.
Sincerely yours
Ying-Shieh Kung,
Professor
*Detailed Response to Reviewers
Response to Review:
Paper Number: COMPELECENG-D-15-00854R1

Author(s): Nguyen Phan Thanh, Ying-Shieh Kung, Seng-Chi Chen, Hsin-Hung Chou
Paper Title: Digital Hardware Implementation of a Radial Basis Function Neural
Network
The authors want to thank referees’ kindly comments and suggestions. The revisions
made in accordance with the reviewer’ comments are summarized as follows:
Reviewer #1:
My concerns had been approved!!
Ans: Thank you.
Reviewer #2:
It is a well-writing paper. It can be accepted as the present form.
Ans: Thank you.
Reviewer #3:
In my opinion, the paper can be accepted for publication.
Ans: Thank you.
Reviewer #5:
Thank you for the new additions. All my issues have been completed, thus I support
publication at CAEE Journal.
Ans: Thank you.
*Manuscript
Click here to view linked References
Digital Hardware Implementation of a Radial Basis Function

Neural Network
Nguyen Phan Thanha,b, Ying-Shieh Kunga,*, Seng-Chi Chena, Hsin-Hung Chouc

a
Department of Electrical Engineering, Southern Taiwan University of Science and Technology, Tainan, Taiwan
b
Faculty of Electronics and Electrical Engineering, Ho Chi Minh City University of Technology and Education, Ho Chi
Minh City, Vietnam
c
Industrial Technology Research Institute, Hsinchu, Taiwan
*Corresponding author: kung@mail.stust.edu.tw
Abstract:
This work studies a digital hardware implementation of a radial basis function neural network (RBF NN)
Firstly, the architecture of the RBF NN, which consists of an input layer, a hidden layer of nonlinear
processing neurons with Gaussian function, an output layer and a learning mechanism, is presented. The
supervising learning mechanism based on the stochastic gradient descent (SGD) method is applied to update
the parameters of RBF NN. Secondly, a very high-speed IC hardware description language (VHDL) is
adopted to describe the behavior of the RBF NN. The finite state machine (FSM) is applied for reducing the
hardware resource usage. Thirdly, based on the electronic design automation (EDA) simulator link, a co-
simulation work by Simulink and ModelSim is applied to verify the VHDL code of RBF NN. Finally, some
simulation cases are tested to validate the effectiveness of the proposed digital hardware implementation of
the RBF NN.
Keyword: Radial basis function neural network (RBF NN), Stochastic gradient descent (SGD), Very high-
speed IC hardware description language (VHDL), Simulink and ModelSim co-simulation, Electronic design
automation (EDA), permanent magnet synchronous motor (PMSM) drive.
1 INTRODUCTION
Differing from other conventional computing, neural networks have the capability of capturing
unnecessarily deterministic nonlinear and take the advantages of the fast, highly accurate computation for the
non-parametric model. Radial basis function (RBF) becomes a very powerful tool in neural network
architecture due to its faster learning and higher nonlinear activated function used in hidden layer, compared
with back propagation topology [1]. In particular, RBF neural networks process the information responds to
inputs according to a defined learning rule and then the trained network can be used to perform certain tasks
depending on the application. Many certain problems such as human face recognition [2], functional
approximation, nonlinear system identification, motor control [3] etc. are considered to solve in the practical
applications of neural networks. Due to their vigorous approximation properties, RBF neural networks can
present a good mapping on the complex models [4].
The RBF NN is specified by a three-layer with a simple topology, where the hidden unit implements a
radial basis activated function. These networks have a good ability to identify or approximate the function of
unconventional systems with the given set of inputs and their corresponding outputs. To fit the network
outputs to the given inputs, the networks need to be trained to optimize their parameters. If the parameters of
the RBF are fixed, in other words, the position of neurons and the “bell curve” of exponential function could
not be changed, the RBF effectiveness reduces gradually to the linear problem solving. To cover the highly
nonlinear problems, the sufficient learning algorithm that allows modifying the parameters of RBF neural
1
network [5] is introduced, called a stochastic gradient descent (SGD) method. The SGD [6] method is a
popular algorithm for training a wide range of models by following the negative gradient of the objective after
using only a single or a few training examples and computes the next updated parameters at each iteration. It
tends to converge very well to global optima when the objective function is convex or pseudo convex,
otherwise converge to local optima [7]. Therefore, its adaptability due to the ability of adjustable network
parameters may lead to improving the accuracy and the stability of approximations.
As the preliminary survey on several papers, some studies give the various methods for the
implementation of artificial neural networks (ANN) in a digital system. One of the basic problems in the
research [8-10] is how to implement the ANN in fixed-point arithmetic as well as floating point behavior due
to the nonlinear characteristic of exponential activation functions and perform the good responses undergoing
the associated training algorithm for the particular task. Although the floating point number formats are more
precise in computation [11-13] but they spend much hardware resources in the FPGA (Field programmable
gate array) than the fixed-point number formats. To solve this problem, a 32-bit Q24 fixed-point number is
firstly adopted to reduce the hardware resources and still assure high numerical accuracy. The next, this work
proposes an efficient algorithm by combining the Taylor series expansion and a look-up table (LUT) to
calculate the exponential activation function in RBF NN. For a 32-bit Q24 numerical type, the 24-bit length in
the fraction part (Q24) can make the RNF NN or nonlinear Gaussian function have a good numerical
precision. Although the range of the integer part (8-bit) is only between +127 to -128, it still can cover the
data operation to avoid the numerical overflow condition occurred if the normalized/de-normalized operation
in input/output data is applied.
In realization, to speed up the computing performance of RBF NN, FPGA [14-16] gives the best solution
for this application due to its characteristics of the programmable hard-wired feature, fast computation power,
higher density, etc. However, to describe the behavior of computing the RBF NN in FPGA implementation,
VHDL, and Open computing language (OpenCL) [17-19] provide two possible programming tools. VHDL
uses in electronic design automation (EDA) to describe digital and mixed-signal systems such as FPGA and
integrated circuits [20]. VHDL can also be used as a general purpose parallel programming language.
OpenCL is a framework for writing programs that execute across heterogeneous platforms consisting of
central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), FPGAs
and other processors [20]. OpenCL provides parallel computing using task-based and data-based parallelism.
In this study, VHDL is adopted to describe the behavior of computing the RBF NN, and finite state machine
(FSM) is applied. Due to the FSM belongs to the sequential processing method; the FPGA resources usage
can be greatly reduced.
Recently, a co-simulation work by electronic design automation (EDA) Simulator Link has been applied
to verify the effectiveness of the VHDL code in a digital system [21]. The EDA Simulator Link provides a co-
simulation interface between Simulink and ModelSim [22]. This work intends to verify the performance of
RBF NN function in a co-simulation work constructed by Simulink and ModelSim based on EDA simulator
link. This sufficient correlative tools strongly provide the obvious results for comparing with the floating
point results in Matlab. Therefore, in this paper, a co-simulation by EDA Simulator Link is applied to design
and verify the proposed computation algorithm for RBF NN. The input stimuli and output results are
performed in Simulink and the algorithm to compute the RBF NN is executed in ModelSim.
The remainder of this paper is organized as follows: Section 2 introduces the architecture of RNF NN
with SGD-based learning mechanism. Section 3 describes the digital hardware implementation of RBF NN.
Section 4 illustrates the simulation results to evaluate the performance of the proposed method. Section 5
gives a summary of the contribution in this work.
2 RADIAL BASIS FUNCTION NEURAL NETWORK (RBF NN)
This section aims to give a brief introduction of the structure and specifically orient to the mathematical
analysis of RBF NN [23]. However, the RBF NN adopted here is a three-layer architecture that is shown in
Fig. 1. It comprises of one input layer, one hidden layer, one output layer and one supervising learning
machine using stochastic gradient descent method.
In neural architecture, the input layer is mapped on hidden layer via the nonlinear activated function or
radial basis functions (known as neurons), whereas the connection from the hidden layer to output layer
2
performs a linear transformation. The input layer has n1 inputs by x1 , x 2 ,.., x n1 and its vector form is
represented by
X  [ x1 x 2 ... x n1 ]T (1)
In hidden layer, the multivariate Gaussian function is chosen as the activated function, and its formulation is
shown as follows.
X  Ci
2
i  exp(  ), i  1,2,....n2 (2)

2 i2
where n2 is the number of neuron in hidden layer, Ci  [ci1 ci 2 ...cin ]T , i denote the node center of radial basis
1
function and node variance (or width) of ith neuron, and X  Ci is the norm value (Euclidean distance) which
is measured by the inputs and the node center at each neuron. Finally, the network output in Fig. 1 can be
written as
n2
y j   w jii , j  1,2,..., n3 (3)
i 1
where y j is the j th output value, w ji is the weight from the j th output to the i th neuron, i is the output of the
i th neuron, and n3 is the output number.
1 y1d
w11
y1 e1
x1 c11 , c12 ,..c1n1 , 1 w21 w12  -+
2 y2
y2 d
x2 e2
 -+
c21, c22 ,..c2 n1 ,  2
wn3 1 y n3 d
w1n2 y n3 en3
xn1 w2n2  - +
wn3n2
 n2
cn2 1 , cn2 2 ,..cn2 n1 ,  n2
Stochastic gradient
descent learning
algorithm
Input layer Hidden layer Output layer
Fig. 1 Architecture of a RBF NN

In RBF NN, learning algorithm plays an important role for determining the optimal cost function that lets
the specified error between the output of the neural network and the desired output become minimize. The
learning algorithm herein is based on the stochastic gradient descent (SGD) method that has an advantage of
being faster computation than the others. This approach is for training RBF NN by tuning the network’s
parameters, such as the weights, node centers and node variances of the radial basis function, to get a good
convergence. For the SGD-based supervised learning process, the error instantaneous cost function is firstly
defined as
1 n3 2 1 n3
J   e j   ( y jd  y j ) 2 (4)
2 j 1 2 j 1
where the error e j is the difference between the desired output y jd and the j th output y j in RNF NN.
Secondly, the updated equations for the parameters of RBF NN based on SGD method are given by
J
w ji (k  1)  w ji (k )  w , i  1,2,.., n2 and j  1,2,.., n3 (5)
w ji (k )
3
J
cir (k  1)  cir (k )  c , i  1,2,.., n2 and r  1,2,.., n1 (6)
cir (k )
J
 i (k  1)   i (k )   , i  1,2,.., n2 (7)
 i (k )
where w , c ,  are the learning rate. Thirdly, applying the chain rule,
J J e j y j
 (8)
w ji e j y j w ji
J n3
J e j y j i
 ( ) (9)
cir j 1 e j y j i cir
J n3
J e j y j i
 ( ) (10)
 i j 1 e j y j i  i
Furthermore, from (3) and (4), we can get
J e j y j
 e ji (11)
e j y j w ji
and
J e j y j
 e j w ji (12)
e j y j i
Also, from (2), it can be obtained
i x c
 i r 2 ir (13)
cir i
and
X  Ci
2
i
 i (14)
 i  i3
Finally, substituting (11)-(14) into (8)-(10) then into (5)-(7), the updated equations for the parameters of the
weights, node centers and node variances in RBF NN can be shown as follows
w ji (k  1)  w ji (k )  e ji , i  1,2,.., n2 and j  1,2,.., n3 (15)
n3
xr  cir (k )
cir (k  1)  cir (k )   ( e j w ji )i , i  1,2,.., n2 and r  1,2,.., n1 (16)
j 1  i2
X  Ci
n3 2
 i (k  1)   i (k )   ( e j w ji )i , i  1,2,.., n2 (17)

j 1  i3 (k )
where   w  c   is a learning rate.
4
3 DIGITAL HARDWARE IMPLEMENTATION OF A RBF NN
This section shows a detailed digital hardware implementation of RBF NN. However, we firstly introduce
the concept of finite state machine (FSM). Then use FSM to design the RBF NN.
3.1. Finite state machine (FSM)
To reduce the use of the FPGA resource, FSM is adopted to describe the complicated control algorithm.
Herein, the computation of a sum of product (SOP) shown below is considered as a case study to present the
advantage of FSM.
Y  a1 * x1  a2 * x2  a3 * x3 (18)
Two kinds of the design method that one is parallel processing method and the other is FSM method are
introduced to realize the computation of SOP. In the former method, the designed SOP circuit is shown in Fig.
2(a), and it will operate continuously and simultaneously. The circuit needs two adders and three multipliers,
but only one clock time can complete the overall computation. Although the parallel processing method has
fast computation ability, it consumes much more FPGA resources. To reduce the resource usage in FPGA, the
designed SOP circuit adopted by using the FSM method is proposed and shown in Fig. 2(b), which uses one
adder, one multiplier and manipulates 5 steps (or 5 clocks time) machine to carry out the overall computation
of SOP. Although the FSM method needs more operation time (if one clock time is 20ns, the 5 clocks needs
0.1 microseconds) than the parallel processing method in executing SOP circuit, it doesn’t lose any
computation power. Therefore, the more complicated computation in the algorithm, the more FPGA resources
can have economized if the FSM is applied. Further, VHDL code with 32-bit Q24 data type to implement the
computation of SOP is shown in Fig.3.
x1
a1
x1 x a1
x +
+ a2
a2 x y
x2 x + y x3 +
x2
a3
a3 x
x3 x s0 s1 s2 s3 s4
(a) (b)
Fig. 2 SOP computation by using (a) parallel processing method (b) FSM method
LIBRARY IEEE; GEN:block
USE IEEE.std_logic_1164.all; BEGIN
USE IEEE.std_logic_arith.all; PROCESS(CLK_40n)
USE IEEE.std_logic_signed.all; BEGIN
LIBRARY lpm; IF CLK_40n'EVENT and CLK_40n='1' THEN
USE lpm.LPM_COMPONENTS.ALL; CNT<=CNT+1;
ENTITY SoP_32Q24 IS IF CNT=X"00" THEN
port (CLK,CLK_40n : IN STD_LOGIC:='0'; mula<=A1;
X1, X2, X3 : IN STD_LOGIC_VECTOR(31 downto 0):=(others =>'0'); mulb<=X1;
A1, A2, A3 : IN STD_LOGIC_VECTOR(31 downto 0):=(others =>'0'); ELSIF CNT=X"01" THEN
Y : OUT STD_LOGIC_VECTOR(31 downto 0):=(others =>'0') adda<=mulr(55 downto 24);
); mula<=A2;
END SoP_32Q24; mulb<=X2;
ELSIF CNT=X"02" THEN
ARCHITECTURE SoP_arch OF SoP_32Q24 IS addb<=mulr(55 downto 24);
SIGNAL mula,mulb :STD_LOGIC_VECTOR(31 downto 0):=(others =>'0'); mula<=A3;
SIGNAL mulr :STD_LOGIC_VECTOR(63 downto 0):=(others =>'0'); mulb<=X3;
SIGNAL adda,addb,addr :STD_LOGIC_VECTOR(31 downto 0):=(others =>'0'); ELSIF CNT=X"03" THEN
SIGNAL CNT :STD_LOGIC_VECTOR(7 downto 0):=(others =>'0'); adda<=addr;
addb<=mulr(55 downto 24);
BEGIN ELSIF CNT=X"04" THEN
mull: lpm_mult Y <=addr;
generic CNT <=X"00";
map(LPM_WIDTHA=>32,LPM_WIDTHB=>32,LPM_WIDTHS=>32,LPM_WIDTHP=>64, END IF;
LPM_REPRESENTATION=>"SIGNED",LPM_PIPELINE=>1) END IF;
port map(dataa=> mula,datab=>mulb,clock=> clk,result=> mulr); END PROCESS;
adder1: lpm_add_sub END BLOCK GEN;
generic map(lpm_width=>32,LPM_REPRESENTATION=>"SIGNED",lpm_pipeline=>1) END SoP_arch;
port map(dataa=>adda,datab=>addb,clock=> clk,result=>addr);
Fig. 3 SOP computation using VHDL

5
3.2. Design of the digital hardware implementation of an RBF NN
The random selection of the number of neurons in hidden layer might cause either overfitting or
underfitting problems. However, using many neurons in the hidden layer can give much information
processing capacity but increase the training time in artificial neural networks and spend many hardware
resources in FPGA. There are some rules for determining the suitable number of neurons to use in the hidden
layer [24], as follows:
- The number of hidden neurons should be between the size of the input layer and the size of the output
layer.
- The number of hidden neurons should be 2/3 the size of the input layer, plus the size of the output layer.
- The number of hidden neurons should be less than twice the size of the input layer.
According to the suggestion as mentioned above, a basic scheme of a 3-5-1 RBF NN is considered as a
developed example in this paper.
In realization, a 3-5-1 RBF NN with three inputs, five neurons and one output structure in Fig. 4(a) is
used as a designed example to evaluate the proposed method, and its corresponding digital hardware
component is shown in Fig. 4(b). In this Figure, the whole network is specified by two main components. One
component performs the function of the forward computation, and the other component performs the function
of the learning algorithm based on SGD method. To reduce the digital hardware resource usage, a finite state
machine (FSM) is employed, and VHDL is applied to model the behavior of these two components. Further,
the data type adopts 32-bit length with Q24 format and 2’s complement operation in the numerical system.
In the forward computation component, the main inputs are the network inputs X , weights w ji , node
centers Ci and node variances i , as well as the outputs are the network out yi and the neuron outputs i . The
internal circuit design of the forward computation component in Fig. 4(b) is presented in Fig. 5 which uses
five neuron components, two adders, and two multipliers and manipulates 25 steps machine to carry out the
overall forward computation. In Fig.3, the steps s0~s19 execute the parallel calculation of five neuron
components which perform the Gaussian function in (2); and s20~s24 perform the computation of network
output in (3). The operation of each step in Fig. 5 is 20 ns (50 MHz); therefore total 25 steps need only 500ns
operation time.
RBF NN with 3-5-1

y1d
X1 X1 Forward
Computation y1
X2 X2 -+ Error

C11,C12,C13, 1 X3 X3
1
1
N1
Cir(k)
2
Inputs  i (k ) 3
X1 C21,C22,C23, 2 X1
W1i (k) 4 Output of
RBF-NN
2 w11
X2
5
N2 clk
w12 y1d X3
X2 C31 ,C32 ,C33, 
3 3 w13 y1
N3  -+
w14 e1 W1i (k)
X3
C41 ,C42 ,C43, 4 Desired  i (k )
Cir(k)
4 w15
output Error
N4 y1d 1 W1i (k+1)
e1
2
5 3  i (k  1)
C51 ,C52 ,C53,
4
N5
5
5
Cir (k+1)
lr
Learning error Learning
Algorithm clk Algorithm
(a) (b)
Fig. 4 (a) a 3-5-1 RBF NN and (b) its digital hardware component
6
x1 
x2  1
x3 
c11, c12 , c13 Neuron-1 x
1 mult_r1(55:24)
w11
 +


2
2 mult_r2(55:24)
 w12 +
 3

mult_r1(55:24)
3
w13

 4 y

Neuron-4 x + +
c41, c42 , c43 mult_r2(55:24)
4
w 14
5
c51, c52 , c53 Neuron-5 x mult_r1(55:24)
5
w 15
s0 ~ s19 s20 s21 s22 s23 s24
Fig. 5 The internal circuit design of the forward computation component in Fig. 4(b)
The neuron component in Fig. 3 performs the Gaussian function in (2). The digital hardware
implementation of the Gaussian function is very complicated due to it needs to execute the exponential
function. To solve this problem, the combination of Taylor series expansion technique and look-up table
(LUT) technique is conduced. Firstly, the exponential formula e  x is computed by using Taylor series
expansion. The equation is shown as follows.
f ( x0 ) f (3) ( x0 ) f ( 4) ( x0 ) f (5) ( x0 )
f ( x)  f ( x0 )  f ( x0 )( x  x0 )  ( x  x0 ) 2  ( x  x0 )3  ( x  x0 ) 4  ( x  x0 )5  ..... (19)
2! 3! 4! 5!
The fifth order polynomial equation within the vicinity of x0 is considered to expand as follows.
f ( x)  e  x  a0  a1 ( x  x0 )  a2 ( x  x0 ) 2  a3 ( x  x0 )3  a4 ( x  x0 ) 4  a5 ( x  x0 )5 (20)
with
a0  f ( x0 )  e  x0 (21)
a1  f ( x0 )  e  x0 (22)
a2  f ( x0 )  e  x0 / 2 (23)
a3  f (3) ( x0 )  e  x0 / 6 (24)
a4  f ( 4) ( x0 )  e  x0 / 24 (25)
a5  f (5) ( x0 )  e  x0 / 120 (26)
In computation, only fifth order expansion in (20) is not enough to obtain an accuracy approximation due to
the reason that the large error will be occurred when the input x is far from x 0 . Therefore, in this paper, the
7
combination of look-up table (LUT) technique and Taylor series expansion technique is proposed. To set up
the LUT, several specific values for x 0 within the range 1  x  0 are firstly chosen; then the parameters of a 0
to a5 in (21) to (26) are computed. Those data included by x0 , a 0 to a5 will be stored in LUT. At the next,
when it needs to compute e  x in (20), x 0 which the most approximate to input x, and the related a 0 to a5 will
be selected from LUT then perform the computing task. Secondly, under the aforementioned design method,
the internal circuit design of the neuron component in Fig. 5 is shown in Fig. 6 which uses two adders, two
multipliers, one divider, seven look-up tables and manipulates 20 steps machine to carry out the overall
computation. The computation equations to design the circuit in Fig. 6 are listed in (2) and (20) to (26).
However, total 20 steps need 400ns operation time for computing the Gaussian function in (2).
-ci1
LUT x0 a0
i (x0) +
mult_r1(55:24) LUT a1
a0
x1
+ x i mult_r1(54:23)
(a0) x +
mult_r1(55:24)
x LUT
(a1) a1
dx a2 +
-ci2 + LUT a2 x
2 i2 (a2)
2 a3 + i
x2 Addr = LUT
a3 x dx x
a4
+ x norm quot (35:28) (a3) mult_r2(55:24)
x +
mult_r2(55:24) +  LUT dx3 a5
quot (63:0)
(a4)
a4
dx
x dx4
x
mult_r2(55:24)
-ci3 LUT x
a5 dx dx5
x=
(a5)
- x0 dx x mult_r2(55:24)
x3
+ x quot (39:8)
x dx
mult_r1(55:24)
+ dx
s0 s1 s2 s3 s4 s5 ~ s10 s11 s12 s13 s14 s15 s16 s17 s18 s19
Fig. 6 The internal circuit design of the neuron component in Fig. 5
In the learning algorithm component in Fig. 4(b), the main inputs are previous weight w ji (k ) , previous
node centers Ci (k ) , previous node variances  i (k ) , learning rate  , error e j and the neuron outputs i as well
as the outputs are the current weight w ji (k  1) , current node centers Ci (k  1) and current node
variances  i (k  1) . The internal circuit design of the learning algorithm component in Fig. 4(b) is presented in
Fig. 7 which uses five multipliers, three adders, three dividers and manipulates 46 steps machine to carry out
the overall SGD-based learning algorithm. In Fig. 7, the steps s0~s4 execute the weights update in (15); the
steps s5 ~ s28 perform the tuning of node centers in (16); the steps s27 ~ s45 indicate the process of the perform
the node variances adjustment in (17). However, total 46 steps need 920ns operation time for computing the
SGD-based learning algorithm in (15)-(17).
Furthermore, based on Altera FPGA kit (DE2-Cyclone IV– EP4CE115F29C7) which has 114,480 logic
elements and 532 Embedded Multiplier 9-bit elements, the hardware resource usages in implementing the
overall 3-5-1 RBF NN are listed as follows:
(1) For the forward computation component:
- Total logic elements: 14,067/114,480 (12%).
- Embedded Multiplier 9-bit elements: 64/532 (12%)
(2) For the SGD learning algorithm component:
- Total logic elements: 17,309/114,480 (15%).
- Embedded Multiplier 9-bit elements: 40/532 (8%)
8
lr
w11 (k )
1 mult_r1(55:24) lrf1 w11 (k  1)
x x +
mult_r1(55:24)
lr error w12 (k )
2 mult_r2(55:24) lrf2 w12 (k  1)
x x +
mult_r2(55:24)
lr error w13 (k )
3 mult_r3(55:24) lrf3
w13 (k  1)
x x +
mult_r3(55:24)
lr error w14 (k )
4 mult_r4(55:24)
w14 (k  1)
lrf4
x x +
mult_r4(55:24)
lr error
w15 (k )
5 mult_r5(55:24) lrf5 w15 (k  1)
x x +
mult_r5(55:24)
error
s0 s1 s2 s3 s4
w11 c11(k)
mult_r1(55:24)
lrf1
x  lrs1
x + c11(k+1)
quot_r1(39:8)
1 x mult_r2(55:24) c12(k)
x + c12(k+1)
x1 add_r1
+ x
-c11 x2 c13(k)
add_r2
+ x + + norm11
x3
-c12 x + c13(k+1)
add_r3
+ x
-c13
c21(k)

lrf2 mult_r3(55:24) lrs2
x x + c21(k+1)
w12 quot_r2(39:8)
c22(k)
2 x
mult_r4(55:24)
x1
+ x x + c22(k+1)
x2 -c21
+ x + + norm12 c23(k)
-c22
x3 x + c23(k+1)
+ x
-c23 c31(k)
lrf3 mult_r5(55:24) lrs3
x  quot_r3(39:8) x + c31(k+1)
x1
w13 3 x + x c32(k)
-c31
x2
+ x + + norm13 x + c32(k+1)
-c32
x3
+ x x + c33(k+1)
-c33
c33(k)
s11
s5 s6 s7 s8 s9 s10 s13 s14 s15 s16 s17
~s12
w14 c41(k)
mult_r1(55:24)
lrf4
x  lrs4
x + c41(k+1)
quot_r1(39:8)
4 x mult_r2(55:24) c42(k)
x + c42(k+1)
x1 add_r1 mult_r1(55:24)
+ x
-c41 x2 c43(k)
add_r2 mult_r2(55:24)
+ x + + norm14
x3
-c42 x + c43(k+1)
add_r3
+ x mult_r1(55:24)
-c43
c51(k)
lrf5
x mult_r3(55:24)
 lrs5
x + c51(k+1)
w 15 quot_r2(39:8)
c52(k)
5
mult_r4(55:24)
x x1
+ x mult_r2(55:24)
x + c52(k+1)
x2 -c51
+ x + + norm15 c53(k)
mult_r3(55:24)
-c52
x3
+ x x + c53(k+1)
mult_r4(55:24)
-c53
s23
s17 s18 s19 s20 s21 s22 s25 s26 s27 s28
~s24
1 (k)
quot_r1(39:8)
lrs1
 x + 1 (k  1)
1 norm11 2 (k)
2 (k  1)
quot_r2(39:8)
lrs2
 x +
2 norm12
3 (k)
lrs3

quot_r3(39:8)
x + 3 (k  1)
3 norm13
4 (k)
4 (k  1)
quot_r1(39:8)

lrs4
x +
4 5 (k)
norm14
5 (k  1)
quot_r2(39:8)
lrs5
 x +
5
norm15
s27 s28 ~s34 s35 s36 s37 ~s42 s43 s44 s45
Fig. 7 The internal circuit design of the learning algorithm component in Fig. 4(b)
9
4 SIMULATION RESULTS
To verify the effectiveness of the proposed VHDL code of the 3-5-1 RBF NN (three inputs, five neurons,
and one output) in Fig. 4(b), the co-simulation architecture by using Simulink and ModelSim is applied. In
this architecture, the input stimuli and output responses are run in Simulink, and the function of the RBF NN
is executed in ModelSim. However, the VHDL code of the 3-5-1 RBF NN is firstly tested, and the
applications of the 3-5-1 RBF NN to the system dynamic identification (ID) in the general linear system and
PMSM drive system are further evaluated.
4.1 Verification of the designed VHDL code for RBF NN with 3-5-1 structure
The accuracy and performance of the forward computation function and the overall function with
learning algorithm for a 3-5-1 RBF NN are separately evaluated. In the former case, the co-simulation
architecture using Simulink and ModelSim is built up and shown in Fig. 8. The input stimuli are generated,
and output responses are displayed in Simulink, as well as the computation of the RBF NN is performed in
ModelSim. Those input signals are also sent to the embedded Matlab function which executive the
computation of a 3-5-1 RBF NN using the floating numerical system and the results will be sent back to Fig. 8
and further make a comparison with the output from Modelsim. To evaluate the computational accuracy, two
tested cases with different weighting values are considered.
Case-1:
w11=-0.23, w12=0.35, w13=0.65; w14=-0.48, w15=0.36 (27)
Case-2:
w11=0.45, w12=-0.23, w13=0.45; w14=0.62, w15=-0.12 (28)
Also, the centers and variances in five neurons are set by
Neuron-1: c11=0.55,c12=-0.7, c13=0.32,  1 =0.6 (29)
Neuron-2: c21=0.5, c22=0.3, c23=-0.52,  2 =0.7 (30)
Neuron-3: c31=0.5, c32=0.3, c33=0.4,  3 =0.65 (31)
Neuron-4: c41=0.4, c42=0.35, c43=-0.43,  4 =0.8 (32)
Neuron-5: c51=0.55, c52=-0.35, c53=0.44,  5 =0.85 (33)
With two selected cases in (27)-(28), the simulation results by the specific inputs are listed in Table 1. This
table also displays the output from Modelsim and Matlab, and the error between them. The results show that
the output of y from Modelsim and y’ from Matlab is very close in these two cases. The maximum error
between them is 3.422351x10-7. It presents that the forward computation function of the proposed VHDL
code for a 3-5-1 RBF NN is correct and effective.
Fig. 8 Simulink and ModelSim co-simulation architecture for the forward computation of a 3-5-1 RBF NN
10
Table 1. Simulation results in the selected two cases
Case-1 (w11=-0.23, w12=0.35, w13=0.65; w14=-0.48, w15=0.36)

Outputs from Outputs from
Inputs error
Modelsim Matlab
x1 x2 x3 y y’ y’-y
0.36 -0.51 0.92 0.26112145185471 0.26112168567535 2.338206e-7
0.36 0.66 0.92 0.46340209245682 0.46340215565847 6.320165e-8
0.36 -0.51 -0.31 0.03410816192627 0.03410848598392 3.240576e-7
Case-2 (w11=0.45, w12=-0.23, w13=0.45; w14=0.62, w15=-0.12)

Outputs from Outputs from
Inputs error
Modelsim Matlab
x1 x2 x3 y y’ y’-y
0.36 -0.51 0.92 0.21531194448471 0.21531222288057 2.783958e-7
0.36 0.66 0.92 0.28644597530365 0.28644631753885 3.422351e-7
0.36 -0.51 -0.31 0.36509042978287 0.36509042477819 -5.004676e-9

Note that: Neuron parameters at two cases are as follows:
neuron-1: c11=0.55, c12=-0.7, c13=0.32, sigma1=0.6
neuron-2: c21=0.5, c22=0.3, c23=-0.52, sigma2=0.7
neuron-3: c31=0.5, c 32=0.3, c33=0.4, sigma 3=0.65
neuron-4: c41=0.4, c 42=0.35, c 43=-0.43, sigma 4=0.8
neuron-5: c51=0.55, c 52=-0.35, c53=0.44, sigma 5=0.85
In the latter case, the overall function with forward computation and the learning algorithm for a 3-5-1
RBF NN in Fig. 4(b) is evaluated. The test condition is designed as follows. Constant values set the inputs,
and the desired output y1d is varied with time. In addition, the initial parameters of RBF NN are set by a
random number and the SGD learning algorithm will continuously tune those parameters according to the
error of the desired output and the output of RBF NN. In this test condition, it is expected that the output of
RBF NN can quickly track the desired output. Therefore, in simulation, the inputs value are set with x1  0.7 ,
x2  0.8 and x3  0.9 , the learning rate is chosen by 0.25, and the desired output is a square wave with 10ms
period and with a varied magnitude of 0.651-12.5-2.51.5-0.850.820.5. Finally, the
simulation result is shown in Fig.9. It demonstrates that the output of RBF NN can’t track the desired output
at the initial condition, but after the parameters of RBF NN is tuned to the suitable values, it can track the
desired output very well. Even the value of the desired output change, the output of RBF NN can quickly
track it within 5ms, and the steady-state tracking error can keep less than 10-6. The simulation result presents
that the forward computation and the learning algorithm of the proposed VHDL code for a 3-5-1 RBF NN are
correct and effective.
11
3
Output response
1
-1 Output of RBF NN
Desired output
-2
-3
0 10 20 30 40 50 60 70 80 90 100
(a) time (ms)
10
Tracking error
-5
-10
0 10 20 30 40 50 60 70 80 90 100
(b) time (ms)
Fig. 9 (a) Tracking response and (b) tracking error

4.2 Applications of the 3-5-1 RBF NN to the system dynamic identification (ID)
After confirming the correctness and effectiveness of the designed VHDL code for 3-5-1 RBF NN, we
apply it to the system dynamic ID issue. Two cases with a general linear system and a PMSM drive system
are tested as follows. However, choosing a suitable learning rate is not only an important but also a difficult
issue because it depends on the system characteristics. In this paper, a try-and-error method is adopted to
choose the learning rate in all simulation cases.
- In the first case, the 3-5-1 RBF NN is applied to identify the dynamic of a linear system and its
identification block diagram is shown in Fig. 10. In the linear plant, a dynamic model [25] is determined
by the follows
y p (k  1)  0.5 y p (k )  0.3 y p (k  1)  u (k ) (34)
Additionally, the inputs of RBF NN are the previous plant output ( y p (k ) , y p (k  1) ) and the plant
input u (k ) as well as the output is the neural network output yrbf (k  1) . It is expected that the neural
network output yrbf (k  1) can quickly track the current plant output y p (k  1) in this design. In the
simulation, the input signal is firstly designed by the following equation.
u (k ) 
1
sin( 10k / 100)  sin( 25k / 100  0.5) (35)
25
Considering the different learning rate by 0.1, 0.25 and 0.5, the simulation results for the plant output and
neural network output between them are shown in Fig. 11. It apparently reveals that the small learning
rate gives slow tracking response, but the large learning rate causes unstable tracking response with
oscillation phenomenon. Additionally, a mean square error (MSE) is defined in (36) which will be taken
as the evaluation index for the tracking performance in the different learning rate.
N
(y d  yrbf ) 2
MSE  i
N (36)
where N is the number of sampling data. Therefore, based on (36), the MSE in Fig. 11 are calculated by
0.064348, 0.034982 and 0.139052 for the different learning rate at 0.1, 0.25 and 0.5, respectively. It
presents that the learning rate chosen by 0.25 appears better-tracking response in this case. Finally, the
12
simulation result demonstrates that the proposed VHDL code for a 3-5-1 RBF NN can be applied well to
identify the dynamic of a linear system.
y p (k  1)
u (k ) Plant
+
y p (k )
_ e
RBF NN
Z 1 yrbf (k  1)
Learning
Algorithm
Fig. 10 Online identify the dynamic of a linear plant using a 3-5-1 RBF NN
0.4
Neural network output
Plant output Learning rate= 0.1
yp y rbf
Output response
0.2
-0.2
-0.4
0 20 40 60 80 100 120 140 160 180 200
(a) time (ms)
0.4 Neural network output

Plant output y rbf Learning rate= 0.25
Output response
yp
0.2
-0.2
-0.4
0 20 40 60 80 100 120 140 160 180 200
(b) time (ms)
1
Neural network output
Plant output y rbf Learning rate= 0.5
Output response
0.5 yp
-0.5
-1
0 20 40 60 80 100 120 140 160 180 200
(c) time (ms)
Fig. 11 The tracking results under different learning rates which are (a) 0.1 (b) 0.25 (c) 0.5
- In the second case, it is considered to focus on identifying the dynamic of the PMSM drive system. The
block diagram regarding as the dynamic identification of PMSM drive system [22] is shown in Fig. 12
and the Simulink/ModelSim co-simulation architecture is presented in Fig. 13. The PMSM, IGBT-based
inverter, and speed command are performed in Simulink, and the speed/current controllers and the
identification (ID) system based on a 3-5-1 RBF NN are executed in ModelSim by three works. The
work-1, work-2 and work-3 of ModelSim in Fig. 12 and in Fig. 13 respectively performs the function of
the PI speed controller, the function of the current controller, coordinate transformation and space vector
pulse width modulation (SVPWM), as well as the function of identifying the dynamic of PMSM drive.
The numerical data type in work-1 and work-2 are specified in 16-bit length with Q15 format and 2’s
13
complement operation, but in work-3 adopts 32-bit length with Q24 format. The inputs in RBF NN are
the previous rotor speed ( r (k  1) , r (k  2) ) and the current command iq (k ) as well as the output is the
estimated rotor speed ˆ r (k ) . Additionally, the inputs to and the output from RBF NN both need to be
normalized and de-normalized process that assures that the values of input and output data for RBF NN
can keep within -1 to 1 and can avoid the numerical overflow condition occurred during the computation
in RBF NN. Therefore, a normalized process for input data (current command) from -1.5A to 1.5A
mapping to -1 to +1 is considered and a de-normalized process for output data (rotor speed) from -1 to +1
mapping to -2000rpm to 2000rpm is applied. In this test condition, it is expected that the estimated rotor
speed ˆ r (k ) can quickly track the current rotor speed r (k ) . In the simulation, the three works in Fig. 13
are implemented in digital hardware design using VHDL. The learning rate is chosen by 0.25. The PI
controller gains designed in work-1 are chosen by 0.367 (Q15 format) and 0.0036 (Q15 format),
respectively. The sampling frequency for the speed control, the current control, and the identification (ID)
system are designed with 2 KHz, 16 KHz, and 16 KHz, respectively. The clocks of 50MHz and 12.5MHz
will supply all works of ModelSim. Further, the designed PMSM parameters used in Fig. 13 are that pole
pairs is 4, stator phase resistance is 1.3, stator inductance is 6.3mH, inertia is J=0.000054 kg*m2 and
friction factor is F=0.00065 N*m*s. When the speed command is step speed values varied from
200400600400200 rpm with the 0.05s period, the simulation results for the rotor speed response,
the output of RBF NN and the tracking error between them are shown in Fig. 14. When the speed
command is a sinusoidal wave with the following equation,
r* (k )  200 sin( 10 (k / 100))  350 sin( 10 (k / 100)  0.5) (37)
the simulation results for the current command, the rotor speed response, the output of RBF NN and the
tracking error between them are presented in Fig. 15. At the start time, the output of RBF NN ̂r can’t
track the rotor speed r in Figs 14-15, but after the parameters of RBF NN is tuned to the suitable values
within 0.003s, the output of RBF NN can follow the rotor speed very well and the maximum tracking
error are within 25 rpm in Fig. 14(b) and 20rpm in Fig.15(c). The simulation result presents that the
proposed VHDL code for a 3-5-1 RBF NN is suitably applied to the dynamic ID in the PMSM drive
system.
work-1 work-2 in ModelSim DC

in ModelSim Power
Current
KI controller Modify
1  Z 1 + iq* vq Park-1 v Clark-1 vref 1
 r*
PWM1
e PI
, 
PWM2
KP + +
—
d,q vref 2 PWM3
IGBT-based
SVPWM
id*  0
PWM4
+ v
— vd ,  a,b,c vref 3 PWM5 Inverter
Speed PI PI PWM6
+
controller —
iq i ia
r ia A BC
d,q , 
ib
id i ib
,  a,b,c ic
ic PMSM
sin ˆe / cos ˆe Park Clark Model
sin /cos of e Flux angle
Transform.
r
Flux angle Current controllers and
coordinate transformation r
r
External load
work-3 in ModelSim
r (k  1)
Z-1 Radial Basis ˆ r (k ) +
r (k  2) Function
Z-2 —
iq* (k ) Neural Network
e
Learning
SimuLink
Identification
System Algorithm
Fig. 12 The block diagram for the dynamic ID of the PMSM drive system
14
double 500
Display 2
0.009009 In1
work-3 in ModelSim RBF Output
Display 3 RBF_Ouput double -K - 499 .5
0.5231
Display 6
In2
RBF Neural Network Gain 4
0.5231
Display 9
0.5231 In3 Identification Error
-5.716 e-005
Normalized_RBF
Nomalized _RBF Output
output
Display 7 Error
0.523 Desired Output Display 1

Normalized_RBF
Nomalized Output
_RBF Output Command
Display 8
RBF_Output
Radial Basic Function Neural Network
0.009009
Desired _Output
Comparison
Comparation Identification
Display 4
double -K -
Gain 2 0.52303019947327
-1 -2 Normalized_Desired
Nomalized Output
_Desired output
1 Z Z
Rotor Speed 1
Display 15
Convert work-1
cmd
up _out 63
0.523
Normalized_Desired
Nomalized Output
_Desired Output
Display 33
Discrete
Timer 2 kp
in up _out Display 14
10000 Convert
Constant 4
ModelSim ui _out
63 ui _out
-K - 499 .5
ki Continuous Gain 5 Display 18

60 Convert Display 5
Constant 5
PI speed _out
499
powergui
rotor _speed
id /q1 Display
Speed _out
HDL Cosimulation 1
Angle
-K - -K -
292 .8
0 Convert
IGBT Gain 3 Gain 1
Display 16
PWM1
cmd_iq In1Conn1 Rotor Speed 3
Constant 1
150
Constant 3
Convert
work-2
cmd_id PWM2
In2Conn2
0.001
Constant 2 Torque
Angle 1
291 .2
Ki_d
1000 Convert In ModelSim PWM3
In3Conn3 DC Voltage Source <Rotor speed wm (rad/s)>
Display 17
Kp_d Tm <Electromagnetic torque Te (N *m)>
entest _quynh
Constant 6 PWM4 <Rotor angle thetam (rad)>
In4Conn4 A
Ki_q m <Stator current is _d (A)> Level -2 M-file
300 Convert
PWM5 B <Stator current is _q (A)> S-Function
CCCT u fcn y
In5Conn5
Constant 7 Kp_q C <Stator current is _a (A)>
<Stator current is _b (A)>
2000 Convert PWM6 Permanent Magnet Embedded
<Stator current is _c (A)>
address_shift In6Conn6 Synchronous Machine MATLAB Function
Constant 8
d-q
0
Address_shift
Convert address
vd _out Subsystem -10 PMSM
info _a vq _out Display 10
-7 325 d
info _b id _out
Display 12 Display 11
info _c
iq _out 57
Stator current
HDL Cosimulation Display 13 vd /q
id /q
Convert -K -
Data Type Conversion 10Saturation Gain 6
Convert -K -
Saturation 1
Data Type Conversion 11 Gain 7
Convert -K -
Data Type Conversion 12Saturation 2 Gain 8

Convert
Data Type Conversion 6
Fig. 13 Simulink/ ModelSim co-simulation architecture for the dynamic identification of the PMSM drive
system
700
600
Speed Command
500
Speed (rpm)
400
Output of RBF NN
300
200
100
0 Rotor speed
-100
0 0.05 0.1 0.15 0.2 0.25
(a) time (s)
60
Tracking error (rpm)
40
20
-20
-40
-60
0 0.05 0.1 0.15 0.2 0.25
(b) time (s)
Fig. 14 (a) Dynamic identification results and (b) the tracking error between the rotor speed and the output
under the step speed command in PMSM drive
15
2
iq*
Current (A)
1
-1
-2
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
(a) time (s)
400
Speed (rpm)
200
-200 Rotor speed

Output of RBF NN
-400
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
(b) time (s)
60
Tracking error (rpm)
40
20
0
-20
-40
-60
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
(c)
Fig. 15 (a) Current response, (b) dynamic identification results and (b) the tracking error between the rotor
speed and the output under the sinusoid speed command in PMSM drive
5 CONCLUSIONS
This study successfully presents a digital hardware implementation of an RBF NN with SGD-based
learning mechanism and demonstrates the results of an excellent accurate and fast computation characteristic
based on EDA simulation environment. The contribution of this work can be summarized as follows.
(1) A high accuracy method, which combines Taylor series expansion technique and look-up table (LUT)
technique to compute the Gaussian function in RBF NN, is proposed. Via to the co-simulation from the
Simulink/ModelSim, the computational accuracy of the proposed method in Gaussian function and
overall RBF NN can reach up10-6~10-7.
(2) Fast computational ability for a complicated RNF NN is presented by using the digital hardware
implementation. For an example, an RBF NN with three inputs, five neurons, one output and one learning
mechanism structure that the computational time only needs 500ns in performing the function of the
forward computation and 920ns in performing the function of the learning algorithm.
(3) Succeed in making the co-simulation environment between Simulink and ModelSim verify the
correctness and the effectiveness of the VHDL code for a 3-5-1 RBF NN. Additionally, the application of
the VHDL code for a 3-5-1 RBF NN to the dynamic ID in the linear system and the PMSM drive system
are also successfully demonstrated
ACKNOWLEDGEMENT
This work was supported by the Ministry of Science and Technology in R.O.C. under Grant no. MOST
104-3113-E-218-001.
16
REFERENCES
[1] S. Haykin, Neural Networks: A Comprehensive Foundation, Prentice Hall, (2nd ed.) 1999
[2] J. Haddadniaa, K. Faeza, M. Ahmadib, A fuzzy hybrid learning algorithm for radial basis function neural
network with application in human face recognition, Patt. Recogn. 2003, pp. 1187–1202.
[3] Y. S. Kung, N. P. Thanh, H. H. Chou, Design and Implementation of a Microprocessor-Based PI
Controller for PMSM Drives, Applied Mechanics and Materials 2015, vols. 764-765, pp. 496-500.
[4] R. Murugadoss, M.Ramakrishnan, Universal Approximation Using Probabilistic Neural Networks with
Sigmoid Activation Functions, (ICAETR) IEEE 2014, pp. 1-4.
[5] A. Esmaeili, N. Mozayani, Adjusting the Parameters of Radial Basis Function Networks Using Particle
Swarm Optimization, Proceedings of the IEEE International Conference on Computational Intelligence
for Measurement Systems and Applications, Hong Kong, China, 2009, pp. 179–181.
[6] S. W. Ellacott, J. C. Mason, I. J. Anderson, Mathematics of Neural Networks: Modes, Algorithms, and
Applications, New York, Springer, 1997.
[7] H. Robbins, D. Siegmund, A convergence theorem for nonnegative almost supermartingales and some
applications, New York, Springer, 1985.
[8] A.C.D. de Souza, M.A.C. Fernandes, Proposal for Parallel Fixed Point Implementation of a Radial Basis
Function Network in an FPGA, IEEE Conference 2014, pp 1-6.
[9] B.Deng, M.Zhang, F.Su, J.Wang, X.Wei, B.Shan, The Implementation of Feedforward Network on Field
Programmable Gate Array, (BMEI) IEEE 2014, pp. 483-487
[10] I.C.Cevikbas, A.S.Ogrenci, G.Dundar, S.Balkir, VLSI Implementation of GRBF (Gaussian Radial Basis
Function) Networks, Proceedings of the IEEE International Symposium on Circuits and Systems; Geneva,
Switzerland, 2000, pp. 28–31.
[11] G. Botella, U. Meyer-Baese, A. García, M. Rodríguez, Quantization analysis and enhancement of a VLSI
gradient-based motion estimation architecture, Digital Signal Processing, 2012, pp. 1174-1187.
[12] C. Shi, R.W. Brodersen, Floating-point to fixed-point conversion with decision errors due to
quantization, Acoustics, Speech, and Signal Processing (ICASSP), vol.5, 2004, pp.V-41-4.
[13] M. Hoffman, P. Bauer, B. Hemrnelman, A. Hasan, Hardware synthesis of artificial neural networks using
field programmable gate arrays and fixed-point numbers, IEEE, 2006, pp. 324-328.
[14] J-P. Deschamps, G.D. Sutter, E.Canto, Guide to FPGA Implementation of Arithmetic Functions (Lecture
Notes in Electrical Engineering), Netherlands, Springer, 2012.
[15] Y. Li , J. Huo, X. Li, J. Wen, Y. Wang and B. Shan, An Open-loop Sin Microstepping Driver Based on
FPGA and the Co-simulation of Modelsim and Simulink, International Conference on Computer,
Mechatronics, Control and Electronic Engineering, 2010, pp. 223-227.
[16] A.R. Omondi, J.C. Rajapakse, FPGA Implementations of Neural Networks, USA, Springer, 2006.
[17] C. Rodriguez-Donate, G. Botella, C. Garcia, E. Cabal-Yepez, A. Garcia-Perez, Early experiences with
OpenCL on FPGAs: convolution case study, Field-Programmable Custom Computing
Machines (FCCM), IEEE, 2015, pp.235-235.
[18] K. Shagrithaya, K. Kepa, P. Athanas, Enabling development of OpenCL applications on FPGA platforms,
Application-Specific Systems, Architectures and Processors (ASAP), IEEE, 2013, pp.26-30.
[19] D. Chen, D. Singh, Invited paper: using OpenCL to evaluate the efficiency of CPUS, GPUS and FPGAS
for information filtering, Field Programmable Logic and Applications (FPL), 2012, pp.5-12.
[20] Wikipedia, the free encyclopedia.
[21] Z.C. Fan and W.J. Hwang, Efficient VLSI Architecture for Training Radial Basis Function Networks,
Sensors Journal, 2013, pp.3848-3877.
[22] Y.S. Kung, Nguyen Phan Thanh, M.S. Wang, Design and Simulation of a Sensorless Permanent Magnet
Synchronous Motor Drive with Microprocessor-based PI Controller and Dedicated Hardware EKF
Estimator, Applied Mathematical Modelling, 2015.
17
[23] M. D. Buhmann, Radial Basis Functions: Theory and Implementations, Cambridge Monographs on
Applied and Computational Mathematics, 2009.
[24] J. Heaton, Introduction to Neural Networks for Java, Heaton Research, Inc., (2nd ed.) 2008.
[25] F. M. Ham, I. Kostanic, Principles of Neurocomputing for Science and Engineering, United Kingdom,
McGraw-Hill Science/Engineering/Math, 2000.
Author biography:
Nguyen Phan Thanh is a lecturer in Faculty of Electronics and Electrical Engineering, Ho Chi Minh
City University of Technology and Education, Ho Chi Minh City, Vietnam. He is currently pursuing his Ph.D.
degree in Electrical Engineering, Southern Taiwan University of Science Technology (STUST), Taiwan. His
major research interests are intelligent system design, automatic motion control, and electric drives.
Co-author-1 biography:
Ying-Shieh Kung was born in Taiwan. He received the Ph.D. degree in Power Mechanical Engineering
from National Tsing Hua University, Taiwan. He is currently a Professor in Southern Taiwan University of
Science and Technology (STUST). His areas of research interest are FPGA-based controller design for AC
motor drives, robot manipulator, and intelligent systems.
Seng-Chi Chen received his Ph.D. degrees in Mechanical Engineering from National Central University
in Taiwan in 2000. His main research interests include precision machine design and manufacture, linear
motor, electric vehicle motor, wind power generator, magnetic levitation, motor drive and servo control
systems.
Hsin-Hung Chou was born in Taiwan, Republic of China. He is currently a Manager and research in the
field of automatic control of machine system in the Industrial Technology Research Institute (ITRI). He is
currently pursuing his Ph.D. degree in Mechanical Engineering, National Chiao Tung University, Hsinchu,
Taiwan. His research interests include robust control, AC servo motor, and Robot system.
18
Graphical Abstract (for review)
Graphical abstract
RBF NN with 3-5-1 architecture

y1d
X1 X1 Forward
Computation y1
X2 X2 -+ Error

C11,C12,C13, 1 X3 X3
1
1
N1
Cir(k)
2
Inputs  i (k ) 3
X1 C21,C22,C23, 2 X1
W1i (k) 4 Output of
RBF-NN
2 w11
X2
5 y1
N2 clk
w12 y1d X3
X2 C31 ,C32 ,C33, 
3 3 w13 y1
N3  -+
w14 e1 W1i (k)
X3
C41 ,C42 ,C43, 4 Desired  i (k )
Cir(k)
4 w15
output Error
N4 y1d 1 W1i (k+1) e1
2
5   i (k  1)
C51 ,C52 ,C53,
43
N5
5
5
Cir (k+1)
lr
Learning error Learning
Algorithm clk Algorithm
(a) 3-5-1 RBF NN structure (b) Its digital hardware component

Highlights (for review)
Research highlight
 A high accuracy method to compute the Gaussian function in the radial basis function neural
network (RBF NN) is proposed.
 The computational accuracy of the propose method in the Gaussian function and the overall
RBF NN can reach up10-6~10-7.
 Fast computational ability for a complicated RNF NN is presented by using the digital
hardware implementation.
 Succeed in making the co-simulation environment between Simulink and ModelSim to verify
the correctness and the effectiveness of the VHDL (Very high-speed IC Hardware
Description Language) code for a 3-5-1 RBF NN.
 The application of the VHDL code for a 3-5-1 RBF NN to the dynamic identification in the
linear system and the PMSM drive system are successfully demonstrated.
Supplementary Interactive Plot Data (CSV)
Click here to download Supplementary Interactive Plot Data (CSV): Plant_identification_lr-0.1.csv

Compeleceng D 15 00854R2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Compeleceng D 15 00854R2

Uploaded by

Copyright:

Available Formats

Elsevier Editorial System(tm) for Computers

& Electrical Engineering

Manuscript Number: COMPELECENG-D-15-00854R2

Title: Digital Hardware Implementation of a Radial Basis Function Neural

Article Type: SI: asi

Keywords: Radial basis function neural network (RBF NN), Stochastic

Corresponding Author: Prof. YingShieh Kung,

Corresponding Author's Institution:

First Author: Phan Thanh Nguyen

Abstract: This work studies a digital hardware implementation of a radial

Southern Taiwan University of Science and Technology

Nov. 16, 2015

To: Editor in Chief

We have revised our manuscript “Digital Hardware Implementation of

Paper Number: COMPELECENG-D-15-00854R1

Digital Hardware Implementation of a Radial Basis Function

Nguyen Phan Thanha,b, Ying-Shieh Kunga,*, Seng-Chi Chena, Hsin-Hung Chouc

*Corresponding author: kung@mail.stust.edu.tw

2 RADIAL BASIS FUNCTION NEURAL NETWORK (RBF NN)

i  exp(  ), i  1,2,....n2 (2)

Input layer Hidden layer Output layer

Fig. 1 Architecture of a RBF NN

Furthermore, from (3) and (4), we can get

Also, from (2), it can be obtained

w ji (k  1)  w ji (k )  e ji , i  1,2,.., n2 and j  1,2,.., n3 (15)

 i (k  1)   i (k )   ( e j w ji )i , i  1,2,.., n2 (17)

where   w  c   is a learning rate.

Fig. 3 SOP computation using VHDL

RBF NN with 3-5-1

a5  f (5) ( x0 )  e  x0 / 120 (26)

Fig. 6 The internal circuit design of the neuron component in Fig. 5

Case-1 (w11=-0.23, w12=0.35, w13=0.65; w14=-0.48, w15=0.36)

0.36 0.66 0.92 0.46340209245682 0.46340215565847 6.320165e-8

0.36 -0.51 -0.31 0.03410816192627 0.03410848598392 3.240576e-7

Case-2 (w11=0.45, w12=-0.23, w13=0.45; w14=0.62, w15=-0.12)

0.36 0.66 0.92 0.28644597530365 0.28644631753885 3.422351e-7

0.36 -0.51 -0.31 0.36509042978287 0.36509042477819 -5.004676e-9

Fig. 9 (a) Tracking response and (b) tracking error

0.4 Neural network output

work-1 work-2 in ModelSim DC

Display 3 RBF_Ouput double -K - 499 .5

0.5231 In3 Identification Error

0.523 Desired Output Display 1

ki Continuous Gain 5 Display 18

Data Type Conversion 10Saturation Gain 6

Data Type Conversion 12Saturation 2 Gain 8

Data Type Conversion 6

-200 Rotor speed

RBF NN with 3-5-1 architecture

(a) 3-5-1 RBF NN structure (b) Its digital hardware component

You might also like