Professional Documents
Culture Documents
a r t i c l e i n f o a b s t r a c t
Article history: The nonlinear spatial grouping process of synapses is one of the fascinating methodologies for neuro-
Received 29 January 2018 computing researchers to achieve the computational power of a neuron. Generally, researchers use neu-
Received in revised form 4 April 2018 ron models that are based on summation (linear), product (linear) or radial basis (nonlinear) aggregation
Accepted 8 April 2018
for the processing of synapses, to construct multi-layered feed-forward neural networks, but all these
Available online 16 April 2018
neuron models and their corresponding neural networks have their advantages or disadvantages. The
multi-layered network generally uses for accomplishing the global approximation of input–output map-
Keywords:
ping but sometimes getting stuck into local minima, while the nonlinear radial basis function (RBF) net-
Quaternionic multi-layer perceptron
Quaternionic back-propagation
work is based on exponentially decaying that uses for local approximation to input–output mapping.
3D transformation Their advantages and disadvantages motivated to design two new artificial neuron models based on com-
Time series prediction pensatory aggregation functions in the quaternionic domain. The net internal potentials of these neuron
models are developed with the compositions of basic summation (linear) and radial basis (nonlinear)
operations on quaternionic-valued input signals. The neuron models based on these aggregation func-
tions ensure faster convergence, better training, and prediction accuracy. The learning and generalization
capabilities of these neurons are verified through various three-dimensional transformations and time
series predictions as benchmark problems.
Ó 2018 Society for Computational Design and Engineering. Publishing Services by Elsevier. This is an open
access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
https://doi.org/10.1016/j.jcde.2018.04.002
2288-4300/Ó 2018 Society for Computational Design and Engineering. Publishing Services by Elsevier.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
34 S. Kumar, B.K. Tripathi / Journal of Computational Design and Engineering 6 (2019) 33–48
based on the combination of linear and nonlinear aggregations 2. Artificial neuron model
among quaternionic-valued input signals.
The hyper-complexed quaternion number (q ¼ RðqÞ þ I1 ðqÞiþ For the quaternionic-valued input signals, the aggregation oper-
I2 ðqÞj þ I3 ðqÞk)1 discovered by Hamilton (Hamilton, 1853) ations (V) of two new neuron models in quaternionic domain are
possesses four components and phase information along different defined by the combinations of a linear (summation) and a nonlin-
components that are embedded within it. This number system has ear (radial basis) aggregation functions with their proportions
been applied in theoretical and applied mathematics, especially for (S : R), where S and R are quaternionic-valued compensatory param-
calculations involving three-dimensional rotations such as in com- eters that categorize the contribution of summation and radial
puter graphics, computer vision, crystallographic texture analysis, basis functions to take into the account the vagueness involved.
and mechanical machine design. Although the multiplication With a view to achieving robust aggregation, both the parameters
operation on quaternions does not follow the commutative property, are adaptable itself during training.
intelligent mathematical formulation will lead to its best use in high- Let q1 ; q2 ,. . ., qL be the quaternionic-valued input signals where
dimensional neuro-computing. The neural networks with quaternion L denotes the number of inputs, Y be the output and V be the net
as a unit of information flow efficiently learn and generalize in high potential of neuron in quaternions. Let f H be the quaternionic-
dimension with fewer neurons (Nitta, 1995; Arena et al., 1996, valued activation function, where H presents a set of quaternion
1997; Minemoto et al., 2017). Further, the back-propagation learning numbers. Let ws1m ; ws2m ,. . ., wsLm be the quaternionic-weights from
algorithm in the quaternionic domain (H-BP) is developed in the inputs to summation (s) term of m th conventional, CSU (Compen-
recent past (Nitta, 1995; Arena et al., 1996, 1997; Matsui et al., satory Summation Unit) (Fig. 1) or CPU (Compensatory Product
2004; Minemoto et al., 2017) using the concept of error-gradient- Unit) (Fig. 2) aggregation based neurons and wRB 1m ; w2m ,. . ., wLm be
RB RB
descent optimization, but it suffers from its basic issues like local the another quaternionic-weights from inputs to radial basis (RB)
minima and slow convergence. Thus, an investigation of the fast and term of m th CSU or CPU neurons; m = 1, 2,. . ., M where M is num-
efficient learning algorithm is highly demanding to overcome the ber of conventional, CSU or CPU neurons in hidden layer of a net-
issues of back-propagation algorithm. work. Let w0 and q0 ¼ 1 þ i þ j þ k be the bias and its input in
In order to capture the nonlinear correlations among quater- quaternion respectively, where i; j and k are the fundamental basis
nionic input patterns, two new compensatory-type aggregation units of a quaternion denoted by italic bold type letter throughout
functions based neurons in the quaternionic domain are presented the paper. These units satisfy the Hamilton rules ( i i ¼
in this paper. The net potential of each neuron is formulated by j j ¼ k k ¼ i j k ¼ 1; i j ¼ j i ¼ k; j k ¼ k j ¼ i
the weighted contributions of summation and radial basis func- and k i ¼ i k ¼ j) (Hamilton, 1853), where the symbol
tions. The multi-layered perceptron (MLP) with the summing neu- denotes quaternionic multiplication and does not satisfies the
rons used for the global approximation of input-output mapping, commutative property (e.g. a b–b a, where a and b are two
but the slow convergence and getting trapped into bad local minima arbitrary quaternionic variables).
are the two main drawbacks. On the other hand, the radial basis
function (RBF) network is used for local approximation of input- 2.1. Conventional neuron model in quaternionic domain
output mapping and presents a faster and efficient learning, but it
is inefficient in the case of constant-valued functions approximation The conventional neuron model is very well-known for real-
(Lee, Chung, Tsai, & Chang, 1999). The advantages and disadvan- valued signals (McCulloch & Pitts, 1943), complex-valued
tages of MLP and RBF networks motivated to design compensatory (Benvenuto & Piazza, 1992; Hirose, 2006; Leung & Haykin, 1991;
summation unit (CSU) and compensatory product unit (CPU) aggre- Nitta, 1997; Nitta, 2000; Tripathi, 2015) and quaternionic domains
gation functions for quaternionic-valued signals. The CSU aggrega- (Arena et al., 1996; Matsui et al., 2004; Nitta, 1995) in three-
tion is based on the compensatory summation of conventional layered neural networks. In all these three domains, the internal
and RBF functions while CPU includes the product of compensatory potential of a conventional neuron is based on the sum of the pro-
conventional and RBF functions in CSU aggregation function. The duct of the corresponding input-weight pairs. The net potential
various benchmark problems confirm the high functionality of pro- (V m ) of m-th conventional neuron with a bias unit in the quater-
posed neuron models based on compensatory aggregation func- nionic domain is expressed as
tions in the quaternionic domain. However, these neurons are
complicated in nature due to the additional parameters used but V m ¼ V Cm þ w0m q0 : ð1Þ
they will outperform when faster convergence and better accuracy
P
are required. where V Cm ¼ Ll¼1 wslm ql .
Rest of this paper is organized as follows: Section 2 mainly The output of the conventional neuron (Y m ) is expressed as
focuses on two proposed compensatory neurons (CSU and CPU) in
quaternionic domain and their architectures. Section 3 presents Y m ¼ f H ðV m Þ: ð2Þ
quaternionic back-propagation (H-BP) learning algorithms for the
networks with conventional (summating), CSU and CPU neurons. 2.2. CSU neuron model in quaternionic domain
In Section 4, the learning of various transformations is governed
through a line and subsequent generalization ability is confirmed The aggregation function of compensatory summation unit
through complicated geometric structures. Various chaotic time ser- (CSU) neuron (V rm ) is defined by the summation of conventional
ies prediction problems are also taken into the account in this sec- P PL
kql wRB k2
tion to convince its applicability in high-dimensional applications. (V Cm ¼ Ll¼1 wslm ql ) and radial basis (V RB
m ¼ e
l¼1 lm ) aggrega-
The final conclusion and future scope of the proposed work are tion functions with their contributions (Sm and Rm ) for quaternionic
described in Section 5. valued signals (ql ), where Sm and Rm are compensatory parameters
of m-th neuron in quaternionic domain that categorize the contri-
bution of conventional summation and radial basis functions. The
net potential (V rm ) of m-th CSU neuron with a bias unit (Fig. 1) in
the quaternionic domain is defined as
1
R; I1 ; I2 and I3 stands for a real and three imaginary components and i; j and k
V rm ¼ V rm1 þ V rm2 þ w0m q0 ð3Þ
are the fundamental basis units of a quaternionic variable respectively.
S. Kumar, B.K. Tripathi / Journal of Computational Design and Engineering 6 (2019) 33–48 35
V pm ¼ V pm1 þ V pm2 þ V pm3 þ w0m q0 ð5Þ There are various activation functions for quaternionic-valued
neurons proposed in the literature. The split-type quaternionic-
where V pm1 ¼ Sm V Cm ; V pm2 ¼ Rm V RB p3 p1 p2
m and V m ¼ V m V m . valued activation function for a neuron addressed in Nitta (1995),
p Arena et al. (1996), Minemoto et al. (2017), Ujang et al. (2011),
The output of CPU neuron (Y m ) is defined as
Kumar and Tripathi (2017), Kumar and Tripathi (2018), Kumar
Y pm ¼ f H ðV pm Þ: ð6Þ and Tripathi (in press) but these activation function is nonregular
36 S. Kumar, B.K. Tripathi / Journal of Computational Design and Engineering 6 (2019) 33–48
1 XN
@E 1 0 @RðV n Þ 3.1. Weight update rules for the network with quaternionic-valued
¼ Rðen Þf ðRðV n ÞÞ
@It ðwmn Þ N @It ðwmn Þ conventional neurons
) ð17Þ
X3
0 @Ik ðV n Þ
þ Ik ðen Þf ðIk ðV n ÞÞ ; for t ¼ 1; 2; 3: The quaternionic-valued conventional neurons have been
@It ðwmn Þ
k¼1 employed in a hidden and output layer of the three-layered archi-
Now substituting Eqs. (16) and (17) in Eq. (15) and we get after tecture. The weight update rule of hidden-output weight pair for
simplification as this architecture can be determined by Eq. (18) after substituting
the quaternionic gradients of all parts of net internal potential
g 0 P
Dwmn ¼ fRðen Þf ðRðV n ÞÞrwmn ðRðV n ÞÞ (V n ¼ M m¼1 wmn Y m þ w0n q0 ). These gradients are computed
N
X3 ð18Þ as:
0
þ Ik ðen Þf ðIk ðV n ÞÞrwmn ðIk ðV n ÞÞg:
rwmn RðV n Þ ¼ Y m : ð28Þ
k¼1
For the weight (w ¼ wlm ) that connects l-th input to m-th hid- rwmn I1 ðV n Þ ¼ i Y m : ð29Þ
den neuron, the weight update is derived conceptually from
Eq. (18) as rwmn I2 ðV n Þ ¼ j Y m : ð30Þ
gX N
0
Dwlm ¼
N n¼1
fRðen Þf ðRðV n ÞÞrwlm ðRðV n ÞÞ rwmn I3 ðV n Þ ¼ k Y m : ð31Þ
ð19Þ Now substituting Eqs. (28)-(31) in Eq. (18) for Dwmn and Dw0n ,
X
3
0
þ Ik ðen Þf ðIk ðV n ÞÞrwlm ðIk ðV n ÞÞg: and we get after simplification
k¼1
g
ðen f H ðV n ÞÞ Y m :
0
Dwmn ¼ ð32Þ
For all four components of net potential (RðV n Þ; I1 ðV n Þ; I2 ðV n Þ, N
and I3 ðV n Þ), the quaternionic gradients of all these with respect to
wlm are derived using the chain rule of derivation and presented as g 0
0 :
Dw0n ¼ ðen f H ðV n ÞÞ q ð33Þ
N
rwlm RðV n Þ ¼ Rðwmn Þf 0 ðRðV m ÞÞrwlm RðV m Þ The weight update rule of the input-hidden pair is derived by
0 Eq. (24) with supporting Eqs. (25)-(27) on substituting the quater-
I1 ðwmn Þf ðI1 ðV m ÞÞrwlm I1 ðV m Þ
0 ð20Þ nionic gradients of all parts of net internal potential
I2 ðwmn Þf ðI2 ðV m ÞÞrwlm I2 ðV m Þ P
0
(V m ¼ Ll¼1 wlm ql þ w0m q0 ). These gradients are computed as:
I3 ðwmn Þf ðI3 ðV m ÞÞrwlm I3 ðV m Þ:
rwlm RðV m Þ ¼ ql : ð34Þ
0
rwlm I1 ðV n Þ ¼ Rðwmn Þf ðI1 ðV m ÞÞrwlm I1 ðV m Þ
0 rwlm I1 ðV m Þ ¼ i ql : ð35Þ
þ I1 ðwmn Þf ðRðV m ÞÞrwlm RðV m Þ
0 ð21Þ
þ I2 ðwmn Þf ðI3 ðV m ÞÞrwlm I3 ðV m Þ rwlm I2 ðV m Þ ¼ j ql : ð36Þ
0
I3 ðwmn Þf ðI2 ðV m ÞÞrwlm I2 ðV m Þ:
rwlm I3 ðV m Þ ¼ k ql : ð37Þ
0
rwlm I2 ðV n Þ ¼ Rðwmn Þf ðI2 ðV m ÞÞrwlm I2 ðV m Þ Now substituting Eqs. (34)-(37) in Eq. (24) and we get after
0
I1 ðwmn Þf ðI3 ðV m ÞÞrwlm I3 ðV m Þ simplification
0 ð22Þ
þ I2 ðwmn Þf ðRðV m ÞÞrwlm RðV m Þ g l :
Dwlm ¼ fm q ð38Þ
0
þ I3 ðwmn Þf ðI1 ðV m ÞÞrwlm I1 ðV m Þ: N
g 0 :
rwlm I3 ðV n Þ ¼ Rðwmn Þf 0 ðI3 ðV m ÞÞrwlm I3 ðV m Þ Dw0m ¼ fm q ð39Þ
N
0
þ I1 ðwmn Þf ðI2 ðV m ÞÞrwlm I2 ðV m Þ The weight update rule is governed by Eqs. (32) and (33) for
0 ð23Þ
I2 ðwmn Þf ðI1 ðV m ÞÞrwlm I1 ðV m Þ hidden-output pair and Eqs. (38) and (39) for the input-hidden pair
0
þ I3 ðwmn Þf ðRðV m ÞÞrwlm RðV m Þ: of the network with conventional neurons in the quaternionic
domain.
Now substituting Eqs. (20)-(23) in Eq. (19) and we get after
3.2. Weight Update Rules for the Network with Proposed
quaternionic gradient wise simplification as
Quaternionic-Valued CSU Neurons in Hidden Layer
g X
3
Dwlm ¼ fRðfm Þrwlm RðV m Þ þ Ik ðfm Þrwlm Ik ðV m Þg: ð24Þ In a three-layered network, the proposed CSU and
N k¼1
conventional-type neurons in quarernionic domain are employed
where in hidden and output layer respectively (Fig. 3). The weight update
rules for hidden-output neuron pairs present same as given in Eqs.
0
X
N
Rðfm Þ ¼ f ðRðV m ÞÞ Rðdmn Þ: ð25Þ (32) and (33) because of conventional neurons are used in the out-
n¼1 put layer of the network. The weight update rule of the input-
hidden pair is derived by Eq. (24) with supporting Eqs. (25), (26)
0
X
N and (27) on substituting the quaternionic gradients of all parts of
It ðfm Þ ¼ f ðIt ðV m ÞÞ It ðdmn Þ; for t ¼ 1; 2; 3: ð26Þ net internal potential (V m ¼ V rm as given in Eq. (3)) with respect
n¼1
to each corresponding weight. Quaternionic gradients of
0
mn ðen f H ðV n ÞÞ:
dmn ¼ w ð27Þ RðV rm Þ; I1 ðV rm Þ; I2 ðV rm Þ and I3 ðV rm Þ with respect to w0m are calcu-
lated as:
38 S. Kumar, B.K. Tripathi / Journal of Computational Design and Engineering 6 (2019) 33–48
rw0m I2 ðV rm Þ ¼ rw0m I2 ðw0m q0 Þ ¼ j q0 : ð42Þ Now substituting Eqs. (58)-(61) in Eq. (24) for Rm and we get
after simplification
rw0m I3 ðV rm Þ ¼ rw0m I3 ðw0m q0 Þ ¼ k q0 : ð43Þ
DRm ¼
g
V RB r
m fm : ð62Þ
N
Now substituting Eqs. (40)-(43) in Eq. (24) and we get after
simplification The weight update rule is governed by Eqs. (32) and (33) for
hidden-output pair and Eqs. (44), (49), (54), (57), and (62) for the
g 0 :
Dw0m ¼ frm q ð44Þ input-hidden pair of the network with conventional neurons at
N output layer and CSU neurons at a hidden layer in the quaternionic
Quaternionic gradients of RðV rm Þ; I1 ðV rm Þ; I2 ðV rm Þ and I3 ðV rm Þ domain.
with respect to wslm are calculated as:
3.3. Weight update rules for the network with proposed quaternionic-
rwslm RðV rm Þ ¼ rwslm RðV rm1 Þ ¼ Sm ql : ð45Þ
valued CPU neurons in hidden layer
rwslm I1 ðV rm Þ ¼ rwslm I1 ðV rm1 Þ ¼ Sm i ql : ð46Þ In a three-layered network, the proposed CPU and
conventional-type neurons for quaternionic-valued signals are
rwslm I2 ðV rm Þ ¼ rwslm I2 ðV rm1 Þ ¼ Sm j ql : ð47Þ employed in hidden and output layers respectively (Fig. 3). Due
to the conventional neurons used in the output layer of the net-
rwslm I3 ðV rm Þ ¼ rwslm I3 ðV rm1 Þ ¼ Sm k ql : ð48Þ work, the weight update rules for hidden-output neuron pairs pre-
sent same as given in Eqs. (32) and (33). The weight update rule of
Now substituting Eqs. (45)-(48) in Eq. (24) and we get after the input-hidden pair is derived by Eq. (24) with supporting Eqs.
simplification (25), (26) and (27) on substituting the quaternionic gradients of
g all parts of net internal potential (V m ¼ V pm ) with respect to corre-
Dwslm ¼ l :
Sm frm q ð49Þ
N sponding weights. Quaternionic gradients of RðV pm Þ; I1 ðV pm Þ;
I2 ðV pm Þ and I3 ðV pm Þ with respect to w0m are calculated as:
Quaternionic gradients of RðV rm Þ; I1 ðV rm Þ; I2 ðV rm Þ and I3 ðV rm Þ
with respect to Sm are calculated as: rw0m RðV pm Þ ¼ rw0m Rðw0m q0 Þ ¼ q0 : ð63Þ
rSm RðV rm Þ ¼ rSm RðV rm1 Þ ¼ V Cm : ð50Þ rw0m I1 ðV pm Þ ¼ rw0m I1 ðw0m q0 Þ ¼ i q0 : ð64Þ
¼ Sm i q
l þ Sm i V pm2 q
l ¼ Sm i ð1 þ V pm2 Þ q
l :
r r2
rwRB It ðV m Þ ¼ rwRB It ðV m Þ ¼ 2It ðRm ÞV RB
m ðql lm Þ
wRB for t
ð69Þ
lm lm
¼ 1; 2; 3: ð56Þ
rwslm I2 ðV pm Þ ¼ rwslm I2 ðV pm1 Þ þ rwslm I2 ðV pm1 V pm2 Þ
Now again substituting Eqs. (55) and (56) in Eq. (24) and we get
after simplification ¼ Sm j q
l þ Sm j V pm2 q
l ¼ Sm j ð1 þ V pm2 Þ q
l :
X3 ð70Þ
2g RB
DwRB
lm ¼ V m fRðfrm ÞRðRm Þ þ Ik ðfrm ÞIk ðRm Þgðql wRB
lm Þ: ð57Þ
N k¼1 rwslm I3 ðV pm Þ ¼ rwslm I3 ðV pm1 Þ þ rwslm I3 ðV pm1 V pm2 Þ
l þ Sm k V pm2 q
¼ Sm k q l ¼ Sm k ð1 þ V pm2 Þ q
r r r r
Quaternionic gradients of RðV m Þ; I1 ðV m Þ; I2 ðV m Þ and I3 ðV m Þ l :
with respect to Rm are calculated as: ð71Þ
Quaternionic gradients of RðV pm Þ; I1 ðV pm Þ; I2 ðV pm Þ and I3 ðV pm Þ 4. Performance analysis of CSU and CPU neurons based
with respect to Sm are derived using chain rule of derivation as: networks
rSm RðV pm Þ ¼ rSm RðV pm1 Þ þ rSm RðV pm1 V pm2 Þ A wide spectrum of benchmark problems from 3D geometrical
ð73Þ
¼ V Cm þ V pm2 V Cm ¼ ð1 þ V pm2 Þ V Cm : transformations; 3D and 4D time series prediction with quater-
nionic back-propagation (H-BP) algorithm for three-layered net-
work based on compensatory summation unit (CSU) or
rSm I1 ðV pm Þ ¼ rSm I1 ðV pm1 Þ þ rSm I1 ðV pm1 V pm2 Þ compensatory product unit (CPU) neurons at the hidden layer
ð74Þ
¼ i V Cm þ i V pm2 V Cm ¼ i ð1 þ V pm2 Þ V Cm : and conventional neurons at the output layer are considered in this
section. The performance evaluation through different statistical
parameters like mean square error (MSE), error variance, correla-
rSm I2 ðV pm Þ ¼ rSm I2 ðV pm1 Þ þ rSm I2 ðV pm1 V pm2 Þ tion, Akaike’s information criterion (AIC) (Fogel, 1991) and predic-
ð75Þ
¼ j V Cm þ j V pm2 V Cm ¼ j ð1 þ V pm2 Þ V Cm : tion gain (PG) (Ujang et al., 2011; Popa, 2016). Lesser values of MSE
and error variance and larger values of AIC and PG for all experi-
ments reveal better accuracy than conventional neuron in quater-
rSm I3 ðV pm Þ ¼ rSm I3 ðV pm1 Þ þ rSm I3 ðV pm1 V pm2 Þ
ð76Þ nionic domain.
¼ k V Cm þ k V pm2 V Cm ¼ k ð1 þ V pm2 Þ V Cm :
Now substituting Eqs. (73)-(76) in Eq. (24) for Sm 4.1. Geometrical transformations in 3D
combinations) through the H-BP algorithm for the networks with tor (i) followed by scaling factor ½ and 0.3 unit translation in the
conventional, CSU, or CPU neurons in the quaternionic domain at positive z-direction as shown in Fig. 4(a), (b) and (c) respectively.
hidden layer and conventional neurons at output layer. Their gen- All 3D points of a straight line with its reference point are treated
eralization abilities are compared with complicated 3D geometric as imaginary quaternion (real part is kept near to zero) which is
structures (Sphere, Cylinder, and Torus) through different statisti- used in all three transformations. In the network, first input
cal parameters. receives set of point that lies on a straight line and second input
A three-layered (2 M 2) network undergoes a learning pro- passes the reference point. The incorporation of the reference
cess for input-output mapping over a straight line containing a point of the object provides more information to learn a system
small number of points (21 data points) along with a reference and yields better accuracy which is observed during experiments
point ((0,0,0) mid of the line) in 3D space, where the network for each transformation.
consists of two inputs, M hidden neurons (conventional, CSU or For all transformations, three 2 M 2 networks are con-
CPU) and two conventional neurons at output layer for structed: first with conventional neurons, second with CSU neu-
quaternionic-valued signals. The input-output mapping as pre- rons and third with CPU neurons at hidden layer; and
sented in Eq. (86) is used for three transformations; scaling with conventional neurons are employed at output layer in all three net-
factor ½, 0.3 unit translation in the positive z-direction followed works. The training process of all networks for all transformations
by scaling with factor ½ and p/2 rad rotation around the unit vec- has been separately performed through H-BP learning algorithm.
Fig. 4. The input-output mapping of a straight line with (a) scaling factor ½, (b) scaling factor ½ and 0.3 unit translation in the positive z-direction and (c) scaling factor ½, 0.3
unit translation in the positive z-direction, and p/2 rad rotation around the unit vector (i).
S. Kumar, B.K. Tripathi / Journal of Computational Design and Engineering 6 (2019) 33–48 41
The training of the networks with proposed CSU or CPU neurons Table 3
requires a lesser number of hidden neurons, a lesser number of Comparison of training and testing performance for scaling, translation, and rotation.
parameters, and lesser training cycles (average epochs) to reach Neuron Type in H Conventional CSU CPU
similar training mean square error (MSE) than the network with Network Topology 2-6-2 2-3-2 2-2-2
conventional neurons and the testing of the trained networks has Parameters 32 29 20
Training MSE 0.001 0.001 0.001
been verified by the complicated geometrical objects in 3D like Average Epochs 26,000 19,000 15,000
Sphere (4141 data points), Cylinder (2929 data points), and Torus Testing MSE Sphere 0.0941 0.0783 0.0491
(10,201 data points) with the comparative analysis based on differ- Cylinder 0.0802 0.0588 0.0324
ent statistical parameters: testing MSE, variance, correlation, AIC Torus 0.0997 0.0870 0.0655
Error Variance Sphere 0.0072 0.0051 0.0043
and PG as reported in Tables 1–3. The huge data points of each
Cylinder 0.0063 0.0045 0.0036
structure are intentionally considered in the testing process for Torus 0.0152 0.0097 0.0085
evaluating the intelligent behavior of the network trained. In order Correlation Sphere 0.9582 0.9743 0.9801
to depict the generalization ability, the testing results of the Cylinder 0.9677 0.9779 0.9891
trained networks with CSU neurons or CPU neurons over different Torus 0.9211 0.9632 0.9702
AIC Sphere 5.0832 5.5623 5.8322
3D objects through H-BP algorithms are demonstrated in Figs. 5 Cylinder 5.4205 5.892 6.2834
and 6 for scaling transformation, Figs. 7 and 8 for scaling & trans- Torus 3.9339 4.4211 4.8993
lation transformation and Figs. 9 and 10 for scaling, translation & PG Sphere 6.71 7.27 7.55
rotation transformation respectively. Tables also present the train- Cylinder 7.53 7.96 8.83
Torus 5.67 6.38 7.02
ing and testing of the network with CPU neurons that gives better
performance over CSU neurons in all perspectives as reported in
Tables 1–3.
4.2. Chaotic time series prediction in 3D using imaginary quaternion
Fig. 5. The generalization of CSU based network trained through H-BP algorithm: Transformation with scaling factor ½ over (a) Sphere (b) Cylinder (c) Torus.
Table 4 dx
Comparison of training and testing performance of Chua’s circuit.
¼ rðy xÞ
dt
Neuron Type in H dy
Conventional CSU CPU
¼ xðq zÞ y ð89Þ
Network Topology 1-4-1 1-2-1 1-1-1 dt
Parameters 13 13 7 dz
Training MSE 0.0074 0.0065 0.0062 ¼ xy bz
Average Epochs 12,000 8000 5000
dt
Testing MSE 0.0075 0.0048 0.0031
where r; q and b symbols are parameters for the Lorenz’s system.
Error Variance 0.0038 0.0021 0.0010
Correlation 0.9839 0.9914 0.9952 This system has been used as a benchmark application in the
AIC 4.8843 5.5821 6.1022 quaternionic-valued neural network (Arena et al., 1996; Popa,
PG 5.72 6.87 7.03 2016; Ujang et al., 2011). Let r ¼ 15; q ¼ 28, and b ¼ 8=3, the sys-
tem of equations presented in Eq. (89) generates 6537 terms of
the time series using fourth-order Runge-Kutta method. When the
conventional through different statistical parameters error vari- dynamics of Lorenz’s system become stationary, the time series
ance, correlation, AIC and PG in Table 4. with 500 terms are supplied for the training of networks through
H-BP learning algorithm and rest of the terms from time series
4.2.2. Lorenz’s system as chaotic time series prediction in 3D are used for testing. The testing results for the prediction of this
Chaotic behavior of the Lorenz’s system (Lorenz, 1963) is gov- time series is analyzed through testing MSE, variance, correlation,
erned by the dynamics of three differential equations as follows PG, and AIC as reported in Table 5. The training and testing results
S. Kumar, B.K. Tripathi / Journal of Computational Design and Engineering 6 (2019) 33–48 43
Fig. 6. The generalization of CPU neurons based network trained through H-BP algorithm: Transformations with scaling factor ½; over (a) Sphere (b) Cylinder (c) Torus.
Table 5
4.3. Time series prediction in 4D using quaternion
Comparison of training and testing performance of Lorenz system.
Neuron Type in H Conventional CSU CPU 4.3.1. Linear autoregressive process with circular noise
Network Topology 1-4-1 1-2-1 1-1-1
The linear autoregressive process with circular noise has been
Parameters 13 13 7
Training MSE 0.0015 0.0010 0.0008 considered as an important benchmark problem as addressed in
Average Epochs 11,000 7000 4500 Ujang et al. (2011), Popa (2016) for predicting quaternionic-valued
Testing MSE 0.0024 0.0018 0.0031 signals. The stable autoregressive filtering process is given as
Error Variance 0.0016 0.0014 0.0009
Correlation 0.9871 0.9924 0.9977 OðnÞ ¼ 1:79 Oðn 1Þ 1:85 Oðn 2Þ þ 1:27 Oðn 3Þ
AIC 6.4503 6.8925 7.3117
PG 8.13 8.75 9.17 0:41 Oðn 4Þ þ #ðnÞ: ð90Þ
Fig. 7. The generalization of CSU based network trained through H-BP algorithm: Transformations with scaling factor ½ and 0.3 unit translation in the positive z-direction;
over (a) Sphere (b) Cylinder (c) Torus.
Fig. 8. The generalization of CPU neurons based network trained through H-BP algorithm: Transformations with scaling factor ½ and 0.3 unit translation in the positive z-
direction; over (a) Sphere (b) Cylinder (c) Torus.
4.3.2. Chaotic time series prediction for 4D Saito’s circuit valued neural network (Arena et al., 1996; Popa, 2016). The
The Saito’s circuit is an electronic circuit which is defined by the chaotic behavior of the system is based on the values of its
following system of four differential equations which contain four parameters (a1 ¼ 7:5; b1 ¼ 0:16; a2 ¼ 15; b2 ¼ 0:097; f ¼ 1:3). First
variables (x1 ; y1 ; x2 ; y2 ) and five parameters (a1 ; b1 ; a2 ; b2 ; f) 500 terms of the series generated by Eq. (91) when its dynamics
" dx # " # is in stationary state are used for training of three different net-
1
dt 1 1 x1 fp1 hðzÞ works with H-BP learning algorithm and testing is performed on
¼
dy1 a1 a1 b1 y1 fpb11 hðzÞ the next 1000 terms of this series. The network with the pro-
" dxdt # " # ð91Þ posed neurons (CSU and CPU) requires smaller network topology
dt
2
1 1 x2 fp2 hðzÞ
¼ (lesser parameters) and drastically reduces the training epochs
dy2 a2 a2 b2 y2 fpb22 hðzÞ
dt than the network with the conventional neurons at hidden layer
8 as presented in Table 7. The testing results through testing MSE,
< 1 for z < 1
variance, correlation, PG, and AIC for the prediction of Saito time
where hðzÞ ¼ 1 for z P 1 , is the normalized hysteresis
: series is also reported in Table 7. The training and testing results
again reveal the superiority of CSU and CPU neuron over conven-
value, and z ¼ x1 þ x2 ; p1 ¼ 1b
b1
; p2 ¼ 1b
b2
. This system is also con-
1 2 tional neuron.
sidered as a benchmark application in the field of quaternionic-
46 S. Kumar, B.K. Tripathi / Journal of Computational Design and Engineering 6 (2019) 33–48
Fig. 9. The generalization of CSU based network trained through H-BP algorithm: Transformations with scaling factor ½, 0.3 unit translation in positive z-direction, and p/
2 rad rotation around the unit vector (i); over (a) Sphere (b) Cylinder (c) Torus.
Table 7
CPU (compensatory product unit) for quaternionic-valued signals.
Comparison of training and testing performance for 4D Saito’s circuit.
The net potentials of the proposed neurons have been defined by
Neuron Type in H Conventional CSU CPU the combinations of a linear (summation) and a nonlinear (radial
Network Topology 1-4-1 1-2-1 1-1-1 basis) aggregation functions with their proportions. These propor-
Parameters 13 13 7
Training MSE 0.0026 0.0020 0.0017
tions are the compensatory parameters in quaternion which gives
Average Epochs 16,000 9100 6500 the contribution of summation and radial basis functions to take
Testing MSE 0.0048 0.0039 0.0031 into the account the vagueness involved. Using proposed and con-
Error Variance 0.0025 0.0015 0.0011 ventional neurons, three quaternionic-valued feed-forward neural
Correlation 0.9622 0.9771 0.9829
networks are constructed. The first network consists of conven-
AIC 5.5101 6.0021 6.8420
PG 6.31 7.23 7.94 tional neurons at hidden and output layers, the second network
consists of CSU and conventional neurons at hidden and output
layers respectively, and the third network consists of CPU and con-
5. Conclusion ventional neurons at hidden and output layers respectively. The
H-BP (quaternionic back-propagation) learning algorithm is used
In this paper, we have proposed two compensatory neurons in for training of all networks through a wide spectrum of benchmark
quaternionic domain; CSU (compensatory summation unit) and problems. It is observed that the implementation of proposed
S. Kumar, B.K. Tripathi / Journal of Computational Design and Engineering 6 (2019) 33–48 47
Fig. 10. The generalization of CPU neurons based network trained through H-BP algorithm: Transformations with scaling factor ½, 0.3 unit translation in positive z-direction,
and p/2 rad rotation around the unit vector (i); over (a) Sphere (b) Cylinder (c) Torus.
neurons instead of conventional neurons in the quaternionic information among them is embedded within the number. The
domain, improves the regression ability and offers faster conver- neural networks with proposed neuron models and their learning
gence when solving a wide spectrum of benchmark problems. algorithms may be extended in octonionic/sedenionic domains
The proposed neuron models have higher functionality than con- for solving eight/sixteen-dimensional problems.
ventional neurons in the quaternionic domain. They make it possi- In order to convince the prospective readers that the contribu-
ble to solve the problems using a smaller network and fewer tion of this paper is significant; the 3D (Chua’s circuit and Lorenz
learning parameters. The neural networks based on CPU neurons system) and 4D (Linear autoregressive process with circular noise
have shown better performance than CSU neurons in terms of con- and Saito’ Circuit) chaotic time series prediction problems are also
vergence speed, testing MSE (mean square error), variance, correla- taken up as benchmark applications. The complexity of the learn-
tion, AIC (Akaike’s information criterion) and PG (prediction gain). ing system becomes much simpler with employment of proposed
The H-BP algorithm smartly trains the network with CSU or CPU neurons in the network as each neuron accepts three or four-
neurons for different compositions of three transformations dimensional data considered as single number system (quater-
through simple input-output mapping over a straight line in the nion). The wide spectrum of experiments shows the significantly
space. The trained network has capable of generalizing the consid- faster convergence of the network with CSU or CPU neuron
ered transformation over complicated geometrical structures like through H-BP, which eventually also improves the testing accuracy
Sphere, Cylinder, and Torus possessing huge point cloud data. Such in terms of statistical parameters viz. error variance, correlation,
simultaneous learning of magnitude and phase of intended compo- AIC, and PG. The drastic reduction in training epochs followed by
nents in space is only possible with a quaternionic-valued neural better generalization in typical nonlinear geometrical objects and
network. Such network is achieved due to the flow of quaternion time series provides a new direction to prospective researchers
as a unit of information in the network because this single number to exploit the proposed approach in the various complex engineer-
contains four components of high-dimensional data and phase ing and scientific problems.
48 S. Kumar, B.K. Tripathi / Journal of Computational Design and Engineering 6 (2019) 33–48
Conflicts of interest Mel, B. W. (1995). Information processing in dendritic trees. Neural Computation, 6,
1013–1085.
Million, E. (2007). The hadamard product. Creative Commons Attribution 3.0 United
There is no conflict of interest related to this work. States License.
Minemoto, T., Isokawa, T., Nishimura, H., & Matsui, N. (2017). Feed forward neural
network with random quaternionic neurons. Signal Processing, 136, 59–68.
Neidinger, R. D. (2010). Introduction to automatic differentiation and MATLAB
References
object-oriented programming. SIAM Review, 52(3), 545–563.
Nitta, T. (1995). A quaternary version of the back-propagation algorithm. IEEE
Arena, P., Caponetto, R., Fortuna, L., Muscato, G., & Xibilia, M. G. (1996). International Conference on Neural Networks, 5, 2753–2756.
Quaternionic multilayer perceptrons for chaotic time series prediction. IEICE Nitta, T. (1997). An extension of the back-propagation algorithm to complex
Transactions on Fundamentals, E79-A(10), 1682–1688. numbers. Neural Networks, 10(8), 1391–1415.
Arena, P., Fortuna, L., Muscato, G., & Xibilia, M. G. (1997). Multilayer perceptrons to Nitta, T. (2000). An analysis of the fundamental structure of complex valued
approximate quaternion valued functions. Neural Networks, 10, 335–342. neurons. Neural Processing Letters, 12, 239–246.
Benvenuto, N., & Piazza, F. (1992). On the complex back-propagation algorithm. Nitta, T. (2006). Three-dimensional vector valued neural network and its
IEEE Transactions on Signal Processing, 40(4), 967–969. generalization ability. Neural Information Processing, 10(10), 237–242.
Chaturvedi, D. K., Satsangi, P. S., & Kalra, P. K. (1999). New neuron models for Popa, C.A. (2016). Scaled conjugate gradient learning for quaternion-valued neural
simulating rotating electrical machines and load forecasting problems. Electric networks. In Proc. of int conf neural information processing (pp. 243–252).
Power Systems Research, 52, 123–131. Tripathi, B. K. (2015). High dimensional neurocomputing: Growth, appraisal and
Chen, X., & Li, S. (2005). An modified error function for the complex-value applications. Landon: Springer.
backpropagation neural networks. Neural Information Processing, 8(1), 1–8. Tripathi, B. K. (2017). On the complex domain deep machine learning for face
Fogel, D. B. (1991). An information criterion for optimal neural network selection. recognition. Applied Intelligence, 47(2), 382–396.
IEEE Transactions on Neural Networks, 2, 490–497. Tripathi, B. K., & Kalra, P. K. (2011a). On efficient learning machine with root-power
Gupta, M. M., & Homma, N. (2003). Static and dynamic neural networks, from mean neuron in complex domain. IEEE Transaction on Neural Networks, 22(5),
fundamentals to advanced theory. New York: Wiley. 727–738.
Hamilton, W. R. (1853). Lectures on quaternions. Dublin, Ireland: Hodges and Smith. Tripathi, B. K., & Kalra, P. K. (2011b). On the learning machine for three dimensional
Hirose, A. (2006). Complex-valued neural networks. New York: Springer. mapping. Neural Computing and Applications, 20, 105–111.
Kim, T., & Adali, T. (2003). Approximation by fully complex multilayer perceptrons. Ujang, B. C., Took, C. C., & Mandic, D. P. (2011). Quaternion-valued nonlinear
Neural Computation, 15(7), 1641–1666. adaptive filtering. IEEE Transactions on Neural Networks, 22, 1193–1206.
Koch, C. (1999). Biophysics of computation: Information processing in single neurons.
Oxford: Oxford University Press.
Kumar, S., & Tripathi, B. K. (2017). Machine learning with resilient propagation in Dr. Sushil Kumar received his PhD in Computational Intelligence from Harcourt
quaternionic domain. International Journal of Intelligent Engineering & Systems, Butler Technical University (HBTU) Kanpur, India and completed M. Tech from
10(4), 205–216. Defence institute of Advanced Technology fully funded by Defence Research and
Kumar, S., & Tripathi, B. K. (2018). High-dimensional information processing Development Organization (DIAT-DRDO) Pune, India. He is associated with the
through resilient propagation in quaternionic domain. Journal of Industrial Nature-inspired Computational Intelligence Research Group (NCIRG) at HBTU. His
Information Integration. https://doi.org/10.1016/j.jii.2018.01.004. areas of research include high-dimensional neurocomputing, computational intel-
Kumar, S., & Tripathi, B. K. (in press). Root-power mean aggregation-based neuron ligence, advancement in neural networks, machine learning and computer vision
in quaternionic domain. IETE Journal of Research, 1–19. https://doi.org/10.1080/ focused on biometrics and 3D Imaging. He has published several research papers in
03772063.2018.1436473. in press. these areas.
Lee, C. C., Chung, P. C., Tsai, J. R., & Chang, C. I. (1999). Robust radial basis function
neural networks. IEEE Transactions on Systems, Man, and Cybernetics. Part B, Prof. Bipin Kumar Tripathi completed his PhD in Computational Intelligence from
Cybernetics, 29(6), 674–685. Indian Institute of technology (IIT) Kanpur, India and M. Tech in Computer Science
Leung, H., & Haykin, S. (1991). The complex backpropagation algorithm. IEEE and Engineering from IIT Delhi, India. Dr. Tripathi is currently serving as a Professor
Transactions on Signal Processing, 39(9), 2101–2104. in Department of Computer Science and Engineering of HBTU Kanpur, India. He is
Lorenz, E. N. (1963). Deterministic nonperiodic flow. Journal of the Atmospheric leading the Nature-inspired Computational Intelligence Research Group (NCIRG) at
Sciences, 20(2), 130–141. HBTU. His areas of research include high-dimensional neurocomputing, computa-
Matsui, N., Isokawa, T., Kusamichi, H., Peper, F., & Nishimura, H. (2004). Quaternion tional neuroscience, intelligent system design, machine learning and computer
neural network with geometrical operators. Journal of Intelligent & Fuzzy vision focused on biometrics and 3D Imaging. He has published several research
Systems, 15, 149–164. papers in these areas in many peer reviewed journals including IEEE Transaction/
Matsumoto, T., & Chua, L. O. (1985). The double scroll. IEEE Transactions on Circuits Elsevier/Springer and other international conferences. He has also contributed book
and Systems, CAS-32(8), 798–818. chapters in different international publications and patent in his area. He is con-
McCulloch, W. S., & Pitts, W. H. (1943). A logical calculus of the ideas immanent in tinuously serving as PC for many international conferences and as a reviewer of
nervous activity. Bulletin of Mathematical Biophysics, 5(4), 115–133. several international journals.