1 s2.0 S2288430018300150 Main PDF

Journal of Computational Design and Engineering 6 (2019) 33–48
Contents lists available at ScienceDirect
Journal of Computational Design and Engineering

journal homepage: www.elsevier.com/locate/jcde
On the learning machine with compensatory aggregation based neurons

in quaternionic domain
Sushil Kumar ⇑, Bipin Kumar Tripathi
Department of Computer Science & Engineering, Harcourt Butler Technical University, Kanpur 208002, Uttar Pradesh, India
a r t i c l e i n f o a b s t r a c t
Article history: The nonlinear spatial grouping process of synapses is one of the fascinating methodologies for neuro-
Received 29 January 2018 computing researchers to achieve the computational power of a neuron. Generally, researchers use neu-
Received in revised form 4 April 2018 ron models that are based on summation (linear), product (linear) or radial basis (nonlinear) aggregation
Accepted 8 April 2018
for the processing of synapses, to construct multi-layered feed-forward neural networks, but all these
Available online 16 April 2018
neuron models and their corresponding neural networks have their advantages or disadvantages. The
multi-layered network generally uses for accomplishing the global approximation of input–output map-
Keywords:
ping but sometimes getting stuck into local minima, while the nonlinear radial basis function (RBF) net-
Quaternionic multi-layer perceptron
Quaternionic back-propagation
work is based on exponentially decaying that uses for local approximation to input–output mapping.
3D transformation Their advantages and disadvantages motivated to design two new artificial neuron models based on com-
Time series prediction pensatory aggregation functions in the quaternionic domain. The net internal potentials of these neuron
models are developed with the compositions of basic summation (linear) and radial basis (nonlinear)
operations on quaternionic-valued input signals. The neuron models based on these aggregation func-
tions ensure faster convergence, better training, and prediction accuracy. The learning and generalization
capabilities of these neurons are verified through various three-dimensional transformations and time
series predictions as benchmark problems.
Ó 2018 Society for Computational Design and Engineering. Publishing Services by Elsevier. This is an open
access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
1. Introduction Mandic, 2011) for the adaptability and computational capability

of complex-valued (C), 3D-vector and quaternionic valued (H) sig-
The advancement in the neural network is the fascinating and nals, respectively. These extensions establish a simple and natural
important fields for neuro-computing researchers working in three-layered structure of the neural network for achieving the
high-dimensional information processing to improve the computa- computational power, learning power, faster convergence and
tional power. In this direction, the conventional real-valued neural excellent accuracy. In complex domain, the extension has received
network (RVNN) (McCulloch & Pitts, 1943) has been extended in much research interest due to better learning, faster convergence
complex-valued neural network (CVNN) (Nitta, 1997, 2000; and better generalization capability and ability to learn two-
Leung & Haykin, 1991; Benvenuto & Piazza, 1992; Hirose, 2006; dimensional motion as its inherent property as compared with
Tripathi, 2015) 3D vector-valued neural network (3D-VVNN) RVNN (Nitta, 1997, 2000; Hirose, 2006; Tripathi, 2015; Chen & Li,
(Nitta, 2006; Tripathi & Kalra, 2011a) and quaternionic-valued 2005). Another direction of interest is related to the structural
neural network (QVNN) (Nitta, 1995; Arena, Caponetto, Fortuna, improvement in the neuron models to achieve better computa-
Muscato, & Xibilia, 1996; Arena, Fortuna, Muscato, & Xibilia, tional intelligence. In (Koch, 1999; Mel, 1995), it is mentioned that
1997; Matsui, Isokawa, Kusamichi, Peper, & Nishimura, 2004; the computational power of a neuron cell lies in the process of
Minemoto, Isokawa, Nishimura, & Matsui, 2017; Ujang, Took, & aggregation of synaptic weights with input signals in the cell body.
In the recent publications (Chaturvedi, Satsangi, & Kalra, 1999;
Gupta & Homma, 2003; Tripathi & Kalra, 2011b), it is also investi-
Peer review under responsibility of Society for Computational Design and gated that the nonlinear aggregation based neuron models have
Engineering. better computational and learning capability than linear aggrega-
⇑ Corresponding author.
tion based neurons (conventional). These studies motivated to pro-
E-mail addresses: sushil0402k5@gmail.com (S. Kumar), abkt.iitk@gmail.com (B.
pose two new neuron models in quaternionic domain which are
K. Tripathi).
https://doi.org/10.1016/j.jcde.2018.04.002
2288-4300/Ó 2018 Society for Computational Design and Engineering. Publishing Services by Elsevier.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
34 S. Kumar, B.K. Tripathi / Journal of Computational Design and Engineering 6 (2019) 33–48
based on the combination of linear and nonlinear aggregations 2. Artificial neuron model
among quaternionic-valued input signals.
The hyper-complexed quaternion number (q ¼ RðqÞ þ I1 ðqÞiþ For the quaternionic-valued input signals, the aggregation oper-
I2 ðqÞj þ I3 ðqÞk)1 discovered by Hamilton (Hamilton, 1853) ations (V) of two new neuron models in quaternionic domain are
possesses four components and phase information along different defined by the combinations of a linear (summation) and a nonlin-
components that are embedded within it. This number system has ear (radial basis) aggregation functions with their proportions
been applied in theoretical and applied mathematics, especially for (S : R), where S and R are quaternionic-valued compensatory param-
calculations involving three-dimensional rotations such as in com- eters that categorize the contribution of summation and radial
puter graphics, computer vision, crystallographic texture analysis, basis functions to take into the account the vagueness involved.
and mechanical machine design. Although the multiplication With a view to achieving robust aggregation, both the parameters
operation on quaternions does not follow the commutative property, are adaptable itself during training.
intelligent mathematical formulation will lead to its best use in high- Let q1 ; q2 ,. . ., qL be the quaternionic-valued input signals where
dimensional neuro-computing. The neural networks with quaternion L denotes the number of inputs, Y be the output and V be the net
as a unit of information flow efficiently learn and generalize in high potential of neuron in quaternions. Let f H be the quaternionic-
dimension with fewer neurons (Nitta, 1995; Arena et al., 1996, valued activation function, where H presents a set of quaternion
1997; Minemoto et al., 2017). Further, the back-propagation learning numbers. Let ws1m ; ws2m ,. . ., wsLm be the quaternionic-weights from
algorithm in the quaternionic domain (H-BP) is developed in the inputs to summation (s) term of m th conventional, CSU (Compen-
recent past (Nitta, 1995; Arena et al., 1996, 1997; Matsui et al., satory Summation Unit) (Fig. 1) or CPU (Compensatory Product
2004; Minemoto et al., 2017) using the concept of error-gradient- Unit) (Fig. 2) aggregation based neurons and wRB 1m ; w2m ,. . ., wLm be
RB RB
descent optimization, but it suffers from its basic issues like local the another quaternionic-weights from inputs to radial basis (RB)
minima and slow convergence. Thus, an investigation of the fast and term of m th CSU or CPU neurons; m = 1, 2,. . ., M where M is num-
efficient learning algorithm is highly demanding to overcome the ber of conventional, CSU or CPU neurons in hidden layer of a net-
issues of back-propagation algorithm. work. Let w0 and q0 ¼ 1 þ i þ j þ k be the bias and its input in
In order to capture the nonlinear correlations among quater- quaternion respectively, where i; j and k are the fundamental basis
nionic input patterns, two new compensatory-type aggregation units of a quaternion denoted by italic bold type letter throughout
functions based neurons in the quaternionic domain are presented the paper. These units satisfy the Hamilton rules ( i i ¼
in this paper. The net potential of each neuron is formulated by j j ¼ k k ¼ i j k ¼ 1; i j ¼ j i ¼ k; j k ¼ k j ¼ i
the weighted contributions of summation and radial basis func- and k i ¼ i k ¼ j) (Hamilton, 1853), where the symbol
tions. The multi-layered perceptron (MLP) with the summing neu- denotes quaternionic multiplication and does not satisfies the
rons used for the global approximation of input-output mapping, commutative property (e.g. a b–b a, where a and b are two
but the slow convergence and getting trapped into bad local minima arbitrary quaternionic variables).
are the two main drawbacks. On the other hand, the radial basis
function (RBF) network is used for local approximation of input- 2.1. Conventional neuron model in quaternionic domain
output mapping and presents a faster and efficient learning, but it
is inefficient in the case of constant-valued functions approximation The conventional neuron model is very well-known for real-
(Lee, Chung, Tsai, & Chang, 1999). The advantages and disadvan- valued signals (McCulloch & Pitts, 1943), complex-valued
tages of MLP and RBF networks motivated to design compensatory (Benvenuto & Piazza, 1992; Hirose, 2006; Leung & Haykin, 1991;
summation unit (CSU) and compensatory product unit (CPU) aggre- Nitta, 1997; Nitta, 2000; Tripathi, 2015) and quaternionic domains
gation functions for quaternionic-valued signals. The CSU aggrega- (Arena et al., 1996; Matsui et al., 2004; Nitta, 1995) in three-
tion is based on the compensatory summation of conventional layered neural networks. In all these three domains, the internal
and RBF functions while CPU includes the product of compensatory potential of a conventional neuron is based on the sum of the pro-
conventional and RBF functions in CSU aggregation function. The duct of the corresponding input-weight pairs. The net potential
various benchmark problems confirm the high functionality of pro- (V m ) of m-th conventional neuron with a bias unit in the quater-
posed neuron models based on compensatory aggregation func- nionic domain is expressed as
tions in the quaternionic domain. However, these neurons are
complicated in nature due to the additional parameters used but V m ¼ V Cm þ w0m q0 : ð1Þ
they will outperform when faster convergence and better accuracy
P
are required. where V Cm ¼ Ll¼1 wslm ql .
Rest of this paper is organized as follows: Section 2 mainly The output of the conventional neuron (Y m ) is expressed as
focuses on two proposed compensatory neurons (CSU and CPU) in
quaternionic domain and their architectures. Section 3 presents Y m ¼ f H ðV m Þ: ð2Þ
quaternionic back-propagation (H-BP) learning algorithms for the
networks with conventional (summating), CSU and CPU neurons. 2.2. CSU neuron model in quaternionic domain
In Section 4, the learning of various transformations is governed
through a line and subsequent generalization ability is confirmed The aggregation function of compensatory summation unit
through complicated geometric structures. Various chaotic time ser- (CSU) neuron (V rm ) is defined by the summation of conventional
ies prediction problems are also taken into the account in this sec- P PL
kql wRB k2
tion to convince its applicability in high-dimensional applications. (V Cm ¼ Ll¼1 wslm ql ) and radial basis (V RB
m ¼ e
l¼1 lm ) aggrega-
The final conclusion and future scope of the proposed work are tion functions with their contributions (Sm and Rm ) for quaternionic
described in Section 5. valued signals (ql ), where Sm and Rm are compensatory parameters
of m-th neuron in quaternionic domain that categorize the contri-
bution of conventional summation and radial basis functions. The
net potential (V rm ) of m-th CSU neuron with a bias unit (Fig. 1) in
the quaternionic domain is defined as
1
R; I1 ; I2 and I3 stands for a real and three imaginary components and i; j and k
V rm ¼ V rm1 þ V rm2 þ w0m q0 ð3Þ
are the fundamental basis units of a quaternionic variable respectively.
S. Kumar, B.K. Tripathi / Journal of Computational Design and Engineering 6 (2019) 33–48 35
Fig. 1. CSU neuron model in quaternionic domain.
Fig. 2. CPU neuron model in quaternionic domain.
where V rm1 ¼ Sm V Cm and V rm2 ¼ Rm V RB

m.
2.4. Structure of multi-layered neural networks with CSU or CPU
The output of CSU neuron (Y rm ) is defined as neurons
Y rm ¼ f H ðV rm Þ: ð4Þ In this paper, the structure of a three-layered (L M N) net-

work in the quaternionic domain is constructed by proposed CSU
2.3. CPU neuron model in quaternionic domain or CPU neurons in a hidden layer and conventional neurons in out-
put layers as shown in Fig. 3, where L denotes the number of
The aggregation function of compensatory product unit (CPU) quaternionic input signals and M and N denotes the number of hid-
neuron (V pm ) is defined by the inclusion of product of conventional den and output neurons respectively. The number of parameters of
P PL three-layered (L M N) network is computed by
kql wRB k2
(V Cm ¼ Ll¼1 wslm ql ) and radial basis (V RB
m ¼ e
l¼1 lm ) aggrega- Mð2L þ 2 þ 1Þ þ NðM þ 1Þ, where 2L (weights associated with con-
tion functions with their contributions (Sm and Rm ) in the aggrega- ventional and radial basis aggregations for L inputs) þ 2 (for com-
tion of CSU neuron for quaternionic valued signals (ql ), where Sm pensatory parameters) þ 1 (for bias) and M þ 1 (for bias) is the
and Rm are compensatory parameters m-th neuron in quaternionic number of parameters for a single CSU/CPU (Fig. 1 or Fig. 2) and
domain that categorize the contribution of conventional summa- conventional neurons respectively.
tion and radial basis functions. The net potential (V pm ) of m-th
CPU neuron with a bias unit (Fig. 2) in the quaternionic domain
is defined as 3. Learning rule
V pm ¼ V pm1 þ V pm2 þ V pm3 þ w0m q0 ð5Þ There are various activation functions for quaternionic-valued
neurons proposed in the literature. The split-type quaternionic-
where V pm1 ¼ Sm V Cm ; V pm2 ¼ Rm V RB p3 p1 p2
m and V m ¼ V m V m . valued activation function for a neuron addressed in Nitta (1995),
p Arena et al. (1996), Minemoto et al. (2017), Ujang et al. (2011),
The output of CPU neuron (Y m ) is defined as
Kumar and Tripathi (2017), Kumar and Tripathi (2018), Kumar
Y pm ¼ f H ðV pm Þ: ð6Þ and Tripathi (in press) but these activation function is nonregular
The output of the n-th conventional neuron in output layer can

be expressed as
Y n ¼ f H ðV n Þ: ð9Þ
The derivative of quaternionic-valued activation function (f H ) is
expressed as
0 0 0 0 0
f H ðV n Þ ¼ f ðRðV n ÞÞ þ f ðI1 ðV n ÞÞi þ f ðI2 ðV n ÞÞj þ f ðI3 ðV n ÞÞk ð10Þ
0
where f ð:Þ denotes the derivative of f ð:Þ.
The gradient-descent-based error back-propagation learning
scheme for feed-forward neural networks has been extended in
quaternionic domain (Arena et al., 1996; Minemoto et al., 2017;
Nitta, 1995). Let Dn be the desired output, then the output error
en is expressed as the difference between desired (Dn ) and actual
(Y n ) output of n-th output neuron
en ¼ Dn Y n ¼ ðRðDn Þ RðY n ÞÞ þ ðI1 ðDn Þ I1 ðY n ÞÞi þ ðI2 ðDn Þ
I2 ðY n ÞÞj þ ðI3 ðDn Þ I3 ðY n ÞÞk
¼ Rðen Þ þ I1 ðen Þi þ I2 ðen Þj þ I3 ðen Þk
ð11Þ
The weight update formula can be derived by minimizing the
real-valued mean square error (MSE) function which is presented
Fig. 3. MPL with CSU/CPU neurons in the hidden layer.
as
1 XN
but bounded in nature because Cauchy-Reimann condition does not E¼ en en

2N n¼1
hold for it (Tripathi, 2015). For the proposed neurons, an activation ð12Þ
N n o
function f H ð:Þ of a quaternionic variable is given as a 4-D extension 1 X X3
¼ ðRðen ÞÞ2 þ ðIk ðen ÞÞ2 :
of a real activation function f ð:Þ. The function f ð:Þ is applied to each 2N n¼1 k¼1
real component of the net potential. This extension has been moti-
The error function is minimized by recursively altering the
vated from split-type complex-valued activation function f C ð:Þ,
weight coefficients as
which is a 2-D extension of a linear or nonlinear real activation
function (Nitta, 1997 2000; Hirose, 2006; Tripathi & Kalra, 2011b; wnew ¼ wold þ Dw ¼ wold grw E: ð13Þ
Tripathi, 2017). In this direction, a fully complex-valued activation
function is also investigated and addressed in Kim and Adali (2003). where rw E presents the quaternionic gradient of the error function
There are various linear or nonlinear real activation functions given (E), which is computed from the partial derivative with respect to a
in the literature, but the nonlinear real-valued hyperbolic-tangent real and other three imaginary components of quaternionic weight
and identity functions are considered for hidden and output neu- (w)
rons respectively. However, the real activation function is specified @E @E @E @E
rw E ¼ þ iþ jþ k: ð14Þ
in general form and its choice depends on the intended application. @RðwÞ @I1 ðwÞ @I2 ðwÞ @I3 ðwÞ
Let V ¼ RðVÞ þ I1 ðVÞi þ I2 ðVÞj þ I3 ðVÞk, be the net internal poten-
tial of a neuron in quaternionic domain, then the output of a neuron However, the efficient and accurate gradient of this error func-
is defined as tion may be computed with automatic differentiation (AD) tech-
nique instead of chain rule derivation (Neidinger, 2010). This
f H ðVÞ ¼ f ðRðVÞÞ þ f ðI1 ðVÞÞi þ f ðI2 ðVÞÞj þ f ðI3 ðVÞÞk: ð7Þ technique evaluates the derivative of function automatically, no
A three-layered (L M N) network structure may be constructed matters how the function is complicated. Newton’s method for
with conventional, CSU or CPU neurons in hidden layer and conven- solving nonlinear equations, optimization for utilizing gradients/
tional neurons in the output layer of the network. Let q1 ; q2 ,. . ., qL Hessians, inverse problems, neural networks and solving stiff ordi-
be the input signals, ws1m ; ws2m ,. . ., wsLm be the weights from inputs nary differential equations are some applications of AD given in the
literature.
to summation term of m-th conventional, wRB 1m ; w2m ,. . ., wLm be
RB RB
The weight update (Dw) is proportional to the negative quater-
the other weights from inputs to radial basis term of m-th compen-
nionic gradient of the cost function with respect to the quater-
satory summation unit (CSU) or compensatory product unit (CPU)
nionic weight given as
neurons; m = 1, 2,. . ., M; Y be the output and V be the net potential
of a neuron in the quaternionic domain. Let S and R be the Dw ¼ grw E
quaternionic-valued compensatory parameters that categorize the
@E @E @E @E ð15Þ
contribution of summation and radial basis aggregation functions. ¼ g þ iþ jþ k :
@RðwÞ @I1 ðwÞ @I2 ðwÞ @I3 ðwÞ
Let w1n ; w2n ,. . ., wMn be the weights from conventional, CSU or
CPU neuron to m-th conventional neuron. Let w0 and q0 be the bias For the weight (w ¼ wmn ) that connects m-th hidden neuron to
and its input in quaternion respectively. The symbol represents n-th output neuron
the Hadamard product (Million, 2007). The output of the m-th con-
@E 1 0 @RðV n Þ
ventional, CSU or CPU quaternionic neuron in the hidden layer of ¼ Rðen Þf ðRðV n ÞÞ
@Rðwmn Þ N @Rðwmn Þ
the network can be expressed as ) ð16Þ
X3
0 @Ik ðV n Þ
Y m ¼ f H ðV m Þ: ð8Þ þ Ik ðen Þf ðIk ðV n ÞÞ :
k¼1
@Rðwmn Þ

@E 1 0 @RðV n Þ 3.1. Weight update rules for the network with quaternionic-valued
¼ Rðen Þf ðRðV n ÞÞ
@It ðwmn Þ N @It ðwmn Þ conventional neurons
) ð17Þ
X3
0 @Ik ðV n Þ
þ Ik ðen Þf ðIk ðV n ÞÞ ; for t ¼ 1; 2; 3: The quaternionic-valued conventional neurons have been
@It ðwmn Þ
k¼1 employed in a hidden and output layer of the three-layered archi-
Now substituting Eqs. (16) and (17) in Eq. (15) and we get after tecture. The weight update rule of hidden-output weight pair for
simplification as this architecture can be determined by Eq. (18) after substituting
the quaternionic gradients of all parts of net internal potential
g 0 P
Dwmn ¼ fRðen Þf ðRðV n ÞÞrwmn ðRðV n ÞÞ (V n ¼ M m¼1 wmn Y m þ w0n q0 ). These gradients are computed
N
X3 ð18Þ as:
0
þ Ik ðen Þf ðIk ðV n ÞÞrwmn ðIk ðV n ÞÞg:
rwmn RðV n Þ ¼ Y m : ð28Þ
k¼1
For the weight (w ¼ wlm ) that connects l-th input to m-th hid- rwmn I1 ðV n Þ ¼ i Y m : ð29Þ
den neuron, the weight update is derived conceptually from
Eq. (18) as rwmn I2 ðV n Þ ¼ j Y m : ð30Þ
gX N
0
Dwlm ¼
N n¼1
fRðen Þf ðRðV n ÞÞrwlm ðRðV n ÞÞ rwmn I3 ðV n Þ ¼ k Y m : ð31Þ
ð19Þ Now substituting Eqs. (28)-(31) in Eq. (18) for Dwmn and Dw0n ,
X
3
0
þ Ik ðen Þf ðIk ðV n ÞÞrwlm ðIk ðV n ÞÞg: and we get after simplification
k¼1
g
ðen f H ðV n ÞÞ Y m :
0
Dwmn ¼ ð32Þ
For all four components of net potential (RðV n Þ; I1 ðV n Þ; I2 ðV n Þ, N
and I3 ðV n Þ), the quaternionic gradients of all these with respect to
wlm are derived using the chain rule of derivation and presented as g 0
0 :
Dw0n ¼ ðen f H ðV n ÞÞ q ð33Þ
N
rwlm RðV n Þ ¼ Rðwmn Þf 0 ðRðV m ÞÞrwlm RðV m Þ The weight update rule of the input-hidden pair is derived by
0 Eq. (24) with supporting Eqs. (25)-(27) on substituting the quater-
I1 ðwmn Þf ðI1 ðV m ÞÞrwlm I1 ðV m Þ
0 ð20Þ nionic gradients of all parts of net internal potential
I2 ðwmn Þf ðI2 ðV m ÞÞrwlm I2 ðV m Þ P
0
(V m ¼ Ll¼1 wlm ql þ w0m q0 ). These gradients are computed as:
I3 ðwmn Þf ðI3 ðV m ÞÞrwlm I3 ðV m Þ:
rwlm RðV m Þ ¼ ql : ð34Þ
0
rwlm I1 ðV n Þ ¼ Rðwmn Þf ðI1 ðV m ÞÞrwlm I1 ðV m Þ
0 rwlm I1 ðV m Þ ¼ i ql : ð35Þ
þ I1 ðwmn Þf ðRðV m ÞÞrwlm RðV m Þ
0 ð21Þ
þ I2 ðwmn Þf ðI3 ðV m ÞÞrwlm I3 ðV m Þ rwlm I2 ðV m Þ ¼ j ql : ð36Þ
0
I3 ðwmn Þf ðI2 ðV m ÞÞrwlm I2 ðV m Þ:
rwlm I3 ðV m Þ ¼ k ql : ð37Þ
0
rwlm I2 ðV n Þ ¼ Rðwmn Þf ðI2 ðV m ÞÞrwlm I2 ðV m Þ Now substituting Eqs. (34)-(37) in Eq. (24) and we get after
0
I1 ðwmn Þf ðI3 ðV m ÞÞrwlm I3 ðV m Þ simplification
0 ð22Þ
þ I2 ðwmn Þf ðRðV m ÞÞrwlm RðV m Þ g l :
Dwlm ¼ fm q ð38Þ
0
þ I3 ðwmn Þf ðI1 ðV m ÞÞrwlm I1 ðV m Þ: N
g 0 :
rwlm I3 ðV n Þ ¼ Rðwmn Þf 0 ðI3 ðV m ÞÞrwlm I3 ðV m Þ Dw0m ¼ fm q ð39Þ
N
0
þ I1 ðwmn Þf ðI2 ðV m ÞÞrwlm I2 ðV m Þ The weight update rule is governed by Eqs. (32) and (33) for
0 ð23Þ
I2 ðwmn Þf ðI1 ðV m ÞÞrwlm I1 ðV m Þ hidden-output pair and Eqs. (38) and (39) for the input-hidden pair
0
þ I3 ðwmn Þf ðRðV m ÞÞrwlm RðV m Þ: of the network with conventional neurons in the quaternionic
domain.
Now substituting Eqs. (20)-(23) in Eq. (19) and we get after
3.2. Weight Update Rules for the Network with Proposed
quaternionic gradient wise simplification as
Quaternionic-Valued CSU Neurons in Hidden Layer
g X
3
Dwlm ¼ fRðfm Þrwlm RðV m Þ þ Ik ðfm Þrwlm Ik ðV m Þg: ð24Þ In a three-layered network, the proposed CSU and
N k¼1
conventional-type neurons in quarernionic domain are employed
where in hidden and output layer respectively (Fig. 3). The weight update
rules for hidden-output neuron pairs present same as given in Eqs.
0
X
N
Rðfm Þ ¼ f ðRðV m ÞÞ Rðdmn Þ: ð25Þ (32) and (33) because of conventional neurons are used in the out-
n¼1 put layer of the network. The weight update rule of the input-
hidden pair is derived by Eq. (24) with supporting Eqs. (25), (26)
0
X
N and (27) on substituting the quaternionic gradients of all parts of
It ðfm Þ ¼ f ðIt ðV m ÞÞ It ðdmn Þ; for t ¼ 1; 2; 3: ð26Þ net internal potential (V m ¼ V rm as given in Eq. (3)) with respect
n¼1
to each corresponding weight. Quaternionic gradients of
0
mn ðen f H ðV n ÞÞ:
dmn ¼ w ð27Þ RðV rm Þ; I1 ðV rm Þ; I2 ðV rm Þ and I3 ðV rm Þ with respect to w0m are calcu-
lated as:
rw0m RðV rm Þ ¼ rw0m Rðw0m q0 Þ ¼ q0 : ð40Þ rRm I2 ðV rm Þ ¼ rRm I2 ðV rm2 Þ ¼ V RB

m j: ð60Þ
rw0m I1 ðV rm Þ ¼ rw0m I1 ðw0m q0 Þ ¼ i q0 : ð41Þ rRm I3 ðV rm Þ ¼ rRm I3 ðV rm2 Þ ¼ V RB

m k: ð61Þ
rw0m I2 ðV rm Þ ¼ rw0m I2 ðw0m q0 Þ ¼ j q0 : ð42Þ Now substituting Eqs. (58)-(61) in Eq. (24) for Rm and we get
after simplification
rw0m I3 ðV rm Þ ¼ rw0m I3 ðw0m q0 Þ ¼ k q0 : ð43Þ
DRm ¼
g
V RB r
m fm : ð62Þ
N
Now substituting Eqs. (40)-(43) in Eq. (24) and we get after
simplification The weight update rule is governed by Eqs. (32) and (33) for
hidden-output pair and Eqs. (44), (49), (54), (57), and (62) for the
g 0 :
Dw0m ¼ frm q ð44Þ input-hidden pair of the network with conventional neurons at
N output layer and CSU neurons at a hidden layer in the quaternionic
Quaternionic gradients of RðV rm Þ; I1 ðV rm Þ; I2 ðV rm Þ and I3 ðV rm Þ domain.
with respect to wslm are calculated as:
3.3. Weight update rules for the network with proposed quaternionic-
rwslm RðV rm Þ ¼ rwslm RðV rm1 Þ ¼ Sm ql : ð45Þ
valued CPU neurons in hidden layer
rwslm I1 ðV rm Þ ¼ rwslm I1 ðV rm1 Þ ¼ Sm i ql : ð46Þ In a three-layered network, the proposed CPU and
conventional-type neurons for quaternionic-valued signals are
rwslm I2 ðV rm Þ ¼ rwslm I2 ðV rm1 Þ ¼ Sm j ql : ð47Þ employed in hidden and output layers respectively (Fig. 3). Due
to the conventional neurons used in the output layer of the net-
rwslm I3 ðV rm Þ ¼ rwslm I3 ðV rm1 Þ ¼ Sm k ql : ð48Þ work, the weight update rules for hidden-output neuron pairs pre-
sent same as given in Eqs. (32) and (33). The weight update rule of
Now substituting Eqs. (45)-(48) in Eq. (24) and we get after the input-hidden pair is derived by Eq. (24) with supporting Eqs.
simplification (25), (26) and (27) on substituting the quaternionic gradients of
g all parts of net internal potential (V m ¼ V pm ) with respect to corre-
Dwslm ¼ l :
Sm frm q ð49Þ
N sponding weights. Quaternionic gradients of RðV pm Þ; I1 ðV pm Þ;
I2 ðV pm Þ and I3 ðV pm Þ with respect to w0m are calculated as:
Quaternionic gradients of RðV rm Þ; I1 ðV rm Þ; I2 ðV rm Þ and I3 ðV rm Þ
with respect to Sm are calculated as: rw0m RðV pm Þ ¼ rw0m Rðw0m q0 Þ ¼ q0 : ð63Þ
rSm RðV rm Þ ¼ rSm RðV rm1 Þ ¼ V Cm : ð50Þ rw0m I1 ðV pm Þ ¼ rw0m I1 ðw0m q0 Þ ¼ i q0 : ð64Þ
rSm I1 ðV rm Þ ¼ rSm I1 ðV rm1 Þ ¼ i V Cm : ð51Þ rw0m I2 ðV pm Þ ¼ rw0m I2 ðw0m q0 Þ ¼ j q0 : ð65Þ
rSm I2 ðV rm Þ ¼ rSm I2 ðV rm1 Þ ¼ j V Cm : ð52Þ rw0m I3 ðV pm Þ ¼ rw0m I3 ðw0m q0 Þ ¼ k q0 : ð66Þ

Now substituting Eqs. (63)-(66) in Eq. (24) for w0m
rSm I3 ðV rm Þ ¼ rSm I3 ðV rm1 Þ ¼ k V Cm : ð53Þ g
Dw0m ¼ 0 :
fpm q ð67Þ
Now substituting Eqs. (50)-(53) in Eq. (24) for Sm and we get N
after simplification Quaternionic gradients of RðV pm Þ; I1 ðV pm Þ; I2 ðV pm Þ and I3 ðV pm Þ
g with respect to wslm are derived using chain rule of derivation as:
DSm ¼ frm V Cm : ð54Þ
N
rwslm RðV pm Þ ¼ rwslm RðV pm1 Þ þ rwslm RðV pm1 V pm2 Þ
Quaternionic gradients of RðV rm Þ; I1 ðV rm Þ; I2 ðV rm Þ and I3 ðV rm Þ ð68Þ
l þ Sm V pm2 q
¼ Sm q l ¼ Sm ð1 þ V pm2 Þ q
l :
with respect to wRB
lm are calculated as:
rwRB RðV rm Þ ¼ rwRB RðV rm2 Þ ¼ 2RðRm ÞV RB

m ðql wlm Þ:
RB
ð55Þ rwslm I1 ðV pm Þ ¼ rwslm I1 ðV pm1 Þ þ rwslm I1 ðV pm1 V pm2 Þ
lm lm
¼ Sm i q
l þ Sm i V pm2 q
l ¼ Sm i ð1 þ V pm2 Þ q
l :
r r2
rwRB It ðV m Þ ¼ rwRB It ðV m Þ ¼ 2It ðRm ÞV RB
m ðql lm Þ
wRB for t
ð69Þ
lm lm
¼ 1; 2; 3: ð56Þ
rwslm I2 ðV pm Þ ¼ rwslm I2 ðV pm1 Þ þ rwslm I2 ðV pm1 V pm2 Þ
Now again substituting Eqs. (55) and (56) in Eq. (24) and we get
after simplification ¼ Sm j q
l þ Sm j V pm2 q
l ¼ Sm j ð1 þ V pm2 Þ q
l :
X3 ð70Þ
2g RB
DwRB
lm ¼ V m fRðfrm ÞRðRm Þ þ Ik ðfrm ÞIk ðRm Þgðql wRB
lm Þ: ð57Þ
N k¼1 rwslm I3 ðV pm Þ ¼ rwslm I3 ðV pm1 Þ þ rwslm I3 ðV pm1 V pm2 Þ
l þ Sm k V pm2 q
¼ Sm k q l ¼ Sm k ð1 þ V pm2 Þ q
r r r r
Quaternionic gradients of RðV m Þ; I1 ðV m Þ; I2 ðV m Þ and I3 ðV m Þ l :
with respect to Rm are calculated as: ð71Þ
rRm RðV rm Þ ¼ rRm RðV rm2 Þ ¼ V RB

m: ð58Þ Now substituting Eqs. (68)-(71) in Eq. (24)
g l :
Dwslm ¼ Sm fpm ð1 þ V pm2 Þ q ð72Þ
rRm I1 ðV rm Þ ¼ rRm I1 ðV rm2 Þ ¼ V RB
m i: ð59Þ N
Quaternionic gradients of RðV pm Þ; I1 ðV pm Þ; I2 ðV pm Þ and I3 ðV pm Þ 4. Performance analysis of CSU and CPU neurons based
with respect to Sm are derived using chain rule of derivation as: networks
rSm RðV pm Þ ¼ rSm RðV pm1 Þ þ rSm RðV pm1 V pm2 Þ A wide spectrum of benchmark problems from 3D geometrical
ð73Þ
¼ V Cm þ V pm2 V Cm ¼ ð1 þ V pm2 Þ V Cm : transformations; 3D and 4D time series prediction with quater-
nionic back-propagation (H-BP) algorithm for three-layered net-
work based on compensatory summation unit (CSU) or
rSm I1 ðV pm Þ ¼ rSm I1 ðV pm1 Þ þ rSm I1 ðV pm1 V pm2 Þ compensatory product unit (CPU) neurons at the hidden layer
ð74Þ
¼ i V Cm þ i V pm2 V Cm ¼ i ð1 þ V pm2 Þ V Cm : and conventional neurons at the output layer are considered in this
section. The performance evaluation through different statistical
parameters like mean square error (MSE), error variance, correla-
rSm I2 ðV pm Þ ¼ rSm I2 ðV pm1 Þ þ rSm I2 ðV pm1 V pm2 Þ tion, Akaike’s information criterion (AIC) (Fogel, 1991) and predic-
ð75Þ
¼ j V Cm þ j V pm2 V Cm ¼ j ð1 þ V pm2 Þ V Cm : tion gain (PG) (Ujang et al., 2011; Popa, 2016). Lesser values of MSE
and error variance and larger values of AIC and PG for all experi-
ments reveal better accuracy than conventional neuron in quater-
rSm I3 ðV pm Þ ¼ rSm I3 ðV pm1 Þ þ rSm I3 ðV pm1 V pm2 Þ
ð76Þ nionic domain.
¼ k V Cm þ k V pm2 V Cm ¼ k ð1 þ V pm2 Þ V Cm :
Now substituting Eqs. (73)-(76) in Eq. (24) for Sm 4.1. Geometrical transformations in 3D
g The geometric transformations like translation, rotation, scal-

D Sm ¼ fpm ð1 þ V pm2 Þ V Cm : ð77Þ
N ing and their combinations, convey dominant geometric charac-
Quaternionic gradients of RðV pm Þ; I1 ðV pm Þ; I2 ðV pm Þ and I3 ðV pm Þ teristics in high-dimensional image processing applications. The
with respect to wRB input-output mappings in the space preserve the angle between
lm are derived using chain rule of derivation as:
oriented surfaces and phase of each data point is also
rwRB RðV pm Þ ¼ rwRB RðV pm2 Þ þ rwRB RðV pm1 V pm2 Þ maintained during learning and generalization of motion or
lm lm
p1
lm
ð78Þ transformation. Each quaternionic variable qi ¼ 0 þ xi i þ yi j þ zi k
¼ 2V RB
m ðRðRm Þ þ RðV m Rm ÞÞðql wlm Þ:
RB
undergoes a transformation (T) through a trained network and
correspondingly yields a transformed quaternionic variable ðq0i Þ
rwRB It ðV pm Þ ¼ rwRB It ðV pm2 Þ þ rwRB It ðV pm1 V pm2 Þ as follows:
lm lm lm
ð79Þ
¼ 2V RB p1
m ðIt ðRm Þ þ It ðV m Rm ÞÞðql wlm Þ:
RB

a qi a
q0i ¼ Tðqi Þ ¼ Sf þ b; for i ¼ 1; 2; 3; . . . np : ð86Þ
kak
where t ¼ 1; 2; 3. Now substituting Eqs. (78) and (79) in Eq. (24) for
wRB
lm where the function T yields different transformations and np
denotes the number of points that lie on the surface of a 3D object.
2g RB
DwRB
lm ¼ V Rðfpm ÞðRðRm Þ þ RðV pm1 Rm ÞÞ The quaternion a is represented by the polar form
N m ^ , where n^ is the unit vector and jhj < 2p.
ð80Þ a ¼ cosðh=2Þ þ sinðh=2Þn
X3
þ Ik ðfpm ÞðIk ðRm Þ þ Ik ðV pm1 Rm ÞÞgðql wRB The transformed quaternion q0i is obtained by the scaling factor
lm Þ
k¼1 (Sf ) followed by a translation with distancekbk ¼
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 P3 2
Quaternionic gradients of RðV pm Þ; I1 ðV pm Þ; I2 ðV pm Þ and I3 ðV pm Þ ðRðbÞÞ þ k¼1 ðIk ðbÞÞ (norm of quaternion) and h radian
with respect to Rm are derived using chain rule of derivation as: rotation around the unit vector (n ^ ) from input quaternion (q).
Various simulations are presented in this section for learning
rRm RðV pm Þ ¼ rRm RðV pm2 Þ þ rRm RðV pm1 V pm2 Þ and generalization of 3D transformations or motion interpretation.
p1
ð81Þ
¼ V RB
m ð1 þ V m Þ:
These simulations exhibit the significant contribution of the
quaternionic neural network because the generalization of 3D
objects was not possible by real or complex valued neural net-
rRm I1 ðV pm Þ ¼ rRm I1 ðV pm2 Þ þ rRm I1 ðV pm1 V pm2 Þ
p1
ð82Þ works as addressed in Minemoto et al. (2017). Since quaternion
¼ V RB
m ð1 þ V m Þ i: is the unit of learning of this neural network hence phase informa-
tion of each point in space is preserved during learning and gener-
rRm I2 ðV pm Þ ¼ rRm I2 ðV pm2 Þ þ rRm I2 ðV pm1 V pm2 Þ alization. The different combinations of transformations facilitate
p1
ð83Þ the viewing of 3D objects from different orientations, which is
¼ V RB
m ð1 þ V m Þ j: highly demanded in computer graphics and intelligent control sys-
tem design.
rRm I3 ðV pm Þ ¼ rRm I3 ðV pm2 Þ þ rRm I3 ðV pm1 V pm2 Þ The network trained with quaternionic back-propagation
p1
ð84Þ (H-BP) algorithm has the capability to learn 3D transformation or
¼ V RB
m ð1 þ V m Þ k:
3D motion through a small set of points lying on a line and even-
Now substituting Eqs. (81)-(84) in Eq. (24) for Rm tually generalize it over complex objects in space. Furthermore,
the learning and generalization for linear transformation are pre-
g p1 p
DR m ¼ V RB
m ð1 þ V m Þ fm : ð85Þ sented in Minemoto et al. (2017), where training is performed
N through the set of points lying on the plane but the plane contains
The weight update rule is governed by Eqs. (32) and (33) for the more features as compared to straight line due to the geometrical
hidden-output pair and Eqs. (67), (72), (77), (80), and (85) for the distribution of points on surface. Using the straight line for train-
input-hidden pair of the network with conventional neurons at ing, the proposed system shows better intelligent behavior than
output layer and CPU neurons at a hidden layer in the quaternionic the plane. As a benchmark problem, this section presents the learn-
domain. ing of transformations (rotation, scaling, and translation and their
combinations) through the H-BP algorithm for the networks with tor (i) followed by scaling factor ½ and 0.3 unit translation in the
conventional, CSU, or CPU neurons in the quaternionic domain at positive z-direction as shown in Fig. 4(a), (b) and (c) respectively.
hidden layer and conventional neurons at output layer. Their gen- All 3D points of a straight line with its reference point are treated
eralization abilities are compared with complicated 3D geometric as imaginary quaternion (real part is kept near to zero) which is
structures (Sphere, Cylinder, and Torus) through different statisti- used in all three transformations. In the network, first input
cal parameters. receives set of point that lies on a straight line and second input
A three-layered (2 M 2) network undergoes a learning pro- passes the reference point. The incorporation of the reference
cess for input-output mapping over a straight line containing a point of the object provides more information to learn a system
small number of points (21 data points) along with a reference and yields better accuracy which is observed during experiments
point ((0,0,0) mid of the line) in 3D space, where the network for each transformation.
consists of two inputs, M hidden neurons (conventional, CSU or For all transformations, three 2 M 2 networks are con-
CPU) and two conventional neurons at output layer for structed: first with conventional neurons, second with CSU neu-
quaternionic-valued signals. The input-output mapping as pre- rons and third with CPU neurons at hidden layer; and
sented in Eq. (86) is used for three transformations; scaling with conventional neurons are employed at output layer in all three net-
factor ½, 0.3 unit translation in the positive z-direction followed works. The training process of all networks for all transformations
by scaling with factor ½ and p/2 rad rotation around the unit vec- has been separately performed through H-BP learning algorithm.
Fig. 4. The input-output mapping of a straight line with (a) scaling factor ½, (b) scaling factor ½ and 0.3 unit translation in the positive z-direction and (c) scaling factor ½, 0.3
unit translation in the positive z-direction, and p/2 rad rotation around the unit vector (i).
The training of the networks with proposed CSU or CPU neurons Table 3
requires a lesser number of hidden neurons, a lesser number of Comparison of training and testing performance for scaling, translation, and rotation.
parameters, and lesser training cycles (average epochs) to reach Neuron Type in H Conventional CSU CPU
similar training mean square error (MSE) than the network with Network Topology 2-6-2 2-3-2 2-2-2
conventional neurons and the testing of the trained networks has Parameters 32 29 20
Training MSE 0.001 0.001 0.001
been verified by the complicated geometrical objects in 3D like Average Epochs 26,000 19,000 15,000
Sphere (4141 data points), Cylinder (2929 data points), and Torus Testing MSE Sphere 0.0941 0.0783 0.0491
(10,201 data points) with the comparative analysis based on differ- Cylinder 0.0802 0.0588 0.0324
ent statistical parameters: testing MSE, variance, correlation, AIC Torus 0.0997 0.0870 0.0655
Error Variance Sphere 0.0072 0.0051 0.0043
and PG as reported in Tables 1–3. The huge data points of each
Cylinder 0.0063 0.0045 0.0036
structure are intentionally considered in the testing process for Torus 0.0152 0.0097 0.0085
evaluating the intelligent behavior of the network trained. In order Correlation Sphere 0.9582 0.9743 0.9801
to depict the generalization ability, the testing results of the Cylinder 0.9677 0.9779 0.9891
trained networks with CSU neurons or CPU neurons over different Torus 0.9211 0.9632 0.9702
AIC Sphere 5.0832 5.5623 5.8322
3D objects through H-BP algorithms are demonstrated in Figs. 5 Cylinder 5.4205 5.892 6.2834
and 6 for scaling transformation, Figs. 7 and 8 for scaling & trans- Torus 3.9339 4.4211 4.8993
lation transformation and Figs. 9 and 10 for scaling, translation & PG Sphere 6.71 7.27 7.55
rotation transformation respectively. Tables also present the train- Cylinder 7.53 7.96 8.83
Torus 5.67 6.38 7.02
ing and testing of the network with CPU neurons that gives better
performance over CSU neurons in all perspectives as reported in
Tables 1–3.
4.2. Chaotic time series prediction in 3D using imaginary quaternion
Table 1 4.2.1. Chua’s circuit as chaotic time series prediction in 3D

Comparison of training and testing performance for scaling transformation (scaling Chua’s circuit is an autonomous electronic circuit which
factor ½). satisfies the chaotic criterion that contains one or more nonlin-
Neuron Type in H Conventional CSU CPU ear elements, one or more active registers and three or more
Network Topology 2-6-2 2-2-2 2-1-2 energy storage devices (Matsumoto & Chua, 1985). In the Chua’s
Parameters 32 20 11 circuit, the one Chua’s diode as the nonlinear element, one
Training MSE 0.001 0.001 0.001
locally active register and two capacitors and one inductor as
Average Epochs 20,000 16,500 12,100
Testing MSE Sphere 0.0922 0.0561 0.0373
energy storage devices are used whose dynamics is governed
Cylinder 0.0714 0.0442 0.0248 by state equations as
Torus 0.0982 0.0704 0.0571
Error Variance Sphere 0.0053 0.0042 0.0036 dx
Cylinder 0.0044 0.0037 0.0029
¼ a½y x hðxÞ
dt
Torus 0.0099 0.0083 0.0078
dy
Correlation Sphere 0.9721 0.9804 0.9882 ¼ xyþz ð87Þ
Cylinder 0.9873 0.9879 0.9905 dt
Torus 0.9462 0.9588 0.9733 dz
AIC Sphere 5.7120 5.9218 6.1052
¼ by cz:
dt
Cylinder 5.7819 6.0419 6.327
Torus 4.3310 4.8223 5.2472 where
PG Sphere 7.23 7.58 7.76
Cylinder 8.91 9.22 9.56 1
Torus 6.02 6.51 6.82 hðxÞ ¼ m1 x þ ðm0 m1 Þðjx þ 1j jx 1jÞ: ð88Þ
2
It presents the electrical response of nonlinear register and
a; b; c; m0 and m1 symbols denote the constant parameters. The
x; y, and z represent the voltages across two capacitors and an
Table 2 inductor respectively; and their combinations show the chaotic
Comparison of training and testing performance for scaling and translation.
attractor in 3D. The double scrolled chaotic attractor is obtained
Neuron Type in H Conventional CSU CPU with initial parameters [a ¼ 15:6; b ¼ 28; c ¼ 0; m0 ¼ 1:143 and
Network Topology 2-6-2 2-2-2 2-1-2 m1 ¼ 0:714]. This system is used as a benchmark application in
Parameters 32 20 11
Training MSE 0.001 0.001 0.001
the quaternionic domain to predict its chaotic behavior (Arena
Average Epochs 25,000 16,000 13,000 et al., 1996; Popa, 2016; Ujang et al., 2011). The chaotic time series
Testing MSE Sphere 0.0927 0.0652 0.0452 has been obtained by the simulation of this system as given in Eqs.
Cylinder 0.0791 0.0621 0.0343 (87) and (88) using fourth-order Runge-Kutta method. The time
Torus 0.0994 0.0845 0.0644
series are encoded in the imaginary quaternion (real part is set
Error Variance Sphere 0.0064 0.0047 0.0040
Cylinder 0.0051 0.0041 0.0034 near to zero). First 500 terms of this series used for training of three
Torus 0.0109 0.0093 0.0082 networks (one with conventional, second with CSU and third with
Correlation Sphere 0.9600 0.9783 0.9856 CPU neurons at hidden layer) using H-BP learning algorithm. These
Cylinder 0.9722 0.9799 0.9899 training time series are obtained after the dynamics of this system
Torus 0.9302 0.9711 0.9864
AIC Sphere 5.3813 6.0342 6.6387
becomes stationary. The network with the proposed neurons (CSU
Cylinder 5.7621 6.5293 6.9448 and CPU) requires smaller network topology (lesser parameters)
Torus 4.0210 4.7218 5.2773 and drastically reduces the training epochs than the network with
PG Sphere 6.82 7.69 7.91 the conventional neurons at hidden layer as reported in Table 4.
Cylinder 7.84 8.58 9.39
The testing performed with next 500 terms of the series
Torus 5.86 6.54 7.41
demonstrate the superiority of the proposed neurons over
Fig. 5. The generalization of CSU based network trained through H-BP algorithm: Transformation with scaling factor ½ over (a) Sphere (b) Cylinder (c) Torus.
Table 4 dx
Comparison of training and testing performance of Chua’s circuit.
¼ rðy xÞ
dt
Neuron Type in H dy
Conventional CSU CPU
¼ xðq zÞ y ð89Þ
Network Topology 1-4-1 1-2-1 1-1-1 dt
Parameters 13 13 7 dz
Training MSE 0.0074 0.0065 0.0062 ¼ xy bz
Average Epochs 12,000 8000 5000
dt
Testing MSE 0.0075 0.0048 0.0031
where r; q and b symbols are parameters for the Lorenz’s system.
Error Variance 0.0038 0.0021 0.0010
Correlation 0.9839 0.9914 0.9952 This system has been used as a benchmark application in the
AIC 4.8843 5.5821 6.1022 quaternionic-valued neural network (Arena et al., 1996; Popa,
PG 5.72 6.87 7.03 2016; Ujang et al., 2011). Let r ¼ 15; q ¼ 28, and b ¼ 8=3, the sys-
tem of equations presented in Eq. (89) generates 6537 terms of
the time series using fourth-order Runge-Kutta method. When the
conventional through different statistical parameters error vari- dynamics of Lorenz’s system become stationary, the time series
ance, correlation, AIC and PG in Table 4. with 500 terms are supplied for the training of networks through
H-BP learning algorithm and rest of the terms from time series
4.2.2. Lorenz’s system as chaotic time series prediction in 3D are used for testing. The testing results for the prediction of this
Chaotic behavior of the Lorenz’s system (Lorenz, 1963) is gov- time series is analyzed through testing MSE, variance, correlation,
erned by the dynamics of three differential equations as follows PG, and AIC as reported in Table 5. The training and testing results
Fig. 6. The generalization of CPU neurons based network trained through H-BP algorithm: Transformations with scaling factor ½; over (a) Sphere (b) Cylinder (c) Torus.
Table 5
4.3. Time series prediction in 4D using quaternion
Comparison of training and testing performance of Lorenz system.
Neuron Type in H Conventional CSU CPU 4.3.1. Linear autoregressive process with circular noise
Network Topology 1-4-1 1-2-1 1-1-1
The linear autoregressive process with circular noise has been
Parameters 13 13 7
Training MSE 0.0015 0.0010 0.0008 considered as an important benchmark problem as addressed in
Average Epochs 11,000 7000 4500 Ujang et al. (2011), Popa (2016) for predicting quaternionic-valued
Testing MSE 0.0024 0.0018 0.0031 signals. The stable autoregressive filtering process is given as
Error Variance 0.0016 0.0014 0.0009
Correlation 0.9871 0.9924 0.9977 OðnÞ ¼ 1:79 Oðn 1Þ 1:85 Oðn 2Þ þ 1:27 Oðn 3Þ
AIC 6.4503 6.8925 7.3117
PG 8.13 8.75 9.17 0:41 Oðn 4Þ þ #ðnÞ: ð90Þ
where the quaternion-valued circular white noise

as presented in Table 5 reveal the superiority of proposed CSU and #ðnÞ ¼ Rð#ðnÞÞ þ I1 ð#ðnÞÞi þ I2 ð#ðnÞÞj þ I3 ð#ðnÞÞk and its compo-
CPU neurons over conventional neuron with the drastic reduction in nents follow the normal distribution with zero mean and unit vari-
training epochs. ance ðN ð0; 1ÞÞ. Three quaternionic-valued feed forward neural
Fig. 7. The generalization of CSU based network trained through H-BP algorithm: Transformations with scaling factor ½ and 0.3 unit translation in the positive z-direction;
over (a) Sphere (b) Cylinder (c) Torus.
networks constructed with conventional, CSU and CPU neurons at

hidden layer are trained through H-BP algorithms with the first
500 terms obtained from the system as given in Eq. (90) and their Table 6
Comparison of training and testing performance for linear autoregressive filtering
training analysis reports are presented in Table 6 which reports that process with circular noise.
the network with the proposed neurons (CSU and CPU) requires
smaller network topology (lesser parameters) and drastically Neuron Type in H Conventional CSU CPU
Network Topology 4-4-1 4-1-1 4-1-1
reduces the training epochs as compared to network with the con- Parameters 25 13 13
ventional neurons used in hidden layer. The next 1000 terms of the Training MSE 0.0034 0.0030 0.0025
series have been used for testing. Training and testing results Average Epochs 10,000 6000 4000
reported in Table 6 show the superiority of proposed neurons over Testing MSE 0.0062 0.0044 0.0036
Error Variance 0.0029 0.0017 0.0010
conventional neuron in terms of different statistical parameters viz.
Correlation 0.9875 0.9952 0.9985
error variance, correlation, AIC, and PG along with a significantly AIC 4.9623 5.9401 6.4920
less number of training epochs. PG 6.82 7.52 8.13
Fig. 8. The generalization of CPU neurons based network trained through H-BP algorithm: Transformations with scaling factor ½ and 0.3 unit translation in the positive z-
direction; over (a) Sphere (b) Cylinder (c) Torus.
4.3.2. Chaotic time series prediction for 4D Saito’s circuit valued neural network (Arena et al., 1996; Popa, 2016). The
The Saito’s circuit is an electronic circuit which is defined by the chaotic behavior of the system is based on the values of its
following system of four differential equations which contain four parameters (a1 ¼ 7:5; b1 ¼ 0:16; a2 ¼ 15; b2 ¼ 0:097; f ¼ 1:3). First
variables (x1 ; y1 ; x2 ; y2 ) and five parameters (a1 ; b1 ; a2 ; b2 ; f) 500 terms of the series generated by Eq. (91) when its dynamics
" dx # " # is in stationary state are used for training of three different net-
1
dt 1 1 x1 fp1 hðzÞ works with H-BP learning algorithm and testing is performed on
¼
dy1 a1 a1 b1 y1 fpb11 hðzÞ the next 1000 terms of this series. The network with the pro-
" dxdt # " # ð91Þ posed neurons (CSU and CPU) requires smaller network topology
dt
2
1 1 x2 fp2 hðzÞ
¼ (lesser parameters) and drastically reduces the training epochs
dy2 a2 a2 b2 y2 fpb22 hðzÞ
dt than the network with the conventional neurons at hidden layer
8 as presented in Table 7. The testing results through testing MSE,
< 1 for z < 1
variance, correlation, PG, and AIC for the prediction of Saito time
where hðzÞ ¼ 1 for z P 1 , is the normalized hysteresis
: series is also reported in Table 7. The training and testing results
again reveal the superiority of CSU and CPU neuron over conven-
value, and z ¼ x1 þ x2 ; p1 ¼ 1b
b1
; p2 ¼ 1b
b2
. This system is also con-
1 2 tional neuron.
sidered as a benchmark application in the field of quaternionic-
Fig. 9. The generalization of CSU based network trained through H-BP algorithm: Transformations with scaling factor ½, 0.3 unit translation in positive z-direction, and p/
2 rad rotation around the unit vector (i); over (a) Sphere (b) Cylinder (c) Torus.
Table 7
CPU (compensatory product unit) for quaternionic-valued signals.
Comparison of training and testing performance for 4D Saito’s circuit.
The net potentials of the proposed neurons have been defined by
Neuron Type in H Conventional CSU CPU the combinations of a linear (summation) and a nonlinear (radial
Network Topology 1-4-1 1-2-1 1-1-1 basis) aggregation functions with their proportions. These propor-
Parameters 13 13 7
Training MSE 0.0026 0.0020 0.0017
tions are the compensatory parameters in quaternion which gives
Average Epochs 16,000 9100 6500 the contribution of summation and radial basis functions to take
Testing MSE 0.0048 0.0039 0.0031 into the account the vagueness involved. Using proposed and con-
Error Variance 0.0025 0.0015 0.0011 ventional neurons, three quaternionic-valued feed-forward neural
Correlation 0.9622 0.9771 0.9829
networks are constructed. The first network consists of conven-
AIC 5.5101 6.0021 6.8420
PG 6.31 7.23 7.94 tional neurons at hidden and output layers, the second network
consists of CSU and conventional neurons at hidden and output
layers respectively, and the third network consists of CPU and con-
5. Conclusion ventional neurons at hidden and output layers respectively. The
H-BP (quaternionic back-propagation) learning algorithm is used
In this paper, we have proposed two compensatory neurons in for training of all networks through a wide spectrum of benchmark
quaternionic domain; CSU (compensatory summation unit) and problems. It is observed that the implementation of proposed
Fig. 10. The generalization of CPU neurons based network trained through H-BP algorithm: Transformations with scaling factor ½, 0.3 unit translation in positive z-direction,
and p/2 rad rotation around the unit vector (i); over (a) Sphere (b) Cylinder (c) Torus.
neurons instead of conventional neurons in the quaternionic information among them is embedded within the number. The
domain, improves the regression ability and offers faster conver- neural networks with proposed neuron models and their learning
gence when solving a wide spectrum of benchmark problems. algorithms may be extended in octonionic/sedenionic domains
The proposed neuron models have higher functionality than con- for solving eight/sixteen-dimensional problems.
ventional neurons in the quaternionic domain. They make it possi- In order to convince the prospective readers that the contribu-
ble to solve the problems using a smaller network and fewer tion of this paper is significant; the 3D (Chua’s circuit and Lorenz
learning parameters. The neural networks based on CPU neurons system) and 4D (Linear autoregressive process with circular noise
have shown better performance than CSU neurons in terms of con- and Saito’ Circuit) chaotic time series prediction problems are also
vergence speed, testing MSE (mean square error), variance, correla- taken up as benchmark applications. The complexity of the learn-
tion, AIC (Akaike’s information criterion) and PG (prediction gain). ing system becomes much simpler with employment of proposed
The H-BP algorithm smartly trains the network with CSU or CPU neurons in the network as each neuron accepts three or four-
neurons for different compositions of three transformations dimensional data considered as single number system (quater-
through simple input-output mapping over a straight line in the nion). The wide spectrum of experiments shows the significantly
space. The trained network has capable of generalizing the consid- faster convergence of the network with CSU or CPU neuron
ered transformation over complicated geometrical structures like through H-BP, which eventually also improves the testing accuracy
Sphere, Cylinder, and Torus possessing huge point cloud data. Such in terms of statistical parameters viz. error variance, correlation,
simultaneous learning of magnitude and phase of intended compo- AIC, and PG. The drastic reduction in training epochs followed by
nents in space is only possible with a quaternionic-valued neural better generalization in typical nonlinear geometrical objects and
network. Such network is achieved due to the flow of quaternion time series provides a new direction to prospective researchers
as a unit of information in the network because this single number to exploit the proposed approach in the various complex engineer-
contains four components of high-dimensional data and phase ing and scientific problems.
Conflicts of interest Mel, B. W. (1995). Information processing in dendritic trees. Neural Computation, 6,
1013–1085.
Million, E. (2007). The hadamard product. Creative Commons Attribution 3.0 United
There is no conflict of interest related to this work. States License.
Minemoto, T., Isokawa, T., Nishimura, H., & Matsui, N. (2017). Feed forward neural
network with random quaternionic neurons. Signal Processing, 136, 59–68.
Neidinger, R. D. (2010). Introduction to automatic differentiation and MATLAB
References
object-oriented programming. SIAM Review, 52(3), 545–563.
Nitta, T. (1995). A quaternary version of the back-propagation algorithm. IEEE
Arena, P., Caponetto, R., Fortuna, L., Muscato, G., & Xibilia, M. G. (1996). International Conference on Neural Networks, 5, 2753–2756.
Quaternionic multilayer perceptrons for chaotic time series prediction. IEICE Nitta, T. (1997). An extension of the back-propagation algorithm to complex
Transactions on Fundamentals, E79-A(10), 1682–1688. numbers. Neural Networks, 10(8), 1391–1415.
Arena, P., Fortuna, L., Muscato, G., & Xibilia, M. G. (1997). Multilayer perceptrons to Nitta, T. (2000). An analysis of the fundamental structure of complex valued
approximate quaternion valued functions. Neural Networks, 10, 335–342. neurons. Neural Processing Letters, 12, 239–246.
Benvenuto, N., & Piazza, F. (1992). On the complex back-propagation algorithm. Nitta, T. (2006). Three-dimensional vector valued neural network and its
IEEE Transactions on Signal Processing, 40(4), 967–969. generalization ability. Neural Information Processing, 10(10), 237–242.
Chaturvedi, D. K., Satsangi, P. S., & Kalra, P. K. (1999). New neuron models for Popa, C.A. (2016). Scaled conjugate gradient learning for quaternion-valued neural
simulating rotating electrical machines and load forecasting problems. Electric networks. In Proc. of int conf neural information processing (pp. 243–252).
Power Systems Research, 52, 123–131. Tripathi, B. K. (2015). High dimensional neurocomputing: Growth, appraisal and
Chen, X., & Li, S. (2005). An modified error function for the complex-value applications. Landon: Springer.
backpropagation neural networks. Neural Information Processing, 8(1), 1–8. Tripathi, B. K. (2017). On the complex domain deep machine learning for face
Fogel, D. B. (1991). An information criterion for optimal neural network selection. recognition. Applied Intelligence, 47(2), 382–396.
IEEE Transactions on Neural Networks, 2, 490–497. Tripathi, B. K., & Kalra, P. K. (2011a). On efficient learning machine with root-power
Gupta, M. M., & Homma, N. (2003). Static and dynamic neural networks, from mean neuron in complex domain. IEEE Transaction on Neural Networks, 22(5),
fundamentals to advanced theory. New York: Wiley. 727–738.
Hamilton, W. R. (1853). Lectures on quaternions. Dublin, Ireland: Hodges and Smith. Tripathi, B. K., & Kalra, P. K. (2011b). On the learning machine for three dimensional
Hirose, A. (2006). Complex-valued neural networks. New York: Springer. mapping. Neural Computing and Applications, 20, 105–111.
Kim, T., & Adali, T. (2003). Approximation by fully complex multilayer perceptrons. Ujang, B. C., Took, C. C., & Mandic, D. P. (2011). Quaternion-valued nonlinear
Neural Computation, 15(7), 1641–1666. adaptive filtering. IEEE Transactions on Neural Networks, 22, 1193–1206.
Koch, C. (1999). Biophysics of computation: Information processing in single neurons.
Oxford: Oxford University Press.
Kumar, S., & Tripathi, B. K. (2017). Machine learning with resilient propagation in Dr. Sushil Kumar received his PhD in Computational Intelligence from Harcourt
quaternionic domain. International Journal of Intelligent Engineering & Systems, Butler Technical University (HBTU) Kanpur, India and completed M. Tech from
10(4), 205–216. Defence institute of Advanced Technology fully funded by Defence Research and
Kumar, S., & Tripathi, B. K. (2018). High-dimensional information processing Development Organization (DIAT-DRDO) Pune, India. He is associated with the
through resilient propagation in quaternionic domain. Journal of Industrial Nature-inspired Computational Intelligence Research Group (NCIRG) at HBTU. His
Information Integration. https://doi.org/10.1016/j.jii.2018.01.004. areas of research include high-dimensional neurocomputing, computational intel-
Kumar, S., & Tripathi, B. K. (in press). Root-power mean aggregation-based neuron ligence, advancement in neural networks, machine learning and computer vision
in quaternionic domain. IETE Journal of Research, 1–19. https://doi.org/10.1080/ focused on biometrics and 3D Imaging. He has published several research papers in
03772063.2018.1436473. in press. these areas.
Lee, C. C., Chung, P. C., Tsai, J. R., & Chang, C. I. (1999). Robust radial basis function
neural networks. IEEE Transactions on Systems, Man, and Cybernetics. Part B, Prof. Bipin Kumar Tripathi completed his PhD in Computational Intelligence from
Cybernetics, 29(6), 674–685. Indian Institute of technology (IIT) Kanpur, India and M. Tech in Computer Science
Leung, H., & Haykin, S. (1991). The complex backpropagation algorithm. IEEE and Engineering from IIT Delhi, India. Dr. Tripathi is currently serving as a Professor
Transactions on Signal Processing, 39(9), 2101–2104. in Department of Computer Science and Engineering of HBTU Kanpur, India. He is
Lorenz, E. N. (1963). Deterministic nonperiodic flow. Journal of the Atmospheric leading the Nature-inspired Computational Intelligence Research Group (NCIRG) at
Sciences, 20(2), 130–141. HBTU. His areas of research include high-dimensional neurocomputing, computa-
Matsui, N., Isokawa, T., Kusamichi, H., Peper, F., & Nishimura, H. (2004). Quaternion tional neuroscience, intelligent system design, machine learning and computer
neural network with geometrical operators. Journal of Intelligent & Fuzzy vision focused on biometrics and 3D Imaging. He has published several research
Systems, 15, 149–164. papers in these areas in many peer reviewed journals including IEEE Transaction/
Matsumoto, T., & Chua, L. O. (1985). The double scroll. IEEE Transactions on Circuits Elsevier/Springer and other international conferences. He has also contributed book
and Systems, CAS-32(8), 798–818. chapters in different international publications and patent in his area. He is con-
McCulloch, W. S., & Pitts, W. H. (1943). A logical calculus of the ideas immanent in tinuously serving as PC for many international conferences and as a reviewer of
nervous activity. Bulletin of Mathematical Biophysics, 5(4), 115–133. several international journals.

1 s2.0 S2288430018300150 Main PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S2288430018300150 Main PDF

Uploaded by

Copyright:

Available Formats

Journal of Computational Design and Engineering 6 (2019) 33–48

Contents lists available at ScienceDirect

Journal of Computational Design and Engineering

On the learning machine with compensatory aggregation based neurons

1. Introduction Mandic, 2011) for the adaptability and computational capability

Fig. 1. CSU neuron model in quaternionic domain.

Fig. 2. CPU neuron model in quaternionic domain.

where V rm1 ¼ Sm V Cm and V rm2 ¼ Rm V RB

Y rm ¼ f H ðV rm Þ: ð4Þ In this paper, the structure of a three-layered (L M N) net-

The output of the n-th conventional neuron in output layer can

but bounded in nature because Cauchy-Reimann condition does not E¼ en en

rw0m RðV rm Þ ¼ rw0m Rðw0m q0 Þ ¼ q0 : ð40Þ rRm I2 ðV rm Þ ¼ rRm I2 ðV rm2 Þ ¼ V RB

rw0m I1 ðV rm Þ ¼ rw0m I1 ðw0m q0 Þ ¼ i q0 : ð41Þ rRm I3 ðV rm Þ ¼ rRm I3 ðV rm2 Þ ¼ V RB

rSm I1 ðV rm Þ ¼ rSm I1 ðV rm1 Þ ¼ i V Cm : ð51Þ rw0m I2 ðV pm Þ ¼ rw0m I2 ðw0m q0 Þ ¼ j q0 : ð65Þ

rSm I2 ðV rm Þ ¼ rSm I2 ðV rm1 Þ ¼ j V Cm : ð52Þ rw0m I3 ðV pm Þ ¼ rw0m I3 ðw0m q0 Þ ¼ k q0 : ð66Þ

rwRB RðV rm Þ ¼ rwRB RðV rm2 Þ ¼ 2RðRm ÞV RB

rRm RðV rm Þ ¼ rRm RðV rm2 Þ ¼ V RB

g The geometric transformations like translation, rotation, scal-

Table 1 4.2.1. Chua’s circuit as chaotic time series prediction in 3D

where the quaternion-valued circular white noise

networks constructed with conventional, CSU and CPU neurons at

You might also like