Professional Documents
Culture Documents
Abstract
Artificial neural networks, such as feed-forward networks (FFN), convolutional neural networks (CNN), recursive neural
network (RNN) are becoming powerful tools that are starting to replace many classical algorithms. It is known that, for
image recognition, CNNs are often the best choice in terms of accuracy. In this paper we show that feed forward networks are
capable of achieving comparable performance, with less complicated architecture in comparison to CNNs. After presentation
of underlying theory of Feed Forward networks, we present different methods, that allowed us to get past network local
minima, then we show experiments and conclusions that followed.
Keywords
Neural networks, Activation function, Images, Classification
23
the value depends mainly on distance between pre- Data: 𝐸𝑝𝑜𝑐ℎ𝑠𝐶𝑜𝑢𝑛𝑡, 𝑇 𝑟𝑎𝑖𝑛𝑖𝑛𝑔𝐼 𝑛𝑝𝑢𝑡𝑠,
dicted and actual values - on error. Other layers’ 𝛿s 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑𝑂𝑢𝑡𝑝𝑢𝑡𝑠 (𝐸𝑂𝑈 𝑇 )
use numbers calculated in previous steps: Result: Higher precision of neural network
𝐿 ∶= count of network’s layers;
𝑁 𝑘+1 𝑁
𝐷 ∶= empty jagged array, for 𝛿 values;
1 𝜕𝐸 1 𝑘+1 𝜕𝐸 𝜕𝑠𝑗 𝑘+1
for 𝑖 = 0 to 𝐸𝑝𝑜𝑐ℎ𝑠𝐶𝑜𝑢𝑛𝑡 do
𝛿𝑖𝑘 = ⋅ 𝑘 = ⋅ ∑ 𝑘+1 = 𝑓 ′ 𝑘
(𝑠 𝑖 ) ∑ 𝛿𝑗𝑘+1 𝑤𝑖𝑗𝑘+1 ,
2 𝜕𝑠𝑖 2 𝑗=1 𝜕𝑠𝑗 𝜕𝑠𝑖𝑘 𝑗=1 for 𝑗 = 0 to Training Set’s Length do
(5) Insert 𝑗 𝑡ℎ vector from 𝐼 𝑛𝑝𝑢𝑡𝑠
to first layer’s input synapses;
where 𝑁𝑘+1 is count of neurons on (𝑘 + 1)𝑡ℎ layer.
for 𝑘 = 0 to L do
In Algorithm 1 you can see the full process of train- Calculate 𝑠 𝑘 on all neurons on 𝑘 𝑡ℎ
ing Feed-Forward Neural Network using back-propag- layer
ation algorithm. It uses array called 𝐷, which con- by summing products of synaptic
sists of 𝛿 values. This jagged array has 𝐿 rows and weights
as many columns, in a row, as many neurons the layer and output values of previous layer
has, it corresponds to. All elements of arrays are writ- neurons (or synapses if it is input
ten like 𝐴𝑖,𝑗 which is just alternative form, for writing layer);
𝐴[𝑖][𝑗]. What’s more, numerical intervals that are used Calculate 𝑦 𝑘 on all neurons on 𝑘 𝑡ℎ
in the for-loops are half-open (the numbers after to layer
don’t count to these intervals). There are also symbols by using activation function;
like 𝐿𝑅, 𝑠𝑖𝑘 , 𝑦𝑖𝑘 and 𝑤𝑖𝑗𝑘 . They mean respectively: Learn- end
ing Rate (𝜂), 𝑘 𝑡ℎ layer’s 𝑖 𝑡ℎ neuron’s input and output
𝑂𝑢𝑡𝑝𝑢𝑡 (𝑂𝑈 𝑇 ) ∶= vector made of
values and weight of synapse between 𝑖 𝑡ℎ neuron of
output values
𝑘 𝑡ℎ layer and 𝑗 𝑡ℎ neuron of previous layer. Moreover
values from output neurons;
there is 𝑓 ′ (⋅) symbol that signifies the differential of the
for 𝑛 = 0 to output neurons count do
activation function - for bipolar linear function
𝐷(𝐿−1),𝑛 =
(𝐸𝑂𝑈 𝑇𝑗,𝑛 − 𝑂𝑈 𝑇𝑗 ) ⋅ 𝑓 ′ (𝑠𝑛𝐿−1 );
1 − 𝑒 −𝛼𝑥
𝑓 (𝑥) = (6) end
1 + 𝑒 −𝛼𝑥
for 𝑘 = 𝐿 − 2 to 0 step −1 do
the differential is:
for 𝑛 = 0 to 𝑘 𝑡ℎ layer’s neurons
2𝛼𝑒 −𝛼𝑥 count do
𝑓 ′ (𝑥) = . (7) 𝐷𝑘,𝑛 = 0;
(1 + 𝑒 −𝛼𝑥 )2
for 𝑚 = 0 to (𝑘 + 1)𝑡ℎ layer’s
neurons count do
𝑘+1 ;
𝐷𝑘,𝑛 = 𝐷𝑘,𝑛 + 𝐷(𝑘+1),𝑚 ⋅ 𝑤𝑚𝑛
3. Example system
end
Our experimental FFN takes in an 50x25x3 image and 𝐷𝑘,𝑛 = 𝐷𝑘,𝑛 ⋅ 𝑓 ′ (𝑠𝑛𝑘 );
outputs six dimensional vector of decision values. Each end
decision value represents one movie that is associated end
with given frame.
for 𝑘 = 𝐿 − 2 to 0 step −1 do
for 𝑛 = 0 to 𝑘 𝑡ℎ layer’s neurons
3.1. Data preparation count do
3.1.1. Image preparation for 𝑚 = 0 to (𝑘 − 1)𝑡ℎ layer’s
neurons count do
After loading data set into the memory, we cut each
𝑘,𝑛 ⋅ 𝑦𝑚 ;
𝑘 = 2 ⋅ 𝐿𝑅 ⋅ 𝐷
𝑤𝑛𝑚 𝑘−1
image to meet 2:1 aspect ratio, this value was chosen end
because our input vector is an image that has an as-
end
pect ratio 2:1. If we took an image with aspect ratio
end
that does not meet input aspect ratio, we would need
to stretch or shrink an image; that would in turn add end
end
Algorithm 1: FFN training algorithm.
24
some pixels and thus make results more inaccurate. was not capacious enough to accommodate amount of
After cropping to target aspect ratio, we crop another knowledge that could recognize four or more movies.
12 of width and height, from each side, to eliminate Eventually, model that contained 3 hidden layers with
1
possible letterboxes. Finally we shrink an image to get the following count of neurons: 800, 200, 50 came out
it to 50x25 dimensions. Such dimensions have been to be an optimal solution.
chosen to cut down on training times. Obviously to The back-propagation algorithm has one significant
eliminate possible artifacts introduced by the prepro- defect. It does not use any heuristics that could help
cessing steps odescribed above the image will be fil- it to deal with local minima. Solution to this issue
tered [12]. is quite simple and easy to implement. In this case,
adding a random number from range of [−0.002; 0.002]
3.1.2. Data labeling to weights of all connections turned out to be extraor-
dinarily efficient. However it’s hard to say about ex-
Each image is given label, signifying movie it belongs act values because, the method was crossed with grad-
1 of image samples are redirected into test-
to. Then 10 ual increase of classes count.Surprisingly, increasing
ing set. amount of classes resulted in leap of quality in deci-
sions made by the network.
3.2. Training network This phenomenon was observed, after few epochs
of learning, after class addition. Sometimes network
After preparation of training and testing datasets, all
needed randomization mentioned above for the phe-
225 per movie images, converted to the vectors, are in-
nomenon to occur.
serted to the neural network. As it was said in ”Feed-
At the time of writing this article absolute accuracy
Forward Networks Theory” section, the information
of predictions made by the network, for six classes ex-
goes through all layers to the output layer with us-
ceeded maximal accuracy given on four classes. We
ing activation function in all encountered neurons and
suspect that this could be explained by growing gener-
then goes back in back-propagation process, modify-
ality of classifier contained in the model with increas-
ing synaptic weights. This sequence is repeated mul-
ing amount of known classes.
tiple times and results in improvement of network’s
Methodology of learning was following. Model was
ability to recognizing objects.
trained until it’s accuracy hadn’t increased, for few
1 hidden layer with 50 neurons epochs. In next stage networks’ weights was random-
Movies count Highest precision ized to get three different child networks. All four in-
4 76.0 % stances was learned in parallel. At the end, the best
5 71.2 % one of them was picked, reproduced by randomizing
6 52.7 % their weights and then all the process were repeated.
When the network had stalled with it’s advance-
3 hidden layers: 800, 200 and 50 neurons ment, one class was added to its possible outputs and
Movies count Highest precision learning was continued in the same way. One of the
most interesting issues that we have experienced in
4 82.0 %
the initial part of the research was a significant drop
5 88.8 %
in accuracy when class set contained ”Shrek 2” movie.
6 84.6 %
After preliminary learning on dataset that hadn’t con-
tained this movie, this problem disappeared.
4. Experiments We were also experimenting with different weight
initialization techniques, described in [4, 13]. We have
Main problem that was dealt with in order to provide tried the following methods:
optimal learning accuracy and time, was selecting ap-
propriate architecture that grants relatively good ac- • He weight initialization
√ - multiplies random value
curacy, but on the other hand, contains as few neurons [−1.0, 1.0] with 2 where 𝑠𝑖𝑧𝑒 is a size of pre-
𝑠𝑖𝑧𝑒
as required. Minimal neuron count results in minimal vious layer
computational time.
Initially, when tests were carried out on 4 classes • Xavier’s weight initialization
√ - similar to He ini-
of data, network that contained only one hidden layer tialization, but with 1
𝑠𝑖𝑧𝑒
which consisted of as few as 50 neurons seemed to be
the most efficient and accurate (76%). Nevertheless it • We also used the following technique:
25
√
2
𝑠𝑖𝑧𝑒𝑙−1 +𝑠𝑖𝑧𝑒𝑙 where 𝑠𝑖𝑧𝑒𝑛 is size of nth layer [2] D. Połap, M. Woźniak, Bacteria shape classifica-
tion by the use of region covariance and convo-
Unfortunately none of those methods worked cor- lutional neural network, in: 2019 International
rectly. In testing, we used networks with significant Joint Conference on Neural Networks (IJCNN),
number of neurons in nearly every layer (3750 neurons IEEE, 2019, pp. 1–7.
in input layer) that in turn set initial weight values to [3] M. Wózniak, D. Połap, R. K. Nowicki, C. Napoli,
minute values. G. Pappalardo, E. Tramontana, Novel approach
Results of our research are shown in the tables be- toward medical signals classifier, in: 2015 Inter-
low. national Joint Conference on Neural Networks
First table presents the most satisfactory accuracies (IJCNN), IEEE, 2015, pp. 1–7.
of the neural network (with 1 hidden layer consisting [4] G. Capizzi, G. L. Sciuto, C. Napoli, D. Polap,
of 50 neurons) before the experiment. The second one M. Woźniak, Small lung nodules detection based
shows the effects of randomization. Percents situated on fuzzy-logic and probabilistic neural network
in right columns are calculated by dividing amount of with bio-inspired reinforcement learning, IEEE
correctly recognized movies by size of testing dataset Transactions on Fuzzy Systems 28 (2019) 1178–
(which is 25 frames per movie). As it shows, the results 1189.
are surprisingly high. [5] F. Beritelli, G. Capizzi, G. Lo Sciuto, C. Napoli,
M. Woźniak, A novel training method to preserve
generalization of rbpnn classifiers applied to ecg
5. Example Predictions signals diagnosis, Neural Networks 108 (2018)
The tables below contain example predictions made by 331–338.
the network. Provided images are screenshots from [6] L.-H. Chen, Y.-C. Lai, H.-Y. M. Liao, Movie
the Netflix platform that were taken independently from scene segmentation using background informa-
the learning and testing datasets. tion, Pattern Recognition 41 (2008) 1056–1065.
As it can be seen, the accuracy is satisfactory high. [7] Y. Zhou, L. Zhang, Z. Yi, Predicting movie box-
It makes a few mistakes when it comes to dark and office revenues using deep neural networks, Neu-
distant frames. Fixing this issue will be our next goal. ral Computing and Applications 31 (2019) 1855–
1865.
[8] D. Połap, K. Kęsik, A. Winnicka, M. Woź-
6. Conclusions niak, Strengthening the perception of the vir-
tual worlds in a virtual reality environment, ISA
After all experiments done with this network, we can transactions 102 (2020) 397–406.
state the following conclusions: [9] J. Chen, J. Shao, C. He, Movie fill in the blank by
1. Combining back-propagation and heuristic ap- joint learning from video and text with adaptive
proach gave an unprecedented leap in network temporal attention, Pattern Recognition Letters
accuracy. - After a number of tests with weight 132 (2020) 62–68.
randomization, we suspect that by giving a ”nud- [10] S. Alghowinem, R. Goecke, M. Wagner, A. Alwa-
ge” to weights, we push it out of local minimum, bil, Evaluating and validating emotion elicitation
hence allowing it further learning. using english and arabic movie clips on a saudi
2. Changing model in-flight allows, for more gen- sample, Sensors 19 (2019) 2218.
erality. - By changing model topology (ex. adding [11] T.-L. Nguyen, S. Kavuri, M. Lee, A multimodal
additional output dimension), we have seen an convolutional neuro-fuzzy network for emotion
increase in generality of a model. understanding of movie clips, Neural Networks
118 (2019) 208–219.
3. Training on diversified samples first, results in
[12] G. Capizzi, S. Coco, G. Lo Sciuto, C. Napoli, A
increased generality.
new iterative fir filter design approach using a
gaussian approximation, IEEE Signal Processing
References Letters 25 (2018) 1615–1619.
[13] M. Matta, G. Cardarilli, L. Di Nunzio, R. Fazzo-
[1] M. Woźniak, D. Połap, Soft trees with neural lari, D. Giardino, A. Nannarelli, M. Re, S. Spanò,
components as image-processing technique for A reinforcement learning-based qam/psk sym-
archeological excavations, Personal and Ubiqui- bol synchronizer, IEEE Access 7 (2019) 124147–
tous Computing 24 (2020) 363–375. 124157.
26
Picture Origin Prediction Matches 2nd prediction
27
Picture Origin Prediction Matches 2nd prediction
28