You are on page 1of 6

2017 International Electronics Symposium on Engineering Technology and Applications (IES-ETA)

Deep Multilayer Network for Automatic Targeting


System of Gun Turret

Muhamad Khoirul Anwar∗ , Anhar Risnumawan† , Adytia Darmawan‡ ,


Mohamad Nasyir Tamara§ , and Didik Setyo Purnomo¶
Mechatronics Engineering Division
Politeknik Elektronika Negeri Surabaya (PENS)
Kampus PENS, Jalan Raya ITS Sukolilo, Surabaya, 60111 Indonesia
Email: ∗ muhkhoi@me.student.pens.ac.id, {† anhar,‡ adyt, § nasir_meka, ¶ didiksp}@pens.ac.id

Abstract—Military vehicles are important part to maintain


territories of a country. Military vehicle is often equipped with
a gun turret mounted on top of the vehicle. Traditionally,
gun turret is operated manually by an operator sitting on the
vehicle. With the advance of current robotic technology an
automatic operation of gun turret is highly possible. Notable
works on automatic gun turret tend to use features that are
manually designed as an input to a classifier for target tracking.
These features can cause less optimal parameters and require
highly complex kinematic and dynamic analysis specific to a
particular turret. In this paper, toward the goal of realizing an
automatic targeting system of gun turret, a gun turret simulation
system is developed by leveraging fully connected network of
deep learning using only visual information from a camera.
It includes designing convolutional layers to accurately detect
and tracking a target with input from a camera. All network
parameters are automatically and jointly learned without any
human intervention, all network parameters are driven purely
by data. This method also requires less kinematic and dynamic Fig. 1. Indonesian domestically-made Rhino panzer successfully pass firing
model. Experiments show encouraging results that the automatic test (top-row). The gun turret is however still operated manually. The operators
targeting system of gun turret using only a camera can benefit are vulnerable to being shot directly or indirectly by enemies. It is then
research in the related fields. important to develop remotely or automatically operated gun turret (bottom-
row).
Keywords—Automatic targeting system, gun turret, deep learn-
ing, convolutional neural network, joint learning
test, as shown in top-row of Fig 1. This panzer is equipped
I. I NTRODUCTION with a gun turret on top and capable of transporting personnel
inside. Despite having passed the firing test, this new armor
Military vehicle is an important part in warfare to maintain still needs many major improvements.
defense system. One of the military weapon developed in the
field of defense technology is Ground Combat Vehicle (GCV). One of the main problems that should be improved is
Many researches have been done to improve the performance manual operation of the gun turret—there must be an operator
of terrestrial vehicles such as artillery tanks, armored vehi- sitting on the vehicle controlling the gun turret manually, as
cles, and infantry combat vehicles. These for example the shown in Fig. 1. Gun turret operators have traditionally been
exploration ability in all terrain, firepower, vehicle stability, seen openly by enemies, which are vulnerable to being shot
structure, and materials. Interestingly, with the advances in by e.g., snipers, and enemy ambush. Therefore, a gun turret
computer technology, robotics, and control, it is highly possible system that can be controlled remotely or automatically is
to make further research possible and workable to design and extremely important. Another important use of gun turret is
develop innovative ground vehicle technologies to produce to guard a base. Gun turret placed on a static platform such as
more effective, accurate, fast and easy for operation. tower has better shooting stability when compared to mobile
platform. These systems are usually controlled automatically
One of the important part in GCV is gun turret. Gun turret by utilizing sensors such as cameras for tracking target. The
is defined as a rotating weapon system mounted on a particular operator can safely defend the other territories.
platform. It has a mechanism that is capable of directing
the weapon and firing at the target during combat, and has Existing automatic targeting system gun turret has been
performance related to the accuracy and speed to reach the performed by [1]–[9]. Those works utilize many sensors for
target position of the shooting. the methods to work. The notable ones that focus only using
visual information such as camera are [1]–[3], [7], [8]. It is
In 2015 Indonesia has a new domestically-made panzer note that using many sensors can increase the weight of a
with the name of the Rhino Panzer and successfully fired a payload and not a cheap solution—not to mention if there is

978-1-5386-0712-1/17/$31.00 ©2017 IEEE 134


damage and repair costs. The existing visual targeting systems, Weapon Station (RCWS) appears. This system is an integrated
[1]–[3], [7], [8], use features that are manually designed as system that combines the advancement of computer technology
input to a classification system to perform target tracking and sensors so that the performance of the turret can be more
and a highly complex kinematic and dynamic analysis that is optimal. The use of sensors other than the camera will increase
only specific to a particular turret. Hand-crafted features can the robot load and the price is quite expensive.
degrade accuracy due to less optimal parameters and difficult to
Conventional detection system such as [28], [29], mostly
implement in practice. Parameters are not purely learned from
comprises of manual features design and a classifier. Detection
training data but is solely based on engineering’s experience
for automatic gun turret using AI has recently been used by
and knowledge.
[1]–[3], [7], [8]. Various features such as SURF to obtain high
Deep learning, [10], has recently shown excellent results accuracy are able to detect and tracking targets have been
in various visual perception tasks, such as image classification used. These features are then used as inputs to classification
by [11], [12], image segmentation by [13]–[15], and object methods. for target tracking. To move the actuators on the turet
detection by [16]–[20]. The features are automatically learned gun then use kinematic and dynamic analysis of the gun turret.
solely from the training data.
All of the above targeting methods of auto gun turret use
In this paper, toward the goal of realizing an automatic features manually designated as input to a classification system
targeting system of gun turret, a gun turret simulation sys- for target tracking, and use a less robust prototype gun turret—
tem is developed by leveraging fully connected network of still like a gun turret toy. In addition, for each gun turret, a
deep learning. Existing manually hand-crafted features works highly complex kinematic and dynamic analysis that is specific
mainly differ from our method, many features are implicitly to a particular turret is required. Manually design features can
integrated and it has a dual connection between features and cause a loss of accuracy due to less optimal parameters to be
classifier. The features and classifier have mutual interaction searched and less practical in implementation. Parameters are
during learning using the well known back-propagation tech- not purely learned from the data but tend to experience and
nique. This method also requires less kinematic and dynamic knowledge of engineer.
model. More specifically, the method comprise of designing
Deep learning, [10], recently has shown excellent results in
CNN to accurately detect and tracking target with input
the various tasks of visual perception, such as the classification
from kinect camera. All CNN parameters are automatically
of images by [11], [12], image segmentation by [13]–[15], and
learned without any human intervention, which means that all
object detection by [16]–[19]. The features are automatically
parameters are driven purely by data.
learned solely from the training data. Therefore, we leverage
This paper is structured as follows. Section II describes the deep learning method to resolve the problems that exist in the
state-of-the-art. In Section III describes the proposed approach visual system of automatic targeting gun turret.
consisting III-A convolutional neural network insights, III-B
describes the CNN structure, III-C describes learning process III. AUTO TARGETING S YSTEM
of CNN, and III-D explains the bounding boxes formation. Ex-
periments and conclusion are shown in IV and V, respectively. Overview of the proposed system is shown in Fig. 2. Given
a video, frames are extracted. Each frame is input to the
network. The network then outputs the desired object location
II. R ELATED W ORK to be tracked. At first, the location of an object is located
One important aspect developed in GCV is the gun turret manually, then the next frames the surrounding patches are
mechanism that deals with the accuracy and speed to achieve extracted indicating the possibility of an object in new position.
the target position of shooting. The issue of control problems We assume the object not moving randomly for each frame. In
in the gun turret is not only on the kinematic and dynamic fact, this is true in real scenario where object seldomly moves
aspects but also on the elements of the controller itself. in random.
Without good control the gun turret’s performance will become
useless and ineffective. The base of this gun turret control A. Convolutional Neural Network
can not be separated from the rapid technological advances in
In this work, CNN is employed to learn the features
the field of robotics. Control system as the brain of robotic
representation, optimizing the features and the classifiers si-
technology greatly affect the design of gun turret controller.
multaneously. This in contrast to the existing manual features
Either theoretically or practically, the problematic application
design that are optimized through tedious job of trial-and-error
of control systems to gun turret has been studied and proposed
process individually optimizing the features and the classifiers.
such as variable structure control [21], active disturbance
rejection control [22], model predictive control [23], fuzzy Multiple classifiers which are connected in series are a
logic controller [24], and intelligent sliding mode control [25]. common way to increase the performance of the detection
system. For example of those works are [30], [31] which utilize
To improve turet gun control performance, the system is
cascaded classifier to boost the accuracy.
equipped with remote control and camera [26], [27]. By using
remote control, the turet gun can be driven automatically from Conventional detection algorithms are to classify a rect-
the desired distance while it is within range of the wireless angle region u whether object or non-object. Those methods
communications system, so operators do not need to be directly can be summarized as follows: a set of hand-crafted fea-
on a turret. In conducting environmental monitoring outside the tures Q(u) = (Q1 (u), Q2 (u), . . . , QN (u)) are extracted. The
body of GCV, the operator’s vision is assisted by a camera method then learns a binary classifier kl for each label. The
mounted on the body, so that the term Remote Controlled classifier is a separate stage from the features. The primary

135
CNN

Frame-0 Frame-1 Frame-t

x,y,width,height
y
Which object to track? Generate patches from Detected Object
the surrounding Layer-5
Layer-1 Layer-2 Layer-3 Layer-4

FC

FC
FC
-1

-3
-2
Convolution Pooling Fully
Connected

Fig. 2. Overview of overall system of our method. At first frame the desired object is located manually. Then the next frames the surrounding patches of the
object are extracted and one-by-one becomes an input to the CNN that regresses the object bounding box relative location from the patch [x, y, width, height].
The center position of the tracked object is then sent to turret’s actuators (pan-tilt) using PID. Best viewed in color.

TABLE I. N ETWORK S TRUCTURE


objective is then to maximize to detect the labels l which is
contained in region u such that l∗ = argmaxl∈L P (l|u), where Layer Type Input Size Kernel Size Feature Map
1 Conv 224x224x3 11x11x3x96 96
labels L = {object, non-object} and a posterior probability Relu 55x55x96 - 96
distribution P (l|u) = kl (Q(u)) over labels given the inputs. Pooling 55x55x96 3x3 96
Normalization 55x55x96 5x5 96
Meanwhile, CNN consist of multiple layers of features 2 Conv 55x55x96 5x5x96x256 256
which are intertwined together. A convolutional layer consist Relu 27x27x256 - 256
Pooling 27x27x256 3x3 256
of N linear filters which is followed by a non-linear activation Normalization 27x27x256 5x5 256
function h. The result from convolutional process produce fea- 3 Conv 27x27x256 3x3x256x384 384
ture map fm (x, y) which is an input to the next convolutional Relu 13x13x384 - 384
4 Conv 13x13x384 3x3x384x384 384
layer, where (x, y) ∈ Sm are spatial coordinates on layer m. Relu 13x13x384 - 384
The feature map fm (x, y) ∈ RC contains C channels fm c
(x, y) 5 Conv 13x13x384 3x3x384x256 256
n
to indicate c-th channel feature map. A new feature map fm+1 Relu 13x13x256 - 256
Pooling 13x13x256 3x3 256
is produced after each convolutional layer such that, 6 Fully Connected 13x13x256 - 4096
7 Fully Connected 4096 - 4096
8 Fully Connected 4096 - 4096
9 Fully Connected 4096 - 4
n n n n
fm+1 = hm (gm ), where gm = Wm ∗ fm + bnm (1)
n n
gm ,Wm , and bnm indicate the n-th net input, filter kernel, and It is interesting that the previous existing works can be
bias on layer m, respectively. Normalization, subsampling, viewed as a kind of CNN [33], where the stages now becomes
and pooling, which are usually intertwined with convolutional layers. Many features are represented by feature maps with
layers, are used to build translation invariance in local neigh- its channels. And a classifier’s non-linear transformation as
borhoods. This work used an activation layer hm such as the a convolutional operation as in Eq. 1 containing non-linear
Rectified Linear Unit (ReLU) hm (f ) = max{0, f }. Pooling activation function of convolution between the filter kernel and
layer is obtained by taking maximum or averaging over local its bias with feature maps. However, in CNN, many features
neighborhood contained in c-th channel of feature maps. integration, non-linear transformation, together with maximum
or average pooling, are all involved in a layer. It is fully feed-
B. Structure of Network forward and can be computed efficiently. So the method able
to optimize an end-to-end mapping that consisting features
CNN structure as shown in Table I is used. An RGB and classifier. In addition, the last fully-connected layer (FC)
image patch or window u ∈ R224×224×3 is used as input. can be thought as pixel-wise non-linear transformation. A new
More specifically, we employ the network first to five layers layer could be easily intertwined to make better discriminative
architecture of CaffeNet [32] while removing all the remaining system.
fully connected layers, to produce a vector of 13×13×256. We
then add with three new fully connected layers, each has size C. Training
4096. Finally, the last layer gives output size 4 which represent
the output bounding box [x, y, width, height]. Between each Objects tend to move smoothly in each frame. Thus we
fully-connected layer we add dropout and ReLU. We found only search on the nearby area of the desired object. At
that the convolutional layers are important part to achieve first, the desired object on initial frame is located manually
robust generic object tracking as the layers were pretrained [x, y, w, h] where (x, y) top-left corner of bounding box and
using ImageNet datasheet. Parameters of the network are (w, h) is width and height respectively. Then c image patches
made similar to the defaults CaffeNet. Our neural network is are generated and cropped randomly. We set c so as not too
implemented using Caffe [32]. many patches for the network to process. We add padding—

136
to actuate the pan-tilt movement. But it is interesting that
our method show fairly well to track the object especially
it is relatively robust to the size of the object. When the
object quite far and quite near the turret still able to track
correctly the object. To this current moment, we have difficulty
in comparing our system with the existing due to different
platform used, such as different motor specification, slow
motor with faster one resulting different tracking.
Accuracy of tracking position when the robot base moves
in circular is shown in Fig. 5. The position of object in camera
view is used. The object position is normalized [0, 1] both on x
and y axes. Point (0, 0) is the top-left position of camera view
while (1, 1) is the bottom-right. Thus, good tracking should
have position (0.5, 0.5). our method shows relatively well for
Fig. 3. Simulation setup in our experiment. tracking the object since the object position is quite close to
the center position (0.5, 0.5). For some positions, our method
slightly misses to track the object, this could be attributed to
increase the surrounding by certain pixels. A common search the slow motor and PID parameters likely not optimally set.
region technique is employed to crop the current frame. But overall our method still manage to track without losing
We made assumption that the desired object was not the target.
occluded and not moving too fast. For faster objects, the search Tracking a person who is identical with another person
region size may likely be increased, sacrificing the complexity is shown in Fig. 6. It still able to track the correct person
of the network which requires more computation. The network though they are identical. This is because the learned deep
are first pre-trained on ImageNet dataset. It is a common way network intrinsically learns the motion of object. Since object
to finetune using this procedure as ImageNet is considered as in video tends to have smooth motion instead of random
generic object datasheet. Layer 1 to 5 are set fix to prevent movement. Although the persons are identical (have similar
overfitting. Learning rate is set to 0.00001 is used, and the visual features), but they have different motion. This is differ
defaults values of CaffeNet for other hyperparameters are taken from previous methods where object motion is hard-coded into
[32]. the system instead of learning from data. However, our method
highly lekely fail due to widely occluded object, as shown in
D. Generating bounding boxes Fig. 7.
During test time, the desired object is located manually,
then for the next frames the system run automatically detecting V. C ONCLUSION
the object. The network produces bounding box [x, y, w, h]
indicating the located object. We simply draw a bounding box We have presented an automatic targeting system of gun
using this information. PID algorithm is then used to rotate the turret using only visual information from a camera to solve
pan-tilt turret actuators using the center position of the tracked the problem of the existing works. It is note that using many
object bounding box. sensors can increase the weight of a payload and not a cheap
solution—not to mention if there is damage and repair costs.
Existing works tend to use features that are manually designed
IV. E XPERIMENTAL R ESULTS as input to a classification system to perform target tracking
PC core i7 16 GigaBytes RAM with GPU 8 GigaBytes and a highly complex kinematic and dynamic analysis that is
memory the training spends about few hours. A free version only specific to a particular turret. Hand-crafted features can
of V-REP1 robotic simulation system is used to test our tar- degrade accuracy due to less optimal parameters and difficult to
geting system. This V-REP is connected with Robot Operating implement in practice. Parameters are not purely learned from
System (ROS)2 and Caffe [32] deep learning library. KUKA- training data but is solely based on engineering’s experience
Youbot robot as a mobile platform and our turret is mounted on and knowledge.
the robot, as shown in Fig. 3. The robot uses non-holonomic
wheel and the turret has pan and tilt movements with a pointing R EFERENCES
gun. A camera is mounted on the similar direction with the
[1] N. Djelal, N. Mechat, and S. Nadia, “Target tracking by visual servo-
gun. In this setup, the robot autonomously follows the given ing,” in Systems, Signals and Devices (SSD), 2011 8th International
line. The line is set in circular. The turret should then track Multi-Conference on. IEEE, 2011, pp. 1–6.
the given object. [2] N. Djelal, N. Saadia, and A. Ramdane-Cherif, “Target tracking based on
surf and image based visual servoing,” in Communications, Computing
The performance of our method is studied for tracking and Control Applications (CCCA), 2012 2nd International Conference
people walking randomly. For some frames, our method seems on. IEEE, 2012, pp. 1–5.
late to move the turret though the object has been successfully [3] E. Iflachah, D. Purnomo, and I. A. Sulistijono, “Coil gun turret control
tracked, this could be attributed to the slow motor rotation using a camera,” EEPIS Final Project, 2011.
[4] A. M. Idris, K. Hudha, Z. A. Kadir, and N. H. Amer, “Development
1 http://www.coppeliarobotics.com/
of target tracking control of gun-turret system,” in Control Conference
2 http://www.ros.org/ (ASCC), 2015 10th Asian. IEEE, 2015, pp. 1–5.

137
Fig. 4. Results of our system for tracking people while the robot base is moved in circular. A window at the bottom-right of each image shows the detected
object (green box) from the camera view.

0.6 cal features for scene labeling,” IEEE transactions on pattern analysis
N
P 0.5 and machine intelligence, vol. 35, no. 8, pp. 1915–1929, 2013.
o
o [14] A. Graves, M. Liwicki, S. Fernández, R. Bertolami, H. Bunke, and
r 0.4
s J. Schmidhuber, “A novel connectionist system for unconstrained
m
i handwriting recognition,” IEEE transactions on pattern analysis and
a 0.3
t machine intelligence, vol. 31, no. 5, pp. 855–868, 2009.
l Object x
i 0.2
i
o Object y [15] A.-r. Mohamed, G. E. Dahl, and G. Hinton, “Acoustic modeling using
z
n
0.1 deep belief networks,” IEEE Transactions on Audio, Speech, and
e Language Processing, vol. 20, no. 1, pp. 14–22, 2012.
0
[16] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE International
1
664
1327
1990
2653
3316
3979
4642
5305
5968
6631
7294
7957
8620
9283
9946
10609
11272
11935

Conference on Computer Vision, 2015, pp. 1440–1448.


i-frame [17] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature
hierarchies for accurate object detection and semantic segmentation,”
in Proceedings of the IEEE conference on computer vision and pattern
Fig. 5. Accuracy of our system perframe measured using camera view recognition, 2014, pp. 580–587.
position. Good tracking should have position (0.5, 0.5). Note that, our method
shows relatively well for tracking the object since the object position is quite [18] ——, “Region-based convolutional networks for accurate object de-
close to the center position. tection and segmentation,” IEEE transactions on pattern analysis and
machine intelligence, vol. 38, no. 1, pp. 142–158, 2016.
[19] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan,
“Object detection with discriminatively trained part-based models,”
[5] J. D. S. Munadi and M. F. Luthfa, “Fuzzy logic control application for IEEE transactions on pattern analysis and machine intelligence, vol. 32,
the prototype of gun-turret system (arsu 57-mm) using matlab,” 2014. no. 9, pp. 1627–1645, 2010.
[6] T. M. Nasyir, B. Pramujati, H. Nurhadi, and E. Pitowarno, “Control [20] I. A. Sulistijono and A. Risnumawan, “From concrete to abstract:
simulation of an automatic turret gun based on force control method,” in Multilayer neural networks for disaster victims detection,” in Electronics
Intelligent Autonomous Agents, Networks and Systems (INAGENTSYS), Symposium (IES), 2016 International. IEEE, 2016, pp. 93–98.
2014 IEEE International Conference on. IEEE, 2014, pp. 13–18.
[21] R. Dana and E. Kreindler, “Variable structure control of a tank gun,” in
[7] G. Ferreira, “Stereo vision based target tracking for a gun turret Control Applications, 1992., First IEEE Conference on. IEEE, 1992,
utilizing low performance components,” Ph.D. dissertation, University pp. 928–933.
of Johannesburg, 2006.
[22] Y. Xia, L. Dai, M. Fu, C. Li, and C. Wang, “Application of active
[8] H. D. B. Brauer, “Real-time target tracking for a gun-turret using disturbance rejection control in tank gun control system,” Journal of
low cost visual servoing,” Master’s thesis, University of Johannesburg, the Franklin Institute, vol. 351, no. 4, pp. 2299–2314, 2014.
2006. [Online]. Available: http://hdl.handle.net/10210/445
[23] G. Kumar, P. Y. Tiwari, V. Marcopoli, and M. V. Kothare, “A study
[9] Ö. Gümüşay, “Intelligent stabilization control of turret subsystems under of a gun-turret assembly in an armored tank using model predictive
disturbances from unstructured terrain,” Ph.D. dissertation, Middle East control,” in American Control Conference, 2009. ACC’09. IEEE, 2009,
Technical University, 2006. pp. 4848–4853.
[10] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, [24] M. Galal, N. Mikhail, and G. Elnashar, “Fuzzy logic controller design
no. 7553, pp. 436–444, 2015. for gun-turret system,” in Proc. the 13th International Conference on
[11] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification Aerospace Sciences and Aviation Technology, 2009.
with deep convolutional neural networks,” in Advances in neural [25] J.-H. Tian, L.-F. Qian, and X.-H. Yang, “Intelligent sliding mode control
information processing systems, 2012, pp. 1097–1105. and application to weapon pointing systems,” in Computer, Mechatron-
[12] K. Simonyan and A. Zisserman, “Very deep convolutional networks for ics, Control and Electronic Engineering (CMCE), 2010 International
large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. Conference on, vol. 2. IEEE, 2010, pp. 372–375.
[13] C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning hierarchi- [26] S. K. Sivanath, S. A. Muralikrishnan, P. Thothadri, and V. Raja, “Eyeball

138
Fig. 6. Results of our system for tracking identical persons while the robot base is moved in circular. Note that our method still able to track the correct person
though they are identical, which could be attributed to learned object motion. Our deep network intrinsically learns object motion.

in International Conference on Soft Computing and Data Mining.


Springer, 2016, pp. 366–375.

Fig. 7. Failure case when the robot camera view is widely occluded.

and blink controlled firing system for military tank using labview,” in
Intelligent Human Computer Interaction (IHCI), 2012 4th International
Conference on. IEEE, 2012, pp. 1–4.
[27] R. Bisewski and P. K. Atrey, “Toward a remote-controlled weapon-
equipped camera surveillance system,” in Tools with Artificial Intelli-
gence (ICTAI), 2011 23rd IEEE International Conference on. IEEE,
2011, pp. 1087–1092.
[28] A. Risnumawan, P. Shivakumara, C. S. Chan, and C. L. Tan, “A robust
arbitrary text detection system for natural scene images,” Expert Systems
with Applications, vol. 41, no. 18, pp. 8027 – 8048, 2014.
[29] A. Risnumawan and C. S. Chan, “Text detection via edgeless stroke
width transform,” in Intelligent Signal Processing and Communication
Systems (ISPACS), 2014 International Symposium on. IEEE, 2014, pp.
336–340.
[30] L. Neumann and J. Matas, “Real-time scene text localization and
recognition,” in Computer Vision and Pattern Recognition (CVPR), 2012
IEEE Conference on, 2012, pp. 3538–3545.
[31] ——, “On combining multiple segmentations in scene text recognition,”
in Document Analysis and Recognition (ICDAR), 2013 12th Interna-
tional Conference on, 2013, pp. 523–527.
[32] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick,
S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for
fast feature embedding,” in Proceedings of the 22nd ACM international
conference on Multimedia. ACM, 2014, pp. 675–678.
[33] A. Risnumawan, I. A. Sulistijono, and J. Abawajy, “Text detection
in low resolution scene images using convolutional neural network,”

139

You might also like