You are on page 1of 4

Foot Recognition Using Deep Learning

for Knee Rehabilitation

Abstract—The foot recognition can be applied in many medical using a classifier which is trained from the labelled data set.
fields such as the gait pattern analysis and the knee exercises of Nonetheless, if the data set are variety in same class, it seems
patients in rehabilitation. Generally, the foot recognition system with to be that theirs feature vectors tends to be not satisfactory to
a camera is aimed to captures an image of the patients in a controlled
room and background to recognize the foot in the limited views. fit with the classifier.
However, this system seems to be not convenience for monitoring Deep learning [7] is a new area of machine learning
the knee exercises at home. This research proposes a method for the application. It can additionally applied to the low level and
foot recognition using a deep learning method and compares with high level information in the large data set. The deep learning
traditional classification method. According to the results, deep learn- architecture is normally similar to artificial neural networks
ing method provides better accuracy and complexity to recognize the
foot images from online databases than the traditional classification but its hidden layers and number of nodes are complex. As
method. a result, it can identify an object in high accuracy such as
handwritten digits, multiple objects and variations of images.
Keywords—Foot Recognition, Deep Learning, Knee Rehabilita-
tion, Convolutional Neural Network The purpose of this study is to develop the foot recognition
using a deep learning method for knee rehabilitation, and it
is evaluated the deep learning method with the traditional
I. I NTRODUCTION methods by using an accuracy and complexity.

I N general, a knee physical rehabilitation [1] aims to in-


crease the strength, mobility and fitness of the knee for a
patient. This exercise is observed by a physical therapist (PT)
II. M ETHODS
This paper proposed the foot recognition using Convolu-
to provide a feedback to the patient such as the limited range tional Neural Network (CNNs) and compared with traditional
of motion of knee (angle). Normally, the PT uses a goniometer cassification methods.
to measure angle of knee joint during knee rehabilitation as
shown in below figure. Nonetheless, it may be not suitable for
home rehabilitation or many patients in the hospital. A. Traditional classification methods for foot recognition
Normally, there are many examples of knee rehabilitation by As can be seen from Figure 1, there are four main steps of
using body weigh, machines and free weights [2] . Considering foot recognition.
the main postures of these, the foot of the patient is the main 1) Preprocessing Images Data: The raw images are nor-
part of the body for moving the direction under the instructions mally labelled, cropped and converted into the suitable color
of PT while the knee joint of the patient is fixed for protection space. These data set consist of two types of data such as
painful on its . It seems to be measurable the angle of knee positive and negative images.
by tracking the foot movement during knee rehabilitation. 2) Feature Generation: It is mathematical calculation to
Using a marker less system (MLS) [3] for tracking a produce a feature vector from a raw image. The feature vector
foot with camera is able to monitor a patient during do a seems to be represented the information of its image. In this
rehabilitation program. In addition, it is low cost and easy research considers LBP and HOG features.
to setup. However, there are problems such as an occlusion, a • LBP is a basic features by using the LBP mask to
specific room and background and limited the view of captures calculate the gray images as demonstrated in Figure 2.
to recognize the foot images. Some patients additionally prefer This mask is generally divided the images into cells
to wear shoes/socks than to be a barefoot to do a rehabilitation determined by a radius of its mask. For example, if the
program so it is hard to recognize the foot location in the radius is one, the neighbors pixel around the cells is 8
image. points as showed in Figure 3. The final of LBP feature
In machine learning for computer vision application, it are normalized by using histogram of its LBP values and
becomes useful for to identify objects in raw images, search reduced the dimension by the rotation invariant uniform
items and translation information. The common model of LBP [8].
machine learning is also supervised learning. To build this • HOG is a feature that uses the histograms of oriented gra-
system, it consists of a proper data set of images that are dients of a gray image. Generally, its process initializes
labelled with its class. For conventional machine learning a parameters such as the sizes of cell, blocks and bins.
techniques, theirs system required a feature generation module The image is calculated the gradients of both horizontal
to convert the low level information such as the pixel values axis, vertical axis and a magnitude an angle of gradients.
to suitable feature vectors such as the local binary patterns This edge histogram is created as gradient vote depending
(LBPs) [4] and the histogram of oriented gradients (HOG) on bins parameter with normalized magnitude of gradient
[5]-[6] . These features are able to represent its class by vote. Finally, the 2-dimensional features are changed into
single vector of features to represent the information of
the image as illustrated in Figure 4.
3) Classifier: The third step is a classifier. In general, the
feature vectors of image are trained by the labelled theirs
class. The classifier model will learn from both negative and
positive images. In this research, SVM and kNN are used in
the experiment.
• Support Vector Machine (SVM) is a popular classifier Fig. 1. Diagram of foot recognition
which based on a separation planes of the class by super-
vised learning [5]. The plane is normally divided a group
of items in different class. To identify the suitable plane,
it requires the maximization of the distances between III. E XPERIMENTAL SETUP
nearest item point of class and the decision plane. Its A. Dataset
distance is also called as margin. As a result, the selection In this study, the data consists of two classes (N=8000) as
of plane with high margin will provide a robustness demonstrated in Figure 6 and Figure 7: foot images (N=4000)
classifier. and non-foot images (N=4000). In case of foot images, they
• k Nearest Neighbor (kNN) is a simple classifier [9].
provide the foot images in different views, varieties of bright-
It is also a lazy algorithm because it does not need ness, bare foot and wearing shoe/sock. In term of non-foot
training phase. For kNN, all feature data is calculated a images, they are randomized cropping from the living room
distance such as Euclidean distance between the unknown images. Both two classes of image are from online dataset
points and every data points. The result of classification is such as Pascal VOC2012 [10] and image-net.org [11]. In case
depending on the class of the nearest k-distances with the of foot recognition with the traditional methods, it requires
highest frequency of corresponding class to the nearest k- for feature generation (LBPs and HOG) of images before
distances. processing in training step to train and test the classifier
4) Recognition Results: The last step aims to evaluate the model. Considering the training step, the deep learning use
performance of foot recognition from the test images without training data (100%): training set (80%) and test set (20%) to
relating in training images. In this paper, it focuses on two adjust the deep learning model. The validation data is about
measurements: sensitivity and specificity. 20% of training data to report the accuracy. The traditional
methods are using 5-fold validation with training set (80%)
and validation data (20%) to report the average of accuracy
B. Convolutional Neural Networks (CNNs) methods for foot
for all 5-fold validation.
recognition
Among the deep learning, the CNNs has continuously been
the effective method in image recognition, object tracking, face B. Configuration
analysis. In general, the CNNs consists of varying component In classifier model step, the deep learning uses Tensorflow
elements of three main layers as shown in Figure 5. 14.0 module in Python 3.6 with simple model running on
• Convolutional layers are the main core of filters or 400 iterations as configuration in Figure 8. In case of the
kernels to calculate the features such as lines, edges and two traditional methods for foot recognition, the former is k-
corners of an image. Normally, these filters are the mask nearest neighbors (kNN) with k = 3, 5 and 7 by calculated
matrix of numbers and moving over the input image to their Euclidean distance. The latter is support vector machine
calculate the specific features. The convolution operation (SVM) [12] with the linear SVM. Both classifiers process by
of filters are also dot product and summation between the scikit-learn 0.19.1 module in the Python 3.6.
filters and input image. The output of these operation are
usually passed through an activation function depending C. Evaluation
on its purpose such as the Rectified Linear Unit (ReLU)
In this paper, it focuses on three measurements to evaluate
activation function for non-linear input.
the foot recognition methods: sensitivity, specificity and com-
• Pooling layers are generally reduced the dimensional
plexity. The sensitivity is able to assess the accuracy of foot
layers. These layers also represent the local and local
recognition as positive images. In case of the specificity, it
or global pooling layers. The pooling operation may be
can evaluate the accuracy of non-foot recognition as negative
maximum, average and summation of each of a cluster
images. For complexity evaluation, the number of feature, the
of data at the prior layer.
number of connection node and Big-O-notation of classifier
• Fully connected layers is the final layers which connect
are compared of each methods.
every neuron in previous layer to every neuron on the next
layer. The fully connected layer normally uses a softmax
activation function in the output layer to classify the input IV. R ESULTS
image into several classes based on the training image. n this section, there are two main indicator of evaluation for
fig:ExamFoot recognition: performance and complexity of each methods.
Fig. 2. Diagram of LBP calculation

Fig. 3. Example of the LBP pixels calculation along a clockwise

Fig. 8. Configuration on CNNs

A. Performance of Recognition
Fig. 4. Diagram of HOG calculation
The Table I shows the performance for foot recognition by
using LBP, HOG with SVM or kNN classifier and CNNs
methods. First of all, it is clear that HOG+kNN and CNNs
provide the higher sensitivity with over 0.96 than LBP+kNN.
As can be seen from the Table II, it is the performance
for non-foot recognition. Overall, both the kNN and the SVM
achieve high specificity over 0.92 with LBP features while
the HOG feature performs the low specificity. Moreover, the
CNNs is the lowest specificity in this experiment.
Fig. 5. Diagram of CNNs for foot recognition

B. Complexity
The Table III illustrates the number of each features where
Nf is the number of features and Nc is number of convolutions
in the neural network. Beginning with the Nf , the HOG
provides the maximum number of features (Nf = 1764) with
a default parameter [13]. Meanwhile, the LBP with r = 1 [8]
shows that it has a lowest number of features (Nf = 10) in
these study. Turning to the CNNS, it can be seen that the Nc
is about 88 million nodes.
The Table IV shows that the Big-O-notation complexity of
each classifier where Ns is the number of samples for training,
Fig. 6. Example of foot images Nf is the number of features, k is the number of majority
votes of its nearest neighbor and Nc is number of convolutions
in the neural network. To begin with the SVM, it seems to
be the lowest complexity of classifier as O(Nf ) in among
these experiment. By contrast, in case of CNNs, it provides
the highest complexity of classifier depending on the number
of Nc as shown in Table III.

V. D ISCUSSION AND C ONCLUSION


In case of sensitivity, the CNNs and HOG+kNN gain high
recognition for foot images. Meanwhile, the specificity of them
shows is lower than LBP+SVM. It seems to be that these
CNNs model in the experiment may not suitable to detect the
Fig. 7. Example of non-foot images background image with variety of shaped, color, and texture
of image.
TABLE I
S ENSITIVITY OF FOOT RECOGNITION R EFERENCES
[1] Anwer S, Alghadir A. Effect of isometric quadriceps exercise on muscle
Sensitivity strength, pain, and function in patients with knee osteoarthritis: a
Method kNN randomized controlled study. J Phys Ther Sci. 2014;26(5):745-8.
SVM
k=3 k=5 k=7 [2] Vincent KR, Vincent HK. Resistance exercise for knee osteoarthritis.
LBP PM R. 2012;4(5 Suppl):S45-52.
r=1 0.74 ± 0.02 0.89 ± 0.01 0.88 ± 0.01 0.88 ± 0.01 [3] Ceseracciu E, Sawacha Z, Cobelli C. Comparison of markerless and
r=2 0.83 ± 0.03 0.90 ± 0.01 0.89 ± 0.02 0.89 ± 0.02 marker-based motion capture technologies through simultaneous data
r=3 0.82 ± 0.02 0.89 ± 0.01 0.88 ± 0.01 0.87 ± 0.01 collection during gait: proof of concept. PLoS One. 2014;9(3):e87640.
r=4 0.81 ± 0.02 0.88 ± 0.02 0.86 ± 0.02 0.86 ± 0.02 Published 2014 Mar 4. doi:10.1371/journal.pone.0087640
r=5 0.86 ± 0.01 0.88 ± 0.02 0.87 ± 0.01 0.87 ± 0.01 [4] J. Kwak, S. Xu and B. Wood, ”Efficient data mining for local binary
HOG 0.80 ± 0.12 0.96 ± 0.01 0.97 ± 0.01 0.97 ± 0.01 pattern in texture image analysis”, Expert Systems with Applications,
CNNs 0.98 ± 0.01 vol. 42, no. 9, pp. 4529-4539, 2015.
[5] Ma, Y., Chen, X., Chen, G., Pedestrian detection and tracking using
TABLE II HOG and oriented-LBP features, Network and Parallel Computing,
S PECIFICITY OF FOOT RECOGNITION 2011, pp. 176-184.
[6] L. Hou, W. Wan, K. Han, R. Muhammad and M. Yang, ”Human
Specificity detection and tracking over camera networks: A review,” 2016 Interna-
Method kNN tional Conference on Audio, Language and Image Processing (ICALIP),
SVM Shanghai, 2016, pp. 574-580. doi: 10.1109/ICALIP.2016.7846643
k=3 k=5 k=7
[7] LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature,
LBP
521(7553), 436-444. https://doi.org/10.1038/nature14539
r=1 0.96 ± 0.01 0.92 ± 0.01 0.94 ± 0.01 0.95 ± 0.01
[8] T. Ojala, M. Pietikainen and T. Maenpaa, “Multiresolution gray-scale
r=2 0.93 ± 0.01 0.93 ± 0.01 0.94 ± 0.01 0.94 ± 0.01
and rotation invariant texture classification with local binary patterns”,
r=3 0.92 ± 0.01 0.92 ± 0.01 0.94 ± 0.01 0.95 ± 0.01 IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.
r=4 0.92 ± 0.01 0.92 ± 0.01 0.94 ± 0.01 0.95 ± 0.01 24, no. 7, pp. 971-987, 2002.
r=5 0.92 ± 0.02 0.93 ± 0.01 0.95 ± 0.01 0.95 ± 0.01 [9] L. Hu, M. Huang, S. Ke and C. Tsai, “The distance function effect on k-
HOG 0.59 ± 0.16 0.42 ± 0.02 0.41 ± 0.02 0.41 ± 0.02 nearest neighbor classification for medical datasets”, SpringerPlus, vol.
CNNs 0.03 ± 0.01 5, no. 1, 2016.
[10] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn and A.
TABLE III Zisserman, “The PASCAL Visual Object Classes Challenge 2012
N UMBER OF FEATURES AND NUMBER OF CONVOLUTIONS IN THE NEURAL (VOC2012) Results”, Host.robots.ox.ac.uk, 2012. [Online]. Available:
NETWORK http://host.robots.ox.ac.uk/pascal/VOC/voc2012/. [Accessed: 10- OCT-
2018].
[11] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z.
Feature Nf Nc Huang, A. Karpathy, A. Khosla, M. Bernstein, A. Berg and L. Fei-Fei,
HOG 1,764 - “ImageNet Large Scale Visual Recognition Challenge”, International
LBP Journal of Computer Vision, vol. 115, no. 3, pp. 211-252, 2015.
r=1 10 - [12] Li, D., et al., Integrating a statistical background-foreground extraction
r=2 18 - algorithm and SVM classifier for pedestrian detection and tracking,
r=3 26 - Integrated Computer-Aided Engineering, 2013, 20, (3), pp. 201-216.
r=4 34 - [13] N. Dalal and B. Triggs, “Histograms of oriented gradients for human
r=5 42 - recognition,” Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1-8,
Jun. 2005.
CNNs - 88,473,600

TABLE IV
C OMPLEXITY OF SVM, K NN AND CNN S C LASSIFIER

Classifier Complexity
SVM O(Nf )
kNN O(Ns ∗ Nf + Ns ∗ k)
CNNs O(Nc )

For the complexity, the CNNs is the highest complexity of


classifiers depending the number of convolutions in this test.
In addition the kNN is the high complexity of classifiers due
to consuming time to calculate distance of all training dataset.

VI. F UTURE WORK


Studying the foot recognition in lateral view relates the knee
rehabilitation program, and a modification of deep learning
model is able to gain high accuracy while it has lower
complexity.

ACKNOWLEDGMENT
This project is supported by the British Council Newton
Fund: Researcher Links Travel Grant 2015/2016.

You might also like