Professional Documents
Culture Documents
Project: Human Action detection Proof of Concept (POC) Setup document for Avistrack
Document Details
Validated By Date
ii
Human Action Detection –Implementation Guide
Contents
1. Introduction ............................................................................................................................................ 1
2. System Requirements............................................................................................................................ 2
3. Implementation ...................................................................................................................................... 3
4. Testing ................................................................................................................................................. 10
5. Why TensorFlow*?............................................................................................................................... 11
iii
Human Action Detection –Implementation Guide
iv
Human Action Detection –Implementation Guide
List of Abbreviations
IR Intermediate Representation
v
Human Action Detection - Implementation Guide
1. Introduction
There is a need to design, develop, and test a human action detection prototype to demonstrate the
feasibility of using the Intel® Distribution of OpenVINO™ toolkit for detecting human action such as push-
ups and sit-ups with accuracy and good performance. This information also needs to be transferred to
an ISV (Avistrack*).
The Intel® Distribution of OpenVINO™ toolkit is a comprehensive toolkit for quickly developing
applications and solutions that emulate human vision. Based on CNNs, the toolkit extends computer
vision workloads across Intel® hardware, maximizing performance.
Supports heterogeneous execution across computer vision accelerators from Intel, using a
common API for the CPU, Intel® Movidius™ Neural Compute Stick, and FPGA. However, we are
not covering Intel® Movidius™ NCS and FPGA in this lab.
Speeds time-to-market through an easy-to-use library of computer vision functions and pre-
optimized kernels.
Includes optimized calls for computer vision standards, including OpenCV, OpenCL™, and
OpenVX*.
1
Human Action Detection - Implementation Guide
2. System Requirements
Hardware Requirements:
Software Requirement:
2
Human Action Detection - Implementation Guide
3. Implementation
This section describes how to develop a human action detection model using TensorFlow*. The following
steps are involved:
Installing TensorFlow deep learning framework
Setting up TensorFlow object detection API
Training custom object detector using object detection API and
Converting the trained model in to IR form using the Intel® Distribution of OpenVINO™ toolkit
Model Optimizer
After completing these steps, you will be able to detect human action such as sit-ups and push-ups using
the Intel® Distribution of OpenVINO™ toolkit.
3.1.1. Pre-requisites
$ sudo apt get update
$ sudo apt install python3-dev python3-pip # install python3
$ sudo pip3 install --U virtualenv # system-wide install
3
Human Action Detection - Implementation Guide
4
Human Action Detection - Implementation Guide
After we have the required dataset, next step is to annotate the objects in an image. For object detection,
we need to annotate the objects in the images manually. This can be done by using Linux* tool called
LabelImg. This open source tool can be downloaded from here. Installation instructions are in the link
provided. For detailed usage of LableImg for this POC, you can follow this Link .
After all the images are manually annotated, for each image, the corresponding xml file will be generated
in the directory mentioned during annotating.
Now create a directory named Object-Detection and create respective directories in it as shown below:
Object-Detection
-data/
--test_labels.csv
--train_labels.csv
-images/
--test/
---testingimages.jpg
--train/
---testingimages.jpg
--...yourimages.jpg
-training
-xml_to_csv.py
Keep all the preprocessed images and their corresponding .xml files in the images directory, separating
them into test and train.
Convert all the xml files generated during annotation to csv format using this script.
def main():
image_path = os.path.join(os.getcwd(), 'annotations')
xml_df = xml_to_csv(image_path)
xml_df.to_csv('raccoon_labels.csv', index=None)
print('Successfully converted xml to csv.')
5
Human Action Detection - Implementation Guide
To:
def main():
for directory in ['train','test']:
image_path = os.path.join(os.getcwd(),
'images/{}'.format(directory))
xml_df = xml_to_csv(image_path)
xml_df.to_csv('data/{}_labels.csv'.format(directory),
index=None)
print('Successfully converted xml to csv.')
Now run the Xml_to_Csv.py script to generate train.csv and test.csv in the data folder
Run the generate_tfrecord.py script. We will run it twice, once for the train TFRecord and once for the test
TFRecord.
(Venv)$ cd <Object-Detection_dir>
(Venv) <Object-Detection_dir>$ python3 generate_tfrecord.py --csv_input=data/train_labels.csv --
output_path=data/train.record --image_dir=images/
(Venv) <Object-Detection_dir>$ python3 generate_tfrecord.py --csv_input=data/test_labels.csv --
output_path=data/test.record --image_dir=images/
Now, in your data directory, you should have train.record and test.record
6
Human Action Detection - Implementation Guide
(venv)$ cd models/research/
(venv) models/research$ protoc object_detection/protos/*.proto --python_out=.
(venv) models/research$ export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
If you get any errors while executing the protoc command, then protoc is not installed properly. Follow
section 3.4 for proper installation.
Navigate to the object detection directory in the file that you have downloaded.
Copy the data, training, images folders and SSD_Inception_v2_COCO pretrained model to the
object_detection directory in the downloaded repository.
1. num_classes: 3
2. fine_tune_checkpoint: "ssd_inception_v2_coco_2018_01_28/model.ckpt"
train_input_reader: {
tf_record_input_reader {
input_path: "data/train.record"
}
label_map_path: "data/object-detection.pbtxt"
}
eval_config: {
num_examples: 40 # change with respect to number of test images
}
eval_input_reader: {
tf_record_input_reader {
input_path: "data/test.record"
7
Human Action Detection - Implementation Guide
}
label_map_path: "data/object-detection.pbtxt"
shuffle: false
num_readers: 1
}
For example, a class such as apple entry can be made as follows to generate the label file:
item {
id: 1
name: 'person doing pushups'
}
item {
id: 2
name: 'person doing situps'
}
item {
id: 3
name: 'person doing ‘proper situps'
}
(venv)$ cd /<obj_detect_api_dir>/model/research/object_detection
In case of any errors, make sure that the environment variables are properly set. For example, protoc and
slim to be added in Python path.
8
Human Action Detection - Implementation Guide
On successful training, checkpoint files are available in the Training folder. Training is to be done until the
loss percentage is <1. Approximately, it will take 5000 steps (5-6hrs). Once the loss percentage is <1,
stop training the model.
The TensorFlow model is converted into IR format with the following command.
$ sudo Python3 <INSTALL_DIR>/deployment_tools/model_optimizer/mo_tf.py --
input_model=/<frozen_graph_location_directory>/ssd_inception_v2_inferece_graph/frozen_inference_gra
ph.pb --tensorflow_use_custom_operations_config
<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/ssd_v2_support.json --
tensorflow_object_detection_api_pipeline_config
=/<frozen_graph_location_directory>/ssd_inception_v2_inferece_graph /pipeline.config --
reverse_input_channels
After executing the above command successfully, frozen inference graph.xml, frozen inference graph.bin
and frozen inference graph.mapping files are generated in the optimizer directory. These files will be used
for testing the model.
9
Human Action Detection - Implementation Guide
4. Testing
To test the developed human activity detection model, the Intel® Distribution of OpenVINO™ toolkit
inference engine is used. The Intel® Distribution of OpenVINO™ toolkit prebuilt sample application
“object_detection_demo_async” is used to test the optimized model.
After executing the command successfully, a camera live feed will be shown in terminal. Stream the
person doing sit-ups or push-ups to be tested using the camera. A rectangular box will be marked
confirming the detection. The detection accuracy value will be displayed on top of the rectangular box.
If you want to display the label name, then create a file named frozen_inference_graph.labels file in the
directory where the object_detection_demo_async is located and type the list of class names used for
training in correct order.
Note: The listing of class names should start from second line.
10
Human Action Detection - Implementation Guide
5. Why TensorFlow*?
There are many different deep learning frameworks available, such as Caffe*, Apache MXNet*, darkNet
and so on. Due to high availability of online support and the availability of many pre-trained models, the
TensorFlow framework is preferred here. In addition, the Intel® Distribution of OpenVINO™ toolkit
supports TensorFlow, Caffe and MXNet currently.
With the TensorFlow object detection API, already available and suited for the object detection
applications, this model is preferred to develop applications quickly.
11
Human Action Detection - Implementation Guide
There are many pre-trained TensorFlow models available for object detection. The SSD_Incepti'n_v2
model has been used here for two reasons. They are ‘Training speed’ and ‘Training accuracy’.
The following is the list of available object detection pre-trained models and their corresponding two
attribute values.
ssd_mobilenet_v1_coco 30 21
ssd_mobilenet_v1_0.75_depth_coco 26 18
ssd_mobilenet_v1_quantized_coco 29 18
ssd_mobilenet_v1_0.75_depth_quantized_coco 29 16
ssd_mobilenet_v1_ppn_coco 26 20
ssd_mobilenet_v1_fpn_coco 56 32
ssd_resnet_50_fpn_coco 76 35
ssd_mobilenet_v2_coco 31 22
ssdlite_mobilenet_v2_coco 27 22
ssd_inception_v2_coco 42 24
faster_rcnn_inception_v2_coco 58 28
faster_rcnn_resnet50_coco 89 30
rfcn_resnet101_coco 92 30
faster_rcnn_resnet101_coco 106 32
mask_rcnn_inception_v2_coco 79 79
If you need faster inferencing at the cost of low accuracy, you can select the mobilenet models. If you
want more accuracy at the cost of low inferencing time, you can select the inception models. For our
object detection, we have selected the SSD_inception_v2 model that is average on both speed and
accuracy. In addition, it is an easy integrated with the Intel® Distribution of OpenVINO™ toolkit.
12
Human Action Detection - Implementation Guide
To count the number of actions performed by the human, say number of sit-ups or number of push-ups, a
concept called Region of Interest (ROI) is used. While inferencing with Python object_detection_sample
on the output, a rectangular box detecting the action performed is shown. You then need to find the
center of the rectangle and draw a horizontal or vertical line across the output window which is called the
ROI line. Once the center of the detection rectangle touches the ROI line two times, you can count it as
one repetition (REP) in fitness terminology. You need to code all this in the Python sample
object_detection_demo_async.py in the Intel® Distribution of OpenVINO™ toolkit.
Figure 3: Output screen showing the detected human actions and action count
In the output screen, you can see the action detection with accuracy and corresponding action count.
13