You are on page 1of 19

HUMAN ACTION DETECTION PROTOTYPE USING

INTEL® DISTRIBUTION OF OPENVINO™ TOOLKIT

IMPLEMENTATION GUIDE ‐ VERSION 0.1


APRIL 2019
Human Action Detection –Implementation Guide

Document Release Note

Project: Human Action detection Proof of Concept (POC) Setup document for Avistrack

Document Details

Author Version Number Description

Ramakrishna Mudavathi 0.1 Human Action Detection Proof of


Concept (POC) Setup document
for Avistrack

Validated By Date

Sai Srujana, KotraX 17-Apr-19

ii
Human Action Detection –Implementation Guide

Contents

1. Introduction ............................................................................................................................................ 1

2. System Requirements............................................................................................................................ 2

3. Implementation ...................................................................................................................................... 3

3.1. Installing TensorFlow* ................................................................................................................... 3

3.1.1. Pre-requisites ........................................................................................................................ 3

3.1.2. Creating Virtual Environment ................................................................................................ 3

3.1.3. Installing Packages ............................................................................................................... 3

3.1.4. Verifying Installation .............................................................................................................. 3

3.2. Setting up TensorFlow* Object Detection API .............................................................................. 4

3.3. Downloading TensorFlow* Models................................................................................................ 4

3.4. Setting up Environment Variables ................................................................................................. 4

3.5. Preprocessing Dataset .................................................................................................................. 4

3.5.1. Annotating Objects ................................................................................................................ 5

3.5.2. Converting XML to CSV Format............................................................................................ 5

3.5.3. Converting CSV Files to TensorFlow* Format ...................................................................... 6

3.6. Training the Model ........................................................................................................................ 7

3.6.1. Configuring the Model ........................................................................................................... 7

3.6.2. Creating Label File ................................................................................................................ 8

3.6.3. Training the Model................................................................................................................. 8

3.6.4. Freezing the Model ............................................................................................................... 9

3.7. Model Optimization ....................................................................................................................... 9

4. Testing ................................................................................................................................................. 10

5. Why TensorFlow*?............................................................................................................................... 11

6. Why TensorFlow* SSD_Inception_v2 model? ..................................................................................... 12

7. Human Activity Counting...................................................................................................................... 13

7. Human activity counting………………………………………………………………………………………13

iii
Human Action Detection –Implementation Guide

iv
Human Action Detection –Implementation Guide

List of Abbreviations

Abbreviation Expanded Form

API Application Program Interface

CNN Convolutional Neural Network

Faster-R-CNN Faster R Convolutional Neural Network

FPGA Field Programmable Gate Array

IR Intermediate Representation

Open VINO Open Visual Inferencing and Neural Network Optimization

OpenCL Open Computing Language

OpenCV Open Computer Vision

ROI Region of Interest

POC Proof of Concept

v
Human Action Detection - Implementation Guide

1. Introduction

There is a need to design, develop, and test a human action detection prototype to demonstrate the
feasibility of using the Intel® Distribution of OpenVINO™ toolkit for detecting human action such as push-
ups and sit-ups with accuracy and good performance. This information also needs to be transferred to
an ISV (Avistrack*).

The Intel® Distribution of OpenVINO™ toolkit is a comprehensive toolkit for quickly developing
applications and solutions that emulate human vision. Based on CNNs, the toolkit extends computer
vision workloads across Intel® hardware, maximizing performance.

The Intel® Distribution of OpenVINO™ toolkit:

 Enables the CNN-based deep learning inference on the edge.

 Supports heterogeneous execution across computer vision accelerators from Intel, using a
common API for the CPU, Intel® Movidius™ Neural Compute Stick, and FPGA. However, we are
not covering Intel® Movidius™ NCS and FPGA in this lab.

 Speeds time-to-market through an easy-to-use library of computer vision functions and pre-
optimized kernels.

 Includes optimized calls for computer vision standards, including OpenCV, OpenCL™, and
OpenVX*.

1
Human Action Detection - Implementation Guide

2. System Requirements

Hardware Requirements:

 Processor: x86_64 is the only supported architecture


 Memory: At least 2GB RAM
 Hard Disk: 32GB HDD
 Network: Network adapter with active internet connection
 Camera: Logitech* USB camera

Software Requirement:

 Operating System: Ubuntu* 16.04 LTS 64 bit


 Git
 Intel® Distribution of OpenVINO™ toolkit

2
Human Action Detection - Implementation Guide

3. Implementation

This section describes how to develop a human action detection model using TensorFlow*. The following
steps are involved:
 Installing TensorFlow deep learning framework
 Setting up TensorFlow object detection API
 Training custom object detector using object detection API and
 Converting the trained model in to IR form using the Intel® Distribution of OpenVINO™ toolkit
Model Optimizer

After completing these steps, you will be able to detect human action such as sit-ups and push-ups using
the Intel® Distribution of OpenVINO™ toolkit.

3.1. Installing TensorFlow*


Install all the dependencies for TensorFlow CPU support.

3.1.1. Pre-requisites
$ sudo apt get update
$ sudo apt install python3-dev python3-pip # install python3
$ sudo pip3 install --U virtualenv # system-wide install

3.1.2. Creating Virtual Environment


$ virtualenv --system-site-packages –p python3 ./venv
$ source ./venv/bin/activate # sh, bash, ksh, or zsh
When virtualenv is active, your shell prompt is prefixed with (venv).

3.1.3. Installing Packages


Install packages within a virtual environment without affecting the host system setup.
Start by upgrading pip:
(venv)$ pip install --upgrade pip
(venv)$ pip list # show packages installed within the virtual environment
(venv)$ pip install tensorflow==1.9

3.1.4. Verifying Installation


To verify installation, check the TensorFlow version installed:
(venv)$ python -c “import tensorflow as tf; print(tf.__version__)”
Upon successful installation, the version of the TensorFlow will be displayed.

3
Human Action Detection - Implementation Guide

3.2. Setting up TensorFlow* Object Detection API


After successful installation of TensorFlow, install the dependencies for the TensorFlow Object Detection
API using the following:
(venv)$ pip install pillow
(venv)$ pip install lxml
(venv)$ pip install jupyter
(venv)$ pip install matplotlib

3.3. Downloading TensorFlow* Models


The next step is to clone or download the required TensorFlow model using the following command:
$ git clone https://github.com/tensorflow/models
Note: If the program ‘git’ is not installed, install it by typing:
$ sudo apt install git

3.4. Setting up Environment Variables


Follow the steps provided here to setup the environment variables to be used in later steps.
While installing Protoc, use the following commands:
(venv)$ cd <Protoc downloaded directory>
(venv)$ ./configure
(venv)$ make
(venv)$ make check
(venv)$ sudo make install
(venv)$ sudo ldconfig # Refresh shared library cache

You can follow this Link for detailed installation instructions.

3.5. Preprocessing Dataset


The required dataset consists of three classes of human actions which are push-ups, sit-ups and proper
sit-ups (doing /sit-ups by keeping hands behind head). For this, we are using HMDB51 dataset and
UCF101 dataset. All datasets mentioned are in video format (.avi). Download the datasets of HMDB51
from this Link. Similarly, the UCF101 datasets can be downloaded from this Link. If you download video
files, then they should be converted into frames or you can directly download the image datasets. For this
PoC, we are using only three human actions, namely push-ups, sit-ups and proper sit-ups.

4
Human Action Detection - Implementation Guide

3.5.1. Annotating Objects

After we have the required dataset, next step is to annotate the objects in an image. For object detection,
we need to annotate the objects in the images manually. This can be done by using Linux* tool called
LabelImg. This open source tool can be downloaded from here. Installation instructions are in the link
provided. For detailed usage of LableImg for this POC, you can follow this Link .

After all the images are manually annotated, for each image, the corresponding xml file will be generated
in the directory mentioned during annotating.

3.5.2. Converting XML to CSV Format


Divide the entire dataset into two. 90% of data will be used for training the model and 10% of data will be
used for testing. This is required for validating datasets during training.

Now create a directory named Object-Detection and create respective directories in it as shown below:

Object-Detection
-data/
--test_labels.csv
--train_labels.csv
-images/
--test/
---testingimages.jpg
--train/
---testingimages.jpg
--...yourimages.jpg
-training
-xml_to_csv.py

Keep all the preprocessed images and their corresponding .xml files in the images directory, separating
them into test and train.

Convert all the xml files generated during annotation to csv format using this script.

Within the Xml_to_Csv.py script make the following changes:

def main():
image_path = os.path.join(os.getcwd(), 'annotations')
xml_df = xml_to_csv(image_path)
xml_df.to_csv('raccoon_labels.csv', index=None)
print('Successfully converted xml to csv.')

5
Human Action Detection - Implementation Guide

To:

def main():
for directory in ['train','test']:
image_path = os.path.join(os.getcwd(),
'images/{}'.format(directory))
xml_df = xml_to_csv(image_path)
xml_df.to_csv('data/{}_labels.csv'.format(directory),
index=None)
print('Successfully converted xml to csv.')

Now run the Xml_to_Csv.py script to generate train.csv and test.csv in the data folder

3.5.3. Converting CSV Files to TensorFlow* Format


Next, convert the csv file into tf_record format that is understood by the network. This conversion is be
done using this script.
The only modification that you will need to make in the script is in the class_text_to_int() function. You
need to change this to your specific class. In our case, we just have THREE classes. If you had many
classes, then you would need to keep building out this if statement as follows:

# TO-DO replace this with label map


def class_text_to_int(row_label):
if row_label == 'person doing pushups':
return 1
if row_label == 'person doing situps':
return 2
if row_label == 'person doing proper situps':
return 3
else:
return 0

Run the generate_tfrecord.py script. We will run it twice, once for the train TFRecord and once for the test
TFRecord.

(Venv)$ cd <Object-Detection_dir>
(Venv) <Object-Detection_dir>$ python3 generate_tfrecord.py --csv_input=data/train_labels.csv --
output_path=data/train.record --image_dir=images/
(Venv) <Object-Detection_dir>$ python3 generate_tfrecord.py --csv_input=data/test_labels.csv --
output_path=data/test.record --image_dir=images/
Now, in your data directory, you should have train.record and test.record

6
Human Action Detection - Implementation Guide

3.6. Training the Model


To train the model, download the TensorFlow object detection API from this link. Set the environment
variables such as PYTHONPATH and protoc path by using the following command:

(venv)$ cd models/research/
(venv) models/research$ protoc object_detection/protos/*.proto --python_out=.
(venv) models/research$ export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
If you get any errors while executing the protoc command, then protoc is not installed properly. Follow
section 3.4 for proper installation.

Navigate to the object detection directory in the file that you have downloaded.

(venv) models/research$ cd models/research/object_detection


For training the dataset, you can choose to use pre-trained model or you can develop your own custom
model. For simplicity, a pre-trained model is used here. The various pre-trained models checkpoints can
be downloaded from this GitHub link. Here, for this POC we are using SSD_Inception_v2_COCO. It can
be downloaded from this Link and required configuration file can be downloaded from Here.

Copy the data, training, images folders and SSD_Inception_v2_COCO pretrained model to the
object_detection directory in the downloaded repository.

3.6.1. Configuring the Model


To configure the model, change the configuration file as follows:

1. num_classes: 3
2. fine_tune_checkpoint: "ssd_inception_v2_coco_2018_01_28/model.ckpt"

Also change the following as shown:

train_input_reader: {
tf_record_input_reader {
input_path: "data/train.record"
}
label_map_path: "data/object-detection.pbtxt"
}

eval_config: {
num_examples: 40 # change with respect to number of test images
}

eval_input_reader: {
tf_record_input_reader {
input_path: "data/test.record"

7
Human Action Detection - Implementation Guide

}
label_map_path: "data/object-detection.pbtxt"
shuffle: false
num_readers: 1
}

Copy this configuration file to the models/research/object_detection/training directory.

To know more on how to configure your model, check this tutorial.

3.6.2. Creating Label File


Create the label_map file that contains the names of 3 classes and their corresponding item id values.
This is named as “object-detection.pbtxt”.

For example, a class such as apple entry can be made as follows to generate the label file:

item {
id: 1
name: 'person doing pushups'
}

item {
id: 2
name: 'person doing situps'
}
item {
id: 3
name: 'person doing ‘proper situps'
}

NOTE: Similar entries can be made for other classes if any.

Copy this file, “object-detection.pbtxt” to models/research/object_detection/data.

3.6.3. Training the Model


Copy the train.py file from models/research/object_detection/legacy to models/research/object_detection/

Execute the following command for training the model:

(venv)$ cd /<obj_detect_api_dir>/model/research/object_detection

(venv) <obj_detect_api_dir>/model/research/object_detection $ python3 train.py --logtostderr --


train_dir=training/ -- pipeline_config_path = training/ssd_inception_v2_coco.config

In case of any errors, make sure that the environment variables are properly set. For example, protoc and
slim to be added in Python path.

8
Human Action Detection - Implementation Guide

On successful training, checkpoint files are available in the Training folder. Training is to be done until the
loss percentage is <1. Approximately, it will take 5000 steps (5-6hrs). Once the loss percentage is <1,
stop training the model.

3.6.4. Freezing the Model


Use the following command to generate the frozen model for custom object detector.

(venv) <obj_detect_api_dir>/model/research/object_detection$ python3 export_inference_graph.py \


--input_type image_tensor \
--pipeline_config_path training/ssd_inception_v2_coco.config \
--trained_checkpoint_prefix training/model.ckpt-56129 \
--output_directory ssd_inception_v2_inference_graph

3.7. Model Optimization


The model optimization is done here using the Intel® Distribution of OpenVINO™ toolkit. To setup in
Ubuntu, go here.

Once the model is trained, a directory “ssd_inception_v2_inference_graph” will be created with


frozen_inference_graph.pb, checkpoint and pipeline config files. These files will be input to the Intel®
Distribution of OpenVINO™ toolkit model optimizer to generate Intermediate Representation format (.bin
or .xml).

The TensorFlow model is converted into IR format with the following command.
$ sudo Python3 <INSTALL_DIR>/deployment_tools/model_optimizer/mo_tf.py --
input_model=/<frozen_graph_location_directory>/ssd_inception_v2_inferece_graph/frozen_inference_gra
ph.pb --tensorflow_use_custom_operations_config
<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/ssd_v2_support.json --
tensorflow_object_detection_api_pipeline_config
=/<frozen_graph_location_directory>/ssd_inception_v2_inferece_graph /pipeline.config --
reverse_input_channels

After executing the above command successfully, frozen inference graph.xml, frozen inference graph.bin
and frozen inference graph.mapping files are generated in the optimizer directory. These files will be used
for testing the model.

To understand details about TensorFlow model conversion mechanism, refer to here.

9
Human Action Detection - Implementation Guide

4. Testing

To test the developed human activity detection model, the Intel® Distribution of OpenVINO™ toolkit
inference engine is used. The Intel® Distribution of OpenVINO™ toolkit prebuilt sample application
“object_detection_demo_async” is used to test the optimized model.

Execute the following commands to do the inferencing.

$cd <path to object_detection_demo_async executable>

$ ./object_detection_demo_ssd_async -i cam -m <path_to_trained_model>/ frozen_inference_graph.xml


–labels frozen_detection_graph.labels

After executing the command successfully, a camera live feed will be shown in terminal. Stream the
person doing sit-ups or push-ups to be tested using the camera. A rectangular box will be marked
confirming the detection. The detection accuracy value will be displayed on top of the rectangular box.

If you want to display the label name, then create a file named frozen_inference_graph.labels file in the
directory where the object_detection_demo_async is located and type the list of class names used for
training in correct order.

Note: The listing of class names should start from second line.

Figure 1 - Output Screen Showing the Detected Human Actions

10
Human Action Detection - Implementation Guide

5. Why TensorFlow*?

There are many different deep learning frameworks available, such as Caffe*, Apache MXNet*, darkNet
and so on. Due to high availability of online support and the availability of many pre-trained models, the
TensorFlow framework is preferred here. In addition, the Intel® Distribution of OpenVINO™ toolkit
supports TensorFlow, Caffe and MXNet currently.

With the TensorFlow object detection API, already available and suited for the object detection
applications, this model is preferred to develop applications quickly.

11
Human Action Detection - Implementation Guide

6. Why TensorFlow* SSD_Inception_v2 model?

There are many pre-trained TensorFlow models available for object detection. The SSD_Incepti'n_v2
model has been used here for two reasons. They are ‘Training speed’ and ‘Training accuracy’.

The following is the list of available object detection pre-trained models and their corresponding two
attribute values.

Model name Speed (ms) Accuracy[^1]

ssd_mobilenet_v1_coco 30 21

ssd_mobilenet_v1_0.75_depth_coco 26 18

ssd_mobilenet_v1_quantized_coco 29 18

ssd_mobilenet_v1_0.75_depth_quantized_coco 29 16

ssd_mobilenet_v1_ppn_coco 26 20

ssd_mobilenet_v1_fpn_coco 56 32

ssd_resnet_50_fpn_coco 76 35

ssd_mobilenet_v2_coco 31 22

ssdlite_mobilenet_v2_coco 27 22

ssd_inception_v2_coco 42 24

faster_rcnn_inception_v2_coco 58 28

faster_rcnn_resnet50_coco 89 30

rfcn_resnet101_coco 92 30

faster_rcnn_resnet101_coco 106 32

mask_rcnn_inception_v2_coco 79 79

Table 1 - Tensor Flow Pre-trained Model Performance Metrics

If you need faster inferencing at the cost of low accuracy, you can select the mobilenet models. If you
want more accuracy at the cost of low inferencing time, you can select the inception models. For our
object detection, we have selected the SSD_inception_v2 model that is average on both speed and
accuracy. In addition, it is an easy integrated with the Intel® Distribution of OpenVINO™ toolkit.

12
Human Action Detection - Implementation Guide

7. Human Activity Counting

To count the number of actions performed by the human, say number of sit-ups or number of push-ups, a
concept called Region of Interest (ROI) is used. While inferencing with Python object_detection_sample
on the output, a rectangular box detecting the action performed is shown. You then need to find the
center of the rectangle and draw a horizontal or vertical line across the output window which is called the
ROI line. Once the center of the detection rectangle touches the ROI line two times, you can count it as
one repetition (REP) in fitness terminology. You need to code all this in the Python sample
object_detection_demo_async.py in the Intel® Distribution of OpenVINO™ toolkit.

Figure 3: Output screen showing the detected human actions and action count

In the output screen, you can see the action detection with accuracy and corresponding action count.

13

You might also like