You are on page 1of 22

Subscribe to DeepL Pro to edit this document.

Visit www.DeepL.com/profor more information.

State Budgetary Educational Institution

Engineering School #1581 Project work

Development and creation of a system of recognition,


tracking and counting of objects for Smart City systems, as
well as its integration with various services

Author: Mikhail Mikhailovich Kuznetsov

Supervisor: Elena Aleksandrovna Olkhovskaya,


teacher of physics at State Educational
Institution "Engineering School №1581

Moscow - 2019
Table of contents
Table of contents2
Goal3
Tasks3
Relevance3
Novelty and Significance3
Main part4
Methodology4
Search for analogues4
Finding ways to solve problems5
Library structure5
Explanation of the code responsible for tracking6
Explanation of the code responsible for object recognition10
Installation17
Performance17
Summary18
Development Prospects18
Sources cited18
Appendix19
Target
Create an object recognition and counting system for Smart City systems.

Tasks
1. Study information about similar projects;
2. Find algorithms for identifying, tracking, and counting objects;
3. Develop a system for tracking and counting various objects based on
these algorithms

Relevance
Internet of Things" technologies, in particular "Smart City" are increasingly
being used in various spheres of activity. To ensure the correct operation of
such systems, services are needed to collect information and monitor the
state of the environment of this "Smart City. In particular, one of the main
tasks in this case is to obtain data on the number and type of objects under
study, as well as their movement and movement.

Novelty and Significance


At the moment, there are no unified systems for the recognition, tracking
and counting of a large number of different types of objects. Ideas of
combining these functions into a single system have so far been
implemented only in the framework of scientific research. As for applied
solutions, there are only highly specialized services, for example, for
attendance accounting. Such services are provided by the company V-.
Count[5]. But this approach has two disadvantages: this accounting is
applicable only to people, i.e. the range of recognized objects is extremely
narrow, as well as the high cost of these services. My solution, on the other
hand, is completely open, and the neural network used to run the program
can be further trained and retrained, i.e. my solution is adaptable to
different types of tasks.

Main part
Methodology
In order to find the thematic information necessary to implement the
project, various sources devoted to computer vision technologies were
studied, as well as the existing analogues and their implementations were
investigated. After analyzing the information received, it was decided that
the system itself will be written in Python (as there is a large amount of
documentation and useful examples for this language[4]) in the form of a
library. When writing the library of computer vision OpenCV[1] was used
because of its openness, extensive functionality, as well as a large amount
of documentation
Search for analogues
When studying the projects and systems whose aim was to solve the tasks
set or similar to them, two main drawbacks were discovered: the narrow
specialization of each of the presented systems, as well as the impossibility
of their integration with other services due to their closed nature. To
eliminate these problems, it was decided that the library would be open
source, hosted on the repository, which would provide easy access and ease
of use when integrating with different services.
Finding ways to solve assigned tasks
All of the analogs studied used computer vision technology, so it was
logical to use this approach as well.
The first method of recognition was the detection of objects in the
foreground (Fig. 1 in the appendix). It turned out that this method is very
sensitive to changes in the camera position, as well as to changes in the
light level, which caused a change in the shade of the background, and the
program distinguished objects in the foreground worse.
Another solution was to use neural networks. This method combines two
important advantages: robustness to noise in the input data (i.e., the
program is less likely to recognize foreign objects), and adaptability to
changes in the environment (i.e., the quality of the program will not be too
different when the environment changes).
Thus, the next task was the selection of a neural network algorithm.
OpenCV has a module dnn[6] (Deep Neural Networks) for deep neural
networks. It supports several frameworks, such as TensorFlow, PyTorch
and others. For the initial development it was necessary to find a trained
model with a large list of recognizable objects, while being able to work in
real time to process incoming information in a timely manner. Such a
model was found, it was in Caffe[2] format (Fig. 2 in the appendix).
Library structure
The full source code of the library is located on the repository at github[3].
The library consists of 5 files : setup.py, which which
regulates the installation of the library, its code is shown below:
from setuptools import setup, find_packages
from os.path import join, dirname
setup(
name='rct',
version='1.0',
packages= ['rct']
)

My library is called rct and the version I am referring to in this paper is the
first one. And it is the only package that is installed when running this
script (see more in
"Installation").
There are 3 files in the rct folder: object.py, recog.py, init . py,
MobileNetSSD_deploy.caffemodel, MobileNetSSD_deploy.prototxt.
init . py is just a directory labeling file as a Python package directory,
and the last two files are responsible for the neural network (Fig. 3,4 in the
appendix).
Explanation of the code responsible for tracking
The object.py file contains code for tracking recognized objects. Below is
the code of the CentroidTracker class:
from scipy.spatial import distance as dist
from collections import OrderedDict
import numpy as np

class CentroidTracker:
def init (self, maxDisappeared=50,
maxDistance=50): self.nextObjectID = 0
self.objects = OrderedDict()
self.disappeared = OrderedDict()
self.maxDisappeared = maxDisappeared
self.maxDistance = maxDistance
def register(self, centroid):
self.objects[self.nextObjectID] = centroid
self.disappeared[self.nextObjectID] = 0
self.nextObjectID += 1

def deregister(self, objectID):


del self.objects[objectID]
del self.disappeared[objectID]

def update(self, rects):


if len(rects) == 0:
for objectID in list(self.disappeared.keys()):
self.disappeared[objectID] += 1
if self.disappeared[objectID] >
self.maxDisappeared:
self.deregister(objectID)
return self.objects
inputCentroids = np.zeros((len(rects), 2), dtype="int")
for (i, (startX, startY, endX, endY)) in enumerate(rects)
cX = int((startX + endX) / 2.0)
cY = int((startY + endY) / 2.0)
inputCentroids[i] = (cX, cY)
if len(self.objects) == 0:
for i in range(0,
len(inputCentroids)):
self.register(inputCentroids[i])
else:
objectIDs = list(self.objects.keys())
objectCentroids = list(self.objects.values())
D = dist.cdist(np.array(objectCentroids),
inputCentroids)
rows = D.min(axis=1).
argsort() cols =
D.argmin(axis=1)[rows]
usedRows = set()
usedCols = set()
for (row, col) in zip(rows, cols):
if row in usedRows or col in usedCols:
continue
if D[row, col] >
self.maxDistance: continue
objectID = objectIDs[row]
self.objects[objectID] = inputCentroids[col]
self.disappeared[objectID] = 0
usedRows.add(row)
usedCols.add(col)
unusedRows = set(range(0,
D.shape[0])).difference(usedRows)
unusedCols = set(range(0,
D.shape[1])).difference(usedCols)
if D.shape[0] >= D.shape[1]:
for row in unusedRows:
objectID = objectIDs[row]
self.disappeared[objectID] += 1
if self.disappeared[objectID] >
self.maxDisappeared:
self.deregister(objectID)
else:
for col in unusedCols:
self.register(inputCentroids[col])
return self.objects

The presented tracking algorithm can be broken down into sub-items:


1. Construct a rectangular area around the detected object and
search for the center of this area. We pass in the set of bounding
rectangles the x, y coordinates for each detected object in each individual
frame. After obtaining the coordinates of the bounding rectangles, we must
calculate the coordinates of the center (the central x, y coordinates of these
rectangles).
2. Calculate the distance between the new bounding rectangles and
the existing objects. For each subsequent frame in our video stream we
apply step #1 by calculating the centers of the objects, then we need to
determine if we can associate the new object centers with the old ones. To
perform this process, we calculate the distances between each pair of
existing object centers and the input object centers. Then we calculate the
distances between each pair of the original centers and the new centers.
3. Update x, y-coordinates of existing objects. The center tracking
algorithm assumes that a given object will potentially move between
subsequent frames, but the distance between centers for two adjacent
frames will be less than all other distances between objects. Thus, if you
associate the centers with the minimum distance between subsequent
frames, you can build an object tracker.
4. Registering new objects. If the number of detected input objects
exceeds the number of tracked objects, a new object must be registered.
"Register" simply means that the new object is added to the
our list of tracked objects: a new object number is assigned to it, the
coordinates of the center of the bounding rectangle for this object are
started to be stored. Then we can go back to the step
№2 and repeat the sequence of steps for each frame in our video stream.
5. Deletion of old objects. Old objects are deleted if they can't be matched
with any of the existing objects, and get a total of N>0 subsequent frames.
Next in the file is the TrackableObject class, its code is given below:

class TrackableObject:
def init (self, objectID, centroid):
self.objectID = objectID
self.centroids = [centroid]
self.counted = False

In order to track and count an object in the video stream, you need a way to
store information about the object itself, including such parameters as the
identifier of this object, the positions of its previous centers (thanks to this
you can trace the direction of movement), information about whether the
object has been counted or not. The class constructor takes the object's
identifier and center and stores them. The Septroids variable is a list,
because it contains the history of the object's location in the form of its
centers.
The constructor is also initialized as False, indicating that the object has not
yet been counted.
Explanation of the code responsible for object recognition
The recog.py file contains code for recognizing and counting objects. This
file contains the Recognition class, which contains various functions for
analyzing and
data (images and video) processing. The source code of the file is given
below:
import cv2
import numpy as np
import os
from objects import CentroidTracker, TrackableObject
from imutils.video import VideoStream
from imutils.video import FPS
import dlib
import time
class Recognition:
def init (self,caffe="MobileNetSSD_deploy.caffemodel",
prototxt='MobileNetSSD_deploy.prototxt',
CLASSES = ["background", "aeroplane",
"bicycle", "bird", "boat", "bottle", "bus", "car", "cat",
"chair", "cow", "diningtable", "dog", "horse",
"motorbike", "person",
"pottedplant", "sheep", "sofa", "train",
"tvmonitor"]): self.caffe = caffe
self.prototxt = prototxt
self.CLASSES = CLASSES

def recog_im_dir(self,path_to_images,list_of_ignored):
self.path_to_images = path_to_images
self.list_of_ignored = list_of_ignored
conf=0.4
IGNORE=set(list_of_ignored)
COLORS = np.random.uniform(0, 255,
size=(len(set(self.CLASSES)-set(IGNORE)), 3))
net = cv2.dnn.readNetFromCaffe(self.prototxt, self.prototxt)
for j in os.listdir(self.path_to_images):
image = cv2.imread(j)
(h, w) = image.shape[:2]
blob = cv2.dnn.blobFromImage(cv2.resize(image, (300, 300)),
0.007843, (300, 300), 127.5)
net.setInput(blob)
detections = net.forward()
for i in np.arange(0, detections.shape[2]):
confidence = detections[0, 0, i, 2]
if confidence > conf:

idx = int(detections[0, 0, i, 1])


if self.CLASSES[idx] in IGNORE:
continue
box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
(startX, startY, endX, endY) = box.astype("int")
label = "{}: {:.2f}%".format(CLASSES[idx], confidence *
100)
cv2.rectangle(image, (startX, startY), (endX,
endY),COLORS[idx], 2)
y = startY - 15 if startY - 15 > 15 else startY + 15
cv2.putText(image, label, (startX,
y),cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS[idx], 2)
cv2.imwrite('rec_'+j,image)

def
count_single_object_from_video(self,path_to_video,path_to_out,obj):
self.path_to_video = path_to_video
self.path_to_out = path_to_out
conf=0.4
sk_fr=30
net =
cv2.dnn.readNetFromCaffe('MobileNetSSD_deploy.prototxt',
'MobileNetSSD_deploy.caffemodel')
vs = cv2.VideoCapture(self.path_to_video)
writer = None
W = None
H = None
ct = CentroidTracker(maxDisappeared=40, maxDistance=50)
trackers = []
trackableObjects = {}
totalFrames = 0
#totalDown = 0
#totalUp = 0
total=0
fps = FPS().start()
while True:
frame = vs.read()
frame = frame[1]
frame = imutils.resize(frame, width=500)
rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
if W is None or H is None:
(H, W) = frame.shape[:2]
fourcc = cv2.VideoWriter_fourcc(*"MJPG")
writer = cv2.VideoWriter(self.path_to_out, fourcc, 30,(W, H),
True)

status = "Waiting"
rects = []
if totalFrames % sk_fr == 0:
status = "Detecting"
trackers = []
blob = cv2.dnn.blobFromImage(frame, 0.007843, (W, H),
127.5)
net.setInput(blob)
detections = net.forward()
for i in np.arange(0, detections.shape[2]):
confidence = detections[0, 0, i, 2]
if confidence > conf:
idx = int(detections[0, 0, i, 1])
if CLASSES[idx] != obj:
continue
box = detections[0, 0, i, 3:7] * np.array([W, H, W, H])
(startX, startY, endX, endY) = box.astype("int")
tracker = dlib.correlation_tracker()
rect = dlib.rectangle(startX, startY, endX, endY)
tracker.start_track(rgb, rect)
trackers.append(tracker)
else:
for tracker in trackers:
status = "Tracking"
tracker.update(rgb)
pos = tracker.get_position()
startX = int(pos.left())
startY = int(pos.top())
endX = int(pos.right())
endY = int(pos.bottom())

rects.append((startX, startY, endX, endY))


objects = ct.update(rects)
for (objectID, centroid) in objects.items():
to = trackableObjects.get(objectID, None)
if to is None:
to = TrackableObject(objectID, centroid)
else:
y = [c[1] for c in to.centroids]
direction = centroid[1] - np.mean(y)
to.centroids.append(centroid)
If not to.counted:
if direction < 0 and centroid[1] < H // 2:
#totalUp += 1
total += 1
to.counted = True
elif direction > 0 and centroid[1] > H // 2:
#totalDown += 1
total += 1
to.counted = True
trackableObjects[objectID] = to
text = "ID {}".format(objectID)
cv2.putText(frame, text, (centroid[0] - 10, centroid[1] - 10),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 2)
cv2.circle(frame, (centroid[0], centroid[1]), 4, (255, 255, 255),
-1)
info = [
('Total',total),
("Status", status),
]
for (i, (k, v)) in enumerate(info):
text = "{}: {}".format(k, v)
if writer is not None:
writer.write(frame)
cv2.imshow("Frame", frame)
key = cv2.waitKey(1) & 0xFF
# if the `q` key was pressed, break from the loop
if key == ord("q"):
break
totalFrames += 1
fps.update()
fps.stop()
print("[INFO] elapsed time: {:.2f}".format(fps.elapsed()))
print("[INFO] approx. FPS: {:.2f}".format(fps.fps()))
if writer is not None:
writer.release()
else:
vs.release()
cv2.destroyAllWindows()
return total
Function init is a class constructor that contains
attributes self.caffe, self.prototxt-path to trained models for recognition,
self.CLASSES-list of objects that the neural network is able to recognize.
The recog_im_dir function is used to recognize objects in images. It is
used in the situation when there is no video recording, but there is a large
set of related images (taken in one area, location), and it is necessary to
quickly recognize what is shown on them in automatic mode, to trace what
happens. It takes two arguments for input: path_to_images-the path to the
folder with the images and list_of_ignored-list of objects that are not
necessarily selected during recognition. Since the complex analysis
requires recognition and selection
If you have a large number of objects, it is the list of ignored objects that is
passed as an argument.
The count_single_object_from_video function is used to count the
number of specific objects from a video stream. It is used to analyze
already recorded video, as well as real-time video streams. It returns the
number of counted objects and also allows to visualize the recognition,
counting and tracking process itself.

Installation
First you need to install all the necessary libraries: OpenCV and all related
packages (Numpy, Scipy and others), but this is not necessary, because all
third-party libraries will be installed automatically when you enter the next
command in the console:
pip install opencv-python opencv-contrib-python
To install the library itself, just execute the following commands in the
console:
git clone https://github.com/mmkuznecov/rct.git
python setup.py build
python setup.py install
After that the library will appear in the list of installed (Fig. 5,6 in the appendix)
Performance
The program for counting the number of people is able to process incoming
information from the computer's webcam in real time (camera resolution
1280×720, frame rate 10 FPS). When tested on a video lasting 20 seconds
and weighing 47.2 MB, the program was able to process it in 48 seconds. A
frame from this video and its processed version are shown in Fig. 7. The
program
was tested on a laptop Acer Aspire V3 (PC data: processor-Intel Core i7,
8GB RAM).

Results
A system of recognition, tracking and counting of objects was written, and
a face recognition system was also connected to it. The materials of the
work can be useful in the development of various smart monitoring
systems, security systems, traffic control and others. The library can also be
simply connected to other systems, services. The results of the work are
shown in Fig. 7,8,9.

Development Prospects
In the future it is planned to implement the use of different types of neural
networks depending on the task (extremely accurate, for example, are
models YOLO[7] (Fig. 10), it is advisable to use them in cases where the
statistical error should be minimal). It is also planned to write a module to
compose visual infographics (Fig. 11), based on the information obtained
by its direct processing (for example, a graph showing the number of
objects in each frame ).

Sources cited
1 The official site of the OpenCV library with the cited
documentation[Electronic resource]. URL: https://opencv.org/

2 Official website of the Caffe framework, with links to documentation


attached. URL: http://caffe.berkeleyvision.org/

3 Project Github[Electronic resource]. URL:


https://github.com/mmkuznecov/rct
4 Learn OpenCV - a resource for learning the OpenCV library
with interesting examples. URL: https://www.learnopencv.com

5 Official website of the company V-Count with a description of their


services[Electronic resource]. URL: https://v- count.com/

6 Documentation for the OpenCV library dnn module from the library's
official website. URL:
https://docs.opencv.org/3.4/d2/d58/tutorial_table_of_content_dnn.html

7 Yolo project website[Electronic resource]. URL:


https://pjreddie.com/darknet/yolo/

8 Link to video folder[Electronic resource]: URL:


https://drive.google.com/open?id=13dKnOy-
PddQDxvHcQW08N4E85PqXNeWl

Appendix

Fig. 1 Using the foreground selection algorithm (the white color shows the
outlines of passing people)
Fig. 2 Example of image processing with the help of a neural network - different
objects are recognized with high accuracy.

Fig. 3,4 Library structure

Fig. 5,6 Library installation (screenshots of the terminal)


Fig. 7 Example of operation of the program for counting the number of people (left -
an ordinary frame, right - passed through the program, the original video and its
processed version can be seen by clicking on the link[8]).

Fig. 8 Operation of the program for the purpose of counting passing cars
(clarification: the frames outlined around the cars are not the effect of the program,
they were present on the original video)
Fig. 9 Integration with facial recognition system

Fig. 10 Example of object recognition using YOLO

Fig. 11 Graph showing the number of people in each frame of the video, built
using the matplotlib library

You might also like