Professional Documents
Culture Documents
Отчет en
Отчет en
Moscow - 2019
Table of contents
Table of contents2
Goal3
Tasks3
Relevance3
Novelty and Significance3
Main part4
Methodology4
Search for analogues4
Finding ways to solve problems5
Library structure5
Explanation of the code responsible for tracking6
Explanation of the code responsible for object recognition10
Installation17
Performance17
Summary18
Development Prospects18
Sources cited18
Appendix19
Target
Create an object recognition and counting system for Smart City systems.
Tasks
1. Study information about similar projects;
2. Find algorithms for identifying, tracking, and counting objects;
3. Develop a system for tracking and counting various objects based on
these algorithms
Relevance
Internet of Things" technologies, in particular "Smart City" are increasingly
being used in various spheres of activity. To ensure the correct operation of
such systems, services are needed to collect information and monitor the
state of the environment of this "Smart City. In particular, one of the main
tasks in this case is to obtain data on the number and type of objects under
study, as well as their movement and movement.
Main part
Methodology
In order to find the thematic information necessary to implement the
project, various sources devoted to computer vision technologies were
studied, as well as the existing analogues and their implementations were
investigated. After analyzing the information received, it was decided that
the system itself will be written in Python (as there is a large amount of
documentation and useful examples for this language[4]) in the form of a
library. When writing the library of computer vision OpenCV[1] was used
because of its openness, extensive functionality, as well as a large amount
of documentation
Search for analogues
When studying the projects and systems whose aim was to solve the tasks
set or similar to them, two main drawbacks were discovered: the narrow
specialization of each of the presented systems, as well as the impossibility
of their integration with other services due to their closed nature. To
eliminate these problems, it was decided that the library would be open
source, hosted on the repository, which would provide easy access and ease
of use when integrating with different services.
Finding ways to solve assigned tasks
All of the analogs studied used computer vision technology, so it was
logical to use this approach as well.
The first method of recognition was the detection of objects in the
foreground (Fig. 1 in the appendix). It turned out that this method is very
sensitive to changes in the camera position, as well as to changes in the
light level, which caused a change in the shade of the background, and the
program distinguished objects in the foreground worse.
Another solution was to use neural networks. This method combines two
important advantages: robustness to noise in the input data (i.e., the
program is less likely to recognize foreign objects), and adaptability to
changes in the environment (i.e., the quality of the program will not be too
different when the environment changes).
Thus, the next task was the selection of a neural network algorithm.
OpenCV has a module dnn[6] (Deep Neural Networks) for deep neural
networks. It supports several frameworks, such as TensorFlow, PyTorch
and others. For the initial development it was necessary to find a trained
model with a large list of recognizable objects, while being able to work in
real time to process incoming information in a timely manner. Such a
model was found, it was in Caffe[2] format (Fig. 2 in the appendix).
Library structure
The full source code of the library is located on the repository at github[3].
The library consists of 5 files : setup.py, which which
regulates the installation of the library, its code is shown below:
from setuptools import setup, find_packages
from os.path import join, dirname
setup(
name='rct',
version='1.0',
packages= ['rct']
)
My library is called rct and the version I am referring to in this paper is the
first one. And it is the only package that is installed when running this
script (see more in
"Installation").
There are 3 files in the rct folder: object.py, recog.py, init . py,
MobileNetSSD_deploy.caffemodel, MobileNetSSD_deploy.prototxt.
init . py is just a directory labeling file as a Python package directory,
and the last two files are responsible for the neural network (Fig. 3,4 in the
appendix).
Explanation of the code responsible for tracking
The object.py file contains code for tracking recognized objects. Below is
the code of the CentroidTracker class:
from scipy.spatial import distance as dist
from collections import OrderedDict
import numpy as np
class CentroidTracker:
def init (self, maxDisappeared=50,
maxDistance=50): self.nextObjectID = 0
self.objects = OrderedDict()
self.disappeared = OrderedDict()
self.maxDisappeared = maxDisappeared
self.maxDistance = maxDistance
def register(self, centroid):
self.objects[self.nextObjectID] = centroid
self.disappeared[self.nextObjectID] = 0
self.nextObjectID += 1
class TrackableObject:
def init (self, objectID, centroid):
self.objectID = objectID
self.centroids = [centroid]
self.counted = False
In order to track and count an object in the video stream, you need a way to
store information about the object itself, including such parameters as the
identifier of this object, the positions of its previous centers (thanks to this
you can trace the direction of movement), information about whether the
object has been counted or not. The class constructor takes the object's
identifier and center and stores them. The Septroids variable is a list,
because it contains the history of the object's location in the form of its
centers.
The constructor is also initialized as False, indicating that the object has not
yet been counted.
Explanation of the code responsible for object recognition
The recog.py file contains code for recognizing and counting objects. This
file contains the Recognition class, which contains various functions for
analyzing and
data (images and video) processing. The source code of the file is given
below:
import cv2
import numpy as np
import os
from objects import CentroidTracker, TrackableObject
from imutils.video import VideoStream
from imutils.video import FPS
import dlib
import time
class Recognition:
def init (self,caffe="MobileNetSSD_deploy.caffemodel",
prototxt='MobileNetSSD_deploy.prototxt',
CLASSES = ["background", "aeroplane",
"bicycle", "bird", "boat", "bottle", "bus", "car", "cat",
"chair", "cow", "diningtable", "dog", "horse",
"motorbike", "person",
"pottedplant", "sheep", "sofa", "train",
"tvmonitor"]): self.caffe = caffe
self.prototxt = prototxt
self.CLASSES = CLASSES
def recog_im_dir(self,path_to_images,list_of_ignored):
self.path_to_images = path_to_images
self.list_of_ignored = list_of_ignored
conf=0.4
IGNORE=set(list_of_ignored)
COLORS = np.random.uniform(0, 255,
size=(len(set(self.CLASSES)-set(IGNORE)), 3))
net = cv2.dnn.readNetFromCaffe(self.prototxt, self.prototxt)
for j in os.listdir(self.path_to_images):
image = cv2.imread(j)
(h, w) = image.shape[:2]
blob = cv2.dnn.blobFromImage(cv2.resize(image, (300, 300)),
0.007843, (300, 300), 127.5)
net.setInput(blob)
detections = net.forward()
for i in np.arange(0, detections.shape[2]):
confidence = detections[0, 0, i, 2]
if confidence > conf:
def
count_single_object_from_video(self,path_to_video,path_to_out,obj):
self.path_to_video = path_to_video
self.path_to_out = path_to_out
conf=0.4
sk_fr=30
net =
cv2.dnn.readNetFromCaffe('MobileNetSSD_deploy.prototxt',
'MobileNetSSD_deploy.caffemodel')
vs = cv2.VideoCapture(self.path_to_video)
writer = None
W = None
H = None
ct = CentroidTracker(maxDisappeared=40, maxDistance=50)
trackers = []
trackableObjects = {}
totalFrames = 0
#totalDown = 0
#totalUp = 0
total=0
fps = FPS().start()
while True:
frame = vs.read()
frame = frame[1]
frame = imutils.resize(frame, width=500)
rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
if W is None or H is None:
(H, W) = frame.shape[:2]
fourcc = cv2.VideoWriter_fourcc(*"MJPG")
writer = cv2.VideoWriter(self.path_to_out, fourcc, 30,(W, H),
True)
status = "Waiting"
rects = []
if totalFrames % sk_fr == 0:
status = "Detecting"
trackers = []
blob = cv2.dnn.blobFromImage(frame, 0.007843, (W, H),
127.5)
net.setInput(blob)
detections = net.forward()
for i in np.arange(0, detections.shape[2]):
confidence = detections[0, 0, i, 2]
if confidence > conf:
idx = int(detections[0, 0, i, 1])
if CLASSES[idx] != obj:
continue
box = detections[0, 0, i, 3:7] * np.array([W, H, W, H])
(startX, startY, endX, endY) = box.astype("int")
tracker = dlib.correlation_tracker()
rect = dlib.rectangle(startX, startY, endX, endY)
tracker.start_track(rgb, rect)
trackers.append(tracker)
else:
for tracker in trackers:
status = "Tracking"
tracker.update(rgb)
pos = tracker.get_position()
startX = int(pos.left())
startY = int(pos.top())
endX = int(pos.right())
endY = int(pos.bottom())
Installation
First you need to install all the necessary libraries: OpenCV and all related
packages (Numpy, Scipy and others), but this is not necessary, because all
third-party libraries will be installed automatically when you enter the next
command in the console:
pip install opencv-python opencv-contrib-python
To install the library itself, just execute the following commands in the
console:
git clone https://github.com/mmkuznecov/rct.git
python setup.py build
python setup.py install
After that the library will appear in the list of installed (Fig. 5,6 in the appendix)
Performance
The program for counting the number of people is able to process incoming
information from the computer's webcam in real time (camera resolution
1280×720, frame rate 10 FPS). When tested on a video lasting 20 seconds
and weighing 47.2 MB, the program was able to process it in 48 seconds. A
frame from this video and its processed version are shown in Fig. 7. The
program
was tested on a laptop Acer Aspire V3 (PC data: processor-Intel Core i7,
8GB RAM).
Results
A system of recognition, tracking and counting of objects was written, and
a face recognition system was also connected to it. The materials of the
work can be useful in the development of various smart monitoring
systems, security systems, traffic control and others. The library can also be
simply connected to other systems, services. The results of the work are
shown in Fig. 7,8,9.
Development Prospects
In the future it is planned to implement the use of different types of neural
networks depending on the task (extremely accurate, for example, are
models YOLO[7] (Fig. 10), it is advisable to use them in cases where the
statistical error should be minimal). It is also planned to write a module to
compose visual infographics (Fig. 11), based on the information obtained
by its direct processing (for example, a graph showing the number of
objects in each frame ).
Sources cited
1 The official site of the OpenCV library with the cited
documentation[Electronic resource]. URL: https://opencv.org/
6 Documentation for the OpenCV library dnn module from the library's
official website. URL:
https://docs.opencv.org/3.4/d2/d58/tutorial_table_of_content_dnn.html
Appendix
Fig. 1 Using the foreground selection algorithm (the white color shows the
outlines of passing people)
Fig. 2 Example of image processing with the help of a neural network - different
objects are recognized with high accuracy.
Fig. 8 Operation of the program for the purpose of counting passing cars
(clarification: the frames outlined around the cars are not the effect of the program,
they were present on the original video)
Fig. 9 Integration with facial recognition system
Fig. 11 Graph showing the number of people in each frame of the video, built
using the matplotlib library