You are on page 1of 6

Incorporation of Panoramic View in Fall Detection

Using Omnidirectional Camera


Tên sinh viên (Author)
Nguyễn Minh Quân – CTTT KTYS- K61
Trần Thị Mến – CTTT KTYS – K61 Tên giáo viên hướng dẫn
Nguyễn Xuân Bách – CTTT KTYS – K61
TS Nguyễn Việt Dũng – ĐTYS

Abstract—Falling is one of the major problems that requires more bandwidth and synchronization among
threaten the health of the elderly and particularly dangerous cameras. PTZ cameras on the other hand, while more
for people that live alone. Recently, surveillance systems flexible than conventional, can still only capture image
using omnidirectional cameras in general and fisheye from only one direction at any given time, and mechanical
cameras in particular have become an attractive choice, as parts also are prone to failure. In order to extend the FOV
they provide a wide Field of Vision without the need for of a surveillance system, omnidirectional cameras can also
multiple cameras. However, objects captured by fisheye be used.
cameras are highly distorted, therefore computer vision
approaches that are developed for conventional cameras
require modification to work with such systems. The aim of
this work is to incorporate the de-warping of fisheye image
using polar to cartesian transformation to generate a
panoramic view. Objects are detected by background
subtraction method. Depending on the objects lay inside or
outside of a center circle of the omnidirectional frame,
features based on contour and rotated bounding box of the
objects, are extracted correspondingly on original
omnidirectional view or panoramic view. Experiments show
that by incorporating both panoramic and omnidirectional Fig. 1. Types of omnidirectional cameras.
view, we can achieve significant improvement in fall
detection, particularly in peripheral areas. This result could
be a useful reference for further studies. Omnidirectional camera, which consists of fisheye-
Keywords—falling, omnidirectional, panoramic, detection lens cameras and catadioptric lens cameras (as shown in
Fig. 1), allows a larger field of vision compared to
I. GIỚI THIỆU (HEADING 1) traditional pin-hole camera by sacrificing resolution in
peripheral areas and image’s structural integrity, which
The world’s population is getting increasingly older. makes it difficult for conventional action recognition and
The proportion of elderly people (aged 60 and over) object detection algorithms to work on fisheye camera.
increased from 7.15% in 1989 to 8.93% in 2009 [1]. About Some techniques have been proposed to mitigate the
8% of all elderly people live alone and 13% elderly drawbacks of fisheye cameras. Chiang and Wang [3]
couples live alone . Living alone poses a great danger as applied HOG+SVM detector and rotate people in fisheye
the elderly are easily affected by emergency situations: images so as they appear upright. This practice can be
natural disasters, health problems such as stroke, heart computationally expensive, as in full-frame fisheye
attack, and unusual actions in daily life , especially falls, images, the image must be rotated many times. Other
strokes…[2]. These situations need to be discovered, dealt works modify the features used for conventional camera
with promptly and quickly and accurately notified to to work on omnidirectional camera [ 4], however, this
relatives or careers. Therefore, having a video surveillance might get complex and the results fall behind compared to
system is extremely necessary. Over the past decades, modern standards. Some papers propose to use
there has been various researches on computer- vision Convolutional Neural Network (CNN) directly on fisheye
based surveillance system. Most of these researches focus dataset [5]. However, the variety of human poses captured
on regular pin-hole cameras. However, while pin-hole by the fisheye camera can lead to problems, as there are
camera has an advantage of simplicity as well as giving an not enough fisheye databases currently. Other authors use
accurate representation of the human’s visual system, fisheye calibration to undistort the fisheye image [6].
monitoring a wide area with conventional pinhole cameras However, to get a full straight equiangular quadrilateral
can be troublesome, as it requires a Pan-Tilt-Zoom (PTZ) shaped image, cutting out a considerable number of pixels
or multiple regular cameras. Using multiple camera is not from the edges is required. Meanwhile this method is
optimal, as it costs more to maintain and install, as well as applicable in some situations, it is not optimal to do so in
surveillance applications. Another projection is fisheye
de- warping, in which a fisheye image is unwrapped into
360- degree equirectangular image [7]. This projection is
also flawed, as the nearer the object is to the center of the
fisheye image, the more deformed it becomes when
unwrapped.

In this work, we propose an approach to reduce the


effect of omnidirectional camera distortion by
incorporating de- warped panorama view for a better fall
detection result. Section II describes our method of Fig.2 Pixel mapping diagram.
incorporating panoramic view in fall detection using
omnidirectional video. Experimental results of the B. Background subtraction algorithm
proposed method are illustrated in Section III. Finally, We apply the background subtraction algorithm proposed in
conclusions are given in Section IV. our previous research [8]. First, the first frame is taken as a
comparison background. The background after being
converted to grayscale has a lot of background noise,
II. METHOD FOR INCORPORATING PANORAMIC therefore we apply the Gaussian Blur function to smooth
VIEW IN POSE RECOGNITION ON and reduce image noise. The background subtraction
OMNIDIRECTIONAL VIDEO algorithm KNN is applied to each frame, people or moving
objects is separated. Then, methods such as shadow removal
A. Fisheye de-warping using polar to cartesian
and noise elimination are applied to further separate the
transformation
main moving object from the background. For the shadow
An easy visualization of the de-warping techniques is to
removal, we use the threshold function – a segmentation
imagine the fisheye image as a circle, put a slice in it, and method, in which if pixel value is greater than a threshold
then uncurl it to normal equirectangular panorama. For this value, it is assigned one value (white), else it is assigned
process, we will need the position of /the center as well as another value (black). Due to the light change, the
the radius of the fisheye camera. Then, we find the comparison between two frames occurred too quickly,
corresponding fisheye output point corresponding to an resulting in noise. To eliminate noise, we use the opening
input cartesian point, with each point in the polar coordinate morphology algorithm.
C. Region detection and choosing processing method.
represented by four parameters: ( ), and the
However, fisheye de-warping algorithm is not without
corresponding cartesian coordinate pixel position ( flaws. As we unwrap the image starting from the center of
),and the height and width of the panorama image is Hp and the fisheye camera, the closer the object is to the center of
Wp, radius of fisheye image as R, center of the fisheye the fisheye image, the more distorted it becomes, as there
are not enough pixels to unwrap to the equirectangular
image is ( )
image. Therefore, this region near the center is not suitable
Then we have: for processing. To solve this problem, we simply do not
map the distorted areas of the equirectangular image from
 the fisheye image.
However, the panoramic view is now unable to detect
 objects in the central area of fisheye image. We propose a
region detection and choosing processing method algorithm.

The algorithm is as follow: If the object is completely
 outside the cropped area, we process it in panoramic view.
If a part of the object is inside the cropped area, we process
Then, for each pixel in the equirectangular image, we find
it in omnidirectional view. We can do this simply by
the corresponding pixel in the fisheye image. We then map
creating a circle with radius r in the omnidirectional view
each pixel in the omnidirectional image to their
corresponding to the cropped-out area in the panoramic
corresponding location in equirectangular image. The
view. Then for every frame, we check if the any part of the
mapping diagram can be seen in Fig.2 below:
object is inside the circle or not. If the object is inside the
circle, process the object in omnidirectional view. Else,
process in equirectangular view. This method is feasible,
due to the fact that the central area of the fisheye image is
much less distorted than the peripheral areas, therefore we
can use this region for processing without having to worry (neighbors) compared to the classified point to consider
too much about distortion, as object in this region remain which data class it belong to.
mostly undistorted. 2.3 Naïve Bayes
Our algorithm can be described according to the flow chart Naïve Bayes is an algorithm in the Supervised Learning
below: group. Naïve Bayes classification, a classification problem
based on the probability which applying Bayes theorem
[16].
According to Bayes theorem, we have the formula for
calculating the random probability of event A as follows:

Suppose that event A is divided into n independent parts ,


, …, .
We have:

So:

is proportional[16].
2.4 Decision Tree
Decision Tree is a diagram which has tree shape structure,
includes:
• Nodes: Test the value of a certain attribute
• Edge / Branch: corresponds to the result of a test and
connects to the next node or leaf.
• Leaf nodes: Terminal nodes predict the outcome
(representing class labels or class distributions) [17].
Fig.3 Algorithm flowchart
D. Posture recognition
1. Contour:
Contour is a useful tool used to analyze object shape, detect
and identify objects. Contour is a set of points that make up
a curve around pixels that have the same or near
approximation of a color value.
2. Models of machine learning: Fig 4. Diagram of Decision Tree
Machine learning is an application of Computer Science and
3. Cross validation
Artificial Intelligence (AI) that provides systems the ability
Cross validation is the most famous method to help evaluate
to automatically learn and improve from experience without
machine learning algorithm model. This method helps us
being explicitly programmed.
divide the original data into separate training and testing
2.1 Support vector machine (SVC): subsets [20].
In this algorithm, data points will be expressed in n- We will go further into the K-fold cross validation
dimensional space, where n is the number of features from method. For k-fold cross validation, the data set will be
the input data set, the value of each data point will be divided into k equal parts (k subsets have approximately the
expressed as a specific value coordinates according to the same total data) [21]. Then one subset will be chosen as a
feature axes. After that the Support vector machine test set (k-1) and other subsets will be combined as a
algorithm re-classifies by finding a hyper-plan to separate training set. The selection is repeated k times so that each of
data groups, or classes. So SVC can be interpreted as the the k subsets is selected as a test set once.
best boundary (optimal hyperlane) between classes[11].
2.2 K-Nearest Neighbor (KNN):
The k-Near Neighbor algorithm assumes that similar things
exist close together, in other words, similar things will be
near each other [13]. So the KNN algorithm is similar to
considering the position of the surrounding points group
 Model optimization:
- GridSearchCV:
The gridsearchCV2 function is used to pair the values of
each model's input parameters to find out the value of the
model's parameters for the highest accuracy on the data set.

Fig 13. result of optimizing model by gridsearchCV


Fig 5. K-fold
4. GridsearchCV - Cross validation:
Cross validation is used to check the change of accuracy
GridsearchCV is the process of making parameter over the range of each input parameter. Our aim is to refer
adjustments to determine the optimal parameter values for a to the best value of each model's input parameters on the
given model on a data set. This is important because the dataset.
performance of the entire model is based on the specified gives an accuracy graph as shown:
parameter values.
III. Posture recornization method
As mentioned above, we use four ML models to classify
and identify three postures: walking, sitting, fainting.
 Block diagram:
a) b) c)
Fig 7. result of optimizing model by Cross validation
 K-fold:
K-fold is used to slip dataset into training set and testing set.

III. IMPLEMENTATION AND RESULTS

All step are written mostly using OpenCV Python, using a


computer with core I5- 6300HQ 2.3Ghz processor, and the
video size is 640x480 pixels.
Using the BOMNI [9] database scenario 1. This database
contains two cases that are recorded by 2 full-direction
cameras. A camera mounted on the ceiling, a camera
 Select and extract features: mounted on the wall.
After applying BS algorithm, six feature is extracted, three Scenario #1 when one person performs the following
postures are classified based on the following 6 features: actions: * walks *, * sits *, * drinks water *, * washes his
hands * opens-closes *, * falls *.
- Area of contour (A1)
- Angle of line (fitting to contour) 1. Fisheye de-warping and background subtraction
- Ratio of width and height ( of rotate For fisheye de-warping, we are able to achieve an
rectangle) acceptable frame rate of 10 FPS and good results, as shown
- Area of bounding box (rotate rectangle) in Fig.5
(A2)
- Ratio of area (A1/A2)
- Angle of bounding box

Fig.8 De-warped fisheye image with distorted region excluded

The optimal radius of the circle corresponding to the


excluded distort region (the black area) is calculated to be
a) b) c) 0.145 * R, with R as the radius of the fisheye image, and in
Fig 6 a) Contour b) Line fitting to contour c) Bounding box this case, r is approximately 70 pixels.
 Accuracy of Machine learning models:

Table 2. Final results


After applying divide folds based on labels, the accuracy
difference between folds decreases significantly. The
accuracy of models applying on topcut and topano video is
much higher than top video and side video (original video)
and get the value from 97% to 100%.
Although our average result is quite high, there are still
remains some drawbacks as:
- The BS algorithm limitations such as
affection of light (stand near by too bright
places), and the algoritm of shadow
remove hasn’t optimized yet result in in
a) b)
accurate feature data.
Fig.9 Region detection and choosing processing method
- When our object opens or closes door,
a) Object outside r-radius circle, b) Object inside r-radius circle,
bounding box will include both the object
process in panoramic view , process in omnidirectional
view
and the space area when our object
After applying background subtraction methods, our opening and closing the door which lead
detection algorithm is also able to accurately detect the to mistaken by our model between
position of the person and choose the region of process walking and fainted.
automatically. In Fig. 6a , the person is out side of the circle - In the problem of sitting on the chair back
with radius r, therefore is processed in panoramic view. In to the camera, models can just recognize
Fig. 6b , the person is inside the circle with radius r, and our the upper part of the object, the lower part
algoritm automatically process the object in omnidirectional was hidden by the chair so our model
view. can’t identify this case
2. Posture recognition
 Dataset includes: V. CONCLUSION
- Sided view- camera (3549 frames): 2347
We presented a novel approach to top-view posture
frames with 3 labels
recognition for omnidirectional camera using incorporation
- Top view-camera(4058 frames) : 2974 of panoramic and omnidirectional view. Compared to
frames with 3 labels previous researches, our method could reliably deal with the
Topcut: 602frames with 3 labels drawbacks of fisheye distortion when applying conventional
Toppano: 2161 frames with 3 labels detection methods as well as solve the problem of loss of
vital information compared to conventional fisheye
 Model optimization: calibration methods. Using Decision Tree, we can achieve
satisfactory detection results on both panoramic as well as
omnidirectional view. However, our approach is still
sensitive to multiple object detection as well as occlusion or
sudden change in illumination. The authors are currently
working on these problems. The result of this work can be
used for further research on omnidirectional camera, and
with further optimization, it can be reliably used for real-
time omnidirectional surveillance system.

REFERENCES
Table 1. Optimization of input parameters
[1] “Viet Nam prepares to support aging population” ,Vietnam News, 09/2017, [11] Rohith Gandhi, “Support Vector Machine- Introduction to Machine
https://vietnamnews.vn/society/health/393500/viet-nam-prepares-to-support- Learning Algorithms”, Jun 7,2018.
aging-population.html#o2tLry468d42sApy.97 [12] P.S. Bradley and O.L. Mangasarian, “Feature selection via concave
[2] A.-T. Chiang and Y. Wang. Human detection in fish-eye images minimization and support vector machines,” in Machine Learning
using HOG-based detectors over rotated windows. In ICMEW, pages Proceedings of the Fifteenth International Conference (ICML '98). J.
1–6, 2014. Shavlik (Ed.), Morgan Kaufmann: San Francisco, California, 1998,
[3] Z.Arican and P. Frossard. Omnisift: Scale invariant pp. 82-90. ftp://ftp.cs.wisc.edu/math-prog/techreports/98-03.ps.
featuresinomnidirectionalimages. InIEEEICIP,pp. 3505–3508, 2010. [13] Wang Ling, Fu Dong-Mei, “Estimation of missing values using a
[4] I. Cinaroglu and Y. Bastanlar. A direct approach weighted k-Nearest Neighbors Algorithm”, 4-5 July 2009
forhumandetectionwithcatadioptricomnidirectional cameras. In 22nd [14] Behrooz Kamgar-Parsi and Laveen N Kanal, “An improved branch
Signal Processing and Communications Applications Conference and bound algorithm for computing k-nearneighbors”, May 19,2003
(SIU), 2014. [15] Min-Ling Zhang, Jose M.Penam, Victor Robles, “Feature section for
[5] Y. Dupuis et al. A direct approach for face detection on multi-label Naïve Bayes Classification”, June 5, 2019
omnidirectional images. In IEEE ROSE, pp. 243– 248, 2011 [16] Yuguang Huang, Lei Li, “Naïve Bayes classification algorithm based
[6] Davide Scaramuzza, Agostino Martinelli, Roland Siegwart, A on small sample set”, Sept 2011
Flexible Technique for Accurate Omnidirectional Camera Calibration [17] Afroz Chakure, “Decision Tree Classification”, Jul 6,2019.
and Structure from Motion
[18] Avinash Navlani, “Decision Tree Classification in Python”,
[7] Hyungtae Kim, Jaehoon Jung, Joonki Paik, “Fisheye lens camera December 29th, 2018.
based surveilance system for wide field of view monitoring”
[19] Anuja Priyam, Rahul Gupta, Anju Rathee, Saurabh Srivastava,
[8] Viet Dung Nguyen, Men Tran Thi, Quan Minh Nguyen, Bach “Comparative Analysis of Decision Tree Classification Algorithms”,
Nguyen Xuan, “EVALUATION OF BACKGROUND 2003
SUBTRACTION METHODS IN OMNIDIRECTIONAL VIDEO”,
[20] Raheek Shaikh, “Cross Validation Explained: Evaluating estimator
KISC, Hanoi University of Science and Technology, Vietnam, 2019
performance.”, Nov 26, 2018.
[9] Demiröz B. E., Arı İ., Eroğlu O., Salah A. A., Akarun L., "Feature-
[21] Krishni, “K-Fold Cross Validation”, Dec 17,2018.
Based Tracking On A Multi-Omnidirectional Camera Dataset",
International Symposium On Communications, Control, And Signal
Processing (ISCCSP12), Rome, Italy, 2012.
[10] Avinash Navlani, “Support Vector Machines with Scikit-learn”, Dec
28th, 2019

You might also like