You are on page 1of 16

Real-Time Multiple Human Activity Recognition from Tracked Body Displacements

Sagar Medikeri Shashank Pujar


under the guidance of

Dr Uma Mudenagudi Department of Electronics and Communications, BVBCET, Hubli, India.

Contents
1 Introduction 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Motivation of the problem . . . . . . . . . . . . . . . . . . . . 2 Review of Literature 1 1 1 3

3 Human Activity Recognition from Tracked Body Displacements 4 3.1 Human Detection Algorithm . . . . . . . . . . . . . . . . . . 4 3.2 Multiple Human Tracking and Tagging the activities . . . . . 6 4 Results and Conclusion 4.1 Experimental Results . 4.2 Conclusion and Future 4.2.1 Conclusion . . 4.2.2 Future Work . 10 10 10 10 12

. . . . Work . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

List of Figures
3.1 3.2 4.1 Activity state recognition without and with backtracking . . Outputs of various processes involved in the algorithm . . . . Results of the Activity Recognition algorithm . . . . . . . . . 8 9 11

ii

Human Activity Recognition

Chapter 1

Introduction
1.1 Introduction

Recognizing human activity is a very challenging task, ranging from lowlevel sensing and feature extraction from sensory data to high-level inference algorithms used to infer the state of the subject from the dataset. We are interested in the scientic challenges of modeling simple activities of people like walking, running and standing. We have come up with a recognizer which is able to reject moving objects other than humans and eventually recognize the activities of only the humans as classied by a computer vision system. We assume that the camera position remains xed as in case of a surveillance or security camera. Human Activity Recognition is a complex problem involving various subprocesses. It begins with (1) detecting if there are any humans in the video captured by the camera. (2) Once a person is detected, he is tracked and (3) the activity being done the person is identied and tagged with one of the states from the dened activity set. We dene the activity set as consisting of the following states - Run Left, Walk Left, Stand Still, Walk Right and Run Right.

1.2

Motivation of the problem

Human Activity Recognition has a lot of scope in many elds. Real-time imaging and human motion tracking systems are applicable to areas such as surveillance, robotics, law enforcement and defense. Many dierent applications have been studied by researchers in activity recognition; examples include assisting the sick and disabled. For example, Pollack et al in [2] show

Human Activity Recognition

that by automatically monitoring human activities, home-based rehabilitation can be provided for people suering from traumatic brain injuries. One can nd applications ranging from security-related applications and logistics support to location-based services. Due to its many-faceted nature, dierent elds may refer to activity recognition as plan recognition, goal recognition, intent recognition, behavior recognition, location estimation and location based services.

Dept of EC, BVBCET, Hubli, KA.

Human Activity Recognition

Chapter 2

Review of Literature
In [3] authored by Rybski and Veloso, human activity is recognized by tracking face displacements. Here, Haar trained classier for human face is used to detect and track faces. However, it is assumed that persons face is always facing the camera. If the persons face or in general, any object the classier is trained to detect, is even slightly turned away from the camera, the haar classier fails to detect it. This is the inherent disadvantage of using haar classiers apart from the increased computation and delay. In [1] authored by Hong, Turk and Huang, nite state machine models of gestures are constructed by learning spatial and temporal information of the gestures separately from each other. However, it is assumed that the gestures are performed directly in front of the camera and that the individual features of the face and hands can be recognized and observed without error. Much of the related work in activity modeling relies upon xed cameras with known poses with respect to the objects and people that they are tracking. Our eorts focus on activity models that can be tracked and observed by uncalibrated vision systems which do not have the luxury of knowing their absolute position in the environment. This approach is attractive because it minimizes the cost for setting up the system and increases its general utility.

Human Activity Recognition

Chapter 3

Human Activity Recognition from Tracked Body Displacements


3.1 Human Detection Algorithm

The detection algorithm consists of following processing steps: 1. Frame subtraction, 2. Thesholding, 3. Update the motion history, 4. Median ltering, 5. Finding contours, 6. Computing centers of the contours, 7. Motion-vector extraction and 8. Activity state classication. All computations on each image frame are completed with one pass over the image, thereby providing a high throughput. The high throughput is a critical factor for achieving the goal of real-time tracking. The advantages of this algorithm among others, include 1) identifying an object by its motions without relying on a priori assumption of object

Human Activity Recognition

model, 2) using a single video camera, and 3) providing better computation speed and accuracy in detection and tracking compared to haar classiers. (a) Frame Subtraction The two adjacent image frames from the video sequence are denoted as I1 (x, y) and I2 (x, y). The width and height for each frame are W and H respectively. It is safe to assume that the frame rate is suciently high with respect to the velocity of the movement. With this assumption, the dierence between I1 (x, y) and I2 (x, y) should contain information about the location and incremental movements of the object. Let Id (x, y) = |I1 (x, y) I2 (x, y)| ; The frame subtraction also serves an important function of eliminating the background and any stationary objects. This is done using cvAbsDi function. (b) Thresholding The dierence image is thresholded in to a binary image (silhouette) according to Isilh = 1 if Id (x, y) Isilh = 0 if Id (x, y) where is a threshold that determines the tradeo between sensitivity and robustness of the tracking algorithm. This is done using cvThreshold function. (c) Updating the Motion History Here, the motion history image is updated by moving the silhouette image. The motion history image is updated as following: mhi(x,y)=timestamp if silhouette(x,y)!=0 mhi(x,y)=0 if silhouette(x,y)=0 and mhi(x,y) < timestamp-duration mhi(x,y)=mhi(x,y) otherwise That is, MHI pixels where motion occurs are set to the current timestamp, while the pixels where motion happened far ago are cleared. This is done using cvUpdateMotionHistory function. (d) Median Filtering Even though the camera mount is assumed to be xed in position, salt n pepper noise is observed in the video captured by the camera. In the video captured, this can be due to motion of tree leaves due to wind, waves on the surface of the water and Dept of EC, BVBCET, Hubli, KA.

Human Activity Recognition

other such stray sources of noise. Eliminating these reduces the possibilty of false detection. Median ltering an image is a widely adopted method for noise removal. This method nds the median of a pixel neighborhood and assigns it to the center pixel. This is done using cvSmooth function. (e) Finding Contours Before nding contours, the image is dilated as a pre-processing step. Dilation clubs all closely positioned motions of a person together to form a lumped region of motion. This enables proper contour detection. A contour is detected for each person in the frame. Centers for each of the contours is found.

3.2 Multiple Human Tracking and Tagging the activities


At every frame of video, all of the contours that have been found return an (x, y) position in image coordinates. The software then tracks each contour from frame to frame and stores a history of the contour positions. Because relying on absolute (x, y) positions will be brittle due to the above constraints, we instead look at the dierence of the contour center positions between subsequent frames of video, e.g. (x, y), wherex = [xt xt1 ]andy = [yt yt1 ]. (a) Minimum distance contour correlation When there is more than one person in the frames, there will be as many contours. To link a contour in one frame to a contour in another frame, we nd the Euclidean distances to the contour centers in the current frame from all the contour centers in the previous frame. Then, the contour center pair (one from previous frame and one from current frame) which produce a minimum distance are linked together. The same is done for all contour centers in the current frame. Let CONTOURS PREVIOUS and CONTOURS CURRENT represent the number of contours in the previous and current frames. Depending on the number of contours in the previous and current frames, three cases arise. i. CONTOURS PREVIOUS = CONTOURS CURRENT

Dept of EC, BVBCET, Hubli, KA.

Human Activity Recognition

Here, the number of persons detected in the current and previous frames are equal. Thus, each contour center in the previous frame will be linked to a contour center in the current frame. ii. CONTOURS PREVIOUS > CONTOURS CURRENT Here, the number of persons detected in the current frame is less than that in the previous frame. This implies that one or more persons have left the scene or are motionless. Each contour center in the current frame gets linked to one contour center in the previous frame. The unlinked contour centers in the previous frame are dropped. iii. CONTOURS PREVIOUS < CONTOURS CURRENT Here, the number of persons detected in the current frame is more than that in the previous frame. This implies that one or more persons have entered the scene or began moving from stand still state. Since each contour center in the current frame cannot be linked to a contour center in the previous frame, the process is reversed and the unlinked contour center is displayed in its position. (b) Fixed Length Backtracking Once a contour center in the current frame is linked to a contour center in the previous frame, the horizontal displacement between them is calculated. But, comparing this value against the thresholds set to classify motion will lead to erroneous results. This is because, while walking, when a person keeps a foot forward, it is stationary momentarily and motion of the other foot which is behind is started. When the center is computed for the contour in this situation, it will shift in a direction opposite to that of the persons walking. Figure 3.1 illustrates the above situation. (Video source: www.istockphoto.com) This problem can be circumvented by xed length backtracking. we have dened a hybrid approach by which backtracking is done on the latest k frames. Thus, when the observation at time t is received, the state at [t - (k/FPS)], where FPS is the frames per second of the video, is inferred. In a real-time system, this xedwindow approach will cause a delay in the state estimate, but as long as the delay is not too long, the estimate may still be useful to act upon. Thus, the value of k represents a tradeo in accuracy and estimation lag in a real-time system. Dept of EC, BVBCET, Hubli, KA.

Human Activity Recognition

Figure 3.1: Activity state recognition without and with backtracking

(c) Tagging activities and representation The result of xed length backtracking for each contour is compared against the set of thresholds to arrive at the activity state of the persons. Once this is done, a rectangle is drawn around the contour and the activity state is displayed. Figure 3.2 illustrates the various processes involved in the algorithm.

Dept of EC, BVBCET, Hubli, KA.

Human Activity Recognition

Figure 3.2: Outputs of various processes involved in the algorithm

Dept of EC, BVBCET, Hubli, KA.

Human Activity Recognition

10

Chapter 4

Results and Conclusion


4.1 Experimental Results

To test the algorithm, a test video was made in the college campus with an Olympus FE15 digital camera xed on a tripod at a distance of about ten meters from the road. The camera lens was directed perpendicular to the motion of the people. Figure 4.1 depicts the results obtained at various instants of the test video. It can be seen that the algorithm has correctly detected the activity states of the various persons in the video frames. The recognizer algorithm performs well in normal lighting conditions. It is able to distinguish between walking and running activities. It is able to reject non-human moving objects like cycles, bikes and cars. It can track any number of persons in the video. Incorporating xed length backtracking greatly increased the accuracy of the activity recognizing algorithm. However, it made few false detections in the presence of shadows. It is also inecient in detecting the stand still activity state.

4.2
4.2.1

Conclusion and Future Work


Conclusion

The advantage of using Tracked Body Displacements is that person is detected irrespective of his/her orientation with respect to the camera. This however, is not the case with Haar classiers where the persons features should closely match with the features of the persons in the images used

Human Activity Recognition

11

Figure 4.1: Results of the Activity Recognition algorithm

Dept of EC, BVBCET, Hubli, KA.

Human Activity Recognition

12

during the training of the classier. Haar classiers advantage lies in the fact that it can detect even a stationary person. We set out to acheive human activity recognition considering the activity set consisting of the states - Run/Walk Left/Right and Stand Still. Except Stand still state, all other states are satisfactorily recognized. We had not anticipated that shadows will cause a problem in the detection of humans. But, we realized that when there are heavy shadows, like in the evenings or mornings, false detections occur. However, the algorithm recognizes the activities of multiple persons remarkably well.

4.2.2

Future Work

To make the algorithm more robust, we intend to incorporate statistical modelling of the activity states. We plan to reliably detect the stand still state by extracting features unique in that situation. We will incorporate adaptive thresholding to distinguish between walking and running states reliably irrespective of whether the person is near to the capturing camera or far from it. We would also like to explore the option of using haar training to tackle the aforesaid problems.

Dept of EC, BVBCET, Hubli, KA.

Human Activity Recognition

13

Bibliography
[1] Huang T.S. Hong P., Turk M. Gesture modeling and recognition using nite state machines. Proceedings of the Fourth IEEE International Conference and Gesture Recognition, Grenoble, France, 2000. [2] M. E. Pollack and L. E. B. et al. Autominder: an intelligent cognitive orthotic system for people with memory impairment. Robotics and Autonomous Systems, 2003. [3] Paul E. Rybski and Manuela M. Veloso. Robust real-time human activity recognition from tracked face displacements. September 2005.

You might also like