Tracking

TRACKING
KAMAL NASROLLAHI, ASSISTANT PROFESSOR, COMPUTER VISION & MEDIA TECH. LAB , AALBORG UNIVERSITY
AGENDA
Definition
Applications
Challenges Simplification of Tracking Object Representation
Feature Selection for Tracking

Object Detection Object Tracking Kalman Filters Extended Kalman Filters Particle Filters Mean-Shift Tracking Silhouette Tracking
DEFINITION
Tracking can be defined as the problem of estimating the trajectory of an object in the image plane as it moves around a scene
Three steps in video analysis: 1. Detection of interesting moving objects 2. Tracking of such objects from frame to frame 3. Analysis of object tracks to recognize their behavior
AGENDA
Definition
Applications

APPLICATIONS
motion-based recognition
human identification based on gait, automatic object detection, etc

monitoring a scene to detect suspicious activities or unlikely events automatic annotation and retrieval of the videos in multimedia databases gesture recognition, eye gaze tracking for data input to computers, etc. real-time gathering of traffic statistics to direct traffic flow video-based path planning and obstacle avoidance capabilities.
automated surveillance video indexing human-computer interaction
traffic monitoring

vehicle navigation
AGENDA
Definition
Applications

CHALLENGES
loss of information caused by projection of the 3D world on a 2D image noise in images complex object motion nonrigid or articulated nature of objects partial and full object occlusions complex object shapes scene illumination changes real-time processing requirements
AGENDA
Definition
Applications

SIMPLIFING THE TRACKING

imposing constraints on the motion and/or appearance of objects
almost all tracking algorithms assume that the object motion is smooth with no abrupt changes
constrain the object motion

to be of constant velocity or constant acceleration

the number and the size of objects, the object appearance and shape
Prior knowledge about
AGENDA
Definition
Applications

OBJECT REPRESENTATION
In a tracking scenario, an object can be defined as anything that is of interest for further analysis: boats on the sea fish inside an aquarium vehicles on a road planes in the air people walking on a road bubbles in the water
In general, there is a strong relationship between the object representations and the tracking algorithms. Object representations are usually chosen according to the application domain. Objects can be represented by their shapes and appearances
OBJECT SHAPE REPRESENTATIONS

Points
by a point (the centroid) or by a set of points

point representation is suitable for tracking objects that occupy small regions in an image
Primitive geometric shapes by a rectangle by an ellipse, etc.

Object motion for such representations is usually modeled by translation, affine, or projective (homography) transformation Though primitive geometric shapes are more suitable for representing simple rigid objects, they are also used for tracking nonrigid objects.
Object silhouette and contour
Contour representation defines the boundary of an object The region inside the contour is called the silhouette of the object Silhouette and contour representations are suitable for tracking complex nonrigid shapes
OBJECT SHAPE REPRESENTATIONS

Articulated shape models composed of body parts that are held together with joints The relationship between the parts are controlled by kinematic motion models, for example, joint angle, etc. In order to represent an articulated object, one can model the different parts using cylinders or ellipses. Skeletal models can be extracted by applying medial axis transform to the object silhouette used commonly as a shape representation for recognizing objects can be used to model both articulated and nonrigid objects
AGENDA
Definition
Applications

FEATURE SELECTION FOR TRACKING

Selecting the right features plays a critical role in tracking. In general, the most desirable property of a visual feature is its uniqueness so that the objects can be easily distinguished in the feature space. Feature selection is closely related to the object representation. For example, color is used as a feature for histogrambased appearance representations, while for contour-based representation, object edges are usually used as features. In general, many tracking algorithms use a combination of features.

Color influenced primarily by two physical factors
the distribution of the illuminant and the surface reflectance properties of the object is not a perceptually uniform color space the differences between the colors in the RGB space do not correspond to the color differences perceived by humans Additionally, the RGB dimensions are highly correlated.
RGB (red, green, blue) color space

HSV (Hue, Saturation, Value), Luv and Lab are perceptually uniform color spaces,
are sensitive to noise
there is no last word on which color space is more efficient

Edges Object boundaries usually generate strong changes in image intensities. Edge detection is used to identify these changes. An important property of edges is that they are less sensitive to illumination changes compared to color features. Algorithms that track the boundary of the objects usually use edges as the representative feature. Because of its simplicity and accuracy, the most popular edge detection approach is the Canny Edge detector.

Optical Flow
Optical flow is a dense field of displacement vectors which defines the translation of each pixel in a region. It is computed using the brightness constraint, which assumes brightness constancy of corresponding pixels in consecutive frames Optical flow is commonly used as a feature in motionbased segmentation and tracking applications.

Texture
is a measure of the intensity variation of a surface which quantifies properties such as smoothness and regularity. Compared to color, texture requires a processing step to generate the descriptors. There are various texture descriptors: Gray-Level Cooccurrence Matrices (GLCMs) (a 2D histogram which shows the cooccurrences of intensities in a specified direction and distance) filters corresponding to level, edge, spot, wave, and ripple), wavelets (orthogonal bank of filters) and steerable pyramids
Similar to edge features, the texture features are less sensitive to illumination changes compared to color.

Automatic feature selection filter methods The filter methods try to select the features based on a general criteria, for example, the features should be uncorrelated. Principal Component Analysis (PCA) is an example of the filter methods for the feature reduction. PCA involves
transformation of a number of (possibly) correlated variables into a (smaller) number of uncorrelated variables called the principal components.
wrapper methods
select the features based on

the usefulness of the features in a specific problem domain, for example, the classification performance using a subset of features.
Adaboost algorithm
AGENDA
Definition
Applications

OBJECT DETECTION
Every tracking method requires an object detection mechanism either in every frame or when the object first appears in the video. So common approaches for object detection information in a single frame temporal information computed from a sequence of frames to reduce the number of false detections.
frame differencing which highlights changing regions in consecutive frames Given the object regions in the image it is then the trackers task to perform object correspondence from one frame to the next to generate the tracks.
OBJECT DETECTION
Point Detectors Background Subtraction Segmentation Mean-Shift Clustering Image Segmentation Using Graph-Cuts Active Contours Supervised Learning Adaptive Boosting Support Vector Machines
OBJECT DETECTION
Point Detectors interest point invariance to changes in illumination and camera viewpoint Moravecs interest operator Harris interest point detector KLT detector SIFT detector
OBJECT DETECTION
Point Detectors Moravecs interest operator computes the variation of the image intensities in a 4 4 patch in the horizontal, vertical, diagonal, and antidiagonal directions then selects the minimum of the four variations as representative values for the window. A point is declared interesting if the intensity variation is a local maximum in a 12 12 patch.
OBJECT DETECTION
Harris interest point detector computes the first order image derivatives, (Ix , I y ), in x and y directions to highlight the directional intensity variations, then a second moment matrix, which encodes this variation, is evaluated for each pixel in a small neighborhood:
An interest point is identified using the determinant and the trace of M which measures the variation in a local neighborhood:
The interest points are marked by thresholding R
OBJECT DETECTION
KLT detector Here the same moment matrix M is used Interest point confidence, R, is computed using the minimum eigenvalue of M, min. Interest point candidates are selected by thresholding R. Among the candidate points KLT eliminates the candidates that are spatially close to each other. the M matrix is invariant to both rotation and translation. However, it is not invariant to affine or projective transformations.
OBJECT DETECTION
SIFT (Scale Invariant Feature Transform)
1. scale space construction by convolving the image with Gaussian filters at different scales. Convolved images are used to generate difference-ofGaussians (DoG) images. Candidate interest points are then selected from the minima and maxima of the DoG images across scales. updating the location of each candidate
by interpolating the color values using neighboring pixels. low contrast candidates as well as the candidates along the edges are eliminated. to the remaining interest points based on the peaks in the histograms of gradient directions in a small neighborhood around a candidate point.
2. 3. 4.
Elimination

Orientation assignment
Better than others

at different scales and different resolutions (pyramid)
OBJECT DETECTION
Segmentation The aim of image segmentation algorithms is to partition the image into perceptually similar regions. Every segmentation algorithm addresses two problems: the criteria for a good partition and the method for achieving efficient partitioning Mean-Shift Clustering the propose of the mean-shift approach is to find clusters in the joint spatial and color space [l , u, v, x, y] where [l , u, v] represents the color and [x, y] represents the spatial location
OBJECT DETECTION
Segmentation
Mean-Shift Clustering
is initialized with a large number of hypothesized cluster centers randomly chosen from the data Then, each cluster center is moved to the mean of the data lying inside the multidimensional ellipsoid centered on the cluster center. The vector defined by the old and the new cluster centers is called the mean-shift vector. The mean-shift vector is computed iteratively until the cluster centers do not change their positions.
Note that during the mean-shift iterations, some Clusters may get merged. Mean-shift clustering is scalable
edge detection image regularization tracking
OBJECT DETECTION
Segmentation Image Segmentation Using Graph-Cuts the vertices (pixels), V = {u, v, . . .}, of a graph (image), G, are partitioned into N disjoint subgraphs (regions), Ai Ai Aj = , i = j ,
by pruning the weighted edges of the graph
The total weight of the pruned edges between two subgraphs is called a cut. The weight is typically computed by color, brightness, or texture similarity between the nodes.
OBJECT DETECTION
Segmentation Active Contours Object segmentation is achieved by evolving a closed contour to the objects boundary, such that the contour tightly encloses the object region. Evolution of the contour is controlled by an energy functional which defines the fitness of the contour to the hypothesized object region.
OBJECT DETECTION
Supervised Learning by learning different object views automatically from a set of examples the requirement of storing a complete set of templates supervised learning methods generate a function that maps inputs to desired outputs A standard formulation of supervised learning is the classification problem where the learner approximates the behavior of a function by generating an output in the form of either a continuous value, which is called regression, or a class label, which is called classification
OBJECT DETECTION
Supervised Learning Adaptive Boosting Boosting is an iterative method of finding a very accurate classifier by combining many base classifiers, each of which may only be moderately accurate
AGENDA
Definition
Applications

OBJECT TRACKING
OBJECT TRACKING
Point Tracking Objects detected in consecutive frames are represented by points, and the association of the points is based on the previous object state which can include object position and motion. This approach requires an external mechanism to detect the objects in every frame.
OBJECT TRACKING
Point Tracking
Tracking can be formulated as the correspondence of detected objects represented by points across frames.
Point correspondence is a complicated problem-specially in the presence of occlusions, misdetections, entries, and exits of objects
Overall, point correspondence methods can be divided into two broad categories, namely,
Deterministic use qualitative motion heuristics to constrain the correspondence problem. Statistical explicitly take the object measurement and take uncertainties into account to establish correspondence.
OBJECT TRACKING
Point Tracking
Deterministic Methods for Correspondence define a cost of associating each object in frame t 1 to a single object in frame t using a set of motion constraints. Minimization of the correspondence cost is formulated as a combinatorial optimization problem. A solution, which consists of one to one correspondences among all possible associations can be obtained by optimal assignment methods, for example,
Hungarian algorithm, or greedy search methods
OBJECT TRACKING
Point Tracking
Deterministic Methods for Correspondence The correspondence cost is usually defined by using a combination of the following constraints Proximity Maximum velocity Small velocity change (smooth motion) Common motion Rigidity
AGENDA
Definition
Applications

OBJECT TRACKING
Point Tracking
Statistical Methods for Correspondence during object state estimation measurement and uncertainties state space approach to model the object properties such as position, velocity, and acceleration
OBJECT TRACKING
Point Tracking
Statistical Methods for Correspondence state space approach Consider a moving object in the scene. The information representing the object, for example, location, is defined by a sequence of states X t : t = 1, 2, . . .. The change in state over time is governed by the dynamic equation:
The relationship between the measurement and the state is specified by the measurement equation
OBJECT TRACKING
Point Tracking
Statistical Methods for Correspondence state space approach tracking is estimating the state X t given all the measurements up to that moment p(X t |Z1,...,t ). Bayesian filter which solves the problem in two steps.
The prediction step uses a dynamic equation and the already computed pdf of the state at time t 1 to derive the prior pdf of the current state, that is, p(X t |Z1,...,t1). Then, the correction step employs the likelihood function p(Zt |X t ) of the current measurement to compute the posterior pdf p(X t |Z1,...,t ).
single object in the scene multiple objects in the scene

measurements need to be associated with the corresponding object states.
OBJECT TRACKING
Point Tracking
Statistical Methods for Correspondence state space approach single object in the scene (Kalman Filtering)
if f and h are linear functions and the initial state X and noise have a Gaussian distribution then the optimal state estimate is given by the Kalman Filter:
prediction and correction
OBJECT TRACKING
Point Tracking
Kalman Filtering
Prediction
The prediction step uses the state model to predict the new state of the variables:
D is the state transition matrix which defines the relation between the state variables at time t and t 1. Q is the covariance of the noise W. Correction
OBJECT TRACKING
Point Tracking
Kalman Filtering
Prediction Correction
uses the current observations Zt to update the objects state:
where v is called the innovation, M is the measurement matrix, K is the Kalman gain,
AGENDA
Definition
Applications

OBJECT TRACKING
Point Tracking
Extended Kalman Filtering the updated state, X is still distributed by a Gaussian. In case the functions f and h are nonlinear, they can be linearized using the Taylor series expansion to obtain the extended Kalman filter.
Similar to the Kalman filter, the extended Kalman filter assumes that the state is distributed by a Gaussian. Particle Filters
AGENDA
Definition
Applications

OBJECT TRACKING
Point Tracking
Particle Filters the conditional state density p(Xt |Zt ) at time t is represented by a set of samples {s(n)t : n = 1, . . . , N} (particles) with weights (n)t (sampling probability). Selection Prediction Correction
AGENDA
Definition
Applications

OBJECT TRACKING
Kernel Tracking Kernel refers to the object shape and appearance. For example, the kernel can be a rectangular template or an elliptical shape with an associated histogram. Objects are tracked by computing the motion of the kernel in consecutive frames. This motion is usually in the form of a parametric transformation such as
translation, rotation, and affine.
OBJECT TRACKING
Kernel Tracking Mean-Shift
OBJECT TRACKING
Kernel Tracking Mean-Shift OpenCV's tracker uses an algorithm called Camshift. Camshift stands for "Continuously Adaptive Mean Shift."
Camshift consists of four steps:

1. Create a color histogram to represent the object 2. Calculate a object probability" for each pixel in the incoming video frames 3. Shift the location of the object rectangle in each video frame 4. Calculate the size and angle
OBJECT TRACKING
Camshift 1. Create a color histogram to represent the object Camshift represents the object it's tracking as a histogram of color values.
HSV The height of each colored bar indicates how many pixels in an image region have that "hue.
OBJECT TRACKING
Camshift 2. Calculate object probability
OBJECT TRACKING
Camshift
3.
Shift to a new location

With each new video frame, Camshift "shifts" its estimate of the object location, keeping it centered over the area with the highest concentration of bright pixels in the objectprobability image. It finds this new location by starting at the previous location and computing the center of gravity of the object-probability values within a rectangle. It then shifts the rectangle so it's right over the center of gravity. It does this a few times to center the rectangle well.
The OpenCV function cvCamShift()implements the steps for shifting to the new location.
This process of shifting the rectangle to correspond with the center of gravity is "Mean Shift"
OBJECT TRACKING
Camshift 4. Calculate size and angle The OpenCV method is called "Continuously Adaptive," and not just "Mean Shift," because it also adjusts
the size and angle of the object rectangle each time it shifts it.
It does this by selecting the scale and orientation that are the best fit to the object-probability pixels inside the new rectangle location.
AGENDA
Definition
Applications

OBJECT TRACKING
Silhouette Tracking Tracking is performed by estimating the object region in each frame. Silhouette tracking methods use the information encoded inside the object region. This information can be in the form of appearance density and shape models which are usually in the form of edge maps. Given the object models, silhouettes are tracked by either shape matching or contour evolution
EXERCISE
Either work on your mini-project or Download Camshift Wrapper which is a meanshift tracking code from the following address and run it in OpenCV: http://www.cognotics.com/opencv/downloads/camshift_wr apper/index.html

Tracking

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tracking

Uploaded by

Copyright:

Available Formats

TRACKING

Feature Selection for Tracking

Feature Selection for Tracking

human identification based on gait, automatic object detection, etc

automated surveillance video indexing human-computer interaction

Feature Selection for Tracking

Feature Selection for Tracking

SIMPLIFING THE TRACKING

constrain the object motion

to be of constant velocity or constant acceleration

Prior knowledge about

Feature Selection for Tracking

OBJECT SHAPE REPRESENTATIONS

by a point (the centroid) or by a set of points

Primitive geometric shapes by a rectangle by an ellipse, etc.

Object silhouette and contour

OBJECT SHAPE REPRESENTATIONS

Feature Selection for Tracking

FEATURE SELECTION FOR TRACKING

FEATURE SELECTION FOR TRACKING

RGB (red, green, blue) color space

there is no last word on which color space is more efficient

FEATURE SELECTION FOR TRACKING

FEATURE SELECTION FOR TRACKING

FEATURE SELECTION FOR TRACKING

FEATURE SELECTION FOR TRACKING

select the features based on

Feature Selection for Tracking

The interest points are marked by thresholding R

Better than others

Feature Selection for Tracking

Hungarian algorithm, or greedy search methods

Feature Selection for Tracking

single object in the scene multiple objects in the scene

uses the current observations Zt to update the objects state:

Feature Selection for Tracking

Feature Selection for Tracking

Feature Selection for Tracking

translation, rotation, and affine.

Camshift consists of four steps:

Shift to a new location

Feature Selection for Tracking

You might also like