You are on page 1of 19

VIDEO SEGMENTATION

Lecture 29: Sarathy Ramanan Ramasamy – 2021UCD2126


BLOCK BASED APPROACHES
§ Block based methods involve dividing the video frames into regions/partitions
called blocks.
§ Each frame ‘f’ is divided into say ‘b’ blocks and is compared with corresponding
the corresponding blocks in frame ‘f+1’.
§ Once the frames are partitioned into blocks, different segmentation techniques are
applied to the blocks – can be independent or collective. This could include edge-
detection, watershed segmentation, thresholding and more.
WHY?
§ Block based approaches use local characteristics of the region, hence increasing
their robustness to camera and object movement.
§ Compared to pixel based approaches, this method is more tolerant to slow and
small changes in object motion over the frames.
§ This is because as in pixel based approaches the frames are compared globally
and don’t efficiently take into consideration on local template/object
characteristics like texture, light intensity, deformations and so on.
§ Reduced computational complexity in the matching process, hence suitable for
real time applications.
FRAMEWORK

§ User Intervention: The user may need to specify initial parameters or provide tools for creating
an initial partition of the image. The user is required to provide an initial rough delimitation of
the object to be segmented, known as the active region. This operation needs to be performed
for each key-frame in the video sequence, particularly when there are significant changes in the
object's shape due to occlusion or disappearance. The set of frames for user to process can be
suggested by an algorithm based on scene change detection criteria.
§ Initial Model Extraction: From each key-frame, an initial model of the object is extracted
within the specified active region. This extraction is based on a combined temporal and spatial
transitions extraction approach.
Ø Simplification Procedure: Applied to reduce irrelevant information in the image that could affect the
segmentation results. Involves reducing texture regions and enhancing contours.
Ø Filtering and Contour Detections: A map of the contour pixels is provided.
Ø Motion Information: The motion information is extracted by means of a frame temporal difference.
Ø Cost Function: Motion information, smoothness, continuity and edge strongness are merged in a cost
function used to obtain the set of contour blocks that constitutes the initial object model.
§ Block Matching: Once the model initialization has been performed, the object is tracked in the
following frames based on a block matching procedure. Accordingly, the model is updated
every frame.
THE BLOCK BY BLOCK APPROACH Base edge
patterns
§ Each step is performed using blocks. Set of edge patterns are used for
simplification and extraction and their number and fashion depend on block
dimension.
§ Each pattern is composed of two regions. The active region is divided into blocks,
by comparing the different regions in the block on their color we decide if the
block is an edge pattern.
§ If there's a big difference in colors between the regions, more or equal to the
threshold from the base edge pattens, we say the block has an edge.
§ For the blocks with edges, we smooth out the colors to make them look refined. We
also keep track of where the edges are.
§ For the blocks without edges, we make the colors more consistent by using the
average color of the block to enhance them.

§ The edged blocks are used to extract the object model. The object's contour, represented
by a subset of these edged blocks. The subsets are chosen by analyzing the blocks on
various properties: Edge Continuity, Smoothness of the Contour, Block Motion, Contour
Strength, and Closeness to User's Polygon.
§ The model is updated in each frame using a block matching procedure. Each block in the
model at frame ‘f’ is compared with corresponding blocks in the next frame ‘f+1’ to
estimate the object's motion.
§ The motion vector of each block in the model is estimated with respect to the next frame.
We then choose candidate motion vectors where the block overlaps with at least one edged
block in the next frame.
§ The output of the tracker is a sequence of closed contours that correspond to the shape of
the object being tracked. These contours outline the object's silhouette in each frame.
§ The final step is to extract the corresponding object from the video sequence. This is
achieved by filling in each pixel within the object's contour, effectively isolating the object
from the background.
CLUSTER BASED APPROACHES
§ Cluster-based approaches for video segmentation involve grouping similar pixels
or regions in a video into clusters based on certain criteria, such as color, texture,
motion, or intensity, i.e. their feature vectors.
§ Cluster based approaches involve feature extraction from each frame or sequence
of frames in the video like color histograms, texture descriptors, motion vectors, or
other relevant characteristics that help in differentiating between regions.
§ Following this clustering algorithms are applied to group similar pixels or regions
together. This is usually done by projecting the pixels on to a Euclidean space and
find the similarity using measurements like Euclidian distance.
WHY?
§ Cluster-based methods are relatively simple and less complex to implement and
more straightforward.
§ They can capture complex relationships between pixels or regions based on
various features, such as color, texture, motion, or intensity, allowing them to
segment videos with diverse content and characteristics.
§ Clustering algorithms can scale to handle videos of varying resolutions, frame
rates, and lengths.
§ These algorithms have various parameters that can be adjusted as needed, hence
making them flexible to work with.
VIDEO SEGMENTATION USING K-MEANS
§ Algorithm:
1. Randomly select k points {n1,n2,n3,…..,nk} as the initial centroids of the k clusters in the
feature space.
2. For each pixel p[i], find the nearest cluster centroid n[j], use distance measures, to the
pixel’s feature and assign the pixel to the cluster ‘j’.
3. Recompute mean of each cluster along with assigned pixels.
4. Repeat steps 2 and 3 till convergence or changes in the k clusters drop below a defined
threshold.

§ Sensitive to initialization and outliers. Need to pick the optimal number of clusters
for meaningful segmentation.
VIDEO SEGMENTATION USING MEAN-SHIFT
§ Algorithm:
§ Consider mean value m[i] = f[i], feature of pixel p[i] as its initial mean.
§ Repeat the following for each mean m[i]:
§ Consider a window of size W around m[i].
§ Compute mean, m, within the window and set m[i] = m.
§ Stop if no shift or shift in mean is less than a defined threshold. m[i] is the mode.
§ Pixels that have the same mode belong to the same cluster.

§ Computationally expensive, no initialization required, robust to outliers, and


clustering is dependant on window size W.
TEMPORAL COHERENCE IN CLUSTER BASED
METHODS
§ While in most cases, clustering for video segmentation is done frame by frame,
independent of each other, this method is in risk of ignoring temporal
consistencies in the video.
§ This could occur due to the frames being processed independently, the extrinsic
conditions changing over time, movement of the object being complex, or
misalignment of clustering across the frames.
§ To combat this we integrate temporal coherence methods like optical flow or graph
modelling along with clustering.
§ For example, Graph-based segmentation methods model temporal relationships
between frames as a graph, where nodes represent pixels or regions, and edges
represent temporal connections. Segmentation is then performed by optimizing
the graph structure by analyzing energy/cost functions and likelihood measures to
maintain coherence over time.
HISTOGRAM BASED APPROACHES
§ Histogram-based approaches for video segmentation involve using histograms of
pixel values or features to segment regions/objects in videos.
§ The idea behind histogram-based approaches is that two frames with unchanging
background and unchanging (although moving) objects will have little difference
in their histograms.
§ The video segmentation process of this category of methods involves the
generation of gray level or color histograms, one for each frame, and a pair-wise
comparison of the histogram bins.
§ Hence, this can be a robust method to perform video segmentation.
WHY?
§ Histogram-based approaches are advantageous because they can effectively
capture the statistical distribution of pixel intensities or features in the images.
§ Histograms are invariant to image rotation and change slowly under the variations
of viewing angle and scale. Hence they’re robust in video segmentation tasks and
more efficient.
§ This allows for robust segmentation, particularly in scenarios where the
background remains constant and objects move within the scene.
WORKING
§ Feature Extraction: Relevant features are extracted from each frame of the video.
Histograms are constructed based on the extracted features for each frame. For
example, a color histogram represents the distribution of color values in the frame,
while a texture histogram represents the distribution of texture patterns.
§ Each frame is compared with its previous frame using the histogram difference.
§ Frame dissimilarities are extracted as features.
§ Based on the similarity between histograms, the video frames are segmented.
Thresholding techniques or clustering algorithms, such as K-means clustering or
Fuzzy C-Means clustering, can be applied to the histogram similarity measures to
partition the video frames into coherent segments or clusters.
§ The frames are divided into clusters: Shot Change Frames, Suspected Shot Change
Frames, and No Shot Change frames.

§ All frames in Shot Change cluster as Shot Change Frames (SCFs).
§ Possible shot change frames are selected from the Suspected Shot Change (SSC)
cluster using a heuristic approach.
§ The heuristic used in the second step involves checking all SSC frames between
consecutive SC frames. Heuristics involve temporal context analysis, multiple
feature extraction and comparison, thresholding, and so on. Based on the heuristic
the suitable SSC frames are classified as SC frames.
§ The video sequence is segmented into shots based on the identified shot change
frames (SCFs) from both the SC and SSC clusters.
THANK YOU

You might also like