VA Lecture 27

Video Summarization
Aatish Malik (2021UCD2168)
Deepanshu (2021UCD2138)
Tanmay Nagori (2021UCD2141)

What is Video Summarization?
● Video summarization is the process of extracting the most relevant and
informative segments from a video and creating a shorter version that
preserves the main content and context.
● It aims to provide a concise representation of the video's main events, key

moments, or important information.
● Summarization can be done manually by humans or automatically using

algorithms and machine learning techniques.
● Applications of video summarization include content browsing, video

surveillance, education and training, news and media and medical imaging.
Applications
Applications
Educational Videos: to facilitate review and revision, for learners with limited time or
attention spans, targeted learning can be achieved for more important topics
Medical: allow specialists to search for similar cases of a procedure and go through the
details of an old case straight away.
Surveillance: to monitor surveillance videos, we can generate a summary of surveillance
videos which includes a specific activity, a specific person or a specific object
Entertainment: generating a trailer of a movie instead of manually creating one
Sports: generating highlights of sports video recordings
News: to quickly look out for the important patterns shown in the news
Drones: key events, anomalies, or points of interest can be highlighted
Let’s look at an example
Summarized video (Target)
What makes good video summaries
● Accuracy: contains main objects of interest

● Relevance: contains interesting actions/events
● Concise and Comprehensive: to the point , non-
repetitive, has sufficient context
● Coherent storyline: should be engaging, visually
appealing and accessible
Steps in Video
Step 1: Analyze Information Sources:
Each information source needs to be analyzed, so that the primary
Summarization information content can be recognized and used further.

Breaking down each frame into its constituent elements, such as
objects, scenes, motion, and other visual features
Techniques using Commonly done using Deep Learning techniques like CNN
Deep Learning Step 2: Measure of Relevance:
The information content based on generic or specialized to a certain

issue is generated based on features or semantic approaches.
Deep learning models extract features from the information sources
and assign relevance scores to different elements within the video
Step 3: Synthesize Appropriate Output:
The extracted data is structured in an understandable format and

represented as accurately as feasible as a output of the model.
Summary is generated based on the assigned relevance scores and it

is represented in a specified format like a summary video or a text
summary
Techniques of video summarization
Video Summarization using
Deep Learning
References used: ● Fully convolutional sequence
network for video
summarization
● Learning video summarization
from unpaired data
https://www.youtube.com/watch?v=dHp5I0m9_zA
https://paperswithcode.com/task/video-
summarization
https://link.springer.com/article/10.1007/s10462-023-
10444-0
FULLY CONVOLUTIONAL SEQUENCE NETWORK (FCSN)
Fully convolutional sequence network (FCSN)
Why FCSN?
In previous work, such as video summarization

using LSTM (zhang et al.) n frames were passed to
the neural LSTM network and output of 0 or 1 was
received for each frame.
PROS: It was good at finding long term structural

dependencies among frames.
CONS: It was hard to do the computational

parallely.
Semantic segmentation use in Video Summarization
Credit: Centre for health care

innovation
FCSN Case study:
0
0
0
1
Feature map
Encoder Decoder
1D convolution and pooling 1D deconvolution and unpooling
Loss functions used:
Challenges with
FCSN
, 0101010010110
, 0100101010110
It is very difficult to get paired data.

, 1001001010111
Frame labels for each

Summary videos
video
Learning from
unpaired data
Raw vidoe
Summarised video
Two player
game approach Keyframe selector network
010101101010101
To learn from unpaired data we use two
player approach fake/predicted summary Real summary (from training data
Summary discriminator network
Real or Fake
Thank You
SUSPICIOUS ACTIVITY
DETECTION
Video Analytics
Tanmay Nagori (2021UCD2141)

What is a suspicious activity?
"Suspicious activity" typically refers to any behavior or action that
appears unusual, out of the ordinary, or potentially indicative of
wrongdoing or a security threat. In various contexts, what constitutes
suspicious activity can vary, but it often involves actions that deviate
from expected patterns or norms.
ABSTRACT
• Suspicious human activity recognition from surveillance video is an
active research area of image processing and computer vision.
• Through the visual surveillance, human activities can be monitored in

sensitive and public areas such as stations, airports, school and
colleges, roads, etc. to prevent terrorism, theft, accidents and illegal
parking, vandalism, fighting, chain snatching, crime and other
suspicious activities.
• It is very difficult to watch public places continuously, therefore an
intelligent video surveillance is required that can monitor the human
activities in real-time and categorize them as usual and unusual
activities; and can generate an alert.
Suspect
Another example of Suspect
• Suspicious activity detection involves the use of algorithms and
technologies to identify behavior or events that deviate from normal
patterns, indicating potential threats or illegal actions.
• This can apply to various domains, such as cybersecurity, finance, or

surveillance techniques include anomaly detection, machine learning, and
pattern recognition to flag activities that might be indicative of fraud,
security breaches, or other unauthorized actions.
• Goal: to enhance proactive monitoring and security measures by quickly

identifying and responding to potentially harmful activities.
Non Suspect (not within range)
Requirements:
❖ Video camera
To capture activities.
❖ Server
To run code and give continuous
monitoring results.
GENERAL SYSTEM ARCHITECTURE AND DESIGN:
• Preprocessing:
• Resize
• Gray scale conversion - It is an image conversion technique in digital
photography. It eliminates every form of color information and only leaves
different shades of gray; the brightest being white and the darkest of it being
black. It makes processing fast and efficient.
• Grayscale compresses an image to its barest minimum pixel.
• Feature extraction:
• LBP - Local Binary Pattern (LBP) is an effective texture descriptor for images
which thresholds the neighboring pixels based on the value of the current
pixel. LBP descriptors efficiently capture the local spatial patterns and the gray
scale contrast in an image.
Linear Binary Pattern (LBP)
Visualization of calculation of Local Binary Pattern (LBP):
➢ An example region of the original image is examined, with neighboring

parameters of R = 1 and N = 8.
➢ Neighboring pixels are compared to the center pixel: pixel values smaller than
the center pixel values are assigned to 1, pixel values bigger to 0.
➢ Binary values are stringed together.
➢ This allows a calculation of a decimal value which will be stored in matrix with
the same width and height as the original image and in the same place as the
input center pixel.
➢ This is done for every pixel of the image. The LBP matrix can be represented as a
histogram which will be treated as the feature vector of the original image.
• Dataset Splitting – Train and Test
• Classification
• CNN
• ECNN
• Performance
• Accuracy
• Error rate
• Recognition
• Suspicious Activity Detection
Challanges:
• Complex Environments: Video surveillance often operates in complex and dynamic environments
with varying lighting conditions, occlusions, shadows, and cluttered backgrounds. These factors
can make it difficult to accurately detect and classify suspicious activities.
• False Alarms: Automated systems may generate false alarms due to factors such as
environmental changes, transient events, or benign behaviors that are misinterpreted as
suspicious. False alarms can overwhelm security personnel and reduce the effectiveness of the
system.
• Anomaly Detection: Identifying anomalous behavior requires sophisticated algorithms capable of

distinguishing between normal activities and genuinely suspicious actions. Developing accurate
anomaly detection models requires large datasets for training and validation.
• Data Privacy Concerns: Video surveillance raises concerns about privacy and civil liberties,
particularly when deploying automated surveillance systems capable of analyzing individuals'
behavior. Striking a balance between security needs and privacy rights is a significant challenge.
• Scalability: Scaling suspicious activity detection systems to large-scale deployments, such as smart
cities or extensive transportation networks, presents technical challenges in terms of processing
power, storage, and bandwidth requirements.
• Real-Time Processing: Many applications require real-time processing of video feeds to detect
and respond to suspicious activities promptly. Achieving low-latency processing while maintaining
high accuracy is challenging, especially in resource-constrained environments.
Applications
• Security and Surveillance: Video cameras are extensively used in security and surveillance
systems to monitor and detect suspicious activities in public places, airports, train stations, banks,
and other high-security areas. This includes detecting unauthorized access, loitering, vandalism,
theft, or any other behavior that deviates from normal patterns.
• Retail Loss Prevention: In retail environments, video cameras are used to detect suspicious
behaviors such as shoplifting, fraudulent returns, or other forms of retail theft. Advanced
analytics can help identify unusual patterns of behavior, such as individuals spending excessive
time in specific areas or attempting to conceal merchandise.
• Smart Cities and Public Safety: Video surveillance is a key component of smart city initiatives
aimed at enhancing public safety. Cameras deployed in urban areas can detect suspicious
activities such as traffic violations, accidents, unauthorized gatherings, or other forms of
antisocial behavior. This information can be used by law enforcement agencies to respond quickly
and effectively to incidents.
• Border Security and Immigration Control: Video surveillance is critical for monitoring
border areas and identifying potential security threats such as illegal border crossings,
smuggling activities, or suspicious behavior near border checkpoints. Automated systems
can analyze video feeds in real-time to detect anomalies and alert border patrol agents.
• Transportation Security: Video cameras installed in transportation hubs such as airports,

train stations, and bus terminals help identify suspicious behavior such as unattended
bags, unauthorized access to restricted areas, or individuals acting suspiciously in
crowded environments. This enhances passenger safety and security measures.
• Banking and Financial Institutions: Video surveillance is essential for securing banking
facilities and ATMs against threats such as robbery, fraud, or unauthorized access.
Suspicious activity detection systems can analyze video feeds to identify unusual
behavior inside bank branches or around ATM locations, triggering alerts for immediate
response.
• Critical Infrastructure Protection: Video cameras are deployed to monitor critical

infrastructure facilities such as power plants, water treatment plants, or nuclear facilities.
Suspicious activity detection helps identify potential threats such as trespassing,
sabotage, or vandalism, allowing security personnel to take appropriate action to
mitigate risks.
THANK YOU

VA Lecture 27

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

VA Lecture 27

Uploaded by

Copyright:

Available Formats

Video Summarization

Aatish Malik (2021UCD2168)

Tanmay Nagori (2021UCD2141)

● It aims to provide a concise representation of the video's main events, key

● Summarization can be done manually by humans or automatically using

● Applications of video summarization include content browsing, video

● Accuracy: contains main objects of interest

Each information source needs to be analyzed, so that the primary

Summarization information content can be recognized and used further.

Deep Learning Step 2: Measure of Relevance:

The information content based on generic or specialized to a certain

Step 3: Synthesize Appropriate Output:

The extracted data is structured in an understandable format and

Summary is generated based on the assigned relevance scores and it

In previous work, such as video summarization

PROS: It was good at finding long term structural

CONS: It was hard to do the computational

Credit: Centre for health care

It is very difficult to get paired data.

Frame labels for each

Summary discriminator network

Tanmay Nagori (2021UCD2141)

• Through the visual surveillance, human activities can be monitored in

• This can apply to various domains, such as cybersecurity, finance, or

• Goal: to enhance proactive monitoring and security measures by quickly

➢ An example region of the original image is examined, with neighboring

➢ Binary values are stringed together.

• Anomaly Detection: Identifying anomalous behavior requires sophisticated algorithms capable of

• Transportation Security: Video cameras installed in transportation hubs such as airports,

• Critical Infrastructure Protection: Video cameras are deployed to monitor critical

You might also like