Video Servellinece

VIDEO SERVELLINECE
DATA ENGINEERING REPORT
SENARIO:
You are part of a team, building a real-time surveillance system that performs
analytics on live camera streams. The system needs to process video frames in real-time,
apply analytics, and store or transmit the results. You are tasked with developing a
critical component of the system.
SCOPE:
The user might ask for the camera footage for a particular time period. Provided the
duration and timestamp, the application must provide the respective footage to the user.
The timestamp and the duration of the video will be given by the user
VIDEO SERVELLINCE PIPELINE:
DATA INGESTION:
In this pipeline first step to ingest data(video streaming) from camera-1, Using
OpenCV-python library video streaming takes place.
COMPUTE:
EXPLORE AND PREPARE:
Data from video stream is analyses and prepared based on requirement into 25fps.
TRANSFORM AND ENRICH:
Data after preparation it converted to batches under video length and stream duration.
STORAGE:
Here, to store data in raw form, we use json file system. But to convert and use for
future data analysis, used Microsoft Azure Sql Database engine( as Datawarehouse)
MONITORING AND MANAGING:
Here, to monitor we use python logging function and also handled error using
Exception handing.
PAGE 1
DATA VISUALIZATION:
In this task we didn’t used data visualization tool as power bi, etc. But as a Data
Engineering pipeline we represent data viz as end of pipeline future data scientist take
care of task.
REQUIREMENTS:
In task 1 and 2, They are several libraries are used based on requirement,
Before working with code install these libraries
Those Are:
1.Python
2.Opencv
3.Json
4.Imageio
5.Os
6.Time
7.pyodbc
Here also, used Microsoft Azure Sql Database for data Storage in Relational Format.
So, Need to have Azure account and sql db in resource group.
PAGE 2
TASK 1:
Write Python code for a real-time video analytics pipeline that performs the following
tasks: For any configurations related tasks create a python config file and must create a
SQL Database for storing information.
I. VIDEO STREAM INGESION:
CODE:
import cv2
import json
import time
import os
import math
import pyodbc
class img_stream_process:
def __init__(self, camera_id, geo_location, output_folder,
config_filename):
self.camera_id = camera_id
self.geo_location = geo_location
self.output_folder = output_folder
self.frame_id = 0
self.cap = cv2.VideoCapture(camera_id)
self.config_filename = config_filename
self.batch_duration = self.read_config_duration()[0]
self.batch_size = self.read_config_duration()[1]
# Create the output folder

os.makedirs(output_folder, exist_ok=True)
#JSON data list and batch list

self.json_data = []
self.batches = []
def read_config_duration(self):
# Read the duration from the config file
with open(self.config_filename, "r") as config_file:
config_data = json.load(config_file)
duration = config_data.get("duration")
batch_size = config_data.get("batch_size")
return duration,batch_size
PAGE 3
def video_capture(self):
# Check if the video stream was opened successfully
if not self.cap.isOpened():
print("Error: Could not open video source.")
exit()
COMMENT:
In this code first we imported all libraries required. Followed by a class

“ima_stream_process”, initialized instance (where self.cap is video capture variable)
of class and two functions related to json file extraction and video capture check
function.
Here, video input is taken transfer to next pipeline.
II. FRAME PROCESSING:
CODE:
def frame_processing(self):
# Set the desired frames per second (fps)
desired_fps = 25
frame_count = 0
frame_to_save = None
while True:
# Read a frame from the video stream
ret, frame = self.cap.read()
# Check if the frame was successfully read

if not ret:
print("Error: Could not read frame.")
break
frame_count += 1
# Save one frame per second as an image file

if frame_count % desired_fps == 0:
outframe_id = (frame_count // desired_fps)
frame_to_save = frame # Save the frame to be reused
frame_filename = os.path.join(self.output_folder,
f"frame_{outframe_id}.jpg")
PAGE 4
cv2.imwrite(frame_filename, frame_to_save) # Save the
frame as an image
# Append frame info to the JSON data list

frame_info = {
"camera_id": self.camera_id,
"frame_id": outframe_id,
"geo_location": self.geo_location,
"image_path": frame_filename
}
self.json_data.append(frame_info)
# Reuse the saved frame for the next 24 frames within the same
second
if frame_to_save is not None:
cv2.imshow("Frame", frame_to_save)
# Exit the loop when the 'q' key is pressed

if cv2.waitKey(1) & 0xFF == ord('r'):
break
# Release the video capture object and close any open windows
self.cap.release()
cv2.destroyAllWindows()
COMMENT:
Based on requirements given in scenario in, 25 Frames taken in one second but
only one frame is used rest 24 frames are reused same file.
Function followed by desired_frame, frame_count,frame_to_save. While loop
read the each frame, converted to one frame stored as a file.
Each file with frame_id, geo_location, camera_id,image file path will be form a
json file. It will append until video will be streamed.
At the end by pressing letter “r” the streaming will be stop.
III. Batching:
CODE:
def batch_frames(self):
# Calculate the number of batches.
num_batches = math.ceil(self.batch_duration / self.batch_size)
# Create a dictionary for each batch.

#batches = {}
PAGE 5
for i in range(num_batches):
batch_id = i + 1
# Calculate the starting and ending frame IDs for the batch.
starting_frame_id = batch_id * self.batch_size -
self.batch_size + 1
ending_frame_id = min(starting_frame_id + self.batch_size - 1,
self.batch_duration)
# Calculate the timestamp of the batch.

timestamp = starting_frame_id / self.batch_duration
# Add the batch to the dictionary.

batch = {
"batch_id": batch_id,
"starting_frame_id": starting_frame_id,
"ending_frame_id": ending_frame_id,
"timestamp": timestamp
}
self.batches.append(batch)
return self.batches
COMMENT:
In this process duration of video with batch size is mentioned in config file as per
requirements.
It extract data from config file and followed by forming a batch of dictionary
which contain batch id, starting frame id, ending frame id, timestamp all are
append into a list called batches.
Batch id calculated by using number of batches(duration/batches), starting frame
id is cal by “batch_id * self.batch_size - self.batch_size + 1”
IV. DATA STORAGE:
CODE:
import json
import pyodbc
# Create a connection to the database
conn = pyodbc.connect(Driver='{ODBC Driver 18 for SQL
Server}',Server="covid19-srv01.database.windows.net",Database="covid-
db",Uid="myadmin",Pwd="Hooked@8",Encrypt="yes",TrustServerCertificate="no"
,Connection_Timeout=30)
PAGE 6
# Create a cursor object for executing SQL queries
cursor = conn.cursor()
with open('batch_info.json', 'r') as json_file:

data = json.load(json_file)
# Define the INSERT SQL statement

sql_insert = "INSERT INTO [dbo].[mytable1] (batch_id, starting_frame_id,
ending_frame_id,timestamp) VALUES (?, ?, ?,?)"
# Insert records from the JSON data

for record in data:
cursor.execute(sql_insert,(record['batch_id'],
record['starting_frame_id'],
record['ending_frame_id'],record['timestamp']))
# Commit the transaction

conn.commit()
# Close the cursor and connection

cursor.close()
conn.close()
COMMENT:
Here, we need to connect to Azure SQL Database using pyodbc library, it is

python library for server connection. Also used ODBC(online Database
Connectivity by Microsoft it want to pre install in system)
Followed by ODBC connection key info. Json file insert and data present in json
file will be inserted into Database by using DML query.
End need to close connection.
V. Error Handling and Logging:
def setup_logger(self):
logger = logging.getLogger("frame_processor")
logger.setLevel(logging.DEBUG)
# Create a file handler and set the log level

log_file = os.path.join(self.output_folder, "frame_processor.log")
file_handler = logging.FileHandler(log_file)
PAGE 7
file_handler.setLevel(logging.DEBUG)
# Create a console handler and set the log level

console_handler = logging.StreamHandler()
console_handler.setLevel(logging.INFO)
# Create a formatter and set it for the handlers

formatter = logging.Formatter(
"%(asctime)s [%(levelname)s] %(message)s", datefmt="%Y-%m-%d
%H:%M:%S"
)
file_handler.setFormatter(formatter)
console_handler.setFormatter(formatter)
# Add the handlers to the logger

logger.addHandler(file_handler)
logger.addHandler(console_handler)
return logger
COMMENT:
Logging the data based on requirements.
CODE:
def process_frame(self, frame, timestamp):

try:
# Write one frame per second as an image file
if self.frame_id % 25 == 0:
image_filename = os.path.join(self.output_folder,
f"frame_{self.frame_id // 25}.jpg")
cv2.imwrite(image_filename, frame)
# Create JSON object for the frame

frame_info = {
"camera_id": self.camera_id,
"frame_id": self.frame_id,
"geo_location": self.geo_location,
"image_path": os.path.abspath(image_filename),
"timestamp": timestamp,
}
# Append frame info to the JSON data list
PAGE 8
self.json_data.append(frame_info)
# Check if a new batch needs to be created

if self.current_batch is None or timestamp -
self.current_batch["timestamp"] >= self.duration:
self.create_new_batch()
# Increment frame ID
self.frame_id += 1
except Exception as e:
self.logger.error(f"Error processing frame: {str(e)}")
COMMENT:
Exception handing the data based on requirement.
VI. Concurrency and Performance:
CODE:
Link:
Comment:
I had changed the code which work for two cameras.
It performance is depend on CPU and cameras data processing speed.
TASK 2:
Write a user driven python program that accepts,
➢ TIMESTAMP
➢ DURATION OF THE VIDEO FILE from the user.
Based on the above information, iterate through the batch information in the Database.
Create a metadata out of it which will be helpful in gathering the frame information from
the json file. Once the necessary frames are gathered convert them to a mp4 file and
present them to the user.
PAGE 9
CODE:
LINK
Comment:
• Based on requirement, Input is given by user, its data send to Database.

From database the metadata will be collected
• Code:
import imageio
import json
# Load the frame metadata from the JSON file

with open("frame_info.json", "r") as json_file:
frame_metadata = json.load(json_file)
# Create a list to store frames

frames = []
# Loop through the frame metadata and add frames to the list
for frame_info in frame_metadata:
frame = imageio.imread(frame_info["image_path"]) # Load the frame
image
frames.append(frame)
# Define the output video file path

output_video_path = "output_video.mp4"
# Create the MP4 video from the list of frames

imageio.mimsave(output_video_path, frames, fps=25) # Adjust the frame
rate (fps) as needed
print(f"Video saved to {output_video_path}")
Every frame is converted to video based on requirement.

Error handling and logging will be same as previous task.
From this two task we used real time video data to Extract, Transformed and
Loaded to Datawarehouse.
It’s a complete ETL process for Streaming data.
PAGE 10

Video Servellinece

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Video Servellinece

Uploaded by

Copyright:

Available Formats

VIDEO SERVELLINECE

DATA ENGINEERING REPORT

EXPLORE AND PREPARE:

TRANSFORM AND ENRICH:

MONITORING AND MANAGING:

I. VIDEO STREAM INGESION:

# Create the output folder

#JSON data list and batch list

In this code first we imported all libraries required. Followed by a class

II. FRAME PROCESSING:

# Check if the frame was successfully read

# Save one frame per second as an image file

# Append frame info to the JSON data list

# Exit the loop when the 'q' key is pressed

# Create a dictionary for each batch.

# Calculate the timestamp of the batch.

# Add the batch to the dictionary.

with open('batch_info.json', 'r') as json_file:

# Define the INSERT SQL statement

# Insert records from the JSON data

# Commit the transaction

# Close the cursor and connection

Here, we need to connect to Azure SQL Database using pyodbc library, it is

V. Error Handling and Logging:

# Create a file handler and set the log level

# Create a console handler and set the log level

# Create a formatter and set it for the handlers

# Add the handlers to the logger

Logging the data based on requirements.

def process_frame(self, frame, timestamp):

# Create JSON object for the frame

# Append frame info to the JSON data list

# Check if a new batch needs to be created

Exception handing the data based on requirement.

VI. Concurrency and Performance:

➢ DURATION OF THE VIDEO FILE from the user.

• Based on requirement, Input is given by user, its data send to Database.

# Load the frame metadata from the JSON file

# Create a list to store frames

# Define the output video file path

# Create the MP4 video from the list of frames

print(f"Video saved to {output_video_path}")

Every frame is converted to video based on requirement.

You might also like