You are on page 1of 11

VIDEO SERVELLINECE

DATA ENGINEERING REPORT

SENARIO:

You are part of a team, building a real-time surveillance system that performs
analytics on live camera streams. The system needs to process video frames in real-time,
apply analytics, and store or transmit the results. You are tasked with developing a
critical component of the system.

SCOPE:

The user might ask for the camera footage for a particular time period. Provided the
duration and timestamp, the application must provide the respective footage to the user.
The timestamp and the duration of the video will be given by the user
VIDEO SERVELLINCE PIPELINE:

DATA INGESTION:

In this pipeline first step to ingest data(video streaming) from camera-1, Using
OpenCV-python library video streaming takes place.

COMPUTE:

EXPLORE AND PREPARE:

Data from video stream is analyses and prepared based on requirement into 25fps.

TRANSFORM AND ENRICH:

Data after preparation it converted to batches under video length and stream duration.

STORAGE:

Here, to store data in raw form, we use json file system. But to convert and use for
future data analysis, used Microsoft Azure Sql Database engine( as Datawarehouse)

MONITORING AND MANAGING:

Here, to monitor we use python logging function and also handled error using
Exception handing.

PAGE 1
DATA VISUALIZATION:

In this task we didn’t used data visualization tool as power bi, etc. But as a Data
Engineering pipeline we represent data viz as end of pipeline future data scientist take
care of task.

REQUIREMENTS:
In task 1 and 2, They are several libraries are used based on requirement,
Before working with code install these libraries
Those Are:
1.Python
2.Opencv
3.Json
4.Imageio
5.Os
6.Time
7.pyodbc

Here also, used Microsoft Azure Sql Database for data Storage in Relational Format.
So, Need to have Azure account and sql db in resource group.

PAGE 2
TASK 1:

Write Python code for a real-time video analytics pipeline that performs the following
tasks: For any configurations related tasks create a python config file and must create a
SQL Database for storing information.

I. VIDEO STREAM INGESION:

CODE:

import cv2
import json
import time
import os
import math
import pyodbc

class img_stream_process:
def __init__(self, camera_id, geo_location, output_folder,
config_filename):
self.camera_id = camera_id
self.geo_location = geo_location
self.output_folder = output_folder
self.frame_id = 0
self.cap = cv2.VideoCapture(camera_id)
self.config_filename = config_filename
self.batch_duration = self.read_config_duration()[0]
self.batch_size = self.read_config_duration()[1]

# Create the output folder


os.makedirs(output_folder, exist_ok=True)

#JSON data list and batch list


self.json_data = []
self.batches = []

def read_config_duration(self):
# Read the duration from the config file
with open(self.config_filename, "r") as config_file:
config_data = json.load(config_file)
duration = config_data.get("duration")
batch_size = config_data.get("batch_size")
return duration,batch_size

PAGE 3
def video_capture(self):
# Check if the video stream was opened successfully
if not self.cap.isOpened():
print("Error: Could not open video source.")
exit()

COMMENT:

In this code first we imported all libraries required. Followed by a class


“ima_stream_process”, initialized instance (where self.cap is video capture variable)
of class and two functions related to json file extraction and video capture check
function.
Here, video input is taken transfer to next pipeline.

II. FRAME PROCESSING:

CODE:

def frame_processing(self):
# Set the desired frames per second (fps)
desired_fps = 25

frame_count = 0
frame_to_save = None

while True:
# Read a frame from the video stream
ret, frame = self.cap.read()

# Check if the frame was successfully read


if not ret:
print("Error: Could not read frame.")
break

frame_count += 1

# Save one frame per second as an image file


if frame_count % desired_fps == 0:
outframe_id = (frame_count // desired_fps)
frame_to_save = frame # Save the frame to be reused
frame_filename = os.path.join(self.output_folder,
f"frame_{outframe_id}.jpg")

PAGE 4
cv2.imwrite(frame_filename, frame_to_save) # Save the
frame as an image

# Append frame info to the JSON data list


frame_info = {
"camera_id": self.camera_id,
"frame_id": outframe_id,
"geo_location": self.geo_location,
"image_path": frame_filename
}
self.json_data.append(frame_info)

# Reuse the saved frame for the next 24 frames within the same
second
if frame_to_save is not None:
cv2.imshow("Frame", frame_to_save)

# Exit the loop when the 'q' key is pressed


if cv2.waitKey(1) & 0xFF == ord('r'):
break
# Release the video capture object and close any open windows
self.cap.release()
cv2.destroyAllWindows()

COMMENT:

Based on requirements given in scenario in, 25 Frames taken in one second but
only one frame is used rest 24 frames are reused same file.
Function followed by desired_frame, frame_count,frame_to_save. While loop
read the each frame, converted to one frame stored as a file.
Each file with frame_id, geo_location, camera_id,image file path will be form a
json file. It will append until video will be streamed.
At the end by pressing letter “r” the streaming will be stop.

III. Batching:

CODE:

def batch_frames(self):
# Calculate the number of batches.
num_batches = math.ceil(self.batch_duration / self.batch_size)

# Create a dictionary for each batch.


#batches = {}

PAGE 5
for i in range(num_batches):
batch_id = i + 1

# Calculate the starting and ending frame IDs for the batch.
starting_frame_id = batch_id * self.batch_size -
self.batch_size + 1
ending_frame_id = min(starting_frame_id + self.batch_size - 1,
self.batch_duration)

# Calculate the timestamp of the batch.


timestamp = starting_frame_id / self.batch_duration

# Add the batch to the dictionary.


batch = {
"batch_id": batch_id,
"starting_frame_id": starting_frame_id,
"ending_frame_id": ending_frame_id,
"timestamp": timestamp
}
self.batches.append(batch)
return self.batches

COMMENT:

In this process duration of video with batch size is mentioned in config file as per
requirements.
It extract data from config file and followed by forming a batch of dictionary
which contain batch id, starting frame id, ending frame id, timestamp all are
append into a list called batches.
Batch id calculated by using number of batches(duration/batches), starting frame
id is cal by “batch_id * self.batch_size - self.batch_size + 1”
IV. DATA STORAGE:

CODE:

import json
import pyodbc
# Create a connection to the database
conn = pyodbc.connect(Driver='{ODBC Driver 18 for SQL
Server}',Server="covid19-srv01.database.windows.net",Database="covid-
db",Uid="myadmin",Pwd="Hooked@8",Encrypt="yes",TrustServerCertificate="no"
,Connection_Timeout=30)

PAGE 6
# Create a cursor object for executing SQL queries
cursor = conn.cursor()

with open('batch_info.json', 'r') as json_file:


data = json.load(json_file)

# Define the INSERT SQL statement


sql_insert = "INSERT INTO [dbo].[mytable1] (batch_id, starting_frame_id,
ending_frame_id,timestamp) VALUES (?, ?, ?,?)"

# Insert records from the JSON data


for record in data:
cursor.execute(sql_insert,(record['batch_id'],
record['starting_frame_id'],
record['ending_frame_id'],record['timestamp']))

# Commit the transaction


conn.commit()

# Close the cursor and connection


cursor.close()
conn.close()

COMMENT:

Here, we need to connect to Azure SQL Database using pyodbc library, it is


python library for server connection. Also used ODBC(online Database
Connectivity by Microsoft it want to pre install in system)
Followed by ODBC connection key info. Json file insert and data present in json
file will be inserted into Database by using DML query.
End need to close connection.

V. Error Handling and Logging:

def setup_logger(self):
logger = logging.getLogger("frame_processor")
logger.setLevel(logging.DEBUG)

# Create a file handler and set the log level


log_file = os.path.join(self.output_folder, "frame_processor.log")
file_handler = logging.FileHandler(log_file)

PAGE 7
file_handler.setLevel(logging.DEBUG)

# Create a console handler and set the log level


console_handler = logging.StreamHandler()
console_handler.setLevel(logging.INFO)

# Create a formatter and set it for the handlers


formatter = logging.Formatter(
"%(asctime)s [%(levelname)s] %(message)s", datefmt="%Y-%m-%d
%H:%M:%S"
)
file_handler.setFormatter(formatter)
console_handler.setFormatter(formatter)

# Add the handlers to the logger


logger.addHandler(file_handler)
logger.addHandler(console_handler)

return logger

COMMENT:

Logging the data based on requirements.

CODE:

def process_frame(self, frame, timestamp):


try:
# Write one frame per second as an image file
if self.frame_id % 25 == 0:
image_filename = os.path.join(self.output_folder,
f"frame_{self.frame_id // 25}.jpg")
cv2.imwrite(image_filename, frame)

# Create JSON object for the frame


frame_info = {
"camera_id": self.camera_id,
"frame_id": self.frame_id,
"geo_location": self.geo_location,
"image_path": os.path.abspath(image_filename),
"timestamp": timestamp,
}

# Append frame info to the JSON data list

PAGE 8
self.json_data.append(frame_info)

# Check if a new batch needs to be created


if self.current_batch is None or timestamp -
self.current_batch["timestamp"] >= self.duration:
self.create_new_batch()

# Increment frame ID
self.frame_id += 1

except Exception as e:
self.logger.error(f"Error processing frame: {str(e)}")

COMMENT:

Exception handing the data based on requirement.

VI. Concurrency and Performance:

CODE:

Link:

Comment:
I had changed the code which work for two cameras.
It performance is depend on CPU and cameras data processing speed.

TASK 2:
Write a user driven python program that accepts,

➢ TIMESTAMP

➢ DURATION OF THE VIDEO FILE from the user.

Based on the above information, iterate through the batch information in the Database.
Create a metadata out of it which will be helpful in gathering the frame information from
the json file. Once the necessary frames are gathered convert them to a mp4 file and
present them to the user.

PAGE 9
CODE:

LINK

Comment:

• Based on requirement, Input is given by user, its data send to Database.


From database the metadata will be collected
• Code:

import imageio
import json

# Load the frame metadata from the JSON file


with open("frame_info.json", "r") as json_file:
frame_metadata = json.load(json_file)

# Create a list to store frames


frames = []

# Loop through the frame metadata and add frames to the list
for frame_info in frame_metadata:
frame = imageio.imread(frame_info["image_path"]) # Load the frame
image
frames.append(frame)

# Define the output video file path


output_video_path = "output_video.mp4"

# Create the MP4 video from the list of frames


imageio.mimsave(output_video_path, frames, fps=25) # Adjust the frame
rate (fps) as needed

print(f"Video saved to {output_video_path}")

Every frame is converted to video based on requirement.


Error handling and logging will be same as previous task.
From this two task we used real time video data to Extract, Transformed and
Loaded to Datawarehouse.
It’s a complete ETL process for Streaming data.

PAGE 10

You might also like