You are on page 1of 14

Augmented Reality - TV

Image Processing - OpenCV, Python & C++

By: Rahul Kedia

Source Code:
https://github.com/KEDIARAHUL135/AR_TV.git

1
Table of Contents

Overview 3

Problem Statement 4

Approach 5

How to place the Aruco Markers 6

Setting up initial code 8

Finding Box Coordinates 10


Detect Aruco markers 10
Extracting Box Coordinates 10

Overlap Frames 12

Store and display the output video 14

2
Overview

“Augmented reality (AR) is an interactive experience of a real-world environment where the


objects that reside in the real world are enhanced by computer-generated perceptual
information, sometimes across multiple sensory modalities, including visual, auditory, haptic,
somatosensory, and olfactory.”

This is a typical definition of Augmented Reality from Wikipedia. You can think of Augmented
Reality as a combination of real and virtual worlds. For example, let us say we can see a TV
mounted on a wall, but it is not present in reality. Sounds fun.? Let’s see how this can be
done using simple Image Processing techniques with OpenCV, C++/Python.

3
Problem Statement
We are given 2 videos as input. The first video contains 4 Aruco Markers glued on a wall in a
rectangular fashion. The second video can be any random video with the same FPS as that
of the first video. We have to project the second video on the first video in place of the
rectangle formed by the outer corners of the 4 Aruco Markers as shown in the image below.

4
Approach
To solve this problem, we will implement the following easy steps:

● Firstly, after reading the images, we will try to overlap the 2 videos as required
frame-by-frame following the below mentioned steps. As the FPS for both the videos
is kept the same, we will not face the problem of one video running slower/faster than
its original form in the output.
● For overlapping the 2 frames from both the videos, we will first detect the aruco
markers present in the first video’s frame.
● Having obtained these aruco markers’ positions, we can get the coordinate values of
the 4 corners of the rectangle formed by these aruco markers.
● We will then apply the projective transformation on the second video using these
rectangle coordinates found earlier.
● Finally, we will overlap the 2 frames (the frame of the first video having the aruco
markers and the frame obtained after applying the projective transformation).

5
How to place the Aruco Markers

An ArUco marker is a synthetic square marker composed by a wide black border and an
inner binary matrix which determines its identifier (id). The black border facilitates its fast
detection in the image and the binary codification allows its identification and the application
of error detection and correction techniques. The marker size determines the size of the
internal matrix. For instance a marker size of 4x4 is composed by 16 bits[Source].

The first thing we will have to do is to place the aruco markers on a wall, or a frame, or
anywhere you like so that we can project another video over them.

For this project, I have used Aruco Markers of size 6x6. You can use OpenCV code to create
and store these Aruco Markers but I personally would suggest using this website for creating
and download them easily. If you still want to do this with the OpenCV code, you can visit
this webpage for C++ code.

We will use the first 4 Aruco Markers of size 6x6.

Note that each Aruco marker (of the same size) has a unique ID. We are using markers of ID
- {0, 1, 2, 3} for our convenience. Now each marker is of a square shape and each corner of
a marker is also unique. On detection of the aruco marker, its corner coordinates are
returned in such a way that the first set of corner coordinates is for the top left corner, the
second set of corner coordinates is for the top right corner, the third set of corner coordinates
is for the bottom right corner, and the last set of corner coordinates is for the bottom left

6
corner (when the marker is placed as it is). Using this information, we can also find the
rotation of the marker.

Let us now place the markers in such a way by which our computation will be the easiest.

As shown in the image, the markers are placed in such a way so that they form a rectangle,
and the marker with ID 0 is placed on the top left corner of that rectangle in such a way that
its corner 0 is placed on the top left corner (see the green point). Similarly, the marker with
ID 1 is placed on the top right corner of the rectangle and its corner 1 is again placed on the
top right corner (see the green point). Repeat the same logic for the remaining two markers
as shown in the image.

The benefit of such positioning is that we will have to get corner 0 of marker id 0, corner 1 of
marker id 1, and so on in order to get the corner coordinates of the final rectangle.

Now let us begin the coding!!!

7
Setting up initial code
Let us begin with importing the libraries.

import cv2
import numpy as np

Here, cv2 is the OpenCV library and numpy is the library for the Python programming
language. Note that you should have OpenCV contribution modules compiled for this project
as aruco markers are part of the contribution library. See here the installation steps for
contribution libraries in Python.

In the main function, let us now read the 2 input videos.

# Reading aruco video.


ArucoVid_Cap = cv2.VideoCapture("Videos/ArucoVideo_OnWall.mp4")

# Reading video for projection


ProjVid_Cap = cv2.VideoCapture("Videos/ProjVid.mpeg")

ArucoVid_Cap is the VideoCapture object for the video with Aruco markers glued on the
wall and ProjVid_Cap is the object for the video that will be projected on the wall.

We will also create a VideoWriter object for saving the final output video of our project.

# Creating video writer object


OutVid = cv2.VideoWriter('Videos/FinalVideo_py.avi',
cv2.VideoWriter_fourcc(*'XVID'),
ArucoVid_Cap.get(cv2.CAP_PROP_FPS),
(int(ArucoVid_Cap.get(3)),
int(ArucoVid_Cap.get(4))))

Now inside an infinite while loop, we will read the frames from both the videos and pass
these frames to other functions for further processing.

The frames are read as follows inside the infinite while loop.

8
# Reading frames
retArucoVid, ArucoVid_Frame = ArucoVid_Cap.read()
retProjVid, ProjVid_Frame = ProjVid_Cap.read()

Refer to the code of the project to see how to initialize this loop, how to check if videos are
opened, and how and when we will exit this loop. These things are explained properly with
the help of comments.

Now pass these 2 frames read to other functions for processing.

9
Finding Box coordinates

Now, we will have to find the coordinates of the 4 corners of the rectangle formed by the
Aruco markers so that we can project the projection video’s frame there. For this, we will first
have to detect the 4 aruco markers.

Detect Aruco markers

Detection of Aruco markers is a simple task. We just have to copy and paste the following
code for this.

# Detecting markers
GrayFrame = cv2.cvtColor(Frame,cv2.COLOR_BGR2GRAY)
ArucoDict = cv2.aruco.Dictionary_get(cv2.aruco.DICT_6X6_50)
Parameters = cv2.aruco.DetectorParameters_create()
Corners, IDs, RejectedImgPoints = cv2.aruco.detectMarkers(GrayFrame,
ArucoDict, parameters=Parameters)

Here, the Frame variable contains the frame having the aruco markers. Make sure that the
aruco marker dictionary mentioned here is the same as the one you used to create these
markers. We have used the dictionary DICT_6x6_50.

Finally, we get the IDs and the corner coordinates of all the markers in the variables IDs, and
Corners respectively. Try printing out and visualize these 2 variables and see the way they
have information stored before moving ahead.

Before moving ahead, make sure that you have found all 4 markers by checking the length
of the variable “IDs” ( length(IDs) = Number of markers found ). If all markers are not found, I
am skipping to the next frame as skipping few frames in between the complete video will not
affect the video overall.

Extracting Box Coordinates

Now let us find the 4 coordinates of the big box from the 16 coordinates found using the
aruco markers. Note that the following code may vary if the positioning of the aruco markers
is done in some other way like if different markers are used or if the markers are rotated. In
my case, I have put the markers in such a way so that my following code will become the

10
easiest. I just have to iterate the variable “Corners” once and for a marker of ID = i, I will
have to take its ith corner.

# Storing Box coordinates


BoxCoordinates = []
for i in range(4):
BoxCoordinates.append(Corners[int(np.where(IDs == i)[0])][0][i])

Here, np.where(IDs == i)[0] returns the index of the value i in the variable list IDs.
Thus, we now have the coordinates of the 4 corners of the box formed by the 4 aruco
markers. It is to note that the markers lie completely inside the box thus on overlapping the
projection video, the markers will be hidden.

11
Overlap Frames
Our final step remaining in this project is to overlap the 2 frames, the frame with aruco
markers (base frame) and the projection video’s frame (secondary frame).

The secondary frame will be applied/added to the base frame according to the box
coordinates found earlier. For this, we will first have to apply the perspective transformation
on the secondary frame, then find the mask image of the secondary frame, and then finally
perform bitwise operations in order to add the 2 frames. Instead of bitwise operations, you
can also use mathematical operations on the frames to add them. Have a look at the image
below to understand the process and then take a look at the code.

12
def ProjectiveTransform(Frame, Coordinates, TransFrameShape):
Height, Width = Frame.shape[:2]
InitialPoints = np.float32([[0, 0], [Width-1, 0],
[0, Height-1], [Width-1, Height-1]])
FinalPoints = np.float32([Coordinates[0], Coordinates[1],
Coordinates[3], Coordinates[2]])

ProjectiveMatrix = cv2.getPerspectiveTransform(InitialPoints,
FinalPoints)
TransformedFrame = cv2.warpPerspective(Frame, ProjectiveMatrix,
TransFrameShape[::-1])

return TransformedFrame

# Finding transformed image


TransformedFrame = ProjectiveTransform(SecFrame, BoxCoordinates,
BaseFrame.shape[:2])

# Overlaping frames
SecFrame_Mask = np.zeros(BaseFrame.shape, dtype=np.uint8)
cv2.fillConvexPoly(SecFrame_Mask, np.asarray(BoxCoordinates,
dtype=np.int32),
(255, )*BaseFrame.shape[2])

BaseFrame = cv2.bitwise_and(BaseFrame, cv2.bitwise_not(SecFrame_Mask))


OverlapedFrame = cv2.bitwise_or(BaseFrame, TransformedFrame)

Note here that the variable “BoxCoordinates” contains the 4 coordinate values of the box in
the clockwise order starting from the top left corner. The box coordinates are passed to the
function fillConvexPoly in the same order, but the last two points’ positions are
interchanged before passing them to the function getPerspectiveTransform. There is no
solid reason behind this but it is just the way these functions take the input.

The variable OverlapedFrame contains the final frame in which the initial 2 frames are
overlapped as required.

13
Store and display the output video
After obtaining the final frame, you can store and show the frames.

Storing the frame:

# Storing to output video


OutVid.write(OverlapedFrame)

Showing the frame:

# Displaying Output video


cv2.imshow("Output Video", OverlapedFrame)
cv2.waitKey(1)

Note that during the runtime of the project, the final video may appear to be running slower
than the input videos. This happens because between every two frames being shown, the
code for finding the markers, finding the box coordinates, and overlapping the frames is
running which might take more time as compared to the time between 2 frames in the input
video.

Finally, do not forget to release all the VideoCapture objects, the VideoWriter object, and the
display window.

# Releasing video objects and destroying windows


ArucoVid_Cap.release()
ProjVid_Cap.release()
OutVid.release()
cv2.destroyAllWindows()

The link for the final output video is shared in the code repository.

14

You might also like