Professional Documents
Culture Documents
CERTIFICATE
II
CERTIFICATE
Mr. Sumeet Kataria, of Oneirix labs was assigned as their guide for the
entire duration of this project.
The aims that had been decided upon at the start of the project
undertaking were satisfactorily completed in good time. Oneirix Labs is pleased
with the quality of work and ultimate results obtained.
III
ACKNOWLEDGEMENT
IV
Abstract
Art, it is said, merely reflects life and society. If there is one medium of
expression that has seduced the imagination of the populace, it is the motion
picture. From humble beginnings, the art of film-making is now nearly a century
old.
It comes as little surprise then to see a whole host of amateur filmmakers
making home-made films. This of course has been made possible due to the
surge in video and audio recording technologies. The digital video camcorders
have been envisaged to help such target market segments by adding a lot of
functionalities such as autofocusing and autoexposure. This however, does not
guarantee the right results all the time. In fact a large portion of every film’s
production time is normally wasted in the post-production stage, with editing
required to perform various tasks such as colour correction, brightness and
contrast control and a frame-by-frame editing of videos.
This requires time, training and introduces human errors, in that there will
always be a limitation on what is perceived as the right brightness level or colour
consistency over the entire duration of a scene from the film.
This is where we come into the picture. The problems with autofous and
autoexposure in modern digital cameras are discussed in depth in the chapters
to follow.
V
If we understand exactly how images are formed in a video stream, it is
possible to use the concepts of image processing to correct the irregularities
cause by the camera, by using the good images to correct or improve those
lacking in quality, leading to an improved overall video. Additionally, we are
seeking to make the whole process as automated as possible to eliminate the
inevitable human-perception errors.
For the end user, it translates to shorter hours, more automation and
reliability and even more functionality.
VI
Contents Page No
VII
7.4 Problems Faced And Its Solution 31
8. Large Space Matrix 33
9. Exposure Correction 37
9.1 A quantitative measure of the exposure in an image 38
9.2 Plotting of the brightness curve 38
9.3 Approach 1: The transfer curve method 40
9.4 Approach 2 42
9.5 Approach 3 44
9.6 Algorithm 45
9.7 Backslash or matrix left division 46
9.8 Pseudo Inverse 47
9.9 Implementation of the algorithm 48
10. Focus 50
10.1 Quantifying focus 51
10.2 Methods to measure the frequency components 51
10.2.1 FFT 51
10.2.2 DCT 53
10.3 Practically applying the above concepts 57
11. Actual Correction 61
11.1 Calculation of weights 64
12. Results 65
13. Conclusion and Future Scope 69
14. References 73
VIII
List of Figures Page No
IX
12.2 First Frame After Correction. 66
12.3 Second Frame. 66
12.4 Second Frame After Correction. 66
12.5 Third Frame. 66
12.6 Third Frame After Correction. 66
12.7 Fourth Frame. 67
12.8 Fourth Frame After Correction. 67
12.9 Final Frame 67
12.10 Final Frame After Correction. 67
X
Chapter 1
-1-
Need For the Project
Technology today has stretched out its long arms to embrace the
populace ad empower them to do things once deemed impossible. Nearly
anyone today can shoot and edit films with the help of a digital camera and a
computer at home. These Digicams have spurned a generation of filmmakers or
all hues and styles.
Factors such as the lighting and ambient colour cannot really be controlled
externally by an amateur as he does not have the necessary resources for
setting up expensive studio lights or reflectors etc. The video that will be obtained
by his Digicam is bound to look rather crude and unedited. The video editing
softwares available today offer him/her some video tools that the filmmaker can
use to improve the overall appearance of his video shoot. These are discussed in
the next section.
-2-
in the pages that follow. The traditional method has always been to rectify or
compensate for these shortcomings at the post-processing stage rather than at
the data acquisition stage itself. Thus, what one hopes to do is to use the video
obtained directly from the camera in its raw form to rectify and create
aesthetically appealing, professional-looking videos.
-3-
Chapter 2
-4-
Video Encoding And Decoding Concepts
MPEG stands for Moving Pictures Expert Group, the committee of industry
which created the standard. MPEG is, in fact, a whole family of standards for
digital video and audio signals using DCT compression. MPEG-2, which employs
DCT compression, is certain to become the dominant standard in consumer
equipment for the foreseeable future. MPEG takes the DCT compression
algorithm and defines how it is used to reduce the data rate, how packets of
video and audio data are multiplexed together in a way that will be understood by
an MPEG decoder.
-5-
DCT or Discrete Cosine Transform, to give it its full name, uses the fact
that adjacent pixels in a picture (either physically close in the image (spatial) or in
successive images (temporal)), may be the same value. Small blocks of 8 x 8
pixels are 'transformed' mathematically in a way that tends to group the common
digital signal elements in a block together. DCT doesn’t directly reduce the data
but the transform tends to concentrate the energy into the first few coefficients
and many of the higher frequency coefficients are often close to zero. Bit rate
reduction is achieved by not transmitting the higher frequency elements, which
have a high probability of not carrying useful information
MPEG first aim was to define a video coding algorithm for application on
'digital storage media', in particular for CD-ROM. Very rapidly the need for audio
coding was added and the scope was extended from being targeted solely on
CD-ROM to trying to define a 'generic' algorithm capable of being used by
virtually all applications, from storage-based multimedia systems, television
broadcasting, and communications applications such as VoD and videophones.
Both the MPEG-1 and MPEG-2 standards are split into three main parts:
Audio coding, video coding, and system management and multiplexing. MPEG
itself is split into three main sub-groups, one responsible for each part, and a
number of other sub-groups to advice on implementation matters, to perform
subjective tests, and to study the requirements that must be supported.
-6-
though only two are currently relevent to broadcasting, main profile which is
essentially MPEG-1 extended to take account of interlace scanning and encodes
chroma 4:2:0 and professional profile which has 4:2:2 chrominance resolution
and is designed for production and post production.
-7-
Error prone environments
Remultiplexing
Support for scrambling
To extract images (frames) from a video file, we used the following open-
source softwares:
-8-
libpostproc is a library containing video postprocessing routines.
libswscale is a library containing video image scaling routines.
2.2.2 Frame Dumper – a simple tool for dumping images in windows in .bmp
format.
Usage:
FrameDumper.exe VideoName FromFrameID ToFrameID StepSize [TargetDir]
Parameters:
VideoName: The source video file name (e.g., video.mpg). Use '\' for path.
FromFrameID: The starting frame number (indexed from 1).
ToFrameID: The ending frame number (indexed from 1).
StepSize: The step size during dumping. 1 for continuous dumping.
TargetDir: Optional. Specify the dumping directory. Otherwise use current.
-9-
Chapter 3
- 10 -
Problems Faced By Amateurs
Most of the digital cameras available on the market today provide a host of
features, the most common being auto-exposure setting, auto-focus, auto-white
balance, image stabilizers, etc. Exposure, termed very simply, refers to the
amount of light that the camcorder's lens collects; auto exposure is a system that
controls the incoming light to prevent (among other things) over- or under-
exposure. An over-exposed shot looks washed out and overly bright, while an
underexposed shot looks shadowy and dark.
The camcorder's exposure system regulates two things: iris and shutter
speed. The iris diaphragm in the lens controls the amount of light admitted, while
the electronic circuitry referred as the shutter governs the amount of time the chip
has to respond to the light. By manipulating the lens aperture (and sometimes
- 11 -
the shutter speed), the camcorder does its best to deliver optimum light to the
image sensing chip regardless of lighting conditions.
When you turn on the camera, special circuitry analyzes the amount of
light hitting the chip during the 1/60 second it takes to form an image. If that
amount is greater than the optimum value, which is usually the case, the circuitry
calculates how much to "stop down" (close) the iris diaphragm that sits between
the sensor and the light source. Then it sends the appropriate command to the
circuits controlling the servo motor of the diaphragm. The motor closes the
diaphragm down, light transmission falls to ideal, and the CCD forms a perfectly
exposed image.
Although auto exposure facility in video cameras saves a lot of hassle and
trouble for the amateur photographer, it has some obvious disadvantages. The
two main problems associated with it are slow response, and not enough
intelligence in the algorithms to decode the image that light forms, let alone
determine which part of that image to expose properly.
- 12 -
readable. Above and below that range, highlights "burn out" or become blobs of
pure white and shadows "block up" or drop to solid black. Four-to-one is about as
wide a contrast ratio as a video system can maintain. The problem is that we are
generally faced with much higher contrast ratios in the real world, leading to the
camcorder properly recording those parts of the scene that fall within its contrast
range and lets the others burn out or block up. Many algorithms such as the
simplistic average brightness based methods to sophisticated weighted
averaging to calculate exposure better for the center portion, are used, The auto-
exposure modes attempt to make whatever you are metering 18% gray (in the
middle). However even the best of them suffer from problems especially evident
in videos panned across a wide brightness range.
3.2 Autofocus
- 13 -
Automatic exposure
There are two types of autofocus systems: active and passive. Some
cameras may have a combination of both types, depending on the price of the
camera
The two main causes of blurred pictures taken via autofocus video cameras are:
The human eye has a rather fast autofocus. For e.g holding your hand up near
your face and focusing on it, and then quickly looking at something in the
distance shows the distant object clearly. Cameras however, are not nearly this
quick or this precise.
Autofocus in a video camera is a passive system that also uses the central
portion of the image. Though very convenient for fast shooting, autofocus has
some problems:
- 14 -
Chapter 4
Post-processing Tools
- 15 -
Post-processing Tools
Now that there is a fair idea of the problems that we are dealing with, let
us look closely at the current market scenario and the post-processing tools
available. Attention is drawn to the costs for each of these tools listed below.
Features
Broad format support
Incredible real-time effects
Comprehensive editing tools
Expanded power as the hub of Final Cut Studio
Open, extensible architecture
Final Cut Pro 6 now also supports mixed video formats (both resolution
and frame rate) in the timeline with real time support.
Cost USD $ 1299.
- 16 -
finishing. Media Composer is Avid's primary editing software solution. Current
version of Media Composer (MCSoft) has the following important features
• Animatte
• 3D Warp
• Paint
• Live Matte Key
• Tracker
• Timewarps with motion estimation (FluidMotion)
• SpectraMatte (high quality chroma keyer)
• Cost USD $ 1822
While these tools listed above have tremendous capabilities, they are found
severely wanting in the following aspects:
• The software itself is extremely expensive, with costs running into a few
lakh INR, making it inaccessible to the populace. In fact one rarely finds
- 17 -
users for these softwares outside of big video editing studios or
workshops.
• Sufficient time and effort needs to be invested in learning and operating
these tools. In effect, professionals who can take advantage of the array of
tools available in them are needed to make economical sense and to do
justice to the video itself.
• The process of analysing every frame for colour and brightness itself is
rather tedious and takes many days for a few seconds of footage.
• The most important factor here though, besides the cost, is the fact that it
is prone to the deficiency of human perception. For e.g, even a
professional judging two or three different frames for, say the brightness,
might cause a small amount of mismatch in the settings. If these errors
are allowed to accumulate, he/she may need to redo the whole process
after looking at the resultant video.
Therefore, our approach, which shall be discussed next, involves some basic
correction on the video’s brightness and an improvement in its aesthetic appeal.
An attempt has been made to ensure that the end-user has little to do in terms of
the actual correction itself, which requires a sophisticated automation of the
whole process.
- 18 -
Chapter 5
Our Approach
- 19 -
Our Approach
We have, upto this point, discussed the problems that arise out of using
Digicams and how it is difficult to rely on video editing softwares already available
in the market to help with the editing. We discuss hereon a different method to
automate and improve the overall video quality by relying on the familiar
procedure of post-processing.
Post-processing essentially involves the raw video file obtained from the
Digicam. Usually, the areas that need substantial improvement arise due to a
motion in the camera, as the user is moving the camera to focus on different
views. As a result, the parts of the video (or more precisely, the set of frames that
are involved in this transition) appear very unappealing when viewed in its raw
form. Hence, a particular set of frames will definitely appear to be extremely dark
or extremely bright, depending on the kind of transition.
More often than not, when there is a drastic change in the brightness as
discussed above, it is either due to the fast motion of the camera or due to the
Digicam’s inability to adjust its focus or brightness fast enough. However, as the
same scene or view is kept in focus for a moderate amount of time after the
transition, the camera eventually adjusts both its exposure and focus to obtain
correct images. We can think of these good frames as those containing a high
amount of information. The transitory frames then are lacking in this same
information.
- 20 -
simple example of a camera initially filming an open street in sunlight. The
camera is now moved to focus on an object located inside the apartment. The
apartment lighting is poorer than the bright sunlit exterior and when the camera
rests upon the object inside, it takes a finite amount of time to adjust to its
ambient luminosity.
Note that in this transition period, there will be a number of frames that
are of poor quality. In this case, the transitory frames will appear dark as the
exposure of the camera is initially set for the street’s luminosity and thus the
camera lens’s aperture would be open only very slightly.
Now for human eyes, there is also this same transition from brightness to relative
darkness. Except in the case of our eyes, they are able to adjust much faster to
changing luminosity and focal distances. Thus, Digicams that are unable to do
this result in videos that look unaesthetic and therefore unprofessional.
Frame 5 Frame 20
Figure 5.1: Frame 5 containing good information and frame 20 with poor information.
Note that in the example above, the object inside the apartment that the
camera focuses upon is present both in the bad frames as well as the good ones
with the information. Thus, what is proposed is to find the exact motion between
these frames and restore or fill in information, on a pixel-by-pixel basis. This
however does not mean simply replacing good frames into bad ones. In fact, the
algorithm followed is a lot more complex and described in the sections below.
- 21 -
Chapter 6
Block Diagram
- 22 -
Block Diagram
Extracting
Input Frames Image
Frames
Video File (Images) Processing
(FFMpeg)
Quantifying Motion
Exposure Estimation
Quantifying
Focus
Image
Correction
Compiling
Storing
Frames
Audio File
into Video
Enhanced
Video
Output
- 23 -
The block diagram shown above systematically details the procedure
followed in order to try to automate the process. The MPEG file from the user’s
camera forms the input video file that is split or ripped up into frames using
ffmpeg, which has been discussed before. The individual frames in the .jpg
format are now processed. The correction of these images requires that the
motion between these frames is identified and used. Similarly the focus and
exposure values of these frames are also calculated.
The final stage, post the correction, simply involves recompiling the
frames into a video. This again is done by means of the ffmpeg software. So as
to maintain the sound integrity of the original video, the sound is originally stored
separately and then used again when the final video is being compiled.
The resultant video is of an improved quality, with the few sections that
appeared bad being corrected.
- 24 -
Chapter 7
Motion Estimation
- 25 -
Motion Estimation
- 26 -
7.2 Algorithm for motion detection
With these basic results, we strove to enhance the algorithm, and after
considerable trial-and-error, we decided to implement the same algorithm, this
time using the filter function instead of the lengthy procedure of shifting the
smaller image over the larger one. The filter function readily allows us to
calculate the correlation of the images, and does so only over the overlap area
when we specify the attribute of ‘valid’.
- 27 -
The filter function is defined to perform the following operation:
Y = filter2(h,X,shape)
It returns the part of Y specified by the shape parameter.
‘shape’ is a string with one of these values:
'full' Returns the full two-dimensional correlation. In this case, Y is larger
than X.
'same' (default) Returns the central part of the correlation. In this case, Y is
the same size as X.
'valid' Returns only those parts of the correlation that are computed
without zero-padded edges. In this case, Y is smaller than X.
- 28 -
7.3 Test Results Of Motion Algorithm
To demonstrate the use of the motion detection algorithm, these are a few
test images which can simulate a camera moving over a fixed background
towards the bottom right direction.
Figure 7.2: Sequences of images showing the apparent motion between them.
The next logical step was to implement this algorithm in order to prepare an
entire database of the motion for a complete sequence of frames. The database
so formed would be useful while evaluating and enhancing further performance
specifications of the video such as focus, exposure, jitter, etc. The graph below
shows the plot of such a database, for the test images above. As can be readily
- 29 -
seen from the images, the camera pans across the images towards the
downward right direction, which is very evident from the motion graph.
Figure 7.3: Plot of calculated motion, using the motion estimation algorithm.
- 30 -
7.4 Problems Faced And Its Solution
The main problem that was faced during motion detection was due to images
like the ones shown below. Most of the image is dark, with a grayscale value of
0. This results in highly inaccurate correlation coefficient results. To overcome
this problem, we followed an approach of using only “good regions” of the image
to estimate the motion.
The images were divided into a number of blocks, say 16 (4X4). For each
of these blocks, certain parameters are calculated, on the basis of which the
quality of the block is defined. The best block out of these is used, depending on
the quality factor that is assigned to each of them. Motion is estimated only for
this particular block, and the same motion is assumed for the entire frame.
- 31 -
Figure 7.5: Good block obtained from the entire frame.
This approach yielded us the best out the 16 blocks, to be used to find out the
motion for the 2 particular frames in consideration. This algorithm was
implemented on every frame before using correlation to estimate the motion. The
results that were obtained were very consistent with the visual perception of
motion.
- 32 -
Chapter 8
- 33 -
Large Space Matrix –Obtained From Motion Estimation
At this juncture, it is now clear how to estimate accurately the exact motion
between two consecutive frames. This motion, in the form of x-y co-ordinates can
be obtained now for the entire series of frames that require correction with a few
of those frames containing good information.
The entire spatial region which the camera has panned in the course of its
motion can be termed as a ‘global space’ as far as the camera is concerned. A
640 x 480 frame, a part of the global space, is captured by the camera, every
1/30th of a second (this is decided by the frame rate of the camera). We can
understand this by analogy of how a panoramic picture is shot, which consists of
a set of photos taken one after the other with the result being a larger photo with
information from all the pictures after they have been stitched together
(obviously, after compensating for motion).
Similarly, our large space is defined as frames placed so that each one
corresponds to one plane in a multi-dimensional space. The location of these
frames in this large space is dictated by its motion from the previous frame.
Shown below are some results to better understand the formation of this ‘matrix’:
- 34 -
Figure 8.1: Large space matrix with each frame as one dimension.
When compensated for by the motion between the frames, the matrix formed is shown in
the figure above. Mathematically, each of these pixel values will be a number. The
motion above is purely horizontal. However, motion can exist in both directions.
- 35 -
Thus, the exact x-y motion co-ordinates for the entire series of frames to be
corrected are used to form a database of these values. The database formed is
now in a format that is easily usable and is used in the next stage of brightness
correction.
The following information can now be obtained very easily from the database or
‘large space’ matrix:
• The location of each point in global space, with respect to local camera
frame co-ordinates.
• Information regarding when a pixel was introduced into the database.
• Information regarding the depth of each point, i.e., the number of frames
for which a point stayed in the database.
Clearly, we are now able to decide how long (in terms of the number of
frames) that any single object persists in the video. This is what has been
referred to earlier as the depth of the point. Hence, further manipulations of this
large matrix can be carried out in the exposure correction section discussed next.
- 36 -
Chapter 9
Exposure Correction
- 37 -
Exposure Correction
- 38 -
The typically observed energy pattern of the frames can be shown as:
- 39 -
9.3 Approach 1: The transfer curve method
We tried to implement transfer curve functions such as the one given below,
to boost the dark region of the image, and reduce the intensity in the bright
regions. However these too resulted in a loss of information, especially since
cameras treat all values above a certain threshold to be 255 (the maximum
brightness value).
Due to this limitation of its range, the transfer curve method tends to
reduce the intensity of portions where it should not really be scaled at all, for eg.
the tubelight region in the figures below.
- 40 -
O/P
255 I/P
This approach assumes that brightness curve followed by each and every pixel
on a frame is same. However, it is not true. The value (brightness)of every pixel
changes individually and hence it should be treated individually.
- 41 -
9.4 Approach 2
Treat every pixel separately
Smoothen the brightness curve of every individual pixel
Figure 9.4: Frames showing the variations of the pixel intensity in time
before the auto exposure of camera settles.
From the above images, it can be demonstrated that each pixel needs to
be treated individually for correction. The center tube light region in the above
frames remains at the same grayscale value of 255 (saturated) right from frame
number 1 till frame number 11. However the peripheral region starts out at a
saturated value of 255, and ends in frame 11 at a much lower value of around
180 or so. Thus if we plot the brightness curve for these two images, they will be
quite different from each other, and consequently, the correction to be
implemented has to different for these pixels.
The reason that this happens is because the actual brightness of the tube
light is much higher than that of the peripheral region, however the limitations of
the camera cannot capture these differences at a high exposure setting. Thus
- 42 -
both the regions appear saturated until the auto-exposure of the camera settles
to the correct value for the frame.
Pixel 1
Pixel 5
- 43 -
9.5 Approach 3
Pixel 1
Pixel 20
Global time
Pixel 1 Pixel 20
Thus the additional constraints on the algorithm to obtain new exposure values:
The new exposure values should be minimally deviant in space.
- 44 -
9.6 Algorithm: To calculate the New Exposure Curve For each new pixel
A: Constraint Matrix
Order of A: (n*number of constraints) X n
As we can see, there are many constraints on a single value of exposure, that is,
the number of equations is more than number of variables. This results in an
over-determined system, which can be solved using pseudo-inverse method.
This equation is solved using the octave inbuilt function of matrix left division.
- 45 -
9.7 Backslash or matrix left division
- 46 -
9.8 Pseudo Inverse
The inverse A −1 of a matrix A exists only if A is square and has full rank. In
this case, A × X = B has the solution X = A −1 × B .
A + = ( AT A) −1 AT
and the solution of A × X = B is X = A + × B
Calculation
The best way to compute A + is to use singular value decomposition. With
A = USV T .
Where U and V (both nXn) orthogonal
S (mXn) is diagonal with real,non-negative singular values
We find
A+ = V ( S T S ) −1 S T U T
If the rank r of A is smaller than n , the inverse of S T S does not exist, and one
uses only the first r singular values; then becomes an ( r X r) matrix and U , V
shrink accordingly.
- 47 -
9.9 Implementation of the algorithm
Here we are trying to form constraint matrix A and ideal value matrix B.
1. Find a new-pixel, for how many frames it exists and what are its old exposure
values.
enewi , j ,n = eold i , j ,n
Convert all the pixel values of 3D matrix (obtained by motion detection
algorithm) into a column vector. This column vector will form first [n X 1]
part of the matrix B.
An [n X n] identity matrix will form first part the matrix A thus implementing
the above equation for every pixel in the 3D matrix.
enewi , j ,n = enewi , j ,n −1
Matrix B part corresponding to the second constraint is a column vector of
zeroes.
Matrix A is formed so as to implement the above equation, for each I,j
possible. Thus every row of matrix A contains 1 and -1, at the positions
enewi , j ,n − enewi , j ,n −1 = 0 .
- 48 -
⎡enew1,1,0 ⎤
⎡1 − 100 ⎤ ⎢ ⎥ ⎡0 ⎤
⎢01 − 10⎥ × ⎢enew1,1,1 ⎥ = ⎢0 ⎥
⎢ ⎥ ⎢enew ⎥ ⎢ ⎥
⎢⎣001 − 1⎥⎦ ⎢ 1,1, 2
⎥ ⎢⎣0⎥⎦
⎢⎣enew1,1,3 ⎥⎦
- 49 -
Chapter 10
Focus
- 50 -
Focus
10.1 Quantifying focus: How to quantify the total focus quality of an image
In our effort to understand how the focus for a video varies as images are
taken in rapid succession, we must be able to somehow quantify the focus of an
image. Only then will it be possible to decide if the image is in good focus or not.
This is a rather difficult problem as the focus quality of an image is not a
physically measurable quantity.
What comes to our rescue is the fact that high focus images are images
that have a very clearly defined set of boundaries. That is, in good focus images,
there is a very fine distinction in the details of the objects in the image. This
points to high frequency components at the edges. Now, a measure of these high
frequency components can give you a very decent estimate of the image’s focus
quality.
We have basically worked on two methods for focus quantification (i.e. frequency
analysis).
The Fast Fourier Transform (FFT) for 2D images
The Discrete Cosine Transform (DCT) for 2D images
Let us look at both these functions a little more closely.
10.2.1 FFT
- 51 -
Evaluating these sums directly would take O(N2) arithmetical operations.
An FFT is an algorithm to compute the same result in only O(N log N) operations.
In general, such algorithms depend upon the factorization of N, but (contrary to
popular misconception) there are FFTs with O(N log N) complexity for all N, even
for prime N.
Many FFT algorithms only depend on the fact that is a primitive root
of unity, and thus can be applied to analogous transforms over any finite field,
such as number-theoretic transforms.
Since the inverse DFT is the same as the DFT, but with the opposite sign
in the exponent and a 1/N factor, any FFT algorithm can easily be adapted for it
as well.
This demonstration shows the FFT of a real image and its basis functions:
Note that in the images below, u* et v* are the coordinates of the pixel selected
with the red cross on F(u,v). Blue cross contributes to the same frequency.
- 52 -
Figure 10.1: Demonstration of FFT transforms.
10.2.2 DCT
- 53 -
The most common variant of discrete cosine transform is the type-II DCT,
which is often called simply "the DCT"; its inverse, the type-III DCT, is
correspondingly often called simply "the inverse DCT" or "the IDCT".
The DCT, and in particular the DCT-II, is often used in signal and image
processing, especially for lossy data compression, because it has a strong
"energy compaction" property: most of the signal information tends to be
concentrated in a few low-frequency components of the DCT, approaching the
Karhunen-Loève transform (which is optimal in the decorrelation sense) for
signals based on certain limits of Markov processes.
- 54 -
like the Fourier series, implies a periodic extension of the original function. A
DCT, like a cosine transform, implies an even extension of the original function.
>>>
Spatial Frequency
- 55 -
This demonstration shows the DCT of an image:
u* and v* are the coordinates of the pixel selected with the red cross on C(u,v).
- 56 -
10.3 Practically applying the above concepts
Carrying out the DCT and FFT operations on images yielded a map of
their frequency components. As expected, there was a larger concentration of
signal strength in the low frequency components. This is an obvious outcome as
it represents most of the image’s uniformities. Thus, when quantifying the focus,
we assigned a low scalar or weightage for these components. After all, we are
more interested in the high-frequency components. Subsequently, the high
frequency components were assigned higher weightage. For 2D FFT
calculations, the low frequency components are shifted to the centre and a
weight assigned to them. DCT low frequency components reside in the upper left
corner. The distance from the origin i.e. u=v=0 was considered as the weight
associated with the frequency components. A simple summation yields the focus
factor. In order to normalize the operation for different amplitudes and slightly
varying images, the above mentioned focus factor was divided by the sum of the
amplitudes of all the frequency components of the respective transforms.
Shown below are some sample images for the code written to estimate the focus
quality in MATLAB.
- 57 -
The images shown above are numbered 1 to 6. The first two images represent
pictures in bad focus, whereas the 4th and 6th images are of particularly good
quality. Both, a 2D FFT and 2D DCT of the all images were taken, first by taking
the luminance component of the RGB images and then performing the required
operations. Shown below are what the DCT and the FFT looks like for the last
image. Please note that the absolute value of the 2D DCT or FFT is plotted on a
log scale.
This is the FFT for the sixth image shown above, shifted to the centre. The
low frequency components are located in the centre. Shown below is the 2D DCT
for the same image. The top-left corner has high-valued low frequency
components, whereas the other corners represent the high frequency
components in their respective directions.
- 58 -
Figure 10.6: DCT of an image.
- 60 -
Chapter 11
Actual Correction
- 61 -
Actual Correction
Figure 11.1: Old and new exposure curve and original pixel values.
Consider a series of images, the exposure value for them is given by the
red curve shown in above diagram. The new exposure curve obtained is given by
the blue curve.
Now, for a pixel whose pixel values change as 10 35 50 75 … on the
global time, the ideal value for a that pixel, at a particular instant of time, will be
function of old exposure curve, new exposure curve, future and past values of
that pixel.
- 62 -
and resultant value be: R
The temp variable thus obtained will correct the pixel value, considering the past
and future values of the pixel are only function of old exposure curve and they
are correct
However, this may not be true. The actual pixel value should be a function of new
exposure curve as well.
Mathematically,
1 W (t + K ) × temp(t ) × Enew(t + k )
R (t ) =
N
∑ Enew(t )
for k=depth of that pixel in 3D global matrix, that is, k will take all future and past
values of that particular pixel.
Where, N: Normalizing factor
N = ∑ I (t + K )
W: Weight factor. It represents how much trust should we put on the value
obtained at that particular instant
- 63 -
11.1 Calculation of weights
1. Value of pixel:
In image processing, the pixel values that fall on the linear range are
suppose to contain maximum information. If the pixel value is within the
linear range, more is the weight assigned. On the other hand, if it too
low(noise) or it is too high(saturated) less weight is assigned.
3. Focus quality
If the focus quality of a frame containing the pixel is more, more weight is
assigned to that pixel.
- 64 -
Chapter 12
Results
- 65 -
Results
Figure 12.1: First Frame Figure 12.2: First Frame After Correction
Figure 12.3: Second Frame Figure 12.4: Second Frame After Correction
Figure 12.5: Third Frame Figure 12.6: Third Frame After Correction
- 66 -
Figure 12.7: Fourth Frame Figure 12.8: Fourth Frame After Correction
Figure 12.9: Final Frame Figure 12.10: Final Frame After Correction
The results for a set of frames are shown above. For the first image in particular,
there is a rather stark improvement in the brightness, while still not immediately
jumping to the level of the last images.
Note however that even the corrected images are increasing in their brightness
value at some reasonable gradient, so there is no sudden peak in the brightness,
which again might seem unaesthetic.
- 67 -
To really understand what has transpired here, attention is drawn to the first set
of original and corrected images. Rather than simply being a case of increasing
the brightness, there’s a substantial increase in the information content as well.
The bag and some details of the clothes begin to appear clearly right from the
first frame in the corrected frames now.
Hence there has been a transplant of information itself from latter frames where
the information is clearly present to the earlier frames.
- 68 -
Chapter 13
- 69 -
Conclusion and Future Scope
At this stage, the initial goals laid done have been achieved with good results. In
the results section, the improvement in the frames is clear for all to see. Also, the
entire process needed very little intervention from the user, and almost no error
due to human perception was incorporated.
What has effectively been achieved here may be explained in a different way.
One may think of the video obtained after post-processing as being taken directly
by an imaginary camera.
The feature of this camera is that it sets a different shutter time, i.e allows for
different exposure values at each and every pixel. This is in stark contrast to a
normal camera where there is only one single exposure setting for the entire
frame.
One may think of a scenario where we are panning the camera across a window
looking out on to the bright exterior of the building, but half covered by a curtain
with designs on it. Now, ordinarily a camera would simply adjust to the bright light
and keep a short shutter time.
This would leave the curtain details in the dark. Imagine however that the camera
pans to reveal those details in latter frames. The algorithm that has been applied
in this project would be able to rectify even those frames where the sunlit side of
the window appears to be exceptionally bright.
Hence, we might even be able to see the window brightly lit on one side, and the
curtain in all its glory on the other.
As this was meant to serve as a proof of concept for a larger more complicated
system, it has served its purpose. The opportunity, however to improve upon this
work is great and would involve probably a complete automation of the entire
process, with even he process of identifying the bad frames done by the
program.
- 70 -
At the cost of complexity and time, we can also make the motion detection
absolutely accurate by considering factors such as rotation, foreshortening and
zoom. The idea for using the same to process colours will also require that the
time expended based on this algorithm be reduced, as it would also involve
additional constraints.
- 71 -
Chapter 14
References
- 72 -
References
www.wikipedia.com
ffmpeg.mplayerhq.hu
www.mathworks.com
- 73 -
www.octave.org
www.google.com
- 74 -