Professional Documents
Culture Documents
Thesis Proposal - Automatic Video Genre Identification
Thesis Proposal - Automatic Video Genre Identification
CONTENT-BASED
AUTOMATIC
VIDEO GENRE IDENTIFICATION
By
Faryal Shamsi
TABLE OF CONTENTS
Abstract
Introduction
Problem Statement
Research Objective
Review of Literature
Methodology
References
2
Content-Based Automatic Video Genre Identification
ABSTRACT
Video genre identification is crucial for efficient indexing and retrieval. The contents of video are
diverse, and its makes the genre identification a tricky job. Shot boundary detection and scene
detection serve as major building blocks for video content analysis using its visual features.
Conventional methods of shot boundary detection have some limitations, as it relies on a threshold.
The threshold value learned from a specific video genre may not produce accurate results for
another genre. Since, shot changes may be very frequent in case of an animation as compared to a
talk-show. Likewise, the similarity between the shots may be very high, in case of a sports program
as compared to a movie.
This study will propose an alternative approach to classify a video in 5 categories (i.e. Movie,
Talk-Show, Lectures, Sports or Others). The shots and scenes will contribute features for the genre
identification. These shots and scene will be detected using K-Means clustering rather than using
a threshold value.
Key Words:
3
Content-Based Automatic Video Genre Identification
INTRODUCTION
With the evolution of internet and social networking websites, content sharing is becoming a
popular trend [1]. The level of facilitation provided to user by such websites leads to increase the
information overload, while organization of the contents is becoming a challenging and hectic task
[2]. The most popular form of content on social media is videos [3]. The nature of video contents
is diverse as it combines all other types of media such as text, audio, images and so on [4]. The top
ranking social networking sites like Facebook, YouTube allow users to explore billions of videos
per day [5]. Proper organization of such videos is therefore a necessary operation to ensure efficient
Genre is defined as socially agreed category of content [6]. So, the term content-based automatic
video genre identification means, to recognize the category of a video on basis of its contents. The
heterogeneous nature of video contents, makes the genre identification a challenging job.
For automatic video classification, various features can be used, such as text based, audio based
and visual based [4]. Shot boundary detection and scene detection are the major building blocks
4
Content-Based Automatic Video Genre Identification
A sequence of images combines to form a frame, while different frames collectively depict a shot,
furthermore collection of shots generate a scene. A video is nothing but a collection of such types
of scenes. Videos may have scenes captured from different cameras as shown in the section (a) of
following diagram.
5
Content-Based Automatic Video Genre Identification
Each camera generates a different sequence of frames as shown in section (b). A shot can be
defined as sequence of frames, recorded at contiguous time slice from a single camera [8]. The
aim of shot boundary detection is to automatically detect the transition of one shot to another for
A scene is a fragment of a video in which the shots are repetitive as illustrated in section (c).
A scene can also be defined as continuous action within an event of a video. Scene detection is the
6
Content-Based Automatic Video Genre Identification
PROBLEM STATEMENT
In spite of all the progress in the field of multimedia mining and contents based filtering, still there
is need of a system which can automatically understand the contents of videos. A reliable automatic
video genre identification system which can categorize any type of video, is yet to be proposed.
The existing video indexing and searching mechanisms available are fully at mercy of the
information provided the uploader. On the other hand, an uploader enjoys full autonomy while
generating and sharing any type of content. The uploader is not bound to provide necessary
information about the content so that it can be utilized for the purpose of indexing. Also, there is
no check and balance to ensure the integrity of the information provided by the uploader. For
example, an uploader has complete freedom to give any title to the video, no matter how
irrespective it is, with the actual contents of video. An uploader might give his or her own name
to a movie or can caption a talk show as a movie, in such case a user may not be able to view these
videos if his/her search string does not match with the information available with the video. So,
there must be an identification system for video genre which considers its contents rather than the
7
Content-Based Automatic Video Genre Identification
Shot detection is one of the major building blocks, used for video contents analysis. Currently shot
detection is performed by comparing the difference of two adjacent frames with a threshold [10].
Consider,
Fi ∈ S if,
Fi – Fi-1 > t
Let, F is set of frames within a video, S is the set of frames representing a shot transition and t is
the threshold value. If the difference between two consecutive shots is found to be greater than the
threshold value, then the later frame has the shot transition [10].
The major challenge here is to set the appropriate threshold value which is effective for all video
genres. If we consider a talk show or a lecture, the environment or the position of actors rarely
changes. Therefore, the threshold value learned by system might be very small. While in a movie
or drama the actor might be running, walking or dancing, which may result as a higher threshold
value to detect a transition between two shots. In such case using threshold based shot detection
approach for video genre identification may not produce accurate results. So, there must be an
alternative approach.
8
Content-Based Automatic Video Genre Identification
RESEARCH OBJECTIVES
This research work, as the title suggests, aims to propose a procedure to automatically identify the
genre of video by analyzing its visual contents. Shot boundary detection and scene detection are
used as dominant features to recognize the contents of a video. The traditional shot boundary
detection and scene detection techniques are often threshold based and may give imperfect results.
This research work also aims to resolve the issues of threshold based shot detection by using the
well-known machine learning approach of K-Means Clustering for the purpose of shot detection.
To propose a unique approach for shot boundary and transition detection, using K-Means
Clustering.
o Movie
o Talk show
o Sports
o Lecture
o Others
9
Content-Based Automatic Video Genre Identification
Chapter 1: Introduction –
This chapter will include the description about the research domain (i.e. Video Classification) and
key terms used throughout the thesis documents. Furthermore, it will address the contribution of
This chapter will discuss the contribution of other researcher in the current field. Here the
significance of their work in this research will be included with the fact that how this work is
Chapter 3: Methodology –
The data mining models and machine learning techniques used will be presented in this section.
The implementation and experimental details will also be the part of this chapter.
Chapter 4: Results –
The overall findings of the research work, along with significant issues like accuracy, precision
Chapter 5: Conclusion –
This chapter will give the insight about the contributions made by the research work, with its
10
Content-Based Automatic Video Genre Identification
LITERATURE REVIEW
For successful video genre identification, recognizing the contents of a video is the chore
activity. Video comprehension and classification approaches can be divided into four
categories – text based, audio based and visual based and hybrid approaches [4].
The text based approaches use the viewable text within a video to understand its contents. A
movie generally has textual information such as cast and subtitles which can give some
insight about the video contents. Techniques extracting the text and TF-IDF (Term
Other than the text, audio contents within a video can also be used for video classification
and genre identification. Various techniques are proposed in literature which uses audio
information, like dialogues [13], music [14,15], or even silence ZCR – Zero crossing rate [16]
Among all these features, visual features are most dominant within the video contents. The
colors [17], shapes [19, 20], luminance [18] and motion [21] are some of the commonly used
visual features used to analyze the contents of video, as proposed be literature. The color and
shape information can be statistically analyzed by generating the color histogram and shape
histogram of images extracted from the frames of video [22]. MPEG motion vectors [23, 24]
11
Content-Based Automatic Video Genre Identification
The most familiar genre identification model is the HMM – Hidden Markov Model [25]
which combines various features (e.g. color, shape and audio) and the Gaussian Mixture
Model [26], which performs the probability distribution with following formula –
Video classification and genre identification can also be performed using some hybrid
approaches like [27], which uses a three step process for genre identification and analyze
Shot detection is also an approach as proposed by [28] used for video classification. Shot
detection is performed in two steps, first one is scoring and the other one is decision. In the
first step, a score is assigned to each frame on the basis of its similarity or dissimilarity with
the previous frame. The similarity score may check the similarity on the basis of color, edge
and luminance information available in the video contents, as proposed by [17, 19]. This
similarity check can be performed pixel by pixel but is very expensive [19], or can be
performed on whole the frame which may lead to inaccurate results [29, 30, 31]. Therefore,
a moderate approach is to divide frame in rectangular boxes and compare both frames block
by block [32, 33]. Two popular similarity parameters used for shot detection are –
1. Linear Form
2. Chi – Square
12
Content-Based Automatic Video Genre Identification
METHODOLOGY
In the first step, video decomposition is performed. A Java application by using the Xuggler Api
and FFMPEG package, will decompose the video in frames. Another Java application using
Paul_image Api, will generate the color and edge histograms for each image in the second step.
In the third step the shot and scene will be detected using K-Means Clustering using Weka Api in
Java. The idea is to calculate the differences between the color histograms and edge histograms of
each image, and these values into two clusters. Then, calculate the means as the centroids of the
µ (color,edge) = ( ∑F ) * 1 / Ci
j
Fj ∈ Ci
13
Content-Based Automatic Video Genre Identification
The above algorithm will assign each frame to the cluster whose mean is nearest to the color and
image histogram difference with its previous frame. This algorithm will assign each frame to
exactly one Cluster, even if it could be assigned to both clusters. Automatically, one cluster will
have the small difference values (let this cluster is C1) and the other will have large difference
values (let this cluster is C2). If the difference between two frames belongs to C1, then will be
considered as same shot. But, if the difference between the frames belongs to C2, then it is
In the fourth step scene detection will be performed. Video genre will be identified on the basis of
number of scenes. A lecture has generally have only one shot, whereas a talk show may have many
shots but overall only one scene. A sports video also has only one scene but will have a
considerable amount of motion. A movie or drama can have any number of shots and scenes.
14
Content-Based Automatic Video Genre Identification
REFERENCES
1. Sitaram Asur, Bernado A. Huberman. Predicting the future with Social Media.
Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence and
Intelligent Agent Technologies. (2010)
2. Jiliang Tang, Yi Chang, Huan Liu. Mining Social Media with Social Theories: A Survey.
(2014). ACM SIGKDD Explorations Newsletter: Volume 15 Issue 2
4. Darin Brezeale and Diane J. Cook, Senior Member, IEEE. “Automatic Video
Classification: A Survey of the Literature”. IEEE Transactions On Systems, Man, And
Cybernetics (2007)
7. Costas Cotsaces, Nikos Nikolaidis, and Ioannis Pitas. Video Shot Detection and
Condensed Representation, A Review. IEEE Signal Processing Magazine. (2006)
15
Content-Based Automatic Video Genre Identification
11. A. Hauptmann, R. Yan, Y. Qi, R. Jin, M. Christel, M. Derthick, M.Y. Chen, R. Baron,
W.-H. Lin, and T. D. Ng, “Video classification and retrieval with the informedia digital
video library system,” in Text Retrieval Conference (TREC02), 2002
12. T. Tokunaga and M. Iwayama, “Text categorization based on weighted inverse document
frequency,” Department of Computer Science, Tokyo Institute of Technology, Technical
Report 94-TR00001, 1994.
13. Bart Lehane, Noel O’Connor, Noel Murphy. “DIALOGUE SCENE DETECTION IN
MOVIES USING LOW AND MID-LEVEL VISUAL FEATURES” Centre for Digital
Video Processing Dublin City University
14. U. Srinivasan, S. Pfeiffer, S. Nepal, M. Lee, L. Gu, and S. Barrass, “A survey of mpeg-
1 audio, video and semantic analysis techniques,” Multimedia Tools and Applications,
vol. 27, no. 1, pp. 105–141, 2005.
15. G. Lu, “Indexing and retrieval of audio: A survey,” Multimedia Tools Applications, vol.
15, no. 3, pp. 269–290, 2001.
17. Z. Cernekova, C. Kotropoulos, and I. Pitas, “Video shot segmentation using singular
value decomposition,” in Proc. 2003 IEEE Int. Conf. Multimedia and Expo, Baltimore,
Maryland, July 2003, vol. 2, pp. 301–302.
18. R. Zabih, J. Miller, and K. Mai, “A feature-based algorithm for detecting and
classification production effects,” ACM Multimedia Syst., vol. 7, no. 1, pp. 119–128,
Jan.1999.
16
Content-Based Automatic Video Genre Identification
19. J.Namand A.Tewfik, “Detection of gradual transitions in video sequences using Bspline
interpolation,” IEEE Trans. Multimedia, vol. 7, no. 4, pp. 667–679, Aug.2005.
20. R. Zabih, J. Miller, and K. Mai, “A feature-based algorithm for detecting and
classification production effects,” ACM Multimedia Syst., vol. 7, no. 1, pp. 119–128,
Jan.1999
21. Bart Lehane, Noel E. O’connor, And Noel Murphy . “ACTION SEQUENCE
DETECTION IN MOTION PICTURES”. Centre for Digital Video Processing Dublin
City University
22. Jordi Mas and Gabriel F. “Shot Boundary Detection on basis of Color Histogram”.
Digital Television Center (CeTVD), La Selle School of Engineering, Ramon Llull
Univerisity. Barcelona. Spain.
24. H. Wang, A. Divakaran, A. Vetro, S.-F. Chang, and H. Sun, “Survey of compressed-
domain features used in audio-visual indexing and analysis,” Journal of Visual
Communication and Image Representation, vol. 14, no. 2, pp. 150–183, June 2003.
25. R. O. Duda, P. E. Hart, and D. G. “Stork, Pattern Classification”, 2nd ed. New York, NY:
John Wiley & Sons, 2001.
26. C. M. Bishop, “Pattern Recognition and Machine Learning”. New York, NY: Springer,
2006.
17
Content-Based Automatic Video Genre Identification
28. N. Vasconcelos and A. Lippman, “Statistical models of video structure for content
analysis and characterization,” IEEE Transactions on Image Processing, vol. 9, no. 1,
2000.
29. Z. Cernekova, C. Kotropoulos, and I. Pitas, “Video shot segmentation using singular
value decomposition,” in Proc. 2003 IEEE Int. Conf. Multimedia and Expo, Baltimore,
Maryland, July 2003, vol. 2, pp. 301–302.
30. J. Yu and M.D. Srinath, “An efficient method for scene cut detection,” Pattern Recognit.
Lett., vol. 22, no. 13, pp. 1379–1391, Nov.2001.
31. R. Lienhart, “Reliable dissolve detection,” in Proc. SPIE, vol. 4315, pp. 219–230,
Jan.2001.
32. D. Lelescu and D. Schonfeld, “Statistical sequential analysis for real-time video scene
change detection on compressed multimedia bitstream,” IEEE Trans. Multimedia, vol. 5,
no. 1, pp. 106–117, Mar.2003.
33. A. Hanjalic, “Shot-boundary detection: Unraveled and resolved?” IEEE Trans. Circuits
Syst. Video Technol., vol. 12, no. 2, pp. 90–105, Feb.2002.
18