Thesis Proposal - Automatic Video Genre Identification

Content-Based Automatic Video Genre Identification
T HESIS R ESEARCH P ROPOSAL
CONTENT-BASED
AUTOMATIC
VIDEO GENRE IDENTIFICATION
By
Faryal Shamsi
A research thesis proposal submitted to the Department of Computer Science
Sukkur Institute of Business Administration, Sukkur
in partial fulfillment of the requirements for the degree of
Master of Science (Computer Science)
Thesis Supervisor: Dr. Sher Muhammad Doudpota
Department of Computer Science
Sukkur Institute of Business Administration, Sukkur

1
TABLE OF CONTENTS
Abstract
Introduction
Problem Statement
Research Objective
Proposed Organization of Thesis
Review of Literature
Methodology
References
2
ABSTRACT
Video genre identification is crucial for efficient indexing and retrieval. The contents of video are
diverse, and its makes the genre identification a tricky job. Shot boundary detection and scene
detection serve as major building blocks for video content analysis using its visual features.
Conventional methods of shot boundary detection have some limitations, as it relies on a threshold.
The threshold value learned from a specific video genre may not produce accurate results for
another genre. Since, shot changes may be very frequent in case of an animation as compared to a
talk-show. Likewise, the similarity between the shots may be very high, in case of a sports program
as compared to a movie.
This study will propose an alternative approach to classify a video in 5 categories (i.e. Movie,
Talk-Show, Lectures, Sports or Others). The shots and scenes will contribute features for the genre
identification. These shots and scene will be detected using K-Means clustering rather than using
a threshold value.
Key Words:
Video Genre Identification – Content-Based Video Processing – Shot Boundary Detection
3
INTRODUCTION
With the evolution of internet and social networking websites, content sharing is becoming a
popular trend [1]. The level of facilitation provided to user by such websites leads to increase the
information overload, while organization of the contents is becoming a challenging and hectic task
[2]. The most popular form of content on social media is videos [3]. The nature of video contents
is diverse as it combines all other types of media such as text, audio, images and so on [4]. The top
ranking social networking sites like Facebook, YouTube allow users to explore billions of videos
per day [5]. Proper organization of such videos is therefore a necessary operation to ensure efficient
indexing and searching.
VIDEO GENRE IDENTIFICATION
Genre is defined as socially agreed category of content [6]. So, the term content-based automatic
video genre identification means, to recognize the category of a video on basis of its contents. The
heterogeneous nature of video contents, makes the genre identification a challenging job.
For automatic video classification, various features can be used, such as text based, audio based
and visual based [4]. Shot boundary detection and scene detection are the major building blocks
video classification using its visual features.
4
SHOT BOUNDARY DETECTION & SCENE DETECTION
A sequence of images combines to form a frame, while different frames collectively depict a shot,
furthermore collection of shots generate a scene. A video is nothing but a collection of such types
of scenes. Videos may have scenes captured from different cameras as shown in the section (a) of
following diagram.
5
Each camera generates a different sequence of frames as shown in section (b). A shot can be
defined as sequence of frames, recorded at contiguous time slice from a single camera [8]. The
aim of shot boundary detection is to automatically detect the transition of one shot to another for
temporal analysis and segmentation of a video [9].
A scene is a fragment of a video in which the shots are repetitive as illustrated in section (c).
A scene can also be defined as continuous action within an event of a video. Scene detection is the
process to automatically detect this repetitive pattern within a video [7].
6
PROBLEM STATEMENT
In spite of all the progress in the field of multimedia mining and contents based filtering, still there
is need of a system which can automatically understand the contents of videos. A reliable automatic
video genre identification system which can categorize any type of video, is yet to be proposed.
The existing video indexing and searching mechanisms available are fully at mercy of the
information provided the uploader. On the other hand, an uploader enjoys full autonomy while
generating and sharing any type of content. The uploader is not bound to provide necessary
information about the content so that it can be utilized for the purpose of indexing. Also, there is
no check and balance to ensure the integrity of the information provided by the uploader. For
example, an uploader has complete freedom to give any title to the video, no matter how
irrespective it is, with the actual contents of video. An uploader might give his or her own name
to a movie or can caption a talk show as a movie, in such case a user may not be able to view these
videos if his/her search string does not match with the information available with the video. So,
there must be an identification system for video genre which considers its contents rather than the
textual information provided by the uploader.
7
Shot detection is one of the major building blocks, used for video contents analysis. Currently shot
detection is performed by comparing the difference of two adjacent frames with a threshold [10].
Consider,
Fi ∈ S if,
Fi – Fi-1 > t
Let, F is set of frames within a video, S is the set of frames representing a shot transition and t is
the threshold value. If the difference between two consecutive shots is found to be greater than the
threshold value, then the later frame has the shot transition [10].
The major challenge here is to set the appropriate threshold value which is effective for all video
genres. If we consider a talk show or a lecture, the environment or the position of actors rarely
changes. Therefore, the threshold value learned by system might be very small. While in a movie
or drama the actor might be running, walking or dancing, which may result as a higher threshold
value to detect a transition between two shots. In such case using threshold based shot detection
approach for video genre identification may not produce accurate results. So, there must be an
alternative approach.
8
RESEARCH OBJECTIVES
This research work, as the title suggests, aims to propose a procedure to automatically identify the
genre of video by analyzing its visual contents. Shot boundary detection and scene detection are
used as dominant features to recognize the contents of a video. The traditional shot boundary
detection and scene detection techniques are often threshold based and may give imperfect results.
This research work also aims to resolve the issues of threshold based shot detection by using the
well-known machine learning approach of K-Means Clustering for the purpose of shot detection.
Major research objective of this thesis are:
 To propose a unique approach for shot boundary and transition detection, using K-Means
Clustering.
 To classify the video with one of five genres –
o Movie
o Talk show
o Sports
o Lecture
o Others
9
Proposed Organization of the Thesis
Thesis will be organized in following chapters:
Chapter 1: Introduction –
This chapter will include the description about the research domain (i.e. Video Classification) and
key terms used throughout the thesis documents. Furthermore, it will address the contribution of
the research and problems with the contemporary systems.
Chapter 2: Literature Review –
This chapter will discuss the contribution of other researcher in the current field. Here the
significance of their work in this research will be included with the fact that how this work is
different or improved than the prior studies.
Chapter 3: Methodology –
The data mining models and machine learning techniques used will be presented in this section.
The implementation and experimental details will also be the part of this chapter.
Chapter 4: Results –
The overall findings of the research work, along with significant issues like accuracy, precision
and recall will be mentioned in this chapter.
Chapter 5: Conclusion –
This chapter will give the insight about the contributions made by the research work, with its
limitations. Recommended future work to extend the study will be elucidated.
10
LITERATURE REVIEW
For successful video genre identification, recognizing the contents of a video is the chore
activity. Video comprehension and classification approaches can be divided into four
categories – text based, audio based and visual based and hybrid approaches [4].
The text based approaches use the viewable text within a video to understand its contents. A
movie generally has textual information such as cast and subtitles which can give some
insight about the video contents. Techniques extracting the text and TF-IDF (Term
Frequency – Inverse Document Frequency) and OCR are discussed in [11,12].
Other than the text, audio contents within a video can also be used for video classification
and genre identification. Various techniques are proposed in literature which uses audio
information, like dialogues [13], music [14,15], or even silence ZCR – Zero crossing rate [16]
video content analysis.
Among all these features, visual features are most dominant within the video contents. The
colors [17], shapes [19, 20], luminance [18] and motion [21] are some of the commonly used
visual features used to analyze the contents of video, as proposed be literature. The color and
shape information can be statistically analyzed by generating the color histogram and shape
histogram of images extracted from the frames of video [22]. MPEG motion vectors [23, 24]
are used as features to examine the motion information available in a video.
11
The most familiar genre identification model is the HMM – Hidden Markov Model [25]
which combines various features (e.g. color, shape and audio) and the Gaussian Mixture
Model [26], which performs the probability distribution with following formula –
Video classification and genre identification can also be performed using some hybrid
approaches like [27], which uses a three step process for genre identification and analyze
text, audio and visual features at each step.
Shot detection is also an approach as proposed by [28] used for video classification. Shot
detection is performed in two steps, first one is scoring and the other one is decision. In the
first step, a score is assigned to each frame on the basis of its similarity or dissimilarity with
the previous frame. The similarity score may check the similarity on the basis of color, edge
and luminance information available in the video contents, as proposed by [17, 19]. This
similarity check can be performed pixel by pixel but is very expensive [19], or can be
performed on whole the frame which may lead to inaccurate results [29, 30, 31]. Therefore,
a moderate approach is to divide frame in rectangular boxes and compare both frames block
by block [32, 33]. Two popular similarity parameters used for shot detection are –
1. Linear Form
2. Chi – Square
12
METHODOLOGY
For video genre identification will be performed in the following steps –
Decomposition Extracting Color Shot & Scene Scoring -

Video Genre
of Video into & Edge Detection Using Number of
Identification
frames Histogram K-Means Scenes Extracted
In the first step, video decomposition is performed. A Java application by using the Xuggler Api
and FFMPEG package, will decompose the video in frames. Another Java application using
Paul_image Api, will generate the color and edge histograms for each image in the second step.
In the third step the shot and scene will be detected using K-Means Clustering using Weka Api in
Java. The idea is to calculate the differences between the color histograms and edge histograms of
each image, and these values into two clusters. Then, calculate the means as the centroids of the
difference values of the clusters.
µ (color,edge) = ( ∑F ) * 1 / Ci
j
Fj ∈ Ci
Ci = min ( ||Fj - µc1||2 , ||Fj - µc2||2 )
13
The above algorithm will assign each frame to the cluster whose mean is nearest to the color and
image histogram difference with its previous frame. This algorithm will assign each frame to
exactly one Cluster, even if it could be assigned to both clusters. Automatically, one cluster will
have the small difference values (let this cluster is C1) and the other will have large difference
values (let this cluster is C2). If the difference between two frames belongs to C1, then will be
considered as same shot. But, if the difference between the frames belongs to C2, then it is
considered as shot transition.
In the fourth step scene detection will be performed. Video genre will be identified on the basis of
number of scenes. A lecture has generally have only one shot, whereas a talk show may have many
shots but overall only one scene. A sports video also has only one scene but will have a
considerable amount of motion. A movie or drama can have any number of shots and scenes.
14
REFERENCES
1. Sitaram Asur, Bernado A. Huberman. Predicting the future with Social Media.
Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence and
Intelligent Agent Technologies. (2010)
2. Jiliang Tang, Yi Chang, Huan Liu. Mining Social Media with Social Theories: A Survey.
(2014). ACM SIGKDD Explorations Newsletter: Volume 15 Issue 2
3. DH Park, IY Choi, HK Kim, JK Kim. A Review and Classification of Recommender

Systems Research. 2011 International Conference on Social Science and Humanity
IPEDR vol. IACSIT Press, Singapore
4. Darin Brezeale and Diane J. Cook, Senior Member, IEEE. “Automatic Video
Classification: A Survey of the Literature”. IEEE Transactions On Systems, Man, And
Cybernetics (2007)
5. The Statistical Portal - https://www.statista.com/statistics/272014/global-social-

networks-ranked-by-number-of-users/
6. Genre – Wiki Encyclopedia. https://en.wikipedia.org/wiki/Genre
7. Costas Cotsaces, Nikos Nikolaidis, and Ioannis Pitas. Video Shot Detection and
Condensed Representation, A Review. IEEE Signal Processing Magazine. (2006)
8. Shot transistion detection – Wiki Encylopedia

https://en.wikipedia.org/wiki/Shot_transition_detection
9. P. Balasubramaniam; R Uthayakumar. Mathematical Modelling and Scientific

Computation International Conference, ICMMSC 2012, Gandhigram, Tamil Nadu,
India, (March 16-18, 2012). Springer. pp. 421–. ISBN 978-3-642-28926-2.
15
10. Rainer Lienhart. Comparision of Automatic Shot Boundary Detection

Algorithms. Microcomputer Research Labs, Intel Corporation, Santa Clara, CA 95052-
8819.
11. A. Hauptmann, R. Yan, Y. Qi, R. Jin, M. Christel, M. Derthick, M.Y. Chen, R. Baron,
W.-H. Lin, and T. D. Ng, “Video classification and retrieval with the informedia digital
video library system,” in Text Retrieval Conference (TREC02), 2002
12. T. Tokunaga and M. Iwayama, “Text categorization based on weighted inverse document
frequency,” Department of Computer Science, Tokyo Institute of Technology, Technical
Report 94-TR00001, 1994.
13. Bart Lehane, Noel O’Connor, Noel Murphy. “DIALOGUE SCENE DETECTION IN
MOVIES USING LOW AND MID-LEVEL VISUAL FEATURES” Centre for Digital
Video Processing Dublin City University
14. U. Srinivasan, S. Pfeiffer, S. Nepal, M. Lee, L. Gu, and S. Barrass, “A survey of mpeg-
1 audio, video and semantic analysis techniques,” Multimedia Tools and Applications,
vol. 27, no. 1, pp. 105–141, 2005.
15. G. Lu, “Indexing and retrieval of audio: A survey,” Multimedia Tools Applications, vol.
15, no. 3, pp. 269–290, 2001.
16. SM Doudpota, Sumanta Guha. “Mining Movies to Extract Song Sequences”.

MKMKDD’11 Proceedings of 11th International Workshop on Multimedia Data Mining.
(2011)
17. Z. Cernekova, C. Kotropoulos, and I. Pitas, “Video shot segmentation using singular
value decomposition,” in Proc. 2003 IEEE Int. Conf. Multimedia and Expo, Baltimore,
Maryland, July 2003, vol. 2, pp. 301–302.
18. R. Zabih, J. Miller, and K. Mai, “A feature-based algorithm for detecting and
classification production effects,” ACM Multimedia Syst., vol. 7, no. 1, pp. 119–128,
Jan.1999.
16
19. J.Namand A.Tewfik, “Detection of gradual transitions in video sequences using Bspline
interpolation,” IEEE Trans. Multimedia, vol. 7, no. 4, pp. 667–679, Aug.2005.
20. R. Zabih, J. Miller, and K. Mai, “A feature-based algorithm for detecting and
classification production effects,” ACM Multimedia Syst., vol. 7, no. 1, pp. 119–128,
Jan.1999
21. Bart Lehane, Noel E. O’connor, And Noel Murphy . “ACTION SEQUENCE
DETECTION IN MOTION PICTURES”. Centre for Digital Video Processing Dublin
City University
22. Jordi Mas and Gabriel F. “Shot Boundary Detection on basis of Color Histogram”.
Digital Television Center (CeTVD), La Selle School of Engineering, Ramon Llull
Univerisity. Barcelona. Spain.
23. V. Kobla, D. S. Doermann, K.-I. Lin, and C. Faloutsos, “Compresseddomain video

indexing techniques using DCT and motion vector information in MPEG video,” in
Storage and Retrieval for Image and Video Databases (SPIE), 1997, pp. 200–211.
24. H. Wang, A. Divakaran, A. Vetro, S.-F. Chang, and H. Sun, “Survey of compressed-
domain features used in audio-visual indexing and analysis,” Journal of Visual
Communication and Image Representation, vol. 14, no. 2, pp. 150–183, June 2003.
25. R. O. Duda, P. E. Hart, and D. G. “Stork, Pattern Classification”, 2nd ed. New York, NY:
John Wiley & Sons, 2001.
26. C. M. Bishop, “Pattern Recognition and Machine Learning”. New York, NY: Springer,
2006.
27. S. Fischer, R. Lienhart, and W. Effelsberg, “Automatic recognition of film genres,” in

MULTIMEDIA ’95: Proceedings of the third ACM international conference on
Multimedia, 1995.
17
28. N. Vasconcelos and A. Lippman, “Statistical models of video structure for content
analysis and characterization,” IEEE Transactions on Image Processing, vol. 9, no. 1,
2000.
29. Z. Cernekova, C. Kotropoulos, and I. Pitas, “Video shot segmentation using singular
value decomposition,” in Proc. 2003 IEEE Int. Conf. Multimedia and Expo, Baltimore,
Maryland, July 2003, vol. 2, pp. 301–302.
30. J. Yu and M.D. Srinath, “An efficient method for scene cut detection,” Pattern Recognit.
Lett., vol. 22, no. 13, pp. 1379–1391, Nov.2001.
31. R. Lienhart, “Reliable dissolve detection,” in Proc. SPIE, vol. 4315, pp. 219–230,
Jan.2001.
32. D. Lelescu and D. Schonfeld, “Statistical sequential analysis for real-time video scene
change detection on compressed multimedia bitstream,” IEEE Trans. Multimedia, vol. 5,
no. 1, pp. 106–117, Mar.2003.
33. A. Hanjalic, “Shot-boundary detection: Unraveled and resolved?” IEEE Trans. Circuits
Syst. Video Technol., vol. 12, no. 2, pp. 90–105, Feb.2002.
18

Thesis Proposal - Automatic Video Genre Identification

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Thesis Proposal - Automatic Video Genre Identification

Uploaded by

Copyright:

Available Formats

Content-Based Automatic Video Genre Identification

T HESIS R ESEARCH P ROPOSAL

A research thesis proposal submitted to the Department of Computer Science

Sukkur Institute of Business Administration, Sukkur

in partial fulfillment of the requirements for the degree of

Master of Science (Computer Science)

Thesis Supervisor: Dr. Sher Muhammad Doudpota

Department of Computer Science

Sukkur Institute of Business Administration, Sukkur

Proposed Organization of Thesis

Video Genre Identification – Content-Based Video Processing – Shot Boundary Detection

indexing and searching.

VIDEO GENRE IDENTIFICATION

video classification using its visual features.

SHOT BOUNDARY DETECTION & SCENE DETECTION

temporal analysis and segmentation of a video [9].

process to automatically detect this repetitive pattern within a video [7].

textual information provided by the uploader.

Major research objective of this thesis are:

 To classify the video with one of five genres –

Proposed Organization of the Thesis

Thesis will be organized in following chapters:

the research and problems with the contemporary systems.

Chapter 2: Literature Review –

different or improved than the prior studies.

and recall will be mentioned in this chapter.

limitations. Recommended future work to extend the study will be elucidated.

Frequency – Inverse Document Frequency) and OCR are discussed in [11,12].

video content analysis.

are used as features to examine the motion information available in a video.

text, audio and visual features at each step.

For video genre identification will be performed in the following steps –

Decomposition Extracting Color Shot & Scene Scoring -

difference values of the clusters.

Ci = min ( ||Fj - µc1||2 , ||Fj - µc2||2 )

considered as shot transition.

3. DH Park, IY Choi, HK Kim, JK Kim. A Review and Classification of Recommender

5. The Statistical Portal - https://www.statista.com/statistics/272014/global-social-

6. Genre – Wiki Encyclopedia. https://en.wikipedia.org/wiki/Genre

8. Shot transistion detection – Wiki Encylopedia

9. P. Balasubramaniam; R Uthayakumar. Mathematical Modelling and Scientific

10. Rainer Lienhart. Comparision of Automatic Shot Boundary Detection

16. SM Doudpota, Sumanta Guha. “Mining Movies to Extract Song Sequences”.

23. V. Kobla, D. S. Doermann, K.-I. Lin, and C. Faloutsos, “Compresseddomain video

27. S. Fischer, R. Lienhart, and W. Effelsberg, “Automatic recognition of ﬁlm genres,” in

You might also like